Metadata

MathSciNet is full of metadata.  We create our own metadata.  We receive metadata from many of the publishers of the journals we cover.  So what are metadata?  (Or what is metadata?)  The simplest explanation of metadata is that they are a type of data that describes other data.  The classical example is the metadata found in card catalogs from libraries.

A card from a card catalog, with annottions

 

Lots of information is on the card.  Note that before the annotation, nothing is labeled. There are accepted rules that tell a librarian (or a patron) what each piece of data is.  For most pieces of this data, a non-librarian would be likely to figure out what everything meant.

In an online catalog (such as from the Library of Congress), you might see:

Personal name: Malinowski, Bronislaw, 1884-1942.
Title: Magic, science and religion and other essays
Published/Created: Boston, Beacon Press, 1948.
LCCN Permalink:  https://lccn.loc.gov/48006987
Description: xii, 327 p. front. 22 cm.
LC classification (full): GN8 .M286
LC classification (partial): GN8
Related names: Redfield, Robert, 1897-ed.
Contents: Magic, science and religion.–Myth in primitive psychology.–Baloma: the spirits of the dead in the Trobriand Islands.–The problem of meaning in primitive language.–An anthropological analysis of war.
Subjects: Anthropology.
Notes: Bibliographic footnotes.
LCCN: 48006987
Dewey class no. 572.04
Type of material: Book

Notice that now the data fields are all labeled.

However, in modern libraries, there is more information than (normally) meets the eye. Here is a more detailed view of the digital record, as provided by the MARC record:

000 01142cam a22002891 4500
001 8352402
005 20050721194408.0
008 750617s1948 mau b 000 0 eng
035 __ |9 (DLC) 48006987
906 __ |a 7 |b cbc |c oclcrpl |d u |e ncip |f 19 |g y-gencatlg
010 __ |a 48006987
035 __ |a (OCoLC)1395532
040 __ |a DLC |c FTS |d OCoLC |d DLC
042 __ |a premarc
050 00 |a GN8 |b .M286
082 __ |a 572.04
100 1_ |a Malinowski, Bronislaw, |d 1884-1942.
245 10 |a Magic, science and religion, and other essays; |c selected and with an introd. by Robert Redfield.
260 __ |a Boston, |b Beacon Press, |c 1948.
300 __ |a xii, 327 p. |b front. |c 22 cm.
504 __ |a Bibliographical footnotes.
505 0_ |a Magic, science and religion.--Myth in primitive psychology.--Baloma: the spirits of the dead in the Trobriand Islands.--The problem of meaning in primitive language.--An anthropological analysis of war.
650 _0 |a Anthropology.
700 1_ |a Redfield, Robert, |d 1897- |e ed.
985 __ |e OCLC REPLACEMENT cdsdistr
991 __ |b c-GenColl |h GN8 |i .M286 |t Copy 1 |w OCLCREP
991 __ |b c-GenColl |h GN8 |i .M286 |p 00017792015 |t Copy 2 |w CCF

Wow! Lots more information! Moreover, I can’t understand most of it.  It is all labeled, but using a secret code.  How is this helpful?  Well, MARC stands for “MAchine Readable Cataloging”.  So I’m not supposed to be able to understand this, but a computer parses the information easily.  This is an example of metadata for the digital world.  (The machine code is available here as an XML file, in case you want to be unable to read it in another format.)

Metadata at Mathematical Reviews

In the right hands, metadata are pieces of information that is developed, structured, and maintained to describe materials in ways that meet the particular needs of a group of users. At Mathematical Reviews, metadata for each bibliographic entry in MathSciNet are created to serve the research needs of mathematicians, librarians, and others who work with the mathematics literature. Metadata describe each item listed in the database in terms of its type of publication, creators and publisher, length, edition, online availability, subject area (using MSCs), and other identifying characteristics.

MathSciNet metadata provide consistent and well-structured information about what has been published in mathematics over time and how publications are related to each other based on elements such as authors, publication date, subject matter, references, and editions. Staff members at Mathematical Reviews work to assure the consistency and accuracy of the metadata created for roughly 120,000 items each year. High quality metadata are essential to the many features of MathSciNet including author and publications searching, the Citation Database, Author Profile pages, and the rich and growing set of links within MathSciNet that enable efficient and accurate exploration of the mathematical literature.

The creation of bibliographic metadata at Mathematical Reviews takes place at two levels. The first level involves the bibliographic description of books, journals and issues.  At this level, the work resembles the cataloging at a library.  The second level focuses on the description of individual papers within journal issues and book collections.  At this second level, the work can be quite different, working in finer detail with a narrower range of data.   Cataloging principles and standards guide the bibliographic description of materials at Mathematical Reviews. One important set of principles is found in the Resource Description and Access (RDA) standards.  (See also this page from the Library of Congress.) These and other principles and standards must be interpreted and incorporated into a cataloging framework designed to meet the information needs of the mathematical community.  Adhering to principles and standards allows us to maintain continuity and consistency in MathSciNet across all the literature we cover, and across the wider bibliographic world (e.g., your library).

Preliminary Data

Publishers create metadata for their publications.  One of the earliest resources was Bowker’s Books in Print.  (See also this.) Years ago, this was produced annually as a bound volume that you could find in your library or bookstore.  It attempted to list the bibliographic information of every book printed in a given year.  It was a book about books – metadata!   For scholarly and academic journals, there was Ulrich’s Periodicals Directory.   Bowker’s and Ulrich’s only had the information from the publishers who gave it to them. Both resources were widely used in libraries, for purchasing as well as for identification and awareness purposes. Bowker’s was also heavily used by bookstores and book distributors, particularly for making purchases. Needless to say, it was a good idea for publishers to provide data to Bowker’s and to Ulrich’s.  Both are now solely online.

For books, Bowker’s established a format for the electronic delivery of metadata.  Data in this format was frequently used by both online bookstores and brick-and-mortar bookstores.  Other formats exist, such as what amazon.com uses.  The Library of Congress records described above are important.  When readying a book, the publisher applies for “CIP data” – cataloging in publishing data.  The record returned to the publisher from the Library of Congress becomes an important component of the metadata attached to the publication.

Mathematical Reviews receives metadata from publishers through our Preliminary Data program. Publisher metadata received in this program are used to create preliminary entries in MathSciNet, while editorial decisions, cataloging, and classification are completed. These items are marked in MathSciNet with the icon  screen-shot-prelim.  Preliminary MathSciNet entries speed up the availability of information about publications by several weeks.

While preliminary data accelerate the posting of items to MathSciNet, it is not the case that the data arrive and we can just blithely post away!  Data arrive in all sorts of formats.  And with various extras.  Upon arrival, preliminary data go through an initial check, during which materials such as cover images, front and back matter, and other non-article information are removed. Duplicate materials are also removed and issue and journal level information is verified and corrected as needed. The papers are then sent to our editors, who select those for which permanent listings in MathSciNet will be made.

The selected papers move through the departments at Mathematical Reviews and the bibliographic data are checked, edited, and enhanced at each point. Discrepancies between the data received and the published online version are addressed and additional information about the paper may be added. Author disambiguation is done, links to Author Profile pages are made, institutional codes are added, and classifications assigned. When these steps are completed, the preliminary entry is replaced with its permanent and complete MathSciNet entry. Articles from approximately 800 journals now arrive through our Preliminary Data program. We continue to add journals to the program and look forward to working with additional publishers in the future.

Accurate, timely, and consistent metadata are integral to the information and services we provide to the mathematical community. MathSciNet metadata development is an ongoing process accomplished by experienced and dedicated staff throughout Mathematical Reviews.

 


I am very grateful to Kathy Wolcott,  Librarian and Manager of the Mathematical Reviews Acquisitions Department, for help with this post.  Large swathes of this post are plagiarized verbatim from a document written by her.


Is “data” singular or plural?  For that question, I defer to the wisdom of xkcd.

About Edward Dunne

I am the Executive Editor of Mathematical Reviews. Previously, I was an editor for the AMS Book Program for 17 years. Before working for the AMS, I had an academic career working at Rice University, Oxford University, and Oklahoma State University. In 1990-91, I worked for Springer-Verlag in Heidelberg. My Ph.D. is from Harvard. I received a world-class liberal arts education as an undergraduate at Santa Clara University.
This entry was posted in General information. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

HTML tags are not allowed.

91,424 Spambots Blocked by Simple Comments