The Constellate Dataset Builder features content from each of the providers below. (Some providers also contain overlapping content.)  All content in Constellate is available to you for analysis, regardless of whether your institution subscribes to the content for access. You may build visualizations by publisher for any of the content provided by JSTOR or Portico.  In addition, you may browse and select for analysis any of the serial titles within Constellate.

JSTOR

JSTOR is a digital library for the intellectually curious. We help everyone discover, share, and connect valuable ideas.

Content Type: journal articles, book chapters, research reports, pamhphlets
Size: over 12 million documents
Document Publication Date Distribution:

Distribution of JSTOR Content in Constellate by Year from 1700-Current

Metadata Quality: High
Text Accuracy: High
Download Availability:

Metadata Unigrams Bigrams Trigrams Full-Text
Early Journal Content (EJC) yes yes yes yes yes
Archive Collections (outside of EJC) yes yes yes yes no
Open Access Books yes yes yes yes yes
Research Reports yes yes yes yes yes
19th Century British Pamphlets yes yes yes yes no

Important Considerations: Most of JSTOR’s journal titles are impacted by the “moving wall.” The moving wall is an agreement with the journal publishers on how many years behind the current year JSTOR will remain. For example, if it is 2020, a journal with a 5-year moving wall will only have content through 2014 available for access in JSTOR and analysis in Constellate.

Portico

Portico works with libraries and publishers to preserve scholarly content.

Content Type: journal articles, book chapters, full books
Size: over 16 million documents
Document Publication Date Distribution:

Distribution of Portico Content in Constellate by Year from 1700-Current

Metadata Quality: Variable
Text Accuracy: High
Download Availability:

Metadata Unigrams Bigrams Trigrams Full-Text
Journals yes yes yes yes no
Books yes yes yes yes no

Important Considerations:

  • Full text from Portico is not currently available for download.
  • Some of the Portico books are made available for text analysis as chapters, however others are only available as full books.
  • Portico’s content is not impacted by a moving wall. In general, Portico has content preserved within a month or two of publication.

The Portico publishers that have chosen to include their content in Constellate are:

  • Academicus - International Scientific Journal
  • Academy of Science of South Africa
  • Begell House
  • Berghahn
  • BioOne
  • Bond University
  • The British Editorial Society of Bone & Joint Surgery
  • Cambridge University Press
  • Copernicus Publications
  • CSIRO Publishing
  • Edinburgh University Press
  • EDP Sciences
  • Edward Elgar Publishing
  • Emerald Group Publishing
  • F1000 Research
  • Guilford Publications
  • HBKU Press (QScience)
  • Hindawi
  • Indiana University Press
  • International Association of Online Engineering
  • The Institute for Business and Finance Research
  • John Benjamins Publishing Company
  • John Wiley & Sons, Inc.
  • LED Edizioni Universitarie
  • Michigan State University Press
  • Microbiology Society
  • Morgan & Claypool Publishers
  • Nomos Verlagsgesellschaft mbH & Co. KG
  • Nordic Open Access Scholarly Publishing
  • Philosophy Documentation Center
  • Pluto Press
  • Project MUSE
  • Royal Irish Academy
  • The Royal Society
  • SAGE Publications
  • Society for Imaging Science & Technology
  • Thieme Publishing Group
  • Universitetsforlaget / Scandinavian University Press
  • University of California Press
  • University of Huddersfield Press
  • University of Kansas Libraries
  • University of South Florida
  • Vittorio Klostermann

Chronicling America

Chronicling America provides historic newspaper pages from 1789 to 1963.

Content Type: newspaper issues
Size: over 1 million documents
Document Publication Date Distribution:

Distribution of Chronicling America Content in Constellate by Year from 1700-Current

Metadata Quality: High
Text Accuracy: Variable
Download Availability:

Metadata Unigrams Bigrams Trigrams Full-Text
Newspapers yes yes yes yes yes

Important Considerations:

  • The OCR in Chronicling America is highly variable and that variability is not necessarily tied to age (e.g., we have identified very old issues with quality OCR and more recent issues with poor OCR).
  • This is a freely available collection of content. If you want to download all of the content locally and need it in the format Constellate delivers, you are welcome to download it at 10 datasets a day over several days from Constellate in units of 25,000 (or 50,000 if you are at a participating institution). You may download all of it in its original format directly from the Library of Congress as well.

Doc South

Documenting the American South (Doc South) is a digital publishing initiative from the University of North Carolina at Chapel Hill that provides access to texts, images, and audio files related to Southern history, literature, and culture.

Content Type: Documents, Books
Document Publication Date Distribution:

Distribution of Chronicling America Content in Constellate by Year from 1700-Current (note, this is the date of production, not the date of publication)

Size: ~600 documents
Metadata Quality: High
Text Accuracy: High
Download Availability:

Metadata Unigrams Bigrams Trigrams Full-Text
Documents yes yes yes yes yes
Books yes yes yes yes yes

Important Considerations:

  • DocSouth has made four collections available for text analytics: The Church in the Southern Black Community, First-Person Narratives of the American South, Library of Southern Literature, and North American Slave Narratives.
  • This is a freely available collection of content. If you want to download all of the content locally and need it in the format we deliver, you are welcome to download it from Constellate. You may download these four collections in their original format directly from Documenting the American South, as well.

South Asia Open Archives

South Asia Open Archives is a free open-access resource for research and teaching—a rich and growing curated collection of key historical and contemporary sources in arts, humanities and social sciences, from and about South Asia, in English and other languages of the region. SAOA's collection currently contains hundreds of thousands of pages of books, journals, newspapers, census data, magazines, and documents, with particular focus on social & economic history, literature, women & gender, and caste & social structure.

Content Type: journal articles, reports, newspapers, periodical, pamphlets, and surveys
Size: ~13,000 documents
Document Publication Date Distribution:

Metadata Quality: High
Text Accuracy: Medium
Download Availability:

Metadata Unigrams Bigrams Trigrams Full-Text
Documents yes yes yes yes no

Important Considerations:

  • The South Asia Open Archives contains a rich variety of content in 27 different languages. The most popular languages are Bengali, English, Urdu, Tamil, and Hindi.
  • The collection materials are free and open-access on JSTOR.

Reveal Digital

The Reveal Digital collection contains materials from Independent Voices, an open access digital collection of alternative press newspapers, magazines and journals, drawn from the special collections of participating libraries. These periodicals were produced by feminists, dissident GIs, campus radicals, Native Americans, anti-war activists, Black Power advocates, Hispanics, LGBT activists, the extreme right-wing press and alternative literary magazines during the latter half of the 20th century.

Content Type: newspapers, magazines, journals
Size: ~19,000 documents
Document Publication Date Distribution:

Metadata Quality: High
Text Accuracy: High
Download Availability:

Metadata Unigrams Bigrams Trigrams Full-Text
Documents yes yes yes yes yes

Important Considerations:

Your Own Institutional Content

We have gotten requests for institutions to load their own content into the platform to make available either to just their constituents or widely to anyone using the platform. This is a feature we are considering building offering. If this is of interest to you, please contact us, we would love to brainstorm with institutions about this possibility.

Alternatively, if you are an individual with a collection of content you hold locally, you may load it into our Analytics Lab to work with on its own or side-by-side with datasets built on our platform.  Please note that our Analytics Lab instances are ephemeral and should not be considered to have any permanency (e.g., if you load content and then walk away for 10 minutes, your session will have been discontinued and you will need to start a new session and re-upload your content).

Suggestions

We would like to continue to increase the variety and amount of content available for analysis.  If you have specific requests, please let us know at tdm@ithaka.org.