Are you interested in learning more about participation and roll-out? Read on...

The demand for analytics skills across all domains is growing exponentially. Text and data analysis is one of those skills, yet it remains difficult to learn. Researchers and students are often teased by black box, point-and-click tools that produce a few quick visualizations that whet the appetite; however, the next step in learning text analytics is a high one and requires students to learn statistics and programming.

Our primary goal is to make it easier for anyone to learn these skills by creating a learning platform that empowers faculty, librarians, and other instructors to educate a generation in text and data analysis. It provides users with the ability to build datasets for analysis from a variety of sources and provides a gathering space for the growing community of practitioners.

Our solution is centered on student and researcher success, providing text and data analysis capabilities and access to content from some of the world’s most respected databases in an open environment with a variety of teaching materials that can be used, modified, and shared.

Summary of the Platform

The platform provides value to users in three core areas -- they can teach and learn text analytics, build datasets from across multiple content sources, and visualize and analyze their datasets:

Learn & Teach

  • Template and Tutorial Code: Work with template Jupyter Notebooks to analyze your dataset and learn about text analytics (with additional environments forthcoming, such as R Studio).
  • Lessons and Documentation: Lessons and educational materials created by a community of experts, including those from the NEH-funded Text Analysis Pedagogy Institutes.
  • Collaborative Teaching Materials Creation: Users may create, edit, reuse and collaborate in the creation of tutorials, code, documentation, and other educational resources for text analysis (our tutorial notebooks are all available in GitHub, in addition to being accessible for use in our Analytics Lab).


  • Multiple Collections: Anchor collections from JSTOR and Portico, with additional content sources continually added (such as Library of Congress’ Chronicling America). Further details about the collections are available.
  • Data Download in JSON
    • All content - bibliographic metadata, unigrams, bigrams, trigrams
    • Open content - bibliographic metadata, unigrams, bigrams, trigrams, full-text
  • Dataset Dashboard: Easily view datasets you have built or accessed.


  • Analytics Lab: Integrated computational environment powered by BinderHub that will allow users to seamlessly analyze text content using provided template Jupyter Notebooks and tutorials
  • Visualize: Built-in visualizations for your datasets
  • Work with Rights Restricted Full-Text: Access to substantial compute cycles to work with the full-text of rights restricted content (forthcoming in late 2021 -- until then, it is possible to request JSTOR content through a personal agreement)

Roll Out

We are rolling out the subscription service by offering a six-month beta evaluation to institutions that participate in JSTOR or Portico. It is important to us that the platform be as widely available as possible, while also covering our costs, and to that end there will always be a tier of service available to individuals for free that improves on JSTOR’s self-service Data for Research (DfR) functionality (see our documentation about the differences between this new platform and DfR).

Institutional participants in the free trial will be able to provide their users with additional computational power in the Analytics Lab and participate in training sessions:

Non-Trial Users Trial Participants
Build & visualize datasets up to a specified number of items 25K 50K
Download datasets up to a specified number of items 25K 50K
View and download built-in visualizations for datasets
Access to computational environment resources sufficient for: Learning Teaching & research
Computational environment with learn to text mine notebooks
Compute environment - CPUs <Core Tier 4 cores
Compute environment - maximum memory 2 GB 8 GB
Unlimited simultaneous users in computational environment
Adopt, adapt, and contribute tutorials and documentation
Run institutional users’ (instructors, students, etc.) repositories of code in our computational environment
Attend our Train-the-Trainer workshops

This free, beta evaluation period will help institutions gauge the demand on their campus for this tool and help us assess the amount of usage the platform may see to more accurately estimate costs and determine appropriate fees

If you are interested in signing up for the free trial, please contact us at

Subscription Service

In the second half of 2021, we expect to offer institutions subscriptions to a paid tier of service sized to be used for teaching and learning. We do not yet have pricing for these subscriptions. We want to balance the need to both cover our costs and keep these subscriptions reasonably priced. The beta evaluation period associated with our 2021 launch will help us and our institutions evaluate both cost and value. By the end of 2021, we plan to offer an additional tier of service aimed at meeting the more substantial demands of advanced researchers requiring computing power and access to the full text of rights restricted content. If you are an advanced researcher interested in exploring with us what might meet your needs, please let us know at