This page is intended to answer questions that developers, or other technical people, may have about Constellate.

How do I login?

  • Navigate to the dataset dashboard and click login via MyJSTOR. If you don’t have a MyJSTOR account, you will be prompted to create one from the MyJSTOR account page. Logged in users can save datasets that they have built using the Constellate application and access their dashboard across devices and browsers. A login is not currently required for the Analytics Lab.

How do I find example datasets?

How do I launch a notebook?

  • A notebook can be launched from any dataset on the dashboard page by clicking the Analyze button. Notebooks may also be launched from our textbook, which is available under the “Tutorials” link in the navigation bar.

How do I access the raw data for a dataset?

  • From the Constellate application, there is a download button from each dataset on the dashboard that will allow you to download the raw data behind any dataset. For all documents this includes the metadata and ngrams (up to three word phrases). Document that are out of copyright or openly available will also include the full text.
  • Within the Analytics Lab, the included Constellate client library will download the raw data automatically to the “data” directory in the home directory of your project.
  • The raw data is structured following the Constellate Document Format, which is a JSON based schema that includes metadata and text data (ngrams or full text, when available) for each document on the Constellate platform. A detailed description of each attribute is available and the JSON document schema can be downloaded or referenced in code.

Can I save a notebook and return to it later?

  • The answer is both no and yes. Sessions in the Analytics Lab don’t persist so to save a notebook, download it to your local machine. To use it later, simply upload it into the notebook environment.
  • However, currently (as of June, 2021), the Analytics Lab is based off the open source BinderHub software. This means that you can run your own Github repositories (and a number of other providers) within the environment. The easiest way to get this started would probably be to fork our repository of tutorial notebooks and modify those or create your own.

What Python libraries are available?

  • If you are working with the Constellate tutorial notebooks, you can view a current of installed libraries by viewing the requirements file.
  • You can always install a package by executing commands from within the notebook session, for example !pip install my-package
  • If you think there is a package we should install by default, please reach out to us at
  • If you are launching your own repository in the Constellate Analytics Lab environment, you can use the methods documented by the MyBinder project, which allow for a lot of flexibility.

What are the dataset limits?

  • Users from organizations participating in the beta program can work with datasets up to 50,000 documents. All others are limited to 25,000 documents. See our dataset options page for more details.