Building a coherent dataset is key to any text analysis. If you are just exploring, we recommend building a very tight, subject specific dataset. You can do this with some robust filtering, but you might also want to focus on one or two publications.
Constellate provides a variety of filtering options to help you build a dataset appropriate to your research:
- Keyword: This field is expansive and lets you use many search and boolean operators (read about the Constellate search syntax). You may search for multi-word phrases, just put quotes around them. For example, "foster care" AND (adopt* OR reunification OR reunifies OR reunify) will find all documents that contain the phrase foster care and contain any words that start with adopt or the flavors of reunify included.
- Publication Title(s): This field lets you limit your dataset to a specific set of journals, newspapers, or books -- start typing the title and matches will be presented to you for selection. Note that Constellate contains tens of thousands of journals, newspapers, and books and you may want to choose to “Browse titles” to more easily browse all of the choices and select those serial titles of interest to you.
- Publication Dates: Constellate has content available for analysis published between 1700 and the current year. Please note that recent content from JSTOR is impacted by the moving wall and some of the the content from Portico that looks like it was published really early, appears to have a metadata problem and is actually published more recently (a good set of examples of why its important to understand your dataset, when doing text analysis!).
- Languages: This is a multi-select menu, so start typing the language in which you are interested and then select it when it appears.
- Document type: You may not want to include all document types in your dataset (does it make sense for your research to include historic newspaper articles and current journal articles?), but instead limit to specific types.
- Provider: While the cornerstones of the content available for analysis in Constellate originate with JSTOR and Portico, we also have content from other providers. You may read a detailed explanation of our providers and their content.
- Category: We have performed some machine learning to assign categories to journal and book content in JSTOR and Portico. If you are interested in only JSTOR content, you may also choose “JSTOR subjects” to select from the subjects assigned to JSTOR journals (these are the same as you'll find on these journals in JSTOR and have been manually assigned.)
- Download availability: For some content in Constellate, you can download the full-text right from the user interface. If you want to limit your dataset to just that subset of content, do it here. You may separately request the full-text of any JSTOR content in Constellate.