The Constellate team teaches live classes for member institutions based on our open educational lessons, licensed under the Creative Commons CC-BY license.

Here are some of the current classes:

Introduction to Text Analysis (12 sessions)
An introduction to text analysis, assuming no prior experience with Jupyter Notebooks or Python.


Python Basics (4 sessions)
An introduction to basic Python programming including operators, expressions, data types, variables, functions, flow control, lists, and dictionaries.


Building and Refining a Constellate Dataset (1-2 sessions)
An introduction to creating a research dataset with the Constellate dataset builder, including background on data types, features, and formats. Can include Pandas methods for working with metadata and refining research projects.


Word Frequencies and Word Clouds (1-2 sessions)
Create a dataset with the Constellate dataset builder. Learn how to extract n-grams, build a stopwords list, use counter objects, clean your data, save your output data, and create a word cloud visualization.


Significant Terms Analysis (1-2 sessions)
Create a dataset with the Constellate dataset builder. Learn how to compute Term Frequency Inverse Document Frequency (TF-IDF), use the Gensim library, find significant terms in a set of documents, and create a simple search engine.


Reading, Writing, and Cleaning Data with Python (3 sessions)
Learn to open, read, and write files in Python, including text files (.txt), comma-separated value files (.csv), and JavaScript Object Notation files (.json). Learn the basics of Pandas Series and DataFrames.


Pandas (3 sessions)
An introduction to working with data in Python Pandas including Series, DataFrames, accessing and changing data, boolean operators, and creating filters.


Tokenization (3 sessions)
An introduction to tokenizing your own texts for use with existing Constellate notebooks.


Sentiment Analysis (1-2 sessions) Analyzing the sentiment of documents using Valence Aware Dictionary and sEntiment Reasoner (VADER) and sci-kit learn.


Topic Modeling (1-2 sessions)
Create a dataset with the Constellate dataset builder. Learn how to apply Latent Dirichlet Allocation (LDiA) using Gensim, evaluate your model, and visualize your topics with PyLDAVis.


Optical Character Recognition (3 sessions)
Learn to use Tesseract to extract text from images and pdfs. Design an OCR project, considering significant workflows and methods.


Working with Strings (1-2 sessions)
Learn essential functions and libraries for working with strings including an introduction to regular expressions.


Text Analysis for Digital Humanities (3-6 sessions)
A set of lessons aimed at the particular needs of digital humanities scholars.


Text Analysis for Data Science (3-6 sessions)
A set of lessons aimed at the particular needs of data scientists.