Author: Laura Isabel Vargas García
This project leverages Digital Humanities and Natural Language Processing (NLP) to analyze the language and themes of the African American Civil Rights Movement. By applying distant reading techniques, it explores the fight against racial segregation in the United States through two key corpora:
- Colored Conventions Project (CCP): Documents from Black conventions held between the mid-19th and early 20th centuries, organized by free Black community leaders in the North.
- Martin Luther King Jr. (MLK) Speeches: A collection of 11 speeches from the 1950s and 1960s, focused on racial equality, social justice, and nonviolent resistance.
The project is organized into several main scripts and data folders:
- Data Preprocessing: Cleaning and preparing raw text data, including stopword and punctuation removal using spaCy.
- XML Conversion: Transforming cleaned texts into XML format with metadata, POS tags, and named entity annotations.
- Collocation Analysis: Extracting and analyzing frequent word pairs (bigrams) from the corpora.
- NER Analysis: Identifying and counting the most frequent named entities, excluding numbers and locations.
- Topic Modeling: Filtering tokens by grammatical category and entity type, segmenting texts, and applying the BERTopic model.
-
Preprocessing:
- Used
spaCy(en_core_web_sm) to clean texts by removing stopwords and punctuation.
- Used
-
XML Conversion:
- Converted texts to XML.
- Extracted metadata and annotated with POS tags and named entities.
-
Collocation Analysis:
- Extracted tokens from XML.
- Generated bigrams and identified the most frequent collocations in each corpus.
-
Named Entity Recognition (NER):
- Identified the 20 most frequent named entities.
- Filtered out numbers and locations to focus on relevant people, organizations, and concepts.
-
Topic Modeling:
- Filtered for nouns, adjectives, proper nouns, and relevant named entities.
- Segmented data into smaller documents.
- Applied
BERTopicto extract and compare core topics in MLK and CCP texts.
-
MLK:
United States,freedom ring,let march,civil rights,direct action,God children,Abraham Lincoln,let freedom,let dissatisfied,New York,love enemies,stop help,years ago,dream day,Montgomery Alabama -
CCP:
colored people,United States,New York,Colored men,J.H,Vice President,people State,W.H,J.W,equal rights,white men
-
MLK:
Negro,today,Jesus,American,South,tonight,Love,Christian,morning,years,Abraham Lincoln,Vietnamese,French,Americans,Christians,Negroes,John,Diem,Stanton,tomorrow -
CCP:
Convention,State,American,South,Committee,Congress,Business Committee,Negro,Constitution,years,Association,Republican,evening,second,1865,annual,Executive Committee,Christian,Resolved Convention,Africa
-
Opposition to Violence
- Key terms:
negro,nation,freedom,sir,people,white,men - Focuses on civil rights, denounces violence and inaction, emphasizes justice and leadership.
- Key terms:
-
Christian Tone
- Key terms:
love,life,God,right,morning,man,people - Highlights love for enemies, Christian philosophy, and moral reflection.
- Key terms:
-
Institutional Tone and Main Concern
- Key terms:
convention,state,committee,people,mr,colored,men - Formal, political discourse on emancipation, enfranchisement, and dignity.
- Key terms:
-
Concern about Cuba
- Key terms:
cuba,Spanish,government,slavery,island,spain,Cuban - Discusses slavery in Cuba, US intervention, and the broader political context.
- Key terms:
-
Education Issue
- Key terms:
Kentucky,Frankfort,Lexington,association,Henderson,convention,Louisville - Focuses on educational advancement and legislative reform in Kentucky conventions.
- Key terms:
-
Install Dependencies
- Ensure Python 3.x is installed.
- Install required packages
-
Prepare Data
- Place raw corpora in the appropriate
data/folders.
- Place raw corpora in the appropriate
-
Run Preprocessing Scripts
- Execute scripts for cleaning, XML conversion, and annotation.
-
Run Analysis Scripts
- Execute scripts for collocation analysis, NER, and topic modeling.
-
Review Results
- Results will be saved to the
output/folder and may include visualizations (e.g., word clouds, topic summaries).
- Results will be saved to the
- Colored Conventions Project
- MLK Speeches courtesy of the Richton Park Library MLK Program