RedTools

A collection of tools for analysing Reddit data from the AusReddit collection, designed for deployment on Nectar BinderHub.

File	Description
`api.py`	API wrapper for the AusReddit collection
`far_bot.py`	Feasibility assessment bot — evaluates whether a topic has sufficient data for research
`reddit_topic_trees.py`	Builds directed conversation graphs from Reddit submissions and comments
`ausreddit_metrics.py`	Computes per-submission conversation metrics from a conversation graph
`emo_intensity_over_time.py`	NRC emotion intensity analysis over time
`LDA_over_time.py`	LDA topic modelling over time
`NLP_over_time.py`	Basic NLP analysis
`topic_window.py`	BERTopic and BERTopic with time windows
`config.yaml.example`	Template configuration file — copy this to `config.yaml` and fill in your credentials

Setup

1. Copy the example config

cp config.yaml.example config.yaml

config.yaml is excluded from version control via .gitignore so your credentials will never be accidentally committed.

2. Fill in your credentials

Open config.yaml and replace each placeholder with your real values:

reddit:
  client_id: 'your_client_id'         # from https://www.reddit.com/prefs/apps
  client_secret: 'your_client_secret'
  redirect_uri: 'your_redirect_uri'
  user_agent: 'your_user_agent'

ausreddit:
  api_key: 'your_api_key'             # AusReddit collection API key

open_ai:
  api_key: 'your_openai_api_key'      # optional, only needed for OpenAI-backed tools

far_bot:
  google_api_key: 'your_google_api_key'         # Google AI Studio API key
  langsmith_tracing: 'true'
  langsmith_endpoint: 'https://api.smith.langchain.com'
  langsmith_api_key: 'your_langsmith_api_key'   # LangSmith API key
  langsmith_project: 'your_langsmith_project'   # LangSmith project name

The umap, pca, tsvd, hdbscan, kmeans, and bertopic sections contain hyperparameters that can be tuned — the defaults in config.yaml.example are a good starting point.

Set hardware: CPU if you do not have a GPU available.

Conversation Trees (`reddit_topic_trees.py` + `ausreddit_metrics.py`)

Reddit_trees collects Reddit data via the PRAW API and builds directed conversation graphs where each node is a submission or comment, and each edge represents a reply relationship.

AusredditMetrics takes one of those graphs and returns a DataFrame of per-submission structural and time-based metrics.

Building a conversation graph

from reddit_topic_trees import Reddit_trees
from ausreddit_metrics import AusredditMetrics

trees = Reddit_trees()

# Collect data
submissions_df = trees.search_subreddit("housing affordability", subreddit="australia")
comments_df = trees.fetch_comments(submissions_df['id'].tolist())

# Build graph — submissions_df is required so each submission becomes the root node
G, adj = trees.tree_graph_and_adj_list(comments_df, submissions_df)

Each connected component in G corresponds to exactly one submission and all of its comments. The submission node (in-degree 0) is the root of each tree.

Computing metrics

metrics = AusredditMetrics()
df = metrics.analyze_conversation_graphs(G)
print(df)

The returned DataFrame is indexed by submission ID and includes:

Column	Description
`num_comments`	Number of comment nodes (excludes the submission root)
`num_nodes`	Total nodes including the submission root
`num_edges`	Number of reply edges
`longest_path_length`	Depth of the deepest reply chain
`average_path_length`	Mean depth across all nodes
`num_branches`	Nodes where more than one reply was made
`num_endpoints`	Leaf nodes (comments with no replies)
`total_duration`	Time from submission to last comment (HH:MM:SS)
`shortest_response_time`	Fastest reply in the thread (HH:MM:SS)
`longest_response_time`	Slowest reply in the thread (HH:MM:SS)
`average_response_time`	Mean reply time across all edges (HH:MM:SS)

Column name overrides

tree_graph_and_adj_list accepts keyword arguments to remap column names for non-Reddit data schemas:

G, adj = trees.tree_graph_and_adj_list(
    comments_df,
    submissions_df,
    id_col='commentId',
    author_col='username',
    body_col='text',
    link_id_col='threadId',
    parent_id_col='responseTo',
    time_col='date',
    time_is_utc=False,
    submission_title_col='headline',
    submission_body_col='content',
)

Feasibility Assessment Bot (`far_bot.py`)

Assesses whether a topic has enough data in the AusReddit collection to be worth studying. Given a query and date range it retrieves submission counts and ngram frequencies, generates charts, and produces a short report covering:

Occurrence — is the topic present, and when does it first/last appear?
Frequency — how many submissions mention it over time?
Volume — what proportion of total comments mention it?

Usage

Command line:

python far_bot.py "bluey" --start 2024-01-01 --end 2025-01-01 --save

As a module:

from far_bot import run
run("bluey", start="2024-01-01", end="2025-01-01", save=True)

The --save / save=True flag writes the report (.md) and charts (.png) to files named after the topic.

Date formats

--start and --end accept yyyy-mm-dd or dd/mm/yyyy.

Output

A feasibility report printed to the terminal (and optionally saved as a .md file)
A bar chart of submission counts over time (submission_frequency.png)
A line chart of ngram usage percentages over time (ngram_volume.png)

Name		Name	Last commit message	Last commit date
Latest commit History 174 Commits
__pycache__		__pycache__
data		data
pics		pics
.gitignore		.gitignore
A glamorous introduction to text analytics for social.pptx		A glamorous introduction to text analytics for social.pptx
ARDC_Reddit_Trees.pptx		ARDC_Reddit_Trees.pptx
Aquisitions.ipynb		Aquisitions.ipynb
LDA_over_time.ipynb		LDA_over_time.ipynb
LDA_over_time.py		LDA_over_time.py
LICENSE		LICENSE
NLP_over_time.py		NLP_over_time.py
README.md		README.md
aaraw.py		aaraw.py
api.py		api.py
ausreddit_api_wrapper.r		ausreddit_api_wrapper.r
ausreddit_metrics.py		ausreddit_metrics.py
config.yaml.example		config.yaml.example
emo_intensity_over_time.py		emo_intensity_over_time.py
far_bot.py		far_bot.py
farbot.ipynb		farbot.ipynb
hierarchical_topics.py		hierarchical_topics.py
reddit_topic_trees.py		reddit_topic_trees.py
requirements.in		requirements.in
requirements.txt		requirements.txt
topic_window.ipynb		topic_window.ipynb
topic_window.py		topic_window.py
trees.ipynb		trees.ipynb
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RedTools

Contents

Setup

1. Copy the example config

2. Fill in your credentials

Conversation Trees (`reddit_topic_trees.py` + `ausreddit_metrics.py`)

Building a conversation graph

Computing metrics

Column name overrides

Feasibility Assessment Bot (`far_bot.py`)

Usage

Date formats

Output

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RedTools

Contents

Setup

1. Copy the example config

2. Fill in your credentials

Conversation Trees (reddit_topic_trees.py + ausreddit_metrics.py)

Building a conversation graph

Computing metrics

Column name overrides

Feasibility Assessment Bot (far_bot.py)

Usage

Date formats

Output

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Conversation Trees (`reddit_topic_trees.py` + `ausreddit_metrics.py`)

Feasibility Assessment Bot (`far_bot.py`)

Packages