Methodology

How we measure community sentiment toward AI models.

Data Collection

We collect posts and comments from 24 AI-focused subreddits. Data is scraped regularly using the Reddit API.

Tracked Subreddits

r/ChatGPT
r/OpenAI
r/ClaudeAI
r/Bard
r/GeminiAI
r/LocalLLaMA
r/MachineLearning
r/ArtificialInteligence
r/artificial
r/DeepLearning
r/learnmachinelearning
r/NeuralNetworks
r/LLM
r/LLMDevs
r/AGI
r/AI_Agents
r/AgentsOfAI
r/PromptEngineering
r/aipromptprogramming
r/ChatGPTPromptGenius
r/ChatGPTCoding
r/vibecoding
r/AIinBusiness
r/Robotics

Sentiment Classification

Each piece of content is analyzed by an LLM to classify sentiment toward three major AI providers. We use the arcee-ai/trinity-large-preview model via OpenRouter for fast and cost-efficient classification.

Classification Process

For each post or comment, the LLM analyzes the text and determines the sentiment expressed toward each AI provider. The model considers context, sarcasm, and comparative statements to provide accurate classifications. A single piece of content can express different sentiments toward different providers (e.g., positive about Claude, negative about GPT).

Anthropic (Claude)

Includes Claude 3.5, Claude 4, Claude 4.5 (Opus, Sonnet, Haiku), Claude Code

OpenAI (GPT)

Includes GPT-4o, GPT-5, ChatGPT, o1, o3, o4-mini, Codex

Google (Gemini)

Includes Gemini 1.5, Gemini 2.0, Gemini 3, Gemini Flash, Gemma

Sentiment Scale

Content is classified on a 6-point scale:

Very Positive Strong enthusiasm, strong recommendation, expressions like "best", "amazing", "love"
Positive Praise, satisfaction, mild preference, general approval
Neutral Factual mention, questions, no clear opinion, balanced comparisons
Negative Criticism, disappointment, mild complaint, frustration
Very Negative Strong criticism, frustration, rejection, expressions like "worst", "hate", "terrible"
Not Mentioned Provider not referenced in content

Scoring

The leaderboard score is calculated using weighted sentiment:

Score = (very_positive × 2 + positive × 1 - negative × 1 - very_negative × 2) / total_opinionated × 100

Neutral mentions are excluded from the score calculation but included in total mention counts. This gives extra weight to strong opinions while normalizing across different mention volumes.

Limitations

Reddit bias: Reddit users don't represent all AI users. The platform skews toward technical, English-speaking users.
Subreddit bias: Provider-specific subreddits (r/ClaudeAI, r/ChatGPT) naturally have more positive sentiment toward their subject.
Classification accuracy: Current models achieve ~80% accuracy on our golden test set. Sarcasm and nuanced comparisons are particularly challenging.
Temporal effects: Major releases, outages, or news events can cause temporary sentiment spikes.

Updates

Data is collected and classified on a regular basis. Each scrape run collects posts and comments from the past 24-48 hours from all tracked subreddits. New content is then processed through the sentiment classification pipeline.

The dashboard reflects the cumulative sentiment data, with the ability to filter by date range to analyze specific time periods. The "Last Updated" timestamp in the stats bar shows when the most recent data collection occurred.

← Back to Dashboard