Methodology
How we measure community sentiment toward AI models.
Data Collection
We collect posts and comments from 24 AI-focused subreddits, including communities like r/ChatGPT, r/ClaudeAI, r/LocalLLaMA, r/MachineLearning, and others. Data is scraped regularly using the Reddit API.
Tracked Subreddits
Sentiment Classification
Each piece of content is analyzed by a local LLM to classify sentiment toward three major AI providers. We use the Qwen3-30B-A3B model running locally via Ollama for privacy and cost efficiency.
Classification Process
For each post or comment, the LLM analyzes the text and determines the sentiment expressed toward each AI provider. The model considers context, sarcasm, and comparative statements to provide accurate classifications. A single piece of content can express different sentiments toward different providers (e.g., positive about Claude, negative about GPT).
Anthropic (Claude)
Includes Claude, Claude 3, Claude 3.5, Sonnet, Opus, Haiku
OpenAI (GPT)
Includes GPT-4, GPT-4o, ChatGPT, o1, o3
Google (Gemini)
Includes Gemini, Gemini Pro, Bard, PaLM
Sentiment Scale
Content is classified on a 6-point scale:
- Very Positive Strong enthusiasm, strong recommendation, expressions like "best", "amazing", "love"
- Positive Praise, satisfaction, mild preference, general approval
- Neutral Factual mention, questions, no clear opinion, balanced comparisons
- Negative Criticism, disappointment, mild complaint, frustration
- Very Negative Strong criticism, frustration, rejection, expressions like "worst", "hate", "terrible"
- Not Mentioned Provider not referenced in content
Scoring
The leaderboard score is calculated using weighted sentiment:
Score = (very_positive × 2 + positive × 1 - negative × 1 - very_negative × 2) / total_opinionated × 100 Neutral mentions are excluded from the score calculation but included in total mention counts. This gives extra weight to strong opinions while normalizing across different mention volumes.
Limitations
- Reddit bias: Reddit users don't represent all AI users. The platform skews toward technical, English-speaking users.
- Subreddit bias: Provider-specific subreddits (r/ClaudeAI, r/ChatGPT) naturally have more positive sentiment toward their subject.
- Classification accuracy: Current models achieve ~80% accuracy on our golden test set. Sarcasm and nuanced comparisons are particularly challenging.
- Temporal effects: Major releases, outages, or news events can cause temporary sentiment spikes.
Updates
Data is collected and classified on a regular basis. Each scrape run collects posts and comments from the past 24-48 hours from all tracked subreddits. New content is then processed through the sentiment classification pipeline.
The dashboard reflects the cumulative sentiment data, with the ability to filter by date range to analyze specific time periods. The "Last Updated" timestamp in the stats bar shows when the most recent data collection occurred.