Methodology

How we measure community sentiment toward AI models.

Data Collection

We collect posts and comments from 24 AI-focused subreddits, including communities like r/ChatGPT, r/ClaudeAI, r/LocalLLaMA, r/MachineLearning, and others. Data is scraped regularly using the Reddit API.

Tracked Subreddits

r/ChatGPT r/ClaudeAI r/OpenAI r/LocalLLaMA r/MachineLearning r/Bard r/GeminiAI r/PromptEngineering r/LLM r/AGI r/DeepLearning r/LLMDevs r/AI_Agents r/ChatGPTCoding r/vibecoding and more...

Sentiment Classification

Each piece of content is analyzed by a local LLM to classify sentiment toward three major AI providers. We use the Qwen3-30B-A3B model running locally via Ollama for privacy and cost efficiency.

Classification Process

For each post or comment, the LLM analyzes the text and determines the sentiment expressed toward each AI provider. The model considers context, sarcasm, and comparative statements to provide accurate classifications. A single piece of content can express different sentiments toward different providers (e.g., positive about Claude, negative about GPT).

Anthropic (Claude)

Includes Claude, Claude 3, Claude 3.5, Sonnet, Opus, Haiku

OpenAI (GPT)

Includes GPT-4, GPT-4o, ChatGPT, o1, o3

Google (Gemini)

Includes Gemini, Gemini Pro, Bard, PaLM

Sentiment Scale

Content is classified on a 6-point scale:

  • Very Positive Strong enthusiasm, strong recommendation, expressions like "best", "amazing", "love"
  • Positive Praise, satisfaction, mild preference, general approval
  • Neutral Factual mention, questions, no clear opinion, balanced comparisons
  • Negative Criticism, disappointment, mild complaint, frustration
  • Very Negative Strong criticism, frustration, rejection, expressions like "worst", "hate", "terrible"
  • Not Mentioned Provider not referenced in content

Scoring

The leaderboard score is calculated using weighted sentiment:

Score = (very_positive × 2 + positive × 1 - negative × 1 - very_negative × 2) / total_opinionated × 100

Neutral mentions are excluded from the score calculation but included in total mention counts. This gives extra weight to strong opinions while normalizing across different mention volumes.

Limitations

  • Reddit bias: Reddit users don't represent all AI users. The platform skews toward technical, English-speaking users.
  • Subreddit bias: Provider-specific subreddits (r/ClaudeAI, r/ChatGPT) naturally have more positive sentiment toward their subject.
  • Classification accuracy: Current models achieve ~80% accuracy on our golden test set. Sarcasm and nuanced comparisons are particularly challenging.
  • Temporal effects: Major releases, outages, or news events can cause temporary sentiment spikes.

Updates

Data is collected and classified on a regular basis. Each scrape run collects posts and comments from the past 24-48 hours from all tracked subreddits. New content is then processed through the sentiment classification pipeline.

The dashboard reflects the cumulative sentiment data, with the ability to filter by date range to analyze specific time periods. The "Last Updated" timestamp in the stats bar shows when the most recent data collection occurred.