Fake News Detector – STA160 Group 2

Results & Visualizations

1. What We Found in the Tweet Data/key Features

Most Twitter users are very consistent: they either tell the truth almost all the time or share false information almost all the time.
The users who spread false information are fewer in number, but they post many more tweets, sometimes over 16,000.

People also tend to retweet messages without checking whether they are true or false. Tweets that reference articles usually agree with those articles, even if the articles themselves are misleading.

Total word count and total token count
How often short words appear
Adpositions (words like “to”, “from”, “with”)
The number of periods and sentences

Features that were much less useful included average word length, the shortest word, hashtags, and various account statistics.

2. What We Found in the News Dataset

For news articles, we used a Kaggle fake vs real news dataset. During data exploration we saw that fake news and true news behave very differently:

Fake news has more topic categories; true news topics are more consistent.
Fake news titles often have messy formatting, such as extra spaces or strange capitalization.
Fake news appears more evenly spread out over time, while true news is clustered in clearer time periods.
Fake news articles tend to be longer and more dramatic, while true news is shorter and more factual.

BERT vs ChatGPT Score Distributions

Histogram of BERT scores

Histogram of Chat scores

To compare the two systems, we ran approximately 200 randomly selected tweets from the NYC mayoral election through both our fine-tuned BERT model and a standardized ChatGPT prompt, collecting the predicted label and confidence score for each model. On average, ChatGPT found the tweets to be about 20% less true. BERT continued to be the stronger strict classifier. Its predictions were consistent and usually high in confidence, reflecting how it was trained—directly on labeled truth data using only the text. As we saw in the previous plots, BERT produced very confident, mostly binary predictions, whereas ChatGPT’s scores were more moderate and varied.

XGBoost vs ChatGPT

Hrediction Accuracy

Confidence Level Distribution

When testing our XBG boost model vs ChatGPT, we subset a random sample of 200 data points from the original data and compared the results with ChatGPT we have to admit that this accuracy does tend to drop once we start leaving data involved in the 2016 election sphere. When inputting more current data, the model is skewed, labeling it as Fake news much more often than Real.