Fake News Detector – STA160 Group 2

Methods Overview

Our project uses multiple data sources, machine learning models, and AI tools to accurately identify fake information. Here are the main steps and models we used:

1. Data Analysis & Preparation (EDA)

For the prediction model on Tweets, we used a dataset “Truth Seeker 2023” from the Canadian Institute of Cybersecurity under the University of New Brunswick. This dataset was initially gathered on 180,000 tweets from 2016-2017, particularly about news and politics. For the prediction model on news, we used the News Dataset (Kaggle Fake & Real News), which consists of ~45,000 news articles (23k fake, 21k real). We analyze the data characteristics to lead to further data cleaning, of making the subject column consistent, unifying letter case, tokenizing text, and removing common words.

2. Model Selection & Baselines

We explored logistic regression, Random Forest, XGBoost, and BERT. BERT was chosen for tweets because it natural language directly. It reads the full text of each tweet and learns meaning from context, sentence structure, and the way ideas are expressed. XGBoost was chosen for news because it performed most reliably with noisy text, nonlinear patterns, and imbalanced subject distributions.

3. ChatGPT Prediction

We developed a Python script and a specific prompt designed to query ChatGPT when a user inputs text. ChatGPT provides an additional True/False prediction and confidence score based on its general language knowledge.

4. API & Cloud Deployment

Framework: The prediction service is built using Python Flask. Model Integration: The API integrates and hosts all three core models: BERT, XGBoost, and the ChatGPT. AWS API Gateway forwards requests from our GitHub Pages website to our EC2 instance, To ensure reliable web access, including robust Cross-Origin Resource Sharing (CORS) allow requests from our GitHub Pages frontend. and returns structured JSON results instantly.

🔁 Deployment Flow Overview

1. User Dashboard

User types tweet or news on our GitHub Pages website.

➜

2. API Gateway

Frontend sends a POST request to AWS API Gateway.

➜

3. EC2 + Flask

Gateway forwards the request to our Flask service running on EC2.

➜

4. Models

Flask calls BERT or XGBoost and an independent ChatGPT predictor.

➜

5. Response

Flask returns prediction, API Gateway sends JSON back to the dashboard.

Click for step-by-step explanation

User Input: The user enters a tweet or news article in the dashboard textarea.
HTTP Request: JavaScript sends a JSON POST request to AWS API Gateway
Routing: API Gateway forwards the request to our Flask endpoint on the EC2 instance.
Run the Models: The server reads your text and uses our trained models (BERT / XGBoost) plus a ChatGPT predictor to detect whether it looks real or fake
Get the Result: The server sends back a clean, structured answer with the prediction and a confidence score.
Dashboard Update: Our website displays the result for you — including the label, confidence, and the ChatGPT-style explanation.