Showing posts with label Programming & Web Development. Show all posts
Showing posts with label Programming & Web Development. Show all posts

Tuesday, July 1, 2025

Project on Analyzing Fake vs Real News Patterns

Introduction

The rise of fake news is of grave concern for the media industry. Fake news erodes trust, may influence public opinions on critical matters like elections and create fear among the public. In this regard, this project tried to explore fake news patterns by comparing them with true news patterns and come up with some results that are applicable to the media industry in general. This project used the “fake-and-real-news-dataset” from Kaggle, available as two CSV files (Fake.csv, True.csv), both containing almost equal number of Fake and Real news. Python was utilized for data cleaning, transforming, and merging task, and finally Tableau dashboard was created for visually analyzing the fake news pattern thus providing suggestions for spotting fake news. 


Business Impact

This analysis is expected to help media strategists and content moderators to:

  • Identify Sensationalism: Detect fake news characteristics (e.g., longer titles, sensational words) to enhance content filtering or fact-checking systems.
  • Understand Publication Trends: Understand when fake news spikes (e.g., election periods) to optimize editorial planning during major political events.
  • Enhance Content Strategy: Leverage true news characteristics (e.g., neutral language) to craft credible content.

Data

  • Dataset Name: “fake-and-real-news-dataset” from Kaggle
  • File Names: Fake.csv, True.csv
  • Description: Fake and Real News articles from various sources, covering 2015–2018.
  • Dataset Details: Initially 44,898 rows (23,420 Fake, 21,478 True), 6 columns after merging and processing. After-cleaning: ~38,688 rows after removing 41 null dates and 6,165 duplicate titles and merged file name: news_merged.CSV
  • Size: ~100 MB (cleaned CSV file after merging).
  • Target Features:
    • title (text): Article titles for word frequency analysis.
    • date (datetime): Publication date for trend analysis.                 
    • subject (categorical): Article category (merged to 5: worldnews, politicsNews, Government News, US_News, left-news).
    • label (categorical): Fake or True, the primary target for comparison.
    • title_length (numerical): Word count of titles, new feature introduced.

 

These features addressed the problem by enabling analysis of content, timing, and characteristics of fake vs. true news.


Data Analysis & Computation

There were 5 major analyses done on the dataset to obtain insights into the data:

 

Analysis #1: Prevalence of fake news and real news in the dataset

Post-cleaning using news_merged.CSV dataset, it shows that fake news and real news are almost spread equal thereby providing a robust dataset for analysis. Here is a Tableau worksheet snapshot for the same:



Analysis #2: Title Length Distribution

Histograms of title_length by label using “matplotlib.pyplot” and Tableau reveals:

  • Fake: Right-skewed, median ~14 words, longer due to sensational phrases (e.g., “Sheriff David Clarke Becomes An Internet Joke For Threatening To Poke People ‘In The Eye”).
  • True: Very slightly Left-skewed, median ~10 words, concise (e.g., “Senate Approves New Policy”). 

The histogram also shows that Fake titles have higher variability and outliers (3 to 42 words) which supports the assumption that fake news uses longer titles due to sensationalism or attention-grabbing tactics. 


Analysis #3: Label and Subject Distribution

The analysis of label and subject (after merging “News” to “worldnews” and “politics” to “politicsNews”) via (df.groupby(['subject', 'label']).size() / len(df) * 100) shows:

  • Label: ~46% Fake (~17,862 articles), ~54% True (~20,826 articles).
  • Subject:
    • worldnews: ~48% (half Fake, half True, per curation).
    • politicsNews: ~45% (mixed Fake/True, True dominant).
    • Government News: ~1.5% (Fake only).
    • US_News: ~2% (Fake only).
    • left-news: ~2% (Fake only). This near-even Fake/True split, with worldnews and politicsNews dominating, highlights source differences (Fake.csv vs. True.csv). The stacked bar chart in Tableau visualizes this distribution.


Analysis #4: Date Trends

Grouping date by year and label (via df.groupby([df['date'].dt.year, 'label']).size()) shows temporal patterns:

  • Fake Articles: Spike in 2016 (~67% of Fake articles), likely due to the U.S. presidential election, then decline in 2017–2018.
  • True Articles: None in 2015, less (4,650) in 2016 and high (16,176) in 2017. 

 

Below is a line chart using Python that visualizes this, confirming the hypothesis that fake news surges during high-profile events, suggesting editorial planning during election periods.



Analysis #5: Word Frequency Analysis

Using NLTK, the top 50 words in title by label were extracted to get a new CSV file: word_frequencies.csv. Then a word cloud is formed in Python visualizing repeated words that come under Fake and Real labels.

  • Fake: Sensational words dominate (e.g., “trump”, “video”, “breaking”), suggesting clickbait tactics.
  • True: Neutral words prevail (e.g., “official”, “senate”, “police”), indicating factual reporting. The word cloud in Python (and corresponding bubble chart in Tableau with word size by frequency, color by label) highlights these stylistic differences, supporting the hypothesis that fake news uses attention-grabbing language, though some words overlap.


Conclusion & Future Work

After carefully analyzing the data and charts, we have come to the following conclusions to spot fake news:

       The title length of fake news is longer (~14 words) in comparison to true news (~10 words).
       Fake news is not specific to any subject and can be found among diverse categories or subjects.
       Fake news or misinformation is rampant during major geopolitical events like the year 2016 during US election time and so news monitoring is necessary during those times.
       The presence of some words like ‘trump’, ‘video’, ‘watch’, etc. in the news article are a red flag and the authenticity of the news must be re-checked or the content be moderated.  


This project utilized only one dataset which may have errors or potential biases. So, in the future other data sources may be merged with this dataset and sentiment analysis can also be included for deeper insights.

References

https://www.kaggle.com/datasets/clmentbisaillon/fake-and-real-news-dataset 

Dashboard

https://public.tableau.com/app/profile/lok.pandey/viz/C1Fakenewsproject-LokPandey/FakevsRealNewsDashboard

Wednesday, May 18, 2022

How I built a real estate sales representative website using Wordpress

This video explains how I made a real estate sales representative website using Wordpress. The website is built using "blogwaves" theme and a number of plugins like "Accordion Slider Gallery", "Easy Property Listings", "Ninja Forms", "MetaSlider", etc.

 

If you are interested in making a custom website, please give me a call at (+1) 647-401-8611 or email me at lokprakashpandey@gmail.com

Friday, January 3, 2020

SOLID Principles

The principles that SOLID outlines are important to writing clear maintainable code. SOLID is an acronym. It describes five principles that developers should always consider when writing code. Those five principles are:
S – Single Responsibility
O – Open for Extension / Closed for Modification
L – Liskov Substitution
I – Interface Segregation
D – Dependency Inversion


Single Responsibility – A class should have a single responsibility. Generally, responsibilities are potential points of change. So, it follows that the more a class is responsible for the more likely it will need to change as the application utilizing the class evolves changes. Keeping the classes focused on a specific responsibility ensures that they perform their task effectively and efficiently.


Open for Extension / Closed for Modification – Classes are generally instantiated in several different parts of an application. For example, let us take an example of a Customer class in an order/entry system. The Customer class will probably be instantiated when creating a new customer, updating the customer, adding orders to a customer and so on. If a developer needs to modify the Customer class according to a new requirement that developer will need to understand the where and the why that class is used in every part of the application. This can become overwhelming quickly. Providing extensibility points in a class through inheritance developers can override any methods or properties marked as virtual.


Liskov Substitution – This principle states that ‘if a child class C is a subtype of P then objects of type P may be replaced with objects of type C without altering the correctness of that program’. What it means is that the child classes cannot break the functionality of the parent. The classic example is one in which we have a rectangle parent class and a child square class. The square class is-a rectangle but when we constrain the square to have fixed length and width parameters we break the Liskov Substitution principal because the parent can no longer be replaced with the child class.


Interface Segregation – In alignment with the Single Responsibility principle, the Interface Segregation principle dictates that interfaces should be as simple and focused as possible. It’s better to have multiple interfaces that are focused than a single interface that tries to do everything. In a language like Java or C#, the designer and developer lose nothing by splitting a large interface into multiple interfaces. As we know we can implement several interfaces on a class. Classes that implement interfaces must implement all the properties the interface dictates.


Dependency Inversion – Dependencies should rely on abstractions rather than concretions. The Dependency Inversion principle specifies that we should move the instantiation of classes that a higher-level class depends on outside of that class thus inverting the dependency. This allows us to replace the dependency with another class and not break our open/closed principle.

For example: Let us analyze the following code:

using System;

public class Program
{
private static Computer _computer;
public static void Main()
{
_computer = new Computer();
_computer.Test();
}
}

public class Computer
{
private MotherBoard _mb;
private Cpu _cpu;
private Ram _ram;
private HardDrive _hd;
private PowerSupply _ps;
public Computer()
{
_mb = new MotherBoard();
_cpu = new Cpu();
_ram = new Ram();
_hd = new HardDrive();
_ps = new PowerSupply();
}

public void Test()
{
Console.WriteLine(_mb.Motherboarding());
Console.WriteLine(_cpu.Calculate());
Console.WriteLine(_ram.StoringData());
Console.WriteLine(_hd.WritingData());
Console.WriteLine(_ps.ProducePower());
}
}

internal class PowerSupply
{
public string ProducePower()
{
return "Producing Power";
}
}

internal class HardDrive
{
public string WritingData()
{
return "Writing Data";
}
}

internal class Ram
{
public string StoringData()
{
return "Storing Data";
}
}

internal class Cpu
{
public string Calculate()
{
return "Calculating";
}
}

internal class MotherBoard
{
public string Motherboarding()
{
return "Connecting everything";
}

}


This code is not SOLID. Now, what we must do is we have to be able to build mobile, computer, laptop, etc. as needed using SOLID principles. The final code may look like this:
using System;
using System.Collections.Generic;
namespace Test
{    
public class Program
    {
        private static void Build(Device Dev)
        {
            Dev.Test();
        }
        public static void Main()
        {
            Build(new Device(new List<IPart>{
                new MotherBoard(),new Cpu(),
                new Ram(), new HardDrive(),
                new PowerSupply() }));
            Build(new Device(new List<IPart>{
                new MotherBoard(),new Cpu(),
                new Ram(), new HardDrive(),
                new PowerSupply(), new Battery() }));
          }
    }
    //Open Closed Principle
    internal class Device
    {
        private List<IPart> _parts;
        //Dependency Inversion
        public Device(List<IPart> parts)
        {
            _parts = parts;
        }
        public void Test()
        {
             //Liskov Substitution Principle
            foreach(IPart p in _parts)
                Console.WriteLine(p.MyMessage());
        }
    }
    //single responsibility principle
    internal class Battery : IPart
    {
        public string MyMessage()
        {
                return "Charging";
        }
    }
    internal class PowerSupply : IPart
    {
        public string MyMessage()
        {
                return "Producing Power";
        }
    }
    internal class HardDrive : IPart
    {
        public string MyMessage()
        {
                return "Writing Data";
        }
    }
    internal class Ram : IPart
    {
        public string MyMessage()
        {
                return "Storing Data";
        }
    }
    internal class Cpu : IPart
    {
        public string MyMessage()
        {
            return "Calculating";
        }
    }
    internal class MotherBoard: IPart
    {
        public string MyMessage()
        {
                return "Connecting everything";
        }
    }    //Interface Segregation
    interface IPart
    {
        string MyMessage();
    }