SAS Hackathon Team Profiles (Past)

See the use cases and winners from past SAS Hackathon events!
BookmarkSubscribeRSS Feed

Fake News Detection

Started ‎02-15-2021 by
Modified ‎10-20-2022 by
Views 4,158
Nupeak Tachyon Fake News Detection (HACKIN SAS 2021) #Short video
Video Player is loading.
Current Time 0:00
Duration 0:00
Loaded: 0%
Stream Type LIVE
Remaining Time 0:00
 
1x
    • Chapters
    • descriptions off, selected
    • captions off, selected
    • en (Main), selected
    (view in My Videos)
    Fake News Nupeak Tachyon (HACKIN SAS 2021) # Long Video
    Video Player is loading.
    Current Time 0:00
    Duration 0:00
    Loaded: 0%
    Stream Type LIVE
    Remaining Time 0:00
     
    1x
      • Chapters
      • descriptions off, selected
      • captions off, selected
        (view in My Videos)
                                             
        Nupeak_Tachyon Presentation.mp4
        Video Player is loading.
        Current Time 0:00
        Duration 0:00
        Loaded: 0%
        Stream Type LIVE
        Remaining Time 0:00
         
        1x
          • Chapters
          • descriptions off, selected
          • captions off, selected
            (view in My Videos)
                                                                                  
             
            Team Name Nupeak Tachyon
            Track START UP
            Use Case Fake News Detection
            Technology NLP, ML
            Region India
            Team lead Jatin Pithva @Jatin_P 
            Team members @uttam631 @N_AK  @toshi @HITESH_MALI 

             

            Introduction

            The authenticity of Information has become a longstanding issue affecting businesses and society, both for printed and digital media. On social networks, the reach and effects of information spread occur at such a fast pace and so amplified that distorted, inaccurate or false information acquires a tremendous potential to cause real world impacts, within minutes, for millions of users. Recently, several public concerns about this problem and some approaches to mitigate the problem were expressed.

            Fake news refers to misinformation or disinformation in the country which is spread through word of mouth and traditional media and more recently through digital forms of communication such as edited videos, memes, unverified advertisements and social media propagated rumors. In this project, we discuss the problem by presenting the proposals into categories:

            1. Content Based
            2. Source Based
            3. Diffusion Based

            We describe two opposite approaches and propose an algorithmic solution that synthesizes the main concerns. We conclude the paper by raising awareness about concerns and opportunities for businesses that are currently on the quest to help automatically detecting fake news.

             

            Goal

            The main objective is to detect the fake news, which is a classic text classification problem with a straight forward proposition. It is needed to build a model that can differentiate between “Real” news and “Fake” news and identify the source that publish fake news simultaneously.

            For Different point of view:

            1. Citizens- Citizen to use this tool to identified fake news or they will see the source of fake news and take alert on email the content of fake news.
            2. Government- Concern government Authority will use this model to identify the source of the fake news and take immediate action.
            3. Publishers- They will identify if someone using his broadcaster name for unauthenticated propaganda.

             

            Tools and Technology

            1. SAS Visual Text Analytics
            2. SAS VDDML
            3. Python
            4. SAS Viya
            5. SAS Studio
            6. SAS Visual Analytics
            7. Microsoft Azure

            Process Flow Diagram

             

            uttam631_0-1613051691277.png

             

             

            Project Implementation Approach

            uttam631_1-1613051691282.png

             

            Description

            1. Crawl the news from the different source URL.
            2. Create a crawled news data in tabular format.
            3. Data Cleansing- Articles with no body text or having less than 10 words in the article body are removed. These operations are performed on all the datasets to achieve consistency of format and structure. Once the relevant attributes are selected after the data cleaning and exploration phase.
            4. Linguistic features- Linguistic features involved certain textual characteristics converted into a numerical form such that they can be used as an input for the training models.
            5. Feature Selection- Select the correlated variables that are important for model.
            6. The input features will used to train the different machine learning models. Each dataset is divided into training and testing data with a 70/30 split.
            7. The learning algorithms are trained with different hyperparameters to achieve maximum accuracy for a given dataset, with an optimal balance between variance and bias.
            8. Compare the output of all that models that we created.
            9. Identify the best fit model.
            10. Give the final conclusion of the best fit model based on their output whether the news is true or fake.

            Model Implementation Approach

            We have shown the Model output.

             

            Confusion Matrix:

                Predicted Class Predicted Class
                REAL FALSE

            Actual Class

            REAL

            TRUE POSITIVE

            FALSE NEGATIVE

            Actual Class

            FALSE

            FALSE NEGATIVE

            TRUE NEGATIVE

            Visualization Report:

            uttam631_2-1613051691284.png

             

            Conclusion

            With the increasing popularity of social media, more and more people consume news from social media instead of traditional news media. However, social media has also been used to spread fake news, which has strong negative impacts on individual users and broader society.

            The task of classifying news manually requires in-depth knowledge of the domain and expertise to identify anomalies in the text. In this project, we discussed the problem of classifying fake news articles using machine learning models and ensemble techniques. The data we used in our work is collected from the different sources URL and contains news articles from various domains. The primary aim of the project is to identify patterns in text that differentiate fake articles from true news. We extracted different textual features from the articles using a different SAS tools and used the feature set as an input to the models. The learning models were trained and parameter-tuned to obtain optimal accuracy. Some models have achieved comparatively higher accuracy than others. We used multiple performance metrics to compare the results for each algorithm. The ensemble learners have shown an overall better score on all performance metrics as compared to the individual learners.

            Fake news detection has many open issues that require. For instance, in order to reduce the spread of fake news, identifying key elements involved in the spread of news is an important step. Machine learning techniques can be employed to identify the key sources involved in spread of fake news.

            In order to detect accurately fake news, we check news from our model and Identify the source of the fake news who is publishing continuously as well as Identify the categories of the fake news. The model will also help to identify the probability rate of spread fake news.

            It will also help to citizen & govt. to identify the news whether it’s true or not so it is helpful in meaningful way:

            • Reduced media noise
            • Increase in optimal use of resources
            • Improved public sentiment toward government’s handling of the upsurge.
            • Improve Business Sentiment to do smoothly

             

            Comments

            Team Name -  

            Nupeak Tachyon

            Fake News Detection

            Team lead@Jatin_P

            Team Members :    @toshi  @HITESH_MALI  @N_AK   @uttam631 

            Hi Guys,

            We have submitted our use case Video

            Short Video .Long Video 

            I Hope this use case helps to Government and public relation to strengthen as well as we hope jury member will consider our context. 

            Our team  also thanks to SaS India & entire SaS hackathon team that  he helped us to thought different ways and guide us approach to achieve our use case.

            Great work team and all the very best 🙂

            Thanks a Lot..🙂

            Version history
            Last update:
            ‎10-20-2022 12:25 PM
            Updated by:

            sas-innovate-white.png

            Join us for our biggest event of the year!

            Four days of inspiring keynotes, product reveals, hands-on learning opportunities, deep-dive demos, and peer-led breakouts. Don't miss out, May 6-9, in Orlando, Florida.

             

            View the full agenda.

            Register now!

            Article Tags