SAS Hackathon Team Profiles (Past)

See the use cases and winners from past SAS Hackathon events!
BookmarkSubscribeRSS Feed

Fake News Detection

Started ‎02-15-2021 by
Modified ‎10-20-2022 by
Views 4,376
Nupeak Tachyon Fake News Detection (HACKIN SAS 2021) #Short video
Video Player is loading.
Current Time 0:00
Duration 0:00
Loaded: 0%
Stream Type LIVE
Remaining Time 0:00
 
1x
    • Chapters
    • descriptions off, selected
    • captions off, selected
      (view in My Videos)
      Fake News Nupeak Tachyon (HACKIN SAS 2021) # Long Video
      Video Player is loading.
      Current Time 0:00
      Duration 0:00
      Loaded: 0%
      Stream Type LIVE
      Remaining Time 0:00
       
      1x
        • Chapters
        • descriptions off, selected
        • captions off, selected
          (view in My Videos)
                                               
          Nupeak_Tachyon Presentation.mp4
          Video Player is loading.
          Current Time 0:00
          Duration 0:00
          Loaded: 0%
          Stream Type LIVE
          Remaining Time 0:00
           
          1x
            • Chapters
            • descriptions off, selected
            • captions off, selected
              (view in My Videos)
                                                                                    
               
              Team Name Nupeak Tachyon
              Track START UP
              Use Case Fake News Detection
              Technology NLP, ML
              Region India
              Team lead Jatin Pithva @Jatin_P 
              Team members @uttam631 @N_AK  @toshi @HITESH_MALI 

               

              Introduction

              The authenticity of Information has become a longstanding issue affecting businesses and society, both for printed and digital media. On social networks, the reach and effects of information spread occur at such a fast pace and so amplified that distorted, inaccurate or false information acquires a tremendous potential to cause real world impacts, within minutes, for millions of users. Recently, several public concerns about this problem and some approaches to mitigate the problem were expressed.

              Fake news refers to misinformation or disinformation in the country which is spread through word of mouth and traditional media and more recently through digital forms of communication such as edited videos, memes, unverified advertisements and social media propagated rumors. In this project, we discuss the problem by presenting the proposals into categories:

              1. Content Based
              2. Source Based
              3. Diffusion Based

              We describe two opposite approaches and propose an algorithmic solution that synthesizes the main concerns. We conclude the paper by raising awareness about concerns and opportunities for businesses that are currently on the quest to help automatically detecting fake news.

               

              Goal

              The main objective is to detect the fake news, which is a classic text classification problem with a straight forward proposition. It is needed to build a model that can differentiate between “Real” news and “Fake” news and identify the source that publish fake news simultaneously.

              For Different point of view:

              1. Citizens- Citizen to use this tool to identified fake news or they will see the source of fake news and take alert on email the content of fake news.
              2. Government- Concern government Authority will use this model to identify the source of the fake news and take immediate action.
              3. Publishers- They will identify if someone using his broadcaster name for unauthenticated propaganda.

               

              Tools and Technology

              1. SAS Visual Text Analytics
              2. SAS VDDML
              3. Python
              4. SAS Viya
              5. SAS Studio
              6. SAS Visual Analytics
              7. Microsoft Azure

              Process Flow Diagram

               

              uttam631_0-1613051691277.png

               

               

              Project Implementation Approach

              uttam631_1-1613051691282.png

               

              Description

              1. Crawl the news from the different source URL.
              2. Create a crawled news data in tabular format.
              3. Data Cleansing- Articles with no body text or having less than 10 words in the article body are removed. These operations are performed on all the datasets to achieve consistency of format and structure. Once the relevant attributes are selected after the data cleaning and exploration phase.
              4. Linguistic features- Linguistic features involved certain textual characteristics converted into a numerical form such that they can be used as an input for the training models.
              5. Feature Selection- Select the correlated variables that are important for model.
              6. The input features will used to train the different machine learning models. Each dataset is divided into training and testing data with a 70/30 split.
              7. The learning algorithms are trained with different hyperparameters to achieve maximum accuracy for a given dataset, with an optimal balance between variance and bias.
              8. Compare the output of all that models that we created.
              9. Identify the best fit model.
              10. Give the final conclusion of the best fit model based on their output whether the news is true or fake.

              Model Implementation Approach

              We have shown the Model output.

               

              Confusion Matrix:

                  Predicted Class Predicted Class
                  REAL FALSE

              Actual Class

              REAL

              TRUE POSITIVE

              FALSE NEGATIVE

              Actual Class

              FALSE

              FALSE NEGATIVE

              TRUE NEGATIVE

              Visualization Report:

              uttam631_2-1613051691284.png

               

              Conclusion

              With the increasing popularity of social media, more and more people consume news from social media instead of traditional news media. However, social media has also been used to spread fake news, which has strong negative impacts on individual users and broader society.

              The task of classifying news manually requires in-depth knowledge of the domain and expertise to identify anomalies in the text. In this project, we discussed the problem of classifying fake news articles using machine learning models and ensemble techniques. The data we used in our work is collected from the different sources URL and contains news articles from various domains. The primary aim of the project is to identify patterns in text that differentiate fake articles from true news. We extracted different textual features from the articles using a different SAS tools and used the feature set as an input to the models. The learning models were trained and parameter-tuned to obtain optimal accuracy. Some models have achieved comparatively higher accuracy than others. We used multiple performance metrics to compare the results for each algorithm. The ensemble learners have shown an overall better score on all performance metrics as compared to the individual learners.

              Fake news detection has many open issues that require. For instance, in order to reduce the spread of fake news, identifying key elements involved in the spread of news is an important step. Machine learning techniques can be employed to identify the key sources involved in spread of fake news.

              In order to detect accurately fake news, we check news from our model and Identify the source of the fake news who is publishing continuously as well as Identify the categories of the fake news. The model will also help to identify the probability rate of spread fake news.

              It will also help to citizen & govt. to identify the news whether it’s true or not so it is helpful in meaningful way:

              • Reduced media noise
              • Increase in optimal use of resources
              • Improved public sentiment toward government’s handling of the upsurge.
              • Improve Business Sentiment to do smoothly

               

              Comments

              Team Name -  

              Nupeak Tachyon

              Fake News Detection

              Team lead@Jatin_P

              Team Members :    @toshi  @HITESH_MALI  @N_AK   @uttam631 

              Hi Guys,

              We have submitted our use case Video

              Short Video .Long Video 

              I Hope this use case helps to Government and public relation to strengthen as well as we hope jury member will consider our context. 

              Our team  also thanks to SaS India & entire SaS hackathon team that  he helped us to thought different ways and guide us approach to achieve our use case.

              Great work team and all the very best 🙂

              Thanks a Lot..🙂

              Version history
              Last update:
              ‎10-20-2022 12:25 PM
              Updated by:

              sas-innovate-white.png

              Our biggest data and AI event of the year.

              Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

              Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

               

              Register now!

              Article Tags