<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Fraud Detection Using Supervised Machine Learning in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/Fraud-Detection-Using-Supervised-Machine-Learning/m-p/543520#M7727</link>
    <description>&lt;P&gt;Use PROC PLS or PROC HPGENSELECT to pick up variables.&lt;/P&gt;</description>
    <pubDate>Fri, 15 Mar 2019 13:19:15 GMT</pubDate>
    <dc:creator>Ksharp</dc:creator>
    <dc:date>2019-03-15T13:19:15Z</dc:date>
    <item>
      <title>Fraud Detection Using Supervised Machine Learning</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Fraud-Detection-Using-Supervised-Machine-Learning/m-p/543448#M7726</link>
      <description>&lt;P&gt;Hello!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am a student working on a project trying to identify fraud in E-Commerce transactions.&lt;/P&gt;&lt;P&gt;The fraud(target) are rare events (around 5 % of the observations) which leads the model to classify everything as not fraud.&lt;/P&gt;&lt;P&gt;I am thinking of a method of oversampling without losing all the observatoions. When I'm using sample to make observations of fraud and not fraud 50/50, the program takes only the fraudcases, 2000 observations, and 2000 random non fraud observations, making me lose the rest of the dataset, which is almost 25000 observations.&lt;/P&gt;&lt;P&gt;Is there a way of making the dataset 50/50 (fraud, not fraud) without removing observations? My thought is that it might just duplicate fraud observations in the training dataset (making it 50%/50%), leaving the test dataset as it was (95% not fraud, 5% fraud). Any step-by-step method of doing this in SAS?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I also have a lot of variables, almost 600. I am thinking of using PCA to get the most relevant of these. How can I do this using SAS?&lt;/P&gt;</description>
      <pubDate>Fri, 15 Mar 2019 09:29:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Fraud-Detection-Using-Supervised-Machine-Learning/m-p/543448#M7726</guid>
      <dc:creator>hakonstrand</dc:creator>
      <dc:date>2019-03-15T09:29:38Z</dc:date>
    </item>
    <item>
      <title>Re: Fraud Detection Using Supervised Machine Learning</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Fraud-Detection-Using-Supervised-Machine-Learning/m-p/543520#M7727</link>
      <description>&lt;P&gt;Use PROC PLS or PROC HPGENSELECT to pick up variables.&lt;/P&gt;</description>
      <pubDate>Fri, 15 Mar 2019 13:19:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Fraud-Detection-Using-Supervised-Machine-Learning/m-p/543520#M7727</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2019-03-15T13:19:15Z</dc:date>
    </item>
    <item>
      <title>Re: Fraud Detection Using Supervised Machine Learning</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Fraud-Detection-Using-Supervised-Machine-Learning/m-p/546187#M7743</link>
      <description>&lt;P&gt;In here you have some ideas how to deal with rare cases:&lt;/P&gt;&lt;P&gt;&lt;A href="http://support.sas.com/documentation/cdl/en/emxndg/67980/HTML/default/viewer.htm#p1w6fewo0jhzxdn1rytuk1kt0pqj.htm" target="_blank"&gt;http://support.sas.com/documentation/cdl/en/emxndg/67980/HTML/default/viewer.htm#p1w6fewo0jhzxdn1rytuk1kt0pqj.htm&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In here is a paper about nice method called SMOTE but unfortunately this version works only on continuous variables:&lt;/P&gt;&lt;P&gt;&lt;A href="https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2018/3604-2018.pdf" target="_blank"&gt;https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2018/3604-2018.pdf&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 26 Mar 2019 15:03:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Fraud-Detection-Using-Supervised-Machine-Learning/m-p/546187#M7743</guid>
      <dc:creator>MBRACH</dc:creator>
      <dc:date>2019-03-26T15:03:19Z</dc:date>
    </item>
  </channel>
</rss>

