<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Reduce Data by using Random Sample in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Reduce-Data-by-using-Random-Sample/m-p/665614#M199064</link>
    <description>&lt;P&gt;Modify your post so you don't use an excel file. This might entice more replies.&lt;/P&gt;</description>
    <pubDate>Sun, 28 Jun 2020 06:39:04 GMT</pubDate>
    <dc:creator>ChrisNZ</dc:creator>
    <dc:date>2020-06-28T06:39:04Z</dc:date>
    <item>
      <title>Reduce Data by using Random Sample</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Reduce-Data-by-using-Random-Sample/m-p/665135#M198815</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;I posted similar couple weeks ago, however these days I need to perform slightly different exercise.&lt;/P&gt;&lt;P&gt;I&amp;nbsp;attached 'have' &amp;amp; 'want' in the excel, please have a look.&lt;/P&gt;&lt;P&gt;This is just a snapshot of what I have - my data is over 10 years with over a hundred million records&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What I have: several years of historical data (month) of accounts with an indicator of a late payment for a respective month and a rate (how many accounts had late payment in a given month). In each month there are unique accounts, such as Aug-10 has 1,270,000 unique accounts, while entire dataset is 33M non-unique accounts. Another point to mention, some account get closed overtime (once it happens, they won't appear on a following month), however new account added. This is why MoM total accounts go up, but not&amp;nbsp;all accounts from previous month would appear on the following month.&lt;/P&gt;&lt;P&gt;This is why in table 'want' I would have 1,647,283 unique accounts&amp;nbsp;(combination of all unique accounts that appear at least 1 time during observation period).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What I need is following:&lt;/P&gt;&lt;P&gt;I need to reduce data to unique account level, while not loosing insights of monthly late payment rate and overall payment rate.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Until now, I&amp;nbsp;tried two approaches:&lt;/P&gt;&lt;P&gt;1. SurveySelect with STRATA by account and SAMPSIZE=1&lt;/P&gt;&lt;P&gt;2. rand('Uniform')&amp;nbsp;with following sorting and&amp;nbsp;NODUPKEY by account&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In both cases, I could achieve reducing accounts to unique account, however when running frequency for each month and total, rates where not close to rates in 'have' - such as, oldest months came back with rate ~6.5% and earliest with ~3.5%, while total rate ~ 5.2.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can anyone suggest procedure/steps to be taken to handle this through random selection/exclusion to reduce data and keep all insightful information?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks in advance!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 25 Jun 2020 19:35:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Reduce-Data-by-using-Random-Sample/m-p/665135#M198815</guid>
      <dc:creator>_MVB_</dc:creator>
      <dc:date>2020-06-25T19:35:26Z</dc:date>
    </item>
    <item>
      <title>Re: Reduce Data by using Random Sample</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Reduce-Data-by-using-Random-Sample/m-p/665614#M199064</link>
      <description>&lt;P&gt;Modify your post so you don't use an excel file. This might entice more replies.&lt;/P&gt;</description>
      <pubDate>Sun, 28 Jun 2020 06:39:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Reduce-Data-by-using-Random-Sample/m-p/665614#M199064</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-06-28T06:39:04Z</dc:date>
    </item>
  </channel>
</rss>

