<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Reduce a dataset in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Reduce-a-dataset/m-p/570529#M160889</link>
    <description>&lt;P&gt;Hi Patrick, thanks for the reply, I used data step to extract a certain number of obs. I will first work with that and see how I will progress later&lt;/P&gt;</description>
    <pubDate>Tue, 02 Jul 2019 12:51:24 GMT</pubDate>
    <dc:creator>Anita_n</dc:creator>
    <dc:date>2019-07-02T12:51:24Z</dc:date>
    <item>
      <title>Reduce a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Reduce-a-dataset/m-p/570462#M160854</link>
      <description>&lt;P&gt;Hello all,&lt;/P&gt;&lt;P&gt;I don't really know if this question makes sense but I will just ask it.&lt;/P&gt;&lt;P&gt;Is there anyway to reduce a sas dataset of about 9 million observations&lt;/P&gt;&lt;P&gt;to lets say 20 000 and still maintain major variables. There are some duplicates&lt;/P&gt;&lt;P&gt;but I will exclude that. I am only familiar with proc summary/proc freq. I don't know&amp;nbsp;&lt;/P&gt;&lt;P&gt;if there is something better&lt;/P&gt;</description>
      <pubDate>Tue, 02 Jul 2019 09:19:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Reduce-a-dataset/m-p/570462#M160854</guid>
      <dc:creator>Anita_n</dc:creator>
      <dc:date>2019-07-02T09:19:31Z</dc:date>
    </item>
    <item>
      <title>Re: Reduce a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Reduce-a-dataset/m-p/570463#M160855</link>
      <description>&lt;P&gt;I don't understand this question. Do you simply want to remove duplicates? Or do you want to remove some specific variabels?&lt;/P&gt;</description>
      <pubDate>Tue, 02 Jul 2019 09:21:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Reduce-a-dataset/m-p/570463#M160855</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2019-07-02T09:21:12Z</dc:date>
    </item>
    <item>
      <title>Re: Reduce a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Reduce-a-dataset/m-p/570468#M160859</link>
      <description>&lt;P&gt;The duplicates have been removed but the dataset is still too large. I want to reduce this&lt;/P&gt;</description>
      <pubDate>Tue, 02 Jul 2019 09:50:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Reduce-a-dataset/m-p/570468#M160859</guid>
      <dc:creator>Anita_n</dc:creator>
      <dc:date>2019-07-02T09:50:15Z</dc:date>
    </item>
    <item>
      <title>Re: Reduce a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Reduce-a-dataset/m-p/570470#M160861</link>
      <description>&lt;P&gt;Too large to do what?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;There are several ways to reduce the size of your data set. You can:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;- &lt;A href="https://documentation.sas.com/?docsetId=lestmtsref&amp;amp;docsetTarget=n1capr0s7tilbvn1lypdshkgpaip.htm&amp;amp;docsetVersion=9.4&amp;amp;locale=en" target="_self"&gt;Drop variables&lt;/A&gt;&amp;nbsp;you do not need&lt;/P&gt;
&lt;P&gt;- Compress your data set.&lt;/P&gt;
&lt;P&gt;- Delete observations you do not need&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;.....&lt;/P&gt;</description>
      <pubDate>Tue, 02 Jul 2019 10:06:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Reduce-a-dataset/m-p/570470#M160861</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2019-07-02T10:06:49Z</dc:date>
    </item>
    <item>
      <title>Re: Reduce a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Reduce-a-dataset/m-p/570475#M160863</link>
      <description>&lt;P&gt;I need to check the plausibility of data in a databank according to some given standards using a specific program.&lt;/P&gt;&lt;P&gt;To that, I have to do some preformating/and reduce the dataset in sas before exporting the data to the other program.&lt;/P&gt;&lt;P&gt;The extern program cannot handle large datasets as sas does. That is why I need to reduce the data.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have two ids in the original data set, one is the patient id and the other is the ailment id (the ailment id is reponsible for the duplicates)&lt;/P&gt;&lt;P&gt;that means a patients can have two or more different kinds of ailment.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I used proc freq/proc summary to remove this, so I have the freq in the new dataset.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;My problem is that I also need to get rid of the patientid because the data is still very huge, but I still need to get some variables from original data. and use may left join to merge then to new data again. But if I do that then I don't have any id to use as identifier.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is there any solution to that?&lt;/P&gt;</description>
      <pubDate>Tue, 02 Jul 2019 10:26:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Reduce-a-dataset/m-p/570475#M160863</guid>
      <dc:creator>Anita_n</dc:creator>
      <dc:date>2019-07-02T10:26:16Z</dc:date>
    </item>
    <item>
      <title>Re: Reduce a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Reduce-a-dataset/m-p/570482#M160869</link>
      <description>&lt;P&gt;It depends what you are going to do with "extern" program.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For instance, the first record for a given patient id may be sufficient to do further work as it may have all variables of interest except "ailment". Data Step programming knowledge may be required in addition to PROCs.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Better think of a small SAS data set. Build from there what you want to take out of it.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It is vague now and hence practical suggestion can't be given&lt;/P&gt;</description>
      <pubDate>Tue, 02 Jul 2019 10:49:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Reduce-a-dataset/m-p/570482#M160869</guid>
      <dc:creator>KachiM</dc:creator>
      <dc:date>2019-07-02T10:49:36Z</dc:date>
    </item>
    <item>
      <title>Re: Reduce a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Reduce-a-dataset/m-p/570485#M160871</link>
      <description>&lt;P&gt;ok thanks , I will try using a datastep&lt;/P&gt;</description>
      <pubDate>Tue, 02 Jul 2019 10:56:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Reduce-a-dataset/m-p/570485#M160871</guid>
      <dc:creator>Anita_n</dc:creator>
      <dc:date>2019-07-02T10:56:03Z</dc:date>
    </item>
    <item>
      <title>Re: Reduce a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Reduce-a-dataset/m-p/570506#M160876</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/168930"&gt;@Anita_n&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It sounds like you need to take a sample of the data for your further analysis. That's something you can use Proc Surveyselect for or if SAS/Stat is not licensed then there are also SAS Datastep approaches for the same.&lt;/P&gt;
&lt;P&gt;&lt;A href="http://support.sas.com/kb/24/722.html" target="_blank"&gt;http://support.sas.com/kb/24/722.html&lt;/A&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It depends of course if a sample is suitable for your downstream data assessment as it won't contain all the records with outliers.&lt;/P&gt;</description>
      <pubDate>Tue, 02 Jul 2019 11:59:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Reduce-a-dataset/m-p/570506#M160876</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2019-07-02T11:59:54Z</dc:date>
    </item>
    <item>
      <title>Re: Reduce a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Reduce-a-dataset/m-p/570529#M160889</link>
      <description>&lt;P&gt;Hi Patrick, thanks for the reply, I used data step to extract a certain number of obs. I will first work with that and see how I will progress later&lt;/P&gt;</description>
      <pubDate>Tue, 02 Jul 2019 12:51:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Reduce-a-dataset/m-p/570529#M160889</guid>
      <dc:creator>Anita_n</dc:creator>
      <dc:date>2019-07-02T12:51:24Z</dc:date>
    </item>
    <item>
      <title>Re: Reduce a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Reduce-a-dataset/m-p/570535#M160894</link>
      <description>&lt;P&gt;Extract every Nth observation, and determine N in a separate step:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data _null_;
call symputx('factor',int(9000000/20000));
run;

data want;
set have;
if not mod(_n_,&amp;amp;factor); /* not true = 0 */
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 02 Jul 2019 12:58:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Reduce-a-dataset/m-p/570535#M160894</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2019-07-02T12:58:08Z</dc:date>
    </item>
    <item>
      <title>Re: Reduce a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Reduce-a-dataset/m-p/570541#M160897</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/11562"&gt;@Kurt_Bremser&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;this really reduces the dataset to about 1900 obs. I only wonder what happend here&lt;/P&gt;</description>
      <pubDate>Tue, 02 Jul 2019 13:11:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Reduce-a-dataset/m-p/570541#M160897</guid>
      <dc:creator>Anita_n</dc:creator>
      <dc:date>2019-07-02T13:11:42Z</dc:date>
    </item>
    <item>
      <title>Re: Reduce a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Reduce-a-dataset/m-p/570675#M160949</link>
      <description>&lt;P&gt;What does the log say?&lt;/P&gt;</description>
      <pubDate>Tue, 02 Jul 2019 17:59:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Reduce-a-dataset/m-p/570675#M160949</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2019-07-02T17:59:04Z</dc:date>
    </item>
    <item>
      <title>Re: Reduce a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Reduce-a-dataset/m-p/570822#M161006</link>
      <description>&lt;P&gt;Just to give you an easy proof:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
do i = 1 to 9000000;
  output;
end;
run;

data _null_;
call symputx('factor',int(9000000/20000));
run;

data want;
set have;
if not mod(_n_,&amp;amp;factor); /* not true = 0 */
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Log of the final data step:&lt;/P&gt;
&lt;PRE&gt;34         data want;
35         set have;
36         if not mod(_n_,&amp;amp;factor); /* not true = 0 */
37         run;

NOTE: There were 9000000 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 20000 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           1.27 seconds
      cpu time            0.72 seconds&lt;/PRE&gt;</description>
      <pubDate>Wed, 03 Jul 2019 05:35:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Reduce-a-dataset/m-p/570822#M161006</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2019-07-03T05:35:46Z</dc:date>
    </item>
  </channel>
</rss>

