<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Using macro to split a big data into subsets does not work in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Using-macro-to-split-a-big-data-into-subsets-does-not-work/m-p/917304#M361335</link>
    <description>&lt;P&gt;I am trying to split a big data file into several subsets. The subsets only have two repeated observations. Why my code does not work? How come the two duplicated records on each subsets?&amp;nbsp; Thank you for being so helpful!&lt;/P&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;options mlogic ;&lt;/DIV&gt;&lt;DIV&gt;options MPRINT;&lt;/DIV&gt;&lt;DIV&gt;options SYMBOLGEN;&lt;/DIV&gt;&lt;DIV&gt;data tt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; do i=1 to 1e7;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; KEY = int(ranuni(0)*1e5);&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; output;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; end;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; run;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;%macro split_1(N=3, data=tt);&lt;/DIV&gt;&lt;DIV&gt;%do i=1 %to &amp;amp;N;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; data or_sub&amp;amp;i;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; IF i &amp;lt; &amp;amp;N&amp;nbsp; THEN&amp;nbsp; do;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; %let&amp;nbsp; &amp;nbsp;a = %eval(10000*&amp;amp;i);&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; %let b = %eval(10000+10000*&amp;amp;i);&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;set tt(FirstObs=&amp;amp;a obs=&amp;amp;b&amp;nbsp; );&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;end;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; run;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;%put ERROR: &amp;amp;i;&lt;/DIV&gt;&lt;DIV&gt;%end;&lt;/DIV&gt;&lt;DIV&gt;%mend split_1;&lt;/DIV&gt;&lt;DIV&gt;%split_1(N=3, data=Tt)&lt;/DIV&gt;</description>
    <pubDate>Thu, 22 Feb 2024 05:41:48 GMT</pubDate>
    <dc:creator>victorleehc0</dc:creator>
    <dc:date>2024-02-22T05:41:48Z</dc:date>
    <item>
      <title>Using macro to split a big data into subsets does not work</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Using-macro-to-split-a-big-data-into-subsets-does-not-work/m-p/917304#M361335</link>
      <description>&lt;P&gt;I am trying to split a big data file into several subsets. The subsets only have two repeated observations. Why my code does not work? How come the two duplicated records on each subsets?&amp;nbsp; Thank you for being so helpful!&lt;/P&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;options mlogic ;&lt;/DIV&gt;&lt;DIV&gt;options MPRINT;&lt;/DIV&gt;&lt;DIV&gt;options SYMBOLGEN;&lt;/DIV&gt;&lt;DIV&gt;data tt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; do i=1 to 1e7;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; KEY = int(ranuni(0)*1e5);&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; output;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; end;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; run;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;%macro split_1(N=3, data=tt);&lt;/DIV&gt;&lt;DIV&gt;%do i=1 %to &amp;amp;N;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; data or_sub&amp;amp;i;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; IF i &amp;lt; &amp;amp;N&amp;nbsp; THEN&amp;nbsp; do;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; %let&amp;nbsp; &amp;nbsp;a = %eval(10000*&amp;amp;i);&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; %let b = %eval(10000+10000*&amp;amp;i);&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;set tt(FirstObs=&amp;amp;a obs=&amp;amp;b&amp;nbsp; );&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;end;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; run;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;%put ERROR: &amp;amp;i;&lt;/DIV&gt;&lt;DIV&gt;%end;&lt;/DIV&gt;&lt;DIV&gt;%mend split_1;&lt;/DIV&gt;&lt;DIV&gt;%split_1(N=3, data=Tt)&lt;/DIV&gt;</description>
      <pubDate>Thu, 22 Feb 2024 05:41:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Using-macro-to-split-a-big-data-into-subsets-does-not-work/m-p/917304#M361335</guid>
      <dc:creator>victorleehc0</dc:creator>
      <dc:date>2024-02-22T05:41:48Z</dc:date>
    </item>
    <item>
      <title>Re: Using macro to split a big data into subsets does not work</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Using-macro-to-split-a-big-data-into-subsets-does-not-work/m-p/917308#M361339</link>
      <description>&lt;P&gt;What SAS code do you think that macro is going to generate?&lt;/P&gt;
&lt;P&gt;Let's reformat the code so it is easier to tell the difference between the macro statements and the SAS statements it is going to generate.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Indenting the macro statements and SAS statements independently will make it clearer the scope of your code blocks make it harder to confuse the macro code for the SAS code.&lt;/P&gt;
&lt;P&gt;Let's also move the macro statements to where they are actually going to execute relative to when the generated SAS code will execute.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;%macro split_1(N=3, data=tt);
  %do i=1 %to &amp;amp;N;
    %let a = %eval(10000*&amp;amp;i);
    %let b = %eval(10000+10000*&amp;amp;i);
data or_sub&amp;amp;i;
  IF i &amp;lt; &amp;amp;N  THEN  do;
    set tt(FirstObs=&amp;amp;a obs=&amp;amp;b  );
  end;
run;
    %put ERROR: &amp;amp;i;
  %end;
%mend split_1;
%split_1(N=3, data=Tt)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;What do you think that IF statement is doing?&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Why do you write an ERROR message to the log every time the %DO loop executes?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;After you work that out then look at the LOGIC you appear to be trying to generate.&amp;nbsp; It looks like perhaps you meant to use the macro to generate SAS code like this:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data or_sub1;
  set tt(FirstObs=10000 obs=20000  );
run;
data or_sub2;
  set tt(FirstObs=20000 obs=30000  );
run;
data or_sub3;
  set tt(FirstObs=30000 obs=40000  );
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Why did you skip the first 9,999 observations?&amp;nbsp; Why did you write the 10,000th observation to both OR_SUB1 and to OR_SUB2?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;What is the actual goal of the macro?&amp;nbsp; Split it HOW?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Let's assume the goal is to split into datasets with no more than 10,000 observations each. A much faster way to do that would be something like this that only reads the dataset once creating all of the subsets at once.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data or_sub1 or_sub2 or_sub3 ;
  set tt;
  select;
    when (_n_ &amp;lt; 10001) output or_sub1;
    when (_n_ &amp;lt; 20001) output or_sub2;
    when (_n_ &amp;lt; 30001) output or_sub3;
    otherwise do; put 'ERROR: More observations than expected.'; stop; end;
  end;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Example:&lt;/P&gt;
&lt;PRE&gt;67   data class1 class2 class3 ;
68     set sashelp.class;
69     select;
70       when (_n_ &amp;lt; 6)  output class1;
71       when (_n_ &amp;lt; 11) output class2;
72       when (_n_ &amp;lt; 16) output class3;
73       otherwise do; put 'ERROR: More observations than expected.'; stop; end;
74     end;
75   run;

ERROR: More observations than expected.
NOTE: There were 16 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.CLASS1 has 5 observations and 5 variables.
NOTE: The data set WORK.CLASS2 has 5 observations and 5 variables.
NOTE: The data set WORK.CLASS3 has 5 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.01 seconds
&lt;/PRE&gt;
&lt;P&gt;So to generate that step you will need two separate %DO loops.&amp;nbsp; One to make the list of datasets in the DATA statement. And the second to generate the series of WHEN () statements for the SELECT block.&lt;/P&gt;</description>
      <pubDate>Thu, 22 Feb 2024 06:16:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Using-macro-to-split-a-big-data-into-subsets-does-not-work/m-p/917308#M361339</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2024-02-22T06:16:45Z</dc:date>
    </item>
    <item>
      <title>Re: Using macro to split a big data into subsets does not work</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Using-macro-to-split-a-big-data-into-subsets-does-not-work/m-p/917326#M361350</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/297600"&gt;@victorleehc0&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;I am trying to split a big data file into several subsets.&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Why? How are you going to analyze this big data when it is separated into many data sets? Depending on your answer, there may be much better solutions than using a macro.&lt;/P&gt;</description>
      <pubDate>Thu, 22 Feb 2024 12:48:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Using-macro-to-split-a-big-data-into-subsets-does-not-work/m-p/917326#M361350</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2024-02-22T12:48:13Z</dc:date>
    </item>
  </channel>
</rss>

