<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to split large dataset into smaller one in SAS Enterprise Guide</title>
    <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-split-large-dataset-into-smaller-one/m-p/841878#M41628</link>
    <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/366820"&gt;@_el_doredo&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;As you know most of our daily works we don't need this. But, For interview purpose they are asking these kinds of questions so i want to be sure on that. I don't want to be in a position where i don't know some thing at this level.&lt;BR /&gt;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;If an interviewer asks a question about splitting data sets into many smaller data sets, then the first answer should be that this is rarely necessary in SAS, which has much better tools for dealing with "groups" of data points, much better than splitting the data set into many smaller parts. Those tools are the BY statement and the WHERE statement. And so, in answering the interviewer's question, you should say that you would see if those tools would be a better solution for the specific problem.&lt;/P&gt;</description>
    <pubDate>Tue, 01 Nov 2022 14:31:51 GMT</pubDate>
    <dc:creator>PaigeMiller</dc:creator>
    <dc:date>2022-11-01T14:31:51Z</dc:date>
    <item>
      <title>How to split large dataset into smaller one</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-split-large-dataset-into-smaller-one/m-p/841857#M41617</link>
      <description>&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;Hello Experts,&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;I need to split my larger data set. I know how many datasets i want from the larger data set.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;I want to split my larger data set into 3 data set with equal count(if total observation is completely divisible by 3) or else higher observation count in first data set&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;Using this code i am achieving the result.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;%macro temp(dataset=,noofsplits=);&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;data %do i=1 %to &amp;amp;noofsplits.;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;split&amp;amp;i. %end;;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;retain x;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;set &amp;amp;dataset. nobs=nobs;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;if _n_ eq 1 then do;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;if mod(nobs,&amp;amp;noofsplits.) eq 0 then&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;x=int(nobs/&amp;amp;noofsplits.);&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;else x=int(nobs/&amp;amp;noofsplits.)+1;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;end;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;%do i=1 %to &amp;amp;noofsplits;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif" color="#FF0000"&gt;%if &amp;amp;i. &amp;gt; 1 %then %do;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif" color="#FF0000"&gt;else %end;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;if _n_ le (&amp;amp;i.*x) then output split&amp;amp;i.;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;%end;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;run;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;%mend temp;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;%temp(dataset=sashelp.cars , noofsplits=3);&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;SASHELP.CARS has 428 observations. So i am getting 143 in first data set and 143 in second data set and 142 in third data set. But, this part of code i am not understanding clearly &lt;FONT color="#FF0000"&gt;%if &amp;amp;i. &amp;gt; 1 %then %do; else %end;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;If we remove these lines from my code means totally it collapse(143 in first data set and 286 in second data set and 428 in third data set). It's clear that it used to start the observation from the next obs of completed data set. I am not getting how it is done.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;It will be very helpful if some one clarify this for me and if there is any easier way to achieve the same without hard coding total data set needed means most welcome&lt;BR /&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;Thanks in Advance&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 01 Nov 2022 13:28:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-split-large-dataset-into-smaller-one/m-p/841857#M41617</guid>
      <dc:creator>_el_doredo</dc:creator>
      <dc:date>2022-11-01T13:28:22Z</dc:date>
    </item>
    <item>
      <title>Re: How to split large dataset into smaller one</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-split-large-dataset-into-smaller-one/m-p/841859#M41618</link>
      <description>&lt;P&gt;You can just write out the code generated by this line, or have SAS do it by turning on macro debugging commands. Run this line of code, then run the macro again, and you will see in the log exactly the code generated.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;options mprint;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It turns out that this will be the code generated:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;if _n_ le (1*x) then output split1; /* (when &amp;amp;i=1) */
else if _n_ le (2*x) then output split2; /* (when &amp;amp;i=2) */
else if _n_ le (3*x) then output split3; /* (when &amp;amp;i=3) */&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I usually question why this is necessary, in fact I usually discourage people from doing this as in most cases there is little or no benefit (although there are exceptions, of course). Why are you doing this? Why can't you have just one data set with the numbers 1 through 3 stored in a new variable?&lt;/P&gt;</description>
      <pubDate>Tue, 01 Nov 2022 13:44:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-split-large-dataset-into-smaller-one/m-p/841859#M41618</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2022-11-01T13:44:40Z</dc:date>
    </item>
    <item>
      <title>Re: How to split large dataset into smaller one</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-split-large-dataset-into-smaller-one/m-p/841861#M41619</link>
      <description>&lt;P&gt;Hello,&lt;BR /&gt;&lt;BR /&gt;I have tried that. I understood that. when &amp;amp;i value equals 2 how it know to start from observation 144. That's what i am not understanding &lt;BR /&gt;&lt;BR /&gt;If &amp;amp;i value is 1, this statement fails %if &amp;amp;i. &amp;gt; 1 %then %do; else %end; But, still Split1 displays with 143 observation&lt;BR /&gt;If &amp;amp;i value is 2 means how it knows to start from observation 144( as 143 observations sent into split1)&lt;BR /&gt;&lt;BR /&gt;This was my concern and can you please send me the way which you are suggesting&lt;/P&gt;</description>
      <pubDate>Tue, 01 Nov 2022 13:46:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-split-large-dataset-into-smaller-one/m-p/841861#M41619</guid>
      <dc:creator>_el_doredo</dc:creator>
      <dc:date>2022-11-01T13:46:03Z</dc:date>
    </item>
    <item>
      <title>Re: How to split large dataset into smaller one</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-split-large-dataset-into-smaller-one/m-p/841864#M41620</link>
      <description>&lt;P&gt;This part of the macro always executes, it is not conditional. Since the loop goes from 1 to 3, it is executed three times.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;if _n_ le (&amp;amp;i.*x) then output split&amp;amp;i.;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This part of the macro code only executes when &amp;amp;i is greater than 1&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;else &lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The loop goes 1 to 3, so it only executed for &amp;amp;i=2 and &amp;amp;i=3. So you get the resulting code I showed.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;You also asked for simpler approaches. I usually question why this splitting a data set in this fashion is necessary, in fact I usually discourage people from doing this as in most cases there is little or no benefit (although there are exceptions, of course). Why are you doing this? Why can't you have just one data set with the numbers 1 through 3 stored in a new variable? Wouldn't that be simpler?&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 01 Nov 2022 13:52:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-split-large-dataset-into-smaller-one/m-p/841864#M41620</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2022-11-01T13:52:48Z</dc:date>
    </item>
    <item>
      <title>Re: How to split large dataset into smaller one</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-split-large-dataset-into-smaller-one/m-p/841870#M41621</link>
      <description>If i remove this code(%if &amp;amp;i. &amp;gt; 1 %then %do; else %end;)  means then why i am not getting the correct result. &lt;BR /&gt;&lt;BR /&gt;The method which you suggest is we have to hard code it. i am creating a variable called temp to store 1 to 3. then my code will be like this&lt;BR /&gt;if temp=1 then output split1&lt;BR /&gt;else if temp=2 then output split2&lt;BR /&gt;else if temp=3 then output split3&lt;BR /&gt;&lt;BR /&gt;But,Suppose in future if i want to create 10 datasets means i need to write more lines. So, i dont want to hard code the value here.</description>
      <pubDate>Tue, 01 Nov 2022 13:57:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-split-large-dataset-into-smaller-one/m-p/841870#M41621</guid>
      <dc:creator>_el_doredo</dc:creator>
      <dc:date>2022-11-01T13:57:15Z</dc:date>
    </item>
    <item>
      <title>Re: How to split large dataset into smaller one</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-split-large-dataset-into-smaller-one/m-p/841871#M41622</link>
      <description>&lt;P&gt;Just to muddy the waters a little bit ...&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The macro you use takes the first 142 observations and puts them in the first split.&amp;nbsp; Would there be any advantage to a slightly different approach?&amp;nbsp; For example, take observations 1, 4, 7, 10, etc. for the first split, observations 2, 5, 8, 11, etc. for the second split?&amp;nbsp; Or even a random assignment?&lt;/P&gt;</description>
      <pubDate>Tue, 01 Nov 2022 13:59:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-split-large-dataset-into-smaller-one/m-p/841871#M41622</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2022-11-01T13:59:05Z</dc:date>
    </item>
    <item>
      <title>Re: How to split large dataset into smaller one</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-split-large-dataset-into-smaller-one/m-p/841872#M41623</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/366820"&gt;@_el_doredo&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;If i remove this code(%if &amp;amp;i. &amp;gt; 1 %then %do; else %end;) means then why i am not getting the correct result.&lt;BR /&gt;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Because you have modified the logic in the code, you get different results. You should not get the same results if you change the logic. There is no ELSE now in the code, and so different things happen. Take the three lines of code I wrote in my first reply, and look at them in the ELSE is now gone. What do you think will happen when that code executes with no ELSE in the code?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;The method which you suggest is we have to hard code it. i am creating a variable called temp to store 1 to 3. then my code will be like this&lt;BR /&gt;if temp=1 then output split1&lt;BR /&gt;else if temp=2 then output split2&lt;BR /&gt;else if temp=3 then output split3&lt;BR /&gt;&lt;BR /&gt;But,Suppose in future if i want to create 10 datasets means i need to write more lines. So, i dont want to hard code the value here.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;BR /&gt;Absolutely not. There is no hard-coding of anything (other than the number 3 which is a macro argument) anywhere in what I am suggesting. But none of this explains why you need separate data sets. What can you do with separate data sets that you can't do with a single data set that has the split number in a variable?&lt;/P&gt;</description>
      <pubDate>Tue, 01 Nov 2022 14:04:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-split-large-dataset-into-smaller-one/m-p/841872#M41623</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2022-11-01T14:04:14Z</dc:date>
    </item>
    <item>
      <title>Re: How to split large dataset into smaller one</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-split-large-dataset-into-smaller-one/m-p/841873#M41624</link>
      <description>&lt;P&gt;This bit of code:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;%if &amp;amp;i. &amp;gt; 1 %then %do; else %end;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Is generating the SAS code: else.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you remove that bit code, instead of the macro generating:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;if _n_ le (1*x) then output split1; /* (when &amp;amp;i=1) */
else if _n_ le (2*x) then output split2; /* (when &amp;amp;i=2) */
else if _n_ le (3*x) then output split3; /* (when &amp;amp;i=3) */&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;it will generate:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;if _n_ le (1*x) then output split1; /* (when &amp;amp;i=1) */
if _n_ le (2*x) then output split2; /* (when &amp;amp;i=2) */
if _n_ le (3*x) then output split3; /* (when &amp;amp;i=3) */&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;That will give you the wrong result.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The easiest way to see this is to run the code with MPRINT turned on, so you can see the generated code in the log.&lt;/P&gt;</description>
      <pubDate>Tue, 01 Nov 2022 14:03:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-split-large-dataset-into-smaller-one/m-p/841873#M41624</guid>
      <dc:creator>Quentin</dc:creator>
      <dc:date>2022-11-01T14:03:13Z</dc:date>
    </item>
    <item>
      <title>Re: How to split large dataset into smaller one</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-split-large-dataset-into-smaller-one/m-p/841874#M41625</link>
      <description>As you know most of our daily works we don't need this. But, For interview purpose they are asking these kinds of questions so i want to be sure on that. I don't want to be in a position where i don't know some thing at this level.&lt;BR /&gt;&lt;BR /&gt;Thank you so much for explaining. I will try the method which you suggest</description>
      <pubDate>Tue, 01 Nov 2022 14:06:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-split-large-dataset-into-smaller-one/m-p/841874#M41625</guid>
      <dc:creator>_el_doredo</dc:creator>
      <dc:date>2022-11-01T14:06:13Z</dc:date>
    </item>
    <item>
      <title>Re: How to split large dataset into smaller one</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-split-large-dataset-into-smaller-one/m-p/841875#M41626</link>
      <description>&lt;P&gt;What about this?&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;%macro temp(dataset=,noofsplits=);
data
%do i = 1 %to &amp;amp;noofsplits.;
  split&amp;amp;i.
%end;
;
set &amp;amp;dataset.;
select (mod(_n_,&amp;amp;noofsplits.));
%do i = 1 %to &amp;amp;noofsplits.;
  when (%eval(&amp;amp;i. - 1)) output split&amp;amp;i.;
%end;
end;
run;
%mend temp;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 01 Nov 2022 14:06:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-split-large-dataset-into-smaller-one/m-p/841875#M41626</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2022-11-01T14:06:54Z</dc:date>
    </item>
    <item>
      <title>Re: How to split large dataset into smaller one</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-split-large-dataset-into-smaller-one/m-p/841876#M41627</link>
      <description>Thanks a lot</description>
      <pubDate>Tue, 01 Nov 2022 14:11:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-split-large-dataset-into-smaller-one/m-p/841876#M41627</guid>
      <dc:creator>_el_doredo</dc:creator>
      <dc:date>2022-11-01T14:11:01Z</dc:date>
    </item>
    <item>
      <title>Re: How to split large dataset into smaller one</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-split-large-dataset-into-smaller-one/m-p/841878#M41628</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/366820"&gt;@_el_doredo&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;As you know most of our daily works we don't need this. But, For interview purpose they are asking these kinds of questions so i want to be sure on that. I don't want to be in a position where i don't know some thing at this level.&lt;BR /&gt;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;If an interviewer asks a question about splitting data sets into many smaller data sets, then the first answer should be that this is rarely necessary in SAS, which has much better tools for dealing with "groups" of data points, much better than splitting the data set into many smaller parts. Those tools are the BY statement and the WHERE statement. And so, in answering the interviewer's question, you should say that you would see if those tools would be a better solution for the specific problem.&lt;/P&gt;</description>
      <pubDate>Tue, 01 Nov 2022 14:31:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-split-large-dataset-into-smaller-one/m-p/841878#M41628</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2022-11-01T14:31:51Z</dc:date>
    </item>
  </channel>
</rss>

