<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: separate the dataset based on the year in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/separate-the-dataset-based-on-the-year/m-p/758952#M239729</link>
    <description>&lt;P&gt;Why do you want to separate the dataset at all?&lt;/P&gt;
&lt;P&gt;You could simplify the code by using call execute:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
   create table work.years as
      select distinct year(appln_filing_date) as year
         from have;
quit;

data _null_;
   set work.years;
   
   call execute(cats('data work.want_', year, ';'));
   call execute('set work.have;');
   call execute(cats('where year(appln_filing_date)=', year, ';'));
   call execute('run;');   
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
    <pubDate>Tue, 03 Aug 2021 05:26:52 GMT</pubDate>
    <dc:creator>andreas_lds</dc:creator>
    <dc:date>2021-08-03T05:26:52Z</dc:date>
    <item>
      <title>separate the dataset based on the year</title>
      <link>https://communities.sas.com/t5/SAS-Programming/separate-the-dataset-based-on-the-year/m-p/758947#M239724</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am trying to separate the dataset based on the year of each observation (from 1985 to 2020). Is there any method to simplify the following code? I do not want to use %MACRO because it is a large dataset.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;	data want_1985
		want_1986
		want_1987
		...
		want_2019
		want_2020
		;
		set have;
		if year(appln_filing_date)=1985 then output want_1985;
		if year(appln_filing_date)=1986 then output want_1986;
		...
		if year(appln_filing_date)=2019 then output want_2019;
		if year(appln_filing_date)=2020 then output want_2020;
	run;&lt;/PRE&gt;&lt;P&gt;Many thanks in advance.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 03 Aug 2021 04:34:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/separate-the-dataset-based-on-the-year/m-p/758947#M239724</guid>
      <dc:creator>Alexxxxxxx</dc:creator>
      <dc:date>2021-08-03T04:34:58Z</dc:date>
    </item>
    <item>
      <title>Re: separate the dataset based on the year</title>
      <link>https://communities.sas.com/t5/SAS-Programming/separate-the-dataset-based-on-the-year/m-p/758951#M239728</link>
      <description>&lt;P&gt;That's about as simple as you can make it unless you are willing to use a Macro.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But what's this about a large dataset and a Macro?&amp;nbsp; Macros just generate SAS code.&amp;nbsp; Macros don't cause problems with large datasets -- as long as you generate good code with your Macro.&amp;nbsp; How the code performs with a large dataset is dependent on your programming skill, not on the Macro facility.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;OK, now, look.&amp;nbsp; First you need two little macros, like this:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;%MACRO	Gen_Dataset_Names(First, Last);
	%DO	i	=	&amp;amp;First	%TO	&amp;amp;Last;
		Want_&amp;amp;i
	%END;
%MEND	Gen_Dataset_Names;

%MACRO	Gen_If_Statements(First, Last);
	%DO	i	=	&amp;amp;First	%TO	&amp;amp;Last;
		%IF	&amp;amp;i	&amp;gt;	&amp;amp;First	%THEN
			%DO;
				else
				if year(appln_filing_date) = &amp;amp;i then output want_&amp;amp;i;
			%END;
		%ELSE	
			%DO;
				if year(appln_filing_date) = &amp;amp;i then output want_&amp;amp;i;
			%END;
	%END;
%MEND	Gen_If_Statements;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The first generates the output dataset names for the Data statement.&amp;nbsp; The second generates the If statements.&amp;nbsp; By the way, notice the "else" I've placed in with the If statements.&amp;nbsp; This else will make your code execute&amp;nbsp;&lt;EM&gt;faster&lt;/EM&gt; than the way that you had it.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;We can test this code with the following Put statements,&amp;nbsp;&lt;STRONG&gt;but&lt;/STRONG&gt; we have to remove the semi-colons (temporarily) from the If statements or there will be an error.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;%PUT %Gen_Dataset_Names(1986, 2020);
%PUT %Gen_If_Statements(1986, 2020);&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Here's the result from the first.&amp;nbsp; It's&amp;nbsp;&lt;STRONG&gt;just text&lt;/STRONG&gt;.&amp;nbsp; Macros are&amp;nbsp;&lt;STRONG&gt;not&lt;/STRONG&gt; innately slow.&lt;/P&gt;
&lt;PRE&gt;Want_1986   Want_1987   Want_1988   Want_1989   Want_1990   Want_1991   Want_1992   Want_1993   Want_1994   Want_1995   
Want_1996   Want_1997   Want_1998   Want_1999   Want_2000   Want_2001   Want_2002   Want_2003   Want_2004   Want_2005   
Want_2006   Want_2007   Want_2008   Want_2009   Want_2010   Want_2011   Want_2012   Want_2013   Want_2014   Want_2015   
Want_2016   Want_2017   Want_2018   Want_2019   Want_2020
&lt;/PRE&gt;
&lt;P&gt;Here's the result from the second.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;if year(appln_filing_date) = 1986 then output want_1986     else     if year(appln_filing_date) = 1987 then output want_1987 
    else     if year(appln_filing_date) = 1988 then output want_1988     else     if year(appln_filing_date) = 1989 then 
output want_1989     else     if year(appln_filing_date) = 1990 then output want_1990     else     if 
year(appln_filing_date) = 1991 then output want_1991     else     if year(appln_filing_date) = 1992 then output want_1992    
 else     if year(appln_filing_date) = 1993 then output want_1993     else     if year(appln_filing_date) = 1994 then output 
want_1994     else     if year(appln_filing_date) = 1995 then output want_1995     else     if year(appln_filing_date) = 1996 then output want_1996     else     if year(appln_filing_date) = 1997 then output want_1997     else     if 
year(appln_filing_date) = 1998 then output want_1998     else     if year(appln_filing_date) = 1999 then output want_1999    
 else     if year(appln_filing_date) = 2000 then output want_2000     else     if year(appln_filing_date) = 2001 then output 
want_2001     else     if year(appln_filing_date) = 2002 then output want_2002     else     if year(appln_filing_date) = 
2003 then output want_2003     else     if year(appln_filing_date) = 2004 then output want_2004     else     if 
year(appln_filing_date) = 2005 then output want_2005     else     if year(appln_filing_date) = 2006 then output want_2006    
 else     if year(appln_filing_date) = 2007 then output want_2007     else     if year(appln_filing_date) = 2008 then output 
want_2008     else     if year(appln_filing_date) = 2009 then output want_2009     else     if year(appln_filing_date) = 
2010 then output want_2010     else     if year(appln_filing_date) = 2011 then output want_2011     else     if 
year(appln_filing_date) = 2012 then output want_2012     else     if year(appln_filing_date) = 2013 then output want_2013    
 else     if year(appln_filing_date) = 2014 then output want_2014     else     if year(appln_filing_date) = 2015 then output 
want_2015     else     if year(appln_filing_date) = 2016 then output want_2016     else     if year(appln_filing_date) = 
2017 then output want_2017     else     if year(appln_filing_date) = 2018 then output want_2018     else     if 
year(appln_filing_date) = 2019 then output want_2019     else     if year(appln_filing_date) = 2020 then output want_2020&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;It's not very tidy, but it will run (once we put the semi colons back in on the If statements).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The final program would look like the below.&amp;nbsp; I don't have your data otherwise I would run it.&amp;nbsp; I could have a typo or something, but this is&amp;nbsp; in general proper SAS code.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;DATA	%Gen_Dataset_Names(1986, 2020);
	SET	Have;
	%Gen_If_Statements(1986, 2020);
RUN;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Jim&lt;/P&gt;</description>
      <pubDate>Tue, 03 Aug 2021 05:34:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/separate-the-dataset-based-on-the-year/m-p/758951#M239728</guid>
      <dc:creator>jimbarbour</dc:creator>
      <dc:date>2021-08-03T05:34:01Z</dc:date>
    </item>
    <item>
      <title>Re: separate the dataset based on the year</title>
      <link>https://communities.sas.com/t5/SAS-Programming/separate-the-dataset-based-on-the-year/m-p/758952#M239729</link>
      <description>&lt;P&gt;Why do you want to separate the dataset at all?&lt;/P&gt;
&lt;P&gt;You could simplify the code by using call execute:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
   create table work.years as
      select distinct year(appln_filing_date) as year
         from have;
quit;

data _null_;
   set work.years;
   
   call execute(cats('data work.want_', year, ';'));
   call execute('set work.have;');
   call execute(cats('where year(appln_filing_date)=', year, ';'));
   call execute('run;');   
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 03 Aug 2021 05:26:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/separate-the-dataset-based-on-the-year/m-p/758952#M239729</guid>
      <dc:creator>andreas_lds</dc:creator>
      <dc:date>2021-08-03T05:26:52Z</dc:date>
    </item>
    <item>
      <title>Re: separate the dataset based on the year</title>
      <link>https://communities.sas.com/t5/SAS-Programming/separate-the-dataset-based-on-the-year/m-p/758956#M239731</link>
      <description>&lt;P&gt;Very good,&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/15475"&gt;@andreas_lds&lt;/a&gt;!&amp;nbsp; That's an excellent idea to use Call Execute in that fashion.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;One caution perhaps:&amp;nbsp; If it's a truly large dataset, then one could have a performance problem.&amp;nbsp; The entire dataset must be read for every one of the Call Executes, which, for years 1986 through 2020, would be about 35 times.&amp;nbsp; If, say, the dataset were 500,000,000 rows, then you could be in for a bit of a wait.&amp;nbsp; &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If one executes the code in a single Data step, as in my expanded reply, above, then the dataset would be read only once.&amp;nbsp; I like the compactness of your solution, and if there were a performance problem, then one could switch to a single pass solution such as the one I added to my initial reply.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Jim&lt;/P&gt;</description>
      <pubDate>Tue, 03 Aug 2021 05:47:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/separate-the-dataset-based-on-the-year/m-p/758956#M239731</guid>
      <dc:creator>jimbarbour</dc:creator>
      <dc:date>2021-08-03T05:47:39Z</dc:date>
    </item>
    <item>
      <title>Re: separate the dataset based on the year</title>
      <link>https://communities.sas.com/t5/SAS-Programming/separate-the-dataset-based-on-the-year/m-p/758966#M239733</link>
      <description>&lt;P&gt;Define "large" in terms of observation count and physical dataset size.&lt;/P&gt;
&lt;P&gt;Which analysis do you intend to do? All analysis procedures allow the use of BY or CLASS, so splitting is not necessary.&lt;/P&gt;</description>
      <pubDate>Tue, 03 Aug 2021 05:56:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/separate-the-dataset-based-on-the-year/m-p/758966#M239733</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2021-08-03T05:56:50Z</dc:date>
    </item>
    <item>
      <title>Re: separate the dataset based on the year</title>
      <link>https://communities.sas.com/t5/SAS-Programming/separate-the-dataset-based-on-the-year/m-p/758968#M239734</link>
      <description>&lt;P&gt;I expect that using where to filter the observations can compensate the overhead partially, but you are right, a macro will most likely perform better, if the data is really large. Maybe &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/262815"&gt;@Alexxxxxxx&lt;/a&gt; can share some facts about the dimension of the dataset: number of variables, number of observations.&lt;/P&gt;</description>
      <pubDate>Tue, 03 Aug 2021 05:58:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/separate-the-dataset-based-on-the-year/m-p/758968#M239734</guid>
      <dc:creator>andreas_lds</dc:creator>
      <dc:date>2021-08-03T05:58:19Z</dc:date>
    </item>
    <item>
      <title>Re: separate the dataset based on the year</title>
      <link>https://communities.sas.com/t5/SAS-Programming/separate-the-dataset-based-on-the-year/m-p/758971#M239739</link>
      <description>&lt;P&gt;After some tests i have to &lt;SPAN class="VIiyi"&gt;&lt;SPAN class="JLqJ4b ChMk0b" data-language-for-alternatives="en" data-language-to-translate-into="de" data-phrase-index="0"&gt;&lt;SPAN&gt;adapt expectations to reality:&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="VIiyi"&gt;&lt;SPAN class="JLqJ4b ChMk0b" data-language-for-alternatives="en" data-language-to-translate-into="de" data-phrase-index="0"&gt;&lt;SPAN&gt;I created a dataset with 635000 obs and 21 variables. Splitting with call execute takes nearly five seconds, while using a macro the split was done less than half a second.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="VIiyi"&gt;&lt;SPAN class="JLqJ4b ChMk0b" data-language-for-alternatives="en" data-language-to-translate-into="de" data-phrase-index="0"&gt;&lt;SPAN&gt;The macro:&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;%macro split;
   %local minYear maxYear;
   proc sql noprint;
      select min(year(appln_filing_date)), max(year(appln_filing_date)) 
         into :minYear trimmed, :maxYear trimmed
         from work.have;
   quit;
   
   %local year;
   
   data 
      %do year = &amp;amp;minYear. %to &amp;amp;maxYear.;
         work.want_&amp;amp;year.
      %end;
      ;
      
      set work.have;
      
      select(year(appln_filing_date));
         %do year = &amp;amp;minYear. %to &amp;amp;maxYear.;
            when (&amp;amp;year) output work.want_&amp;amp;year.;
         %end;
      end;      
   run;
%mend;

%split;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 03 Aug 2021 06:49:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/separate-the-dataset-based-on-the-year/m-p/758971#M239739</guid>
      <dc:creator>andreas_lds</dc:creator>
      <dc:date>2021-08-03T06:49:09Z</dc:date>
    </item>
    <item>
      <title>Re: separate the dataset based on the year</title>
      <link>https://communities.sas.com/t5/SAS-Programming/separate-the-dataset-based-on-the-year/m-p/758975#M239740</link>
      <description>&lt;P&gt;One dynamically created data step will always be faster than a series of data steps, one for every selection. In the case of years, one can create a macro that does the loops internally:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;%macro split(start,end);
%local i;
data
%do i = &amp;amp;start. %to %end.;
  want&amp;amp;i.
%end;
;
set have;
%do i = &amp;amp;start. %to &amp;amp;end.;
if year = &amp;amp;i. then output want&amp;amp;i.;
%end;
run;
%mend;
%split&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;But still, splitting will move the contents of the dataset three times through I/O:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;to determine the split values (but only necessary if the edge values are not known beforehand)&lt;/LI&gt;
&lt;LI&gt;to read the dataset&lt;/LI&gt;
&lt;LI&gt;to write the split datasets&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;and all this before you have done ANY analysis.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In all my professional work with SAS, I have done this exactly &lt;EM&gt;once&lt;/EM&gt;, and only because sorting a large dataset would have cracked my quota, preventing me from testing a production program in my personal environment (the batch user has no quotas). I split along the top-level sort variable, sorted/summarized the splits, and concatenated them back. It also has the side effect that the program can't eat too much work storage, possibly colliding with other batch jobs running concurrently.&lt;/P&gt;</description>
      <pubDate>Tue, 03 Aug 2021 07:51:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/separate-the-dataset-based-on-the-year/m-p/758975#M239740</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2021-08-03T07:51:12Z</dc:date>
    </item>
    <item>
      <title>Re: separate the dataset based on the year</title>
      <link>https://communities.sas.com/t5/SAS-Programming/separate-the-dataset-based-on-the-year/m-p/759013#M239766</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/15475"&gt;@andreas_lds&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;      select(year(appln_filing_date));
         %do year = &amp;amp;minYear. %to &amp;amp;maxYear.;
            when (&amp;amp;year) output work.want_&amp;amp;year.;
         %end;&lt;BR /&gt;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Nice use of the select statement.&amp;nbsp; &lt;span class="lia-unicode-emoji" title=":thumbs_up:"&gt;👍&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regarding the use of:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;   proc sql noprint;
      select min(year(appln_filing_date)), max(year(appln_filing_date)) 
         into :minYear trimmed, :maxYear trimmed
         from work.have;
   quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;that's going to depend a bit on the business need, at least that's how it works in my company.&amp;nbsp; Our database has many years in it, but we typically only want the current year + the prior three years.&amp;nbsp; So, is it better to use SQL to detect which years are present or is it better to build the macro with years as arguments like the below?&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;%Gen_Dataset_Names(1986, 2020);&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Well, as say, "it depends."&amp;nbsp; &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Now, as for the CALL EXECUTEs that you coded, even if it's not the fastest here, it's still a valuable technique to be aware of -- launching multiple steps from within a single Data step.&amp;nbsp; CALL EXECUTEs are run &lt;STRONG&gt;serially&lt;/STRONG&gt;, i.e. one after the other.&amp;nbsp; Another option, instead of CALL EXECUTE is RSUBMIT (or SYSTASK) which launches processes in &lt;STRONG&gt;parallel&lt;/STRONG&gt;, i.e. concurrently.&amp;nbsp; Parallel processing, depending on the requirements, can significantly speed overall processing.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Jim&lt;/P&gt;</description>
      <pubDate>Tue, 03 Aug 2021 13:24:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/separate-the-dataset-based-on-the-year/m-p/759013#M239766</guid>
      <dc:creator>jimbarbour</dc:creator>
      <dc:date>2021-08-03T13:24:11Z</dc:date>
    </item>
  </channel>
</rss>

