BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
aaronh
Quartz | Level 8

(My post here may seem verbose, but I hope that it will at least be clearly stated with the help of the context, the example dataset, and my codes.)

 

CONTEXT: I have a very large dataset that contains a date variable, and I am trying to come up with a way to subset this large dataset BY the year of this date variable (i.e. so that one subset contains ALL rows for 2018, and another contains ALL rows for 2019. So on and so forth). 

 

IDEA: Instead of using an if ... then output for each of the years in my data, I have attempted (and failed) to use a do loop and macro variables. (I'd also welcome other approaches that will get me the same results.) 

 

Just to illustrate my idea, assume that we have an imaginary table called 'test' that contains the birth year of a few kids:

 

data test;
input yr;
datalines;
2019
2019
2018
2018
2018
2017
2016
2016
2015
2015
;

The table above contains years from 2015 to 2019 (spanning 5 years). And I've written the following codes in an attempt to programmatically to create 5 subsets by the birth year:

 

 

 

** To save the max and min year as macro var **;
proc sql noprint;
  select min(yr), max(yr)
  into :min_yr, 
       :max_yr
  from test;
quit;

%let int = %eval(&max_yr - &min_yr); **int denotes the spanning interval**;

** Subset using do loop **;
data _null_;
  set test;
  do i=0 to &int.;
    call symput('iter', i);
    if yr = %eval(&min_yr. + i) then output "data&iter.";
  end;
run;

Unfortunately, my codes did not work, because inside the %eval( ) function, the "i" was not assigned a value. And SAS log had an error message that reads: A character operand was found in the %eval function ... where a numeric operand is required.

 

 

Sorry for making this so lengthy, but I'd really appreciate some help.

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26
if yr = %eval(&min_yr. + i) then output "data&iter.";

The %eval function needs macro variables or numbers to work properly. It doesn't know what to do with the data step variable i

 

When trying to get macros like this to work, you first need to have a data step that works without macros and without macro variables. Then, you should be able to substitute the macro variables and macro code to get it to work. Then, also, you could see that putting a name of a data set inside double quotes, as you did above, is not necessary, and will not work.

 

So, this works without macro variables and without macros (and I have saved myself some typing by assuming the only years are 2015 2016 and 2017).

 

data data2015 data2016 data2017;
  set test;
  if yr = 2015 then output data2015;
  if yr = 2016 then output data2016;
  if yr = 2017 then output data2017;
run;

So now you can turn this into a macro by placing macro loops in the places where the years are shown above. The macro processor really just replaces text upon program execution, and so you want a macro variable (let's call it &y) to have values (in the loop) going from 2015 to 2017.

 

%macro dothis;
     data %do y=&min_yr %to &max_yr; data&y %end; ;
         set test;
         %do y=&min_yr %to &max_yr;
              if yr=&y then output data&y;
          %end;
       run;
%mend;
%dothis

 

Finally, as a general comment, but usually an important concept that probably applies here ... it is inefficient to split up data sets into yearly data sets. If you need analyses by year, and you keep the entire data set whole (instead of splitting it up), you can get yearly analyses by using the BY command in your SAS code. If I were you, I would see if this works satisfactorily before considering splitting the data up into data sets by year. In other words, my advice to you is: DON'T DO THIS unless you are convinced that there's no way to get what you want with a single large data set.

 

 

 

--
Paige Miller

View solution in original post

9 REPLIES 9
PaigeMiller
Diamond | Level 26
if yr = %eval(&min_yr. + i) then output "data&iter.";

The %eval function needs macro variables or numbers to work properly. It doesn't know what to do with the data step variable i

 

When trying to get macros like this to work, you first need to have a data step that works without macros and without macro variables. Then, you should be able to substitute the macro variables and macro code to get it to work. Then, also, you could see that putting a name of a data set inside double quotes, as you did above, is not necessary, and will not work.

 

So, this works without macro variables and without macros (and I have saved myself some typing by assuming the only years are 2015 2016 and 2017).

 

data data2015 data2016 data2017;
  set test;
  if yr = 2015 then output data2015;
  if yr = 2016 then output data2016;
  if yr = 2017 then output data2017;
run;

So now you can turn this into a macro by placing macro loops in the places where the years are shown above. The macro processor really just replaces text upon program execution, and so you want a macro variable (let's call it &y) to have values (in the loop) going from 2015 to 2017.

 

%macro dothis;
     data %do y=&min_yr %to &max_yr; data&y %end; ;
         set test;
         %do y=&min_yr %to &max_yr;
              if yr=&y then output data&y;
          %end;
       run;
%mend;
%dothis

 

Finally, as a general comment, but usually an important concept that probably applies here ... it is inefficient to split up data sets into yearly data sets. If you need analyses by year, and you keep the entire data set whole (instead of splitting it up), you can get yearly analyses by using the BY command in your SAS code. If I were you, I would see if this works satisfactorily before considering splitting the data up into data sets by year. In other words, my advice to you is: DON'T DO THIS unless you are convinced that there's no way to get what you want with a single large data set.

 

 

 

--
Paige Miller
aaronh
Quartz | Level 8
Thank you so much, Paige! The only reason I was trying to use a do loop here, instead of the if then output, is because I will likely use the same procedure again, sometimes without knowing exactly what years are present in my data. But I can always use the data step if then output if I have no better ways to do this. Thanks again!
PaigeMiller
Diamond | Level 26

@aaronh wrote:
Thank you so much, Paige! The only reason I was trying to use a do loop here, instead of the if then output, is because I will likely use the same procedure again, sometimes without knowing exactly what years are present in my data. But I can always use the data step if then output if I have no better ways to do this. Thanks again!

Except I also advised you to NOT break up the data set into data sets by year, except as a last resort. I think what you are trying to do is unnecessary extra work that only slows things down in the long run, IN MOST CASES.

--
Paige Miller
aaronh
Quartz | Level 8
The macro %dothis was very cool, thank you! I will keep in mind not to make the subsets unless it's necessary to do so. Thanks for the suggestions and reminders!
aaronh
Quartz | Level 8
p.s. thanks for providing the thought process behind how you arrived at the solution. It really helped me understand why you did these.
Tom
Super User Tom
Super User

You are close, but as you saw you cannot use mix data step and macro logic in that way.

It is easier to just have the data step generate the code.  One way to generate code from a data step is to use CALL EXECUTE().

So first get the range of years.  Do you want all years from min to max, even those without any records?

proc summary data=test ;
  var yr ;
  output out=range min=min_yr max=max_yr;
run;
data _null_;
   set range;
   call execute('data');
   do yr=min_yr to max_yr ;
      call execute(' '||cats('data',yr));
   end;
   call execute(';set test;');
   do yr=min_yr to max_yr;
      call execute(catx(' ','if yr=',yr,'then output',cats('data',yr),';'));
    end;
   call execute('run;');
run;

Or just the years that exist?

proc summary data=test nway;
  class yr ;
  output out=range;
run;
data _null_;
   call execute('data');
   do until(eof1);
     set range end=eof1;
     call execute(' '||cats('data',yr));
   end;
   call execute(';set test;');
   do until(eof2);
     set range end=eof2;
     call execute(catx(' ','if yr=',yr,'then output',cats('data',yr),';'));
   end;
   call execute('run;');
run;
novinosrin
Tourmaline | Level 20

Hi @aaronh 

 

Index your yr column and that will do an implicit sort and then hash is a breeze

 


data test;
input yr;
datalines;
2019
2019
2018
2018
2018
2017
2016
2016
2015
2015
;


proc sql ;                                                              
  create index  yr on test (yr) ;                      
quit ;                                                                  
                                                                        
data _null_ ;                                                           
  if _n_ = 1 then do ;                                                  
    dcl hash h () ;                                                     
    h.definekey  ("_n_") ;                                              
    h.definedata ('yr') ;
    h.definedone () ;                                                   
  end ;                                                                 
  do _n_ = 1 by 1 until (last.yr) ;                            
    set test ;                                                          
    by yr ;                                                    
    h.add() ;                                                           
  end ;                                                                 
  h.output (dataset: catx ("_", "year", yr)) ;          
  h.clear() ;                                                           
run ;   

 

 

ballardw
Super User

If your date variable is an actual SAS date variable for a very large number of purposes you can create groups directly as you describe without adding any variable just use the correct format:

data example;
   do date = '01Jan1960'd to '18Jul2020'd;
   output;
   end;
run;

proc freq data=example;
   tables date;
   format date year4.;
run;

or

proc freq data=example;
   tables date;
   format date yyq6.;
run;

or

proc freq data=example;
   tables date;
   format date yymon.;
run;

Groups created by a format applied to a SAS date variable are honored by most analysis and graphing procedures as well as reporting and summarizing such as Proc Freq , Tabulate, Report.

 

Or leave the data together and only subset as needed:

 

proc freq data=example;
   where year(date) > 2010;
   tables date;
   format date yymon.;
run;
Reeza
Super User
Why are you trying to split it up? If you're exporting, using ODS EXCEL with BY group processing can put each data set to a unique worksheet automatically. Or there are many other options. Usually the only reason to split it up, it's too big for SAS UE or a statistical proc that relies on memory, or for proc optmodel. Otherwise, there's almost always a better approach.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 3988 views
  • 3 likes
  • 6 in conversation