BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Dansas5
Fluorite | Level 6

Hi guys,

My challenge is to find a way of enabling SAS to output the number of datasets determined by the number of month macro variables.

The source datasets are large and therefore created for each separate month.

However, I need to compare summary results of specific variables across the months.

The code I've written partially automates the data gathering step. It's the second part which I've not yet found an elegant way of enabling SAS to create the relevant number of datasets, hence my question.

 

First part/section of code declares:

%LET CurrentMonth = '30Jun2024'D

%LET INTNXIncrement = 6;

 

Two DATA steps follow creating the "MonthKeys" needed because the source dataset names end with the suffix yyyymm, e.g. SourceDS_202405, SourceDS_202404, etc.

These two DATA steps create the relevant MonthKeys via an INTNX statement in which the increment macro is called, in this case 6 for six (prior) months, i.e.:

DATA DS_1;

FORMAT MonthEndDates DATE9.;

DO i=1 TO &INTNXIncrement;

   MonthEndDates = INTNX('MONTH',&CurrentMonth,-i,'E');

   OUTPUT;

END;

DATA DS_2;

SET DS_1;

IF LENGTH(STRIP(MONTH(MonthEndDates ))) = 1 THEN MonthKeys=YEAR(MonthEndDates )||"0"||STRIP(MONTH(MonthEndDates ));
ELSE MonthKeys=YEAR(MonthEndDates )||STRIP(MONTH(MonthEndDates ));

RUN;

The DO loop outputs the end dates of the 6 months preceding the current month, the second DATA step simply converts these month-end dates to the yyymm format. 

 

Finally, those MonthKeys are then converted into macro variables via a PROC SQL step:

PROC SQL;
SELECT COUNT(*) INTO :NObs
FROM DS_2;
SELECT MonthKeys INTO :Month_Key1-:Month_Key%LEFT(&NObs)
FROM DS_2;
QUIT;

We now have an array of 6 macro variables in the yyymm format going back 6 months before the current month, i.e.:

Month_Key1, Month_Key2... MonthKey6 or

202405, 202404... 202312.

 

So the last step needed is one capable of outputting only the relevant number of datasets (in this case 6).

It would know to create datasets until the last macro variable is called and processed. Each step or iteration would refer to a different source dataset distinguished by the suffix and those suffixes are already stored in memory as macro variables.

This would avoid writing the code as shown below:

PROC SQL;

CREATE TABLE NewDS_&Month_Key1 AS

SELECT Month_Key, SUM(Var1), SUM(Var2), SUM(Var3)

FROM SourceDS_&MonthKey1;

GROUP BY 1;

.

.

.

CREATE TABLE NewDS_&Month_Key6 AS

SELECT Month_Key, SUM(Var1), SUM(Var2), SUM(Var3)

FROM SourceDS_&MonthKey6;

GROUP BY 1;

QUIT;

I.e. the total PROC SQL hard-coded steps being dependent on the number of months being analysed, which is no where near ideal.

 

Ultimately, all these grouped/summarised datasets need to be concatenated so that a monthly comparison can be made of each variable.

Hence, this last DATA step also needs refinement, instead of the hard-code approach currently used -

DATA NewDS_Combined;

SET NewDS_&Month_Key1... NewDS_&Month_Key6;

RUN;

 

My guess is either a DO loop or a macro DATA step is required to create the "n" number of datasets and then to concatenate/set them.

I would be very grateful for your advice.

Dan

 

1 ACCEPTED SOLUTION

Accepted Solutions
Dansas5
Fluorite | Level 6

Hi @Tom, apologies for my belated reply.

Your suggestion of combining a DO loop within a macro using a PROC SQL step has worked!

I also created another macro appending all the resulting new monthly summary datasets.

But in the end I managed to incorporate that DATA concatenating step within the existing CreateNewDS macro, as shown below.

Hence, the combination of the DATA _null_ step and the macro completely and beautifully solves my much-needed query.

I am indebted to you both @ballardw and @Tom for your assistance, you guys are the real stars!

Last, but not least, thank you for reminding me about PROC SUMMARY. I shall need to refresh my memory of that procedure too.

 

LIBNAME ...

%LET CurrentMonth = '30Jun2024'D

%LET INTNXIncrement = 6;

 

DATA _NULL_;
DO i=1 TO &INTNXIncrement;
     CALL SYMPUTX("Month_Key"||PUT(i,best3. -L),PUT(INTNX('MONTH',&CurrentMonth,-i,'E'),yymmn6.));
END;
RUN;
 
%MACRO CreateNewDS;
%DO i=1 %TO 6;
CREATE TABLE NewDS_&&Month_Key&i AS
  SELECT Month_Key, SUM(Var1) as Var1, SUM(Var2) as Var2, SUM(Var3) as Var3
  FROM SourceDS_&&MonthKey&i
  WHERE Var4 = "Adj520"
  GROUP BY 1;
%END;
 
DATA CombinedNewDS;
SET 
%DO i = 1 %TO 6;
  NewDS_&&Month_Key&i
%END;;
RUN;
 
%MEND;
%CreateNewDS
 

View solution in original post

8 REPLIES 8
ballardw
Super User

Besides not understanding why you need multiple data sets I don't believe that I understand what you want.

 

Please show the exact list of names you want to create for that given Currentdate and Intnxincrement

 

Note: If you find yourself tempted to write stuff like this again look to a proper FORMAT. SAS even allows creating custom formats for dates if needed

but

IF LENGTH(STRIP(MONTH(MonthEndDates ))) = 1 THEN MonthKeys=YEAR(MonthEndDates )||"0"||STRIP(MONTH(MonthEndDates ));
ELSE MonthKeys=YEAR(MonthEndDates )||STRIP(MONTH(MonthEndDates ));

one has problems as the automatic conversions for numeric to numeric can be proplematic

but try

monthkeys = put(monthenddates,yymmn6.);

(not terribly fond of variables indicating plurals as they generally should only have one value at a time)

 

And this makes the macro variables directly in the first step:

DATA _null_;
  DO i=1 TO &INTNXIncrement;
     call symputx("Month_key"||put(i,best3. -L),put(INTNX('MONTH',&CurrentMonth,-i,'E'),yymmn6.));
  end;
run;

I don't see anything actually related to extracting or creating multiple data sets from one so I suspect it is problematic.

 

Personally I suspect you would be way better off creating a start and end date of interest, subsetting the large data set into one based on those two dates, sort by the date variable and use:

Proc summary data=smallerdataset;
   by datevalue;
   format datevalue yymmn6.
   var var1 var2 var3 ;
   output out=want (drop=_type_ _freq_) sum= ;
run;

summary and single result set in one step.

 

Look up CALL EXECUTE as a way to create code using values in a data set (or loop).

Or use a data step to write text for the syntax to a file and then use %include to execute it.

 

Dansas5
Fluorite | Level 6

Many and sincere thanks @ballardw for your advice.

Apologies for not making the need clear.

The source datasets I am working with are all monthly tables, so each dataset relates to data for each month separately.

Therefore, to analyse how a variable changes across/over months it is necessary to combine or concatenate data from the relevant monthly datasets.

In short, I'm not dealing with one large dataset that needs subsetting by the month keys.

These monthly dataset names all follow this convention, i.e. x_y where x = an unchanging string and y being the plural yyyymm reference.

In creating the yyyymm macro variables, I was heading towards the PUT statement you've suggested, which is a far superior approach, so cheers for that.

I obviously need to familiarise myself more with CALL SYMPUTX (have often used CALL SYMPUT previously though).

May I ask what is the purpose of the "-L" in the first PUT statement in the DATA _null_ step? I.e., 

put(i,best3. -L)

@ballardw, your DATA _null_ step involving the CALL SYMPUTX routine achieves in one step what I was trying to achieve in three, which is very cool.

First argument creates the macro variable: Month_Key1...Month_Key6. I'm guessing the "-L" is similar to the low in PROC FORMAT specifying the lowest range of values? I'm curious to know what would happen if we didn't use the "L" specification?

Second argument provides/passes the text for the macro variable to store, which are the month-end dates outputted into yyymm format.

So, I only still need to know a way of instructing SAS to create the "i" number of monthly datasets that ultimately need concatenating.

Any ideas?

ballardw
Super User

First a simple answer: the -L in put makes sure the result is LEFT justified. Otherwise a put with a numeric value is right justified in the result. Which means you can get leading spaces in the result which remain when concatenated.

 

IF you have a number of sets to concatenate

data want;
    set  data_202405 data_202404 data_202403 <etc>;
run;

Which with call execute could be written in a data step

data _null_;
   call execute "data want;"
   call execute " set ";

  DO i=1 TO &INTNXIncrement;
     dsn= " sourceDs_"||put(INTNX('MONTH',&CurrentMonth,-i,'E'),yymmn6.));
     call execute (dsn); /* this should be just the name of data set(s) on the Set statement*/
  end;
  call execute (';');/* this ; ends the SET statement*/
  /* if you want to manipulate the variables in the set
  that code would go here
  */
  call execute ('run;';
run;

 

Tom
Super User Tom
Super User

Take a step back and explain exactly what SAS you want to end up running when you have a start date of '30Jun2024'D and you want to include 3 months of data.

 

Do you want to combine the data and then run the analysis?

data for_analysis/ view=for_analysis;
  set biglib.ds_202404 biglib.ds_202405 ;
run;
proc means data=for_analysis noprint;
  var a bc ;
  output out=want sum=;
run;

Or do you want to run the analysis and then combine the results?

proc means data=biglib.ds_202404;
  var a bc ;
  output out=summary_202404 sum=;
run;
proc means data=biglib.ds_202405;
  var a bc ;
  output out=summary_202405 sum=;
run;
data want;
  set summary_202404 summary_202405;
run;

Once you know that then you can design some logic to generate that code.  Either with your data step that is currently generating the list of months.  Or perhaps with a SAS macro instead.

Tom
Super User Tom
Super User

I thing you are making this way too hard, but let's just go through what you posted in order.

 

First there is no need to run two data steps to calculate two variables.  Calculate them both in the same step.

data DS_2;
  do i=1 to &INTNXIncrement;
    MonthEndDates = intnx('month',&CurrentMonth,-i,'E');
    MonthKeys = put(MonthEndDates,yymmn6.);
    output;
  end;
  format MonthEndDates DATE9.;
run;

Results:

Tom_0-1718852386787.png

Now that you have the set of values you can use it to generate the code you want.

file code temp;
data _null_;
  set ds_2;
  file code;
  put 'proc summary data=SourceDS_' MOnthKeys 'nway;'
    / '  class Month_Key;'
    / '  var var1 var2 var3;'
    / '  output out=NewDS_' MOnthKeys 'sum= ;'
    / 'run;'
    / 'proc append base=NewDS_Combined data=NewDS_' MOnthKeys 'force;'
    / 'run;'
  ;
run;

Which you can then easily run using %INCLUDE macro statement.

%include code / source2;

If you really want to use macro variables instead then there is not really any need for the dataset.

data _null_;
  do i=1 to &INTNXIncrement;
    MonthEndDates = INTNX('MONTH',&CurrentMonth,-i,'E');
    call symputx(cats('month_key',i),put(MonthEndDates,yymmn6.));
    output;
  end;
run;

Result:

704  %put &=month_key1 MONTH_KEY&INTNXIncrement=&&&month_key&INTNXIncrement;
MONTH_KEY1=202405 MONTH_KEY6=202312

Then inside of a macro definition you could use a %DO loop to generate the SAS code instead.

%do i=1 %to &INTNXIncrement;
proc summary data=SourceDS_&&month_key&i nway;
  class Month_Key;
  var var1 var2 var3;
  output out=NewDS_&&month_key&i sum= ;
run;
proc append base=NewDS_Combined data=NewDS_&&month_key&i force;
run;
%end;

You could even skip all of the extra macro variables and just have the %DO loop generate the next month_key when it needs it.  While you are at it why not generate them in chronological order to begin with?

%do i=&INTNXIncrement %to 1 %by -1;
  %let month_key=%sysfunc(intnx(month,&CurrentMonth,-&i,e),yymmn6.);
proc summary data=SourceDS_&month_key nway;
  class Month_Key;
  var var1 var2 var3;
  output out=NewDS_&month_key sum= ;
run;
proc append base=NewDS_Combined data=NewDS_&month_key force;
run;
%end;

 

Dansas5
Fluorite | Level 6

Hi @ballardw

Many thanks for your quick reply.

What you've suggested serves for the very last, ultimate, step of concatenating the six, in this case (as INTNXIncrement=6) new datasets ("NewDS").

We've created the relevant macro variables (Month_Key1... MonthKey6) through that DATA _null_ step you've suggested.

 

Now we need to create 6 new datasets. I did it the long way of hardcoding for each, as shown below.

This is the crucial step we need a solution for and that would manage any number of new monthly datasets.

Remembering that our source datasets are large monthly tables, and we need to query a sub-set number of each of them. That sub-set is determined by the CurrentMonth and INTNXIncrement macro vars.

Also, please accept my apologies for forgetting to write the WHERE clause in the PROC SQL steps I referred to initially in my post. I've now corrected that as shown below.

@Tom, thanks for your reply too, just seen it. To your question "Do you want to combine the data and then run the analysis?', the answer is yes.

That's because each of the six (in this case) datasets will only contain one observation, i.e., the Month_Key value and the summations of the variables (Var1-Var3).

The concatenated dataset will therefore contain six observations, with six Month_Key values and their corresponding summations for the three selected variables:

 

Month_KeyVar1Var2Var3
202405123503212153210
202404123403210143210
202403123033202032030
202402123203200332020
202401123013214032110
202312123213215032112

 

Can we perhaps create a macro capable of creating the required number of datasets, in which each iteration the relevant macro variable is passed as a parameter?

E.g. in the first iteration of the macro routine, Month_Key = 202405, and table NewDS_202405 is created by subsetting from SourceDS_202405.

In second iteration, Month_Key = 202404, and table NewDS_202404 is created by subsetting from SourceDS_202404.

Etc. until we reach the sixth macro variable where Month_Key = 202312.

This is what I'm envisioning and I trust the requirements are clearer now. Please advise either way, your help here is hugely appreciated!

The code excerpt below is the undynamic and laborious approach I've temporarily used.

 

PROC SQL;

CREATE TABLE NewDS_&Month_Key1 AS

SELECT Month_Key, SUM(Var1), SUM(Var2), SUM(Var3)

FROM SourceDS_&MonthKey1

WHERE Var4 = "Adj520";

GROUP BY 1;

.

.

.

CREATE TABLE NewDS_&Month_Key6 AS

SELECT Month_Key, SUM(Var1), SUM(Var2), SUM(Var3)

FROM SourceDS_&MonthKey6

WHERE Var4 = "Adj520";

GROUP BY 1;

QUIT;

Tom
Super User Tom
Super User

If you define a macro you can use %DO loop.

%do i=1 %to 6 ;
CREATE TABLE NewDS_&&Month_Key&i AS
  SELECT Month_Key, SUM(Var1) as Var1, SUM(Var2) as Var2, SUM(Var3) as Var3
  FROM SourceDS_&&MonthKey&i
  WHERE Var4 = "Adj520"
  GROUP BY 1
;
%end;

Note that it will probably be faster to use PROC SUMMARY than PROC SQL.  It is certainly easier to type the code (not so many pesky commas) and extend the list of variables to sum.

proc summary data=SourceDS_&&MonthKey&i nway;
  WHERE Var4 = "Adj520";
  class Month_Key ;
  var Var1 Var2 Var3 ;
  output out=NewDS_&&Month_Key&i sum=;
run;

 

Dansas5
Fluorite | Level 6

Hi @Tom, apologies for my belated reply.

Your suggestion of combining a DO loop within a macro using a PROC SQL step has worked!

I also created another macro appending all the resulting new monthly summary datasets.

But in the end I managed to incorporate that DATA concatenating step within the existing CreateNewDS macro, as shown below.

Hence, the combination of the DATA _null_ step and the macro completely and beautifully solves my much-needed query.

I am indebted to you both @ballardw and @Tom for your assistance, you guys are the real stars!

Last, but not least, thank you for reminding me about PROC SUMMARY. I shall need to refresh my memory of that procedure too.

 

LIBNAME ...

%LET CurrentMonth = '30Jun2024'D

%LET INTNXIncrement = 6;

 

DATA _NULL_;
DO i=1 TO &INTNXIncrement;
     CALL SYMPUTX("Month_Key"||PUT(i,best3. -L),PUT(INTNX('MONTH',&CurrentMonth,-i,'E'),yymmn6.));
END;
RUN;
 
%MACRO CreateNewDS;
%DO i=1 %TO 6;
CREATE TABLE NewDS_&&Month_Key&i AS
  SELECT Month_Key, SUM(Var1) as Var1, SUM(Var2) as Var2, SUM(Var3) as Var3
  FROM SourceDS_&&MonthKey&i
  WHERE Var4 = "Adj520"
  GROUP BY 1;
%END;
 
DATA CombinedNewDS;
SET 
%DO i = 1 %TO 6;
  NewDS_&&Month_Key&i
%END;;
RUN;
 
%MEND;
%CreateNewDS
 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 578 views
  • 3 likes
  • 3 in conversation