BookmarkSubscribeRSS Feed
Dansas5
New User | Level 1

Hi guys,

My challenge is to find a way of enabling SAS to output the number of datasets determined by the number of month macro variables.

The source datasets are large and therefore created for each separate month.

However, I need to compare summary results of specific variables across the months.

The code I've written partially automates the data gathering step. It's the second part which I've not yet found an elegant way of enabling SAS to create the relevant number of datasets, hence my question.

 

First part/section of code declares:

%LET CurrentMonth = '30Jun2024'D

%LET INTNXIncrement = 6;

 

Two DATA steps follow creating the "MonthKeys" needed because the source dataset names end with the suffix yyyymm, e.g. SourceDS_202405, SourceDS_202404, etc.

These two DATA steps create the relevant MonthKeys via an INTNX statement in which the increment macro is called, in this case 6 for six (prior) months, i.e.:

DATA DS_1;

FORMAT MonthEndDates DATE9.;

DO i=1 TO &INTNXIncrement;

   MonthEndDates = INTNX('MONTH',&CurrentMonth,-i,'E');

   OUTPUT;

END;

DATA DS_2;

SET DS_1;

IF LENGTH(STRIP(MONTH(MonthEndDates ))) = 1 THEN MonthKeys=YEAR(MonthEndDates )||"0"||STRIP(MONTH(MonthEndDates ));
ELSE MonthKeys=YEAR(MonthEndDates )||STRIP(MONTH(MonthEndDates ));

RUN;

The DO loop outputs the end dates of the 6 months preceding the current month, the second DATA step simply converts these month-end dates to the yyymm format. 

 

Finally, those MonthKeys are then converted into macro variables via a PROC SQL step:

PROC SQL;
SELECT COUNT(*) INTO :NObs
FROM DS_2;
SELECT MonthKeys INTO :Month_Key1-:Month_Key%LEFT(&NObs)
FROM DS_2;
QUIT;

We now have an array of 6 macro variables in the yyymm format going back 6 months before the current month, i.e.:

Month_Key1, Month_Key2... MonthKey6 or

202405, 202404... 202312.

 

So the last step needed is one capable of outputting only the relevant number of datasets (in this case 6).

It would know to create datasets until the last macro variable is called and processed. Each step or iteration would refer to a different source dataset distinguished by the suffix and those suffixes are already stored in memory as macro variables.

This would avoid writing the code as shown below:

PROC SQL;

CREATE TABLE NewDS_&Month_Key1 AS

SELECT Month_Key, SUM(Var1), SUM(Var2), SUM(Var3)

FROM SourceDS_&MonthKey1;

GROUP BY 1;

.

.

.

CREATE TABLE NewDS_&Month_Key6 AS

SELECT Month_Key, SUM(Var1), SUM(Var2), SUM(Var3)

FROM SourceDS_&MonthKey6;

GROUP BY 1;

QUIT;

I.e. the total PROC SQL hard-coded steps being dependent on the number of months being analysed, which is no where near ideal.

 

Ultimately, all these grouped/summarised datasets need to be concatenated so that a monthly comparison can be made of each variable.

Hence, this last DATA step also needs refinement, instead of the hard-code approach currently used -

DATA NewDS_Combined;

SET NewDS_&Month_Key1... NewDS_&Month_Key6;

RUN;

 

My guess is either a DO loop or a macro DATA step is required to create the "n" number of datasets and then to concatenate/set them.

I would be very grateful for your advice.

Dan

 

1 REPLY 1
ballardw
Super User

Besides not understanding why you need multiple data sets I don't believe that I understand what you want.

 

Please show the exact list of names you want to create for that given Currentdate and Intnxincrement

 

Note: If you find yourself tempted to write stuff like this again look to a proper FORMAT. SAS even allows creating custom formats for dates if needed

but

IF LENGTH(STRIP(MONTH(MonthEndDates ))) = 1 THEN MonthKeys=YEAR(MonthEndDates )||"0"||STRIP(MONTH(MonthEndDates ));
ELSE MonthKeys=YEAR(MonthEndDates )||STRIP(MONTH(MonthEndDates ));

one has problems as the automatic conversions for numeric to numeric can be proplematic

but try

monthkeys = put(monthenddates,yymmn6.);

(not terribly fond of variables indicating plurals as they generally should only have one value at a time)

 

And this makes the macro variables directly in the first step:

DATA _null_;
  DO i=1 TO &INTNXIncrement;
     call symputx("Month_key"||put(i,best3. -L),put(INTNX('MONTH',&CurrentMonth,-i,'E'),yymmn6.));
  end;
run;

I don't see anything actually related to extracting or creating multiple data sets from one so I suspect it is problematic.

 

Personally I suspect you would be way better off creating a start and end date of interest, subsetting the large data set into one based on those two dates, sort by the date variable and use:

Proc summary data=smallerdataset;
   by datevalue;
   format datevalue yymmn6.
   var var1 var2 var3 ;
   output out=want (drop=_type_ _freq_) sum= ;
run;

summary and single result set in one step.

 

Look up CALL EXECUTE as a way to create code using values in a data set (or loop).

Or use a data step to write text for the syntax to a file and then use %include to execute it.

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 91 views
  • 1 like
  • 2 in conversation