BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
djk200399
Fluorite | Level 6

Hi, I am using SAS again after just over a year at a new company and I know that this is possible and can't remember if it needs to be a loop within a macro

 

What I have are datasets that hold a number of variables with the year and month on the end and I only want to bring back those that are relevant to the calculation period I want to look at. Example table is shown below but covers about 50 months of results in the real version:

 

Product_Number  Analysis_202001 Analysis 202002 Analysis 202003 Analysis 202004 Analysis 202005 Analysis 202006 

1                                 10                            10                  20                           10        

2                                  5                              20                20                             5

3                                 10                              5                   5                              5 

 

What I want to be able to do is put a date variable in the datastep as my start date e.g. 202001 and an end date of 202004 which then only brings back those columns in my new dataset  along with the product_number).

 

What I currently say in the datastep is this:

data test;

set temp (keep = product_number analysis_202001 analysis_202002 analysis_202003 analysis_202004);

run;

 

This obviously doesn't allow me to change this depending on the period I am looking at. I know there is a solution to this but my brain has frozen and just isn't letting me get the result I want so any help would be much appreciated

 

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

To do what you want you just need some method to generate the list of names.  For example just use the INTNX() and INTCK() functions to allow you to generate one name per month in the interval.

So code like this will create a dataset with the 5 months in the interval Jan 2020 thru April 2020.  If you want to eliminate the first month to get the number down to four then start the OFFSET variable from 1 instead of 0.

data names ;
  length start end offset date 8 name $32.;
  format stat end date yymmn6.;
  start='01JAN2020'd;
  end='01APR2020'd;
  do offset=0 to intck('month',start,end);
     date = intnx('month',start,offset);
     name=cats('Analysis_',put(date,yymmn6.));
     output;
  end;
run;

 Once you know how to generate the list it is easy to put it into a macro variable to use in a KEEP statement.  For example with PROC SQL INTO clause.

proc sql noprint;
  select name into :names separated by '  '
  from names;
quit;
data test;
  set temp (keep = product_number &names);
run;

View solution in original post

11 REPLIES 11
Reeza
Super User

Here is a reference that illustrates how to refer to variables and datasets in a short cut list:
https://blogs.sas.com/content/iml/2018/05/29/6-easy-ways-to-specify-a-list-of-variables-in-sas.html

 

You can use short cut references, in your case use the colon and Year most likely. 

 

keep = product_number analysis_2020: 

keep = product_number analysis_202001-analysis_202012

 

 

Also, this an example of why it's a bad idea to store data like this, use a long format so you can very easily add a WHERE statement.

 


@djk200399 wrote:

Hi, I am using SAS again after just over a year at a new company and I know that this is possible and can't remember if it needs to be a loop within a macro

 

What I have are datasets that hold a number of variables with the year and month on the end and I only want to bring back those that are relevant to the calculation period I want to look at. Example table is shown below but covers about 50 months of results in the real version:

 

Product_Number  Analysis_202001 Analysis 202002 Analysis 202003 Analysis 202004 Analysis 202005 Analysis 202006 

1                                 10                            10                  20                           10        

2                                  5                              20                20                             5

3                                 10                              5                   5                              5 

 

What I want to be able to do is put a date variable in the datastep as my start date e.g. 202001 and an end date of 202004 which then only brings back those columns in my new dataset  along with the product_number).

 

What I currently say in the datastep is this:

data test;

set temp (keep = product_number analysis_202001 analysis_202002 analysis_202003 analysis_202004);

run;

 

This obviously doesn't allow me to change this depending on the period I am looking at. I know there is a solution to this but my brain has frozen and just isn't letting me get the result I want so any help would be much appreciated

 

Thank you


 

Tom
Super User Tom
Super User

To do what you want you just need some method to generate the list of names.  For example just use the INTNX() and INTCK() functions to allow you to generate one name per month in the interval.

So code like this will create a dataset with the 5 months in the interval Jan 2020 thru April 2020.  If you want to eliminate the first month to get the number down to four then start the OFFSET variable from 1 instead of 0.

data names ;
  length start end offset date 8 name $32.;
  format stat end date yymmn6.;
  start='01JAN2020'd;
  end='01APR2020'd;
  do offset=0 to intck('month',start,end);
     date = intnx('month',start,offset);
     name=cats('Analysis_',put(date,yymmn6.));
     output;
  end;
run;

 Once you know how to generate the list it is easy to put it into a macro variable to use in a KEEP statement.  For example with PROC SQL INTO clause.

proc sql noprint;
  select name into :names separated by '  '
  from names;
quit;
data test;
  set temp (keep = product_number &names);
run;
djk200399
Fluorite | Level 6

Tom, 

 

Thank you for this as meant I didn't have to do a long winded keep statement every time.

Is there a way for me to then use these names within a calculation. 

 

I have attached a mini version of the closing_bals_tester dataset and attached it as a csv.

 

I want to be able to do the big calculation below but struggling as obviously I have created some in the names piece. How would I macro or loop this? Apologies for asking you as a reply but I just wondered if I could use some of the elements created in the earlier macro.

 

%let fstmonth = '31Oct2020'd;
%let lstmonth = '31Dec2025'd;

data names ;
length start end offset date 8 name $32.;
format stat end date yymmn6.;
start=&fstmonth;
end=&lstmonth;
do offset=0 to intck('month',start,end);
date = intnx('month',start,offset);
name=cats('cinr_',put(date,yymmn6.));
output;
end;
run;

proc sql noprint;
select name into :names separated by ' '
from names;
quit;

data test;
set ciandnr1 (keep = brand no1 product_group &names);
run;

/*Take the last actual closing balance */
data Last_Actual_Product_Balances;
set sav.sepoct20
(keep=ProdRepGroup prodrepgroupid
LedgerBal_oct20);

run;

/*Merge the tables */
proc sql;
create table closing_bals_calc as
select a.*,
b.*
from Last_Actual_product_Balances as a
left join test as b on a.prodrepgroupid=b.no1;
quit;

/* Now do the calcs which should be closing balance for the last actual month
and the rest are the result of the prior month plus the ci and net receipts-
we need to put something in to only run this for the months we want to choose
as we did in earlier macro */


data closing_bals_calca;
set closing_bals_calc;
bal_202010 = ledgerbal_Oct20 / 1000000;
bal_202011 = bal_202010 + cinr_202011;
bal_202012 = bal_202011 + cinr_202012;
bal_202101 = bal_202012 + cinr_202101;
bal_202102 = bal_202101 + cinr_202102;
bal_202103 = bal_202102 + cinr_202103;
bal_202104 = bal_202103 + cinr_202104;
bal_202105 = bal_202104 + cinr_202105;
bal_202106 = bal_202105 + cinr_202106;
bal_202107 = bal_202106 + cinr_202107;
bal_202108 = bal_202107 + cinr_202108;
bal_202109 = bal_202108 + cinr_202109;
bal_202110 = bal_202109 + cinr_202110;
bal_202111 = bal_202110 + cinr_202111;
bal_202112 = bal_202111 + cinr_202112;
bal_202201 = bal_202112 + cinr_202201;
bal_202202 = bal_202201 + cinr_202202;
bal_202203 = bal_202202 + cinr_202203;
bal_202204 = bal_202203 + cinr_202204;
bal_202205 = bal_202204 + cinr_202205;
bal_202206 = bal_202205 + cinr_202206;
bal_202207 = bal_202206 + cinr_202207;
bal_202208 = bal_202207 + cinr_202208;
bal_202209 = bal_202208 + cinr_202209;
bal_202210 = bal_202209 + cinr_202210;
bal_202211 = bal_202210 + cinr_202211;
bal_202212 = bal_202211 + cinr_202212;
bal_202301 = bal_202212 + cinr_202301;
bal_202302 = bal_202301 + cinr_202302;
bal_202303 = bal_202302 + cinr_202303;
bal_202304 = bal_202303 + cinr_202304;
bal_202305 = bal_202304 + cinr_202305;
bal_202306 = bal_202305 + cinr_202306;
bal_202307 = bal_202306 + cinr_202307;
bal_202308 = bal_202307 + cinr_202308;
bal_202309 = bal_202308 + cinr_202309;
bal_202310 = bal_202309 + cinr_202310;
bal_202311 = bal_202310 + cinr_202311;
bal_202312 = bal_202311 + cinr_202312;
bal_202401 = bal_202312 + cinr_202401;
bal_202402 = bal_202401 + cinr_202402;
bal_202403 = bal_202402 + cinr_202403;
bal_202404 = bal_202403 + cinr_202404;
bal_202405 = bal_202404 + cinr_202405;
bal_202406 = bal_202405 + cinr_202406;
bal_202407 = bal_202406 + cinr_202407;
bal_202408 = bal_202407 + cinr_202408;
bal_202409 = bal_202408 + cinr_202409;
bal_202410 = bal_202409 + cinr_202410;
bal_202411 = bal_202410 + cinr_202411;
bal_202412 = bal_202411 + cinr_202412;
bal_202501 = bal_202412 + cinr_202501;
bal_202502 = bal_202501 + cinr_202502;
bal_202503 = bal_202502 + cinr_202503;
bal_202504 = bal_202503 + cinr_202504;
bal_202505 = bal_202504 + cinr_202505;
bal_202506 = bal_202505 + cinr_202506;
bal_202507 = bal_202506 + cinr_202507;
bal_202508 = bal_202507 + cinr_202508;
bal_202509 = bal_202508 + cinr_202509;
bal_202510 = bal_202509 + cinr_202510;
bal_202511 = bal_202510 + cinr_202511;
bal_202512 = bal_202511 + cinr_202512;
run;

 

Appreciate any help / guidance

 

David

Tom
Super User Tom
Super User

First comment is that if you need to do calculations like that you have structured your dataset wrong.  The date should be in a variable instead being part of a variable name. So in your case instead of N observation with 20 BAL... variables and 20 CINR.. variables you will have three variables DATE, BAL and CINR and 20*N observations.

 

But given your current request just use ARRAY statements in your dataset and loop over the index into the array.

data closing_bals_calca;
  set closing_bals_calc;
  array bal bal_202010 bal_202011 ..... ;
  array cinr cinr_202011 cinr_202012 ... ;
  do index=1 to dim(bal);
    if index=1 then bal[index] = ledgerbal_Oct20 / 1000000;
    else bal[index]=bal[index] + cinr[index-1];
  end;
run;

To make so you don't have to type all of the names into the ARRAY statements make two macros variables.  One to contain the list of BAL variables and the other the list of CINR variables . So then your ARRAY statements can look like this:

  array bal &bal ;
  array cinr &cinr ;

 

djk200399
Fluorite | Level 6

Hi Tom,

 

I tried using your code which gave me the first bal_202010 result but thereafter each of the results was a missing calculation. Here is the code I ran (slight tweak as I want the first result to be ledgerbal / 1000000 + the same month's result of cinr. The next calc should be the result of what we have just created + cinr_202011 etc. but all come back as missing even though there are numeric values in there :

 

data closing_bals_calcz;
set closing_bals_calc;
array bal bal_202010 bal_202011 bal202012 bal_202101;
array cinr cinr_202010 cinr_202011 cinr_202112 cinr_202101 ;
do index=1 to dim(bal);
if index=1 then bal[index] = ledgerbal_Oct20 / 1000000+cinr[index];
else bal[index]=bal[index] + cinr[index];
end;
run;

 

 

Kurt_Bremser
Super User

All your troubles stem from a catastrophic design failure (keeping data in structure). As long as you permeate this, you will have issues with stupid code and unnecessary work.

Starting with the csv you provided, I first bring that into usable shape:

data closing_bals_calc;
infile
  "/folders/myfolders/CLOSING_BALS_TESTER.csv"
  dlm=","
  dsd
  truncover
  firstobs=2
;
input LedgerBal_Oct20 cinr_202011 cinr_202012 cinr_202101 cinr_202102 no1;
run;

proc transpose
  data=closing_bals_calc
  out=l1 (rename=(col1=cinr))
;
by no1;
var cinr:;
run;

data long;
set l1;
period = input(scan(_name_,2,'_'),yymmn6.);
format period yymmn6.;
drop _name_;
run;

Now the code looks like this:

data want;
merge 
  closing_bals_calc (keep=no1 ledgerbal_oct20)
  long
;
by no1;
retain balance;
if first.no1
then balance = LedgerBal_Oct20 / 1000000 + cinr;
else balance = balance + cinr;
drop ledgerbal_oct20;
run;

As I see from your code that the design failure seems to be present in most of your data, you need to correct it there, as most of the code you posted would not be necessary (the same functionality would then be served by simple WHERE conditions).

 

Maxim 19: Long Beats Wide.

djk200399
Fluorite | Level 6

Thank you for this Kurt, the problem is that people are keying into spreadsheets which then get inputted so we need to use all their fields to calc some elements, hence your transpose to a long file does the job perfectly bringing it in. 

 

I am now thinking I will export the results out and then use an excel macro to convert them back to the way they see them currently to avoid transposing it back again in SAS

ballardw
Super User

The superior method is to have a variable that holds the date of interest such as:

 

product date analysisvar

 

With an actual SAS date value, numeric and a date format applied that makes sense, then you can filter data based on range easily such as

Data need;

    set have;

    where '01Jan2011'd le date le '01Oct2015'd;

run;

 

IF, and this can be a very big if, the variables you want are in adjacent columns you can use Keep= Var1 - - Var23. The two dashes is a sequential list. But if you have variables: var1 othervar1 var2 othervar2 ... var23 othervar23 you would have the 23 var and 22 of the othervar variables kept.

 

If variable names have a common prefix and an actual sequential suffix then you can use: Keep Var1 - Var10. Which will keep variables Var1, Var2 ..., Var10. However your "date" names won't work well because crossing year boundaries is not sequential: Var201912 is followed with Var202001 which has a gap of about 89 sequential numbers.

Tom
Super User Tom
Super User

There is a trick that could make the use of a variable list possible. It could works because of these specific things about this example:

  • You wanted to use KEEP= dataset option on the input dataset 
  • Your variables have numeric suffixes on the dataset name
  • You are not using any other KEEP=, DROP= or RENAME= dataset options.

SAS has an option named DKRICOND (Drop Keep Rename Input CONDition) that controls what it does when you reference a non-existent variable with those options when reading from a dataset.  Normal default is to error when the variable is not found But you can reduce that to just a warning or no warning at all.  With that option you can then use a variable list and the fact that 202013 suffix does not exist will not cause an error. 

data have;
  length analysis_202001-analysis_202012 analysis_202101-analysis_202112 8;
run;

%let optsave= %sysfunc(getoption(dkricond,keyword));
options dkricond=nowarn;
data want;
  set have (keep=analysis_202010 - analysis_202102 );
run;
options &optsave;
Kurt_Bremser
Super User

Transpose to a long layout, and you can use simple WHERE conditions.

data wide;
input
  Product_Number
  Analysis_202001
  Analysis_202002
  Analysis_202003
  Analysis_202004
;
datalines;
1 10 10 20 10        
2  5 20 20  5
3 10  5  5  5
;

proc transpose
  data=wide
  out=long (rename=(col1=analysis))
;
by product_number;
var analysis:;
run;

data want;
set long;
period = input(scan(_name_,2,"_"),yymmn6.);
format period yymmn6.;
drop _name_;
run;

By extracting the period from the variable name, and storing it as SAS date, you open yourself up to the use of the SAS date functions, e.g.

where intnx('month',&refdate.,-3,'b') le period le intnx('month',&refdate.,3,'e');

which handles year boundaries without hassle.

SASJedi
SAS Super FREQ

If you always analyze a range of consecutively named variables, try something like this:

/* Set the macro variable value to the correct variable range */
%let range=analysis_202001 - analysis_202004;
data test;
   /* Wherever you need the variable list, reference the macro variable. */
   set temp (keep = product_number &range);
run;

 

Check out my Jedi SAS Tricks for SAS Users

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 2067 views
  • 3 likes
  • 6 in conversation