Good afternoon everyone,
please help me with the automation of my code. I am relatively new to the automation of SAS codes. Please see the code that I am trying to automate below:
data sales;
set Applied_Accounts;
if sales_date <> . then do;
If sales_date le '05Apr2020'd then salesperiod = '00: Sales pre Apr';else
If sales_date le '05May2020'd then salesperiod = '01: 06Apr - 05May2020';else
If sales_date le '05Oct2020'd then salesperiod = '06: 06Sep - 05Oct2020';else
If sales_date le '05Nov2020'd then salesperiod = '07: 06Oct - 05Nov2020';else
If sales_date le '01Dec2020'd then salesperiod = '08: 06Nov - 01Dec2020';else
If sales_date le '05Jan2021'd then salesperiod = '09: 02Dec - 05Jan2021';else
If sales_date le '05Feb2021'd then salesperiod = '10: 06Jan - 05Feb2021';else
If sales_date le '05Mar2021'd then salesperiod = '11: 06Feb - 05Mar2021';else
If sales_date le '31Mar2021'd then salesperiod = '12: 06Mar - 31Mar2021';else
If sales_date le '30Apr2021'd then salesperiod = '13: 01Apr - 30Apr2021';else
If sales_date le '31May2021'd then salesperiod = '14: 01May - 31May2021';else
If sales_date le '30Jun2021'd then salesperiod = '15: 01Jun - 30Jun2021';else
If sales_date le '31Jul2021'd then salesperiod = '16: 01Jul - 31Jul2021';else
If sales_date le '31Aug2021'd then salesperiod = '17: 01Aug - 31Aug2021';else
If sales_date le '30Sep2021'd then salesperiod = '18: 01Sep - 30Sep2021';else
If sales_date le '31Oct2021'd then salesperiod = '19: 01Oct - 31Oct2021';else
If sales_date le '30Nov2021'd then salesperiod = '20: 01Nov - 30Nov2021';else
If sales_date le '31Dec2021'd then salesperiod = '21: 01Dec - 31Dec2021';else
If sales_date le '31Jan2022'd then salesperiod = '22: 01Jan - 31Jan2022';else
If sales_date le '28Feb2022'd then salesperiod = '23: 01Feb - 28Feb2022';else
If sales_date le '31Mar2022'd then salesperiod = '24: 01Mar - 31Mar2022';else
If sales_date le '30Apr2022'd then salesperiod = '25: 01Apr - 30Apr2022';else
If sales_date le '31May2022'd then salesperiod = '26: 01May - 31May2022';else
salesperiod = '999:Update code';
end;
run;
I want to automate this code so that I do not have to manually input the sales_data and the salesperiod.
Thank you in advance for your response.
@MagD wrote (emphasis mine)
Some time periods started at the 5th day of the month but that has been changed to the 1st day of the month
You are saying you have an irregular historic set of date ranges up through the 06MAR2021-31MAR2021, and subsequently regular whole month date ranges. So you could just use a logical structure such as this pseudo-code:
data sales;
set Applied_Accounts;
if not missing(sales_date) then do;
if sales_date <='31mar2021'd then salesperiod = put(sales_date,dtfmt.); else
salesperiod=function-of-sales_date-distance-from-31mar2021;
end;
run;
This assumes you use part of the format DTFMT shown by @acordes. Something like
proc format ;
VALUE dtfmt
low - '05apr2020'd='00: Sales pre Apr'
'06APR2020'd- '05MAY2020'd='01: 06APR - 05MAY2020'
'06MAY2020'd-'05OCT2020'd='02: 06MAY - 05OCT2020'
'06OCT2020'd-'05NOV2020'd='07: 06OCT - 05NOV2020'
'06NOV2020'd-'01DEC2020'd='08: 06NOV - 01DEC2020'
'02DEC2020'd-'05JAN2021'd='09: 02DEC - 05JAN2021'
'06JAN2021'd-'05FEB2021'd='10: 06JAN - 05FEB2021'
'06FEB2021'd-'05MAR2021'd='11: 06FEB - 05MAR2021'
'06MAR2021'd-'31MAR2021'd='12: 06MAR - 31MAR2021'
other='xxxxxx';
run;
data sales;
set Applied_Accounts;
if not missing(sales_date) then do;
if sales_date <= '31mar2021'd then salesperiod=put(sales_date,dtfmt.); else
salesperiod= catx(' '
,cats(intck('month','31mar2021'd,sales_date)+12,': 01',put(sales_date,monname3.))
,'-'
,put(intnx('month',sales_date,0,'e'),date9.)
);
end;
end;
run;
Since the range 06mar2021 through 31mar2021 is range number 12, then every date afterwards is just 12 plus the number of months the sales_date follows 31mar2021 - hence the "intck(.......) + 12" is the first argument of the CATS function.
I don't understand what you want to automate.
The code you showed means that you do not have to manually enter any dates, the dates are coming from the input dataset. You also do not need to manually enter any periods since the IF/THEN series will assign those based on the dates.
It might be easier to use a format instead.
If there is some RULE you can explain that generates the cutoff dates for the periods then you could make some code to generate a format. But I do not see any clear pattern to the cutoff dates.
But we can't help you automate this until you provide explanation of the pattern.
Why do some time periods begin on the 1st day of the month?
Why do other time periods begin on the 5th day of the month?
What is the logic here? How can we know what day of the month is the beginning of the time period?
Some time periods started at the 5th day of the month but that has been changed to the 1st day of the month hence the differences but the beginning of the time period is the 1st- I hope this helps.
Does this mean that all time periods, even the ones that start you show starting at the 5th day of the month in your table, are really starting on the 1st day of the month? Or does it mean something else?
If all months begin on the first day of the month, you can use a format to produce a month name such as APR2022, or you can use a custom format from PROC FCMP if it absolutely has to be shown as 01APR2022–31APR2022. But that's extra work that (in my opinion) provides no extra value, as APR2022 is clear to everyone.
Your value of '25: 01Apr - 30Apr2022' also seems to be extra work for no benefit, why is the 25 necessary?
@MagD wrote (emphasis mine)
Some time periods started at the 5th day of the month but that has been changed to the 1st day of the month
You are saying you have an irregular historic set of date ranges up through the 06MAR2021-31MAR2021, and subsequently regular whole month date ranges. So you could just use a logical structure such as this pseudo-code:
data sales;
set Applied_Accounts;
if not missing(sales_date) then do;
if sales_date <='31mar2021'd then salesperiod = put(sales_date,dtfmt.); else
salesperiod=function-of-sales_date-distance-from-31mar2021;
end;
run;
This assumes you use part of the format DTFMT shown by @acordes. Something like
proc format ;
VALUE dtfmt
low - '05apr2020'd='00: Sales pre Apr'
'06APR2020'd- '05MAY2020'd='01: 06APR - 05MAY2020'
'06MAY2020'd-'05OCT2020'd='02: 06MAY - 05OCT2020'
'06OCT2020'd-'05NOV2020'd='07: 06OCT - 05NOV2020'
'06NOV2020'd-'01DEC2020'd='08: 06NOV - 01DEC2020'
'02DEC2020'd-'05JAN2021'd='09: 02DEC - 05JAN2021'
'06JAN2021'd-'05FEB2021'd='10: 06JAN - 05FEB2021'
'06FEB2021'd-'05MAR2021'd='11: 06FEB - 05MAR2021'
'06MAR2021'd-'31MAR2021'd='12: 06MAR - 31MAR2021'
other='xxxxxx';
run;
data sales;
set Applied_Accounts;
if not missing(sales_date) then do;
if sales_date <= '31mar2021'd then salesperiod=put(sales_date,dtfmt.); else
salesperiod= catx(' '
,cats(intck('month','31mar2021'd,sales_date)+12,': 01',put(sales_date,monname3.))
,'-'
,put(intnx('month',sales_date,0,'e'),date9.)
);
end;
end;
run;
Since the range 06mar2021 through 31mar2021 is range number 12, then every date afterwards is just 12 plus the number of months the sales_date follows 31mar2021 - hence the "intck(.......) + 12" is the first argument of the CATS function.
For me that makes no sense. It's a better idea to convince your peers that native formats in SAS should be used.
data have;
infile datalines;
input fecha;
format fecha ddmmyy10.;
informat fecha date9.;
datalines;
05Apr2020
05May2020
05Oct2020
05Nov2020
01Dec2020
05Jan2021
05Feb2021
05Mar2021
31Mar2021
30Apr2021
31May2021
30Jun2021
31Jul2021
31Aug2021
30Sep2021
31Oct2021
30Nov2021
31Dec2021
31Jan2022
28Feb2022
31Mar2022
30Apr2022
31May2022
;
run;
data step;
set have ;
format start stop date9.;
start=lag(fecha)+1;
stop=fecha;
run;
proc sql;
select cats (quote(put(start, date9.), "'"), "d-", quote(put(stop, date9.), "'"), 'd=',
"'"||put(intck('month', '05apr2020'd, start)+1, z2.), catx(" ", ":", substr(put(start, date9.),1,5), "-", put(stop, date9.))||"'") into :oks separated by " "
from step;
quit;
PROC FORMAT;
VALUE dtfmt
low - '05apr2020'd='00: Sales pre Apr'
'06APR2020'd-'05MAY2020'd='01: 06APR - 05MAY2020'
'06MAY2020'd-'05OCT2020'd='02: 06MAY - 05OCT2020'
'06OCT2020'd-'05NOV2020'd='07: 06OCT - 05NOV2020'
'06NOV2020'd-'01DEC2020'd='08: 06NOV - 01DEC2020'
'02DEC2020'd-'05JAN2021'd='09: 02DEC - 05JAN2021'
'06JAN2021'd-'05FEB2021'd='10: 06JAN - 05FEB2021'
'06FEB2021'd-'05MAR2021'd='11: 06FEB - 05MAR2021'
'06MAR2021'd-'31MAR2021'd='12: 06MAR - 31MAR2021'
'01APR2021'd-'30APR2021'd='13: 01APR - 30APR2021'
'01MAY2021'd-'31MAY2021'd='14: 01MAY - 31MAY2021'
'01JUN2021'd-'30JUN2021'd='15: 01JUN - 30JUN2021'
'01JUL2021'd-'31JUL2021'd='16: 01JUL - 31JUL2021'
'01AUG2021'd-'31AUG2021'd='17: 01AUG - 31AUG2021'
'01SEP2021'd-'30SEP2021'd='18: 01SEP - 30SEP2021'
'01OCT2021'd-'31OCT2021'd='19: 01OCT - 31OCT2021'
'01NOV2021'd-'30NOV2021'd='20: 01NOV - 30NOV2021'
'01DEC2021'd-'31DEC2021'd='21: 01DEC - 31DEC2021'
'01JAN2022'd-'31JAN2022'd='22: 01JAN - 31JAN2022'
'01FEB2022'd-'28FEB2022'd='23: 01FEB - 28FEB2022'
'01MAR2022'd-'31MAR2022'd='24: 01MAR - 31MAR2022'
'01APR2022'd-'30APR2022'd='25: 01APR - 30APR2022'
'01MAY2022'd-'31MAY2022'd='26: 01MAY - 31MAY2022'
other = '999:Update code';
RUN;
data fake;
format f_date ddmmyy10.;
do i=1 to 1000;
f_date=rand("integer", '01jan2020'd, today());
output;
end;
run;
proc means data=fake n;
format f_date dtfmt.;
class f_date;
var i;
run;
Let's start with a few cornerstones:
That first IF statement if very fishy looking.
if sales_date <> . then do;
Why are you using the MAX operator?
Since the right hand value is missing then the result is just going to be the left hand value.
So it is the same as doing:
if sales_date then do;
Which will be true when ever SALES_DATE is not missing and not 01JAN1960.
Create a format that's dynamic, you can set how much it grows and use that instead. You could just set the end of the loop to a large number and use that.
data date_fmt;
fmtname = 'DATERANGE_FMT';
type = 'N';
length label $50.;
START=.; END='05APR2020'd; label = '00: Sales pre Apr'; HLO="L"; output;
call missing(HLO);
START= '06APR2020'd; END='05MAY2020'd; label = '01: 06Apr - 05May2020'; output;
/*****type out the rest of your ranges manually*****/
*everything after April is automated;
do i=13 to 30;
START=intnx('month', '01APR2020'd, i-1 , 'b');
END = intnx('month', '01APR2020'd, i-1, 'e');
label = cat(i, ": ", substr(put(start, date9.), 1, 5), " - ", put(end, date9.));
output;
end;
run;
proc format cntlin=date_fmt;
run;
data applied_accounts;
input sales_date date9.;
format sales_date date9.;
cards;
03APR2020
08APR2020
30APR2021
14MAR2022
;;;;
run;
data sales;
set applied_accounts;
salesperiod = put(sales_date, daterange_fmt.);
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.