BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
MagD
Quartz | Level 8

Good afternoon everyone,

 

please help me with the automation of my code. I am relatively new to the automation of SAS codes. Please see the code that I am trying to automate below:

 

data sales;
set Applied_Accounts;

 

if sales_date <> . then do;
If sales_date le '05Apr2020'd then salesperiod  = '00: Sales pre Apr';else
If sales_date le '05May2020'd then salesperiod = '01: 06Apr - 05May2020';else
If sales_date le '05Oct2020'd then salesperiod  = '06: 06Sep - 05Oct2020';else
If sales_date le '05Nov2020'd then salesperiod = '07: 06Oct - 05Nov2020';else
If sales_date le '01Dec2020'd then salesperiod = '08: 06Nov - 01Dec2020';else
If sales_date le '05Jan2021'd then salesperiod  = '09: 02Dec - 05Jan2021';else
If sales_date le '05Feb2021'd then salesperiod  = '10: 06Jan - 05Feb2021';else
If sales_date le '05Mar2021'd then salesperiod = '11: 06Feb - 05Mar2021';else
If sales_date le '31Mar2021'd then salesperiod = '12: 06Mar - 31Mar2021';else
If sales_date le '30Apr2021'd then salesperiod = '13: 01Apr - 30Apr2021';else
If sales_date le '31May2021'd then salesperiod = '14: 01May - 31May2021';else
If sales_date le '30Jun2021'd then salesperiod = '15: 01Jun - 30Jun2021';else
If sales_date le '31Jul2021'd then salesperiod = '16: 01Jul - 31Jul2021';else
If sales_date le '31Aug2021'd then salesperiod = '17: 01Aug - 31Aug2021';else
If sales_date le '30Sep2021'd then salesperiod = '18: 01Sep - 30Sep2021';else
If sales_date le '31Oct2021'd then salesperiod = '19: 01Oct - 31Oct2021';else
If sales_date le '30Nov2021'd then salesperiod = '20: 01Nov - 30Nov2021';else
If sales_date le '31Dec2021'd then salesperiod = '21: 01Dec - 31Dec2021';else
If sales_date le '31Jan2022'd then salesperiod = '22: 01Jan - 31Jan2022';else
If sales_date le '28Feb2022'd then salesperiod = '23: 01Feb - 28Feb2022';else
If sales_date le '31Mar2022'd then salesperiod = '24: 01Mar - 31Mar2022';else
If sales_date le '30Apr2022'd then salesperiod = '25: 01Apr - 30Apr2022';else
If sales_date le '31May2022'd then salesperiod = '26: 01May - 31May2022';else

salesperiod = '999:Update code';
end;

run;

 

I want to automate this code so that I do not have to manually input the sales_data and the salesperiod.

 

Thank you in advance for your response.

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

@MagD wrote (emphasis mine)

 

Some time periods started at the 5th day of the month but that has been changed to the 1st day of the month 


You are saying you have an irregular historic set of date ranges up through the 06MAR2021-31MAR2021, and subsequently regular whole month date ranges.  So you could just use a logical structure such as this pseudo-code:

 

data sales;
  set Applied_Accounts;
  if not missing(sales_date) then do;
    if sales_date <='31mar2021'd then salesperiod = put(sales_date,dtfmt.); else
    salesperiod=function-of-sales_date-distance-from-31mar2021;
  end;
run;

This assumes you use part of the format DTFMT shown by @acordes.  Something like

 

proc format ;
  VALUE dtfmt
    low         - '05apr2020'd='00: Sales pre Apr'    
   '06APR2020'd- '05MAY2020'd='01: 06APR - 05MAY2020'
   '06MAY2020'd-'05OCT2020'd='02: 06MAY - 05OCT2020'
   '06OCT2020'd-'05NOV2020'd='07: 06OCT - 05NOV2020'
   '06NOV2020'd-'01DEC2020'd='08: 06NOV - 01DEC2020'
   '02DEC2020'd-'05JAN2021'd='09: 02DEC - 05JAN2021'
   '06JAN2021'd-'05FEB2021'd='10: 06JAN - 05FEB2021'
   '06FEB2021'd-'05MAR2021'd='11: 06FEB - 05MAR2021'
   '06MAR2021'd-'31MAR2021'd='12: 06MAR - 31MAR2021'
   other='xxxxxx';
run;

data sales;
  set Applied_Accounts;
  if not missing(sales_date) then do;
    if sales_date <= '31mar2021'd then salesperiod=put(sales_date,dtfmt.); else
    salesperiod= catx(' '
                     ,cats(intck('month','31mar2021'd,sales_date)+12,': 01',put(sales_date,monname3.))
                     ,'-'
                     ,put(intnx('month',sales_date,0,'e'),date9.)
                     );
    end;
  end;
run;

Since the range 06mar2021 through 31mar2021 is range number 12, then every date afterwards is just 12 plus the number of months the sales_date follows 31mar2021 - hence the "intck(.......) + 12" is the first argument of the CATS function.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

16 REPLIES 16
AMSAS
SAS Super FREQ

I'd suggest you look at custom formats:

 

PROC FORMAT 

Specifically the CNTLIN/CNTLOUT 

Tom
Super User Tom
Super User

I don't understand what you want to automate.

The code you showed means that you do not have to manually enter any dates, the dates are coming from the input dataset.  You also do not need to manually enter any periods since the IF/THEN series will assign those based on the dates.

 

It might be easier to use a format instead.

 

If there is some RULE you can explain that generates the cutoff dates for the periods then you could make some code to generate a format.  But I do not see any clear pattern to the cutoff dates.  

MagD
Quartz | Level 8
Hey Tom,

So I want to automate the code so that I do not need to keep manually adding more dates to the code, i.e. I won't have to manually add next month's date which will be June, i.e
if sales_date le '30Jun2022'd then salesperiod = '27:' 01Jun - 30Jun2022';
PaigeMiller
Diamond | Level 26

But we can't help you automate this until you provide explanation of the pattern.


Why do some time periods begin on the 1st day of the month?

Why do other time periods begin on the 5th day of the month?
What is the logic here? How can we know what day of the month is the beginning of the time period?

--
Paige Miller
MagD
Quartz | Level 8
Hi Paige,

Apologies for not providing a lot of information.

The beginning of each month is the 1st, so next month's date will be 01Jun - 30Jun2022.

Some time periods started at the 5th day of the month but that has been changed to the 1st day of the month hence the differences but the beginning of the time period is the 1st- I hope this helps.
PaigeMiller
Diamond | Level 26

Some time periods started at the 5th day of the month but that has been changed to the 1st day of the month hence the differences but the beginning of the time period is the 1st- I hope this helps.

Does this mean that all time periods, even the ones that start you show starting at the 5th day of the month in your table, are really starting on the 1st day of the month? Or does it mean something else?


If all months begin on the first day of the month, you can use a format to produce a month name such as APR2022, or you can use a custom format from PROC FCMP if it absolutely has to be shown as 01APR2022–31APR2022. But that's extra work that (in my opinion) provides no extra value, as APR2022 is clear to everyone.

 

Your value of  '25: 01Apr - 30Apr2022' also seems to be extra work for no benefit, why is the 25 necessary?

--
Paige Miller
mkeintz
PROC Star

@MagD wrote (emphasis mine)

 

Some time periods started at the 5th day of the month but that has been changed to the 1st day of the month 


You are saying you have an irregular historic set of date ranges up through the 06MAR2021-31MAR2021, and subsequently regular whole month date ranges.  So you could just use a logical structure such as this pseudo-code:

 

data sales;
  set Applied_Accounts;
  if not missing(sales_date) then do;
    if sales_date <='31mar2021'd then salesperiod = put(sales_date,dtfmt.); else
    salesperiod=function-of-sales_date-distance-from-31mar2021;
  end;
run;

This assumes you use part of the format DTFMT shown by @acordes.  Something like

 

proc format ;
  VALUE dtfmt
    low         - '05apr2020'd='00: Sales pre Apr'    
   '06APR2020'd- '05MAY2020'd='01: 06APR - 05MAY2020'
   '06MAY2020'd-'05OCT2020'd='02: 06MAY - 05OCT2020'
   '06OCT2020'd-'05NOV2020'd='07: 06OCT - 05NOV2020'
   '06NOV2020'd-'01DEC2020'd='08: 06NOV - 01DEC2020'
   '02DEC2020'd-'05JAN2021'd='09: 02DEC - 05JAN2021'
   '06JAN2021'd-'05FEB2021'd='10: 06JAN - 05FEB2021'
   '06FEB2021'd-'05MAR2021'd='11: 06FEB - 05MAR2021'
   '06MAR2021'd-'31MAR2021'd='12: 06MAR - 31MAR2021'
   other='xxxxxx';
run;

data sales;
  set Applied_Accounts;
  if not missing(sales_date) then do;
    if sales_date <= '31mar2021'd then salesperiod=put(sales_date,dtfmt.); else
    salesperiod= catx(' '
                     ,cats(intck('month','31mar2021'd,sales_date)+12,': 01',put(sales_date,monname3.))
                     ,'-'
                     ,put(intnx('month',sales_date,0,'e'),date9.)
                     );
    end;
  end;
run;

Since the range 06mar2021 through 31mar2021 is range number 12, then every date afterwards is just 12 plus the number of months the sales_date follows 31mar2021 - hence the "intck(.......) + 12" is the first argument of the CATS function.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
MagD
Quartz | Level 8
Thank you so much for this solution - it works perfectly.

Thank you for your assistance. I appreciate it.
acordes
Rhodochrosite | Level 12

For me that makes no sense. It's a better idea to convince your peers that native formats in SAS should be used.

 

data have;
infile datalines;
input fecha;
format fecha ddmmyy10.;
informat fecha date9.;
datalines;
05Apr2020
05May2020
05Oct2020
05Nov2020
01Dec2020
05Jan2021
05Feb2021
05Mar2021
31Mar2021
30Apr2021
31May2021
30Jun2021
31Jul2021
31Aug2021
30Sep2021
31Oct2021
30Nov2021
31Dec2021
31Jan2022
28Feb2022
31Mar2022
30Apr2022
31May2022
;
run;

data step;
set have ;
format start stop date9.;
start=lag(fecha)+1;
stop=fecha;
run;

proc sql;
select cats (quote(put(start, date9.), "'"), "d-", quote(put(stop, date9.), "'"), 'd=', 
"'"||put(intck('month', '05apr2020'd, start)+1, z2.), catx(" ", ":", substr(put(start, date9.),1,5), "-", put(stop, date9.))||"'") into :oks separated by " "
from step;
quit;

PROC FORMAT;

VALUE dtfmt
low -       '05apr2020'd='00: Sales pre Apr'    
'06APR2020'd-'05MAY2020'd='01: 06APR - 05MAY2020'
'06MAY2020'd-'05OCT2020'd='02: 06MAY - 05OCT2020'
'06OCT2020'd-'05NOV2020'd='07: 06OCT - 05NOV2020'
'06NOV2020'd-'01DEC2020'd='08: 06NOV - 01DEC2020'
'02DEC2020'd-'05JAN2021'd='09: 02DEC - 05JAN2021'
'06JAN2021'd-'05FEB2021'd='10: 06JAN - 05FEB2021'
'06FEB2021'd-'05MAR2021'd='11: 06FEB - 05MAR2021'
'06MAR2021'd-'31MAR2021'd='12: 06MAR - 31MAR2021'
'01APR2021'd-'30APR2021'd='13: 01APR - 30APR2021'
'01MAY2021'd-'31MAY2021'd='14: 01MAY - 31MAY2021'
'01JUN2021'd-'30JUN2021'd='15: 01JUN - 30JUN2021'
'01JUL2021'd-'31JUL2021'd='16: 01JUL - 31JUL2021'
'01AUG2021'd-'31AUG2021'd='17: 01AUG - 31AUG2021'
'01SEP2021'd-'30SEP2021'd='18: 01SEP - 30SEP2021'
'01OCT2021'd-'31OCT2021'd='19: 01OCT - 31OCT2021'
'01NOV2021'd-'30NOV2021'd='20: 01NOV - 30NOV2021'
'01DEC2021'd-'31DEC2021'd='21: 01DEC - 31DEC2021'
'01JAN2022'd-'31JAN2022'd='22: 01JAN - 31JAN2022'
'01FEB2022'd-'28FEB2022'd='23: 01FEB - 28FEB2022'
'01MAR2022'd-'31MAR2022'd='24: 01MAR - 31MAR2022'
'01APR2022'd-'30APR2022'd='25: 01APR - 30APR2022'
'01MAY2022'd-'31MAY2022'd='26: 01MAY - 31MAY2022'
other = '999:Update code';

RUN;

data fake;
format f_date ddmmyy10.;
do i=1 to 1000;
f_date=rand("integer", '01jan2020'd, today());
output;
end;
run;

proc means data=fake n;
format f_date dtfmt.;
class f_date;
var i;
run;

pic1.pngpic.png 

 

 

MagD
Quartz | Level 8
Thank you so much for this approach - I tested it out and it also works perfectly as well.

Thank you for taking the time to assist me.
Tom
Super User Tom
Super User

That first IF statement if very fishy looking.

if sales_date <> . then do;

Why are you using the MAX operator? 

Since the right hand value is missing then the result is just going to be the left hand value.

So it is the same as doing:

if sales_date then do;

Which will be true when ever SALES_DATE is not missing and not 01JAN1960.

Reeza
Super User

Create a format that's dynamic, you can set how much it grows and use that instead. You could just set the end of the loop to a large number and use that. 

 

data date_fmt;
fmtname = 'DATERANGE_FMT';
type = 'N';

length label $50.;
START=.; END='05APR2020'd; label = '00: Sales pre Apr'; HLO="L"; output; 
call missing(HLO);
START= '06APR2020'd; END='05MAY2020'd; label = '01: 06Apr - 05May2020'; output;

/*****type out the rest of your ranges manually*****/

*everything after April is automated;
do i=13 to 30;
START=intnx('month', '01APR2020'd, i-1 , 'b');
END = intnx('month', '01APR2020'd, i-1, 'e');
label = cat(i, ": ", substr(put(start, date9.), 1, 5), " - ", put(end, date9.));
output;
end;

run;

proc format cntlin=date_fmt;
run;


data applied_accounts;
input sales_date date9.;
format sales_date date9.;
cards;
03APR2020
08APR2020
30APR2021
14MAR2022
;;;;
run;


data sales;
set applied_accounts;

salesperiod = put(sales_date, daterange_fmt.);
run;
MagD
Quartz | Level 8
Thank you so much for your response and recommended approach. I truly appreciate it.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 16 replies
  • 1931 views
  • 6 likes
  • 8 in conversation