Hi,
I want to create a new variable "MonthPaid" that is dependent on two other date variables. Here is how I created the variable for one month (January 2016):
data new;
set old;
if (pwrkcode='D' and workcode='P' and ('01jan2016'd <= dtclpaid <='31jan2016'd) and ('01jan2016'd <= dtwork <='31jan2016'd))
then MonthPaid='Jan-16';
run;
How would I turn this data step into an array/macro to create the variable for 12 months (I want 'MonthPaid' to populate with 'Jan-16', 'Feb-16', 'Mar-16', ........'Dec-16')? Ideally, I'd like to factor in the year, as well, ('MonthPaid would also populate with 'Jan-17', 'Feb-17', etc.) but if that's too complicated, I'm fine having a data step for each year of interest.
If there's a better approach, I'm happy to adjust my plan.
AC
OK, using your added cases and using Tom's idea to switch to "2016-01" instead of "01-2016":
data new;
set old;
if (pwrkcode='D' and workcode='P') then do;
if put(dtclpaid, yymmd7.) = put(dtwork, yymmd7.) then MonthPaid=put(dtwork, yymmd7.);
end;
if (wkstat='C' and workcode in ('P','D','PL','RS','PH','PT','DH') and workflag in ('A','G','N'))
the MonthPaid = put(dtclpaid, yymmd7.);
run;
There are many ways to format the logic ... just pick one that you find easy to interpret.
If you want to convert a date to the first of the month then just use the INTNX() function. You can then attach a format that only displays the Year and Month. Or use the PUT() function if you want to store the value as a character string instead, but using a DATE will probably give you more flexibility.
IF pwrkcode='D' and workcode='P'
and intnx('month',dtclpaid,0,'b') = intnx('month',dtwork ,0,'b')
THEN MonthPaid = intnx('month',dtclpaid,0,'b')
;
format MonthPaid yymm7. ;
Wow. This is a whole new way of thinking about the logic. So actually, the entire logic is this:
if (pwrkcode='D' and workcode='P' and ('01jan2016'd <= dtclpaid <='31jan2016'd) and ('01jan2016'd <= dtwork <='31jan2016'd))
or
(wkstat='C' and workcode in ('P','D','PL','RS','PH','PT','DH') and workflag in ('A','G','N') and ('01jan2016'd <= dtclpaid <='31jan2016'd))
then MonthPaid='Jan-16';
I don't think it would work to replace that second dtclpaid with the intnx format. Correct?
If you want to replace the value with character strings then you could just use one of the available date formats that ignore the day of the month. Or strip out the day of the month from one that does. If you do store it into a character variable I would recommend using one in YYYYMM format in some form so that it will sort properly with January coming before December.
Not sure why you need to test if the date is within the month. You might want to test if the date is within reason.
Ok. You've make me re-think my logic, and I need to discuss this with the business again. Thanks for the feedback and making me think more critically.
Amanda
I would suggest changing the outcome variable. Use four-digit years, and a format that would sort in chronological order such as "01-2016". At any rate, here is one way to get it all done in one step:
data new;
set old;
if (pwrkcode='D' and workcode='P' then do;
if put(dtclpaid, mmyyd7.) = put(dtwork, mmyyd7.) then MonthPaid=put(dtwork, mmyyd7.);
end;
run;
Note that you still need to make decisions:
Changing the outcome variable is a good idea. I wanted to simplify my code for this discussion, but in doing so I removed some relevant logic. Here is the entire logic I need to use:
if (pwrkcode='D' and workcode='P' and ('01jan2016'd <= dtclpaid <='31jan2016'd) and ('01jan2016'd <= dtwork <='31jan2016'd))
or
(wkstat='C' and workcode in ('P','D','PL','RS','PH','PT','DH') and workflag in ('A','G','N') and ('01jan2016'd <= dtclpaid <='31jan2016'd))
then MonthPaid='Jan-16';
Note:
I don't want to populate the the 'MonthPaid' variable unless both dtclpaid and dtwork fall in the same month, so this logic is ok.
OK, using your added cases and using Tom's idea to switch to "2016-01" instead of "01-2016":
data new;
set old;
if (pwrkcode='D' and workcode='P') then do;
if put(dtclpaid, yymmd7.) = put(dtwork, yymmd7.) then MonthPaid=put(dtwork, yymmd7.);
end;
if (wkstat='C' and workcode in ('P','D','PL','RS','PH','PT','DH') and workflag in ('A','G','N'))
the MonthPaid = put(dtclpaid, yymmd7.);
run;
There are many ways to format the logic ... just pick one that you find easy to interpret.
Thanks to you both!
Using logic in meta data to extract payment data
inspired by
https://goo.gl/orP6AA
https://communities.sas.com/t5/Base-SAS-Programming/How-do-a-I-use-an-array-or-macro-to-replace-my-repeating-code/m-p/343776
You need to cycle through logic involving four arrays
using date ranges and constants for every record
in the paid dataset.
This can be solved using SQL quite easily but I don't think
that was the solution the op wanted.
HASH may also be possible, but the logic in the
mata data can change.
HAVE Two datasets meta and paid datasets
====
HAV1ST with valid data data combinations of DTC and DTW)
HAV2ND data to be checked against meta data and
This defines the combinations(meta data) of valid we want to check against
Up to 40 obs WORK.HAV1ST total obs=16
HAV1ST
Obs PWRKCODE WORKCODE DTC DTW
1 D P JAN2016 JAN2016 * matches a range in data
2 D P JAN2016 MAY2016
3 D P JAN2016 JUL2016
4 D P JAN2016 AUG2016
5 D P MAY2016 JAN2016
6 D P MAY2016 MAY2016 * matches
7 D P MAY2016 JUL2016
8 D P MAY2016 AUG2016
9 D P JUL2016 JAN2016
10 D P JUL2016 MAY2016
11 D P JUL2016 JUL2016 *
12 D P JUL2016 AUG2016
13 D P AUG2016 JAN2016 * no match
14 D P AUG2016 MAY2016 * no match
15 D P AUG2016 JUL2016 * no match
16 D P AUG2016 AUG2016 * no match
HAV22ND (This is the main dataset)
Obs PWRKCODE WORKCODE DTCLPAID DTWORK
1 D P 15JAN2016 15JAN2016 * this is valid (in HAV1ST month January)
2 D P 15JAN2016 22JAN2016 * this is valid (in HAV1ST)
3 X P 15MAR2016 15MAR2016 * invalid not in HAV1ST
4 X P 15MAR2016 22MAR2016 * invalid not in HAV1ST
5 D P 15MAY2016 15MAY2016 * this is valid (in HAV1ST)
6 D P 15MAY2016 22MAY2016 * this is valid (in HAV1ST)
7 D P 15JUL2016 15JUL2016
8 D P 15JUL2016 22JUL2016
WANT (Months where customers paid)
====
Macro variable
Month_paid = MONTHPAID=JAN2016 MAY2016 JUL2016
DETAILS
You need to cycle through logic involving four arrays
using date ranges and constants for every record
in the paid dataset.
Note I only used 31 day months to simplify the problem.
if pwrkcode=pwr[mmyy]
and workcode=wrk[mmyy]
and (dtclower[mmyy] <= dtclpaid <=dtcupper[mmyy])
and (dtwlower[mmyy] <= dtwork <=dtwupper[mmyy])
then Month=dtc[mmyy];
* _ _ _
_ __ ___ __ _| | _____ __| | __ _| |_ __ _
| '_ ` _ \ / _` | |/ / _ \_____ / _` |/ _` | __/ _` |
| | | | | | (_| | < __/_____| (_| | (_| | || (_| |
|_| |_| |_|\__,_|_|\_\___| \__,_|\__,_|\__\__,_|
;
proc datasets lib=work kill;
run;quit;
data hav1st;
retain pwrkcode 'D' workcode 'P';
do dtc = 'JAN2016','MAY2016','JUL2016','AUG2016';
do dtw = 'JAN2016','MAY2016','JUL2016','AUG2016';
output;
end;
end;
run;quit;
data hav2nd;
retain pwrkcode 'D' workcode 'P' dtclpaid '01jan2016'd dtwork '31jan2016'd;
format dtclpaid dtwork date9.;
do dtclpaid = '15jan2016'd,'15mar2016'd,'15may2016'd,'15jul2016'd;
if dtclpaid = '15mar2016'd then pwrkcode='X';else pwrkcode='D';
dtwork = dtclpaid;
output;
dtwork = dtwork + 7;
output;
end;
run;quit;
* _ _ _
___ ___ | |_ _| |_(_) ___ _ __
/ __|/ _ \| | | | | __| |/ _ \| '_ \
\__ \ (_) | | |_| | |_| | (_) | | | |
|___/\___/|_|\__,_|\__|_|\___/|_| |_|
;
%symdel dtc dtw pwr wrk monthpaid;
data _null_;
* note it is good idea to do the sql code
here because macro variable sqlobs is very volatile;
* create data for logic arrays;
if _n_=0 then do;
%sysfunc(dosubl('
proc sql;
select
quote(min(dtc))
,quote(min(dtw))
,quote(min(pwrkcode))
,quote(min(workcode))
into
:dtc separated by ','
,:dtw separated by ','
,:pwr separated by ','
,:wrk separated by ','
from
hav1st
group
by dtc
,dtw
,pwrkcode
,workcode
;quit;
'));
end;
length monthpaid $1000;
retain monthpaid ' ';
array dtc[&sqlobs] $7 (&dtc.) ;
array dtw[&sqlobs] $7 (&dtw.) ;
array pwr[&sqlobs] $7 (&pwr.) ;
array wrk[&sqlobs] $7 (&wrk.) ;
set hav2nd end=dne;
format _all_;
do mmyy=1 to &sqlobs;
dtc_1=input(cats("01",dtc[mmyy]),date9.);
dtc_2=input(cats("31",dtc[mmyy]),date9.);
dtw_1=input(cats("01",dtw[mmyy]),date9.);
dtw_2=input(cats("31",dtw[mmyy]),date9.);
if pwrkcode=pwr[mmyy]
and workcode=wrk[mmyy]
and (dtc_1 <= dtclpaid <=dtc_2)
and (dtw_1 <= dtwork <=dtw_2)
then Month=dtc[mmyy];
if month ne '' then leave;
end;
if indexw(monthpaid,month)=0 then monthpaid=catx(' ',monthpaid,month);
if dne then call symputx('monthpaid',monthpaid);
run;quit;
%put &=monthpaid;
MONTHPAID=JAN2016 MAY2016 JUL2016
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.