Hi,
I have these 36-character strings where each character represents a month between 201701 and 201912 (in chronological order). I need to create EMPDT which would capture the date of the first 1, 2, or 3 in the string that occurs on or after GRADDT. For ID #1, EMPDT would be 201702 because the second character in the string (which corresponds to 201702) is a 1; for ID #3, EMPDT should be 201706. I had initially created 36 month-level variables, but I haven't figured out how to avoid writing clunky code and loop over them in a way that would get me what I need… I would appreciate any tips.
ID MONTHSTR GRADDT EMPDT
1 210000000000000000000000000000010111 201702 201702
2 000000000000000000000000000000011111 201805 201908
3 111001111000000000000000001111100022 201704 201706
4 222200000000000030000000000000000011 201903 201911
5 000001111111000200000000000000011111 201706 201706
6 001110000000000000000000001000001111 201707 201903
7 100000022000000000000000000000022211 201809 201908
8 000220000001111111111000000000000000 201712 201712
9 033330000000000000000000000001111100 201801 201806
data have;
input id monthstr : $80. graddt $;
cards;
1 210000000000000000000000000000010111 201702
2 000000000000000000000000000000011111 201805
3 111001111000000000000000001111100022 201704
4 222200000000000030000000000000000011 201903
5 000001111111000200000000000000011111 201706
6 001110000000000000000000001000001111 201707
7 100000022000000000000000000000022211 201809
8 000220000001111111111000000000000000 201712
9 033330000000000000000000000001111100 201801
;run;
%let s=%sysevalf('1jan2017'd);
%let e=%sysevalf('1dec2019'd);
data want;
set have;
array x{&s : &e} $ 1 _temporary_;
n=0;
do i=&s to &e ;
if day(i)=1 then do; n+1; x{i}=char(monthstr,n); end;
end;
idx=input(graddt,yymmn6.);
if &s>idx then idx=&s;
if idx>&e then idx='1jan9999'd;
do j=idx to &e;
if day(j)=1 and x{j} in ('1' '2' '3') then do;empdt=put(j,yymmn6.);leave;end;
end;
drop i j n idx;
run;
I don't understand this part
for ID #3, EMPDT should be 201706.
Please explain further.
The string for ID #3 is 111001111000000000000000001111100022; the first 1/2/3 that comes on or after the GRADDT of 201704 (so in 4th position or after) is in the 6th position (in red). Since the string represents months in chronological order between 201601 and 201912, the 6th position means 201706. Hope this makes sense. GRADDT is not a SAS date, it's character. I'd like the new variable to also be character.
If your data is like this:
data have;
length id 8 monthstr $36;
informat graddt yymmn6.;
format graddt yymmn6.;
input id monthstr graddt;
cards;
1 210000000000000000000000000000010111 201702
2 000000000000000000000000000000011111 201805
3 111001111000000000000000001111100022 201704
4 222200000000000030000000000000000011 201903
5 000001111111000200000000000000011111 201706
6 001110000000000000000000001000001111 201707
7 100000022000000000000000000000022211 201809
8 000220000001111111111000000000000000 201712
9 033330000000000000000000000001111100 201801
;run;
You can do the calculation like this:
data want;
set have;
start=intck('month','31DEC2016'd,graddt);
dif=indexc(substr(monthstr,start),'123')-1;
empdt=intnx('month',graddt,dif);
format empdt yymmn6.;
run;
START is the number of months between the end of 2016 and GRADDT, the place where we should start looking in the string. DIF is the difference in months between GRADDT and EMPDT.
This is of course assuming that GRADDT is a SAS date, if it is a string, you should use the INPUT function to calculate the SAS date. Likewise, if you want the EMPDT as a string, you can use PUT to get it.
Thank you! This works pretty well. There are a few places where it doesn't work because the data are a bit more messy than mentioned earlier.
I have some cases with GRADDT before 201701. Using this code sets EMPDT to the month before GRADDT instead of the first occurrence of 1/2/3 in the string. In the example below, EMPDT gets set to 201608 instead of 201702.
MONTHSTR | GRADDT | EMPDT |
011110000000000000000000000000000000 | 201609 | 201608 |
@mh04 wrote:
Similar to the issue mentioned earlier, if there are no values of 1/2/3 after GRADDT, it returns the date of the prior month instead of 0 or missing. Not sure how to edit the code to fix that.
Quality of code is directly dependent on the quality of the program description.
You did not provide any mention that GRADDT would be, basically, invalid in terms of your general rules. Or how to deal with it. And what about missing graddt? Values later than that range?
My approach was something along these lines:
data read; informat GRADDT EMPDT yymmn6.; input ID $ @40 GRADDT EMPDT @; input @3 @; scoredate = '01Jan2017'd; do until (scoredate > "01Dec2019"d); input score 1. @; output; scoredate = intnx('month',scoredate,1,'b'); end; input; format GRADDT EMPDT scoredate yymmn6.; datalines; 1 210000000000000000000000000000010111 201702 201702 2 000000000000000000000000000000011111 201805 201908 3 111001111000000000000000001111100022 201704 201706 4 222200000000000030000000000000000011 201903 201911 5 000001111111000200000000000000011111 201706 201706 6 001110000000000000000000001000001111 201707 201903 7 100000022000000000000000000000022211 201809 201908 8 000220000001111111111000000000000000 201712 201712 9 033330000000000000000000000001111100 201801 201806 ; data want; set read (where=(scoredate ge graddt and score in (1,2,3))); by id scoredate; if first.id; run;
If you need what I call scoredate because I couldn't actually tell if you already had an EMPDT in your data or not, merge this result back onto the original data, which would have a missing scoredate for most invalid graddt values.
So what should EMPDT be set to if GRADDT is before 2017?
BTW, if you are using SAS dates, setting EMPDT to 0 will mean setting it to January 1st, 1960, as SAS date values are internally represented as the number of days from that date (or before, if the value is negative). If you just want to distinguish it from the cases where GRADDT is missing, you can also use a special missing value, e.g.:
data want;
set have;
if missing(graddt) then empdt=.;
else do;
start=max(intck('month','31DEC2016'd,graddt),1);
dif=indexc(substr(monthstr,start),'123')-1;
if dif=-1 then empdt=.A;
else empdt=intnx('month',graddt,dif);
end;
format empdt yymmn6.;
run;
So here EMPDT is set to the special missing value .A when there is no 1/2/3 after GRADDT. I also put in code to look from the beginning of the string when GRADDT is before 2017, using the MAX function.
data have;
input id monthstr : $80. graddt $;
cards;
1 210000000000000000000000000000010111 201702
2 000000000000000000000000000000011111 201805
3 111001111000000000000000001111100022 201704
4 222200000000000030000000000000000011 201903
5 000001111111000200000000000000011111 201706
6 001110000000000000000000001000001111 201707
7 100000022000000000000000000000022211 201809
8 000220000001111111111000000000000000 201712
9 033330000000000000000000000001111100 201801
;run;
%let s=%sysevalf('1jan2017'd);
%let e=%sysevalf('1dec2019'd);
data want;
set have;
array x{&s : &e} $ 1 _temporary_;
n=0;
do i=&s to &e ;
if day(i)=1 then do; n+1; x{i}=char(monthstr,n); end;
end;
idx=input(graddt,yymmn6.);
if &s>idx then idx=&s;
if idx>&e then idx='1jan9999'd;
do j=idx to &e;
if day(j)=1 and x{j} in ('1' '2' '3') then do;empdt=put(j,yymmn6.);leave;end;
end;
drop i j n idx;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.