BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mh04
Obsidian | Level 7

Hi,

I have these 36-character strings where each character represents a month between 201701 and 201912 (in chronological order). I need to create EMPDT which would capture the date of the first 1, 2, or 3 in the string that occurs on or after GRADDT. For ID #1, EMPDT would be 201702 because the second character in the string (which corresponds to 201702) is a 1; for ID #3, EMPDT should be 201706. I had initially created 36 month-level variables, but I haven't figured out how to avoid writing clunky code and loop over them in a way that would get me what I need… I would appreciate any tips.

ID MONTHSTR GRADDT EMPDT
1 210000000000000000000000000000010111 201702 201702
2 000000000000000000000000000000011111 201805 201908
3 111001111000000000000000001111100022 201704 201706
4 222200000000000030000000000000000011 201903 201911
5 000001111111000200000000000000011111 201706 201706
6 001110000000000000000000001000001111 201707 201903
7 100000022000000000000000000000022211 201809 201908
8 000220000001111111111000000000000000 201712 201712
9 033330000000000000000000000001111100 201801 201806

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data have;
input id monthstr : $80. graddt $;
cards;
1 210000000000000000000000000000010111 201702
2 000000000000000000000000000000011111 201805
3 111001111000000000000000001111100022 201704
4 222200000000000030000000000000000011 201903
5 000001111111000200000000000000011111 201706
6 001110000000000000000000001000001111 201707
7 100000022000000000000000000000022211 201809
8 000220000001111111111000000000000000 201712
9 033330000000000000000000000001111100 201801
;run;

%let s=%sysevalf('1jan2017'd);
%let e=%sysevalf('1dec2019'd);

data want;
 set have;
 array x{&s : &e} $ 1 _temporary_;
n=0;
do i=&s to &e ;
  if day(i)=1 then do; n+1; x{i}=char(monthstr,n); end;
end;

idx=input(graddt,yymmn6.); 
if &s>idx  then idx=&s;
if idx>&e  then idx='1jan9999'd;
 do j=idx to &e;
   	if day(j)=1 and x{j} in ('1' '2' '3') then do;empdt=put(j,yymmn6.);leave;end;
 end;
 drop i j n idx;
run;

View solution in original post

12 REPLIES 12
PaigeMiller
Diamond | Level 26

I don't understand this part

 

for ID #3, EMPDT should be 201706. 

 

Please explain further.

--
Paige Miller
mh04
Obsidian | Level 7

The string for ID #3 is 111001111000000000000000001111100022; the first 1/2/3 that comes on or after the GRADDT of 201704 (so in 4th position or after) is in the 6th position (in red). Since the string represents months in chronological order between 201601 and 201912, the 6th position means 201706. Hope this makes sense. GRADDT is not a SAS date, it's character. I'd like the new variable to also be character.

mh04
Obsidian | Level 7
I meant chronological order between 201701 and 201912..
s_lassen
Meteorite | Level 14

If your data is like this:

data have;
length id 8 monthstr $36;
informat graddt yymmn6.;
format graddt yymmn6.;
input id monthstr graddt;
cards;
1 210000000000000000000000000000010111 201702
2 000000000000000000000000000000011111 201805
3 111001111000000000000000001111100022 201704
4 222200000000000030000000000000000011 201903
5 000001111111000200000000000000011111 201706
6 001110000000000000000000001000001111 201707
7 100000022000000000000000000000022211 201809
8 000220000001111111111000000000000000 201712
9 033330000000000000000000000001111100 201801
;run;

You can do the calculation like this:

data want;
  set have;
  start=intck('month','31DEC2016'd,graddt);
  dif=indexc(substr(monthstr,start),'123')-1;
  empdt=intnx('month',graddt,dif);
  format empdt yymmn6.;
run;

START is the number of months between the end of 2016 and GRADDT, the place where we should start looking in the string. DIF is the difference in months between GRADDT and EMPDT.

 

This is of course assuming that GRADDT is a SAS date, if it is a string, you should use the INPUT function to calculate the SAS date. Likewise, if you want the EMPDT as a string, you can use PUT to get it.

 

mh04
Obsidian | Level 7

Thank you! This works pretty well. There are a few places where it doesn't work because the data are a bit more messy than mentioned earlier.

I have some cases with GRADDT before 201701.  Using this code sets EMPDT to the month before GRADDT instead of the first occurrence of 1/2/3 in the string. In the example below, EMPDT gets set to 201608 instead of 201702.

 

MONTHSTR

GRADDT

EMPDT

011110000000000000000000000000000000

201609

201608

mh04
Obsidian | Level 7
Similar to the issue mentioned earlier, if there are no values of 1/2/3 after GRADDT, it returns the date of the prior month instead of 0 or missing. Not sure how to edit the code to fix that.
ballardw
Super User

@mh04 wrote:
Similar to the issue mentioned earlier, if there are no values of 1/2/3 after GRADDT, it returns the date of the prior month instead of 0 or missing. Not sure how to edit the code to fix that.

Quality of code is directly dependent on the quality of the program description.

You did not provide any mention that GRADDT would be, basically, invalid in terms of your general rules. Or how to deal with it. And what about missing graddt? Values later than that range?

 

My approach was something along these lines:

data read;
  informat  GRADDT EMPDT yymmn6.;
  input   ID $  @40 GRADDT EMPDT @;
  input @3 @;
  scoredate = '01Jan2017'd;
  do until (scoredate > "01Dec2019"d);
     input score 1. @;
     output;
     scoredate = intnx('month',scoredate,1,'b');
  end;
  input;
  format GRADDT EMPDT scoredate yymmn6.;
datalines;
1 210000000000000000000000000000010111 201702 201702
2 000000000000000000000000000000011111 201805 201908
3 111001111000000000000000001111100022 201704 201706
4 222200000000000030000000000000000011 201903 201911
5 000001111111000200000000000000011111 201706 201706
6 001110000000000000000000001000001111 201707 201903
7 100000022000000000000000000000022211 201809 201908
8 000220000001111111111000000000000000 201712 201712
9 033330000000000000000000000001111100 201801 201806
;

data want;
   set read (where=(scoredate ge graddt and score in (1,2,3)));
   by id scoredate;
   if first.id;
run;

If you need what I call scoredate because I couldn't actually tell if you already had an EMPDT in your data or not, merge this result back onto the original data, which would have a missing scoredate for most invalid graddt values.

mh04
Obsidian | Level 7
Yes, sorry for leaving out important details about the data. GRADDT ranges between 200901 and 201912. EMPDT doesn't exist in the data and should be missing if GRADDT is missing. If there's no 1/2/3 on or after GRADDT, EMPDT should be 0. My data are always in SAS datasets so I know nothing about reading in raw data. I'm not sure how to edit your code to make it work for me. Thanks anyway though.
s_lassen
Meteorite | Level 14

So what should EMPDT be set to if GRADDT is before 2017?

 

BTW, if you are using SAS dates, setting EMPDT to 0 will mean setting it to January 1st, 1960, as SAS date values are internally represented as the number of days from that date (or before, if the value is negative). If you just want to distinguish it from the cases where GRADDT is missing, you can also use a special missing value, e.g.:

data want;
  set have;
  if missing(graddt) then empdt=.;
  else do;
    start=max(intck('month','31DEC2016'd,graddt),1);
    dif=indexc(substr(monthstr,start),'123')-1;
    if dif=-1 then empdt=.A;
    else empdt=intnx('month',graddt,dif);
    end;
  format empdt yymmn6.;
run;

So here EMPDT is set to the special missing value .A when there is no 1/2/3 after GRADDT. I also put in code to look from the beginning of the string when GRADDT is before 2017, using the MAX function.

mh04
Obsidian | Level 7
Thank you! If GRADDT is before 2017, EMPDT should be set to the first month&year in the string that equals 1/2/3. I had very few cases in that scenario so to save time I hardcoded based on position of first 1/2/3.
Ksharp
Super User
data have;
input id monthstr : $80. graddt $;
cards;
1 210000000000000000000000000000010111 201702
2 000000000000000000000000000000011111 201805
3 111001111000000000000000001111100022 201704
4 222200000000000030000000000000000011 201903
5 000001111111000200000000000000011111 201706
6 001110000000000000000000001000001111 201707
7 100000022000000000000000000000022211 201809
8 000220000001111111111000000000000000 201712
9 033330000000000000000000000001111100 201801
;run;

%let s=%sysevalf('1jan2017'd);
%let e=%sysevalf('1dec2019'd);

data want;
 set have;
 array x{&s : &e} $ 1 _temporary_;
n=0;
do i=&s to &e ;
  if day(i)=1 then do; n+1; x{i}=char(monthstr,n); end;
end;

idx=input(graddt,yymmn6.); 
if &s>idx  then idx=&s;
if idx>&e  then idx='1jan9999'd;
 do j=idx to &e;
   	if day(j)=1 and x{j} in ('1' '2' '3') then do;empdt=put(j,yymmn6.);leave;end;
 end;
 drop i j n idx;
run;
mh04
Obsidian | Level 7
Thank you! I was able to make this work for me.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 790 views
  • 4 likes
  • 5 in conversation