Below is an example of data I have. I want to find an array of unique types based on the startmonth and startyear. I want to find an array of valid end dates for each unique type.
data HAVE;
input caseid startmonth961 startyear961 startmonth962 startyear962 startmonth963 startyear963 startmonth964 startyear964
startmonth981 startyear981 startmonth982 startyear982 startmonth983 startyear983 startmonth984 startyear984
endmonth961 endyear961 endmonth962 endyear962 endmonth963 endyear963 endmonth964 endyear964
endmonth981 endyear981 endmonth982 endyear82 endmonth983 endyear983 endmonth984 endyear984
type961 type962 type963 type964 type981 type982 type983 type984;
datalines;
1 1 1995 5 1996 10 1997 -4 -4 1 1998 10 1997 -4 -4 -4 -4 1 1995 5 1988 -4 -4 -4 -4 -4 -4 12 1997 -4 -4 -4 -4 6 8 1 . 6 1 . .
;
run;
This is my code. I was able to find unique types. I'm unable to get the valid end dates.
array train type961-type964 type981-type984;
array start_mo startmonth961-startmonth964 startmonth981-startmonth984;
array start_yr startyear961-startyear964 startyear981-startyear984;
array end_mo endmonth961-endmonth964 endmonth981 - endmonth984;
array end_yr endyear961-endyear964 endyear981 - endyear984;
This is what I want. If there are duplicates of types, some have invalid end dates and some valid end dates, then the end date should be the most current valid end date. If duplicates have all valid end dates, then it should be the most current valid end dates. If duplicates have all invalid end dates, then the end date should be invalid.
If type is unique, then assign its own end dates (it could be valid or invalid).
For example, type963 and type982 have same type (1) and same start month & year (10/1997). However, endmonth963 (-4) and endyear963 (-4) differ from endmonth982 (12) and endyear1982(1997). So, the third unique type should be: u03 =1, stmo03=10, styr03=1997, endyr03=12, endyr03=1997, m03 =1;
data want;
input id u01 - u5 stmo01-stmo5 styr01-styr5 endmo01-endmo05 endyr01-endyr5 m01-m5;
datalines;
1 6 8 1 6 . 1 5 10 1 . 1995 1996 1997 1998 . 1 5 12 -4 . 1995 1988 1997 -4 . . . 1 . .
;
run;
Is this an easier problem to think about if your data are vertical?
data vert;
set have;
length num 5 startmo startyr endmo endyr type 8;
array sm {*} startmonth:;
array sy {*} startyear:;
array em {*} endmonth:;
array ey {*} endyear:;
array tp {*} type:;
do i=1 to dim(sm);
num=substr(vname(sm[i]),anydigit(vname(sm[i])))*1;
startmo=sm[i];
startyr=sy[i];
endmo=em[i];
endyr=ey[i];
type=tp[i];
output;
end;
keep caseid num startmo startyr endmo endyr type;
run;
proc print data=vert width=min; run;
Questions:
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.