Below is an example of data I have. I want to find an array of unique types based on the startmonth and startyear. I want to find an array of valid end dates for each unique type. data HAVE; input caseid startmonth961 startyear961 startmonth962 startyear962 startmonth963 startyear963 startmonth964 startyear964 startmonth981 startyear981 startmonth982 startyear982 startmonth983 startyear983 startmonth984 startyear984 endmonth961 endyear961 endmonth962 endyear962 endmonth963 endyear963 endmonth964 endyear964 endmonth981 endyear981 endmonth982 endyear82 endmonth983 endyear983 endmonth984 endyear984 type961 type962 type963 type964 type981 type982 type983 type984; datalines; 1 1 1995 5 1996 10 1997 -4 -4 1 1998 10 1997 -4 -4 -4 -4 1 1995 5 1988 -4 -4 -4 -4 -4 -4 12 1997 -4 -4 -4 -4 6 8 1 . 6 1 . . ; run; This is my code. I was able to find unique types. I'm unable to get the valid end dates. data mycode; set have; array train type961-type964 type981-type984; array start_mo startmonth961-startmonth964 startmonth981-startmonth984; array start_yr startyear961-startyear964 startyear981-startyear984; array end_mo endmonth961-endmonth964 endmonth981 - endmonth984; array end_yr endyear961-endyear964 endyear981 - endyear984; array unique u01 - u10; array stmo_unique stmo01-stmo10; array styr_unique styr01-styr10; array m m01-m10; array endmo_unique endmo01-endmo10; array endyr_unique endyr01-endyr10; unique_count = 0; do i =1 to dim(train); if (train{i} ne .) then do; match_found = 0; do j= 1 to unique_count; if (train{i} = unique{j} and start_mo{i} = stmo_unique{j} and start_yr{i} = styr_unique{j}) then do; if end_mo{i} ne -4 and end_yr{i} ne -4 then do; if endmo_unique{j} =-4 and endyr_unique{j}=-4 or (end_yr{i}>endyr_unique{j}) or (end_yr{i} = endyr_unique{j} and end_mo{i} > endmo_unique{j}) then do; endmo_unique{j} =end_mo{i}; endyr_unique{j}=end_yr{i}; end; end; match_found = 1; m{j} = 1; leave; end; end; if (match_found = 0) then do; unique_count + 1; unique{unique_count} = train{i}; stmo_unique{unique_count}=start_mo{i}; styr_unique{unique_count}=start_yr{i}; endmo_unique{unique_count}=end_mo{i}; endyr_unique{unique_count}=end_yr{i}; end; end; end; keep caseid run; This is what I want. If there are duplicates of types, some have invalid end dates and some valid end dates, then the end date should be the most current valid end date. If duplicates have all valid end dates, then it should be the most current valid end dates. If duplicates have all invalid end dates, then the end date should be invalid. If type is unique, then assign its own end dates (it could be valid or invalid). For example, type963 and type982 have same type (1) and same start month & year (10/1997). However, endmonth963 (-4) and endyear963 (-4) differ from endmonth982 (12) and endyear1982(1997). So, the third unique type should be: u03 =1, stmo03=10, styr03=1997, endyr03=12, endyr03=1997, m03 =1; data want; input id u01 - u5 stmo01-stmo5 styr01-styr5 endmo01-endmo05 endyr01-endyr5 m01-m5; datalines; 1 6 8 1 6 . 1 5 10 1 . 1995 1996 1997 1998 . 1 5 12 -4 . 1995 1988 1997 -4 . . . 1 . . ; run;
... View more