Find the array of unique type and valid date

kk13 · Posted 04-04-2025 04:31 PM

Below is an example of data I have. I want to find an array of unique types based on the startmonth and startyear. I want to find an array of valid end dates for each unique type.

data HAVE;

input caseid startmonth961 startyear961 startmonth962 startyear962 startmonth963 startyear963 startmonth964 startyear964

startmonth981 startyear981 startmonth982 startyear982 startmonth983 startyear983 startmonth984 startyear984

endmonth961 endyear961 endmonth962 endyear962 endmonth963 endyear963 endmonth964 endyear964

endmonth981 endyear981 endmonth982 endyear82 endmonth983 endyear983 endmonth984 endyear984

type961 type962 type963 type964 type981 type982 type983 type984;

datalines;

1 1 1995 5 1996 10 1997 -4 -4 1 1998 10 1997 -4 -4 -4 -4 1 1995 5 1988 -4 -4 -4 -4 -4 -4 12 1997 -4 -4 -4 -4 6 8 1 . 6 1 . .

;

run;

This is my code. I was able to find unique types. I'm unable to get the valid end dates.

data mycode;

set have;

array train type961-type964 type981-type984;

array start_mo startmonth961-startmonth964 startmonth981-startmonth984;

array start_yr startyear961-startyear964 startyear981-startyear984;

array end_mo endmonth961-endmonth964 endmonth981 - endmonth984;

array end_yr endyear961-endyear964 endyear981 - endyear984;

array unique u01 - u10;

array stmo_unique stmo01-stmo10;

array styr_unique styr01-styr10;

array m m01-m10;

array endmo_unique endmo01-endmo10;

array endyr_unique endyr01-endyr10;

unique_count = 0;

do i =1 to dim(train);

if (train{i} ne .) then do;

match_found = 0;

do j= 1 to unique_count;

if (train{i} = unique{j} and start_mo{i} = stmo_unique{j} and start_yr{i} = styr_unique{j}) then do;

if end_mo{i} ne -4 and end_yr{i} ne -4 then do;

if endmo_unique{j} =-4 and endyr_unique{j}=-4 or (end_yr{i}>endyr_unique{j}) or (end_yr{i} = endyr_unique{j} and end_mo{i} > endmo_unique{j}) then do;

endmo_unique{j} =end_mo{i};

endyr_unique{j}=end_yr{i};

end;

match_found = 1;

m{j} = 1;

leave;

end;

if (match_found = 0) then do;

unique_count + 1;

unique{unique_count} = train{i};

stmo_unique{unique_count}=start_mo{i};

styr_unique{unique_count}=start_yr{i};

endmo_unique{unique_count}=end_mo{i};

endyr_unique{unique_count}=end_yr{i};

end;

keep caseid

run;

This is what I want. If there are duplicates of types, some have invalid end dates and some valid end dates, then the end date should be the most current valid end date. If duplicates have all valid end dates, then it should be the most current valid end dates. If duplicates have all invalid end dates, then the end date should be invalid.

If type is unique, then assign its own end dates (it could be valid or invalid).

For example, type963 and type982 have same type (1) and same start month & year (10/1997). However, endmonth963 (-4) and endyear963 (-4) differ from endmonth982 (12) and endyear1982(1997). So, the third unique type should be: u03 =1, stmo03=10, styr03=1997, endyr03=12, endyr03=1997, m03 =1;

data want;

input id u01 - u5 stmo01-stmo5 styr01-styr5 endmo01-endmo05 endyr01-endyr5 m01-m5;
datalines;
1 6 8 1 6 . 1 5 10 1 . 1995 1996 1997 1998 . 1 5 12 -4 . 1995 1988 1997 -4 . . . 1 . .
;
run;

quickbluefish · Posted 04-04-2025 08:34 PM

Is this an easier problem to think about if your data are vertical?

data vert;
set have;
length num 5 startmo startyr endmo endyr type 8;
array sm {*} startmonth:;
array sy {*} startyear:;
array em {*} endmonth:;
array ey {*} endyear:;
array tp {*} type:;
do i=1 to dim(sm);
	num=substr(vname(sm[i]),anydigit(vname(sm[i])))*1;
	startmo=sm[i];
	startyr=sy[i];
	endmo=em[i];
	endyr=ey[i];
	type=tp[i];
	output;
end;
keep caseid num startmo startyr endmo endyr type;
run;

proc print data=vert width=min; run;

mkeintz · Posted 04-04-2025 11:17 PM

Questions:

What do you mean by "most current"?
I don't think I understand what you mean by "unique". Can you provide a more complete explanation?
Why does your code provide for 10 TYPE variables in WANT, when there are only 8 TYPE variables in HAVE?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Find the array of unique type and valid date

Re: Find the array of unique type and valid date

Re: Find the array of unique type and valid date

Find the array of unique type and valid date

Re: Find the array of unique type and valid date

Re: Find the array of unique type and valid date

Registration is open

SAS Training: Just a Click Away