Hi,
I have a dataset as follows:
ID Var1 Var2 Var3 Var4 Var5
1 Y Y Y Y N
1 Y Y Y Y N
1 N N Y N
2 N N N N Y
2 Y Y N N Y
I want to identify the similarities and differences between each variable (Var1-Var5) by ID.
eg: For ID1, obs 1 and 2 are same for all variables, but obs3 is different in terms of Var1, Var2, Var4 and Var5.
I am not sure how I will be able to flag each of these differences without making the code too long.
Thank you for your help.
Please show what you expect the output for that input to look like. With 5 variables and apparenlty 3 values (including missing) you have 243 possible value combinations to consider.
Also if an Id has 10 records are each of the 10 to compared to all of the other records for the id, only the first or some other rule? You potentially are dealing with factorial numbers of comparisons which depending on how many records per ID can lead to extremely large numbers of comparisons.
I assume the dataset is already sorted by ID.
Are duplicates always contiguous? If so, then this simple program is all you need:
data want;
set have;
by ID var1-var5 notsorted;
if first.var5 then first_obs_this_pattern=1;
else first_obs_this_pattern=0;
run;
Notes:
If the data are not pre-grouped by VAR1-VAR5 and sorting is not expsensive (i.e. data set is not big), then this works:
proc sort data=have out=need;
by ID var1-var5;
run;
data want;
set need;
by ID var1-var5 ;
if first.var5 then first_obs_this_pattern=1;
else first_obs_this_pattern=0;
run;
NOtes:
If (1) the data set is so big that you want to avoid sorting, and (2) you don't have too many records per ID, then a little more programming as in:
data want;
array ny_list {10} $20 _temporary_;
set have;
by id;
length ny_test $20 ;
ny_test=cat(of var:);
if first.id then N_patterns=0;
retain N_patterns;
do I=1 by 1 while (I<=N_patterns);
if ny_test=ny_list{I} then leave;
end;
if I>N_patterns then do;
first_obs_this_pattern=1;
N_patterns+1;
ny_list{N_patterns}=ny_test;
end;
else first_obs_this_pattern=0;
drop ny_test N_patterns I;
run;
NOtes:
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.