Hi,
I have a dataset as follows:
ID Var1 Var2 Var3 Var4 Var5
1 Y Y Y Y N
1 Y Y Y Y N
1 N N Y N
2 N N N N Y
2 Y Y N N Y
I want to identify the similarities and differences between each variable (Var1-Var5) by ID.
eg: For ID1, obs 1 and 2 are same for all variables, but obs3 is different in terms of Var1, Var2, Var4 and Var5.
I am not sure how I will be able to flag each of these differences without making the code too long.
Thank you for your help.
Please show what you expect the output for that input to look like. With 5 variables and apparenlty 3 values (including missing) you have 243 possible value combinations to consider.
Also if an Id has 10 records are each of the 10 to compared to all of the other records for the id, only the first or some other rule? You potentially are dealing with factorial numbers of comparisons which depending on how many records per ID can lead to extremely large numbers of comparisons.
I assume the dataset is already sorted by ID.
Are duplicates always contiguous? If so, then this simple program is all you need:
data want;
set have;
by ID var1-var5 notsorted;
if first.var5 then first_obs_this_pattern=1;
else first_obs_this_pattern=0;
run;
Notes:
If the data are not pre-grouped by VAR1-VAR5 and sorting is not expsensive (i.e. data set is not big), then this works:
proc sort data=have out=need;
by ID var1-var5;
run;
data want;
set need;
by ID var1-var5 ;
if first.var5 then first_obs_this_pattern=1;
else first_obs_this_pattern=0;
run;
NOtes:
If (1) the data set is so big that you want to avoid sorting, and (2) you don't have too many records per ID, then a little more programming as in:
data want;
array ny_list {10} $20 _temporary_;
set have;
by id;
length ny_test $20 ;
ny_test=cat(of var:);
if first.id then N_patterns=0;
retain N_patterns;
do I=1 by 1 while (I<=N_patterns);
if ny_test=ny_list{I} then leave;
end;
if I>N_patterns then do;
first_obs_this_pattern=1;
N_patterns+1;
ny_list{N_patterns}=ny_test;
end;
else first_obs_this_pattern=0;
drop ny_test N_patterns I;
run;
NOtes:
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.