I have a really large dataset and I was wondering if there was a way I can flag whether a particular variable value is the same or not for IDs with more than 1 record. I am not looking for exact duplicates across all variable values (i.e can't use NODUPKEY), nor am I interested in removing the exact duplicates, I just wanted to flag whether the values were the same or not. I have included a same dataset below.
ID Fruit Colour Flag
1001 Apple Purple 1
1003 Banana Black 0
1001 Apple Purple 1
1005 Mango Blue 0
1005 Mango Pink 0
1002 Apple Red 0
1006 Cherry Yellow 0
1001 Pear Purple 1
For the above dataset, among IDs with more than 1 record, I am interested in flagging whether the fruit variable value is the same/different among all observations for that particular ID. I have included how I would like my flag variable to code. For example ID 1001, the fruit variable is the same only for 2/3 observations, so I wanted to flag this as 1 (not the same).
Thank you once again for taking the time to read through my question and I am looking forward to reading any suggestions you might have to solve my problem!
Data name; infile cards expandtabs; input icustomerid debt_code rep_code dr_inits $ dr_name $; datalines ; 48196 367762168 131 Tami Baker 48196 337656029 131 Tami Baker 48196 302678693 131 Tami Baker 48197 302678693 131 Tami Baker 48197 302678693 131 Tami Bak ; run; proc sql; create table want as select *,count(distinct dr_name) ne 1 as flag from name group by icustomerid ; quit;
Alternative option with hash tables:
data have; input ID Fruit : $ Colour : $; cards; 1001 Apple Purple 1 1003 Banana Black 0 1001 Apple Purple 1 1005 Mango Blue 0 1005 Mango Pink 0 1002 Apple Red 0 1006 Cherry Yellow 0 1001 Pear Purple 1 ; run; proc print; run; data want; length F $ 8 ; drop F; declare hash H(); H.defineKey("ID"); H.defineData("F","Flag"); H.defineDone(); do until(EOF1); set have end=EOF1; if 0=H.find() then do; if F NE Fruit then Flag+1; end; else Flag=-1; _N_=H.replace(); end; do until(EOF2); set have end=EOF2; _N_=H.find(); Flag=Flag>0; output; end; stop; run; proc print; run;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.