Hi there,
I have a really large dataset and I was wondering if there was a way I can flag whether a particular variable value is the same or not for IDs with more than 1 record. I am not looking for exact duplicates across all variable values (i.e can't use NODUPKEY), nor am I interested in removing the exact duplicates, I just wanted to flag whether the values were the same or not. I have included a same dataset below.
ID Fruit Colour Flag
1001 Apple Purple 1
1003 Banana Black 0
1001 Apple Purple 1
1005 Mango Blue 0
1005 Mango Pink 0
1002 Apple Red 0
1006 Cherry Yellow 0
1001 Pear Purple 1
For the above dataset, among IDs with more than 1 record, I am interested in flagging whether the fruit variable value is the same/different among all observations for that particular ID. I have included how I would like my flag variable to code. For example ID 1001, the fruit variable is the same only for 2/3 observations, so I wanted to flag this as 1 (not the same).
Thank you once again for taking the time to read through my question and I am looking forward to reading any suggestions you might have to solve my problem!
Data name;
infile cards expandtabs;
input icustomerid debt_code rep_code dr_inits $ dr_name $;
datalines ;
48196 367762168 131 Tami Baker
48196 337656029 131 Tami Baker
48196 302678693 131 Tami Baker
48197 302678693 131 Tami Baker
48197 302678693 131 Tami Bak
;
run;
proc sql;
create table want as
select *,count(distinct dr_name) ne 1 as flag
from name
group by icustomerid ;
quit;
Alternative option with hash tables:
data have;
input ID Fruit : $ Colour : $;
cards;
1001 Apple Purple 1
1003 Banana Black 0
1001 Apple Purple 1
1005 Mango Blue 0
1005 Mango Pink 0
1002 Apple Red 0
1006 Cherry Yellow 0
1001 Pear Purple 1
;
run;
proc print;
run;
data want;
length F $ 8 ; drop F;
declare hash H();
H.defineKey("ID");
H.defineData("F","Flag");
H.defineDone();
do until(EOF1);
set have end=EOF1;
if 0=H.find() then do; if F NE Fruit then Flag+1; end;
else Flag=-1;
_N_=H.replace();
end;
do until(EOF2);
set have end=EOF2;
_N_=H.find();
Flag=Flag>0;
output;
end;
stop;
run;
proc print;
run;
Bart
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.