Hi there,
I have a really large dataset and I was wondering if there was a way I can flag whether a particular variable value is the same or not for IDs with more than 1 record. I am not looking for exact duplicates across all variable values (i.e can't use NODUPKEY), nor am I interested in removing the exact duplicates, I just wanted to flag whether the values were the same or not. I have included a same dataset below.
ID Fruit Colour Flag
1001 Apple Purple 1
1003 Banana Black 0
1001 Apple Purple 1
1005 Mango Blue 0
1005 Mango Pink 0
1002 Apple Red 0
1006 Cherry Yellow 0
1001 Pear Purple 1
For the above dataset, among IDs with more than 1 record, I am interested in flagging whether the fruit variable value is the same/different among all observations for that particular ID. I have included how I would like my flag variable to code. For example ID 1001, the fruit variable is the same only for 2/3 observations, so I wanted to flag this as 1 (not the same).
Thank you once again for taking the time to read through my question and I am looking forward to reading any suggestions you might have to solve my problem!
Data name;
infile cards expandtabs;
input icustomerid debt_code rep_code dr_inits $ dr_name $;
datalines ;
48196 367762168 131 Tami Baker
48196 337656029 131 Tami Baker
48196 302678693 131 Tami Baker
48197 302678693 131 Tami Baker
48197 302678693 131 Tami Bak
;
run;
proc sql;
create table want as
select *,count(distinct dr_name) ne 1 as flag
from name
group by icustomerid ;
quit;
Alternative option with hash tables:
data have;
input ID Fruit : $ Colour : $;
cards;
1001 Apple Purple 1
1003 Banana Black 0
1001 Apple Purple 1
1005 Mango Blue 0
1005 Mango Pink 0
1002 Apple Red 0
1006 Cherry Yellow 0
1001 Pear Purple 1
;
run;
proc print;
run;
data want;
length F $ 8 ; drop F;
declare hash H();
H.defineKey("ID");
H.defineData("F","Flag");
H.defineDone();
do until(EOF1);
set have end=EOF1;
if 0=H.find() then do; if F NE Fruit then Flag+1; end;
else Flag=-1;
_N_=H.replace();
end;
do until(EOF2);
set have end=EOF2;
_N_=H.find();
Flag=Flag>0;
output;
end;
stop;
run;
proc print;
run;
Bart
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.