BookmarkSubscribeRSS Feed
jnivi
Calcite | Level 5

Hi there, 

 

I have a really large dataset and I was wondering if there was a way I can flag whether a particular variable value is the same or not for IDs with more than 1 record. I am not looking for exact duplicates across all variable values (i.e can't use NODUPKEY), nor am I interested in removing the exact duplicates, I just wanted to flag whether the values were the same or not. I have included a same dataset below. 

 

ID       Fruit     Colour         Flag

1001   Apple    Purple          1

1003   Banana  Black          0

1001   Apple      Purple        1

1005   Mango   Blue            0

1005   Mango    Pink           0

1002   Apple     Red            0

1006   Cherry   Yellow         0

1001   Pear       Purple        1

 

For the above dataset, among IDs with more than 1 record, I am interested in flagging whether the fruit variable value is the same/different among all observations for that particular ID. I have included how I would like my flag variable to code. For example ID 1001, the fruit variable is the same only for 2/3 observations, so I wanted to flag this as 1 (not the same). 

 

Thank you once again for taking the time to read through my question and I am looking forward to reading any suggestions you might have to solve my problem!

 

2 REPLIES 2
Ksharp
Super User
Data name;
infile cards expandtabs;
input icustomerid debt_code rep_code dr_inits $ dr_name $;
datalines ;
48196 367762168 131 Tami Baker
48196 337656029 131 Tami Baker
48196 302678693 131 Tami Baker
48197 302678693 131 Tami Baker
48197 302678693 131 Tami Bak
;
run;

proc sql;
create table want as
select *,count(distinct dr_name) ne 1 as flag
 from name
  group by icustomerid ;
quit;
yabwon
Amethyst | Level 16

Alternative option with hash tables:

data have;
input ID       Fruit : $     Colour : $;
cards;
1001   Apple    Purple          1
1003   Banana  Black          0
1001   Apple      Purple        1
1005   Mango   Blue            0
1005   Mango    Pink           0
1002   Apple     Red            0
1006   Cherry   Yellow         0
1001   Pear       Purple        1
;
run;
proc print;
run;

data want;

  length F $ 8 ; drop F;
  declare hash H();
  H.defineKey("ID");
  H.defineData("F","Flag");
  H.defineDone();


  do until(EOF1);
    set have end=EOF1;

    if 0=H.find() then do; if F NE Fruit then Flag+1; end;
    else Flag=-1;

    _N_=H.replace();
  end;


  do until(EOF2);
    set have end=EOF2;

    _N_=H.find();

    Flag=Flag>0;
    output;
  end;
stop;
run;
proc print;
run;

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 868 views
  • 0 likes
  • 3 in conversation