BookmarkSubscribeRSS Feed
jnivi
Calcite | Level 5

Hi there, 

 

I have a really large dataset and I was wondering if there was a way I can flag whether a particular variable value is the same or not for IDs with more than 1 record. I am not looking for exact duplicates across all variable values (i.e can't use NODUPKEY), nor am I interested in removing the exact duplicates, I just wanted to flag whether the values were the same or not. I have included a same dataset below. 

 

ID       Fruit     Colour         Flag

1001   Apple    Purple          1

1003   Banana  Black          0

1001   Apple      Purple        1

1005   Mango   Blue            0

1005   Mango    Pink           0

1002   Apple     Red            0

1006   Cherry   Yellow         0

1001   Pear       Purple        1

 

For the above dataset, among IDs with more than 1 record, I am interested in flagging whether the fruit variable value is the same/different among all observations for that particular ID. I have included how I would like my flag variable to code. For example ID 1001, the fruit variable is the same only for 2/3 observations, so I wanted to flag this as 1 (not the same). 

 

Thank you once again for taking the time to read through my question and I am looking forward to reading any suggestions you might have to solve my problem!

 

2 REPLIES 2
Ksharp
Super User
Data name;
infile cards expandtabs;
input icustomerid debt_code rep_code dr_inits $ dr_name $;
datalines ;
48196 367762168 131 Tami Baker
48196 337656029 131 Tami Baker
48196 302678693 131 Tami Baker
48197 302678693 131 Tami Baker
48197 302678693 131 Tami Bak
;
run;

proc sql;
create table want as
select *,count(distinct dr_name) ne 1 as flag
 from name
  group by icustomerid ;
quit;
yabwon
Meteorite | Level 14

Alternative option with hash tables:

data have;
input ID       Fruit : $     Colour : $;
cards;
1001   Apple    Purple          1
1003   Banana  Black          0
1001   Apple      Purple        1
1005   Mango   Blue            0
1005   Mango    Pink           0
1002   Apple     Red            0
1006   Cherry   Yellow         0
1001   Pear       Purple        1
;
run;
proc print;
run;

data want;

  length F $ 8 ; drop F;
  declare hash H();
  H.defineKey("ID");
  H.defineData("F","Flag");
  H.defineDone();


  do until(EOF1);
    set have end=EOF1;

    if 0=H.find() then do; if F NE Fruit then Flag+1; end;
    else Flag=-1;

    _N_=H.replace();
  end;


  do until(EOF2);
    set have end=EOF2;

    _N_=H.find();

    Flag=Flag>0;
    output;
  end;
stop;
run;
proc print;
run;

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg

 

 

Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 161 views
  • 0 likes
  • 3 in conversation