BookmarkSubscribeRSS Feed
jnivi
Calcite | Level 5

Hi there, 

 

I have a really large dataset and I was wondering if there was a way I can flag whether a particular variable value is the same or not for IDs with more than 1 record. I am not looking for exact duplicates across all variable values (i.e can't use NODUPKEY), nor am I interested in removing the exact duplicates, I just wanted to flag whether the values were the same or not. I have included a same dataset below. 

 

ID       Fruit     Colour         Flag

1001   Apple    Purple          1

1003   Banana  Black          0

1001   Apple      Purple        1

1005   Mango   Blue            0

1005   Mango    Pink           0

1002   Apple     Red            0

1006   Cherry   Yellow         0

1001   Pear       Purple        1

 

For the above dataset, among IDs with more than 1 record, I am interested in flagging whether the fruit variable value is the same/different among all observations for that particular ID. I have included how I would like my flag variable to code. For example ID 1001, the fruit variable is the same only for 2/3 observations, so I wanted to flag this as 1 (not the same). 

 

Thank you once again for taking the time to read through my question and I am looking forward to reading any suggestions you might have to solve my problem!

 

2 REPLIES 2
Ksharp
Super User
Data name;
infile cards expandtabs;
input icustomerid debt_code rep_code dr_inits $ dr_name $;
datalines ;
48196 367762168 131 Tami Baker
48196 337656029 131 Tami Baker
48196 302678693 131 Tami Baker
48197 302678693 131 Tami Baker
48197 302678693 131 Tami Bak
;
run;

proc sql;
create table want as
select *,count(distinct dr_name) ne 1 as flag
 from name
  group by icustomerid ;
quit;
yabwon
Onyx | Level 15

Alternative option with hash tables:

data have;
input ID       Fruit : $     Colour : $;
cards;
1001   Apple    Purple          1
1003   Banana  Black          0
1001   Apple      Purple        1
1005   Mango   Blue            0
1005   Mango    Pink           0
1002   Apple     Red            0
1006   Cherry   Yellow         0
1001   Pear       Purple        1
;
run;
proc print;
run;

data want;

  length F $ 8 ; drop F;
  declare hash H();
  H.defineKey("ID");
  H.defineData("F","Flag");
  H.defineDone();


  do until(EOF1);
    set have end=EOF1;

    if 0=H.find() then do; if F NE Fruit then Flag+1; end;
    else Flag=-1;

    _N_=H.replace();
  end;


  do until(EOF2);
    set have end=EOF2;

    _N_=H.find();

    Flag=Flag>0;
    output;
  end;
stop;
run;
proc print;
run;

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 489 views
  • 0 likes
  • 3 in conversation