BookmarkSubscribeRSS Feed
jnivi
Calcite | Level 5

Hi there, 

 

I have a really large dataset and I was wondering if there was a way I can flag whether a particular variable value is the same or not for IDs with more than 1 record. I am not looking for exact duplicates across all variable values (i.e can't use NODUPKEY), nor am I interested in removing the exact duplicates, I just wanted to flag whether the values were the same or not. I have included a same dataset below. 

 

ID       Fruit     Colour         Flag

1001   Apple    Purple          1

1003   Banana  Black          0

1001   Apple      Purple        1

1005   Mango   Blue            0

1005   Mango    Pink           0

1002   Apple     Red            0

1006   Cherry   Yellow         0

1001   Pear       Purple        1

 

For the above dataset, among IDs with more than 1 record, I am interested in flagging whether the fruit variable value is the same/different among all observations for that particular ID. I have included how I would like my flag variable to code. For example ID 1001, the fruit variable is the same only for 2/3 observations, so I wanted to flag this as 1 (not the same). 

 

Thank you once again for taking the time to read through my question and I am looking forward to reading any suggestions you might have to solve my problem!

 

2 REPLIES 2
Ksharp
Super User
Data name;
infile cards expandtabs;
input icustomerid debt_code rep_code dr_inits $ dr_name $;
datalines ;
48196 367762168 131 Tami Baker
48196 337656029 131 Tami Baker
48196 302678693 131 Tami Baker
48197 302678693 131 Tami Baker
48197 302678693 131 Tami Bak
;
run;

proc sql;
create table want as
select *,count(distinct dr_name) ne 1 as flag
 from name
  group by icustomerid ;
quit;
yabwon
Onyx | Level 15

Alternative option with hash tables:

data have;
input ID       Fruit : $     Colour : $;
cards;
1001   Apple    Purple          1
1003   Banana  Black          0
1001   Apple      Purple        1
1005   Mango   Blue            0
1005   Mango    Pink           0
1002   Apple     Red            0
1006   Cherry   Yellow         0
1001   Pear       Purple        1
;
run;
proc print;
run;

data want;

  length F $ 8 ; drop F;
  declare hash H();
  H.defineKey("ID");
  H.defineData("F","Flag");
  H.defineDone();


  do until(EOF1);
    set have end=EOF1;

    if 0=H.find() then do; if F NE Fruit then Flag+1; end;
    else Flag=-1;

    _N_=H.replace();
  end;


  do until(EOF2);
    set have end=EOF2;

    _N_=H.find();

    Flag=Flag>0;
    output;
  end;
stop;
run;
proc print;
run;

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 372 views
  • 0 likes
  • 3 in conversation