BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dpachorek
Fluorite | Level 6

Hi,

 

I have a dataset where I am trying to sort out specific duplicates. I am creating an email list from an appended dataset and because certain people are labeled as both staff and students, we have duplicate records. Here is one such case:

 

dpachorek_0-1601389103979.png

 

This is not the case for everyone but I am trying to sort out the duplicate records that are labeled as student since their staff label takes precedence.

 

Any help? Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
SwissC
Obsidian | Level 7

Another option would be to transpose the data.

DATA part1;
  SET have;
  x=1;
RUN;

PROC TRANSPOSE data=part1 out=want;
  BY email (add all other variables except group and x);
  ID group;
  VAR x;
RUN;

This would then give a dataset with flags for each department and actually is prob a better way.

View solution in original post

5 REPLIES 5
SwissC
Obsidian | Level 7

Assuming that email is held in variable email, and the staff student is held in a column called staff_student.

 

PROC SQL;
  CREATE TABLE part1 AS SELECT
  *, count(distinct(staff_student)) as nrc
  FROM have
  GROUP BY email;
QUIT;
DATA part2;
  SET part1;
  IF nrc=2 THEN staff_studuent="Staff/Student";
  DROP nrc;
RUN;
PROC SQL;
  CREATE TABLE want AS SELECT
  distinct *
  FROM part2;
QUIT;

 You would then end with a record labeled Staff/Student for these, assuming these are the only things that are different in the record.

dpachorek
Fluorite | Level 6

You're right where the emails are under a variable called email. However, staff and student are apart of a variable called group that has 4 options (Student, Staff, Faculty, and Lib_Faculty).

SwissC
Obsidian | Level 7
DATA part1;
  SET have;
  IF group in("Staff" "Student") THEN cnt=1;
    ELSE cnt=0;
RUN;

PROC SQL;
  CREATE TABLE part2 AS SELECT
  *, sum(staff_student) as nrc
  FROM part1
  GROUP BY email;
QUIT;
DATA part3;
  SET part2;
  IF nrc=2 THEN staff_studuent="Staff/Student";
  DROP nrc cnt;
RUN;
PROC SQL;
  CREATE TABLE want AS SELECT
  distinct *
  FROM part3;
QUIT;

Does this fix it?

SwissC
Obsidian | Level 7

Another option would be to transpose the data.

DATA part1;
  SET have;
  x=1;
RUN;

PROC TRANSPOSE data=part1 out=want;
  BY email (add all other variables except group and x);
  ID group;
  VAR x;
RUN;

This would then give a dataset with flags for each department and actually is prob a better way.

dpachorek
Fluorite | Level 6

Yes! Thank you. Now, I can easily sort out based off these flags.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 536 views
  • 1 like
  • 2 in conversation