BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dpachorek
Fluorite | Level 6

Hi,

 

I have a dataset where I am trying to sort out specific duplicates. I am creating an email list from an appended dataset and because certain people are labeled as both staff and students, we have duplicate records. Here is one such case:

 

dpachorek_0-1601389103979.png

 

This is not the case for everyone but I am trying to sort out the duplicate records that are labeled as student since their staff label takes precedence.

 

Any help? Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
SwissC
Obsidian | Level 7

Another option would be to transpose the data.

DATA part1;
  SET have;
  x=1;
RUN;

PROC TRANSPOSE data=part1 out=want;
  BY email (add all other variables except group and x);
  ID group;
  VAR x;
RUN;

This would then give a dataset with flags for each department and actually is prob a better way.

View solution in original post

5 REPLIES 5
SwissC
Obsidian | Level 7

Assuming that email is held in variable email, and the staff student is held in a column called staff_student.

 

PROC SQL;
  CREATE TABLE part1 AS SELECT
  *, count(distinct(staff_student)) as nrc
  FROM have
  GROUP BY email;
QUIT;
DATA part2;
  SET part1;
  IF nrc=2 THEN staff_studuent="Staff/Student";
  DROP nrc;
RUN;
PROC SQL;
  CREATE TABLE want AS SELECT
  distinct *
  FROM part2;
QUIT;

 You would then end with a record labeled Staff/Student for these, assuming these are the only things that are different in the record.

dpachorek
Fluorite | Level 6

You're right where the emails are under a variable called email. However, staff and student are apart of a variable called group that has 4 options (Student, Staff, Faculty, and Lib_Faculty).

SwissC
Obsidian | Level 7
DATA part1;
  SET have;
  IF group in("Staff" "Student") THEN cnt=1;
    ELSE cnt=0;
RUN;

PROC SQL;
  CREATE TABLE part2 AS SELECT
  *, sum(staff_student) as nrc
  FROM part1
  GROUP BY email;
QUIT;
DATA part3;
  SET part2;
  IF nrc=2 THEN staff_studuent="Staff/Student";
  DROP nrc cnt;
RUN;
PROC SQL;
  CREATE TABLE want AS SELECT
  distinct *
  FROM part3;
QUIT;

Does this fix it?

SwissC
Obsidian | Level 7

Another option would be to transpose the data.

DATA part1;
  SET have;
  x=1;
RUN;

PROC TRANSPOSE data=part1 out=want;
  BY email (add all other variables except group and x);
  ID group;
  VAR x;
RUN;

This would then give a dataset with flags for each department and actually is prob a better way.

dpachorek
Fluorite | Level 6

Yes! Thank you. Now, I can easily sort out based off these flags.

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1166 views
  • 1 like
  • 2 in conversation