BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dpachorek
Fluorite | Level 6

Hi,

 

I have a dataset where I am trying to sort out specific duplicates. I am creating an email list from an appended dataset and because certain people are labeled as both staff and students, we have duplicate records. Here is one such case:

 

dpachorek_0-1601389103979.png

 

This is not the case for everyone but I am trying to sort out the duplicate records that are labeled as student since their staff label takes precedence.

 

Any help? Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
SwissC
Obsidian | Level 7

Another option would be to transpose the data.

DATA part1;
  SET have;
  x=1;
RUN;

PROC TRANSPOSE data=part1 out=want;
  BY email (add all other variables except group and x);
  ID group;
  VAR x;
RUN;

This would then give a dataset with flags for each department and actually is prob a better way.

View solution in original post

5 REPLIES 5
SwissC
Obsidian | Level 7

Assuming that email is held in variable email, and the staff student is held in a column called staff_student.

 

PROC SQL;
  CREATE TABLE part1 AS SELECT
  *, count(distinct(staff_student)) as nrc
  FROM have
  GROUP BY email;
QUIT;
DATA part2;
  SET part1;
  IF nrc=2 THEN staff_studuent="Staff/Student";
  DROP nrc;
RUN;
PROC SQL;
  CREATE TABLE want AS SELECT
  distinct *
  FROM part2;
QUIT;

 You would then end with a record labeled Staff/Student for these, assuming these are the only things that are different in the record.

dpachorek
Fluorite | Level 6

You're right where the emails are under a variable called email. However, staff and student are apart of a variable called group that has 4 options (Student, Staff, Faculty, and Lib_Faculty).

SwissC
Obsidian | Level 7
DATA part1;
  SET have;
  IF group in("Staff" "Student") THEN cnt=1;
    ELSE cnt=0;
RUN;

PROC SQL;
  CREATE TABLE part2 AS SELECT
  *, sum(staff_student) as nrc
  FROM part1
  GROUP BY email;
QUIT;
DATA part3;
  SET part2;
  IF nrc=2 THEN staff_studuent="Staff/Student";
  DROP nrc cnt;
RUN;
PROC SQL;
  CREATE TABLE want AS SELECT
  distinct *
  FROM part3;
QUIT;

Does this fix it?

SwissC
Obsidian | Level 7

Another option would be to transpose the data.

DATA part1;
  SET have;
  x=1;
RUN;

PROC TRANSPOSE data=part1 out=want;
  BY email (add all other variables except group and x);
  ID group;
  VAR x;
RUN;

This would then give a dataset with flags for each department and actually is prob a better way.

dpachorek
Fluorite | Level 6

Yes! Thank you. Now, I can easily sort out based off these flags.

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1134 views
  • 1 like
  • 2 in conversation