Hi,
I have a dataset where I am trying to sort out specific duplicates. I am creating an email list from an appended dataset and because certain people are labeled as both staff and students, we have duplicate records. Here is one such case:
This is not the case for everyone but I am trying to sort out the duplicate records that are labeled as student since their staff label takes precedence.
Any help? Thanks!
Another option would be to transpose the data.
DATA part1;
SET have;
x=1;
RUN;
PROC TRANSPOSE data=part1 out=want;
BY email (add all other variables except group and x);
ID group;
VAR x;
RUN;
This would then give a dataset with flags for each department and actually is prob a better way.
Assuming that email is held in variable email, and the staff student is held in a column called staff_student.
PROC SQL;
CREATE TABLE part1 AS SELECT
*, count(distinct(staff_student)) as nrc
FROM have
GROUP BY email;
QUIT;
DATA part2;
SET part1;
IF nrc=2 THEN staff_studuent="Staff/Student";
DROP nrc;
RUN;
PROC SQL;
CREATE TABLE want AS SELECT
distinct *
FROM part2;
QUIT;
You would then end with a record labeled Staff/Student for these, assuming these are the only things that are different in the record.
You're right where the emails are under a variable called email. However, staff and student are apart of a variable called group that has 4 options (Student, Staff, Faculty, and Lib_Faculty).
DATA part1;
SET have;
IF group in("Staff" "Student") THEN cnt=1;
ELSE cnt=0;
RUN;
PROC SQL;
CREATE TABLE part2 AS SELECT
*, sum(staff_student) as nrc
FROM part1
GROUP BY email;
QUIT;
DATA part3;
SET part2;
IF nrc=2 THEN staff_studuent="Staff/Student";
DROP nrc cnt;
RUN;
PROC SQL;
CREATE TABLE want AS SELECT
distinct *
FROM part3;
QUIT;
Does this fix it?
Another option would be to transpose the data.
DATA part1;
SET have;
x=1;
RUN;
PROC TRANSPOSE data=part1 out=want;
BY email (add all other variables except group and x);
ID group;
VAR x;
RUN;
This would then give a dataset with flags for each department and actually is prob a better way.
Yes! Thank you. Now, I can easily sort out based off these flags.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.