Hello
I've a dataset with 4 variables and many rows, among the records few rows have duplicate records except one variable (See the attached picture). If any rows that have a, b, c common values and different d value then i want to delete one row and change the value of d to z.
How can I do that?
Thanks in advance!
First, sort your data set if it is not already in order:
proc sort data=have;
by a b c;
run;
Then get the data set you want:
data want;
set have;
by a b c;
if last.c;
if first.c=0 then d='z';
run;
Here's an SQL version that might do it:
data have;
input a b c d $;
datalines;
1 2 3 x
1 2 2 x
1 2 2 y
2 1 1 x
1 2 4 y
1 2 4 x
;
proc sql;
CREATE TABLE want AS
SELECT DISTINCT a, b, c,
CASE count(DISTINCT d) WHEN 1 THEN d
ELSE 'z'
END AS d
FROM have
GROUP BY a, b, c;
quit;
I'm not sure how you'd easily get the final sort order you have in your image, but I think this gets the data right.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.