01-23-2017 04:31 PM - edited 01-23-2017 04:39 PM
I've a dataset with 4 variables and many rows, among the records few rows have duplicate records except one variable (See the attached picture). If any rows that have a, b, c common values and different d value then i want to delete one row and change the value of d to z.
How can I do that?
Thanks in advance!
01-23-2017 05:19 PM
First, sort your data set if it is not already in order:
proc sort data=have;
by a b c;
Then get the data set you want:
by a b c;
if first.c=0 then d='z';
01-23-2017 08:49 PM
Here's an SQL version that might do it:
data have; input a b c d $; datalines; 1 2 3 x 1 2 2 x 1 2 2 y 2 1 1 x 1 2 4 y 1 2 4 x ; proc sql; CREATE TABLE want AS SELECT DISTINCT a, b, c, CASE count(DISTINCT d) WHEN 1 THEN d ELSE 'z' END AS d FROM have GROUP BY a, b, c; quit;
I'm not sure how you'd easily get the final sort order you have in your image, but I think this gets the data right.