Duplicate values

Sujithpeta · Posted 01-23-2017 04:31 PM

Hello

I've a dataset with 4 variables and many rows, among the records few rows have duplicate records except one variable (See the attached picture). If any rows that have a, b, c common values and different d value then i want to delete one row and change the value of d to z.

How can I do that?

Thanks in advance!

Astounding · Posted 01-23-2017 05:19 PM

First, sort your data set if it is not already in order:

proc sort data=have;

by a b c;

run;

Then get the data set you want:

data want;

set have;

by a b c;

if last.c;

if first.c=0 then d='z';

run;

collinelliot · Posted 01-23-2017 08:49 PM

Here's an SQL version that might do it:

data have;
input a b c d $;
datalines;	
1 2 3 x
1 2 2 x
1 2 2 y
2 1 1 x
1 2 4 y
1 2 4 x
;

proc sql;
    CREATE TABLE want AS
    SELECT DISTINCT a, b, c, 
           CASE count(DISTINCT d) WHEN 1 THEN d
                ELSE 'z'
           END AS d
    FROM have
    GROUP BY a, b, c;
quit;

I'm not sure how you'd easily get the final sort order you have in your image, but I think this gets the data right.

Duplicate values

Re: Duplicate values

Re: Duplicate values

Catch up on SAS Innovate 2026

Duplicate values

Re: Duplicate values

Re: Duplicate values

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away