DATA Step, Macro, Functions and more

Duplicate values

Reply
Contributor
Posts: 43

Duplicate values

[ Edited ]

Hello 

 

I've a dataset with 4 variables and many rows, among the records few rows have duplicate records except one variable (See the attached picture). If any rows that have a, b, c common values and different d value then i want to delete one row and change the value of d to z.

How can I do that?

 

Thanks in advance!


Untitled.png
Super User
Posts: 5,505

Re: Duplicate values

Posted in reply to Sujithpeta

First, sort your data set if it is not already in order:

 

proc sort data=have;

by a b c;

run;

 

Then get the data set you want:

 

data want;

set have;

by a b c;

if last.c;

if first.c=0 then d='z';

run;

PROC Star
Posts: 307

Re: Duplicate values

Posted in reply to Sujithpeta

Here's an SQL version that might do it:

 

data have;
input a b c d $;
datalines;	
1 2 3 x
1 2 2 x
1 2 2 y
2 1 1 x
1 2 4 y
1 2 4 x
;

proc sql;
    CREATE TABLE want AS
    SELECT DISTINCT a, b, c, 
           CASE count(DISTINCT d) WHEN 1 THEN d
                ELSE 'z'
           END AS d
    FROM have
    GROUP BY a, b, c;
quit;

I'm not sure how you'd easily get the final sort order you have in your image, but I think this gets the data right.

Ask a Question
Discussion stats
  • 2 replies
  • 194 views
  • 0 likes
  • 3 in conversation