BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

Below is a snippet from very large data set. The problem I am experiencing is to do with duplicates. As you can see below I have duplicated values in the 'ID' column, however the rest of the values in the other coloumns are not duplicates. What I want to do is:

  • Remove the row with the 'first' duplicate, where there a zero entry in Var1 and blanks in Var2 and Var 3.
  • Therefore keeping the row where there is information for all variables.

How can I achieve this is SAS?

 

Thanks

 

data.PNG

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

One way:

proc sort data=have;
   by id Var1;
run;

data want;
   set have;
   by id;
   if first.id and var1=0 and missing(var2) and missing(var3) then delete;
run;

View solution in original post

2 REPLIES 2
ballardw
Super User

One way:

proc sort data=have;
   by id Var1;
run;

data want;
   set have;
   by id;
   if first.id and var1=0 and missing(var2) and missing(var3) then delete;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 457 views
  • 0 likes
  • 2 in conversation