BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

Below is a snippet from very large data set. The problem I am experiencing is to do with duplicates. As you can see below I have duplicated values in the 'ID' column, however the rest of the values in the other coloumns are not duplicates. What I want to do is:

  • Remove the row with the 'first' duplicate, where there a zero entry in Var1 and blanks in Var2 and Var 3.
  • Therefore keeping the row where there is information for all variables.

How can I achieve this is SAS?

 

Thanks

 

data.PNG

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

One way:

proc sort data=have;
   by id Var1;
run;

data want;
   set have;
   by id;
   if first.id and var1=0 and missing(var2) and missing(var3) then delete;
run;

View solution in original post

2 REPLIES 2
ballardw
Super User

One way:

proc sort data=have;
   by id Var1;
run;

data want;
   set have;
   by id;
   if first.id and var1=0 and missing(var2) and missing(var3) then delete;
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 422 views
  • 0 likes
  • 2 in conversation