BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Randy1
Calcite | Level 5

Sorry if this is redundant but I don't see anything about removing duplicates in here that answers my question.  I've tried IF-THEN, and the Case Where statements.  I want to remove the Var2 obs with a 7 only when Var1=X (character data).  This is a very large consolidated dataset with over 23.5 million rows and 156 variables.  The other variables in the Y and Z rows have different data and I need to retain it.  So I just want to get rid of the X duplicate.

 

What I have:  

 

Var1Var2
X5
X6
X7
Y5
Y6
Y7
Z7
Z9
Z10

 

 

What a want:

 

 

Var1Var2
X2
X3
Y5
Y6
Y7
Z7
Z9
Z10

 

Thank you ahead of time!

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

If code like this does not work post your code and log and copies of some of the records it should delete but doesn't:

 

data want;
   set have;
   if var1='X' and var2=7 then delete;
run;

Note the character comparison is case sensitive so if you say var1='ABC' and the actual value is 'Abc' then that is not a match.

 

If that is an issue then you may want to UPCASE the variable: If upcase(var1) = 'ABC'

View solution in original post

13 REPLIES 13
ballardw
Super User

If code like this does not work post your code and log and copies of some of the records it should delete but doesn't:

 

data want;
   set have;
   if var1='X' and var2=7 then delete;
run;

Note the character comparison is case sensitive so if you say var1='ABC' and the actual value is 'Abc' then that is not a match.

 

If that is an issue then you may want to UPCASE the variable: If upcase(var1) = 'ABC'

Randy1
Calcite | Level 5

Thanks Ballardw for your quick reply.  Yes, I did try that and it seems it should be that simple.  It retains all of the X's that are other than 7, but it eliminates all of the rows with Var2=7, that being the ones that have Var1=Y, and Z.  I'm sorting by Var1 and Var2.  Would this make a different?

ballardw
Super User

@Randy1 wrote:

Thanks Ballardw for your quick reply.  Yes, I did try that and it seems it should be that simple.  It retains all of the X's that are other than 7, but it eliminates all of the rows with Var2=7, that being the ones that have Var1=Y, and Z.  I'm sorting by Var1 and Var2.  Would this make a different?


Please show the code that you ran as that is not what will happen with the code I provided.

data have;
input Var1 $ Var2 ;
datalines;
X 5 
X 6 
X 7 
Y 5 
Y 6 
Y 7 
Z 7 
Z 9 
Z 10 
;

run;
data want;
   set have;
   if var1='X' and var2=7 then delete;
run;
proc print noobs;
run;

Note that your example want data contains values that were not in the have data.

 

Randy1
Calcite | Level 5

Unfortunately I'm not allowed to post my actual data so I have to create an example.

ballardw
Super User

@Randy1 wrote:

Unfortunately I'm not allowed to post my actual data so I have to create an example.


At least post the code that you ran.

The behavior you claim for the "if var1='X' and var2=7 then delete;"

would make me think that you either included an OR some where in the code or left out the AND in the comparison.

Randy1
Calcite | Level 5

Ballardw, I removed the BY statement in the code below and it ran.  Do you know why that would mess it up?

 

 

DATA TEMP1;
SET QUERY_FOR_MERGE_STORET_SRBT_0006;
BY Organizati POINT_X POINT_Y DATE YEAR SLID_Rep;
IF Var1='X' and Var2>1 THEN DELETE;

RUN

Randy1
Calcite | Level 5
Thanks for all you help everyone. I'll let it at that.
ballardw
Super User

@Randy1 wrote:

Ballardw, I removed the BY statement in the code below and it ran.  Do you know why that would mess it up?

 

 

DATA TEMP1;
SET QUERY_FOR_MERGE_STORET_SRBT_0006;
BY Organizati POINT_X POINT_Y DATE YEAR SLID_Rep;
IF Var1='X' and Var2>1 THEN DELETE;

RUN


If that code did not yield the result desired I would suspect soem error to appear in the log. Most likely would be something like a sort order error or possibly one of the variables on the by statement. If the log shows something like:

ERROR: BY variables are not properly sorted on data set

then the resulting set would not be replaced. If you have an existing temp1 created incorrectly then that would be what you would see when you examine the values in Temp1. Or

 

ERROR: BY variable <any of your variables on the by statement> is not on input data set

would also not replace an existing Temp1.

 

Mokben
Fluorite | Level 6

I don't really understand what you want !!

 

If you want to delete duplicates you can use NODUPKEYS in Proc Sort  or an Proc sql with distinct :

 

Proc Sort data = your_table NODUPKEYS ; 

By _All_ ;

Run ; 

Randy1
Calcite | Level 5

Thanks Mokben for your quick reply and suggestion.  I'll will give your suggestion a try as I prefer SQL and using SAS EG.  I was afraid my example might not be clear.

kaverisonu1
Fluorite | Level 6

if i understood correctly.


data new;
input var1 $ var2;
cards;
X 5
X 6
X 7
Y 5
Y 6
Y 7
Z 7
Z 9
Z 10
;
run;
data new1;
set new;
if var1='X' and var2=7 then delete;
run;

Randy1
Calcite | Level 5

Thanks for your quick reply Kaverisonu1.  I used the suggested code above (that is essentially the same as yours here)   I removed a sort BY statement and it seems to work now.  Not sure why that would mess anything up, but as long as it work, I'm happy.  Have a good one.

ballardw
Super User

Sort order would not affect the results of the posted code. If you have a "sort by" in the data step that would cause an error and not replace an existing incorrect dataset. As I mentioned, posting your code can clarify a number of questions.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 2414 views
  • 0 likes
  • 4 in conversation