BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SasDewd
Obsidian | Level 7

Hello,

 

I have a data set full of orders. 

The data set has 150+ columns and when I update it every day some duplicates will be introduced.

I need to remove those duplicates (based on order_id) and keep the record with the most recent update_date.

The following code works most of the time but I just noticed there are duplicates that the code for some reason is not removing and it does not make any sense.

Here is my code:

 

proc sort data=Orders out=Orders;
by order_id update_date;
run;

data Orders;
set Orders;
by order_id update_date;
if last.update_date;
run;

 

What am I missing here and why is this working for some orders but not the others.

I've been staring at one particular order for the past 2hrs and it's the same exact order_id and the update_date is clearly different yet the code returns both records!?

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star
LAST.ORDER_ID is true when you hit the last observation for an ORDER_ID. Those are the observations you are looking for.

LAST.UPDATE_DATE is true when you hit the last observation for an UPDATE_DATE. That would select one observation per date.

View solution in original post

6 REPLIES 6
Astounding
PROC Star

Your subsetting statement is incorrect:

 

if last.update_date;

 

It should be:

 

if last.order_id;

 

Also notice, this is dangerous:

 

data orders;

   set orders;

   ...........

 

If your logic is wrong, you can destroy your data set ORDERS.  Safer:

 

data orders2;

   set orders;

   ..............

 

 

SasDewd
Obsidian | Level 7

Hmm..OK...I will try that.

But why?

If I have 2 records:

Order_Id         Update_date

100                  1/1/2019

100                  2/1/2019

 

So I want to keep the one from 2/1/2019 why would I say if  last.order_id and not  last.update_date??

 

Just trying to understand.

Second part makes perfect sense..I will change that.

 

Thank you for your answer @Astounding !!

SasDewd
Obsidian | Level 7

I forgot to mention...especially since the code I had was removing duplicates as I was hoping it would...just not all of them

sotojcr
Obsidian | Level 7
proc sort data=Orders;
by order_id descending update_date;
run;

proc sort data=Orders nodupkey;
by order_id;
run;
ballardw
Super User

@SasDewd wrote:

Hello,

 

I have a data set full of orders. 

The data set has 150+ columns and when I update it every day some duplicates will be introduced.

I need to remove those duplicates (based on order_id) and keep the record with the most recent update_date.

The following code works most of the time but I just noticed there are duplicates that the code for some reason is not removing and it does not make any sense.

Here is my code:

 

proc sort data=Orders out=Orders;
by order_id update_date;
run;

data Orders;
set Orders;
by order_id update_date;
if last.update_date;
run;

 

What am I missing here and why is this working for some orders but not the others.

I've been staring at one particular order for the past 2hrs and it's the same exact order_id and the update_date is clearly different yet the code returns both records!?


You may need to consider how you are "updating" you data. I sounds like you might be APPENDING data. Either with Proc Append or using a SET statement with two data sets, the master and the "new" data.

 

If you want a single record for ORDER_ID perhaps you need to use a data step UPDATE statement.

Astounding
PROC Star
LAST.ORDER_ID is true when you hit the last observation for an ORDER_ID. Those are the observations you are looking for.

LAST.UPDATE_DATE is true when you hit the last observation for an UPDATE_DATE. That would select one observation per date.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 2368 views
  • 3 likes
  • 4 in conversation