BookmarkSubscribeRSS Feed
Sandeep77
Lapis Lazuli | Level 10

Hi all,

I am merging three datasets. I have done by icustomerid. All three datasets have trace_dt (it's a date). I want the output data to show the latest trace_dt out of all the three datasets. I don't know how to approach for this. Could you please help?

Sample dataset:

Data Test;
infile cards expandtabs;
input icustomerid debt_code Trace_Month$ Trace_DT :date9. N Distinct_Lima_accounts2 Distinct_TPT_accounts2 Distinct_Man_accounts2;
format Trace_DT date9.;
datalines ;
1071 1001452 Sep-18 28SEP2018:03:05:06.310 1 1 0 0
1226 270840689 Dec-17 09DEC2017:13:57:43.730 1 1 0 0
1234 379021629 Mar-21 03MAY2022:00:00:00.000 1 1 1 0
1234 379059801 Mar-21 03MAY2022:00:00:00.000 1 1 1 0
1234 396978322 Feb-22 03MAY2022:00:00:00.000 1 1 1 0
1234 401458237 Apr-22 03MAY2022:00:00:00.000 1 1 1 0
;
run;

5 REPLIES 5
LinusH
Tourmaline | Level 20

Assuming this is your merged dataset: make sure it sorted on icustomerid and trace_dt.

Then do a SET BY, and if last.trace_dt filter.

Data never sleeps
Sandeep77
Lapis Lazuli | Level 10
Hi, yes this is merged dataset and I have sorted by icustomerid and then merged. Do I need to sort by trace_dt also? I didn't get exactly how should I do set by and if last.trace_dt filter. Could you please explain in detail?
LinusH
Tourmaline | Level 20
Yes, you need to sort by trace_dt to able to use the if last.trace_dt logic. @Kurt_Bremser pionted out you need to do your merge in away that you a record for each trace_dt. Without have seeing your original data, I would suffest that your do a merge on both icustomerid and trace_dt. Then you can do the if last.trace_dt in the same step.
Data never sleeps
Kurt_Bremser
Super User

If you MERGE datasets which all have the same variable (besides the BY variable(s)), you will lose values because they are overwritten. Please supply examples for the datasets before the "merge", and the code you used for this.

Tom
Super User Tom
Super User

If all three datasets have the ID and DATETIME variables then you do NOT want to merge just on the ID variable.

You should either merge by BOTH.

data want;
  merge ds1 ds2 ds3 ;
  by id datetime;
  if last.id;
run;

Or interleave by BOTH.

So this will combine the datesets (not merge them) and keep only the latest observation per ID.

data want;
  set ds1 ds2 ds3 ;
  by id datetime;
  if last.id;
run;

NOTE: Datasets have to be sorted already.  Use your dataset names and variable names.

 

If there is only one observation per ID per dataset (or at most one dataset that has multiple observations) then you can  merge by ID.  But to keep all three DATETIME values you will need to rename two of them.

data want;
  merge ds1 ds2(rename=(datetime=dateime2)) ds3(rename=(datetime=datetime3));
  by id;
  datetime = max(datetime,datetime2,datetime3);
run;

If one of them did have multiple observations per ID then you will need add a new SORT step and data step to pick the last date after this since taking the max of the three datetime values might have changed the related order of the date values.

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

From SAS Users blog
Want more? Visit our blog for more articles like these.
5 Steps to Your First Analytics Project Using SAS

For SAS newbies, this video is a great way to get started. James Harroun walks through the process using SAS Studio for SAS OnDemand for Academics, but the same steps apply to any analytics project.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 231 views
  • 1 like
  • 4 in conversation