BookmarkSubscribeRSS Feed
Sandeep77
Lapis Lazuli | Level 10

Hi all,

I am merging three datasets. I have done by icustomerid. All three datasets have trace_dt (it's a date). I want the output data to show the latest trace_dt out of all the three datasets. I don't know how to approach for this. Could you please help?

Sample dataset:

Data Test;
infile cards expandtabs;
input icustomerid debt_code Trace_Month$ Trace_DT :date9. N Distinct_Lima_accounts2 Distinct_TPT_accounts2 Distinct_Man_accounts2;
format Trace_DT date9.;
datalines ;
1071 1001452 Sep-18 28SEP2018:03:05:06.310 1 1 0 0
1226 270840689 Dec-17 09DEC2017:13:57:43.730 1 1 0 0
1234 379021629 Mar-21 03MAY2022:00:00:00.000 1 1 1 0
1234 379059801 Mar-21 03MAY2022:00:00:00.000 1 1 1 0
1234 396978322 Feb-22 03MAY2022:00:00:00.000 1 1 1 0
1234 401458237 Apr-22 03MAY2022:00:00:00.000 1 1 1 0
;
run;

5 REPLIES 5
LinusH
Tourmaline | Level 20

Assuming this is your merged dataset: make sure it sorted on icustomerid and trace_dt.

Then do a SET BY, and if last.trace_dt filter.

Data never sleeps
Sandeep77
Lapis Lazuli | Level 10
Hi, yes this is merged dataset and I have sorted by icustomerid and then merged. Do I need to sort by trace_dt also? I didn't get exactly how should I do set by and if last.trace_dt filter. Could you please explain in detail?
LinusH
Tourmaline | Level 20
Yes, you need to sort by trace_dt to able to use the if last.trace_dt logic. @Kurt_Bremser pionted out you need to do your merge in away that you a record for each trace_dt. Without have seeing your original data, I would suffest that your do a merge on both icustomerid and trace_dt. Then you can do the if last.trace_dt in the same step.
Data never sleeps
Kurt_Bremser
Super User

If you MERGE datasets which all have the same variable (besides the BY variable(s)), you will lose values because they are overwritten. Please supply examples for the datasets before the "merge", and the code you used for this.

Tom
Super User Tom
Super User

If all three datasets have the ID and DATETIME variables then you do NOT want to merge just on the ID variable.

You should either merge by BOTH.

data want;
  merge ds1 ds2 ds3 ;
  by id datetime;
  if last.id;
run;

Or interleave by BOTH.

So this will combine the datesets (not merge them) and keep only the latest observation per ID.

data want;
  set ds1 ds2 ds3 ;
  by id datetime;
  if last.id;
run;

NOTE: Datasets have to be sorted already.  Use your dataset names and variable names.

 

If there is only one observation per ID per dataset (or at most one dataset that has multiple observations) then you can  merge by ID.  But to keep all three DATETIME values you will need to rename two of them.

data want;
  merge ds1 ds2(rename=(datetime=dateime2)) ds3(rename=(datetime=datetime3));
  by id;
  datetime = max(datetime,datetime2,datetime3);
run;

If one of them did have multiple observations per ID then you will need add a new SORT step and data step to pick the last date after this since taking the max of the three datetime values might have changed the related order of the date values.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 575 views
  • 1 like
  • 4 in conversation