Hi all,
I am merging three datasets. I have done by icustomerid. All three datasets have trace_dt (it's a date). I want the output data to show the latest trace_dt out of all the three datasets. I don't know how to approach for this. Could you please help?
Sample dataset:
Data Test;
infile cards expandtabs;
input icustomerid debt_code Trace_Month$ Trace_DT :date9. N Distinct_Lima_accounts2 Distinct_TPT_accounts2 Distinct_Man_accounts2;
format Trace_DT date9.;
datalines ;
1071 1001452 Sep-18 28SEP2018:03:05:06.310 1 1 0 0
1226 270840689 Dec-17 09DEC2017:13:57:43.730 1 1 0 0
1234 379021629 Mar-21 03MAY2022:00:00:00.000 1 1 1 0
1234 379059801 Mar-21 03MAY2022:00:00:00.000 1 1 1 0
1234 396978322 Feb-22 03MAY2022:00:00:00.000 1 1 1 0
1234 401458237 Apr-22 03MAY2022:00:00:00.000 1 1 1 0
;
run;
Assuming this is your merged dataset: make sure it sorted on icustomerid and trace_dt.
Then do a SET BY, and if last.trace_dt filter.
If you MERGE datasets which all have the same variable (besides the BY variable(s)), you will lose values because they are overwritten. Please supply examples for the datasets before the "merge", and the code you used for this.
If all three datasets have the ID and DATETIME variables then you do NOT want to merge just on the ID variable.
You should either merge by BOTH.
data want;
merge ds1 ds2 ds3 ;
by id datetime;
if last.id;
run;
Or interleave by BOTH.
So this will combine the datesets (not merge them) and keep only the latest observation per ID.
data want;
set ds1 ds2 ds3 ;
by id datetime;
if last.id;
run;
NOTE: Datasets have to be sorted already. Use your dataset names and variable names.
If there is only one observation per ID per dataset (or at most one dataset that has multiple observations) then you can merge by ID. But to keep all three DATETIME values you will need to rename two of them.
data want;
merge ds1 ds2(rename=(datetime=dateime2)) ds3(rename=(datetime=datetime3));
by id;
datetime = max(datetime,datetime2,datetime3);
run;
If one of them did have multiple observations per ID then you will need add a new SORT step and data step to pick the last date after this since taking the max of the three datetime values might have changed the related order of the date values.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.