Hi everyone, help please! I am very new to using SAS and I would like to combine two datasets by ID and Date. Some visits in the two datasets fall on the same date while some can be different +/- 90 days. Please see the below for example datasets.
Dataset One
ID | Date | Pulse Rate |
1 | 4/2/18 | 85 |
1 | 10/3/19 | 90 |
1 | 3/2/20 | 75 |
2 | 10/2/18 | 85 |
2 | 15/3/19 | 69 |
2 | 25/2/20 | 99 |
Dataset Two
ID | Date | BP |
1 | 4/2/18 | 125 |
1 | 15/5/19 | 110 |
1 | 3/2/20 | 100 |
2 | 10/2/18 | 160 |
2 | 15/3/19 | 150 |
2 | 12/12/19 | 120 |
The output dataset I want is
ID | Date | Pulse Rate | BP |
1 | 4/2/18 | 85 | 125 |
1 | 10/3/19 | 90 | 110 |
1 | 3/2/20 | 75 | 100 |
2 | 10/2/18 | 85 | 160 |
2 | 15/3/19 | 69 | 150 |
2 | 25/2/20 | 99 | 120 |
As you can see, some visits are on different dates. But I want to merge them if the date in Dataset Two is within +/- 90 days of the Dataset One. Could you please advise how to write the script for this. Thanks in advance.
You can do this in SQL:
data one;
infile datalines dlm='09'x dsd truncover;
input id $ date :ddmmyy10. pulse_rate;
format date yymmddd10.;
datalines;
1 4/2/18 85
1 10/3/19 90
1 3/2/20 75
2 10/2/18 85
2 15/3/19 69
2 25/2/20 99
;
data two;
infile datalines dlm='09'x dsd truncover;
input id $ date :ddmmyy10. bp;
format date yymmddd10.;
datalines;
1 4/2/18 125
1 15/5/19 110
1 3/2/20 100
2 10/2/18 160
2 15/3/19 150
2 12/12/19 120
;
proc sql;
create table want as
select
one.id,
one.date,
one.pulse_rate,
two.bp
from one, two
where one.id = two.id and abs(one.date - two.date) le 90
;
quit;
Note:
You can do this in SQL:
data one;
infile datalines dlm='09'x dsd truncover;
input id $ date :ddmmyy10. pulse_rate;
format date yymmddd10.;
datalines;
1 4/2/18 85
1 10/3/19 90
1 3/2/20 75
2 10/2/18 85
2 15/3/19 69
2 25/2/20 99
;
data two;
infile datalines dlm='09'x dsd truncover;
input id $ date :ddmmyy10. bp;
format date yymmddd10.;
datalines;
1 4/2/18 125
1 15/5/19 110
1 3/2/20 100
2 10/2/18 160
2 15/3/19 150
2 12/12/19 120
;
proc sql;
create table want as
select
one.id,
one.date,
one.pulse_rate,
two.bp
from one, two
where one.id = two.id and abs(one.date - two.date) le 90
;
quit;
Note:
Thanks, Kurt. It helps. Just an additional question. I wonder if I can modify the script to link only to a nearest date in data two within +/- 90 days to avoid catersian join. Thanks again.
Expand the SQL by creating a column that holds the difference and a group by, and select for the minimum difference:
proc sql;
create table want as
select
one.id,
one.date,
one.pulse_rate,
two.bp,
abs(two.date - one.date) as datedif
from one, two
where one.id = two.id and abs(one.date - two.date) le 90
group by one.id, one.date
having datedif = min(datedif)
;
quit;
Note that you still might get two observations for a single date if you have two with the same distance.
To avoid this, you would need to run a data step:
proc sql;
create table want as
select
one.id,
one.date,
one.pulse_rate,
two.bp,
abs(two.date - one.date) as datedif
from one, two
where one.id = two.id and abs(one.date - two.date) le 90
order by one.id, one.date, calculated datedif
;
quit;
data final_result;
set want;
by id date;
if first.date;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.