Hi everyone, help please! I am very new to using SAS and I would like to combine two datasets by ID and Date. Some visits in the two datasets fall on the same date while some can be different +/- 90 days. Please see the below for example datasets.
Dataset One
| ID | Date | Pulse Rate |
| 1 | 4/2/18 | 85 |
| 1 | 10/3/19 | 90 |
| 1 | 3/2/20 | 75 |
| 2 | 10/2/18 | 85 |
| 2 | 15/3/19 | 69 |
| 2 | 25/2/20 | 99 |
Dataset Two
| ID | Date | BP |
| 1 | 4/2/18 | 125 |
| 1 | 15/5/19 | 110 |
| 1 | 3/2/20 | 100 |
| 2 | 10/2/18 | 160 |
| 2 | 15/3/19 | 150 |
| 2 | 12/12/19 | 120 |
The output dataset I want is
| ID | Date | Pulse Rate | BP |
| 1 | 4/2/18 | 85 | 125 |
| 1 | 10/3/19 | 90 | 110 |
| 1 | 3/2/20 | 75 | 100 |
| 2 | 10/2/18 | 85 | 160 |
| 2 | 15/3/19 | 69 | 150 |
| 2 | 25/2/20 | 99 | 120 |
As you can see, some visits are on different dates. But I want to merge them if the date in Dataset Two is within +/- 90 days of the Dataset One. Could you please advise how to write the script for this. Thanks in advance.
You can do this in SQL:
data one;
infile datalines dlm='09'x dsd truncover;
input id $ date :ddmmyy10. pulse_rate;
format date yymmddd10.;
datalines;
1 4/2/18 85
1 10/3/19 90
1 3/2/20 75
2 10/2/18 85
2 15/3/19 69
2 25/2/20 99
;
data two;
infile datalines dlm='09'x dsd truncover;
input id $ date :ddmmyy10. bp;
format date yymmddd10.;
datalines;
1 4/2/18 125
1 15/5/19 110
1 3/2/20 100
2 10/2/18 160
2 15/3/19 150
2 12/12/19 120
;
proc sql;
create table want as
select
one.id,
one.date,
one.pulse_rate,
two.bp
from one, two
where one.id = two.id and abs(one.date - two.date) le 90
;
quit;
Note:
You can do this in SQL:
data one;
infile datalines dlm='09'x dsd truncover;
input id $ date :ddmmyy10. pulse_rate;
format date yymmddd10.;
datalines;
1 4/2/18 85
1 10/3/19 90
1 3/2/20 75
2 10/2/18 85
2 15/3/19 69
2 25/2/20 99
;
data two;
infile datalines dlm='09'x dsd truncover;
input id $ date :ddmmyy10. bp;
format date yymmddd10.;
datalines;
1 4/2/18 125
1 15/5/19 110
1 3/2/20 100
2 10/2/18 160
2 15/3/19 150
2 12/12/19 120
;
proc sql;
create table want as
select
one.id,
one.date,
one.pulse_rate,
two.bp
from one, two
where one.id = two.id and abs(one.date - two.date) le 90
;
quit;
Note:
Thanks, Kurt. It helps. Just an additional question. I wonder if I can modify the script to link only to a nearest date in data two within +/- 90 days to avoid catersian join. Thanks again.
Expand the SQL by creating a column that holds the difference and a group by, and select for the minimum difference:
proc sql;
create table want as
select
one.id,
one.date,
one.pulse_rate,
two.bp,
abs(two.date - one.date) as datedif
from one, two
where one.id = two.id and abs(one.date - two.date) le 90
group by one.id, one.date
having datedif = min(datedif)
;
quit;
Note that you still might get two observations for a single date if you have two with the same distance.
To avoid this, you would need to run a data step:
proc sql;
create table want as
select
one.id,
one.date,
one.pulse_rate,
two.bp,
abs(two.date - one.date) as datedif
from one, two
where one.id = two.id and abs(one.date - two.date) le 90
order by one.id, one.date, calculated datedif
;
quit;
data final_result;
set want;
by id date;
if first.date;
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.