Hi all,
I am hoping you can help me! **Please help!!!!** 🙏
I am in SAS and I have datasets A and B with different measurements (relating to patient's health) as follows:
Dataset A
ID Date measurement_a
1 20JUN2013 52.3
1 12JUL2013 65.6
1 28NOV2014 37.4
1 02DEC2014 61.3
1 22SEP2015 40.5
1 15OCT2015 60.5
2 03JUN2011 46.5
2 19JUL2011 54.1
2 29OCT2012 53.6
Dataset B
ID Date measurement_b
1 21MAR2007 43
1 13JUL2007 45
1 07APR2009 47
1 14MAY2009 46
1 16FEB2012 42
1 27AUG2012 53
1 12DEC2012 58
1 20JUN2013 56
1 10DEC2013 53
1 23MAY2014 49
1 17SEP2014 44
1 23SEP2015 40
2 16DEC2011 58
2 22AUG2012 54
2 20FEB2013 56
2 29MAY2013 53
What I am looking for is that if the date in Dataset B is within 6 months of the date in Dataset a, then a new variable called "time" will be added, saying 1,2,3,etc. for how many ever match with ** only measurement_a** length (in other words, I do not need to retain values of measurement_b if it does not match the date in Dataset a. Here is an example of what I mean:
Desired result/dataset
ID Time measure_a measure_b
1 1 52.3 56 (Dataset B Date = 20JUN2013 - Matched exactly)
1 2 65.6 53 (Dataset B date = 10DEC2013 - Within six months of 12JUL2013 [Dataset A Date])
1 3 37.4 44 (Dataset B date = 17SEP2014 - Within six months of 28NOV2014 [Dataset A Date])
1 4 61.3 . (because 17SEP2014 [Dataset B] is closest to 28NOV2014 [Dataset A])
1 5 40.5 40 (because 23SEP2015 [Dataset B] is closest to 22SEP2015 [Dataset A])
1 6 60.5 . (No date in Dataset B that is within 6 months of Date in Dataset A [15OCT2015])
2 1 46.5 . (See below)
2 2 54.1 58 (because 03JUL2011 [Dataset B] is closest to 19JUL2011 [Dataset A])
2 3 53.6 54 (Dataset B date = 22AUG2012 - Within 6 months of Dataset A date = 29OCT2012)
I have joined on ID but the times is proving difficult. I know it could be the difference in months in the "where" statement in the following code:
PROC SQL;
CREATE TABLE join_test as
SELECT * FROM data_a as a
LEFT_JOIN data_b as b
ON a.id = b.id
WHERE days(a.Date - b.Date) <= 180 ;
QUIT;
But this does not do the trick.
Can some **please** help me?
I really appreciate it. Thanks in advance.
Looks like there are several scenarios that need to be handled
1. If the date in dataset B is earlier by 6 months or less
2. If the date in dataset B is later by 6 months or less
3. If the date in dataset B is exactly the same date
We need to define their order of precedence as there are entries that can fit two of the above scenarios, so we can decide what will be the final "measurement_b" value to take.
This is what I have so far, you can play around with the order of the if-then handling below to suit what outcome you want to achieve
data ds1;
set ds1;
time + 1; /*add counter*/
by id date;
if first.id then time = 1;
run;
data ds;
set ds1 ds2; /*combine so they can be sorted by date*/
run;
proc sort data=ds;
by id date; /*sorting*/
run;
data look_both_ways (keep=id date measurement_a time keep rename=(keep=measurement_b) where=(time ne .));
set ds;
by ID;
set ds (firstobs=2 keep=date measurement_b rename=(date=next_date measurement_b=nxt_m_b))
ds (obs=1 keep=time);
format prev_date date9.;
prev_date=ifn(first.ID,(.),lag(date));
next_date=ifn(last.ID,(.),next_date);
prev_m_b = ifn( first.Id, (.), lag(measurement_b));
nxt_m_b = ifn(last.Id, (.), nxt_m_b);
if intck('month',prev_date,date) <= 6 then keep=prev_m_b; /*scenario 1*/
if intck('month',date,next_date) <= 6 then keep=nxt_m_b; /*scenario 2*/
if date=next_date then keep=nxt_m_b; /*scenario 3*/
run;
If you read the log you should have an error about an apparent function DAYS not existing. SAS SQL does not have a "days" function. You can get the number of days between two date values by 1) subtraction or 2) use of the INTCK function.
By "6 months" do you mean "6 calendar months" as in 1 January is "6 months" before "30 July " [month 7 - month 1=6] or do you actually want 180 days? There is a difference. Months have differing numbers of days and the exact comparison you expect needs to be defined.
Also, you need to consider order. Any value that precedes another when using subtraction will yield a negative number and hence is "<180". So are you only concerned with dates in B greater than the dates in A?
You're absolutely right about the days - thank you. I will use the 'intck' function from now on.
By 6 months i mean 180 days actually, since I feel as though it is a more holistic capture of a patient.
That is an interesting question and something I did not consider, actually. Now that I think about it, I do not need dates in B after dates in A - I want any date in Dataset B that falls within +6 months or -6 months of the date in Dataset A. So I would use an absolute value for this!
Thanks for these questions. These are exactly what I need to consider! Thank you so much for your reply!
What if several measurements from B fall into the 6-month range of a particular observation in A? What is the rule for selecting only one?
Hi there, thank you for your reply. If there are many measurements, the general rule of thumb is is to select the date in Dataset B that is the closest to the date in Dataset A.
@mugsie91 wrote:
Hi there, thank you for your reply. If there are many measurements, the general rule of thumb is is to select the date in Dataset B that is the closest to the date in Dataset A.
Then you should try this:
data a;
input ID $ Date :date9. measurement_a;
format date yymmddd10.;
datalines;
1 20JUN2013 52.3
1 12JUL2013 65.6
1 28NOV2014 37.4
1 02DEC2014 61.3
1 22SEP2015 40.5
1 15OCT2015 60.5
2 03JUN2011 46.5
2 19JUL2011 54.1
2 29OCT2012 53.6
;
data b;
input ID $ Date :date9. measurement_b;
format date yymmddd10.;
datalines;
1 21MAR2007 43
1 13JUL2007 45
1 07APR2009 47
1 14MAY2009 46
1 16FEB2012 42
1 27AUG2012 53
1 12DEC2012 58
1 20JUN2013 56
1 10DEC2013 53
1 23MAY2014 49
1 17SEP2014 44
1 23SEP2015 40
2 16DEC2011 58
2 22AUG2012 54
2 20FEB2013 56
2 29MAY2013 53
;
proc sql;
create table want as
select
a.id,
a.date,
b.date as b_date,
a.measurement_a,
b.measurement_b,
abs(a.date - b.date) as datdif
from a
left join b
on a.id = b.id and abs(a.date - b.date) le 180
group by a.id, a.date
having datdif = min(datdif)
;
quit;
Hello - thank you so much! This does the job for me. Thank you so much, I really appreciate it!
Looks like there are several scenarios that need to be handled
1. If the date in dataset B is earlier by 6 months or less
2. If the date in dataset B is later by 6 months or less
3. If the date in dataset B is exactly the same date
We need to define their order of precedence as there are entries that can fit two of the above scenarios, so we can decide what will be the final "measurement_b" value to take.
This is what I have so far, you can play around with the order of the if-then handling below to suit what outcome you want to achieve
data ds1;
set ds1;
time + 1; /*add counter*/
by id date;
if first.id then time = 1;
run;
data ds;
set ds1 ds2; /*combine so they can be sorted by date*/
run;
proc sort data=ds;
by id date; /*sorting*/
run;
data look_both_ways (keep=id date measurement_a time keep rename=(keep=measurement_b) where=(time ne .));
set ds;
by ID;
set ds (firstobs=2 keep=date measurement_b rename=(date=next_date measurement_b=nxt_m_b))
ds (obs=1 keep=time);
format prev_date date9.;
prev_date=ifn(first.ID,(.),lag(date));
next_date=ifn(last.ID,(.),next_date);
prev_m_b = ifn( first.Id, (.), lag(measurement_b));
nxt_m_b = ifn(last.Id, (.), nxt_m_b);
if intck('month',prev_date,date) <= 6 then keep=prev_m_b; /*scenario 1*/
if intck('month',date,next_date) <= 6 then keep=nxt_m_b; /*scenario 2*/
if date=next_date then keep=nxt_m_b; /*scenario 3*/
run;
Hi @sustagens, thank you so much for your reply and your solution! I will try that now, and play around with it to get the outcome.
Thank you so much, I really, really, REALLY appreciate your help!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.