BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mugsie91
Fluorite | Level 6

Hi all,

 

I am hoping you can help me! **Please help!!!!** 🙏

I am in SAS and I have datasets A and B with different measurements (relating to patient's health) as follows:

 

Dataset A
ID            Date               measurement_a
1          20JUN2013                52.3
1          12JUL2013                 65.6
1          28NOV2014                37.4
1          02DEC2014                61.3
1          22SEP2015                40.5
1         15OCT2015                 60.5
2         03JUN2011                 46.5
2         19JUL2011                  54.1
2         29OCT2012                 53.6

Dataset B

ID            Date                  measurement_b
1          21MAR2007                  43
1          13JUL2007                   45
1           07APR2009                 47
1           14MAY2009                 46
1           16FEB2012                  42
1           27AUG2012                 53
1           12DEC2012                 58
1           20JUN2013                  56
1           10DEC2013                  53
1           23MAY2014                  49
1           17SEP2014                  44
1           23SEP2015                  40
2           16DEC2011                  58
2           22AUG2012                 54
2           20FEB2013                  56
2           29MAY2013                 53


What I am looking for is that if the date in Dataset B is within 6 months of the date in Dataset a, then a new variable called "time" will be added, saying 1,2,3,etc. for how many ever match with ** only measurement_a** length (in other words, I do not need to retain values of measurement_b if it does not match the date in Dataset a. Here is an example of what I mean:

 

Desired result/dataset

ID      Time    measure_a    measure_b
1          1           52.3           56 (Dataset B Date = 20JUN2013 - Matched exactly)
1          2           65.6           53 (Dataset B date = 10DEC2013 - Within six months of 12JUL2013 [Dataset A Date])
1          3           37.4           44 (Dataset B date = 17SEP2014 - Within six months of 28NOV2014 [Dataset A Date])
1          4           61.3              . (because 17SEP2014 [Dataset B] is closest to 28NOV2014 [Dataset A])
1          5           40.5           40 (because 23SEP2015 [Dataset B] is closest to 22SEP2015 [Dataset A])
1          6           60.5              . (No date in Dataset B that is within 6 months of Date in Dataset A [15OCT2015])

2          1           46.5              . (See below)
2          2           54.1            58 (because 03JUL2011 [Dataset B] is closest to 19JUL2011 [Dataset A])
2          3           53.6            54 (Dataset B date = 22AUG2012 - Within 6 months of Dataset A date = 29OCT2012)

 

I have joined on ID but the times is proving difficult. I know it could be the difference in months in the "where" statement in the following code:


PROC SQL;
CREATE TABLE join_test as
SELECT * FROM data_a as a
LEFT_JOIN data_b as b
ON a.id = b.id
WHERE days(a.Date - b.Date) <= 180 ;
QUIT;

But this does not do the trick.


Can some **please** help me?

I really appreciate it. Thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
sustagens
Pyrite | Level 9

Looks like there are several scenarios that need to be handled

1. If the date in dataset B is earlier by 6 months or less

2. If the date in dataset B is later by 6 months or less

3. If the date in dataset B is exactly the same date

 

We need to define their order of precedence as there are entries that can fit two of the above scenarios, so we can decide what will be the final "measurement_b" value to take.

 

This is what I have so far, you can play around with the order of the if-then handling below to suit what outcome you want to achieve

data ds1;
  set ds1;
  time + 1; /*add counter*/
  by id date;
  if first.id then time = 1;
run;

data ds;
set ds1 ds2; /*combine so they can be sorted by date*/
run;

proc sort data=ds;
by id date; /*sorting*/
run;

data look_both_ways (keep=id date measurement_a time keep rename=(keep=measurement_b) where=(time ne .));
set ds;
by ID;
set ds (firstobs=2 keep=date measurement_b rename=(date=next_date measurement_b=nxt_m_b))
    ds (obs=1 keep=time);
format prev_date date9.;

prev_date=ifn(first.ID,(.),lag(date));
next_date=ifn(last.ID,(.),next_date);

prev_m_b = ifn( first.Id, (.), lag(measurement_b));
nxt_m_b = ifn(last.Id, (.), nxt_m_b);

if intck('month',prev_date,date) <= 6 then keep=prev_m_b; /*scenario 1*/
if intck('month',date,next_date) <= 6 then keep=nxt_m_b; /*scenario 2*/
if date=next_date then keep=nxt_m_b; /*scenario 3*/
run;

View solution in original post

8 REPLIES 8
ballardw
Super User

If you read the log you should have an error about an apparent function DAYS not existing. SAS SQL does not have a "days" function. You can get the number of days between two date values by 1) subtraction or 2) use of the INTCK function.

 

By "6 months" do you mean "6 calendar months" as in 1 January is "6 months" before "30 July " [month 7 - month 1=6] or do you actually want 180 days? There is a difference. Months have differing numbers of days and the exact comparison you expect needs to be defined.

Also, you need to consider order. Any value that precedes another when using subtraction will yield a negative number and hence is "<180". So are you only concerned with dates in B greater than the dates in A?

mugsie91
Fluorite | Level 6

You're absolutely right about the days - thank you. I will use the 'intck' function from now on. 

 

By 6 months i mean 180 days actually, since I feel as though it is a more holistic capture of a patient. 

 

That is an interesting question and something I did not consider, actually. Now that I think about it, I do not need dates in B after dates in A - I want any date in Dataset B that falls within +6 months or -6 months of the date in Dataset A. So I would use an absolute value for this! 

 

Thanks for these questions. These are exactly what I need to consider! Thank you so much for your reply!

mugsie91
Fluorite | Level 6

Hi there, thank you for your reply. If there are many measurements, the general rule of thumb is is to select the date in Dataset B that is the closest to the date in Dataset A.

Kurt_Bremser
Super User

@mugsie91 wrote:

Hi there, thank you for your reply. If there are many measurements, the general rule of thumb is is to select the date in Dataset B that is the closest to the date in Dataset A.


Then you should try this:

data a;
input ID $ Date :date9. measurement_a;
format date yymmddd10.;
datalines;
1 20JUN2013 52.3
1 12JUL2013 65.6
1 28NOV2014 37.4
1 02DEC2014 61.3
1 22SEP2015 40.5
1 15OCT2015 60.5
2 03JUN2011 46.5 
2 19JUL2011 54.1
2 29OCT2012 53.6
;

data b;
input ID $ Date :date9. measurement_b;
format date yymmddd10.;
datalines;
1 21MAR2007 43
1 13JUL2007 45
1 07APR2009 47
1 14MAY2009 46
1 16FEB2012 42
1 27AUG2012 53
1 12DEC2012 58 
1 20JUN2013 56
1 10DEC2013 53
1 23MAY2014 49
1 17SEP2014 44
1 23SEP2015 40
2 16DEC2011 58
2 22AUG2012 54
2 20FEB2013 56
2 29MAY2013 53
;

proc sql;
create table want as
  select
    a.id,
    a.date,
    b.date as b_date,
    a.measurement_a,
    b.measurement_b,
    abs(a.date - b.date) as datdif
  from a
  left join b
  on a.id = b.id and abs(a.date - b.date) le 180
  group by a.id, a.date
  having datdif = min(datdif)
;
quit;
mugsie91
Fluorite | Level 6

Hello - thank you so much! This does the job for me. Thank you so much, I really appreciate it!

sustagens
Pyrite | Level 9

Looks like there are several scenarios that need to be handled

1. If the date in dataset B is earlier by 6 months or less

2. If the date in dataset B is later by 6 months or less

3. If the date in dataset B is exactly the same date

 

We need to define their order of precedence as there are entries that can fit two of the above scenarios, so we can decide what will be the final "measurement_b" value to take.

 

This is what I have so far, you can play around with the order of the if-then handling below to suit what outcome you want to achieve

data ds1;
  set ds1;
  time + 1; /*add counter*/
  by id date;
  if first.id then time = 1;
run;

data ds;
set ds1 ds2; /*combine so they can be sorted by date*/
run;

proc sort data=ds;
by id date; /*sorting*/
run;

data look_both_ways (keep=id date measurement_a time keep rename=(keep=measurement_b) where=(time ne .));
set ds;
by ID;
set ds (firstobs=2 keep=date measurement_b rename=(date=next_date measurement_b=nxt_m_b))
    ds (obs=1 keep=time);
format prev_date date9.;

prev_date=ifn(first.ID,(.),lag(date));
next_date=ifn(last.ID,(.),next_date);

prev_m_b = ifn( first.Id, (.), lag(measurement_b));
nxt_m_b = ifn(last.Id, (.), nxt_m_b);

if intck('month',prev_date,date) <= 6 then keep=prev_m_b; /*scenario 1*/
if intck('month',date,next_date) <= 6 then keep=nxt_m_b; /*scenario 2*/
if date=next_date then keep=nxt_m_b; /*scenario 3*/
run;
mugsie91
Fluorite | Level 6

Hi @sustagens, thank you so much for your reply and your solution! I will try that now, and play around with it to get the outcome. 

 

Thank you so much, I really, really, REALLY appreciate your help! 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 1971 views
  • 6 likes
  • 4 in conversation