DATA Step, Macro, Functions and more

Match and replace row information by a threshold of dates

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 76
Accepted Solution

Match and replace row information by a threshold of dates

[ Edited ]

Hello all, I have data set of lab tests where some of the tests were performed in house and some were sent out to a different lab to get tested for a patient. Those lab test that get sent out have a date that is 1-7 days post the date of the patients visit/ time of sample collection (thus added to a different row). I would like to add this information to the same row as all the other patient information collection for that visit. For example I have an example of two variables (lab_test_1 and lab_test_2) shown below for a single id where those lab tests order dates were 1 day after. I will have to do this for every id

 

ID var1 var2 var3 var4 var5 labtest_1 labtest_2 test_order_date Time Between Tests
1 24 0.33 Y 7 23 .   11/3/2015 0
1 45 0.54 Y 9 22 .   12/17/2015 44
1 . . . . . 1.28 50-75 12/18/2015 1
1 354 0.36 N 8 21 .   6/22/2016 187
1 6 0.45 N 13 22 1.31 0-2 8/18/2016 57
1 . . . . . 1.31 0-2 8/18/2016 0
1 4 0.32 N 12 28 .   11/15/2016 89

 

Want dataset:

ID var1 var2 var3 var4 var5 labtest_1 labtest_2 test_order_date Time Between Tests alternative_lab_test__date
1 24 0.33 Y 7 23 .   11/3/2015 0  
1 45 0.54 Y 9 22 1.28 50-75 12/17/2015 44    12/18/2015
1 354 0.36 N 8 21 .   6/22/2016 187  
1 6 0.45 N 13 22 1.31 0-2 8/18/2016 57  
1 4 0.32 N 12 28 .   11/15/2016 89  

 

 

I think I would need to create some sort of threshold to look at time between test dates (shown above) that is between 1-7 days post collection date and somehow match onto the row that has the most non-missing information. I am mainly stuck on how to create a theshold for the dates. 

 

Best

 

EDIT: I played around a little bit with the code that PG created and I think I made my example data a little too simple. If I only have one id, this would work well, but I have cases with multiple IDs and the lag function does not seem to work well when you have multple subject ids (it will carry over the date from the last row from the prevous id to the first row of the current id, even though they are two different subjects. I have added a second subject as an example:

data have;
infile datalines truncover;
input ID labID var1 var2 var3 $ var4 var5 labtest_1 labtest_2 $ test_order_date :mmddyy10. ;
format test_order_date yymmdd10.;
datalines;
1 1 24 0.33 Y 7 23 . . 11/3/2015
1 1 45 0.54 Y 9 22 . . 12/17/2015
1 2 . . . . . 1.28 50-75 12/18/2015
1 1 354 0.36 N 8 21 . . 6/22/2016
1 1 6 0.45 N 13 22 1.31 0-2 8/18/2016
1 2 . . . . . 1.31 0-2 8/18/2016
1 1 4 0.32 N 12 28 . . 11/15/2016
2 1 14 2.2 Y 5 23 . . 8/16/2015
2 2 . . Y 5 23 . . 8/16/2015
2 1 . . . . . 0-2 1.013 8/21/2015
2 2	. . . . . 0-2 1.013	8/21/2015
2 1	22 2.1 N 5 25 .	. 9/12/2015
2 1	23 2.5 N 4 21 .	. 10/28/2015
2 1	18 2.1 N 2 22 .	. 4/8/2016
2 2	26 1.7 . . . 0-2 1.014 10/26/2016
;

Accepted Solutions
Solution
‎03-20-2017 01:19 PM
Respected Advisor
Posts: 4,932

Re: Match and replace row information by a threshold of dates

Posted in reply to daszlosek

Had to fix your data a bit. Added variable labId to identify the results origin and fixed the number of columns on some obs. Didn't require variable days_between_tests.

 

data have;
infile datalines truncover;
input ID labID var1 var2 var3 $ var4 var5 labtest_1 labtest_2 $ test_order_date :mmddyy10. Days_Between_Tests;
format test_order_date yymmdd10.;
datalines;
1 1 24 0.33 Y 7 23 . . 11/3/2015 0
1 1 45 0.54 Y 9 22 . . 12/17/2015 44
1 2 . . . . . 1.28 50-75 12/18/2015 1
1 1 354 0.36 N 8 21 . . 6/22/2016 187
1 1 6 0.45 N 13 22 1.31 0-2 8/18/2016 57
1 2 . . . . . 1.31 0-2 8/18/2016 0
1 1 4 0.32 N 12 28 . . 11/15/2016 89
;

proc sort data=have; by id test_order_date; run;

data temp;
set have; by id;
prevDate = lag(test_order_date);
prevLabId = lag(labId);
if first.id then test + 1;
else if labId = prevLabId or intck("day", prevDate, test_order_date) > 7 then test + 1;
drop prev:;
run;

data want;
update temp(where=(labId=1)) temp(where=(labId ne 1)); by test;
drop labId test days_between_tests;
run;

proc print data=want; run;
PG

View solution in original post


All Replies
Super User
Posts: 11,343

Re: Match and replace row information by a threshold of dates

Posted in reply to daszlosek

Please post data in the form of a datastep.

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

 

Anything else we have to attempt to create data to test code with and we may make choices on variable types that will not work with your data.

 

And in the case of some browsers and the form you pasted the data many of the cells are appearing over the marginal information in the forum and just plain hard to read.

Solution
‎03-20-2017 01:19 PM
Respected Advisor
Posts: 4,932

Re: Match and replace row information by a threshold of dates

Posted in reply to daszlosek

Had to fix your data a bit. Added variable labId to identify the results origin and fixed the number of columns on some obs. Didn't require variable days_between_tests.

 

data have;
infile datalines truncover;
input ID labID var1 var2 var3 $ var4 var5 labtest_1 labtest_2 $ test_order_date :mmddyy10. Days_Between_Tests;
format test_order_date yymmdd10.;
datalines;
1 1 24 0.33 Y 7 23 . . 11/3/2015 0
1 1 45 0.54 Y 9 22 . . 12/17/2015 44
1 2 . . . . . 1.28 50-75 12/18/2015 1
1 1 354 0.36 N 8 21 . . 6/22/2016 187
1 1 6 0.45 N 13 22 1.31 0-2 8/18/2016 57
1 2 . . . . . 1.31 0-2 8/18/2016 0
1 1 4 0.32 N 12 28 . . 11/15/2016 89
;

proc sort data=have; by id test_order_date; run;

data temp;
set have; by id;
prevDate = lag(test_order_date);
prevLabId = lag(labId);
if first.id then test + 1;
else if labId = prevLabId or intck("day", prevDate, test_order_date) > 7 then test + 1;
drop prev:;
run;

data want;
update temp(where=(labId=1)) temp(where=(labId ne 1)); by test;
drop labId test days_between_tests;
run;

proc print data=want; run;
PG
Frequent Contributor
Posts: 76

Re: Match and replace row information by a threshold of dates

[ Edited ]

I just wanted to give an update and say that the code PGStats has pointed me in the correct direcion. I have a much better grasp on how I want to code the issue. Once it is complete, I will post it. 

 

Thank you very much!

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 163 views
  • 1 like
  • 3 in conversation