Hello all, I have data set of lab tests where some of the tests were performed in house and some were sent out to a different lab to get tested for a patient. Those lab test that get sent out have a date that is 1-7 days post the date of the patients visit/ time of sample collection (thus added to a different row). I would like to add this information to the same row as all the other patient information collection for that visit. For example I have an example of two variables (lab_test_1 and lab_test_2) shown below for a single id where those lab tests order dates were 1 day after. I will have to do this for every id
ID | var1 | var2 | var3 | var4 | var5 | labtest_1 | labtest_2 | test_order_date | Time Between Tests |
1 | 24 | 0.33 | Y | 7 | 23 | . | 11/3/2015 | 0 | |
1 | 45 | 0.54 | Y | 9 | 22 | . | 12/17/2015 | 44 | |
1 | . | . | . | . | . | 1.28 | 50-75 | 12/18/2015 | 1 |
1 | 354 | 0.36 | N | 8 | 21 | . | 6/22/2016 | 187 | |
1 | 6 | 0.45 | N | 13 | 22 | 1.31 | 0-2 | 8/18/2016 | 57 |
1 | . | . | . | . | . | 1.31 | 0-2 | 8/18/2016 | 0 |
1 | 4 | 0.32 | N | 12 | 28 | . | 11/15/2016 | 89 |
Want dataset:
ID | var1 | var2 | var3 | var4 | var5 | labtest_1 | labtest_2 | test_order_date | Time Between Tests | alternative_lab_test__date |
1 | 24 | 0.33 | Y | 7 | 23 | . | 11/3/2015 | 0 | ||
1 | 45 | 0.54 | Y | 9 | 22 | 1.28 | 50-75 | 12/17/2015 | 44 | 12/18/2015 |
1 | 354 | 0.36 | N | 8 | 21 | . | 6/22/2016 | 187 | ||
1 | 6 | 0.45 | N | 13 | 22 | 1.31 | 0-2 | 8/18/2016 | 57 | |
1 | 4 | 0.32 | N | 12 | 28 | . | 11/15/2016 | 89 |
I think I would need to create some sort of threshold to look at time between test dates (shown above) that is between 1-7 days post collection date and somehow match onto the row that has the most non-missing information. I am mainly stuck on how to create a theshold for the dates.
Best
EDIT: I played around a little bit with the code that PG created and I think I made my example data a little too simple. If I only have one id, this would work well, but I have cases with multiple IDs and the lag function does not seem to work well when you have multple subject ids (it will carry over the date from the last row from the prevous id to the first row of the current id, even though they are two different subjects. I have added a second subject as an example:
data have;
infile datalines truncover;
input ID labID var1 var2 var3 $ var4 var5 labtest_1 labtest_2 $ test_order_date :mmddyy10. ;
format test_order_date yymmdd10.;
datalines;
1 1 24 0.33 Y 7 23 . . 11/3/2015
1 1 45 0.54 Y 9 22 . . 12/17/2015
1 2 . . . . . 1.28 50-75 12/18/2015
1 1 354 0.36 N 8 21 . . 6/22/2016
1 1 6 0.45 N 13 22 1.31 0-2 8/18/2016
1 2 . . . . . 1.31 0-2 8/18/2016
1 1 4 0.32 N 12 28 . . 11/15/2016
2 1 14 2.2 Y 5 23 . . 8/16/2015
2 2 . . Y 5 23 . . 8/16/2015
2 1 . . . . . 0-2 1.013 8/21/2015
2 2 . . . . . 0-2 1.013 8/21/2015
2 1 22 2.1 N 5 25 . . 9/12/2015
2 1 23 2.5 N 4 21 . . 10/28/2015
2 1 18 2.1 N 2 22 . . 4/8/2016
2 2 26 1.7 . . . 0-2 1.014 10/26/2016
;
Had to fix your data a bit. Added variable labId to identify the results origin and fixed the number of columns on some obs. Didn't require variable days_between_tests.
data have;
infile datalines truncover;
input ID labID var1 var2 var3 $ var4 var5 labtest_1 labtest_2 $ test_order_date :mmddyy10. Days_Between_Tests;
format test_order_date yymmdd10.;
datalines;
1 1 24 0.33 Y 7 23 . . 11/3/2015 0
1 1 45 0.54 Y 9 22 . . 12/17/2015 44
1 2 . . . . . 1.28 50-75 12/18/2015 1
1 1 354 0.36 N 8 21 . . 6/22/2016 187
1 1 6 0.45 N 13 22 1.31 0-2 8/18/2016 57
1 2 . . . . . 1.31 0-2 8/18/2016 0
1 1 4 0.32 N 12 28 . . 11/15/2016 89
;
proc sort data=have; by id test_order_date; run;
data temp;
set have; by id;
prevDate = lag(test_order_date);
prevLabId = lag(labId);
if first.id then test + 1;
else if labId = prevLabId or intck("day", prevDate, test_order_date) > 7 then test + 1;
drop prev:;
run;
data want;
update temp(where=(labId=1)) temp(where=(labId ne 1)); by test;
drop labId test days_between_tests;
run;
proc print data=want; run;
Please post data in the form of a datastep.
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.
Anything else we have to attempt to create data to test code with and we may make choices on variable types that will not work with your data.
And in the case of some browsers and the form you pasted the data many of the cells are appearing over the marginal information in the forum and just plain hard to read.
Had to fix your data a bit. Added variable labId to identify the results origin and fixed the number of columns on some obs. Didn't require variable days_between_tests.
data have;
infile datalines truncover;
input ID labID var1 var2 var3 $ var4 var5 labtest_1 labtest_2 $ test_order_date :mmddyy10. Days_Between_Tests;
format test_order_date yymmdd10.;
datalines;
1 1 24 0.33 Y 7 23 . . 11/3/2015 0
1 1 45 0.54 Y 9 22 . . 12/17/2015 44
1 2 . . . . . 1.28 50-75 12/18/2015 1
1 1 354 0.36 N 8 21 . . 6/22/2016 187
1 1 6 0.45 N 13 22 1.31 0-2 8/18/2016 57
1 2 . . . . . 1.31 0-2 8/18/2016 0
1 1 4 0.32 N 12 28 . . 11/15/2016 89
;
proc sort data=have; by id test_order_date; run;
data temp;
set have; by id;
prevDate = lag(test_order_date);
prevLabId = lag(labId);
if first.id then test + 1;
else if labId = prevLabId or intck("day", prevDate, test_order_date) > 7 then test + 1;
drop prev:;
run;
data want;
update temp(where=(labId=1)) temp(where=(labId ne 1)); by test;
drop labId test days_between_tests;
run;
proc print data=want; run;
I just wanted to give an update and say that the code PGStats has pointed me in the correct direcion. I have a much better grasp on how I want to code the issue. Once it is complete, I will post it.
Thank you very much!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.