Hi,
I am processing blood lead tests. Below is a sample of the data:
Record_id | Collected_date | Pb_result | Days |
LEAD1000 | 9/14/2007 | 9 | . |
LEAD1005 | 4/30/2007 | 5 | . |
LEAD1005 | 1/22/2008 | 7 | 267 |
LEAD1007 | 3/9/2007 | 6 | . |
LEAD1009 | 7/11/2007 | 17 | . |
LEAD1009 | 8/31/2007 | 22 | 51 |
LEAD1009 | 9/21/2007 | 10 | 21 |
LEAD1009 | 2/22/2008 | 15 | 154 |
LEAD1011 | 1/22/2007 | 6 | . |
LEAD1011 | 7/9/2007 | 7 | 168 |
LEAD1011 | 9/17/2007 | 10 | 70 |
LEAD1012 | 10/22/2007 | 8 | . |
The record_id is the unique id for each child in the dataset, collected_date is the date of collection/test, Pb_result is the result of the test, and Days is the number of days between tests for the same child. I have sorted the data by record_id and collected_date. I need to look for confirmatory pairs - tests where the the same child (record_id) received a blood lead test within 1 to 84 days of a previous test, and then select the test with the highest result. Record_id LEAD1009 had 2 tests within 51 days of each other and the second test, where the record for the 8/31/2007 test with Pb_result=22, should be output. The next pair would be the test performed on 8/31/2007 and 9/21/2007, where the 8/31/2007 test should again be output because it was the higher test result of the two.The third and fourth tests for LEAD1009 should not result in any output because the time between tests was greater than 84 days. If the child had only one test then I'd like to output those records to a different dataset.
I have attached an additional schematic of what I am trying to accomplish for further clarification.
I just can't seem to wrap my head around how to successfully code to achieve the results I am looking for. Thanks in advance for any help.
Are you saying that you want LEAD1009 8/31/2007 output 2 times?
If so, then this works:
data have;
input Record_id :$8. Collected_date :mmddyy10. Pb_result Days ;
format collected_date yymmddn8.;
datalines;
LEAD1000 9/14/2007 9 .
LEAD1005 4/30/2007 5 .
LEAD1005 1/22/2008 7 267
LEAD1007 3/9/2007 6 .
LEAD1009 7/11/2007 17 .
LEAD1009 8/31/2007 22 51
LEAD1009 9/21/2007 10 21
LEAD1009 2/22/2008 15 154
LEAD1011 1/22/2007 6 .
LEAD1011 7/9/2007 7 168
LEAD1011 9/17/2007 10 70
LEAD1012 10/22/2007 8 .
run;
data want (drop=nxt_:);
set have ;
by record_id;
/* Compare to predecessor */
if pb_result>lag(pb_result) and first.record_id=0 and days<=84 then output;
/* ReCompare, this time to successor */
merge have (keep=record_id)
have (firstobs=2 keep=pb_result days rename=(pb_result=nxt_pb days=nxt_days));
if pb_result>nxt_pb and last.record_id=0 and nxt_days<=84 then output;
run;
Notes;
Are you saying that you want LEAD1009 8/31/2007 output 2 times?
If so, then this works:
data have;
input Record_id :$8. Collected_date :mmddyy10. Pb_result Days ;
format collected_date yymmddn8.;
datalines;
LEAD1000 9/14/2007 9 .
LEAD1005 4/30/2007 5 .
LEAD1005 1/22/2008 7 267
LEAD1007 3/9/2007 6 .
LEAD1009 7/11/2007 17 .
LEAD1009 8/31/2007 22 51
LEAD1009 9/21/2007 10 21
LEAD1009 2/22/2008 15 154
LEAD1011 1/22/2007 6 .
LEAD1011 7/9/2007 7 168
LEAD1011 9/17/2007 10 70
LEAD1012 10/22/2007 8 .
run;
data want (drop=nxt_:);
set have ;
by record_id;
/* Compare to predecessor */
if pb_result>lag(pb_result) and first.record_id=0 and days<=84 then output;
/* ReCompare, this time to successor */
merge have (keep=record_id)
have (firstobs=2 keep=pb_result days rename=(pb_result=nxt_pb days=nxt_days));
if pb_result>nxt_pb and last.record_id=0 and nxt_days<=84 then output;
run;
Notes;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.