Selection based on pairs of records within BY group where test occurred within defined time period

Accepted Solution Solved
Reply
Senior User
Posts: 1
Accepted Solution

Selection based on pairs of records within BY group where test occurred within defined time period

Hi,

I am processing blood lead tests.  Below is a sample of the data:

 

Record_idCollected_datePb_resultDays
LEAD10009/14/20079.
LEAD10054/30/20075.
LEAD10051/22/20087267
LEAD10073/9/20076.
LEAD10097/11/200717.
LEAD10098/31/20072251
LEAD10099/21/20071021
LEAD10092/22/200815154
LEAD10111/22/20076.
LEAD10117/9/20077168
LEAD10119/17/20071070
LEAD101210/22/20078.

   
The record_id is the unique id for each child in the dataset, collected_date is the date of collection/test, Pb_result is the result of the test, and Days is the number of days between tests for the same child. I have sorted the data by record_id and collected_date. I need to look for confirmatory pairs - tests where the the same child (record_id) received a blood lead test within 1 to 84 days of a previous test, and then select the test with the highest result. Record_id LEAD1009 had 2 tests within 51 days of each other and the second test, where the record for the 8/31/2007 test with Pb_result=22, should be output. The next pair would be the test performed on 8/31/2007 and 9/21/2007, where the 8/31/2007 test should again be output because it was the higher test result of the two.The third and fourth tests for LEAD1009 should not result in any output because the time between tests was greater than 84 days. If the child had only one test then I'd like to output those records to a different dataset.

I have attached an additional schematic of what I am trying to accomplish for further clarification.

I just can't seem to wrap my head around how to successfully code to achieve the results I am looking for. Thanks in advance for any help.

 

 

Attachment

Accepted Solutions
Solution
‎03-07-2017 09:00 AM
Super User
Posts: 789

Re: Selection based on pairs of records within BY group where test occurred within defined time peri

[ Edited ]

Are you saying that you want LEAD1009  8/31/2007 output 2 times?

  • Once because it is within 84 days of 7/11/2007 and has a higher pb?
  • Once because it is within 84 days of 9/21/2007 and has a higher pb?

If so, then this works:

 

data have;
  input Record_id :$8. Collected_date :mmddyy10. Pb_result Days ;
  format collected_date yymmddn8.;
datalines;
LEAD1000 9/14/2007 9 . 
LEAD1005 4/30/2007 5 . 
LEAD1005 1/22/2008 7 267 
LEAD1007 3/9/2007 6 . 
LEAD1009 7/11/2007 17 . 
LEAD1009 8/31/2007 22 51     
LEAD1009 9/21/2007 10 21 
LEAD1009 2/22/2008 15 154 
LEAD1011 1/22/2007 6 . 
LEAD1011 7/9/2007 7 168 
LEAD1011 9/17/2007 10 70     
LEAD1012 10/22/2007 8 . 
run;

data want (drop=nxt_:);
  set have ;
  by record_id;

  /* Compare to predecessor */
  if pb_result>lag(pb_result) and first.record_id=0 and days<=84 then output;

  /* ReCompare, this time to successor */
  merge have (keep=record_id)
        have (firstobs=2 keep=pb_result days rename=(pb_result=nxt_pb days=nxt_days));
  if pb_result>nxt_pb and last.record_id=0 and nxt_days<=84 then output;

run;

Notes;

 

  1. Comparing to predessor is pretty straightforward, but one caveat  In the IF statement make sure that the "pb_result>lag(pb_result) condition precedes the other conditions.  Because all the conditions are connected by "and", sas is smart enough to stop checking once a failed condition is detected.  But since we always want the lag queue (this queue have size 1) updated, it should be the first condition.

  2. SAS has no lead functino.  To the compare to successor uses the MERGE statement (notice no accompanying BY statement), where the data set is merged with itself, but the second HAVE has "firstobs=2" meaning that it reads one observation ahead of the first have.  The "rename=" parameters is needed to avoid two records trying to populate the same variable.

View solution in original post


All Replies
Solution
‎03-07-2017 09:00 AM
Super User
Posts: 789

Re: Selection based on pairs of records within BY group where test occurred within defined time peri

[ Edited ]

Are you saying that you want LEAD1009  8/31/2007 output 2 times?

  • Once because it is within 84 days of 7/11/2007 and has a higher pb?
  • Once because it is within 84 days of 9/21/2007 and has a higher pb?

If so, then this works:

 

data have;
  input Record_id :$8. Collected_date :mmddyy10. Pb_result Days ;
  format collected_date yymmddn8.;
datalines;
LEAD1000 9/14/2007 9 . 
LEAD1005 4/30/2007 5 . 
LEAD1005 1/22/2008 7 267 
LEAD1007 3/9/2007 6 . 
LEAD1009 7/11/2007 17 . 
LEAD1009 8/31/2007 22 51     
LEAD1009 9/21/2007 10 21 
LEAD1009 2/22/2008 15 154 
LEAD1011 1/22/2007 6 . 
LEAD1011 7/9/2007 7 168 
LEAD1011 9/17/2007 10 70     
LEAD1012 10/22/2007 8 . 
run;

data want (drop=nxt_:);
  set have ;
  by record_id;

  /* Compare to predecessor */
  if pb_result>lag(pb_result) and first.record_id=0 and days<=84 then output;

  /* ReCompare, this time to successor */
  merge have (keep=record_id)
        have (firstobs=2 keep=pb_result days rename=(pb_result=nxt_pb days=nxt_days));
  if pb_result>nxt_pb and last.record_id=0 and nxt_days<=84 then output;

run;

Notes;

 

  1. Comparing to predessor is pretty straightforward, but one caveat  In the IF statement make sure that the "pb_result>lag(pb_result) condition precedes the other conditions.  Because all the conditions are connected by "and", sas is smart enough to stop checking once a failed condition is detected.  But since we always want the lag queue (this queue have size 1) updated, it should be the first condition.

  2. SAS has no lead functino.  To the compare to successor uses the MERGE statement (notice no accompanying BY statement), where the data set is merged with itself, but the second HAVE has "firstobs=2" meaning that it reads one observation ahead of the first have.  The "rename=" parameters is needed to avoid two records trying to populate the same variable.
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 1 reply
  • 136 views
  • 0 likes
  • 2 in conversation