BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
aldreilly
Calcite | Level 5

Hi,

I am processing blood lead tests.  Below is a sample of the data:

 

Record_idCollected_datePb_resultDays
LEAD10009/14/20079.
LEAD10054/30/20075.
LEAD10051/22/20087267
LEAD10073/9/20076.
LEAD10097/11/200717.
LEAD10098/31/20072251
LEAD10099/21/20071021
LEAD10092/22/200815154
LEAD10111/22/20076.
LEAD10117/9/20077168
LEAD10119/17/20071070
LEAD101210/22/20078.

   
The record_id is the unique id for each child in the dataset, collected_date is the date of collection/test, Pb_result is the result of the test, and Days is the number of days between tests for the same child. I have sorted the data by record_id and collected_date. I need to look for confirmatory pairs - tests where the the same child (record_id) received a blood lead test within 1 to 84 days of a previous test, and then select the test with the highest result. Record_id LEAD1009 had 2 tests within 51 days of each other and the second test, where the record for the 8/31/2007 test with Pb_result=22, should be output. The next pair would be the test performed on 8/31/2007 and 9/21/2007, where the 8/31/2007 test should again be output because it was the higher test result of the two.The third and fourth tests for LEAD1009 should not result in any output because the time between tests was greater than 84 days. If the child had only one test then I'd like to output those records to a different dataset.

I have attached an additional schematic of what I am trying to accomplish for further clarification.

I just can't seem to wrap my head around how to successfully code to achieve the results I am looking for. Thanks in advance for any help.

 

 


SAS help example.PNG
1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Are you saying that you want LEAD1009  8/31/2007 output 2 times?

  • Once because it is within 84 days of 7/11/2007 and has a higher pb?
  • Once because it is within 84 days of 9/21/2007 and has a higher pb?

If so, then this works:

 

data have;
  input Record_id :$8. Collected_date :mmddyy10. Pb_result Days ;
  format collected_date yymmddn8.;
datalines;
LEAD1000 9/14/2007 9 . 
LEAD1005 4/30/2007 5 . 
LEAD1005 1/22/2008 7 267 
LEAD1007 3/9/2007 6 . 
LEAD1009 7/11/2007 17 . 
LEAD1009 8/31/2007 22 51     
LEAD1009 9/21/2007 10 21 
LEAD1009 2/22/2008 15 154 
LEAD1011 1/22/2007 6 . 
LEAD1011 7/9/2007 7 168 
LEAD1011 9/17/2007 10 70     
LEAD1012 10/22/2007 8 . 
run;

data want (drop=nxt_:);
  set have ;
  by record_id;

  /* Compare to predecessor */
  if pb_result>lag(pb_result) and first.record_id=0 and days<=84 then output;

  /* ReCompare, this time to successor */
  merge have (keep=record_id)
        have (firstobs=2 keep=pb_result days rename=(pb_result=nxt_pb days=nxt_days));
  if pb_result>nxt_pb and last.record_id=0 and nxt_days<=84 then output;

run;

Notes;

 

  1. Comparing to predessor is pretty straightforward, but one caveat  In the IF statement make sure that the "pb_result>lag(pb_result) condition precedes the other conditions.  Because all the conditions are connected by "and", sas is smart enough to stop checking once a failed condition is detected.  But since we always want the lag queue (this queue have size 1) updated, it should be the first condition.

  2. SAS has no lead functino.  To the compare to successor uses the MERGE statement (notice no accompanying BY statement), where the data set is merged with itself, but the second HAVE has "firstobs=2" meaning that it reads one observation ahead of the first have.  The "rename=" parameters is needed to avoid two records trying to populate the same variable.
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

1 REPLY 1
mkeintz
PROC Star

Are you saying that you want LEAD1009  8/31/2007 output 2 times?

  • Once because it is within 84 days of 7/11/2007 and has a higher pb?
  • Once because it is within 84 days of 9/21/2007 and has a higher pb?

If so, then this works:

 

data have;
  input Record_id :$8. Collected_date :mmddyy10. Pb_result Days ;
  format collected_date yymmddn8.;
datalines;
LEAD1000 9/14/2007 9 . 
LEAD1005 4/30/2007 5 . 
LEAD1005 1/22/2008 7 267 
LEAD1007 3/9/2007 6 . 
LEAD1009 7/11/2007 17 . 
LEAD1009 8/31/2007 22 51     
LEAD1009 9/21/2007 10 21 
LEAD1009 2/22/2008 15 154 
LEAD1011 1/22/2007 6 . 
LEAD1011 7/9/2007 7 168 
LEAD1011 9/17/2007 10 70     
LEAD1012 10/22/2007 8 . 
run;

data want (drop=nxt_:);
  set have ;
  by record_id;

  /* Compare to predecessor */
  if pb_result>lag(pb_result) and first.record_id=0 and days<=84 then output;

  /* ReCompare, this time to successor */
  merge have (keep=record_id)
        have (firstobs=2 keep=pb_result days rename=(pb_result=nxt_pb days=nxt_days));
  if pb_result>nxt_pb and last.record_id=0 and nxt_days<=84 then output;

run;

Notes;

 

  1. Comparing to predessor is pretty straightforward, but one caveat  In the IF statement make sure that the "pb_result>lag(pb_result) condition precedes the other conditions.  Because all the conditions are connected by "and", sas is smart enough to stop checking once a failed condition is detected.  But since we always want the lag queue (this queue have size 1) updated, it should be the first condition.

  2. SAS has no lead functino.  To the compare to successor uses the MERGE statement (notice no accompanying BY statement), where the data set is merged with itself, but the second HAVE has "firstobs=2" meaning that it reads one observation ahead of the first have.  The "rename=" parameters is needed to avoid two records trying to populate the same variable.
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 853 views
  • 0 likes
  • 2 in conversation