Hi,
I am working with health care date. I have a data set in which patients with unique IDs have either received a drug, ambulatory care or been hospitalized. My event of interest is receiving a drug and hospitalization or ambulatory care are judged useful or not based on drug use. Basically:
1. If a patient has received ambulatory care before using the drug then I do not need that ambulatory service. Also if the patient has received ambulatory care more than 90 days after receiving the drug I do not need that ambulatory care.
2. If a patient has been hospitalized within 90 days prior to receiving the drug I do want that hospitalization. Also, if a patient has been hospitalized within 90 days after receiving the drug I do need that hospitalization.
This is how original data set might look like:
| ID | service_date | Drug | amb | Hosp | 
| 1 | 12-May-12 | 1 | ||
| 1 | 20-May-12 | 1 | ||
| 1 | 29-Jun-12 | 1 | ||
| 1 | 30-Sep-12 | 1 | ||
| 1 | 15-Dec-12 | 1 | ||
| 1 | 19-Nov-13 | 1 | ||
| 2 | 13-May-12 | 1 | ||
| 2 | 13-Jul-12 | 1 | ||
| 2 | 18-Jul-12 | 1 | ||
| 2 | 10-Aug-12 | 1 | ||
| 2 | 01-Sep-12 | 1 | ||
| 2 | 15-Nov-12 | 1 | ||
| 2 | 05-Feb-13 | 1 | ||
| 2 | 18-Feb-13 | 1 | ||
| 2 | 01-Mar-13 | 1 | ||
| 2 | 02-Mar-13 | 1 | ||
| 2 | 07-May-13 | 1 | ||
| 2 | 07-May-14 | 1 | ||
| 3 | 10-Nov-12 | 1 | ||
| 3 | 20-Nov-12 | 1 | ||
| 3 | 01-Dec-12 | 1 | ||
| 3 | 03-Dec-12 | 1 | ||
| 3 | 25-Jan-13 | 1 | ||
| 3 | 01-Mar-13 | 1 | ||
| 3 | 08-Mar-13 | 1 | ||
| 3 | 08-Apr-13 | 1 | ||
| 3 | 09-Apr-13 | 1 | ||
| 3 | 03-Aug-14 | 1 | 
And this is what I expect to get out of the code:
| ID | service_date | Drug | amb | Hosp | 
| 1 | 12-May-12 | 1 | ||
| 1 | 20-May-12 | 1 | ||
| 1 | 29-Jun-12 | 1 | ||
| 1 | 30-Sep-12 | 1 | ||
| 1 | 15-Dec-12 | 1 | ||
| 2 | 18-Jul-12 | 1 | ||
| 2 | 10-Aug-12 | 1 | ||
| 2 | 01-Sep-12 | 1 | ||
| 2 | 15-Nov-12 | 1 | ||
| 2 | 18-Feb-13 | 1 | ||
| 2 | 01-Mar-13 | 1 | ||
| 2 | 02-Mar-13 | 1 | ||
| 2 | 07-May-13 | 1 | ||
| 3 | 10-Nov-12 | 1 | ||
| 3 | 03-Dec-12 | 1 | ||
| 3 | 25-Jan-13 | 1 | ||
| 3 | 01-Mar-13 | 1 | ||
| 3 | 08-Mar-13 | 1 | ||
| 3 | 08-Apr-13 | 1 | ||
| 3 | 09-Apr-13 | 1 | 
I have come up with some ways to clean up my data set but it has many data steps like reversing the data set and frankly I am not sure if it is actually working.
Does anyone have a reliable compact code for this?
Thanks,
Your rule can be restated as keep (1) all drug records (2) all ambl records up to 90 days after any drug record, (3) all hosp records within 90 days of any drug record.
So pass through the drug records for a given id. Keep an two arrays indexed by dates covering the entire date range of your study, one array HOSP_DATES will have a "Y" for all qualifying elements (i.e. all qualifying hospiital dates). A similar AMBL_DATES array can be kept for ambulatory records:
data want (drop=d);
  set have (where=(drug=1) in=firstpass)
      have (in=secondpass);
  /* Correction - put in $1 to make the arrays as character arrays */
  array hosp_dates {%sysevalf("01jan2011"d):%sysevalf("31dec2018"d) $1 _temporary_;
  array ambl_dates {%sysevalf("01jan2011"d):%sysevalf("31dec2018"d) $1 _temporary_;
  if first.id then call missing (of hosp_dates{*}, of ambl_dates{*});
  if firstpass then do;
    do d=service_date-90 to service_date+90; hosp_dates{d}='Y'; end;
    do d=service_date to service_date+90;    ambl_dates{d}='Y'; end;
  end;
  if secondpass;
  if (drug=1) or (hosp=1 and hosp_dates{service_date}='Y') or (ambl=1 and ambl_dates{service_date}='Y');
run;
Thanks @Kurt_Bremser I've now corrected the array declarations to specify character arrays.
Your rule can be restated as keep (1) all drug records (2) all ambl records up to 90 days after any drug record, (3) all hosp records within 90 days of any drug record.
So pass through the drug records for a given id. Keep an two arrays indexed by dates covering the entire date range of your study, one array HOSP_DATES will have a "Y" for all qualifying elements (i.e. all qualifying hospiital dates). A similar AMBL_DATES array can be kept for ambulatory records:
data want (drop=d);
  set have (where=(drug=1) in=firstpass)
      have (in=secondpass);
  /* Correction - put in $1 to make the arrays as character arrays */
  array hosp_dates {%sysevalf("01jan2011"d):%sysevalf("31dec2018"d) $1 _temporary_;
  array ambl_dates {%sysevalf("01jan2011"d):%sysevalf("31dec2018"d) $1 _temporary_;
  if first.id then call missing (of hosp_dates{*}, of ambl_dates{*});
  if firstpass then do;
    do d=service_date-90 to service_date+90; hosp_dates{d}='Y'; end;
    do d=service_date to service_date+90;    ambl_dates{d}='Y'; end;
  end;
  if secondpass;
  if (drug=1) or (hosp=1 and hosp_dates{service_date}='Y') or (ambl=1 and ambl_dates{service_date}='Y');
run;
Thanks @Kurt_Bremser I've now corrected the array declarations to specify character arrays.
Hi there,
First of all I want to thank you for taking the time to answer my question. I am not good in SAS and i don't know what I would do without kind strangers who voluntarily answer my questions.
I love your answer; it is concise and efficient. But somehow it did not work. There were a few syntax errors; right before both _temporary_ there are missing }. I also added a "by ID" right before defining the arrays because you have used first.id.
I also increased the date range for arrays to make sure everything in my dataset is covered.
After running the code only 2 records were omitted. I was expecting 100 to 200 be omitted. I can see ambulatory care instances that have occurred two years before first onset of drug use in patients are still there. I am going over your code and it seems all correct to me, I don't get why it is not working!
Thanks again,
But
I think I might have solved the problem! It's really the most ridiculous fix. It seems like SAS does not like 'Y' in the arrays. I changed tem to 1 :
data want (drop=d);
  set have (where=(drug=1) in=firstpass)
      have (in=secondpass);
  by id;
  array hosp_dates {%sysevalf("01jan2011"d):%sysevalf("31dec2018"d)} _temporary_;
  array ambl_dates {%sysevalf("01jan2011"d):%sysevalf("31dec2018"d)} _temporary_;
  if first.id then call missing (of hosp_dates{*}, of ambl_dates{*});
  if firstpass then do;
    do d=service_date-90 to service_date+90; hosp_dates{d}=1; end;
    do d=service_date to service_date+90;    ambl_dates{d}=1; end;
  end;
  if secondpass;
  if (drug=1) or (hosp=1 and hosp_dates{service_date}=1) or (ambl=1 and ambl_dates{service_date}=1);
run;
and it appears to be working. Unbelievable how much time I put into this!
Thanks 🙂
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
