Hello All,
I have a dataset where each row represents a patient and each column a day. The columns have 0 if the patient is on no medication, 1 if the patient is on one medication, 2 if the patient is on 2 medications and so on. There are 1825 days in the form of an array day1-day1825. I need to select 365 days from these 1825 days for each patient. The start and end date will vary for each patient. How do I create this new array of 365 days for each patient based on the individual patients specific start and end date.
What I have
Patient_id Start_date End_date Day1.......day1825
What I need
Patient_Id Start_date End_date Day1...day365
Based on each individual patients start and end date.
Any guidance will be appreciated.
Regards,
Pooja Desai
The University of Texas at Austin
Hi,
A few questions:
Q1) For each patient (row) is Day1 the same as Start_date? If not, how do you figure out which Day column relates to Start_date?
Q2) If you always want 365 Day columns what is End_date for?
Regards,
Amir.
Hello Amir,
For each patient day1 is not the same as the start_date. I have the start_date as a separate variable which is a continuous number between 1 and 1825.
You are right I want to have exactly 365 days following start_date so end_date would be irrelevant.
Thanks!
day1 of collection would be the date to which your current variable day1 refers to. That is, you have data gathered over 1825 days or 5 years. If your study took place from 01JUNE2006 to 31MAY2011, then "day1 of collection" would be 01JUNE2006. What my code example does is take the number of days between START_DATE and this 01JUNE2006 to find which array column to begin with for each patient and simply copies the 365 following values to a new array.
I had merely used a macro variable to clearly depict that you need to know what date "day1" variable in your dataset refers to. You could very well just hard type that date in the intck function's 3rd parameter.
Vincent
I assume you know exactly what date is equivalent to day1 of collection. In order to appropriately build the sub-array, I would go as follow
%let collection_day1_date=01MAY2011;
data want;
set have;
array day{*} day1-day1825;
array newday{*} newday1-newday365;
index=intck('day', start_date, "&collection_day1_date"d);
do i=1 to 365;
newday{i}=day{index+i};
end;
drop day: index i;
run;
Hello Vince,
Thanks for your code. What exactly do you mean by day1 of collection? I know which day the 365 day period for each patient starts and which day it ends. Please could you clarify?
day{i} would not necessarily be 0 for days outside of the start_date and end_date. I need a particular snapshot of the data and hence I am trying to get 365 out of the 1825 days.
Thanks!
So the start and end date represent the 365 days you want?
What date does day one correspond to?
I had data over a longer period so day1 is the day they indexed on the medication. However I need data only over a particular 365 days snapshot. Yes start_day and end_day are continuous numbers between 1 and 1825 and represent the 365 day period over which I need data. The final product I am hoping for is a new array newday1-newday365.
Thanks!
O wait, start_date and end_date are not actual date values but simply values ranging from 1-1825? If so, it is even more straight forward as you can use start_date to index day{} directly
data want;
set have;
array day{*} day1-day1825;
array newday{*} newday1-newday365;
do i=1 to 365;
newday{i}=day{start_date+i-1};
end;
drop day: i;
run;
Thanks for the clarification Vincent. I used the code but for the newday array I get all missing values. All the 365 columns for all the patients are missing.
Can you post some sample data. Vince's code should have worked, so there's something else missing.
Was there anything else in the log?
For example is this how your data looks like?
Patient_id | Start_date | End_date | Day1.......day1825 |
111 2 367 0 0 0 1 2.....2
Reeza,
Vince's code ran with no errors. Only the newday array had missing values for all the columns. The data look exactly like how you described it.
The output dataset was like:
Patient_ID start_dat end_day day1......day1825 newday1....newday365
111 2 367 0 1 1 1 1 2 2 2 ............................
with the entire newday array missing.
Thanks!
If you copy pasted my code above and did not remove the drop statement, then it means that your variables day1-day1825 are only labeled day1-day1825 and actually bear a different name. Thus, 1825 missing values were created to populate 365 missing values, the former 1825 created were dropped but the actual appropriate columns were never used because they are not named appropriately
Had they been named day1-day1825, they would have been removed from the output dataset completely assuming you kept the drop statement.
P.s. I am leaving work at noon today so I won't be able to provide any further debugging until tomorrow if the above does not solve your issue.
Oops..you're right Vince, I forgot to change the variable name!:smileyblush:
Thanks a lot for your help!
How would you do this if you weren't allowed to use computers? What could you look at in the data that would tell you what you want to know?
Also, how do you know that each patient should end 365 days after starting? Would you want to confirm that by looking at the data?
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Join us for two new fee-based courses: Administrative Healthcare Data and SAS via Live Web Monday-Thursday, April 24-27 from 1:00 to 4:30 PM ET each day. And Administrative Healthcare Data and SAS: Hands-On Programming Workshop via Live Web on Friday, April 28 from 9:00 AM to 5:00 PM ET.