BookmarkSubscribeRSS Feed
Pooja
Fluorite | Level 6

Hello All,

I have a dataset where each row represents a patient and each column a day. The columns have 0 if the patient is on no medication, 1 if the patient is on one medication, 2 if the patient is on 2 medications and so on. There are 1825 days in the form of an array day1-day1825. I need to select 365 days from these 1825 days for each patient. The start and end date will vary for each patient. How do I create this new array of 365 days for each patient based on the individual patients specific start and end date.

What I have

Patient_id     Start_date          End_date     Day1.......day1825

What I need

Patient_Id     Start_date     End_date     Day1...day365

Based on each individual patients start and end date.

Any guidance will be appreciated.

Regards,

Pooja Desai

The University of Texas at Austin

15 REPLIES 15
Amir
PROC Star

Hi,

A few questions:

Q1) For each patient (row) is Day1 the same as Start_date? If not, how do you figure out which Day column relates to Start_date?

Q2) If you always want 365 Day columns what is End_date for?

Regards,

Amir.

Pooja
Fluorite | Level 6

Hello Amir,

For each patient day1 is not the same as the start_date. I have the start_date as a separate variable which is a continuous number between 1 and 1825.

You are right I want to have exactly 365 days following start_date so end_date would be irrelevant.

Thanks!

Vince28_Statcan
Quartz | Level 8

day1 of collection would be the date to which your current variable day1 refers to. That is, you have data gathered over 1825 days or 5 years. If your study took place from 01JUNE2006 to 31MAY2011, then "day1 of collection" would be 01JUNE2006. What my code example does is take the number of days between START_DATE and this 01JUNE2006 to find which array column to begin with for each patient and simply copies the 365 following values to a new array.

I had merely used a macro variable to clearly depict that you need to know what date "day1" variable in your dataset refers to. You could very well just hard type that date in the intck function's 3rd parameter.

Vincent

Vince28_Statcan
Quartz | Level 8

I assume you know exactly what date is equivalent to day1 of collection. In order to appropriately build the sub-array, I would go as follow

%let collection_day1_date=01MAY2011;

data want;

     set have;

     array day{*} day1-day1825;

     array newday{*} newday1-newday365;

     index=intck('day', start_date, "&collection_day1_date"d);

     do i=1 to 365;

          newday{i}=day{index+i};

     end;

     drop day: index i;

run;

Pooja
Fluorite | Level 6

Hello Vince,

Thanks for your code. What exactly do you mean by day1 of collection? I know which day the 365 day period for each patient starts and which day it ends. Please could you clarify?

day{i} would not necessarily be 0 for days outside of the start_date and end_date. I need a particular snapshot of the data and hence I am trying to get 365 out of the 1825 days. 

Thanks!

Reeza
Super User

So the start and end date represent the 365 days you want?

What date does day one correspond to?

Pooja
Fluorite | Level 6

I had data over a longer period so day1 is the day they indexed on the medication. However I need data only over a particular 365 days snapshot. Yes start_day and end_day are continuous numbers between 1 and 1825 and represent the 365 day period over which I need data. The final product I am hoping for is a new array newday1-newday365.

Thanks!

Vince28_Statcan
Quartz | Level 8

O wait, start_date and end_date are not actual date values but simply values ranging from 1-1825? If so, it is even more straight forward as you can use start_date to index day{} directly

data want;

     set have;

     array day{*} day1-day1825;

     array newday{*} newday1-newday365;

     do i=1 to 365;

          newday{i}=day{start_date+i-1};

     end;

     drop day: i;

run;

Pooja
Fluorite | Level 6

Thanks for the clarification Vincent. I used the code but for the newday array I get all missing values. All the 365 columns for all the patients are missing.

Reeza
Super User

Can you post some sample data. Vince's code should have worked, so there's something else missing.

Was there anything else in the log?

For example is this how your data looks like?

Patient_id Start_date      End_date Day1.......day1825

111                  2                 367           0  0 0 1 2.....2

Pooja
Fluorite | Level 6

Reeza,

Vince's code ran with no errors. Only the newday array had missing values for all the columns. The data look exactly like how you described it.

The output dataset was like:

Patient_ID     start_dat     end_day     day1......day1825     newday1....newday365

111               2                    367          0 1 1 1 1 2 2 2          ............................

with the entire newday array missing.

Thanks!

Vince28_Statcan
Quartz | Level 8


If you copy pasted my code above and did not remove the drop statement, then it means that your variables day1-day1825 are only labeled day1-day1825 and actually bear a different name. Thus, 1825 missing values were created to populate 365 missing values, the former 1825 created were dropped but the actual appropriate columns were never used because they are not named appropriately

Had they been named day1-day1825, they would have been removed from the output dataset completely assuming you kept the drop statement.

P.s. I am leaving work at noon today so I won't be able to provide any further debugging until tomorrow if the above does not solve your issue.

Pooja
Fluorite | Level 6

Oops..you're right Vince, I forgot to change the variable name!:smileyblush:

Thanks a lot for your help!

Astounding
PROC Star

How would you do this if you weren't allowed to use computers?  What could you look at in the data that would tell you what you want to know?

Also, how do  you know that each patient should end 365 days after starting?  Would  you want to confirm that by looking at the data?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

New Learning Events in April

 

Join us for two new fee-based courses: Administrative Healthcare Data and SAS via Live Web Monday-Thursday, April 24-27 from 1:00 to 4:30 PM ET each day. And Administrative Healthcare Data and SAS: Hands-On Programming Workshop via Live Web on Friday, April 28 from 9:00 AM to 5:00 PM ET.

LEARN MORE

Discussion stats
  • 15 replies
  • 4688 views
  • 0 likes
  • 5 in conversation