Re: Selecting specific column from an array.

Pooja · Posted 10-02-2013 04:50 AM

Hello All,

I have a dataset where each row represents a patient and each column a day. The columns have 0 if the patient is on no medication, 1 if the patient is on one medication, 2 if the patient is on 2 medications and so on. There are 1825 days in the form of an array day1-day1825. I need to select 365 days from these 1825 days for each patient. The start and end date will vary for each patient. How do I create this new array of 365 days for each patient based on the individual patients specific start and end date.

What I have

Patient_id Start_date End_date Day1.......day1825

What I need

Patient_Id Start_date End_date Day1...day365

Based on each individual patients start and end date.

Any guidance will be appreciated.

Regards,

Pooja Desai

The University of Texas at Austin

Amir · Posted 10-02-2013 07:48 AM

Hi,

A few questions:

Q1) For each patient (row) is Day1 the same as Start_date? If not, how do you figure out which Day column relates to Start_date?

Q2) If you always want 365 Day columns what is End_date for?

Regards,

Amir.

Pooja · Posted 10-02-2013 10:51 AM

Hello Amir,

For each patient day1 is not the same as the start_date. I have the start_date as a separate variable which is a continuous number between 1 and 1825.

You are right I want to have exactly 365 days following start_date so end_date would be irrelevant.

Thanks!

Vince28_Statcan · Posted 10-02-2013 10:59 AM

day1 of collection would be the date to which your current variable day1 refers to. That is, you have data gathered over 1825 days or 5 years. If your study took place from 01JUNE2006 to 31MAY2011, then "day1 of collection" would be 01JUNE2006. What my code example does is take the number of days between START_DATE and this 01JUNE2006 to find which array column to begin with for each patient and simply copies the 365 following values to a new array.

I had merely used a macro variable to clearly depict that you need to know what date "day1" variable in your dataset refers to. You could very well just hard type that date in the intck function's 3rd parameter.

Vincent

Vince28_Statcan · Posted 10-02-2013 08:52 AM

I assume you know exactly what date is equivalent to day1 of collection. In order to appropriately build the sub-array, I would go as follow

%let collection_day1_date=01MAY2011;

data want;

set have;

array day{*} day1-day1825;

array newday{*} newday1-newday365;

index=intck('day', start_date, "&collection_day1_date"d);

do i=1 to 365;

newday{i}=day{index+i};

end;

drop day: index i;

run;

Pooja · Posted 10-02-2013 10:48 AM

Hello Vince,

Thanks for your code. What exactly do you mean by day1 of collection? I know which day the 365 day period for each patient starts and which day it ends. Please could you clarify?

day{i} would not necessarily be 0 for days outside of the start_date and end_date. I need a particular snapshot of the data and hence I am trying to get 365 out of the 1825 days.

Thanks!

Reeza · Posted 10-02-2013 10:52 AM

So the start and end date represent the 365 days you want?

What date does day one correspond to?

Pooja · Posted 10-02-2013 10:56 AM

I had data over a longer period so day1 is the day they indexed on the medication. However I need data only over a particular 365 days snapshot. Yes start_day and end_day are continuous numbers between 1 and 1825 and represent the 365 day period over which I need data. The final product I am hoping for is a new array newday1-newday365.

Thanks!

Vince28_Statcan · Posted 10-02-2013 11:02 AM

O wait, start_date and end_date are not actual date values but simply values ranging from 1-1825? If so, it is even more straight forward as you can use start_date to index day{} directly

data want;

set have;

array day{*} day1-day1825;

array newday{*} newday1-newday365;

do i=1 to 365;

newday{i}=day{start_date+i-1};

end;

drop day: i;

run;

Pooja · Posted 10-02-2013 11:08 AM

Thanks for the clarification Vincent. I used the code but for the newday array I get all missing values. All the 365 columns for all the patients are missing.

Reeza · Posted 10-02-2013 11:11 AM

Can you post some sample data. Vince's code should have worked, so there's something else missing.

Was there anything else in the log?

For example is this how your data looks like?

Patient_id

Start_date

End_date

Day1.......day1825

111 2 367 0 0 0 1 2.....2

Pooja · Posted 10-02-2013 11:16 AM

Reeza,

Vince's code ran with no errors. Only the newday array had missing values for all the columns. The data look exactly like how you described it.

The output dataset was like:

Patient_ID start_dat end_day day1......day1825 newday1....newday365

111 2 367 0 1 1 1 1 2 2 2 ............................

with the entire newday array missing.

Thanks!

Vince28_Statcan · Posted 10-02-2013 11:42 AM

If you copy pasted my code above and did not remove the drop statement, then it means that your variables day1-day1825 are only labeled day1-day1825 and actually bear a different name. Thus, 1825 missing values were created to populate 365 missing values, the former 1825 created were dropped but the actual appropriate columns were never used because they are not named appropriately

Had they been named day1-day1825, they would have been removed from the output dataset completely assuming you kept the drop statement.

P.s. I am leaving work at noon today so I won't be able to provide any further debugging until tomorrow if the above does not solve your issue.

Pooja · Posted 10-02-2013 12:47 PM

Oops..you're right Vince, I forgot to change the variable name!:smileyblush:

Thanks a lot for your help!

Astounding · Posted 10-02-2013 08:57 AM

How would you do this if you weren't allowed to use computers? What could you look at in the data that would tell you what you want to know?

Also, how do you know that each patient should end 365 days after starting? Would you want to confirm that by looking at the data?

New Learning Events in April