Solved: Re: Creating extra date columns (transpose or arrays?)

devsas · Posted 12-06-2016 11:55 AM

Thanks in advance!

Basically I want to create new columns with dates starting from date_from date and going until days_supply -1 and populate the valid dates with perday number. details below

data have;

input id $ id2 $ date_from drug $ days_supply perday;

cards;

xxxxxx1	yyyyyy1	6/24/2016	HYDROMORPHONE HCL	8	60
xxxxxx2	yyyyyy2	7/5/2016	MORPHINE SULFATE ER	14	120
xxxxxx3	yyyyyy3	7/5/2016	FENTANYL	15	60
xxxxxx4	yyyyyy4	7/5/2016	TRAMADOL HCL	14	40

;

run

I want the output to look like this-

id1	id2	drug	6/24/2016	6/25/2016	6/26/2016	6/27/2016	6/28/2016	6/29/2016	6/30/2016	7/1/2016	7/2/2016	7/3/2016	7/4/2016	7/5/2016	7/6/2016	7/7/2016	7/8/2016	7/9/2016	7/10/2016	7/11/2016	7/12/2016	7/13/2016	7/14/2016	7/15/2016	7/16/2016	7/17/2016	7/18/2016	7/19/2016
xxxxxx1	yyyyyy1	HYDROMORPHONE HCL	60	60	60	60	60	60	60	60
xxxxxx2	yyyyyy2	MORPHINE SULFATE ER												120	120	120	120	120	120	120	120	120	120	120	120	120	120
xxxxxx3	yyyyyy3	FENTANYL												60	60	60	60	60	60	60	60	60	60	60	60	60	60	60
xxxxxx4	yyyyyy4	TRAMADOL HCL												40	40	40	40	40	40	40	40	40	40	40	40	40	40

Astounding · Posted 12-06-2016 01:35 PM

Getting closer then. The variable names are still not valid because of the slashes. I would recommend putting the year first in the variable names, such as d_20160624. That way, the variable names will be in the same order as the dates they represent.

Follow-up questions to get the next step:

Do you want a variable for every date in the time period even if no meds were used on that date?

Do you know the time period, or do you need to find it based on the data?

View solution in original post

Astounding · Posted 12-06-2016 12:25 PM

You would be well advised to actually discuss your intended result. There may be much easier ways to get there.

Regarding what you asked for:

The column names you illustrated are not valid names for variables in SAS. You would have to supply different names.
The number of new variables could be in the thousands. How would that be useful to you?

devsas · Posted 12-06-2016 01:02 PM

Thanks a lot for your reply. To answer your questions, i know about that..perhaps i was thinking columns starting with d_6/24/2016 and so on. I wont have thousands of variable names, maximum 300-400 variable names for all dates in a year.

The intended result output is the same as i posted above. I tried using this code, but its not getting me much.

data want;
set have;
array i _numeric_;
do i=date_from to ((date_from + days_supply) -1);
i= perday;
end;
run;

Astounding · Posted 12-06-2016 01:35 PM

Getting closer then. The variable names are still not valid because of the slashes. I would recommend putting the year first in the variable names, such as d_20160624. That way, the variable names will be in the same order as the dates they represent.

Follow-up questions to get the next step:

Do you want a variable for every date in the time period even if no meds were used on that date?

Do you know the time period, or do you need to find it based on the data?

devsas · Posted 12-06-2016 01:42 PM

Thanks again! To answer your questions.

1. ideally, yes, but if there is a date on which none of the id's used the drug, its fine not to have that date column.

2. Based on the data-meaning start the columns from the first date which appears and end with the last date possible. For example, in the given sample, the first date column starts from 20160624 and end with 20160719.

Astounding · Posted 12-06-2016 01:53 PM

OK, I might just take the easy way out here. The results will be difficult to work with no matter what, but all the necessary dates will be there. So ...

data halfway_there;

set have;

do date = date_from to date_from + days_supply - 1;

column_header = 'd_' || put(date, yymmddn8.);

output;

end;

drop date;

run;

proc transpose data=halfway_there out=want;

by id1 id2 drug;

var days_supply;

id column_header;

run;

There may be an extra variable or two hanging around that you need to drop, and you will need to sort your data set ahead of time. But it will get you there with a minimal amount of coding.

(It's also untested. I don't expect problems, but let me know if any debugging is needed.)

devsas · Posted 12-06-2016 03:02 PM

thanks so much, it worked. var was perday number not days_supply, but that was a minor thing and yes i dropped _name_ and_label_ variables.

ballardw · Posted 12-06-2016 12:38 PM

How are you going to use the result?

There is great potential for a difficult to manage dataset with hundreds of variables.

Shmuel · Posted 12-06-2016 02:01 PM

As your final datae will hold variables for all dates created in all observations,

even if some of them are nor relevant and shall be missing value, I would prefer

calculate the minimum of date_from, the maximum of date_last, how many dates should be in the array

and create a list of variable names per date as a mcaro variable.

On next step declare the list of variables and an array to parse the per_day:

data _NULL_;

set have end=eof;

retain min_date max_date;

length var_names $1000; /* up to 800 variable names each of 12 characters long and a space as seperator */

if _N_ = 1 then do;

min_date = date_from;

max_date = date_from + days_supply;

end; else do;

min_date = min( min_date, date_from);

max_date = max(max_date, date_from + days_supply);

end;

if eof then do;

dates_count = max_date - min_date +1;

do i=min_date to max_date;

vname = cat('V_', put(i , yymmdd10.);

var_names = catx(' ', var_names , vname);

end;

PUT dates_count= min_date= date9. max_date= date9. ; /* check in LOG */

call symput('dates_count', left(dates_count));

call symput('date_min', left(min_date));;

call symput('var_names' , trim(var_names);

end;

run;

data want;

set have;

length &var_names 8; /* thus adding/declaring the variables */

array dtx {&dates_count} &var_names;

retain date_min = symget('date_min');

i1 = date_from - date_min +1;

do i = i1 to i1 + days_supply -1;

dtx(i) = perday;

end;

run;

devsas · Posted 12-06-2016 03:10 PM

Thanks for the solution, i will try this one too.

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away