Thanks in advance!
Basically I want to create new columns with dates starting from date_from date and going until days_supply -1 and populate the valid dates with perday number. details below
data have;
input id $ id2 $ date_from drug $ days_supply perday;
cards;
xxxxxx1 | yyyyyy1 | 6/24/2016 | HYDROMORPHONE HCL | 8 | 60 |
xxxxxx2 | yyyyyy2 | 7/5/2016 | MORPHINE SULFATE ER | 14 | 120 |
xxxxxx3 | yyyyyy3 | 7/5/2016 | FENTANYL | 15 | 60 |
xxxxxx4 | yyyyyy4 | 7/5/2016 | TRAMADOL HCL | 14 | 40 |
;
run
I want the output to look like this-
id1 | id2 | drug | 6/24/2016 | 6/25/2016 | 6/26/2016 | 6/27/2016 | 6/28/2016 | 6/29/2016 | 6/30/2016 | 7/1/2016 | 7/2/2016 | 7/3/2016 | 7/4/2016 | 7/5/2016 | 7/6/2016 | 7/7/2016 | 7/8/2016 | 7/9/2016 | 7/10/2016 | 7/11/2016 | 7/12/2016 | 7/13/2016 | 7/14/2016 | 7/15/2016 | 7/16/2016 | 7/17/2016 | 7/18/2016 | 7/19/2016 |
xxxxxx1 | yyyyyy1 | HYDROMORPHONE HCL | 60 | 60 | 60 | 60 | 60 | 60 | 60 | 60 | ||||||||||||||||||
xxxxxx2 | yyyyyy2 | MORPHINE SULFATE ER | 120 | 120 | 120 | 120 | 120 | 120 | 120 | 120 | 120 | 120 | 120 | 120 | 120 | 120 | ||||||||||||
xxxxxx3 | yyyyyy3 | FENTANYL | 60 | 60 | 60 | 60 | 60 | 60 | 60 | 60 | 60 | 60 | 60 | 60 | 60 | 60 | 60 | |||||||||||
xxxxxx4 | yyyyyy4 | TRAMADOL HCL | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 |
Getting closer then. The variable names are still not valid because of the slashes. I would recommend putting the year first in the variable names, such as d_20160624. That way, the variable names will be in the same order as the dates they represent.
Follow-up questions to get the next step:
Do you want a variable for every date in the time period even if no meds were used on that date?
Do you know the time period, or do you need to find it based on the data?
You would be well advised to actually discuss your intended result. There may be much easier ways to get there.
Regarding what you asked for:
Thanks a lot for your reply. To answer your questions, i know about that..perhaps i was thinking columns starting with d_6/24/2016 and so on. I wont have thousands of variable names, maximum 300-400 variable names for all dates in a year.
The intended result output is the same as i posted above. I tried using this code, but its not getting me much.
data want;
set have;
array i _numeric_;
do i=date_from to ((date_from + days_supply) -1);
i= perday;
end;
run;
Getting closer then. The variable names are still not valid because of the slashes. I would recommend putting the year first in the variable names, such as d_20160624. That way, the variable names will be in the same order as the dates they represent.
Follow-up questions to get the next step:
Do you want a variable for every date in the time period even if no meds were used on that date?
Do you know the time period, or do you need to find it based on the data?
Thanks again! To answer your questions.
1. ideally, yes, but if there is a date on which none of the id's used the drug, its fine not to have that date column.
2. Based on the data-meaning start the columns from the first date which appears and end with the last date possible. For example, in the given sample, the first date column starts from 20160624 and end with 20160719.
OK, I might just take the easy way out here. The results will be difficult to work with no matter what, but all the necessary dates will be there. So ...
data halfway_there;
set have;
do date = date_from to date_from + days_supply - 1;
column_header = 'd_' || put(date, yymmddn8.);
output;
end;
drop date;
run;
proc transpose data=halfway_there out=want;
by id1 id2 drug;
var days_supply;
id column_header;
run;
There may be an extra variable or two hanging around that you need to drop, and you will need to sort your data set ahead of time. But it will get you there with a minimal amount of coding.
(It's also untested. I don't expect problems, but let me know if any debugging is needed.)
thanks so much, it worked. var was perday number not days_supply, but that was a minor thing and yes i dropped _name_ and_label_ variables.
How are you going to use the result?
There is great potential for a difficult to manage dataset with hundreds of variables.
As your final datae will hold variables for all dates created in all observations,
even if some of them are nor relevant and shall be missing value, I would prefer
calculate the minimum of date_from, the maximum of date_last, how many dates should be in the array
and create a list of variable names per date as a mcaro variable.
On next step declare the list of variables and an array to parse the per_day:
data _NULL_;
set have end=eof;
retain min_date max_date;
length var_names $1000; /* up to 800 variable names each of 12 characters long and a space as seperator */
if _N_ = 1 then do;
min_date = date_from;
max_date = date_from + days_supply;
end; else do;
min_date = min( min_date, date_from);
max_date = max(max_date, date_from + days_supply);
end;
if eof then do;
dates_count = max_date - min_date +1;
do i=min_date to max_date;
vname = cat('V_', put(i , yymmdd10.);
var_names = catx(' ', var_names , vname);
end;
PUT dates_count= min_date= date9. max_date= date9. ; /* check in LOG */
call symput('dates_count', left(dates_count));
call symput('date_min', left(min_date));;
call symput('var_names' , trim(var_names);
end;
run;
data want;
set have;
length &var_names 8; /* thus adding/declaring the variables */
array dtx {&dates_count} &var_names;
retain date_min = symget('date_min');
i1 = date_from - date_min +1;
do i = i1 to i1 + days_supply -1;
dtx(i) = perday;
end;
run;
Thanks for the solution, i will try this one too.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.