Hi all, I need help formatting my data for Proc Traj. I have steroid data during a 1 year period, and I have dose information as well as the amount of steroid supplied (days_supp). I am trying to get my dataset clean so that every month is recorded properly, however my data is fairly messy. This is a small example of what I am working with and what I would like it to be: data have;
input id month dose days_supp;
datalines;
1 1 1 90
1 2 5 90
1 3 . .
1 4 . .
1 5 4 30
1 6 6 90
1 7 . .
1 8 . .
1 9 . .
1 10 2 30
1 11 3 30
1 12 . .
2 1 . .
2 2 5 60
2 3 2 30
2 4 . .
2 5 . .
2 6 . .
2 7 5 30
2 8 . .
2 9 5 30
2 10 5 30
2 11 5 30
2 12 5 30
;
run;
data want;
input id month dose days_supp;
datalines;
1 1 1 30
1 2 1 30
1 3 1 30
1 4 5 30
1 5 5 30
1 6 5 30
1 7 4 30
1 8 6 30
1 9 6 30
1 10 6 30
1 11 2 30
1 12 3 30
2 1 . .
2 2 5 30
2 3 5 30
2 4 2 30
2 5 . .
2 6 . .
2 7 5 30
2 8 . .
2 9 5 30
2 10 5 30
2 11 5 30
2 12 5 30
;
run; Basically, if someone got a 90 day supply on month 1, that dose should be present for month 1, 2, and 3. This is what I have tried so far: data month_lag;
set un_t_month;
dose_lag=lag(avg_dose);
supp_lag=lag(days_supp);
dose_lag2=lag(dose_lag);
supp_lag2=lag(supp_lag);
dose_lag3=lag(dose_lag2);
supp_lag3=lag(supp_lag2);
if 35<supp_lag<61 and _name_ ^= "dose12" and avg_dose=. then do;
avg_dose=dose_lag;
days_supp=supp_lag-30;
avg_dose=(avg_dose*days_supp)/30;
end;
else if 61<supp_lag<91 and _name_ not in ("dose12","dose11") and avg_dose=. then do;
avg_dose=dose_lag;
days_supp=supp_lag-60;
avg_dose=(avg_dose*days_supp)/30;
end;
else if 61<supp_lag2<91 and _name_ not in ("dose12", "dose11") and avg_dose=. then do;
avg_dose=dose_lag2;
days_supp=supp_lag2-60;
avg_dose=(avg_dose*days_supp)/30;
end;
run; But this doesn't work when I have multiple large days supplies next to each other (i.e. 90 on month 1 and 90 on month 2). Also it's very inefficient code I'm sure. Any help would be appreciated. I also have the data transposed in case it's better to approach it that way.
... View more