Hi,
I'm working on a database describing medicines reimbursement. I'm counting the number of reimbursement by year, month and by ATC (a categorical variable describing "classes" of medicines).
proc sql noprint;
create table remb_month as
select month, year, ATC, count(*) as nb_remb
from remb
group by year, month, ATC
quit;
I obtain something like that :
month year ATC nb_remb
1 2006 1 1234
1 2006 2 3456
1 2006 3 7890
1 2006 5 9876
You can see there is no reimbursement of medicines of the ATC class 4 during this month. I would like to obtain a additionnal line when there are no reimbursement of a certain ATC during a given month :
month year ATC nb_remb
1 2006 1 1234
1 2006 2 3456
1 2006 3 7890
1 2006 4 0
1 2006 5 9876
Any idea of how I could do that ?
Thank you
I also regularly encounter this in my data. And I use the Double Transpose method.
My solution would be like this:
data defaults;
do month = 1 to 12;
do year = 2000 to 2015;
do ATC = 1 to &max_atc;
output;
end;
end;
end;
run;
proc sql;
create table want as
select
a.month,
a.year,
a.ATC,
case
when b.nb_remb is missing
then 0
else b.nb_remb
end as nb_remb
from
defaults as a left join
remb_month as b
on a.month = b.month and
a.year = b.year and
a.ATC = b.ATC
;
quit;
(untested)
Nice Kurt, basically the answer is you are knowing that dates should measurements are to be expected, than:
1/ build a template for all the levels that should be there (your datastep)
2/ merge that one with the results that are being delivered.
3/ Correct / analyse them NULL/missing results as being information of the join what should be there and what is there. (only the levels, info like counts excluded)
You may consider a non-SQL approach. The MEANS procedure (and her sister SUMMARY for that matter) have the PRELOADFMT option. This assumes you have a format value for each existing value of your classification variable. In combination with COMPLETETYPES you can create an output dataset that has a row for each value in the format, if its in the data or not. The other way around it also allows you to exclude values may you wish so.
Some nice uses of the PRELOADFMT option (which is not limited to MEANS/SUMMARY) can be found in this paper: http://support.sas.com/resources/papers/proceedings11/239-2011.pdf
I also regularly encounter this in my data. And I use the Double Transpose method.
Please try
data have;
input month year ATC nb_remb;
cards;
1 2006 1 1234
1 2006 2 3456
1 2006 3 7890
1 2006 5 9876
;
proc sort data=have out=dummy(keep=month year ) nodupkey;
by month year;
run;
proc sort data=have ;
by month year atc;
run;
data dummy_;
set dummy;
do atc=1 to 5;
output;
end;
run;
proc sort data=dummy_;
by month year atc;
run;
data want;
merge dummy_(in=b) have(in=a);
by month year atc;
if b;
if nb_remb=. then nb_remb=0;
run;
proc sort data=want;
by month year atc;
run;
Thanks,
Jag
try this , if your data is not already sorted then you may have to sort it:
data have;
input month year ATC nb_remb;
cards;
1 2006 1 1234
1 2006 2 3456
1 2006 3 7890
1 2006 5 9876
;
data want;
set have;
by atc;
if dif(atc)>1 then do;
month=month;year=year;atc=atc-1;nb_remb=0;
output;
set have point=_N_;output;
end;
else output;
run;
Thanks for all your answers.
I didnt know the double transpose method, it's very useful in such cases !
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.