Hi,
I have a large dataset (20 year period) in the following format:
month | numerator | denominator |
Jan-16 | 10 | 200 |
Feb-16 | 15 | 300 |
Mar-16 | 20 | 300 |
Apr-16 | 10 | 150 |
May-16 | 5 | 100 |
Jun-16 | 30 | 350 |
Jul-16 | 20 | 400 |
Aug-16 | 25 | 350 |
Sep-16 | 15 | 200 |
Oct-16 | 40 | 450 |
Nov-16 | 10 | 200 |
Dec-16 | 40 | 400 |
I need to transform to a quarterly view in the following way.
Quarter_end | Rate |
Mar-16 | 22.5% |
Jun-16 | 30.0% |
Sep-16 | 15.0% |
Dec-16 | 20.0% |
Mar-16 (quarter end) = sum of numerator in Jan-16, Feb-16 and Mar-16 divided by denominator in Jan-16 (quarter start)
= (10+15+20) / 200
The logic for the other quarters follow in the same way.
I am currently doing this in Excel but it can get messy.
Is there a more efficient way of doing this in SAS?
Thanks in advance
Same update, to take into account the year:
data have;
infile datalines dlm="09"x;
input month:monyy. numerator denominator;
format month monyy.;
datalines;
Jan-16 10 200
Feb-16 15 300
Mar-16 20 300
Apr-16 10 150
May-16 5 100
Jun-16 30 350
Jul-16 20 400
Aug-16 25 350
Sep-16 15 200
Oct-16 40 450
Nov-16 10 200
Dec-16 40 400
Jan-17 40 450
Feb-17 10 200
Mar-17 40 400
;
run;
data have_q;
set have;
Year = year(month);
Quarter = qtr(month);
run;
data denominator (keep= year quarter denominator) last_month (keep=year quarter month);
set have_q;
by year quarter;
if first.quarter then output denominator;
if last.quarter then output last_month;
run;
proc means data=have_q sum noprint;
var numerator;
by year quarter;
output out=numerator (drop=_:) sum=sum_numerator;
run;
data want (keep= month Rate rename=(month=Quarter_end));
merge numerator denominator last_month;
format Rate percent8.1;
by year quarter;
Rate = sum_numerator / denominator;
run;
Hi @PetePatel
Here is one approach to achieve this, using a data step:
data have;
infile datalines dlm="09"x;
input month:monyy. numerator denominator;
format month monyy.;
datalines;
Jan-16 10 200
Feb-16 15 300
Mar-16 20 300
Apr-16 10 150
May-16 5 100
Jun-16 30 350
Jul-16 20 400
Aug-16 25 350
Sep-16 15 200
Oct-16 40 450
Nov-16 10 200
Dec-16 40 400
;
run;
data have_q;
set have;
Quarter = qtr(month);
run;
data want (keep= Quarter_end Rate);
set have_q;
by Quarter;
format month monyy. Rate percent8.1;
retain numerator_sum;
retain denominator_sum;
if first.Quarter then do;
numerator_sum=0;
denominator_sum=0;
end;
numerator_sum + numerator;
if first.Quarter then denominator_sum + denominator;
Rate = numerator_sum / denominator_sum;
if last.Quarter then output;
rename month=Quarter_end;
run;
Thanks Ed, this is really close.
It works for the one year period I provided but as soon as you add in more quarters it fails.
So if I add on the below to datalines
Jan-17 40 450
Feb-17 10 200
Mar-17 40 400
Is there a way around this?
Hi @PetePatel
I have added two statements in the program to take into account the year:
Best,
data have;
infile datalines dlm="09"x;
input month:monyy. numerator denominator;
format month monyy.;
datalines;
Jan-16 10 200
Feb-16 15 300
Mar-16 20 300
Apr-16 10 150
May-16 5 100
Jun-16 30 350
Jul-16 20 400
Aug-16 25 350
Sep-16 15 200
Oct-16 40 450
Nov-16 10 200
Dec-16 40 400
Jan-17 40 450
Feb-17 10 200
Mar-17 40 400
;
run;
data have_q;
set have;
Year = year(month);
Quarter = qtr(month);
run;
data want (keep= Quarter_end Rate);
set have_q;
by Year Quarter;
format month monyy. Rate percent8.1;
retain numerator_sum;
retain denominator_sum;
if first.Quarter then do;
numerator_sum=0;
denominator_sum=0;
end;
numerator_sum + numerator;
if first.Quarter then denominator_sum + denominator;
Rate = numerator_sum / denominator_sum;
if last.Quarter then output;
rename month=Quarter_end;
run;
Another option could be to use a proc means:
data have;
infile datalines dlm="09"x;
input month:monyy. numerator denominator;
format month monyy.;
datalines;
Jan-16 10 200
Feb-16 15 300
Mar-16 20 300
Apr-16 10 150
May-16 5 100
Jun-16 30 350
Jul-16 20 400
Aug-16 25 350
Sep-16 15 200
Oct-16 40 450
Nov-16 10 200
Dec-16 40 400
;
run;
data have_q;
set have;
quarter = qtr(month);
run;
data denominator (keep=quarter denominator) last_month (keep=quarter month);
set have_q;
by quarter;
if first.quarter then output denominator;
if last.quarter then output last_month;
run;
proc means data=have_q sum noprint;
var numerator;
by quarter;
output out=numerator (drop=_:) sum=sum_numerator;
run;
data want (keep= month Rate rename=(month=Quarter_end));
merge numerator denominator last_month;
format Rate percent8.1;
by quarter;
Rate = sum_numerator / denominator;
run;
Same update, to take into account the year:
data have;
infile datalines dlm="09"x;
input month:monyy. numerator denominator;
format month monyy.;
datalines;
Jan-16 10 200
Feb-16 15 300
Mar-16 20 300
Apr-16 10 150
May-16 5 100
Jun-16 30 350
Jul-16 20 400
Aug-16 25 350
Sep-16 15 200
Oct-16 40 450
Nov-16 10 200
Dec-16 40 400
Jan-17 40 450
Feb-17 10 200
Mar-17 40 400
;
run;
data have_q;
set have;
Year = year(month);
Quarter = qtr(month);
run;
data denominator (keep= year quarter denominator) last_month (keep=year quarter month);
set have_q;
by year quarter;
if first.quarter then output denominator;
if last.quarter then output last_month;
run;
proc means data=have_q sum noprint;
var numerator;
by year quarter;
output out=numerator (drop=_:) sum=sum_numerator;
run;
data want (keep= month Rate rename=(month=Quarter_end));
merge numerator denominator last_month;
format Rate percent8.1;
by year quarter;
Rate = sum_numerator / denominator;
run;
Works perfectly, thank you.
Is this for reporting purposes or do you need a SAS data set?
For reporting but required in SAS to merge onto other datasets.
Hi @PetePatel Nice to see the question has been answered in very legible terms. Good morning, Assuming your sample represents your sample and is neatly sorted by calendar month, this presents a case for nice application of groupformats akin to what we are used to in procedures. One would think, why not use a procedure, well datastep is more of a work horse than readymeals, the options plethora are staggering
data have;
infile datalines ;
input month:monyy. numerator denominator;
format month monyy.;
datalines;
Jan-16 10 200
Feb-16 15 300
Mar-16 20 300
Apr-16 10 150
May-16 5 100
Jun-16 30 350
Jul-16 20 400
Aug-16 25 350
Sep-16 15 200
Oct-16 40 450
Nov-16 10 200
Dec-16 40 400
Jan-17 40 450
Feb-17 10 200
Mar-17 40 400
;
run;
data want;
call missing(Quarter_end);
do until(last._m);
set have(rename=month=_m);
by _m groupformat;
format _m yyq6.;
Quarter_end=_m;
if first._m then _d=denominator;
_n=sum(numerator,_n);
end;
Rate=_n/_d;
format rate percent8.2 Quarter_end monyy7.;
keep Quarter_end rate;
run;
All the best!
Thank you novinosrin, I didn't know you could use a groupformat in such a way.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.