Hello forum,
I am trying to calculate cumulative percent for the variable prob for each of the bin level.
so for bin = 'a' with 4 counts, the cum % should be
0.2/4
(0.2+0.3)/4
and so on .. for each of the bin levels.
data test;
input bin $ prob;
cards;
a .2
a .3
a .6
a .8
b .5
b .6
b .7
b .8
c .1
c .4
c .5
c .7
c .8
.... ;
what I would like to see is ..
bin prob cum_per
a .2 0.05
a .3 0.125
a .6 0.275
a .8 0.475
b .5 0.125
b .6 0.275
b .7 0.45
b .8 0.65
c .1 0.02
c .4 0.08
c .5 0.1
c .7 0.14
c .8 0.16
Thanks in advance for you help!
Inelegant solution, but it works:
data test;
input bin $ prob;
cards;
a .2
a .3
a .6
a .8
b .5
b .6
b .7
b .8
c .1
c .4
c .5
c .7
c .8
.... ;
proc sql;
create table temp as
select a.bin, a.prob, count(bin) as tot
from test a
group by bin
order by bin, prob;
quit;
data want;
set temp;
retain cum_prob;
by bin;
if first.bin then cum_prob=prob/tot;
else cum_prob=cum_prob+(prob/tot);
run;
data test;
input bin $ prob;
cards;
a .2
a .3
a .6
a .8
b .5
b .6
b .7
b .8
c .1
c .4
c .5
c .7
c .8
;
data want;
set test;
by bin;
if first.bin then cumsum=0;
cumsum+prob;
run;
proc sql;
create table need as
select bin,prob,cumsum/count(bin) as cum_per from want
group by bin
order by bin,prob;
quit;
data test;
input bin $ prob;
cards;
a .2
a .3
a .6
a .8
b .5
b .6
b .7
b .8
c .1
c .4
c .5
c .7
c .8
;
run;
data test2;
do until (last.bin);
set test;
by bin;
if first.bin then no=0;
no + 1;
end;
do until (last.bin);
set test;
by bin;
cum_per = sum(cum_per, (prob / no));
output;
end;
run;
A SQL approach (warning: not for the best efficiency):
data have;
input bin $ prob;
cards;
a .2
a .3
a .6
a .8
b .5
b .6
b .7
b .8
c .1
c .4
c .5
c .7
c .8
;
proc sql;
create table want as
select *, (select sum(prob) from have where bin=a.bin and prob <= a.prob )/count(bin) as cumsum
from have a
group by bin
order by a.bin, a.prob
;
quit;
Haikuo
Hi another simple solution :
PROC SORT DATA=HAVE;
BY BIN;
RUN;
ODS LISTING CLOSE;
PROC TABULATE DATA=HAVE OUT=FREQ;
CLASS BIN / ASCENDING;
TABLE BIN;
RUN;
ODS LISTING;
DATA WANT;
MERGE TEST (IN=I) FREQ (KEEP=BIN N);
BY BIN;
IF I;
RETAIN CUM_PROB;
IF FIRST.BIN THEN CUM_PROB = 0;
CUM_PROB = CUM_PROB + PROB;
CUMM_PCT = CUM_PROB / N;
RUN;
data test; input bin $ prob; cards; a .2 a .3 a .6 a .8 b .5 b .6 b .7 b .8 c .1 c .4 c .5 c .7 c .8 ; run; data want; n=0; do until(last.bin); set test; by bin; n+1; end; sum=0; do until(last.bin); set test; by bin; sum+prob;cum_per=sum/n;output; end; drop n sum; run;
Xia Keshan
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.