Hello forum,
I am trying to calculate cumulative percent for the variable prob for each of the bin level.
so for bin = 'a' with 4 counts, the cum % should be
0.2/4
(0.2+0.3)/4
and so on .. for each of the bin levels.
data test;
input bin $ prob;
cards;
a .2
a .3
a .6
a .8
b .5
b .6
b .7
b .8
c .1
c .4
c .5
c .7
c .8
.... ;
what I would like to see is ..
bin prob cum_per
a .2 0.05
a .3 0.125
a .6 0.275
a .8 0.475
b .5 0.125
b .6 0.275
b .7 0.45
b .8 0.65
c .1 0.02
c .4 0.08
c .5 0.1
c .7 0.14
c .8 0.16
Thanks in advance for you help!
Inelegant solution, but it works:
data test;
input bin $ prob;
cards;
a .2
a .3
a .6
a .8
b .5
b .6
b .7
b .8
c .1
c .4
c .5
c .7
c .8
.... ;
proc sql;
create table temp as
select a.bin, a.prob, count(bin) as tot
from test a
group by bin
order by bin, prob;
quit;
data want;
set temp;
retain cum_prob;
by bin;
if first.bin then cum_prob=prob/tot;
else cum_prob=cum_prob+(prob/tot);
run;
data test;
input bin $ prob;
cards;
a .2
a .3
a .6
a .8
b .5
b .6
b .7
b .8
c .1
c .4
c .5
c .7
c .8
;
data want;
set test;
by bin;
if first.bin then cumsum=0;
cumsum+prob;
run;
proc sql;
create table need as
select bin,prob,cumsum/count(bin) as cum_per from want
group by bin
order by bin,prob;
quit;
data test;
input bin $ prob;
cards;
a .2
a .3
a .6
a .8
b .5
b .6
b .7
b .8
c .1
c .4
c .5
c .7
c .8
;
run;
data test2;
do until (last.bin);
set test;
by bin;
if first.bin then no=0;
no + 1;
end;
do until (last.bin);
set test;
by bin;
cum_per = sum(cum_per, (prob / no));
output;
end;
run;
A SQL approach (warning: not for the best efficiency):
data have;
input bin $ prob;
cards;
a .2
a .3
a .6
a .8
b .5
b .6
b .7
b .8
c .1
c .4
c .5
c .7
c .8
;
proc sql;
create table want as
select *, (select sum(prob) from have where bin=a.bin and prob <= a.prob )/count(bin) as cumsum
from have a
group by bin
order by a.bin, a.prob
;
quit;
Haikuo
Hi another simple solution :
PROC SORT DATA=HAVE;
BY BIN;
RUN;
ODS LISTING CLOSE;
PROC TABULATE DATA=HAVE OUT=FREQ;
CLASS BIN / ASCENDING;
TABLE BIN;
RUN;
ODS LISTING;
DATA WANT;
MERGE TEST (IN=I) FREQ (KEEP=BIN N);
BY BIN;
IF I;
RETAIN CUM_PROB;
IF FIRST.BIN THEN CUM_PROB = 0;
CUM_PROB = CUM_PROB + PROB;
CUMM_PCT = CUM_PROB / N;
RUN;
data test; input bin $ prob; cards; a .2 a .3 a .6 a .8 b .5 b .6 b .7 b .8 c .1 c .4 c .5 c .7 c .8 ; run; data want; n=0; do until(last.bin); set test; by bin; n+1; end; sum=0; do until(last.bin); set test; by bin; sum+prob;cum_per=sum/n;output; end; drop n sum; run;
Xia Keshan
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.