Not applicable
Posts: 1

# Cumulative Percentage within a formatted variable

Hello forum,

I am trying to calculate cumulative percent for the variable prob for each of the bin level.

so for bin = 'a' with 4 counts, the cum % should be

0.2/4

(0.2+0.3)/4

and so on .. for each of the bin levels.

data test;

input bin \$ prob;

cards;

a .2

a .3

a .6

a .8

b .5

b .6

b .7

b .8

c .1

c .4

c .5

c .7

c .8

.... ;

what I would like to see is ..

bin  prob  cum_per

a    .2     0.05

a    .3    0.125

a    .6    0.275

a    .8    0.475

b    .5    0.125

b    .6    0.275

b    .7    0.45

b    .8    0.65

c    .1    0.02

c    .4    0.08

c    .5    0.1

c    .7    0.14

c    .8    0.16

Thanks in advance for you help!

Super User
Posts: 23,758

## Re: Cumulative Percentage within a formatted variable

Inelegant solution, but it works:

data test;

input bin \$ prob;

cards;

a .2

a .3

a .6

a .8

b .5

b .6

b .7

b .8

c .1

c .4

c .5

c .7

c .8

.... ;

proc sql;

create table temp as

select a.bin, a.prob, count(bin) as tot

from test a

group by bin

order by bin, prob;

quit;

data want;

set temp;

retain cum_prob;

by bin;

if first.bin then cum_prob=prob/tot;

else cum_prob=cum_prob+(prob/tot);

run;

Posts: 1,270

## Re: Cumulative Percentage within a formatted variable

data test;
input bin \$ prob;
cards;
a .2
a .3
a .6
a .8
b .5
b .6
b .7
b .8
c .1
c .4
c .5
c .7
c .8
;

data want;
set test;
by bin;
if first.bin then cumsum=0;
cumsum+prob;
run;

proc sql;
create table need as
select bin,prob,cumsum/count(bin) as cum_per from want
group by bin
order by bin,prob;
quit;

Occasional Contributor
Posts: 8

## Re: Cumulative Percentage within a formatted variable

data test;

input bin \$ prob;

cards;

a .2

a .3

a .6

a .8

b .5

b .6

b .7

b .8

c .1

c .4

c .5

c .7

c .8

;

run;

data test2;

do until (last.bin);

set test;

by bin;

if first.bin then no=0;

no + 1;

end;

do until (last.bin);

set test;

by bin;

cum_per = sum(cum_per, (prob / no));

output;

end;

run;

Posts: 3,167

## Re: Cumulative Percentage within a formatted variable

A SQL approach (warning: not for the best efficiency):

data have;

input bin \$ prob;

cards;

a .2

a .3

a .6

a .8

b .5

b .6

b .7

b .8

c .1

c .4

c .5

c .7

c .8

;

proc sql;

create table want as

select *, (select sum(prob) from have where bin=a.bin and prob <= a.prob )/count(bin) as cumsum

from have a

group by bin

order by a.bin, a.prob

;

quit;

Haikuo

Occasional Contributor
Posts: 8

## Re: Cumulative Percentage within a formatted variable

Hi another simple solution :

PROC SORT DATA=HAVE;

BY BIN;

RUN;

ODS LISTING CLOSE;

PROC TABULATE DATA=HAVE OUT=FREQ;

CLASS BIN  /  ASCENDING;

TABLE BIN;

RUN;

ODS LISTING;

DATA WANT;

MERGE TEST (IN=I) FREQ (KEEP=BIN N);

BY BIN;

IF I;

RETAIN CUM_PROB;

IF FIRST.BIN THEN CUM_PROB = 0;

CUM_PROB = CUM_PROB + PROB;

CUMM_PCT = CUM_PROB / N;

RUN;

Super User
Posts: 10,784

## Re: Cumulative Percentage within a formatted variable

```data test;
input bin \$ prob;
cards;
a .2
a .3
a .6
a .8
b .5
b .6
b .7
b .8
c .1
c .4
c .5
c .7
c .8
;
run;
data want;
n=0;
do until(last.bin);
set test;
by bin;
n+1;
end;
sum=0;
do until(last.bin);
set test;
by bin;
sum+prob;cum_per=sum/n;output;
end;
drop n sum;
run;

```

Xia Keshan

Discussion stats
• 6 replies
• 615 views
• 0 likes
• 7 in conversation