Cumulative Percentage within a formatted variable

Reply
N/A
Posts: 1

Cumulative Percentage within a formatted variable

Hello forum,

I am trying to calculate cumulative percent for the variable prob for each of the bin level.

so for bin = 'a' with 4 counts, the cum % should be

0.2/4

(0.2+0.3)/4

and so on .. for each of the bin levels.

data test;

input bin $ prob;

cards;

a .2

a .3

a .6

a .8

b .5

b .6

b .7

b .8

c .1

c .4

c .5

c .7

c .8

.... ;

what I would like to see is ..

bin  prob  cum_per

a    .2     0.05

a    .3    0.125

a    .6    0.275

a    .8    0.475

b    .5    0.125

b    .6    0.275

b    .7    0.45

b    .8    0.65

c    .1    0.02

c    .4    0.08

c    .5    0.1

c    .7    0.14

c    .8    0.16

Thanks in advance for you help!

Super User
Posts: 19,878

Re: Cumulative Percentage within a formatted variable

Inelegant solution, but it works:

data test;

input bin $ prob;

cards;

a .2

a .3

a .6

a .8

b .5

b .6

b .7

b .8

c .1

c .4

c .5

c .7

c .8

.... ;

proc sql;

  create table temp as

  select a.bin, a.prob, count(bin) as tot

  from test a

  group by bin

  order by bin, prob;

quit;

data want;

set temp;

retain cum_prob;

by bin;

if first.bin then cum_prob=prob/tot;

else cum_prob=cum_prob+(prob/tot);

run;

Trusted Advisor
Posts: 1,231

Re: Cumulative Percentage within a formatted variable

data test;
input bin $ prob;
cards;
a .2
a .3
a .6
a .8
b .5
b .6
b .7
b .8
c .1
c .4
c .5
c .7
c .8
;

data want;
set test;
by bin;
if first.bin then cumsum=0;
cumsum+prob;
run;

proc sql;
create table need as
select bin,prob,cumsum/count(bin) as cum_per from want
group by bin
order by bin,prob;
quit;

Occasional Contributor
Posts: 8

Re: Cumulative Percentage within a formatted variable

data test;

input bin $ prob;

cards;

a .2

a .3

a .6

a .8

b .5

b .6

b .7

b .8

c .1

c .4

c .5

c .7

c .8

;

run;

data test2;

  do until (last.bin);

    set test;

    by bin;

    if first.bin then no=0;

    no + 1;

  end;

  do until (last.bin);

    set test;

    by bin;

    cum_per = sum(cum_per, (prob / no));

    output;

  end;

run;

Respected Advisor
Posts: 3,156

Re: Cumulative Percentage within a formatted variable

A SQL approach (warning: not for the best efficiency):

data have;

     input bin $ prob;

     cards;

a .2

a .3

a .6

a .8

b .5

b .6

b .7

b .8

c .1

c .4

c .5

c .7

c .8

;

proc sql;

     create table want as

           select *, (select sum(prob) from have where bin=a.bin and prob <= a.prob )/count(bin) as cumsum

                from have a

                     group by bin

                           order by a.bin, a.prob

     ;

quit;


Haikuo

Occasional Contributor
Posts: 8

Re: Cumulative Percentage within a formatted variable

Hi another simple solution :

PROC SORT DATA=HAVE;

     BY BIN;

RUN;

ODS LISTING CLOSE;

PROC TABULATE DATA=HAVE OUT=FREQ;

     CLASS BIN  /  ASCENDING;

     TABLE BIN;

RUN;

ODS LISTING;

DATA WANT;

MERGE TEST (IN=I) FREQ (KEEP=BIN N);

     BY BIN;

     IF I;

     RETAIN CUM_PROB;

     IF FIRST.BIN THEN CUM_PROB = 0;

     CUM_PROB = CUM_PROB + PROB;

     CUMM_PCT = CUM_PROB / N;

RUN;

Super User
Posts: 10,047

Re: Cumulative Percentage within a formatted variable

data test;
input bin $ prob;
cards;
a .2
a .3
a .6
a .8
b .5
b .6
b .7
b .8
c .1
c .4
c .5
c .7
c .8
;
run;
data want;
n=0;
 do until(last.bin);
  set test;
  by bin;
  n+1;
 end;
sum=0;
 do until(last.bin);
  set test;
  by bin;
  sum+prob;cum_per=sum/n;output;
 end;
drop n sum;
run;

Xia Keshan

Ask a Question
Discussion stats
  • 6 replies
  • 396 views
  • 0 likes
  • 7 in conversation