Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

cumulative sum by group and under some criteria

Accepted Solution Solved
Reply
Contributor
Posts: 50
Accepted Solution

cumulative sum by group and under some criteria

I need to create a new variable 'cumulative'.

Within each by group, which has same n1, m1,I need to look at the p2, p12, and term, if there is any obs that p2 <= current p2, p12> current p12, then sum term. for example, the first obs, p2= 0.75,p12 = 0.1,so for all other sets in the by group (n1 = 3 m1 =16),if p2 <=0.75 and p12 > 0.1, then sum the 'term',in this example, only the 2nd obs satisfies the criteria,so cumulative= 0.3;for second obs,p2= 0.75,p12= 0.2, because no obs satisfy the criteria (p2<=0.75 and p12 >0.2) in the by group , so the cumulative = 0. 

Old data: 

Obs n1 m1 x P2 p12 Term

1 3 16 4 0.75 0.1 5

2 3 16 5 0.75 0.2 3

3 3 16 6 0.76 0.5 6

4 3 16 7 0.76 0.1 1

5 3 16 9 0.76 0.6 8

6 3 16 4 0.78 0.3 5

7 3 16 7 0.78 0.2 8

8 3 17 4 0.25 0.5 4

9 3 17 3 0.25 0.2 3

10 3 17 2 0.35 0.3 9

11 3 17 1 0.35 0.5 9

12 3 17 0 0.46 0.6 5

13 3 17 7 0.46 0.5 7

New data:

Obs n1 m1 x P2 p12l Term cumulative

1 3 16 4 0.75 0.1 5  3

2 3 16 5 0.75 0.2 3  0

3 3 16 6 0.76 0.5 6  8

4 3 16 7 0.76 0.1 1  17

5 3 16 9 0.76 0.6 8  0

6 3 16 4 0.78 0.3 5  14

7 3 16 7 0.78 0.2 8  19

8 3 17 4 0.25 0.5 4  0

9 3 17 3 0.25 0.2 3  4

10 3 17 2 0.35 0.3 9  13

11 3 17 1 0.35 0.5 9  0

12 3 17 0 0.46 0.6 5  0

13 3 17 7 0.46 0.5 7  5


Accepted Solutions
Solution
‎09-23-2012 07:01 PM
Respected Advisor
Posts: 3,156

Re: cumulative sum by group and under some criteria

And you feel SQL has a performance issue, then try following hash():

data want;

  if _n_=1 then do;

      if 0 then set have(rename=(term=bterm p2=bp2 p12=bp12));

         dcl hash h(dataset:'have(rename=(term=bterm p2=bp2 p12=bp12))', multidata:'y');

         h.definekey('n1','m1');

         h.definedata('bp2','bp12','bterm') ;

         h.definedone();

  end;

    set have;

       cumulative=0;

       rc=h.find();

       do rc=0 by 0 while (rc=0);

          if bp2<=p2 and bp12>p12 then cumulative+bterm;

          rc=h.find_next();

       end;

drop b: rc;

run;

Haikuo

View solution in original post


All Replies
Respected Advisor
Posts: 3,156

Re: cumulative sum by group and under some criteria

Here is one Proc SQL approach:

data have;

input Obs n1 m1 x P2 p12 Term;

cards;

1 3 16 4 0.75 0.1 5

2 3 16 5 0.75 0.2 3

3 3 16 6 0.76 0.5 6

4 3 16 7 0.76 0.1 1

5 3 16 9 0.76 0.6 8

6 3 16 4 0.78 0.3 5

7 3 16 7 0.78 0.2 8

8 3 17 4 0.25 0.5 4

9 3 17 3 0.25 0.2 3

10 3 17 2 0.35 0.3 9

11 3 17 1 0.35 0.5 9

12 3 17 0 0.46 0.6 5

13 3 17 7 0.46 0.5 7

;

proc sql noprint;

  create table want  as

  select distinct Obs ,n1, m1, x, P2, p12, case when sum(bterm)>0 then sum(bterm) else 0 end as cumulative from

  ( select distinct a.*, b.term as bterm from

            have a

            left join

            have b

            on a.n1=b.n1 and a.m1=b.m1     and b.p2<=a.p2 and b.p12>a.p12 )

group by obs;

quit;

proc print;run;

Haikuo

Solution
‎09-23-2012 07:01 PM
Respected Advisor
Posts: 3,156

Re: cumulative sum by group and under some criteria

And you feel SQL has a performance issue, then try following hash():

data want;

  if _n_=1 then do;

      if 0 then set have(rename=(term=bterm p2=bp2 p12=bp12));

         dcl hash h(dataset:'have(rename=(term=bterm p2=bp2 p12=bp12))', multidata:'y');

         h.definekey('n1','m1');

         h.definedata('bp2','bp12','bterm') ;

         h.definedone();

  end;

    set have;

       cumulative=0;

       rc=h.find();

       do rc=0 by 0 while (rc=0);

          if bp2<=p2 and bp12>p12 then cumulative+bterm;

          rc=h.find_next();

       end;

drop b: rc;

run;

Haikuo

Contributor
Posts: 50

Re: cumulative sum by group and under some criteria

Thank you!

PROC Star
Posts: 1,167

Re: cumulative sum by group and under some criteria

I don't really understand how you're deriving your "cumulative", but here's some code that I think does some of the things you want.

Tom

DATA HAVE;
INPUT n1 m1 x P2 p12 Term;
DATALINES;
3 16 4 0.75 0.1 5
3 16 5 0.75 0.2 3
3 16 6 0.76 0.5 6
3 16 7 0.76 0.1 1
3 16 9 0.76 0.6 8
3 16 4 0.78 0.3 5
3 16 7 0.78 0.2 8
3 17 4 0.25 0.5 4
3 17 3 0.25 0.2 3
3 17 2 0.35 0.3 9
3 17 1 0.35 0.5 9
3 17 0 0.46 0.6 5
3 17 7 0.46 0.5 7
RUN;

DATA WANT;
RETAIN P2_REFERENCE P12_REFERENCE CUMULATIVE;
SET HAVE;
  BY n1 m1;
IF FIRST.m1 THEN
DO;
  P2_REFERENCE = P2;
  P12_REFERENCE = P12;
  CUMULATIVE = 0;
END;
ELSE IF P2 <= P2_REFERENCE & P12 > P12_REFERENCE THEN CUMULATIVE = CUMULATIVE + TERM;
RUN;

Contributor
Posts: 50

Re: cumulative sum by group and under some criteria

Thank you. The reference value should be the value of p2 and p12 for the obs itself, not the first obs in the by group.

Contributor
Posts: 50

Re: cumulative sum by group and under some criteria

this probably is the most efficient method, it works very well with a part of  my data, with the whole data, I got a message "computer is out of resources'. So I want to separate the data to several subsets of data(for example, every 5 blocks in one dataset) and try to create a macro using the code you provide, I am not familar with the hash table code, for the following statement : dcl hash h(dataset:'have(rename=(term=bterm p2=bp2 p12=bp12))', multidata:'y'); , can the 'have' be a macro variable?

Respected Advisor
Posts: 3,156

Re: cumulative sum by group and under some criteria

  It sure can. And then you need to replace the single quote to double quote so that macro variable can resolve.

dcl hash h(dataset:"&var.(rename=(term=bterm p2=bp2 p12=bp12))", multidata:'y');

Haikuo

Contributor
Posts: 50

Re: cumulative sum by group and under some criteria

is it possible to include the hash table while reading each observation and find the cumulative sum for each observation? I am trying to use more efficient code, I need to run this on a giant dataset that includes over 500 million obs. Specifically, can the second data step be included in the first step?

%let userN = 10;

%let p01 = 0.05;

%let p02= 0.05;

%let p11 = 0.25;

%let p12 = 0.25;

data s1_A;

  do n1 = 2 to &userN-2;

    do m1 = 2 to &userN-2 while ((m1 + n1) < &userN-2);

       

     do y1 = 0 to n1-1

           do x1 = 0 to m1-1 ;

           p11 = x1/m1;

                 p12 = y1/n1;

                

                if p11 > p12 then do;

              p11_final = (m1 * x1/m1 + n1 * y1/n1)/(m1+n1);   

              p12_final = p11_final;

              end;

                    

                 else do;

                       P11_final = p11;

              P12_final = p12;

                       end;

                 term1_p1 = pdf('BINOMIAL', x1, &p11, m1)* pdf('BINOMIAL', y1, &p12, n1);

                 term1_p0 = pdf('BINOMIAL', x1, &p01, m1)* pdf('BINOMIAL', y1, &p02, n1);

           term1_p0p1 = pdf('BINOMIAL', x1, &p01, m1)* pdf('BINOMIAL', y1, &p12, n1);

       

         output;

           end;

     end;    

  end;

end;  

run;

data s1_B;

if _n_=1 then do;

      if 0 then set s1_A(rename=(term1_p0=bt1p0 term1_p1=bt1p1 term1_p0p1=bt1p0p1 p12_final=bp12 p11_final=bp11));

         dcl hash h(dataset:'s1_A(rename=(term1_p0=bt1p0 term1_p1=bt1p1 term1_p0p1=bt1p0p1 p12_final=bp12 p11_final=bp11))', multidata:'y');

         h.definekey('n1','m1');

         h.definedata('bp12','bp11', 'bt1p0', 'bt1p1', 'bt1p0p1') ;

         h.definedone();

  end;

    set s1_A;

       ct1p0H=0;

          ct1p1H=0;

          ct1p0p1H=0;

          ct1p0L=0;

          ct1p1L=0;    

          ct1p0p1L=0;

       rc=h.find();

       do rc=0 by 0 while (rc=0);

          if bp12<=p12_final then do;

             ct1p1H + bt1p1;

                      ct1p0H + bt1p0;

                      ct1p0p1H + bt1p0p1;

                      end;

                if bp11<=p11_final then do;

             ct1p1L + bt1p1;

                      ct1p0L + bt1p0;                 

                      ct1p0p1L + bt1p0p1;

                      end;

          rc=h.find_next();        

       

       end;

drop b: rc;

if ct1p1H <=0.15 or ct1p1L <= 0.15;

run;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 1328 views
  • 6 likes
  • 3 in conversation