BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jojo
Obsidian | Level 7

I need to create a new variable 'cumulative'.

Within each by group, which has same n1, m1,I need to look at the p2, p12, and term, if there is any obs that p2 <= current p2, p12> current p12, then sum term. for example, the first obs, p2= 0.75,p12 = 0.1,so for all other sets in the by group (n1 = 3 m1 =16),if p2 <=0.75 and p12 > 0.1, then sum the 'term',in this example, only the 2nd obs satisfies the criteria,so cumulative= 0.3;for second obs,p2= 0.75,p12= 0.2, because no obs satisfy the criteria (p2<=0.75 and p12 >0.2) in the by group , so the cumulative = 0. 

Old data: 

Obs n1 m1 x P2 p12 Term

1 3 16 4 0.75 0.1 5

2 3 16 5 0.75 0.2 3

3 3 16 6 0.76 0.5 6

4 3 16 7 0.76 0.1 1

5 3 16 9 0.76 0.6 8

6 3 16 4 0.78 0.3 5

7 3 16 7 0.78 0.2 8

8 3 17 4 0.25 0.5 4

9 3 17 3 0.25 0.2 3

10 3 17 2 0.35 0.3 9

11 3 17 1 0.35 0.5 9

12 3 17 0 0.46 0.6 5

13 3 17 7 0.46 0.5 7

New data:

Obs n1 m1 x P2 p12l Term cumulative

1 3 16 4 0.75 0.1 5  3

2 3 16 5 0.75 0.2 3  0

3 3 16 6 0.76 0.5 6  8

4 3 16 7 0.76 0.1 1  17

5 3 16 9 0.76 0.6 8  0

6 3 16 4 0.78 0.3 5  14

7 3 16 7 0.78 0.2 8  19

8 3 17 4 0.25 0.5 4  0

9 3 17 3 0.25 0.2 3  4

10 3 17 2 0.35 0.3 9  13

11 3 17 1 0.35 0.5 9  0

12 3 17 0 0.46 0.6 5  0

13 3 17 7 0.46 0.5 7  5

1 ACCEPTED SOLUTION

Accepted Solutions
Haikuo
Onyx | Level 15

And you feel SQL has a performance issue, then try following hash():

data want;

  if _n_=1 then do;

      if 0 then set have(rename=(term=bterm p2=bp2 p12=bp12));

         dcl hash h(dataset:'have(rename=(term=bterm p2=bp2 p12=bp12))', multidata:'y');

         h.definekey('n1','m1');

         h.definedata('bp2','bp12','bterm') ;

         h.definedone();

  end;

    set have;

       cumulative=0;

       rc=h.find();

       do rc=0 by 0 while (rc=0);

          if bp2<=p2 and bp12>p12 then cumulative+bterm;

          rc=h.find_next();

       end;

drop b: rc;

run;

Haikuo

View solution in original post

8 REPLIES 8
Haikuo
Onyx | Level 15

Here is one Proc SQL approach:

data have;

input Obs n1 m1 x P2 p12 Term;

cards;

1 3 16 4 0.75 0.1 5

2 3 16 5 0.75 0.2 3

3 3 16 6 0.76 0.5 6

4 3 16 7 0.76 0.1 1

5 3 16 9 0.76 0.6 8

6 3 16 4 0.78 0.3 5

7 3 16 7 0.78 0.2 8

8 3 17 4 0.25 0.5 4

9 3 17 3 0.25 0.2 3

10 3 17 2 0.35 0.3 9

11 3 17 1 0.35 0.5 9

12 3 17 0 0.46 0.6 5

13 3 17 7 0.46 0.5 7

;

proc sql noprint;

  create table want  as

  select distinct Obs ,n1, m1, x, P2, p12, case when sum(bterm)>0 then sum(bterm) else 0 end as cumulative from

  ( select distinct a.*, b.term as bterm from

            have a

            left join

            have b

            on a.n1=b.n1 and a.m1=b.m1     and b.p2<=a.p2 and b.p12>a.p12 )

group by obs;

quit;

proc print;run;

Haikuo

Haikuo
Onyx | Level 15

And you feel SQL has a performance issue, then try following hash():

data want;

  if _n_=1 then do;

      if 0 then set have(rename=(term=bterm p2=bp2 p12=bp12));

         dcl hash h(dataset:'have(rename=(term=bterm p2=bp2 p12=bp12))', multidata:'y');

         h.definekey('n1','m1');

         h.definedata('bp2','bp12','bterm') ;

         h.definedone();

  end;

    set have;

       cumulative=0;

       rc=h.find();

       do rc=0 by 0 while (rc=0);

          if bp2<=p2 and bp12>p12 then cumulative+bterm;

          rc=h.find_next();

       end;

drop b: rc;

run;

Haikuo

jojo
Obsidian | Level 7

Thank you!

TomKari
Onyx | Level 15

I don't really understand how you're deriving your "cumulative", but here's some code that I think does some of the things you want.

Tom

DATA HAVE;
INPUT n1 m1 x P2 p12 Term;
DATALINES;
3 16 4 0.75 0.1 5
3 16 5 0.75 0.2 3
3 16 6 0.76 0.5 6
3 16 7 0.76 0.1 1
3 16 9 0.76 0.6 8
3 16 4 0.78 0.3 5
3 16 7 0.78 0.2 8
3 17 4 0.25 0.5 4
3 17 3 0.25 0.2 3
3 17 2 0.35 0.3 9
3 17 1 0.35 0.5 9
3 17 0 0.46 0.6 5
3 17 7 0.46 0.5 7
RUN;

DATA WANT;
RETAIN P2_REFERENCE P12_REFERENCE CUMULATIVE;
SET HAVE;
  BY n1 m1;
IF FIRST.m1 THEN
DO;
  P2_REFERENCE = P2;
  P12_REFERENCE = P12;
  CUMULATIVE = 0;
END;
ELSE IF P2 <= P2_REFERENCE & P12 > P12_REFERENCE THEN CUMULATIVE = CUMULATIVE + TERM;
RUN;

jojo
Obsidian | Level 7

Thank you. The reference value should be the value of p2 and p12 for the obs itself, not the first obs in the by group.

jojo
Obsidian | Level 7

this probably is the most efficient method, it works very well with a part of  my data, with the whole data, I got a message "computer is out of resources'. So I want to separate the data to several subsets of data(for example, every 5 blocks in one dataset) and try to create a macro using the code you provide, I am not familar with the hash table code, for the following statement : dcl hash h(dataset:'have(rename=(term=bterm p2=bp2 p12=bp12))', multidata:'y'); , can the 'have' be a macro variable?

Haikuo
Onyx | Level 15

  It sure can. And then you need to replace the single quote to double quote so that macro variable can resolve.

dcl hash h(dataset:"&var.(rename=(term=bterm p2=bp2 p12=bp12))", multidata:'y');

Haikuo

jojo
Obsidian | Level 7

is it possible to include the hash table while reading each observation and find the cumulative sum for each observation? I am trying to use more efficient code, I need to run this on a giant dataset that includes over 500 million obs. Specifically, can the second data step be included in the first step?

%let userN = 10;

%let p01 = 0.05;

%let p02= 0.05;

%let p11 = 0.25;

%let p12 = 0.25;

data s1_A;

  do n1 = 2 to &userN-2;

    do m1 = 2 to &userN-2 while ((m1 + n1) < &userN-2);

       

     do y1 = 0 to n1-1

           do x1 = 0 to m1-1 ;

           p11 = x1/m1;

                 p12 = y1/n1;

                

                if p11 > p12 then do;

              p11_final = (m1 * x1/m1 + n1 * y1/n1)/(m1+n1);   

              p12_final = p11_final;

              end;

                    

                 else do;

                       P11_final = p11;

              P12_final = p12;

                       end;

                 term1_p1 = pdf('BINOMIAL', x1, &p11, m1)* pdf('BINOMIAL', y1, &p12, n1);

                 term1_p0 = pdf('BINOMIAL', x1, &p01, m1)* pdf('BINOMIAL', y1, &p02, n1);

           term1_p0p1 = pdf('BINOMIAL', x1, &p01, m1)* pdf('BINOMIAL', y1, &p12, n1);

       

         output;

           end;

     end;    

  end;

end;  

run;

data s1_B;

if _n_=1 then do;

      if 0 then set s1_A(rename=(term1_p0=bt1p0 term1_p1=bt1p1 term1_p0p1=bt1p0p1 p12_final=bp12 p11_final=bp11));

         dcl hash h(dataset:'s1_A(rename=(term1_p0=bt1p0 term1_p1=bt1p1 term1_p0p1=bt1p0p1 p12_final=bp12 p11_final=bp11))', multidata:'y');

         h.definekey('n1','m1');

         h.definedata('bp12','bp11', 'bt1p0', 'bt1p1', 'bt1p0p1') ;

         h.definedone();

  end;

    set s1_A;

       ct1p0H=0;

          ct1p1H=0;

          ct1p0p1H=0;

          ct1p0L=0;

          ct1p1L=0;    

          ct1p0p1L=0;

       rc=h.find();

       do rc=0 by 0 while (rc=0);

          if bp12<=p12_final then do;

             ct1p1H + bt1p1;

                      ct1p0H + bt1p0;

                      ct1p0p1H + bt1p0p1;

                      end;

                if bp11<=p11_final then do;

             ct1p1L + bt1p1;

                      ct1p0L + bt1p0;                 

                      ct1p0p1L + bt1p0p1;

                      end;

          rc=h.find_next();        

       

       end;

drop b: rc;

if ct1p1H <=0.15 or ct1p1L <= 0.15;

run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 2903 views
  • 6 likes
  • 3 in conversation