Help with vertical processing in SAS

Accepted Solution Solved
Reply
Super Contributor
Posts: 371
Accepted Solution

Help with vertical processing in SAS

Hi Everyone,

I have a problem with the below dataset:

Time     Price     Level

1     5     9

2     2     3

3     8     5

4     10     5

5     12     6

For each observation, I want to find the first future observation That:

Price(t) > Current Level

Example, for the first observation, the level is 9.

So the First future price greater than 9 belong to observation 4 (price=10)

For the second observation (level=3), the first future price reported should be the 3rd observation where price equal to 8.

The output file should be:

Time     Price     Level          First obsevation with price greater than level

1     5     9          4

2     2     3          3

3     8     5          10

4     10     5

5     12     6          

Thank you so much for your help.

HC


Accepted Solutions
Solution
‎05-23-2013 09:07 AM
Super Contributor
Posts: 282

Re: Help with vertical processing in SAS

Hi,

I'm not sure I fully understood your example output, I thought the 10 would be 4 and I think the "time" column is what you refer to as the "observation" column.

Were you looking for something like the following:

data have;

  input time price level;

  datalines;

1     5     9

2     2     3

3     8     5

4     10    5

5     12    6

;

data want(keep=time price level greater);

  set have nobs=totalobs;

  current_level=level;

  found        =0;

  greater      =0;

  n            =_n_;

  do while(not found and n ne totalobs);

    n=n+1;

    set have(rename=(time=time2 price=price2 level=leve2)) point=n;

    if price2 gt current_level then

    do;

      found=1;

      greater=time2;

    end;

  end;

  output;

run;

Regards,

Amir.

View solution in original post


All Replies
Solution
‎05-23-2013 09:07 AM
Super Contributor
Posts: 282

Re: Help with vertical processing in SAS

Hi,

I'm not sure I fully understood your example output, I thought the 10 would be 4 and I think the "time" column is what you refer to as the "observation" column.

Were you looking for something like the following:

data have;

  input time price level;

  datalines;

1     5     9

2     2     3

3     8     5

4     10    5

5     12    6

;

data want(keep=time price level greater);

  set have nobs=totalobs;

  current_level=level;

  found        =0;

  greater      =0;

  n            =_n_;

  do while(not found and n ne totalobs);

    n=n+1;

    set have(rename=(time=time2 price=price2 level=leve2)) point=n;

    if price2 gt current_level then

    do;

      found=1;

      greater=time2;

    end;

  end;

  output;

run;

Regards,

Amir.

Super Contributor
Posts: 371

Re: Help with vertical processing in SAS

Thank you so much, Amir.

The output is exactly what I need.

Have a nice day.

HC

Super User
Super User
Posts: 6,308

Re: Help with vertical processing in SAS

You can take advantage of the flexibility of the DO loop syntax to make the program simpler.

data want (keep=time price level greater);

  set have nobs=totalobs;

  do n=_n_+1 to totalobs until (found) ;

    set have(keep=time price rename=(time=greater price=price2)) point=n;

    found = (price2 > level) ;

  end;

  if not found then greater=0;

run;

Super Contributor
Posts: 282

Re: Help with vertical processing in SAS

Hi,

Thanks for the tip. I had a few attempts at the code which is why there are some redundant bits, e.g. output and current_level, but in the end your code is obviously more succinct.

I also tried to use the firstobs data set option, but it wouldn't accept a data step variable.

Thanks,

Amir.

Super Contributor
Posts: 371

Re: Help with vertical processing in SAS

Hi Tom,

I like your solution alot.

However, I dont understand how SAS reads and execute your code from one line to the other.

In a effort to simplify your code, I make a file K as below. Still I dont understand how SAS work on both file have and K at the same time to get the output.

data K; set have;

rename time=greater price=price2;

data want3;

set have nobs=totalobs;

do n=_n_+1 to totalobs until (found);

       set K point=n;

       found = (price2>level);

end;

if not found then greater=0;

run;

I very much appreciate if you could explain step by step your code.

Thank you and have a nice weekend.

HC

Super User
Super User
Posts: 6,308

Re: Help with vertical processing in SAS

Let's peel this like an onion.

Outer Layer:    SAS will loop through the data step incrementing the automatic variable _N_ each time.  It will stop when the SET statement reads past the end of the file.  Because there are no OUTPUT statements in the datastep SAS will automatically write the observation when gets to the end of the data step.

Now let's look at the DO loop.

The DO N=_N_+1 to TOTALOBS part will iterate from the next observation ( hence the plus one) to the last observation (this is known because we used the NOBS option on the SET statement above.  BUT the UNTIL (FOUND) clause will be tested at the end of the loop and could stop the looping before N reaches TOTALOBS. It will stop when the FOUND variable is TRUE.

Inside the loop.

The SET statement will use direct access to load observation number N from the dataset (WANT in the original or your exact copy in your updated version) because of the use of the POINT= option.  Note that this SET statement can never read past the end of the file because of the boundaries on N because of the DO loop.  The reason why you NEED the KEEP (or equivalent DROP) dataset option on the dataset in this SET statement is that you do NOT want to read in LEVEL and overwrite the value of LEVEL that was read by the SET statement before the DO loop started.  The reason why it is better to use the same dataset as used in the other SET statement is to eliminate the risk that the duplicate is not an exact duplicate.  SAS has no trouble reading from the same dataset multiple times. Each SET statement will keep track of where it is in reading, although in this case the second doesn't need to as we use the POINT= option to tell it where to read.

The assignment statement will set the variable FOUND to the value of the comparison of PRICE2 and LEVEL .  When the condition is true (that is PRICE2 is greater than LEVEL) then FOUND will be set to TRUE otherwise it will be set to FALSE.  In reality FOUND will be set to 1 when PRICE2 is greater than LEVEL and and 0 otherwise.

After the loop we need to reset GREATER to zero if we did not find any prices over the value of LEVEL.  This is because each time we SET the dataset with POINT=N inside the DO loop we are reading in the value of PRICE2 and GREATER, even when that value of PRICE2 is NOT greater than LEVEL.  When we FOUND the right version then GREATER will have the right value, but if we did NOT find one then GREATER will just have the value from the last observation.  So we want to set it to 0.

Super Contributor
Posts: 371

Re: Help with vertical processing in SAS

Thank you so much, Tom.

Your explanation helps me a lot.

Base on your explanation, if I am the machine, below is how I will execute your order line by line.

What I feel confuse is the position of DO. Does this DO apply to file K or file HAVE.

If it applied to K, I thought (I know it is wrong), logically, it should be below the SET K statement. For example, if this obs in K does not meet the condition to FOUND, go to the next obs in K.

Thank you again.

HHC

Data k; set have;

keep time price;

rename time=greater price=price2;run;

data want;                                 *Create dataset want;

set have nobs=totalobs;                    *Open dataset Have, work on all obs; START at Obs=1;

do n=_n_+1 to totalobs until (found);       *On dataset K below, start from obs n+1=1+1=2 till found; IS THAT RIGHT?

set k point=n;                             * Open dataset K, start at obs =1, dictated by the first SET statement;

found = (price2>level);                    * compare price2 in K with level in HAVE; IF Found, then COMBINE OBS=1 in Have with the OBS in K;

end  ;

if

not found then greater=0;

run

;


Super User
Super User
Posts: 6,308

Re: Help with vertical processing in SAS

Why not just put in some PUT statements and watch what is happening?

692  data want (keep=time price level greater);

693    put "START of STEP: " (_N_ N time price level found greater price2) (=);

694    set have nobs=totalobs;

695    put "AFTER  SET#1 : " (_N_ N time price level found greater price2) (=);

696    do n=_n_+1 to totalobs until (found) ;

697    put "BEFORE SET#2 : " (_N_ N time price level found greater price2) (=);

698      set have(keep=time price rename=(time=greater price=price2)) point=n;

699    put "AFTER  SET#2 : " (_N_ N time price level found greater price2) (=);

700      found = (price2 > level) ;

701    put "AFTER  TEST  : " (_N_ N time price level found greater price2) (=);

702    end;

703    put "AFTER DO Loop: " (_N_ N time price level found greater price2) (=);

704    if not found then greater=0;

705    put "END of STEP  : " (_N_ N time price level found greater price2) (=);

706    put ;

707  run;

START of STEP: _N_=1 N=0 time=. price=. level=. found=. greater=. price2=.

AFTER  SET#1 : _N_=1 N=0 time=1 price=5 level=9 found=. greater=. price2=.

BEFORE SET#2 : _N_=1 N=2 time=1 price=5 level=9 found=. greater=. price2=.

AFTER  SET#2 : _N_=1 N=2 time=1 price=5 level=9 found=. greater=2 price2=2

AFTER  TEST  : _N_=1 N=2 time=1 price=5 level=9 found=0 greater=2 price2=2

BEFORE SET#2 : _N_=1 N=3 time=1 price=5 level=9 found=0 greater=2 price2=2

AFTER  SET#2 : _N_=1 N=3 time=1 price=5 level=9 found=0 greater=3 price2=8

AFTER  TEST  : _N_=1 N=3 time=1 price=5 level=9 found=0 greater=3 price2=8

BEFORE SET#2 : _N_=1 N=4 time=1 price=5 level=9 found=0 greater=3 price2=8

AFTER  SET#2 : _N_=1 N=4 time=1 price=5 level=9 found=0 greater=4 price2=10

AFTER  TEST  : _N_=1 N=4 time=1 price=5 level=9 found=1 greater=4 price2=10

AFTER DO Loop: _N_=1 N=4 time=1 price=5 level=9 found=1 greater=4 price2=10

END of STEP  : _N_=1 N=4 time=1 price=5 level=9 found=1 greater=4 price2=10

START of STEP: _N_=2 N=4 time=1 price=5 level=9 found=. greater=4 price2=10

AFTER  SET#1 : _N_=2 N=4 time=2 price=2 level=3 found=. greater=4 price2=10

BEFORE SET#2 : _N_=2 N=3 time=2 price=2 level=3 found=. greater=4 price2=10

AFTER  SET#2 : _N_=2 N=3 time=2 price=2 level=3 found=. greater=3 price2=8

AFTER  TEST  : _N_=2 N=3 time=2 price=2 level=3 found=1 greater=3 price2=8

AFTER DO Loop: _N_=2 N=3 time=2 price=2 level=3 found=1 greater=3 price2=8

END of STEP  : _N_=2 N=3 time=2 price=2 level=3 found=1 greater=3 price2=8

START of STEP: _N_=3 N=3 time=2 price=2 level=3 found=. greater=3 price2=8

AFTER  SET#1 : _N_=3 N=3 time=3 price=8 level=5 found=. greater=3 price2=8

BEFORE SET#2 : _N_=3 N=4 time=3 price=8 level=5 found=. greater=3 price2=8

AFTER  SET#2 : _N_=3 N=4 time=3 price=8 level=5 found=. greater=4 price2=10

AFTER  TEST  : _N_=3 N=4 time=3 price=8 level=5 found=1 greater=4 price2=10

AFTER DO Loop: _N_=3 N=4 time=3 price=8 level=5 found=1 greater=4 price2=10

END of STEP  : _N_=3 N=4 time=3 price=8 level=5 found=1 greater=4 price2=10

START of STEP: _N_=4 N=4 time=3 price=8 level=5 found=. greater=4 price2=10

AFTER  SET#1 : _N_=4 N=4 time=4 price=10 level=5 found=. greater=4 price2=10

BEFORE SET#2 : _N_=4 N=5 time=4 price=10 level=5 found=. greater=4 price2=10

AFTER  SET#2 : _N_=4 N=5 time=4 price=10 level=5 found=. greater=5 price2=12

AFTER  TEST  : _N_=4 N=5 time=4 price=10 level=5 found=1 greater=5 price2=12

AFTER DO Loop: _N_=4 N=5 time=4 price=10 level=5 found=1 greater=5 price2=12

END of STEP  : _N_=4 N=5 time=4 price=10 level=5 found=1 greater=5 price2=12

START of STEP: _N_=5 N=5 time=4 price=10 level=5 found=. greater=5 price2=12

AFTER  SET#1 : _N_=5 N=5 time=5 price=12 level=6 found=. greater=5 price2=12

AFTER DO Loop: _N_=5 N=6 time=5 price=12 level=6 found=. greater=5 price2=12

END of STEP  : _N_=5 N=6 time=5 price=12 level=6 found=. greater=0 price2=12

START of STEP: _N_=6 N=6 time=5 price=12 level=6 found=. greater=0 price2=12

NOTE: There were 5 observations read from the data set WORK.HAVE.

NOTE: The data set WORK.WANT has 5 observations and 4 variables.

Super Contributor
Posts: 371

Re: Help with vertical processing in SAS

That's awesome!

I learn a lot from your post, Tom :-)

Thank you,

HC

Super Contributor
Posts: 371

Re: Help with vertical processing in SAS


data want (keep=time price level p2 t2);
set have nobs=totalobs;
found=0;
greater=0;
p2=0;
t2=0;
i+1;
do j=i+1 to min(nobs, i+12) until (found=1);
set have(keep=time price rename=(time=t2 price=p2)) point=j nobs=nobs;

if p2>level then do;
found=1;
end;
end;
run;

Respected Advisor
Posts: 3,124

Re: Help with vertical processing in SAS

Coding wise, Proc SQL is pretty straightforward; while efficiency wise, Hash() is better:

data have;

input Time Price Level;

cards;

1 5 9

2 2 3

3 8 5

4 10 5

5 12 6

;

/*Proc SQL*/

proc sql;

  create table want_sql as

  select a.*, b.time as b_time from have a

left join have b

on a.time < b.time

and b.price > a.level

group by a.time

having b.time=min(b.time);

quit;

/*Hash*/

data want_hash;

  if _n_=1 then do;

  if 0 then set have (rename=(time=b_time price=_price) drop=level);

declare hash h(dataset:'have (rename=(time=b_time price=_price) drop=level)', ordered:'a');

h.definekey('b_time');

h.definedata(all:'y');

h.definedone();

declare hiter hi('h');

  end;

set have;

  do rc=hi.setcur(key:time) by 0 while (rc=0);

rc=hi.next();

if _price> level then leave;

end;

if rc ne 0 then call missing(b_time);

drop _price rc;

run;

Haikuo

Super Contributor
Posts: 371

Re: Help with vertical processing in SAS

Thank you, Haikuo.

That sql is very clear.

Hc

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 552 views
  • 8 likes
  • 4 in conversation