BookmarkSubscribeRSS Feed
rmg
Calcite | Level 5 rmg
Calcite | Level 5

I'm trying to compute the number of observations that fulfil a requirement.

I have a table such as this:

opening_tradetrade_timetrade_priceopening_timeobs
1931005093000.
1934005593000.
1935005793000.
2938006093800should be 2  (93400 and 93500)


When "opening_trade" = 2, I would like to determine how many observations happened within the last 500 units (5 min) as determined with opening_time and trade_time.  In other words, the number of observations that happened from 93800 to 93800 - 500 or 93800 to 93300.  The value should be 2 under "obs" in the last row.

IF opening_trade=2

THEN

DO;

x=0;

y=0;

%LET k=1;

DO WHILE (y<500);

x=lag&k(trade_time);

y=opening_time - x;

%LET k=%SYSEVALF(&k+1);

END;

END;

The issue seems to be with the evaluation of x in the DO WHILE Loop.  I'm not sure how to fix it.  Thanks for any help you can provide!

9 REPLIES 9
Haikuo
Onyx | Level 15

of course you can do anything using Macro, but I don't know if in this case Macro is a needed call.

In you case, proc sql seems to be handy:

data have;

input opening_trade trade_time trade_price opening_time;

cards;

1 93100 50 93000

1 93400 55 93000

1 93500 57 93000

2 93800 60 93800

;

proc sql;

  create table want as

    select a.*, case when a.opening_trade=2 then count(*) else . end as obs

    from have a

       left join have b

     on a.trade_time-500 < b.trade_time < a.trade_time

   group by a.opening_trade, a.trade_time, a.trade_price, a.opening_time

;

quit;

Haikuo

rmg
Calcite | Level 5 rmg
Calcite | Level 5

Hello Haikuo,

Thanks, but I really need to do it within a loop within the same DATA Step b/c it involves analyzing terabytes of data.  Creating new tables will significantly increase the processing time and demand on resources.

Would you know how to do it using a DO WHILE Loop or another Loop structure?

RichardinOz
Quartz | Level 8

There is no difference between the SQL and your desired data step as regards to creation of another table.

Because you apparently want to reference rows before your event of interest a standard data step (which operated sequentially) is unsuitable.  You could get it to work possibly using an array of lagged values, or potentially with some hash join; or you could sort the original data in descending order so that before becomes after as far as processing is concerned.  However, sorting will require around three times the disk space of the original table.

Try the SQL.  It can handle terebytes of data.  If it runs out of space, the same problem is likely to occur with a datastep. 

Richard

rmg
Calcite | Level 5 rmg
Calcite | Level 5

Thanks for your input RichardinOz.

I spoke with a SAS instructor before getting to this point in the code.  According to her, there is a difference.  Not noted in the code above are a number of IF/THEN statements that utilize RETAIN and affect roughly 10 other Variables.  These are all computed for each row and carried forward to the next row.  Interrupting that process and creating new tables would be significantly more inefficient than using a Loop.

The issue is that I do not know what the lagged value is going to be ahead of time.  The code above is meant to discern it when the condition is met.

Its crucial to construct the code in a Loop structure.

Haikuo
Onyx | Level 15

We still don't have the big picture here. Maybe revealing more details, such as your purpose, instead of just a technique you choose to use. Since you mentioned SAS instructor, is this a homework? Anyway, if you HAVE to use do-loop, then you would need 3 passes to get what you want, that is why I choose to use proc sql at the first place,  yet the code is still creating a new table one way or the other, and it is rather awkward. The only scenario where no new table is made is to use Modify or UPdate statement, which seems way off per your situation.

data have;

input opening_trade trade_time trade_price opening_time;

cards;

1 93100 50 93000

1 93400 55 93000

1 93500 57 93000

2 93800 60 93800

;

data want;

   do until (opening_trade=2);

       set have;

   end;

   _t=trade_time;

   do until (opening_trade=2);

       set have;

    if _t-500 < trade_time < _t then _obs+1;

   end;

   do until (opening_trade=2);

   set have;

   obs=ifn(opening_trade=2,_obs,.);

   output;

   end;

  drop _:;

   run;

Good luck,

Haikuo

RichardinOz
Quartz | Level 8

You should try both solutions by Haikuo with a subset of the data, eg the first 100,000 rows:

options obs = 100000 ;

{code}

options obs = max ;

That will give you a good indication of which method is "better" and also what the likely maximum number of previous values would be encountered.  If they are about equal I would go for the SQL solution because the code is more compact and easier to maintain.

Richard

rmg
Calcite | Level 5 rmg
Calcite | Level 5

The project is doctoral research into Market Microstructure.  I'll try both and let you know if I run into any issues and how it turns out.  Thanks!

rmg
Calcite | Level 5 rmg
Calcite | Level 5

I noticed a mistake in my original post: it should be "y=opening_time - x;",  before it stated "y=opening_time;".

The Loop does not refer to "opening_time".  I adjusted it as shown below.  It seems to be working.  I'll confirm tomorrow and let you know, thanks!

data loop;

obs=0;

do until (opening_trade=2);

set tbl_1;

end;

output;

_t=opening_time;

do until (opening_trade=2);

set tbl_1;

if _t-500 < nbbo_time < _t then _obs+1;

end;

do until (opening_trade=2);

set tbl_1;

obs=ifn(opening_trade=2,_obs,.);

output;

end;

drop _:;

run;

rmg
Calcite | Level 5 rmg
Calcite | Level 5

I cannot get the above Loop code to work within the existing DATA Step.

I have the code below.  It iterates only once for some reason.  Any ideas why it does not loop or how to fix it?

IF opening_time>93000 and opening_trade ^= p_opening_trade

THEN

DO;

x=opening_time;

DO WHILE (opening_time < x <= opening_time);

%LET k=1;

x=lag&k(nbbo_time);

%LET k=%SYSEVALF(&k+1);

END;

k=&k;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 3751 views
  • 6 likes
  • 3 in conversation