Help using Base SAS procedures

how to select the first obs after a given time?

Accepted Solution Solved
Reply
Contributor
Posts: 56
Accepted Solution

how to select the first obs after a given time?

I have a dataset of 8000 observation each with a datetime variable.

I want to create a new dataset where it only takes the

1. first observation after 9.15am each day

2. first observation after 12.30pm each day

3. first observation after 4pm each day

My dataset has 8000 obs spanning 65 days (all days are week days, no obs on weekends), and so I wish for this new data set to have only 3 obs each day, i.e the dataset would have 65*3=195 observations.

if say, there was no more records after 4pm, then yes that day would just have 2 records. So no it does not have to be 3 records per day.

any advise would be much appreciated.


Accepted Solutions
Solution
‎03-21-2012 10:45 AM
Respected Advisor
Posts: 3,124

Re: how to select the first obs after a given time?

I am sure there must be better approaches, effiency wise. But for the time being, try this, since it is not tested, error could happen:

/*to make a group variable by day*/

data have2;

set have2;

date=datepart(Trade_Date_Time);

run;

/*sorting by day and time*/

proc sort data=have2;

by date Trade_Date_Time;

run;

data want (drop=date n1-n3);

/*to get the first row no. after criteria point*/

do _n_=1 by 1 until (last.date);

   set have2;

   by date datetimevar;

   if timepart(Trade_Date_Time)>= '9:15:00't then n1=coalesce(n1,_n_);

   if timepart(Trade_Date_Time)>='12:30:00't then n2=coalesce(n2,_n_);

   if timepart(Trade_Date_Time)>='16:00:00't then n3=coalesce(n3,_n_);

end;

/*output the results*/

do _n_=1 by 1 until (last.date);

    set have2;

by date Trade_Date_Time;

if _n_=n1 then output;

else if _n_=n2 then output;

else if _n_=n3 then output;

end;

run;

Regards,

Haikuo

View solution in original post


All Replies
Respected Advisor
Posts: 3,124

Re: how to select the first obs after a given time?

What if , say, you only have 1 record after 9 and 12, is '3 record per day' a mandate or in some special case you will take 0-2 records per day? if it has to be 3, how you want them to be repeated?

Haikuo

Contributor
Posts: 56

how to select the first obs after a given time?

if say, there was no more records after 4pm, then yes that day would just have 2 records. So no it does not have to be 3 records per day.

Super User
Posts: 5,072

how to select the first obs after a given time?

Along similar lines ...

If the first record of the day is at 1 p.m., should that record be selected once or twice?

The programming won't be that difficult, assuming you can sort the data by your date-time variable.  It's the objective that is critical.

Contributor
Posts: 56

Re: how to select the first obs after a given time?

If the first record of the day is at 1 p.m, then select it once. (i.e  there will only be at most 2 records for that day, 2 records if we have  some data after 4pm and only 1 record for that day if we have no obs after 4pm)

My dataset has 8000 observations each observation has 3 variables

- Trade_Date_Time (datetime variable)

- Options_Trade_Price

- Futures_Trade_Price

and is already sorted by Trade_date_time


Solution
‎03-21-2012 10:45 AM
Respected Advisor
Posts: 3,124

Re: how to select the first obs after a given time?

I am sure there must be better approaches, effiency wise. But for the time being, try this, since it is not tested, error could happen:

/*to make a group variable by day*/

data have2;

set have2;

date=datepart(Trade_Date_Time);

run;

/*sorting by day and time*/

proc sort data=have2;

by date Trade_Date_Time;

run;

data want (drop=date n1-n3);

/*to get the first row no. after criteria point*/

do _n_=1 by 1 until (last.date);

   set have2;

   by date datetimevar;

   if timepart(Trade_Date_Time)>= '9:15:00't then n1=coalesce(n1,_n_);

   if timepart(Trade_Date_Time)>='12:30:00't then n2=coalesce(n2,_n_);

   if timepart(Trade_Date_Time)>='16:00:00't then n3=coalesce(n3,_n_);

end;

/*output the results*/

do _n_=1 by 1 until (last.date);

    set have2;

by date Trade_Date_Time;

if _n_=n1 then output;

else if _n_=n2 then output;

else if _n_=n3 then output;

end;

run;

Regards,

Haikuo

Contributor
Posts: 56

Re: how to select the first obs after a given time?

thanks, that works great, really appreciate your time, thank you so much :-)

Super Contributor
Posts: 1,636

Re: how to select the first obs after a given time?

how about:

data test ;

format date mmddyy10. time time5. datetime datetime18.;

input datetime

datetime18. ;

date=datepart(datetime);

time=timepart(datetime);

if time <="9:15"t then delete;

else if "9:15"t<time <="12:30"t then group=1;

   else if "12:30"t<time<="16:00"t then group=2;

      else group=3;

cards;

04jul2006:08:15

04jul2006:09:15

04jul2006:09:30

04jul2006:12:30

04jul2006:13:45

04jul2006:16:00

04jul2006:23:59

04jul2006:12:30

04jul2006:14:00

04jul2006:16:00

04jul2006:16:35

04jul2006:18:35

06jul2006:09:00

06jul2006:09:20

06jul2006:09:35

06jul2006:12:00

06jul2006:12:30

06jul2006:14:21

06jul2006:16:00

06jul2006:17:35

06jul2006:18:35

;

proc sort;

  by date group time;

data want;

  set test;

  by date group;

  if first.group;

run;

proc print;run;

Obs          date     time              datetime    group

1     07/04/2006     9:30      04JUL06:09:30:00      1

2     07/04/2006    13:45      04JUL06:13:45:00      2

3     07/04/2006    16:35      04JUL06:16:35:00      3

4     07/06/2006     9:20      06JUL06:09:20:00      1

5     07/06/2006    14:21      06JUL06:14:21:00      2

6     07/06/2006    17:35      06JUL06:17:35:00      3

Linlin

Message was edited by: Linlin

PROC Star
Posts: 7,356

Re: how to select the first obs after a given time?

This is the same as Linlin's suggested code, thus she should get the credit for the answer, but it has one minor change and uses what I think it an easier way to document the code:

data want;

  format date mmddyy10. time time5. datetime datetime18.;

  input datetime

  datetime18. ;

  date=datepart(datetime);

  time=timepart(datetime);

  if "9:15"t <time <= "12:30"t then group=1;

   else if "12:30"t <time <= "16:00"t then group=2;

    else if "16:00"t < time then group=3;

     else group=0;

  if group >0;

  cards;

04jul2006:08:15

04jul2006:09:15

04jul2006:09:30

04jul2006:12:30

04jul2006:13:45

04jul2006:16:00

04jul2006:23:59

04jul2006:12:30

04jul2006:14:00

04jul2006:16:00

04jul2006:16:35

04jul2006:18:35

06jul2006:09:00

06jul2006:09:20

06jul2006:09:35

06jul2006:12:00

06jul2006:12:30

06jul2006:14:21

06jul2006:16:00

06jul2006:17:35

06jul2006:18:35

07jul2006:17:35

07jul2006:18:35

;

proc sort data=want;

  by date group time;

run;

data want;

  set want;

  by date group;

  if first.group;

run;

Super Contributor
Posts: 1,636

Re: how to select the first obs after a given time?

Hi Art,

Thank you! I made some other changes too.

.

Contributor
Posts: 56

Re: how to select the first obs after a given time?

thank you also, this works great as well

Respected Advisor
Posts: 4,641

how to select the first obs after a given time?

You could do this:

data periods(keep=period hbeg hend);
input hb mb he me;
hbeg = hms(hb, mb, 0);
hend = hms(he, me, 59.99);
period = _n_;
datalines;
9 15 12 29
12 30 15 59
16 0 23 59
;

data test;
input Trade_Date_Time ANYDTDTM. Options_Trade_Price Futures_Trade_Price;
format Trade_Date_Time datetime20.0;
datalines;
01012012 08:30:00  1 2
01012012 10:30:00  1 2
01012012 12:30:00  1 2
01012012 14:30:00  1 2
01012012 16:30:00  1 2
01012012 18:30:00  1 2
01012012 20:30:00  1 2
01012012 22:30:00  1 2
02012012 08:00:00  1 2
02012012 10:00:00  1 2
02012012 12:00:00  1 2
02012012 14:00:00  1 2
02012012 16:00:00  1 2
02012012 18:00:00  1 2
02012012 20:00:00  1 2
02012012 22:00:00  1 2
;

proc sql;
create table firstTimes as
select min(Trade_date_time) as firstTime, datePart(Trade_date_time) as tradeDate, period
from test inner join periods on timepart(trade_date_time) between hbeg and hend
group by calculated tradeDate, period;

create table firstTrades as
select test.*, period
from test inner join firstTimes on test.Trade_date_time=firstTime;

quit;

PG

PG
Respected Advisor
Posts: 3,124

Re: how to select the first obs after a given time?

Here is a Hash version, no need to presort, no need to create grouping variables, presumely  more efficient than the previous one that I had posted. Borrowed LinLin's sample set:

data have;

format  datetime datetime18.;

input datetime datetime18. ;

cards;

04jul2006:08:15

04jul2006:09:15

04jul2006:09:30

04jul2006:12:30

04jul2006:13:45

04jul2006:16:00

04jul2006:23:59

04jul2006:12:30

04jul2006:14:00

04jul2006:16:00

04jul2006:16:35

04jul2006:18:35

06jul2006:09:00

06jul2006:09:20

06jul2006:09:35

06jul2006:12:00

06jul2006:12:30

06jul2006:14:21

06jul2006:16:00

06jul2006:17:35

06jul2006:18:35

07aug2008:18:35

07aug2008:18:36

07aug2008:18:36

07aug2008:18:33

07aug2008:10:35

;

data want (keep=datetime);

  if _n_=1 then do;

     set have point=_n_;

     dcl hash h(dataset:'have', multidata:'yes', ordered: 'a');

      h.definekey('datetime');

      h.definedata('datetime');

      h.definedone();

      dcl hiter hi('h');

   end;

do _rc=hi.first() by 0 while (_rc=0);

       date=datepart(datetime);

        do  while (_rc=0);

          if timepart(datetime) >= '16:00't then

             do;

                 ct1+1;

                 if ct1 =1 then output;

             end;

            else if timepart(datetime) >='12:30't then

                 do;

                     ct2+1;

                     if ct2=1 then output;

                 end;

            else if timepart(datetime) >= '9:15't then

                 do;

                    ct3+1;

                    if ct3=1 then output;

                 end;

          _rc=hi.next();

          if date ne datepart(datetime) then leave;

      end;

     call missing(of ct1-ct3);

  end;

    stop;

run;

proc print;run;

Regards,

Haikuo

Super User
Posts: 9,662

Re: how to select the first obs after a given time?

It is my two cents.

data test ;
format datetime datetime18.;
input datetime datetime18. ;
date=datepart(datetime);
time=timepart(datetime);
cards;
04jul2006:08:15
04jul2006:09:15
04jul2006:09:30
04jul2006:12:30
04jul2006:13:45
04jul2006:16:00
04jul2006:23:59
04jul2006:12:30
04jul2006:14:00
04jul2006:16:00
04jul2006:16:35
04jul2006:18:35
06jul2006:09:00
06jul2006:09:20
06jul2006:09:35
06jul2006:12:00
06jul2006:12:30
06jul2006:14:21
06jul2006:16:00
06jul2006:17:35
06jul2006:18:35
;
run;
proc sort data=test; by date time;run;
data want(keep=datetime);
 set test;
 by date;
 retain one two three;
 if first.date then do;one=0;two=0;three=0;end;
 if (time ge '9:15:00't) then do;if not one then output;one=1;end;
 if (time ge '12:30:00't) then do;if not two then output;two=1;end;
 if (time ge '16:00:00't ) then do;if not three then output;three=1;end;
run;


Ksharp

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 13 replies
  • 363 views
  • 6 likes
  • 7 in conversation