Solved: Re: how to select the first obs after a given time?

skipper · Posted 03-21-2012 10:07 AM

I have a dataset of 8000 observation each with a datetime variable.

I want to create a new dataset where it only takes the

1. first observation after 9.15am each day

2. first observation after 12.30pm each day

3. first observation after 4pm each day

My dataset has 8000 obs spanning 65 days (all days are week days, no obs on weekends), and so I wish for this new data set to have only 3 obs each day, i.e the dataset would have 65*3=195 observations.

if say, there was no more records after 4pm, then yes that day would just have 2 records. So no it does not have to be 3 records per day.

any advise would be much appreciated.

Haikuo · Posted 03-21-2012 10:45 AM

I am sure there must be better approaches, effiency wise. But for the time being, try this, since it is not tested, error could happen:

/*to make a group variable by day*/

data have2;

set have2;

date=datepart(Trade_Date_Time);

run;

/*sorting by day and time*/

proc sort data=have2;

by date Trade_Date_Time;

run;

data want (drop=date n1-n3);

/*to get the first row no. after criteria point*/

do _n_=1 by 1 until (last.date);

set have2;

by date datetimevar;

if timepart(Trade_Date_Time)>= '9:15:00't then n1=coalesce(n1,_n_);

if timepart(Trade_Date_Time)>='12:30:00't then n2=coalesce(n2,_n_);

if timepart(Trade_Date_Time)>='16:00:00't then n3=coalesce(n3,_n_);

end;

/*output the results*/

do _n_=1 by 1 until (last.date);

set have2;

by date Trade_Date_Time;

if _n_=n1 then output;

else if _n_=n2 then output;

else if _n_=n3 then output;

end;

run;

Regards,

Haikuo

View solution in original post

Haikuo · Posted 03-21-2012 10:15 AM

What if , say, you only have 1 record after 9 and 12, is '3 record per day' a mandate or in some special case you will take 0-2 records per day? if it has to be 3, how you want them to be repeated?

Haikuo

skipper · Posted 03-21-2012 10:18 AM

if say, there was no more records after 4pm, then yes that day would just have 2 records. So no it does not have to be 3 records per day.

Astounding · Posted 03-21-2012 10:31 AM

Along similar lines ...

If the first record of the day is at 1 p.m., should that record be selected once or twice?

The programming won't be that difficult, assuming you can sort the data by your date-time variable. It's the objective that is critical.

skipper · Posted 03-21-2012 10:37 AM

If the first record of the day is at 1 p.m, then select it once. (i.e there will only be at most 2 records for that day, 2 records if we have some data after 4pm and only 1 record for that day if we have no obs after 4pm)

My dataset has 8000 observations each observation has 3 variables

- Trade_Date_Time (datetime variable)

- Options_Trade_Price

- Futures_Trade_Price

and is already sorted by Trade_date_time

Haikuo · Posted 03-21-2012 10:45 AM

I am sure there must be better approaches, effiency wise. But for the time being, try this, since it is not tested, error could happen:

/*to make a group variable by day*/

data have2;

set have2;

date=datepart(Trade_Date_Time);

run;

/*sorting by day and time*/

proc sort data=have2;

by date Trade_Date_Time;

run;

data want (drop=date n1-n3);

/*to get the first row no. after criteria point*/

do _n_=1 by 1 until (last.date);

set have2;

by date datetimevar;

if timepart(Trade_Date_Time)>= '9:15:00't then n1=coalesce(n1,_n_);

if timepart(Trade_Date_Time)>='12:30:00't then n2=coalesce(n2,_n_);

if timepart(Trade_Date_Time)>='16:00:00't then n3=coalesce(n3,_n_);

end;

/*output the results*/

do _n_=1 by 1 until (last.date);

set have2;

by date Trade_Date_Time;

if _n_=n1 then output;

else if _n_=n2 then output;

else if _n_=n3 then output;

end;

run;

Regards,

Haikuo

skipper · Posted 03-21-2012 11:07 AM

thanks, that works great, really appreciate your time, thank you so much 🙂

Linlin · Posted 03-21-2012 10:49 AM

how about:

data test ;

format date mmddyy10. time time5. datetime datetime18.;

input datetime

datetime18. ;

date=datepart(datetime);

time=timepart(datetime);

if time <="9:15"t then delete;

else if "9:15"t<time <="12:30"t then group=1;

else if "12:30"t<time<="16:00"t then group=2;

else group=3;

cards;

04jul2006:08:15

04jul2006:09:15

04jul2006:09:30

04jul2006:12:30

04jul2006:13:45

04jul2006:16:00

04jul2006:23:59

04jul2006:12:30

04jul2006:14:00

04jul2006:16:00

04jul2006:16:35

04jul2006:18:35

06jul2006:09:00

06jul2006:09:20

06jul2006:09:35

06jul2006:12:00

06jul2006:12:30

06jul2006:14:21

06jul2006:16:00

06jul2006:17:35

06jul2006:18:35

;

proc sort;

by date group time;

data want;

set test;

by date group;

if first.group;

run;

proc print;run;

Obs date time datetime group

1 07/04/2006 9:30 04JUL06:09:30:00 1

2 07/04/2006 13:45 04JUL06:13:45:00 2

3 07/04/2006 16:35 04JUL06:16:35:00 3

4 07/06/2006 9:20 06JUL06:09:20:00 1

5 07/06/2006 14:21 06JUL06:14:21:00 2

6 07/06/2006 17:35 06JUL06:17:35:00 3

Linlin

Message was edited by: Linlin

art297 · Posted 03-21-2012 11:07 AM

This is the same as Linlin's suggested code, thus she should get the credit for the answer, but it has one minor change and uses what I think it an easier way to document the code:

data want;

format date mmddyy10. time time5. datetime datetime18.;

input datetime

datetime18. ;

date=datepart(datetime);

time=timepart(datetime);

if "9:15"t <time <= "12:30"t then group=1;

else if "12:30"t <time <= "16:00"t then group=2;

else if "16:00"t < time then group=3;

else group=0;

if group >0;

cards;

04jul2006:08:15

04jul2006:09:15

04jul2006:09:30

04jul2006:12:30

04jul2006:13:45

04jul2006:16:00

04jul2006:23:59

04jul2006:12:30

04jul2006:14:00

04jul2006:16:00

04jul2006:16:35

04jul2006:18:35

06jul2006:09:00

06jul2006:09:20

06jul2006:09:35

06jul2006:12:00

06jul2006:12:30

06jul2006:14:21

06jul2006:16:00

06jul2006:17:35

06jul2006:18:35

07jul2006:17:35

07jul2006:18:35

;

proc sort data=want;

by date group time;

run;

data want;

set want;

by date group;

if first.group;

run;

Linlin · Posted 03-21-2012 11:22 AM

Hi Art,

Thank you! I made some other changes too.

.

skipper · Posted 03-21-2012 11:08 AM

thank you also, this works great as well

PGStats · Posted 03-21-2012 11:24 AM

You could do this:

data periods(keep=period hbeg hend);
input hb mb he me;
hbeg = hms(hb, mb, 0);
hend = hms(he, me, 59.99);
period = _n_;
datalines;
9 15 12 29
12 30 15 59
16 0 23 59
;

data test;
input Trade_Date_Time ANYDTDTM. Options_Trade_Price Futures_Trade_Price;
format Trade_Date_Time datetime20.0;
datalines;
01012012 08:30:00 1 2
01012012 10:30:00 1 2
01012012 12:30:00 1 2
01012012 14:30:00 1 2
01012012 16:30:00 1 2
01012012 18:30:00 1 2
01012012 20:30:00 1 2
01012012 22:30:00 1 2
02012012 08:00:00 1 2
02012012 10:00:00 1 2
02012012 12:00:00 1 2
02012012 14:00:00 1 2
02012012 16:00:00 1 2
02012012 18:00:00 1 2
02012012 20:00:00 1 2
02012012 22:00:00 1 2
;

proc sql;
create table firstTimes as
select min(Trade_date_time) as firstTime, datePart(Trade_date_time) as tradeDate, period
from test inner join periods on timepart(trade_date_time) between hbeg and hend
group by calculated tradeDate, period;

create table firstTrades as
select test.*, period
from test inner join firstTimes on test.Trade_date_time=firstTime;

quit;

PG

Haikuo · Posted 03-21-2012 09:06 PM

Here is a Hash version, no need to presort, no need to create grouping variables, presumely more efficient than the previous one that I had posted. Borrowed LinLin's sample set:

data have;

format datetime datetime18.;

input datetime datetime18. ;

cards;

04jul2006:08:15

04jul2006:09:15

04jul2006:09:30

04jul2006:12:30

04jul2006:13:45

04jul2006:16:00

04jul2006:23:59

04jul2006:12:30

04jul2006:14:00

04jul2006:16:00

04jul2006:16:35

04jul2006:18:35

06jul2006:09:00

06jul2006:09:20

06jul2006:09:35

06jul2006:12:00

06jul2006:12:30

06jul2006:14:21

06jul2006:16:00

06jul2006:17:35

06jul2006:18:35

07aug2008:18:35

07aug2008:18:36

07aug2008:18:33

07aug2008:10:35

;

data want (keep=datetime);

if _n_=1 then do;

set have point=_n_;

dcl hash h(dataset:'have', multidata:'yes', ordered: 'a');

h.definekey('datetime');

h.definedata('datetime');

h.definedone();

dcl hiter hi('h');

end;

do _rc=hi.first() by 0 while (_rc=0);

date=datepart(datetime);

do while (_rc=0);

if timepart(datetime) >= '16:00't then

do;

ct1+1;

if ct1 =1 then output;

end;

else if timepart(datetime) >='12:30't then

do;

ct2+1;

if ct2=1 then output;

end;

else if timepart(datetime) >= '9:15't then

do;

ct3+1;

if ct3=1 then output;

end;

_rc=hi.next();

if date ne datepart(datetime) then leave;

end;

call missing(of ct1-ct3);

end;

stop;

run;

proc print;run;

Regards,

Haikuo

Ksharp · Posted 03-22-2012 04:08 AM

It is my two cents.

data test ;
format datetime datetime18.;
input datetime datetime18. ;
date=datepart(datetime);
time=timepart(datetime);
cards;
04jul2006:08:15
04jul2006:09:15
04jul2006:09:30
04jul2006:12:30
04jul2006:13:45
04jul2006:16:00
04jul2006:23:59
04jul2006:12:30
04jul2006:14:00
04jul2006:16:00
04jul2006:16:35
04jul2006:18:35
06jul2006:09:00
06jul2006:09:20
06jul2006:09:35
06jul2006:12:00
06jul2006:12:30
06jul2006:14:21
06jul2006:16:00
06jul2006:17:35
06jul2006:18:35
;
run;
proc sort data=test; by date time;run;
data want(keep=datetime);
 set test;
 by date;
 retain one two three;
 if first.date then do;one=0;two=0;three=0;end;
 if (time ge '9:15:00't) then do;if not one then output;one=1;end;
 if (time ge '12:30:00't) then do;if not two then output;two=1;end;
 if (time ge '16:00:00't ) then do;if not three then output;three=1;end;
run;

Ksharp

Registration is open

SAS Training: Just a Click Away