BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
skipper
Calcite | Level 5

I have a dataset of 8000 observation each with a datetime variable.

I want to create a new dataset where it only takes the

1. first observation after 9.15am each day

2. first observation after 12.30pm each day

3. first observation after 4pm each day

My dataset has 8000 obs spanning 65 days (all days are week days, no obs on weekends), and so I wish for this new data set to have only 3 obs each day, i.e the dataset would have 65*3=195 observations.

if say, there was no more records after 4pm, then yes that day would just have 2 records. So no it does not have to be 3 records per day.

any advise would be much appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
Haikuo
Onyx | Level 15

I am sure there must be better approaches, effiency wise. But for the time being, try this, since it is not tested, error could happen:

/*to make a group variable by day*/

data have2;

set have2;

date=datepart(Trade_Date_Time);

run;

/*sorting by day and time*/

proc sort data=have2;

by date Trade_Date_Time;

run;

data want (drop=date n1-n3);

/*to get the first row no. after criteria point*/

do _n_=1 by 1 until (last.date);

   set have2;

   by date datetimevar;

   if timepart(Trade_Date_Time)>= '9:15:00't then n1=coalesce(n1,_n_);

   if timepart(Trade_Date_Time)>='12:30:00't then n2=coalesce(n2,_n_);

   if timepart(Trade_Date_Time)>='16:00:00't then n3=coalesce(n3,_n_);

end;

/*output the results*/

do _n_=1 by 1 until (last.date);

    set have2;

by date Trade_Date_Time;

if _n_=n1 then output;

else if _n_=n2 then output;

else if _n_=n3 then output;

end;

run;

Regards,

Haikuo

View solution in original post

13 REPLIES 13
Haikuo
Onyx | Level 15

What if , say, you only have 1 record after 9 and 12, is '3 record per day' a mandate or in some special case you will take 0-2 records per day? if it has to be 3, how you want them to be repeated?

Haikuo

skipper
Calcite | Level 5

if say, there was no more records after 4pm, then yes that day would just have 2 records. So no it does not have to be 3 records per day.

Astounding
PROC Star

Along similar lines ...

If the first record of the day is at 1 p.m., should that record be selected once or twice?

The programming won't be that difficult, assuming you can sort the data by your date-time variable.  It's the objective that is critical.

skipper
Calcite | Level 5

If the first record of the day is at 1 p.m, then select it once. (i.e  there will only be at most 2 records for that day, 2 records if we have  some data after 4pm and only 1 record for that day if we have no obs after 4pm)

My dataset has 8000 observations each observation has 3 variables

- Trade_Date_Time (datetime variable)

- Options_Trade_Price

- Futures_Trade_Price

and is already sorted by Trade_date_time


Haikuo
Onyx | Level 15

I am sure there must be better approaches, effiency wise. But for the time being, try this, since it is not tested, error could happen:

/*to make a group variable by day*/

data have2;

set have2;

date=datepart(Trade_Date_Time);

run;

/*sorting by day and time*/

proc sort data=have2;

by date Trade_Date_Time;

run;

data want (drop=date n1-n3);

/*to get the first row no. after criteria point*/

do _n_=1 by 1 until (last.date);

   set have2;

   by date datetimevar;

   if timepart(Trade_Date_Time)>= '9:15:00't then n1=coalesce(n1,_n_);

   if timepart(Trade_Date_Time)>='12:30:00't then n2=coalesce(n2,_n_);

   if timepart(Trade_Date_Time)>='16:00:00't then n3=coalesce(n3,_n_);

end;

/*output the results*/

do _n_=1 by 1 until (last.date);

    set have2;

by date Trade_Date_Time;

if _n_=n1 then output;

else if _n_=n2 then output;

else if _n_=n3 then output;

end;

run;

Regards,

Haikuo

skipper
Calcite | Level 5

thanks, that works great, really appreciate your time, thank you so much 🙂

Linlin
Lapis Lazuli | Level 10

how about:

data test ;

format date mmddyy10. time time5. datetime datetime18.;

input datetime

datetime18. ;

date=datepart(datetime);

time=timepart(datetime);

if time <="9:15"t then delete;

else if "9:15"t<time <="12:30"t then group=1;

   else if "12:30"t<time<="16:00"t then group=2;

      else group=3;

cards;

04jul2006:08:15

04jul2006:09:15

04jul2006:09:30

04jul2006:12:30

04jul2006:13:45

04jul2006:16:00

04jul2006:23:59

04jul2006:12:30

04jul2006:14:00

04jul2006:16:00

04jul2006:16:35

04jul2006:18:35

06jul2006:09:00

06jul2006:09:20

06jul2006:09:35

06jul2006:12:00

06jul2006:12:30

06jul2006:14:21

06jul2006:16:00

06jul2006:17:35

06jul2006:18:35

;

proc sort;

  by date group time;

data want;

  set test;

  by date group;

  if first.group;

run;

proc print;run;

Obs          date     time              datetime    group

1     07/04/2006     9:30      04JUL06:09:30:00      1

2     07/04/2006    13:45      04JUL06:13:45:00      2

3     07/04/2006    16:35      04JUL06:16:35:00      3

4     07/06/2006     9:20      06JUL06:09:20:00      1

5     07/06/2006    14:21      06JUL06:14:21:00      2

6     07/06/2006    17:35      06JUL06:17:35:00      3

Linlin

Message was edited by: Linlin

art297
Opal | Level 21

This is the same as Linlin's suggested code, thus she should get the credit for the answer, but it has one minor change and uses what I think it an easier way to document the code:

data want;

  format date mmddyy10. time time5. datetime datetime18.;

  input datetime

  datetime18. ;

  date=datepart(datetime);

  time=timepart(datetime);

  if "9:15"t <time <= "12:30"t then group=1;

   else if "12:30"t <time <= "16:00"t then group=2;

    else if "16:00"t < time then group=3;

     else group=0;

  if group >0;

  cards;

04jul2006:08:15

04jul2006:09:15

04jul2006:09:30

04jul2006:12:30

04jul2006:13:45

04jul2006:16:00

04jul2006:23:59

04jul2006:12:30

04jul2006:14:00

04jul2006:16:00

04jul2006:16:35

04jul2006:18:35

06jul2006:09:00

06jul2006:09:20

06jul2006:09:35

06jul2006:12:00

06jul2006:12:30

06jul2006:14:21

06jul2006:16:00

06jul2006:17:35

06jul2006:18:35

07jul2006:17:35

07jul2006:18:35

;

proc sort data=want;

  by date group time;

run;

data want;

  set want;

  by date group;

  if first.group;

run;

Linlin
Lapis Lazuli | Level 10

Hi Art,

Thank you! I made some other changes too.

.

skipper
Calcite | Level 5

thank you also, this works great as well

PGStats
Opal | Level 21

You could do this:

data periods(keep=period hbeg hend);
input hb mb he me;
hbeg = hms(hb, mb, 0);
hend = hms(he, me, 59.99);
period = _n_;
datalines;
9 15 12 29
12 30 15 59
16 0 23 59
;

data test;
input Trade_Date_Time ANYDTDTM. Options_Trade_Price Futures_Trade_Price;
format Trade_Date_Time datetime20.0;
datalines;
01012012 08:30:00  1 2
01012012 10:30:00  1 2
01012012 12:30:00  1 2
01012012 14:30:00  1 2
01012012 16:30:00  1 2
01012012 18:30:00  1 2
01012012 20:30:00  1 2
01012012 22:30:00  1 2
02012012 08:00:00  1 2
02012012 10:00:00  1 2
02012012 12:00:00  1 2
02012012 14:00:00  1 2
02012012 16:00:00  1 2
02012012 18:00:00  1 2
02012012 20:00:00  1 2
02012012 22:00:00  1 2
;

proc sql;
create table firstTimes as
select min(Trade_date_time) as firstTime, datePart(Trade_date_time) as tradeDate, period
from test inner join periods on timepart(trade_date_time) between hbeg and hend
group by calculated tradeDate, period;

create table firstTrades as
select test.*, period
from test inner join firstTimes on test.Trade_date_time=firstTime;

quit;

PG

PG
Haikuo
Onyx | Level 15

Here is a Hash version, no need to presort, no need to create grouping variables, presumely  more efficient than the previous one that I had posted. Borrowed LinLin's sample set:

data have;

format  datetime datetime18.;

input datetime datetime18. ;

cards;

04jul2006:08:15

04jul2006:09:15

04jul2006:09:30

04jul2006:12:30

04jul2006:13:45

04jul2006:16:00

04jul2006:23:59

04jul2006:12:30

04jul2006:14:00

04jul2006:16:00

04jul2006:16:35

04jul2006:18:35

06jul2006:09:00

06jul2006:09:20

06jul2006:09:35

06jul2006:12:00

06jul2006:12:30

06jul2006:14:21

06jul2006:16:00

06jul2006:17:35

06jul2006:18:35

07aug2008:18:35

07aug2008:18:36

07aug2008:18:36

07aug2008:18:33

07aug2008:10:35

;

data want (keep=datetime);

  if _n_=1 then do;

     set have point=_n_;

     dcl hash h(dataset:'have', multidata:'yes', ordered: 'a');

      h.definekey('datetime');

      h.definedata('datetime');

      h.definedone();

      dcl hiter hi('h');

   end;

do _rc=hi.first() by 0 while (_rc=0);

       date=datepart(datetime);

        do  while (_rc=0);

          if timepart(datetime) >= '16:00't then

             do;

                 ct1+1;

                 if ct1 =1 then output;

             end;

            else if timepart(datetime) >='12:30't then

                 do;

                     ct2+1;

                     if ct2=1 then output;

                 end;

            else if timepart(datetime) >= '9:15't then

                 do;

                    ct3+1;

                    if ct3=1 then output;

                 end;

          _rc=hi.next();

          if date ne datepart(datetime) then leave;

      end;

     call missing(of ct1-ct3);

  end;

    stop;

run;

proc print;run;

Regards,

Haikuo

Ksharp
Super User

It is my two cents.

data test ;
format datetime datetime18.;
input datetime datetime18. ;
date=datepart(datetime);
time=timepart(datetime);
cards;
04jul2006:08:15
04jul2006:09:15
04jul2006:09:30
04jul2006:12:30
04jul2006:13:45
04jul2006:16:00
04jul2006:23:59
04jul2006:12:30
04jul2006:14:00
04jul2006:16:00
04jul2006:16:35
04jul2006:18:35
06jul2006:09:00
06jul2006:09:20
06jul2006:09:35
06jul2006:12:00
06jul2006:12:30
06jul2006:14:21
06jul2006:16:00
06jul2006:17:35
06jul2006:18:35
;
run;
proc sort data=test; by date time;run;
data want(keep=datetime);
 set test;
 by date;
 retain one two three;
 if first.date then do;one=0;two=0;three=0;end;
 if (time ge '9:15:00't) then do;if not one then output;one=1;end;
 if (time ge '12:30:00't) then do;if not two then output;two=1;end;
 if (time ge '16:00:00't ) then do;if not three then output;three=1;end;
run;


Ksharp

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1590 views
  • 6 likes
  • 7 in conversation