BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
LzEr23
Obsidian | Level 7

Dear all,

I am trying to convert my data set into an time interval data, and I would be glad if I can get some help on this.

 

The current data that I have is a set of trade (or transaction) record, and it includes datetime information looking like this:

01JAN2018 09:00:00.00

 

There are many other variables in the dataset, but it doesn't matter. Basically, it is a list of trade records with its traded time.

 

What I am trying to do is to transform this data into a 5 min time interval data, and get the number of observations for each interval.

For example, if there are 15 obs in the first 5 minutes of the data and 10 obs in the second,

then I would like to have a new dataset with list(or table) looking like this:

interval1    15

interval2    10

 

The last time I tried this, there weren't many obs, and I could do it manually, one by one.

But this time, there are thousands of obs and the given total time span is too long (more than a month).

So I would be happy get some advice on how I can code this efficiently and properly.

I hope I've been clear about what I am trying to say...

1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

This breaks down into two steps,

- assign a variable for time window

- count items grouped by new var

As you have not provided any test data in the form of a datastep or what the output should look like, this is just a shell:

data want;
  set have;
  by idvar;
  retain lsttime timewindow;
  if first.idvar then do
    timewindow=1;
    lsttime=datetime;
  end;
  else do;
    if intck('minute',datetime,lsttime) > 5 then do;
      timewindow=timewindow+1;
      lsttime=datetime;
    end;
  end;
run;
 
proc sql;
  create table counts as
  select  idvar,
             timewindow,
             count(*) as result
  from    want
  group by idvar,timewindow;
quit;

Of course you will need to change var names and such like as I have no test data in datastep to work with!!

View solution in original post

6 REPLIES 6
RW9
Diamond | Level 26 RW9
Diamond | Level 26

This breaks down into two steps,

- assign a variable for time window

- count items grouped by new var

As you have not provided any test data in the form of a datastep or what the output should look like, this is just a shell:

data want;
  set have;
  by idvar;
  retain lsttime timewindow;
  if first.idvar then do
    timewindow=1;
    lsttime=datetime;
  end;
  else do;
    if intck('minute',datetime,lsttime) > 5 then do;
      timewindow=timewindow+1;
      lsttime=datetime;
    end;
  end;
run;
 
proc sql;
  create table counts as
  select  idvar,
             timewindow,
             count(*) as result
  from    want
  group by idvar,timewindow;
quit;

Of course you will need to change var names and such like as I have no test data in datastep to work with!!

LzEr23
Obsidian | Level 7

Thank a lot for the help!

I can see how it works, but somehow the timewindow variable appears to be 1 all the way down.

 

The datastep is too much to be posted here but for the var names,  it looks like:

trade_date    trade_code    trade_price      datetime

20180101       AAAAAA                  30          01JAN2018 09:00:00.00

20180101       AAAAAA                  30          01JAN2018 09:00:01.27

20180101       AAAAAA                  31          01JAN2018 09:00:02.12

20180101       AAAAAA                  31          01JAN2018 10:00:01.01

20180102       AAAAAA                  32          01JAN2018 09:00:00.00

 

where code is just a random code for identification.

I'm not sure what you mean by 'idvar' there.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

idvar would just be whatever variables group the data, possibly trade_code in your example.  Do note that test data should be in a datastep and just needs to be illustrative of the problem:

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

 

LzEr23
Obsidian | Level 7

Thank you for the help.

What you suggested does work.

 

Only, that it doesn't include the interval without observations.

in case there are 5 obs in first 5mins, 12 obs in second, 5 in third,

it successfully count 5, 12, 5.

but when there are 5 obs in first, 0 ob in second and 5 tin third,

instead of counting 5, 0, 5,

it simply gives out 5, 5.

Would there be anyway to solve this problem? to get 0 for the 5min interval without any obs?

Patrick
Opal | Level 21

You could just align your datetime values to 5 minutes intervals.

data have;
  format dttm datetime20.;
  do dttm=datetime() to datetime()+10000 by 1;
    output;
  end;
  stop;
run;

data want;
  set have;
  format groupDttm datetime20.;
  groupDttm=intnx('dtminute5',dttm,0,'b');
run;
LzEr23
Obsidian | Level 7

Thank you.

your codes seem to work as well.

But still having the same problem I posted above.

But thank you for your suggestion, it did help a lot.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 3295 views
  • 1 like
  • 3 in conversation