BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
LzEr23
Obsidian | Level 7

Dear all,

I am trying to convert my data set into an time interval data, and I would be glad if I can get some help on this.

 

The current data that I have is a set of trade (or transaction) record, and it includes datetime information looking like this:

01JAN2018 09:00:00.00

 

There are many other variables in the dataset, but it doesn't matter. Basically, it is a list of trade records with its traded time.

 

What I am trying to do is to transform this data into a 5 min time interval data, and get the number of observations for each interval.

For example, if there are 15 obs in the first 5 minutes of the data and 10 obs in the second,

then I would like to have a new dataset with list(or table) looking like this:

interval1    15

interval2    10

 

The last time I tried this, there weren't many obs, and I could do it manually, one by one.

But this time, there are thousands of obs and the given total time span is too long (more than a month).

So I would be happy get some advice on how I can code this efficiently and properly.

I hope I've been clear about what I am trying to say...

1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

This breaks down into two steps,

- assign a variable for time window

- count items grouped by new var

As you have not provided any test data in the form of a datastep or what the output should look like, this is just a shell:

data want;
  set have;
  by idvar;
  retain lsttime timewindow;
  if first.idvar then do
    timewindow=1;
    lsttime=datetime;
  end;
  else do;
    if intck('minute',datetime,lsttime) > 5 then do;
      timewindow=timewindow+1;
      lsttime=datetime;
    end;
  end;
run;
 
proc sql;
  create table counts as
  select  idvar,
             timewindow,
             count(*) as result
  from    want
  group by idvar,timewindow;
quit;

Of course you will need to change var names and such like as I have no test data in datastep to work with!!

View solution in original post

6 REPLIES 6
RW9
Diamond | Level 26 RW9
Diamond | Level 26

This breaks down into two steps,

- assign a variable for time window

- count items grouped by new var

As you have not provided any test data in the form of a datastep or what the output should look like, this is just a shell:

data want;
  set have;
  by idvar;
  retain lsttime timewindow;
  if first.idvar then do
    timewindow=1;
    lsttime=datetime;
  end;
  else do;
    if intck('minute',datetime,lsttime) > 5 then do;
      timewindow=timewindow+1;
      lsttime=datetime;
    end;
  end;
run;
 
proc sql;
  create table counts as
  select  idvar,
             timewindow,
             count(*) as result
  from    want
  group by idvar,timewindow;
quit;

Of course you will need to change var names and such like as I have no test data in datastep to work with!!

LzEr23
Obsidian | Level 7

Thank a lot for the help!

I can see how it works, but somehow the timewindow variable appears to be 1 all the way down.

 

The datastep is too much to be posted here but for the var names,  it looks like:

trade_date    trade_code    trade_price      datetime

20180101       AAAAAA                  30          01JAN2018 09:00:00.00

20180101       AAAAAA                  30          01JAN2018 09:00:01.27

20180101       AAAAAA                  31          01JAN2018 09:00:02.12

20180101       AAAAAA                  31          01JAN2018 10:00:01.01

20180102       AAAAAA                  32          01JAN2018 09:00:00.00

 

where code is just a random code for identification.

I'm not sure what you mean by 'idvar' there.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

idvar would just be whatever variables group the data, possibly trade_code in your example.  Do note that test data should be in a datastep and just needs to be illustrative of the problem:

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

 

LzEr23
Obsidian | Level 7

Thank you for the help.

What you suggested does work.

 

Only, that it doesn't include the interval without observations.

in case there are 5 obs in first 5mins, 12 obs in second, 5 in third,

it successfully count 5, 12, 5.

but when there are 5 obs in first, 0 ob in second and 5 tin third,

instead of counting 5, 0, 5,

it simply gives out 5, 5.

Would there be anyway to solve this problem? to get 0 for the 5min interval without any obs?

Patrick
Opal | Level 21

You could just align your datetime values to 5 minutes intervals.

data have;
  format dttm datetime20.;
  do dttm=datetime() to datetime()+10000 by 1;
    output;
  end;
  stop;
run;

data want;
  set have;
  format groupDttm datetime20.;
  groupDttm=intnx('dtminute5',dttm,0,'b');
run;
LzEr23
Obsidian | Level 7

Thank you.

your codes seem to work as well.

But still having the same problem I posted above.

But thank you for your suggestion, it did help a lot.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 2839 views
  • 1 like
  • 3 in conversation