Dear all,
I am trying to convert my data set into an time interval data, and I would be glad if I can get some help on this.
The current data that I have is a set of trade (or transaction) record, and it includes datetime information looking like this:
01JAN2018 09:00:00.00
There are many other variables in the dataset, but it doesn't matter. Basically, it is a list of trade records with its traded time.
What I am trying to do is to transform this data into a 5 min time interval data, and get the number of observations for each interval.
For example, if there are 15 obs in the first 5 minutes of the data and 10 obs in the second,
then I would like to have a new dataset with list(or table) looking like this:
interval1 15
interval2 10
The last time I tried this, there weren't many obs, and I could do it manually, one by one.
But this time, there are thousands of obs and the given total time span is too long (more than a month).
So I would be happy get some advice on how I can code this efficiently and properly.
I hope I've been clear about what I am trying to say...
This breaks down into two steps,
- assign a variable for time window
- count items grouped by new var
As you have not provided any test data in the form of a datastep or what the output should look like, this is just a shell:
data want; set have; by idvar; retain lsttime timewindow; if first.idvar then do timewindow=1; lsttime=datetime; end; else do; if intck('minute',datetime,lsttime) > 5 then do; timewindow=timewindow+1; lsttime=datetime; end; end; run; proc sql; create table counts as select idvar, timewindow, count(*) as result from want group by idvar,timewindow; quit;
Of course you will need to change var names and such like as I have no test data in datastep to work with!!
This breaks down into two steps,
- assign a variable for time window
- count items grouped by new var
As you have not provided any test data in the form of a datastep or what the output should look like, this is just a shell:
data want; set have; by idvar; retain lsttime timewindow; if first.idvar then do timewindow=1; lsttime=datetime; end; else do; if intck('minute',datetime,lsttime) > 5 then do; timewindow=timewindow+1; lsttime=datetime; end; end; run; proc sql; create table counts as select idvar, timewindow, count(*) as result from want group by idvar,timewindow; quit;
Of course you will need to change var names and such like as I have no test data in datastep to work with!!
Thank a lot for the help!
I can see how it works, but somehow the timewindow variable appears to be 1 all the way down.
The datastep is too much to be posted here but for the var names, it looks like:
trade_date trade_code trade_price datetime
20180101 AAAAAA 30 01JAN2018 09:00:00.00
20180101 AAAAAA 30 01JAN2018 09:00:01.27
20180101 AAAAAA 31 01JAN2018 09:00:02.12
20180101 AAAAAA 31 01JAN2018 10:00:01.01
20180102 AAAAAA 32 01JAN2018 09:00:00.00
where code is just a random code for identification.
I'm not sure what you mean by 'idvar' there.
idvar would just be whatever variables group the data, possibly trade_code in your example. Do note that test data should be in a datastep and just needs to be illustrative of the problem:
Thank you for the help.
What you suggested does work.
Only, that it doesn't include the interval without observations.
in case there are 5 obs in first 5mins, 12 obs in second, 5 in third,
it successfully count 5, 12, 5.
but when there are 5 obs in first, 0 ob in second and 5 tin third,
instead of counting 5, 0, 5,
it simply gives out 5, 5.
Would there be anyway to solve this problem? to get 0 for the 5min interval without any obs?
You could just align your datetime values to 5 minutes intervals.
data have;
format dttm datetime20.;
do dttm=datetime() to datetime()+10000 by 1;
output;
end;
stop;
run;
data want;
set have;
format groupDttm datetime20.;
groupDttm=intnx('dtminute5',dttm,0,'b');
run;
Thank you.
your codes seem to work as well.
But still having the same problem I posted above.
But thank you for your suggestion, it did help a lot.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.