Dear all,
I am trying to convert my data set into an time interval data, and I would be glad if I can get some help on this.
The current data that I have is a set of trade (or transaction) record, and it includes datetime information looking like this:
01JAN2018 09:00:00.00
There are many other variables in the dataset, but it doesn't matter. Basically, it is a list of trade records with its traded time.
What I am trying to do is to transform this data into a 5 min time interval data, and get the number of observations for each interval.
For example, if there are 15 obs in the first 5 minutes of the data and 10 obs in the second,
then I would like to have a new dataset with list(or table) looking like this:
interval1 15
interval2 10
The last time I tried this, there weren't many obs, and I could do it manually, one by one.
But this time, there are thousands of obs and the given total time span is too long (more than a month).
So I would be happy get some advice on how I can code this efficiently and properly.
I hope I've been clear about what I am trying to say...
This breaks down into two steps,
- assign a variable for time window
- count items grouped by new var
As you have not provided any test data in the form of a datastep or what the output should look like, this is just a shell:
data want; set have; by idvar; retain lsttime timewindow; if first.idvar then do timewindow=1; lsttime=datetime; end; else do; if intck('minute',datetime,lsttime) > 5 then do; timewindow=timewindow+1; lsttime=datetime; end; end; run; proc sql; create table counts as select idvar, timewindow, count(*) as result from want group by idvar,timewindow; quit;
Of course you will need to change var names and such like as I have no test data in datastep to work with!!
This breaks down into two steps,
- assign a variable for time window
- count items grouped by new var
As you have not provided any test data in the form of a datastep or what the output should look like, this is just a shell:
data want; set have; by idvar; retain lsttime timewindow; if first.idvar then do timewindow=1; lsttime=datetime; end; else do; if intck('minute',datetime,lsttime) > 5 then do; timewindow=timewindow+1; lsttime=datetime; end; end; run; proc sql; create table counts as select idvar, timewindow, count(*) as result from want group by idvar,timewindow; quit;
Of course you will need to change var names and such like as I have no test data in datastep to work with!!
Thank a lot for the help!
I can see how it works, but somehow the timewindow variable appears to be 1 all the way down.
The datastep is too much to be posted here but for the var names, it looks like:
trade_date trade_code trade_price datetime
20180101 AAAAAA 30 01JAN2018 09:00:00.00
20180101 AAAAAA 30 01JAN2018 09:00:01.27
20180101 AAAAAA 31 01JAN2018 09:00:02.12
20180101 AAAAAA 31 01JAN2018 10:00:01.01
20180102 AAAAAA 32 01JAN2018 09:00:00.00
where code is just a random code for identification.
I'm not sure what you mean by 'idvar' there.
idvar would just be whatever variables group the data, possibly trade_code in your example. Do note that test data should be in a datastep and just needs to be illustrative of the problem:
Thank you for the help.
What you suggested does work.
Only, that it doesn't include the interval without observations.
in case there are 5 obs in first 5mins, 12 obs in second, 5 in third,
it successfully count 5, 12, 5.
but when there are 5 obs in first, 0 ob in second and 5 tin third,
instead of counting 5, 0, 5,
it simply gives out 5, 5.
Would there be anyway to solve this problem? to get 0 for the 5min interval without any obs?
You could just align your datetime values to 5 minutes intervals.
data have;
format dttm datetime20.;
do dttm=datetime() to datetime()+10000 by 1;
output;
end;
stop;
run;
data want;
set have;
format groupDttm datetime20.;
groupDttm=intnx('dtminute5',dttm,0,'b');
run;
Thank you.
your codes seem to work as well.
But still having the same problem I posted above.
But thank you for your suggestion, it did help a lot.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.