Help using Base SAS procedures

How to reduce a large time series dataset

Accepted Solution Solved
Reply
Contributor
Posts: 57
Accepted Solution

How to reduce a large time series dataset

Dear all,

This question partly originates from a larger problem currently addressed. For more info see below:

I have a large data set with 64 million observations containing high frequency (up to the second) data for currencies for a time period of 68 days.

I would like to reduce this dataset to a minute interval that is. Randomly to pick for each currency 1 observation per minute. This should net 60x24=1440 observations per day per currency and around 100 000(1440x68) per currency for the whole time period.

Since I have around 10 currencies the dataset will be reduced from 64 million to 10x100 000= 1 million.

Do you have any ideas on how to reduce the dataset based on my suggestion?

I will then use this reduced dataset to overcome the computation difficulties that appear in the matching question post (see above link)

Attached is a sample of the data set for only once currency.

Thank you

Best

Neo

Attachment

Accepted Solutions
Solution
‎03-03-2013 04:06 PM
Respected Advisor
Posts: 4,927

Re: How to reduce a large time series dataset

If your full dataset is sorted by _RIC, date_G_ and time_G_, you could use surveyselect this way :

proc sql;
create view chfReuters3 as
select

     *,

     intnx("MINUTE", dhms(datepart(date_G_), hour(time_G_), minute(time_G_), second(time_G_)), 0)
          as minute format=datetime13.
from sasforum.chfReuters3;
quit;

options nonotes; /* Prevents the printing of a note for every minute with only 1 obs */


proc surveyselect data=chfReuters3 out=chfReutersMinute method=srs sampsize=1;
strata _RIC minute;
run;

options notes;

PG

PG

View solution in original post


All Replies
Solution
‎03-03-2013 04:06 PM
Respected Advisor
Posts: 4,927

Re: How to reduce a large time series dataset

If your full dataset is sorted by _RIC, date_G_ and time_G_, you could use surveyselect this way :

proc sql;
create view chfReuters3 as
select

     *,

     intnx("MINUTE", dhms(datepart(date_G_), hour(time_G_), minute(time_G_), second(time_G_)), 0)
          as minute format=datetime13.
from sasforum.chfReuters3;
quit;

options nonotes; /* Prevents the printing of a note for every minute with only 1 obs */


proc surveyselect data=chfReuters3 out=chfReutersMinute method=srs sampsize=1;
strata _RIC minute;
run;

options notes;

PG

PG
Contributor
Posts: 57

Re: How to reduce a large time series dataset

PG,

Many thanks for this one, it worked perfectly. Apologies for the late reply, had the impression I had provided feedback

Cheers

Neo

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 216 views
  • 0 likes
  • 2 in conversation