# How to reduce a large time series dataset

Dear all,

I have a large data set with 64 million observations containing high frequency (up to the second) data for currencies for a time period of 68 days.

I would like to reduce this dataset to a minute interval that is. Randomly to pick for each currency 1 observation per minute. This should net 60x24=1440 observations per day per currency and around 100 000(1440x68) per currency for the whole time period.

Since I have around 10 currencies the dataset will be reduced from 64 million to 10x100 000= 1 million.

Do you have any ideas on how to reduce the dataset based on my suggestion?

I will then use this reduced dataset to overcome the computation difficulties that appear in the matching question post (see above link)

Attached is a sample of the data set for only once currency.

Thank you

Best

Neo

‎03-03-2013 04:06 PM
Posts: 5,521

## Re: How to reduce a large time series dataset

If your full dataset is sorted by _RIC, date_G_ and time_G_, you could use surveyselect this way :

proc sql;
create view chfReuters3 as
select

*,

intnx("MINUTE", dhms(datepart(date_G_), hour(time_G_), minute(time_G_), second(time_G_)), 0)
as minute format=datetime13.
from sasforum.chfReuters3;
quit;

options nonotes; /* Prevents the printing of a note for every minute with only 1 obs */

proc surveyselect data=chfReuters3 out=chfReutersMinute method=srs sampsize=1;
strata _RIC minute;
run;

options notes;

PG

PG

## Re: How to reduce a large time series dataset

PG,

Many thanks for this one, it worked perfectly. Apologies for the late reply, had the impression I had provided feedback

Cheers

Neo

