Dear all,
This question partly originates from a larger problem currently addressed. For more info see below:
I have a large data set with 64 million observations containing high frequency (up to the second) data for currencies for a time period of 68 days.
I would like to reduce this dataset to a minute interval that is. Randomly to pick for each currency 1 observation per minute. This should net 60x24=1440 observations per day per currency and around 100 000(1440x68) per currency for the whole time period.
Since I have around 10 currencies the dataset will be reduced from 64 million to 10x100 000= 1 million.
Do you have any ideas on how to reduce the dataset based on my suggestion?
I will then use this reduced dataset to overcome the computation difficulties that appear in the matching question post (see above link)
Attached is a sample of the data set for only once currency.
Thank you
Best
Neo
If your full dataset is sorted by _RIC, date_G_ and time_G_, you could use surveyselect this way :
proc sql;
create view chfReuters3 as
select
*,
intnx("MINUTE", dhms(datepart(date_G_), hour(time_G_), minute(time_G_), second(time_G_)), 0)
as minute format=datetime13.
from sasforum.chfReuters3;
quit;
options nonotes; /* Prevents the printing of a note for every minute with only 1 obs */
proc surveyselect data=chfReuters3 out=chfReutersMinute method=srs sampsize=1;
strata _RIC minute;
run;
options notes;
PG
If your full dataset is sorted by _RIC, date_G_ and time_G_, you could use surveyselect this way :
proc sql;
create view chfReuters3 as
select
*,
intnx("MINUTE", dhms(datepart(date_G_), hour(time_G_), minute(time_G_), second(time_G_)), 0)
as minute format=datetime13.
from sasforum.chfReuters3;
quit;
options nonotes; /* Prevents the printing of a note for every minute with only 1 obs */
proc surveyselect data=chfReuters3 out=chfReutersMinute method=srs sampsize=1;
strata _RIC minute;
run;
options notes;
PG
PG,
Many thanks for this one, it worked perfectly. Apologies for the late reply, had the impression I had provided feedback
Cheers
Neo
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.