Help using Base SAS procedures

correlation on large data set

Reply
N/A
Posts: 0

correlation on large data set

Hi all,

Firstly, I'm slightly new to SAS. I would like to compute correlation coefficients on a large data set X with a single common variable Y.

The data set X is now sorted such that daily observations run down the table, stock names run across the top, with returns in the table. i.e.,

stock_a stock_b stock_c
day 1 . . .
day 2 . . .
day 3 . . .

The difficulty I'm having is that I would like to compute the correlation between the X's and the Y on a monthly basis (where I have daily data). In Matlab for instance I would do this by looping over the rows then the columns and filling up containers then computing the correlations for each individual month.

Does SAS have an easy way to do this given my X is large (60 million observations) and contains missing data.

Thank a bundle. I'm learning fast and I'm liking SAS so far.

Message was edited by: thepowertoknow? Message was edited by: thepowertoknow?
PROC Star
Posts: 7,473

Re: correlation on large data set

Posted in reply to deleted_user
You have to tell us more! If you have 60 million records, one for each day and with each row containing some kind of values for a series of stocks, then you have 164,271 years worth of data. I didn't know that stocks have been around that long.

You can build loops into SAS code but, more likely, you will want to use something like proc summary to calculate the desired averages for each month and, if necessary, transpose the file.

However, for anyone to help, they would have to know what your data really are, and whic variables you want to obtain correlations for.

Art
> Hi all,
>
> Firstly, I'm slightly new to SAS. I would like to
> compute correlation coefficients on a large data set
> X with a single common variable Y.
>
> The data set X is now sorted such that daily
> observations run down the table, stock names run
> across the top, with returns in the table. i.e.,
>
> stock_a stock_b stock_c
> . . .
> day 2 . . .
> day 3 . . .
>
> The difficulty I'm having is that I would like to
> compute the correlation between the X's and the Y on
> a monthly basis (where I have daily data). In Matlab
> for instance I would do this by looping over the rows
> then the columns and filling up containers then
> computing the correlations for each individual month.
>
>
> Does SAS have an easy way to do this given my X is
> large (60 million observations) and contains missing
> data.
>
> Thank a bundle. I'm learning fast and I'm liking SAS
> so far.
>
> Message was edited by: thepowertoknow?
>
> Message was edited by: thepowertoknow?
Super User
Posts: 10,028

Re: correlation on large data set

Posted in reply to deleted_user
Proc corr can give you correlation coefficients ( Pearson or Spearman ).




Ksharp
Super User
Posts: 11,343

Re: correlation on large data set

Posted in reply to deleted_user
If I understand it sounds like you are wanting to use your date in a BY group.
If the "day" variable is a SAS date variable and the data is sorted by that variable, then try:

PROC CORR DATA=;
by date;
var x y;
format date monyy7.;
run;

This will create one correlation output table for each month and year that appears in the data. You may want to direct the output to a data set.
N/A
Posts: 0

Re: correlation on large data set

Hi,

yes - this is exactly how I did it in the end. Thanks a lot.
Ask a Question
Discussion stats
  • 4 replies
  • 352 views
  • 0 likes
  • 4 in conversation