Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- correlation on large data set

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-19-2011 01:52 PM

Hi all,

Firstly, I'm slightly new to SAS. I would like to compute correlation coefficients on a large data set X with a single common variable Y.

The data set X is now sorted such that daily observations run down the table, stock names run across the top, with returns in the table. i.e.,

stock_a stock_b stock_c

day 1 . . .

day 2 . . .

day 3 . . .

The difficulty I'm having is that I would like to compute the correlation between the X's and the Y on a monthly basis (where I have daily data). In Matlab for instance I would do this by looping over the rows then the columns and filling up containers then computing the correlations for each individual month.

Does SAS have an easy way to do this given my X is large (60 million observations) and contains missing data.

Thank a bundle. I'm learning fast and I'm liking SAS so far.

Message was edited by: thepowertoknow? Message was edited by: thepowertoknow?

Firstly, I'm slightly new to SAS. I would like to compute correlation coefficients on a large data set X with a single common variable Y.

The data set X is now sorted such that daily observations run down the table, stock names run across the top, with returns in the table. i.e.,

stock_a stock_b stock_c

day 1 . . .

day 2 . . .

day 3 . . .

The difficulty I'm having is that I would like to compute the correlation between the X's and the Y on a monthly basis (where I have daily data). In Matlab for instance I would do this by looping over the rows then the columns and filling up containers then computing the correlations for each individual month.

Does SAS have an easy way to do this given my X is large (60 million observations) and contains missing data.

Thank a bundle. I'm learning fast and I'm liking SAS so far.

Message was edited by: thepowertoknow? Message was edited by: thepowertoknow?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-20-2011 12:43 PM

You have to tell us more! If you have 60 million records, one for each day and with each row containing some kind of values for a series of stocks, then you have 164,271 years worth of data. I didn't know that stocks have been around that long.

You can build loops into SAS code but, more likely, you will want to use something like proc summary to calculate the desired averages for each month and, if necessary, transpose the file.

However, for anyone to help, they would have to know what your data really are, and whic variables you want to obtain correlations for.

Art

> Hi all,

>

> Firstly, I'm slightly new to SAS. I would like to

> compute correlation coefficients on a large data set

> X with a single common variable Y.

>

> The data set X is now sorted such that daily

> observations run down the table, stock names run

> across the top, with returns in the table. i.e.,

>

> stock_a stock_b stock_c

> . . .

> day 2 . . .

> day 3 . . .

>

> The difficulty I'm having is that I would like to

> compute the correlation between the X's and the Y on

> a monthly basis (where I have daily data). In Matlab

> for instance I would do this by looping over the rows

> then the columns and filling up containers then

> computing the correlations for each individual month.

>

>

> Does SAS have an easy way to do this given my X is

> large (60 million observations) and contains missing

> data.

>

> Thank a bundle. I'm learning fast and I'm liking SAS

> so far.

>

> Message was edited by: thepowertoknow?

>

> Message was edited by: thepowertoknow?

You can build loops into SAS code but, more likely, you will want to use something like proc summary to calculate the desired averages for each month and, if necessary, transpose the file.

However, for anyone to help, they would have to know what your data really are, and whic variables you want to obtain correlations for.

Art

> Hi all,

>

> Firstly, I'm slightly new to SAS. I would like to

> compute correlation coefficients on a large data set

> X with a single common variable Y.

>

> The data set X is now sorted such that daily

> observations run down the table, stock names run

> across the top, with returns in the table. i.e.,

>

> stock_a stock_b stock_c

> . . .

> day 2 . . .

> day 3 . . .

>

> The difficulty I'm having is that I would like to

> compute the correlation between the X's and the Y on

> a monthly basis (where I have daily data). In Matlab

> for instance I would do this by looping over the rows

> then the columns and filling up containers then

> computing the correlations for each individual month.

>

>

> Does SAS have an easy way to do this given my X is

> large (60 million observations) and contains missing

> data.

>

> Thank a bundle. I'm learning fast and I'm liking SAS

> so far.

>

> Message was edited by: thepowertoknow?

>

> Message was edited by: thepowertoknow?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-20-2011 09:50 PM

Proc corr can give you correlation coefficients ( Pearson or Spearman ).

Ksharp

Ksharp

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-21-2011 10:57 AM

If I understand it sounds like you are wanting to use your date in a BY group.

If the "day" variable is a SAS date variable and the data is sorted by that variable, then try:

PROC CORR DATA=;

by date;

var x y;

format date monyy7.;

run;

This will create one correlation output table for each month and year that appears in the data. You may want to direct the output to a data set.

If the "day" variable is a SAS date variable and the data is sorted by that variable, then try:

PROC CORR DATA=

by date;

var x y;

format date monyy7.;

run;

This will create one correlation output table for each month and year that appears in the data. You may want to direct the output to a data set.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-21-2011 02:26 PM

Hi,

yes - this is exactly how I did it in the end. Thanks a lot.

yes - this is exactly how I did it in the end. Thanks a lot.