turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- PROC CORR in rolling macro w a certain amount of n...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-02-2011 12:07 AM

Hey guys,

I'm programming a rolling correlations macro on something like 18 variables in column form each starting at a different date and ending at a different date. My window is approx 250 days, total duration of the data is 16 yrs. Code collects correlation results through OUTP= option and appends them in a dataset. I have found one problem though, if i specify a global window, it will start from the first 250 days but then when some variables start to have observations it will calculate correlations using those few observations. so 5 variables will have say 250 observations each one, and then 1 variable will have 1 observation and that variable correlation will be calculated using that 1 observation. I would use vector logic if i worked in R, but idk how to do it in SAS. It's a bit hard to visualize/explain so i'm pasting the code below:

the macro vars are initialized before and the macro itself runs good...

%Macro EliLoop;

%Do time=&FromYear %to &ToYear;

%Let RollingDate=%sysevalf(&time.+&window.);

proc corr data=work.tseries outp=outp noprint;

where &time LE date LE &RollingDate;

var &TickerList;

run;

data outp;

set outp;

Start= &time;

End= &RollingDate;

format start date9.;

format end date9.;

run;

data RollingCorr;

set RollingCorr outp;

run;

%if &RollingDate. GE &ToYear %then %goto leave;

%End;

%leave:

%Mend;

The resulting data piece is below (note copy paste messes up the table a bit):

MEAN 0.001694017 0.313450873 0.002982988 05Mar1995

STD 0.031866551 0.327143332 0.033331209 05Mar1995

N 176 2 176 05Mar1995

CORR ADI 0.529862765 1 0.531684736 05Mar1995

CORR ALTR 0.583095622 1 0.534006834 05Mar1995

note the middle column N =2, that variable just started observations so the correlations are obviously are wrong. We should be able either to limit proc corr to calculate the correlation only if amount nonmissing observation for that variable is above certain number or somehow edit the table afterwards and set row and column in the corr matrix to missing if the N right above is less than some number? Or maybe i'm missing some very simple solution SAS style

Would really appreciate any help!

Thanks,

Arsenio

Message was edited by: Arsenio_Staer Message was edited by: Arsenio_Staer

I'm programming a rolling correlations macro on something like 18 variables in column form each starting at a different date and ending at a different date. My window is approx 250 days, total duration of the data is 16 yrs. Code collects correlation results through OUTP= option and appends them in a dataset. I have found one problem though, if i specify a global window, it will start from the first 250 days but then when some variables start to have observations it will calculate correlations using those few observations. so 5 variables will have say 250 observations each one, and then 1 variable will have 1 observation and that variable correlation will be calculated using that 1 observation. I would use vector logic if i worked in R, but idk how to do it in SAS. It's a bit hard to visualize/explain so i'm pasting the code below:

the macro vars are initialized before and the macro itself runs good...

%Macro EliLoop;

%Do time=&FromYear %to &ToYear;

%Let RollingDate=%sysevalf(&time.+&window.);

proc corr data=work.tseries outp=outp noprint;

where &time LE date LE &RollingDate;

var &TickerList;

run;

data outp;

set outp;

Start= &time;

End= &RollingDate;

format start date9.;

format end date9.;

run;

data RollingCorr;

set RollingCorr outp;

run;

%if &RollingDate. GE &ToYear %then %goto leave;

%End;

%leave:

%Mend;

The resulting data piece is below (note copy paste messes up the table a bit):

MEAN 0.001694017 0.313450873 0.002982988 05Mar1995

STD 0.031866551 0.327143332 0.033331209 05Mar1995

N 176 2 176 05Mar1995

CORR ADI 0.529862765 1 0.531684736 05Mar1995

CORR ALTR 0.583095622 1 0.534006834 05Mar1995

note the middle column N =2, that variable just started observations so the correlations are obviously are wrong. We should be able either to limit proc corr to calculate the correlation only if amount nonmissing observation for that variable is above certain number or somehow edit the table afterwards and set row and column in the corr matrix to missing if the N right above is less than some number? Or maybe i'm missing some very simple solution SAS style

Would really appreciate any help!

Thanks,

Arsenio

Message was edited by: Arsenio_Staer Message was edited by: Arsenio_Staer

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-02-2011 12:10 AM

Hey guys,

Is my post above cut off? My post is long but the code looks cut off.

A

Is my post above cut off? My post is long but the code looks cut off.

A

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-02-2011 12:20 AM

Arsenio,

Yes your post didn't include all of your code. It only showed:

%Macro EliLoop;

%Do time=&FromYear %to &ToYear;

%Let RollingDate=%sysevalf(&time.+&window.);

proc corr data=work.tseries outp=outp noprint;

where &time.

When you repost it, I'd suggest that you also include some sample data. If there is some character in your code that is limiting the inclusion of your full post, you can identify it by simply clicking on preview before actually submitting the post.

HTH,

Art

Yes your post didn't include all of your code. It only showed:

%Macro EliLoop;

%Do time=&FromYear %to &ToYear;

%Let RollingDate=%sysevalf(&time.+&window.);

proc corr data=work.tseries outp=outp noprint;

where &time.

When you repost it, I'd suggest that you also include some sample data. If there is some character in your code that is limiting the inclusion of your full post, you can identify it by simply clicking on preview before actually submitting the post.

HTH,

Art

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-02-2011 12:38 AM

Apparently less than equal sign automatically stops the post so i substituted it by LE but apparently the table got messed up quite a bit at the end . Didn't see the preview button in the post editing window.

Hope this is at least understandable for you guys. Sorry for the hassle.

Arsenio

p.s. if the admins are reading this it would be great to have some html capability in the post editing window to make the code more readable ( perhaps some markup code for sas syntax) and also some sort of the copy paste html table would be awesome!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-03-2011 12:28 PM

> p.s. if the admins are reading this it would be great

> to have some html capability in the post editing

> window to make the code more readable ( perhaps some

> markup code for sas syntax) and also some sort of the

> copy paste html table would be awesome!

Until that happy day, there's this: http://support.sas.com/forums/thread.jspa?messageID=27609毙

> to have some html capability in the post editing

> window to make the code more readable ( perhaps some

> markup code for sas syntax) and also some sort of the

> copy paste html table would be awesome!

Until that happy day, there's this: http://support.sas.com/forums/thread.jspa?messageID=27609毙

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-03-2011 12:07 AM

It would be helpful,if post some data and explain what you want base these data.

Ksharp

Ksharp

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-03-2011 12:24 AM

Hi,

Basically, it's a rolling correlation with a certain window (nr of obs to calculate the correlations with) with unequal starts and unequal endings for many variables. With a global start and date, the results in the corr table can be biased for variables that end or start at different dates than the global dates

Example below, BRCM just started observations but the rolling correlations say a month before with a month window will produce correlation for BRCM with the other 3 variables using only 1 observation of BRCM on 19980420 which obviously is not what we want.

The way i visualize this, I would fill the corr table for BRCM both vertical and horizontal with missing until the number of observations for BRCM is equal to at least "window" (remember the rolling correlations is basically a moving window that scans down the dataset) so for the next month of correlations the corr table for BRCM would have missing values and then (after the 1st month of observations of BRCM has appeared) would start with corr table values based on the months worth of observations.

date VTSS XLNX SNDK BRCM

19980416 -0.003472 -0.027143 -0.032967

19980417 0.016260 -0.007342 -0.002841

19980420 0.026286 0.041420 0.005698 -0.039627

19980421 0.008909 -0.001420 -0.008499 0.072816

19980422 0.011038 0.017070 0.040000 0.000000

19980423 -0.002183 -0.043357 -0.027473 -0.040724

Hope this clarifies things a bit.

Basically, it's a rolling correlation with a certain window (nr of obs to calculate the correlations with) with unequal starts and unequal endings for many variables. With a global start and date, the results in the corr table can be biased for variables that end or start at different dates than the global dates

Example below, BRCM just started observations but the rolling correlations say a month before with a month window will produce correlation for BRCM with the other 3 variables using only 1 observation of BRCM on 19980420 which obviously is not what we want.

The way i visualize this, I would fill the corr table for BRCM both vertical and horizontal with missing until the number of observations for BRCM is equal to at least "window" (remember the rolling correlations is basically a moving window that scans down the dataset) so for the next month of correlations the corr table for BRCM would have missing values and then (after the 1st month of observations of BRCM has appeared) would start with corr table values based on the months worth of observations.

date VTSS XLNX SNDK BRCM

19980416 -0.003472 -0.027143 -0.032967

19980417 0.016260 -0.007342 -0.002841

19980420 0.026286 0.041420 0.005698 -0.039627

19980421 0.008909 -0.001420 -0.008499 0.072816

19980422 0.011038 0.017070 0.040000 0.000000

19980423 -0.002183 -0.043357 -0.027473 -0.040724

Hope this clarifies things a bit.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-03-2011 05:13 AM

After check documentation about proc corr,The following is what you want?

[pre]

PROC CORR excludes observations with missing values in the WEIGHT and FREQ variables. By

default, PROC CORR uses pairwise deletion when observations contain missing values. PROC

CORR includes all nonmissing pairs of values for each pair of variables in the statistical computations.

Therefore, the correlation statistics might be based on different numbers of observations.

If you specify the NOMISS option, PROC CORR uses listwise deletion when a value of the VAR or

WITH statement variable is missing. PROC CORR excludes all observations with missing values

from the analysis. Therefore, the number of observations for each pair of variables is identical.

The PARTIAL statement always excludes the observations with missing values by automatically

invoking the NOMISS option. With the NOMISS option, the data are processed more efficiently

because fewer resources are needed. Also, the resulting correlation matrix is nonnegative definite.

In contrast, if the data set contains missing values for the analysis variables and the NOMISS option

is not specified, the resulting correlation matrix might not be nonnegative definite. This leads to

several statistical difficulties if you use the correlations as input to regression or other statistical

procedures.

[/pre]

Ksharp

[pre]

PROC CORR excludes observations with missing values in the WEIGHT and FREQ variables. By

default, PROC CORR uses pairwise deletion when observations contain missing values. PROC

CORR includes all nonmissing pairs of values for each pair of variables in the statistical computations.

Therefore, the correlation statistics might be based on different numbers of observations.

If you specify the NOMISS option, PROC CORR uses listwise deletion when a value of the VAR or

WITH statement variable is missing. PROC CORR excludes all observations with missing values

from the analysis. Therefore, the number of observations for each pair of variables is identical.

The PARTIAL statement always excludes the observations with missing values by automatically

invoking the NOMISS option. With the NOMISS option, the data are processed more efficiently

because fewer resources are needed. Also, the resulting correlation matrix is nonnegative definite.

In contrast, if the data set contains missing values for the analysis variables and the NOMISS option

is not specified, the resulting correlation matrix might not be nonnegative definite. This leads to

several statistical difficulties if you use the correlations as input to regression or other statistical

procedures.

[/pre]

Ksharp

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-03-2011 11:24 AM

Hey Ksharp,

Thank you for your answer!

Well, not exactly. NOMISS will just exclude any missing observations from all variables in the corr matrix so instead of having a month of observations for 3 variables out of 4 ( we have month for 3 first vars, and several days for the 4th), i will have the least common denominator so to speak, only 1 day of observations for all 4 vars. The normal way, without the NOMISS option, will use only pairwise deletion, and it's the situation right now, 3 vars are ok, but their correlation with the 4th is based on fewer observations than the window.

A

Thank you for your answer!

Well, not exactly. NOMISS will just exclude any missing observations from all variables in the corr matrix so instead of having a month of observations for 3 variables out of 4 ( we have month for 3 first vars, and several days for the 4th), i will have the least common denominator so to speak, only 1 day of observations for all 4 vars. The normal way, without the NOMISS option, will use only pairwise deletion, and it's the situation right now, 3 vars are ok, but their correlation with the 4th is based on fewer observations than the window.

A

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-03-2011 09:57 PM

Yes. proc corr will exclude missing value in obs. you said "will have the least common denominator " .So you want set the missing value of 4th variable to what ? mean or nearest value ? Like it?

[pre]

date VTSS XLNX SNDK BRCM

19980416 -0.003472 -0.027143 -0.032967**-0.039627**

19980417 0.016260 -0.007342 -0.002841**-0.039627**

19980420 0.026286 0.041420 0.005698**-0.039627**

19980421 0.008909 -0.001420 -0.008499 0.072816

19980422 0.011038 0.017070 0.040000 0.000000

19980423 -0.002183 -0.043357 -0.027473 -0.040724

[/pre]

correlation coefficient is only covariance of standardized data.

Ksharp Message was edited by: Ksharp

[pre]

date VTSS XLNX SNDK BRCM

19980416 -0.003472 -0.027143 -0.032967

19980417 0.016260 -0.007342 -0.002841

19980420 0.026286 0.041420 0.005698

19980421 0.008909 -0.001420 -0.008499 0.072816

19980422 0.011038 0.017070 0.040000 0.000000

19980423 -0.002183 -0.043357 -0.027473 -0.040724

[/pre]

correlation coefficient is only covariance of standardized data.

Ksharp Message was edited by: Ksharp

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-04-2011 02:40 AM

A

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-04-2011 06:50 AM

I think I understand much more .How about:

Maybe you need post some more dummy data would be helpful,But I have to leave,Tomorrow I will be here.

I recode it just to optimize it. How about this:

[pre]

data temp;

infile datalines truncover;

input date : yymmdd10. VTSS XLNX SNDK BRCM ;

format date yymmddn8.;

datalines;

19980416 -0.003472 -0.027143 -0.032967

19980417 0.016260 -0.007342 -0.002841 0.00456

19980418 0.012160 -0.007342 -0.031841

19980419 0.016420 -0.007342 -0.002841

19980420 0.026286 0.041420 0.005698 -0.039627

19980421 0.008909 -0.001420 -0.008499 0.072816

19980422 0.011038 0.017070 0.040000 0.002353

19980423 -0.002183 -0.043357 -0.027473 -0.040724

19980424 -0.002134 -0.043357 -0.026773 -0.035524

19980425 -0.002233 -0.043357 -0.027873

19980426 -0.004333 -0.043357 -0.024573 -0.040214

19980427 -0.003233 -0.053357 -0.067873 -0.031214

19980428 -0.003233 -0.023357 -0.067873 -0.034214

;

run;

options nomprint nomlogic symbolgen;

%Macro EliLoop(FromYear=16apr1998,ToYear=24apr1998,window=3,TickerList=VTSS XLNX SNDK BRCM);

proc corr data=temp(obs=10) outp=_want noprint;

var &TickerList;

run;

data want;

set _want;

stop;

run;

%let fromyear= %sysfunc(inputn(&fromyear,date9.) );

%let toyear=%sysfunc(inputn(&toyear,date9.));

%do time= &FromYear %to &ToYear;

%let missing=N;

data op;

set temp (where=( date between &time and (&time + &window)));

if missing(BRCM) then call symputx('missing','Y');

run;

%put &missing;

%if &missing eq N %then %do;

proc corr data=op outp=outp noprint;

var &TickerList;

run;

proc append base=want data=outp force;

run;

%End;

%end;

%Mend;

%eliloop()

[/pre]

Ksharp Message was edited by: Ksharp

Maybe you need post some more dummy data would be helpful,But I have to leave,Tomorrow I will be here.

I recode it just to optimize it. How about this:

[pre]

data temp;

infile datalines truncover;

input date : yymmdd10. VTSS XLNX SNDK BRCM ;

format date yymmddn8.;

datalines;

19980416 -0.003472 -0.027143 -0.032967

19980417 0.016260 -0.007342 -0.002841 0.00456

19980418 0.012160 -0.007342 -0.031841

19980419 0.016420 -0.007342 -0.002841

19980420 0.026286 0.041420 0.005698 -0.039627

19980421 0.008909 -0.001420 -0.008499 0.072816

19980422 0.011038 0.017070 0.040000 0.002353

19980423 -0.002183 -0.043357 -0.027473 -0.040724

19980424 -0.002134 -0.043357 -0.026773 -0.035524

19980425 -0.002233 -0.043357 -0.027873

19980426 -0.004333 -0.043357 -0.024573 -0.040214

19980427 -0.003233 -0.053357 -0.067873 -0.031214

19980428 -0.003233 -0.023357 -0.067873 -0.034214

;

run;

options nomprint nomlogic symbolgen;

%Macro EliLoop(FromYear=16apr1998,ToYear=24apr1998,window=3,TickerList=VTSS XLNX SNDK BRCM);

proc corr data=temp(obs=10) outp=_want noprint;

var &TickerList;

run;

data want;

set _want;

stop;

run;

%let fromyear= %sysfunc(inputn(&fromyear,date9.) );

%let toyear=%sysfunc(inputn(&toyear,date9.));

%do time= &FromYear %to &ToYear;

%let missing=N;

data op;

set temp (where=( date between &time and (&time + &window)));

if missing(BRCM) then call symputx('missing','Y');

run;

%put &missing;

%if &missing eq N %then %do;

proc corr data=op outp=outp noprint;

var &TickerList;

run;

proc append base=want data=outp force;

run;

%End;

%end;

%Mend;

%eliloop()

[/pre]

Ksharp Message was edited by: Ksharp

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-09-2011 11:35 PM

Hi KSharp, thanks for the code!

I analyzed and applied the code, i think it works fine for one firm, BRCM, but if i want to apply the missing argument to many firms (changing it to say cmiss()), so that it would dynamically calculate correlations only for the firms (vars) with full observations inside the window, it doesn't seem to work. I think because the symputx assigns a global macro var ( global in the sense for all vars) to govern over the execution status of the proc corr block.

Sorry for delay, non-research duties :/

Thanks,

Arsenio

I analyzed and applied the code, i think it works fine for one firm, BRCM, but if i want to apply the missing argument to many firms (changing it to say cmiss()), so that it would dynamically calculate correlations only for the firms (vars) with full observations inside the window, it doesn't seem to work. I think because the symputx assigns a global macro var ( global in the sense for all vars) to govern over the execution status of the proc corr block.

Sorry for delay, non-research duties :/

Thanks,

Arsenio

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-10-2011 12:38 AM

Yes. But you can split the datasets into many sub-datasets then use this code.

After all are done, using proc append to union them all.

Ksharp

After all are done, using proc append to union them all.

Ksharp

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-24-2017 12:31 PM

Arsenio-

Did you ever get this figured out? I want to do the same thing.

Thanks,

Melanie