BookmarkSubscribeRSS Feed
Arsenio_Staer
Calcite | Level 5
Hey guys,

I'm programming a rolling correlations macro on something like 18 variables in column form each starting at a different date and ending at a different date. My window is approx 250 days, total duration of the data is 16 yrs. Code collects correlation results through OUTP= option and appends them in a dataset. I have found one problem though, if i specify a global window, it will start from the first 250 days but then when some variables start to have observations it will calculate correlations using those few observations. so 5 variables will have say 250 observations each one, and then 1 variable will have 1 observation and that variable correlation will be calculated using that 1 observation. I would use vector logic if i worked in R, but idk how to do it in SAS. It's a bit hard to visualize/explain so i'm pasting the code below:
the macro vars are initialized before and the macro itself runs good...

%Macro EliLoop;

%Do time=&FromYear %to &ToYear;
%Let RollingDate=%sysevalf(&time.+&window.);
proc corr data=work.tseries outp=outp noprint;
where &time LE date LE &RollingDate;
var &TickerList;
run;

data outp;
set outp;
Start= &time;
End= &RollingDate;
format start date9.;
format end date9.;
run;
data RollingCorr;
set RollingCorr outp;
run;
%if &RollingDate. GE &ToYear %then %goto leave;

%End;
%leave:
%Mend;

The resulting data piece is below (note copy paste messes up the table a bit):
MEAN 0.001694017 0.313450873 0.002982988 05Mar1995
STD 0.031866551 0.327143332 0.033331209 05Mar1995
N 176 2 176 05Mar1995
CORR ADI 0.529862765 1 0.531684736 05Mar1995
CORR ALTR 0.583095622 1 0.534006834 05Mar1995

note the middle column N =2, that variable just started observations so the correlations are obviously are wrong. We should be able either to limit proc corr to calculate the correlation only if amount nonmissing observation for that variable is above certain number or somehow edit the table afterwards and set row and column in the corr matrix to missing if the N right above is less than some number? Or maybe i'm missing some very simple solution SAS style 🙂

Would really appreciate any help!

Thanks,

Arsenio

Message was edited by: Arsenio_Staer Message was edited by: Arsenio_Staer
14 REPLIES 14
Arsenio_Staer
Calcite | Level 5
Hey guys,

Is my post above cut off? My post is long but the code looks cut off.

A
art297
Opal | Level 21
Arsenio,

Yes your post didn't include all of your code. It only showed:

%Macro EliLoop;

%Do time=&FromYear %to &ToYear;
%Let RollingDate=%sysevalf(&time.+&window.);
proc corr data=work.tseries outp=outp noprint;
where &time.

When you repost it, I'd suggest that you also include some sample data. If there is some character in your code that is limiting the inclusion of your full post, you can identify it by simply clicking on preview before actually submitting the post.

HTH,
Art
Arsenio_Staer
Calcite | Level 5
art297, thanks!!

Apparently less than equal sign automatically stops the post so i substituted it by LE but apparently the table got messed up quite a bit at the end 😞 . Didn't see the preview button in the post editing window.

Hope this is at least understandable for you guys. Sorry for the hassle.

Arsenio

p.s. if the admins are reading this it would be great to have some html capability in the post editing window to make the code more readable ( perhaps some markup code for sas syntax) and also some sort of the copy paste html table would be awesome! 🙂
Tim_SAS
Barite | Level 11
> p.s. if the admins are reading this it would be great
> to have some html capability in the post editing
> window to make the code more readable ( perhaps some
> markup code for sas syntax) and also some sort of the
> copy paste html table would be awesome! 🙂

Until that happy day, there's this: http://support.sas.com/forums/thread.jspa?messageID=27609毙
Ksharp
Super User
It would be helpful,if post some data and explain what you want base these data.


Ksharp
Arsenio_Staer
Calcite | Level 5
Hi,

Basically, it's a rolling correlation with a certain window (nr of obs to calculate the correlations with) with unequal starts and unequal endings for many variables. With a global start and date, the results in the corr table can be biased for variables that end or start at different dates than the global dates

Example below, BRCM just started observations but the rolling correlations say a month before with a month window will produce correlation for BRCM with the other 3 variables using only 1 observation of BRCM on 19980420 which obviously is not what we want.
The way i visualize this, I would fill the corr table for BRCM both vertical and horizontal with missing until the number of observations for BRCM is equal to at least "window" (remember the rolling correlations is basically a moving window that scans down the dataset) so for the next month of correlations the corr table for BRCM would have missing values and then (after the 1st month of observations of BRCM has appeared) would start with corr table values based on the months worth of observations.

date VTSS XLNX SNDK BRCM
19980416 -0.003472 -0.027143 -0.032967
19980417 0.016260 -0.007342 -0.002841
19980420 0.026286 0.041420 0.005698 -0.039627
19980421 0.008909 -0.001420 -0.008499 0.072816
19980422 0.011038 0.017070 0.040000 0.000000
19980423 -0.002183 -0.043357 -0.027473 -0.040724

Hope this clarifies things a bit.
Ksharp
Super User
After check documentation about proc corr,The following is what you want?
[pre]
PROC CORR excludes observations with missing values in the WEIGHT and FREQ variables. By
default, PROC CORR uses pairwise deletion when observations contain missing values. PROC
CORR includes all nonmissing pairs of values for each pair of variables in the statistical computations.
Therefore, the correlation statistics might be based on different numbers of observations.
If you specify the NOMISS option, PROC CORR uses listwise deletion when a value of the VAR or
WITH statement variable is missing. PROC CORR excludes all observations with missing values
from the analysis. Therefore, the number of observations for each pair of variables is identical.
The PARTIAL statement always excludes the observations with missing values by automatically
invoking the NOMISS option. With the NOMISS option, the data are processed more efficiently
because fewer resources are needed. Also, the resulting correlation matrix is nonnegative definite.
In contrast, if the data set contains missing values for the analysis variables and the NOMISS option
is not specified, the resulting correlation matrix might not be nonnegative definite. This leads to
several statistical difficulties if you use the correlations as input to regression or other statistical
procedures.
[/pre]

Ksharp
Arsenio_Staer
Calcite | Level 5
Hey Ksharp,

Thank you for your answer!

Well, not exactly. NOMISS will just exclude any missing observations from all variables in the corr matrix so instead of having a month of observations for 3 variables out of 4 ( we have month for 3 first vars, and several days for the 4th), i will have the least common denominator so to speak, only 1 day of observations for all 4 vars. The normal way, without the NOMISS option, will use only pairwise deletion, and it's the situation right now, 3 vars are ok, but their correlation with the 4th is based on fewer observations than the window.


A
Ksharp
Super User
Yes. proc corr will exclude missing value in obs. you said "will have the least common denominator " .So you want set the missing value of 4th variable to what ? mean or nearest value ? Like it?
[pre]
date VTSS XLNX SNDK BRCM
19980416 -0.003472 -0.027143 -0.032967 -0.039627
19980417 0.016260 -0.007342 -0.002841 -0.039627
19980420 0.026286 0.041420 0.005698 -0.039627
19980421 0.008909 -0.001420 -0.008499 0.072816
19980422 0.011038 0.017070 0.040000 0.000000
19980423 -0.002183 -0.043357 -0.027473 -0.040724
[/pre]



correlation coefficient is only covariance of standardized data.

Ksharp Message was edited by: Ksharp
Arsenio_Staer
Calcite | Level 5
Tim@SAS, thanks for the info! I will take a look at those posts.

Ksharp, i'm using a rolling correlation macro so i'm advancing the window by 1 day and recalculating correlations, and appending results to a table. My idea was to populate just that row and column (both for BRCM) in the correlation table by missing values until i have at least the necessary amount of observations in BRCM (remember the window is scrolling down or forward in time, so with each day, we have 1 more observation for BRCM until the window is full with observations and we can calculate correlations of BRCM with all other vars). Obviously that should not affect all other variables, for each one of those i have necessary amount of observations so correlations for them (besides BRCM) should be calculated easily. I can attach or post an .xls file with output if that helps. May help other people doing rolling calcs in sas, which in stata is really trivial.

A
Ksharp
Super User
I think I understand much more .How about:
Maybe you need post some more dummy data would be helpful,But I have to leave,Tomorrow I will be here.
I recode it just to optimize it. How about this:


[pre]
data temp;
infile datalines truncover;
input date : yymmdd10. VTSS XLNX SNDK BRCM ;
format date yymmddn8.;
datalines;
19980416 -0.003472 -0.027143 -0.032967
19980417 0.016260 -0.007342 -0.002841 0.00456
19980418 0.012160 -0.007342 -0.031841
19980419 0.016420 -0.007342 -0.002841
19980420 0.026286 0.041420 0.005698 -0.039627
19980421 0.008909 -0.001420 -0.008499 0.072816
19980422 0.011038 0.017070 0.040000 0.002353
19980423 -0.002183 -0.043357 -0.027473 -0.040724
19980424 -0.002134 -0.043357 -0.026773 -0.035524
19980425 -0.002233 -0.043357 -0.027873
19980426 -0.004333 -0.043357 -0.024573 -0.040214
19980427 -0.003233 -0.053357 -0.067873 -0.031214
19980428 -0.003233 -0.023357 -0.067873 -0.034214
;
run;
options nomprint nomlogic symbolgen;
%Macro EliLoop(FromYear=16apr1998,ToYear=24apr1998,window=3,TickerList=VTSS XLNX SNDK BRCM);

proc corr data=temp(obs=10) outp=_want noprint;
var &TickerList;
run;
data want;
set _want;
stop;
run;


%let fromyear= %sysfunc(inputn(&fromyear,date9.) );
%let toyear=%sysfunc(inputn(&toyear,date9.));
%do time= &FromYear %to &ToYear;
%let missing=N;
data op;
set temp (where=( date between &time and (&time + &window)));
if missing(BRCM) then call symputx('missing','Y');
run;
%put &missing;
%if &missing eq N %then %do;
proc corr data=op outp=outp noprint;
var &TickerList;
run;

proc append base=want data=outp force;
run;
%End;
%end;
%Mend;

%eliloop()

[/pre]

Ksharp Message was edited by: Ksharp
Arsenio_Staer
Calcite | Level 5
Hi KSharp, thanks for the code!

I analyzed and applied the code, i think it works fine for one firm, BRCM, but if i want to apply the missing argument to many firms (changing it to say cmiss()), so that it would dynamically calculate correlations only for the firms (vars) with full observations inside the window, it doesn't seem to work. I think because the symputx assigns a global macro var ( global in the sense for all vars) to govern over the execution status of the proc corr block.



Sorry for delay, non-research duties 😕

Thanks,

Arsenio
Ksharp
Super User
Yes. But you can split the datasets into many sub-datasets then use this code.
After all are done, using proc append to union them all.

Ksharp
Melanie1
Calcite | Level 5

Arsenio-

Did you ever get this figured out?  I want to do the same thing.

Thanks,

Melanie

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 2761 views
  • 0 likes
  • 5 in conversation