DATA Step, Macro, Functions and more

Vector logic (as in R): how to cond-ly operate on ranges in a dataset?

Reply
Contributor
Posts: 65

Vector logic (as in R): how to cond-ly operate on ranges in a dataset?

Hey guys,

A quick question related to my rolling correlations discussion in a different subforum. I'm trying to operate directly on proc corr output results and substitute rows or cols or specific ranges of values with missing values.

say i have this output

_TYPE_ _NAME_ ADI ALTR AMAT AMD AMKR
MEAN 0.00 0.00 0.00 0.00 -0.01
STD 0.04 0.04 0.03 0.04 0.05
N 126 126 126 126 63
CORR ADI 1.00 0.52 0.46 0.33 0.28
CORR ALTR 0.52 1.00 0.54 0.22 0.34
CORR AMAT 0.46 0.54 1.00 0.38 0.32
CORR AMD 0.33 0.22 0.38 1.00 0.25
CORR AMKR 0.28 0.34 0.32 0.25 1.00

so i want to find all columns/vars with N less than some dynamic nr, say100, and then populat just the corresponding column and row with missing values. So in the example above, the end result should be like this

_TYPE_ _NAME_ ADI ALTR AMAT AMD AMKR
MEAN 0.00 0.00 0.00 0.00 -0.01
STD 0.04 0.04 0.03 0.04 0.05
N 126 126 126 126 63
CORR ADI 1.00 0.52 0.46 0.33 .
CORR ALTR 0.52 1.00 0.54 0.22 .
CORR AMAT 0.46 0.54 1.00 0.38 .
CORR AMD 0.33 0.22 0.38 1.00 .
CORR AMKR . . . . .

been trying to approach this problem from several angles, but right now other than call R within SAS, i see no way. There should be an elegant SAS-wise solution i think. It's somewhat urgent and the code should be dynamic enough to adjust for different n, several variables with N less than some nr and lots of variables.

Thanks!!

Arsenio
Super User
Posts: 10,018

Re: Vector logic (as in R): how to cond-ly operate on ranges in a dataset?

Posted in reply to Arsenio_Staer
Oh.It much more clear. Once you post what you want output dataset to look like.
How about:


[pre]
data temp;
infile datalines truncover;
input date : yymmdd10. VTSS XLNX SNDK BRCM ;
format date yymmddn8.;
datalines;
19980416 -0.003472 -0.027143 -0.032967
19980417 0.016260 -0.007342 -0.002841 0.00456
19980418 0.012160 -0.007342 -0.031841
19980419 0.016420 -0.007342 -0.002841
19980420 0.026286 0.041420 0.005698 -0.039627
19980421 0.008909 -0.001420 -0.008499 0.072816
19980422 0.011038 0.017070 0.040000 0.002353
19980423 -0.002183 -0.043357 -0.027473 -0.040724
19980424 -0.002134 -0.043357 -0.026773 -0.035524
19980425 -0.002233 -0.043357 -0.027873
19980426 -0.004333 -0.043357 -0.024573 -0.040214
19980427 -0.003233 -0.053357 -0.067873 -0.031214
19980428 -0.003233 -0.023357 -0.067873 -0.034214
;
run;
options nomprint nomlogic nosymbolgen;
%Macro EliLoop(FromYear=16apr1998,ToYear=24apr1998,window=3,TickerList=VTSS XLNX SNDK BRCM,sort_var=BRCM);


proc corr data=temp(obs=10) outp=_want noprint;
var &TickerList;
run;
data want;
set _want;
stop;
run;

%let fromyear= %sysfunc(inputn(&fromyear,date9.) );
%let toyear=%sysfunc(inputn(&toyear,date9.));
%do time=&FromYear %to &ToYear;
data op;
set temp (where=( date between &time and (&time + &window)));
run;

proc sort data=op;
by &sort_var;
run;

data op;
set op;
retain missing 'N';
if missing(BRCM) then missing='Y';
if missing='Y' then call missing(BRCM);
run;


proc corr data=op outp=outp noprint;
var &TickerList;
run;

proc append base=want data=outp force;
run;
%end;
%Mend;

%eliloop()
[/pre]


Ksharp Message was edited by: Ksharp
Contributor
Posts: 65

Re: Vector logic (as in R): how to cond-ly operate on ranges in a dataset?

Hey Ksharp, thanks again!

Can i adjust this code, so it's applicable to any variable that might have some obs missing? I have to repeat this analysis for many subperiods and over 1000+ subsets of variables from say 8 to 2000 variables per subset. I will never know ex ante which ones start and which ones end during the estimation period so i can't prespecify the variable that might have some observations missing like BRCM in this case.

Any ideas anybody?

Thanks a lot, especially to KSharp, for his help!
Super User
Posts: 10,018

Re: Vector logic (as in R): how to cond-ly operate on ranges in a dataset?

Posted in reply to Arsenio_Staer
OK. It is much more complicated.
Assuming each dataset has a variable date.
And I only process one dataset, you can expand it to process multi-datasets by yourself.
I think I am too tired.

[pre]
data temp;
infile datalines dsd truncover dlm=' ';
input date : yymmdd10. VTSS XLNX SNDK BRCM ;
format date yymmddn8.;
datalines;
19980416 -0.003472 -0.027143 -0.032967
19980417 0.016260 -0.007342 -0.002841 0.00456
19980418 0.012160 -0.007342 -0.031841 0.002841
19980419 0.016420 -0.007342 -0.002841 0.002841
19980420 0.026286 0.041420 0.005698 -0.039627
19980421 0.008909 -0.001420 -0.008499 0.072816
19980422 0.017070 0.040000 0.002353 0.002841
19980423 -0.002183 -0.043357 -0.027473 -0.040724
19980424 -0.002134 -0.026773 -0.035524
19980425 -0.002233 -0.043357 -0.027873
19980426 -0.004333 -0.043357 -0.024573 -0.040214
19980427 -0.003233 -0.053357 -0.067873 -0.031214
19980428 -0.003233 -0.023357 -0.067873 -0.034214
19980429 -0.002133 -0.043357 -0.067233 -0.023214
;
run;

%Macro EliLoop(lib=work,dataset=temp,window=3);

proc sql noprint;
select name into : name_list separated by ' '
from dictionary.columns
where libname="%upcase(&lib)" and memname="%upcase(&dataset)" and name ne 'date';
quit;
proc corr data=&lib..&dataset(obs=10) outp=_want noprint;
var &name_list;
run;
data want_&dataset; /*unique output dataset for each dataset*/
set _want;
stop;
run;


proc sort data=&dataset;
by date;
run;

data _null_;
set &lib..&dataset end=last;
if _n_ eq 1 then call symputx('from',date);
if last then call symputx('to',date);
run;

%do time=&from %to &to-&window;
data op;
set &lib..&dataset(where=( date between &time and (&time + &window))) end=last;
array varname{*} &name_list;
array missing_var{10000} $50 _temporary_ ;
do i=1 to dim(varname);
if missing(varname{i}) then missing_var{i}=vname(varname{i});
end;
if last then call symputx('missing_var',catx(' ',of missing_var{*}));
run;
/*%put _user_; */
data op;
set op;
call missing(of &missing_var);
run;
proc corr data=op outp=outp noprint;
var &name_list;
run;

proc append base=want_&dataset data=outp force;
run;
%end;



%mend;

%eliloop()


[/pre]


Ksharp
Contributor
Posts: 65

Re: Vector logic (as in R): how to cond-ly operate on ranges in a dataset?

Thanks, Ksharp!

I used some tricks from your code, and i think it works now! Spent hrs looking for smth like vname function before i saw your code. I ran the proc corr normally but i used your code to edit the "outp" dataset to set the column and row in the square corr matrix to missing if the there are less than some nr of observations!!! I think it works ok now.

I wonder you are the only one that helps out here or are there more people? Anyway, thanks a lot and have a great weekend!

A
Super User
Posts: 10,018

Re: Vector logic (as in R): how to cond-ly operate on ranges in a dataset?

Posted in reply to Arsenio_Staer
Hi.
I think there are more people just as me to help you in this forum.
Such as : Peter.C Patrick Art.C Art.T data _null_ ...... they are all seasoned programmer and
enthusiastic warm-hearted man.
If you have some other problem in near future, I am sure they will be happy to help you.

Good Luck.
Ksharp
Respected Advisor
Posts: 3,799

Re: Vector logic (as in R): how to cond-ly operate on ranges in a dataset?

Posted in reply to Arsenio_Staer
Your description of the problem was difficult for me to understand and since you mention R and I don't know R jargon I passed. Reading your "description" more carefull I believe this does what you want. The program should be general enough for any size correlation matrix. You will need to learn some SAS to understand it.

[pre]
data corr;
input _TYPE_ :$8. _NAME_ :$32. ADI ALTR AMAT AMD AMKR;
cards;
MEAN . 0.00 0.00 0.00 0.00 -0.01
STD . 0.04 0.04 0.03 0.04 0.05
N . 126 26 126 126 63
CORR ADI 1.00 0.52 0.46 0.33 0.28
CORR ALTR 0.52 1.00 0.54 0.22 0.34
CORR AMAT 0.46 0.54 1.00 0.38 0.32
CORR AMD 0.33 0.22 0.38 1.00 0.25
CORR AMKR 0.28 0.34 0.32 0.25 1.00
;;;;
run;
proc print;
run;
* Index list of target vars;
proc transpose data=corr(drop=_name_) out=lt100(where=(col1 lt 100) index=(_name_));
where _type_ eq 'N';
run;
proc print;
run;
* When _NAME_ is in LT100 set numeric vars to missing;
data corr2 / pgm=corr2;
set corr(where=(_type_ eq 'CORR'));
set lt100(drop=col1) key=_name_/unique;
if _iorc_ eq 0 then do;
call missing(of _numeric_);
end;
_error_ = 0;
run;
* run stored program;
data pgm=corr2;
run;
*Flip it;
proc transpose data=corr2 out=corr;
by _type_;
run;
proc print;
run;
* Run stored program;
data pgm=corr2;
run;
proc print;
run;
[/pre]
Ask a Question
Discussion stats
  • 6 replies
  • 217 views
  • 0 likes
  • 3 in conversation