I have two datasets, one with stock prices and the other with Foreign Ownership (See sample attached). I want to calculate the cross sectional deviation of returns (CSAD), based on the level of foreign ownership of each firm.
Specifically, instead of calculating CSAD on all stocks (as in my code), I want to calculate this on stocks with FR>5; so for each day only the firms with FR>5, which is on the second dataset, should be used in order to calculate CSAD in the "data Step1" command .
Another thing I would like to do is create a macro variable with the number of stocks, so that I don't have to manually adjust these on each dataset (numbers in arrays)
Any ideas to sort these out?
Many thanks in advance.
*Import Datasets; PROC IMPORT OUT=PRICES DATAFILE= "C:\Users\Laptop\Desktop\DATA.xlsx" DBMS=XLSX REPLACE; SHEET="Sheet1"; GETNAMES=YES; RUN; PROC IMPORT OUT=FR DATAFILE= "C:\Users\Laptop\Desktop\DATA.xlsx" DBMS=XLSX REPLACE; SHEET="Sheet2"; GETNAMES=YES; RUN; *Remove all variables with 'Error'; proc sql noprint; select trim(compress(name)) into :drop_vars separated by ' ' from SASHELP.VCOLUMN where libname = upcase('WORK') and memname = upcase('PRICES') and upcase(name) like '%ERROR%' ; quit; %put &drop_vars.; data PRICE; set PRICES; drop &drop_vars.; run; proc sql noprint; select trim(compress(name)) into :drop_vars separated by ' ' from SASHELP.VCOLUMN where libname = upcase('WORK') and memname = upcase('FR') and upcase(name) like '%ERROR%' ; quit; %put &drop_vars.; data FR2; set FR; drop &drop_vars.; run; *Calculate Cross Sectional Absolute Deviation of Returns; data Step1; set PRICE; N= n(of TELECOM_ARGENTINA--GRUPO_FINO_GALICIA_DEAD___DELIST); array vars(*) TELECOM_ARGENTINA--GRUPO_FINO_GALICIA_DEAD___DELIST; array diff(179); do i=1 to dim(vars); diff(i)=log(vars(i)/lag(vars(i))); end; Average= mean(of diff1-diff179); array var(*) diff1--diff179; array ar(179); do i=1 to dim(var); ar(i)=(var(i)-average); end; array vas(*) ar1--ar179; array ab(179); do i=1 to dim(vas); ab(i)=abs(vas(i)); end; run; data Step2; set Step1; CSAD= sum(of ab1-ab179)/n; abs=abs(Average); sqr=Average*Average; keep date average CSAD abs sqr; run
I dont have too much time to actually try this. I'm just trying to give you an overall idea.
However my suggestion is only viable if the following is true from your write up and my quick read.
In your write up you are trying to avoid manually figuring out the "n" in which to divide or do the calculation by.
If that is the case (generally) then here is my suggestion mock up.
I suggest using the idea of letting the "data" drive code (in this case format) to then use within your code.
I hope my pseudo code allows the suggested idea to come thru for you to then put into your own work.
/** take data and create a freq of ownership to get 'n' ***/ proc freq data=work.fr ; table var_to_count ; output out=work.counted; run;
/** you will need to play with this to get the desired columns to use later **/
/** in my idea you would need the column for ownership and the 'count' **/ data work.counted; set work.counted; /** prep for use with format and cntlin **/ /** create label, fmt name, fmt type etc **/
/** the link below in proc format shows a good example of proc format cntlin **/
/** also look up Wendi Wrights papers on proc format because she rocks with it **/ run; proc format cntlin=work.counted; /** see example: http://www2.sas.com/proceedings/forum2007/068-2007.pdf **/ run; /** then set n value based on count to data using format **/ data work.nowtry; set work.fr; count= put(var_to_apply_count , $fmtname.); /** then do your math using new count var **/ run;
How do you determine when FR>5? I don't see a variable of that name, FR, in either sheet of your example.
I also suspect that this code:
data FR2; set FR; drop _ERR: ; run;
might get rid of all of those error variables without the SQL and macro variable.
I personally would be very concerned about receiving input data with that many column headings of Error.
Thanks for your reply. Data is from Thomson Datastream so this is what I have to work with. Sheet 1 contains prices and Sheet 2 contains FR for the same stocks (not all stocks have a vailable data on FR, so these are less). So, what I want to do is keep the observations in prices (sheet1 or dataset Prices) only when for the same day the FR of the company is >5 (Sheet 2 or dataset FR). Eventually what I want to do seems like that:
data prices; input Date $ 1-10 ARG YPF BBVA CENTRAL ; cards; 01/01/2003 3.21 5.27 7.23 9.88 02/01/2003 3.23 5.30 7.20 9.78 03/01/2003 3.26 5.32 7.18 9.70 04/01/2003 3.23 5.38 7.20 9.71 05/01/2003 3.20 5.35 7.19 9.80 ;;; data FR; input Date $ 1-10 ARG YPF BBVA CENTRAL ; cards; 01/01/2003 3 11 7 . 02/01/2003 3 11 7 . 03/01/2003 3 11 7 . 04/01/2003 6 11 4 . 05/01/2003 6 11 4 . ;;; data need; input Date $ 1-10 ARG YPF BBVA CENTRAL ; cards; 01/01/2003 . 5.27 7.23 . 02/01/2003 . 5.30 7.20 . 03/01/2003 . 5.32 7.18 . 04/01/2003 3.23 5.38 . . 05/01/2003 3.20 5.35 . . ;;;
Please see the attached excel file. This is how the data is downloaded from Datastream. The stock prices are in Sheet 1 (e.g. TELECOM ARGENTINA) and the foreign holdings (FR) in Sheet 2 (e.g. TELECOM ARGENTINA - FOREIGN HOLDINGS). So, I would like to keep the stock price of TELECOM ARGENTINA for every observation only when TELECOM ARGENTINA - FOREIGN HOLDINGS is greater than 5 for the same observation.
I just thought that it would be easier to have the same names in the two databases and maybe merge somehow the two of them.
I hope this helps.
I managed to figure out a way to do what I want by transposing the two datasets and merging them. If someone has any suggestions on how to make this faster, please feel free to comment.
By the way, how could I create a macro to count the variables in order not to mannually write each time COL1--COL25 and then in each of the arrays the number of variables?
*Import Datasets; PROC IMPORT OUT=PRICE DATAFILE= "C:\Users\Laptop\Desktop\DATA.xlsx" DBMS=XLSX REPLACE; SHEET="Sheet1"; GETNAMES=YES; RUN; PROC IMPORT OUT=FR DATAFILE= "C:\Users\Laptop\Desktop\DATA.xlsx" DBMS=XLSX REPLACE; SHEET="Sheet2"; GETNAMES=YES; RUN; *Remove all variables with 'Error'; data price; set price; drop _ERR: ; run; data fr; set fr; drop _ERR: ; run; *transpose and sort tables; proc transpose data=price out=pr(rename=(col1=Price));; by date; var TELECOM_ARGENTINA--GRUPO_FINO_GALICIA_DEAD___DELIST; run; proc sort data=pr; by date _LABEL_; run; proc transpose data=fr out=fr2(rename=(col1=FR));; by date; var TELECOM_ARGENTINA--CENTRAL_PUERTO_SOCIEDAD_ANONIMA; run; proc sort data=fr2; by date _LABEL_; run; *merge tables; data data; merge pr fr2; by date _LABEL_; run; *keep observations with FR>5; data data2; set data; where Fr>5; drop FR _NAME_; run; *Transpose table; proc transpose data=data2 out=data3(drop=_NAME_); by DATE; run; *Calculate Cross Sectional Absolute Deviation of Returns; data Step1; set data3; N= n(of COL1--COL25); array vars(*) COL1--COL25; array diff(25); do i=1 to dim(vars); diff(i)=log(vars(i)/lag(vars(i))); end; Average= mean(of diff1-diff25); array var(*) diff1--diff25; array ar(25); do i=1 to dim(var); ar(i)=(var(i)-average); end; array vas(*) ar1--ar25; array ab(25); do i=1 to dim(vas); ab(i)=abs(vas(i)); end; run; data Step2; set Step1; CSAD= sum(of ab1-ab25)/n; abs=abs(Average); sqr=Average*Average; keep date average CSAD abs sqr; if CSAD=0 then delete; run;
Here is a little helper macro.
%macro NumVars(dataset=, LastVar=, FirstVar=,MvarName=); %global &MvarName; data _null_; dsid = open("&dataset","i"); Last = varnum(dsid,"&LastVar"); First = varnum(dsid,"&FirstVar"); NumVars= Last - first +1; rc = close(dsid); call symputx("&MvarName",Numvars); run; %mend; %NumVars(dataset=price , LastVar = GRUPO_FINO_GALICIA_DEAD___DELIST, FirstVar = TELECOM_ARGENTINA, MvarName = dvar); %put &dvar;
You can specify the name of the resultant macro variable to use in the places where you need the number of variables such as
Col1 - Col&dvar
Array ar( &dvar);
Note this works only because the variables you want are in sequence in the data.
Execute the Macro (the %NUMVARS line) BEFORE the first time that you need the value for any thing but AFTER you have run the macro definition code (the code that starts with %macro and ends with %mend).
Bad things will happen if you don't put the first and last variable names in the right place, misspell them or point to the wrong dataset.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.