Hi,
I am an almost new SAS user. I have a query that I am writing below.
I have many datasets and each dataset contains many different variables (firms). These datasets contain time series of price data of those firms. I need to perform some equations for all those variables. For this, I formed the array. But the problem is that I need to define arrays for each equation. And since each dataset has different number of variables, I need to change the number of array elements for each dataset. Is there any way to reduce this task? I think macro can solve this. Any suggested solutions will be highly appreciated.
Abu Chowdhury
Since you are explicitly listing the variables you no longer need the tricks to skip the extra numeric variable for DATE. So take out the extra SET and the +1 in the array indexing.
You can't use the same list of variables for all of the arrays. Since except for the input list (array VAR) you are making NEW variables you do not need to list the variable names since SAS will automatically create them for you by adding numeric suffix to the array name.. If it is less confusing you can replace "array rvar(&nvar);" with "array rvar rvar1 - rvar&nvar;".
%let varlist = CA45245E1097 ... US98974P1003 ; /***I wrote all the variable names***/
%let nvar = %sysfunc(countw(&varlist,%str( )));
data P20040by01;
set Benchmar.P20040by0 (keep = date &varlist);
*set Benchmar.P20040by0;
array var (&nvar) &varlist;
array rvar (&nvar) &varlist;
array gvar (&nvar) &varlist;
array lgvar (&nvar) &varlist;
array ret (&nvar) &varlist;
do i = 1 to &nvar;
rvar(i) = (var(i+1)-lag(var(i+1)))/lag(var(i+1));
gvar(i) = rvar(i)+1;
lgvar(i) = lag(gvar(i));
end;
rgla = mean( of lgvar(*) );
do i = 1 to &nvar;
ret(i) = (rvar(i)*lgvar(i))/rgla;
end;
run;
It sounds like a task that could be solved.
But please try to explain your requirement in more detail, perhaps with sample data set structure(s), and equation logic you need to apply to the variables.
I wrote the following codes:
data P20040by01;
set Benchmar.P20040by0 (keep = Date CA45245E1097 CH0018666781 … … … US98974P1003;
Array var {104} CA45245E1097 CH0018666781 … … … US98974P1003;
Array rvar {104} r1 r2 … … … r104;
Array gvar {104} rg1 rg2 … … … rg104;
Array lgvar {104} rgl1 rgl2 … … … rgl104;
do i = 1 to 104;
rvar{i} = (var{i}-lag(var{i}))/lag(var{i});
gvar {i} = rvar{i}+1;
lgvar {i} = lag(gvar {i});
end;
rgla = mean (rgl1, rgl2, … … … rgl104);
Array ret {104} ret1 ret2 … … … ret104;
do i = 1 to 104;
ret {i} = (rvar{i}*lgvar{i})/rgla;
end;
run;
Description of array names are as follows:
var: price data of each variable. Each variable is a firm.
rvar: (price –lag(price))/lag(price) i.e. rvar is the daily return
gvar: rvar + 1 i.e. gvar is the gross return
lgvar: lag(gvar) i.e. lgvar is the prior day gross return
rgla: mean(lgvar) i.e. rgla is the average of prior day gross return of the stocks
ret: return i.e. ret is the daily return multiplied by the prior day gross
return divided by the average of prior day gross return of stocks
In this dataset there are 104 variables. I need to perform those equations for all variables. For this, I formed arrays for each equation. For other datasets, I have different number of variables with different names. So I need to change the number of elements in each of the arrays I wrote for each dataset. I am attaching the dataset.
Is there any way so that I can reduce this task? I think macro can help. Let me know if you need more information.
If you want to treat all numeric variables except DATE as price data for firms then your data step code can be much simpler.
Assuming you know the number of variables then you could write your data step like this. Note the extra SET to make sure that DATE is the first variable and the +1 in the indexes into the VAR() array to skip over the DATE variable. Also I think you will have an issue getting the LAG() stacks to work since for the first observation LAG(VAR(x)) will be missing. What do you want the value of RVAR to be for the first date recorded?
%let nvar=104 ;
data P20040by01;
set P20040by0 (keep=date);
set P20040by0 ;
array var _numeric_ ;
array rvar (&nvar);
array gvar (&nvar);
array lgvar (&nvar);
array ret (&nvar);
do i = 1 to &nvar;
rvar(i) = (var(i+1)-lag(var(i+1)))/lag(var(i+1));
gvar(i) = rvar(i)+1;
lgvar(i) = lag(gvar(i));
end;
rgla = mean( of gvar(*) );
do i = 1 to &nvar;
ret(i) = (rvar(i)*lgvar(i))/rgla;
end;
run;
If you do NOT know the number of variables then this simple data step will find it for you.
data _null_;
set P20040by0 ;
array var _numeric_;
call symputx('nvar',dim(var)-1);
stop;
run;
Thanks a lot for your suggestions. It works. But I find the different results from results generated by my previous code. Why is this? Again if I want to keep these 104 variables (firms in my dataset) from the main dataset (where I have more than 2000 firms), then how can I select those 104 variables if I write %let nvar = 104 and set P20040by0 (keep = date)? I am eagerly waiting for your answer.
Actually I will start from 1/1/2004 in my analysis. Since I know that first two observations will be blank because of my equations, so I started my dataset from the date of 30/12/2003. I will remove those two blank observations (for 30/12/2003 and 31/12/2003) by using if ret1 =. then delete; Please let me know whether it's ok or not.
If you only want to use some of the variables then you must list them. If you list them then you can use COUNTW() function to find the number.
%let varlist= CA45245E1097 ... US98974P1003 ;
%let nvars = %sysfunc(countw(&varlist,%str( )));
...
set Benchmar.P20040by0 (keep = Date &varlist);
...
array var (&nvars) &varlist ;
If there is a pattern to the names then you can use variable lists to make it easier.
For example if they are the only variables that start with the letter C you could write
set P20040by0 (keep=date c:);
Or if they are in uninterrupted order in the dataset you could write:
set Benchmar.P20040by0 (keep = Date CA45245E1097 -- US98974P1003);
Hi, Thanks for your helpful suggestions. But this time something is getting wrong. Let me explain. Say, P20040by0 dataset has 2271 firms and date. From this dataset I want to keep those selected 104 firms and date. I have the following queries: Do I need to write all the variable names in %let varlist= ? Again, do I need to remove the _numeric_ from array var (that you suggested previously)? Finally, don't I need to write &varlist in other arrays?
I wrote the following codes but SAS log says error and 0 observation. Please correct me. You are highly appreciated for everything.
%let varlist = CA45245E1097 ... US98974P1003 ; /***I wrote all the variable names***/
%let nvar = %sysfunc(countw(&varlist,%str( )));
data P20040by01;
set Benchmar.P20040by0 (keep = date &varlist);
set Benchmar.P20040by0;
array var (&nvar) &varlist;
array rvar (&nvar) &varlist;
array gvar (&nvar) &varlist;
array lgvar (&nvar) &varlist;
array ret (&nvar) &varlist;
do i = 1 to &nvar;
rvar(i) = (var(i+1)-lag(var(i+1)))/lag(var(i+1));
gvar(i) = rvar(i)+1;
lgvar(i) = lag(gvar(i));
end;
rgla = mean( of lgvar(*) );
do i = 1 to &nvar;
ret(i) = (rvar(i)*lgvar(i))/rgla;
end;
run;
Since you are explicitly listing the variables you no longer need the tricks to skip the extra numeric variable for DATE. So take out the extra SET and the +1 in the array indexing.
You can't use the same list of variables for all of the arrays. Since except for the input list (array VAR) you are making NEW variables you do not need to list the variable names since SAS will automatically create them for you by adding numeric suffix to the array name.. If it is less confusing you can replace "array rvar(&nvar);" with "array rvar rvar1 - rvar&nvar;".
%let varlist = CA45245E1097 ... US98974P1003 ; /***I wrote all the variable names***/
%let nvar = %sysfunc(countw(&varlist,%str( )));
data P20040by01;
set Benchmar.P20040by0 (keep = date &varlist);
*set Benchmar.P20040by0;
array var (&nvar) &varlist;
array rvar (&nvar) &varlist;
array gvar (&nvar) &varlist;
array lgvar (&nvar) &varlist;
array ret (&nvar) &varlist;
do i = 1 to &nvar;
rvar(i) = (var(i+1)-lag(var(i+1)))/lag(var(i+1));
gvar(i) = rvar(i)+1;
lgvar(i) = lag(gvar(i));
end;
rgla = mean( of lgvar(*) );
do i = 1 to &nvar;
ret(i) = (rvar(i)*lgvar(i))/rgla;
end;
run;
Hi, thank you very very much. Everything is fine now.
By the by, can I have your email address if you don't mind? My email address is rasel0608@yahoo.com. I will be very happy if you send email to me so that I can get your email address.
There is a facility on this site for sending private communications.
Hi Tom, back to my previous dataset and the last equation in the code (ret(i) = .........
If I use this then SAS calculates the return for selected firms (say 104) and names such as ret1, ret2, ….. etc. Now this is one of the datasets for 2004. This dataset has 104 firms and say this is dataset1 for year 2004. I will have 100 datasets for each year and this 100 datasets will have returns for 2271 firms (this is my sample size). There will be different firms and different number of firms for each dataset for each year. I have a query:
How will I combine these 100 datasets? Each dataset has variables such as ret1, ret2, … etc. but each dataset has different firms. But I need to identify the returns for each firm for all 2271 firms for those 100 datasets because I need to calculate returns for those firms in 2005 and other years as well. Now, say dataset1 in 2005 has different firms and different number of firms than those of 2004 i.e. the firm that was in dataset1 for 2004 may be in another dataset for 2005. So I need to identify the returns for each firm so that I can form a time series dataset (say from 2004 to 2010) for returns of each firm.
In summary, 100 datasets will be combined for each year and this combined dataset will have returns for all 2271 firms. So there will be one dataset for every year. Then another dataset will have to be formed combining all yearly datasets.
I am not sure whether I could explain properly. Please let me know if you do not understand properly.
If you are getting different results hen one way to help debug is to try running i both ways for just one or two firms and use PROC COMPARE to see where the differences are.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.