BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AbuChowdhury
Fluorite | Level 6

Hi,

I am an almost new SAS user. I have a query that I am writing below.

I have many datasets and each dataset contains many different variables (firms). These datasets contain time series of price data of those firms. I need to perform some equations for all those variables. For this, I formed the array. But the problem is that I need to define arrays for each equation. And since each dataset has different number of variables, I need to change the number of array elements for each dataset. Is there any way to reduce this task? I think macro can solve this. Any suggested solutions will be highly appreciated.

Abu Chowdhury

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Since you are explicitly listing the variables you no longer need the tricks to skip the extra numeric variable for DATE. So take out the extra SET and the +1 in the array indexing.

You can't use the same list of variables for all of the arrays. Since except for the input list (array VAR) you are making NEW variables you do not need to list the variable names since SAS will automatically create them for you by adding numeric suffix to the array name.. If it is less confusing you can replace "array rvar(&nvar);"  with "array rvar rvar1 - rvar&nvar;".

%let varlist = CA45245E1097 ... US98974P1003 ; /***I wrote all the variable names***/

%let nvar = %sysfunc(countw(&varlist,%str( )));

data P20040by01;

  set Benchmar.P20040by0 (keep = date &varlist);

*set Benchmar.P20040by0;

  array var (&nvar) &varlist;

  array rvar (&nvar) &varlist;

  array gvar (&nvar) &varlist;

  array lgvar (&nvar) &varlist;

  array ret (&nvar) &varlist;

  do i = 1 to &nvar;

    rvar(i) = (var(i+1)-lag(var(i+1)))/lag(var(i+1));

    gvar(i) = rvar(i)+1;

    lgvar(i) = lag(gvar(i));

  end;

  rgla = mean( of lgvar(*) );

  do i = 1 to &nvar;

    ret(i) = (rvar(i)*lgvar(i))/rgla;

  end;

run;

View solution in original post

12 REPLIES 12
LinusH
Tourmaline | Level 20

It sounds like a task that could be solved.

But please try to explain your requirement in more detail, perhaps with sample data set structure(s), and equation logic you need to apply to the variables.

Data never sleeps
AbuChowdhury
Fluorite | Level 6

I wrote the following codes:

data P20040by01;

set Benchmar.P20040by0 (keep = Date CA45245E1097      CH0018666781 … … … US98974P1003;

Array var {104} CA45245E1097  CH0018666781 … … … US98974P1003;

Array rvar {104} r1 r2 … … … r104;

Array gvar {104} rg1 rg2 … … … rg104;

Array lgvar {104} rgl1 rgl2 … … … rgl104;

do i = 1 to 104;

rvar{i} = (var{i}-lag(var{i}))/lag(var{i});

gvar {i} = rvar{i}+1;

lgvar {i} = lag(gvar {i});

end;

rgla = mean (rgl1, rgl2, … … … rgl104);

Array ret {104} ret1 ret2 … … … ret104;

do i = 1 to 104;

ret {i} = (rvar{i}*lgvar{i})/rgla;

end;

run;


Description of array names are as follows:

var: price data of each variable. Each variable is a firm.

rvar: (price –lag(price))/lag(price)       i.e. rvar is the daily return

gvar: rvar + 1                                         i.e. gvar is the gross return

lgvar: lag(gvar)                                         i.e. lgvar is the prior day gross return

rgla: mean(lgvar)                                    i.e. rgla is the average of prior day gross return of the stocks

ret: return                                                   i.e. ret is the daily return multiplied by the prior day gross

return divided by the average of prior day gross return of stocks



In this dataset there are 104 variables. I need to perform those equations for all variables. For this, I formed arrays for each equation. For other datasets, I have different number of variables with different names. So I need to change the number of elements in each of the arrays I wrote for each dataset. I am attaching the dataset.


Is there any way so that I can reduce this task? I think macro can help. Let me know if you need more information.

Tom
Super User Tom
Super User

If you want to treat all numeric variables except DATE as price data for firms then your data step code can be much simpler.

Assuming you know the number of variables then you could write your data step like this.  Note the extra SET to make sure that DATE is the first variable and the +1 in the indexes into the VAR() array to skip over the DATE variable.  Also I think you will have an issue getting the LAG() stacks to work since for the first observation LAG(VAR(x)) will be missing.  What do you want the value of RVAR to be for the first date recorded?

%let nvar=104 ;

data P20040by01;

  set P20040by0 (keep=date);

  set P20040by0 ;

  array var _numeric_ ;

  array rvar (&nvar);

  array gvar (&nvar);

  array lgvar (&nvar);

  array ret (&nvar);

  do i = 1 to &nvar;

    rvar(i) = (var(i+1)-lag(var(i+1)))/lag(var(i+1));

    gvar(i) = rvar(i)+1;

    lgvar(i) = lag(gvar(i));

  end;

  rgla = mean( of gvar(*) );

  do i = 1 to &nvar;

    ret(i) = (rvar(i)*lgvar(i))/rgla;

  end;

run;


If you do NOT know the number of variables then this simple data step will find it for you.

data _null_;

  set P20040by0 ;

   array var _numeric_;

   call symputx('nvar',dim(var)-1);

   stop;

run;



AbuChowdhury
Fluorite | Level 6

Thanks a lot for your suggestions. It works. But I find the different results from results generated by my previous code. Why is this? Again if I want to keep these 104 variables (firms in my dataset) from the main dataset (where I have more than 2000 firms), then how can I select those 104 variables if I write %let nvar = 104 and set P20040by0 (keep = date)? I am eagerly waiting for your answer.

Actually I will start from 1/1/2004 in my analysis. Since I know that first two observations will be blank because of my equations, so I started my dataset from the date of 30/12/2003. I will remove those two blank observations (for 30/12/2003 and 31/12/2003) by using if ret1 =. then delete; Please let me know whether it's ok or not.

Tom
Super User Tom
Super User

If you only want to use some of the variables then you must list them.  If you list them then you can use COUNTW() function to find the number.

%let varlist= CA45245E1097 ... US98974P1003 ;

%let nvars = %sysfunc(countw(&varlist,%str( )));

...

set Benchmar.P20040by0 (keep = Date &varlist);

...

array var (&nvars) &varlist ;

If there is a pattern to the names then you can use variable lists to make it easier.

For example if they are the only variables that start with the letter C you could write

set P20040by0 (keep=date c:);

Or if they are in uninterrupted order in the dataset you could write:

set Benchmar.P20040by0 (keep = Date CA45245E1097 -- US98974P1003);

AbuChowdhury
Fluorite | Level 6

Hi, Thanks for your helpful suggestions. But this time something is getting wrong. Let me explain. Say, P20040by0 dataset has 2271 firms and date. From this dataset I want to keep those selected 104 firms and date. I have the following queries: Do I need to write all the variable names in %let varlist= ? Again, do I need to remove the _numeric_ from array var (that you suggested previously)? Finally, don't I need to write &varlist in other arrays?

I wrote the following codes but SAS log says error and 0 observation. Please correct me. You are highly appreciated for everything.

%let varlist = CA45245E1097 ... US98974P1003 ; /***I wrote all the variable names***/

%let nvar = %sysfunc(countw(&varlist,%str( )));

data P20040by01;

set Benchmar.P20040by0 (keep = date &varlist);

set Benchmar.P20040by0;

array var (&nvar) &varlist;

array rvar (&nvar) &varlist;

array gvar (&nvar) &varlist;

array lgvar (&nvar) &varlist;

array ret (&nvar) &varlist;

do i = 1 to &nvar;

rvar(i) = (var(i+1)-lag(var(i+1)))/lag(var(i+1));

    gvar(i) = rvar(i)+1;

    lgvar(i) = lag(gvar(i));

  end;

  rgla = mean( of lgvar(*) );

  do i = 1 to &nvar;

    ret(i) = (rvar(i)*lgvar(i))/rgla;

  end;

run;

Tom
Super User Tom
Super User

Since you are explicitly listing the variables you no longer need the tricks to skip the extra numeric variable for DATE. So take out the extra SET and the +1 in the array indexing.

You can't use the same list of variables for all of the arrays. Since except for the input list (array VAR) you are making NEW variables you do not need to list the variable names since SAS will automatically create them for you by adding numeric suffix to the array name.. If it is less confusing you can replace "array rvar(&nvar);"  with "array rvar rvar1 - rvar&nvar;".

%let varlist = CA45245E1097 ... US98974P1003 ; /***I wrote all the variable names***/

%let nvar = %sysfunc(countw(&varlist,%str( )));

data P20040by01;

  set Benchmar.P20040by0 (keep = date &varlist);

*set Benchmar.P20040by0;

  array var (&nvar) &varlist;

  array rvar (&nvar) &varlist;

  array gvar (&nvar) &varlist;

  array lgvar (&nvar) &varlist;

  array ret (&nvar) &varlist;

  do i = 1 to &nvar;

    rvar(i) = (var(i+1)-lag(var(i+1)))/lag(var(i+1));

    gvar(i) = rvar(i)+1;

    lgvar(i) = lag(gvar(i));

  end;

  rgla = mean( of lgvar(*) );

  do i = 1 to &nvar;

    ret(i) = (rvar(i)*lgvar(i))/rgla;

  end;

run;

AbuChowdhury
Fluorite | Level 6

Hi, thank you very very much. Everything is fine now.

By the by, can I have your email address if you don't mind? My email address is rasel0608@yahoo.com. I will be very happy if you send email to me so that I can get your email address.

Tom
Super User Tom
Super User

There is a facility on this site for sending private communications.

AbuChowdhury
Fluorite | Level 6

Hi Tom, back to my previous dataset and the last equation in the code (ret(i) = .........

If I use this then SAS calculates the return for selected firms (say 104) and names such as ret1, ret2, ….. etc. Now this is one of the datasets for 2004. This dataset has 104 firms and say this is dataset1 for year 2004. I will have 100 datasets for each year and this 100 datasets will have returns for 2271 firms (this is my sample size).  There will be different firms and different number of firms for each dataset for each year. I have a query:

How will I combine these 100 datasets? Each dataset has variables such as ret1, ret2, … etc. but each dataset has different firms. But I need to identify the returns for each firm for all 2271 firms for those 100 datasets because I need to calculate returns for those firms in 2005 and other years as well. Now, say dataset1 in 2005 has different firms and different number of firms than those of 2004 i.e. the firm that was in dataset1 for 2004 may be in another dataset for 2005. So I need to identify the returns for each firm so that I can form a time series dataset (say from 2004 to 2010) for returns of each firm.

In summary, 100 datasets will be combined for each year and this combined dataset will have returns for all 2271 firms. So there will be one dataset for every year. Then another dataset will have to be formed combining all yearly datasets.

I am not sure whether I could explain properly. Please let me know if you do not understand properly.

Reeza
Super User

Post this as a new question and mark this as answered.

Also, your question isn't clear.

Tom
Super User Tom
Super User

If you are getting different results hen one way to help debug is to try running i both ways for just one or two firms and use PROC COMPARE to see where the differences are.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 2595 views
  • 0 likes
  • 4 in conversation