Hello,
I'm trying to take variables from future years and insert them into the same row of data as the current year's value, i.e. take dividends from 2019, 2020 and 2021 and add them to the data row of fiscal year 2018.
I'm trying to do it via the Macro below, my problem is that for some reason in the output, only i=3 is actually present, i=1 and i=2 are not there. In my example I would have the row for fiscal year 2018 and would have the 2021 dividends, but not those from 2019 and 2020.
How can I modify my code to not have this problem?
Thanks in advance!
%Macro Leadvariables (var=, dataset=); %Do i = 1 %to 3; Proc SQL; Create table &dataset. as select a.*, b.&var. as F&i._&var. label = " " From &dataset. as a left join &dataset. as b
On a.gvkey = b.gvkey and a.fyear = b.fyear - &i.; Quit; %End; %Mend;
You are processing the entire dataset three times, once for each desired forward period. I'd suggest processing the data once, in a data step, using an array to build up a complete history of dividends, and then subsequently retrieve those dividends
BTW - variables FYEAR? GVKEY? You are apparently using Compustat data. And it is probably sorted by GVKEY/DATADATE (almost, but not quite the same as sorted by GVKEY/FYEAR). I assume you are looking at annual data. So here is a DATA step to take advantage of the fact that the data are sorted by GVKEY (it won't matter what the order is within GVKEY).
Untested in the absence of sample data in a working data step:
%let minyear=1998;
%let maxyear=2021;
%let upper_bound=%eval(&maxyear+3);
data want (drop=_:);
array _v1_hist{&minyear:&upper_bound} ; /*dividend history*/
array _v2_hist{&minyear:&upper_bound} ; /*some other history*/
do until (last.gvkey); /*Build history of variables of interest */
set have;
by gvkey;
_v1_hist{fyear}=var1;
_v2_hist{fyear}=var2;
end;
/* Now reread the data and fetch values from future years*/
do until (last.gvkey);
set have;
by gvkey;
var1_f1=_v1_hist{fyear+1};
var2_f1=_v2_hist{fyear+1};
var1_f2=_v1_hist{fyear+2};
var2_f2=_v2_hist{fyear+2};
var1_f3=_v1_hist{fyear+3};
var2_f3=_v2_hist{fyear+3};
output;
end;
run;
The arrays are indexed by fiscal year. You can modify the macrovars minyear and maxyear for your data. This code makes the upper bound of the array equivalent to maxyear+3, so as to avoid an error when retrieving a value for three years after 2021 in this case.
Hi,
If you turn on options MPRINT and look at the log, I think you'll see that the macro did generate all three PROC SQL steps, as you expect, and hopefully they all ran without any errors.
The problem is you're using the CREATE TABLE statement which does not insert data into an existing dataset, it creates a dataset. If that dataset already exists, it over-writes the dataset. Also a problem is that you are over-writing the same dataset you read in.
The easiest way in SAS to add rows to an existing dataset is to use PROC APPEND.
That said, if you can post a little example of the data you have, and the data you want, people might be able to help with other approaches. For example, it sounds like you're trying to do a 'look-ahead'? Generally the structure of data you have (with a year variable) is better than having separate variables for Y2019dividend, Y2020dividend, Y2021dividend..
can you post some sample data as datalines? it looks like you're joining the same table to itself. not sure if you meant to do that.
When creating macros, you really ought to first have working code without macros and without macro variables that does what you want for 2 instances (in this case 2 years). I'm guessing you didn't do that, and so it is difficult to get code with macros and with macro variables to work.
If you did write the code without macros and without macro variables and it works properly for two years, it should be simple to turn this code into a macro. So, please show us the working code without macros and without macro variables and it works properly for two years.
NOTE: Most people ignore this advice and continue to struggle writing macros that work. Don't be one of those people.
You are processing the entire dataset three times, once for each desired forward period. I'd suggest processing the data once, in a data step, using an array to build up a complete history of dividends, and then subsequently retrieve those dividends
BTW - variables FYEAR? GVKEY? You are apparently using Compustat data. And it is probably sorted by GVKEY/DATADATE (almost, but not quite the same as sorted by GVKEY/FYEAR). I assume you are looking at annual data. So here is a DATA step to take advantage of the fact that the data are sorted by GVKEY (it won't matter what the order is within GVKEY).
Untested in the absence of sample data in a working data step:
%let minyear=1998;
%let maxyear=2021;
%let upper_bound=%eval(&maxyear+3);
data want (drop=_:);
array _v1_hist{&minyear:&upper_bound} ; /*dividend history*/
array _v2_hist{&minyear:&upper_bound} ; /*some other history*/
do until (last.gvkey); /*Build history of variables of interest */
set have;
by gvkey;
_v1_hist{fyear}=var1;
_v2_hist{fyear}=var2;
end;
/* Now reread the data and fetch values from future years*/
do until (last.gvkey);
set have;
by gvkey;
var1_f1=_v1_hist{fyear+1};
var2_f1=_v2_hist{fyear+1};
var1_f2=_v1_hist{fyear+2};
var2_f2=_v2_hist{fyear+2};
var1_f3=_v1_hist{fyear+3};
var2_f3=_v2_hist{fyear+3};
output;
end;
run;
The arrays are indexed by fiscal year. You can modify the macrovars minyear and maxyear for your data. This code makes the upper bound of the array equivalent to maxyear+3, so as to avoid an error when retrieving a value for three years after 2021 in this case.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.