I have information on many companies for many years sorted by company and by year. The raw data contains a row for one company for one year.
In my analyses, I want to use a variable from one row (say 2006 GE sales) as the dependent variable, but information from previous rows as independent variables (say 2005 GE advertising, 2004 GE advertising, and 2003 GE advertising). I know how to obtain variables from 2005 for 2006 by using the lag () function: lag(sales). Is there any way to pull variables from more than just one previous row? Ideally, I could just say something like lag2(sales) to pull data from two rows back.
Hi, in fact, the lag function documentation does show how you can use lag2, lag3 etc. Depending on the procedure you're going to use for your analysis, you may want to turn your long skinny data into wide data with PROC TRANSPOSE, instead of using LAG.
The Transpose documentation has some good examples of turning long skinny data into wide data (so that for one company, you'd have all the information for all the years on the same observation).
It really comes down to how your analytical procedure needs to have the data structured for analysis. If you look at the doc on your procedure of choice, then you can see how they structure the sample data that's going into the procedure. Then, by comparing your data structure to the sample data, you can figure out the transformations, if any, that you need. And whether LAG or TRANSPOSE will work better for you.
I believe there have been quite a few previous forum postings on both LAG and TRANSPOSE, too, so perhaps a search of the forum would be beneficial.