I have a very simple bit of code below that I setup to calculate percentage change between two periods.
The calculations appear fine except for the first valid record.
The data set is arranged so that there are multiple stock tickers in time series. The data is sorted by ticker and first_of_month date.
I check to see if the ticker is equal to the prior record ticker. If it is, then I calculate the percentage change in price. If it is not, that indicates one ticker series has ended and another begun so I do not calculate the change as there is no prior period for the new ticker. For debugging purposes, I fill in the price_change field with 666
As you can see from the output snippet, the first record appripriately gets a 666 but then the second record fails to calcualte the price_change at all. I have included other debugging variables that indicate while the code sees the prior record with the same ticker, it fails to see the lagged price and does not perform the calculation.
This failure only happens on the first two records of the data set. When the ticker changes later on in the data set, everyone performs fine. I also checked this on several different datasets and the behavior is the same - the first ticker misses out on the first opportunity for a price_change calculation but everything else works fine.
DATA pct_changes;
SET ticker_files;
IF lag1(ticker) = ticker THEN
DO;
price_change = dif1(close_price)/lag1(close_price)*100;
lag1price = lag1(close_price);
dif1price = dif1(close_price);
lag2price = lag2(close_price);
old_price = "I see it";
END;
ELSE
price_change = 666;
ticker first_of_month close_price price_change lag1price dif1price lag2price old_price
MMC 01/01/98 16.2454 666 . . .
MMC 02/01/98 16.2575 . . . . I see it
MMC 03/01/98 18.9624 16.637859449 16.2575 2.7049 . I see it
MMC 04/01/98 19.2384 1.4555119605 18.9624 0.276 16.2575 I see it
MMC 05/01/98 20.1771 4.8793038922 19.2384 0.9387 18.9624 I see it
I appreciate any suggestions
You haven't explained what you are trying to do but I expect that you want to run the LAG functions on every observation instead of conditionally running it only on some of the observations. The LAG() function does not take the value from the previous observation. It takes it from the stack that is build from the values that are pushed down with in runs. Skipping some observations prevents those values from making into the lag stack to be later retrieved. Try this little test.
data test;
input a @@;
laga=lag(a);
if _N_> 1 then NOT_REALLY_laga=lag(a);
cards;
1 2 3 4 5 6 7 8 9
;
You need to run the LAG() every time.
DATA pct_changes;
SET ticker_files;
by ticker ;
lag1price = lag1(close_price);
dif1price = dif1(close_price);
lag2price = lag2(close_price);
if first.ticker then call missing(of lag1price dif1price lag2price);
else price_change = 100*(dif1price/lag1price);
run;
You haven't explained what you are trying to do but I expect that you want to run the LAG functions on every observation instead of conditionally running it only on some of the observations. The LAG() function does not take the value from the previous observation. It takes it from the stack that is build from the values that are pushed down with in runs. Skipping some observations prevents those values from making into the lag stack to be later retrieved. Try this little test.
data test;
input a @@;
laga=lag(a);
if _N_> 1 then NOT_REALLY_laga=lag(a);
cards;
1 2 3 4 5 6 7 8 9
;
You need to run the LAG() every time.
DATA pct_changes;
SET ticker_files;
by ticker ;
lag1price = lag1(close_price);
dif1price = dif1(close_price);
lag2price = lag2(close_price);
if first.ticker then call missing(of lag1price dif1price lag2price);
else price_change = 100*(dif1price/lag1price);
run;
I agree with Tom, but not only did you misuse the lag function (i.e., by applying it conditionally), you also misused the dif function. Here is a way to use both correctly:
data ticker_files; informat first_of_month mmddyy8.; input ticker $ first_of_month close_price; cards; MMC 01/01/98 16.2454 MMC 02/01/98 16.2575 MMC 03/01/98 18.9624 MMC 04/01/98 19.2384 MMC 05/01/98 20.1771 ; data pct_changes (drop=counter); SET ticker_files; by ticker; if first.ticker then counter=1; else counter+1; lag1price = ifn(counter gt 1,lag(close_price),.); dif1price = ifn(counter gt 1,dif1(close_price),.); lag2price = ifn(counter gt 2,lag2(close_price),.); old_price = "I see it"; price_change = ifn(counter gt 1,dif(close_price)*100,666); run;
Art, CEO, AnalystFinder.com
Thanks guys - I understand it much better now based on your explanations.
As a bonus, I learned about first.var, CALL MISSING and proper use of the BY statement!
Just to make sure I understand this completely, is the stack reset when the next ticker arrives due to the BY statement? That's what I want and the code seems to behave that wya but thought I should verify.
Thanks again! It's been too many years since I coded in SAS...
No, the stack isn't reset with a by statement. The code simply substitutes a missing value when the first record of a by group is being processed.
Art, CEO, AnalystFinder.com
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.