Hi,
My data are sorted by gvkey and year. I ran the following code.
Data want;
set have;
if first.gvkey then sales_growth=0;
else sales_growth=log((1+sale)/lag((1+sale)));
run;
I get sales growth number as expected for the whole data set except the following.
Gvkey year sale sales_growth
1004 2000 28796 0
1004 2001 34697 .
1004 2002 36784 0.0601
My question is why I am getting missing value in the 2nd year? In my data set 1004 is the first gvkey. The problem is happening only for the first gvkey. For other gvkey I am not having this problem. Can anyone help me please?
NOTE: Missing values were generated as a result of performing an operation on missing values. Each place is given by: (Number of times) at (Line):(Column). 1 at 3647:19 1 at 3647:32
This is telling you there is a problem somewhere. That at line 3647 column 32 a missing value is produced. What happens at column 32 of line 3647 ... you are evaluating the LAG() function.
The LAG() function doesn't work the way most people think it works. I think you need the LAG() function to be evaluated on every record, not just the records where first.gvkey is zero (because your code only evaluates the LAG() function when first.gvkey is zero).
So, this should fix the issue, by evaluating LAG() on every record.
data comp33;
set comp3;
by gvkey fy;
zz=lag(1+wsale);
if first.gvkey then sales_growth=0;
else sales_growth=log((1+wsale)/zz);
run;
Well, you didn't run that exact code. There's no variable "sale" in the data you show, despite what the code you show says. And of course, that code won't run unless its in a DATA step, so let's see the entire DATA step. So, show us the entire LOG (code plus notes, warnings, errors) for this DATA step. Paste the log into the window that appears when you click on the </> icon so as to preserve the formatting and make it easier for us to read.
Also, show us a portion of the actual data, where the variable names match the variable names in the code.
I don't want to see "edited" code. I want to see the actual code you are running. And show us the LOG anyway, even if there are no errors or warnings.
3643 Data comp33;
3644 set comp3;
3645 by gvkey fy;
3646 if first.gvkey then sales_growth=0;
3647 else sales_growth=log((1+wsale)/lag((1+wsale)));
3648 run;
NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
1 at 3647:19 1 at 3647:32
NOTE: There were 110195 observations read from the data set WORK.COMP3.
NOTE: The data set WORK.COMP33 has 110195 observations and 43 variables.
NOTE: DATA statement used (Total process time):
real time 0.11 seconds
cpu time 0.07 seconds
By the way, I don't have any missing observation in my first gvkey and problem is I am missing value for the first gvkey only
NOTE: Missing values were generated as a result of performing an operation on missing values. Each place is given by: (Number of times) at (Line):(Column). 1 at 3647:19 1 at 3647:32
This is telling you there is a problem somewhere. That at line 3647 column 32 a missing value is produced. What happens at column 32 of line 3647 ... you are evaluating the LAG() function.
The LAG() function doesn't work the way most people think it works. I think you need the LAG() function to be evaluated on every record, not just the records where first.gvkey is zero (because your code only evaluates the LAG() function when first.gvkey is zero).
So, this should fix the issue, by evaluating LAG() on every record.
data comp33;
set comp3;
by gvkey fy;
zz=lag(1+wsale);
if first.gvkey then sales_growth=0;
else sales_growth=log((1+wsale)/zz);
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.