BookmarkSubscribeRSS Feed
Saurabh_Rana
Obsidian | Level 7

Col-A Col-B

   X        9

   X        4

   X        3

   Y        .

   X        1

 

I want to add the last values of Col-B when Col-A is equal to "Y" and replace the null value by it

 Output ->

 

Col-A Col-B

   X        9

   X        4

   X        3

   Y        7

   X        1

To achieve this output I am using this code->

 

 
data table;
set table;
if "Col-A"n = "Y"
then "Col-B"n=lag1("Col-B"n)+lag2("Col-B"n);
run;
But I am getting the following error in this code ->
NOTE: Missing values were generated as a result of performing an operation on missing values.

 

Note- I am getting proper output when I am storing the lag function variables first and then adding them,

But I don't want to unnecessarily store variables.

 

9 REPLIES 9
PeterClemmensen
Tourmaline | Level 20

Try this

 

data have;
input a $ b;
datalines;
X 9 
X 4 
X 3 
Y . 
X 7 
;

data want;
   set have;
   b = ifn(b = ., sum(lag1(b), lag2(b)), b);
run;
Saurabh_Rana
Obsidian | Level 7

Your code worked perfectly but can you please explain to me what is wrong with the code that I am using?

Shmuel
Garnet | Level 18

@Saurabh_Rana wrote:

Your code worked perfectly but can you please explain to me what is wrong with the code that I am using?


Suppose x value is missing and y=5.
result of x+y will be missing value, while sum(of x, y) = 5

that is because sum function ignores missing value or relate to it as zero.

Saurabh_Rana
Obsidian | Level 7
Yeah but there are no null values, in this case, it's simply 3+4=7.
PeterClemmensen
Tourmaline | Level 20

Sure. The Lag Function is not a lookback function. Rather, it is a queue function. When you call Lag2(b) you set up a queue with two entries. THe values in the entries are initially missing. Each time the Lag Function is called, the top value is pushed out and returned from the queue. The remaining values are shifted upward, and the new value of the argument is placed at the bottom of the queue. 

 

That is why you will almost certainly get undesired results when calling the Lag Function in conditional logic such as If-Then Statements. However, when we use the Ifn Function, the Lag Function is executed regardless of whether the expression in the first argument is true or not.

 

Hope this clears things up 🙂

Saurabh_Rana
Obsidian | Level 7
Thank you for your in-depth explanation but I am still not very clear about why it doesn't work with IF-Then on the other works perfectly with Ifn function.
I guess my basic understanding of base sas is not very clear.
PeterClemmensen
Tourmaline | Level 20

The difference is:

 

With the If-Then Statements, the lag function executes only when the condition is true.

 

The Ifn Function is designed so that the lag function executes no matter if the condition is true or not.

Saurabh_Rana
Obsidian | Level 7
Yes exactly, I want the lag function to be executed just for the case where Col-A is equal to "Y",

Like this -

A B lag1(B) lag2(B)
X 4
X 3
Y . 3 4
PeterClemmensen
Tourmaline | Level 20

Then this is the code for you.

 

Makes sense?

 

data have;
input a $ b;
datalines;
X 9 
X 4 
X 3 
Y . 
X 7 
;

data want;
   set have;
   b = ifn(a = "Y", sum(lag1(b), lag2(b)), b);
run;

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1058 views
  • 2 likes
  • 3 in conversation