Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- Programming
- /
- lag function not working

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

☑ This topic is **solved**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 11-23-2023 10:10 AM
(2382 views)

Hello,

Here is the sample data set similar to what I am handling now.

data temp; infile datalines dlm='#'; input id $ dttm :datetime20. var1 var2 var3 var4; format dttm datetime20.; datalines; A # 20JAN2021:13:37:39.000 # 1 # # # A # 10FEB2021:09:45:46.000 # 1 # 0 # 0 # 0 A # 04SEP2021:11:17:24.000 # 1 # # # A # 09OCT2021:18:01:40.000 # 1 # 0 # 0 # 0 A # 10OCT2021:06:15:52.000 # 0 # # # A # 13OCT2021:13:42:22.000 # 1 # # # A # 17OCT2021:10:44:29.000 # 0 # 0 # # A # 11DEC2021:09:54:47.000 # 0 # 1 # 1 # A # 15DEC2021:16:55:44.000 # 1 # 0 # 0 # A # 15MAY2022:11:33:08.000 # 0 # # # ; run;

Here are the codes I wrote.

data temp1;set temp; by id; length var_combined $20.; if first.id then do; ct_one=.; ct_zero=.; end; var_combined=catx(', ', of var:); ct_one=count(var_combined, '1'); ct_zero=count(var_combined, '0'); if (ct_one>0 and ct_zero=0) then outcome=1; if (ct_one=0 and ct_zero>0) then outcome=0; if (ct_one>0 and ct_zero>0) then outcome=1-lag(outcome); run;

Explanations.

- I combined var1-var4 (var_combined) since I need to know whether all vars have the same value (e.g., all zeros or all ones).
- If one and zero are mixed, then the outcome needs to be the opposite of the value on the previous date. For instance, if the outcome for obs#1 is 1, then the outcome for obs#2 needs to be 0, not 1. I used the lag function together with the if statement, but it's not working. My codes assign a missing value for the obs with ones and zeros mixed.
- I understand this conditional lag() will work at the execution level, not the case level and that's why I get a missing value for the lagged outcome. Still, I cannot figure out how to correct the codes to achieve what I want.
- FYI, I don't have to use this approach for the goal.

Any advice or help would be greatly appreciated! Please let me know if you need additional information. Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Can you clarify the rules?

You mentioned "previous date", but your data has DATETIME values, not DATE values.

Can you have two observations on the date, but different times of day. In that case how will that impact your rule #2 ?

If you did mean the previous DATETIME value then can two observations have the same DATETIME value? If so then what do you want to do then?

Of is the "DATE" actually only used for sorting and the actual rule is just to toggle from the pervious result?

What is the rule when the FIRST observation for an ID has mixed 1 and 0 values?

Can you provide the output you want for the given input?

If the rule is that when the first observation has mixed values then new variable is missing then this is my interpretation of what you EXPECT.

```
data temp;
input id $ dttm :datetime. var1-var4 expect reason $40.;
format dttm datetime19.;
datalines;
A 20JAN2021:13:37:39 1 . . . 1 Only one and missing
A 10FEB2021:09:45:46 1 0 0 0 0 Toggle previous
A 04SEP2021:11:17:24 1 . . . 1 Only one and missing
B 09OCT2021:18:01:40 1 0 0 0 . Mixed on first observation
B 10OCT2021:06:15:52 0 . . . 0 Only zero and missing
B 13OCT2021:13:42:22 1 . . . 1 Only one and missing
B 17OCT2021:10:44:29 0 0 . . 0 Only zero and missing
B 11DEC2021:09:54:47 0 1 1 . 1 Toggle previous
B 11DEC2021:09:54:47 1 1 0 0 0 Toggle previous
C 15DEC2021:16:55:44 1 0 0 . . Mixed on first observation
C 15MAY2022:11:33:08 0 . . . 0 Only zero and missing
;
```

Which you can get by doing something like this:

```
data temp1;
set temp;
by id;
if whichn(1,of var:) and 0=whichn(0,of var:) then want=1;
else if 0=whichn(1,of var:) and whichn(0,of var:) then want=0;
else if first.id then want=.;
else want = not want;
retain want;
run;
```

Now if you really did want to toggle the value from the previous DATETIME (and not just the previous observation) then you could retain that "lagged" value instead of the actual value and only update it on the last observation for a DATETIME value.

```
data temp;
input id $ dttm :datetime. var1-var4 expect reason $40.;
format dttm datetime19.;
datalines;
A 20JAN2021:13:37:39 1 . . . 1 Only one and missing
A 10FEB2021:09:45:46 1 0 0 0 0 Toggle previous
A 04SEP2021:11:17:24 1 . . . 1 Only one and missing
B 09OCT2021:18:01:40 1 0 0 0 . Mixed on first observation
B 10OCT2021:06:15:52 0 . . . 0 Only zero and missing
B 13OCT2021:13:42:22 1 . . . 1 Only one and missing
B 17OCT2021:10:44:29 0 0 . . 0 Only zero and missing
B 11DEC2021:09:54:47 0 1 1 . 1 Toggle previous datetime
B 11DEC2021:09:54:47 1 1 0 0 1 Toggle previous datetime
C 15DEC2021:16:55:44 1 0 0 . . Mixed on first observation
C 15MAY2022:11:33:08 0 . . . 0 Only zero and missing
;
data temp1;
set temp;
by id dttm;
if whichn(1,of var:) and 0=whichn(0,of var:) then want=1;
else if 0=whichn(1,of var:) and whichn(0,of var:) then want=0;
else if first.id then want=.;
else want = not lag_want;
output;
if last.dttm then lag_want=want;
if last.id then lag_want=.;
retain lag_want;
run;
```

11 REPLIES 11

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Instead of

` if (ct_one>0 and ct_zero>0) then outcome=1-lag(outcome);`

you should use

```
prev_outcome=lag(outcome);
if (ct_one>0 and ct_zero>0) then outcome=1-prev_outcome;
```

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you, @PaigeMiller. I replaced the line with your suggested two lines, and it partially works. The outcome is missing when the first obs has multiple values (id=C, var_combined='1, 0, 0'). So I made the codes into two steps, and it seems working as I want now. It seems like where and when the lag function is executed makes a difference. Please let me know if you see any issues. Below are the codes that I tried.

data temp1;set temp; length var_combined $20.; by id; var_combined=catx(', ', of var:); ct_one=count(var_combined, '1'); ct_zero=count(var_combined, '0'); if (ct_one>0 and ct_zero=0) then outcome=1; if (ct_one=0 and ct_zero>0) then outcome=0; if first.id and (ct_one>0 and ct_zero>0) then outcome=var1; run; data temp2;set temp1; by id; pre_outcome=lag(outcome); if first.id then pre_outcome=.; if not first.id and (ct_one>0 and ct_zero>0) then outcome=1-pre_outcome; run;

The results.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Well, I tested it with the real data, and it doesn't work perfectly. When there are two consecutive observations with multiple values, the second one doesn't have any value. I updated the data set (temp) and then ran the codes again. See the results below. Please advise!

data temp; infile datalines dlm='#'; input id $ dttm :datetime20. var1 var2 var3 var4; format dttm datetime20.; datalines; A # 20JAN2021:13:37:39.000 # 1 # # # A # 10FEB2021:09:45:46.000 # 1 # 0 # 0 # 0 A # 04SEP2021:11:17:24.000 # 1 # # # B # 09OCT2021:18:01:40.000 # 1 # 0 # 0 # 0 B # 10OCT2021:06:15:52.000 # 0 # # # B # 13OCT2021:13:42:22.000 # 1 # # # B # 17OCT2021:10:44:29.000 # 0 # 0 # # B # 11DEC2021:09:54:47.000 # 0 # 1 # 1 # B # 11DEC2021:09:54:47.000 # 1 # 1 # 0 # 0 C # 15DEC2021:16:55:44.000 # 1 # 0 # 0 # C # 15MAY2022:11:33:08.000 # 0 # # # ; run; data temp1;set temp; length var_combined $20.; by id; var_combined=catx(', ', of var:); ct_one=count(var_combined, '1'); ct_zero=count(var_combined, '0'); if (ct_one>0 and ct_zero=0) then outcome=1; if (ct_one=0 and ct_zero>0) then outcome=0; if first.id and (ct_one>0 and ct_zero>0) then outcome=0; /* first obs and multiple values, then assign "sinus" (0) */ run; data temp2;set temp1; by id; pre_outcome=lag(outcome); if first.id then pre_outcome=.; if not first.id and (ct_one>0 and ct_zero>0) then outcome=1-pre_outcome; run;

The results.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

LAG() does not look back into the observation read previously from a dataset. Instead it sets up a FIFO queue, and puts a value into it *every time it is called*. By calling it in a conditional branch, you uncouple the filling of the queue from the progress through the incoming dataset.

LAG cannot be used to calculate a value and also put it into the queue; when LAG is called in your code, outcome will always be missing (not yet set), and that value will end up in the queue for the next call.

Instead, make outcome a RETAINed variable, and initialize it properly at first.id.

This part

```
if first.id then do;
ct_one=.;
ct_zero=.;
end;
```

makes no sense, as the variables are not retained, and will be set to missing at the start of each DATA step iteration anyway.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you, @Kurt_Bremser. First, I removed the part you pointed out. Yes, it doesn't make sense (it was a leftover from my previous tries!). Then I added the retain statement. Still, it doesn't work as I want. Here are the codes that I tried.

data temp1;set temp; by id; length var_combined $20.; retain outcome; if first.id then outcome=.; var_combined=catx(', ', of var:); ct_one=count(var_combined, '1'); ct_zero=count(var_combined, '0'); if (ct_one>0 and ct_zero=0) then outcome=1; if (ct_one=0 and ct_zero>0) then outcome=0; if (ct_one>0 and ct_zero>0) then outcome=1-lag(outcome); run;

The results.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Once again, this

`if (ct_one>0 and ct_zero>0) then outcome=1-lag(outcome);`

won't work. At the moment LAG is called, it puts the current value of outcome (retained from the previous iteration) into the queue, not the one that is calculated.

And you still call LAG conditionally, which will load the queue inconsistently.

Also point out which values of outcome you expect for your example data.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Can you clarify the rules?

You mentioned "previous date", but your data has DATETIME values, not DATE values.

Can you have two observations on the date, but different times of day. In that case how will that impact your rule #2 ?

If you did mean the previous DATETIME value then can two observations have the same DATETIME value? If so then what do you want to do then?

Of is the "DATE" actually only used for sorting and the actual rule is just to toggle from the pervious result?

What is the rule when the FIRST observation for an ID has mixed 1 and 0 values?

Can you provide the output you want for the given input?

If the rule is that when the first observation has mixed values then new variable is missing then this is my interpretation of what you EXPECT.

```
data temp;
input id $ dttm :datetime. var1-var4 expect reason $40.;
format dttm datetime19.;
datalines;
A 20JAN2021:13:37:39 1 . . . 1 Only one and missing
A 10FEB2021:09:45:46 1 0 0 0 0 Toggle previous
A 04SEP2021:11:17:24 1 . . . 1 Only one and missing
B 09OCT2021:18:01:40 1 0 0 0 . Mixed on first observation
B 10OCT2021:06:15:52 0 . . . 0 Only zero and missing
B 13OCT2021:13:42:22 1 . . . 1 Only one and missing
B 17OCT2021:10:44:29 0 0 . . 0 Only zero and missing
B 11DEC2021:09:54:47 0 1 1 . 1 Toggle previous
B 11DEC2021:09:54:47 1 1 0 0 0 Toggle previous
C 15DEC2021:16:55:44 1 0 0 . . Mixed on first observation
C 15MAY2022:11:33:08 0 . . . 0 Only zero and missing
;
```

Which you can get by doing something like this:

```
data temp1;
set temp;
by id;
if whichn(1,of var:) and 0=whichn(0,of var:) then want=1;
else if 0=whichn(1,of var:) and whichn(0,of var:) then want=0;
else if first.id then want=.;
else want = not want;
retain want;
run;
```

Now if you really did want to toggle the value from the previous DATETIME (and not just the previous observation) then you could retain that "lagged" value instead of the actual value and only update it on the last observation for a DATETIME value.

```
data temp;
input id $ dttm :datetime. var1-var4 expect reason $40.;
format dttm datetime19.;
datalines;
A 20JAN2021:13:37:39 1 . . . 1 Only one and missing
A 10FEB2021:09:45:46 1 0 0 0 0 Toggle previous
A 04SEP2021:11:17:24 1 . . . 1 Only one and missing
B 09OCT2021:18:01:40 1 0 0 0 . Mixed on first observation
B 10OCT2021:06:15:52 0 . . . 0 Only zero and missing
B 13OCT2021:13:42:22 1 . . . 1 Only one and missing
B 17OCT2021:10:44:29 0 0 . . 0 Only zero and missing
B 11DEC2021:09:54:47 0 1 1 . 1 Toggle previous datetime
B 11DEC2021:09:54:47 1 1 0 0 1 Toggle previous datetime
C 15DEC2021:16:55:44 1 0 0 . . Mixed on first observation
C 15MAY2022:11:33:08 0 . . . 0 Only zero and missing
;
data temp1;
set temp;
by id dttm;
if whichn(1,of var:) and 0=whichn(0,of var:) then want=1;
else if 0=whichn(1,of var:) and whichn(0,of var:) then want=0;
else if first.id then want=.;
else want = not lag_want;
output;
if last.dttm then lag_want=want;
if last.id then lag_want=.;
retain lag_want;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- You mentioned "previous date", but your data has DATETIME values, not DATE values.
- You are right. The data needs to be sorted by DATETIME, not DATE. Each observation is a new datetime.

- Can you have two observations on the date, but different times of day. In that case how will that impact your rule #2 ?
- Two obs can have the same or different date and time.

- If you did mean the previous DATETIME value then can two observations have the same DATETIME value? If so then what do you want to do then?
- I meant the previous datetime.
- No, two separate observations cannot have the same datetime.

- Of is the "DATE" actually only used for sorting and the actual rule is just to toggle from the pervious result?
- If there are multiple values on the same date and time, then toggle the value on the previous datetime.

- What is the rule when the FIRST observation for an ID has mixed 1 and 0 values?
- Good point. If mixed in the first obs, then the value should be 0.

- Can you provide the output you want for the given input?
- I modified the codes ("else if first.id then var=0;") in your second approach, and it perfectly works! The last three lines for lagging make sense to me.

```
data temp;
infile datalines dlm='#';
input id $ dttm :datetime20. var1 var2 var3 var4;
format dttm datetime20.;
datalines;
A # 20JAN2021:13:37:39.000 # 1 # # #
A # 10FEB2021:09:45:46.000 # 1 # 0 # 0 # 0
A # 04SEP2021:11:17:24.000 # 1 # # #
B # 09OCT2021:18:01:40.000 # 1 # 0 # 0 # 0
B # 10OCT2021:06:15:52.000 # 0 # # #
B # 13OCT2021:13:42:22.000 # 1 # # #
B # 17OCT2021:10:44:29.000 # 0 # 0 # #
B # 11DEC2021:09:54:47.000 # 0 # 1 # 1 #
B # 11DEC2021:09:54:47.000 # 1 # 1 # 0 # 0
C # 15DEC2021:16:55:44.000 # 1 # 0 # 0 #
C # 15MAY2022:11:33:08.000 # 0 # # #
;
run;
data temp1; set temp;
by id dttm;
if whichn(1, of var:) and 0=whichn(0, of var:) then var=1;
else if 0=whichn(1, of var:) and whichn(0, of var:) then var=0;
else if first.id then var=0;
else var= not lag_var;
output;
if last.dttm then lag_var=var;
if last.id then lag_var=.;
retain lag_var;
run;
```

The results (which I wanted).

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Don't call the new variable VAR and then use OF VAR: in the code because the NEW variable will be in included in that list (at least in the ELSE clause after the data step compiler has had a chance to see that VAR exists).

Either use VAR1-VAR4 or call the variable something that does not start with VAR.

Also don't indent lines of data. It will just confuse you (and SAS) about where the data starts on the line. And if you have accidentally turned off the editor option to replace tabs with spaces you might end up with actual TAB characters in your data if you run the code without using Display Manager.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

The final codes.

```
data temp1; set temp;
by id dttm;
if whichn(1, of var:) and 0=whichn(0, of var:) then outcome=1;
else if 0=whichn(1, of var:) and whichn(0, of var:) then outcome=0;
else if first.id then outcome=0;
else outcome= not lag_outcome;
output;
if last.dttm then lag_outcome=outcome;
if last.id then lag_outcome=.;
retain lag_outcome;
run;
```

**SAS Innovate 2025** is scheduled for May 6-9 in Orlando, FL. Sign up to be **first to learn** about the agenda and registration!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Ready to level-up your skills? Choose your own adventure.