BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
X605191
Fluorite | Level 6

Hello, 

 

Here is the sample data set similar to what I am handling now.

 

data temp;
	infile datalines dlm='#';
	input id $ dttm :datetime20. var1 var2 var3 var4;
	format dttm datetime20.;
	datalines;
		A # 20JAN2021:13:37:39.000 # 1 #  #  # 
		A # 10FEB2021:09:45:46.000 # 1 # 0 # 0 # 0
		A # 04SEP2021:11:17:24.000 # 1 #  #  # 
		A # 09OCT2021:18:01:40.000 # 1 # 0 # 0 # 0
		A # 10OCT2021:06:15:52.000 # 0 #  #  # 
		A # 13OCT2021:13:42:22.000 # 1 #  #  # 
		A # 17OCT2021:10:44:29.000 # 0 # 0 #  # 
		A # 11DEC2021:09:54:47.000 # 0 # 1 # 1	 # 
		A # 15DEC2021:16:55:44.000 # 1 # 0 # 0	 # 
		A # 15MAY2022:11:33:08.000 # 0 #  #  # 
	;
run;

 

 

Here are the codes I wrote.

 

data temp1;set temp;
	by id;
	length var_combined $20.;

	if first.id then do;
		ct_one=.;
		ct_zero=.;
	end;

	var_combined=catx(', ', of var:);
	ct_one=count(var_combined, '1');
	ct_zero=count(var_combined, '0');
	
	if (ct_one>0 and ct_zero=0) then outcome=1;
	if (ct_one=0 and ct_zero>0) then outcome=0;
	if (ct_one>0 and ct_zero>0) then outcome=1-lag(outcome);
run;

 

 

Explanations.

  1. I combined var1-var4 (var_combined) since I need to know whether all vars have the same value (e.g., all zeros or all ones).
  2. If one and zero are mixed, then the outcome needs to be the opposite of the value on the previous date. For instance, if the outcome for obs#1 is 1, then the outcome for obs#2 needs to be 0, not 1. I used the lag function together with the if statement, but it's not working. My codes assign a missing value for the obs with ones and zeros mixed.
  3. I understand this conditional lag() will work at the execution level, not the case level and that's why I get a missing value for the lagged outcome. Still, I cannot figure out how to correct the codes to achieve what I want.
  4. FYI, I don't have to use this approach for the goal.

 

Any advice or help would be greatly appreciated! Please let me know if you need additional information. Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Can you clarify the rules?

 

You mentioned "previous date", but your data has DATETIME values, not DATE values. 

 

Can you have two observations on the date, but different times of day.  In that case how will that impact your rule #2 ? 

 

If you did mean the previous DATETIME value then can two observations have the same DATETIME value?  If so then what do you want to do then?

 

Of is the "DATE" actually only used for sorting and the actual rule is just to toggle from the pervious result?

 

What is the rule when the FIRST observation for an ID has mixed 1 and 0 values?

 

Can you provide the output you want for the given input?

 

If the rule is that when the first observation has mixed values then new variable is missing then this is my interpretation of what you EXPECT.

data temp;
  input id $ dttm :datetime. var1-var4 expect reason $40.;
  format dttm datetime19.;
datalines;
A 20JAN2021:13:37:39 1 . . . 1 Only one and missing
A 10FEB2021:09:45:46 1 0 0 0 0 Toggle previous
A 04SEP2021:11:17:24 1 . . . 1 Only one and missing
B 09OCT2021:18:01:40 1 0 0 0 . Mixed on first observation
B 10OCT2021:06:15:52 0 . . . 0 Only zero and missing
B 13OCT2021:13:42:22 1 . . . 1 Only one and missing
B 17OCT2021:10:44:29 0 0 . . 0 Only zero and missing
B 11DEC2021:09:54:47 0 1 1 . 1 Toggle previous
B 11DEC2021:09:54:47 1 1 0 0 0 Toggle previous
C 15DEC2021:16:55:44 1 0 0 . . Mixed on first observation
C 15MAY2022:11:33:08 0 . . . 0 Only zero and missing
;

Which you can get by doing something like this:

data temp1;
  set temp;
  by id;
  if whichn(1,of var:) and 0=whichn(0,of var:) then want=1;
  else if 0=whichn(1,of var:) and whichn(0,of var:) then want=0;
  else if first.id then want=.;
  else want = not want;
  retain want;
run;

Now if you really did want to toggle the value from the previous DATETIME (and not just the previous observation) then you could retain that "lagged" value instead of the actual value and only update it on the last observation for a DATETIME value.

data temp;
  input id $ dttm :datetime. var1-var4 expect reason $40.;
  format dttm datetime19.;
datalines;
A 20JAN2021:13:37:39 1 . . . 1 Only one and missing
A 10FEB2021:09:45:46 1 0 0 0 0 Toggle previous
A 04SEP2021:11:17:24 1 . . . 1 Only one and missing
B 09OCT2021:18:01:40 1 0 0 0 . Mixed on first observation
B 10OCT2021:06:15:52 0 . . . 0 Only zero and missing
B 13OCT2021:13:42:22 1 . . . 1 Only one and missing
B 17OCT2021:10:44:29 0 0 . . 0 Only zero and missing
B 11DEC2021:09:54:47 0 1 1 . 1 Toggle previous datetime
B 11DEC2021:09:54:47 1 1 0 0 1 Toggle previous datetime
C 15DEC2021:16:55:44 1 0 0 . . Mixed on first observation
C 15MAY2022:11:33:08 0 . . . 0 Only zero and missing
;

data temp1;
  set temp;
  by id dttm;
  if whichn(1,of var:) and 0=whichn(0,of var:) then want=1;
  else if 0=whichn(1,of var:) and whichn(0,of var:) then want=0;
  else if first.id then want=.;
  else want = not lag_want;
  output;
  if last.dttm then lag_want=want;
  if last.id then lag_want=.;
  retain lag_want;
run;

 

 

View solution in original post

11 REPLIES 11
PaigeMiller
Diamond | Level 26

Instead of

 

	if (ct_one>0 and ct_zero>0) then outcome=1-lag(outcome);

 

you should use

 

prev_outcome=lag(outcome);
if (ct_one>0 and ct_zero>0) then outcome=1-prev_outcome;
--
Paige Miller
X605191
Fluorite | Level 6

Thank you, @PaigeMiller. I replaced the line with your suggested two lines, and it partially works. The outcome is missing when the first obs has multiple values (id=C, var_combined='1, 0, 0'). So I made the codes into two steps, and it seems working as I want now. It seems like where and when the lag function is executed makes a difference. Please let me know if you see any issues. Below are the codes that I tried.

 

data temp1;set temp;
	length var_combined $20.;
	by id;

	var_combined=catx(', ', of var:);
	ct_one=count(var_combined, '1');
	ct_zero=count(var_combined, '0');
	
	if (ct_one>0 and ct_zero=0) then outcome=1;
	if (ct_one=0 and ct_zero>0) then outcome=0;
	if first.id and (ct_one>0 and ct_zero>0) then outcome=var1;
run;

data temp2;set temp1;
	by id;

	pre_outcome=lag(outcome);
	if first.id then pre_outcome=.;

	if not first.id and (ct_one>0 and ct_zero>0) then outcome=1-pre_outcome;
run;

The results.

X605191_1-1700843261543.png

 

 

 

X605191
Fluorite | Level 6

Well, I tested it with the real data, and it doesn't work perfectly. When there are two consecutive observations with multiple values, the second one doesn't have any value. I updated the data set (temp) and then ran the codes again. See the results below. Please advise!

 

data temp;
	infile datalines dlm='#';
	input id $ dttm :datetime20. var1 var2 var3 var4;
	format dttm datetime20.;
	datalines;
		A # 20JAN2021:13:37:39.000 # 1 #  #  # 
		A # 10FEB2021:09:45:46.000 # 1 # 0 # 0 # 0
		A # 04SEP2021:11:17:24.000 # 1 #  #  # 
		B # 09OCT2021:18:01:40.000 # 1 # 0 # 0 # 0
		B # 10OCT2021:06:15:52.000 # 0 #  #  # 
		B # 13OCT2021:13:42:22.000 # 1 #  #  # 
		B # 17OCT2021:10:44:29.000 # 0 # 0 #  # 
		B # 11DEC2021:09:54:47.000 # 0 # 1 # 1	 # 
		B # 11DEC2021:09:54:47.000 # 1 # 1 # 0	 # 0 
		C # 15DEC2021:16:55:44.000 # 1 # 0 # 0	 # 
		C # 15MAY2022:11:33:08.000 # 0 #  #  # 
	;
run;

data temp1;set temp;
	length var_combined $20.;
	by id;

	var_combined=catx(', ', of var:);
	ct_one=count(var_combined, '1');
	ct_zero=count(var_combined, '0');
	
	if (ct_one>0 and ct_zero=0) then outcome=1;
	if (ct_one=0 and ct_zero>0) then outcome=0;
	if first.id and (ct_one>0 and ct_zero>0) then outcome=0; /* first obs and multiple values, then assign "sinus" (0) */
run;

data temp2;set temp1;
	by id;

	pre_outcome=lag(outcome);
	if first.id then pre_outcome=.;

	if not first.id and (ct_one>0 and ct_zero>0) then outcome=1-pre_outcome;
run;

The results.

X605191_0-1700928195466.png

 

Kurt_Bremser
Super User

LAG() does not look back into the observation read previously from a dataset. Instead it sets up a FIFO queue, and puts a value into it every time it is called. By calling it in a conditional branch, you uncouple the filling of the queue from the progress through the incoming dataset.

 

LAG cannot be used to calculate a value and also put it into the queue; when LAG is called in your code, outcome will always be missing (not yet set), and that value will end up in the queue for the next call.

Instead, make outcome a RETAINed variable, and initialize it properly at first.id.

This part

if first.id then do;
	ct_one=.;
	ct_zero=.;
end;

makes no sense, as the variables are not retained, and will be set to missing at the start of each DATA step iteration anyway.

 

X605191
Fluorite | Level 6

Thank you, @Kurt_Bremser. First, I removed the part you pointed out. Yes, it doesn't make sense (it was a leftover from my previous tries!). Then I added the retain statement. Still, it doesn't work as I want. Here are the codes that I tried.

 

data temp1;set temp;
	by id;
	length var_combined $20.;
	retain outcome;

	if first.id then outcome=.;

	var_combined=catx(', ', of var:);
	ct_one=count(var_combined, '1');
	ct_zero=count(var_combined, '0');
	
	if (ct_one>0 and ct_zero=0) then outcome=1;
	if (ct_one=0 and ct_zero>0) then outcome=0;
	if (ct_one>0 and ct_zero>0) then outcome=1-lag(outcome);
run;

The results.

X605191_0-1700841994642.png

 

 

Kurt_Bremser
Super User

Once again, this

if (ct_one>0 and ct_zero>0) then outcome=1-lag(outcome);

won't work. At the moment LAG is called, it puts the current value of outcome (retained from the previous iteration) into the queue, not the one that is calculated.

And you still call LAG conditionally, which will load the queue inconsistently.

 

Also point out which values of outcome you expect for your example data.

Tom
Super User Tom
Super User

Can you clarify the rules?

 

You mentioned "previous date", but your data has DATETIME values, not DATE values. 

 

Can you have two observations on the date, but different times of day.  In that case how will that impact your rule #2 ? 

 

If you did mean the previous DATETIME value then can two observations have the same DATETIME value?  If so then what do you want to do then?

 

Of is the "DATE" actually only used for sorting and the actual rule is just to toggle from the pervious result?

 

What is the rule when the FIRST observation for an ID has mixed 1 and 0 values?

 

Can you provide the output you want for the given input?

 

If the rule is that when the first observation has mixed values then new variable is missing then this is my interpretation of what you EXPECT.

data temp;
  input id $ dttm :datetime. var1-var4 expect reason $40.;
  format dttm datetime19.;
datalines;
A 20JAN2021:13:37:39 1 . . . 1 Only one and missing
A 10FEB2021:09:45:46 1 0 0 0 0 Toggle previous
A 04SEP2021:11:17:24 1 . . . 1 Only one and missing
B 09OCT2021:18:01:40 1 0 0 0 . Mixed on first observation
B 10OCT2021:06:15:52 0 . . . 0 Only zero and missing
B 13OCT2021:13:42:22 1 . . . 1 Only one and missing
B 17OCT2021:10:44:29 0 0 . . 0 Only zero and missing
B 11DEC2021:09:54:47 0 1 1 . 1 Toggle previous
B 11DEC2021:09:54:47 1 1 0 0 0 Toggle previous
C 15DEC2021:16:55:44 1 0 0 . . Mixed on first observation
C 15MAY2022:11:33:08 0 . . . 0 Only zero and missing
;

Which you can get by doing something like this:

data temp1;
  set temp;
  by id;
  if whichn(1,of var:) and 0=whichn(0,of var:) then want=1;
  else if 0=whichn(1,of var:) and whichn(0,of var:) then want=0;
  else if first.id then want=.;
  else want = not want;
  retain want;
run;

Now if you really did want to toggle the value from the previous DATETIME (and not just the previous observation) then you could retain that "lagged" value instead of the actual value and only update it on the last observation for a DATETIME value.

data temp;
  input id $ dttm :datetime. var1-var4 expect reason $40.;
  format dttm datetime19.;
datalines;
A 20JAN2021:13:37:39 1 . . . 1 Only one and missing
A 10FEB2021:09:45:46 1 0 0 0 0 Toggle previous
A 04SEP2021:11:17:24 1 . . . 1 Only one and missing
B 09OCT2021:18:01:40 1 0 0 0 . Mixed on first observation
B 10OCT2021:06:15:52 0 . . . 0 Only zero and missing
B 13OCT2021:13:42:22 1 . . . 1 Only one and missing
B 17OCT2021:10:44:29 0 0 . . 0 Only zero and missing
B 11DEC2021:09:54:47 0 1 1 . 1 Toggle previous datetime
B 11DEC2021:09:54:47 1 1 0 0 1 Toggle previous datetime
C 15DEC2021:16:55:44 1 0 0 . . Mixed on first observation
C 15MAY2022:11:33:08 0 . . . 0 Only zero and missing
;

data temp1;
  set temp;
  by id dttm;
  if whichn(1,of var:) and 0=whichn(0,of var:) then want=1;
  else if 0=whichn(1,of var:) and whichn(0,of var:) then want=0;
  else if first.id then want=.;
  else want = not lag_want;
  output;
  if last.dttm then lag_want=want;
  if last.id then lag_want=.;
  retain lag_want;
run;

 

 

X605191
Fluorite | Level 6
  • You mentioned "previous date", but your data has DATETIME values, not DATE values. 
    • You are right. The data needs to be sorted by DATETIME, not DATE.  Each observation is a new datetime.
  • Can you have two observations on the date, but different times of day.  In that case how will that impact your rule #2 ? 
    • Two obs can have the same or different date and time.
  • If you did mean the previous DATETIME value then can two observations have the same DATETIME value?  If so then what do you want to do then?
    • I meant the previous datetime.
    • No, two separate observations cannot have the same datetime.
  • Of is the "DATE" actually only used for sorting and the actual rule is just to toggle from the pervious result?
    • If there are multiple values on the same date and time, then toggle the value on the previous datetime.
  • What is the rule when the FIRST observation for an ID has mixed 1 and 0 values?
    • Good point. If mixed in the first obs, then the value should be 0.
  • Can you provide the output you want for the given input?
    • I modified the codes ("else if first.id then var=0;") in your second approach, and it perfectly works! The last three lines for lagging make sense to me.  

 

data temp;
	infile datalines dlm='#';
	input id $ dttm :datetime20. var1 var2 var3 var4;
	format dttm datetime20.;
	datalines;
		A # 20JAN2021:13:37:39.000 # 1 #  #  # 
		A # 10FEB2021:09:45:46.000 # 1 # 0 # 0 # 0
		A # 04SEP2021:11:17:24.000 # 1 #  #  # 
		B # 09OCT2021:18:01:40.000 # 1 # 0 # 0 # 0
		B # 10OCT2021:06:15:52.000 # 0 #  #  # 
		B # 13OCT2021:13:42:22.000 # 1 #  #  # 
		B # 17OCT2021:10:44:29.000 # 0 # 0 #  # 
		B # 11DEC2021:09:54:47.000 # 0 # 1 # 1	 # 
		B # 11DEC2021:09:54:47.000 # 1 # 1 # 0	 # 0 
		C # 15DEC2021:16:55:44.000 # 1 # 0 # 0	 # 
		C # 15MAY2022:11:33:08.000 # 0 #  #  # 
	;
run;

data temp1; set temp;
  by id dttm;

  if whichn(1, of var:) and 0=whichn(0, of var:) then var=1;
  else if 0=whichn(1, of var:) and whichn(0, of var:) then var=0;
  else if first.id then var=0;
  else var= not lag_var;
  output;

  if last.dttm then lag_var=var;
  if last.id then lag_var=.;
  retain lag_var;
run;

 

The results (which I wanted).

 

  • X605191_1-1700966826792.png

     


     

Tom
Super User Tom
Super User

Don't call the new variable VAR and then use OF VAR: in the code because the NEW variable will be in included in that list (at least in the ELSE clause after the data step compiler has had a chance to see that VAR exists).

 

Either use VAR1-VAR4 or call the variable something that does not start with VAR.

 

Also don't indent lines of data.  It will just confuse you (and SAS) about where the data starts on the line.  And if you have accidentally turned off the editor option to replace tabs with spaces you might end up with actual TAB characters in your data if you run the code without using Display Manager.

X605191
Fluorite | Level 6

Thank you for the tips, @Tom. It actually solved the problem I had with the real data. My codes didn't work perfectly when I used the real data, and not using the same prefix (var) solved the problem! 

X605191
Fluorite | Level 6

The final codes.

data temp1; set temp;
  by id dttm;

  if whichn(1, of var:) and 0=whichn(0, of var:) then outcome=1;
  else if 0=whichn(1, of var:) and whichn(0, of var:) then outcome=0;
  else if first.id then outcome=0;
  else outcome= not lag_outcome;
  output;

  if last.dttm then lag_outcome=outcome;
  if last.id then lag_outcome=.;
  retain lag_outcome;
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 4604 views
  • 6 likes
  • 4 in conversation