BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
scwein
Fluorite | Level 6

**ADDENDUM to original post: I realized that this issue was being caused by starting with a "RETAIN" statement, which I use to put the variables in the desired order. But I'd still like to leave this question up because I'd appreciate any feedback on:

  1. How does a RETAIN statement work? When does it affect the outputs of a command in a DATA step?
  2. Does anyone have alternate/preferred strategies for reordering the variables in a dataset?

Thanks!

***********************************************************************

Original post:

 

Hello SAS community,

 

I'm very confused about how SAS deciphers "IF" Statements in the DATA step.

 

In this specific case, I'm working with an account dataset that has some conflicting information about when accounts close, and I am constructing an "effective" close date.

Earlier in my data step, I used some IF statements to construct my desired close date. The last step is to convert that numeric close date to a string variable in the format YYYYMM.

 

Here's what I tried:

 

DATA WORK.dates_test;
SET WORK.raw_dates;

close_eff_n = acct_close_dte_n;
IF closed = 1 AND acct_close_dte_n = . THEN DO;
close_eff_n = maxdate_n;
END;
*(omitting some additional logic used here for parsimony); IF close_eff_n > 0 THEN DO; close_dte_eff = put(close_eff_n,yymmn.); END; RUN;

 

I had earlier written this last segment as:

 

close_dte_eff = put(close_eff_n,yymmn.);

but this populated the string variable close_dte_eff with a value of "." when close_eff_n was missing, which is why I'm now trying to implement this conditional logic.

 

The problem is: where this condition fails, SAS populates the close_dte_eff field with whatever the last non-failed value was, which is completely incorrect.

e.g.

I have:

close_eff_n

01MAR2023

01APR2023
.
.
01JUL2021

 

I want:

close_eff_nclose_dte_eff

01MAR2023

202303

01APR2023202304
. 
. 
01JUL2021202107

 

But instead I get:

close_eff_nclose_dte_eff

01MAR2023

202303

01APR2023202304
.202304
.202304
01JUL2021202107

 

When I tried to replicate this problem with a simplified dataset, i.e. just taking the final input variables and creating the desired output, I got the result I want, so I suspect it might have something to do with the preceding IF-statements.

 

I can think of plenty of workarounds to get this to work as intended, so my question is not so much how to fix this, but why is this happening?

 

There's something fundamental about how the "IF-statement" is being processed where rows that fail the "IF" condition are being populated with the value of the last row that met that condition, and I would like to understand when SAS applies this behavior and when it does not. I can see this being a useful feature in some limited cases, but it's generally not what I would want to do when applying conditional logic.

 

I had thought that these sort of situations where SAS operates on one row depending on what was in the previous row only happen when there is a "BY" statement, but obviously that's incorrect as there is no "BY" statement in this DATA step.

 

I'd really appreciate some explanation as to when actions are applied to rows that do not meet the specified condition in an "IF" statement, and how to control that behavior, so I can make sure that the commands I write are applying to the rows that I expect them to apply to.

 

Please let me know if I can provide any other context or information that would be helpful.

 

Many thanks,

Scott

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

That is exactly what RETAIN is intended for.  Note that your usage of RETAIN to set the variable order is just taking advantage of the fact that SAS sets the order of the variables when they first are "seen" by the compiler. It can be useful since a simple RETAIN statement (without any initial values) will not force SAS to set the TYPE of the variable.

 

Since the value is retained it can only change when you explicitly change it.

 

You just need to add an ELSE clause.

IF close_eff_n > 0 THEN DO;
  close_dte_eff = put(close_eff_n,yymmn.);
END;
else close_dte_eff =' ';

So that the value is set on every observation.

 

Alternatively you could change the value of the MISSING option and eliminate the IF statement.

option missing=' ';
....
close_dte_eff = put(close_eff_n,yymmn.);

Remember to set the missing option back to a period after the data step.

 

PS Your IF statement is checking for values after 01JAN1960 which is the date that zero represents.  Is that really what you meant to do?  If you wanted to test for missing why not do that instead?

IF not missing(close_eff_n) THEN DO;

 

View solution in original post

5 REPLIES 5
ballardw
Super User

Can you describe the rules involved for selecting the effective close?

 

An example input data set and the expected result would go a way toward a workable solution.

 

Serious comment: DO NOT MAKE YOUR EFFECTIVE DATE A CHARACTER VALUE. As soon as you try to use the effective date you will find that many things are going to involve turning that character value back into an actual date so start with one.

Astounding
PROC Star
You're right, the retain statement causes the problem. In a simple manner. Retain tells SAS to let the value sit there and don't reset it just you begin a new observation. IF THEN is irrelevant, although you capitalize on it by adding an ELSE statement:

else close_dte_eff = " ";
Tom
Super User Tom
Super User

That is exactly what RETAIN is intended for.  Note that your usage of RETAIN to set the variable order is just taking advantage of the fact that SAS sets the order of the variables when they first are "seen" by the compiler. It can be useful since a simple RETAIN statement (without any initial values) will not force SAS to set the TYPE of the variable.

 

Since the value is retained it can only change when you explicitly change it.

 

You just need to add an ELSE clause.

IF close_eff_n > 0 THEN DO;
  close_dte_eff = put(close_eff_n,yymmn.);
END;
else close_dte_eff =' ';

So that the value is set on every observation.

 

Alternatively you could change the value of the MISSING option and eliminate the IF statement.

option missing=' ';
....
close_dte_eff = put(close_eff_n,yymmn.);

Remember to set the missing option back to a period after the data step.

 

PS Your IF statement is checking for values after 01JAN1960 which is the date that zero represents.  Is that really what you meant to do?  If you wanted to test for missing why not do that instead?

IF not missing(close_eff_n) THEN DO;

 

Patrick
Opal | Level 21

I suggest you share with us some representative sample data (your table raw_dates) with all the variables required to derive the effective date, show us the desired result based on this sample data and explain us the logic how to get from have to want.  

If you provide us with this information then we can certainly help you with the code.

 

Please amend below code to share the sample data.

data raw_dates;
  infile datalines dsd dlm=',' truncover;
  input closed acct_close_dte:date9.;
  format acct_close_dte date9.;
  datalines;
1,01MAR2023
1,01APR2023
0,01May2023
1,.
1,01JUL2023
;
scwein
Fluorite | Level 6

Thanks to all for your responses!

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 957 views
  • 2 likes
  • 5 in conversation