- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
**ADDENDUM to original post: I realized that this issue was being caused by starting with a "RETAIN" statement, which I use to put the variables in the desired order. But I'd still like to leave this question up because I'd appreciate any feedback on:
- How does a RETAIN statement work? When does it affect the outputs of a command in a DATA step?
- Does anyone have alternate/preferred strategies for reordering the variables in a dataset?
Thanks!
***********************************************************************
Original post:
Hello SAS community,
I'm very confused about how SAS deciphers "IF" Statements in the DATA step.
In this specific case, I'm working with an account dataset that has some conflicting information about when accounts close, and I am constructing an "effective" close date.
Earlier in my data step, I used some IF statements to construct my desired close date. The last step is to convert that numeric close date to a string variable in the format YYYYMM.
Here's what I tried:
DATA WORK.dates_test;
SET WORK.raw_dates;
close_eff_n = acct_close_dte_n;
IF closed = 1 AND acct_close_dte_n = . THEN DO;
close_eff_n = maxdate_n;
END;
*(omitting some additional logic used here for parsimony);
IF close_eff_n > 0 THEN DO;
close_dte_eff = put(close_eff_n,yymmn.);
END;
RUN;
I had earlier written this last segment as:
close_dte_eff = put(close_eff_n,yymmn.);
but this populated the string variable close_dte_eff with a value of "." when close_eff_n was missing, which is why I'm now trying to implement this conditional logic.
The problem is: where this condition fails, SAS populates the close_dte_eff field with whatever the last non-failed value was, which is completely incorrect.
e.g.
I have:
close_eff_n |
01MAR2023 |
01APR2023 |
. |
. |
01JUL2021 |
I want:
close_eff_n | close_dte_eff |
01MAR2023 | 202303 |
01APR2023 | 202304 |
. | |
. | |
01JUL2021 | 202107 |
But instead I get:
close_eff_n | close_dte_eff |
01MAR2023 | 202303 |
01APR2023 | 202304 |
. | 202304 |
. | 202304 |
01JUL2021 | 202107 |
When I tried to replicate this problem with a simplified dataset, i.e. just taking the final input variables and creating the desired output, I got the result I want, so I suspect it might have something to do with the preceding IF-statements.
I can think of plenty of workarounds to get this to work as intended, so my question is not so much how to fix this, but why is this happening?
There's something fundamental about how the "IF-statement" is being processed where rows that fail the "IF" condition are being populated with the value of the last row that met that condition, and I would like to understand when SAS applies this behavior and when it does not. I can see this being a useful feature in some limited cases, but it's generally not what I would want to do when applying conditional logic.
I had thought that these sort of situations where SAS operates on one row depending on what was in the previous row only happen when there is a "BY" statement, but obviously that's incorrect as there is no "BY" statement in this DATA step.
I'd really appreciate some explanation as to when actions are applied to rows that do not meet the specified condition in an "IF" statement, and how to control that behavior, so I can make sure that the commands I write are applying to the rows that I expect them to apply to.
Please let me know if I can provide any other context or information that would be helpful.
Many thanks,
Scott
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
That is exactly what RETAIN is intended for. Note that your usage of RETAIN to set the variable order is just taking advantage of the fact that SAS sets the order of the variables when they first are "seen" by the compiler. It can be useful since a simple RETAIN statement (without any initial values) will not force SAS to set the TYPE of the variable.
Since the value is retained it can only change when you explicitly change it.
You just need to add an ELSE clause.
IF close_eff_n > 0 THEN DO;
close_dte_eff = put(close_eff_n,yymmn.);
END;
else close_dte_eff =' ';
So that the value is set on every observation.
Alternatively you could change the value of the MISSING option and eliminate the IF statement.
option missing=' ';
....
close_dte_eff = put(close_eff_n,yymmn.);
Remember to set the missing option back to a period after the data step.
PS Your IF statement is checking for values after 01JAN1960 which is the date that zero represents. Is that really what you meant to do? If you wanted to test for missing why not do that instead?
IF not missing(close_eff_n) THEN DO;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Can you describe the rules involved for selecting the effective close?
An example input data set and the expected result would go a way toward a workable solution.
Serious comment: DO NOT MAKE YOUR EFFECTIVE DATE A CHARACTER VALUE. As soon as you try to use the effective date you will find that many things are going to involve turning that character value back into an actual date so start with one.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
else close_dte_eff = " ";
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
That is exactly what RETAIN is intended for. Note that your usage of RETAIN to set the variable order is just taking advantage of the fact that SAS sets the order of the variables when they first are "seen" by the compiler. It can be useful since a simple RETAIN statement (without any initial values) will not force SAS to set the TYPE of the variable.
Since the value is retained it can only change when you explicitly change it.
You just need to add an ELSE clause.
IF close_eff_n > 0 THEN DO;
close_dte_eff = put(close_eff_n,yymmn.);
END;
else close_dte_eff =' ';
So that the value is set on every observation.
Alternatively you could change the value of the MISSING option and eliminate the IF statement.
option missing=' ';
....
close_dte_eff = put(close_eff_n,yymmn.);
Remember to set the missing option back to a period after the data step.
PS Your IF statement is checking for values after 01JAN1960 which is the date that zero represents. Is that really what you meant to do? If you wanted to test for missing why not do that instead?
IF not missing(close_eff_n) THEN DO;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I suggest you share with us some representative sample data (your table raw_dates) with all the variables required to derive the effective date, show us the desired result based on this sample data and explain us the logic how to get from have to want.
If you provide us with this information then we can certainly help you with the code.
Please amend below code to share the sample data.
data raw_dates;
infile datalines dsd dlm=',' truncover;
input closed acct_close_dte:date9.;
format acct_close_dte date9.;
datalines;
1,01MAR2023
1,01APR2023
0,01May2023
1,.
1,01JUL2023
;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks to all for your responses!