Hello everyone,
I've tried tackling the problem below multiple ways including using FIRST. and LAST. in BY-GROUP analysis, WHILE loops, but have not been unsuccessful so far. I'm running out of time and if anyone can provide some direction it'd be greatly appreciated.
Here's the situation:
I have a data set which is 'tenurized' based on the occurrence of a particular event. In this case, the event being a 'limit placement' which happens at a particular date defined as 'Limit_dt'. Therefore, as of the 'Limit_dt' the 'Tenure' field gets set to = 0 and increments by 1 until the last row for that particular customer group (ID). The data is sorted by ID, Limit_dt and Date.
See below an illustration of what the data looks like as well as better explanation of each field. Field Tenure2 (highlighted in yellow) is what I'm attempting to tackle right now. Please ignore it for now.
Essentially, what I need to do here is 're-tenurize' this data set based on the FIRST occurrence Score_flag ='Y' within each customer group. To expand a bit on this, Score_flag = 'Y' when Score >= Score30 so this flag can basically jump back and forth between 'Y' and 'N' from month to month.
What I need to do is create a field named 'Tenure2' which gets set to '0' at the FIRST occurrence (and ONLY the first occurrence for each ID) of Score_flag='Y' for a particular ID and increment by 1 from there until the last record/row for that customer (ID). The field 'Tenure2' in the illustration above exemplifies the values that this field should take.
Hoping someone can direct me to a simple solution to this.
Thanks in advance for your support.
Add another flag. If this doesn't work, please post sample data following these instructions:
data WANT;
retain TENURE2;
set HAVE;
by ID;
length first_flag $1.;
retain first_flag;
if first.ID then call missing(TENURE2, first_flag);;
if score_flag='Y' and missing(first_flag) then do;
tenure2=0;
first_flag='Y';
end;
else TENURE2=TENURE2+1;
run;
@renanm wrote:
Hi @ChrisNZ , thanks for your attempt at helping to solve this.
This is similar to one of my previous attempts but unfortunately it doesn't solve the problem here.
Here are some of the reasons why:
#1 - Whenever there are consecutive Score_flag = 'Y' month after month, the Tenure2 fields gets the value '0' again and again instead of incrementing by 1.
#2 - Similarly, if Score_flag alternates between 'Y' and 'N' from month to month, every time it gets set to 'Y' again, the field Tenure2 gets re-set to '0' again.
See illustrations of these situations below:
Example 1:
Example 2:
Example 3:
Essentially, Tenure2 needs to be set to '0' ONLY at the FIRST occurrence of Score_flag = 'Y' within each customer group.
This is the missing piece I can't seem to figure out.
Any additional help is much appreciated.
Thanks !
Like this?
data WANT;
retain TENURE2;
set HAVE;
by ID;
if first.ID then TENURE2=.;
if SCORE_FLAG='Y' then TENURE2=0;
else TENURE2=TENURE2+1;
run;
Hi @ChrisNZ , thanks for your attempt at helping to solve this.
This is similar to one of my previous attempts but unfortunately it doesn't solve the problem here.
Here are some of the reasons why:
#1 - Whenever there are consecutive Score_flag = 'Y' month after month, the Tenure2 fields gets the value '0' again and again instead of incrementing by 1.
#2 - Similarly, if Score_flag alternates between 'Y' and 'N' from month to month, every time it gets set to 'Y' again, the field Tenure2 gets re-set to '0' again.
See illustrations of these situations below:
Example 1:
Example 2:
Example 3:
Essentially, Tenure2 needs to be set to '0' ONLY at the FIRST occurrence of Score_flag = 'Y' within each customer group.
This is the missing piece I can't seem to figure out.
Any additional help is much appreciated.
Thanks !
Hi @Reeza , my apologies. I'm new to this forum and learning how to go about this.
How would you suggest I go about illustrating the situation? I've created a hypothetical data set which mirrors the issue I'm having due to confidentiality reasons.
Thanks for your support and advice.
Thanks @Reeza for sharing these instructions and guidelines. I'll ensure to follow them the next time around.
Add another flag. If this doesn't work, please post sample data following these instructions:
data WANT;
retain TENURE2;
set HAVE;
by ID;
length first_flag $1.;
retain first_flag;
if first.ID then call missing(TENURE2, first_flag);;
if score_flag='Y' and missing(first_flag) then do;
tenure2=0;
first_flag='Y';
end;
else TENURE2=TENURE2+1;
run;
@renanm wrote:
Hi @ChrisNZ , thanks for your attempt at helping to solve this.
This is similar to one of my previous attempts but unfortunately it doesn't solve the problem here.
Here are some of the reasons why:
#1 - Whenever there are consecutive Score_flag = 'Y' month after month, the Tenure2 fields gets the value '0' again and again instead of incrementing by 1.
#2 - Similarly, if Score_flag alternates between 'Y' and 'N' from month to month, every time it gets set to 'Y' again, the field Tenure2 gets re-set to '0' again.
See illustrations of these situations below:
Example 1:
Example 2:
Example 3:
Essentially, Tenure2 needs to be set to '0' ONLY at the FIRST occurrence of Score_flag = 'Y' within each customer group.
This is the missing piece I can't seem to figure out.
Any additional help is much appreciated.
Thanks !
Thanks @Reeza. This seems to solve it and quite interesting the way you approached this. Helped me to learn a lot. Much appreciated.
> Whenever there are consecutive Score_flag = 'Y' month after month, the Tenure2 fields gets the value '0'
This seems easy to solve. Have you even tried?
data WANT;
retain TENURE2;
set HAVE;
by ID;
if first.ID then TENURE2=.;
if SCORE_FLAG='Y' & TENURE2=. then TENURE2=-1;
TENURE2=TENURE2+1;
run;
And yes, as @Reeza said, posting images as data prevents us from running code on actual data (unless we do the work for you and create actual data, but why would we do your work?)
Hi @ChrisNZ ,
Thanks again for your help.
As mentioned in my response to Reeza above, I'm new to this forum and did not know posting images was not acceptable. I thought that would be a good way to clearly illustrate the situation I was dealing with and clearly demonstrate it to the contributors of the forum.
It won't happen again. I'll follow the guidelines shared by Reeza next time. That wasn't meant for you to do my work for me, I simply
didn't know better.
And to answer your other question of whether or not I have even tried, the answer is YES. I have tried multiple times and searched for different solutions. Unfortunately, as you can tell, I'm learning and wasn't able to find a solution. I'll get there and hopefully one day be
able to help others with similar situations.
Thanks for sharing another alternative to the problem.
@renanm wrote:
Hi @ChrisNZ ,
Thanks again for your help.
As mentioned in my response to Reeza above, I'm new to this forum and did not know posting images was not acceptable. I thought that would be a good way to clearly illustrate the situation I was dealing with and clearly demonstrate it to the contributors of the forum.
It isn't that images are "not acceptable" but that they can be misleading. We will get a number of posts each week often involving dates but sometimes other values, that appear to be one thing, SAS date value or numeric and are actually character values in the users data. Or "dates" that look like 20190215 that are actually simple numbers and do not have the proper values for use with date functions and formats.
Also testing code without actual values is difficult and too time consuming for most of us to type. There is a SAS macro program at https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.
We also like to see the code that you tried because sometimes all it takes is tweak to get it working. It is not uncommon for someone coming from a different programming or scripting language to use something that may be valid in SAS but means something different than the other language. The fun thing with this can be the rules SAS uses.
A moderately common issue is comparing values:
if x = 3 or 4 then do <some code>; for example. Does NOT throw an error but will always do the code represented by <some code>
Reason: the value 4, being non-zero is treated as true all by itself and x is only compared to 3 because there was no instruction to compare x to 4. So the code could be fixed by either
if x=3 or x=4 then do <some code>.
or
if x in (3,4) then do <some code>.
So if you had posted code with something like the first If we could relatively quickly point that out. Code and log entries should be pasted into code boxes opened with either the {I} or "running man" icon to preserve formatting. The main message windows will reformat text removing lots of white space and that often makes data step code fail in one way or another. Error messages from the log also often have diagnostic characters like an _ where the error is found. If you paste one of those into the main message window the _ character usually ends up under something not related to the error.
Very helpful context and guidelines. Thanks @ballardw !
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.