I'm not well versed in SAS language, just enough to get by with basic logic in regards to data and proc sql steps, maybe a sort here and there.
Hoping for some direction on a problem I'm facing.
Dataset
DATE | CUST_ID | SYSTEM | COLOR |
3/31/2021 | 15 | ORA | BLUE |
4/30/2021 | 13 | ORA | BLUE |
5/31/2021 | 10 | ORA | BLUE |
6/30/2021 | 19 | ORA | BLUE |
7/31/2021 | 11 | ORA | BLUE |
8/31/2021 | 14 | ORA | BLUE |
9/30/2021 | 21 | ORA | BLUE |
10/31/2021 | 16 | ORA | BLUE |
11/30/2021 | 12 | ORA | BLUE |
12/31/2021 | 17 | ORA | BLUE |
1/31/2022 | 22 | ORA | BLUE |
2/28/2022 | 18 | ORA | BLUE |
3/31/2022 | 20 | SIS | UNK |
4/30/2022 | 23 | SIS | UNK |
The issue: When a customer switches to another system their color data goes missing. Hence the return value of unknown. What I need assistance in is coming up with the right logic to assign the unknown fields to the last known value. Now it doesn't have to be the last known value as theoretically the color should never change once set, until it loads into another system.
So the dataset I want to return is:
DATE | CUST_ID | SYSTEM | COLOR |
3/31/2021 | 15 | ORA | BLUE |
4/30/2021 | 13 | ORA | BLUE |
5/31/2021 | 10 | ORA | BLUE |
6/30/2021 | 19 | ORA | BLUE |
7/31/2021 | 11 | ORA | BLUE |
8/31/2021 | 14 | ORA | BLUE |
9/30/2021 | 21 | ORA | BLUE |
10/31/2021 | 16 | ORA | BLUE |
11/30/2021 | 12 | ORA | BLUE |
12/31/2021 | 17 | ORA | BLUE |
1/31/2022 | 22 | ORA | BLUE |
2/28/2022 | 18 | ORA | BLUE |
3/31/2022 | 20 | SIS | BLUE |
4/30/2022 | 23 | SIS | BLUE |
Thanks for your help in advance!
In the case of unknown color you want to "assign the unknown fields to the last known value." Apparently, you are ok with assigning a color to CUST_ID 20 with a known value from CUST_ID 18. Is that correct?
Assuming the answer is yes, then the code below does what you need (untested in the absence of sample data in the form of a working data step).
data want (drop=_:);
set have;
length _lastknowncolor $4;
retain _lastknowncolor ;
if color^= 'UNK' then _lastknowncolor=color;
else color=_lastknowncolor;
run;
The key here is the RETAIN statement which tells SAS not to reset the retained variable to missing with each new iteration of the data step (i.e. each incoming obs in the above case).
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.