I'm not well versed in SAS language, just enough to get by with basic logic in regards to data and proc sql steps, maybe a sort here and there.
Hoping for some direction on a problem I'm facing.
Dataset
DATE | CUST_ID | SYSTEM | COLOR |
3/31/2021 | 15 | ORA | BLUE |
4/30/2021 | 13 | ORA | BLUE |
5/31/2021 | 10 | ORA | BLUE |
6/30/2021 | 19 | ORA | BLUE |
7/31/2021 | 11 | ORA | BLUE |
8/31/2021 | 14 | ORA | BLUE |
9/30/2021 | 21 | ORA | BLUE |
10/31/2021 | 16 | ORA | BLUE |
11/30/2021 | 12 | ORA | BLUE |
12/31/2021 | 17 | ORA | BLUE |
1/31/2022 | 22 | ORA | BLUE |
2/28/2022 | 18 | ORA | BLUE |
3/31/2022 | 20 | SIS | UNK |
4/30/2022 | 23 | SIS | UNK |
The issue: When a customer switches to another system their color data goes missing. Hence the return value of unknown. What I need assistance in is coming up with the right logic to assign the unknown fields to the last known value. Now it doesn't have to be the last known value as theoretically the color should never change once set, until it loads into another system.
So the dataset I want to return is:
DATE | CUST_ID | SYSTEM | COLOR |
3/31/2021 | 15 | ORA | BLUE |
4/30/2021 | 13 | ORA | BLUE |
5/31/2021 | 10 | ORA | BLUE |
6/30/2021 | 19 | ORA | BLUE |
7/31/2021 | 11 | ORA | BLUE |
8/31/2021 | 14 | ORA | BLUE |
9/30/2021 | 21 | ORA | BLUE |
10/31/2021 | 16 | ORA | BLUE |
11/30/2021 | 12 | ORA | BLUE |
12/31/2021 | 17 | ORA | BLUE |
1/31/2022 | 22 | ORA | BLUE |
2/28/2022 | 18 | ORA | BLUE |
3/31/2022 | 20 | SIS | BLUE |
4/30/2022 | 23 | SIS | BLUE |
Thanks for your help in advance!
In the case of unknown color you want to "assign the unknown fields to the last known value." Apparently, you are ok with assigning a color to CUST_ID 20 with a known value from CUST_ID 18. Is that correct?
Assuming the answer is yes, then the code below does what you need (untested in the absence of sample data in the form of a working data step).
data want (drop=_:);
set have;
length _lastknowncolor $4;
retain _lastknowncolor ;
if color^= 'UNK' then _lastknowncolor=color;
else color=_lastknowncolor;
run;
The key here is the RETAIN statement which tells SAS not to reset the retained variable to missing with each new iteration of the data step (i.e. each incoming obs in the above case).
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.