Hello,
We have old sas programs where existing or new variable are mapped using the combined values of two or more variables.
Most of the time, a select statement is used as below:
data Transactions;
length TRANS $ 3;
infile datalines delimiter=',';
input Policy_category $ Network_Provider $;
datalines;
R,20000
A,20000
A,10008
M,10008
R,10015
M,
,10015
,
;
run;
Data Transactions;
set Transactions;
SELECT;
WHEN (Policy_Category = 'R') TRANS = 'CGN';
WHEN (Policy_Category = 'A' and Network_Provider='20000') TRANS='AAA';
WHEN (Policy_Category = 'A' and Network_Provider<>'20000') TRANS='BBB';
OTHERWISE TRANS='CCC';
END;
run;
The problem that I see with this approach is that the mapping in imbedded into the SAS code making the maintenance hard to do.
What's the best way to do that task in SAS if we want to centralize the information.
P.S: All the good ideas are welcome!
Outcome Variable | Outcome Value | Criteria |
Trans | CGM | Policy_Category = 'R' |
Trans | AAA | Policy_Category = 'A' and Network_Provider='20000' |
Trans | BBB | Policy_Category = 'A' and Network_Provider ne '20000' |
Trans | CCC | OTHER |
The most basic form to track/generate these SELECT statements.
Obviously, it depends on multiple factors.
How many types of transformations are there? Is there any pattern to them?
If there is no pattern then code is probably the easiest way to record the transformation. To make it reusable you might want to make a macro. Then when the rules change you just publish a new version of the macro.
Note your code is wrong. You are trying to use the MAX operator ( <> ) with character values. The MAX operator only works with numeric values.
Plus it looks like you probably meant to test for non equality. Use the NE operator for that. (Only in SQL does that strange less than and greater than symbol mean not equal).
But in reality you don't need that part of the test anyway since you already captured the cases where it was equal with the previous WHEN statement.
SELECT;
WHEN (Policy_Category = 'R') TRANS = 'CGN';
WHEN (Policy_Category = 'A' and Network_Provider='20000') TRANS='AAA';
WHEN (Policy_Category = 'A') TRANS='BBB';
OTHERWISE TRANS='CCC';
END;
SELECT;
WHEN (Policy_Category = 'R') TRANS = 'CGN';
WHEN (Policy_Category = 'A' and Network_Provider='20000') TRANS='AAA';
WHEN (Policy_Category = 'A') and Network_Provider <> '20000') TRANS='BBB';
OTHERWISE TRANS='CCC';
END;
Here's the correct code.
My questions are: is there a way to centralize the information ? Is there a procedure to do that kind of mapping or better way to do it ?
You are still trying to use the MAX operator with character values.
That CANNOT work.
61 data test; 62 string ='Hello'; 63 if string <> 'Hello' then put 'What??'; NOTE: The "<>" operator is interpreted as "MAX". 64 run; NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column). 63:13 NOTE: Invalid numeric data, 'Hello' , at line 63 column 13. string=Hello _ERROR_=1 _N_=1 NOTE: The data set WORK.TEST has 1 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.01 seconds
Just remove <> and >< from the set of operators you use in code (any code) as they are confusing. Not sure who thought that the first looks like it means either max or not equal. And if it does then what logic suggests that the second one means something different?
If you want to test is two value are not equal then tell the computer that is what you want to do. There are many options that do not involve using a symbol that has no clear meaning.
Network_Provider ne '20000'
not (Network_Provider = '20000')
not (Network_Provider eq '20000')
Outcome Variable | Outcome Value | Criteria |
Trans | CGM | Policy_Category = 'R' |
Trans | AAA | Policy_Category = 'A' and Network_Provider='20000' |
Trans | BBB | Policy_Category = 'A' and Network_Provider ne '20000' |
Trans | CCC | OTHER |
The most basic form to track/generate these SELECT statements.
If you were dealing with a single variable then a custom format would be one of the most efficient ways to deal with recoding or "mapping" values as you would not need to add a variable at all for most uses.
As soon as you go to two or more variables you don't have much choice. If the values are discrete then perhaps storing the value sets in another data set and then combining with that set, whether a data step Merge or Hash or SQL join might be the choice. But then you have one or more permanent data sets you need to make sure don't get lost or corrupted.
Caution: Using the construct of the same data set as input and output like below is dangerous as it completely replaces the source data set. Logic errors may result in loss of data or possibly worse minor unexpected value changes.
Data Transactions; set Transactions;
Personally, when I have something like your admittedly limited example, I place the mapping code into the same step that READS the data.
I also tend to create custom informats for discrete value variables like your Policy_category and Network_provider to through invalid data messages if a value other than the ones you expect to see, including unexpected missing values, appears in the source data.
Those invalid data messages then become a clue that the mapping code may need to be updated to handle new values.
Given that your "lookup" conditions do not all use the same number of variables, it may be messy to centralize as @Reeza suggested.
However, you might benefit simply from nesting conditions, which could simplify maintenance. For instance, the code you present could be recoded as:
select ;
WHEN (Policy_Category = 'R') TRANS = 'CGN';
WHEN (Policy_Category = 'A') do;
select(netword_provider);
when ('20000') TRANS='AAA';
otherwise TRANS='BBB';
end;
end;
OTHERWISE TRANS='CCC';
end;
This will be best, of course, if the various conditions have a nice nesting structure.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.