BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
alepage
Barite | Level 11

Hello,

 

We have old sas programs where existing or new variable are mapped using the combined values of two or more variables.

Most of the time, a select statement is used as below:

 

data Transactions;
length TRANS $ 3;
   infile datalines delimiter=',';
   input Policy_category $ Network_Provider $;
   datalines;
R,20000
A,20000
A,10008
M,10008
R,10015
M,      
 ,10015
  , 
;

run;

Data Transactions;
set Transactions;
      SELECT;
             WHEN (Policy_Category = 'R') TRANS = 'CGN';
             WHEN (Policy_Category = 'A' and Network_Provider='20000') TRANS='AAA';
             WHEN (Policy_Category = 'A' and Network_Provider<>'20000') TRANS='BBB';
             OTHERWISE TRANS='CCC';
        END;

run;

The problem that I see with this approach is that the mapping in imbedded into the SAS code making the maintenance hard to do.

 

What's the best way to do that task in SAS if we want to centralize the information.

P.S:   All the good ideas are welcome!

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
Outcome Variable Outcome Value Criteria
Trans CGM Policy_Category = 'R'
Trans AAA Policy_Category = 'A' and Network_Provider='20000'
Trans BBB Policy_Category = 'A' and Network_Provider ne '20000'
Trans CCC OTHER

 

The most basic form to track/generate these SELECT statements.

 

View solution in original post

13 REPLIES 13
PeterClemmensen
Tourmaline | Level 20

Obviously, it depends on multiple factors.

 

  1. How many mappings are there?
  2. Is the mapping done in multiple programs or just this one?
alepage
Barite | Level 11
They are many mappings, and that, in many SAS programs. So the maitenance is hard to do. It is why I was asking if there is a better way to do it othterwise using the select statement and to centralize those mappings.
Tom
Super User Tom
Super User

How many types of transformations are there?  Is there any pattern to them?

If there is no pattern then code is probably the easiest way to record the transformation.  To make it reusable you might want to make a macro.  Then when the rules change you just publish a new version of the macro.

 

Note your code is wrong.  You are trying to use the MAX operator ( <> ) with character values.  The MAX operator only works with numeric values.

 

Plus it looks like you probably meant to test for non equality.  Use the NE operator for that.  (Only in SQL does that strange less than and greater than symbol mean not equal).  

 

But in reality you don't need that part of the test anyway since you already captured the cases where it was equal with the previous WHEN statement.

SELECT;
  WHEN (Policy_Category = 'R') TRANS = 'CGN';
  WHEN (Policy_Category = 'A' and Network_Provider='20000') TRANS='AAA';
  WHEN (Policy_Category = 'A') TRANS='BBB';
  OTHERWISE TRANS='CCC';
END;
alepage
Barite | Level 11
SELECT;
  WHEN (Policy_Category = 'R') TRANS = 'CGN';
  WHEN (Policy_Category = 'A' and Network_Provider='20000') TRANS='AAA';
  WHEN (Policy_Category = 'A') and Network_Provider <> '20000') TRANS='BBB';
  OTHERWISE TRANS='CCC';
END;

Here's the correct code.

 

My questions are: is there a way to centralize the information ? Is there a procedure to do that kind of mapping or better way to do it ?

Reeza
Super User
No, only processes you can design.
For example, you could create a lookup file with the conditions, either each in a column or line and then build the code dynamically. But that depends on the complexity of the conditions. For something as shown, very doable. For more complex conditions, very much a pain.
Tom
Super User Tom
Super User

You are still trying to use the MAX operator with character values.

That CANNOT work.

61   data test;
62     string ='Hello';
63     if string <> 'Hello' then put 'What??';
NOTE: The "<>" operator is interpreted as "MAX".
64   run;

NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
      63:13
NOTE: Invalid numeric data, 'Hello' , at line 63 column 13.
string=Hello _ERROR_=1 _N_=1
NOTE: The data set WORK.TEST has 1 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds

Just remove <> and >< from the set of operators you use in code (any code) as they are confusing.  Not sure who thought that the first looks like it means either max or not equal.  And if it does then what logic suggests that the second one means something different?

 

If you want to test is two value are not equal then tell the computer that is what you want to do.  There are many options that do not involve using a symbol that has no clear meaning.

Network_Provider ne '20000'
not (Network_Provider = '20000')
not (Network_Provider eq '20000')

 

 

 

 

 

 

alepage
Barite | Level 11
Thank you very much for this information. I will keep that in mind .
Reeza
Super User
Outcome Variable Outcome Value Criteria
Trans CGM Policy_Category = 'R'
Trans AAA Policy_Category = 'A' and Network_Provider='20000'
Trans BBB Policy_Category = 'A' and Network_Provider ne '20000'
Trans CCC OTHER

 

The most basic form to track/generate these SELECT statements.

 

alepage
Barite | Level 11
Hello Reeza, Thanks for the table. I will use it in my supporting documentation.
Regarding the SAS code, I was asking the community just to see it could exist a nice way to do that task.
It seems that the best approach is still the select statement.

What I could do , it's to create a macro function, one per outcome variable, then keep all those into a subfolder named mapping and then call these macro when needed. At least, all the information will be hosted at the same place.

Thank all of you for your time and help.
Reeza
Super User
Given your stated problem, I agree this is probably a good approach. It makes it reusable so you aren't having multiple select in different places and are updating code in one place.

Reeza
Super User
As Peter mentioned, it really depends. You would need to go through the different selects and develop a system that makes sense for your programs. It could be as simple as having each condition in it's own columns but if you have nested OR conditions that could get complex.
ballardw
Super User

If you were dealing with a single variable then a custom format would be one of the most efficient ways to deal with recoding or "mapping" values as you would not need to add a variable at all for most uses.

 

As soon as you go to two or more variables you don't have much choice. If the values are discrete then perhaps storing the value sets in another data set and then combining with that set, whether a data step Merge or Hash or SQL join might be the choice. But then you have one or more permanent data sets you need to make sure don't get lost or corrupted.

 

Caution: Using the construct of the same data set as input and output like below is dangerous as it completely replaces the source data set. Logic errors may result in loss of data or possibly worse minor unexpected value changes.

Data Transactions;
set Transactions;

Personally, when I have something like your admittedly limited example, I place the mapping code into the same step that READS the data.

I also tend to create custom informats for discrete value variables like your Policy_category and Network_provider to through invalid data messages if a value other than the ones you expect to see, including unexpected missing values, appears in the source data.

 

Those invalid data messages then become a clue that the mapping code may need to be updated to handle new values.

 

mkeintz
PROC Star

Given that your "lookup" conditions do not all use the same number of variables, it may be messy to centralize as @Reeza suggested.

 

However, you might benefit simply from nesting conditions, which could simplify maintenance.  For instance, the code you present could be recoded as:

 

select ;
  WHEN (Policy_Category = 'R') TRANS = 'CGN';
  WHEN (Policy_Category = 'A') do;
     select(netword_provider);
       when ('20000') TRANS='AAA';
       otherwise TRANS='BBB';
     end;
  end;
  OTHERWISE TRANS='CCC';
end;

This will be best, of course, if the various conditions have a nice nesting structure.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 2605 views
  • 5 likes
  • 6 in conversation