BookmarkSubscribeRSS Feed
CeciliaE
Fluorite | Level 6

Hello,

 

I have a couple of case when statements that will be used in about 18 different proc SQL programs within a project. Rather than copying and pasting the case when statements in each program, I'm wondering if there's a way for me to create a macro that stores the case when statements so that I can reference the macro from each of the programs.

 

The first macro is quite simple. I have a variable that has a few invalid values that must be mapped to valid ones. However, the vast majority of observations can remain as they are. My case when statement is structured as follows:

(CASE 
/*if variable is an invalid value, return its valid equivalent*/ WHEN VARIABLE = INVALID_VALUE1 THEN VALID_VALUE1 WHEN VARIABLE = INVALID_VALUE2 THEN VALID_VALUE2 ...
/*otherwise, return the variable as-is*/ ELSE VARIABLE END) as VARIABLE_CORRECTED,

 

This is what I've tried, but it didn't work:

%macro variable_crosswalk(variable);
    %if &variable = 'invalid_value1' %then %let ;
        &variable = 'valid_value1'; %end;
    %else %if &variable = 'invalid_value2' %then %do;
        &variable = 'valid_value2'; %end;
...
    %else %do;
        &variable
    %end;
%mend;

The second case when statement I'd like to store in a macro is a little more complicated. It uses several different race/ethnicity fields to calculate a person's race. It's structured as follows:

(case
when race1 = 1 then 'Race 1'
when race8 = 2 then 'Race 2'
...
when race1 = 1 and race2 = 2 then 'Two or More Races'
when race1 = 1 and race3 = 3 then 'Two or More Races'
/*all other possible race combinations*/ ... else 'Unknown' end) as Race_Ethnicity,

Is there a way for me to store one or both of the case when statements above as macros that can be accessed from all 18 programs in my project? If so, can someone help me with the syntax?

Thanks for any assistance you can provide!

6 REPLIES 6
Reeza
Super User
Have you considered a custom format instead? At least for the ones that use a single variable mapping.

CeciliaE
Fluorite | Level 6
I saw that option a prior post, but I don't see how the format would return the valid values as themselves. The invalid values comprise only 10 values out of several hundred; all of the values that are already valid can be returned as themselves. Would I need to include all possible values in the format, even the valid ones?

Here's the post I was referring to: https://communities.sas.com/t5/SAS-Programming/Case-Statement-from-a-Macro-Possible/td-p/460068
[https://communities.sas.com/html/assets/sas-comm-social-blue.png]<>
Case Statement from a Macro: Possible?<>
I have a case statement that will need to be used in multiple PROC SQL procedures and wanted to see if this can be stored in a macro similar to how it can be done with an IN statement. Case Statement Sample: ,case when t1.FIELD_NM IN ('EAGLE') THEN 'BIRD' when t1.FIELD_NM IN ('SALMON') THEN 'FISH'...
communities.sas.com

Tom
Super User Tom
Super User

You only need to list of values you want changed.

proc format;
value myfmt 99='XXX' other=[z3.];
run;
1203  data _null_;
1204    do myval=.,-1,0,1,12,99;
1205      put myval= myval=myfmt.;
1206    end;
1207  run;

myval=. myval=.
myval=-1 myval=-01
myval=0 myval=000
myval=1 myval=001
myval=12 myval=012
myval=99 myval=XXX
Reeza
Super User

 Your macro code needs to look similar to the code. It also looks like you're mixing data step and SQL code as well. One thing with formats as well, it works in either use case.

 

Very base case:

%let new_type = case when type = 'Hybrid' then 'Electric'
     when type in ('Truck', 'SUV') then 'Truck'
     else type end as new_type ;


proc sql;
create table want as
select model, make, invoice, msrp, &new_type.
from sashelp.cars;
quit;


Also, if you're repeating something multiple times, 18....there's likely something in your process that should be changed.

Tom
Super User Tom
Super User

Do you want to put the code into a macro variable?

%let race1_case=
case
when race1 = 1 then 'Race 1'
when race8 = 2 then 'Race 2'
...
when race1 = 1 and race2 = 2 then 'Two or More Races'
when race1 = 1 and race3 = 3 then 'Two or More Races'
else 'Unspecified'
end
;

Which you can then use:

select &race1_case as RACE1_CASE
     , ....

You will be limited to 65K bytes in the macro variable.

 

Or you can define an actual macro instead.

%macro race1_case;
case
when race1 = 1 then 'Race 1'
when race8 = 2 then 'Race 2'
...
when race1 = 1 and race2 = 2 then 'Two or More Races'
when race1 = 1 and race3 = 3 then 'Two or More Races'
else 'Unspecified'
end
%mend;

Which you can then use:

select %race1_case as RACE1_CASE
     , ....

And you will be able to have unlimited length.

ballardw
Super User

When code involves setting values based on a single variable then a Format or Informat may be the more flexible approach. For a great many purposes you don't even need to change the value in the data set at all for a format to be flexible.

 

Note: the Macro language basically does not see the values of the variables so any "%if &variable" will not in generally work if &variable is supposed to reference the name of a variable in a data set.

 

If  you have never looked at the SASHELP.CLASS data set please do so. It is a small, 19 observations and 5 variables.

Consider the variable SEX which has values of F and M. Perhaps I want to display values of 'Female' and 'Male' as those would be nicer to read.

Then run this code:

Proc format library=work;
value $class_sex
'F'='Female'
'M'='Male'
other='Unknown'
;
run;

proc freq data=sashelp.class;
   tables sex*age ;
   format sex $class_sex.;
run;

However formats are limited to mapping single variable values.

 

Your example for race needs a complete description of values and possibly meanings. You appear to have as many as 8 race variables since your code shows a Race8. The implication you have is that your variables are numeric so there are some interesting approaches

For example you can use the RANGE function on the list of all the variables: Range(race1, race2,race3,...,race8);

If the result from range is something other than 0 then at least two of the Race variables have different values which I suspect will take care of your "2 or more races" logic.

 

If the RANGE result is missing then all of the variables have missing values and Race_ethnicity='Unknown'.

 

If the Range is 0 that means only one value appeared in the list. I am going to guess that means that only one of Race1, Race2, etc will have a value (expecting Race1=1 , Race2=2, Race3=3 etc when present). Which means that in the Range=0 case that Race_ethnicity could be :

catx(' ', 'Race',put(max(race1,race2,race3...,race8),1.) )

 

In my personal code I would perhaps have Race_ethnicity as numeric and have format display "Race 1" whatever that text should be and have missing assigned to Race_ethnicity with the format .

 

Also personally I wouldn't do this in Proc SQL because you lose some of the tools that allow writing shorter code such as ARRAY. In a data step you can write something like this:

data want;
   set have;
   array r (*) race1-race8;
   if range(of r(*)) >0 then Race_ethnicity=9; /* where 9 is going to mean the 2 or more races in a format*/
   else if range(of r(*)) = then Race_ethnicity= max(of r(*));
run;

If you have many of those invalid values that get mapped to different valid values then an informat is extremely flexible when only one variable is involved.

 

Going back to the SASHELP.CLASS set.

Try this example:

Proc format library=work;
invalue $Newname 
'Alfred' = 'Fred'
'Henry'  = 'Hank'
'Carol','Louise' ='Dorothy'
other=_same_;
run;

data want;  
   set sashelp.class;
   newname = input(name,$newname.);
run;

The above example in Want creates a new variable so you can see the resulting behavior side by side the old Name and Newname variables. Rerun the code with Name= instead of Newname= to replace values in place.

The Input statement would work in SQL as well:

 

Proc sql;
   create table example as
   select input(name,$newname.) as name,sex, age
   from sashelp.class
   ;
run;

You can see with this sort of example that the code in the SQL would not have to change if I want to use the same rules in the Format or Informat created in Proc Format. I provided an example of mapping two incoming values to the same out come name for Dorothy as an example. The Case when would either require adding WHENs for new values or possibly changing a When var=value to When var in (<value list>). Since the list can be provided in the format or informat definition you move logic into a place that is easier to update.

 

If the results of the Proc Format code are assigned to permanent library, the LIbrary=work example, you can typically add that library to the system options looking for formats to search there when SAS is running. Or just execute the Proc Format code at start up of SAS and would always be available.

The formats and informats can be applied to multiple variables if the ranges of values are the same or a similar subset. I have an informat for reading "Yes" "No" "True" "False" "Y" "N" "T" "F" as numeric 1/0 coded values as that is very handy for some things.

 

I have another place where I have a list of "reasons" entered into an open text field that I use to create a new variable that has 4 code values. The list has about 120 text values up to 60 or so characters long.

You can use this code to IDENTIFY invalid values as well.

 

Suppose my "rules" say that the values for Sex are supposed to be "G" instead of "F" and expect "B" instead of "M"

Proc format library=work;
invalue $newsex
'F' = 'G'
'B' = _same_
other=_error_
;
run;

data example;
   set sashelp.class;
   sex= input(sex,$newsex.);
run;

The keyword _SAME_ in and INVALUE statement means basically use the existing value as the result.

The OTHER means this rule is applied to any value not explicitly stated prior. _ERROR_ means that those values other than 'F' and 'B' are errors and the LOG will show invalid data messages and usually enough information to determine what needs to be added to the invalue. In this case we would need to add a "M"="B".

 

I have a long running project, as in 20+years, that has a site location code in the data. I use an informat similar to that for $newsex whenI read the data files that has the expected values (_same_) and an other=_error_. That error triggers me to ask the program providing data for additional details we need about the site so I can update other code (some is formats) as needed. Then add the "new" site codes to the data validation invalue and rerun the code to build the proper data sets.

 

 

 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 333 views
  • 3 likes
  • 4 in conversation