Hello,
I have a huge dataset with a ton of different injury types, and am wondering if there is an easier way to create a different variable other than a bunch of of-then statements. I obviously don't need every type of injury to have its own category, and was hoping to group similar ones together, but it's taking so long.
I attached the cause of injury list below.
Here is the code I started typing:
if cause_of_injury in ("FALL") then cause="FALL";
if cause_of_injury in ("FALL SL") then cause="FALL_SL";
if cause_of_injury in ("MVA") then cause="MVA";
if cause_of_injury in ("FALL HEIGHT") then cause="FALL_HEIGHT";
if cause_of_injury in ("FALL SL") then cause="FALL_SL";
if cause_of_injury in ("MCA") then cause="MCA";
if cause_of_injury in ("SKI", "SNOWB", "SMOB", "SLED", "SNOWB") then cause="SNOW";
if cause_of_injury in ("ASLT") then cause="ASLT";
if cause_of_injury in ("BIKE") then cause="BIKE";
There are sooo many categories. Any tips/ideas?
According to sample given it seems that in most cases the cause = cause_of_injury,
in other cases the different cause_of_injury are gathered to one cause then you can write:
if cause_of_injury in (list 1 of "value"s) then cause = <value1>; else
if cause_of_injury in (list 2 of "value"s) then cause = <value2>; else
..... up to last group list .... ; else cause = cause_of_injury;
Alternatively create a table of distinct values of cause_of_injury
and assign a second variable the wanted category cause .
Having that table - then run SQL JOIN of your data with categorized table.
Use PROC FREQ on the cause_of_injury type and pipe it to Excel. In Excel, map your categories to the new ones which is a lot easier because you can sort and group them visually. Then re-import that file as a lookup table and merge them together.
I often use a mix of excel and sas to do something like this.
For example, I generate the unique categories with sas (like with proc freq) then copy them to a column in excel, then in second column, I enter the new value. Then in third column, I create the sas code statement, referring to the excel cells. Excel is very easy to copy/paste cells so this is pretty efficient. For example (and need to pay attention to the quotes so the text is properly quoted in the code):
="if cause_of_injury in (' " & a1 & " ') then cause=' " & b1 " ';' "
Then copy all rows back to SAS code. You can even store in a separate .sas file and use the %include statement which helps keep code cleaner.
Sometimes I create a giant proc format statement with this same method, and then use PUT statement to assign the new variable.
For example, in excel:
=" ' " & a1 & " ' = ' " & b1
Then in the sas code:
proc format;
value $causefmt;
[copy all the lines from excel]
data tmp;
set data;
cause=put(cause_of_injury,$causefmt.);
run;
If you import that look up table you can create a format from a data set as well using the CNTLIN option.
@Sascoder wrote:
I often use a mix of excel and sas to do something like this.
For example, I generate the unique categories with sas (like with proc freq) then copy them to a column in excel, then in second column, I enter the new value. Then in third column, I create the sas code statement, referring to the excel cells. Excel is very easy to copy/paste cells so this is pretty efficient. For example (and need to pay attention to the quotes so the text is properly quoted in the code):
="if cause_of_injury in (' " & a1 & " ') then cause=' " & b1 " ';' "
Then copy all rows back to SAS code. You can even store in a separate .sas file and use the %include statement which helps keep code cleaner.
Sometimes I create a giant proc format statement with this same method, and then use PUT statement to assign the new variable.
For example, in excel:
=" ' " & a1 & " ' = ' " & b1
Then in the sas code:
proc format;value $causefmt;
[copy all the lines from excel]
data tmp;
set data;
cause=put(cause_of_injury,$causefmt.);
run;
Depending on what you want to do, there are several options.
If you want to simplify the code a bit, you can use a SELECT statement instead of all the IFs:
select (cause_of_injury);
when ("FALL SL") cause="FALL_SL";
when ("MVA") cause="MVA";
when ("FALL") cause="FALL";
when ("FALL HEIGHT") cause="FALL_HEIGHT";
when ("FALL SL") cause="FALL_SL";
when ("MCA") cause="MCA";
when ("SKI", "SNOWB", "SMOB", "SLED", "SNOWB") cause="SNOW";
when ("ASLT") cause="ASLT";
when ("BIKE") cause="BIKE";
end;
And then there are many cases where the two variables end up the same, and some where blanks are converted to underscores, no other change. You can put all of those into an OTHERWISE statement:
select (cause_of_injury);
when ("SKI", "SNOWB", "SMOB", "SLED", "SNOWB") cause="SNOW";
otherwise cause=translate(trim(cause_of_injury),'_',' ');
end;
I know, this is not all of them, just those in your example. But it makes the code a lot simpler, nevertheless.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.