BookmarkSubscribeRSS Feed
stancemcgraw
Obsidian | Level 7

Hello,

 

  I have a huge dataset with a ton of different injury types, and am wondering if there is an easier way to create a different variable other than a bunch of of-then statements. I obviously don't need every type of injury to have its own category, and was hoping to group similar ones together, but it's taking so long.

I attached the cause of injury list below. 

Here is the code I started typing:

 

if cause_of_injury in ("FALL") then cause="FALL";

if cause_of_injury in ("FALL SL") then cause="FALL_SL";

if cause_of_injury in ("MVA") then cause="MVA";

if cause_of_injury in ("FALL HEIGHT") then cause="FALL_HEIGHT";

if cause_of_injury in ("FALL SL") then cause="FALL_SL";

if cause_of_injury in ("MCA") then cause="MCA";

if cause_of_injury in ("SKI", "SNOWB", "SMOB", "SLED", "SNOWB") then cause="SNOW";

if cause_of_injury in ("ASLT") then cause="ASLT";

if cause_of_injury in ("BIKE") then cause="BIKE";

 

There are sooo many categories. Any tips/ideas?

 

 

 

5 REPLIES 5
Shmuel
Garnet | Level 18

According to sample given it seems that in most cases the cause = cause_of_injury,

in other cases the different cause_of_injury are gathered to one cause then you can write:

      if cause_of_injury  in (list 1 of "value"s) then cause = <value1>; else

      if cause_of_injury  in (list 2 of "value"s) then cause = <value2>; else

      ..... up to last group list .... ; else  cause = cause_of_injury;

 

Alternatively create a table of distinct values of cause_of_injury 

and assign a second variable the wanted category cause .

Having that table - then run SQL JOIN of your data with categorized table.

Reeza
Super User

Use PROC FREQ on the cause_of_injury type and pipe it to Excel. In Excel, map your categories to the new ones which is a lot easier because you can sort and group them visually. Then re-import that file as a lookup table and merge them together. 

Sascoder
Obsidian | Level 7

I often use a mix of excel and sas to do something like this.

For example, I generate the unique categories with sas (like with proc freq) then copy them to a column in excel, then in second column, I enter the new value.  Then in third column, I create the sas code statement, referring to the excel cells.  Excel is very easy to copy/paste cells so this is pretty efficient.  For example (and need to pay attention to the quotes so the text is properly quoted in the code):

 ="if cause_of_injury in (' " & a1 & " ') then cause=' " & b1 " ';' " 

Then copy all rows back to SAS code.  You can even store in a separate .sas file and use the %include statement which helps keep code cleaner.

 

Sometimes I create a giant proc format statement with this same method, and then use PUT statement to assign the new variable.

For example, in excel:

=" ' " & a1 & " ' = ' "  & b1

 

Then in the sas code:
proc format;

value $causefmt;

[copy all the lines from excel]

 

data tmp;

  set data;

 cause=put(cause_of_injury,$causefmt.);

run;

 

 

 

Reeza
Super User

If you import that look up table you can create a format from a data set as well using the CNTLIN option. 

 


@Sascoder wrote:

I often use a mix of excel and sas to do something like this.

For example, I generate the unique categories with sas (like with proc freq) then copy them to a column in excel, then in second column, I enter the new value.  Then in third column, I create the sas code statement, referring to the excel cells.  Excel is very easy to copy/paste cells so this is pretty efficient.  For example (and need to pay attention to the quotes so the text is properly quoted in the code):

 ="if cause_of_injury in (' " & a1 & " ') then cause=' " & b1 " ';' " 

Then copy all rows back to SAS code.  You can even store in a separate .sas file and use the %include statement which helps keep code cleaner.

 

Sometimes I create a giant proc format statement with this same method, and then use PUT statement to assign the new variable.

For example, in excel:

=" ' " & a1 & " ' = ' "  & b1

 

Then in the sas code:
proc format;

value $causefmt;

[copy all the lines from excel]

 

data tmp;

  set data;

 cause=put(cause_of_injury,$causefmt.);

run;

 

 

 


 

s_lassen
Meteorite | Level 14

Depending on what you want to do, there are several options.

 

If you want to simplify the code a bit, you can use a SELECT statement instead of all the IFs:

select (cause_of_injury);
  when ("FALL SL") cause="FALL_SL";
  when ("MVA") cause="MVA";
  when ("FALL") cause="FALL";
  when ("FALL HEIGHT") cause="FALL_HEIGHT";
  when ("FALL SL") cause="FALL_SL";
  when ("MCA") cause="MCA";
  when ("SKI", "SNOWB", "SMOB", "SLED", "SNOWB") cause="SNOW";
  when ("ASLT") cause="ASLT";
  when ("BIKE") cause="BIKE";
  end;

And then there are many cases where the two variables end up the same, and some where blanks are converted to underscores, no other change. You can put all of those into an OTHERWISE statement:

select (cause_of_injury);
  when ("SKI", "SNOWB", "SMOB", "SLED", "SNOWB") cause="SNOW";
  otherwise cause=translate(trim(cause_of_injury),'_',' ');
  end;

I know, this is not all of them, just those in your example. But it makes the code a lot simpler, nevertheless.

 

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 980 views
  • 0 likes
  • 5 in conversation