- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have a huge dataset with a ton of different injury types, and am wondering if there is an easier way to create a different variable other than a bunch of of-then statements. I obviously don't need every type of injury to have its own category, and was hoping to group similar ones together, but it's taking so long.
I attached the cause of injury list below.
Here is the code I started typing:
if cause_of_injury in ("FALL") then cause="FALL";
if cause_of_injury in ("FALL SL") then cause="FALL_SL";
if cause_of_injury in ("MVA") then cause="MVA";
if cause_of_injury in ("FALL HEIGHT") then cause="FALL_HEIGHT";
if cause_of_injury in ("FALL SL") then cause="FALL_SL";
if cause_of_injury in ("MCA") then cause="MCA";
if cause_of_injury in ("SKI", "SNOWB", "SMOB", "SLED", "SNOWB") then cause="SNOW";
if cause_of_injury in ("ASLT") then cause="ASLT";
if cause_of_injury in ("BIKE") then cause="BIKE";
There are sooo many categories. Any tips/ideas?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
According to sample given it seems that in most cases the cause = cause_of_injury,
in other cases the different cause_of_injury are gathered to one cause then you can write:
if cause_of_injury in (list 1 of "value"s) then cause = <value1>; else
if cause_of_injury in (list 2 of "value"s) then cause = <value2>; else
..... up to last group list .... ; else cause = cause_of_injury;
Alternatively create a table of distinct values of cause_of_injury
and assign a second variable the wanted category cause .
Having that table - then run SQL JOIN of your data with categorized table.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Use PROC FREQ on the cause_of_injury type and pipe it to Excel. In Excel, map your categories to the new ones which is a lot easier because you can sort and group them visually. Then re-import that file as a lookup table and merge them together.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I often use a mix of excel and sas to do something like this.
For example, I generate the unique categories with sas (like with proc freq) then copy them to a column in excel, then in second column, I enter the new value. Then in third column, I create the sas code statement, referring to the excel cells. Excel is very easy to copy/paste cells so this is pretty efficient. For example (and need to pay attention to the quotes so the text is properly quoted in the code):
="if cause_of_injury in (' " & a1 & " ') then cause=' " & b1 " ';' "
Then copy all rows back to SAS code. You can even store in a separate .sas file and use the %include statement which helps keep code cleaner.
Sometimes I create a giant proc format statement with this same method, and then use PUT statement to assign the new variable.
For example, in excel:
=" ' " & a1 & " ' = ' " & b1
Then in the sas code:
proc format;
value $causefmt;
[copy all the lines from excel]
data tmp;
set data;
cause=put(cause_of_injury,$causefmt.);
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you import that look up table you can create a format from a data set as well using the CNTLIN option.
@Sascoder wrote:
I often use a mix of excel and sas to do something like this.
For example, I generate the unique categories with sas (like with proc freq) then copy them to a column in excel, then in second column, I enter the new value. Then in third column, I create the sas code statement, referring to the excel cells. Excel is very easy to copy/paste cells so this is pretty efficient. For example (and need to pay attention to the quotes so the text is properly quoted in the code):
="if cause_of_injury in (' " & a1 & " ') then cause=' " & b1 " ';' "
Then copy all rows back to SAS code. You can even store in a separate .sas file and use the %include statement which helps keep code cleaner.
Sometimes I create a giant proc format statement with this same method, and then use PUT statement to assign the new variable.
For example, in excel:
=" ' " & a1 & " ' = ' " & b1
Then in the sas code:
proc format;value $causefmt;
[copy all the lines from excel]
data tmp;
set data;
cause=put(cause_of_injury,$causefmt.);
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Depending on what you want to do, there are several options.
If you want to simplify the code a bit, you can use a SELECT statement instead of all the IFs:
select (cause_of_injury);
when ("FALL SL") cause="FALL_SL";
when ("MVA") cause="MVA";
when ("FALL") cause="FALL";
when ("FALL HEIGHT") cause="FALL_HEIGHT";
when ("FALL SL") cause="FALL_SL";
when ("MCA") cause="MCA";
when ("SKI", "SNOWB", "SMOB", "SLED", "SNOWB") cause="SNOW";
when ("ASLT") cause="ASLT";
when ("BIKE") cause="BIKE";
end;
And then there are many cases where the two variables end up the same, and some where blanks are converted to underscores, no other change. You can put all of those into an OTHERWISE statement:
select (cause_of_injury);
when ("SKI", "SNOWB", "SMOB", "SLED", "SNOWB") cause="SNOW";
otherwise cause=translate(trim(cause_of_injury),'_',' ');
end;
I know, this is not all of them, just those in your example. But it makes the code a lot simpler, nevertheless.