BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
TL93
Obsidian | Level 7

Question: I have a character variable where some responses are coded as all numbers, and some responses include letters. Is there a way I can tell SAS to format only the responses coded as numbers as numeric?

 

Context: My variable of interest is "CAUSE_DEATH" (cause of death) which is a character variable; length 4, format/informat $4. 

Causes of death are coded using ICD-9 and ICD-10 classification system depending on what year my respondent died. ICD 9 codes are 4 digit numbers, whereas ICD 10 codes include a letter as the first digit.

I want to group these deaths into broad categories in the most efficient way possible. Is there a way to tell SAS to format the responses with only numbers (i.e. ICD 9 codes) as numeric, while keeping the ICD 10-coded responses as character? I am using SAS Enterprise Guide 7.1.

 

Example: One broad cause of death category is Infectious Diseases. ICD 9 codes for this are between 0010 and 1399. ICD 10 codes for this are between A000 and B999. The following code works (for the ICD 10 codes):

proc freq data=lib.datafile;
table CAUSE_DEATH;
where substr(CAUSE_DEATH,1,1='A') or substr(CAUSE_DEATH,1,1='B');
run;

The following code obviously does not work (for the ICD 9 codes):

proc freq data=lib.datafile;
table CAUSE_DEATH;
where 0010<=CAUSE_DEATH<=1399;
run;

My goal is to be able to do this:

data causedeath; set lib.datafile;
infectious_disease=.;
if 0010<=CAUSE_DEATH<=1399 OR substr(CAUSE_DEATH,1,1='A') OR substr(CAUSE_DEATH,1,1='B') then infectious_disease=1; else infectious_disease=0;
run;

Can someone please let me know the best way to make this possible? I would not want to write out each individual ICD 9 code in my syntax since that would be very time consuming.

 

Thank you for your time and help!

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

@TL93 wrote:

 

The following code obviously does not work (for the ICD 9 codes):

proc freq data=lib.datafile;
table CAUSE_DEATH;
where 0010<=CAUSE_DEATH<=1399;
run;

 


Why doesn't it work? Because the variable CAUSE_DEATH is character and not numeric?

 

If so, the you can convert CAUSE_DEATH to numeric to do the comparison:

 

where 10<=input(CAUSE_DEATH,4.)<=1399;

and this solves the problem, you can now modify your final code this way as well.

 

Incidentally, you can also use this to find strings which begin with letters A or B:

 

where CAUSE_DEATH=:'A' or CAUSE_DEATH=:'B';

 

 

 

 

--
Paige Miller

View solution in original post

6 REPLIES 6
PaigeMiller
Diamond | Level 26

@TL93 wrote:

 

The following code obviously does not work (for the ICD 9 codes):

proc freq data=lib.datafile;
table CAUSE_DEATH;
where 0010<=CAUSE_DEATH<=1399;
run;

 


Why doesn't it work? Because the variable CAUSE_DEATH is character and not numeric?

 

If so, the you can convert CAUSE_DEATH to numeric to do the comparison:

 

where 10<=input(CAUSE_DEATH,4.)<=1399;

and this solves the problem, you can now modify your final code this way as well.

 

Incidentally, you can also use this to find strings which begin with letters A or B:

 

where CAUSE_DEATH=:'A' or CAUSE_DEATH=:'B';

 

 

 

 

--
Paige Miller
TL93
Obsidian | Level 7

Hi Paige,

 

Thank you so much for your response, I think it has answered my question. Yes, CAUSE_DEATH is a character variable. I used the following syntax with your suggestions and it worked:

data causedeath; set lib.datafile;
infectious_disease=.;
if 0010<=input(CAUSE_DEATH,4.)<=1399 or CAUSE_DEATH=:'A' or CAUSE_DEATH=:'B' then infectious_disease=1; else infectious_disease=0;
run;

Sorry for my ignorance, I wasn't aware that I can do the conversion in a "where" (or in this case "if") statement. Does this permanently change my CAUSE_DEATH format? I would like to keep it as a character variable so that my ICD 10-coded responses are not altered. I need to have the letter in front of the ICD 10-coded responses so that I can continue to make other groupings.

PaigeMiller
Diamond | Level 26

@TL93 wrote:

I wasn't aware that I can do the conversion in a "where" (or in this case "if") statement. Does this permanently change my CAUSE_DEATH format? I would like to keep it as a character variable so that my ICD 10-coded responses are not altered. I need to have the letter in front of the ICD 10-coded responses so that I can continue to make other groupings.


CAUSE_DEATH is unchanged (which you could confirm simply by looking at the data set).

 

A numeric version of CAUSE_DEATH is used in the IF statement, but this numeric version of CAUSE_DEATH is not stored in the data set. 

 

--
Paige Miller
TL93
Obsidian | Level 7
Got it! Again, thank you very much for your help!
Peter_C
Rhodochrosite | Level 12
Looks like a candidate for special user-defined in/formats.
That separates the definition of the "group" from the coding which is logically straightforward
..... Is this candidate one of the required group .....
If input( cause_death, user_informat. ) ;
34reqrwe
Quartz | Level 8

FYI: You can do size comparisions of character variables . e.g. '1' < '2' .   Characters have an order, and one can be less than another.

However be careful , as '10' < '2' also . 

 

In your case you are OK though because you have leading zeros . Therfore you could just do this.

 

 

data causedeath; set lib.datafile;
infectious_disease=.;
if '0010'<=CAUSE_DEATH<='1399' OR substr(CAUSE_DEATH,1,1='A') OR substr(CAUSE_DEATH,1,1='B') then infectious_disease=1; else infectious_disease=0;
run;

 

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 2394 views
  • 2 likes
  • 4 in conversation