Question: I have a character variable where some responses are coded as all numbers, and some responses include letters. Is there a way I can tell SAS to format only the responses coded as numbers as numeric?
Context: My variable of interest is "CAUSE_DEATH" (cause of death) which is a character variable; length 4, format/informat $4.
Causes of death are coded using ICD-9 and ICD-10 classification system depending on what year my respondent died. ICD 9 codes are 4 digit numbers, whereas ICD 10 codes include a letter as the first digit.
I want to group these deaths into broad categories in the most efficient way possible. Is there a way to tell SAS to format the responses with only numbers (i.e. ICD 9 codes) as numeric, while keeping the ICD 10-coded responses as character? I am using SAS Enterprise Guide 7.1.
Example: One broad cause of death category is Infectious Diseases. ICD 9 codes for this are between 0010 and 1399. ICD 10 codes for this are between A000 and B999. The following code works (for the ICD 10 codes):
proc freq data=lib.datafile;
table CAUSE_DEATH;
where substr(CAUSE_DEATH,1,1='A') or substr(CAUSE_DEATH,1,1='B');
run;
The following code obviously does not work (for the ICD 9 codes):
proc freq data=lib.datafile;
table CAUSE_DEATH;
where 0010<=CAUSE_DEATH<=1399;
run;
My goal is to be able to do this:
data causedeath; set lib.datafile;
infectious_disease=.;
if 0010<=CAUSE_DEATH<=1399 OR substr(CAUSE_DEATH,1,1='A') OR substr(CAUSE_DEATH,1,1='B') then infectious_disease=1; else infectious_disease=0;
run;
Can someone please let me know the best way to make this possible? I would not want to write out each individual ICD 9 code in my syntax since that would be very time consuming.
Thank you for your time and help!
@TL93 wrote:
The following code obviously does not work (for the ICD 9 codes):
proc freq data=lib.datafile; table CAUSE_DEATH; where 0010<=CAUSE_DEATH<=1399; run;
Why doesn't it work? Because the variable CAUSE_DEATH is character and not numeric?
If so, the you can convert CAUSE_DEATH to numeric to do the comparison:
where 10<=input(CAUSE_DEATH,4.)<=1399;
and this solves the problem, you can now modify your final code this way as well.
Incidentally, you can also use this to find strings which begin with letters A or B:
where CAUSE_DEATH=:'A' or CAUSE_DEATH=:'B';
@TL93 wrote:
The following code obviously does not work (for the ICD 9 codes):
proc freq data=lib.datafile; table CAUSE_DEATH; where 0010<=CAUSE_DEATH<=1399; run;
Why doesn't it work? Because the variable CAUSE_DEATH is character and not numeric?
If so, the you can convert CAUSE_DEATH to numeric to do the comparison:
where 10<=input(CAUSE_DEATH,4.)<=1399;
and this solves the problem, you can now modify your final code this way as well.
Incidentally, you can also use this to find strings which begin with letters A or B:
where CAUSE_DEATH=:'A' or CAUSE_DEATH=:'B';
Hi Paige,
Thank you so much for your response, I think it has answered my question. Yes, CAUSE_DEATH is a character variable. I used the following syntax with your suggestions and it worked:
data causedeath; set lib.datafile;
infectious_disease=.;
if 0010<=input(CAUSE_DEATH,4.)<=1399 or CAUSE_DEATH=:'A' or CAUSE_DEATH=:'B' then infectious_disease=1; else infectious_disease=0;
run;
Sorry for my ignorance, I wasn't aware that I can do the conversion in a "where" (or in this case "if") statement. Does this permanently change my CAUSE_DEATH format? I would like to keep it as a character variable so that my ICD 10-coded responses are not altered. I need to have the letter in front of the ICD 10-coded responses so that I can continue to make other groupings.
@TL93 wrote:
I wasn't aware that I can do the conversion in a "where" (or in this case "if") statement. Does this permanently change my CAUSE_DEATH format? I would like to keep it as a character variable so that my ICD 10-coded responses are not altered. I need to have the letter in front of the ICD 10-coded responses so that I can continue to make other groupings.
CAUSE_DEATH is unchanged (which you could confirm simply by looking at the data set).
A numeric version of CAUSE_DEATH is used in the IF statement, but this numeric version of CAUSE_DEATH is not stored in the data set.
FYI: You can do size comparisions of character variables . e.g. '1' < '2' . Characters have an order, and one can be less than another.
However be careful , as '10' < '2' also .
In your case you are OK though because you have leading zeros . Therfore you could just do this.
data causedeath; set lib.datafile;
infectious_disease=.;
if '0010'<=CAUSE_DEATH<='1399' OR substr(CAUSE_DEATH,1,1='A') OR substr(CAUSE_DEATH,1,1='B') then infectious_disease=1; else infectious_disease=0;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.