Question: I have a character variable where some responses are coded as all numbers, and some responses include letters. Is there a way I can tell SAS to format only the responses coded as numbers as numeric?
Context: My variable of interest is "CAUSE_DEATH" (cause of death) which is a character variable; length 4, format/informat $4.
Causes of death are coded using ICD-9 and ICD-10 classification system depending on what year my respondent died. ICD 9 codes are 4 digit numbers, whereas ICD 10 codes include a letter as the first digit.
I want to group these deaths into broad categories in the most efficient way possible. Is there a way to tell SAS to format the responses with only numbers (i.e. ICD 9 codes) as numeric, while keeping the ICD 10-coded responses as character? I am using SAS Enterprise Guide 7.1.
Example: One broad cause of death category is Infectious Diseases. ICD 9 codes for this are between 0010 and 1399. ICD 10 codes for this are between A000 and B999. The following code works (for the ICD 10 codes):
proc freq data=lib.datafile;
table CAUSE_DEATH;
where substr(CAUSE_DEATH,1,1='A') or substr(CAUSE_DEATH,1,1='B');
run;The following code obviously does not work (for the ICD 9 codes):
proc freq data=lib.datafile;
table CAUSE_DEATH;
where 0010<=CAUSE_DEATH<=1399;
run;My goal is to be able to do this:
data causedeath; set lib.datafile;
infectious_disease=.;
if 0010<=CAUSE_DEATH<=1399 OR substr(CAUSE_DEATH,1,1='A') OR substr(CAUSE_DEATH,1,1='B') then infectious_disease=1; else infectious_disease=0;
run;Can someone please let me know the best way to make this possible? I would not want to write out each individual ICD 9 code in my syntax since that would be very time consuming.
Thank you for your time and help!
@TL93 wrote:
The following code obviously does not work (for the ICD 9 codes):
proc freq data=lib.datafile; table CAUSE_DEATH; where 0010<=CAUSE_DEATH<=1399; run;
Why doesn't it work? Because the variable CAUSE_DEATH is character and not numeric?
If so, the you can convert CAUSE_DEATH to numeric to do the comparison:
where 10<=input(CAUSE_DEATH,4.)<=1399;
and this solves the problem, you can now modify your final code this way as well.
Incidentally, you can also use this to find strings which begin with letters A or B:
where CAUSE_DEATH=:'A' or CAUSE_DEATH=:'B';
@TL93 wrote:
The following code obviously does not work (for the ICD 9 codes):
proc freq data=lib.datafile; table CAUSE_DEATH; where 0010<=CAUSE_DEATH<=1399; run;
Why doesn't it work? Because the variable CAUSE_DEATH is character and not numeric?
If so, the you can convert CAUSE_DEATH to numeric to do the comparison:
where 10<=input(CAUSE_DEATH,4.)<=1399;
and this solves the problem, you can now modify your final code this way as well.
Incidentally, you can also use this to find strings which begin with letters A or B:
where CAUSE_DEATH=:'A' or CAUSE_DEATH=:'B';
Hi Paige,
Thank you so much for your response, I think it has answered my question. Yes, CAUSE_DEATH is a character variable. I used the following syntax with your suggestions and it worked:
data causedeath; set lib.datafile;
infectious_disease=.;
if 0010<=input(CAUSE_DEATH,4.)<=1399 or CAUSE_DEATH=:'A' or CAUSE_DEATH=:'B' then infectious_disease=1; else infectious_disease=0;
run;Sorry for my ignorance, I wasn't aware that I can do the conversion in a "where" (or in this case "if") statement. Does this permanently change my CAUSE_DEATH format? I would like to keep it as a character variable so that my ICD 10-coded responses are not altered. I need to have the letter in front of the ICD 10-coded responses so that I can continue to make other groupings.
@TL93 wrote:
I wasn't aware that I can do the conversion in a "where" (or in this case "if") statement. Does this permanently change my CAUSE_DEATH format? I would like to keep it as a character variable so that my ICD 10-coded responses are not altered. I need to have the letter in front of the ICD 10-coded responses so that I can continue to make other groupings.
CAUSE_DEATH is unchanged (which you could confirm simply by looking at the data set).
A numeric version of CAUSE_DEATH is used in the IF statement, but this numeric version of CAUSE_DEATH is not stored in the data set.
FYI: You can do size comparisions of character variables . e.g. '1' < '2' . Characters have an order, and one can be less than another.
However be careful , as '10' < '2' also .
In your case you are OK though because you have leading zeros . Therfore you could just do this.
data causedeath; set lib.datafile;
infectious_disease=.;
if '0010'<=CAUSE_DEATH<='1399' OR substr(CAUSE_DEATH,1,1='A') OR substr(CAUSE_DEATH,1,1='B') then infectious_disease=1; else infectious_disease=0;
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.