BookmarkSubscribeRSS Feed
jb915
Calcite | Level 5

One of the variables in my dataset has ICD-10 codes, and I am trying to calculate the cause-specific numbers of cancer deaths and cancer death rates. I've figured out a way to calculate the number of cancer deaths, but I am stumped on how to calculate deaths rates in one statement within the DATA step. I am only focused on the first three letters of each observation (e.g., "C10"), and I want to calculate it for different cancer types and used the SUBSTR statement. This is what my code looks like right now (I only did it for the first two cancer types and used this same structure for the remaining cancer types), and I want to know the best way to achieve this same result in one statement in a DATA step with all death rates rounded to 1 decimal place:

 

DATA want;

SET have;

*Cancer Overall:;

IF SUBSTR(cause,1,1)="C" THEN overall_cancer_deaths=1;

ELSE IF SUBSTR(cause,1,1) ne "C" THEN overall_cancer_deaths=0;

*Stomach: C16;

IF SUBSTR(cause,1,3)="C16" THEN stomach_deaths=1;

ELSE IF SUBSTR(cause,1,3) ne "C16" THEN stomach_deaths=0;

RUN;

 

DATA want;

SET have;

overall_rate=((overall_cancer_deaths=1)/(12,000,000*100,000));

stomach rate=((stomach_deaths=1)/(12,000,000*100,000));

RUN

 

Any suggestions about how to achieve this in one statement in a DATA step is very much appreciated. Thanks in advance! 

4 REPLIES 4
Tom
Super User Tom
Super User

If you want to calculate a RATE you need two numbers, a numerator and a denominator.   Does not seem like something you could do based on data that has individual ICD-10.  You probably need to first COUNT something.

ballardw
Super User

Saying "one statement" is going to be very hard to fulfill as there is no data provided to know if this may even be possible depending on what  you may mean by "one statement". Or where.

 

You can shorten your code such as

 

IF SUBSTR(cause,1,1)="C" THEN overall_cancer_deaths=1;
ELSE IF SUBSTR(cause,1,1) ne "C" THEN overall_cancer_deaths=0;

to

 

 

overall_cancer_deaths = (substr(cause,1,1)='C') ;

SAS will return 1 for true and 0 for false.

 

 

Your "rates" might be done with arrays. Replacing your code : (DO not places commas in numbers to use in calculations, it will result in an error).

 

DATA want;
   SET have;
   array c (*) overall_cancer_deaths   stomach_deaths other_deaths;
   array r (*) overall_rate   stomach_rate other_rate;
   do i=1 to dim(c);
       r[i] =((c[i]=1)/(12000000*100000)); 
   end;
   drop i;
RUN

However, I think that your have your () incorrect ant the calculation should be

 

 r[i] =((c[i]=1)/12000000)*100000;

if you want a rate per 100,000. As currently using your code you have the equivalent to dividing by 1,200,000,000,000

 

 

 

Reeza
Super User

I think you're missing some steps in between there, like a count?

If so, you may be able to use a custom multilabel format proc means and a single data step to get this done. You would need to show some raw data and the expected output to help us build that solution though. 

PaigeMiller
Diamond | Level 26

While I don't know how to do what you ask in one statement (whatever that means), you can simplify your typing as follows:

 

IF SUBSTR(cause,1,1)="C" THEN overall_cancer_deaths=1;
ELSE IF SUBSTR(cause,1,1) ne "C" THEN overall_cancer_deaths=0;

 

can be replaced with

 

    c_deaths=(cause=:'C');

 

which will give you a one or zero depending on the value of variable named cause. Please note the colon after the cause= .

 

From there you could create a macro to compute this 0 or 1 and sum it, so you could write

 

%macro death(string);
     sum_&string + (cause=:"&string");
%mend;

 

and then list all the codes like this:

 

data want;
     set have end=eof;
     array sums sum_:;
     %death(C)
     %death(C16)
     ...
     if eof then do;
         do i=1 to dim(sums);
             sums(i)=sums(i)/(12000000*100000);
         end;
         output;
    end;
run;

 

This could even be simplified further if you have all of the possible codes to be used in a SAS data set, then you can replace the typing of all %death statements with a macro loop that calls %death for each possible cause, so you don't have to type them all in. Or similar with CALL EXECUTE. I realize this isn't anywhere close to the (probably impossible) request to do it in one statement, but it seems to me it would be one of the fastest ways to code this.

 

Which brings us to a question I have for you. What is your real goal? To find the fastest way to program this? Or are you looking for something else? Too often, people decide upon a solution first — write it in one line — and this doesn't express their real goal here, and gives little guidance as the apparent goal of writing it in one line seems impossible.

--
Paige Miller

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1408 views
  • 0 likes
  • 5 in conversation