- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
One of the variables in my dataset has ICD-10 codes, and I am trying to calculate the cause-specific numbers of cancer deaths and cancer death rates. I've figured out a way to calculate the number of cancer deaths, but I am stumped on how to calculate deaths rates in one statement within the DATA step. I am only focused on the first three letters of each observation (e.g., "C10"), and I want to calculate it for different cancer types and used the SUBSTR statement. This is what my code looks like right now (I only did it for the first two cancer types and used this same structure for the remaining cancer types), and I want to know the best way to achieve this same result in one statement in a DATA step with all death rates rounded to 1 decimal place:
DATA want;
SET have;
*Cancer Overall:;
IF SUBSTR(cause,1,1)="C" THEN overall_cancer_deaths=1;
ELSE IF SUBSTR(cause,1,1) ne "C" THEN overall_cancer_deaths=0;
*Stomach: C16;
IF SUBSTR(cause,1,3)="C16" THEN stomach_deaths=1;
ELSE IF SUBSTR(cause,1,3) ne "C16" THEN stomach_deaths=0;
RUN;
DATA want;
SET have;
overall_rate=((overall_cancer_deaths=1)/(12,000,000*100,000));
stomach rate=((stomach_deaths=1)/(12,000,000*100,000));
RUN
Any suggestions about how to achieve this in one statement in a DATA step is very much appreciated. Thanks in advance!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you want to calculate a RATE you need two numbers, a numerator and a denominator. Does not seem like something you could do based on data that has individual ICD-10. You probably need to first COUNT something.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Saying "one statement" is going to be very hard to fulfill as there is no data provided to know if this may even be possible depending on what you may mean by "one statement". Or where.
You can shorten your code such as
IF SUBSTR(cause,1,1)="C" THEN overall_cancer_deaths=1; ELSE IF SUBSTR(cause,1,1) ne "C" THEN overall_cancer_deaths=0;
to
overall_cancer_deaths = (substr(cause,1,1)='C') ;
SAS will return 1 for true and 0 for false.
Your "rates" might be done with arrays. Replacing your code : (DO not places commas in numbers to use in calculations, it will result in an error).
DATA want; SET have; array c (*) overall_cancer_deaths stomach_deaths other_deaths; array r (*) overall_rate stomach_rate other_rate; do i=1 to dim(c); r[i] =((c[i]=1)/(12000000*100000)); end; drop i; RUN
However, I think that your have your () incorrect ant the calculation should be
r[i] =((c[i]=1)/12000000)*100000;
if you want a rate per 100,000. As currently using your code you have the equivalent to dividing by 1,200,000,000,000
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think you're missing some steps in between there, like a count?
If so, you may be able to use a custom multilabel format proc means and a single data step to get this done. You would need to show some raw data and the expected output to help us build that solution though.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
While I don't know how to do what you ask in one statement (whatever that means), you can simplify your typing as follows:
IF SUBSTR(cause,1,1)="C" THEN overall_cancer_deaths=1;
ELSE IF SUBSTR(cause,1,1) ne "C" THEN overall_cancer_deaths=0;
can be replaced with
c_deaths=(cause=:'C');
which will give you a one or zero depending on the value of variable named cause. Please note the colon after the cause= .
From there you could create a macro to compute this 0 or 1 and sum it, so you could write
%macro death(string);
sum_&string + (cause=:"&string");
%mend;
and then list all the codes like this:
data want;
set have end=eof;
array sums sum_:;
%death(C)
%death(C16)
...
if eof then do;
do i=1 to dim(sums);
sums(i)=sums(i)/(12000000*100000);
end;
output;
end;
run;
This could even be simplified further if you have all of the possible codes to be used in a SAS data set, then you can replace the typing of all %death statements with a macro loop that calls %death for each possible cause, so you don't have to type them all in. Or similar with CALL EXECUTE. I realize this isn't anywhere close to the (probably impossible) request to do it in one statement, but it seems to me it would be one of the fastest ways to code this.
Which brings us to a question I have for you. What is your real goal? To find the fastest way to program this? Or are you looking for something else? Too often, people decide upon a solution first — write it in one line — and this doesn't express their real goal here, and gives little guidance as the apparent goal of writing it in one line seems impossible.
Paige Miller