DATA Step, Macro, Functions and more

New Variable Recode

Reply
Contributor
Posts: 35

New Variable Recode

Hi,

I have a cancer data set that have a Seer_group variable (have the same coding as SEER Cancer Data (20010, 20020...31010, 31020...) and a beh variable coded as 0, 1, 2 and 3 ( depending on the malignancy behavior).

I'm trying to create a new variable CaSite81 which includes all the seer group variables plus adding 6 more variables for brain and breast cancer (also change the seer group variable into numeric)as follows:

CASite81=Seer group*1;
if Seer_group="" then CAsite81 = 99999;
if Seer_group in ( 26000 ) and beh=2
then CASite81= 26010 ; *Breast, In Situ;
if Seer_group in ( 26000 ) and beh=3
then CASite81= 26020 ; *Breast, Invasive;
if Seer_group in ( 31010 31040 ) and beh in (0 1)
then CASite81= 31020 ; *Brain and Other Nervous System (Benign);
if Seer_group in ( 31010 31040 ) and beh=3
then CASite81= 31021 ; *Brain and Other Nervous System (Malignant);
if Seer_group in ( 31010 ) and beh=3
then CASite81= 31030 ; *Brain (Malignant);
if Seer_group in ( 31040 ) and beh=3
then CASite81= 31031 ; *Cranial Nerves Other Nervous System (Malignant);

 

When I run the Freq Casite81*beh or Casite*year all have frequencies except 31021. I can't find any explanation for that. 

The freq table for the original variables (31010 1nd 31040) *beh has the following:

                               Beh

Seer_group      0      1       2        3

  31010          197   215     0      2465  

  31040         2172  118     0      187

 

Freq for Casite81 I get:

 

                             Beh

Seer_group      0        1       2        3

  31020           2369  333     0        0

  31030             0       0        0      2465

  31031             0       0        0      187

 

Freq Casite81 Seer_group *Year have only 31020, 31030, 31031 also.

I'm so confused why this 31021 does not show at all.

 

Thanks

 

 

Super User
Posts: 5,079

Re: New Variable Recode

There can't be any 31021 in your final data set. 

 

Look at the final three IF/THEN statements.  Any time 31021 was assigned, one of the last two statements will change that value to either 31030 or 31031.  You can't have CASite81 taking on both 31021 and 31030 on the same observation.

Contributor
Posts: 35

Re: New Variable Recode

Thank you Astounding.

So is there any other way I can have those 4 new variables in my data set. 

Brain and Caranial nerves and other nervous system (Benign); with beh =  0 or 1

Brain and Caranial nerves and other nervous system (Malignant); with beh =3

Brain (malignant); with beh =3

Cranial nerves and other Nervous system (malignant); with beh =3

 

Brain code is 31010

Cranial nerves and other Nervous system code is 31040

beh benign code 0 or 1

beh malignant code is 3

Thanks,

Super User
Posts: 5,079

Re: New Variable Recode

One approach would be this.  Don't replace CASite81 with values of 31030 and 31031.  Instead, create a new variable:


if Seer_group in ( 31010 31040 ) and beh=3
then do;

   CASite81= 31021 ; *Brain and Other Nervous System (Malignant);
   if Seer_group = 31010 then CAsubSite81= 31030 ; /* Brain (Malignant) */
   else CAsubSite81= 31031 ; /* Cranial Nerves Other Nervous System (Malignant) */

end;

 

Notice that the style of commenting changes also.  That's needed to permit the ELSE statement to work.

Super User
Posts: 10,483

Re: New Variable Recode


mayasak wrote:

When I run the Freq Casite81*beh or Casite*year all have frequencies except 31021. I can't find any explanation for that. 

The freq table for the original variables (31010 1nd 31040) *beh has the following:

                               Beh

Seer_group      0      1       2        3

  31010          197   215     0      2465  

  31040         2172  118     0      187

 

Freq for Casite81 I get:

 

                             Beh

Seer_group      0        1       2        3

  31020           2369  333     0        0

  31030             0       0        0      2465

  31031             0       0        0      187

 

Freq Casite81 Seer_group *Year have only 31020, 31030, 31031 also.

I'm so confused why this 31021 does not show at all. 


Here is a way of looking at proc freq output that will often help diagnose coding issues:

 

Proc freq data=have;

   tables seer_group*beh*Casite81 /list missing;

run;

This will show the combinations of seer_group and beh that were assigned to Casite81 AND the missing results if any.

Often this will point to the places in your code you need to address logic issues like boundary values or unconsidered value combinations.

Ask a Question
Discussion stats
  • 4 replies
  • 237 views
  • 0 likes
  • 3 in conversation