Hi, I have a cancer data set that have a Seer_group variable (have the same coding as SEER Cancer Data (20010, 20020...31010, 31020...) and a beh variable coded as 0, 1, 2 and 3 ( depending on the malignancy behavior). I'm trying to create a new variable CaSite81 which includes all the seer group variables plus adding 6 more variables for brain and breast cancer (also change the seer group variable into numeric)as follows: CASite81=Seer group*1; if Seer_group="" then CAsite81 = 99999; if Seer_group in ( 26000 ) and beh=2 then CASite81= 26010 ; *Breast, In Situ; if Seer_group in ( 26000 ) and beh=3 then CASite81= 26020 ; *Breast, Invasive; if Seer_group in ( 31010 31040 ) and beh in (0 1) then CASite81= 31020 ; *Brain and Other Nervous System (Benign); if Seer_group in ( 31010 31040 ) and beh=3 then CASite81= 31021 ; *Brain and Other Nervous System (Malignant); if Seer_group in ( 31010 ) and beh=3 then CASite81= 31030 ; *Brain (Malignant); if Seer_group in ( 31040 ) and beh=3 then CASite81= 31031 ; *Cranial Nerves Other Nervous System (Malignant); When I run the Freq Casite81*beh or Casite*year all have frequencies except 31021. I can't find any explanation for that. The freq table for the original variables (31010 1nd 31040) *beh has the following: Beh Seer_group 0 1 2 3 31010 197 215 0 2465 31040 2172 118 0 187 Freq for Casite81 I get: Beh Seer_group 0 1 2 3 31020 2369 333 0 0 31030 0 0 0 2465 31031 0 0 0 187 Freq Casite81 Seer_group *Year have only 31020, 31030, 31031 also. I'm so confused why this 31021 does not show at all. Thanks
... View more