BookmarkSubscribeRSS Feed
mayasak
Quartz | Level 8

Hi,

I have a cancer data set that have a Seer_group variable (have the same coding as SEER Cancer Data (20010, 20020...31010, 31020...) and a beh variable coded as 0, 1, 2 and 3 ( depending on the malignancy behavior).

I'm trying to create a new variable CaSite81 which includes all the seer group variables plus adding 6 more variables for brain and breast cancer (also change the seer group variable into numeric)as follows:

CASite81=Seer group*1;
if Seer_group="" then CAsite81 = 99999;
if Seer_group in ( 26000 ) and beh=2
then CASite81= 26010 ; *Breast, In Situ;
if Seer_group in ( 26000 ) and beh=3
then CASite81= 26020 ; *Breast, Invasive;
if Seer_group in ( 31010 31040 ) and beh in (0 1)
then CASite81= 31020 ; *Brain and Other Nervous System (Benign);
if Seer_group in ( 31010 31040 ) and beh=3
then CASite81= 31021 ; *Brain and Other Nervous System (Malignant);
if Seer_group in ( 31010 ) and beh=3
then CASite81= 31030 ; *Brain (Malignant);
if Seer_group in ( 31040 ) and beh=3
then CASite81= 31031 ; *Cranial Nerves Other Nervous System (Malignant);

 

When I run the Freq Casite81*beh or Casite*year all have frequencies except 31021. I can't find any explanation for that. 

The freq table for the original variables (31010 1nd 31040) *beh has the following:

                               Beh

Seer_group      0      1       2        3

  31010          197   215     0      2465  

  31040         2172  118     0      187

 

Freq for Casite81 I get:

 

                             Beh

Seer_group      0        1       2        3

  31020           2369  333     0        0

  31030             0       0        0      2465

  31031             0       0        0      187

 

Freq Casite81 Seer_group *Year have only 31020, 31030, 31031 also.

I'm so confused why this 31021 does not show at all.

 

Thanks

 

 

4 REPLIES 4
Astounding
PROC Star

There can't be any 31021 in your final data set. 

 

Look at the final three IF/THEN statements.  Any time 31021 was assigned, one of the last two statements will change that value to either 31030 or 31031.  You can't have CASite81 taking on both 31021 and 31030 on the same observation.

mayasak
Quartz | Level 8

Thank you Astounding.

So is there any other way I can have those 4 new variables in my data set. 

Brain and Caranial nerves and other nervous system (Benign); with beh =  0 or 1

Brain and Caranial nerves and other nervous system (Malignant); with beh =3

Brain (malignant); with beh =3

Cranial nerves and other Nervous system (malignant); with beh =3

 

Brain code is 31010

Cranial nerves and other Nervous system code is 31040

beh benign code 0 or 1

beh malignant code is 3

Thanks,

Astounding
PROC Star

One approach would be this.  Don't replace CASite81 with values of 31030 and 31031.  Instead, create a new variable:


if Seer_group in ( 31010 31040 ) and beh=3
then do;

   CASite81= 31021 ; *Brain and Other Nervous System (Malignant);
   if Seer_group = 31010 then CAsubSite81= 31030 ; /* Brain (Malignant) */
   else CAsubSite81= 31031 ; /* Cranial Nerves Other Nervous System (Malignant) */

end;

 

Notice that the style of commenting changes also.  That's needed to permit the ELSE statement to work.

ballardw
Super User

@mayasak wrote:

When I run the Freq Casite81*beh or Casite*year all have frequencies except 31021. I can't find any explanation for that. 

The freq table for the original variables (31010 1nd 31040) *beh has the following:

                               Beh

Seer_group      0      1       2        3

  31010          197   215     0      2465  

  31040         2172  118     0      187

 

Freq for Casite81 I get:

 

                             Beh

Seer_group      0        1       2        3

  31020           2369  333     0        0

  31030             0       0        0      2465

  31031             0       0        0      187

 

Freq Casite81 Seer_group *Year have only 31020, 31030, 31031 also.

I'm so confused why this 31021 does not show at all. 


Here is a way of looking at proc freq output that will often help diagnose coding issues:

 

Proc freq data=have;

   tables seer_group*beh*Casite81 /list missing;

run;

This will show the combinations of seer_group and beh that were assigned to Casite81 AND the missing results if any.

Often this will point to the places in your code you need to address logic issues like boundary values or unconsidered value combinations.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 962 views
  • 0 likes
  • 3 in conversation