Hi all,
Working on (merged) dataset named HNC. Trying to group disease codes (C ...) together for subsites. Tried grouping as below but getting errors (also below). Anyone know a quick, simple fix please?
Any help would be appreciated.
KR,
Craig
(CODE)
DATA = HNC;
IF SUBSITE = C00.3, C00.4, C00.5, C00.6, C00.8, C00.9,
C02.0, C02.1, C02.2, C02.3, C02.8, C02.9
C03.0, C03.1, C03.9
C04.0, C04.1, C04.8, C04.9
C05.0 C05.1, C05.2, C05.8, C05.9
C06.0, C06.1, C06.2 C06.8, C06.9,
THEN SUBSITE = ORAL CAVITY CANCER;
IF SUBSITE = C01.X,
C02.4,
C09.0, C09.1, C09.8, C09.9,
C10.0, C10.1, C10.2, C10.3, C10.4, C10.8, C10.9
C14.0, C14.2,
THEN SUBSITE = OROPHARYNGEAL CANCER;
IF SUBSITE = C32.0, C32.1, C32.2, C32.3, C32.8, C32.9,
THEN SUBSITE = LARYNGEAL CANCER;
IF SUBSITE = C00.3, C00.4, C00.5, C00.6, C00.8, C00.9
C01.X,
C02.0, C02.1, C02.2, C02.3, C02.4, C02.8, C02.9
C03.0, C03.1, C03.9
C04.0, C04.1, C04.8, C04.9
C05.0, C05.1, C05.2, C05.8, C05.9
C06.0, C06.1, C06.2, C06.8, C06.9
C09.0, C09.1, C09.8, C09.9
C10.0, C10.1, C10.2, C10.3, C10.4, C10.8, C10.9
C11.0, C11.1, C11.2, C11.3, C11.8, C11.9
C12.X,
C13.0, C13.1, C13.2, C13.8, C13.9
C14.0, C14.2, C14.8
C32.0, C32.1, C32.2, C32.3, C32.8, C32.9,
THEN SUBSITE = HEAD AND NECK CANCER;
RUN;
(ERRORS)
455 DATA = HNC;
----
180
ERROR 180-322: Statement is not valid or it is used out of proper order.
456
457 IF SUBSITE = C00.3, C00.4, C00.5, C00.6, C00.8, C00.9,
--
180
ERROR 180-322: Statement is not valid or it is used out of proper order.
458 C02.0, C02.1, C02.2, C02.3, C02.8, C02.9
459 C03.0, C03.1, C03.9
460 C04.0, C04.1, C04.8, C04.9
461 C05.0 C05.1, C05.2, C05.8, C05.9
462 C06.0, C06.1, C06.2 C06.8, C06.9,
463 THEN SUBSITE = ORAL CAVITY CANCER;
464
465 IF SUBSITE = C01.X,
--
180
ERROR 180-322: Statement is not valid or it is used out of proper order.
466 C02.4,
467 C09.0, C09.1, C09.8, C09.9,
468 C10.0, C10.1, C10.2, C10.3, C10.4, C10.8, C10.9
469 C14.0, C14.2,
470 THEN SUBSITE = OROPHARYNGEAL CANCER;
471
472 IF SUBSITE = C32.0, C32.1, C32.2, C32.3, C32.8, C32.9,
--
180
ERROR 180-322: Statement is not valid or it is used out of proper order.
473 THEN SUBSITE = LARYNGEAL CANCER;
474
475 IF SUBSITE = C00.3, C00.4, C00.5, C00.6, C00.8, C00.9
--
180
ERROR 180-322: Statement is not valid or it is used out of proper order.
476 C01.X,
477 C02.0, C02.1, C02.2, C02.3, C02.4, C02.8, C02.9
478 C03.0, C03.1, C03.9
479 C04.0, C04.1, C04.8, C04.9
480 C05.0, C05.1, C05.2, C05.8, C05.9
481 C06.0, C06.1, C06.2, C06.8, C06.9
482 C09.0, C09.1, C09.8, C09.9
483 C10.0, C10.1, C10.2, C10.3, C10.4, C10.8, C10.9
484 C11.0, C11.1, C11.2, C11.3, C11.8, C11.9
485 C12.X,
486 C13.0, C13.1, C13.2, C13.8, C13.9
487 C14.0, C14.2, C14.8
488 C32.0, C32.1, C32.2, C32.3, C32.8, C32.9,
489 THEN SUBSITE = HEAD AND NECK CANCER;
490
491
492 RUN;
DATA = HNC;
No equal sign here.
IF SUBSITE = C00.3, C00.4, C00.5, C00.6, C00.8, C00.9,
This is not correct syntax, SAS does not understand this. What you probably want is this:
IF SUBSITE in ('C00.3','C00.4','C00.5','C00.6','C00.8','C00.9',
and of course, you will have to fix all of your code to be proper syntax.
DATA = HNC;
No equal sign here.
IF SUBSITE = C00.3, C00.4, C00.5, C00.6, C00.8, C00.9,
This is not correct syntax, SAS does not understand this. What you probably want is this:
IF SUBSITE in ('C00.3','C00.4','C00.5','C00.6','C00.8','C00.9',
and of course, you will have to fix all of your code to be proper syntax.
Thanks Paige, using your feedback readjusted code to below:
DATA HNC;
SET SOURCE;
IF SUBSITE IN = (C00.3, C00.4, C00.5, C00.6, C00.8, C00.9,
C02.0, C02.1, C02.2, C02.3, C02.8, C02.9
C03.0, C03.1, C03.9
C04.0, C04.1, C04.8, C04.9
C05.0 C05.1, C05.2, C05.8, C05.9
C06.0, C06.1, C06.2 C06.8, C06.9)
THEN SUB = 'ORAL CAVITY CANCER';
IF SUBSITE IN = (C01.X,
C02.4,
C09.0, C09.1, C09.8, C09.9,
C10.0, C10.1, C10.2, C10.3, C10.4, C10.8, C10.9
C14.0, C14.2)
THEN SUB = 'OROPHARYNGEAL CANCER';
IF SUBSITE IN = (C32.0, C32.1, C32.2, C32.3, C32.8, C32.9)
THEN SUB = 'LARYNGEAL CANCER';
IF SUBSITE IN = (C00.3, C00.4, C00.5, C00.6, C00.8, C00.9
C01.X,
C02.0, C02.1, C02.2, C02.3, C02.4, C02.8, C02.9
C03.0, C03.1, C03.9
C04.0, C04.1, C04.8, C04.9
C05.0, C05.1, C05.2, C05.8, C05.9
C06.0, C06.1, C06.2, C06.8, C06.9
C09.0, C09.1, C09.8, C09.9
C10.0, C10.1, C10.2, C10.3, C10.4, C10.8, C10.9
C11.0, C11.1, C11.2, C11.3, C11.8, C11.9
C12.X,
C13.0, C13.1, C13.2, C13.8, C13.9
C14.0, C14.2, C14.8
C32.0, C32.1, C32.2, C32.3, C32.8, C32.9)
THEN SUB = 'HEAD AND NECK CANCER';
RUN;
This produced more errors though 😞
531 DATA HNC;
532 SET SOURCE;
ERROR: File WORK.SOURCE.DATA does not exist.
533 IF SUBSITE IN = (C00.3, C00.4, C00.5, C00.6, C00.8, C00.9,
-
390
76
ERROR 390-185: Expecting an relational or arithmetic operator.
ERROR 76-322: Syntax error, statement will be ignored.
534 C02.0, C02.1, C02.2, C02.3, C02.8, C02.9
535 C03.0, C03.1, C03.9
536 C04.0, C04.1, C04.8, C04.9
537 C05.0 C05.1, C05.2, C05.8, C05.9
538 C06.0, C06.1, C06.2 C06.8, C06.9)
539 THEN SUB = 'ORAL CAVITY CANCER';
540
541 IF SUBSITE IN = (C01.X,
-
22
-----
557
ERROR: The right-hand operand must be an array name or a constant value list. The specified
name NAME, is not an array.
ERROR: DATA STEP Component Object failure. Aborted during the COMPILATION phase.
ERROR 22-322: Expecting a name.
ERROR 557-185: Variable C01 is not an object.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: DATA statement used (Total process time):
real time 0.04 seconds
cpu time 0.01 seconds
542 C02.4,
543 C09.0, C09.1, C09.8, C09.9,
544 C10.0, C10.1, C10.2, C10.3, C10.4, C10.8, C10.9
545 C14.0, C14.2)
546 THEN SUB = 'OROPHARYNGEAL CANCER';
547
548 IF SUBSITE IN = (C32.0, C32.1, C32.2, C32.3, C32.8, C32.9)
549 THEN SUB = 'LARYNGEAL CANCER';
550
551 IF SUBSITE IN = (C00.3, C00.4, C00.5, C00.6, C00.8, C00.9
552 C01.X,
553 C02.0, C02.1, C02.2, C02.3, C02.4, C02.8, C02.9
554 C03.0, C03.1, C03.9
555 C04.0, C04.1, C04.8, C04.9
556 C05.0, C05.1, C05.2, C05.8, C05.9
557 C06.0, C06.1, C06.2, C06.8, C06.9
558 C09.0, C09.1, C09.8, C09.9
559 C10.0, C10.1, C10.2, C10.3, C10.4, C10.8, C10.9
560 C11.0, C11.1, C11.2, C11.3, C11.8, C11.9
561 C12.X,
562 C13.0, C13.1, C13.2, C13.8, C13.9
563 C14.0, C14.2, C14.8
564 C32.0, C32.1, C32.2, C32.3, C32.8, C32.9)
565 THEN SUB = 'HEAD AND NECK CANCER';
566
567
568 RUN;
You need a SET statement to bring your existing data from the set that has the variable Subsite, otherwise there are no values for the variable Subsite and you get nothing in the output. But it needs the name of YOUR data set. @PaigeMiller used "Set Source;" to indicate you needed an existing set.
Hey Ballard,
Thanks for the message.
I see - i replaced it with the name of my dataset but that still hasn't solved it.
KR,
Craig
569 DATA HNC;
570 SET HNC;
571 IF SUBSITE IN = (C00.3, C00.4, C00.5, C00.6, C00.8, C00.9,
-
390
76
ERROR 390-185: Expecting an relational or arithmetic operator.
ERROR 76-322: Syntax error, statement will be ignored.
572 C02.0, C02.1, C02.2, C02.3, C02.8, C02.9
573 C03.0, C03.1, C03.9
574 C04.0, C04.1, C04.8, C04.9
575 C05.0 C05.1, C05.2, C05.8, C05.9
576 C06.0, C06.1, C06.2 C06.8, C06.9)
577 THEN SUB = 'OCC';
578
579 IF SUBSITE IN = (C01.X,
-
22
-----
557
ERROR: The right-hand operand must be an array name or a constant value list. The specified
name NAME, is not an array.
ERROR: DATA STEP Component Object failure. Aborted during the COMPILATION phase.
ERROR 22-322: Expecting a name.
ERROR 557-185: Variable C01 is not an object.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
580 C02.4,
581 C09.0, C09.1, C09.8, C09.9,
582 C10.0, C10.1, C10.2, C10.3, C10.4, C10.8, C10.9
583 C14.0, C14.2)
584 THEN SUB = 'OPC';
585
586 IF SUBSITE IN = (C32.0, C32.1, C32.2, C32.3, C32.8, C32.9)
587 THEN SUB = 'Larynx';
588
589 IF SUBSITE IN = (C00.3, C00.4, C00.5, C00.6, C00.8, C00.9
590 C01.X,
591 C02.0, C02.1, C02.2, C02.3, C02.4, C02.8, C02.9
592 C03.0, C03.1, C03.9
593 C04.0, C04.1, C04.8, C04.9
594 C05.0, C05.1, C05.2, C05.8, C05.9
595 C06.0, C06.1, C06.2, C06.8, C06.9
596 C09.0, C09.1, C09.8, C09.9
597 C10.0, C10.1, C10.2, C10.3, C10.4, C10.8, C10.9
598 C11.0, C11.1, C11.2, C11.3, C11.8, C11.9
599 C12.X,
600 C13.0, C13.1, C13.2, C13.8, C13.9
601 C14.0, C14.2, C14.8
602 C32.0, C32.1, C32.2, C32.3, C32.8, C32.9)
603 THEN SUB = 'HNC';
604
605
606 RUN;
This is NOT what I said you should use.
IF SUBSITE IN = (C00.3, C00.4, C00.5, C00.6, C00.8, C00.9,
Please look very carefully at what I said your code should be in my earlier message, and you should do the same in your code.
Until you become more familiar with using SAS and the data step in particular I strongly suggest that you do not use the same data set as output like this attempts:
DATA HNC; SET HNC;
This syntax completely replaces the data set if you don't have syntax errors. But you are very likely to have many logic errors learning the code properly and have the possibility of corrupting your data in a manner that is not recoverable barring rereading the data from scratch.
Strongly recommend using a different data set for output such as this or something similar.
DATA HNC_new; SET HNC;
That way you still have the set HNC and can actually compare data sets if needed.
Also, it is best to post text from the LOG into a text box opened on the forum with the </> icon that appears above the message window. It will preserve the formatting of the text. Note how the message window moved the location of the underscore from where it actually appeared in your log?
Got it workign was missing quotes - thanks so much Paige !!
One thing though - how can I get an entry of data to belong to 2 different groups simultaneously. As you'll see, HNC is all of the sites (and groups) combined.
Best wishes,
Craig
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.