02-28-2016 08:42 AM
02-28-2016 09:24 AM - edited 02-28-2016 09:26 AM
First of all, something must be missing in your code, probably a SET statement reading another dataset. Otherwise, where should variable AJCC come from?
The reason why SAS seems to omit "Stage III" is most likely that you did not specify a sufficient length for character variable AJCC in the other dataset. There should be a statement like the following in the data step creating the dataset which contains AJCC:
length AJCC $9;
If there was no LENGTH statement for AJCC, this variable would be assigned the default length for character variabes, which is 8, or the length of the first value assigned to it. But the string "Stage III" has length 9. The last roman digit "I" would be truncated if AJCC had length 8. Hence, all values which should read "Stage III" would get the value "Stage II". Not surprisingly, "Stage 2" has a particularly large percentage in your output (because most probably it is in fact the union of Stage II and Stage III).
The same goes for variabe MYSTAGE. The value "Unknown Stage" is truncated to "Unknown" due to the missing length specification.
02-28-2016 10:23 AM
02-28-2016 10:52 AM - edited 02-28-2016 03:34 PM
Your INPUT statement is invalid. [Edit: No, as pointed out by Tom, it's actually valid syntactically, although incorrect, in that it requests column input (for column 13) both senselessly and in a misleading way.] Either correct it to read
input AJCC :$13.;
or insert a LENGTH statement before the INPUT statement:
length AJCC $13;
Then you can omit the informat specification completely and just write
The LENGTH statement for MYSTAGE is still missing in the second data step.
02-28-2016 11:07 AM - edited 02-28-2016 11:07 AM
Look carefully at your INPUT statement.
input AJCC $13;
Since there is no period after the 13 it is taken to mean a column number instead of an informat. So it means to read the 13th character. That is why you do not see any of the values.
Also why did you set the delimiter to a comma on the INFILE statement? If the file really is a CSV file with more than one column then you should use the DSD option to make sure missing values are properly handled. If you just have a single column of values then it easiest to just read the value and not worry about delimiters. But you might want to add a TRUNCOVER option so that it properly handles lines with less than 13 characters.
infile "C:\sas\bcancer.csv" firstobs=2 truncover;
input AJCC $13.;
03-05-2016 12:46 PM