Dear All,
The data contains variable called AJCC with the sub-variables: stage I, Stage II, Stage III, Stage IV, and Unknown Stage. But I keep getting missing output "Stage III". see the attached document for detail. Thank you for any help related to the problem.
abuanuazu
Hello @abuanuazu,
First of all, something must be missing in your code, probably a SET statement reading another dataset. Otherwise, where should variable AJCC come from?
The reason why SAS seems to omit "Stage III" is most likely that you did not specify a sufficient length for character variable AJCC in the other dataset. There should be a statement like the following in the data step creating the dataset which contains AJCC:
length AJCC $9;
If there was no LENGTH statement for AJCC, this variable would be assigned the default length for character variabes, which is 8, or the length of the first value assigned to it. But the string "Stage III" has length 9. The last roman digit "I" would be truncated if AJCC had length 8. Hence, all values which should read "Stage III" would get the value "Stage II". Not surprisingly, "Stage 2" has a particularly large percentage in your output (because most probably it is in fact the union of Stage II and Stage III).
The same goes for variabe MYSTAGE. The value "Unknown Stage" is truncated to "Unknown" due to the missing length specification.
Your INPUT statement is invalid. [Edit: No, as pointed out by Tom, it's actually valid syntactically, although incorrect, in that it requests column input (for column 13) both senselessly and in a misleading way.] Either correct it to read
input AJCC :$13.;
or insert a LENGTH statement before the INPUT statement:
length AJCC $13;
Then you can omit the informat specification completely and just write
input AJCC;
The LENGTH statement for MYSTAGE is still missing in the second data step.
Look carefully at your INPUT statement.
input AJCC $13;
Since there is no period after the 13 it is taken to mean a column number instead of an informat. So it means to read the 13th character. That is why you do not see any of the values.
Also why did you set the delimiter to a comma on the INFILE statement? If the file really is a CSV file with more than one column then you should use the DSD option to make sure missing values are properly handled. If you just have a single column of values then it easiest to just read the value and not worry about delimiters. But you might want to add a TRUNCOVER option so that it properly handles lines with less than 13 characters.
data cancer;
infile "C:\sas\bcancer.csv" firstobs=2 truncover;
input AJCC $13.;
run;
Dear All,
Thank you for all help on Data Step question. I made a mistake in formating the real data from .xlsx to csv that cause all the problem.
I used the orginal data import to SAS and works fine. Thank you all again. please see the attached result.
Abuanuazu
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.