DATA Step, Macro, Functions and more

Data step

Reply
Occasional Contributor
Posts: 13

Data step

Dear All,

 

The data contains variable called AJCC with the sub-variables: stage I, Stage II, Stage III, Stage IV, and Unknown Stage. But I keep getting missing output "Stage III". see the attached document for detail. Thank you for any help related to the problem.

 

abuanuazu

 

Trusted Advisor
Posts: 1,117

Re: Data step

[ Edited ]
Posted in reply to abuanuazu

Hello @abuanuazu,

 

First of all, something must be missing in your code, probably a SET statement reading another dataset. Otherwise, where should variable AJCC come from?

 

The reason why SAS seems to omit "Stage III" is most likely that you did not specify a sufficient length for character variable AJCC in the other dataset. There should be a statement like the following in the data step creating the dataset which contains AJCC:

length AJCC $9;

If there was no LENGTH statement for AJCC, this variable would be assigned the default length for character variabes, which is 8, or the length of the first value assigned to it. But the string "Stage III" has length 9. The last roman digit "I" would be truncated if AJCC had length 8. Hence, all values which should read "Stage III" would get the value "Stage II". Not surprisingly, "Stage 2" has a particularly large percentage in your output (because most probably it is in fact the union of Stage II and Stage III).

 

The same goes for variabe MYSTAGE. The value "Unknown Stage" is truncated to "Unknown" due to the missing length specification.

Occasional Contributor
Posts: 13

Re: Data step

Posted in reply to FreelanceReinhard
Thank you for your prompt response.

This is the actual code:
Data cancer;
Infile "C:\sas\bcancer.csv" dlm="," firstobs=2;
Input AJCC $13;
Run;

Data status;
Set cancer;
if AJCC="Stage I" then mystage="Stage 1";
else if AJCC="Stage II" then mystage="Stage 2";
else if AJCC="Stage III" then mystage="Stage 3";
else if AJCC="Stage IV" then mystage="Stage 4";
else
mystage="Unknown Stage";
Run;

proc freq data=status;
table mystage/chisq;
run;

initially to test the data i didn't assign length. After your response, I add length to AJCC $13 and execute the code. The output produce all the data to "unknown" 100% data.

Any suggestion?
Trusted Advisor
Posts: 1,117

Re: Data step

[ Edited ]
Posted in reply to abuanuazu

Your INPUT statement is invalid. [Edit: No, as pointed out by Tom, it's actually valid syntactically, although incorrect, in that it requests column input (for column 13) both senselessly and in a misleading way.]  Either correct it to read

input AJCC :$13.;

or insert a LENGTH statement before the INPUT statement:

length AJCC $13;

Then you can omit the informat specification completely and just write

input AJCC;

 

The LENGTH statement for MYSTAGE is still missing in the second data step.

Super User
Super User
Posts: 7,039

Re: Data step

[ Edited ]
Posted in reply to abuanuazu

Look carefully at your INPUT statement.

 

input AJCC $13;

 

Since there is no period after the 13 it is taken to mean a column number instead of an informat. So it means to read the 13th character.  That is why you do not see any of the values.

 

Also why did you set the delimiter to a comma on the INFILE statement?  If the file really is a CSV file with more than one column then you should use the DSD option to make sure missing values are properly handled.  If you just have a single column of values then it easiest to just read the value and not worry about delimiters. But you might want to add a TRUNCOVER option so that it properly handles lines with less than 13 characters. 

 

data cancer;
  infile "C:\sas\bcancer.csv" firstobs=2 truncover;
  input AJCC $13.;
run;

Occasional Contributor
Posts: 13

Re: Data step

Dear All,

 

Thank you for all help on Data Step question. I made a mistake in formating the real data from .xlsx to csv that cause all the problem.

I used the orginal data import to SAS and works fine. Thank you all again. please see the attached  result.

 

Abuanuazu

Ask a Question
Discussion stats
  • 5 replies
  • 1336 views
  • 5 likes
  • 3 in conversation