BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
yewkeong
Calcite | Level 5

Hi, i have trouble importing my text files, my output does not get the same as my lecure given.

Bellow are my codes and files attachment.

Thnks!

 

data ass.titaniccsv;
length row_names 3 pclass $3 survived 3 name $82 age 3
embarked $3 home_dest $70 room $5 ticket $20 boat $3 sex $10;

 

infile '/home/titanic.txt' dlm=',' dsd;
input row_names pclass $ survived name ~ $82. age
embarked $ home_dest $ room $ ticket $ boat $ sex $;

run;

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

If you want a column to be numeric, but have non-numeric data in the input stream that signals a missing value, your best choice is to read into a temporary character variable and conditionally convert from that:

data ....;
infile .....;
length ...... _age $3 age 3 ......;
input ..... _age .....;
if _age = 'NA'
then age = .;
else age = input(_age,3.);
drop _age;
run;

View solution in original post

9 REPLIES 9
Reeza
Super User

Any errors in your log?

yewkeong
Calcite | Level 5

no error, just warning. 

the invalid data appeared every line.

 

NOTE: Invalid data for age in line 73 41-42.
WARNING: Limit set by ERRORS= option reached. Further errors of this type will not be printed.
73 "72","1st",0,"Crafton, Mr John Bertram",NA,"Southampton","Roachdale, IN","","","","male" 88
row_names=72 pclass=1st survived=0 name="Crafton, Mr John Bertram" age=. embarked=Sou home_dest=Roachdale, IN room= ticket= boat=
sex=male _ERROR_=1 _N_=73
Reeza
Super User

Your code between this and your other question differ. 

 

I would suggest using proc import and set GUESSINGROWS to the number of observation. The code to read the data will be in your log. Use that to compare t your current code to find the errors. 

yewkeong
Calcite | Level 5

alrigh noted with thanks!!!

 

Shmuel
Garnet | Level 18

Just to relate to the error message:

       NOTE: Invalid data for age in line 73 41-42.

 

in your code you defined:   length ... age 3 ...  whichs means that a numeric data is expected

the input value for age in line 73 (and many others) is NA that is an alphanumeric data,

and as such it is invalid, therfore output is: age=. which means age has missing value.

 

In your first post you wrote: my output does not get the same as my lecure given.

What have you been given ? Are there other discompatibilities ?

yewkeong
Calcite | Level 5
oh i see! didnt notice my age variable having an alpha data. Thanks man!

the output must be the same as the text files that i attached
Kurt_Bremser
Super User

If you want a column to be numeric, but have non-numeric data in the input stream that signals a missing value, your best choice is to read into a temporary character variable and conditionally convert from that:

data ....;
infile .....;
length ...... _age $3 age 3 ......;
input ..... _age .....;
if _age = 'NA'
then age = .;
else age = input(_age,3.);
drop _age;
run;
yewkeong
Calcite | Level 5
hey it works! but can u explain more details for me about this code? i dont really understand y must be two variables. At first store inside the character variables which is _age $3, if got 'NA' then set the numeric variables equals to missing value. if is not 'NA' the convert the _age into numeric? is it??

i'm sorry i very new aboout sas studio. =')
Kurt_Bremser
Super User

@yewkeong wrote:
hey it works! but can u explain more details for me about this code? i dont really understand y must be two variables. At first store inside the character variables which is _age $3, if got 'NA' then set the numeric variables equals to missing value. if is not 'NA' the convert the _age into numeric? is it??

i'm sorry i very new aboout sas studio. =')

You are completely right about how the step works.

As for your question: in a dataset, a column can only have one type throughout the dataset, just like you have in database systems. This is where SAS differs from a spreadsheet program like Excel or OpenOffice calc. Since the data step data structures are taken from the contributing datasets, the same is true for the data step: a variable can only be of one type.

So one has to read mixed type values into a separate character variable and then conditionally convert to numeric values, the character variable can be kept for future reference or be dropped.

 

You could avoid the temporary variable, but trying to read mixed data directly into numeric will cause a lot of messages and finally a WARNING or ERROR in the log. Clean programming avoids those, so when you have them appearing in the log unexpectedly, you know something's wrong in the input data.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 5310 views
  • 0 likes
  • 4 in conversation