BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Gilles-Protais
Obsidian | Level 7

Greetings everyone

 I have a dataset with a variable with missing values called NA. Because of this, it makes that variable a character variable, so I don't know how to make this numerical so that I can use multiple imputations to impute the missing data. 

But I removed the Na and replaced it with a space; it still shows that the variable is a character variable, same with the replacing with the dot, 

Please, what can I do to handle this? Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
16 REPLIES 16
Gilles-Protais
Obsidian | Level 7

Greetings everyone

 I have a dataset with a variable with missing values called NA. Because of this, it makes that variable a character variable, so I don't know how to make this numerical so that I can use multiple imputations to impute the missing data. 

But I removed the Na and replaced it with a space; it still shows that the variable is a character variable, same with the replacing with the dot, 

Please, what can I do to handle this? Thanks.

Kurt_Bremser
Super User

From which source do you import the data into SAS?

 

Please do not post the same question multiple times. It only makes following the flow of the discussion harder, so I merged your posts.

Kurt_Bremser
Super User

You can not import from SAS. You either import into SAS, or export from SAS.

If you have an external data source containing "NA" for missing values, you use an informat during the import process which reads "NA" as missing, and everything else as numbers.

Gilles-Protais
Obsidian | Level 7
Yes it is an external data source. Precisely csv

Gilles-Protais
Obsidian | Level 7

Thanks very much, but can i have the code format, to import the csv file as informal?,

Kurt_Bremser
Super User

First, create an informat:

proc format;
invalue mynum
  'NA' = .
  other = [best20.]
;
run;

Then, use it while you read the file:

data want;
infile "path_to_your_file.csv" dlm="," dsd truncover firstobs=2;
input
  ...
  numvar :mynum.
  ...
;
run;
Gilles-Protais
Obsidian | Level 7

Thank you for the answer,

 but can easily decode what you have written, because still new to was coding. 

For instance if the columns of my csv file where a,b,c and c is the numeric with missing values how could I fit them into the code you given me. so sorry if am too much asking questions.

Tom
Super User Tom
Super User

To read a CSV file with 4 columns. Three character strings with max lengths of 10,20 and 300 followed by one numeric variable that has NA instead of null strings to represent missing you could do:

proc format;
invalue na 'NA'=. other=[32.];
run;
data want;
  infile 'myfile.csv' dsd truncover firstobs=2;
  length a $10 b $20 c $300 D 8;
  input a b c d :na. ;
run;
Kurt_Bremser
Super User

The columns of the csv file must appear in the correct order (as stated in the documentation/description of the file) in the INPUT statement, with informats if needed. See the documentation of List INPUT 

Gilles-Protais
Obsidian | Level 7
"subplotID","flowerID","garden","species","rater","compound","tot.vase.days"
1,"11812",1,1,8,1,25
2,"11713",1,1,7,1,10
3,"11611",1,1,6,1,25
4,"11411",1,1,4,1,23
5,"11112",1,1,1,1,NA
6,"11113",1,1,1,1,NA

these are the few lines of my csv file

Kurt_Bremser
Super User

Looks like everything except the last column is for identification purposes and should therefore be stored as character.

data want;
infile "......" dlm="," dsdtruncover firstobs=2;
input
  subplotID $
  flowerID $
  garden $
  species $
  rater $
  compound $
  tot_vase_days :mynum.
;
run;
Tom
Super User Tom
Super User

For a simple file like that I wouldn't bother with doing anything fancy about the NA values.

 

In general the process I follow it to take first line from the file and paste it into the program editor.  Remove the quotes and commas (and fix any column headers that are not valid variable names).  Then add LENGTH at the front and insert length specifications for each variable so I can convert it into a LENGTH statement to DEFINE the variables.

 

But for something this simple I might just use in-line informat specifications in the INPUT statement and let SAS impute the variable type and length from the type and width of the informat used to read the variable. That is because the first 6 variables are all identifier variables and so should be defined as character since you will never be doing arithmetic with the values.  To suppress the error messages from the NA value in the one numeric variable I would just add the ?? modifier to the INPUT statement.

 

data want;
  infile 'myfile.csv' dsd truncover firstobs=2;
  input (subplotID flowerID garden species rater compound) (:$8.) tot_vase_days ??;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 16 replies
  • 1831 views
  • 0 likes
  • 3 in conversation