Greetings everyone
I have a dataset with a variable with missing values called NA. Because of this, it makes that variable a character variable, so I don't know how to make this numerical so that I can use multiple imputations to impute the missing data.
But I removed the Na and replaced it with a space; it still shows that the variable is a character variable, same with the replacing with the dot,
Please, what can I do to handle this? Thanks.
Greetings everyone
I have a dataset with a variable with missing values called NA. Because of this, it makes that variable a character variable, so I don't know how to make this numerical so that I can use multiple imputations to impute the missing data.
But I removed the Na and replaced it with a space; it still shows that the variable is a character variable, same with the replacing with the dot,
Please, what can I do to handle this? Thanks.
From which source do you import the data into SAS?
Please do not post the same question multiple times. It only makes following the flow of the discussion harder, so I merged your posts.
Yes, I want to import from SAS
You can not import from SAS. You either import into SAS, or export from SAS.
If you have an external data source containing "NA" for missing values, you use an informat during the import process which reads "NA" as missing, and everything else as numbers.
Thanks very much, but can i have the code format, to import the csv file as informal?,
First, create an informat:
proc format;
invalue mynum
'NA' = .
other = [best20.]
;
run;
Then, use it while you read the file:
data want;
infile "path_to_your_file.csv" dlm="," dsd truncover firstobs=2;
input
...
numvar :mynum.
...
;
run;
Thank you for the answer,
but can easily decode what you have written, because still new to was coding.
For instance if the columns of my csv file where a,b,c and c is the numeric with missing values how could I fit them into the code you given me. so sorry if am too much asking questions.
To read a CSV file with 4 columns. Three character strings with max lengths of 10,20 and 300 followed by one numeric variable that has NA instead of null strings to represent missing you could do:
proc format;
invalue na 'NA'=. other=[32.];
run;
data want;
infile 'myfile.csv' dsd truncover firstobs=2;
length a $10 b $20 c $300 D 8;
input a b c d :na. ;
run;
The columns of the csv file must appear in the correct order (as stated in the documentation/description of the file) in the INPUT statement, with informats if needed. See the documentation of List INPUT
Open your csv file with a text editor (not with Excel!), and copy/paste the first few lines into a window opened with this button:
"subplotID","flowerID","garden","species","rater","compound","tot.vase.days" 1,"11812",1,1,8,1,25 2,"11713",1,1,7,1,10 3,"11611",1,1,6,1,25 4,"11411",1,1,4,1,23 5,"11112",1,1,1,1,NA 6,"11113",1,1,1,1,NA
these are the few lines of my csv file
Looks like everything except the last column is for identification purposes and should therefore be stored as character.
data want;
infile "......" dlm="," dsdtruncover firstobs=2;
input
subplotID $
flowerID $
garden $
species $
rater $
compound $
tot_vase_days :mynum.
;
run;
For a simple file like that I wouldn't bother with doing anything fancy about the NA values.
In general the process I follow it to take first line from the file and paste it into the program editor. Remove the quotes and commas (and fix any column headers that are not valid variable names). Then add LENGTH at the front and insert length specifications for each variable so I can convert it into a LENGTH statement to DEFINE the variables.
But for something this simple I might just use in-line informat specifications in the INPUT statement and let SAS impute the variable type and length from the type and width of the informat used to read the variable. That is because the first 6 variables are all identifier variables and so should be defined as character since you will never be doing arithmetic with the values. To suppress the error messages from the NA value in the one numeric variable I would just add the ?? modifier to the INPUT statement.
data want;
infile 'myfile.csv' dsd truncover firstobs=2;
input (subplotID flowerID garden species rater compound) (:$8.) tot_vase_days ??;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.