- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Greetings everyone
I have a dataset with a variable with missing values called NA. Because of this, it makes that variable a character variable, so I don't know how to make this numerical so that I can use multiple imputations to impute the missing data.
But I removed the Na and replaced it with a space; it still shows that the variable is a character variable, same with the replacing with the dot,
Please, what can I do to handle this? Thanks.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Greetings everyone
I have a dataset with a variable with missing values called NA. Because of this, it makes that variable a character variable, so I don't know how to make this numerical so that I can use multiple imputations to impute the missing data.
But I removed the Na and replaced it with a space; it still shows that the variable is a character variable, same with the replacing with the dot,
Please, what can I do to handle this? Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
From which source do you import the data into SAS?
Please do not post the same question multiple times. It only makes following the flow of the discussion harder, so I merged your posts.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, I want to import from SAS
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You can not import from SAS. You either import into SAS, or export from SAS.
If you have an external data source containing "NA" for missing values, you use an informat during the import process which reads "NA" as missing, and everything else as numbers.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks very much, but can i have the code format, to import the csv file as informal?,
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
First, create an informat:
proc format;
invalue mynum
'NA' = .
other = [best20.]
;
run;
Then, use it while you read the file:
data want;
infile "path_to_your_file.csv" dlm="," dsd truncover firstobs=2;
input
...
numvar :mynum.
...
;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for the answer,
but can easily decode what you have written, because still new to was coding.
For instance if the columns of my csv file where a,b,c and c is the numeric with missing values how could I fit them into the code you given me. so sorry if am too much asking questions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
To read a CSV file with 4 columns. Three character strings with max lengths of 10,20 and 300 followed by one numeric variable that has NA instead of null strings to represent missing you could do:
proc format;
invalue na 'NA'=. other=[32.];
run;
data want;
infile 'myfile.csv' dsd truncover firstobs=2;
length a $10 b $20 c $300 D 8;
input a b c d :na. ;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The columns of the csv file must appear in the correct order (as stated in the documentation/description of the file) in the INPUT statement, with informats if needed. See the documentation of List INPUT
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Open your csv file with a text editor (not with Excel!), and copy/paste the first few lines into a window opened with this button:
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
"subplotID","flowerID","garden","species","rater","compound","tot.vase.days" 1,"11812",1,1,8,1,25 2,"11713",1,1,7,1,10 3,"11611",1,1,6,1,25 4,"11411",1,1,4,1,23 5,"11112",1,1,1,1,NA 6,"11113",1,1,1,1,NA
these are the few lines of my csv file
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Looks like everything except the last column is for identification purposes and should therefore be stored as character.
data want;
infile "......" dlm="," dsdtruncover firstobs=2;
input
subplotID $
flowerID $
garden $
species $
rater $
compound $
tot_vase_days :mynum.
;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
For a simple file like that I wouldn't bother with doing anything fancy about the NA values.
In general the process I follow it to take first line from the file and paste it into the program editor. Remove the quotes and commas (and fix any column headers that are not valid variable names). Then add LENGTH at the front and insert length specifications for each variable so I can convert it into a LENGTH statement to DEFINE the variables.
But for something this simple I might just use in-line informat specifications in the INPUT statement and let SAS impute the variable type and length from the type and width of the informat used to read the variable. That is because the first 6 variables are all identifier variables and so should be defined as character since you will never be doing arithmetic with the values. To suppress the error messages from the NA value in the one numeric variable I would just add the ?? modifier to the INPUT statement.
data want;
infile 'myfile.csv' dsd truncover firstobs=2;
input (subplotID flowerID garden species rater compound) (:$8.) tot_vase_days ??;
run;