Hi everyone,
My data is from a big csv file and in that file, the authors used ####### as a missing value. How do I get sas to identify this as a missing value and not try to fill it in?
My code right now is:
proc import out=work.problems
datafile="/home/u63556826/sasuser.v94/problems.csv"
dbms=csv
replace;
getnames=YES;
run;
proc print data=work.problems (obs = 6); run;
While there are ways to do this when you read in the data, it will probably be easier to convert it after you have read the data into a SAS dataset.
So you could do:
data problems2;
set problems ;
array chars{*} _character_ ;
do i=1 to dim(chars);
if chars{i}='########' then call missing(chars{i});
end;
run;
And if you want numeric variables, you could add code to create numeric variables from the character variables, using the INPUT function.
Would that look something like this:
proc import out=work.problems
datafile="/home/u63556826/sasuser.v94/problems.csv"
dbms=csv
replace;
getnames=YES;
run;
data problems2;
set problems ;
array chars{*} _character_ ;
do i=1 to dim(chars);
if chars{i}='########' then call missing(chars{i});
end;
run;
proc print data=work.problems (obs = 6); run;
Yes, that was my thought. How does it work for your need?
If the data are in a text or csv or other ascii file, you can use INFORMATs to tell SAS to convert '########' to missing values, both for numeric, and for character variables:
proc format ;
invalue shnum '########'=.
other=[best32.];
invalue $shchr '########'=' ' min=8 max=20
other=[$20.];
run;
data test;
informat default=shnum. ;
input id $5. num1 num2 char1 :$shchr10. char2 :$shchr. ;
datalines;
AAAAA ######## 1002 ch11_6789A ch12_6789ABC
BBBBB 2001 2002 ######## ch22_6789ABC
CCCCC 3001 ######## ch31_6789A ########
DDDDD ######## ######## ######## ########
run;
proc print;
var id num1 num2 char1 char2;
run;
Importing a csv file with non-standard missing values (standard = empty field) with proc import will lead to extra work and may lead to variables with a wrong guessed type. So better write a data step yourself to avoid all the things you have to do get the data step with proper typed variables (and lengths).
@andreas_lds wrote:
Importing a csv file with non-standard missing values (standard = empty field) with proc import will lead to extra work and may lead to variables with a wrong guessed type. So better write a data step yourself to avoid all the things you have to do get the data step with proper typed variables (and lengths).
Definitely agree. I have data sources where they can't even be consistent and one file will have: NA, not answered, UNK , unknown, refused to answer in ONE variable just for what should be missing values. Proc format can handle multiple strings like that to all set one standard "missing" value.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.