Hi Experts:
Simple question:
I have a CSV file and one of a column has a dot (.) value
I use infile to read this column into SAS table as a character value and the dot is gone.
What is the reason behind?
The single period is read as MISSING for both numeric and character variables. For character variables a missing values means all blank. So it is on purpose, or at least a foreseeable side effect of how SAS reads data by default.
Consider the more normal LIST MODE input designed to read a more human readable text file where extra spaces (aka delimiters) are inserted to make the columns of values line up.
data class;
length name $20 age 8 sex $1 height weight 8 ;
input name age sex height weight ;
cards;
Alfred 14 M 69 112.5
Alice 13 F 56.5 84
Barbara 13 F 65.3 98
Carol 14 F 62.8 102.5
;
How would you want to indicate that we don't know the gender for Carol? If we leave only spaces then the value for HEIGHT will be read into the SEX variable. So instead you put a period to indicate the missing values, numeric or character.
data class;
length name $20 age 8 sex $1 height weight 8 ;
input name age sex height weight ;
cards;
Alfred 14 M 69 112.5
Alice . F 56.5 84
Barbara 13 F . 98
Carol 14 . 62.8 102.5
;
If you really want to read a single period into a character variable then use the $CHAR informat instead of the $ informat.
data class;
length name $20 age 8 sex $1 height weight 8 ;
input name age sex height weight ;
informat sex $char.;
cards;
Alfred 14 M 69 112.5
Alice . F 56.5 84
Barbara 13 F . 98
Carol 14 . 62.8 102.5
;
Results with $CHAR informat.
Obs name age sex height weight 1 Alfred 14 M 69.0 112.5 2 Alice . F 56.5 84.0 3 Barbara 13 F . 98.0 4 Carol 14 . 62.8 102.5
Share the data step code you used to read the CSV for help in seeing how to modify it to use the $CHAR informat for the variables where you want the periods not to be used as indicating missing values.
The single period is read as MISSING for both numeric and character variables. For character variables a missing values means all blank. So it is on purpose, or at least a foreseeable side effect of how SAS reads data by default.
Consider the more normal LIST MODE input designed to read a more human readable text file where extra spaces (aka delimiters) are inserted to make the columns of values line up.
data class;
length name $20 age 8 sex $1 height weight 8 ;
input name age sex height weight ;
cards;
Alfred 14 M 69 112.5
Alice 13 F 56.5 84
Barbara 13 F 65.3 98
Carol 14 F 62.8 102.5
;
How would you want to indicate that we don't know the gender for Carol? If we leave only spaces then the value for HEIGHT will be read into the SEX variable. So instead you put a period to indicate the missing values, numeric or character.
data class;
length name $20 age 8 sex $1 height weight 8 ;
input name age sex height weight ;
cards;
Alfred 14 M 69 112.5
Alice . F 56.5 84
Barbara 13 F . 98
Carol 14 . 62.8 102.5
;
If you really want to read a single period into a character variable then use the $CHAR informat instead of the $ informat.
data class;
length name $20 age 8 sex $1 height weight 8 ;
input name age sex height weight ;
informat sex $char.;
cards;
Alfred 14 M 69 112.5
Alice . F 56.5 84
Barbara 13 F . 98
Carol 14 . 62.8 102.5
;
Results with $CHAR informat.
Obs name age sex height weight 1 Alfred 14 M 69.0 112.5 2 Alice . F 56.5 84.0 3 Barbara 13 F . 98.0 4 Carol 14 . 62.8 102.5
Share the data step code you used to read the CSV for help in seeing how to modify it to use the $CHAR informat for the variables where you want the periods not to be used as indicating missing values.
Hi Tom, I am actually using file reader in DI to read CSV file. I changed the informat to $char but still not working.
Does DI file reader generate SAS code? Can you see it? If not can you at least see the SAS log for that step?
Does DI allow you to run a custom SAS data step to read the file? Reading a CSV file is very easy.
infile "xxx"
lrecl = 10000
delimiter = ','
dsd
missover
firstobs = 2
encoding = "ANY"
termstr = CRLF;
;
attrib column1 length = $1000
format = $1000.
informat = $1000.;
So you did NOT change the INFORMAT from $ to $CHAR. Does it not allow you to specify the informat? How is it able to read date values if you cannot change the informat?
You probably need this
attrib column1 length = $1000
format = $1000.
informat = $CHAR1000.
;
But it also might matter how they generated the INPUT statement.
The code I sent to you is the original copy without any changes. I changed to $char but it is not working.
What does the generated INPUT statement look like?
financial_year and column1 are different variables. What are the ATTRIB settings of financial_year?
A correctly set informat of $CHAR will read the dot as-is, proof here:
data check;
infile datalines dlm="," dsd;
attrib
col1 informat=$char10.
col2 informat=$10.
;
input
col1
col2
;
datalines;
2017,2017
.,.
;
proc print data=check noobs;
run;
Result:
col1 col2 2017 2017 .
So we need to see the complete code of the data step that reads the file.
See if you can figure out how to see the SAS log for that data step and make sure it is actually running the code you see and reading the file you think.
Also make sure that there is not some other step in your data flow that is changing the period into a space.
You've given us the symptom of your problem but no evidence to help us figure out why it is happening.
Maxim 2 for maximally-efficient SAS programmers is read your SAS log. This will likely contain sufficient evidence to identify the cause. Feel free to post your complete log including notes if you need help interpreting it.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.