BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
gyambqt
Obsidian | Level 7

Hi Experts:

Simple question:

I have a CSV file and one of a column has a dot (.) value

I use infile to read this column into SAS table as a character value and the dot is gone.

What is the reason behind?

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

The single period is read as MISSING for both numeric and character variables. For character variables a missing values means all blank.  So it is on purpose, or at least a foreseeable side effect of how SAS reads data by default. 

 

Consider the more normal LIST MODE input designed to read a more human readable text file where extra spaces (aka delimiters) are inserted to make the columns of values line up.  

data class;
  length name $20 age 8 sex $1 height weight 8 ;
  input name age sex height weight ;
cards;
Alfred  14 M 69   112.5
Alice   13 F 56.5  84
Barbara 13 F 65.3  98
Carol   14 F 62.8 102.5
;

How would you want to indicate that we don't know the gender for Carol?  If we leave only spaces then the value for HEIGHT will be read into the SEX variable.  So instead you put a period to indicate the missing values, numeric or character.

data class;
  length name $20 age 8 sex $1 height weight 8 ;
  input name age sex height weight ;
cards;
Alfred  14 M 69   112.5
Alice    . F 56.5  84
Barbara 13 F   .   98
Carol   14 . 62.8 102.5
;

If you really want to read a single period into a character variable then use the $CHAR informat instead of the $ informat.

data class;
  length name $20 age 8 sex $1 height weight 8 ;
  input name age sex height weight ;
  informat sex $char.;
cards;
Alfred  14 M 69   112.5
Alice    . F 56.5  84
Barbara 13 F   .   98
Carol   14 . 62.8 102.5
;

Results with $CHAR informat.

Obs     name      age    sex    height    weight

 1     Alfred      14     M      69.0      112.5
 2     Alice        .     F      56.5       84.0
 3     Barbara     13     F        .        98.0
 4     Carol       14     .      62.8      102.5

Share the data step code you used to read the CSV for help in seeing how to modify it to use the $CHAR informat for the variables where you want the periods not to be used as indicating missing values.

 

View solution in original post

13 REPLIES 13
Tom
Super User Tom
Super User

The single period is read as MISSING for both numeric and character variables. For character variables a missing values means all blank.  So it is on purpose, or at least a foreseeable side effect of how SAS reads data by default. 

 

Consider the more normal LIST MODE input designed to read a more human readable text file where extra spaces (aka delimiters) are inserted to make the columns of values line up.  

data class;
  length name $20 age 8 sex $1 height weight 8 ;
  input name age sex height weight ;
cards;
Alfred  14 M 69   112.5
Alice   13 F 56.5  84
Barbara 13 F 65.3  98
Carol   14 F 62.8 102.5
;

How would you want to indicate that we don't know the gender for Carol?  If we leave only spaces then the value for HEIGHT will be read into the SEX variable.  So instead you put a period to indicate the missing values, numeric or character.

data class;
  length name $20 age 8 sex $1 height weight 8 ;
  input name age sex height weight ;
cards;
Alfred  14 M 69   112.5
Alice    . F 56.5  84
Barbara 13 F   .   98
Carol   14 . 62.8 102.5
;

If you really want to read a single period into a character variable then use the $CHAR informat instead of the $ informat.

data class;
  length name $20 age 8 sex $1 height weight 8 ;
  input name age sex height weight ;
  informat sex $char.;
cards;
Alfred  14 M 69   112.5
Alice    . F 56.5  84
Barbara 13 F   .   98
Carol   14 . 62.8 102.5
;

Results with $CHAR informat.

Obs     name      age    sex    height    weight

 1     Alfred      14     M      69.0      112.5
 2     Alice        .     F      56.5       84.0
 3     Barbara     13     F        .        98.0
 4     Carol       14     .      62.8      102.5

Share the data step code you used to read the CSV for help in seeing how to modify it to use the $CHAR informat for the variables where you want the periods not to be used as indicating missing values.

 

gyambqt
Obsidian | Level 7

Hi Tom, I am actually using file reader in DI to read CSV file. I changed the informat to $char but still not working.

Tom
Super User Tom
Super User

Does DI file reader generate SAS code? Can you see it?  If not can you at least see the SAS log for that step?

 

Does DI allow you to run a custom SAS data step to read the file? Reading a CSV file is very easy.

gyambqt
Obsidian | Level 7

infile "xxx"
lrecl = 10000
delimiter = ','
dsd
missover
firstobs = 2
encoding = "ANY"
termstr = CRLF;
;
attrib column1 length = $1000
format = $1000.
informat = $1000.;

Tom
Super User Tom
Super User

So you did NOT change the INFORMAT from $ to $CHAR. Does it not allow you to specify the informat?  How is it able to read date values if you cannot change the informat?

 

You probably need this

attrib column1 length = $1000
  format = $1000.
  informat = $CHAR1000.
;

But it also might matter how they generated the INPUT statement.

 

 

gyambqt
Obsidian | Level 7

The code I sent to you is the original copy without any changes. I changed to $char but it is not working.

Tom
Super User Tom
Super User

What does the generated INPUT statement look like?

gyambqt
Obsidian | Level 7
input Financial_Year;
gyambqt
Obsidian | Level 7
They are the same column
Kurt_Bremser
Super User

A correctly set informat of $CHAR will read the dot as-is, proof here:

data check;
infile datalines dlm="," dsd;
attrib
  col1 informat=$char10.
  col2 informat=$10.
;
input
  col1
  col2
;
datalines;
2017,2017
.,.
;

proc print data=check noobs;
run;

Result:

col1	col2
2017	2017
.	 

So we need to see the complete code of the data step that reads the file.

Tom
Super User Tom
Super User

See if you can figure out how to see the SAS log for that data step and make sure it is actually running the code you see and reading the file you think.

 

Also make sure that there is not some other step in your data flow that is changing the period into a space.

SASKiwi
PROC Star

You've given us the symptom of your problem but no evidence to help us figure out why it is happening.

 

Maxim 2 for maximally-efficient SAS programmers is read your SAS log. This will likely contain sufficient evidence to identify the cause. Feel free to post your complete log including notes if you need help interpreting it.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1911 views
  • 0 likes
  • 4 in conversation