04-29-2016 09:08 AM
I am trying to import a bad text raw data file in SAS EG using Infile statement. The file has 100,000 rows but when I import it imports only 2000 rows. I looked into the text file and saw that 2001 row there is a ‘arrow’ sign in the middle of the data. May be that’s the reason it’s not importing from this row onwards.
Is there any way we can import all the data even if there are some characters..?
04-29-2016 09:28 AM
You could try pre-processing the file to exclude any characters you don't need, something like:
data _null_; infile "C:\test.txt" recfm=n; file "C:\NEW_Test.txt" recfm=n; input a $char1.; put compress(a,"","knpu"); put a $char1.; run;
So the file is read one character at a time. The compress should keep (k) numeric+chars (n), punctuation (p), and uppercase (u), and then write that out again. You can then read in the NEW_Test.txt file without special characters.
04-29-2016 09:37 AM
If your SAS is running on Windows, there is another possibility that your text file has embedded with 'end of file' unprinted symbol, namely '1A'x. In this case, you need to tell SAS to ignore it:
infile test ignoredoseof;
04-29-2016 09:46 AM
Thanks. It worked
So is this the unprinted sign and we don't see it in the file? That is tricky..how do we know that all the data is not getting imported because of some wierd character or is it odd sign/symbol or because of end of file symbol?
04-29-2016 10:09 AM
I am afraid that there really is no easy programmable way to tell. I would reach out to the data provider get some metadata information, at least on ballpark level, such as how many records, fields in total etc, and understanding how the data is generated also helps, for instance, if you know you are get whole year of data that is collected on monthly basis, then there is a chance of your having 'end of file' symbol embedded.