10-20-2017 02:54 PM - edited 10-20-2017 03:00 PM
Hello, I am trying to figure out a file translation issue when using filename ftp. I have attached a sample of the file I have been trying to import.
When I use the exact same data step code to import the exact same file I get bad results when I use the filename ftp (FTP Method below) versus good results when copying the same file (drag and drop) to the SAS server then importing from the copied file (Direct Copy Method below). I'm assuming the ftp messes it up because there are some encoding translation issues when pulling from a Windows FTP server to the Unix SAS Server. I've tried tweaking around with all the infile options but haven't been able get it to import everything. Sometimes it will even import a whole "column" of the data as a single attribute value. Does anyone know how I can get this to import to a table correctly (by correctly I mean having every bit of data in the table, the formatting can come later)?
/* FTP Method */ FILENAME ftpfiles FTP 'filename.txt' CD='/directory' HOST='host.com' USER='user' PASS='pass'; data _temp; /* Exact same code */ infile ftpfiles /* Exact same code */ run; /* Direct Copy Method */ /* After copying file to sas server at '/sas_server/filename.txt' */ data _temp; /* Exact same code */ infile ftpfiles '/sas_server/filename.txt' /* Exact same code */ run;
10-20-2017 06:09 PM
If you are reading from a Unix system running on windows you may want the option TERMSTR=LF on the INFILE statement.
Windows and Unix use different characters to end a "line" of data. LF is Unix and a Windows system won't always recognize the end of a line especially with varying line lengths.
You don't show any LRECL option value either, which setting may be needed if some of your data is very long lines.
10-26-2017 04:28 PM
I've tried all three options of CRLF, LF, and CR and none of them read the lines correctly. Each line isn't too long either maybe 180 characters max. It's weird because if I use dlm='0A'x it finds the end-of-line correctly, but if I use termstr=LF it does not.
10-26-2017 05:02 PM
First look at the file on the source system and see what it looks like. For example if you SAS then you can run a simple program like this to see what is in the first 500 bytes of the file.
data _null_; infile "&path\sample_import_file.txt" lrecl=100 recfm=f obs=5 ; input; list; run;
The file you posted has CR+LF at the end of each line. It also says that it is using UTF16 character encoding.
NOTE: A byte-order mark in the file "c:\downloads\sample_import_file.txt" (for fileref "#LN00079") indicates that the data is encoded in "utf-16le". This encoding will be used to process the file.
You might want to tell the FTP engine that information as it might not be able to autodetect that. I am not sure the FTP engine can detect the BOM the same way that SAS can when reading directly from a disk file.
Second check what your "drag and drop" method actually did. How do you drag and drop a file from Windows to Unix? Are you using some type of GUI FTP interface? if so then perhaps it transferred the file in ASCII mode and so converted the CR+LF to just LF. Or perhaps you have mounted the same physical disk onto both a Windows machine and a Unix machine. Then most likely the file was copied as BINARY and so the CR+LF stayed exactly as they were before.
Third try using the RCMD option on the FTP filename engine to tell your FTP server whether you want to move the file as BINARY. You might need to save it to a temporary physical file and then read from there.