08-12-2016 11:40 AM
I'm importing some data from the web using an infile statement in a data step. Some of the fields contain carriage returns / line feeds in the middle of the data that cause bad records to be returned. I'm trying to find the simplest was to clean up the culprit fields which have these characters so the resulting data set in my work file doesn't have these errors.
I was thinking of using the Compress function to remove the characters, but I don't know where to put it in the code.
Here's a simplified version of my code that creates a work file but with errors.
filename notes url "http://www.website.com/...."; data WORK.NOTES; infile NOTES delimiter = ',' MISSOVER DSD firstobs=2; informat name $150. ; informat note $300. ; format name $150. ; format note $300. ; input name $ note $; run;
08-12-2016 12:18 PM - edited 08-12-2016 03:44 PM
Not tested, you may need to tweak,
filename notes url "http://www.website.com/...."; data WORK.NOTES; infile NOTES delimiter = ',' MISSOVER DSD firstobs=2 lrecl=1000; informat name $150. ; informat note $300. ; format name $150. ; format note $300. ; input @1 @; _infile_=compress(_infile_,'0D0A'x); input name $ note $; run;
08-12-2016 01:29 PM
I got this error when inserting this code
ERROR: The INFILE statement MISSOVER option and the INPUT statement double trailing @ option, are being used in an inconsistent manner. The execution of the DATA STEP is being terminated to prevent an infinite loop condition.
08-12-2016 03:40 PM - edited 08-12-2016 03:44 PM
Right, changing the double @@ to single @ should fix it.
input @1 @;
Previous post has been updated to reflect the change.
08-12-2016 04:45 PM
Seems to me you have more/different issues than you have presented. So far, we have followed the path you set, while did not get to know exactly what you mean by 'error' or 'problem'. current process should take care of the 2 control characters you think you have, if not, you can try a full-blown version of it: COMPRESS(_INFILE_,,'C'). This will take out all of the control characters, but still may not tackle your 'error' or 'problem'.
08-12-2016 12:21 PM
Do you know if the issue is ONLY linefeed or is it Carriage Return Linefeed pair, or carriage return only?
What OS are you running on?
You may be able to set the TERMSTR= option on the infile to correct this but settings would depend on your OS and the actual content of the file.
08-12-2016 02:03 PM
You might try TERMSTR=CRLF on the infile. This would set the requirement for both a carriage return and linefeed to appear to end a line. Note that if this works you will still have CR in the variable. That should be removed with a string function such as Translate though you'll likely need to use the Hex coded value in code.
08-12-2016 02:24 PM
I tried TERMSTR=CRLF and TERMSTR=CR, both produced zero records
TERMSTR=LF produced results, but with the bad records due to carriage returns
08-12-2016 06:24 PM
The FILENAME statement supports the TERMSTR= option for the URL method. It can't hurt to try it there. LF is the default.
08-13-2016 12:42 AM
Make it as a Stream File : filename notes url "http://www.website.com/...."; data WORK.NOTES; infile NOTES recfm=n dlm='<' dsd; input @'>' x : $2000. @@; run; After that you can remove it by compress(x,'0D0A'x);
08-16-2016 12:20 PM
Here's some sample data and simplified code that illustrates my issue. The imported 'notes' work file has 7 records instead of 5 due to the multiple lines in the 'notes' field for records 3 and 5 in the .txt file.
filename notes 'C:\NotesTest.txt'; proc import datafile=notes out=notes dbms=tab replace; getnames = yes; run;
Need further help from the community? Please ask a new question.