BookmarkSubscribeRSS Feed
Kuda35
Calcite | Level 5

 

Hi,  I am trying to import data from a   '|' delimited text file into a dataset. However when SAS encounters a field when there a multiple spaces of blanks it seems to be reading these a delimiter and treating that this entries in to the following fields as missing values

I am sure I am missing something simple.

 

So for example is I have

Age|Name|Town

20|Kuda ZZZ|Harare

25|Fraser Bernhard             |New York

30|Matthew Benhard|London 

 

and I am getting 

 

Age       Name                        Town

 20        Kuda ZZZ                  Harare

 25        Fraser Bernhard        .

 .           .                                 .        

 30        Matthew Benhard      London   

 

I am using the following code

 

 

data .Input    ;
        %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
         infile 'C:/Data.txt' delimiter = '|' MISSOVER DSD lrecl=32767 firstobs=2 ;
               informat Age best32.. ;
               informat Name $30. ;
               informat Town  $20. ;
                       format Age best32.. ;
                       format Name $30. ;
                       format Town  $20. ;
             input
                        Age 
                        Name $
                        Town  $
 ;
           if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */
            run;

 

9 REPLIES 9
Haikuo
Onyx | Level 15

Your code runs fine in my systems (SAS 9.4 on Linux 64 /Windows 7).

 

filename FT15F001 temp; 

data Input    ;
        %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
         infile FT15F001 delimiter = '|' MISSOVER DSD lrecl=32767 firstobs=2 ;
               informat Age best32. ;
               informat Name $30. ;
               informat Town  $20. ;
                       format Age best32. ;
                       format Name $30. ;
                       format Town  $20. ;
             input
                        Age 
                        Name $
                        Town  $
 ;
           if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */
parmcards; 
Age|Name|Town
20|Kuda ZZZ|Harare
25|Fraser Bernhard             |New York
30|Matthew Benhard|London 
;;;;
            run;

It must be something else.

 

ballardw
Super User

Use a hex editor or similar on your data set. The example output you show, especially the line with missing AGE makes me suspect that you may have some linefeed or carriage return or other non-printable values in the data that appear as blanks but are not actually so.

FreelanceReinh
Jade | Level 19

Hi @Kuda35,

 

I cannot imagine that those "multiple spaces" are misinterpreted as delimiters as you use the DELIMITERS= option correctly.

 

A possible explanation could be premature line breaks (e.g. '0A'x) or a large number (e.g. larger than the length of the respective variable) of invisible characters (like non-breakable spaces, 'A0'x) in front of the actual value.

 

Can you post a few data lines from C:\Data.txt (with confidential information overtyped, e.g., with 'zzzz') where you observed the issue?

 

Also, do you get any suspicious log messages (such as "NOTE: Invalid data ...")?

Kuda35
Calcite | Level 5

Thanks all for the responses. I have look at the data using a hex editor and you are all correct. It appears to be due to premature line breaks. '0D0A' coming through in some lines in the data. 

 

What would be the best approach to deal with these?

FreelanceReinh
Jade | Level 19

I think, the best approach would be to avoid the stray "0D0A"s in the .txt file (if this is an option).

 

If these '0D0A'x characters really occur in more or less unpredictable places of your data (e.g. between first name and last name of the field "name" and also in other places), it could be fairly difficult to work around them in SAS. This is because we would have to distinguish between the unwanted line breaks and the correct ones at the end of each record. Also, it is not clear in general if all of those line breaks should be replaced by (single) blanks or null strings or maybe delimiters.


The problem would be much easier if the additional line breaks were different (e.g., only '0A'x) from the end-of-record character (presumably '0D0A'x in your file, or?).

 

In the worst case, I think we'd have to examine the file structure (in terms of delimiters, types of values and line breaks) of Data.txt in order to develop a strategy for the next step. You can see an example of how this "next step" could look like (in a very simple case) in the last "Solution" of section "Reading Delimited Data" in the INFILE statement documentation.

 

Needless to say that especially a large number of columns ("variables") in Data.txt would aggravate the problem.

Kuda35
Calcite | Level 5

Thanks for getting back,  I look again at the hex code and I noticed that the when this issue occurs the '0d0a'x is right next to the delimiter '|', so the code is always '0d0a7c'x.  Is there a code I could use to simply replace the '0d0a7c'x with '7c' or '|' in the file. That should solve the issue. 

FreelanceReinh
Jade | Level 19

This is good news. I was hoping that there was some regular pattern that we could take advantage of.

 

I've just performed this find-and-replace action on a test file in an external text editor which allows to search for line breaks (CRLF, i.e. 0D0A) by entering ^p into the "Find what" field, i.e. find ^p|, replace with |.

 

Maybe your hex editor has this capability, too? Actually it does not require hex mode, only a feature to search for line breaks. Even the SAS Enhanced Editor has this feature (and could be used if the .txt file is not too large for it). For my test file, the following replacement worked:

replace.png

The only difference between the results of the two editors was that the Enhanced Editor seemed to add a single CRLF to the end of the file.

 

If none of these point-and-click solutions works for you, we can still develop a little SAS program to perform the replacement.

Kuda35
Calcite | Level 5

Thanks Reinhard, really appreciated  your help with this.  Unfortunately my file  is too large and none of teh point and click solutions work. How would I do this in SAS? I have tried the following code but it doesn't work. Could you help me with the code I would need to run.  

 

data _null_;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile 'c/data.txt' lrecl=1 recfm=n end=eof sharebuffers;
file 'c:/Datav2.txt' lrecl=1000000 recfm=n;
input char $char3.;
	 if char = '0D0A7c'x then put '|';
     else put char $char3.;
if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */
run;
FreelanceReinh
Jade | Level 19

No problem. Try this:

data _null_;
infile 'C:\Data.txt' recfm=n lrecl=1000000 eof=eof;
file 'C:\Datav2.txt' recfm=n lrecl=1000000;
input a $char1.;
if a ne '0D'x then put a $1.;
else do;
  af=1;
  input b $char1.;
  if b ne '0A'x then put '0D'x b $1.;
  else do;
    bf=1;
    input c $char1.;
    if c ne '|' then put '0D0A'x c $1.;
    else put '|';
  end;
end;
return;
eof:
  if af then put '0D'x;
  if bf then put '0A'x;
run;

This program inputs only one character at a time and takes special action if that character could be part of a '0D0A7C'x string, always considering the possibility that the end of the .txt file is reached with one of the INPUT statements. Variables AF and BF are flags to remember that characters stored in A or B, respectively, still need to be written to the output file.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1589 views
  • 1 like
  • 4 in conversation