import multiple line value of csv file

Reply
SAS Employee
Posts: 2

import multiple line value of csv file

I have a large csv file that export from another data source, I want to convert to SAS dataset. The file has over 100 columns and some columns has multiple line value (people paste email information here). I tried both proc import and data step, it could only import the first line, the second line overwrite the next column or wrap to next row. Here is an example of description field:

first row: There are 5 Black Belt in Italy.

second row: They are using Excel

 

my code is:

data test;

infile "c:\test" missover dsd lrecl=32767 firstobs=2;

informat description $1000.;

format description $1000.;

input description $;

run;

 

Any ideas how to fix the code? 

 

Thank you very much!

Grand Advisor
Posts: 10,210

Re: import multiple line value of csv file

Easiest in the long run is likely to go to the data source that the file is created from and have that field remove any of the carriage return, linefeed or what ever character is used to create the multiple lines. This becomes more important if this process is to be repeated as this type of bad data will cause issues forever otherwise.

 

If this is a one time thing and there are not millions of lines then using a text editor such as notepad you can delete the end of line markers manually (they will always be at the end of line so not hard to find) and then read the file.

 

Your problem is that programs look can look for patterns but you have to be able to describe the pattern. Proc import doesn't allow much and even a datastep means you have to be able to figure out how and that is not going to be a trivial exercise.

 

Grand Advisor
Posts: 17,332

Re: import multiple line value of csv file

In a well defined CSV file, a field with multiple values would be wrapped in quotation marks? Is that the case here?

 

What happens if you open the file in Excel, how do those fields look?

Trusted Advisor
Posts: 1,508

Re: import multiple line value of csv file

Difficult to know from the data you posted, but it seems from your description that option truncover may help.

Grand Advisor
Posts: 10,210

Re: import multiple line value of csv file

I am attaching a very small example of what I think is the problem. I have run into these and the issue is the characters getting treated as end of line mean the MISSOVER or TRUNCOVER do not finish reading the first line, then the coninued text is attempting to be read into the incorrect variables. I changed the extension to TXT so that it uploads and is readable easily.

 

Very basic code to read that file:

data junk;
   infile "d:\data\examplecsv.txt" dlm=',' firstobs=2 missover;
   informat id  Var1 best4.
            description $200.
            var3 best4.
   ;
   input id var1 description var3;
run;

 

Lots of Invalid data for id messages with this kind of data.

The file was generated use Excel and Alt-Enter to put vertical spaces within a single cell.

Super User
Super User
Posts: 6,323

Re: import multiple line value of csv file

[ Edited ]

Your example file has  embedded CR+LF characters.  But when I make such a file from Excel it only embeds LF characters.

You should be able to read a file with embedded LF characters and end of lines marked by CR+LF if you add the TERMSTR=CRLF option to your infile statement.

 

If that doesn't work then you can use code like this to convert any CR or LF in the middle of a quoted string to a pipe character.  

filename copy temp;

data _null_;
  infile 'exampleCSV.txt' recfm=n ;
  file copy recfm=n;
  input ch $char1.;
  retain q 0;
  q = mod(q+(ch='"'),2);
  if q and ch in ('0D'x,'0A'x) then put '|';
  else put ch $char1. ;
run;

Then you should be able to read the file.

data _null_;
  infile copy dsd truncover;
  length x1-x5 $200;
  input x1-x5 ;
  put (_all_) (=);
run;


x1=Id x2=Var1 x3=Description x4=Var3 x5=
x1=1 x2=23 x3=A simple comment x4=17 x5=
x1=2 x2=18 x3=A comment entered|when using Alt-Enter in |Excel to create a multiline single entry x4=44 x5=
x1=3 x2=29 x3=Another simple comment x4=55 x5=
NOTE: 4 records were read from the infile COPY.
Respected Advisor
Posts: 4,606

Re: import multiple line value of csv file

See if your situation is similar to my simulated example:

 

proc print data=test; run;

                            Obs            text

                             1     "This is one line
                             2     and another line
                             3     and a third line"
                             4     Some text in line two

data want;
length line $1000;
do until ((char(line,1) = '"') = (substr(line,length(line),1) = '"')) ;
    set test;
    line = catx(" ", line, text);
    end;
drop text;
run;

proc print data=want; run;

            Obs    line

             1     "This is one line and another line and a third line"
             2     Some text in line two
PG
SAS Employee
Posts: 2

Re: import multiple line value of csv file

Thank you for all your suggestions. I have tried all the ideas. Since the data is not clean, none of them can completed resolve the issue, I decide to move the column to the last and remove from import if necessary. 

Trusted Advisor
Posts: 1,508

Re: import multiple line value of csv file

Do try @Tom 's TERMST=CRLF suggestion. It could well solve your issue.

Ask a Question
Discussion stats
  • 8 replies
  • 893 views
  • 5 likes
  • 6 in conversation