DATA Step, Macro, Functions and more

Import data with unexpected split lines

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 11
Accepted Solution

Import data with unexpected split lines

I got a datasets with some cells contain multiple lines. Like row3 under SrreComm column, there are two lines (see picture below. When importing the data, sas will automatically read this one cell as 2 records. I tried to use TERMSTR=CRLF but did not work. Anyone know how to deal with it?

 

Thanks,

Andrea

 

My codes:

 

data WORK.TUMOUR2 ;
%let _EFIERR_ = 0;
infile '&dir2.\&foldername.\tumour_noclass_MA.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 TERMSTR=CRLF;
informat DDtlODsc $89. ;
informat SrceComm $255. ;
informat COUNT best12.;
informat tumour_type $9. ;
informat indicator_type $10. ;
informat MA best12. ;
format DDtlODsc $89. ;
format SrceComm $255. ;
format COUNT best12. ;
format tumour_type $9. ;
format indicator_type $10. ;
format MA best12. ;
input DDtlODsc $ SrceComm $ COUNT tumour_type $ indicator_type $ MA;
if _ERROR_ then call symputx('_EFIERR_',1);
run;

Capture.JPG

 


Accepted Solutions
Solution
‎06-19-2017 09:25 PM
PROC Star
Posts: 1,561

Re: Import data with unexpected split lines

Notepad is useless for this as it doesn't display non-printable characters.

You need to identify what character is causing the issue.

Run this

data TEST3;
  infile "&dir2.\&foldername.\tumour_noclass_MA.csv" recfm=n;
  input X $1. @@;
  putlog X $1. @3 X $hex2.;
run; 

It will print one line par character in the log, together with its hexadecimal code.

Look after 133,,, see what you have.

Regardless, you should clean your data *before* you export it to CSV, and remove non-printable characters from strings if they are unneeded, like this:

STRING=compress(STRING,,'wk');

Non-printable characters will always mess up CSV files.

 

 

 

View solution in original post


All Replies
Super User
Posts: 17,836

Re: Import data with unexpected split lines

What happens if you remove the MISSOVER, should be TRUNCOVER anyways and DSD. 

 

Usually this gets read in properly in my experience, but I'm assuming you created this CSV from Excel for some reason?

Are you sure that comment field doesn't have comma's in it?

 

Occasional Contributor
Posts: 11

Re: Import data with unexpected split lines

Hi Reeza,

 

Yes. The data comes from an excel file, which was from SAS! Some cells do have cooma but the major problem seems like coming from the split lines instead of comma. Not quite understand the reason. I swap missover to truncover now and put ignoredoseof. Get nothing! And no error in the log.

 

Andrea

Super User
Posts: 17,836

Re: Import data with unexpected split lines

Did you figure out the character that's causing the issue?

Is it 

'041'x?

Super User
Posts: 17,836

Re: Import data with unexpected split lines

SAS Super FREQ
Posts: 683

Re: Import data with unexpected split lines

Hi

 

To read a file that has LF,  x"0a" as part of their data value, you can use the option TERMSTR=CRLF, this was alread metioned.

 

If you are using SAS Enterprise Guide, by default you will not see the text that comes after the LF. Better do a Proc PRINT from your data.

 

The following code:

filename xcsv "c:\temp\sample.csv" termstr=crlf;
proc import file=xcsv out=want dbms=csv replace ;
run;
proc print data=want;
run;

Will read a file that looks like this:

Snap8.png

 

 

Occasional Contributor
Posts: 11

Re: Import data with unexpected split lines

Hi Bruno,

 

An exellent idea! But SAS Enterprise Guide seems can't find the file. I will have a look the problem.

 

Cheers,

Andrea

Super User
Posts: 17,836

Re: Import data with unexpected split lines

Most likely EG is on the server and the file is on your local drive. 

 

To see 'non-printable' characters you need to use a decent text editor, ie NotePad++ or TextWranger are the two that I use. 

Occasional Contributor
Posts: 11

Re: Import data with unexpected split lines

Hi Reeze,

 

I just used Chris's idea, remove the non-printable characters and exported the file again! It is sovled!

 

Thanks for all you guys suggestions!

 

Cheers,

Andrea

Occasional Contributor
Posts: 11

Re: Import data with unexpected split lines

Hi Reeze,

 

I use notepad to look at the data, finding there is a quote when data is split into 2 lines. I suspect it is the quote but not sure, I will have a try.

 

Capture.JPG

 

Cheers,

Andrea

Solution
‎06-19-2017 09:25 PM
PROC Star
Posts: 1,561

Re: Import data with unexpected split lines

Notepad is useless for this as it doesn't display non-printable characters.

You need to identify what character is causing the issue.

Run this

data TEST3;
  infile "&dir2.\&foldername.\tumour_noclass_MA.csv" recfm=n;
  input X $1. @@;
  putlog X $1. @3 X $hex2.;
run; 

It will print one line par character in the log, together with its hexadecimal code.

Look after 133,,, see what you have.

Regardless, you should clean your data *before* you export it to CSV, and remove non-printable characters from strings if they are unneeded, like this:

STRING=compress(STRING,,'wk');

Non-printable characters will always mess up CSV files.

 

 

 

Occasional Contributor
Posts: 11

Re: Import data with unexpected split lines

Hi Chris,

 

I just removed the non-printable characters and exported the file again! It works well now!

 

Thanks,

Andrea

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 146 views
  • 0 likes
  • 4 in conversation