DATA Step, Macro, Functions and more

Loading CSV files with Non Quoted Strings

Reply
Frequent Contributor
Posts: 81

Loading CSV files with Non Quoted Strings

Hi,

I have received a CSV file with non quoted strings. One of the string variables in the file may contain values with a comma.

So when I load the data through Proc Import, those records will all be shifted to an additional variable.

Unfortunately the data is not aligned correctly for me to run it with fixed widths.

Has anyone figured out a way to get around this?

Thanks in advance for your help.

Respected Advisor
Posts: 3,156

Re: Loading CSV files with Non Quoted Strings

Unless there is an additional rule that can help to identify those fields containing comma, I don't see there is a reliable way to import data precisely. I would go back asking your data source to modify their delivery in a way that the data can be easily interpreted, either adding quotes or using special delimiters that can't be found inside your field contents.

Haikuo

Frequent Contributor
Posts: 81

Re: Loading CSV files with Non Quoted Strings

I figured that was the case.

Thanks for the quick reply.

Regular Contributor
Posts: 180

Re: Loading CSV files with Non Quoted Strings

I would try to correct the input file using a Data step program before the Proc Import if it is possible to create a business rule.
For example: "only the third field has commas and there are ten fields in total".
In this case I would count the commas in each line, if more than 9 (for ten fields) then they belong to the third field so I can insert quotes arround it.

If you are not able to find a correction business rule, I'm afraid you would have to ask them to send you the file again created correctly.

CTorres

Frequent Contributor
Posts: 81

Re: Loading CSV files with Non Quoted Strings

Interesting..

I can probably come up with a business rule, my data will have 15 variables, and the 10th variable is the field that may have the additional comma.

I'm lost at how I would insert quotes around the 10th field in a datastep?

Thanks,

KD

Super User
Posts: 19,768

Re: Loading CSV files with Non Quoted Strings

It depends on your data.

You can import all as text and then clean up afterwards, i.e. if you have 16 variables instead of expected 15 then concatenate the 10th/11th and move everything over one field.

But for absolute accuracy get a new version. If source is Excel file, it doesn't have a default for quoted delimited which is usually the issue Smiley Sad

Regular Contributor
Posts: 180

Re: Loading CSV files with Non Quoted Strings

The following program fix the incorrect input CSV file than contains 7 fields but the 3rd one includes commas:


%let invalid=3;
%let total=7;


data _null_;
length record $200 newrecord $200;
infile datalines ;
input;
record= _infile_;
from=findc(record,',');
do i=2 to &Invalid-1;
   from=findc(record,',',from+1);
end;
to=findc(record,',','B');
do i=&invalid+1 to &total-1;
   to=findc(substr(record,1,to-1),',','B');
end;
newrecord=cats(substr(record,1,from),'"',substr(record,from+1,to-from-1),'"',substr(record,to));
put newrecord;
datalines;
12,125,JOHN TAYLOR,NC,48,1,XYZ
22,7897,BROWN, PETER,MA,62,2,ZZY
333,1740,PETER WHITE,NY,77.2,1,XYZ
555,45,JUAN, MARIA, LUIS,XX,18,,BBB
;
RUN;

LOG:

653  %let invalid=3;
654  %let total=7;
655
656  data _null_;
657   length record $200 newrecord $200;
658   infile datalines ;
659   input;
660   record= _infile_;
661   from=findc(record,',');
662   do i=2 to &Invalid-1;
663     from=findc(record,',',from+1);
664   end;
665   to=findc(record,',','B');
666   do i=&invalid+1 to &total-1;
667     to=findc(substr(record,1,to-1),',','B');
668   end;
669   newrecord=cats(substr(record,1,from),'"',substr(record,from+1,to-from-1),'"',substr(record,to));
670  put newrecord;
671  datalines;

12,125,"JOHN TAYLOR",NC,48,1,XYZ
22,7897,"BROWN, PETER",MA,62,2,ZZY
333,1740,"PETER WHITE",NY,77.2,1,XYZ
555,45,"JUAN, MARIA, LUIS",XX,18,,BBB
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds


676  ;
677  RUN;

CTorres

Ask a Question
Discussion stats
  • 6 replies
  • 314 views
  • 3 likes
  • 4 in conversation