BookmarkSubscribeRSS Feed
DangIT
Fluorite | Level 6

Hi,

I have received a CSV file with non quoted strings. One of the string variables in the file may contain values with a comma.

So when I load the data through Proc Import, those records will all be shifted to an additional variable.

Unfortunately the data is not aligned correctly for me to run it with fixed widths.

Has anyone figured out a way to get around this?

Thanks in advance for your help.

6 REPLIES 6
Haikuo
Onyx | Level 15

Unless there is an additional rule that can help to identify those fields containing comma, I don't see there is a reliable way to import data precisely. I would go back asking your data source to modify their delivery in a way that the data can be easily interpreted, either adding quotes or using special delimiters that can't be found inside your field contents.

Haikuo

DangIT
Fluorite | Level 6

I figured that was the case.

Thanks for the quick reply.

CTorres
Quartz | Level 8

I would try to correct the input file using a Data step program before the Proc Import if it is possible to create a business rule.
For example: "only the third field has commas and there are ten fields in total".
In this case I would count the commas in each line, if more than 9 (for ten fields) then they belong to the third field so I can insert quotes arround it.

If you are not able to find a correction business rule, I'm afraid you would have to ask them to send you the file again created correctly.

CTorres

DangIT
Fluorite | Level 6

Interesting..

I can probably come up with a business rule, my data will have 15 variables, and the 10th variable is the field that may have the additional comma.

I'm lost at how I would insert quotes around the 10th field in a datastep?

Thanks,

KD

Reeza
Super User

It depends on your data.

You can import all as text and then clean up afterwards, i.e. if you have 16 variables instead of expected 15 then concatenate the 10th/11th and move everything over one field.

But for absolute accuracy get a new version. If source is Excel file, it doesn't have a default for quoted delimited which is usually the issue Smiley Sad

CTorres
Quartz | Level 8

The following program fix the incorrect input CSV file than contains 7 fields but the 3rd one includes commas:


%let invalid=3;
%let total=7;


data _null_;
length record $200 newrecord $200;
infile datalines ;
input;
record= _infile_;
from=findc(record,',');
do i=2 to &Invalid-1;
   from=findc(record,',',from+1);
end;
to=findc(record,',','B');
do i=&invalid+1 to &total-1;
   to=findc(substr(record,1,to-1),',','B');
end;
newrecord=cats(substr(record,1,from),'"',substr(record,from+1,to-from-1),'"',substr(record,to));
put newrecord;
datalines;
12,125,JOHN TAYLOR,NC,48,1,XYZ
22,7897,BROWN, PETER,MA,62,2,ZZY
333,1740,PETER WHITE,NY,77.2,1,XYZ
555,45,JUAN, MARIA, LUIS,XX,18,,BBB
;
RUN;

LOG:

653  %let invalid=3;
654  %let total=7;
655
656  data _null_;
657   length record $200 newrecord $200;
658   infile datalines ;
659   input;
660   record= _infile_;
661   from=findc(record,',');
662   do i=2 to &Invalid-1;
663     from=findc(record,',',from+1);
664   end;
665   to=findc(record,',','B');
666   do i=&invalid+1 to &total-1;
667     to=findc(substr(record,1,to-1),',','B');
668   end;
669   newrecord=cats(substr(record,1,from),'"',substr(record,from+1,to-from-1),'"',substr(record,to));
670  put newrecord;
671  datalines;

12,125,"JOHN TAYLOR",NC,48,1,XYZ
22,7897,"BROWN, PETER",MA,62,2,ZZY
333,1740,"PETER WHITE",NY,77.2,1,XYZ
555,45,"JUAN, MARIA, LUIS",XX,18,,BBB
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds


676  ;
677  RUN;

CTorres

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1015 views
  • 3 likes
  • 4 in conversation