CSV File Import :

Reply
Occasional Contributor
Posts: 6

CSV File Import :

Hi,

I have to import a Comma Seprated file into SAS but there is a problem with one coulm (A), Coulmn(A) has multiple values seprated by commas.

while importing SAS also reads these commas, so shift the values in subsequent coulmns(B,C,..). so the data become courrpt.

Can anay one please advice how can this exception(issue) be handle? so the multiple values in coulmn(A) remains in  Coulmn(A).

SAS gurus your support in this would be highly appreciated

Regards

Ali J

Respected Advisor
Posts: 3,777

Re: CSV File Import :

I don't believe you have a CSV because the specification for a CSV "says" that fields that contain the delimiter are to be quoted.

If you know the total number of fields you can count from the right to find the column where the first field should end.  Then read that field with $VARYING and continue with the other fields.

Show example data.

Super Contributor
Posts: 282

Re: CSV File Import :

Hi,

Can you please provide some example problem data, the code you are using to read it and what results you want.

Regards,

Amir.

Super Contributor
Posts: 644

Re: CSV File Import :

If column A always has the same number of commas (eg 2) then you can define extra temporary columns A1 A2 A3 to import the data; after the input statement you then need something like

     A = catx (', ', A1, A2, A3) ;

which will restore the commas in a single column A.  Follow with

     Drop A1 A2 A3 ;

Otherwise this might work:

     Infile xxx MISSOVER [etc] ;

     Input @ ;

     _Infile_ = reverse (_Infile_) ;

     Input     D  C  B A3 A2 A1 ;

     D = left (reverse (D)) ;

     C = left (reverse (C)) ;

     ...

     A1 = left (reverse (A1)) ;

     A = catx (', ', A1, A2, A3) ;

     Drop A1 A2 A3 ;

Good luck.

Richard

Occasional Contributor
Posts: 6

Re: CSV File Import :

Dear All,

Thanks for your replys.


Probably I did not explain the problem correctly. Please find attached sample data "SAMPL.csv"

There are coulmns from A to W , when I import data it create coulmns by commas but in coulmn 'N' there are multiple values(>=2, not fixed) also seprated by  comma . SAS also read these values as different coulmns and shifts the value of adjacent coulmn. due to this extra coulmns cretae automatically and inapproprite data is filled in coulmns.

Kindly help me to resolve it, i am using SAS EG for data import

Regards

Attachment
Super Contributor
Posts: 644

Re: CSV File Import :

I'm not sure how EG handles input where the apparent number of data items exceeds the number of columns defined in the input.  I think you will have to write a data step anyway in a program module to handle this irregular data: modify my suggestion (above) to start at column N.

Richard

Contributor
Posts: 43

Re: CSV File Import :

I agree with RichardinOz.  The csv sample file was not built correctly.  The text string for N should have been "11,15,17".

If you know that N is the only column that can have multiple values together, then write your data step/ input statement to read in extra fields at the end (X1, X2, X3, etc.) and then shift everything back, depending on the number of extra values found.  I can also tell from the sample, that the number of extra values can vary from row to row.

I've had to deal with this myself, not fun.

You can do this using arrays, but a simple way to show what you are doing with your sample data:

(let's say if you know you can have up to 5 values in that multiple string)

* starting with the maximum number of extra fields ;

length N_string $16 ; * (to hold 5 2-digit numbers and commas);

input A B  C etc... W X1 X2 X3 X4 ;

If X4 ^= ' ' then do ;

     * we have 5 values for N ;

     N_string = N || ',' || O || ',' || P || ',' || Q || ',' || R ;

     O = S ;

     P = T ;

     *fill in the rest as above;

     W = X4 ;

end ;

else if X3 ^= ' ' then do ;

     and so on

Super User
Posts: 10,538

Re: CSV File Import :

Looking at the file example only, HOW do you know the problem is with column N? That makes me think you know more about the file source than you are telling here. As structured, the file is NOT CSV as mentioned data_null. In this case it is only the file extension.

If possible I would go to the source of the file and try to find out how it is created. Then work to see if the source can properly quote qualify the text.

Contributor
Posts: 43

Re: CSV File Import :

I agree that my first choice would be to go back to the entity that generated the false "csv" file, but assuming that's not possible, or the source can't do what you want, and you KNOW what the data should be, you can work around it.

If more than one column can have multiple values, then you can't work around it.

Also, this is only going to work if all the columns after the problem column (column N in the sample) are the same data type.  If you have a date column, or a text string in the columns that might be shifted, then you could lose data unless you read all columns in as string variables, and then convert the numbers.

Now that I think about it, I should have included conversions from the numbers to strings when contatenating N, O, P, etc.

Occasional Contributor
Posts: 6

Re: CSV File Import :

I recive this data from extrenal source frequently and used to work on it, the external source often forget the quantifiers(" "), so to get the correct file waste time which is critical. i-e, i want to handle this exception at my end rather wasting time oftenly to get correct data.

help is required to write a procedure/data step to address this issue when it occurs.

Regards

Ali J

Occasional Contributor
Posts: 7

Re: CSV File Import :

I know that this is really simplistic and clunky (I never really got out of SASv6 mode!), but perhaps it can start you down a trail. Some of the code is bloated to provide intermediate outputs, and could easily be replaced. The assumption is that column N is the field with multiple commas, and that quotation marks are always omitted.

data test;

  infile "C:\My SAS Files\sample.csv" lrecl=256 pad;

  length aa a b c d e f g h i j k l m n o p q r s t u v w $12;

  length ipt ript tipt $256.;

  input ipt $char256.;

  ript=trim(reverse(ipt));

  array field {*} a b c d e f g h i j k l m n o p q r s t u v w;

  do ii=dim(field) to 15 by -1;

    jj=index(ript,",");

    if jj=1 then do;

      field(ii)=" ";

      ript=substr(ript,2);

      end;

    else do;

      field{ii}=reverse(substr(ript,1,jj-1));

      ript=trim(substr(ript,jj+1));

      end;

    end;

  tipt=left(reverse(ript));

  do ii=1 to 13;

    jj=index(tipt,",");

    if jj=1 then do;

      field(ii)=" ";

      tipt=substr(tipt,2);

      end;

    else do;

      field{ii}=substr(tipt,1,jj-1);

      tipt=trim(substr(tipt,jj+1));

      end;

    end;

  n=trim(tipt);

  run;

Occasional Contributor
Posts: 6

Re: CSV File Import :

DanWaldo !

your approch and code was awsome and i had sucefully implemented it with some customization

Respected Advisor
Posts: 3,777

Re: CSV File Import :

Assuming my assumptions are right.  If this works you can add the proper INFORMATS to read the numbers and dates etc.

data csv;
   infile cards dsd column=col;
   input (a b c d e f g h i j k l m)(:$12.) @;
   column=col;
  
CALL SCAN(_infile_,9, position, length,',','mb');
   lengthN = Position - col-1;
  
input n $varying32. lengthN +1 (o p q r s t u v w)(:$12.);
   cards;
A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W
C1,6,T,45600,31-MAR-13,3800,0,0,0,3800,31-JUL-03,31-JUL-05,S,15 ,17,0,,0,0,12,0,0,0,0
C2,54,E,5000000,30-NOV-12,0,0,0,0,0,06-MAY-11,30-SEP-12,S,11 ,18,0,,0,1,0,0,0,0,0
C3,6,T,27835,31-MAR-13,15467,0,0,0,15467,04-AUG-07,04-FEB-09,S,15 ,17,0,,0,0,9,0,0,0,0
C4,54,T,1000000,31-AUG-12,999932,0,0,0,1317036,09-AUG-00,31-JUL-09,S,11 ,15 ,17,0,,0,0,5,0,0,0,0
;;;;
   run;

Message was edited by: data _null_ "+1"

Contributor
Posts: 43

Re: CSV File Import :

I love your idea, but a proc print shows that it is not quite working.

i thought perhaps the empty value of two commas together might be messing up the SCAN() (because SCAN might be calling two delimiters together as one delimiter), but it still didn't work when I added a space there.  The variable "o" is not getting populated.

Respected Advisor
Posts: 3,777

Re: CSV File Import :

I need to advance the pointer +1 after reading N.

Ask a Question
Discussion stats
  • 17 replies
  • 1275 views
  • 8 likes
  • 8 in conversation