BookmarkSubscribeRSS Feed
AliJ
Calcite | Level 5

Hi,

I have to import a Comma Seprated file into SAS but there is a problem with one coulm (A), Coulmn(A) has multiple values seprated by commas.

while importing SAS also reads these commas, so shift the values in subsequent coulmns(B,C,..). so the data become courrpt.

Can anay one please advice how can this exception(issue) be handle? so the multiple values in coulmn(A) remains in  Coulmn(A).

SAS gurus your support in this would be highly appreciated

Regards

Ali J

17 REPLIES 17
data_null__
Jade | Level 19

I don't believe you have a CSV because the specification for a CSV "says" that fields that contain the delimiter are to be quoted.

If you know the total number of fields you can count from the right to find the column where the first field should end.  Then read that field with $VARYING and continue with the other fields.

Show example data.

Amir
PROC Star

Hi,

Can you please provide some example problem data, the code you are using to read it and what results you want.

Regards,

Amir.

RichardinOz
Quartz | Level 8

If column A always has the same number of commas (eg 2) then you can define extra temporary columns A1 A2 A3 to import the data; after the input statement you then need something like

     A = catx (', ', A1, A2, A3) ;

which will restore the commas in a single column A.  Follow with

     Drop A1 A2 A3 ;

Otherwise this might work:

     Infile xxx MISSOVER [etc] ;

     Input @ ;

     _Infile_ = reverse (_Infile_) ;

     Input     D  C  B A3 A2 A1 ;

     D = left (reverse (D)) ;

     C = left (reverse (C)) ;

     ...

     A1 = left (reverse (A1)) ;

     A = catx (', ', A1, A2, A3) ;

     Drop A1 A2 A3 ;

Good luck.

Richard

AliJ
Calcite | Level 5

Dear All,

Thanks for your replys.


Probably I did not explain the problem correctly. Please find attached sample data "SAMPL.csv"

There are coulmns from A to W , when I import data it create coulmns by commas but in coulmn 'N' there are multiple values(>=2, not fixed) also seprated by  comma . SAS also read these values as different coulmns and shifts the value of adjacent coulmn. due to this extra coulmns cretae automatically and inapproprite data is filled in coulmns.

Kindly help me to resolve it, i am using SAS EG for data import

Regards

RichardinOz
Quartz | Level 8

I'm not sure how EG handles input where the apparent number of data items exceeds the number of columns defined in the input.  I think you will have to write a data step anyway in a program module to handle this irregular data: modify my suggestion (above) to start at column N.

Richard

cwilson
Calcite | Level 5

I agree with RichardinOz.  The csv sample file was not built correctly.  The text string for N should have been "11,15,17".

If you know that N is the only column that can have multiple values together, then write your data step/ input statement to read in extra fields at the end (X1, X2, X3, etc.) and then shift everything back, depending on the number of extra values found.  I can also tell from the sample, that the number of extra values can vary from row to row.

I've had to deal with this myself, not fun.

You can do this using arrays, but a simple way to show what you are doing with your sample data:

(let's say if you know you can have up to 5 values in that multiple string)

* starting with the maximum number of extra fields ;

length N_string $16 ; * (to hold 5 2-digit numbers and commas);

input A B  C etc... W X1 X2 X3 X4 ;

If X4 ^= ' ' then do ;

     * we have 5 values for N ;

     N_string = N || ',' || O || ',' || P || ',' || Q || ',' || R ;

     O = S ;

     P = T ;

     *fill in the rest as above;

     W = X4 ;

end ;

else if X3 ^= ' ' then do ;

     and so on

ballardw
Super User

Looking at the file example only, HOW do you know the problem is with column N? That makes me think you know more about the file source than you are telling here. As structured, the file is NOT CSV as mentioned data_null. In this case it is only the file extension.

If possible I would go to the source of the file and try to find out how it is created. Then work to see if the source can properly quote qualify the text.

cwilson
Calcite | Level 5

I agree that my first choice would be to go back to the entity that generated the false "csv" file, but assuming that's not possible, or the source can't do what you want, and you KNOW what the data should be, you can work around it.

If more than one column can have multiple values, then you can't work around it.

Also, this is only going to work if all the columns after the problem column (column N in the sample) are the same data type.  If you have a date column, or a text string in the columns that might be shifted, then you could lose data unless you read all columns in as string variables, and then convert the numbers.

Now that I think about it, I should have included conversions from the numbers to strings when contatenating N, O, P, etc.

AliJ
Calcite | Level 5

I recive this data from extrenal source frequently and used to work on it, the external source often forget the quantifiers(" "), so to get the correct file waste time which is critical. i-e, i want to handle this exception at my end rather wasting time oftenly to get correct data.

help is required to write a procedure/data step to address this issue when it occurs.

Regards

Ali J

DanWALDO
Calcite | Level 5

I know that this is really simplistic and clunky (I never really got out of SASv6 mode!), but perhaps it can start you down a trail. Some of the code is bloated to provide intermediate outputs, and could easily be replaced. The assumption is that column N is the field with multiple commas, and that quotation marks are always omitted.

data test;

  infile "C:\My SAS Files\sample.csv" lrecl=256 pad;

  length aa a b c d e f g h i j k l m n o p q r s t u v w $12;

  length ipt ript tipt $256.;

  input ipt $char256.;

  ript=trim(reverse(ipt));

  array field {*} a b c d e f g h i j k l m n o p q r s t u v w;

  do ii=dim(field) to 15 by -1;

    jj=index(ript,",");

    if jj=1 then do;

      field(ii)=" ";

      ript=substr(ript,2);

      end;

    else do;

      field{ii}=reverse(substr(ript,1,jj-1));

      ript=trim(substr(ript,jj+1));

      end;

    end;

  tipt=left(reverse(ript));

  do ii=1 to 13;

    jj=index(tipt,",");

    if jj=1 then do;

      field(ii)=" ";

      tipt=substr(tipt,2);

      end;

    else do;

      field{ii}=substr(tipt,1,jj-1);

      tipt=trim(substr(tipt,jj+1));

      end;

    end;

  n=trim(tipt);

  run;

AliJ
Calcite | Level 5

DanWaldo !

your approch and code was awsome and i had sucefully implemented it with some customization

data_null__
Jade | Level 19

Assuming my assumptions are right.  If this works you can add the proper INFORMATS to read the numbers and dates etc.

data csv;
   infile cards dsd column=col;
   input (a b c d e f g h i j k l m)(:$12.) @;
   column=col;
  
CALL SCAN(_infile_,9, position, length,',','mb');
   lengthN = Position - col-1;
  
input n $varying32. lengthN +1 (o p q r s t u v w)(:$12.);
   cards;
A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W
C1,6,T,45600,31-MAR-13,3800,0,0,0,3800,31-JUL-03,31-JUL-05,S,15 ,17,0,,0,0,12,0,0,0,0
C2,54,E,5000000,30-NOV-12,0,0,0,0,0,06-MAY-11,30-SEP-12,S,11 ,18,0,,0,1,0,0,0,0,0
C3,6,T,27835,31-MAR-13,15467,0,0,0,15467,04-AUG-07,04-FEB-09,S,15 ,17,0,,0,0,9,0,0,0,0
C4,54,T,1000000,31-AUG-12,999932,0,0,0,1317036,09-AUG-00,31-JUL-09,S,11 ,15 ,17,0,,0,0,5,0,0,0,0
;;;;
   run;

Message was edited by: data _null_ "+1"

cwilson
Calcite | Level 5

I love your idea, but a proc print shows that it is not quite working.

i thought perhaps the empty value of two commas together might be messing up the SCAN() (because SCAN might be calling two delimiters together as one delimiter), but it still didn't work when I added a space there.  The variable "o" is not getting populated.

data_null__
Jade | Level 19

I need to advance the pointer +1 after reading N.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 2216 views
  • 8 likes
  • 8 in conversation