DATA Step, Macro, Functions and more

Reading comma Delimited File

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 75
Accepted Solution

Reading comma Delimited File

I have a millions of records like this (comma separated records in text file)

000000000123456789,30,ABC,2010-09-21,JANUDG_BUS2,BH1 MK1,4,201111,100,,78562561,Correct Char,987654321,1112131415,FEE,VEEFEE_BUS2,456109231,1011,LMS,1,0,

000000000987654321,20,ZXF,2011-09-21,GNYUDG_BUS2,BY1 MK1,4,201111,100,,78562561,In SAS Version 7 and beyond, SAS provides new date formats that allow

you to specify the type of separator, or no separator when writing out date values,987654321,1112131415,FEE,VEEFEE_BUS2,456109231,1011,LMS,1,0,

I am able to read the first record with out any problem I am able to read it with the help of DSD MISSOVER TRUNCOVER options.

But if you look at the second record the highlighted character value coming with multiple commas with in the character variable, Can anyone help me to read these kind of record.

Thanks

Dhanasekaran R


Accepted Solutions
Solution
‎01-10-2012 02:27 PM
PROC Star
Posts: 7,492

Re: Reading comma Delimited File

The following works with your sample records:

data want (drop=_Smiley Happy;

  infile "c:\art\stuff.txt" lrecl=500;

  input;

  format var4 date9.;

  var1=scan(_infile_,1,',');

  var2=input(scan(_infile_,2,','),12.);

  var3=scan(_infile_,3,',');

  var4=input(scan(_infile_,4,','),yymmdd10.);

  var5=scan(_infile_,5,',');

  var6=scan(_infile_,6,',');

  var7=input(scan(_infile_,7,','),12.);

  var8=input(scan(_infile_,8,','),12.);

  var9=input(scan(_infile_,9,','),12.);

  var10=input(scan(_infile_,10,','),12.);

  call scan(_infile_, 11, _sposition, _slength, ',');

  call scan(_infile_, -9, _eposition, _elength, ',');

  var11 = substrn(_infile_, _sposition, _eposition-_sposition-1);

  var12=input(scan(_infile_,-9,','),12.);

  var13=input(scan(_infile_,-8,','),12.);

  var14=scan(_infile_,-7,',');

  var15=scan(_infile_,-6,',');

  var16=input(scan(_infile_,-5,','),12.);

  var17=input(scan(_infile_,-4,','),12.);

  var18=scan(_infile_,-3,',');

  var19=input(scan(_infile_,-2,','),12.);

  var20=input(scan(_infile_,-1,','),12.);

run;

View solution in original post


All Replies
PROC Star
Posts: 7,492

Reading comma Delimited File

If the string in question had been in quotes, I don't think you would have a problem.

However, if it is just one field, you could read each record as a string, and then use the scan function to capture all of the other variables from left to right, and then from right to left, and the remainder as the field you are currently having a problem with.

I'm not going to write the code unless you confirm that the above scenario appears to be the case, and provide a full set of three to five such records for testing purposes.

Of course, if you can do it on your own, even better.

Frequent Contributor
Posts: 75

Re: Reading comma Delimited File

Hi Art,

Thanks for your suggestion. I have attached a file with some sample records.

Can you please help me to read this.

Attachment
Solution
‎01-10-2012 02:27 PM
PROC Star
Posts: 7,492

Re: Reading comma Delimited File

The following works with your sample records:

data want (drop=_Smiley Happy;

  infile "c:\art\stuff.txt" lrecl=500;

  input;

  format var4 date9.;

  var1=scan(_infile_,1,',');

  var2=input(scan(_infile_,2,','),12.);

  var3=scan(_infile_,3,',');

  var4=input(scan(_infile_,4,','),yymmdd10.);

  var5=scan(_infile_,5,',');

  var6=scan(_infile_,6,',');

  var7=input(scan(_infile_,7,','),12.);

  var8=input(scan(_infile_,8,','),12.);

  var9=input(scan(_infile_,9,','),12.);

  var10=input(scan(_infile_,10,','),12.);

  call scan(_infile_, 11, _sposition, _slength, ',');

  call scan(_infile_, -9, _eposition, _elength, ',');

  var11 = substrn(_infile_, _sposition, _eposition-_sposition-1);

  var12=input(scan(_infile_,-9,','),12.);

  var13=input(scan(_infile_,-8,','),12.);

  var14=scan(_infile_,-7,',');

  var15=scan(_infile_,-6,',');

  var16=input(scan(_infile_,-5,','),12.);

  var17=input(scan(_infile_,-4,','),12.);

  var18=scan(_infile_,-3,',');

  var19=input(scan(_infile_,-2,','),12.);

  var20=input(scan(_infile_,-1,','),12.);

run;

Frequent Contributor
Posts: 75

Re: Reading comma Delimited File

Art this is really excellent...! I didn't think in this way...!

Thanks a lot ...!

Regards

Dhana

PROC Star
Posts: 7,492

Re: Reading comma Delimited File

Dhana,  I've been known to think in unusual ways Smiley Happy  I'm sure that the code could be optimized, but you get the general idea.

Super Contributor
Posts: 1,636

Re: Reading comma Delimited File

Hi Art,

I borrowed your code.  Thank you!

data want (drop=i j);

infile "\\your location\samplerecord.txt" lrecl=500;

input;

  array _v(10) $ 80 var1-var10;

    do i=1 to 10;

       _v{i}=scan(_infile_,i,',');

    end;

  call scan(_infile_, 11, _sposition, _slength, ',');

  call scan(_infile_, -9, _eposition, _elength, ',');

  var11 = substrn(_infile_, _sposition, _eposition-_sposition-1);

  array _x(12:20) $ 80 var12-var20;

  do i=12 to 20;

  j=i-21;

  _x(i)=scan(_infile_,j,',');

  end;

run;

Linlin

Frequent Contributor
Posts: 129

Re: Reading comma Delimited File

Dhana,

How about having the file regenerated with some other character as the delimiter, say a tab or obscure character such as '|' or '~'? 

A well behaved file generating application should not use characters which are embedded in data strings as delimiters. 

I know this is a large file, but if this issue will be repeated in the future, it would be nice to have the data in a format that any general data application could read easily and correctly.  To see how this is a problem for other applications, try reading your 5 line example file into Excel.

Will new files be generated in the future or is it easy to get this file re-generated?  If so I strongly suggest fixing at the file generation level rather than as you read into SAS.  Even if this is a one of kind problem, it may be easier to fix and may require fewer computational resources than the relatively complicate scan solution.

Art's solution will work if it is only the eleventh field which contains embedded commas,  but if embedded commas are encountered in other fields those will need to be fixed also.

Frequent Contributor
Posts: 75

Re: Reading comma Delimited File

Posted in reply to LarryWorley

You are exactly correct Larry. Thanks for your valuable input.

But unfortunately the file comes from a third party vendor, which we dont have control on them.

Regards

Dhana

Super User
Super User
Posts: 7,081

Re: Reading comma Delimited File

Can you fire the third party vendor?  Or withhold payment until they produce the data in a usable format?

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 442 views
  • 0 likes
  • 5 in conversation