BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dhana
Fluorite | Level 6

I have a millions of records like this (comma separated records in text file)

000000000123456789,30,ABC,2010-09-21,JANUDG_BUS2,BH1 MK1,4,201111,100,,78562561,Correct Char,987654321,1112131415,FEE,VEEFEE_BUS2,456109231,1011,LMS,1,0,

000000000987654321,20,ZXF,2011-09-21,GNYUDG_BUS2,BY1 MK1,4,201111,100,,78562561,In SAS Version 7 and beyond, SAS provides new date formats that allow

you to specify the type of separator, or no separator when writing out date values,987654321,1112131415,FEE,VEEFEE_BUS2,456109231,1011,LMS,1,0,

I am able to read the first record with out any problem I am able to read it with the help of DSD MISSOVER TRUNCOVER options.

But if you look at the second record the highlighted character value coming with multiple commas with in the character variable, Can anyone help me to read these kind of record.

Thanks

Dhanasekaran R

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

The following works with your sample records:

data want (drop=_:);

  infile "c:\art\stuff.txt" lrecl=500;

  input;

  format var4 date9.;

  var1=scan(_infile_,1,',');

  var2=input(scan(_infile_,2,','),12.);

  var3=scan(_infile_,3,',');

  var4=input(scan(_infile_,4,','),yymmdd10.);

  var5=scan(_infile_,5,',');

  var6=scan(_infile_,6,',');

  var7=input(scan(_infile_,7,','),12.);

  var8=input(scan(_infile_,8,','),12.);

  var9=input(scan(_infile_,9,','),12.);

  var10=input(scan(_infile_,10,','),12.);

  call scan(_infile_, 11, _sposition, _slength, ',');

  call scan(_infile_, -9, _eposition, _elength, ',');

  var11 = substrn(_infile_, _sposition, _eposition-_sposition-1);

  var12=input(scan(_infile_,-9,','),12.);

  var13=input(scan(_infile_,-8,','),12.);

  var14=scan(_infile_,-7,',');

  var15=scan(_infile_,-6,',');

  var16=input(scan(_infile_,-5,','),12.);

  var17=input(scan(_infile_,-4,','),12.);

  var18=scan(_infile_,-3,',');

  var19=input(scan(_infile_,-2,','),12.);

  var20=input(scan(_infile_,-1,','),12.);

run;

View solution in original post

9 REPLIES 9
art297
Opal | Level 21

If the string in question had been in quotes, I don't think you would have a problem.

However, if it is just one field, you could read each record as a string, and then use the scan function to capture all of the other variables from left to right, and then from right to left, and the remainder as the field you are currently having a problem with.

I'm not going to write the code unless you confirm that the above scenario appears to be the case, and provide a full set of three to five such records for testing purposes.

Of course, if you can do it on your own, even better.

dhana
Fluorite | Level 6

Hi Art,

Thanks for your suggestion. I have attached a file with some sample records.

Can you please help me to read this.

art297
Opal | Level 21

The following works with your sample records:

data want (drop=_:);

  infile "c:\art\stuff.txt" lrecl=500;

  input;

  format var4 date9.;

  var1=scan(_infile_,1,',');

  var2=input(scan(_infile_,2,','),12.);

  var3=scan(_infile_,3,',');

  var4=input(scan(_infile_,4,','),yymmdd10.);

  var5=scan(_infile_,5,',');

  var6=scan(_infile_,6,',');

  var7=input(scan(_infile_,7,','),12.);

  var8=input(scan(_infile_,8,','),12.);

  var9=input(scan(_infile_,9,','),12.);

  var10=input(scan(_infile_,10,','),12.);

  call scan(_infile_, 11, _sposition, _slength, ',');

  call scan(_infile_, -9, _eposition, _elength, ',');

  var11 = substrn(_infile_, _sposition, _eposition-_sposition-1);

  var12=input(scan(_infile_,-9,','),12.);

  var13=input(scan(_infile_,-8,','),12.);

  var14=scan(_infile_,-7,',');

  var15=scan(_infile_,-6,',');

  var16=input(scan(_infile_,-5,','),12.);

  var17=input(scan(_infile_,-4,','),12.);

  var18=scan(_infile_,-3,',');

  var19=input(scan(_infile_,-2,','),12.);

  var20=input(scan(_infile_,-1,','),12.);

run;

dhana
Fluorite | Level 6

Art this is really excellent...! I didn't think in this way...!

Thanks a lot ...!

Regards

Dhana

art297
Opal | Level 21

Dhana,  I've been known to think in unusual ways Smiley Happy  I'm sure that the code could be optimized, but you get the general idea.

Linlin
Lapis Lazuli | Level 10

Hi Art,

I borrowed your code.  Thank you!

data want (drop=i j);

infile "\\your location\samplerecord.txt" lrecl=500;

input;

  array _v(10) $ 80 var1-var10;

    do i=1 to 10;

       _v{i}=scan(_infile_,i,',');

    end;

  call scan(_infile_, 11, _sposition, _slength, ',');

  call scan(_infile_, -9, _eposition, _elength, ',');

  var11 = substrn(_infile_, _sposition, _eposition-_sposition-1);

  array _x(12:20) $ 80 var12-var20;

  do i=12 to 20;

  j=i-21;

  _x(i)=scan(_infile_,j,',');

  end;

run;

Linlin

LarryWorley
Fluorite | Level 6

Dhana,

How about having the file regenerated with some other character as the delimiter, say a tab or obscure character such as '|' or '~'? 

A well behaved file generating application should not use characters which are embedded in data strings as delimiters. 

I know this is a large file, but if this issue will be repeated in the future, it would be nice to have the data in a format that any general data application could read easily and correctly.  To see how this is a problem for other applications, try reading your 5 line example file into Excel.

Will new files be generated in the future or is it easy to get this file re-generated?  If so I strongly suggest fixing at the file generation level rather than as you read into SAS.  Even if this is a one of kind problem, it may be easier to fix and may require fewer computational resources than the relatively complicate scan solution.

Art's solution will work if it is only the eleventh field which contains embedded commas,  but if embedded commas are encountered in other fields those will need to be fixed also.

dhana
Fluorite | Level 6

You are exactly correct Larry. Thanks for your valuable input.

But unfortunately the file comes from a third party vendor, which we dont have control on them.

Regards

Dhana

Tom
Super User Tom
Super User

Can you fire the third party vendor?  Or withhold payment until they produce the data in a usable format?

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 2764 views
  • 0 likes
  • 5 in conversation