I have a millions of records like this (comma separated records in text file)
000000000123456789,30,ABC,2010-09-21,JANUDG_BUS2,BH1 MK1,4,201111,100,,78562561,Correct Char,987654321,1112131415,FEE,VEEFEE_BUS2,456109231,1011,LMS,1,0,
000000000987654321,20,ZXF,2011-09-21,GNYUDG_BUS2,BY1 MK1,4,201111,100,,78562561,In SAS Version 7 and beyond, SAS provides new date formats that allow
you to specify the type of separator, or no separator when writing out date values,987654321,1112131415,FEE,VEEFEE_BUS2,456109231,1011,LMS,1,0,
I am able to read the first record with out any problem I am able to read it with the help of DSD MISSOVER TRUNCOVER options.
But if you look at the second record the highlighted character value coming with multiple commas with in the character variable, Can anyone help me to read these kind of record.
Thanks
Dhanasekaran R
The following works with your sample records:
data want (drop=_:);
infile "c:\art\stuff.txt" lrecl=500;
input;
format var4 date9.;
var1=scan(_infile_,1,',');
var2=input(scan(_infile_,2,','),12.);
var3=scan(_infile_,3,',');
var4=input(scan(_infile_,4,','),yymmdd10.);
var5=scan(_infile_,5,',');
var6=scan(_infile_,6,',');
var7=input(scan(_infile_,7,','),12.);
var8=input(scan(_infile_,8,','),12.);
var9=input(scan(_infile_,9,','),12.);
var10=input(scan(_infile_,10,','),12.);
call scan(_infile_, 11, _sposition, _slength, ',');
call scan(_infile_, -9, _eposition, _elength, ',');
var11 = substrn(_infile_, _sposition, _eposition-_sposition-1);
var12=input(scan(_infile_,-9,','),12.);
var13=input(scan(_infile_,-8,','),12.);
var14=scan(_infile_,-7,',');
var15=scan(_infile_,-6,',');
var16=input(scan(_infile_,-5,','),12.);
var17=input(scan(_infile_,-4,','),12.);
var18=scan(_infile_,-3,',');
var19=input(scan(_infile_,-2,','),12.);
var20=input(scan(_infile_,-1,','),12.);
run;
If the string in question had been in quotes, I don't think you would have a problem.
However, if it is just one field, you could read each record as a string, and then use the scan function to capture all of the other variables from left to right, and then from right to left, and the remainder as the field you are currently having a problem with.
I'm not going to write the code unless you confirm that the above scenario appears to be the case, and provide a full set of three to five such records for testing purposes.
Of course, if you can do it on your own, even better.
Hi Art,
Thanks for your suggestion. I have attached a file with some sample records.
Can you please help me to read this.
The following works with your sample records:
data want (drop=_:);
infile "c:\art\stuff.txt" lrecl=500;
input;
format var4 date9.;
var1=scan(_infile_,1,',');
var2=input(scan(_infile_,2,','),12.);
var3=scan(_infile_,3,',');
var4=input(scan(_infile_,4,','),yymmdd10.);
var5=scan(_infile_,5,',');
var6=scan(_infile_,6,',');
var7=input(scan(_infile_,7,','),12.);
var8=input(scan(_infile_,8,','),12.);
var9=input(scan(_infile_,9,','),12.);
var10=input(scan(_infile_,10,','),12.);
call scan(_infile_, 11, _sposition, _slength, ',');
call scan(_infile_, -9, _eposition, _elength, ',');
var11 = substrn(_infile_, _sposition, _eposition-_sposition-1);
var12=input(scan(_infile_,-9,','),12.);
var13=input(scan(_infile_,-8,','),12.);
var14=scan(_infile_,-7,',');
var15=scan(_infile_,-6,',');
var16=input(scan(_infile_,-5,','),12.);
var17=input(scan(_infile_,-4,','),12.);
var18=scan(_infile_,-3,',');
var19=input(scan(_infile_,-2,','),12.);
var20=input(scan(_infile_,-1,','),12.);
run;
Art this is really excellent...! I didn't think in this way...!
Thanks a lot ...!
Regards
Dhana
Dhana,  I've been known to think in unusual ways  I'm sure that the code could be optimized, but you get the general idea.
  I'm sure that the code could be optimized, but you get the general idea.
Hi Art,
I borrowed your code. Thank you!
data want (drop=i j);
infile "\\your location\samplerecord.txt" lrecl=500;
input;
array _v(10) $ 80 var1-var10;
do i=1 to 10;
_v{i}=scan(_infile_,i,',');
end;
call scan(_infile_, 11, _sposition, _slength, ',');
call scan(_infile_, -9, _eposition, _elength, ',');
var11 = substrn(_infile_, _sposition, _eposition-_sposition-1);
array _x(12:20) $ 80 var12-var20;
do i=12 to 20;
j=i-21;
_x(i)=scan(_infile_,j,',');
end;
run;
Linlin
Dhana,
How about having the file regenerated with some other character as the delimiter, say a tab or obscure character such as '|' or '~'?
A well behaved file generating application should not use characters which are embedded in data strings as delimiters.
I know this is a large file, but if this issue will be repeated in the future, it would be nice to have the data in a format that any general data application could read easily and correctly. To see how this is a problem for other applications, try reading your 5 line example file into Excel.
Will new files be generated in the future or is it easy to get this file re-generated? If so I strongly suggest fixing at the file generation level rather than as you read into SAS. Even if this is a one of kind problem, it may be easier to fix and may require fewer computational resources than the relatively complicate scan solution.
Art's solution will work if it is only the eleventh field which contains embedded commas, but if embedded commas are encountered in other fields those will need to be fixed also.
You are exactly correct Larry. Thanks for your valuable input.
But unfortunately the file comes from a third party vendor, which we dont have control on them.
Regards
Dhana
Can you fire the third party vendor? Or withhold payment until they produce the data in a usable format?
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
