Hi Team, I facing issue while importing .csv file into SAS. I tried with different method but didn't succeed.
I have multiple .csv files and these files contains some columns in which double quotation was part of the values . because of this double quotation, columns got truncated ,value mismatched and getting wrong output.
Please refer below sample data and code..
data _null_;
file "&path.real_data.txt" lrecl=10000;
put '"Turner, Alfred","2A"';
put '"Anandsinh Sindha","4C"';
put '"Arundhati , "tiwari"","1A"';
run;
data test;
infile "&path.real_data.txt" dsd dlm=',' missover;
length v1 $200 v2 $20 ;
input
'V1'n :$CHAR200.
'V2'n :$CHAR20.;
run;
based on above code getting below output
In this output, In observation-3 and V1 column is break before delimiter , value is separated while it finds double quotation in string value. I want to ignore that Quotation and add full string in single column.
My desire output is mentioned below..
Please assist me to get my desire output in SAS.
Thanks in Advance.
Your csv file has not been created correctly. By rule, quotes that are part of the data have to be doubled:
put '"Arundhati , ""tiwari""","1A"';
And if quotes are not part of the data, they shouldn't be there in the first place. Return the files to have them corrected.
Hi @Kurt_Bremser ,
Thank you for your response ..
Actually these are system generated files, and columns are human written like comments or suggestion or ideas, etc. so I can't correct them from source side. Is there any way to handle this kind of problem with in SAS Code ?
@anandsindha wrote:
Hi @Kurt_Bremser ,
Thank you for your response ..
Actually these are system generated files, and columns are human written like comments or suggestion or ideas, etc. so I can't correct them from source side. Is there any way to handle this kind of problem with in SAS Code ?
Describe the "system". Most normal database systems know how to create proper CSV files, you just have to understand how to tell it that is what you want.
@anandsindha wrote:
Hi @Kurt_Bremser ,
Thank you for your response ..
Actually these are system generated files, and columns are human written like comments or suggestion or ideas, etc. so I can't correct them from source side. Is there any way to handle this kind of problem with in SAS Code ?
Have you checked to see if you have "partial" lines of data?
When such stuff is accepted for entry it is entirely likely that line delimiters (carriage return, linefeed or both) have been entered.
I would suggest checking with people maintaining that data for the SETTINGS used in their export to CSV generating system. Perhaps you could have them change the field delimiter to something that people don't enter very often like a | character.
Who created the CSV file? Have them create the file properly.
If you are creating the file with SAS then use the DSD option on the FILE statement and SAS will properly quote the values that need quotes.
Let's convert your attempt to make a file into an actual SAS dataset with some values without commas, some with commas, and some with double quotes.
data have ;
infile datalines dlm='|' truncover;
input name :$50. code :$2.;
datalines;
Turner, Alfred|2A
Anandsinh Sindha|4C
Arundhati , "tiwari"|1A
;
Now SAS can write a readable CSV file with a simple data step like:
data _null_;
set have;
file 'myfile.csv' dsd;
put name code;
run;
Which will make a file like this:
"Turner, Alfred",2A Anandsinh Sindha,4C "Arundhati , ""tiwari""",1A
Hi @Tom ,
No, I m not creating these file. Actually we are downloading these files from web portal, this portal has functionality to collect the data and by batch job we are downloaded in our space and it's not generated by SAS.
is there any code or method available , which can replace double quoted string into actual format before reading file ?
@anandsindha wrote:
Hi @Tom ,
No, I m not creating these file. Actually we are downloading these files from web portal, this portal has functionality to collect the data and by batch job we are downloaded in our space and it's not generated by SAS.
is there any code or method available , which can replace double quoted string into actual format before reading file ?
NO. The code you shared would create a text file that looks like this:
"Turner, Alfred","2A" "Anandsinh Sindha","4C" "Arundhati , "tiwari"","1A"
That third line is confused. It opens the quoted string and then closes it. Then has text. Then a quoted string of length zero. You might try fixing the file manually.
If (and it is a big if) you know that only one field in the line can contain embedded delimiters then you can possible fix the file by counting the delimiters. In this case treat everything after the last comma as the last field and everything before the last comma as the value of the first field.
Some systems think they should "escape" special characters instead of following normal quoting rules. So check the bad lines and see if the extra double quotes (and possible the extra delimiters) have backslash character in front of them. If so then that file can be fixed by replacing the \" with "" instead. And the resulting fixed file should be readable.
Another problem that SAS will have is with files that have the end of line characters embedded in the value of a field. That you can handle IF the field with the embedded end of line characters is quoted (and the quoting is properly done, unlike in your example) by removing/replacing any end of line characters that appear when the count of the number of double quote characters that have been seen is ODD. https://github.com/sasutils/macros/blob/master/replace_crlf.sas
@anandsindha wrote:
Hi @Tom ,
No, I m not creating these file. Actually we are downloading these files from web portal, this portal has functionality to collect the data and by batch job we are downloaded in our space and it's not generated by SAS.
is there any code or method available , which can replace double quoted string into actual format before reading file ?
Can you share some details about how you down load these files? Are the CSV on the portal before you download or are they made by the web portal when requested?
Is the description field that causes problems in the middle of the other data columns or at the "end" of the line of data in the CSV?
These files are made by web portal when requested. these description columns are in middle of the data, so it creates problem for other data also.
In order to fix this, you would need to first make up a rule in plain language, which, if applied manually by you, results in correct data without any unwanted side-effects. If such a rule is possible, it can be translated into code; if not, then there's no way to create an automatic, code-based fix.
One possible approach: read the whole line (using the automatic _INFILE_ variable), and parsing it from front (up to the interesting field) and from back (down to the interesting field), counting along the commas. Keep the comma positions, so you can finally retrieve everything between these positions into your "description" variable with SUBSTR. But this will only work if the number of values after the interesting field is always the same in all lines.
But really, the operators of the portal need to fix this. Either reject any input containing quotes, or run a simple routine over it that duplicates any quotes in the value. Just a matter of being professional and taking your job seriously.
If these are free-text fields where users can enter "anything" and when downloading the data it doesn't create a clean .csv format then you need to try if you can define a character as delimiter that's highly unlikely to also get used by users in their free-text. Or if there are other formats like json available then consider using these.
Below code works for your sample data but there is a big chance that you've got other cases in your actual data that won't be covered.
%let path=%sysfunc(pathname(work));
data _null_;
file "&path.\have_data.txt" lrecl=10000;
put '"Turner, Alfred","2A"';
put '"Anandsinh Sindha","4C"';
put '"Arundhati , "tiwari"","1A"';
run;
data _null_;
file "&path.\need_data.txt" lrecl=10000;
infile "&path.\have_data.txt" lrecl=10000;
input;
/*_infile_=prxchange('s/(")([^"]*?)("")(,")/$3$2$1$4/oi',-1,trim(_infile_));*/
_infile_=prxchange('s/"([^"]*?)"","/$1","/oi',-1,trim(_infile_));
put _infile_;
run;
data work.want;
infile "&path.\need_data.txt" dlm=',' truncover dsd lrecl=10000;
input var1:$40. var2:$40.;
run;
proc print data=want;
run;
how about:
data _null_;
file "R:/real_data.txt" lrecl=10000;
put '"Turner, Alfred","2A"';
put '"Anandsinh Sindha","4C"';
put '"Arundhati , "tiwari"","1A"';
run;
data test;
dlm=",";
infile "R:/real_data.txt" dlm=dlm truncover;
input @;
_infile_=left(reverse(_infile_));
input V2 :$20. @@;
dlm='0a'x;
input V1 ~$200.;
v1 = left(reverse(v1));
v1 = substr(v1,2,length(v1)-2);
v2 = dequote(left(reverse(v2)));
run;
proc print;
run;
Bart
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.