BookmarkSubscribeRSS Feed
anandsindha
Calcite | Level 5

Hi Team, I facing issue while importing .csv file into SAS. I tried with different method but didn't succeed.

 

I have multiple .csv files and these files contains some columns in which double quotation was part of the values . because of this double quotation, columns got truncated ,value mismatched  and getting wrong output. 

 

Please refer below sample data and code..

 

data _null_;
file "&path.real_data.txt" lrecl=10000;
put '"Turner, Alfred","2A"';
put '"Anandsinh Sindha","4C"';
put '"Arundhati , "tiwari"","1A"';
run;

 

data test;
infile "&path.real_data.txt" dsd dlm=',' missover;
length v1 $200 v2 $20 ;
input
'V1'n :$CHAR200.
'V2'n :$CHAR20.;
run;

 

based on above code getting below output

 

image.png

 

In this output, In observation-3  and V1 column is break before delimiter , value is separated while it finds double quotation in string value. I want to ignore that Quotation and add full string in single column. 

 

My desire output is mentioned below.. 

 

anandsindha_0-1711527458720.png

 

Please assist me to get my desire output in SAS.

 

Thanks in Advance.

12 REPLIES 12
Kurt_Bremser
Super User

Your csv file has not been created correctly. By rule, quotes that are part of the data have to be doubled:

put '"Arundhati , ""tiwari""","1A"';

And if quotes are not part of the data, they shouldn't be there in the first place. Return the files to have them corrected.

anandsindha
Calcite | Level 5

Hi @Kurt_Bremser ,

 

Thank you for your response ..

 

Actually these are system generated files, and columns are human written like comments or suggestion or ideas, etc. so I can't correct them from source side. Is there any way to handle this kind of problem with in  SAS Code ?

Tom
Super User Tom
Super User

@anandsindha wrote:

Hi @Kurt_Bremser ,

 

Thank you for your response ..

 

Actually these are system generated files, and columns are human written like comments or suggestion or ideas, etc. so I can't correct them from source side. Is there any way to handle this kind of problem with in  SAS Code ?


Describe the "system".  Most normal database systems know how to create proper CSV files, you just have to understand how to tell it that is what you want.

ballardw
Super User

@anandsindha wrote:

Hi @Kurt_Bremser ,

 

Thank you for your response ..

 

Actually these are system generated files, and columns are human written like comments or suggestion or ideas, etc. so I can't correct them from source side. Is there any way to handle this kind of problem with in  SAS Code ?


Have you checked to see if you have "partial" lines of data?

When such stuff is accepted for entry it is entirely likely that line delimiters (carriage return, linefeed or both) have been entered.

 

I would suggest checking with people maintaining that data for the SETTINGS used in their export to CSV generating system. Perhaps you could have them change the field delimiter to something that people don't enter very often like a | character.

Tom
Super User Tom
Super User

Who created the CSV file?  Have them create the file properly.

If you are creating the file with SAS then use the DSD option on the FILE statement and SAS will properly quote the values that need quotes.

Let's convert your attempt to make a file into an actual SAS dataset with some values without commas, some with commas, and some with double quotes.

data have ;
  infile datalines dlm='|' truncover;
  input name :$50. code :$2.;
datalines;
Turner, Alfred|2A
Anandsinh Sindha|4C
Arundhati , "tiwari"|1A
;

Now SAS can write a readable CSV file with a simple data step like:

data _null_;
  set have;
  file 'myfile.csv' dsd;
  put name code;
run;

Which will make a file like this:

"Turner, Alfred",2A
Anandsinh Sindha,4C
"Arundhati , ""tiwari""",1A
anandsindha
Calcite | Level 5

Hi @Tom ,

 

No, I m not creating these file. Actually we are downloading these files from web portal, this portal has functionality to collect the data and by batch job we are downloaded in our space and it's not generated by SAS.

 

is there any code or method available , which can replace double quoted string into actual format before reading file ? 

Tom
Super User Tom
Super User

@anandsindha wrote:

Hi @Tom ,

 

No, I m not creating these file. Actually we are downloading these files from web portal, this portal has functionality to collect the data and by batch job we are downloaded in our space and it's not generated by SAS.

 

is there any code or method available , which can replace double quoted string into actual format before reading file ? 


NO.   The code you shared would create a text file that looks like this:

"Turner, Alfred","2A"
"Anandsinh Sindha","4C"
"Arundhati , "tiwari"","1A"

That third line is confused.  It opens the quoted string and then closes it.  Then has text. Then a quoted string of length zero.  You might try fixing the file manually.

 

If (and it is a big if) you know that only one field in the line can contain embedded delimiters then you can possible fix the file by counting the delimiters.  In this case treat everything after the last comma as the last field and everything before the last comma as the value of the first field.

 

Some systems think they should "escape" special characters instead of following normal quoting rules.  So check the bad lines and see if the extra double quotes (and possible the extra delimiters) have backslash character in front of them.  If so then that file can be fixed by replacing the \" with "" instead.  And the resulting fixed file should be readable.

 

Another problem that SAS will have is with files that have the end of line characters embedded in the value of a field. That you can handle IF the field with the embedded end of line characters is quoted (and the quoting is properly done, unlike in your example) by removing/replacing any end of line characters that appear when the count of the number of double quote characters that have been seen is ODD. https://github.com/sasutils/macros/blob/master/replace_crlf.sas

ballardw
Super User

@anandsindha wrote:

Hi @Tom ,

 

No, I m not creating these file. Actually we are downloading these files from web portal, this portal has functionality to collect the data and by batch job we are downloaded in our space and it's not generated by SAS.

 

is there any code or method available , which can replace double quoted string into actual format before reading file ? 


Can you share some details about how you down load these files? Are the CSV on the portal before you download or are they made by the web portal when requested?

Is the description field that causes problems in the middle of the other data columns or at the "end" of the line of data in the CSV?

 

anandsindha
Calcite | Level 5

These files are made by web portal when requested. these description columns are in middle of the data, so it creates problem for other data also.

Kurt_Bremser
Super User

In order to fix this, you would need to first make up a rule in plain language, which, if applied manually by you, results in correct data without any unwanted side-effects. If such a rule is possible, it can be translated into code; if not, then there's no way to create an automatic, code-based fix.

 

One possible approach: read the whole line (using the automatic _INFILE_ variable), and parsing it from front (up to the interesting field) and from back (down to the interesting field), counting along the commas. Keep the comma positions, so you can finally retrieve everything between these positions into your "description" variable with SUBSTR. But this will only work if the number of values after the interesting field is always the same in all lines.

 

But really, the operators of the portal need to fix this. Either reject any input containing quotes, or run a simple routine over it that duplicates any quotes in the value. Just a matter of being professional and taking your job seriously.

Patrick
Opal | Level 21

If these are free-text fields where users can enter "anything" and when downloading the data it doesn't create a clean .csv format then you need to try if you can define a character as delimiter that's highly unlikely to also get used by users in their free-text. Or if there are other formats like json available then consider using these.

Below code works for your sample data but there is a big chance that you've got other cases in your actual data that won't be covered.

%let path=%sysfunc(pathname(work));

data _null_;
  file "&path.\have_data.txt" lrecl=10000;
  put '"Turner, Alfred","2A"';
  put '"Anandsinh Sindha","4C"';
  put '"Arundhati , "tiwari"","1A"';
run;

data _null_;
  file "&path.\need_data.txt" lrecl=10000;
  infile "&path.\have_data.txt" lrecl=10000;
  input;
  /*_infile_=prxchange('s/(")([^"]*?)("")(,")/$3$2$1$4/oi',-1,trim(_infile_));*/
  _infile_=prxchange('s/"([^"]*?)"","/$1","/oi',-1,trim(_infile_));
  put _infile_;
run;

data work.want;
  infile "&path.\need_data.txt" dlm=',' truncover dsd lrecl=10000;
  input var1:$40. var2:$40.;
run;

proc print data=want;
run;

Patrick_0-1711596499704.png

 

yabwon
Onyx | Level 15

how about:

data _null_;
file "R:/real_data.txt" lrecl=10000;
put '"Turner, Alfred","2A"';
put '"Anandsinh Sindha","4C"';
put '"Arundhati , "tiwari"","1A"';
run;

 

data test;
  dlm=",";
  infile "R:/real_data.txt" dlm=dlm truncover;
  input @;
  _infile_=left(reverse(_infile_));
  input V2 :$20. @@;
  dlm='0a'x;
  input V1 ~$200.;
  v1 = left(reverse(v1));
  v1 = substr(v1,2,length(v1)-2);
  v2 = dequote(left(reverse(v2)));
run;
proc print;
run;

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 674 views
  • 0 likes
  • 6 in conversation