BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
xyxu
Quartz | Level 8

The CSV file has many columns, and 2 of these columns' values are text (short paragraphs). I am using 

delimiter = ',' 

It seems SAS recognizes any comma inside the paragraphs as a delimiter and reports an error. Is there a good way to properly import such files?

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

You have a line feed (LF) in your data. Add the following to your Infile statement and see if this helps: TERMSTR=CRLF

View solution in original post

11 REPLIES 11
ballardw
Super User

The infile option DSD should allow you to read the data if the value in the data is inside quotes.

 

If the values contain quotes in the body you may have issues though.

xyxu
Quartz | Level 8
I used DSD option, but still get errors when SAS goes over a comma inside the string.
Patrick
Opal | Level 21

The strings with commas need to be quoted. If so then below two options should work (for the Infile statement DSD adds the "magic" you're after).

%let path=%sysfunc(pathname(work));
data _null_;
  file "&path/text.csv";
  put 'col1,col2,col3';
  put '1,"ab, cd", xyz';
  stop;
run;

/* using proc import */
proc import 
  file="&path/text.csv"
  out=want1
  dbms=csv;
  delimiter = ",";
  getnames=yes;
  guessingrows=max;
run;

proc print data=want1;
run;

/* using data step */
data want2;
  infile "&path/text.csv" dsd truncover firstobs=2;
  informat col1 best32. col2 col3 $10.;
  input col1 col2 col3;
run;

proc print data=want2;
run;

 

Patrick_0-1591475798353.png

 

xyxu
Quartz | Level 8

Thanks! Your solution helps me realize that the problem I am getting may be different from what I initially thought. Please see attached for a one-row example. I got errors when using the following code:

data want;
	infile "&my_path\test2.csv" dsd truncover firstobs = 2;
	informat 
		NUMBER 10.
		Current_Incremental_Terms Current_Covenants Current_Other $100.
		EBITDA $10.
		FirstLienLev TotalLev 10.
		Equity_perc percent5.2
		Revolver $10.
		Tla Concurrent_Bonds 10.
		Purpose_comment $50.;
	input NUMBER Current_Incremental_Terms	Current_Covenants Current_Other EBITDA FirstLienLev TotalLev Equity_perc Revolver Tla Concurrent_Bonds Purpose_comment;
run;

While the long strings are indeed quoted in the csv file, SAS still breaks the string between commas. Could you tell what's causing the error? 

Patrick
Opal | Level 21

You have a line feed (LF) in your data. Add the following to your Infile statement and see if this helps: TERMSTR=CRLF

xyxu
Quartz | Level 8

That works! After that, the remaining errors are due to data formats (numerical value = "N/A" when it should indeed be missing). So it should be fine now. Thank you!

Patrick
Opal | Level 21

You probably shouldn't mark your own post as solution. That's normally not very appreciated here in the forums.

For the N/A issue and if you just want this to become a missing use the ??. Done in below code for Concurrent_Bonds as there was another issue in your data.

 

data want;
	infile "~/test/test2.csv" dsd truncover firstobs = 2 termstr=crlf;
	informat 
		NUMBER 10.
		Current_Incremental_Terms Current_Covenants Current_Other $100.
		EBITDA $10.
		FirstLienLev TotalLev 10.
		Equity_perc percent5.2
		Revolver $10.
		Tla Concurrent_Bonds 10.
		Purpose_comment $50.;
	input NUMBER Current_Incremental_Terms	Current_Covenants 
    Current_Other EBITDA FirstLienLev 
    TotalLev Equity_perc Revolver Tla 
    Concurrent_Bonds ?? best32.
    Purpose_comment
    ;
run;

proc print data=want;
run;

 

 

xyxu
Quartz | Level 8

That was a mis-click. Already corrected. Thanks again! 

ballardw
Super User

@xyxu wrote:

That works! After that, the remaining errors are due to data formats (numerical value = "N/A" when it should indeed be missing). So it should be fine now. Thank you!


Depending on the actual desired informat you can create a custom informat using the Proc Format invalue statement to read the N/A "cleanly" as missing. By clean I mean that there will not be any "invalid data" messages.

Example:

proc format ;
invalue mybest (upcase)
"N/A"=.
other = best8.    /*<= informat here should be what you want for actual values*/
;
run;

data example;
   informat x mybest. y best8.;
   input x y;
datalines;
1 1
n/a n/a
N/a 3
n/A 4
345.5  16
;

The Invalue statement names the informat you are creating, the option (UPCASE) turns any text into upper case before comparing to the list of values, otherwise it is an exact comparison.

Notice in the example data I only assigned the informat to one of the variables to show that you still get the invalid data for the variable without the format but the X is not causing the messages.

 

Going one step further with the informats if the variables are supposed to contain only a certain range of values, such as survey question response codes of 1,2,3,4 and 5 you can validate the data as it is read. This can get tricky with continuous variables because the left side  of the definitions should be explicit text values. Here's a brief example.

proc format ;
invalue lickert
'1','2','3','4','5'= f1.
' ','.'  = .
other = _error_
;
run;

data example;
   infile datalines truncover;
   informat x lickert.;
   input x ;
datalines;
1 
2
.
 
99
5
;

Note the two ways to mention "missing" for input. Also, if your actual data had a numeric code, such as 99 to indicate missing, you might want to assign missing when read as well. Example for the interested reader: modify the informat LIckert to do that.

Bonus points for looking up "special missing" and assigning a special missing value to indicate that a missing code was encountered and not just a blank column in the data.

 

 

Tom
Super User Tom
Super User

Some notes on creating an INFORMAT to handle NA, N/A, null and other text to replace missing values.

You could have the N/A text converted to a special missing if you need to distinguish it from the normal missing or empty value.

To have the informat just use the normal input method you can use the keyword _SAME_ on the right of the equal sign.  

To specify the nested use of another informat use square brackets, [ ], around the informat name on the right of the equal sign.  There is no need to use the BEST alias for the normal numeric informat, what the documentation calls the w.d informat.  If you want to use an alias it might be clearer to use F instead BEST, it is not clear what "best" would mean in terms of an informat anyway.

proc format ;
invalue na (upcase)
  'N/A','NA' = .N
  other = _same_
;
run;

data test;
  input x na32. ;
  put x= ' ' _infile_;
cards;
12
0.5
n/A
NA
.
12345
;

proc format ;
invalue na_comma (upcase)
  'N/A','NA' = .N
  other = [comma.]
;
run;

data test;
  input x na_comma32. ;
  put x= ' ' _infile_;
cards;
12
0.5
n/A
NA
.
12,345
;
  
ballardw
Super User

You can post text without it getting reformatted by the message windows by copying the line from a text editor like NotePad, WordPad or similar. Then open a code box on the forum using the </> and paste.

So your attachment looks like:

 

NUMBER,Current Incremental Terms,Current Covenants,Current Other,EBITDA,FirstLienLev,TotalLev,Equity %,Revolver,Tla,Concurrent Bonds,Purpose comment
2413,"Incremental facilities allow leverage to increase to 5.5x net first-lien, 7.25x secured, and 7.75x net total through unsecured debt, which could either cushion against an EBITDA decline or allow incremental debt; company has access to $165 million F&C beyond the ratio debt, and those loans would be governed by 75 bps of MFN protection, subject to a six month sunset
",,"Privately-placed $355M SLTL; 24 months of portability, to a financial sponsor with at least $1 billion of AUM  leverage is capped at 5.5x net first-lien and 7.75x net total; the other portability requirement is that the sponsor contribute 35% equity to the transaction.",,5.1,7.2,,75,,`,Fund a $113M dividend and refinance existing debt

And your actual issue is a line feed, carriage return (or both) depending on operating system and what program had the data entry job.

 

 

There is a chance, depending on which actual characters are appearing, that you could use the INFILE option TERMSTR=CRLF. If the character causing the issue in the middle of the text is a single line feed or single carriage return the option CRLF requires both a carriage return and line feed character to end the line. However if the actual data entry was done in some application that allowed both characters (use of Enter Key for example) when entering data that may not work.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 3624 views
  • 2 likes
  • 4 in conversation