BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
andreas_lds
Jade | Level 19

@lbarwick wrote:
Understood but the 'creator' of the spreadsheet in this case is Jira, I get the .csv as an export and then save it as .xlsx. I tried converting all of the values in this column to text and Excel will turn 100% into 1 and 50% into 0.5. The problem with this is that there could be other rows with a legitimate value of 1 (1%) so in this conversion I cannot distinguish between what was originally 100% and what was originally 1%.

So, don't open the csv-file with excel, write a data step to read it directly. The code depends on the contents of the file, so it is hardly possible to suggest something useful.

Tom
Super User Tom
Super User

Please go support this now 3+ year old SASWARE Ballot Idea for SAS to make enhancement to allow easy processing of such text files.

 

https://communities.sas.com/t5/SASware-Ballot-Ideas/Enhancements-to-INFILE-FILE-to-handle-delimited-...

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @lbarwick 

 

I have encountered the same problem several times, and I have found the following approach useful. There is no conversion outside SAS.

 

1. Ged rid of the embedded line shifts. Use a data step to read the CSV file as text lines. Store the line in a retained output-line variable. Count the number of delimiters in the current line and add to the output line until it has the expected number of deliniters. Then write the output-line variable to a temporary text file, initiate the output-line variable and counter and read next line etc.

 

2.Use another data step to read the temporary file into SAS as a delimited file. I prefer to read all variables as text and convert them to numeric or dates or whatever where appropriate.

 

 

 

 

lbarwick
Quartz | Level 8
I don't suppose you have example code for this solution?
Tom
Super User Tom
Super User

@lbarwick wrote:
I don't suppose you have example code for this solution?

There are literally dozens if not hundreds of examples on this site alone.

Try a simple google search:

https://www.google.com/search?q=%40sas.com+embedded+line+breaks+in+csv+file

 

Here are a couple of solutions:

https://communities.sas.com/t5/New-SAS-User/Reading-CSV-file-with-double-quotes-line-breaks-and-spac...

https://communities.sas.com/t5/SAS-Programming/How-to-import-line-feed-or-carriage-return-values-fro...

 

SAS has even posted a note about this, but watch out as that method modifies the original file.

https://support.sas.com/kb/26/065.html

ErikLund_Jensen
Rhodochrosite | Level 12

Sure, here we go..

 

* Write a test CSV file with embedded line breaks in fields;
data _null_;
	infile cards truncover;
	file 'c:\temp\csvtest.csv';
	input;
	put _infile_;
	cards4;
101;3/11/2020;"text field";something;217.3;another field
102;;"text field with 
line break";more;;another field
103;12/1/2021;;something;25.2.3;another field
104;2/2/2021;"text field 
with line break";something;10000000;another 
field with 
line break
105;3/11/2020;"text field";something;217.3;another field
;;;;
run;

* Filename for temp file;
filename temp 'c:\temp\csvtest_cleaned.txt';

* Read CSV file and remove embedded line breaks - we expect 5 separators in each line;
data _null_;
	infile 'c:\temp\csvtest.csv' truncover lrecl=255 end=eof;
	file temp lrecl=255;
	length outstr $255;
	retain outstr;
	input;
	l = count(outstr,';');
	if l = 5 and count(_infile_,';') > 0 then do;
		put outstr;
		outstr = '';
	end;
	outstr = catx(' ',outstr,_infile_);
	if eof then put outstr;
run;

* Read cleaned CSV file - the "hard way" with full control;
data want;
	infile temp lrecl=255;
	length ID Date 8 Text1 text2 $200 Value 8 Text3 $200;
	format Date date9. Value 12.3;
	input;
	ID = input(scan(_infile_,1,';','m'),??8.);
	Date = input(scan(_infile_,2,';','m'),??mmddyy10.);
	Text1 = dequote(scan(_infile_,3,';','m'));
	Text2 = scan(_infile_,4,';','m');
	Value = input(scan(_infile_,5,';','m'),??12.3);
	Text3 = scan(_infile_,6,';','m');
run;
lbarwick
Quartz | Level 8

Here is the solution that seems to work best for my needs:

 

%let dl=C:\Users\lbarwick\Desktop;
filename fixed temp;
data _null_;
  infile "&dl/Jira_Data.csv" end=eof  LRECL=32767 truncover dsd;
  file "&dl/Jira_Data_cleaned.csv" ;
  input;
  nq+countc(_infile_,'"');
  nq=mod(nq,2);
  put _infile_ @;
  if eof or not nq then put; 
  else put '|' @;
run;

One issue I noticed is that after 693 columns or so, SAS starts to not assign names to columns so I end up with a column of data in the 'cleaned' csv with no variable name in the header.

Tom
Super User Tom
Super User

Sounds like the header row is longer than 32 K bytes.  For 2400 names that would be an average of about 12 characters per name.

 

You will need to switch to one of the methods that does not use _INFILE_ automatic variable when pre-processing the file.  There are versions out that that process the file byte by byte so they should work for any recrod length.

 

If you are using PROC IMPORT to guess at how to name the variables from the header row you might still have a problem.  I seem to remember it also could not really handle more than 32K for the length of the header row.

lbarwick
Quartz | Level 8

Here is the solution I ended up going with:

 

*read-in raw Jira csv data WITHOUT names;
%let dl=C:\Users\lbarwick\Desktop;

filename csv "&dl/Jira_Data.csv" lrecl=2000000;
proc import datafile=csv out=jiranames replace dbms=csv;
 getnames=no;
 datarow=2;
run;

*transpose generic VAR names from raw Jira import from above;
proc transpose data=jiranames(obs=0) out=names ;
 var _all_;
run;


*now that we have variable names set, read-in raw Jira data once again and this time correct issues with carriage returns and line feeds so we get
	all records/values lined up with header.;
%let dl=C:\Users\lbarwick\Desktop;
filename fixed temp;
data _null_;
  infile "&dl/Jira_Data.csv" end=eof;
  file "&dl/Jira_Data_cleaned.csv" ; *output file with carriage return/line feed issues fixed;
  input;
  nq+countc(_infile_,'"');
  nq=mod(nq,2);
  put _infile_ @;
  if eof or not nq then put; 
  else put '|' @;
run;

*import the cleaned Jira data but again, we're not going to import variable names;
filename csv 'C:\Users\lbarwick\Desktop\Jira_Data_cleaned.csv';
proc import out=prr /*(where=(substr(issue_key,1,3)='PRR'))*/
		datafile = "C:\Users\lbarwick\Desktop\Jira_Data_cleaned.csv"
		dbms=csv replace;
		delimiter=',';
		guessingrows=1000;
	getnames=no; 
	datarow=2;
run;

*create temp file which will create code to rename generic variables to the cleaned up variables;
filename code temp;
data _null_;
  file code;
  set names end=eof;
  if not eof2 then set new_names1 end=eof2;
  if _n_=1 then put 'rename';
  put _name_ '=' final_new  ;
  if eof then put ';';
run;

*replace generic variable names with cleaned up variable names;
proc datasets nolist lib=work;
  modify prr;
  %include code;
run;
quit;
Kurt_Bremser
Super User

I have written a little C utility that reads a file byte-by-byte, counts the quotes, and if mod(count,2) = 1 and a linefeed is encountered, inserts <BR> instead. Is blindingly fast and can be used as filter in a FILENAME PIPE.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 24 replies
  • 4283 views
  • 9 likes
  • 8 in conversation