BookmarkSubscribeRSS Feed
Tom
Super User Tom
Super User

Do you ever needed to read delimited text (aka CSV or Comma Separated Values) files into SAS datasets?

Unhappy with the speed, accuracy, flexibility of PROC IMPORT?

I know I was.

 

I have created a %CSV2DS() macro for reading delimited text files into SAS datasets. 

 

It has a number of enhancements (or just different behavior) over how PROC IMPORT DBMS=CSV (or DBMS=DLM) behaves. Many of which I have listed in the macro header.

Differences from PROC IMPORT
- Supports header lines with more than 32,767 characters
- Supports ZIP and GZIP source files
- Generates unique variable names by adding numeric suffix
- Does not overestimate maxlength when longest value is quoted
- Does NOT force character type if all values are quoted
- Generates label when generated variable name is different than header
- Supports NAMEROW option
- Supports numeric fields with special missing values (MISSING statement)
- Does not attach unneeded informats or formats
- Allows overriding calculated metadata
- Allow using random sample of rows to guess metadata
- Generates more compact SAS code
- Generates analysis summary dataset and raw data view
- Saves generated SAS code to a file
- Forces DATE and DATETIME formats to show century
- Difference in generated V7 compatible variable names
  - Replaces adjacent non-valid characters with single underscore

Please try it out and let me know whether it works for you.

Link to raw code:  https://raw.githubusercontent.com/sasutils/macros/master/csv2ds.sas

Make sure to also get the %PARMV()  parameter validation macro it uses to check the user inputs:

https://raw.githubusercontent.com/sasutils/macros/master/parmv.sas 

 

In my tests it is able to reduce import times by up to 90% over PROC IMPORT with GUESSINGROWS=MAX.  

Some example uses:

* Convert CSV file to default dataset ;
%csv2ds("myfile.csv");
* Use FILEREF to reference the file, name output dataset ;
filename csv "myfile.csv";
%csv2ds(csv,out=test);
* Use larger LRECL for file  ;
%csv2ds("myfile.csv" lrecl=1000000);
* Convert ZIP compressed file ;
%csv2ds("myfiles.zip" zip member="myfile.csv")
* Convert GZIP compressed file ;
%csv2ds("myfile.csv.gz" zip gzip)

 

3 REPLIES 3
AllanBowe
Barite | Level 11
thanks for sharing!
/Allan
SAS Challenges - SASensei
MacroCore library for app developers
SAS networking events (BeLux, Germany, UK&I)

Data Workflows, Data Contracts, Data Lineage, Drag & drop excel EUCs to SAS 9 & Viya - Data Controller
DevOps and AppDev on SAS 9 / Viya / Base SAS - SASjs
Tom
Super User Tom
Super User

I have also created a %REPLACE_CRLF() macro that will attempt to fix delimited files that have embedded line breaks.  SAS has published some code for this,  https://support.sas.com/kb/26/065.html , but that code modifies to file in place.  Which in addition to having a risk of losing your data also means you can only replace the characters with a single replacement character and you cannot remove the characters.

 

Code:

%macro replace_crlf
/*----------------------------------------------------------------------------
Replace carriage return or linefeed characters that are inside quotes
----------------------------------------------------------------------------*/
(infile   /* Fileref or quoted physical name of input file */
,outfile  /* Fileref or quoted physical name of output file */
,cr='\r'  /* Replacement string for carriage return */
,lf='\n'  /* Replacement string for linefeed */
);
/*----------------------------------------------------------------------------
SAS cannot parse delimited text files that have end of line characters in the
value of a column, even if the value is quoted.  This macro will read a file
byte by byte and keep track of the number of quote characters seen. When the
number of quotes seen is odd then the location is inside of quotes.  So any
carriage return or linefeed inside quotes will be replaced with the CR or LF
parameter, respectively.

The values of CR and LF can be anything that is valid in a PUT statement.

To write nothing in place of the character just set the parameter empty:
To leave one of the characters unchanged just set the hexcode as the value:
  CR='0D'x  or LF='0A'x

Examples:

* Replace only CR characters ;
%replace_crlf('in.csv','out.csv',lf='0A'x);

* Remove CR and replace LF ;
%replace_crlf('in.csv','out.csv',cr=);

* Replace LF with pipe character and leave CD unchanged ;
%replace_crlf('in.csv','out.csv',cr='0D'x,lf='|');

* Read from ZIP file ;
%replace_crlf('myfile.zip' zip member='myfile.csv','myfile.csv')

----------------------------------------------------------------------------*/
%if 0=%length(&infile) or 0=%length(&outfile) %then %do;
  %put ERROR: Both the INFILE and OUTFILE parameters are required by &sysmacroname..;
  %put ERROR: &=infile ;
  %put ERROR: &=outfile ;
%end;
%else %do;
data _null_;
  infile &infile recfm=f lrecl=1;
  file &outfile recfm=f lrecl=1;
  input ch $char1. ;
  retain q 0;
  q=mod(q+(ch='"'),2);
  if q and ch='0D'x then put &cr ;
  else if q and ch='0A'x then put &lf ;
  else put ch $char1. ;
run;
%end;
%mend replace_crlf;
AllanBowe
Barite | Level 11

Nice.  We have a similar macro in the SASjs/core library:  https://core.sasjs.io/mp__cleancsv_8sas.html

/Allan
SAS Challenges - SASensei
MacroCore library for app developers
SAS networking events (BeLux, Germany, UK&I)

Data Workflows, Data Contracts, Data Lineage, Drag & drop excel EUCs to SAS 9 & Viya - Data Controller
DevOps and AppDev on SAS 9 / Viya / Base SAS - SASjs

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 778 views
  • 5 likes
  • 2 in conversation