<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Optimized way to read variable length csv file in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Optimized-way-to-read-variable-length-csv-file/m-p/745454#M233659</link>
    <description>&lt;P&gt;Thanks &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/159"&gt;@Tom&lt;/a&gt;&amp;nbsp;, that worked perfectly. It took around 2 minutes to read the data and dataset was compressed to 1 GB only.&lt;/P&gt;&lt;P&gt;I just need to verify if reading this compressed dataset in other data steps increases any processing time or not.&lt;/P&gt;</description>
    <pubDate>Thu, 03 Jun 2021 13:02:54 GMT</pubDate>
    <dc:creator>hhh123</dc:creator>
    <dc:date>2021-06-03T13:02:54Z</dc:date>
    <item>
      <title>Optimized way to read variable length csv file</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Optimized-way-to-read-variable-length-csv-file/m-p/745435#M233652</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I have a comma delimited csv file which has around 240 columns. These columns are of variable length hence I have used PROC IMPORT(as below) to read the file.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;PROC IMPORT OUT= WORK.&amp;amp;table_name. DATAFILE= holdit DBMS=DLM REPLACE;
				DELIMITER="&amp;amp;delimiter.";
				GETNAMES=YES;
				DATAROW=2;
				GUESSINGROWS=MAX;
			RUN;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;This works fine for smaller amount of data. But when reading csv file with 2 Million records it takes around 1 hour to read and this is because of &lt;STRONG&gt;GUSSINGROWS=MAX.&amp;nbsp;&lt;/STRONG&gt;Since data is of variable length I have this limitation to use &lt;STRONG&gt;MAX.&lt;/STRONG&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So I took another approach of using infile/input statement with defining maximum width of columns.&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data WORK.&amp;amp;wrk_table. ;
infile &amp;amp;var_fref delimiter = "&amp;amp;delimiter" MISSOVER DSD lrecl=32767 firstobs=2 ;
attrib MEMBER_ID informat=$32. format=$32.;
/*prop_list - Contains list of other column names*/
%do p=1 %to %sysfunc(countw(&amp;amp;prop_list));
	attrib %scan(&amp;amp;prop_list,&amp;amp;p)  informat=$255. format=$255.;
%end;
input
/*inp_str - Contains list of all input variable names*/
%do k=1 %to %sysfunc(countw(&amp;amp;inp_str));
	%scan(&amp;amp;inp_str,&amp;amp;k) $
%end;
;
if missing(MEMBER_ID) then delete;
run;
			&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;This code takes around 7-8 minutes to complete. But the dataset created is of 140 GB (with proc import it was of 9 GB). And this is because of default length of 255 bytes.&lt;/P&gt;&lt;P&gt;So is there any optimal way to handle this situation which reduces the time to read and does not consume lot of memory?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 03 Jun 2021 11:51:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Optimized-way-to-read-variable-length-csv-file/m-p/745435#M233652</guid>
      <dc:creator>hhh123</dc:creator>
      <dc:date>2021-06-03T11:51:23Z</dc:date>
    </item>
    <item>
      <title>Re: Optimized way to read variable length csv file</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Optimized-way-to-read-variable-length-csv-file/m-p/745450#M233656</link>
      <description>&lt;P&gt;Use the COMPRESS option (or COMPRESS= dataset option) to generate compressed dataset.&amp;nbsp; That will reduce the size used on the disk for the dataset.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Your program could be a lot simpler.&amp;nbsp; Note there is no need to attach the $ format or $ informat to character variables.&amp;nbsp; SAS already knows how to write and read character variables.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data _null_;
  infile &amp;amp;var_fref obs=1 ;
  input;
  call symputx('varlist',translate(_infile_,' ',"&amp;amp;delimiter"));
run;

data &amp;amp;wrk_table. (compress=yes) ;
  length MEMBDER_ID $32 &amp;amp;varlist $255 ;
  infile &amp;amp;var_fref dsd dlm = "&amp;amp;delimiter" truncover firstobs=2 ;
  input &amp;amp;varlist;
  if missing(MEMBER_ID) then delete;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 03 Jun 2021 12:46:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Optimized-way-to-read-variable-length-csv-file/m-p/745450#M233656</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2021-06-03T12:46:44Z</dc:date>
    </item>
    <item>
      <title>Re: Optimized way to read variable length csv file</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Optimized-way-to-read-variable-length-csv-file/m-p/745454#M233659</link>
      <description>&lt;P&gt;Thanks &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/159"&gt;@Tom&lt;/a&gt;&amp;nbsp;, that worked perfectly. It took around 2 minutes to read the data and dataset was compressed to 1 GB only.&lt;/P&gt;&lt;P&gt;I just need to verify if reading this compressed dataset in other data steps increases any processing time or not.&lt;/P&gt;</description>
      <pubDate>Thu, 03 Jun 2021 13:02:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Optimized-way-to-read-variable-length-csv-file/m-p/745454#M233659</guid>
      <dc:creator>hhh123</dc:creator>
      <dc:date>2021-06-03T13:02:54Z</dc:date>
    </item>
    <item>
      <title>Re: Optimized way to read variable length csv file</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Optimized-way-to-read-variable-length-csv-file/m-p/745458#M233662</link>
      <description>&lt;P&gt;Is this file static?&amp;nbsp; Do you get multiple copies of it over time? (like a weekly or monthly delivery)&lt;/P&gt;
&lt;P&gt;If the columns names and order do not change then just take the time to define the structure you need once and read all instances into the same structure.&amp;nbsp; Just define each variable long enough to hold the maximum allowed value for that variable, independent of the maximum value in this one set of 2 million observations.&lt;/P&gt;</description>
      <pubDate>Thu, 03 Jun 2021 13:28:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Optimized-way-to-read-variable-length-csv-file/m-p/745458#M233662</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2021-06-03T13:28:47Z</dc:date>
    </item>
    <item>
      <title>Re: Optimized way to read variable length csv file</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Optimized-way-to-read-variable-length-csv-file/m-p/745459#M233663</link>
      <description>##- This is not a static file and column name and width can change in each&lt;BR /&gt;run. Due to this usecase I have previously used PROC IMPORT to read the&lt;BR /&gt;file.-##</description>
      <pubDate>Thu, 03 Jun 2021 13:31:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Optimized-way-to-read-variable-length-csv-file/m-p/745459#M233663</guid>
      <dc:creator>hhh123</dc:creator>
      <dc:date>2021-06-03T13:31:56Z</dc:date>
    </item>
    <item>
      <title>Re: Optimized way to read variable length csv file</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Optimized-way-to-read-variable-length-csv-file/m-p/745489#M233682</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/102722"&gt;@hhh123&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;##- This is not a static file and column name and width can change in each&lt;BR /&gt;run. Due to this usecase I have previously used PROC IMPORT to read the&lt;BR /&gt;file.-##&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;When I see a statement that "column name and width" can change I suggest approaching the &lt;STRONG&gt;source &lt;/STRONG&gt;of the file for the documentation of what is supposed to be created. If they can't provide that then how do they know what they are creating???&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Used to do some contract data-related work for a large company and they asked us "why do you charge us a programming surcharge so often". When we explained that 1) the order of columns in the data they provided, 2) column headings changed and 3) structure of the content in some of those columns changed multiple times per week it wasn't very long before that quit happening as the charges were running thousands per month.&lt;/P&gt;</description>
      <pubDate>Thu, 03 Jun 2021 14:58:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Optimized-way-to-read-variable-length-csv-file/m-p/745489#M233682</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2021-06-03T14:58:01Z</dc:date>
    </item>
    <item>
      <title>Re: Optimized way to read variable length csv file</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Optimized-way-to-read-variable-length-csv-file/m-p/745685#M233796</link>
      <description>&lt;P&gt;Actually this is kind of a generic application on which I am working. This is to be used by multiple consumers each having their own structure of csv file. So basically I need to handle this varied length and column in code part itself.&lt;/P&gt;</description>
      <pubDate>Fri, 04 Jun 2021 04:46:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Optimized-way-to-read-variable-length-csv-file/m-p/745685#M233796</guid>
      <dc:creator>hhh123</dc:creator>
      <dc:date>2021-06-04T04:46:00Z</dc:date>
    </item>
    <item>
      <title>Re: Optimized way to read variable length csv file</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Optimized-way-to-read-variable-length-csv-file/m-p/745689#M233798</link>
      <description>&lt;P&gt;Instead of putting only column names into a macro variable, have a dataset that describes all the necessary variable attributes (name, type, length, informat, format) and use that in a DATA _NULL_ step which creates the import code with CALL EXECUTE.&lt;/P&gt;</description>
      <pubDate>Fri, 04 Jun 2021 05:16:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Optimized-way-to-read-variable-length-csv-file/m-p/745689#M233798</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2021-06-04T05:16:26Z</dc:date>
    </item>
  </channel>
</rss>

