<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Import large csv problem in SAS Data Management</title>
    <link>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328224#M9602</link>
    <description>&lt;P&gt;If 1 million doesn't get it that would be very surprising.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I suggested 1.5,million.&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Sat, 28 Jan 2017 15:15:49 GMT</pubDate>
    <dc:creator>Reeza</dc:creator>
    <dc:date>2017-01-28T15:15:49Z</dc:date>
    <item>
      <title>Import large csv problem</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328210#M9595</link>
      <description>Hi all,&lt;BR /&gt;I'm trying to import CSV file of 4 million rows and ~150 columns that order of columns I don't know.&lt;BR /&gt;It is impossible to do guessingrows for such a file.&lt;BR /&gt;And, I get cut character columns.&lt;BR /&gt;&lt;BR /&gt;Any ideas what can I do?</description>
      <pubDate>Sat, 28 Jan 2017 13:44:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328210#M9595</guid>
      <dc:creator>evgenys</dc:creator>
      <dc:date>2017-01-28T13:44:33Z</dc:date>
    </item>
    <item>
      <title>Re: Import large csv problem</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328213#M9597</link>
      <description>&lt;P&gt;Very simple. Write a data step according to the file description.&lt;/P&gt;
&lt;P&gt;To get help in this, you best post that description here.&lt;/P&gt;</description>
      <pubDate>Sat, 28 Jan 2017 13:59:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328213#M9597</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2017-01-28T13:59:43Z</dc:date>
    </item>
    <item>
      <title>Re: Import large csv problem</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328218#M9598</link>
      <description>As I wrote. I didn't know the order and some times discription of columns... becouse of this I can't write data step....</description>
      <pubDate>Sat, 28 Jan 2017 14:51:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328218#M9598</guid>
      <dc:creator>evgenys</dc:creator>
      <dc:date>2017-01-28T14:51:25Z</dc:date>
    </item>
    <item>
      <title>Re: Import large csv problem</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328219#M9599</link>
      <description>&lt;P&gt;Then DEMAND the description from whoever gave you that file.&lt;/P&gt;
&lt;P&gt;Your only other option is to run proc import with a large enough guessingrows value, and hope for the best.&lt;/P&gt;
&lt;P&gt;Unless you want to inspect all rows with the good old eyeballs Mk 1.&lt;/P&gt;</description>
      <pubDate>Sat, 28 Jan 2017 14:58:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328219#M9599</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2017-01-28T14:58:50Z</dc:date>
    </item>
    <item>
      <title>Re: Import large csv problem</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328220#M9600</link>
      <description>&lt;P&gt;There isn't much of a choice then. I suggest the standard process of using proc import to generate code. Copy the code from the log and customize according to your errors until they're gone.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I would recommend setting GUESSINGROWS to 1 million for initial proc import.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You can limit total amount read using OBS option.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Option obs=1500000;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Theb reset after:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;option obs=Max;&lt;/P&gt;</description>
      <pubDate>Sat, 28 Jan 2017 15:03:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328220#M9600</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2017-01-28T15:03:47Z</dc:date>
    </item>
    <item>
      <title>Re: Import large csv problem</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328221#M9601</link>
      <description>I tried GUESSINGROWS 4 million ...it doesn't end.&lt;span class="lia-unicode-emoji" title=":grinning_face_with_big_eyes:"&gt;😃&lt;/span&gt;&lt;BR /&gt;</description>
      <pubDate>Sat, 28 Jan 2017 15:09:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328221#M9601</guid>
      <dc:creator>evgenys</dc:creator>
      <dc:date>2017-01-28T15:09:58Z</dc:date>
    </item>
    <item>
      <title>Re: Import large csv problem</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328224#M9602</link>
      <description>&lt;P&gt;If 1 million doesn't get it that would be very surprising.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I suggested 1.5,million.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 28 Jan 2017 15:15:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328224#M9602</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2017-01-28T15:15:49Z</dc:date>
    </item>
    <item>
      <title>Re: Import large csv problem</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328225#M9603</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/125955"&gt;@evgenys&lt;/a&gt; wrote:&lt;BR /&gt;I tried GUESSINGROWS 4 million ...it doesn't end.&lt;span class="lia-unicode-emoji" title=":grinning_face_with_big_eyes:"&gt;😃&lt;/span&gt;&lt;BR /&gt;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Oh yeah, 4 million rows * 150 columns takes a lot of time to inspect. It is what happens when you try something stupid.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;Get back to the data source and demand the description, and tell them that the data is useless without it.&lt;/P&gt;</description>
      <pubDate>Sat, 28 Jan 2017 15:18:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328225#M9603</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2017-01-28T15:18:10Z</dc:date>
    </item>
    <item>
      <title>Re: Import large csv problem</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328230#M9604</link>
      <description>&lt;span class="lia-unicode-emoji" title=":grinning_face_with_big_eyes:"&gt;😃&lt;/span&gt;&lt;BR /&gt;I did it first...but before it done I have to cope with it...</description>
      <pubDate>Sat, 28 Jan 2017 15:41:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328230#M9604</guid>
      <dc:creator>evgenys</dc:creator>
      <dc:date>2017-01-28T15:41:37Z</dc:date>
    </item>
    <item>
      <title>Re: Import large csv problem</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328236#M9605</link>
      <description>&lt;P&gt;This isn't an easy problem!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Do any of your fields have internal commas? If not, then you could use (untested):&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data _null_;&lt;BR /&gt;retain MaxCols 0;&lt;BR /&gt;infile x end=LastRec;&lt;BR /&gt;input;&lt;BR /&gt;ColCount = count(_infile_, ",") + 1;&lt;BR /&gt;if ColCount &amp;gt; MaxCols then MaxCols = ColCount;&lt;BR /&gt;if LastRec then call symput("ColCount", put(MaxCols, best8.));&lt;BR /&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This should give you the highest number of columns in your file.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Then maybe something along the lines of:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;%macro GetLen;&lt;BR /&gt;data _null_;&lt;BR /&gt;retain MaxLen 0;&lt;BR /&gt;infile x end=LastRec;&lt;BR /&gt;input;&lt;BR /&gt;%do &amp;amp;i = 1 to &amp;amp;ColCount;&lt;BR /&gt;ColLen = length(scan(_infile_, &amp;amp;i))&lt;BR /&gt;if ColLen &amp;gt; MaxLen then MaxLen = ColLen;&lt;BR /&gt;%end;&lt;BR /&gt;if LastRec then call symput("ColLen", put(MaxLen, best8.));&lt;BR /&gt;run;&lt;BR /&gt;%mend;&lt;BR /&gt;%GetLen;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;And finally:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;%macro GetData;&lt;BR /&gt;data Want;&lt;BR /&gt;length Col1-Col&amp;amp;ColCount. $&amp;amp;ColLen.;&lt;BR /&gt;infile x;&lt;BR /&gt;input;&lt;BR /&gt;%do &amp;amp;i = 1 to &amp;amp;ColCount;&lt;BR /&gt;Col&amp;amp;i. scan(_infile_, &amp;amp;i);&lt;BR /&gt;%end;&lt;BR /&gt;run;&lt;BR /&gt;%mend;&lt;BR /&gt;%GetData;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Good luck!&lt;/P&gt;
&lt;P&gt;Tom&lt;/P&gt;</description>
      <pubDate>Sat, 28 Jan 2017 16:37:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328236#M9605</guid>
      <dc:creator>TomKari</dc:creator>
      <dc:date>2017-01-28T16:37:49Z</dc:date>
    </item>
    <item>
      <title>Re: Import large csv problem</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328237#M9606</link>
      <description>Thanks Tom. I'll try it tomorrow.</description>
      <pubDate>Sat, 28 Jan 2017 16:53:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328237#M9606</guid>
      <dc:creator>evgenys</dc:creator>
      <dc:date>2017-01-28T16:53:55Z</dc:date>
    </item>
    <item>
      <title>Re: Import large csv problem</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328238#M9607</link>
      <description>&lt;P&gt;You don't need to know what is in a CSV file to read it in as character strings. If you don't even know how many columns there are just use a larger number (150 in the example below) than you expect. &amp;nbsp;If the last column is not empty then increase the number and read the file again.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data temp (compress=yes);
   infile 'myfile.csv' dsd truncover firstobs=2 ;
   length x1-x150 $200 ;
   input x1-x150;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;You can then analyze the character strings yourself and make your own decision on what is in it. &amp;nbsp;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Find the maximum length (if you find any with maximum length close to 200 then you might want to use more then $200 in the step above).&amp;nbsp;&lt;/LI&gt;
&lt;LI&gt;Check if they can be converted to a number by using INPUT function with COMMA32. informat.&lt;/LI&gt;
&lt;LI&gt;Check if they can by converted to a date, time, or datetime by using ANYDTDTE, ANYDTTME, and ANYDTDTM informats, respectively.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;If the first line has variable names then you could read that line in separately and use it to rename the variables.&lt;/P&gt;</description>
      <pubDate>Sat, 28 Jan 2017 18:49:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328238#M9607</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2017-01-28T18:49:04Z</dc:date>
    </item>
    <item>
      <title>Re: Import large csv problem</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328272#M9613</link>
      <description>&lt;P&gt;Here is a quick way to take a 1% random sample of the records using a line pointer control in the input statement&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data sample;
length str $200;
infile "&amp;amp;sasforum\datasets\frame.csv" truncover line=lineNo;
linePt = ceil(2 * 100 * rand("uniform"));
input #linePt str&amp;amp;;
line + lineNo;
keep line str;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Sun, 29 Jan 2017 03:20:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328272#M9613</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2017-01-29T03:20:59Z</dc:date>
    </item>
    <item>
      <title>Re: Import large csv problem</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328501#M9626</link>
      <description>&lt;P&gt;My rough stab would be to run proc import with guessingows in the 32000 range. The save and&amp;nbsp;inspect the generated datastep&amp;nbsp;&amp;nbsp;code.&lt;/P&gt;
&lt;P&gt;I would likely increase the lengths of character variables in case the generated 73 or such doesn't quite work for all of them, likely around 10 percent added.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But if you don't have a document that says what any of the columns are then what are you going to do with this data file? Unless you have column headers long enough to be explanatory it may be hard to tell what anything else is.&lt;/P&gt;</description>
      <pubDate>Mon, 30 Jan 2017 16:40:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328501#M9626</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2017-01-30T16:40:17Z</dc:date>
    </item>
    <item>
      <title>Re: Import large csv problem</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328522#M9628</link>
      <description>Let see... you have a software that it's output CSV file that cat change. You do know what you get.. and you can't change your import code every time.That's you must have a sutable function.</description>
      <pubDate>Mon, 30 Jan 2017 17:24:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328522#M9628</guid>
      <dc:creator>evgenys</dc:creator>
      <dc:date>2017-01-30T17:24:38Z</dc:date>
    </item>
    <item>
      <title>Re: Import large csv problem</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328557#M9630</link>
      <description>&lt;P&gt;Rapidly changing file structures are a VERY BAD IDEA, as they only cause unnecessary work and are a sign for no design or a very idiotic one.&lt;/P&gt;
&lt;P&gt;For one-shots, PROC IMPORT is &amp;nbsp;the way to go, but let me tell you that even for one-off flat files that I have to deal with, I always get the same type of documentation that I get for those that will be created and imported daily.&lt;/P&gt;
&lt;P&gt;So you either learn to live with the shortcomings of PROC IMPORT, or go through a lot of work determining the file structure by trial-and-error, or tell the idiot that heaped that task on you to finally grow a brain.&lt;/P&gt;</description>
      <pubDate>Mon, 30 Jan 2017 19:51:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328557#M9630</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2017-01-30T19:51:24Z</dc:date>
    </item>
    <item>
      <title>Re: Import large csv problem</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328567#M9631</link>
      <description>&lt;P&gt;Point out to management how much "extra time" this process takes for one iteration and the costs. Then the next time the additional time and associated costs. And the next time. Repeat as needed.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It may sink into someone head that having a process with an order and controls may actually make financial sense to the organization.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I once worked as a contractor doing some data analysis for a client. The at one point asked about the typical $200 to $400 monthly fee we charged in programming. When they learned it was because we had to reprogram processes each month due to file column order and content changes the data stabilized pretty quickly.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Note that we did not have that issue with the client's software side of the house only the hardware folks.&lt;/P&gt;</description>
      <pubDate>Mon, 30 Jan 2017 20:32:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328567#M9631</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2017-01-30T20:32:28Z</dc:date>
    </item>
    <item>
      <title>Re: Import large csv problem</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328574#M9633</link>
      <description>One of the reasons for such problem is (that's my problem too) if you integrate new software you'll run it 1000s times with 100s of different output files.&lt;BR /&gt;Moreover, I expected from software like sas if I've written separated by '','' it would read data between two'','' .</description>
      <pubDate>Mon, 30 Jan 2017 21:04:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328574#M9633</guid>
      <dc:creator>evgenys</dc:creator>
      <dc:date>2017-01-30T21:04:54Z</dc:date>
    </item>
    <item>
      <title>Re: Import large csv problem</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328606#M9638</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/125955"&gt;@evgenys&lt;/a&gt; wrote:&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Moreover, I expected from software like sas if I've written separated by '','' it would read data between two'','' .&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;That's what it does. How does SAS not do that?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Honestly your question and what your actual issue is unclear. We're making standard guesses and comments but beyond you need to know your data what else can we say. I'm 99% sure any other language operates&amp;nbsp;pretty much the same way with a file that is unspecified. This is also why people use databases.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 30 Jan 2017 22:38:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328606#M9638</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2017-01-30T22:38:37Z</dc:date>
    </item>
    <item>
      <title>Re: Import large csv problem</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328694#M9646</link>
      <description>&lt;P&gt;Moreover: if I had to make a guess, I'd say that 80% of my work now involves documentation, either writing or reading. The actual programming consumes very little time, partly owing to the tools I created to make my work easier, but mostly&amp;nbsp;&lt;STRONG&gt;because of the documentation!&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 31 Jan 2017 10:54:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Import-large-csv-problem/m-p/328694#M9646</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2017-01-31T10:54:29Z</dc:date>
    </item>
  </channel>
</rss>

