<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Import series of CSV files only with specific rows in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Import-series-of-CSV-files-only-with-specific-rows/m-p/746012#M233959</link>
    <description>Thank you, some questions:&lt;BR /&gt;1- I need to import CSV files from last year ( so there will be 365 CSV files to import).&lt;BR /&gt;In the code you wrote, where can I see that you import CSV files from last 1 year?&lt;BR /&gt;Remember that each CSV file is called : tYYYYMMDD (I cannot see it in your program)&lt;BR /&gt;2-I see that you wrote two times input &lt;BR /&gt;input @;&lt;BR /&gt;input customer var1 var2 .... ;&lt;BR /&gt;Can you explain why?&lt;BR /&gt;3-Did you put all CSV files that you imported into one DATA set or into multiple data sets?</description>
    <pubDate>Sat, 05 Jun 2021 16:49:38 GMT</pubDate>
    <dc:creator>Ronein</dc:creator>
    <dc:date>2021-06-05T16:49:38Z</dc:date>
    <item>
      <title>Import series of CSV files only with specific rows</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Import-series-of-CSV-files-only-with-specific-rows/m-p/745969#M233939</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;For each day there is CSV file called tYYYYMMDD.&lt;/P&gt;
&lt;P&gt;Each CSV file contain around 2 millions of rows.&lt;/P&gt;
&lt;P&gt;I want to import CSV file from last 1 year and put them in one data set.&lt;/P&gt;
&lt;P&gt;The problem is that it exceeds the maxmimum number of rows. (Question1-What is max number of rows possible in data set?)&lt;/P&gt;
&lt;P&gt;I have also a data set called List_customers with list of customers that purcahsed in the shop in last 1 year .&lt;/P&gt;
&lt;P&gt;I need to select rows from CSV files only for the list of customers in data set&amp;nbsp;List_customers.&lt;/P&gt;
&lt;P&gt;Question2:&lt;/P&gt;
&lt;P&gt;What is the way to import multiple CSV files only for the customers in List_customers&amp;nbsp; dataset and then put them in one data set?&lt;/P&gt;
&lt;P&gt;Is it possible to perform import rows of CSV files only for these specific customers?&lt;/P&gt;
&lt;P&gt;Or the only way is to import the fully CSV files and then perform inner join with&amp;nbsp; List_customers&amp;nbsp; dataset?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 05 Jun 2021 07:31:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Import-series-of-CSV-files-only-with-specific-rows/m-p/745969#M233939</guid>
      <dc:creator>Ronein</dc:creator>
      <dc:date>2021-06-05T07:31:37Z</dc:date>
    </item>
    <item>
      <title>Re: Import series of CSV files only with specific rows</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Import-series-of-CSV-files-only-with-specific-rows/m-p/745971#M233941</link>
      <description>&lt;P&gt;700 million observations is way below any SAS limit I know of.&lt;/P&gt;
&lt;P&gt;What variables do you have in the CSV files, and what are their attributes (take them from the documentation you received along with the files)?&lt;/P&gt;</description>
      <pubDate>Sat, 05 Jun 2021 07:39:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Import-series-of-CSV-files-only-with-specific-rows/m-p/745971#M233941</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2021-06-05T07:39:40Z</dc:date>
    </item>
    <item>
      <title>Re: Import series of CSV files only with specific rows</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Import-series-of-CSV-files-only-with-specific-rows/m-p/745974#M233943</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;In CSV file there are 25 varaibles and all are numeric.&lt;/P&gt;
&lt;P&gt;I still ask if it is more efficient to import CSV files and select only the rows that are relevent to my customers and then stack the rows to one data set?&lt;/P&gt;
&lt;P&gt;(Instead of import millions of rows that most of them are not relevent)?&lt;/P&gt;</description>
      <pubDate>Sat, 05 Jun 2021 08:48:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Import-series-of-CSV-files-only-with-specific-rows/m-p/745974#M233943</guid>
      <dc:creator>Ronein</dc:creator>
      <dc:date>2021-06-05T08:48:55Z</dc:date>
    </item>
    <item>
      <title>Re: Import series of CSV files only with specific rows</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Import-series-of-CSV-files-only-with-specific-rows/m-p/745992#M233951</link>
      <description>&lt;P&gt;Run the import for all data in one data step, as described in your other thread.&lt;/P&gt;
&lt;P&gt;Load the customers into a hash object, and use the CHECK method to filter the relevant observations.&lt;/P&gt;</description>
      <pubDate>Sat, 05 Jun 2021 13:51:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Import-series-of-CSV-files-only-with-specific-rows/m-p/745992#M233951</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2021-06-05T13:51:43Z</dc:date>
    </item>
    <item>
      <title>Re: Import series of CSV files only with specific rows</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Import-series-of-CSV-files-only-with-specific-rows/m-p/745995#M233953</link>
      <description>&lt;P&gt;CSV files are text files.&amp;nbsp; And text files with variable length lines.&amp;nbsp; So you have to read the whole file.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;That does not mean you have to write back out all of the lines read.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Similarly the location of the variables on the line is not fixed.&amp;nbsp; So if you only want some of the variables on the line you have to read them all (or at least up to the last one on the line you want).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;To filter observations based on a list of values a HASH object is useful.&amp;nbsp; (If the list of customers it too large to load into a hash then use an indexed dataset instead.)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;With your naming convention you can use a wildcard in the physical filename to read all of the CSV files in one step.&amp;nbsp; If the CSV files have header rows then you will need to include logic to skip the header row.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Let's assume the key variable is named CUSTOMER in both your LIST_CUSTOMERS dataset and in the dataset you want to create from the CSV files.&lt;/P&gt;
&lt;P&gt;Example:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want ;
* Load the customer list into a hash object ;
  if _n_= 1 then do;
    declare hash h(dataset:'list_customers');
    rc=h.definekey('customer');
    rc=h.definedone();
  end;
* Read all of the files for a particular year ;
* Skip the header line when starting a new file ;
  length fname $256 ;
  infile '/mypath/t2021*.csv' dsd truncover filename=fname ;
  input @;
  if fname ne lag(fname) then delete;
  * Logic for reading the data from a line in the CSV file here ;
  input customer var1 var2 .... ;
* Delete lines for customers that are not in the hash object ;
  if h.find() then delete;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 05 Jun 2021 14:27:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Import-series-of-CSV-files-only-with-specific-rows/m-p/745995#M233953</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2021-06-05T14:27:55Z</dc:date>
    </item>
    <item>
      <title>Re: Import series of CSV files only with specific rows</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Import-series-of-CSV-files-only-with-specific-rows/m-p/746012#M233959</link>
      <description>Thank you, some questions:&lt;BR /&gt;1- I need to import CSV files from last year ( so there will be 365 CSV files to import).&lt;BR /&gt;In the code you wrote, where can I see that you import CSV files from last 1 year?&lt;BR /&gt;Remember that each CSV file is called : tYYYYMMDD (I cannot see it in your program)&lt;BR /&gt;2-I see that you wrote two times input &lt;BR /&gt;input @;&lt;BR /&gt;input customer var1 var2 .... ;&lt;BR /&gt;Can you explain why?&lt;BR /&gt;3-Did you put all CSV files that you imported into one DATA set or into multiple data sets?</description>
      <pubDate>Sat, 05 Jun 2021 16:49:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Import-series-of-CSV-files-only-with-specific-rows/m-p/746012#M233959</guid>
      <dc:creator>Ronein</dc:creator>
      <dc:date>2021-06-05T16:49:38Z</dc:date>
    </item>
    <item>
      <title>Re: Import series of CSV files only with specific rows</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Import-series-of-CSV-files-only-with-specific-rows/m-p/746017#M233963</link>
      <description>&lt;P&gt;The * in the filename is a wildcard.&amp;nbsp; So the example was for 2021.&amp;nbsp; If you also other files, like "t2021GEORGE.csv" that match that pattern you will have a problem (not a hard problem since you can use the FNAME variable to test the actual name of the file that was found.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You need the extra INPUT with the trailing&amp;nbsp;@ to get the FNAME variable to be updated so you can test if it is the start of a new file or not.&amp;nbsp; If the files do not contain header rows then you do not need this.&amp;nbsp; But see below discussion about whether the DATE value appears in the data or not.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Yes. It makes no sense to make 365 separate SAS datasets.&amp;nbsp; If the fields in the file do not contain the date then pull the date from the FNAME variable.&amp;nbsp; Note the FNAME variable will not appear in the output dataset so use a new variable for this.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;date_string = scan(fname,-2,'.t');&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 05 Jun 2021 17:10:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Import-series-of-CSV-files-only-with-specific-rows/m-p/746017#M233963</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2021-06-05T17:10:13Z</dc:date>
    </item>
    <item>
      <title>Re: Import series of CSV files only with specific rows</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Import-series-of-CSV-files-only-with-specific-rows/m-p/746029#M233970</link>
      <description>&lt;P&gt;How can you know what's in a row until you read it? Unless you have some external logic or additional information you can't know. Therefore you kind of have to process the entire file and then parse out what you want. So you read the file but only write the records of interest.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 05 Jun 2021 19:04:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Import-series-of-CSV-files-only-with-specific-rows/m-p/746029#M233970</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2021-06-05T19:04:17Z</dc:date>
    </item>
    <item>
      <title>Re: Import series of CSV files only with specific rows</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Import-series-of-CSV-files-only-with-specific-rows/m-p/746038#M233976</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/159549"&gt;@Ronein&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;In CSV file there are 25 varaibles and all are numeric.&lt;/P&gt;
&lt;P&gt;I still ask if it is more efficient to import CSV files and select only the rows that are relevent to my customers and then stack the rows to one data set?&lt;/P&gt;
&lt;P&gt;(Instead of import millions of rows that most of them are not relevent)?&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;How many "relevent customers" are their?&lt;/P&gt;
&lt;P&gt;If the number is "small" it might be as simple as adding something like:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;if customer id in ('id1' 'id2' 'id3') ;&lt;/P&gt;
&lt;P&gt;to the data step reading the file.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You would &lt;STRONG&gt;read&lt;/STRONG&gt; all of the records but only &lt;STRONG&gt;keep &lt;/STRONG&gt;the desired ones.&lt;/P&gt;
&lt;P&gt;If you have "many" then read the whole and filter later such as by merging with a data set that contains the relevant customer id values.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Are the same customers "relevant" in all the source files?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Why do you reading 2 million records "exceeds the maximum number of rows"? Are you using one of the SAS editions that has file size limits because it is intended as a learning tool and not a production data system?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 05 Jun 2021 20:40:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Import-series-of-CSV-files-only-with-specific-rows/m-p/746038#M233976</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2021-06-05T20:40:25Z</dc:date>
    </item>
  </channel>
</rss>

