<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Hash Tables - problems with large datasets in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77506#M16785</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;SPDE needs physically separate volumes for the "bins" to improve I/O. Running it on a single disk array will not improve anything, as the array already spreads the workload. Using SPDE with a quadcore and 4 arrays (preferably on 4 separate PCIe buses) will improve things. But that needs a real server and not a Windows toybox.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Tue, 16 Jun 2015 06:02:01 GMT</pubDate>
    <dc:creator>Kurt_Bremser</dc:creator>
    <dc:date>2015-06-16T06:02:01Z</dc:date>
    <item>
      <title>Hash Tables - problems with large datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77468#M16747</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Does anyone have links to a good beginning tutorial on Hash tables?&amp;nbsp; I have been googling and reading a lot, but I find that the papers, so far, are fairly specific and vague enough that I find it difficult to understand the overall structure.&amp;nbsp; I have a dataset with 530 million observations and 250+ columns of sensor data (~ 3 TB).&amp;nbsp; The powers that be want stats summaries on ALL of the columns (n, min, max, mean, stddev, skew, kurtosis, var) by equipment id by date.&amp;nbsp; Being new to SAS, I did a lot of research and it appears that hash tables would be the best approach but there are several aspects to the programming that are not clear to me.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;My initial approach (and please direct me if there is a better approach) is to use the hash tables to subset the data by id (or id/date) and then proc summary on the subset.&amp;nbsp; I tried running the hash subset and ran out of memory (Win7 8GB memory). &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt; font-family: Courier New;"&gt;data hash_results&lt;/SPAN&gt;&lt;SPAN style="font-size: 10pt; font-family: Courier New;"&gt;;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #0000ff; font-size: 10pt; font-family: Courier New;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; set&lt;/SPAN&gt;&lt;SPAN style="font-size: 10pt; font-family: Courier New;"&gt; myLargeDataset;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #0000ff; font-size: 10pt; font-family: Courier New;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if&lt;/SPAN&gt;&lt;SPAN style="font-size: 10pt; font-family: Courier New;"&gt; (_n_ eq &lt;/SPAN&gt;&lt;STRONG style=": ; color: #008080; font-size: 10pt; font-family: Courier New;"&gt;1&lt;/STRONG&gt;&lt;SPAN style="font-size: 10pt; font-family: Courier New;"&gt;) &lt;/SPAN&gt;&lt;SPAN style="color: #0000ff; font-size: 10pt; font-family: Courier New;"&gt;then&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #0000ff; font-size: 10pt; font-family: Courier New;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; do&lt;/SPAN&gt;&lt;SPAN style="font-size: 10pt; font-family: Courier New;"&gt;;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #0000ff; font-size: 10pt; font-family: Courier New;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; declare&lt;/SPAN&gt;&lt;SPAN style="font-size: 10pt; font-family: Courier New;"&gt; &lt;/SPAN&gt;&lt;SPAN style="color: #0000ff; font-size: 10pt; font-family: Courier New;"&gt;hash&lt;/SPAN&gt;&lt;SPAN style="font-size: 10pt; font-family: Courier New;"&gt; a(dataset:'myLargeDataset&lt;/SPAN&gt;&lt;SPAN style="color: #800080; font-size: 10pt; font-family: Courier New;"&gt;'&lt;/SPAN&gt;&lt;SPAN style="font-size: 10pt; font-family: Courier New;"&gt;);&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt; font-family: Courier New;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; a.defineKey(&lt;/SPAN&gt;&lt;SPAN style="color: #800080; font-size: 10pt; font-family: Courier New;"&gt;'equipmentsernum'&lt;/SPAN&gt;&lt;SPAN style="font-size: 10pt; font-family: Courier New;"&gt;, &lt;/SPAN&gt;&lt;SPAN style="color: #800080; font-size: 10pt; font-family: Courier New;"&gt;'Date'&lt;/SPAN&gt;&lt;SPAN style="font-size: 10pt; font-family: Courier New;"&gt;);&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt; font-family: Courier New;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; a.defineData(all:&lt;/SPAN&gt;&lt;SPAN style="color: #800080; font-size: 10pt; font-family: Courier New;"&gt;'y'&lt;/SPAN&gt;&lt;SPAN style="font-size: 10pt; font-family: Courier New;"&gt;);&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt; font-family: Courier New;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; a.defineDone();&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt; font-family: Courier New;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN style="color: #0000ff; font-size: 10pt; font-family: Courier New;"&gt;end&lt;/SPAN&gt;&lt;SPAN style="font-size: 10pt; font-family: Courier New;"&gt;;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt; font-family: Courier New;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; equipmentsernum = &lt;/SPAN&gt;&lt;SPAN style="color: #800080; font-size: 10pt; font-family: Courier New;"&gt;'296737'&lt;/SPAN&gt;&lt;SPAN style="font-size: 10pt; font-family: Courier New;"&gt;;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt; font-family: Courier New;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN style="color: #0000ff; font-size: 10pt; font-family: Courier New;"&gt;if&lt;/SPAN&gt;&lt;SPAN style="font-size: 10pt; font-family: Courier New;"&gt;(a.find() eq &lt;/SPAN&gt;&lt;STRONG style=": ; color: #008080; font-size: 10pt; font-family: Courier New;"&gt;0&lt;/STRONG&gt;&lt;SPAN style="font-size: 10pt; font-family: Courier New;"&gt;);&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #0000ff; font-size: 10pt; font-family: Courier New;"&gt;run&lt;/SPAN&gt;&lt;SPAN style="font-size: 10pt; font-family: Courier New;"&gt;;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt; font-family: Courier New;"&gt;This code works on a subset of myLargeDataset, but on the big set, it quickly runs out of memory.&amp;nbsp; Some things I haven't figured out with hash tables are:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt; font-family: Courier New;"&gt;1) Can I save the resulting hash table to re-use outside of the data step?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt; font-family: Courier New;"&gt;2) Can I write a macro to loop through the hash? My thought was to use the hash table to subset myLargeDataset into a smaller table of just one serial number or id, the call proc summary to get stats for that unit, then loop through the next serial number...etc.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt; font-family: Courier New;"&gt;Any hash tutorials or pointers would be greatly appreciated.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt; font-family: Courier New;"&gt;&lt;BR /&gt;&lt;/SPAN&gt;Regards,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Fred&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 16 May 2013 17:31:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77468#M16747</guid>
      <dc:creator>FredGIII</dc:creator>
      <dc:date>2013-05-16T17:31:24Z</dc:date>
    </item>
    <item>
      <title>Re: Hash Tables - problems with large datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77469#M16748</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Ok before someone slaps me for doing something stupid, I realized that defineData(all:'y') with that large of a dataset was crazy.&amp;nbsp; So I have removed that and am currently running the following against the 3TB dataset just to see if I can create the hash table.&amp;nbsp; But my questions still hold - is there a better approach? Can I save the hashtable? Looping through hash table to subset the data?&amp;nbsp; Links to hash table tutorials?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;F:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: Courier New; font-size: 10pt;"&gt;data hash_results&lt;/SPAN&gt;&lt;SPAN style="font-family: Courier New; font-size: 10pt;"&gt;;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #0000ff; font-family: Courier New; font-size: 10pt;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; set&lt;/SPAN&gt;&lt;SPAN style="font-family: Courier New; font-size: 10pt;"&gt; myLargeDataset;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #0000ff; font-family: Courier New; font-size: 10pt;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if&lt;/SPAN&gt;&lt;SPAN style="font-family: Courier New; font-size: 10pt;"&gt; (_n_ eq &lt;/SPAN&gt;&lt;STRONG style="color: #008080; font-family: Courier New; font-size: 10pt;"&gt;1&lt;/STRONG&gt;&lt;SPAN style="font-family: Courier New; font-size: 10pt;"&gt;) &lt;/SPAN&gt;&lt;SPAN style="color: #0000ff; font-family: Courier New; font-size: 10pt;"&gt;then&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #0000ff; font-family: Courier New; font-size: 10pt;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; do&lt;/SPAN&gt;&lt;SPAN style="font-family: Courier New; font-size: 10pt;"&gt;;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #0000ff; font-family: Courier New; font-size: 10pt;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; declare&lt;/SPAN&gt;&lt;SPAN style="font-family: Courier New; font-size: 10pt;"&gt; &lt;/SPAN&gt;&lt;SPAN style="color: #0000ff; font-family: Courier New; font-size: 10pt;"&gt;hash&lt;/SPAN&gt;&lt;SPAN style="font-family: Courier New; font-size: 10pt;"&gt; a(dataset:'myLargeDataset&lt;/SPAN&gt;&lt;SPAN style="color: #800080; font-family: Courier New; font-size: 10pt;"&gt;', multidata:'y'&lt;/SPAN&gt;&lt;SPAN style="font-family: Courier New; font-size: 10pt;"&gt;);&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: Courier New; font-size: 10pt;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; a.defineKey(&lt;/SPAN&gt;&lt;SPAN style="color: #800080; font-family: Courier New; font-size: 10pt;"&gt;'equipmentsernum'&lt;/SPAN&gt;&lt;SPAN style="font-family: Courier New; font-size: 10pt;"&gt;);&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: Courier New; font-size: 10pt;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; a.defineData(&lt;/SPAN&gt;&lt;SPAN style="color: #800080; font-family: Courier New; font-size: 10pt;"&gt;'equipmentsernum'&lt;/SPAN&gt;&lt;SPAN style="font-family: Courier New; font-size: 10pt;"&gt;, &lt;/SPAN&gt;&lt;SPAN style="color: #800080; font-family: Courier New; font-size: 10pt;"&gt;'Date'&lt;/SPAN&gt;&lt;SPAN style="font-family: Courier New; font-size: 10pt;"&gt;);&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: Courier New; font-size: 10pt;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; a.defineDone();&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: Courier New; font-size: 10pt;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN style="color: #0000ff; font-family: Courier New; font-size: 10pt;"&gt;end&lt;/SPAN&gt;&lt;SPAN style="font-family: Courier New; font-size: 10pt;"&gt;;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #0000ff; font-family: Courier New; font-size: 10pt;"&gt;run&lt;/SPAN&gt;&lt;SPAN style="font-family: Courier New; font-size: 10pt;"&gt;;&lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 16 May 2013 18:08:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77469#M16748</guid>
      <dc:creator>FredGIII</dc:creator>
      <dc:date>2013-05-16T18:08:49Z</dc:date>
    </item>
    <item>
      <title>Re: Hash Tables - problems with large datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77470#M16749</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I am still learning Hash objects as well so I will be interesting in other comments but my understanding is they have to fit in memory or they cannot be used. Basically that is where the efficiency are gained because of the bypass of writing results to disk.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have to be honest and say I havent worked with files of the TB nature and not knowing a lot about the file structure there might be ways to reduce the on disk size to a more manageable size such as looking at any variable lengths and see if there are a lot of empty space (ie character fields that are larger than needed, numeric fields where precision is not needed so less than 8 length could be used). But even tricks like that may not reduce enough and would meaning running through all the data to adjust these, in which case you might as well run proc summary through the dataset and save the results to a dataset that you can report from.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Alternatively, you could add an index to the file but that will also take time and additionally disk resources to store.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Very interesting question ... interested to hear other responses.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;EJ&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 16 May 2013 18:19:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77470#M16749</guid>
      <dc:creator>esjackso</dc:creator>
      <dc:date>2013-05-16T18:19:19Z</dc:date>
    </item>
    <item>
      <title>Re: Hash Tables - problems with large datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77471#M16750</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I am pretty sure hashes are only available during the sas session and to save them out would mean to create a sas dataset from them which I dont think is your intent (but maybe it is).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Is the data static? Or is it continuously updated? If its being updated then maybe a filter view through proc SQL might be better.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The approach you take may also be dependent on whether this is a one time task or if it will be repeated on some interval. Brute force method might be fine for a one time thing, but for repeated tasks a more efficient process is probably desired.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;EJ&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 16 May 2013 18:26:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77471#M16750</guid>
      <dc:creator>esjackso</dc:creator>
      <dc:date>2013-05-16T18:26:23Z</dc:date>
    </item>
    <item>
      <title>Re: Hash Tables - problems with large datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77472#M16751</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;EJ - the data is static and we just need to generate summaries to create smaller datasets that we can work with.&amp;nbsp; As for saving the hash table, my only thought was that if I am going to loop through by id and subset then proc summary, that I would either need to save the hash table or regenerate it for each loop. It seems that regenerating it for each loop would be inefficient.&amp;nbsp; But again, that is assuming my approach is valid &lt;img id="smileyhappy" class="emoticon emoticon-smileyhappy" src="https://communities.sas.com/i/smilies/16x16_smiley-happy.png" alt="Smiley Happy" title="Smiley Happy" /&gt;.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;FG&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 16 May 2013 18:32:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77472#M16751</guid>
      <dc:creator>FredGIII</dc:creator>
      <dc:date>2013-05-16T18:32:52Z</dc:date>
    </item>
    <item>
      <title>Re: Hash Tables - problems with large datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77473#M16752</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I think im catching on ... so you are using hash somewhat like an index in order to subset the large dataset. I think at this point my hash knowledge has been exhausted.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I think if I was trying to do this I might start with a 20% random sampling of the large dataset stratified by equip id and date (or whatever the summary groupings are). That should give you a small enough dataset to do the summaries on but not take a day for proc summary to run. Of course you would have to run through the data again to set the sample.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I might be leading down the wrong path so I will wait to see if others respond.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;EJ&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 16 May 2013 18:50:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77473#M16752</guid>
      <dc:creator>esjackso</dc:creator>
      <dc:date>2013-05-16T18:50:44Z</dc:date>
    </item>
    <item>
      <title>Re: Hash Tables - problems with large datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77474#M16753</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;You can't save a hash table.&amp;nbsp; And are likely to be other approaches.&amp;nbsp; First, a few preliminary questions ...&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;How many unique values for equipsernum?&amp;nbsp; (order of magnitude would do)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;How many for date?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Do you need to show every date for every equipsernum, or only the dates that actually exist in the data?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I suspect you will end up with a SQL step to extract a table with the equipsernum values:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc sql noprint;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; create table sernums as select distinct equipsernum from MyLargeDataset;&lt;/P&gt;&lt;P&gt;quit;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;That would make it easy to loop through using CALL EXECUTE ... generate a separate PROC SUMMARY, CLASS DATE for each EQUIPSERNUM.&amp;nbsp; So if that turns out to be viable I can sketch out more of the code.&amp;nbsp; Of course "viable" doesn't mean "fast".&amp;nbsp; So let's start with the questions above.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Good luck.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 16 May 2013 19:17:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77474#M16753</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2013-05-16T19:17:34Z</dc:date>
    </item>
    <item>
      <title>Re: Hash Tables - problems with large datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77475#M16754</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Thanks Astounding (that sounds strange &lt;img id="smileyhappy" class="emoticon emoticon-smileyhappy" src="https://communities.sas.com/i/smilies/16x16_smiley-happy.png" alt="Smiley Happy" title="Smiley Happy" /&gt;),&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We did think about doing PROC SQL, and we did try a test doing a subset using proc sql for one specific equipsernum, that query alone took about 30 hrs (a bit over 1 day).&amp;nbsp; There are almost 800 unique ids in equipsernum, which means it would take over 2 years to subset the entire dataset..&amp;nbsp; I was hoping for something a bit speedier LOL.The sensor data is stored at 5 min intervals which means there are 288 observations per day.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;FG&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 16 May 2013 20:37:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77475#M16754</guid>
      <dc:creator>FredGIII</dc:creator>
      <dc:date>2013-05-16T20:37:17Z</dc:date>
    </item>
    <item>
      <title>Re: Hash Tables - problems with large datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77476#M16755</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;With only 800 equipsernums, you should be able to summarize directly with one pass through the data ...&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc summary data=MyLargeDataset nway;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; class equipsernum date;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; var ...;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; output out=summary.stats (drop=_type_ _freq_)&amp;nbsp;&amp;nbsp; ...;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;There is a way to specify the AUTONAME option that escapes me, but you should be able to use it so you don't have to spell out the full list of statistics for each variable.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;You can run out of memory if you have too many equipsernum date combinations, but we didn't get into how many date values are in the data.&amp;nbsp; Memory usage would be unrelated to the number of observations ... only related to the number of equipsernum date combinations.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 16 May 2013 20:48:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77476#M16755</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2013-05-16T20:48:19Z</dc:date>
    </item>
    <item>
      <title>Re: Hash Tables - problems with large datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77477#M16756</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Astounding:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I did mention the date values above (every 5 min, so 288 observations or date values per day).&amp;nbsp; So the summary function would have to summarize the 288 obs for each day for each equipmentsernum.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We did try running the following :&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG style="color: navy; background: white; font-family: 'Courier New';"&gt;proc&lt;/STRONG&gt; &lt;STRONG style="color: navy; background: white; font-family: 'Courier New';"&gt;summary&lt;/STRONG&gt;&amp;nbsp; &lt;SPAN style="background: white; color: blue; font-family: 'Courier New';"&gt;data&lt;/SPAN&gt;&lt;SPAN style="background: white; color: black; font-family: 'Courier New';"&gt;=MyLargeDataset &lt;SPAN style="background: white; color: blue; font-family: 'Courier New';"&gt;nway&lt;/SPAN&gt;&lt;SPAN style="background: white; color: black; font-family: 'Courier New';"&gt;;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;SPAN style="background: white; color: blue; font-family: 'Courier New';"&gt;class&lt;/SPAN&gt;&lt;SPAN style="background: white; color: black; font-family: 'Courier New';"&gt; equipmentsernum Date flag;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;SPAN style="background: white; color: blue; font-family: 'Courier New';"&gt;Var&lt;/SPAN&gt;&lt;SPAN style="background: white; color: black; font-family: 'Courier New';"&gt; _numeric_;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;SPAN style="background: white; color: blue; font-family: 'Courier New';"&gt;output&lt;/SPAN&gt; &lt;SPAN style="background: white; color: blue; font-family: 'Courier New';"&gt;out &lt;/SPAN&gt;&lt;SPAN style="background: white; color: black; font-family: 'Courier New';"&gt;= SummaryDataset (drop = _type_ _freq_)&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN style="background: white; color: black; font-family: 'Courier New';"&gt;&lt;SPAN style="background: white; color: blue; font-family: 'Courier New';"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Sum&lt;/SPAN&gt;&lt;SPAN style="background: white; color: black; font-family: 'Courier New';"&gt; = &lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background: white; color: black; font-family: 'Courier New';"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;SPAN style="background: white; color: blue; font-family: 'Courier New';"&gt;max&lt;/SPAN&gt;&lt;SPAN style="background: white; color: black; font-family: 'Courier New';"&gt; = &lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background: white; color: black; font-family: 'Courier New';"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;SPAN style="background: white; color: blue; font-family: 'Courier New';"&gt;min&lt;/SPAN&gt;&lt;SPAN style="background: white; color: black; font-family: 'Courier New';"&gt; = &lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background: white; color: black; font-family: 'Courier New';"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;SPAN style="background: white; color: blue; font-family: 'Courier New';"&gt;median&lt;/SPAN&gt;&lt;SPAN style="background: white; color: black; font-family: 'Courier New';"&gt; = &lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background: white; color: black; font-family: 'Courier New';"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;SPAN style="background: white; color: blue; font-family: 'Courier New';"&gt;mean&lt;/SPAN&gt;&lt;SPAN style="background: white; color: black; font-family: 'Courier New';"&gt; = &lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background: white; color: black; font-family: 'Courier New';"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;SPAN style="background: white; color: blue; font-family: 'Courier New';"&gt;std&lt;/SPAN&gt;&lt;SPAN style="background: white; color: black; font-family: 'Courier New';"&gt; = &lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background: white; color: black; font-family: 'Courier New';"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;SPAN style="background: white; color: blue; font-family: 'Courier New';"&gt;Kurt&lt;/SPAN&gt;&lt;SPAN style="background: white; color: black; font-family: 'Courier New';"&gt; = &lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background: white; color: black; font-family: 'Courier New';"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;SPAN style="background: white; color: blue; font-family: 'Courier New';"&gt;Skew&lt;/SPAN&gt;&lt;SPAN style="background: white; color: black; font-family: 'Courier New';"&gt; = &lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background: white; color: black; font-family: 'Courier New';"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;SPAN style="background: white; color: blue; font-family: 'Courier New';"&gt;n&lt;/SPAN&gt;&lt;SPAN style="background: white; color: black; font-family: 'Courier New';"&gt; = &lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN style="background: white; color: black; font-family: 'Courier New';"&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; /&lt;/SPAN&gt;&lt;SPAN style="background: white; color: blue; font-family: 'Courier New';"&gt;autoname&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background: white; color: black; font-family: 'Courier New';"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG style="color: navy; background: white; font-family: 'Courier New';"&gt;run&lt;/STRONG&gt;&lt;SPAN style="background: white; color: black; font-family: 'Courier New';"&gt;;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt; &lt;/P&gt;&lt;P&gt;&lt;SPAN style="background: white; color: black; font-family: 'Courier New';"&gt;We ran out of memory after 30 hrs.&amp;nbsp; By the way, the flag is either 1 or 0 (1= full speed, 0= partial speed).&amp;nbsp; Would it be better to do a By equipmentsernum, Date, flag instead of NWAY?&amp;nbsp; &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt; &lt;/P&gt;&lt;P&gt;&lt;SPAN style="background: white; color: black; font-family: 'Courier New';"&gt;FG&lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 16 May 2013 21:07:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77477#M16756</guid>
      <dc:creator>FredGIII</dc:creator>
      <dc:date>2013-05-16T21:07:39Z</dc:date>
    </item>
    <item>
      <title>Re: Hash Tables - problems with large datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77478#M16757</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;OK, I guess I was imagining there might be more than 1 day in the data. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Yes, you can definitely switch to a BY statement if the data are sorted.&amp;nbsp; And that would solve the memory problems.&amp;nbsp; So it all depends on what the sorted order to the data is.&amp;nbsp; If it's in order by all three variables, you can just use a BY statement instead of a CLASS statement.&amp;nbsp; But if it's only in order by one variable (say by DATE), you can use a combination:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;by date;&lt;/P&gt;&lt;P&gt;class equipmentsernum flag;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Sorting this amount of data doesn't seem realistic, however.&amp;nbsp; You would have to rely on it already being in sorted order.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 16 May 2013 21:26:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77478#M16757</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2013-05-16T21:26:23Z</dc:date>
    </item>
    <item>
      <title>Re: Hash Tables - problems with large datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77479#M16758</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;A hash table should be used as output in this case, not as input since it will never fit in memory.&lt;/P&gt;&lt;P&gt;So you need to read the table sequentially and store the summary values in a hash table as you go.&lt;/P&gt;&lt;P&gt;1- First pass: derive and store in the hash table things like sum, min, max, n, nmiss, etc.&lt;/P&gt;&lt;P&gt;2- Then process and update the hash table to derive things like mean, percentages&lt;/P&gt;&lt;P&gt;3- The second pass is similar to the steps above and used to derive std, var.&lt;/P&gt;&lt;P&gt;4- Output hash table.&lt;/P&gt;&lt;P&gt;Your hash table will have as many rows as there are classification groups, and as many columns as number of _NUM_*number of stats.&lt;/P&gt;&lt;P&gt;In my experience, this runs slower than proc summary if you need 2-pass stats like std, but with a smaller memory footprint. I never worked on such a large dataset though.&lt;/P&gt;&lt;P&gt;My 2 cents.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 16 May 2013 21:51:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77479#M16758</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2013-05-16T21:51:19Z</dc:date>
    </item>
    <item>
      <title>Re: Hash Tables - problems with large datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77480#M16759</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;So is your "DATE" variable really a datetime variable?&amp;nbsp; If so you might be able to use it as a class variable by using a format.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;&lt;STRONG style="font-style: inherit; font-family: 'Courier New'; color: navy;"&gt;proc&lt;/STRONG&gt; &lt;STRONG style="font-style: inherit; font-family: 'Courier New'; color: navy;"&gt;summary&lt;/STRONG&gt;&amp;nbsp; &lt;SPAN style="font-style: inherit; font-family: 'Courier New'; color: blue;"&gt;data&lt;/SPAN&gt;&lt;SPAN style="font-style: inherit; font-family: 'Courier New'; color: black;"&gt;=MyLargeDataset &lt;SPAN style="font-style: inherit; color: blue;"&gt;nway&lt;/SPAN&gt;&lt;SPAN style="font-style: inherit;"&gt;;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;SPAN style="font-style: inherit; font-family: 'Courier New'; color: blue;"&gt;class&lt;/SPAN&gt;&lt;SPAN style="font-style: inherit; font-family: 'Courier New'; color: black;"&gt; equipmentsernum Date flag;&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;&lt;SPAN style="font-style: inherit; font-family: 'Courier New'; color: black;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; format date dtdate.;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 16 May 2013 21:52:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77480#M16759</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2013-05-16T21:52:04Z</dc:date>
    </item>
    <item>
      <title>Re: Hash Tables - problems with large datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77481#M16760</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Tom,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In this case, the DATE variable is just that, MMDDYYYY.&amp;nbsp; It was created as a subset of a date/time stamp for this reason. And running the proc summary above - we ran out of memory.&amp;nbsp; We do have more memory on order, but have to wait on purchase order approvals, go to sourcing, purchasing..etc&amp;nbsp; So I was hoping to find a quicker way to subset the data.&amp;nbsp; If can subset the data quickly into tables by equipmentsernum then SAS can handle that file size without too much problem.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;FG&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 16 May 2013 22:15:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77481#M16760</guid>
      <dc:creator>FredGIII</dc:creator>
      <dc:date>2013-05-16T22:15:27Z</dc:date>
    </item>
    <item>
      <title>Re: Hash Tables - problems with large datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77482#M16761</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Can you run a proc contents on those three fields in the class statement and place them here?&lt;/P&gt;&lt;P&gt;You can write a macro to subset the data by &lt;SPAN style="font-style: inherit; font-family: 'Courier New'; color: black;"&gt;equipmentsernum&amp;nbsp; and loop through it if you wanted to, but you'd have to be able to run a proc freq on the dataset first. &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-style: inherit; font-family: 'Courier New'; color: black;"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-style: inherit; font-family: 'Courier New'; color: black;"&gt;Can you run the following without issue?&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-style: inherit; font-family: 'Courier New'; color: black;"&gt;proc freq data=have noprint;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-style: inherit; font-family: 'Courier New'; color: black;"&gt;table &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-style: inherit; font-family: 'Courier New'; color: black;"&gt;equipmentsernum&amp;nbsp; /out=equiplist;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-style: inherit; font-family: 'Courier New'; color: black;"&gt;run;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-style: inherit; font-family: 'Courier New'; color: black;"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-style: inherit; font-family: 'Courier New'; color: black;"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 16 May 2013 22:21:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77482#M16761</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2013-05-16T22:21:37Z</dc:date>
    </item>
    <item>
      <title>Re: Hash Tables - problems with large datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77483#M16762</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;A simple fix if you have neither sorted data, nor enough memory, is to subset the data.&amp;nbsp; For example, run your PROC SUMMARY as is, but run it twice:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;where flag=1;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;where flag=0;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;You'll need half the memory.&amp;nbsp; If that is still consuming too much memory, run it 10 times:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;where equipmentsernum =: '1';&lt;/P&gt;&lt;P&gt;where equipmentsernum =: '2';&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;etc.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Sure, it will take a while.&amp;nbsp; But any solution will take a while.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 16 May 2013 23:49:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77483#M16762</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2013-05-16T23:49:30Z</dc:date>
    </item>
    <item>
      <title>Re: Hash Tables - problems with large datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77484#M16763</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Ok, a little late for the party, but FWIW this is what I would do if you input data is at least sorted by equip id:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P style="margin-bottom: 0.0001pt;"&gt;&lt;/P&gt;&lt;P style="margin-bottom: 0.0001pt;"&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: green; background: white;"&gt;/*To get your 250 variable names for Hash use*/&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="margin-bottom: 0.0001pt;"&gt;&lt;SPAN style="color: navy; background: white; font-size: 10.0pt; font-family: 'Courier New';"&gt;&lt;STRONG&gt;proc&lt;/STRONG&gt;&lt;/SPAN&gt; &lt;SPAN style="color: navy; background: white; font-size: 10.0pt; font-family: 'Courier New';"&gt;&lt;STRONG&gt;sql&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: black; background: white;"&gt;;&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="margin-bottom: 0.0001pt;"&gt;&amp;nbsp; &lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: blue; background: white;"&gt;select&lt;/SPAN&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: black; background: white;"&gt; quote(cats(name)) &lt;/SPAN&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: blue; background: white;"&gt;into&lt;/SPAN&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: black; background: white;"&gt; :qname separated &lt;/SPAN&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: blue; background: white;"&gt;by&lt;/SPAN&gt; &lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: purple; background: white;"&gt;','&lt;/SPAN&gt; &lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: blue; background: white;"&gt;from&lt;/SPAN&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: black; background: white;"&gt; dictionary.columns &lt;/SPAN&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: blue; background: white;"&gt;where&lt;/SPAN&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: black; background: white;"&gt; LIBNAME=&lt;/SPAN&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: purple; background: white;"&gt;'YOURLIBNAME'&lt;/SPAN&gt; &lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: blue; background: white;"&gt;AND&lt;/SPAN&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: black; background: white;"&gt; MEMNAME=&lt;/SPAN&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: purple; background: white;"&gt;'MYLARGEDATASET'&lt;/SPAN&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: black; background: white;"&gt;;quit;&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="margin-bottom: 0.0001pt;"&gt;&lt;/P&gt;&lt;P style="margin-bottom: 0.0001pt;"&gt;&lt;/P&gt;&lt;P style="margin-bottom: 0.0001pt;"&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: green; background: white;"&gt;/*dynamic output dataset by equipmentsernum*/&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="margin-bottom: 0.0001pt;"&gt;&lt;SPAN style="color: navy; background: white; font-size: 10.0pt; font-family: 'Courier New';"&gt;&lt;STRONG&gt;data&lt;/STRONG&gt;&lt;/SPAN&gt; &lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: blue; background: white;"&gt;_null_&lt;/SPAN&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: black; background: white;"&gt;;&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="margin-bottom: 0.0001pt;"&gt;&amp;nbsp; &lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: blue; background: white;"&gt;declare&lt;/SPAN&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: black; background: white;"&gt; hash h(multidata:&lt;/SPAN&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: purple; background: white;"&gt;'y'&lt;/SPAN&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: black; background: white;"&gt;);&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="margin-bottom: 0.0001pt;"&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: black; background: white;"&gt;&amp;nbsp; h.definekey(&lt;/SPAN&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: purple; background: white;"&gt;'equipmentsernum'&lt;/SPAN&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: black; background: white;"&gt;);&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="margin-bottom: 0.0001pt;"&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: black; background: white;"&gt;&amp;nbsp; h.definedata(&amp;amp;q&lt;/SPAN&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: teal; background: white;"&gt;name.&lt;/SPAN&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: black; background: white;"&gt;);&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="margin-bottom: 0.0001pt;"&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: black; background: white;"&gt;&amp;nbsp; h.definedone();&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="margin-bottom: 0.0001pt;"&gt;&amp;nbsp; &lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: blue; background: white;"&gt;do&lt;/SPAN&gt; &lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: blue; background: white;"&gt;until&lt;/SPAN&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: black; background: white;"&gt; (last.equipmentsernum);&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="margin-bottom: 0.0001pt;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: blue; background: white;"&gt;set&lt;/SPAN&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: black; background: white;"&gt; MYLARGEDATASET;&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="margin-bottom: 0.0001pt;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: blue; background: white;"&gt;by&lt;/SPAN&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: black; background: white;"&gt; equipmentsernum;&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="margin-bottom: 0.0001pt;"&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: black; background: white;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; rc=h.add();&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="margin-bottom: 0.0001pt;"&gt;&amp;nbsp; &lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: blue; background: white;"&gt;end&lt;/SPAN&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: black; background: white;"&gt;;&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="margin-bottom: 0.0001pt;"&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: black; background: white;"&gt;&amp;nbsp; rc=h.output(dataset:&lt;/SPAN&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: purple; background: white;"&gt;'out'&lt;/SPAN&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: black; background: white;"&gt;||&lt;/SPAN&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: purple; background: white;"&gt;'_'&lt;/SPAN&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: black; background: white;"&gt;||equipmentsernum);&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="margin-bottom: 0.0001pt;"&gt;&amp;nbsp; &lt;SPAN style="color: navy; background: white; font-size: 10.0pt; font-family: 'Courier New';"&gt;&lt;STRONG&gt;run&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: black; background: white;"&gt;;&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="margin-bottom: 0.0001pt;"&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: black; background: white;"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="margin-bottom: 0.0001pt;"&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: black; background: white;"&gt;After subset, you probably want to use Macro to loop through Proc mean, you can still use &amp;amp;name for downstream Macro processing.&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="margin-bottom: 0.0001pt;"&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: black; background: white;"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="margin-bottom: 0.0001pt;"&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: black; background: white;"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="margin-bottom: 0.0001pt;"&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: black; background: white;"&gt;HTH&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="margin-bottom: 0.0001pt;"&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New'; color: black; background: white;"&gt;Haikuo&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 17 May 2013 02:34:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77484#M16763</guid>
      <dc:creator>Haikuo</dc:creator>
      <dc:date>2013-05-17T02:34:09Z</dc:date>
    </item>
    <item>
      <title>Re: Hash Tables - problems with large datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77485#M16764</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;If your dataset is sorted before , That would be possible to use Hash Table by cleaning it after you get a group MEAN value.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Ksharp&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Message was edited by: xia keshan&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 17 May 2013 03:24:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77485#M16764</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2013-05-17T03:24:25Z</dc:date>
    </item>
    <item>
      <title>Re: Hash Tables - problems with large datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77486#M16765</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Reeza,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am running proc freq as you suggested now.&amp;nbsp; I imagine it will take a while to get through the data (if it gets through all of it).&amp;nbsp; Will report back when something happens.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;FG&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 17 May 2013 14:03:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77486#M16765</guid>
      <dc:creator>FredGIII</dc:creator>
      <dc:date>2013-05-17T14:03:55Z</dc:date>
    </item>
    <item>
      <title>Re: Hash Tables - problems with large datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77487#M16766</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Did you try the summary with a small subset of the VAR instead of _numeric_? I would be tempted to see if doing batches of 10 or so variables would run without exhausting memory and possibly within a reasonable time frame&amp;nbsp; Then merge the resulting summary datasets.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 17 May 2013 16:07:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-Tables-problems-with-large-datasets/m-p/77487#M16766</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2013-05-17T16:07:19Z</dc:date>
    </item>
  </channel>
</rss>

