<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: sas large dataset summary statistics in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/sas-large-dataset-summary-statistics/m-p/479466#M123775</link>
    <description>&lt;P&gt;Afaik both procs are optimized to do the job, so you will hardly find anything more efficient.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;EDIT: But, if you would post example input data and the required result, another approach could exist.&lt;/P&gt;</description>
    <pubDate>Thu, 19 Jul 2018 11:55:19 GMT</pubDate>
    <dc:creator>andreas_lds</dc:creator>
    <dc:date>2018-07-19T11:55:19Z</dc:date>
    <item>
      <title>sas large dataset summary statistics</title>
      <link>https://communities.sas.com/t5/SAS-Programming/sas-large-dataset-summary-statistics/m-p/479452#M123769</link>
      <description>&lt;P&gt;Hello,&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have large sas tables (100 mln lines) and up to 200 columns in each one. Could someone share a code that would produce a&amp;nbsp;summary statistics on both categorical and numeric variables.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I need something similar but more stats (count, min, max, avg, std, q1, q3...etc):&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://communities.sas.com/t5/SAS-Procedures/Count-missing-values-of-100-variables-in-with-column-output/td-p/235075/page/2" target="_blank"&gt;https://communities.sas.com/t5/SAS-Procedures/Count-missing-values-of-100-variables-in-with-column-output/td-p/235075/page/2&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks in advance.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 19 Jul 2018 11:04:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/sas-large-dataset-summary-statistics/m-p/479452#M123769</guid>
      <dc:creator>gabriel_k</dc:creator>
      <dc:date>2018-07-19T11:04:32Z</dc:date>
    </item>
    <item>
      <title>Re: sas large dataset summary statistics</title>
      <link>https://communities.sas.com/t5/SAS-Programming/sas-large-dataset-summary-statistics/m-p/479453#M123770</link>
      <description>&lt;P&gt;&lt;A href="http://documentation.sas.com/?cdcId=pgmsascdc&amp;amp;cdcVersion=9.4_3.3&amp;amp;docsetId=procstat&amp;amp;docsetTarget=procstat_freq_toc.htm&amp;amp;locale=en" target="_blank"&gt;proc freq&lt;/A&gt; and &lt;A href="http://documentation.sas.com/?cdcId=pgmsascdc&amp;amp;cdcVersion=9.4_3.3&amp;amp;docsetId=proc&amp;amp;docsetTarget=p0aq3hsvflztfzn1xa2wt6s35oy6.htm&amp;amp;locale=de" target="_blank"&gt;proc summary&lt;/A&gt;. Study the examples given.&lt;/P&gt;</description>
      <pubDate>Thu, 19 Jul 2018 11:08:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/sas-large-dataset-summary-statistics/m-p/479453#M123770</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2018-07-19T11:08:06Z</dc:date>
    </item>
    <item>
      <title>Re: sas large dataset summary statistics</title>
      <link>https://communities.sas.com/t5/SAS-Programming/sas-large-dataset-summary-statistics/m-p/479455#M123772</link>
      <description>&lt;P&gt;Is there other option? I get the insufficient memory error.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 19 Jul 2018 12:49:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/sas-large-dataset-summary-statistics/m-p/479455#M123772</guid>
      <dc:creator>gabriel_k</dc:creator>
      <dc:date>2018-07-19T12:49:12Z</dc:date>
    </item>
    <item>
      <title>Re: sas large dataset summary statistics</title>
      <link>https://communities.sas.com/t5/SAS-Programming/sas-large-dataset-summary-statistics/m-p/479466#M123775</link>
      <description>&lt;P&gt;Afaik both procs are optimized to do the job, so you will hardly find anything more efficient.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;EDIT: But, if you would post example input data and the required result, another approach could exist.&lt;/P&gt;</description>
      <pubDate>Thu, 19 Jul 2018 11:55:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/sas-large-dataset-summary-statistics/m-p/479466#M123775</guid>
      <dc:creator>andreas_lds</dc:creator>
      <dc:date>2018-07-19T11:55:19Z</dc:date>
    </item>
    <item>
      <title>Re: sas large dataset summary statistics</title>
      <link>https://communities.sas.com/t5/SAS-Programming/sas-large-dataset-summary-statistics/m-p/479473#M123780</link>
      <description>&lt;P&gt;Just usual dataset with both numeric and character variables. The problem is that the tables are probably to large to run on a server set up.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 19 Jul 2018 12:51:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/sas-large-dataset-summary-statistics/m-p/479473#M123780</guid>
      <dc:creator>gabriel_k</dc:creator>
      <dc:date>2018-07-19T12:51:28Z</dc:date>
    </item>
    <item>
      <title>Re: sas large dataset summary statistics</title>
      <link>https://communities.sas.com/t5/SAS-Programming/sas-large-dataset-summary-statistics/m-p/479488#M123793</link>
      <description>&lt;P&gt;If you run out of memory, you need to sort the tables first by the class variables, so you can use "by" instead of "class" in proc summary. Running statistics for one group at a time (that's what by-processing does) needs next to no memory.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 19 Jul 2018 13:38:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/sas-large-dataset-summary-statistics/m-p/479488#M123793</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2018-07-19T13:38:21Z</dc:date>
    </item>
    <item>
      <title>Re: sas large dataset summary statistics</title>
      <link>https://communities.sas.com/t5/SAS-Programming/sas-large-dataset-summary-statistics/m-p/479531#M123814</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/58279"&gt;@gabriel_k&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;Is there other option? I get the insufficient memory error.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;1) Show your code, we don't have a clue what you actually did&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;2) Proc freq has the possibility of creating a LOT of output and the tables are built in memory for display in the results window.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; If there are variables that you really aren't interested such as a unique identifier variable, which is going to create one row of output for each value in proc freq, you might consider dropping them from the table by using (drop=variables) data set option with the proc statement data= .&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You might want to send the output direct to a different ODS destination than results;&lt;/P&gt;
&lt;P&gt;/* turn o&lt;/P&gt;
&lt;P&gt;ods html close;&lt;/P&gt;
&lt;P&gt;ods rtf file="&amp;lt;path&amp;gt;\summary.rtf";&lt;/P&gt;
&lt;P&gt;proc freq data=have;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;proc means data=have;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;ods rtf close;&lt;/P&gt;</description>
      <pubDate>Thu, 19 Jul 2018 14:54:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/sas-large-dataset-summary-statistics/m-p/479531#M123814</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2018-07-19T14:54:38Z</dc:date>
    </item>
    <item>
      <title>Re: sas large dataset summary statistics</title>
      <link>https://communities.sas.com/t5/SAS-Programming/sas-large-dataset-summary-statistics/m-p/479637#M123844</link>
      <description>&lt;P&gt;Hi &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/58279"&gt;@gabriel_k&lt;/a&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;SAS Enterprise Guide has that feature within the &lt;STRONG&gt;Characterize Data Wizard&lt;/STRONG&gt;!&lt;BR /&gt;The Attached macro is a wrapped version the code generated by EG.&lt;BR /&gt;&lt;BR /&gt;Here is how you can run it in your SAS session &lt;span class="lia-unicode-emoji" title=":smiling_face_with_smiling_eyes:"&gt;😊&lt;/span&gt;:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;%_EG_CHARACT_WRAPPER_ (p_inDsName=&amp;lt;LIB.&amp;gt;DataSet, p_inCatMaxLimit=30);  /* p_inCatMaxLimit: Maximum count for unique category value */
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Note: For processing large SAS data sets, you can specify the following options on the SAS Command&lt;/P&gt;
&lt;P&gt;-MEMSIZE 4G -SORTSIZE 2G&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Enjoy,&lt;BR /&gt;Ahmed&lt;/P&gt;
&lt;DIV class="_n_n3"&gt;
&lt;DIV class="conductorContent" tabindex="-1"&gt;
&lt;DIV id="primaryContainer" class="_n_e" style="position: absolute;"&gt;
&lt;DIV style="left: 0px; top: 50px; width: auto; height: auto; right: 0px; bottom: 0px; overflow: hidden; position: absolute;"&gt;
&lt;DIV class="conductorContent" tabindex="-1"&gt;
&lt;DIV class="_n_T" style="position: absolute;"&gt;
&lt;DIV class="_n_X" style="left: 0px; top: 0px; width: auto; height: auto; right: 0px; bottom: 0px; position: absolute;"&gt;
&lt;DIV class="_n_X" style="left: 197px; top: 0px; width: auto; height: auto; right: 0px; bottom: 0px; position: absolute;"&gt;
&lt;DIV style="left: 0px; top: 40px; width: auto; height: auto; right: 0px; bottom: 0px; position: absolute;"&gt;
&lt;DIV class="_n_X" style="position: absolute;"&gt;
&lt;DIV class="_n_Y" style="left: 4px; top: 312px; width: auto; height: auto; right: 0px; bottom: 0px; position: absolute;" tabindex="-1"&gt;
&lt;DIV class="allowTextSelection"&gt;
&lt;DIV class="conductorContent"&gt;
&lt;DIV class="_rp_k" tabindex="-1"&gt;
&lt;DIV class="_rp_k allowTextSelection" style="position: relative;"&gt;
&lt;DIV class="_rp_l allowTextSelection scrollContainer" style="left: 0px; top: 0px; width: auto; height: auto; right: 0px; bottom: 0px; position: absolute;" tabindex="-1"&gt;
&lt;DIV&gt;
&lt;DIV tabindex="-1"&gt;
&lt;DIV tabindex="-1"&gt;
&lt;DIV tabindex="0"&gt;
&lt;DIV&gt;
&lt;DIV class="_rp_m5" tabindex="-1"&gt;
&lt;DIV class="_rp_Y4 ms-border-color-neutralLight ShowConsesusSchedulingLink ShowReferenceAttachmentsLinks" tabindex="-1"&gt;
&lt;DIV class="_rp_b5 _rp_a5"&gt;
&lt;DIV&gt;
&lt;DIV id="Item.MessagePartBody" class="_rp_05"&gt;
&lt;DIV id="Item.MessageUniqueBody" class="_rp_15 ms-font-weight-regular ms-font-color-neutralDark rpHighlightAllClass rpHighlightBodyClass" style="font-family: 'wf_segoe-ui_normal', 'Segoe UI', 'Segoe WP', Tahoma, Arial, sans-serif,serif,'EmojiFont';"&gt;
&lt;DIV&gt;
&lt;DIV dir="ltr"&gt;
&lt;DIV id="divtagdefaultwrapper"&gt;
&lt;DIV style="margin-top: 0px; margin-bottom: 0px;"&gt;I had a user asking if I knew of a macro that creates a frequency table for every variable in a given data set. My initial answer was, NO. But SAS Enterprise Guide has that feature within the Characterize Data Wizard!&lt;/DIV&gt;
&lt;DIV style="margin-top: 0px; margin-bottom: 0px;"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV style="margin-top: 0px; margin-bottom: 0px;"&gt;So, I cheated and wrapped the code generated by EG into this attached macro, so you can run it on your SAS server in batch &lt;span class="lia-unicode-emoji" title=":smiling_face_with_smiling_eyes:"&gt;😊&lt;/span&gt;&lt;/DIV&gt;
&lt;DIV style="margin-top: 0px; margin-bottom: 0px;"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV id="Signature"&gt;
&lt;DIV id="divtagdefaultwrapper" style="background-color: white;"&gt;
&lt;DIV style="margin-top: 0px; margin-bottom: 0px;"&gt;&lt;U&gt;Usage Example:&lt;/U&gt;&lt;/DIV&gt;
&lt;DIV style="margin-top: 0px; margin-bottom: 0px;"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV style="margin-top: 0px; margin-bottom: 0px;"&gt;%_EG_CHARACT_WRAPPER_ (&lt;STRONG&gt;p_inDsName&lt;/STRONG&gt;=&amp;lt;LIB.&amp;gt;DataSet, &lt;STRONG&gt;p_inCatMaxLimit&lt;/STRONG&gt;=30);&amp;nbsp; /* &lt;STRONG&gt;p_inCatMaxLimit&lt;/STRONG&gt;: Maximum count for unique category value */&lt;/DIV&gt;
&lt;DIV style="margin-top: 0px; margin-bottom: 0px;"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV style="margin-top: 0px; margin-bottom: 0px;"&gt;Enjoy,&lt;/DIV&gt;
&lt;DIV style="margin-top: 0px; margin-bottom: 0px;"&gt;Ahmed&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;</description>
      <pubDate>Thu, 19 Jul 2018 18:34:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/sas-large-dataset-summary-statistics/m-p/479637#M123844</guid>
      <dc:creator>AhmedAl_Attar</dc:creator>
      <dc:date>2018-07-19T18:34:02Z</dc:date>
    </item>
    <item>
      <title>Re: sas large dataset summary statistics</title>
      <link>https://communities.sas.com/t5/SAS-Programming/sas-large-dataset-summary-statistics/m-p/479769#M123879</link>
      <description>&lt;P&gt;If you're getting a shortfall in memory, it's more likely due to frequency tabulations than the calculation of parametric statistics.&amp;nbsp; If so, then just do fewer variables at a time.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Since you have yet to show us the requested code that produced the problem, I will assume you are asking the stat procedure(s) to process a maximum number of variables, which is why I made the suggestion above.&lt;/P&gt;</description>
      <pubDate>Fri, 20 Jul 2018 01:21:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/sas-large-dataset-summary-statistics/m-p/479769#M123879</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2018-07-20T01:21:01Z</dc:date>
    </item>
    <item>
      <title>Re: sas large dataset summary statistics</title>
      <link>https://communities.sas.com/t5/SAS-Programming/sas-large-dataset-summary-statistics/m-p/479791#M123888</link>
      <description>&lt;P&gt;What information do you want for the character variables?&lt;/P&gt;</description>
      <pubDate>Fri, 20 Jul 2018 03:53:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/sas-large-dataset-summary-statistics/m-p/479791#M123888</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2018-07-20T03:53:29Z</dc:date>
    </item>
    <item>
      <title>Re: sas large dataset summary statistics</title>
      <link>https://communities.sas.com/t5/SAS-Programming/sas-large-dataset-summary-statistics/m-p/479793#M123889</link>
      <description>&lt;P&gt;Some statistics are very hungry when the cardinality is high. That's just the way it is.&lt;/P&gt;
&lt;P&gt;Q1, median, etc potentially need all the values to be evaluated at the same time.&lt;/P&gt;
&lt;P&gt;So consider whether you really need them.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Likewise for the number of distinct values.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you need them, then you just have to bite the bullet and spend the resources.&lt;/P&gt;
&lt;P&gt;This may mean deriving them with smaller variable batches, and making more runs.&lt;/P&gt;
&lt;P&gt;That's expensive as I said.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Giving SAS as much RAM as possible helps (options REALMEMSIZE for the SAS session, SUMSIZE for proc means, the confusingly named option UBUFSIZE -or BUFFERSIZE as it used to be called- for proc sql) if you have access to them.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;MEMSIZE includes paging space, which you have to be careful not to use for your purpose. It's much faster to run several batches than to page data.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 20 Jul 2018 04:25:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/sas-large-dataset-summary-statistics/m-p/479793#M123889</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2018-07-20T04:25:33Z</dc:date>
    </item>
  </channel>
</rss>

