<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Proc SQL with Large Data Sets in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110801#M30729</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Okay...But I'll need some information.&lt;/P&gt;&lt;P&gt;Let's assume you have a table with corteges &amp;lt;pid, fid, year, ret&amp;gt;.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The objective is to calculate&lt;/P&gt;&lt;P&gt;ret2 for each cortege by this mnemonic formula:&lt;/P&gt;&lt;P&gt;&amp;lt;pid', fid', y', ret'&amp;gt; - the cortege for which we calculate our ret2.&lt;/P&gt;&lt;P&gt;ret2 = avg(ret) over ( pid &amp;lt;&amp;gt; pid' , y == y', fid &amp;lt;&amp;gt; fid') &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Here it will be usefull if you provide some euristics about the data.&lt;/P&gt;&lt;P&gt;Like "Portid could be only 1 to 5". That will help to find a better way of solving the task.&lt;/P&gt;&lt;P&gt;And moreover, if you'll have portid=3 in some row, will ret2 be calculated as an AVG through different portid's or it will be seperate for each value of portid&amp;lt;&amp;gt;portid' ?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Tue, 30 Oct 2012 09:46:47 GMT</pubDate>
    <dc:creator>ghastly_kitten</dc:creator>
    <dc:date>2012-10-30T09:46:47Z</dc:date>
    <item>
      <title>Proc SQL with Large Data Sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110797#M30725</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Dear All,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Below is a portion of my data set. The code below does what I need to accomplish (which greatly benefited from your help in the past): merge these two data sets (return and need), but make sure that myear is the same, but portid is not the same. The actual sizes of the data sets are 961,896 rows X 12 columns for return, and 15,600 rows X 3,085 columns for need. When I use the actual data, the program runs for over 24 hours and then says it has to stop due to lack of hard disk space (after creating a temporary file of about 300 gb). Is there a waPry to run this PROC SQL part more efficiently (or avoid it)?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;**************************************************************************************************;&lt;/P&gt;&lt;P&gt;DATA return;&lt;/P&gt;&lt;P&gt;INPUT portid firmid myear return;&lt;/P&gt;&lt;P&gt;DATALINES;&lt;/P&gt;&lt;P&gt;1 100 200301&amp;nbsp; 2.11&lt;/P&gt;&lt;P&gt;1 120 200301&amp;nbsp; 3.12&lt;/P&gt;&lt;P&gt;1 130 200301&amp;nbsp; 2.13&lt;/P&gt;&lt;P&gt;1 140 200301&amp;nbsp; 3.67&lt;/P&gt;&lt;P&gt;1 100 200302&amp;nbsp; 5.23&lt;/P&gt;&lt;P&gt;1 120 200302&amp;nbsp; 6.81&lt;/P&gt;&lt;P&gt;1 130 200302&amp;nbsp; 2.55&lt;/P&gt;&lt;P&gt;1 140 200302&amp;nbsp; 4.21&lt;/P&gt;&lt;P&gt;1 150 200302&amp;nbsp; 3.21&lt;/P&gt;&lt;P&gt;2 300 200301&amp;nbsp; 1.11&lt;/P&gt;&lt;P&gt;2 320 200301&amp;nbsp; 0.12&lt;/P&gt;&lt;P&gt;2 330 200301 -2.13&lt;/P&gt;&lt;P&gt;2 340 200301 -1.67&lt;/P&gt;&lt;P&gt;2 300 200302&amp;nbsp; 4.23&lt;/P&gt;&lt;P&gt;2 320 200302&amp;nbsp; 3.81&lt;/P&gt;&lt;P&gt;2 330 200302&amp;nbsp; 1.55&lt;/P&gt;&lt;P&gt;2 340 200302&amp;nbsp; 2.21&lt;/P&gt;&lt;P&gt;2 350 200302&amp;nbsp; 4.23&lt;/P&gt;&lt;P&gt;2 100 200302&amp;nbsp; 3.66&lt;/P&gt;&lt;P&gt;;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;**************************************************************************************************************;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc transpose data=return out=need (drop=_name_);&lt;/P&gt;&lt;P&gt;&amp;nbsp; var return;&lt;/P&gt;&lt;P&gt;&amp;nbsp; id firmid;&lt;/P&gt;&lt;P&gt;&amp;nbsp; by portid myear;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;************************************************************************************************************;&lt;/P&gt;&lt;P&gt;proc sql;&lt;/P&gt;&lt;P&gt;&amp;nbsp; create table want as&lt;/P&gt;&lt;P&gt;&amp;nbsp; select * &lt;/P&gt;&lt;P&gt;&amp;nbsp; from return a, need b&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; where a.myear=b.myear and a.portid ne b.portid;&lt;/P&gt;&lt;P&gt;quit;&lt;/P&gt;&lt;P&gt;**********************************************************************************************************;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 29 Oct 2012 19:02:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110797#M30725</guid>
      <dc:creator>finans_sas</dc:creator>
      <dc:date>2012-10-29T19:02:16Z</dc:date>
    </item>
    <item>
      <title>Re: Proc SQL with Large Data Sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110798#M30726</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;Yes, Cartesian product will exhaust your resources. I haven't done much SQL join inside SAS, I usually do it via pass-through, which handles large datasets pretty well, but it would not help with your problem. So I will suggest you using Hash() approach. First make sure you have enough memory(computer RAM) to accommodate the entire 'need', then you may proceed with the following code:&lt;/P&gt;&lt;P&gt;DATA return;&lt;/P&gt;&lt;P&gt;INPUT portid firmid myear return;&lt;/P&gt;&lt;P&gt;DATALINES;&lt;/P&gt;&lt;P&gt;1 100 200301 2.11&lt;/P&gt;&lt;P&gt;1 120 200301 3.12&lt;/P&gt;&lt;P&gt;1 130 200301 2.13&lt;/P&gt;&lt;P&gt;1 140 200301 3.67&lt;/P&gt;&lt;P&gt;1 100 200302 5.23&lt;/P&gt;&lt;P&gt;1 120 200302 6.81&lt;/P&gt;&lt;P&gt;1 130 200302 2.55&lt;/P&gt;&lt;P&gt;1 140 200302 4.21&lt;/P&gt;&lt;P&gt;1 150 200302 3.21&lt;/P&gt;&lt;P&gt;2 300 200301 1.11&lt;/P&gt;&lt;P&gt;2 320 200301 0.12&lt;/P&gt;&lt;P&gt;2 330 200301 -2.13&lt;/P&gt;&lt;P&gt;2 340 200301 -1.67&lt;/P&gt;&lt;P&gt;2 300 200302 4.23&lt;/P&gt;&lt;P&gt;2 320 200302 3.81&lt;/P&gt;&lt;P&gt;2 330 200302 1.55&lt;/P&gt;&lt;P&gt;2 340 200302 2.21&lt;/P&gt;&lt;P&gt;2 350 200302 4.23&lt;/P&gt;&lt;P&gt;2 100 200302 3.66&lt;/P&gt;&lt;P&gt;;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data want;&lt;/P&gt;&lt;P&gt;&amp;nbsp; if _n_=1 then do;&lt;/P&gt;&lt;P&gt;&amp;nbsp; if 0 then set need(rename=portid=np);&lt;/P&gt;&lt;P&gt;&amp;nbsp; declare hash h(dataset:'need(rename=portid=np)', multidata:'y');&lt;/P&gt;&lt;P&gt;h.definekey('myear');&lt;/P&gt;&lt;P&gt;h.definedata(all:'y');&lt;/P&gt;&lt;P&gt;h.definedone();&lt;/P&gt;&lt;P&gt;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp; set return;&lt;/P&gt;&lt;P&gt;&amp;nbsp; rc=h.find();&lt;/P&gt;&lt;P&gt;do rc=0 by 0 while (rc=0);&lt;/P&gt;&lt;P&gt;if portid ne np then output;&lt;/P&gt;&lt;P&gt;rc=h.find_next();&lt;/P&gt;&lt;P&gt;end;&lt;/P&gt;&lt;P&gt;drop np rc;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Haikuo&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 29 Oct 2012 19:52:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110798#M30726</guid>
      <dc:creator>Haikuo</dc:creator>
      <dc:date>2012-10-29T19:52:03Z</dc:date>
    </item>
    <item>
      <title>Re: Proc SQL with Large Data Sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110799#M30727</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Dear finans_sas,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;could you please explain the target of your manipulation?&lt;/P&gt;&lt;P&gt;Whether you will use hash or not, the total design of calculation looks quite ineffective and strange.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;What's the target?&lt;/P&gt;&lt;P&gt;You have the &lt;STRONG&gt;return&lt;/STRONG&gt; of the firm (firmid), but what do you want to calculate by joining the transposed table for a certain year period and &lt;STRONG&gt;portid&lt;/STRONG&gt; with it's source with different &lt;STRONG&gt;portid?&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;BR /&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Maybe we will be able to propose a better design for you research?&lt;STRONG&gt;&lt;BR /&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 30 Oct 2012 07:11:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110799#M30727</guid>
      <dc:creator>ghastly_kitten</dc:creator>
      <dc:date>2012-10-30T07:11:39Z</dc:date>
    </item>
    <item>
      <title>Re: Proc SQL with Large Data Sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110800#M30728</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Thank you, Haikuo and ghastly_kitten for your help. Below is what I want to accomplish;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;My objective is to calculate average returns for firmid in each portid-myear pair by using returns from the same myear, but from different firmid and portid. For example,&lt;/P&gt;&lt;P&gt;the average return for:&lt;/P&gt;&lt;P&gt;firmid=100 in portid=2 and myear=200302 (observation # 19)=(6.81 + 2.55 + 4.21 + 3.21)/4=&lt;/P&gt;&lt;P&gt;firmid=120 in portid=1 and myear=200301 (observation #2)=(1.11 + 0.12 -2.13 -1.67)/4=&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I would really appreciate your help with this problem.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 30 Oct 2012 09:18:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110800#M30728</guid>
      <dc:creator>finans_sas</dc:creator>
      <dc:date>2012-10-30T09:18:00Z</dc:date>
    </item>
    <item>
      <title>Re: Proc SQL with Large Data Sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110801#M30729</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Okay...But I'll need some information.&lt;/P&gt;&lt;P&gt;Let's assume you have a table with corteges &amp;lt;pid, fid, year, ret&amp;gt;.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The objective is to calculate&lt;/P&gt;&lt;P&gt;ret2 for each cortege by this mnemonic formula:&lt;/P&gt;&lt;P&gt;&amp;lt;pid', fid', y', ret'&amp;gt; - the cortege for which we calculate our ret2.&lt;/P&gt;&lt;P&gt;ret2 = avg(ret) over ( pid &amp;lt;&amp;gt; pid' , y == y', fid &amp;lt;&amp;gt; fid') &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Here it will be usefull if you provide some euristics about the data.&lt;/P&gt;&lt;P&gt;Like "Portid could be only 1 to 5". That will help to find a better way of solving the task.&lt;/P&gt;&lt;P&gt;And moreover, if you'll have portid=3 in some row, will ret2 be calculated as an AVG through different portid's or it will be seperate for each value of portid&amp;lt;&amp;gt;portid' ?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 30 Oct 2012 09:46:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110801#M30729</guid>
      <dc:creator>ghastly_kitten</dc:creator>
      <dc:date>2012-10-30T09:46:47Z</dc:date>
    </item>
    <item>
      <title>Re: Proc SQL with Large Data Sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110802#M30730</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Here's the scheme of calculation which could be useful.&lt;/P&gt;&lt;P&gt;The key idea is that calculating averages is easier than producing cartesian join.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Example:&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;firmid=100 in portid=2 and myear=200302 (observation # 19)=(6.81 + 2.55 + 4.21 + 3.21)/4=&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;here you can first calculate average for ret for all firmids.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;Let us assume, that we have firmid = 100 for portid = 1 and myear = 200302 with some ret value &lt;STRONG&gt;R.&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;A = &lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;(6.81 + 2.55 + 4.21 + 3.21 &lt;STRONG&gt;+ R&lt;/STRONG&gt;)/5&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: #ffffff; font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif;"&gt;N = 5 (amount of values used)&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;Now, as we now that &lt;STRONG&gt;R &lt;/STRONG&gt;shouldn't be used for calculating real average (let's denote it &lt;STRONG&gt;B&lt;/STRONG&gt;) we can do such a thing&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;&lt;STRONG&gt;B = (&lt;/STRONG&gt;A * N - R) / (N - 1);&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;Now the real task is harder, since we can have more that 2 portids (but at the same time we assume that there is no entries with different &lt;STRONG&gt;ret &lt;/STRONG&gt;for the same &lt;STRONG&gt;firmid portid myear&lt;/STRONG&gt;), and we need to calc averages over different portids. But it's quite solveable in the same manner.&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: #ffffff; font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif;"&gt;&lt;STRONG&gt;&lt;BR /&gt;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: #ffffff; font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif;"&gt;&lt;STRONG&gt;So my proposal is to use this scheme:&lt;BR /&gt;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;1. Calculate averages and count (or sum and count) for each group of&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; proc sql;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; create table Averages as&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; select&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; portid as pid,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; myear,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; avg(ret) as A_pid,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; count(ret) as N_pid&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; from&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; group by&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; portid, myear&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; order by&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; portid, myear;&lt;/P&gt;&lt;P&gt;quit;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/* Now! if we need to have an average over different portid which is not equal to source (for example for firmid=100 and portid=1 there are different firms for portid = 2,3,4,5,... ; but we need the result without grouping by this portid's, then we need averages over all portid's within the same year */&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; proc sql;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; create table Averages as&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; select&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; pid,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; myear,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; A_pid,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; N_pid,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; sum( A_pid * N_pid ) / sum (N_pid)&amp;nbsp; as total_average&amp;nbsp;&amp;nbsp;&amp;nbsp; /* that is just total average of all ret's; you don't need it, just to point the meaning */,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (sum( A_pid * N_pid )&amp;nbsp; - A_pid * N_pid ) / ( sum (N_pid) - N_pid ) as AVG_RET_for_other_pids_same_year,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; sum (N_pid) - N_pid as NUM_of_terms_in_avg&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; from&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Averages&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; group by&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; myear&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; order by&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; pid, myear;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; quit;&lt;/P&gt;&lt;P&gt;/* yes, the grouping is only within &lt;STRONG&gt;myear &lt;/STRONG&gt;and it will work, and quite fast */&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;2. Now, to get ret2 for each firmid we need to make "an exclusion list" for each firmid.&lt;/P&gt;&lt;P&gt;As by calculating the target ret2 we don't need values of firmid at all, we calculate SUM (this will help to simplify formulas) and amount:&lt;/P&gt;&lt;P&gt;proc sql;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; create table Firmid_averages as&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; select&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; firmid as fid,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; myear,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; SUM(ret) as R_fid,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; count(*) as N_fid&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; from&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; group by&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; firmid, myear;&lt;/P&gt;&lt;P&gt;quit;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;3. Finally, get ret2 (which you can &lt;STRONG&gt;LEFT&lt;/STRONG&gt; join to source table after that, in order to compare ret and ret2)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc sql;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; create table RETURN2 as&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; select&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; f.fid as firmid,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; p.pid as potid,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; f.myear,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (p.AVG_RET_for_other_pids_same_year * p.NUM_of_terms_in_avg - f.R_fid) / (p.NUM_of_terms_in_avg - f.N_fid) as RET2&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; from&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Firmid_averages f left join&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Averages p on&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; f.myear = p.myear&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; order by&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; portid, myear, firmid;&lt;/P&gt;&lt;P&gt;quit;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;4. Left join this to your source if you need to see the difference between ret and ret2.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;... well, the code is a bit entangled. Actually, everything could be slightly simplified if you will use sums instead of averages. But I was afraid that the idea will not be clear from the simple code. So... I used this.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 30 Oct 2012 10:34:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110802#M30730</guid>
      <dc:creator>ghastly_kitten</dc:creator>
      <dc:date>2012-10-30T10:34:16Z</dc:date>
    </item>
    <item>
      <title>Re: Proc SQL with Large Data Sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110803#M30731</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hello.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If I got this right, it's just a matter of summing/counting the data by porid, myear and merge back with the original dataset with the correct join clause.&lt;/P&gt;&lt;P&gt;For this kind of data volume (under 1M) a simple two level SQL join will do the trick.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG style="font-size: 10.0pt; font-family: 'Courier New';"&gt;proc&lt;/STRONG&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New';"&gt; &lt;STRONG&gt;sql&lt;/STRONG&gt; noprint;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New';"&gt;create table want as&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New';"&gt;select a.portid, a.firmid, a.myear, a.return,&lt;/SPAN&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New';"&gt;(b.sumreturn/b.cntreturn) as avgreturn&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: 'Courier New'; font-size: 10pt;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; from&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New';"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return as a,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New';"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (select portid, myear, sum(return) as sumreturn, count(return) as cntreturn&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New';"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; from &lt;/SPAN&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New';"&gt;return group by portid, myear) as b&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New';"&gt;where a.portid &amp;lt;&amp;gt; b.portid and a.myear = b.myear;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG style="font-size: 10.0pt; font-family: 'Courier New';"&gt;quit&lt;/STRONG&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New';"&gt;;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New';"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Cheers from Portugal.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Daniel Santos @ &lt;A href="http://www.cgd.pt" target="_blank"&gt;www.cgd.pt&lt;/A&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 30 Oct 2012 12:18:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110803#M30731</guid>
      <dc:creator>DanielSantos</dc:creator>
      <dc:date>2012-10-30T12:18:16Z</dc:date>
    </item>
    <item>
      <title>Re: Proc SQL with Large Data Sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110804#M30732</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;DanielSantos,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If &lt;STRONG&gt;I &lt;/STRONG&gt;got the task right, this is not that easy &lt;img id="smileyhappy" class="emoticon emoticon-smileyhappy" src="https://communities.sas.com/i/smilies/16x16_smiley-happy.png" alt="Smiley Happy" title="Smiley Happy" /&gt;&lt;/P&gt;&lt;P&gt;For each average return you need to exclude the firms with the same id from calculating averages.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 30 Oct 2012 12:54:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110804#M30732</guid>
      <dc:creator>ghastly_kitten</dc:creator>
      <dc:date>2012-10-30T12:54:12Z</dc:date>
    </item>
    <item>
      <title>Re: Proc SQL with Large Data Sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110805#M30733</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Dataset 'need' has 3,085 variables?&amp;nbsp; I hope that's a typo.&amp;nbsp; If it's not, it should be no surprise you're having a performance issue.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Get rid of the thousands of variables you don't need.&amp;nbsp; Either use a DATA step with a KEEP statement to specify only the (maybe) couple dozen variables you require before the join/merge or instead of SELECT * in your query Select a list of the variables you require.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 30 Oct 2012 14:51:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110805#M30733</guid>
      <dc:creator>bentleyj1</dc:creator>
      <dc:date>2012-10-30T14:51:09Z</dc:date>
    </item>
    <item>
      <title>Re: Proc SQL with Large Data Sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110806#M30734</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;OK ghastly_kitten, you're right I missed this requirement, as long as &lt;SPAN style="text-decoration: underline;"&gt;there is no duplicates&lt;/SPAN&gt; I think it's just a matter of throwing firmid in a cross join and adjusting the calculation to exclude those records, like this:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG style="font-size: 10.0pt; font-family: 'Courier New';"&gt;proc&lt;/STRONG&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New';"&gt; &lt;STRONG&gt;sql&lt;/STRONG&gt; noprint;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New';"&gt;create table want as&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New';"&gt;select a.portid, a.firmid,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New';"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New';"&gt;(b.sumreturn-c.sumreturn+return)/(b.cntreturn-c.cntreturn+&lt;STRONG&gt;1&lt;/STRONG&gt;) as avgreturn&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New';"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; from&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New';"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return as a,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New';"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (select portid, myear, sum(return) as sumreturn, count(return) as cntreturn&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New';"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; from return group by portid, myear) as b,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New';"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (select firmid, myear, sum(return) as sumreturn, count(return) as cntreturn&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New';"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; from return group by firmid, myear) as c&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New';"&gt;where a.portid ne b.portid and a.myear eq b.myear and a.firmid eq c.firmid and b.myear eq c.myear;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG style="font-size: 10.0pt; font-family: 'Courier New';"&gt;quit&lt;/STRONG&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New';"&gt;;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Which will produce this, which I believe is right:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;TABLE border="0" cellpadding="0" cellspacing="0" style="width: 257px;"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD class="xl66" height="20" width="45"&gt;portid&lt;/TD&gt;&lt;TD class="xl67" style="border-left: none;" width="45"&gt;firmid&lt;/TD&gt;&lt;TD class="xl67" style="border-left: none;" width="49"&gt;myear&lt;/TD&gt;&lt;TD class="xl67" style="border-left: none;" width="46"&gt;return&lt;/TD&gt;&lt;TD class="xl68" style="border-left: none;" width="72"&gt;avgreturn&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD align="right" class="xl75" height="20" style="border-top: none;" width="45"&gt;1&lt;/TD&gt;&lt;TD align="right" class="xl76" style="border-top: none; border-left: none;" width="45"&gt;120&lt;/TD&gt;&lt;TD align="right" class="xl76" style="border-top: none; border-left: none;" width="49"&gt;200301&lt;/TD&gt;&lt;TD align="right" class="xl76" style="border-top: none; border-left: none;" width="46"&gt;3.12&lt;/TD&gt;&lt;TD align="right" class="xl79" style="border-top: none; border-left: none;"&gt;-0.6425000&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD align="right" class="xl75" height="20" style="border-top: none;" width="45"&gt;1&lt;/TD&gt;&lt;TD align="right" class="xl76" style="border-top: none; border-left: none;" width="45"&gt;130&lt;/TD&gt;&lt;TD align="right" class="xl76" style="border-top: none; border-left: none;" width="49"&gt;200301&lt;/TD&gt;&lt;TD align="right" class="xl76" style="border-top: none; border-left: none;" width="46"&gt;2.13&lt;/TD&gt;&lt;TD align="right" class="xl79" style="border-top: none; border-left: none;"&gt;-0.6425000&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD align="right" class="xl75" height="20" style="border-top: none;" width="45"&gt;1&lt;/TD&gt;&lt;TD align="right" class="xl76" style="border-top: none; border-left: none;" width="45"&gt;100&lt;/TD&gt;&lt;TD align="right" class="xl76" style="border-top: none; border-left: none;" width="49"&gt;200301&lt;/TD&gt;&lt;TD align="right" class="xl76" style="border-top: none; border-left: none;" width="46"&gt;2.11&lt;/TD&gt;&lt;TD align="right" class="xl79" style="border-top: none; border-left: none;"&gt;-0.6425000&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD align="right" class="xl75" height="20" style="border-top: none;" width="45"&gt;1&lt;/TD&gt;&lt;TD align="right" class="xl76" style="border-top: none; border-left: none;" width="45"&gt;140&lt;/TD&gt;&lt;TD align="right" class="xl76" style="border-top: none; border-left: none;" width="49"&gt;200301&lt;/TD&gt;&lt;TD align="right" class="xl76" style="border-top: none; border-left: none;" width="46"&gt;3.67&lt;/TD&gt;&lt;TD align="right" class="xl79" style="border-top: none; border-left: none;"&gt;-0.6425000&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD align="right" class="xl69" height="20" style="border-top: none;" width="45"&gt;1&lt;/TD&gt;&lt;TD align="right" class="xl65" style="border-top: none; border-left: none;" width="45"&gt;120&lt;/TD&gt;&lt;TD align="right" class="xl65" style="border-top: none; border-left: none;" width="49"&gt;200302&lt;/TD&gt;&lt;TD align="right" class="xl65" style="border-top: none; border-left: none;" width="46"&gt;6.81&lt;/TD&gt;&lt;TD align="right" class="xl70" style="border-top: none; border-left: none;" width="72"&gt;3.2816667&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD align="right" class="xl69" height="20" style="border-top: none;" width="45"&gt;1&lt;/TD&gt;&lt;TD align="right" class="xl65" style="border-top: none; border-left: none;" width="45"&gt;130&lt;/TD&gt;&lt;TD align="right" class="xl65" style="border-top: none; border-left: none;" width="49"&gt;200302&lt;/TD&gt;&lt;TD align="right" class="xl65" style="border-top: none; border-left: none;" width="46"&gt;2.55&lt;/TD&gt;&lt;TD align="right" class="xl70" style="border-top: none; border-left: none;" width="72"&gt;3.2816667&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD align="right" class="xl69" height="20" style="border-top: none;" width="45"&gt;1&lt;/TD&gt;&lt;TD align="right" class="xl65" style="border-top: none; border-left: none;" width="45"&gt;150&lt;/TD&gt;&lt;TD align="right" class="xl65" style="border-top: none; border-left: none;" width="49"&gt;200302&lt;/TD&gt;&lt;TD align="right" class="xl65" style="border-top: none; border-left: none;" width="46"&gt;3.21&lt;/TD&gt;&lt;TD align="right" class="xl70" style="border-top: none; border-left: none;" width="72"&gt;3.2816667&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD align="right" class="xl69" height="20" style="border-top: none;" width="45"&gt;1&lt;/TD&gt;&lt;TD align="right" class="xl65" style="border-top: none; border-left: none;" width="45"&gt;140&lt;/TD&gt;&lt;TD align="right" class="xl65" style="border-top: none; border-left: none;" width="49"&gt;200302&lt;/TD&gt;&lt;TD align="right" class="xl65" style="border-top: none; border-left: none;" width="46"&gt;4.21&lt;/TD&gt;&lt;TD align="right" class="xl70" style="border-top: none; border-left: none;" width="72"&gt;3.2816667&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD align="right" class="xl69" height="20" style="border-top: none;" width="45"&gt;1&lt;/TD&gt;&lt;TD align="right" class="xl65" style="border-top: none; border-left: none;" width="45"&gt;100&lt;/TD&gt;&lt;TD align="right" class="xl65" style="border-top: none; border-left: none;" width="49"&gt;200302&lt;/TD&gt;&lt;TD align="right" class="xl65" style="border-top: none; border-left: none;" width="46"&gt;5.23&lt;/TD&gt;&lt;TD align="right" class="xl70" style="border-top: none; border-left: none;" width="72"&gt;3.2060000&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD align="right" class="xl75" height="20" style="border-top: none;" width="45"&gt;2&lt;/TD&gt;&lt;TD align="right" class="xl76" style="border-top: none; border-left: none;" width="45"&gt;330&lt;/TD&gt;&lt;TD align="right" class="xl76" style="border-top: none; border-left: none;" width="49"&gt;200301&lt;/TD&gt;&lt;TD align="right" class="xl77" style="border-top: none; border-left: none;"&gt;-2.13&lt;/TD&gt;&lt;TD align="right" class="xl78" style="border-top: none; border-left: none;" width="72"&gt;2.7575000&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD align="right" class="xl75" height="20" style="border-top: none;" width="45"&gt;2&lt;/TD&gt;&lt;TD align="right" class="xl76" style="border-top: none; border-left: none;" width="45"&gt;300&lt;/TD&gt;&lt;TD align="right" class="xl76" style="border-top: none; border-left: none;" width="49"&gt;200301&lt;/TD&gt;&lt;TD align="right" class="xl76" style="border-top: none; border-left: none;" width="46"&gt;1.11&lt;/TD&gt;&lt;TD align="right" class="xl78" style="border-top: none; border-left: none;" width="72"&gt;2.7575000&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD align="right" class="xl75" height="20" style="border-top: none;" width="45"&gt;2&lt;/TD&gt;&lt;TD align="right" class="xl76" style="border-top: none; border-left: none;" width="45"&gt;340&lt;/TD&gt;&lt;TD align="right" class="xl76" style="border-top: none; border-left: none;" width="49"&gt;200301&lt;/TD&gt;&lt;TD align="right" class="xl77" style="border-top: none; border-left: none;"&gt;-1.67&lt;/TD&gt;&lt;TD align="right" class="xl78" style="border-top: none; border-left: none;" width="72"&gt;2.7575000&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD align="right" class="xl75" height="20" style="border-top: none;" width="45"&gt;2&lt;/TD&gt;&lt;TD align="right" class="xl76" style="border-top: none; border-left: none;" width="45"&gt;320&lt;/TD&gt;&lt;TD align="right" class="xl76" style="border-top: none; border-left: none;" width="49"&gt;200301&lt;/TD&gt;&lt;TD align="right" class="xl76" style="border-top: none; border-left: none;" width="46"&gt;0.12&lt;/TD&gt;&lt;TD align="right" class="xl78" style="border-top: none; border-left: none;" width="72"&gt;2.7575000&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD align="right" class="xl69" height="20" style="border-top: none;" width="45"&gt;2&lt;/TD&gt;&lt;TD align="right" class="xl65" style="border-top: none; border-left: none;" width="45"&gt;300&lt;/TD&gt;&lt;TD align="right" class="xl65" style="border-top: none; border-left: none;" width="49"&gt;200302&lt;/TD&gt;&lt;TD align="right" class="xl65" style="border-top: none; border-left: none;" width="46"&gt;4.23&lt;/TD&gt;&lt;TD align="right" class="xl71" style="border-top: none; border-left: none;"&gt;-0.6425000&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD align="right" class="xl69" height="20" style="border-top: none;" width="45"&gt;2&lt;/TD&gt;&lt;TD align="right" class="xl65" style="border-top: none; border-left: none;" width="45"&gt;320&lt;/TD&gt;&lt;TD align="right" class="xl65" style="border-top: none; border-left: none;" width="49"&gt;200302&lt;/TD&gt;&lt;TD align="right" class="xl65" style="border-top: none; border-left: none;" width="46"&gt;3.81&lt;/TD&gt;&lt;TD align="right" class="xl70" style="border-top: none; border-left: none;" width="72"&gt;4.4020000&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD align="right" class="xl69" height="20" style="border-top: none;" width="45"&gt;2&lt;/TD&gt;&lt;TD align="right" class="xl65" style="border-top: none; border-left: none;" width="45"&gt;350&lt;/TD&gt;&lt;TD align="right" class="xl65" style="border-top: none; border-left: none;" width="49"&gt;200302&lt;/TD&gt;&lt;TD align="right" class="xl65" style="border-top: none; border-left: none;" width="46"&gt;4.23&lt;/TD&gt;&lt;TD align="right" class="xl70" style="border-top: none; border-left: none;" width="72"&gt;4.4020000&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD align="right" class="xl69" height="20" style="border-top: none;" width="45"&gt;2&lt;/TD&gt;&lt;TD align="right" class="xl65" style="border-top: none; border-left: none;" width="45"&gt;340&lt;/TD&gt;&lt;TD align="right" class="xl65" style="border-top: none; border-left: none;" width="49"&gt;200302&lt;/TD&gt;&lt;TD align="right" class="xl65" style="border-top: none; border-left: none;" width="46"&gt;2.21&lt;/TD&gt;&lt;TD align="right" class="xl70" style="border-top: none; border-left: none;" width="72"&gt;4.4020000&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD align="right" class="xl69" height="20" style="border-top: none;" width="45"&gt;2&lt;/TD&gt;&lt;TD align="right" class="xl65" style="border-top: none; border-left: none;" width="45"&gt;330&lt;/TD&gt;&lt;TD align="right" class="xl65" style="border-top: none; border-left: none;" width="49"&gt;200302&lt;/TD&gt;&lt;TD align="right" class="xl65" style="border-top: none; border-left: none;" width="46"&gt;1.55&lt;/TD&gt;&lt;TD align="right" class="xl70" style="border-top: none; border-left: none;" width="72"&gt;4.4020000&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD align="right" class="xl72" height="21" style="border-top: none;" width="45"&gt;2&lt;/TD&gt;&lt;TD align="right" class="xl73" style="border-top: none; border-left: none;" width="45"&gt;100&lt;/TD&gt;&lt;TD align="right" class="xl73" style="border-top: none; border-left: none;" width="49"&gt;200302&lt;/TD&gt;&lt;TD align="right" class="xl73" style="border-top: none; border-left: none;" width="46"&gt;3.66&lt;/TD&gt;&lt;TD align="right" class="xl74" style="border-top: none; border-left: none;" width="72"&gt;4.1950000&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Courier New';"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Cheers from Portugal.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Daniel Santos @ &lt;A href="http://www.cgd.pt" target="_blank"&gt;www.cgd.pt&lt;/A&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 30 Oct 2012 15:45:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110806#M30734</guid>
      <dc:creator>DanielSantos</dc:creator>
      <dc:date>2012-10-30T15:45:37Z</dc:date>
    </item>
    <item>
      <title>Re: Proc SQL with Large Data Sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110807#M30735</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I probably don't understand the problem correct but, in the chance that I do, I think a simplified version of Daniel's code together with a small datastep can solve the problem.&amp;nbsp; Let me/us know.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc sql noprint;&lt;/P&gt;&lt;P&gt;&amp;nbsp; create table want as&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; select distinct a.portid,a.firmid,a.myear,a.return,b.sumreturn, b.cntreturn,bportid&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; from return as a&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; left join&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (select portid as bportid, myear, sum(return) as sumreturn, count(return) as cntreturn&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; from return group by myear,bportid) as b&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; on a.myear=b.myear&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; where a.portid eq bportid&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; order by firmid,myear,portid&lt;/P&gt;&lt;P&gt;&amp;nbsp; ;&lt;/P&gt;&lt;P&gt;quit;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data want;&lt;/P&gt;&lt;P&gt;&amp;nbsp; set want;&lt;/P&gt;&lt;P&gt;&amp;nbsp; by firmid myear;&lt;/P&gt;&lt;P&gt;&amp;nbsp; if not (first.myear and last.myear) then do;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; sumreturn=sumreturn-return;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; cntreturn=cntreturn-1;&lt;/P&gt;&lt;P&gt;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;&amp;nbsp; average=sumreturn/cntreturn;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 30 Oct 2012 20:50:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110807#M30735</guid>
      <dc:creator>art297</dc:creator>
      <dc:date>2012-10-30T20:50:53Z</dc:date>
    </item>
    <item>
      <title>Re: Proc SQL with Large Data Sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110808#M30736</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I sincerely thank each of you for sparing your valuable time to help me out. I tried Hash( ) approach, but my hard disk ran out of space. I am still trying to implement ghastly_kitten' suggestion. Daniel's code is very efficient (takes about 1 minute instead of 24+ hours) and it works correctly in the small data set, but for some reasons, in my real data set, it is creating more than necessary observations (961,896 is the number of rows in "return", but Daniel's code is creating 47,132,904 observations). Arthur's modification solves that problem, but the average for a given firmid still contains that firm's returns. Below is Daniel's code (modified by Arthur and slightly changed by me). The only remaining task is to exclude a stock's own return in the computation of its corresponding average return (a stock's average return should be calculated by averaging all &lt;SPAN style="text-decoration: underline;"&gt;different&lt;/SPAN&gt; stocks in&lt;SPAN style="text-decoration: underline;"&gt; different&lt;/SPAN&gt; portfolios. In other words, for portid=1 and firmid=100, I cannot have any stocks from portid=1 and I must also exclude firmid=100 since the empirical design allow a firmid to exist in more than one portid) The final result should match Daniel's output. I will be happy to provide more detail if you would like. I am also trying to figure out how to impose the condition that a firm's return should be excluded from its average (taking me some time to figure out how Daniel did and how I can incorporate that into Arthur's code), but I would really appreciate your help in figuring this out.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc sql noprint;&lt;/P&gt;&lt;P&gt;&amp;nbsp; create table want as&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; select distinct a.portid,a.firmid,a.myear,a.return,b.sumreturn, b.cntreturn,bportid&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; from return as a&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; left join&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (select portid as bportid, myear, sum(return) as sumreturn, count(return) as cntreturn&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; from return group by myear,bportid) as b&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; on a.myear=b.myear&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; where a.portid ne bportid *this is where I made a slight correction;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; order by firmid,myear,portid&lt;/P&gt;&lt;P&gt;&amp;nbsp; ;&lt;/P&gt;&lt;P&gt;quit;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data want;&lt;/P&gt;&lt;P&gt;&amp;nbsp; set want;&lt;/P&gt;&lt;P&gt;&amp;nbsp; by firmid myear;&lt;/P&gt;&lt;P&gt;&amp;nbsp; if not (first.myear and last.myear) then do;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; sumreturn=sumreturn-return;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; cntreturn=cntreturn-1;&lt;/P&gt;&lt;P&gt;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt; average=sumreturn/cntreturn;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;proc sort data=want; by portid myear firmid; run;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 30 Oct 2012 22:23:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110808#M30736</guid>
      <dc:creator>finans_sas</dc:creator>
      <dc:date>2012-10-30T22:23:34Z</dc:date>
    </item>
    <item>
      <title>Re: Proc SQL with Large Data Sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110809#M30737</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;This is one of those cases where a macro and a loop may be more efficient than the BY method for calculating something, simply because of space and time.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;1. Create your list of ID's and Years&lt;/P&gt;&lt;P&gt;2. Loop through doing each as required using a where clause to subset your data.&lt;/P&gt;&lt;P&gt;3. Calculate the stats and append to table, dropping intermediary tables&lt;/P&gt;&lt;P&gt;4. Output the end table.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 30 Oct 2012 22:27:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110809#M30737</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2012-10-30T22:27:52Z</dc:date>
    </item>
    <item>
      <title>Re: Proc SQL with Large Data Sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110810#M30738</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I thought the latest suggested code did what you wanted.&amp;nbsp; What would help is an example dataset with the averages that you want to appear.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 30 Oct 2012 23:02:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110810#M30738</guid>
      <dc:creator>art297</dc:creator>
      <dc:date>2012-10-30T23:02:52Z</dc:date>
    </item>
    <item>
      <title>Re: Proc SQL with Large Data Sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110811#M30739</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Below is what you have requested (based on the code I have). I hope this helps. The program works great, but takes a long time to process.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;DATA return;&lt;/P&gt;&lt;P&gt;INPUT portid firmid myear return;&lt;/P&gt;&lt;P&gt;DATALINES;&lt;/P&gt;&lt;P&gt;1 100 200301&amp;nbsp; 2.11&lt;/P&gt;&lt;P&gt;1 120 200301&amp;nbsp; 3.12&lt;/P&gt;&lt;P&gt;1 130 200301&amp;nbsp; 2.13&lt;/P&gt;&lt;P&gt;1 140 200301&amp;nbsp; 3.67&lt;/P&gt;&lt;P&gt;1 100 200302&amp;nbsp; 5.23&lt;/P&gt;&lt;P&gt;1 120 200302&amp;nbsp; 6.81&lt;/P&gt;&lt;P&gt;1 130 200302&amp;nbsp; 2.55&lt;/P&gt;&lt;P&gt;1 140 200302&amp;nbsp; 4.21&lt;/P&gt;&lt;P&gt;1 150 200302&amp;nbsp; 3.21&lt;/P&gt;&lt;P&gt;2 300 200301&amp;nbsp; 1.11&lt;/P&gt;&lt;P&gt;2 320 200301&amp;nbsp; 0.12&lt;/P&gt;&lt;P&gt;2 330 200301 -2.13&lt;/P&gt;&lt;P&gt;2 340 200301 -1.67&lt;/P&gt;&lt;P&gt;2 300 200302&amp;nbsp; 4.23&lt;/P&gt;&lt;P&gt;2 320 200302&amp;nbsp; 3.81&lt;/P&gt;&lt;P&gt;2 330 200302&amp;nbsp; 1.55&lt;/P&gt;&lt;P&gt;2 340 200302&amp;nbsp; 2.21&lt;/P&gt;&lt;P&gt;2 350 200302&amp;nbsp; 4.23&lt;/P&gt;&lt;P&gt;2 100 200302&amp;nbsp; 3.66&lt;/P&gt;&lt;P&gt;;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc transpose data=return out=need (drop=_name_);&lt;/P&gt;&lt;P&gt;&amp;nbsp; var return;&lt;/P&gt;&lt;P&gt;&amp;nbsp; id firmid;&lt;/P&gt;&lt;P&gt;&amp;nbsp; by portid myear;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data need; set need;&lt;/P&gt;&lt;P&gt;rename myear=myear2;&lt;/P&gt;&lt;P&gt;rename portid=portid2;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc sql;&lt;/P&gt;&lt;P&gt;create table want as&lt;/P&gt;&lt;P&gt;&amp;nbsp; select *&lt;/P&gt;&lt;P&gt;&amp;nbsp; from return, need&lt;/P&gt;&lt;P&gt;&amp;nbsp; where return.myear=need.myear2 and return.portid ne need.portid2;&lt;/P&gt;&lt;P&gt;quit;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data want (drop=i _:);;&lt;/P&gt;&lt;P&gt;&amp;nbsp; set want;&lt;/P&gt;&lt;P&gt;&amp;nbsp; array returns(*) _100--_350;&lt;/P&gt;&lt;P&gt;&amp;nbsp; do i=1 to dim(returns);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if input(substr(vname(returns(i)),2),8.) eq firmid then&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; call missing(returns(i));&lt;/P&gt;&lt;P&gt;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;&amp;nbsp; average=mean(of returns(*));&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;proc sort data=want;&amp;nbsp; by portid myear firmid; run;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 30 Oct 2012 23:33:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110811#M30739</guid>
      <dc:creator>finans_sas</dc:creator>
      <dc:date>2012-10-30T23:33:25Z</dc:date>
    </item>
    <item>
      <title>Re: Proc SQL with Large Data Sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110812#M30740</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I still don't know if I fully understand the problem you are trying to solve, but the following matches your test data results:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc sql noprint;&lt;/P&gt;&lt;P&gt;&amp;nbsp; create table want as&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; select distinct a.portid,a.firmid,a.myear,a.return,b.sumreturn, b.cntreturn,bportid&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; from return as a&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; left join&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (select portid as bportid, myear, sum(return) as sumreturn, count(return) as cntreturn&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; from return group by myear,bportid) as b&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; on a.myear=b.myear&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; where a.portid ne bportid&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; order by firmid,myear,portid&lt;/P&gt;&lt;P&gt;&amp;nbsp; ;&lt;/P&gt;&lt;P&gt;quit;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data want (drop=next_measure prev_measure measure);&lt;/P&gt;&lt;P&gt;&amp;nbsp; set want;&lt;/P&gt;&lt;P&gt;&amp;nbsp; by firmid myear;&lt;/P&gt;&lt;P&gt;&amp;nbsp; set want ( firstobs = 2 keep = return rename = (return = Next_Measure) )&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; want (&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; obs = 1 drop = _all_&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; );&lt;/P&gt;&lt;P&gt;&amp;nbsp; Prev_Measure = ifn( first.myear, (.), lag(return) );&lt;/P&gt;&lt;P&gt;&amp;nbsp; Next_Measure = ifn(&amp;nbsp; last.myear, (.), Next_Measure );&lt;/P&gt;&lt;P&gt;&amp;nbsp; if not (first.myear and last.myear) then do;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if first.myear then measure=next_measure;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; else measure=prev_measure;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; sumreturn=sumreturn-measure;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; cntreturn=cntreturn-1;&lt;/P&gt;&lt;P&gt;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;&amp;nbsp; average=sumreturn/cntreturn;&lt;/P&gt;&lt;P&gt; run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc sort data=want;&lt;/P&gt;&lt;P&gt;&amp;nbsp; by myear portid&amp;nbsp; firmid;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 31 Oct 2012 00:49:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110812#M30740</guid>
      <dc:creator>art297</dc:creator>
      <dc:date>2012-10-31T00:49:27Z</dc:date>
    </item>
    <item>
      <title>Re: Proc SQL with Large Data Sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110813#M30741</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;&lt;A __default_attr="3219" __jive_macro_name="user" class="jive_macro jive_macro_user" data-objecttype="3" href="https://communities.sas.com/"&gt;&lt;/A&gt; You apparently updated your post at the same time as I sent you a response.&amp;nbsp; Take a look at what I had sent.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 31 Oct 2012 01:04:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110813#M30741</guid>
      <dc:creator>art297</dc:creator>
      <dc:date>2012-10-31T01:04:02Z</dc:date>
    </item>
    <item>
      <title>Re: Proc SQL with Large Data Sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110814#M30742</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Thank you Arthur (art297) for spending your valuable time helping me out. Your updated version indeed generates the desirable output. However, it will take me a while to digest what exactly is going on in the code and to make sure that I can comfortably apply to my large data set. I am really sorry that I have not made my inquiry very clear. Maybe the story below may shed some light:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Imagine that you have two portfolios (portid 1 and 2) and company A is in both portid 1 and portid2. Portid 1 also contains Companies B and C. Portid 2 contains Companies D and E. In short,&lt;/P&gt;&lt;P&gt;Portid 1: A, B, C&lt;/P&gt;&lt;P&gt;Portid 2: A, D, E&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am interested in calculating an average value for each company by averaging returns of companies in other portfolios. For example,&lt;/P&gt;&lt;P&gt;average for C: (A+D+E)/3, all of which come from portid 2 (A is common)&lt;/P&gt;&lt;P&gt;average for E: (A+B+C)/3, all of which come from portid 1&lt;/P&gt;&lt;P&gt;average for A in portid 1= (D+E)/2&lt;/P&gt;&lt;P&gt;average for A in portid 2=(B+C)/2&lt;/P&gt;&lt;P&gt;Overall, I have two rules: a company's average return must exclude its own return (as in A) and must only include returns from other portfolios.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I hope my objective is clearer now. &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 31 Oct 2012 01:56:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110814#M30742</guid>
      <dc:creator>finans_sas</dc:creator>
      <dc:date>2012-10-31T01:56:45Z</dc:date>
    </item>
    <item>
      <title>Re: Proc SQL with Large Data Sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110815#M30743</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;As long as your real data is limited to two portfolios I think the code will work.&amp;nbsp; In the case where there is only one portfolio, it simply takes the average.&amp;nbsp; In the case where there are two portfolios, it gets the revenue to removed from the other portfolio.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 31 Oct 2012 02:30:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110815#M30743</guid>
      <dc:creator>art297</dc:creator>
      <dc:date>2012-10-31T02:30:11Z</dc:date>
    </item>
    <item>
      <title>Re: Proc SQL with Large Data Sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110816#M30744</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Thank you for your further explanations. In this project, I have 50 portfolios, but sometimes it gets as high as 400. I should then be careful about the portfolio size.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 31 Oct 2012 03:10:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-with-Large-Data-Sets/m-p/110816#M30744</guid>
      <dc:creator>finans_sas</dc:creator>
      <dc:date>2012-10-31T03:10:22Z</dc:date>
    </item>
  </channel>
</rss>

