<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Old SAS proc sql question needs explanation in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Old-SAS-proc-sql-question-needs-explanation/m-p/679649#M205246</link>
    <description>somehow the "red" option does not work in "SAS code" area. and I can't edit my own post..&lt;BR /&gt;The part where I don't understand is this:&lt;BR /&gt;(select avg(Salary) from WORK.PILOTS as P1where P1.Jobcode=P2.Jobcode) as Avg&lt;BR /&gt;</description>
    <pubDate>Thu, 27 Aug 2020 05:48:50 GMT</pubDate>
    <dc:creator>happy_sas_kitty</dc:creator>
    <dc:date>2020-08-27T05:48:50Z</dc:date>
    <item>
      <title>Old SAS proc sql question needs explanation</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Old-SAS-proc-sql-question-needs-explanation/m-p/679648#M205245</link>
      <description>&lt;P&gt;This question has been asked before but I did not understand the explanation.&lt;/P&gt;&lt;P&gt;Old link: &lt;A href="https://communities.sas.com/t5/General-SAS-Programming/PROC-SQL/td-p/314917" target="_blank"&gt;https://communities.sas.com/t5/General-SAS-Programming/PROC-SQL/td-p/314917&lt;/A&gt;&lt;/P&gt;&lt;P&gt;The data is here:&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data work.pilots;
infile datalines;
input id name $ Jobcode $ Salary;
datalines;
001 Albert PT1 50000
002 Brenda PT1 70000
003 Carl PT1 60000
004 Donna PT2 80000
005 Edward PT2 90000
006 Flora PT3 100000
;
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;We want to get a result that looks like this:&lt;/P&gt;&lt;P&gt;&lt;SPAN class="fontstyle0"&gt;Jobcode Salary Avg&lt;BR /&gt;------- ------ -----&lt;BR /&gt;PT1 50000 60000&lt;BR /&gt;PT1 70000 60000&lt;BR /&gt;PT1 60000 60000&lt;BR /&gt;PT2 80000 85000&lt;BR /&gt;PT2 90000 85000&lt;BR /&gt;PT3 100000 100000&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;It seems that the results come from average(salary) by group, ordered by id.&lt;/P&gt;&lt;P&gt;Why does the code below work? The part I don't understand is highlighted in red.&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;&lt;SPAN class="fontstyle0"&gt;select&lt;BR /&gt;Jobcode,&lt;/SPAN&gt;&lt;SPAN class="fontstyle2"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN class="fontstyle0"&gt;Salary,&lt;BR /&gt;&lt;FONT color="#FF0000"&gt;(select avg(Salary) &lt;/FONT&gt;&lt;FONT color="#FF0000"&gt;from WORK.PILOTS as P1&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#FF0000"&gt;where P1.Jobcode=P2.Jobcode&lt;/FONT&gt;) as Avg&lt;BR /&gt;from WORK.PILOTS as P2&lt;BR /&gt;order by Id&lt;BR /&gt;;&lt;/SPAN&gt; &lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Here's what I'm thinking about the red part subquery:&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;select avg(salary) from work.pilots as p1&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;this gives us 1 value of avg(all salaries) in p1. It has no jobcode column. Thus the next line "where p1.jobcode = p2.jobcode" should not work.&lt;/P&gt;&lt;P&gt;This would make more sense:&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;select avg(Salary) from WORK.PILOTS as P1&lt;BR /&gt;group by jobcode&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;However this code, although giving us 3 averages for 3 jobcodes, does not have an one-to-one relationship with pilots.id&lt;/P&gt;&lt;P&gt;Can anyone explain step by step on how the codes marked in red works?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 27 Aug 2020 05:45:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Old-SAS-proc-sql-question-needs-explanation/m-p/679648#M205245</guid>
      <dc:creator>happy_sas_kitty</dc:creator>
      <dc:date>2020-08-27T05:45:53Z</dc:date>
    </item>
    <item>
      <title>Re: Old SAS proc sql question needs explanation</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Old-SAS-proc-sql-question-needs-explanation/m-p/679649#M205246</link>
      <description>somehow the "red" option does not work in "SAS code" area. and I can't edit my own post..&lt;BR /&gt;The part where I don't understand is this:&lt;BR /&gt;(select avg(Salary) from WORK.PILOTS as P1where P1.Jobcode=P2.Jobcode) as Avg&lt;BR /&gt;</description>
      <pubDate>Thu, 27 Aug 2020 05:48:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Old-SAS-proc-sql-question-needs-explanation/m-p/679649#M205246</guid>
      <dc:creator>happy_sas_kitty</dc:creator>
      <dc:date>2020-08-27T05:48:50Z</dc:date>
    </item>
    <item>
      <title>Re: Old SAS proc sql question needs explanation</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Old-SAS-proc-sql-question-needs-explanation/m-p/679650#M205247</link>
      <description>&lt;P&gt;The stuff in the paranthesis is a subquery; meaning that for jobcode PT1 the average salary for the first 3 rows (the ones having that jobcode) is selected.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So for those rows, the subquery is equivalent to&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;select avg(Salary) from WORK.PILOTS as P1 where P1.Jobcode='PT1'&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;which returns a single value - and so on for the other jobcodes.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Your GROUP BY suggestion would not work in such a subquery, as it would not return a single row.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You could, in SAS SQL, use a GROUP BY to get the same result like this:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
  select
    ID,
    Jobcode,
    Salary,
    avg(Salary) as avg 
    from WORK.PILOTS 
    group by Jobcode
    order by Id; 
quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The reason it is not done like that is probably that this type of query would not work in most other SQL dialects, whereas the subquery solution would.&lt;/P&gt;</description>
      <pubDate>Thu, 27 Aug 2020 06:20:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Old-SAS-proc-sql-question-needs-explanation/m-p/679650#M205247</guid>
      <dc:creator>s_lassen</dc:creator>
      <dc:date>2020-08-27T06:20:32Z</dc:date>
    </item>
    <item>
      <title>Re: Old SAS proc sql question needs explanation</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Old-SAS-proc-sql-question-needs-explanation/m-p/679651#M205248</link>
      <description>&lt;P&gt;It is a sub-select that calculates the avg for the jobcode currently encountered in the main SELECT.&lt;/P&gt;</description>
      <pubDate>Thu, 27 Aug 2020 06:30:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Old-SAS-proc-sql-question-needs-explanation/m-p/679651#M205248</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2020-08-27T06:30:06Z</dc:date>
    </item>
    <item>
      <title>Re: Old SAS proc sql question needs explanation</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Old-SAS-proc-sql-question-needs-explanation/m-p/679652#M205249</link>
      <description>&lt;P&gt;The sub-select is necessary in SQL dialects that do not allow re-merging. Since SAS SQL does allow this, the simpler code presented by &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/76464"&gt;@s_lassen&lt;/a&gt;&amp;nbsp;will solve your issue.&lt;/P&gt;</description>
      <pubDate>Thu, 27 Aug 2020 06:33:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Old-SAS-proc-sql-question-needs-explanation/m-p/679652#M205249</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2020-08-27T06:33:34Z</dc:date>
    </item>
    <item>
      <title>Re: Old SAS proc sql question needs explanation</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Old-SAS-proc-sql-question-needs-explanation/m-p/679658#M205253</link>
      <description>Thank you for the fast reply. I somehow still don't understand what's going on.&lt;BR /&gt;how did&lt;BR /&gt;(select avg(Salary) from WORK.PILOTS as P1where P1.Jobcode=P2.Jobcode) as Avg&lt;BR /&gt;transform into this:&lt;BR /&gt;select avg(Salary) from WORK.PILOTS as P1 where P1.Jobcode='PT1'&lt;BR /&gt;I probably need a step by step explanation.&lt;BR /&gt;&lt;BR /&gt;after reading your reply, my guess is, the whole thing happens like this:&lt;BR /&gt;the proc sql runs row by row on the outer query.&lt;BR /&gt;It sees the "from pilots as p2" first, so it sets up the row(pdv?) id, jobcode, salary, avg and sets all to missing.&lt;BR /&gt;on 1st row: it loads id, jobcode, salary from p2. It sees column Avg and tries to calculate it.&lt;BR /&gt;Avg is calculated from: (select avg(Salary) from WORK.PILOTS as P1where P1.Jobcode=P2.Jobcode)&lt;BR /&gt;since currently we are loading from p2 (because the from p2 clause) , we see that p2.jobcode on this 1st row is 'PT1'&lt;BR /&gt;Then, sas looks at "select avg(Salary) from WORK.PILOTS as P1 where P1.Jobcode='PT1'" , and returns the avg value for jobcode='PT1', which is one single value, to avg column.&lt;BR /&gt;If I get this correctly, for jobcode='PT1', sas calculated this avg(salary) 3 times?&lt;BR /&gt;</description>
      <pubDate>Thu, 27 Aug 2020 07:25:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Old-SAS-proc-sql-question-needs-explanation/m-p/679658#M205253</guid>
      <dc:creator>happy_sas_kitty</dc:creator>
      <dc:date>2020-08-27T07:25:55Z</dc:date>
    </item>
    <item>
      <title>Re: Old SAS proc sql question needs explanation</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Old-SAS-proc-sql-question-needs-explanation/m-p/679661#M205254</link>
      <description>&lt;P&gt;You are mostly correct.&lt;/P&gt;
&lt;P&gt;Wrong is this:&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;"It sees the "from pilots as p2" first, so it sets up the row(pdv?) id, jobcode, salary, avg and sets all to missing."&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;That's what a data step does, in SQL the values are always taken from the incoming data; missing values occur only&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;SPAN&gt;if missing values are present in an incoming table&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt;or, in a (outer/left/right) join, one of the contributing tables has no match; then its values will be missing&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;SPAN&gt;But that's a minor detail.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;"If I get this correctly, for jobcode='PT1', sas calculated this avg(salary) 3 times?"&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;SAS SQL does some optimization behind the scenes, so (that's a guess) it should see this construct and summarize once, like this:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;create table _temp_ as
  select
    jobcode,
    avg(salary) as avg
  from&amp;nbsp;work.pilots
;
create table want as
  select
    p.Jobcode,
    p.Salary,
    t.avg
  from work.pilots p left join _temp_ t
  on p.jobcode = t.jobcode
;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;where _temp_ stands for a "hidden" table that exists only for the duration of the query.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;But you may be right, and code like this could be (considerably) less performant than a separate summary:&lt;/SPAN&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data pilots;
infile datalines;
input id name $ Jobcode $ Salary;
datalines;
001 Albert PT1 50000
002 Brenda PT1 70000
003 Carl PT1 60000
004 Donna PT2 80000
005 Edward PT2 90000
006 Flora PT3 100000
;

proc summary data=pilots;
class jobcode;
var salary;
output
  out=avg (keep=jobcode avg)
  mean(salary)=avg
;
run;

data want;
set pilots;
if _n_ = 1
then do;
  length avg 8;
  declare hash a (dataset:"avg");
  a.definekey("jobcode");
  a.definedata("avg");
  a.definedone();
end;
if a.find() ne 0 then avg = .;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;SPAN&gt;This method completely avoids any sorting (all "sorted" tables are built solely in memory) and will be the most performant way to tackle your issue.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 27 Aug 2020 08:07:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Old-SAS-proc-sql-question-needs-explanation/m-p/679661#M205254</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2020-08-27T08:07:23Z</dc:date>
    </item>
  </channel>
</rss>

