<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: SAS left join via proc sql in New SAS User</title>
    <link>https://communities.sas.com/t5/New-SAS-User/SAS-left-join-via-proc-sql/m-p/751474#M29684</link>
    <description>&lt;P&gt;I'm a bit confused at the final goal of the merge.&amp;nbsp; Is it to find out how many patients in ClientList have MainPlatform of XOZ?&amp;nbsp; There's a couple of other strategies I'd use if so:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The next example subsets the bigger dataset with an inline view that only grabs the two variables you're using and takes only the unique combinations of them prior to merging.&amp;nbsp; Again I don't use big data so I don't know if memory wise this is worse or not.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;PROC SQL;
CREATE TABLE merged_table AS
SELECT
t1.ID,
t1.Name,
t1.Location,
t2.MainPlatform
FROM ClientList as t1
LEFT JOIN 
  (select distinct ID, MainPlatform from MainPlatform 
        WHERE MainPlatform = 'XOZ') as t2
ON t1.ID = t2.ID;
QUIT;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The next example would add the MainPlatform variable to your ClientList dataset and then updates it wherever the ID variable is in MainPlatform instead of merging:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;PROC SQL;
ALTER TABLE merged_table 
    ADD MainPlatform char(3);
UPDATE merged_table
    set MainPlatform='XOZ'
    where ID in (select ID from MainPlatform where MainPlatform='XOZ');
QUIT;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Neither of these examples help as much if the number of XOZ rows in MainPlatform matter for a particular ID.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 01 Jul 2021 14:58:26 GMT</pubDate>
    <dc:creator>JeffMeyers</dc:creator>
    <dc:date>2021-07-01T14:58:26Z</dc:date>
    <item>
      <title>SAS left join via proc sql</title>
      <link>https://communities.sas.com/t5/New-SAS-User/SAS-left-join-via-proc-sql/m-p/751463#M29680</link>
      <description>&lt;P&gt;Hi guys,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have been trying to join/match a table (about 10,000 rows) to a table with about 5 million rows. It has been running for hours and I'm afraid I may have done something to prevent this.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any best practice tips?&lt;/P&gt;</description>
      <pubDate>Thu, 01 Jul 2021 14:16:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/SAS-left-join-via-proc-sql/m-p/751463#M29680</guid>
      <dc:creator>rapt1</dc:creator>
      <dc:date>2021-07-01T14:16:07Z</dc:date>
    </item>
    <item>
      <title>Re: SAS left join via proc sql</title>
      <link>https://communities.sas.com/t5/New-SAS-User/SAS-left-join-via-proc-sql/m-p/751465#M29681</link>
      <description>I don't deal with big data, but I would think if you're doing a big join you could take a smaller sample (1000pts?) Of the big dataset to try it on if you're not confident. &lt;BR /&gt;What is the join code you used? Did you join on enough variables?</description>
      <pubDate>Thu, 01 Jul 2021 14:24:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/SAS-left-join-via-proc-sql/m-p/751465#M29681</guid>
      <dc:creator>JeffMeyers</dc:creator>
      <dc:date>2021-07-01T14:24:21Z</dc:date>
    </item>
    <item>
      <title>Re: SAS left join via proc sql</title>
      <link>https://communities.sas.com/t5/New-SAS-User/SAS-left-join-via-proc-sql/m-p/751467#M29682</link>
      <description>&lt;P&gt;Here's the code:&lt;BR /&gt;&lt;BR /&gt;PROC SQL;&lt;BR /&gt;&lt;BR /&gt;CREATE TABLE merged_table AS&lt;BR /&gt;SELECT&lt;BR /&gt;t1.ID,&lt;BR /&gt;t1.Name,&lt;BR /&gt;t1.Location,&lt;BR /&gt;t2.MainPlatform&lt;BR /&gt;FROM ClientList as t1&lt;BR /&gt;LEFT JOIN MainPlatform as t2&lt;BR /&gt;ON t1.ID = t2.ID&lt;BR /&gt;WHERE t2.MainPlatform = 'XOZ';&lt;BR /&gt;QUIT;&lt;BR /&gt;&lt;BR /&gt;so like, T1 had only 10,000 points but T2 had around 5 million&lt;/P&gt;</description>
      <pubDate>Thu, 01 Jul 2021 14:33:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/SAS-left-join-via-proc-sql/m-p/751467#M29682</guid>
      <dc:creator>rapt1</dc:creator>
      <dc:date>2021-07-01T14:33:34Z</dc:date>
    </item>
    <item>
      <title>Re: SAS left join via proc sql</title>
      <link>https://communities.sas.com/t5/New-SAS-User/SAS-left-join-via-proc-sql/m-p/751474#M29684</link>
      <description>&lt;P&gt;I'm a bit confused at the final goal of the merge.&amp;nbsp; Is it to find out how many patients in ClientList have MainPlatform of XOZ?&amp;nbsp; There's a couple of other strategies I'd use if so:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The next example subsets the bigger dataset with an inline view that only grabs the two variables you're using and takes only the unique combinations of them prior to merging.&amp;nbsp; Again I don't use big data so I don't know if memory wise this is worse or not.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;PROC SQL;
CREATE TABLE merged_table AS
SELECT
t1.ID,
t1.Name,
t1.Location,
t2.MainPlatform
FROM ClientList as t1
LEFT JOIN 
  (select distinct ID, MainPlatform from MainPlatform 
        WHERE MainPlatform = 'XOZ') as t2
ON t1.ID = t2.ID;
QUIT;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The next example would add the MainPlatform variable to your ClientList dataset and then updates it wherever the ID variable is in MainPlatform instead of merging:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;PROC SQL;
ALTER TABLE merged_table 
    ADD MainPlatform char(3);
UPDATE merged_table
    set MainPlatform='XOZ'
    where ID in (select ID from MainPlatform where MainPlatform='XOZ');
QUIT;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Neither of these examples help as much if the number of XOZ rows in MainPlatform matter for a particular ID.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 01 Jul 2021 14:58:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/SAS-left-join-via-proc-sql/m-p/751474#M29684</guid>
      <dc:creator>JeffMeyers</dc:creator>
      <dc:date>2021-07-01T14:58:26Z</dc:date>
    </item>
    <item>
      <title>Re: SAS left join via proc sql</title>
      <link>https://communities.sas.com/t5/New-SAS-User/SAS-left-join-via-proc-sql/m-p/751487#M29686</link>
      <description>&lt;P&gt;You're correct in the overall goal of the code block. I need to determine the subset of patients in the ClientList that XOZ MainPlatforms (which can be found in the other table).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I think those are good alternatives. Honestly, I forgot what an inline view is and may have to review it again. I've read about inline before I am thinking the first one might do the trick. The second one is entirely new for me.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have just been learning SAS for a couple months and now is the only time I have to use it in practice. Guess I am overwhelmed at how it actually works on a larger data set as opposed to just using sashelp tables&amp;nbsp;&lt;span class="lia-unicode-emoji" title=":grinning_face_with_sweat:"&gt;😅&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 01 Jul 2021 15:31:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/SAS-left-join-via-proc-sql/m-p/751487#M29686</guid>
      <dc:creator>rapt1</dc:creator>
      <dc:date>2021-07-01T15:31:16Z</dc:date>
    </item>
    <item>
      <title>Re: SAS left join via proc sql</title>
      <link>https://communities.sas.com/t5/New-SAS-User/SAS-left-join-via-proc-sql/m-p/751655#M29699</link>
      <description>&lt;P&gt;5 million records doesn't sound very large, so unless you have a very slow system, it seems that PROC SQL is not able to optimize the query at all.&lt;/P&gt;
&lt;P&gt;And it's all about the data. Do you have any duplicates on ID in either table? If so, how many?&lt;/P&gt;
&lt;P&gt;Inner joins is easier to optimize, so figure out if you really need a left join.&lt;/P&gt;
&lt;P&gt;Indexing on ID &amp;lt;might&amp;gt; help (index join).&lt;/P&gt;
&lt;P&gt;Indexing on MainPlatform might help, especially if it has high number of discrete values.&lt;/P&gt;
&lt;P&gt;To understand how PROC SQL is doing, try add these options:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;optoins msglevel=i fullstimer;
proc sql _method _tree;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;You can also paly with buffersize option to try to get a hash join (since your t1 table has so few rows) and magic= (influences which join algorithm PROC SQL will use).&lt;/P&gt;</description>
      <pubDate>Fri, 02 Jul 2021 08:39:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/SAS-left-join-via-proc-sql/m-p/751655#M29699</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2021-07-02T08:39:53Z</dc:date>
    </item>
    <item>
      <title>Re: SAS left join via proc sql</title>
      <link>https://communities.sas.com/t5/New-SAS-User/SAS-left-join-via-proc-sql/m-p/752072#M29731</link>
      <description>Thank you for this alternative. Will try to read into these additional options further.</description>
      <pubDate>Mon, 05 Jul 2021 12:15:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/SAS-left-join-via-proc-sql/m-p/752072#M29731</guid>
      <dc:creator>rapt1</dc:creator>
      <dc:date>2021-07-05T12:15:34Z</dc:date>
    </item>
    <item>
      <title>Re: SAS left join via proc sql</title>
      <link>https://communities.sas.com/t5/New-SAS-User/SAS-left-join-via-proc-sql/m-p/752076#M29732</link>
      <description>&lt;P&gt;Whenever you need to subset a large dataset along the data in a smaller dataset, consider using a data step and a hash object:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data merged_table;
set mainplatform;
where mainplatform = " XOZ";
if _n_ = 1
then do;
  length
    name
    location /* set the lengths as needed */
  ;
  declare hash t1 (dataset:"clientlist");
  t1.definekey("id");
  t1.definedata("name","location");
  t1.definedone();
  call missing(name,location);
end;
if t1.find() = 0;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Caution: this will only work correctly if id is unique in clientlist.&lt;/P&gt;
&lt;P&gt;The code is not tested in any way.&lt;/P&gt;</description>
      <pubDate>Mon, 05 Jul 2021 12:30:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/SAS-left-join-via-proc-sql/m-p/752076#M29732</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2021-07-05T12:30:18Z</dc:date>
    </item>
    <item>
      <title>Re: SAS left join via proc sql</title>
      <link>https://communities.sas.com/t5/New-SAS-User/SAS-left-join-via-proc-sql/m-p/752080#M29733</link>
      <description>&lt;P&gt;Thanks Kurt! and Thanks everyone for their responses. There is a lot to learn!&lt;/P&gt;</description>
      <pubDate>Mon, 05 Jul 2021 12:39:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/SAS-left-join-via-proc-sql/m-p/752080#M29733</guid>
      <dc:creator>rapt1</dc:creator>
      <dc:date>2021-07-05T12:39:51Z</dc:date>
    </item>
  </channel>
</rss>

