<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: getting the same result but spend a shorter time in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/getting-the-same-result-but-spend-a-shorter-time/m-p/760814#M240634</link>
    <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/37107"&gt;@jimbarbour&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/12447"&gt;@Patrick&lt;/a&gt;,&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Good approach.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If I may ask:&amp;nbsp; Why use&amp;nbsp;multidata:'y'?&amp;nbsp; It sounded as though we wanted a unique count.&amp;nbsp; Wouldn't&amp;nbsp;multidata:'y' &lt;EM&gt;permit&lt;/EM&gt; duplicates?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Jim&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/37107"&gt;@jimbarbour&lt;/a&gt;&amp;nbsp;Ooops! Copy/paste error. Yes, certainly not multidata. I've fixed the code.&lt;/P&gt;</description>
    <pubDate>Wed, 11 Aug 2021 04:59:29 GMT</pubDate>
    <dc:creator>Patrick</dc:creator>
    <dc:date>2021-08-11T04:59:29Z</dc:date>
    <item>
      <title>getting the same result but spend a shorter time</title>
      <link>https://communities.sas.com/t5/SAS-Programming/getting-the-same-result-but-spend-a-shorter-time/m-p/760798#M240620</link>
      <description>&lt;P&gt;Hello all,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am using the following code,&lt;/P&gt;&lt;PRE&gt;data have1_1985;
	infile cards dsd  dlm=",";
	input
	Applicant_name :$8.
	appln_filing_year: 8.
	appln_id :8.
	cited_appln_id :8.
	pre_patent :8.
	;
	cards;
	A,2021,101,201,1
	B,2021,101,202,
	A,2017,102,203,
	A,2016,103,204,
	B,2018,104,205,
	;;;;
run;
data have2_1985;
	infile cards dsd  dlm=",";
	input
	appln_id_citing :8.
	appln_id_citing_p5 :8.
	;
	cards;
	101,201
	101,203
	102,203
	103,205
	104,205
	;;;;
run;
%MACRO Explo(year=);


PROC SQL;
	CREATE TABLE want1_&amp;amp;year. AS
	SELECT DISTINCT
	citing1.Applicant_name,
	citing1.appln_filing_year,
	citing1.appln_id,
	pre_patent,
	citing1.cited_appln_id,
	citing2.appln_id_citing_p5
	FROM
	have1_&amp;amp;year. (keep= Applicant_name appln_filing_year appln_id cited_appln_id pre_patent)as citing1
	JOIN have2_&amp;amp;year.(keep=appln_id_citing appln_id_citing_p5)  AS citing2 ON citing1.appln_id = citing2.appln_id_citing
	order by citing1.appln_id
	;
QUIT;
data want2_&amp;amp;year.;
	length cited_patent 3.;
	set want1_&amp;amp;year.;
	if cited_appln_id=appln_id_citing_p5 then cited_patent=1;
run; 
PROC SQL;
	CREATE TABLE want3_&amp;amp;year. AS
	SELECT DISTINCT
	Applicant_name,
	appln_filing_year,
	appln_id,
	cited_appln_id,
	pre_patent,
	cited_patent
	FROM want2_&amp;amp;year.
	GROUP BY Applicant_name,appln_id,cited_appln_id
	Having cited_patent=max(cited_patent) 
	;
QUIT;
%MEND Explo;
%Explo(year=1985)

&lt;/PRE&gt;&lt;P&gt;however, these codes spend around 10 hours getting the results. Is there any method to simplify the steps, getting the same result but spend a shorter time?&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I expect to get the "Want3" (want1, and want2 are&amp;nbsp;&lt;SPAN&gt;just intermediary)&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;many thanks in advance.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 11 Aug 2021 03:32:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/getting-the-same-result-but-spend-a-shorter-time/m-p/760798#M240620</guid>
      <dc:creator>Alexxxxxxx</dc:creator>
      <dc:date>2021-08-11T03:32:27Z</dc:date>
    </item>
    <item>
      <title>Re: getting the same result but spend a shorter time</title>
      <link>https://communities.sas.com/t5/SAS-Programming/getting-the-same-result-but-spend-a-shorter-time/m-p/760799#M240621</link>
      <description>&lt;P&gt;Is your actual have data in SAS or in a database? Are these yearly tables as the sample data indicates? What are the actual data volumes (like number of rows and volume in GB)?&lt;/P&gt;
&lt;P&gt;Do you need all Want tables or only Want3 and the other two tables in your code are just intermediary?&lt;/P&gt;</description>
      <pubDate>Wed, 11 Aug 2021 03:28:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/getting-the-same-result-but-spend-a-shorter-time/m-p/760799#M240621</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2021-08-11T03:28:14Z</dc:date>
    </item>
    <item>
      <title>Re: getting the same result but spend a shorter time</title>
      <link>https://communities.sas.com/t5/SAS-Programming/getting-the-same-result-but-spend-a-shorter-time/m-p/760801#M240623</link>
      <description>Hi Patrick, thanks for your reply.&lt;BR /&gt;&lt;BR /&gt;the data is current in the SAS. the data is so big that I do not have enough space to get the result, so I split the data by year. By using 1985 year data as an example, have1 has 279295 rows and 5 columns, have2 has 299680568 rows.&lt;BR /&gt;&lt;BR /&gt;I expect to get the "Want3" (want1, and want2 are just intermediary).</description>
      <pubDate>Wed, 11 Aug 2021 03:45:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/getting-the-same-result-but-spend-a-shorter-time/m-p/760801#M240623</guid>
      <dc:creator>Alexxxxxxx</dc:creator>
      <dc:date>2021-08-11T03:45:25Z</dc:date>
    </item>
    <item>
      <title>Re: getting the same result but spend a shorter time</title>
      <link>https://communities.sas.com/t5/SAS-Programming/getting-the-same-result-but-spend-a-shorter-time/m-p/760803#M240624</link>
      <description>&lt;P&gt;Can you post your log?&amp;nbsp; Exactly which step is it that is taking a long time?&amp;nbsp; The 2nd SQL step?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;How many rows does Want1 wind up having?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The first thing that occurs to me is that a WHERE clause will be more efficient than a HAVING with a summary function.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Something like:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;	PROC SQL;
		CREATE TABLE want3_&amp;amp;year. AS
		SELECT DISTINCT
		Applicant_name,
		appln_filing_year,
		appln_id,
		cited_appln_id,
		pre_patent,
		cited_patent
		FROM want2_&amp;amp;year.
		WHERE	cited_patent	&amp;gt;	0
		ORDER BY 
/*		GROUP BY*/
			Applicant_name,appln_id,cited_appln_id
/*		Having cited_patent=max(cited_patent) */
		;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;However, if there are cases where the max(cited_patent) is going to be zero or missing and you're going to &lt;EM&gt;want&lt;/EM&gt; to have them in the final output, my WHERE suggestion won't work.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Jim&lt;/P&gt;</description>
      <pubDate>Wed, 11 Aug 2021 04:34:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/getting-the-same-result-but-spend-a-shorter-time/m-p/760803#M240624</guid>
      <dc:creator>jimbarbour</dc:creator>
      <dc:date>2021-08-11T04:34:53Z</dc:date>
    </item>
    <item>
      <title>Re: getting the same result but spend a shorter time</title>
      <link>https://communities.sas.com/t5/SAS-Programming/getting-the-same-result-but-spend-a-shorter-time/m-p/760805#M240626</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/262815"&gt;@Alexxxxxxx&lt;/a&gt;&amp;nbsp;Your have1 dataset is actually not that big so it's only the join that blows things up. If reformulating things a bit it might be possible to execute for all years at once. Do you have a year or date column in your actual data which we need to add for joining?&lt;/P&gt;</description>
      <pubDate>Wed, 11 Aug 2021 04:10:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/getting-the-same-result-but-spend-a-shorter-time/m-p/760805#M240626</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2021-08-11T04:10:50Z</dc:date>
    </item>
    <item>
      <title>Re: getting the same result but spend a shorter time</title>
      <link>https://communities.sas.com/t5/SAS-Programming/getting-the-same-result-but-spend-a-shorter-time/m-p/760809#M240629</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/262815"&gt;@Alexxxxxxx&lt;/a&gt;,&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I would agree with&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/12447"&gt;@Patrick&lt;/a&gt;&amp;nbsp;that if there is a way you can get all the data you need without generating so many rows, that will be your best method of reducing run time.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;One thing I notice, is that it looks like the Data step really isn't necessary.&amp;nbsp; A Case statement in the first SQL procedure appears to produce the same results.&amp;nbsp; Check it and see what you think.&amp;nbsp; This would eliminate having to read every row again in the Data step.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here's the modified SQL (added Case statement):&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;	PROC SQL;
		CREATE TABLE want1_&amp;amp;year. AS
		SELECT DISTINCT
		citing1.Applicant_name,
		citing1.appln_filing_year,
		citing1.appln_id,
		pre_patent,
		citing1.cited_appln_id,
		citing2.appln_id_citing_p5
		,CASE	when	cited_appln_id=appln_id_citing_p5 then 1
			ELSE .	END	AS	cited_patent
		FROM
		have1_&amp;amp;year. (keep= Applicant_name appln_filing_year appln_id cited_appln_id pre_patent)as citing1
		JOIN have2_&amp;amp;year.(keep=appln_id_citing appln_id_citing_p5)  AS citing2 ON citing1.appln_id = citing2.appln_id_citing
		order by citing1.appln_id
		;
	QUIT;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Results:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="jimbarbour_1-1628656663713.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/62463i92F5DB54581E8282/image-size/medium?v=v2&amp;amp;px=400" role="button" title="jimbarbour_1-1628656663713.png" alt="jimbarbour_1-1628656663713.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 11 Aug 2021 04:39:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/getting-the-same-result-but-spend-a-shorter-time/m-p/760809#M240629</guid>
      <dc:creator>jimbarbour</dc:creator>
      <dc:date>2021-08-11T04:39:29Z</dc:date>
    </item>
    <item>
      <title>Re: getting the same result but spend a shorter time</title>
      <link>https://communities.sas.com/t5/SAS-Programming/getting-the-same-result-but-spend-a-shorter-time/m-p/760811#M240631</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/262815"&gt;@Alexxxxxxx&lt;/a&gt;&amp;nbsp;Looking at your have2 dataset and the number of rows I suspect that the two columns you give us are not the primary key and if de-duping volumes get eventually quite a bit lower.&lt;/P&gt;
&lt;P&gt;To test the theory can you please execute below against your real data and tell us if that works (fits into memory) and what number n_obs prints in the log.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data _null_;
  if 0 then set have2_1985;
  dcl hash h1(dataset:"have2_1985");
  h1.defineKey('appln_id_citing','appln_id_citing_p5');
  h1.defineDone();
  n_obs=h1.num_items;
  put n_obs=;
  stop;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;And an added question: Is your original data source from which you create HAVE2 already a SAS table or is this data in a data base? If it's the latter then one should reduce volumes already on the DB side.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Below to illustrate my line of thought (fully working code) - but it all comes down if have2 can fit into memory else another coding approach will be required.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have1_1985;
  infile cards dsd  dlm=",";
  input
    Applicant_name :$8.
    appln_filing_year: 8.
    appln_id :8.
    cited_appln_id :8.
    pre_patent :8.
  ;
  cards4;
A,2021,101,201,1
B,2021,101,202,
A,2017,102,203,
A,2016,103,204,
B,2018,104,205,
;;;;

data have2_1985;
  infile cards dsd  dlm=",";
  input
    appln_id_citing :8.
    appln_id_citing_p5 :8.
  ;
  cards4;
101,201
101,203
102,203
103,205
104,205
;;;;

%MACRO Explo(year=);

  data _null_;
    if 0 then set have2_&amp;amp;year.;
    dcl hash h1(dataset:"have2_&amp;amp;year.");
    h1.defineKey('appln_id_citing','appln_id_citing_p5');
    h1.defineData('appln_id_citing','appln_id_citing_p5');
    h1.defineDone();
    h1.output(dataset:"have2_dedup_&amp;amp;year.");
    stop;
  run;

  data want3_&amp;amp;year.;
    if _n_=1 then 
      do;
        if 0 then set have1_&amp;amp;year. have2_dedup_&amp;amp;year.;
        dcl hash h1(dataset:" have2_dedup_&amp;amp;year.",multidata:'y');
        h1.defineKey('appln_id_citing');
        h1.defineData('appln_id_citing','appln_id_citing_p5');
        h1.defineDone();
        length cited_patent 3.;
      end;

    set have1_&amp;amp;year.;
    do while(h1.do_over(key:appln_id) eq 0);
      if cited_appln_id=appln_id_citing_p5 then 
        do;
          cited_patent=1;
          output;
          leave;
        end;
    end;
    if cited_patent ne 1 then output;

    keep
      Applicant_name
      appln_filing_year
      appln_id
      cited_appln_id
      pre_patent
      cited_patent
      ;

  run;

  proc sort data=want3_&amp;amp;year. out=want3_&amp;amp;year.;
    by Applicant_name appln_filing_year cited_appln_id;
  run;

%MEND Explo;

%Explo(year=1985)
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 11 Aug 2021 05:11:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/getting-the-same-result-but-spend-a-shorter-time/m-p/760811#M240631</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2021-08-11T05:11:39Z</dc:date>
    </item>
    <item>
      <title>Re: getting the same result but spend a shorter time</title>
      <link>https://communities.sas.com/t5/SAS-Programming/getting-the-same-result-but-spend-a-shorter-time/m-p/760812#M240632</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/12447"&gt;@Patrick&lt;/a&gt;,&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Good approach.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If I may ask:&amp;nbsp; Why use&amp;nbsp;multidata:'y'?&amp;nbsp; It sounded as though we wanted a unique count.&amp;nbsp; Wouldn't&amp;nbsp;multidata:'y' &lt;EM&gt;permit&lt;/EM&gt; duplicates?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Jim&lt;/P&gt;</description>
      <pubDate>Wed, 11 Aug 2021 04:58:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/getting-the-same-result-but-spend-a-shorter-time/m-p/760812#M240632</guid>
      <dc:creator>jimbarbour</dc:creator>
      <dc:date>2021-08-11T04:58:14Z</dc:date>
    </item>
    <item>
      <title>Re: getting the same result but spend a shorter time</title>
      <link>https://communities.sas.com/t5/SAS-Programming/getting-the-same-result-but-spend-a-shorter-time/m-p/760814#M240634</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/37107"&gt;@jimbarbour&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/12447"&gt;@Patrick&lt;/a&gt;,&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Good approach.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If I may ask:&amp;nbsp; Why use&amp;nbsp;multidata:'y'?&amp;nbsp; It sounded as though we wanted a unique count.&amp;nbsp; Wouldn't&amp;nbsp;multidata:'y' &lt;EM&gt;permit&lt;/EM&gt; duplicates?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Jim&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/37107"&gt;@jimbarbour&lt;/a&gt;&amp;nbsp;Ooops! Copy/paste error. Yes, certainly not multidata. I've fixed the code.&lt;/P&gt;</description>
      <pubDate>Wed, 11 Aug 2021 04:59:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/getting-the-same-result-but-spend-a-shorter-time/m-p/760814#M240634</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2021-08-11T04:59:29Z</dc:date>
    </item>
    <item>
      <title>Re: getting the same result but spend a shorter time</title>
      <link>https://communities.sas.com/t5/SAS-Programming/getting-the-same-result-but-spend-a-shorter-time/m-p/760820#M240639</link>
      <description>&lt;BLOCKQUOTE&gt;
&lt;P&gt;however, these codes spend around 10 hours getting the results. Is there any method to simplify the steps, getting the same result but spend a shorter time?&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Start by showing what you have by posting proc contents of the original files, then explain what you are trying to achieve. And you should post some details about the hardware you are using.&lt;/P&gt;</description>
      <pubDate>Wed, 11 Aug 2021 05:31:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/getting-the-same-result-but-spend-a-shorter-time/m-p/760820#M240639</guid>
      <dc:creator>andreas_lds</dc:creator>
      <dc:date>2021-08-11T05:31:25Z</dc:date>
    </item>
    <item>
      <title>Re: getting the same result but spend a shorter time</title>
      <link>https://communities.sas.com/t5/SAS-Programming/getting-the-same-result-but-spend-a-shorter-time/m-p/760866#M240663</link>
      <description>&lt;P&gt;If the example data represents your real data correctly, the first SQL will result in a (hidden) cartesian join that can easily blow up on you anytime. The two 101s in both datasets result in 4 observations in the output.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So you should get to know why there are repeats in both datasets.&lt;/P&gt;
&lt;P&gt;Next, the use of the DISTINCT clause is always a big performance-eater and should only be done when &lt;U&gt;&lt;EM&gt;absolutely necessary&lt;/EM&gt;&lt;/U&gt;.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you actually use DISTINCT to dedupe the result, then it is much better to dedupe before doing the join.&lt;/P&gt;</description>
      <pubDate>Wed, 11 Aug 2021 09:05:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/getting-the-same-result-but-spend-a-shorter-time/m-p/760866#M240663</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2021-08-11T09:05:16Z</dc:date>
    </item>
    <item>
      <title>Re: getting the same result but spend a shorter time</title>
      <link>https://communities.sas.com/t5/SAS-Programming/getting-the-same-result-but-spend-a-shorter-time/m-p/760889#M240675</link>
      <description>&lt;P&gt;1. Why do you&lt;/P&gt;
&lt;PRE&gt;order by citing1.appln_id&lt;/PRE&gt;
&lt;P&gt;when you never use that order?&lt;/P&gt;
&lt;P&gt;2. What is the HAVING supposed to achieve when the value is either missing or 1?&lt;/P&gt;
&lt;P&gt;3. As mentioned, is the DISTINCT keyword useful? Is there a DQ issue with the data?&lt;/P&gt;
&lt;P&gt;In any case this generates exactly the same result on your data and should be much faster:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;PROC SQL;
	CREATE TABLE want AS
	SELECT unique
	Applicant_name,
	appln_filing_year,
	appln_id,
	cited_appln_id,
	pre_patent,
	case when cited_appln_id=appln_id_citing_p5 then 1 end as cited_patent
	FROM (
		SELECT  
		citing1.Applicant_name,
		citing1.appln_filing_year,
		citing1.appln_id,
		pre_patent,
		citing1.cited_appln_id,
		citing2.appln_id_citing_p5
		FROM
        have1_&amp;amp;year.       as citing1
		JOIN have2_&amp;amp;year.  as citing2 
		ON citing1.appln_id = citing2.appln_id_citing
	)
	GROUP BY Applicant_name,appln_id,cited_appln_id
	HaViNg cited_patent=max(cited_patent) 
	;
QUIT;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 11 Aug 2021 11:12:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/getting-the-same-result-but-spend-a-shorter-time/m-p/760889#M240675</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2021-08-11T11:12:52Z</dc:date>
    </item>
    <item>
      <title>Re: getting the same result but spend a shorter time</title>
      <link>https://communities.sas.com/t5/SAS-Programming/getting-the-same-result-but-spend-a-shorter-time/m-p/760890#M240676</link>
      <description>&lt;P&gt;This also generates the same results and should be even faster:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;PROC SQL;
	CREATE TABLE t AS
	SELECT unique * 
	FROM (
		SELECT  
			citing1.Applicant_name,
			citing1.appln_filing_year,
			citing1.appln_id,
			pre_patent,
			citing1.cited_appln_id,
			case when cited_appln_id=appln_id_citing_p5 then 1 end as cited_patent
		FROM have1_&amp;amp;year.  as citing1
		JOIN have2_&amp;amp;year.  as citing2 
		ON citing1.appln_id = citing2.appln_id_citing
	)
	GROUP BY Applicant_name,appln_id,cited_appln_id
	Having cited_patent=max(cited_patent) 
	;
QUIT;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 11 Aug 2021 11:20:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/getting-the-same-result-but-spend-a-shorter-time/m-p/760890#M240676</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2021-08-11T11:20:53Z</dc:date>
    </item>
    <item>
      <title>Re: getting the same result but spend a shorter time</title>
      <link>https://communities.sas.com/t5/SAS-Programming/getting-the-same-result-but-spend-a-shorter-time/m-p/760943#M240706</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/16961"&gt;@ChrisNZ&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Select&amp;nbsp;&lt;EM&gt;&lt;STRONG&gt;Unique&lt;/STRONG&gt;&lt;/EM&gt;?&amp;nbsp; I know it works in Oracle, but is that valid in SAS?&amp;nbsp; Is its behavior any different than Select Distinct?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Edit:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;H4 class="xis-argument"&gt;DISTINCT&lt;/H4&gt;
&lt;DIV class="xis-argumentDescription"&gt;
&lt;P class="xis-paraSimpleFirst"&gt;eliminates duplicate rows. The DISTINCT argument is identical to UNIQUE.&lt;/P&gt;
&lt;TABLE class="xis-summary"&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD class="xis-alias"&gt;Alias&lt;/TD&gt;
&lt;TD class="xis-summaryText"&gt;UNIQUE&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD rowspan="2" class="xis-summaryNote"&gt;Notes&lt;/TD&gt;
&lt;TD class="xis-summaryText"&gt;Although the UNIQUE argument is identical to DISTINCT, it is not an ANSI standard.&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;/DIV&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Well, I'll be darned.&amp;nbsp; Learn something new every day.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Jim&lt;/P&gt;</description>
      <pubDate>Wed, 11 Aug 2021 15:45:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/getting-the-same-result-but-spend-a-shorter-time/m-p/760943#M240706</guid>
      <dc:creator>jimbarbour</dc:creator>
      <dc:date>2021-08-11T15:45:44Z</dc:date>
    </item>
    <item>
      <title>Re: getting the same result but spend a shorter time</title>
      <link>https://communities.sas.com/t5/SAS-Programming/getting-the-same-result-but-spend-a-shorter-time/m-p/760963#M240714</link>
      <description>&lt;P&gt;The best answer may depend on whether there is a known, sorted order to have2.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here is an approach to consider.&amp;nbsp; I'm not sure if it's feasible, because I'm not sure how you want to handle duplicate entries.&amp;nbsp; But there should be a few posters on this thread who will take the idea and run with it.&amp;nbsp; (Sorry, I can't spend enough time to look up the details.)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Rather than creating a hash table, create an informat.&amp;nbsp; The advantage:&amp;nbsp; The hash table has to process the 300M records for each run (i.e., for each year).&amp;nbsp; But an informat can be created once, and permanently saved.&amp;nbsp; It also forces you to clean out any duplicates from HAVE2.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Once that is done, you would be appending to just a smaller (300K records) data set and the processing should be swift.&lt;/P&gt;</description>
      <pubDate>Wed, 11 Aug 2021 16:32:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/getting-the-same-result-but-spend-a-shorter-time/m-p/760963#M240714</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2021-08-11T16:32:42Z</dc:date>
    </item>
    <item>
      <title>Re: getting the same result but spend a shorter time</title>
      <link>https://communities.sas.com/t5/SAS-Programming/getting-the-same-result-but-spend-a-shorter-time/m-p/761048#M240754</link>
      <description>&lt;P&gt;Unsure how &lt;EM&gt;unique&lt;/EM&gt; became my go-to syntax rather than &lt;EM&gt;distinct&lt;/EM&gt;. Easier to type maybe? Yes, they are the same.&lt;/P&gt;</description>
      <pubDate>Thu, 12 Aug 2021 00:40:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/getting-the-same-result-but-spend-a-shorter-time/m-p/761048#M240754</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2021-08-12T00:40:06Z</dc:date>
    </item>
  </channel>
</rss>

