BookmarkSubscribeRSS Feed
France
Quartz | Level 8

Dear all,

 

When I run the code below,

 

PROC SQL;
   CREATE TABLE Step4.M_3cit  AS
      SELECT
       t_base.HRM_L2_cited,
       t_base.earliest_filing_month,
       t_base.earliest_publn_date,
	   t_do.cited_docdb_family_id,
       t_do.docdb_family_id,
       t_ap2.appln_id,
       t_ap2.pat_publn_id,
       t_ap2.publn_date,
       t_ap2.HRM_L2_citing
      FROM
        Pat_ori.Docdbfamiliescitations AS t_do
        JOIN Step3.M_2cit AS t_base ON t_do.cited_docdb_family_id=t_base.docdb_family_id_base
        JOIN Step1.appln_publn AS t_ap2 ON t_do.docdb_family_id=t_ap2.docdb_family_id
      WHERE t_ap2.publn_date <= intnx('year', t_base.earliest_publn_date,3) /*a 3-year moving window starting at the earliest publication date of a patent family */
      AND t_ap2.publn_date >= intnx('year', t_base.earliest_publn_date,0)
      GROUP BY HRM_L2_cited,cited_docdb_family_id,t_do.Docdb_family_id
      Having sum(HRM_L2_cited = HRM_L2_citing)>0 /* exclude self_citation */
	  ORDER BY t_base.HRM_L2_cited
;
QUIT;

I get the 

320  PROC SQL;
321     CREATE TABLE Step4.M_3cit  AS
322        SELECT
323         t_base.HRM_L2_cited,
324         t_base.earliest_filing_month,
325         t_base.earliest_publn_date,
326         t_do.cited_docdb_family_id,
327         t_do.docdb_family_id,
328         t_ap2.appln_id,
329         t_ap2.pat_publn_id,
330         t_ap2.publn_date,
331         t_ap2.HRM_L2_citing
332        FROM
333          Con_ori.Docdbfamiliescitations AS t_do
334          JOIN Step3.M_2cit AS t_base ON t_do.cited_docdb_family_id=t_base.docdb_family_id_base
335          JOIN Step1.appln_publn AS t_ap2 ON t_do.docdb_family_id=t_ap2.docdb_family_id
336        WHERE t_ap2.publn_date <= intnx('year', t_base.earliest_publn_date,3) /*a 3-year moving
336! window starting at the earliest publication date of a patent family */
337        AND t_ap2.publn_date >= intnx('year', t_base.earliest_publn_date,0)
338        GROUP BY HRM_L2_cited,cited_docdb_family_id,t_do.Docdb_family_id
339        Having sum(HRM_L2_cited = HRM_L2_citing)>0 /* exclude self_citation */
340        ORDER BY t_base.HRM_L2_cited
341  ;
NOTE: The query requires remerging summary statistics back with the original data.
ERROR: Sort execution failure.

342  QUIT;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
      real time           2:10:07.98
      cpu time            19:42.26

How can I overcome this problem? Could you please give me some suggestion about this ?

thanks in advance

5 REPLIES 5
ChrisNZ
Tourmaline | Level 20

Have a read here and here.

Bottom line: not enough resources on the computer.

Use the _METHOD option to know more about the joins.

Workarounds:

- Presort the tables

- Break the query into 2 queries.

- Increase available RAM or disk space

 

 

Reeza
Super User
Data is too big to be worked on at once. you're doing cross joins so that's not too surprising. Try breaking it up into simpler steps and/or changing your WHERE to ON conditions instead. And remove the ORDER BY, since you have GROUP BY it should be the same.
France
Quartz | Level 8

Daer Reeza,

Could you please tell me how to change WHERE to ON conditions?

 

 

Besides, I simplify the code and run the following codes,

 

PROC SQL;
   CREATE TABLE Step4.M_3cit1 AS
      SELECT
       t_do.HRM_L2_cited,
       t_do.earliest_filing_month,
       t_do.earliest_publn_date,
	   t_do.cited_docdb_family_id,
       t_do.docdb_family_id,
       t_ap2.appln_id,
       t_ap2.pat_publn_id,
       t_ap2.publn_date,
       t_ap2.HRM_L2_citing
      FROM
        step5.M_3cit_step1 AS t_do
        JOIN Step1.appln_publn AS t_ap2 ON t_do.docdb_family_id=t_ap2.docdb_family_id
      WHERE t_ap2.publn_date <= intnx('year', t_do.earliest_publn_date,3) /*a 3-year moving window starting at the earliest publication date of a patent family */
      AND t_ap2.publn_date >= intnx('year', t_do.earliest_publn_date,0)
      GROUP BY HRM_L2_cited,t_do.cited_docdb_family_id,t_do.Docdb_family_id
      Having sum(HRM_L2_cited = HRM_L2_citing)>0 /* exclude self_citation */
;
QUIT;

and the get following results,

 

238  PROC SQL;
239     CREATE TABLE Step4.M_3cit1 AS
240        SELECT
241         t_do.HRM_L2_cited,
242         t_do.earliest_filing_month,
243         t_do.earliest_publn_date,
244         t_do.cited_docdb_family_id,
245         t_do.docdb_family_id,
246         t_ap2.appln_id,
247         t_ap2.pat_publn_id,
248         t_ap2.publn_date,
249         t_ap2.HRM_L2_citing
250        FROM
251          step5.M_3cit_step1 AS t_do
252          JOIN Step1.appln_publn AS t_ap2 ON t_do.docdb_family_id=t_ap2.docdb_family_id
253        WHERE t_ap2.publn_date <= intnx('year', t_do.earliest_publn_date,3) /*a 3-year moving
253! window starting at the earliest publication date of a patent family */
254        AND t_ap2.publn_date >= intnx('year', t_do.earliest_publn_date,0)
255        GROUP BY HRM_L2_cited,t_do.cited_docdb_family_id,t_do.Docdb_family_id
256        Having sum(HRM_L2_cited = HRM_L2_citing)>0 /* exclude self_citation */
257  ;
NOTE: The query requires remerging summary statistics back with the original data.
NOTE: Invalid argument 2 to function INTNX. Missing values may be generated.
NOTE: Invalid argument 2 to function INTNX. Missing values may be generated.
ERROR: User asked for termination.

258  QUIT;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
      real time           18.27 seconds
      cpu time            0.78 seconds

closing clipboard failed

why there are

'NOTE: Invalid argument 2 to function INTNX. Missing values may be generated.
NOTE: Invalid argument 2 to function INTNX. Missing values may be generated.' 

 

Could you please give me some suggestion about this?

thanks in advance.

 

 

Reeza
Super User
Change WHERE too AND.

If you have missing values in your data, then INTNX will throw a warning. you can add a CASE statement to not do that calculation when you have missing data.
Kurt_Bremser
Super User

Things like that were the reason for writing Maxim 10.

Break your task down into several steps, and use data step merging wherever you do not need cartesian joins.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 5065 views
  • 5 likes
  • 4 in conversation