PROC SQL delete not exists .. remergance .. why?

Reply
Occasional Contributor
Posts: 5

PROC SQL delete not exists .. remergance .. why?

Hi, my question is about the sas log NOTE: The query requires remerging summary statistics back with the original data

 

My application uses proc sql to delete duplicate rows, and I've developed code that works. See below steps1,2 and 3

I added Step2 explicitly to use recno as the reference for deletion, and is actually not relevent is Step3 .. more to come

 

My concern is that other better solutions didn't work and i'm don't understand how SAS interprets remergence.

Here is another step3A solution which SAS did not accept because of remergence. Why? Recno is unique on each record. The MIN(sumAllPeriod) reduces the source data on the BY GROUP. I tried adding an index for recno but no luck. This kind of SQL operation works in other SQL engines

 

Thanks for any explanation

 

Step 3A

 PROC SQL UNDO_POLICY=NONE;
  DELETE FROM ciwork.claimshist_cat_x2 AS X
  WHERE NOT EXISTS (SELECT 1
   FROM (SELECT recno, MIN(sumAllPeriod)
       FROM ciwork.claimshist_cat_x2
      GROUP BY correlationId, t_s_date, policyCoverageId) AS Y      
   WHERE X.recno = Y.recno);
 QUIT;

 

Step 1 :

 PROC SQL;
  CREATE TABLE ciwork.claimshist_cat_x1 AS
   SELECT a.*,
      b.numClmsYr4Low,b.numClmsYr4High,b.numClmsYr3Low,b.numClmsYr3High,
      b.numClmsYr2Low,b.numClmsYr2High,b.numClmsYr1Low,b.numClmsYr1High, b.claimsHistCat,
      SUM(b.numClmsYr1High,b.numClmsYr2High,b.numClmsYr3High,b.numClmsYr4High) -
      SUM(b.numClmsYr1Low,b.numClmsYr2Low,b.numClmsYr3Low,b.numClmsYr4Low) AS sumAllPeriod
    FROM ciwork.wrkcmp_credible_risk1 AS a INNER JOIN ciprd0.Tcper_claimshist_cat AS b   
      ON a.rateset = b.rateset
        AND a.t_s_date >= b.effectiveDate
        AND a.p_effTimestamp >= b.changeTimestamp
        AND (a.t_s_date <= b.effectiveEndDate
         OR  a.p_effTimestamp < b.endDateTimestamp)
     AND (a.numClaimsMade_1 BETWEEN b.numClmsYr1Low AND b.numClmsYr1High)
     AND (a.numClaimsMade_2 BETWEEN b.numClmsYr2Low AND b.numClmsYr2High)
     AND (a.numClaimsMade_3 BETWEEN b.numClmsYr3Low AND b.numClmsYr3High)
     AND (a.numClaimsMade_4 BETWEEN b.numClmsYr4Low AND b.numClmsYr4High);
 QUIT;

 

Step 2

 DATA ciwork.claimshist_cat_x2 (index=(recno));
  attrib recno length = 5;
  SET ciwork.claimshist_cat_x1 (drop=numClmsYrSmiley Happy;
  recno = _N_;
 RUN;

 

Step 3

 PROC SQL UNDO_POLICY=NONE;
  DELETE FROM ciwork.claimshist_cat_x2 AS X
  WHERE NOT EXISTS (SELECT 1
   FROM (SELECT correlationId, policyCoverageId, t_s_date, MIN(sumAllPeriod) AS minSumAllPeriod
       FROM ciwork.claimshist_cat_x2
      GROUP BY correlationId, t_s_date, policyCoverageId) AS Y      
   WHERE X.correlationId = Y.correlationId
    AND X.policyCoverageId = Y.policyCoverageId
    AND X.t_s_date = Y.t_s_date
    AND X.sumAllPeriod = Y.minSumAllPeriod);
 QUIT;

 

 

 PROC SQL UNDO_POLICY=NONE;
  DELETE FROM ciwork.claimshist_cat_x2 AS X
  WHERE NOT EXISTS (SELECT 1
   FROM (SELECT recno, MIN(sumAllPeriod) AS minSumAllPeriod
       FROM ciwork.claimshist_cat_x2
      GROUP BY correlationId, t_s_date, policyCoverageId) AS Y      
   WHERE X.recno = Y.recno);
 QUIT;

PROC Star
Posts: 1,674

Re: PROC SQL delete not exists .. remergance .. why?

Yes SAS issues this message

WARNING: This statement recursively references the target table. A consequence of this is a possible data integrity problem.

when you use the same table to be created /modified and as a source.

 

The standard tech support answer is:

Because SAS does not fully adhere to the ANSI standards it was necessary for that Warning to be generated by Proc SQL.

 

This frankly doesn't mean much, and it is utter non-sense in the case of a create statement.

 

In the case of a delete statement, it makes more sense since you are deleting records as you are assessing which records to delete.

So some unexpected results can <possibly> take place.

 

 

 

 

 

Super User
Posts: 19,171

Re: PROC SQL delete not exists .. remergance .. why?

 

It issue the error when there's a variable on the select clause not on the group clause. 

In general, other instances of SQL don't allow this AFAIK. 

 

Add rec_no to the group by to remove the message, your where clause is outside of the subquery so in my opinion it's fair to issue the warning.

 

I think this is the code causing the issue, though I think you're aware of that.

SELECT recno, MIN(sumAllPeriod)
       FROM ciwork.claimshist_cat_x2
      GROUP BY correlationId, t_s_date, policyCoverageId

 

 

 

Respected Advisor
Posts: 4,821

Re: PROC SQL delete not exists .. remergance .. why?

The principle is a follows. When a query with a GROUP BY clause selects only group-by variables, constant expressions and summary expressions, there is no remerging. If the query selects a non-group-by variable or expression, SAS/SQL remerges the summary expressions into the original data.

Automatic remerging is a powerful feature of SAS/SQL not present in every implementation of SQL.

 

Compare 

 

proc sql;
select 
    catx(" - ", make, origin) as Company, 
    mean(invoice) as meanPrice
from sashelp.cars
group by make;
quit;

with

 

proc sql;
select 
    catx(" - ", make, origin) as Company, 
    mean(invoice) as meanPrice
from sashelp.cars
group by make, origin;
quit;
PG
Occasional Contributor
Posts: 5

Re: PROC SQL delete not exists .. remergance .. why?

Hi,

 

I get that the result set of a SUM operation is a new set of records

However, the result set of a MIN or MAX operation are records that exist in the source, so non-groupby variables should be allowed in the result set

 

Regards Peter

Super User
Posts: 19,171

Re: PROC SQL delete not exists .. remergance .. why?


pmg7670 wrote:

Hi,

 

I get that the result set of a SUM operation is a new set of records

However, the result set of a MIN or MAX operation are records that exist in the source, so non-groupby variables should be allowed in the result set

 

Regards Peter


 That's what you may want, but not how SQL is implemented Smiley Happy

Respected Advisor
Posts: 4,821

Re: PROC SQL delete not exists .. remergance .. why?

You advocate for MAX and MIN to be exempt from remerging. But MAX and MIN are summaries most likely to benefit from remerging. Consider finding information about the oldest students:

 

Proc SQL;
select * 
from sashelp.class
group by sex
having age = max(age);
quit;

the HAVING clause cannot be evaluated without remerging.

PG
Occasional Contributor
Posts: 5

Re: PROC SQL delete not exists .. remergance .. why?

... ahh yes .. a having clause is the trick, thanks PGStats, Reeza and ChrisNZ

See step2 which now works well

Regards Peter

 

Step 1

 PROC SQL;
  CREATE TABLE ciwork.claimshist_cat_x1 AS
   SELECT monotonic() as recno, a.*,
      b.numClmsYr4Low,b.numClmsYr4High,b.numClmsYr3Low,b.numClmsYr3High,
      b.numClmsYr2Low,b.numClmsYr2High,b.numClmsYr1Low,b.numClmsYr1High, b.claimsHistCat,
      SUM(b.numClmsYr1High,b.numClmsYr2High,b.numClmsYr3High,b.numClmsYr4High) -
      SUM(b.numClmsYr1Low,b.numClmsYr2Low,b.numClmsYr3Low,b.numClmsYr4Low) AS sumAllPeriod
    FROM ciwork.wrkcmp_credible_risk1 AS a INNER JOIN ciprd0.Tcper_claimshist_cat AS b   
      ON a.rateset = b.rateset
        AND a.t_s_date >= b.effectiveDate
        AND a.p_effTimestamp >= b.changeTimestamp
        AND (a.t_s_date <= b.effectiveEndDate
         OR  a.p_effTimestamp < b.endDateTimestamp)
     AND (a.numClaimsMade_1 BETWEEN b.numClmsYr1Low AND b.numClmsYr1High)
     AND (a.numClaimsMade_2 BETWEEN b.numClmsYr2Low AND b.numClmsYr2High)
     AND (a.numClaimsMade_3 BETWEEN b.numClmsYr3Low AND b.numClmsYr3High)
     AND (a.numClaimsMade_4 BETWEEN b.numClmsYr4Low AND b.numClmsYr4High);
 QUIT;

 

Step2

 OPTIONS NONOTES;
 PROC SQL UNDO_POLICY=NONE;
  DELETE FROM ciwork.claimshist_cat_x1 AS X
  WHERE NOT EXISTS (SELECT 1
   FROM (SELECT recno
       FROM ciwork.claimshist_cat_x1
      GROUP BY correlationId, t_s_date, policyCoverageId
      HAVING sumAllPeriod = MIN(sumAllPeriod)) AS Y      
   WHERE X.recno = Y.recno);
 QUIT;
 OPTIONS NOTES;

Ask a Question
Discussion stats
  • 7 replies
  • 634 views
  • 0 likes
  • 4 in conversation