Help using Base SAS procedures

Self-Join Proc SQL

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 5
Accepted Solution

Self-Join Proc SQL

I'm looking for some assistance with a self-join using proc sql.

I have a table with the following variables: client_policy_number, transaction_date, plan_type, plan_description.

I want to select the minimum plan_type for each transaction_date and group by the policy_number and transaction_date.

The data looks like this:

policy_number      tx_date        plan_typ       plan_desc

       1                04JUL2011         15             Agro

       1                04JUL2011         12            GDP

       1                04JUL2011         7              VFFC


I'd like to return something like this for each policy number:

policy_number    tx_date       plan_typ    plan_desc

      1                04JUL2011      7               VFFC

I'm wondering if anyone has suggestions or can provide me guidance.

Thank you!


Accepted Solutions
Solution
‎10-09-2013 12:07 PM
Super User
Posts: 19,877

Re: Self-Join Proc SQL

Does the following work (untested):

proc sql;

     create table want as

select policy_number, tx_date, min(plan_type) as min_plan_type, plan_desc

group by policy_number, tx_date

having plan_typ=calculated min_plan_type;

quit;

This is also a very easy data step problem using first logic.

proc sort data=have; by policy_number tx_date plan_type plan_desc;

data want;

set have;

by policy_number tx_date;

if first.tx_date;

run;

View solution in original post


All Replies
Solution
‎10-09-2013 12:07 PM
Super User
Posts: 19,877

Re: Self-Join Proc SQL

Does the following work (untested):

proc sql;

     create table want as

select policy_number, tx_date, min(plan_type) as min_plan_type, plan_desc

group by policy_number, tx_date

having plan_typ=calculated min_plan_type;

quit;

This is also a very easy data step problem using first logic.

proc sort data=have; by policy_number tx_date plan_type plan_desc;

data want;

set have;

by policy_number tx_date;

if first.tx_date;

run;

Occasional Contributor
Posts: 5

Re: Self-Join Proc SQL

Question about the above:  having plan_typ =calculated min_plan_type

This is giving me an error but I don't understand why. It doesn't like something about it.

Super User
Posts: 19,877

Re: Self-Join Proc SQL

Can you post the log?

Occasional Contributor
Posts: 5

Re: Self-Join Proc SQL

I don't know if the log will be useful. I'm using SAS as a pass through. I've tried running your solution separate of the connection to our database and it works in a 2 step process where I extract just the 4 variables and then separately run the proc sql solution you gave above.

Here's my code, based on your help.

proc sql;

connect to XXXX as IQ (user = XXX password = XXX host='XXX' database='XXX' server='XXX' port='XXX');

create table Plan_home_details as

select * from connection to IQ

(

select

p.pol_no,

p.tx_pol_ver_dt,

min(p.plan_typ) as min_plan_type,

p.plan_desc

from FDSS_ANA2.pla_pol    p

where p.pol_no = 123456789   /*I'm testing on a single policy number*/

and p.pol_tx_typ in (3,4,5)

group by p.pol_no, p.tx_pol_ver_dt

having p.plan_typ=calculated min_plan_type

);

disconnect from IQ;

quit;

The log says the following:

ERROR: CLI prepare error: [Sybase][ODBC Driver][Sybase IQ]Syntax error near 'min_plan_type' on line 1

N/A
Posts: 1

Re: Self-Join Proc SQL

Hi Please try this, its is working.

DATA POLICY_DATASET;
INPUT POLICY_NUMBER $ TX_DATE : DATE9. PLAN_TYP PLAN_DESC $;
FORMAT TX_DATE DATE9.;
CARDS;
1 04JUL2011 15 Agro
1 04JUL2011 12 GDP
1 04JUL2011 7 VFFC
2 05JUL2011 10 Agro
2 05JUL2011 11 GDP
2 05JUL2011 20 VFFC
3 06JUL2011 2 Agro
3 06JUL2011 10 GDP
3 06JUL2011 15 VFFC
;
RUN;


PROC SQL;
CREATE TABLE POLICY_NUMBERS AS
SELECT POLICY_NUMBER, TX_DATE, PLAN_TYP, PLAN_DESC, MIN(PLAN_TYP) AS MIN_PLAN
FROM POLICY_DATASET

GROUP BY POLICY_NUMBER, TX_DATE
HAVING PLAN_TYP=CALCULATED MIN_PLAN
;
QUIT;

Super User
Posts: 19,877

Re: Self-Join Proc SQL

When you use pass through you need to use SQL that is compliant with your system, so you need to provide the system information, in this case SYBASE.
Unfortunately I'm not familiar with SYBASE SQL, so I don't know the solution off the top of my head.

My next suggestion would be to use a subquery and join on the table itself.

Occasional Contributor
Posts: 5

Re: Self-Join Proc SQL

The data step solution you provided does work.

Unfortunately, I must do this one in proc sql and can't use a data step.

Super Contributor
Posts: 339

Re: Self-Join Proc SQL

I'm fairly positive it is because CALCULATED is for select and where statement references and is not required for HAVING statement. Simply using the column alias should be sufficient. The where statement normally applies before aggregate functions whereas having applies naturally after.

HAVING clause attributesWHERE clause attributes
is typically used to specify conditions
for including or excluding groups of rows from a table.
is used to specify conditions for
including or excluding individual rows from a table.
must follow the GROUP BY clause in a
query, if used with a GROUP BY clause.
must precede the GROUP BY clause in a
query, if used with a GROUP BY clause.
is affected by a GROUP BY clause; when
there is no GROUP BY clause, the HAVING clause is treated like a WHERE
clause.
is not affected by a GROUP BY
clause.
is processed after the GROUP BY clause and
any aggregate functions.

is processed before a GROUP BY clause, if
there is one, and before any aggregate functions.

One of the following codes should fix the issue

proc sql;

     create table want as

     select *

     from have

     group by policy_number, tx_date

     having plan_typ=min(plan_typ)

;

quit;

*edit* killed the where approach - does not work. However, calculated statement is not necessary in the having clause.

It's possible that simply removing the CALCULATED statement in Reeza's code will work as well since having is processed after aggregate functions.

Vince

Respected Advisor
Posts: 4,173

Re: Self-Join Proc SQL

Agree with Vince: "Calculated" is a SAS SQL flavour thing and won't work in Sybase (and other DBMS). Using "having p.plan_typ=min(p.plan_typ)" will work.

You also need to have "min(p.plan_typ) as min_plan_type" in your select statement. A "select *" won't do. Not sure if Sybase would allow a "select * " syntactically (Oracle wouldn't) but even if it doesn't throw an error (SAS SQL wouldn't) you would end up with one row per distinct value of plan_type per group. Something you don't want. You're only after one row with the minimum value for plan_type.

select
  p.pol_no,
  p.tx_pol_ver_dt,
  min(p.plan_typ) as min_plan_type,
  p.plan_desc
  from FDSS_ANA2.pla_pol p


where
  p.pol_no = 123456789   /*I'm testing on a single policy number*/
  and p.pol_tx_typ in (3,4,5)

group by p.pol_no, p.tx_pol_ver_dt

having p.plan_typ=min(p.plan_typ)

If you're familiar with SAS SQL but not with the DB flavour SQL then one "trick" which sometimes works is to write the SQL first in SAS but also using options

OPTIONS SASTRACE=',,,d' SASTRACELOC=SASLOG NOSTSUFFIX;

The SAS/Access engine will try to send as much of your SAS SQL to the database for execution (so translates it to DB flavour SQL). Using above options will show you in the log the SQL SAS was able to send to the DB - and if it was the full SQL then you can copy/paste this SQL from the log into a pass-through SQL block (or at least use it as a starting point).

Super User
Posts: 19,877

Re: Self-Join Proc SQL

Doesn't SQL usually complain about selecting something that's not in the group by clause or summarized on as well, ie plan_desc?

That's why you usually need a subquery. 

Respected Advisor
Posts: 4,173

Re: Self-Join Proc SQL

Yes, some DB SQL flavours do. If "plan_desc" is constant per group then a way around this is to use a max or min function there as well:

select
  p.pol_no,
  p.tx_pol_ver_dt,
  min(p.plan_typ) as min_plan_type,
  min(p.plan_desc) as plan_desc
  from FDSS_ANA2.pla_pol p


where
  p.pol_no = 123456789   /*I'm testing on a single policy number*/
  and p.pol_tx_typ in (3,4,5)

group by p.pol_no, p.tx_pol_ver_dt

having p.plan_typ=min(p.plan_typ)

Super Contributor
Posts: 339

Re: Self-Join Proc SQL

I don't think it's necessary to calculate the minimum in the select statement. The value of the minimum is stored for some row(s) of a group and thus the having statement will select the row(s) with value equal to minimum. This causes a remerging before having statement applies but realistically, plan_desc would cause the remerging anyway so there is no gain in calculating a minimum on plan_typ in the select statement unless you also edit the way you are computing plan_desc to not cause the remerging.

I also think this is what Reeza refers to. Including plan_desc in the query causes a warning (or a note) saying that the query caused a remerging. Which is not necessarily a bad thing - it would be just as long to do subquery + merge or self join.

You did point out that if somehow you can use another way to derive plan_desc either through an aggregate function or possibly a case over the calculated min(plan_typ), then there could be a real gain but if not, then there is no need to create a column for min(plan_typ) in the select statement as the having will only retain the corresponding appropriate row(s).

Respected Advisor
Posts: 4,173

Re: Self-Join Proc SQL

Posted in reply to Vince28_Statcan

"The value of the minimum is stored for some row(s) of a group and thus the having statement will select the row(s) with value equal to minimum"

That will work with SAS SQL but I'm pretty sure that with Oracle SQL it wouldn't and you have to use some function for all columns not being part of the group by clause.

I'm not sure if this behaviour is Oracle specific or an "ANSI standard" but using a function will always work - and as I don't have experience with the Sybase SQL flavour that's what I proposed to be on the safe side. Else it's the sub-query as Reeza proposed.

Frequent Contributor
Frequent Contributor
Posts: 83

Re: Self-Join Proc SQL

Try the following code

Proc sort data=have ;
by policy_number tx_date plan_typ;
run;

data want ;
set have;
by policy_number tx_date plan_typ;

if first.policy_number then output;
run;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 14 replies
  • 1685 views
  • 0 likes
  • 6 in conversation