BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Shara
Calcite | Level 5

I'm looking for some assistance with a self-join using proc sql.

I have a table with the following variables: client_policy_number, transaction_date, plan_type, plan_description.

I want to select the minimum plan_type for each transaction_date and group by the policy_number and transaction_date.

The data looks like this:

policy_number      tx_date        plan_typ       plan_desc

       1                04JUL2011         15             Agro

       1                04JUL2011         12            GDP

       1                04JUL2011         7              VFFC


I'd like to return something like this for each policy number:

policy_number    tx_date       plan_typ    plan_desc

      1                04JUL2011      7               VFFC

I'm wondering if anyone has suggestions or can provide me guidance.

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Does the following work (untested):

proc sql;

     create table want as

select policy_number, tx_date, min(plan_type) as min_plan_type, plan_desc

group by policy_number, tx_date

having plan_typ=calculated min_plan_type;

quit;

This is also a very easy data step problem using first logic.

proc sort data=have; by policy_number tx_date plan_type plan_desc;

data want;

set have;

by policy_number tx_date;

if first.tx_date;

run;

View solution in original post

14 REPLIES 14
Reeza
Super User

Does the following work (untested):

proc sql;

     create table want as

select policy_number, tx_date, min(plan_type) as min_plan_type, plan_desc

group by policy_number, tx_date

having plan_typ=calculated min_plan_type;

quit;

This is also a very easy data step problem using first logic.

proc sort data=have; by policy_number tx_date plan_type plan_desc;

data want;

set have;

by policy_number tx_date;

if first.tx_date;

run;

Shara
Calcite | Level 5

Question about the above:  having plan_typ =calculated min_plan_type

This is giving me an error but I don't understand why. It doesn't like something about it.

Reeza
Super User

Can you post the log?

Shara
Calcite | Level 5

I don't know if the log will be useful. I'm using SAS as a pass through. I've tried running your solution separate of the connection to our database and it works in a 2 step process where I extract just the 4 variables and then separately run the proc sql solution you gave above.

Here's my code, based on your help.

proc sql;

connect to XXXX as IQ (user = XXX password = XXX host='XXX' database='XXX' server='XXX' port='XXX');

create table Plan_home_details as

select * from connection to IQ

(

select

p.pol_no,

p.tx_pol_ver_dt,

min(p.plan_typ) as min_plan_type,

p.plan_desc

from FDSS_ANA2.pla_pol    p

where p.pol_no = 123456789   /*I'm testing on a single policy number*/

and p.pol_tx_typ in (3,4,5)

group by p.pol_no, p.tx_pol_ver_dt

having p.plan_typ=calculated min_plan_type

);

disconnect from IQ;

quit;

The log says the following:

ERROR: CLI prepare error: [Sybase][ODBC Driver][Sybase IQ]Syntax error near 'min_plan_type' on line 1

UPENDRAKUMARB
Calcite | Level 5

Hi Please try this, its is working.

DATA POLICY_DATASET;
INPUT POLICY_NUMBER $ TX_DATE : DATE9. PLAN_TYP PLAN_DESC $;
FORMAT TX_DATE DATE9.;
CARDS;
1 04JUL2011 15 Agro
1 04JUL2011 12 GDP
1 04JUL2011 7 VFFC
2 05JUL2011 10 Agro
2 05JUL2011 11 GDP
2 05JUL2011 20 VFFC
3 06JUL2011 2 Agro
3 06JUL2011 10 GDP
3 06JUL2011 15 VFFC
;
RUN;


PROC SQL;
CREATE TABLE POLICY_NUMBERS AS
SELECT POLICY_NUMBER, TX_DATE, PLAN_TYP, PLAN_DESC, MIN(PLAN_TYP) AS MIN_PLAN
FROM POLICY_DATASET

GROUP BY POLICY_NUMBER, TX_DATE
HAVING PLAN_TYP=CALCULATED MIN_PLAN
;
QUIT;

Reeza
Super User

When you use pass through you need to use SQL that is compliant with your system, so you need to provide the system information, in this case SYBASE.
Unfortunately I'm not familiar with SYBASE SQL, so I don't know the solution off the top of my head.

My next suggestion would be to use a subquery and join on the table itself.

Shara
Calcite | Level 5

The data step solution you provided does work.

Unfortunately, I must do this one in proc sql and can't use a data step.

Vince28_Statcan
Quartz | Level 8

I'm fairly positive it is because CALCULATED is for select and where statement references and is not required for HAVING statement. Simply using the column alias should be sufficient. The where statement normally applies before aggregate functions whereas having applies naturally after.

HAVING clause attributesWHERE clause attributes
is typically used to specify conditions
for including or excluding groups of rows from a table.
is used to specify conditions for
including or excluding individual rows from a table.
must follow the GROUP BY clause in a
query, if used with a GROUP BY clause.
must precede the GROUP BY clause in a
query, if used with a GROUP BY clause.
is affected by a GROUP BY clause; when
there is no GROUP BY clause, the HAVING clause is treated like a WHERE
clause.
is not affected by a GROUP BY
clause.
is processed after the GROUP BY clause and
any aggregate functions.

is processed before a GROUP BY clause, if
there is one, and before any aggregate functions.

One of the following codes should fix the issue

proc sql;

     create table want as

     select *

     from have

     group by policy_number, tx_date

     having plan_typ=min(plan_typ)

;

quit;

*edit* killed the where approach - does not work. However, calculated statement is not necessary in the having clause.

It's possible that simply removing the CALCULATED statement in Reeza's code will work as well since having is processed after aggregate functions.

Vince

Patrick
Opal | Level 21

Agree with Vince: "Calculated" is a SAS SQL flavour thing and won't work in Sybase (and other DBMS). Using "having p.plan_typ=min(p.plan_typ)" will work.

You also need to have "min(p.plan_typ) as min_plan_type" in your select statement. A "select *" won't do. Not sure if Sybase would allow a "select * " syntactically (Oracle wouldn't) but even if it doesn't throw an error (SAS SQL wouldn't) you would end up with one row per distinct value of plan_type per group. Something you don't want. You're only after one row with the minimum value for plan_type.

select
  p.pol_no,
  p.tx_pol_ver_dt,
  min(p.plan_typ) as min_plan_type,
  p.plan_desc
  from FDSS_ANA2.pla_pol p


where
  p.pol_no = 123456789   /*I'm testing on a single policy number*/
  and p.pol_tx_typ in (3,4,5)

group by p.pol_no, p.tx_pol_ver_dt

having p.plan_typ=min(p.plan_typ)

If you're familiar with SAS SQL but not with the DB flavour SQL then one "trick" which sometimes works is to write the SQL first in SAS but also using options

OPTIONS SASTRACE=',,,d' SASTRACELOC=SASLOG NOSTSUFFIX;

The SAS/Access engine will try to send as much of your SAS SQL to the database for execution (so translates it to DB flavour SQL). Using above options will show you in the log the SQL SAS was able to send to the DB - and if it was the full SQL then you can copy/paste this SQL from the log into a pass-through SQL block (or at least use it as a starting point).

Reeza
Super User

Doesn't SQL usually complain about selecting something that's not in the group by clause or summarized on as well, ie plan_desc?

That's why you usually need a subquery. 

Patrick
Opal | Level 21

Yes, some DB SQL flavours do. If "plan_desc" is constant per group then a way around this is to use a max or min function there as well:

select
  p.pol_no,
  p.tx_pol_ver_dt,
  min(p.plan_typ) as min_plan_type,
  min(p.plan_desc) as plan_desc
  from FDSS_ANA2.pla_pol p


where
  p.pol_no = 123456789   /*I'm testing on a single policy number*/
  and p.pol_tx_typ in (3,4,5)

group by p.pol_no, p.tx_pol_ver_dt

having p.plan_typ=min(p.plan_typ)

Vince28_Statcan
Quartz | Level 8

I don't think it's necessary to calculate the minimum in the select statement. The value of the minimum is stored for some row(s) of a group and thus the having statement will select the row(s) with value equal to minimum. This causes a remerging before having statement applies but realistically, plan_desc would cause the remerging anyway so there is no gain in calculating a minimum on plan_typ in the select statement unless you also edit the way you are computing plan_desc to not cause the remerging.

I also think this is what Reeza refers to. Including plan_desc in the query causes a warning (or a note) saying that the query caused a remerging. Which is not necessarily a bad thing - it would be just as long to do subquery + merge or self join.

You did point out that if somehow you can use another way to derive plan_desc either through an aggregate function or possibly a case over the calculated min(plan_typ), then there could be a real gain but if not, then there is no need to create a column for min(plan_typ) in the select statement as the having will only retain the corresponding appropriate row(s).

Patrick
Opal | Level 21

"The value of the minimum is stored for some row(s) of a group and thus the having statement will select the row(s) with value equal to minimum"

That will work with SAS SQL but I'm pretty sure that with Oracle SQL it wouldn't and you have to use some function for all columns not being part of the group by clause.

I'm not sure if this behaviour is Oracle specific or an "ANSI standard" but using a function will always work - and as I don't have experience with the Sybase SQL flavour that's what I proposed to be on the safe side. Else it's the sub-query as Reeza proposed.

Mit
Calcite | Level 5 Mit
Calcite | Level 5

Try the following code

Proc sort data=have ;
by policy_number tx_date plan_typ;
run;

data want ;
set have;
by policy_number tx_date plan_typ;

if first.policy_number then output;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 4690 views
  • 0 likes
  • 6 in conversation