Solved: Re: What is the equivalent SQL code for first. or last.

Julie4435637 · Posted 10-19-2023 10:13 AM

Is there an SQL equivalent to the following code?

data tst1;
infile cards delimiter='09x';
input st $2. @5 id1 $6. @13 id2 $6. @21 pay dollar10.2;
cards;
AK  000753  352689  $945.00
AK  000753  332446  $14,175.00
AK  079773  047274  $0.00
AK  080409  744304  $0.00
AK  094749  348343  $18,794.28
AK  109935  709535  $0.00
AK  143363  378625  $132,141.56
CA  332181  398805  $381.43
CO  062290  612954  $0.00
CT  094749  718936  $0.00
CT  094749  201278  $4,355.00
FL  070780  372943  $0.00
FL  070780  352206  $6,433.83
IL  073560  566269  $15,776.78
MD  118837  319500  $960.30
NJ  173824  983294  $6,227.86
NM  076332  889071  $0.00
OR  159021  656278  $0.00
TX  158684  156785  $0.00
TX  158684  849608  $80.03
;;;;
run;

proc sort data=tst1;
   by st id1 pay;
run;

data tst2;
   set tst1;
   by st id1;
   if last.id1;
run;

FreelanceReinh · Posted 10-20-2023 07:32 AM

@Julie4435637 wrote:

... it could be the the min(id2) value when there is a grouping where id1 and pay are the same. I have the code to pull max(pay) but need something to pull just 1 record when there is a tie.

You could apply Reeza's solution to a view (in the outer query) where the ties have been removed:

proc sql;
create view noties as
select st, id1, min(id2) as id2, pay from tst1
group by st, id1, pay;

create table want as
select t1.* from noties as t1
right join(select st, id1, max(pay) as max_pay from tst1 group by st, id1) as f
on t1.st=f.st and t1.id1 = f.id1 and t1.pay= f.max_pay;

drop view noties;
quit;

View solution in original post

PaigeMiller · Posted 10-19-2023 10:16 AM

There is no equivalent. Use DATA step code.

--
Paige Miller

ballardw · Posted 10-19-2023 10:18 AM

None. SQL is intended to operate on sets of values. So there is no "first" or "last" operator.

If you expect to identify a "first" value then you need to add a variable to the data set for the order you expect and to identify such before any SQL code.

Perhaps if you describe what you are attempting to do in SQL someone can suggest an alternate approach.

Julie4435637 · Posted 10-19-2023 12:08 PM

I just updated my post with a small sample. I basically want to keep records within a by group (st & id1) with the max value of pay. This code may have to converted to use in SQL later - that is why I would like to avoid data step.

ballardw · Posted 10-19-2023 12:17 PM

@Julie4435637 wrote:

I just updated my post with a small sample. I basically want to keep records within a by group (st & id1) with the max value of pay. This code may have to converted to use in SQL later - that is why I would like to avoid data step.

Like this?

proc sql;
   create table want as 
   select *
   from tst1
   group by ST,id1
   having pay= max(pay)
   ;
quit;

but duplicate max values will appear.

Reeza · Posted 10-19-2023 12:25 PM

This will get you close...but if you have duplicate maximums in the data table, the SQL will not match the data step.

proc sql;
create table want as
select t1.* from tst1 as t1
right join(select st, id1, max(pay) as max_pay from tst1 group by st, id1) as f
on t1.st=f.st and t1.id1 = f.id1 and t1.pay= f.max_pay;
quit;

This use case will 'break' the match.

data tst1;
infile cards delimiter='09x';
input st $2. @5 id1 $6. @13 id2 $6. @21 pay dollar10.2;
cards;
AK  000753  352689  $945.00
AK  000753  332446  $14,175.00
AK  079773  047274  $0.00
AK  080409  744304  $0.00
AK  094749  348343  $18,794.28
AK  109935  709535  $0.00
AK  143363  378625  $132,141.56
CA  332181  398805  $381.43
CO  062290  612954  $0.00
CT  094749  718936  $0.00
CT  094749  201278  $4,355.00
FL  070780  372943  $0.00
FL  070780  352206  $6,433.83
IL  073560  566269  $15,776.78
MD  118837  319500  $960.30
NJ  173824  983294  $6,227.86
NM  076332  889071  $0.00
OR  159021  656278  $0.00
OR  159021  656279  $0.00
TX  158684  156785  $0.00
TX  158684  849608  $80.03
;;;;
run;

SQL will return both records as they both match the maximum but SAS will only keep the last record. Which is correct depends on your usage.

Julie4435637 · Posted 10-19-2023 01:43 PM

That is a problem when the max(pay) has duplicates within a group. I need the state&id1 to be distinct.

Reeza · Posted 10-19-2023 01:48 PM

And what is the rule? Which observation should get picked in that case?

Julie4435637 · Posted 10-19-2023 03:36 PM

I don't have a rule but for my purposes it could be the the min(id2) value when there is a grouping where id1 and pay are the same. I have the code to pull max(pay) but need something to pull just 1 record when there is a tie. I wanted the code in a generic SQL version because this code will need to pushed eventually on another platform that doesn't use SAS but SQL based.

FreelanceReinh · Posted 10-20-2023 07:32 AM

@Julie4435637 wrote:

... it could be the the min(id2) value when there is a grouping where id1 and pay are the same. I have the code to pull max(pay) but need something to pull just 1 record when there is a tie.

You could apply Reeza's solution to a view (in the outer query) where the ties have been removed:

proc sql;
create view noties as
select st, id1, min(id2) as id2, pay from tst1
group by st, id1, pay;

create table want as
select t1.* from noties as t1
right join(select st, id1, max(pay) as max_pay from tst1 group by st, id1) as f
on t1.st=f.st and t1.id1 = f.id1 and t1.pay= f.max_pay;

drop view noties;
quit;

PaigeMiller · Posted 10-19-2023 02:58 PM

@Julie4435637 wrote:

I just updated my post with a small sample. I basically want to keep records within a by group (st & id1) with the max value of pay. This code may have to converted to use in SQL later - that is why I would like to avoid data step.

Can you give us some context about why this has to be in SQL, in more depth than saying "may have to converted to us in SQL later"? As always, the context may be helpful in finding the right solution here. Because without context, the idea of taking code that works and translating it to some other part of SAS where it may not work seems like a poor idea.

--
Paige Miller

okayMrN · Posted 10-19-2023 10:19 AM

I don't think it is possible to re-write in proc sql.

In SQL(like ms sql , or Teradata SQL) , if you wish to write then may be will have to carefully partition by and rank function (window functions in general) .

but partition by and rank functions do not work in Proc sql .

So, writing it using data step is the way forward i suppose.

Thanks a lot .

Reeza · Posted 10-19-2023 11:21 AM

There isn't one, but if you examine the underlying assumptions, there usually is a workaround. For example, you may have another variable such as a date, where the last record really reflects the last date or something like that. Unfortunately that does rely on understanding the data and problem.