Is there an SQL equivalent to the following code?
data tst1;
infile cards delimiter='09x';
input st $2. @5 id1 $6. @13 id2 $6. @21 pay dollar10.2;
cards;
AK 000753 352689 $945.00
AK 000753 332446 $14,175.00
AK 079773 047274 $0.00
AK 080409 744304 $0.00
AK 094749 348343 $18,794.28
AK 109935 709535 $0.00
AK 143363 378625 $132,141.56
CA 332181 398805 $381.43
CO 062290 612954 $0.00
CT 094749 718936 $0.00
CT 094749 201278 $4,355.00
FL 070780 372943 $0.00
FL 070780 352206 $6,433.83
IL 073560 566269 $15,776.78
MD 118837 319500 $960.30
NJ 173824 983294 $6,227.86
NM 076332 889071 $0.00
OR 159021 656278 $0.00
TX 158684 156785 $0.00
TX 158684 849608 $80.03
;;;;
run;
proc sort data=tst1;
by st id1 pay;
run;
data tst2;
set tst1;
by st id1;
if last.id1;
run;
@Julie4435637 wrote:
... it could be the the min(id2) value when there is a grouping where id1 and pay are the same. I have the code to pull max(pay) but need something to pull just 1 record when there is a tie.
You could apply Reeza's solution to a view (in the outer query) where the ties have been removed:
proc sql; create view noties as select st, id1, min(id2) as id2, pay from tst1 group by st, id1, pay; create table want as select t1.* from noties as t1 right join(select st, id1, max(pay) as max_pay from tst1 group by st, id1) as f on t1.st=f.st and t1.id1 = f.id1 and t1.pay= f.max_pay; drop view noties; quit;
There is no equivalent. Use DATA step code.
None. SQL is intended to operate on sets of values. So there is no "first" or "last" operator.
If you expect to identify a "first" value then you need to add a variable to the data set for the order you expect and to identify such before any SQL code.
Perhaps if you describe what you are attempting to do in SQL someone can suggest an alternate approach.
I just updated my post with a small sample. I basically want to keep records within a by group (st & id1) with the max value of pay. This code may have to converted to use in SQL later - that is why I would like to avoid data step.
@Julie4435637 wrote:
I just updated my post with a small sample. I basically want to keep records within a by group (st & id1) with the max value of pay. This code may have to converted to use in SQL later - that is why I would like to avoid data step.
Like this?
proc sql; create table want as select * from tst1 group by ST,id1 having pay= max(pay) ; quit;
but duplicate max values will appear.
This will get you close...but if you have duplicate maximums in the data table, the SQL will not match the data step.
proc sql;
create table want as
select t1.* from tst1 as t1
right join(select st, id1, max(pay) as max_pay from tst1 group by st, id1) as f
on t1.st=f.st and t1.id1 = f.id1 and t1.pay= f.max_pay;
quit;
This use case will 'break' the match.
data tst1; infile cards delimiter='09x'; input st $2. @5 id1 $6. @13 id2 $6. @21 pay dollar10.2; cards; AK 000753 352689 $945.00 AK 000753 332446 $14,175.00 AK 079773 047274 $0.00 AK 080409 744304 $0.00 AK 094749 348343 $18,794.28 AK 109935 709535 $0.00 AK 143363 378625 $132,141.56 CA 332181 398805 $381.43 CO 062290 612954 $0.00 CT 094749 718936 $0.00 CT 094749 201278 $4,355.00 FL 070780 372943 $0.00 FL 070780 352206 $6,433.83 IL 073560 566269 $15,776.78 MD 118837 319500 $960.30 NJ 173824 983294 $6,227.86 NM 076332 889071 $0.00 OR 159021 656278 $0.00 OR 159021 656279 $0.00 TX 158684 156785 $0.00 TX 158684 849608 $80.03 ;;;; run;
SQL will return both records as they both match the maximum but SAS will only keep the last record. Which is correct depends on your usage.
That is a problem when the max(pay) has duplicates within a group. I need the state&id1 to be distinct.
I don't have a rule but for my purposes it could be the the min(id2) value when there is a grouping where id1 and pay are the same. I have the code to pull max(pay) but need something to pull just 1 record when there is a tie. I wanted the code in a generic SQL version because this code will need to pushed eventually on another platform that doesn't use SAS but SQL based.
@Julie4435637 wrote:
... it could be the the min(id2) value when there is a grouping where id1 and pay are the same. I have the code to pull max(pay) but need something to pull just 1 record when there is a tie.
You could apply Reeza's solution to a view (in the outer query) where the ties have been removed:
proc sql; create view noties as select st, id1, min(id2) as id2, pay from tst1 group by st, id1, pay; create table want as select t1.* from noties as t1 right join(select st, id1, max(pay) as max_pay from tst1 group by st, id1) as f on t1.st=f.st and t1.id1 = f.id1 and t1.pay= f.max_pay; drop view noties; quit;
@Julie4435637 wrote:
I just updated my post with a small sample. I basically want to keep records within a by group (st & id1) with the max value of pay. This code may have to converted to use in SQL later - that is why I would like to avoid data step.
Can you give us some context about why this has to be in SQL, in more depth than saying "may have to converted to us in SQL later"? As always, the context may be helpful in finding the right solution here. Because without context, the idea of taking code that works and translating it to some other part of SAS where it may not work seems like a poor idea.
I don't think it is possible to re-write in proc sql.
In SQL(like ms sql , or Teradata SQL) , if you wish to write then may be will have to carefully partition by and rank function (window functions in general) .
but partition by and rank functions do not work in Proc sql .
So, writing it using data step is the way forward i suppose.
Thanks a lot .
Check out this paper if you want to see SQL implementation for first. & last. Advanced Programming Techniques with PROC SQL
If you are trying to apply the SQL to a third party Relational Database such as Teradata/Oracle to name few, you may need to check this paper
Your Database Can Do SAS® Too!
Hope this helps,
Ahmed
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.