BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Julie4435637
Obsidian | Level 7

Is there an SQL equivalent to the following code?  

data tst1;
infile cards delimiter='09x';
input st $2. @5 id1 $6. @13 id2 $6. @21 pay dollar10.2;
cards;
AK 000753 352689 $945.00
AK 000753 332446 $14,175.00
AK 079773 047274 $0.00
AK 080409 744304 $0.00
AK 094749 348343 $18,794.28
AK 109935 709535 $0.00
AK 143363 378625 $132,141.56
CA 332181 398805 $381.43
CO 062290 612954 $0.00
CT 094749 718936 $0.00
CT 094749 201278 $4,355.00
FL 070780 372943 $0.00
FL 070780 352206 $6,433.83
IL 073560 566269 $15,776.78
MD 118837 319500 $960.30
NJ 173824 983294 $6,227.86
NM 076332 889071 $0.00
OR 159021 656278 $0.00
TX 158684 156785 $0.00
TX 158684 849608 $80.03
;;;;
run;

proc sort data=tst1;
by st id1 pay;
run;

data tst2;
set tst1;
by st id1;
if last.id1;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

@Julie4435637 wrote:

... it could be the the min(id2) value when there is a grouping where id1 and pay are the same.  I have the code to pull max(pay) but need something to pull just 1 record when there is a tie.


You could apply Reeza's solution to a view (in the outer query) where the ties have been removed:

proc sql;
create view noties as
select st, id1, min(id2) as id2, pay from tst1
group by st, id1, pay;

create table want as
select t1.* from noties as t1
right join(select st, id1, max(pay) as max_pay from tst1 group by st, id1) as f
on t1.st=f.st and t1.id1 = f.id1 and t1.pay= f.max_pay;

drop view noties;
quit;

View solution in original post

13 REPLIES 13
PaigeMiller
Diamond | Level 26

There is no equivalent. Use DATA step code.

 

 

--
Paige Miller
ballardw
Super User

None. SQL is intended to operate on sets of values. So there is no "first" or "last" operator.

If you expect to identify a "first" value then you need to add a variable to the data set for the order you expect and to identify such before any SQL code.

 

Perhaps if you describe what you are attempting to do in SQL someone can suggest an alternate approach.

Julie4435637
Obsidian | Level 7

I just updated my post with a small sample.  I basically want to keep records within a by group (st & id1) with the max value of pay. This code may have to converted to use in SQL later - that is why I would like to avoid data step.

ballardw
Super User

@Julie4435637 wrote:

I just updated my post with a small sample.  I basically want to keep records within a by group (st & id1) with the max value of pay. This code may have to converted to use in SQL later - that is why I would like to avoid data step.


Like this?

proc sql;
   create table want as 
   select *
   from tst1
   group by ST,id1
   having pay= max(pay)
   ;
quit;

but duplicate max values will appear.

 

Reeza
Super User

This will get you close...but if you have duplicate maximums in the data table, the SQL will not match the data step. 

 

proc sql;
create table want as
select t1.* from tst1 as t1
right join(select st, id1, max(pay) as max_pay from tst1 group by st, id1) as f
on t1.st=f.st and t1.id1 = f.id1 and t1.pay= f.max_pay;
quit;

This use case will 'break' the match.

data tst1;
infile cards delimiter='09x';
input st $2. @5 id1 $6. @13 id2 $6. @21 pay dollar10.2;
cards;
AK  000753  352689  $945.00
AK  000753  332446  $14,175.00
AK  079773  047274  $0.00
AK  080409  744304  $0.00
AK  094749  348343  $18,794.28
AK  109935  709535  $0.00
AK  143363  378625  $132,141.56
CA  332181  398805  $381.43
CO  062290  612954  $0.00
CT  094749  718936  $0.00
CT  094749  201278  $4,355.00
FL  070780  372943  $0.00
FL  070780  352206  $6,433.83
IL  073560  566269  $15,776.78
MD  118837  319500  $960.30
NJ  173824  983294  $6,227.86
NM  076332  889071  $0.00
OR  159021  656278  $0.00
OR  159021  656279  $0.00
TX  158684  156785  $0.00
TX  158684  849608  $80.03
;;;;
run;

SQL will return both records as they both match the maximum but SAS will only keep the last record. Which is correct depends on your usage. 

Julie4435637
Obsidian | Level 7

That is a problem when the max(pay) has duplicates within a group.  I need the state&id1 to be distinct.

Reeza
Super User
And what is the rule? Which observation should get picked in that case?
Julie4435637
Obsidian | Level 7

I don't have a rule but for my purposes it could be the the min(id2) value when there is a grouping where id1 and pay are the same.  I have the code to pull max(pay) but need something to pull just 1 record when there is a tie.  I wanted the code in a generic SQL version because this code will need to pushed eventually on another platform that doesn't use SAS but SQL based.

 

FreelanceReinh
Jade | Level 19

@Julie4435637 wrote:

... it could be the the min(id2) value when there is a grouping where id1 and pay are the same.  I have the code to pull max(pay) but need something to pull just 1 record when there is a tie.


You could apply Reeza's solution to a view (in the outer query) where the ties have been removed:

proc sql;
create view noties as
select st, id1, min(id2) as id2, pay from tst1
group by st, id1, pay;

create table want as
select t1.* from noties as t1
right join(select st, id1, max(pay) as max_pay from tst1 group by st, id1) as f
on t1.st=f.st and t1.id1 = f.id1 and t1.pay= f.max_pay;

drop view noties;
quit;
PaigeMiller
Diamond | Level 26

@Julie4435637 wrote:

I just updated my post with a small sample.  I basically want to keep records within a by group (st & id1) with the max value of pay. This code may have to converted to use in SQL later - that is why I would like to avoid data step.


Can you give us some context about why this has to be in SQL, in more depth than saying "may have to converted to us in SQL later"? As always, the context may be helpful in finding the right solution here. Because without context, the idea of taking code that works and translating it to some other part of SAS where it may not work seems like a poor idea.

--
Paige Miller
okayMrN
Calcite | Level 5

I don't think it is possible to re-write in proc sql.

 

In SQL(like ms sql , or Teradata SQL) , if you wish to write then may be will have to carefully partition by and rank function (window functions in general) .

but partition by and rank functions do not work in Proc sql .

 

So, writing it using data step is the way forward i suppose. 

Thanks a lot .

Reeza
Super User
There isn't one, but if you examine the underlying assumptions, there usually is a workaround. For example, you may have another variable such as a date, where the last record really reflects the last date or something like that. Unfortunately that does rely on understanding the data and problem.
AhmedAl_Attar
Rhodochrosite | Level 12

Hi @Julie4435637 

Check out this paper if you want to see SQL implementation for first. & last. Advanced Programming Techniques with PROC SQL

If you are trying to apply the SQL to a third party Relational Database such as Teradata/Oracle to name few, you may need to check this paper 

Your Database Can Do SAS® Too!

 

Hope this helps,

Ahmed

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1697 views
  • 11 likes
  • 7 in conversation