Hi Everyone, Can anybody help me with the below issue?
-> wanted to convert the following datastep (containing first. last. functionality) into Proc sql
data readin; input ID Name $ Score; cards; 1 David 45 1 David 74 2 Sam 45 2 Ram 54 3 Bane 87 3 Mary 92 3 Bane 87 4 Dane 23 5 Jenny 87 5 Ken 87 6 Simran 63 8 Priya 72 ; run;
proc sort data=readin;
by ID Name;
run; data b; set readin; by ID Name; if last.ID or last.Name then output; run;
data readin;
input ID Name $ Score;
cards;
1 David 45
1 David 74
2 Sam 45
2 Ram 54
3 Bane 87
3 Mary 92
3 Bane 87
4 Dane 23
5 Jenny 87
5 Ken 87
6 Simran 63
8 Priya 72
;
run;
proc sql;
create table want as
select distinct *
from (select monotonic() as _n_,* from readin)
group by id,name
having _n_=max(_n_);
quit;
The only problem using the MONOTONIC() function in PROC SQL is that it is unsupported. It works today, it may not work in the next release (or some future release) of SAS. And so, if you are writing code with an unsupported feature, this can lead to problems and the original problem is probably best left as a DATA step.
There is no such thing. SQL does not guarantee the order in which it selects observation, so the "last" observation is unreliable. It could change from run to run.
There is an unsupported, unguaranteed "monotonic" feature of SQL if you want to go that route.
Maxim 14: Use the Right Tool, which is the data step for such tasks. SQL code would be much more complicated, harder to maintain, and have worse performance. SQL does not have the concept of sequence that the data step has, and so that must be forced, usually with a sub-select which has very bad performance, or an intermediate step that creates a sequence number through use of the undocumented (and therefore not advisable) MONOTONIC() function.
BTW last.ID implies last.Name, so it is sufficient to check for last.Name in your example.
A possible SQL code might be
proc sql;
create table want as
select
id,
name,
score
from (
select *, monotonic() as seq
from readin
)
group by id, name
having seq = max(seq)
;
quit;
but that is NOT guaranteed to create the same result as the SORT/DATA steps.
I agree to both @Astounding and @PaigeMiller for their professional neat advice. Genius @Ksharp's solution is something that came to my mind too. However, this certainly is a not a question that's production code related and a fun question perhaps?
The subject matter though that's worth learning is whether or not there is a clean way of reading a dataset in a sequential order. Once we have this, a possible conditional construct utility may surface. I'm afraid Proc SQL doesn't quite have the row order functionality and in SQL of any variant requires explicit determination of a row order unlike the datastep.
The author of the best-selling SQL book of all time Ben Forta writes, "the retrieved data from underlying tables is not displayed in a mere random order. If unsorted, data will typically be displayed in the order in which it appears in the underlying tables. This could be the order in which the data was added to the tables initially. However, if data was subsequently updated or deleted, the order will be affected by how the DBMS reuses reclaimed storage space. The end result is that you cannot (and should not) rely on the sort order if you do not explicitly control it. Relational database design theory states that the sequence of retrieved data cannot be assumed to have significance if ordering was not explicitly specified"
Note: Some even go as far to believe almighty perhaps created EF Codd to relate and normalise tables and then Ben forta to play with it. If Ben wrote it, I believe him.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.