Re: First dot and last dot conversion into proc sql

karrora1201 · Posted 01-28-2021 08:34 AM

Hi Everyone, Can anybody help me with the below issue?

-> wanted to convert the following datastep (containing first. last. functionality) into Proc sql

data readin;
input ID Name $ Score;
cards;
1     David   45
1     David   74
2     Sam     45
2     Ram     54
3     Bane    87
3     Mary    92
3     Bane    87
4     Dane    23
5     Jenny   87
5     Ken     87
6     Simran  63
8     Priya   72
;
run;

proc sort data=readin;
by ID Name;
run;

data b;
set readin;
by ID Name;
if last.ID or last.Name then output;
run;

Ksharp · Posted 01-28-2021 08:43 AM

data readin;
input ID Name $ Score;
cards;
1     David   45
1     David   74
2     Sam     45
2     Ram     54
3     Bane    87
3     Mary    92
3     Bane    87
4     Dane    23
5     Jenny   87
5     Ken     87
6     Simran  63
8     Priya   72
;
run;

proc sql;
create table want as
select distinct *
 from (select monotonic() as _n_,* from readin)
  group by id,name
   having _n_=max(_n_);
quit;

PaigeMiller · Posted 01-28-2021 08:46 AM

The only problem using the MONOTONIC() function in PROC SQL is that it is unsupported. It works today, it may not work in the next release (or some future release) of SAS. And so, if you are writing code with an unsupported feature, this can lead to problems and the original problem is probably best left as a DATA step.

--
Paige Miller

Astounding · Posted 01-28-2021 08:46 AM

There is no such thing. SQL does not guarantee the order in which it selects observation, so the "last" observation is unreliable. It could change from run to run.

There is an unsupported, unguaranteed "monotonic" feature of SQL if you want to go that route.

Kurt_Bremser · Posted 01-28-2021 08:56 AM

Maxim 14: Use the Right Tool, which is the data step for such tasks. SQL code would be much more complicated, harder to maintain, and have worse performance. SQL does not have the concept of sequence that the data step has, and so that must be forced, usually with a sub-select which has very bad performance, or an intermediate step that creates a sequence number through use of the undocumented (and therefore not advisable) MONOTONIC() function.

BTW last.ID implies last.Name, so it is sufficient to check for last.Name in your example.

A possible SQL code might be

proc sql;
create table want as
  select
    id,
    name,
    score
  from (
    select *, monotonic() as seq
    from readin
  )
  group by id, name
  having seq = max(seq)
;
quit;

but that is NOT guaranteed to create the same result as the SORT/DATA steps.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

novinosrin · Posted 01-28-2021 09:05 AM

I agree to both @Astounding and @PaigeMiller for their professional neat advice. Genius @Ksharp's solution is something that came to my mind too. However, this certainly is a not a question that's production code related and a fun question perhaps?

The subject matter though that's worth learning is whether or not there is a clean way of reading a dataset in a sequential order. Once we have this, a possible conditional construct utility may surface. I'm afraid Proc SQL doesn't quite have the row order functionality and in SQL of any variant requires explicit determination of a row order unlike the datastep.

The author of the best-selling SQL book of all time Ben Forta writes, "the retrieved data from underlying tables is not displayed in a mere random order. If unsorted, data will typically be displayed in the order in which it appears in the underlying tables. This could be the order in which the data was added to the tables initially. However, if data was subsequently updated or deleted, the order will be affected by how the DBMS reuses reclaimed storage space. The end result is that you cannot (and should not) rely on the sort order if you do not explicitly control it. Relational database design theory states that the sequence of retrieved data cannot be assumed to have significance if ordering was not explicitly specified"

Note: Some even go as far to believe almighty perhaps created EF Codd to relate and normalise tables and then Ben forta to play with it. If Ben wrote it, I believe him.

First dot and last dot conversion into proc sql