problems in the "over" statement

ganatbasa · Posted 04-13-2023 11:55 AM

The following code runs perfectly in aginity/netezza sql, but does not work in

proc sql: 
Create table new_table as 
select * from (select * from( select a.* from (select *, rank () over (partition by var_a order by var_id) as rank1 from old_table) a where a.rank1=1) a) b

When I try to run this code in proc sql, it won't let me do this because of problems in the "over" statement. This code is used for removing duplicates : for groups of duplicate records within the data, it leaves only one record. Is there a way to modify this sql code so that in runs in proc sql? Thanks!

redtube

Tom · Posted 04-13-2023 12:09 PM

That is not valid ANSI 92 SQL syntax so it cannot work in PROC SQL.

If you want to take only the first observation in a BY group use a DATA step instead of trying to trick the set logic of SQL to handle observation order.

So if you have a dataset that is sorted by GROUP and DATE you can use a step like this to find the earliest observation for each group.

proc sort data=have;
  by group date;
run;
data first_per_group;
  set have;
  by group date ;
  if first.group;
run;

Looks like your SQL is using var_a as the GROUP variable and var_id as the sub-ordering variable within the groups, like the DATE variable in my example data step.

ballardw · Posted 04-13-2023 12:17 PM

Please post code or log examples in a text box opened on the forum using the </> icon above the message box. That will separate out code from question or problem descriptions and make it easier to follow. Also code pasted into a text box will retain formatting, such as code indentation, or diagnostic characters correctly if posting SAS log text as the main message windows will reformat pasted text.

Your SQL uses several items that are not available in SAS besides "over" : "Rank()" and "partition".

The SAS procedure SORT will remove duplicate observations.

Here is a brief example creating a small data set and then sort code to create wanted data without duplicate records.

data example;
  input x y z;
datalines;
1  2  3
2  3  4
1  2  3
2  3  5
3  4  5
1  2  3
;

proc sort data=example out=want nodup;
  by x;
run;

Note that the duplicate rows for 1 2 3 are removed.

The data step as shown to create the example data set is the way to provide an example of your data if needed. If this basic sort does not do what you want then provide an example starting data set and the desired result made from the example describing any rules.

HB · Posted 04-13-2023 03:06 PM

If you simply must have an SQL solution use DISTINCT.

Using BallardW's data statement you can do

proc sql;
select
distinct a.*
from
example a;
quit;

to get

x	y	z
1	2	3
2	3	4
2	3	5
3	4	5

problems in the "over" statement

Re: problems in the "over" statement

Re: problems in the "over" statement

Re: problems in the "over" statement

problems in the "over" statement

Re: problems in the "over" statement

Re: problems in the "over" statement

Re: problems in the "over" statement

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!