The following code runs perfectly in aginity/netezza sql, but does not work in
proc sql:
Create table new_table as
select * from (select * from( select a.* from (select *, rank () over (partition by var_a order by var_id) as rank1 from old_table) a where a.rank1=1) a) b
When I try to run this code in proc sql, it won't let me do this because of problems in the "over" statement. This code is used for removing duplicates : for groups of duplicate records within the data, it leaves only one record. Is there a way to modify this sql code so that in runs in proc sql? Thanks!
That is not valid ANSI 92 SQL syntax so it cannot work in PROC SQL.
If you want to take only the first observation in a BY group use a DATA step instead of trying to trick the set logic of SQL to handle observation order.
So if you have a dataset that is sorted by GROUP and DATE you can use a step like this to find the earliest observation for each group.
proc sort data=have;
by group date;
run;
data first_per_group;
set have;
by group date ;
if first.group;
run;
Looks like your SQL is using var_a as the GROUP variable and var_id as the sub-ordering variable within the groups, like the DATE variable in my example data step.
Please post code or log examples in a text box opened on the forum using the </> icon above the message box. That will separate out code from question or problem descriptions and make it easier to follow. Also code pasted into a text box will retain formatting, such as code indentation, or diagnostic characters correctly if posting SAS log text as the main message windows will reformat pasted text.
Your SQL uses several items that are not available in SAS besides "over" : "Rank()" and "partition".
The SAS procedure SORT will remove duplicate observations.
Here is a brief example creating a small data set and then sort code to create wanted data without duplicate records.
data example; input x y z; datalines; 1 2 3 2 3 4 1 2 3 2 3 5 3 4 5 1 2 3 ; proc sort data=example out=want nodup; by x; run;
Note that the duplicate rows for 1 2 3 are removed.
The data step as shown to create the example data set is the way to provide an example of your data if needed. If this basic sort does not do what you want then provide an example starting data set and the desired result made from the example describing any rules.
If you simply must have an SQL solution use DISTINCT.
Using BallardW's data statement you can do
proc sql;
select
distinct a.*
from
example a;
quit;
to get
x | y | z |
---|---|---|
1 | 2 | 3 |
2 | 3 | 4 |
2 | 3 | 5 |
3 | 4 | 5 |
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.