How do I select distinct on multiple variables using PROC SQL

Paige1 · Posted 04-24-2019 04:55 PM

I am trying to create a dataset based on a much bigger dataset that has many more rows and columns. I only want to keep the columns RSP, ID, and PAR. And I only want the rows that have unique/distinct ID and PAR combinations. But the code I have keeps returning errors:

proc sql;
create table newtable as
select RSP, distinct ID, PAR
from oldtable.fy18
where RSP="N" 
;
quit;

When I run this code, it underlines ID with a 22 and puts:

ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, (, *, **, +, ',', -, '.', /, <, <=, <>, =, >, >=, ?, AND, AS,
CONTAINS, EQ, EQT, FROM, GE, GET, GT, GTT, LE, LET, LIKE, LT, LTT, NE, NET, OR, ^=, |, ||, ~=.

SAS 9.4

Thank you very much

PeterClemmensen · Posted 04-24-2019 05:04 PM

Why not simply:

proc sort data=oldtable.fy18(keep=RSP ID PAR) nodupkey out=newtable;
   by ID PAR;
run;

The DATA to DATA Step Macro
Blog: SASnrd

Paige1 · Posted 04-24-2019 05:11 PM

@PeterClemmensen The oldtable is VERY large and would take a lot of work to sort.

Tom · Posted 04-24-2019 07:37 PM

@Paige1 wrote:

@PeterClemmensen The oldtable is VERY large and would take a lot of work to sort.

You are going to have to sort to find the distinct values. PROC SORT is probably the fastest way to sort.

ballardw · Posted 04-24-2019 06:09 PM

@Paige1 wrote:

I am trying to create a dataset based on a much bigger dataset that has many more rows and columns. I only want to keep the columns RSP, ID, and PAR. And I only want the rows that have unique/distinct ID and PAR combinations. But the code I have keeps returning errors:
proc sql;
create table newtable as
select RSP, distinct ID, PAR
from oldtable.fy18
where RSP="N" 
;
quit;
When I run this code, it underlines ID with a 22 and puts:

ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, (, *, **, +, ',', -, '.', /, <, <=, <>, =, >, >=, ?, AND, AS,
CONTAINS, EQ, EQT, FROM, GE, GET, GT, GTT, LE, LET, LIKE, LT, LTT, NE, NET, OR, ^=, |, ||, ~=.

SAS 9.4

Thank you very much

Distinct to use for the purpose you intend has to be before the variables and applies to all variables on the SELECT clause. Since you have distinct following RSP then the compiler thinks that distinct is a variable you are selecting and is missing a comma or some operator between distinct and id.

Likely you would get what you intend with

Select distinct rsp, id, par

FreelanceReinh · Posted 04-25-2019 05:12 AM

Hi @Paige1,

You could simply

select distinct ID, PAR

because, given your WHERE condition, the RSP values would be redundant anyway.

In a quick test on my computer with 100 million observations (and 4 variables) containing 2 million distinct ID-PAR combinations (for RSP='N') in random order the PROC SQL step took 53 s. PROC SORT with NODUPKEY option was faster (46 s) and a DATA step using a hash object (as shown below) was slower (59 s, without "ordered:'y'" 57 s).

data _null_;
if 0 then set fy18;
dcl hash h(dataset:'fy18(where=(rsp="N"))', ordered:'y', hashexp:20);
h.definekey('id', 'par');
h.definedone();
h.output(dataset:'newtable');
stop;
run;

Paige1 · Posted 04-28-2019 12:58 PM

select distinct ID, Par worked.

I didn't realize that if you have a where statement, it will include that variable in the table you're creating. Thank you all!

FreelanceReinh · Posted 04-28-2019 02:10 PM

@Paige1 wrote:

I didn't realize that if you have a where statement, it will include that variable in the table you're creating.

No, this is not the case. With "redundant" I meant that all (possibly millions of) observations in dataset NEWTABLE would contain the same value of RSP (namely "N") and you would know this value beforehand because you specified it in the WHERE clause.

How do I select distinct on multiple variables using PROC SQL

Re: How do I select distinct on multiple variables using PROC SQL

Re: How do I select distinct on multiple variables using PROC SQL

Re: How do I select distinct on multiple variables using PROC SQL

Re: How do I select distinct on multiple variables using PROC SQL

Re: How do I select distinct on multiple variables using PROC SQL

Re: How do I select distinct on multiple variables using PROC SQL

Re: How do I select distinct on multiple variables using PROC SQL

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away