I am trying to create a dataset based on a much bigger dataset that has many more rows and columns. I only want to keep the columns RSP, ID, and PAR. And I only want the rows that have unique/distinct ID and PAR combinations. But the code I have keeps returning errors:
proc sql;
create table newtable as
select RSP, distinct ID, PAR
from oldtable.fy18
where RSP="N"
;
quit;
When I run this code, it underlines ID with a 22 and puts:
ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, (, *, **, +, ',', -, '.', /, <, <=, <>, =, >, >=, ?, AND, AS,
CONTAINS, EQ, EQT, FROM, GE, GET, GT, GTT, LE, LET, LIKE, LT, LTT, NE, NET, OR, ^=, |, ||, ~=.
SAS 9.4
Thank you very much
Why not simply:
proc sort data=oldtable.fy18(keep=RSP ID PAR) nodupkey out=newtable;
by ID PAR;
run;
@PeterClemmensen The oldtable is VERY large and would take a lot of work to sort.
@Paige1 wrote:
@PeterClemmensen The oldtable is VERY large and would take a lot of work to sort.
You are going to have to sort to find the distinct values. PROC SORT is probably the fastest way to sort.
@Paige1 wrote:
I am trying to create a dataset based on a much bigger dataset that has many more rows and columns. I only want to keep the columns RSP, ID, and PAR. And I only want the rows that have unique/distinct ID and PAR combinations. But the code I have keeps returning errors:
proc sql; create table newtable as select RSP, distinct ID, PAR from oldtable.fy18 where RSP="N" ; quit;
When I run this code, it underlines ID with a 22 and puts:
ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, (, *, **, +, ',', -, '.', /, <, <=, <>, =, >, >=, ?, AND, AS,
CONTAINS, EQ, EQT, FROM, GE, GET, GT, GTT, LE, LET, LIKE, LT, LTT, NE, NET, OR, ^=, |, ||, ~=.
SAS 9.4
Thank you very much
Distinct to use for the purpose you intend has to be before the variables and applies to all variables on the SELECT clause. Since you have distinct following RSP then the compiler thinks that distinct is a variable you are selecting and is missing a comma or some operator between distinct and id.
Likely you would get what you intend with
Select distinct rsp, id, par
Hi @Paige1,
You could simply
select distinct ID, PAR
because, given your WHERE condition, the RSP values would be redundant anyway.
In a quick test on my computer with 100 million observations (and 4 variables) containing 2 million distinct ID-PAR combinations (for RSP='N') in random order the PROC SQL step took 53 s. PROC SORT with NODUPKEY option was faster (46 s) and a DATA step using a hash object (as shown below) was slower (59 s, without "ordered:'y'" 57 s).
data _null_;
if 0 then set fy18;
dcl hash h(dataset:'fy18(where=(rsp="N"))', ordered:'y', hashexp:20);
h.definekey('id', 'par');
h.definedone();
h.output(dataset:'newtable');
stop;
run;
select distinct ID, Par worked.
I didn't realize that if you have a where statement, it will include that variable in the table you're creating. Thank you all!
@Paige1 wrote:
I didn't realize that if you have a where statement, it will include that variable in the table you're creating.
No, this is not the case. With "redundant" I meant that all (possibly millions of) observations in dataset NEWTABLE would contain the same value of RSP (namely "N") and you would know this value beforehand because you specified it in the WHERE clause.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.