- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to create a dataset based on a much bigger dataset that has many more rows and columns. I only want to keep the columns RSP, ID, and PAR. And I only want the rows that have unique/distinct ID and PAR combinations. But the code I have keeps returning errors:
proc sql;
create table newtable as
select RSP, distinct ID, PAR
from oldtable.fy18
where RSP="N"
;
quit;
When I run this code, it underlines ID with a 22 and puts:
ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, (, *, **, +, ',', -, '.', /, <, <=, <>, =, >, >=, ?, AND, AS,
CONTAINS, EQ, EQT, FROM, GE, GET, GT, GTT, LE, LET, LIKE, LT, LTT, NE, NET, OR, ^=, |, ||, ~=.
SAS 9.4
Thank you very much
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Why not simply:
proc sort data=oldtable.fy18(keep=RSP ID PAR) nodupkey out=newtable;
by ID PAR;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@PeterClemmensen The oldtable is VERY large and would take a lot of work to sort.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Paige1 wrote:
@PeterClemmensen The oldtable is VERY large and would take a lot of work to sort.
You are going to have to sort to find the distinct values. PROC SORT is probably the fastest way to sort.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Paige1 wrote:
I am trying to create a dataset based on a much bigger dataset that has many more rows and columns. I only want to keep the columns RSP, ID, and PAR. And I only want the rows that have unique/distinct ID and PAR combinations. But the code I have keeps returning errors:
proc sql; create table newtable as select RSP, distinct ID, PAR from oldtable.fy18 where RSP="N" ; quit;
When I run this code, it underlines ID with a 22 and puts:
ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, (, *, **, +, ',', -, '.', /, <, <=, <>, =, >, >=, ?, AND, AS,
CONTAINS, EQ, EQT, FROM, GE, GET, GT, GTT, LE, LET, LIKE, LT, LTT, NE, NET, OR, ^=, |, ||, ~=.
SAS 9.4
Thank you very much
Distinct to use for the purpose you intend has to be before the variables and applies to all variables on the SELECT clause. Since you have distinct following RSP then the compiler thinks that distinct is a variable you are selecting and is missing a comma or some operator between distinct and id.
Likely you would get what you intend with
Select distinct rsp, id, par
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Paige1,
You could simply
select distinct ID, PAR
because, given your WHERE condition, the RSP values would be redundant anyway.
In a quick test on my computer with 100 million observations (and 4 variables) containing 2 million distinct ID-PAR combinations (for RSP='N') in random order the PROC SQL step took 53 s. PROC SORT with NODUPKEY option was faster (46 s) and a DATA step using a hash object (as shown below) was slower (59 s, without "ordered:'y'" 57 s).
data _null_;
if 0 then set fy18;
dcl hash h(dataset:'fy18(where=(rsp="N"))', ordered:'y', hashexp:20);
h.definekey('id', 'par');
h.definedone();
h.output(dataset:'newtable');
stop;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
select distinct ID, Par worked.
I didn't realize that if you have a where statement, it will include that variable in the table you're creating. Thank you all!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Paige1 wrote:
I didn't realize that if you have a where statement, it will include that variable in the table you're creating.
No, this is not the case. With "redundant" I meant that all (possibly millions of) observations in dataset NEWTABLE would contain the same value of RSP (namely "N") and you would know this value beforehand because you specified it in the WHERE clause.