@Ronein wrote: Sometimes there are some problems in data and have duplications. Is it better to select all rows in the query that create sas data set from tera table (not using distinct) and only then use sas proc sort nodupkey?
That's what I would do; PROC SORT in SAS is usually the quickest way, unless you can fit the data into memory (hash object).
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!