- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have a dataset that is sorted by only an ID variable. However, I would like to re-sort the data ID and Date of BIRTH and SEX in that order.
Can I use Proc SQL to store the resorted data in a new table ? Or will the order by be used only to display the results?
I am interested in resorting and saving the data in a new table, so any hints will help. I know it is easy to do this using Proc Sort in the Data Step
Thanks
Proc SQL;
create table NEW as select * from OLD
order by ID, DOB, SEX;
quit;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I wouldn't be too quick in assuming SQL ORDER BY produces the same result as PROC SORT.
Yes it will IF ... each combination of the sort variables has only one observation.
But if there is more than one observation with the same ID/DOB/SEX triplet, then you can't make that assumption. Now PROC SORT has an option (EQUALS vs NOEQUALS) that gives you a little control. The EQUALS option (which is the default) preserves the original order of observations with tied triplets. NOEQUALS does not enforce that result (which theoretically might allow a bit more efficiency). I do not believe that the ORDER BY clause in PROC SQL provides for honoring the EQUALS behavior, which would be necessary to guarantee identical results to PROC SORT
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes! Your code will produce the outcome you are looking for. Dataset NEW will sorted by ID, DOB, SEX.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Can I use Proc SQL to store the resorted data in a new table ? Or will the order by be used only to display the results?
SAS tables are sequential and the sort order is how data gets stored physically. That's different to most databases.
Whether you use the SQL syntax as shared or Proc Sort will have the same result. Afaik both procedures use the same "module" in the background for the actual sorting of a SAS table.
Proc Sort gives you more options like NODUPKEY or in cases where the source table is potentially already sorted the way you want it PRESORTED (which then will only physically sort the table if it isn't already sorted).
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I wouldn't be too quick in assuming SQL ORDER BY produces the same result as PROC SORT.
Yes it will IF ... each combination of the sort variables has only one observation.
But if there is more than one observation with the same ID/DOB/SEX triplet, then you can't make that assumption. Now PROC SORT has an option (EQUALS vs NOEQUALS) that gives you a little control. The EQUALS option (which is the default) preserves the original order of observations with tied triplets. NOEQUALS does not enforce that result (which theoretically might allow a bit more efficiency). I do not believe that the ORDER BY clause in PROC SQL provides for honoring the EQUALS behavior, which would be necessary to guarantee identical results to PROC SORT
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for this response. I have come across more than one observation with the same ID/DOB/SEX triplet sort key. And you are correct. Comparing the Proc SORT vs. PROC SQL, the PROC SQL method doesn't have the same EQUALS behavior as PROC SORT, so when I attempted to pick the first record in the sort key based on PROC SQL or PROC SORT I have noticed slight differences in output. Yes you do get the same number of observations, but the distribution looks different.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This is mostly (*) equivalent to
proc sort
data=old
out=new
;
by id dob sex;
run;
*) see the other posts in the thread.