I often find myself creating a data-set and then sorting it since for instance I want to do an visual inspection of my data.
For example, with pseudo code it looks like:
EDIT: When creating the data set, I do some other things such as keeping some columns and create a new one.
* Creating my data-set;
data MyData;
set SomeDataSet;
keep colA colB;
NewCol = ColA - ColB;
run;
* Sorting my data-set;
proc sort data = MyData;
by ColumnA;
run;
Now, this quickly becomes alot of code, can this be done more compactly? For instance sort the data already in the data step? Any other methods to recommend?
Thanks.
If you switch from a DATA step to PROC SQL, you can do it in one step. My SQL is rusty at best so you will need to test for syntax errors:
proc sql noprint;
create table MyData as
select ColA, ColB, NewCol as ColA - ColB,
from SomeDataSet,
order by ColA;
quit;
The first step is not necessary at all, just use
proc sort data= SomeDataset out = MyData;
by ColumnA;
run;
I think your idea of "compact code" and my idea of "compact code" are probably different.
If you need to do a subtraction in a DATA step, and then a sort, then these are two different processes. You can't do them both in one step * so you are left with the need to do these two different processes in two steps. NOTE: there are times when you can do two different processes in one step, this isn't one of them.
* — I suppose you could use hash objects to do all of this in one data step, if you want to use an advanced method.
@SasStatistics wrote:
Thanks, please view my edit, my pseudo code example was bad.
Yes, it was 😉
As @PaigeMiller pointed out: one could use a hash-object to do the sorting in the data step, but you would have to write more than three lines of code to get the same result, so using easy-to-read steps is by far more desirable than having one big step.
You could do a sort in a data step on the fly, but it can only work as long as the complete dataset (including the B-tree) can be kept in memory, and it would be A LOT MORE code.
If you want to avoid storing an intermediate dataset, you can use your data step to define a view and use that as input in the PROC SORT, but you will still have to write the code (even a little more, as you need the /VIEW= in the DATA statement).
If you switch from a DATA step to PROC SQL, you can do it in one step. My SQL is rusty at best so you will need to test for syntax errors:
proc sql noprint;
create table MyData as
select ColA, ColB, NewCol as ColA - ColB,
from SomeDataSet,
order by ColA;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.