I have a data set from which a couple sets, identical to the source set, are derived. The original code sorted each derived set one at a time. To speed up the program, I would like to change it to just sort the source data set. But I notice when I do this that the source set's metadata shows "SORTED BY whatever", but the derived sets do not show that they are sorted by anything.
My question is, does it matter? Is there any practical difference between 2 sets with the same exact record order if one's metadata indicates how it is sorted and the other does not?
Thanks. I know kind of a weird question.
data source;
infile datalines dlm=',';
length a $1. b 3.;
input a $ b;
datalines;
a,1
b,2
c,3
d,4
e,5
;
proc sort;
by a;
run;
data derived1 derived2;
set source;
by a;
run;
Hi @Mike_B,
You can use the SORTEDBY= dataset option to attach that metadata to the derived datasets (without sorting them):
data derived(sortedby=a) derived2(sortedby=a);
set source;
by a;
run;
Hi @Mike_B,
You can use the SORTEDBY= dataset option to attach that metadata to the derived datasets (without sorting them):
data derived(sortedby=a) derived2(sortedby=a);
set source;
by a;
run;
Thanks. Found this article that elaborates on the concept. So I decided to use the sortedby data set option + the presorted proc sort option.
data derived1(sortedby=a) derived2(sortedby=a);
set source;
by a;
run;
proc sort data=derived1 presorted;
by a;
proc sort data=derived2 presorted;
by a;
run;
For completeness there is this article on "Sorted Datasets" and the "The Sort Indicator". http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000766829.htm#a0031...
> My question is, does it matter?
This metadata is hugely important if you want to leverage one of SAS's strengths in terms or storage: it keeps data sets sorted.
Note that there are 2 sorted flags. The VALIDATED sort flag is the one that matters as SAS can ignore the flag that you set yourself in some cases. See here.
The last step when creating a permanent table should be a proc sort (with option presorted if applicable) so the VALIDATED flag is set. The output data set should also set other options: index= if needed, compression= to choose the best method for that particular data, and the write= alter= options, to avoiding anyone overwriting the data set by accident. The password can be well-known if needed. The goal is to avoid accidental changes. Other useful options can be set too to improve performance.
For example something like:
proc sort data=SUMS
out =PRODLIB.FINAL( index =(CUSTOMER_ID)
compress=char
write =ProdData
alter =ProdData
label ="Created %sysfunc(datetime(),datetime20.) by &sysuserid"
bufsize =32k
bufno =25
pointobs=no)
presorted;
by DATE;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.