I have a data set from which a couple sets, identical to the source set, are derived. The original code sorted each derived set one at a time. To speed up the program, I would like to change it to just sort the source data set. But I notice when I do this that the source set's metadata shows "SORTED BY whatever", but the derived sets do not show that they are sorted by anything.
My question is, does it matter? Is there any practical difference between 2 sets with the same exact record order if one's metadata indicates how it is sorted and the other does not?
Thanks. I know kind of a weird question.
data source; infile datalines dlm=','; length a $1. b 3.; input a $ b; datalines; a,1 b,2 c,3 d,4 e,5 ; proc sort; by a; run; data derived1 derived2; set source; by a; run;
Thanks. Found this article that elaborates on the concept. So I decided to use the sortedby data set option + the presorted proc sort option.
data derived1(sortedby=a) derived2(sortedby=a); set source; by a; run; proc sort data=derived1 presorted; by a; proc sort data=derived2 presorted; by a; run;
For completeness there is this article on "Sorted Datasets" and the "The Sort Indicator". http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000766829.htm#a0031...
> My question is, does it matter?
This metadata is hugely important if you want to leverage one of SAS's strengths in terms or storage: it keeps data sets sorted.
Note that there are 2 sorted flags. The VALIDATED sort flag is the one that matters as SAS can ignore the flag that you set yourself in some cases. See here.
The last step when creating a permanent table should be a proc sort (with option presorted if applicable) so the VALIDATED flag is set. The output data set should also set other options: index= if needed, compression= to choose the best method for that particular data, and the write= alter= options, to avoiding anyone overwriting the data set by accident. The password can be well-known if needed. The goal is to avoid accidental changes. Other useful options can be set too to improve performance.
For example something like:
proc sort data=SUMS out =PRODLIB.FINAL( index =(CUSTOMER_ID) compress=char write =ProdData alter =ProdData label ="Created %sysfunc(datetime(),datetime20.) by &sysuserid" bufsize =32k bufno =25 pointobs=no) presorted; by DATE; run;
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.