Hi Experts,
There is a question in my mind from few days and working on it.
Suppose say I have an data set a which has a variable id and its not sorted.
data a;
input id;
cards;
1
2
4
5
3
;
run;I want this dataset to be sorted without using any proc or order by statement. I am thinking to transpose this data and then use call sort and then again transpose. Need your suggestions.
Best Regards,
Arun
Option 1:
Hash object with "ordered:" parameter:
data _null_;
if 0 then set have;
declare hash h (dataset:'have',ordered:'a');
h.definekey('id');
h.definedata(all:'Y');
h.definedone();
rc=h.output(dataset:'want');
run;
Option 2: You can even drop the ordered parameter if you use the hashexp:0 parameter:
data _null_;
if 0 then set have;
declare hash h (dataset:'have',hashexp:0);
h.definekey('id');
h.definedata(all:'Y');
h.definedone();
rc=h.output(dataset:'want2');
run;
The hashexp:0 parameter tells sas to use only a single "bucket" for the hash object. (2**0=1 bucket. The default is hashexp:8 implying 2**8=256 buckets). And each bucket, we have been told by Paul Dorfman in a number of presentations on the SAS hash object, is an AVL tree. It turns out this AVL tree, when processed through the OUTPUT method generates a sorted dataset when only one bucket is used.
If you have duplicate id's, then include the "multidata:'Y'" parameter in the declare hash statement.
Why one would go out of their way to do this, I don't know. Perhaps, because hash objects are memory-resident, there might be less disk input/output in re-ordering data items.
How are you planning on transposing the data without using a procedure? 🙂
I take it that you are allowed to use IML since you mention call sort?
Why? Proc sort, and order by are specifically built into SAS and SQL respectively for the sole purpose of ordering data. What purpose does it serve doing this yourself, you will write code that will be messy, and far slower than the built in functions - plus to get an accurate procedure, you will need to validate it etc.
If it was me in the chair I would say what I have stated above. If the interviewier is asking daft stuff like this, then I would considering how to get out of there as fast as possible. I mean where you going to go, maybe altering memory locations, heck go back to soldering transistors onto a board to order your data.
"If you're looking for an idiot who wastes your customer's money by wasting time doing stupid things, then I'm not the right person for you. I'm intelligent and like working hard at work worth doing."
Well, if you can't use PROC SORT, it doesn't seem fair that you can use CALL SORTN. So let's remove that tool as well. Purely as a theoretical exercise of ocurse.
First, find out the number of IDs that you have:
data _null_;
set have nobs=_nobs_;
call symputx('n_ids', _nobs_);
run;
Then use a DATA step to transpose, sort (without any CALL routines), and output:
data want;
array ids {&n_ids} id1-id&n_ids;
do i=1 by 1 until (done);
set have end=done;
ids{i} = id;
end;
That much loads the IDs into an array. Next, the same DATA step continues to sort them with no CALL routines:
do until (changes=0);
changes=0;
do i=2 to &n_ids;
if ids{i} < ids{i-1} then do;
* Two IDs are not in order. Switch them;
changes + 1;
temp = ids{i-1};
ids{i-1} = ids{i};
ids{i}=temp;
end;
end;
end;
Yes, DO UNTIL is dangerous and if incorrectly used can result in an infinite loop.
When all the switching is complete, output the results in the same DATA step:
do i=1 to &n_ids;
id = ids{i};
output;
end;
keep id;
run;
Most anything can be done. But as others have asked, would you really want to?
Open a table in tableview with openmode=edit and sort there by clicking on column heading and the sort icon.
Write the data out to a different file format and use someone else's sort utility.
I could write a very inefficient program comparing two rows of data with lag or dif,retain and a bunch of arrays to swap/write values. Really would not want to.
I would really be tempted to ask an interviewer why they want to use a crowbar instead of a screwdriver when the task calls for a screwdriver.
Option 1:
Hash object with "ordered:" parameter:
data _null_;
if 0 then set have;
declare hash h (dataset:'have',ordered:'a');
h.definekey('id');
h.definedata(all:'Y');
h.definedone();
rc=h.output(dataset:'want');
run;
Option 2: You can even drop the ordered parameter if you use the hashexp:0 parameter:
data _null_;
if 0 then set have;
declare hash h (dataset:'have',hashexp:0);
h.definekey('id');
h.definedata(all:'Y');
h.definedone();
rc=h.output(dataset:'want2');
run;
The hashexp:0 parameter tells sas to use only a single "bucket" for the hash object. (2**0=1 bucket. The default is hashexp:8 implying 2**8=256 buckets). And each bucket, we have been told by Paul Dorfman in a number of presentations on the SAS hash object, is an AVL tree. It turns out this AVL tree, when processed through the OUTPUT method generates a sorted dataset when only one bucket is used.
If you have duplicate id's, then include the "multidata:'Y'" parameter in the declare hash statement.
Why one would go out of their way to do this, I don't know. Perhaps, because hash objects are memory-resident, there might be less disk input/output in re-ordering data items.
A very easy method is sort a dataset with proc sort or orderby satetment is indexing.
As we know the index in stored as sorted with manner along with the dataset.
data a;
input x;
cards;
1
3
4
2
5
;
run;
proc datasets lib=work  noprint;
modify a;
index create x;
run;
data b;
set a;
by x;
run;hi do you have a pdf notes on this .. it will be great for me.
Thanks,
Arun
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
