Help using Base SAS procedures

Reg:Sorting Data by Hash Keys

Reply
Contributor
Posts: 37

Reg:Sorting Data by Hash Keys


HI i having a dataset with 72 variables and having 12lacs observation
i want to sort the data on two variables but do reduce the time i want to do it by hash hey
how can i do it.

proc sort data=temp;
by name date;
run;

Trusted Advisor
Posts: 1,301

Reg:Sorting Data by Hash Keys

Are you confusing a hash join with a type of sort?  Are you wanting to create a new key as a hash value of the concatenation of the name and date fields?  Or something else? A sort of a file with only 72 variables and 1,200,000 observations should be very quick already...  What kind of system are you using?  You could try using the tagsort option.

Super User
Super User
Posts: 7,083

Reg:Sorting Data by Hash Keys

Trusted Advisor
Posts: 1,301

Reg:Sorting Data by Hash Keys

There is a balance in the performance offered by tagsorting.  If you use the tagsort option you remove the ability for the sort to be multithreaded.  So if your performance is hindered by I/O avaiablility for your utility files from the sort, then tagsort is a good option.  If you are not bounded by I/O or disk space for these files then you will likely see hindered sort performance.  You could try allocating additional memory to the sort procedure (see the sortsize option and other options in the memory/performace groups).  On my system, with default memory options a table with 4,000 columns and 1,000,000 obs took about 5 minutes with a lot of system competition.  However if I run the same sort with the tagsort option on it took over 10 minutes.

Super User
Posts: 10,048

Reg:Sorting Data by Hash Keys

You only need to use argument ordered: 'a' in hash table constructor.

and put these name into definekey () .Hash will automatically sort them for you.

declare hash ha1(dataset:'temp',ordered: 'a');

  ha1.definekey('name','age');

Ksharp

Contributor
Posts: 37

Reg:Sorting Data by Hash Keys

data temp;

input name age;

declare hash ha1(dataset:'temp',ordered: 'a');

  ha1.definekey('name','age');

cards;

23 45

67 89

12 56

run;

proc print;

run;

I have tryed this but it did not worked how can i do it and how can i use the hash key in proc sort:

proc sort data=temp;

by ha1.definekey;

run;

if i have used this iam getting error. can any one help me

Trusted Advisor
Posts: 1,301

Reg:Sorting Data by Hash Keys

Good thinking Ksharp, I do believe this is the method the OP was trying to ask about utilizing.  However, I do not think he understands it's use or implementation.

Super User
Super User
Posts: 7,083

Reg:Sorting Data by Hash Keys

I am really not clear on what you mean by "use the hash key" in proc sort.

Why do you want to sort your data?

Would an index serve your needs?

Also how large is your dataset? What number does 12lacs represent?

PROC Star
Posts: 7,492

Reg:Sorting Data by Hash Keys

Tom,  I can't answer why the OP would want to do this, but I can answer what the number represents: 1.2 million.  See: http://en.wikipedia.org/wiki/Lakh

Super User
Posts: 10,048

Re: Reg:Sorting Data by Hash Keys

You do not use Hash Table correctly.

See the followig code of example.

BTW. I am not sure this method will be more efficient than proc sort;

data temp;
input name age;
cards;
23 45
67 89
12 56
;
run;
data _null_;
*load dataset temp into Hash Table;
if 0 then set temp;
declare hash ha1(dataset:'temp',ordered: 'a');
  ha1.definekey('name','age');
  ha1.definedata('name','age');
  ha1.definedone();
*output Hash Table to a dataset;
  ha1.output(dataset:'want');
run;
*now see the data has already been sort by name age;
proc print data=want;run;

Ksharp

Ask a Question
Discussion stats
  • 9 replies
  • 173 views
  • 1 like
  • 5 in conversation