DATA Step, Macro, Functions and more

Covert Data step to hash code

Reply
Occasional Contributor
Posts: 13

Covert Data step to hash code

Hi,

I am new to the hash code and I have a data step that I want to convert to hash code to decrease run time ( i have millions of rows). I would appreciate any help to covert same code to hash code.

 

Here is the logic of my datastep:  I am creating a new variable "tag" which can 1 or 0. If it is my first.ID and value of status is not 8 then 0 else 1 until it hits 1 again. So basically I am retaining the tag value until I hit any other value other than 8.

 

Example:

 

data try;
input id status;
datalines;
1 8
1 . 
1 1
1 .
2 8
2 .
2 .
2 .
3 1
3 .
3 8
3 .
4 .
4 .
4 .
5 .
5 .
5 8
5 .
;
run;

 

data try1 ;
set try;
by id;
retain tag 0;
if first.id then
tag=0;
if status=8 then
tag=1;
if status=1 then
tag=0;
run;

 

 

 

Output:

 id status tag
1 8 1
1 . 1
1 1 0
1 . 0
2 8 1
2 . 1
2 . 1
2 . 1
3 1 0
3 . 0
3 8 1
3 . 1
4 . 0
4 . 0
4 . 0
5 . 0
5 . 0
5 8 1
5 . 1

 

Super User
Posts: 6,626

Re: Covert Data step to hash code

Posted in reply to ArseneWenger

Your existing program is fine.  How could hashing speed this up in any way?

Occasional Contributor
Posts: 13

Re: Covert Data step to hash code

Posted in reply to Astounding
As I mentioned that I would several millions of rows and its taking forever.
Super User
Posts: 23,237

Re: Covert Data step to hash code

Posted in reply to ArseneWenger

What's your definition of forever?

I don't think there's anyway to simplify this process so you're best off trying to figure out where/why it's being slow first. 

 

A few million rows should be processed pretty quickly unless you have memory or space restrictions.

Occasional Contributor
Posts: 13

Re: Covert Data step to hash code

Regardless of time constraint can anyone help me understand how can the same code be converted to hash code for knowledge purpose? I would really appreciate that.

 

I am not lying when I said it's very very slow (several hours) and I feel I want to test if using hash code improves timing.

 

Thanks in advance for your help and understanding.

Super User
Posts: 23,237

Re: Covert Data step to hash code

Posted in reply to ArseneWenger

If a data step takes hours for less than 10 million rows, you either have some other step your'e not showing (a sort?) or something else is wrong with your set up. It takes about 20 minutes for me to process (multiple procs/data steps/sql) 30 million rows.

Super User
Posts: 23,237

Re: Covert Data step to hash code

Posted in reply to ArseneWenger

If you'd like to convert it as an academic exercise, feel free to do so.

 

Asking someone else to do it for you, isn't what this forum is for. If you're looking for help with it, show us what you have so far and we can help with what you correct/modify if it as necessary. And the 'requirement' to switch to hash makes me think this is likely homework.

 


ArseneWenger wrote:

Regardless of time constraint can anyone help me understand how can the same code be converted to hash code for knowledge purpose? I would really appreciate that.

 

I am not lying when I said it's very very slow (several hours) and I feel I want to test if using hash code improves timing.

 

Thanks in advance for your help and understanding.


 

Occasional Contributor
Posts: 13

Re: Covert Data step to hash code

Thanks, everyone!

Reeza this platform is for helping not for being mean. I am not a student in a college which you should have understood from my original post.

 

I have always experienced this behavior from you from all of my posts. 

I understand posting useless replies also bumps your rating as it doesn't see the content you are posting.

 

Again, thanks for spoiling the thread, I will sure figure out how to optimize this code. 

 

 

 

Super User
Posts: 23,237

Re: Covert Data step to hash code

Posted in reply to ArseneWenger

ArseneWenger wrote:

Thanks, everyone!

Reeza this platform is for helping not for being mean. I am not a student in a college which you should have understood from my original post.

 


Why should I understand you're not a student? We only see exactly what you post, so based on your question, I don't see any reason why I would conclude you're not a student. I don't care if you're a student, worker, manager or the CEO of a company, you'll get the same treatment from me. I care about how you phrase questions. It's one thing to ask for help, it's another to ask people to do work for you for free. People asking for the second without showing any attempts is my pet peeve, so yeah, I'm probably 'meaner' than necessary. I still think it's perfectly fair to ask someone to show what they've attempted before helping them. In fact, it's part of the requirements in other common forums (StackOverflow) and in general it's how you ask a  good question - see guidelines here:

https://stackoverflow.com/help/how-to-ask

and here:

https://communities.sas.com/t5/SAS-Communities-Library/How-to-ask-a-question-in-SAS-Analytics-U-Comm...


I understand posting useless replies also bumps your rating as it doesn't see the content you are posting.

 


I don't care about status, views or the number of posts on here. If I did, wouldn't answering your question get me more points and 'status'? In fact, the answer is probably less characters than typing out this response. 


Again, thanks for spoiling the thread, I will sure figure out how to optimize this code.  

My apologies for spoiling your thread, I promise to never spoil another thread for you ever again. 

PROC Star
Posts: 1,558

Re: Covert Data step to hash code

Posted in reply to ArseneWenger

Hi @ArseneWenger Sir/Madam, There's no reason for you get so heated about anything over a person or people just because they gave their opinion. Chill. The point all the super users were trying to make is that your question doesn't quite warrant a hash solution is all. All the three who responded in this thread are really amazing people and I have shamelessly used their solutions over the years. 

 

This is an amazing friendly forum and many have been benefitted(myself included) and so we shouldn't create room for unnecessary feelings of haste. The positive spirit of this forum should go on forever. Well, I hope.

 

OK, May i ask why do you want a hash solution? what are you trying to load in hash table? What would be your look up keys and what data would you associate with those keys? How would you add those to the table? Through direct load or load during the pass of dataset? If you choose the latter, how would you counter efficiency as this is going to be an overhead?  Now, if for some reason you loaded your data, how do you think the object dot notation method will fetch your  data? Doesn't this make a one full pass of a dataset, plus a hash table plus a routine of adding and fetching data into the PDV? So inessence, there is more I/O, memory and others etc to further affect performance?

 

If you can help me explain the above with logical reasoning and some critical thinking, we can make this discussion an enjoyable experience. Have a nice evening

Super User
Posts: 13,293

Re: Covert Data step to hash code

Posted in reply to ArseneWenger

Hash is one way of looking things up in another data set. Since you do not show anything that resembles a look up then Hash does not seem likely to help.

 

You can get a small improvement by changing the code slightly:

data try1 ;
   set try;
   by id;
   retain tag 0;
   if first.id then  tag=0;
   if status=8 then  tag=1;
   Else if status=1 then  tag=0;
run;

Put the status value in the first if statement that occurs more often if such is the case.

 

 

If you have more such values then describe them.

Occasional Contributor
Posts: 13

Re: Covert Data step to hash code

Thanks will try that.

Super User
Posts: 13,293

Re: Covert Data step to hash code

Posted in reply to ArseneWenger

Another possibility for a small improvement especially if the number of missing is large.

 

data try1 ;
   set try;
   by id;
   retain tag 0;
   if first.id then  tag=0;
   select (status);
      when(1) tag=0;
      when(8) tag=1;
      otherwise ;
   end;
run;

The select branches directly to one of 3 instructions with the otherwise being any code executed for values other than 1 or 8 such as missing. Since you don't actually want to do anything specific then a null statement is appropriate.

 

The time improvement would not be large but the If/then/else/if will always do 2 comparisons if the status is missing where the select will do only one.

PROC Star
Posts: 1,558

Re: Covert Data step to hash code

@ArseneWenger  My humble suggestion is to proceed with @ballardw simple yet neat approach with one pass and done and that is pretty much what all of us would. But here you go, just for your needs, have fun.- Nonetheless, I would never do this for my work

 

data try;

input id status;

datalines;

1 8

1 .

1 1

1 .

2 8

2 .

2 .

2 .

3 1

3 .

3 8

3 .

4 .

4 .

4 .

5 .

5 .

5 8

5 .

;

run;

 

 

data want;

    if _N_ = 1 then do;

      declare hash h(dataset: 'try', ordered: 'yes',multidata:'yes');

      declare hiter iter('h');

      h.defineKey('id');

      h.defineData('id','status');

      h.defineDone();

      call missing(id, status);

   end;

   rc = iter.first();

   do while (rc = 0);

   if lag(id) ne id then tag=0;

   if status=8 then tag=1;

   else if status=1 then tag=0;

     output;

      rc = iter.next();

   end;

   drop rc;

run;

 

Ask a Question
Discussion stats
  • 13 replies
  • 218 views
  • 5 likes
  • 5 in conversation