DATA Step, Macro, Functions and more

hash object vs proc sql

Accepted Solution Solved
Reply
Contributor
Posts: 33
Accepted Solution

hash object vs proc sql

I would like to calculate a sum, and add a field through the use of hash object
and get the same result (Table Calculated) obtained from the following proc sql:

 

data input;
input k1 k2 num;
datalines;
1 1  10
1 1 20
1 1 30
2 1 25
2 2 40
2 3 80
run;
proc sql;
create table calculated as 
select *, 
	sum(num) as sum_num,
	 case  when k1=1 then .
		  when k1=2 then 1
	 end as new_field
from input
group by  k1,k2;
quit;

Accepted Solutions
Solution
‎10-27-2016 11:00 AM
Frequent Contributor
Posts: 103

Re: hash object vs proc sql

[ Edited ]

In order to perform a match, hash needs clear keys. IMHO it can not deal with mnemonics operators like '<'

So you would have to calculate the sum_num and set the corresponding new_field first. Afterwards you can perform your merge with hash.

I would use @KurtBremser's code to perform the first step, while defining the new_field value as requested and load the table into the hash for match merging. - of course only if I expect the time requested for programming to be compensated by run time Smiley Wink

DATA help (keep=k1 k2 sum_num new_field);
   set input;
   retain sum_num;
   by k1 k2;
   if first.k2 then sum_num = 0;
   sum_num + num;

   if last.k2 then do;
      if k1 eq 1 then new_field=.;
      else if k1 eq 2 and sum_num < 50 then new_field=1;
      output;
   end;
RUN;

DATA calculated2;
   length new_field sum_num 8;
   drop rc; 
   if _n_=1 then do; 
      call missing(new_field, sum_num);
      declare hash NF(dataset:'help');
      NF.definekey('k1', 'k2');
      NF.definedata(all:'YES');
      NF.definedone();
   end;
   set input;
   rc=NF.find(key:k1,key:k2);
RUN;

 

________________________

- That still only counts as one -

View solution in original post


All Replies
Super User
Posts: 6,936

Re: hash object vs proc sql

If you are looking for a non-SQL solution (maybe because of performance problems?), consider this:

data help (keep=k1 k2 sum_num);
set input;
retain sum_num;
by k1 k2;
if first.k2 then sum_num = 0;
sum_num + num;
if last.k2 then output;
run;

data calculated;
merge
  input
  help
;
by k1 k2;
if k1 = 2 then new_field = 1;
run;

As opposed to the SQL, this also preserves the original order.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Frequent Contributor
Posts: 103

Re: hash object vs proc sql

%MACRO sum(k1,k2);
   PROC SQL;
      UPDATE calculated2
      SET sum_num=(select sum(num) FROM calculated WHERE k1 eq &k1. AND k2 eq &k2. GROUP BY k1,k2)
      WHERE k1 eq &k1. AND k2 eq &k2. 
      ;
   QUIT;
%MEND;

DATA calculated2;
   length new_field sum_num 8;
   drop rc; 
   if _n_=1 then do; 
      call missing(new_field);
      declare hash NF();
      NF.definekey('k1');
      NF.definedata('new_field');
      NF.definedone(); 
      NF.add(key:1, data:.);
      NF.add(key:2, data:1);
   end;
   set input;
   call execute('%'||'sum('||put(k1,best.)||','||put(k2,best.)||');');
   rc=NF.find();
RUN;

Nice problem, thanks for asking Smiley Happy

Cheers,

Oligolas

________________________

- That still only counts as one -

Contributor
Posts: 33

Re: hash object vs proc sql

 

Thanks!

And if there was a more restrictive condition on the "case when"?

For example:

 

proc sql;
create table calculated as
select *,
sum(num) as sum_num,
case when k1=1 then .
when k1=2 and calculated sum_num < 50 then 1
end as new_field
from input
group by k1,k2;
quit;

Solution
‎10-27-2016 11:00 AM
Frequent Contributor
Posts: 103

Re: hash object vs proc sql

[ Edited ]

In order to perform a match, hash needs clear keys. IMHO it can not deal with mnemonics operators like '<'

So you would have to calculate the sum_num and set the corresponding new_field first. Afterwards you can perform your merge with hash.

I would use @KurtBremser's code to perform the first step, while defining the new_field value as requested and load the table into the hash for match merging. - of course only if I expect the time requested for programming to be compensated by run time Smiley Wink

DATA help (keep=k1 k2 sum_num new_field);
   set input;
   retain sum_num;
   by k1 k2;
   if first.k2 then sum_num = 0;
   sum_num + num;

   if last.k2 then do;
      if k1 eq 1 then new_field=.;
      else if k1 eq 2 and sum_num < 50 then new_field=1;
      output;
   end;
RUN;

DATA calculated2;
   length new_field sum_num 8;
   drop rc; 
   if _n_=1 then do; 
      call missing(new_field, sum_num);
      declare hash NF(dataset:'help');
      NF.definekey('k1', 'k2');
      NF.definedata(all:'YES');
      NF.definedone();
   end;
   set input;
   rc=NF.find(key:k1,key:k2);
RUN;

 

________________________

- That still only counts as one -

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 269 views
  • 1 like
  • 3 in conversation