Solved: Re: hash object vs proc sql

mariopellegrini · Posted 10-24-2016 09:13 AM

I would like to calculate a sum, and add a field through the use of hash object
and get the same result (Table Calculated) obtained from the following proc sql:

data input;
input k1 k2 num;
datalines;
1 1  10
1 1 20
1 1 30
2 1 25
2 2 40
2 3 80
run;
proc sql;
create table calculated as 
select *, 
	sum(num) as sum_num,
	 case  when k1=1 then .
		  when k1=2 then 1
	 end as new_field
from input
group by  k1,k2;
quit;

Oligolas · Posted 10-24-2016 12:23 PM

In order to perform a match, hash needs clear keys. IMHO it can not deal with mnemonics operators like '<'

So you would have to calculate the sum_num and set the corresponding new_field first. Afterwards you can perform your merge with hash.

I would use @Kurt_Bremser's code to perform the first step, while defining the new_field value as requested and load the table into the hash for match merging. - of course only if I expect the time requested for programming to be compensated by run time 😉

DATA help (keep=k1 k2 sum_num new_field);
   set input;
   retain sum_num;
   by k1 k2;
   if first.k2 then sum_num = 0;
   sum_num + num;

   if last.k2 then do;
      if k1 eq 1 then new_field=.;
      else if k1 eq 2 and sum_num < 50 then new_field=1;
      output;
   end;
RUN;

DATA calculated2;
   length new_field sum_num 8;
   drop rc; 
   if _n_=1 then do; 
      call missing(new_field, sum_num);
      declare hash NF(dataset:'help');
      NF.definekey('k1', 'k2');
      NF.definedata(all:'YES');
      NF.definedone();
   end;
   set input;
   rc=NF.find(key:k1,key:k2);
RUN;

________________________
- Cheers -

View solution in original post

Kurt_Bremser · Posted 10-24-2016 09:59 AM

If you are looking for a non-SQL solution (maybe because of performance problems?), consider this:

data help (keep=k1 k2 sum_num);
set input;
retain sum_num;
by k1 k2;
if first.k2 then sum_num = 0;
sum_num + num;
if last.k2 then output;
run;

data calculated;
merge
  input
  help
;
by k1 k2;
if k1 = 2 then new_field = 1;
run;

As opposed to the SQL, this also preserves the original order.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Oligolas · Posted 10-24-2016 10:05 AM

%MACRO sum(k1,k2);
   PROC SQL;
      UPDATE calculated2
      SET sum_num=(select sum(num) FROM calculated WHERE k1 eq &k1. AND k2 eq &k2. GROUP BY k1,k2)
      WHERE k1 eq &k1. AND k2 eq &k2. 
      ;
   QUIT;
%MEND;

DATA calculated2;
   length new_field sum_num 8;
   drop rc; 
   if _n_=1 then do; 
      call missing(new_field);
      declare hash NF();
      NF.definekey('k1');
      NF.definedata('new_field');
      NF.definedone(); 
      NF.add(key:1, data:.);
      NF.add(key:2, data:1);
   end;
   set input;
   call execute('%'||'sum('||put(k1,best.)||','||put(k2,best.)||');');
   rc=NF.find();
RUN;

Nice problem, thanks for asking 🙂

Cheers,

Oligolas

________________________
- Cheers -

mariopellegrini · Posted 10-24-2016 11:13 AM

Thanks!

And if there was a more restrictive condition on the "case when"?

For example:

proc sql;
create table calculated as
select *,
sum(num) as sum_num,
case when k1=1 then .
when k1=2 and calculated sum_num < 50 then 1
end as new_field
from input
group by k1,k2;
quit;

Oligolas · Posted 10-24-2016 12:23 PM

In order to perform a match, hash needs clear keys. IMHO it can not deal with mnemonics operators like '<'

So you would have to calculate the sum_num and set the corresponding new_field first. Afterwards you can perform your merge with hash.

I would use @Kurt_Bremser's code to perform the first step, while defining the new_field value as requested and load the table into the hash for match merging. - of course only if I expect the time requested for programming to be compensated by run time 😉

DATA help (keep=k1 k2 sum_num new_field);
   set input;
   retain sum_num;
   by k1 k2;
   if first.k2 then sum_num = 0;
   sum_num + num;

   if last.k2 then do;
      if k1 eq 1 then new_field=.;
      else if k1 eq 2 and sum_num < 50 then new_field=1;
      output;
   end;
RUN;

DATA calculated2;
   length new_field sum_num 8;
   drop rc; 
   if _n_=1 then do; 
      call missing(new_field, sum_num);
      declare hash NF(dataset:'help');
      NF.definekey('k1', 'k2');
      NF.definedata(all:'YES');
      NF.definedone();
   end;
   set input;
   rc=NF.find(key:k1,key:k2);
RUN;

________________________
- Cheers -

hash object vs proc sql

Re: hash object vs proc sql

Re: hash object vs proc sql

Re: hash object vs proc sql

Re: hash object vs proc sql

Re: hash object vs proc sql

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away