Re: getting max and min value by group using PROC IML

sebster24 · Posted 06-02-2016 10:42 AM

Ksharp = SAS Super Guru. Thank you.

Is there anything you cannot solve in SAS? 🙂

sebster24 · Posted 06-01-2016 08:27 AM

Hello,

I am trying to obtain the min, max values within a group (in this instance, "SITE") for a dataset.

I am new to proc iml, so please forgive me for my mistakes.

I have provided the datalines below for your perusal, along with my code (which does not work) and end solution:

data pottery;
        input Site X Fe Mg Ca Na;
        datalines;
      1 14.4 7.00 4.30 0.15 0.51
      1 13.8 7.08 3.43 0.12 0.17
      1 14.6 7.09 3.88 0.13 0.20
      1 11.5 6.37 5.64 0.16 0.14
      1 13.8 7.06 5.34 0.20 0.20
      1 10.9 6.26 3.47 0.17 0.22
      1 10.1 4.26 4.26 0.20 0.18
      1 11.6 5.78 5.91 0.18 0.16
      1 11.1 5.49 4.52 0.29 0.30
      1 13.4 6.92 7.23 0.28 0.20
      1 12.4 6.13 5.69 0.22 0.54
      1 13.1 6.64 5.51 0.31 0.24
      1 12.7 6.69 4.45 0.20 0.22
      1 12.5 6.44 3.94 0.22 0.23
      2 11.8 5.44 3.94 0.30 0.04
      2 11.6 5.39 3.77 0.29 0.06
      3 18.3 1.28 0.67 0.03 0.03
      3 15.8 2.39 0.63 0.01 0.04
      3 18.0 1.50 0.67 0.01 0.06
      3 18.0 1.88 0.68 0.01 0.04
      3 20.8 1.51 0.72 0.07 0.10
      4 17.7 1.12 0.56 0.06 0.06
      4 18.3 1.14 0.67 0.06 0.05
      4 16.7 0.92 0.53 0.01 0.05
      4 14.8 2.74 0.67 0.03 0.05
      4 19.1 1.64 0.60 0.10 0.03
      ;


	  proc iml;
        use pottery;
        read all var{site x Fe} into a;
		call sortndx(ndx,a,1);
       unique_rows=uniqueby(a,1,1:nrow(a)); 
	   unique_vals=a[ndx[unique_rows],1:3];
	   minC=a[><,];
	   maxC=a[<>,];
print unique_rows, unique_vals, minC, maxC;

The end solution needs to look somewhat like this:

site	Min_X	Min_Fe	Min_Mg	Min_Ca	Min_Na
1	10,1	4,26	3,43	0,12	0,14
2	11,6	5,39	3,77	0,29	0,04
3	15,8	1,28	0,63	0,01	0,03
4	14,8	0,92	0,53	0,01	0,03

site	Max_X	Max_Fe	Max_Mg	Max_Ca	Max_Na
1	14,6	7,09	7,23	0,31	0,54
2	11,8	5,44	3,94	0,3	0,06
3	20,8	2,39	0,72	0,07	0,1
4	19,1	2,74	0,67	0,1	0,06

I want to avoid using the datastep and Proc Sql option.

Please do let me know.

many thanks.

RW9 · Posted 06-01-2016 09:30 AM

The question really, is why do you want to avoid using datastep, proc, and SQL? Proc means/summary/freq are coded specifically for these types of process, they are optimised for it, and so not using them doesn't make much sense?

Ksharp · Posted 06-01-2016 10:57 PM

It is indeed more suitable for IML.

data pottery;
        input Site X Fe Mg Ca Na;
        datalines;
      1 14.4 7.00 4.30 0.15 0.51
      1 13.8 7.08 3.43 0.12 0.17
      1 14.6 7.09 3.88 0.13 0.20
      1 11.5 6.37 5.64 0.16 0.14
      1 13.8 7.06 5.34 0.20 0.20
      1 10.9 6.26 3.47 0.17 0.22
      1 10.1 4.26 4.26 0.20 0.18
      1 11.6 5.78 5.91 0.18 0.16
      1 11.1 5.49 4.52 0.29 0.30
      1 13.4 6.92 7.23 0.28 0.20
      1 12.4 6.13 5.69 0.22 0.54
      1 13.1 6.64 5.51 0.31 0.24
      1 12.7 6.69 4.45 0.20 0.22
      1 12.5 6.44 3.94 0.22 0.23
      2 11.8 5.44 3.94 0.30 0.04
      2 11.6 5.39 3.77 0.29 0.06
      3 18.3 1.28 0.67 0.03 0.03
      3 15.8 2.39 0.63 0.01 0.04
      3 18.0 1.50 0.67 0.01 0.06
      3 18.0 1.88 0.68 0.01 0.04
      3 20.8 1.51 0.72 0.07 0.10
      4 17.7 1.12 0.56 0.06 0.06
      4 18.3 1.14 0.67 0.06 0.05
      4 16.7 0.92 0.53 0.01 0.05
      4 14.8 2.74 0.67 0.03 0.05
      4 19.1 1.64 0.60 0.10 0.03
      ;
run;
proc iml;
use pottery nobs nobs;
read all var {Site};
read all var {X Fe Mg Ca Na} into x[c=vnames];
close;

start_end=t(loc(t(Site)^={.}||remove(Site,nobs)))||
          t(loc(t(Site)^=remove(Site,1)||{.}));
          
nrow=nrow(start_end);  
ncol=ncol(x);  
sites=site[start_end[,1]];  
minC=j(nrow,ncol,.);
maxC=j(nrow,ncol,.);

do i=1 to nrow;
 minC[i,]=x[start_end[i,1]:start_end[i,2],][><,];
 maxC[i,]=x[start_end[i,1]:start_end[i,2],][<>,];
end;
print sites minC[c=vnames],sites maxC[c=vnames];
quit;

RW9 · Posted 06-02-2016 05:12 AM

Hi,

Just wondering why IML is the better option than:

proc summary data=work.pottery;
  class site;
  var x fe mg ca na;
  output out=work.results min=min_x min_fe min_mg min_ca min_na max=max_x max_fe max_mg max_ca max_na;
run;

Seems to be simpler coding to do the above, and only requires Base SAS?

sebster24 · Posted 06-02-2016 10:41 AM

Hello RW9,

i was insisting on proc iml, because i wanted to dig deeper into statistical programming. It looks fun.

On another note, i wanted to see the difference myself, and found the following. Proc Iml is much efficient in resource usage and slightly quicker.

Spoiler

PROCEDURE IML used (Total process time):

real time 0.07 seconds

user cpu time 0.04 seconds

system cpu time 0.03 seconds

Memory 822k

OS Memory 11816k

Timestamp 02.06.2016 16:37:09

NOTE: There were 26 observations read from the data set WORK.POTTERY.

NOTE: The data set WORK.RESULTS has 5 observations and 13 variables.

NOTE: PROCEDURE SUMMARY used (Total process time):

real time 0.04 seconds

user cpu time 0.00 seconds

system cpu time 0.04 seconds

Memory 7980k

OS Memory 20044k

Timestamp 02.06.2016 16:37:09

PROCEDURE IML used (Total process time): real time 0.07 seconds user cpu time 0.04 seconds system cpu time 0.03 seconds Memory 822k OS Memory 11816k Timestamp 02.06.2016 16:37:09 NOTE: There were 26 observations read from the data set WORK.POTTERY. NOTE: The data set WORK.RESULTS has 5 observations and 13 variables. NOTE: PROCEDURE SUMMARY used (Total process time): real time 0.04 seconds user cpu time 0.00 seconds system cpu time 0.04 seconds Memory 7980k OS Memory 20044k Timestamp 02.06.2016 16:37:09

THank you for your help RW9. Much appreciated. You guys are just SAS Super-Gurus 🙂