BookmarkSubscribeRSS Feed
sebster24
Quartz | Level 8

Ksharp = SAS Super Guru. Thank you.

Is there anything you cannot solve in SAS? 🙂

5 REPLIES 5
sebster24
Quartz | Level 8

Hello,

I am trying to obtain the min, max values within a group (in this instance, "SITE") for a dataset.

I am new to proc iml, so please forgive me for my mistakes.

I have provided the datalines below for your perusal, along with my code (which does not work) and end solution:

 

data pottery;
        input Site X Fe Mg Ca Na;
        datalines;
      1 14.4 7.00 4.30 0.15 0.51
      1 13.8 7.08 3.43 0.12 0.17
      1 14.6 7.09 3.88 0.13 0.20
      1 11.5 6.37 5.64 0.16 0.14
      1 13.8 7.06 5.34 0.20 0.20
      1 10.9 6.26 3.47 0.17 0.22
      1 10.1 4.26 4.26 0.20 0.18
      1 11.6 5.78 5.91 0.18 0.16
      1 11.1 5.49 4.52 0.29 0.30
      1 13.4 6.92 7.23 0.28 0.20
      1 12.4 6.13 5.69 0.22 0.54
      1 13.1 6.64 5.51 0.31 0.24
      1 12.7 6.69 4.45 0.20 0.22
      1 12.5 6.44 3.94 0.22 0.23
      2 11.8 5.44 3.94 0.30 0.04
      2 11.6 5.39 3.77 0.29 0.06
      3 18.3 1.28 0.67 0.03 0.03
      3 15.8 2.39 0.63 0.01 0.04
      3 18.0 1.50 0.67 0.01 0.06
      3 18.0 1.88 0.68 0.01 0.04
      3 20.8 1.51 0.72 0.07 0.10
      4 17.7 1.12 0.56 0.06 0.06
      4 18.3 1.14 0.67 0.06 0.05
      4 16.7 0.92 0.53 0.01 0.05
      4 14.8 2.74 0.67 0.03 0.05
      4 19.1 1.64 0.60 0.10 0.03
      ;


	  proc iml;
        use pottery;
        read all var{site x Fe} into a;
		call sortndx(ndx,a,1);
       unique_rows=uniqueby(a,1,1:nrow(a)); 
	   unique_vals=a[ndx[unique_rows],1:3];
	   minC=a[><,];
	   maxC=a[<>,];
print unique_rows, unique_vals, minC, maxC;

 

 

The end solution needs to look somewhat like this:

site Min_X Min_Fe Min_Mg Min_Ca Min_Na
1 10,1 4,26 3,43 0,12 0,14
2 11,6 5,39 3,77 0,29 0,04
3 15,8 1,28 0,63 0,01 0,03
4 14,8 0,92 0,53 0,01 0,03

 

site Max_X Max_Fe Max_Mg Max_Ca Max_Na
1 14,6 7,09 7,23 0,31 0,54
2 11,8 5,44 3,94 0,3 0,06
3 20,8 2,39 0,72 0,07 0,1
4 19,1 2,74 0,67 0,1 0,06

 

I want to avoid using the datastep and Proc Sql option.

Please do let me know.

many thanks.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

The question really, is why do you want to avoid using datastep, proc, and SQL?  Proc means/summary/freq are coded specifically for these types of process, they are optimised for it, and so not using them doesn't make much sense?

Ksharp
Super User

It is indeed more suitable for IML.

 

data pottery;
        input Site X Fe Mg Ca Na;
        datalines;
      1 14.4 7.00 4.30 0.15 0.51
      1 13.8 7.08 3.43 0.12 0.17
      1 14.6 7.09 3.88 0.13 0.20
      1 11.5 6.37 5.64 0.16 0.14
      1 13.8 7.06 5.34 0.20 0.20
      1 10.9 6.26 3.47 0.17 0.22
      1 10.1 4.26 4.26 0.20 0.18
      1 11.6 5.78 5.91 0.18 0.16
      1 11.1 5.49 4.52 0.29 0.30
      1 13.4 6.92 7.23 0.28 0.20
      1 12.4 6.13 5.69 0.22 0.54
      1 13.1 6.64 5.51 0.31 0.24
      1 12.7 6.69 4.45 0.20 0.22
      1 12.5 6.44 3.94 0.22 0.23
      2 11.8 5.44 3.94 0.30 0.04
      2 11.6 5.39 3.77 0.29 0.06
      3 18.3 1.28 0.67 0.03 0.03
      3 15.8 2.39 0.63 0.01 0.04
      3 18.0 1.50 0.67 0.01 0.06
      3 18.0 1.88 0.68 0.01 0.04
      3 20.8 1.51 0.72 0.07 0.10
      4 17.7 1.12 0.56 0.06 0.06
      4 18.3 1.14 0.67 0.06 0.05
      4 16.7 0.92 0.53 0.01 0.05
      4 14.8 2.74 0.67 0.03 0.05
      4 19.1 1.64 0.60 0.10 0.03
      ;
run;
proc iml;
use pottery nobs nobs;
read all var {Site};
read all var {X Fe Mg Ca Na} into x[c=vnames];
close;

start_end=t(loc(t(Site)^={.}||remove(Site,nobs)))||
          t(loc(t(Site)^=remove(Site,1)||{.}));
          
nrow=nrow(start_end);  
ncol=ncol(x);  
sites=site[start_end[,1]];  
minC=j(nrow,ncol,.);
maxC=j(nrow,ncol,.);

do i=1 to nrow;
 minC[i,]=x[start_end[i,1]:start_end[i,2],][><,];
 maxC[i,]=x[start_end[i,1]:start_end[i,2],][<>,];
end;
print sites minC[c=vnames],sites maxC[c=vnames];
quit;
          
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

 

Just wondering why IML is the better option than:

proc summary data=work.pottery;
  class site;
  var x fe mg ca na;
  output out=work.results min=min_x min_fe min_mg min_ca min_na max=max_x max_fe max_mg max_ca max_na;
run;

Seems to be simpler coding to do the above, and only requires Base SAS?

sebster24
Quartz | Level 8

Hello RW9,

i was insisting on proc iml, because i wanted to dig deeper into statistical programming. It looks fun.

On another note, i wanted to see the difference myself, and found the following. Proc Iml is much efficient in resource usage and slightly quicker.

Spoiler

PROCEDURE IML used (Total process time):

real time 0.07 seconds

user cpu time 0.04 seconds

system cpu time 0.03 seconds

Memory 822k

OS Memory 11816k

Timestamp 02.06.2016 16:37:09




NOTE: There were 26 observations read from the data set WORK.POTTERY.

NOTE: The data set WORK.RESULTS has 5 observations and 13 variables.

NOTE: PROCEDURE SUMMARY used (Total process time):

real time 0.04 seconds

user cpu time 0.00 seconds

system cpu time 0.04 seconds

Memory 7980k

OS Memory 20044k

Timestamp 02.06.2016 16:37:09

THank you for your help RW9. Much appreciated. You guys are just SAS Super-Gurus 🙂 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 5 replies
  • 1541 views
  • 2 likes
  • 3 in conversation