Ksharp = SAS Super Guru. Thank you.
Is there anything you cannot solve in SAS? 🙂
Hello,
I am trying to obtain the min, max values within a group (in this instance, "SITE") for a dataset.
I am new to proc iml, so please forgive me for my mistakes.
I have provided the datalines below for your perusal, along with my code (which does not work) and end solution:
data pottery;
input Site X Fe Mg Ca Na;
datalines;
1 14.4 7.00 4.30 0.15 0.51
1 13.8 7.08 3.43 0.12 0.17
1 14.6 7.09 3.88 0.13 0.20
1 11.5 6.37 5.64 0.16 0.14
1 13.8 7.06 5.34 0.20 0.20
1 10.9 6.26 3.47 0.17 0.22
1 10.1 4.26 4.26 0.20 0.18
1 11.6 5.78 5.91 0.18 0.16
1 11.1 5.49 4.52 0.29 0.30
1 13.4 6.92 7.23 0.28 0.20
1 12.4 6.13 5.69 0.22 0.54
1 13.1 6.64 5.51 0.31 0.24
1 12.7 6.69 4.45 0.20 0.22
1 12.5 6.44 3.94 0.22 0.23
2 11.8 5.44 3.94 0.30 0.04
2 11.6 5.39 3.77 0.29 0.06
3 18.3 1.28 0.67 0.03 0.03
3 15.8 2.39 0.63 0.01 0.04
3 18.0 1.50 0.67 0.01 0.06
3 18.0 1.88 0.68 0.01 0.04
3 20.8 1.51 0.72 0.07 0.10
4 17.7 1.12 0.56 0.06 0.06
4 18.3 1.14 0.67 0.06 0.05
4 16.7 0.92 0.53 0.01 0.05
4 14.8 2.74 0.67 0.03 0.05
4 19.1 1.64 0.60 0.10 0.03
;
proc iml;
use pottery;
read all var{site x Fe} into a;
call sortndx(ndx,a,1);
unique_rows=uniqueby(a,1,1:nrow(a));
unique_vals=a[ndx[unique_rows],1:3];
minC=a[><,];
maxC=a[<>,];
print unique_rows, unique_vals, minC, maxC;
The end solution needs to look somewhat like this:
site | Min_X | Min_Fe | Min_Mg | Min_Ca | Min_Na |
1 | 10,1 | 4,26 | 3,43 | 0,12 | 0,14 |
2 | 11,6 | 5,39 | 3,77 | 0,29 | 0,04 |
3 | 15,8 | 1,28 | 0,63 | 0,01 | 0,03 |
4 | 14,8 | 0,92 | 0,53 | 0,01 | 0,03 |
site | Max_X | Max_Fe | Max_Mg | Max_Ca | Max_Na |
1 | 14,6 | 7,09 | 7,23 | 0,31 | 0,54 |
2 | 11,8 | 5,44 | 3,94 | 0,3 | 0,06 |
3 | 20,8 | 2,39 | 0,72 | 0,07 | 0,1 |
4 | 19,1 | 2,74 | 0,67 | 0,1 | 0,06 |
I want to avoid using the datastep and Proc Sql option.
Please do let me know.
many thanks.
The question really, is why do you want to avoid using datastep, proc, and SQL? Proc means/summary/freq are coded specifically for these types of process, they are optimised for it, and so not using them doesn't make much sense?
It is indeed more suitable for IML.
data pottery;
input Site X Fe Mg Ca Na;
datalines;
1 14.4 7.00 4.30 0.15 0.51
1 13.8 7.08 3.43 0.12 0.17
1 14.6 7.09 3.88 0.13 0.20
1 11.5 6.37 5.64 0.16 0.14
1 13.8 7.06 5.34 0.20 0.20
1 10.9 6.26 3.47 0.17 0.22
1 10.1 4.26 4.26 0.20 0.18
1 11.6 5.78 5.91 0.18 0.16
1 11.1 5.49 4.52 0.29 0.30
1 13.4 6.92 7.23 0.28 0.20
1 12.4 6.13 5.69 0.22 0.54
1 13.1 6.64 5.51 0.31 0.24
1 12.7 6.69 4.45 0.20 0.22
1 12.5 6.44 3.94 0.22 0.23
2 11.8 5.44 3.94 0.30 0.04
2 11.6 5.39 3.77 0.29 0.06
3 18.3 1.28 0.67 0.03 0.03
3 15.8 2.39 0.63 0.01 0.04
3 18.0 1.50 0.67 0.01 0.06
3 18.0 1.88 0.68 0.01 0.04
3 20.8 1.51 0.72 0.07 0.10
4 17.7 1.12 0.56 0.06 0.06
4 18.3 1.14 0.67 0.06 0.05
4 16.7 0.92 0.53 0.01 0.05
4 14.8 2.74 0.67 0.03 0.05
4 19.1 1.64 0.60 0.10 0.03
;
run;
proc iml;
use pottery nobs nobs;
read all var {Site};
read all var {X Fe Mg Ca Na} into x[c=vnames];
close;
start_end=t(loc(t(Site)^={.}||remove(Site,nobs)))||
t(loc(t(Site)^=remove(Site,1)||{.}));
nrow=nrow(start_end);
ncol=ncol(x);
sites=site[start_end[,1]];
minC=j(nrow,ncol,.);
maxC=j(nrow,ncol,.);
do i=1 to nrow;
minC[i,]=x[start_end[i,1]:start_end[i,2],][><,];
maxC[i,]=x[start_end[i,1]:start_end[i,2],][<>,];
end;
print sites minC[c=vnames],sites maxC[c=vnames];
quit;
Hi,
Just wondering why IML is the better option than:
proc summary data=work.pottery; class site; var x fe mg ca na; output out=work.results min=min_x min_fe min_mg min_ca min_na max=max_x max_fe max_mg max_ca max_na; run;
Seems to be simpler coding to do the above, and only requires Base SAS?
Hello RW9,
i was insisting on proc iml, because i wanted to dig deeper into statistical programming. It looks fun.
On another note, i wanted to see the difference myself, and found the following. Proc Iml is much efficient in resource usage and slightly quicker.
PROCEDURE IML used (Total process time):
real time 0.07 seconds
user cpu time 0.04 seconds
system cpu time 0.03 seconds
Memory 822k
OS Memory 11816k
Timestamp 02.06.2016 16:37:09
NOTE: There were 26 observations read from the data set WORK.POTTERY.
NOTE: The data set WORK.RESULTS has 5 observations and 13 variables.
NOTE: PROCEDURE SUMMARY used (Total process time):
real time 0.04 seconds
user cpu time 0.00 seconds
system cpu time 0.04 seconds
Memory 7980k
OS Memory 20044k
Timestamp 02.06.2016 16:37:09
THank you for your help RW9. Much appreciated. You guys are just SAS Super-Gurus 🙂
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.