Hello,
My understanding, a new topic should be opened for this by group question:-
Next, if we wanted to add another field such as group by and
calculate the std for each value of gr in the data below:-
what would be the best solution ? Thanks a lot.
data available ;
input gr dt y ;
cards ;
1 1 80
1 2 20
1 3 40
1 4 60
2 1 80
2 2 20
2 3 40
2 4 50
;
run ;
data wanted ;
input gr dt y std ;
cards ;
1 1 80
1 2 20 42.426406871
1 3 40 30.550504633
1 4 60 25.819888975
2 1 80
2 2 20 42.426406871
2 3 40 30.550504633
2 4 50 25
;
run ;
data available ;
input gr dt y ;
cards ;
1 1 80
1 2 20
1 3 40
1 4 60
2 1 80
2 2 20
2 3 40
2 4 50
;
proc freq data=available;
tables gr/noprint out=_a_;
run;
data intermediate;
merge available _a_;
by gr;
if first.gr then seq=0;
seq+1;
do group=seq to count;
thisy=y;
output;
end;
run;
proc summary data=intermediate nway;
class gr group;
var thisy;
output out=want stddev=;
run;
Did you try my code ?
data available ;
input gr dt y ;
cards ;
1 1 80
1 2 20
1 3 40
1 4 60
2 1 80
2 2 20
2 3 40
2 4 50
;
data want;
set available;
by gr;
array x{999999} _temporary_;
if first.gr then do;n=0;call missing(of x{*});end;
n+1;
x{n}=y;
std=std(of x{*});
drop n;
run;
Thanks a lot it works
super
Hello @J111,
My solution from the previous thread can be generalized to BY groups as well:
data want(drop=_:);
_s=0; _v=0;
do _n=1 by 1 until(last.gr);
set available;
by gr;
_s+y; /* cumulative sum */
_m=_s/_n; /* cumulative mean */
_d=dif(_m); /* mean change */
_q=(y-_m)**2; /* new term in sum of squares */
if _n>1 then do;
std=sqrt(_v+_d**2+_q/(_n-1)); /* cumulative standard deviation */
_v=((_n-1)*(_v+_d**2)+_q)/_n; /* cumulative population variance */
end;
output;
end;
run;
Thanks it works !!
Great
It would be nice to see the solution using the logic below (from P.Miller),
but just for the by group..
data _NULL_;
if 0 then set available nobs=n;
call symputx('nrows',n);
stop;
run;
data intermediate;
set available;
do group=_n_ to &nrows;
thisy=y;
output;
end;
run;
proc summary data=intermediate nway;
class group;
var thisy;
output out=want stddev=;
run;
using
data available ;
input gr dt y ;
cards ;
1 1 80
1 2 20
1 3 40
1 4 60
2 1 80
2 2 20
2 3 40
2 4 50
;
proc freq data=available;
tables gr/noprint out=_a_;
run;
data intermediate;
merge available _a_;
by gr;
if first.gr then seq=0;
seq+1;
do group=seq to count;
thisy=y;
output;
end;
run;
proc summary data=intermediate nway;
class gr group;
var thisy;
output out=want stddev=;
run;
Thanks a lot -
May I point out that this solution has much better performance compared to the array post
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.