Hello all.
I hve data such as
Name YEAR Weight
Jack 2000 200
Jack 2001 205
Jack 2002 206
Jack 2003 203
Jack 2004 209
Jack 2005 210
John 2000 200
John 2001 205
John 2002 206
John 2003 203
John 2004 209
John 2005 210
How to capture the previous 3 years standard deviation of WEIGHT for each NAME? If previous years only has observation less than 3, it can return 0.
I know STD code, but I don't know how to capure the standard deviation for a certain range of time.
This is the data table I want
Name YEAR Weight STD
Jack 2000 200 0
Jack 2001 205 0
Jack 2002 206 0
Jack 2003 203 3.5 (This is a number I make up, but it should be the STD of WEIGHT from 2000 to 2003)
Jack 2004 209 3.4 (Again, this should be the STD of WEIGHT from 2001 to 2004)
Jack 2005 210 5.2 (Again, this should be the STD of WEIGHT from 2002 to 2005)
John 2000 200
John 2001 205
John 2002 206
John 2003 203
John 2004 209
John 2005 210
proc sql;
create table want as select *,
(select std(WEIGHT) from have where YEAR between a.YEAR-3 and a.YEAR and NAME=a.NAME) as std
from have as a;
quit;
Above is the code I use.
But the result shows even for the beginning year, it also has a small standard deviation. That looks weird to me. Because, if it only has one value, and it should have no variation then no standard deviation.
and sub query is also confusing me. Why I cannot just put the condition from year-3 to year using the GROUP BY?
Thank you all!
If the data are sorted, use the DATA step, with a BY statement:
data want;
set have;
by name;
seq+1;
if first.name then seq=1;
wgt_std3=ifn(seq>=4
,std(lag3(weight),lag2(weight),lag1(weight),weight)
,0);
run;
The lag function takes the prior OBSERVATIONS, so the program assumes there are no gaps in the sequence of years.
Are you after something like this?
data have;
input Name $ Age Weight ;
datalines;
Jack 18 200
Jack 19 205
Jack 20 206
Jack 21 203
Jack 22 209
Jack 23 210
John 18 200
John 19 205
John 20 206
John 21 203
John 22 209
John 23 210
John 24 209
John 25 210
;
proc sql;
create table want1(drop=k) as
select *,floor(age/3) as k,ifn(count(calculated k)>=3,std(weight),0) as std
from have
group by name,k ;
quit;
or just name and std variables?
proc sql;
create table want(drop=k) as
select name,floor(age/3) as k,ifn(count(calculated k)>=3,std(weight),0) as std
from have
group by name,k ;
quit;
Hello.
I modify the problem a little.
What if the AGE is the YEAR?
And I only want the STD every 4 years.
If the data are sorted, use the DATA step, with a BY statement:
data want;
set have;
by name;
seq+1;
if first.name then seq=1;
wgt_std3=ifn(seq>=4
,std(lag3(weight),lag2(weight),lag1(weight),weight)
,0);
run;
The lag function takes the prior OBSERVATIONS, so the program assumes there are no gaps in the sequence of years.
Thank you very much.
This works.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.