New-ish SAS user.
I need to generate some curves to fit some vaguely bell-shaped data. (I'm looking at female fertility by age, if that matters.) It looks like splines are the way to go, and PROC TRANSREG seems to be a great procedure to use here. What I'd like to do is add conditions that f(a)=f'(a)=0 for my curve, where a is either "endpoint" of my data (in this case, for a=14 and a=50). Two questions:
1. Can I do this within PROC TRANSREG? And a follow-up question: if I need to generate several spline curves and display them on a single graph (say, one for each year or something), how would I do this with this procedure? (My thinking is that maybe I could save the curves on the fly and then write them to a single graph or something?)
2. Is there a better procedure for this kind of thing? There are so many to choose from...
There is nothing built into transreg that does what you want. The only way I have ever figured out to make transreg produce a function that passes through (or very close to) specific points is to add those points to the data set and give them a really big weight. In the example code below, I add two points to the data, one at the minimum and one at the maximum. I give those a big weight. Then, yes, you can create and output data set and create graphs with one or more functions by using PROC SGPLOT. I only illustrate one function, but you can add more statements to get more.
data x;
do i = 1 to 100;
x = normal(151);
y = exp(-0.5 * x * x) + 0.1 + 0.1 * normal(151);
output;
end;
run;
ods graphics on;
proc transreg data=x;
model ide(y) = spl(x / nkn=3);
output out=b p;
run;
proc means noprint data=x;
output out=m(where=(_stat_ in ('MIN', 'MAX')));
var x;
run;
data x2(drop=_stat_ min max);
retain min max;
if _n_ eq 1 then do;
set m(keep=x _stat_ where=(_stat_='MIN'));
min = x;
set m(keep=x _stat_ where=(_stat_='MAX'));
max = x;
end;
set x;
w = 1;
output;
if abs(x - min) le 1e-8 or abs(x - max) le 1e-8 then do;
w = 1e6;
y = 0;
output;
end;
run;
proc transreg data=x2 plots=fit(nocli);
model ide(y) = spl(x / nkn=3);
output out=b(where=(w=1)) p;
weight w;
run;
proc sort data=b;
by x;
run;
proc sgplot;
scatter y=y x=x;
series y=py x=x;
refline 0 / axis=y;
run;
Alternatively: how would you go about doing this? (fitting a curve to data with the requirement that the curve "level off" at zero)
There is nothing built into transreg that does what you want. The only way I have ever figured out to make transreg produce a function that passes through (or very close to) specific points is to add those points to the data set and give them a really big weight. In the example code below, I add two points to the data, one at the minimum and one at the maximum. I give those a big weight. Then, yes, you can create and output data set and create graphs with one or more functions by using PROC SGPLOT. I only illustrate one function, but you can add more statements to get more.
data x;
do i = 1 to 100;
x = normal(151);
y = exp(-0.5 * x * x) + 0.1 + 0.1 * normal(151);
output;
end;
run;
ods graphics on;
proc transreg data=x;
model ide(y) = spl(x / nkn=3);
output out=b p;
run;
proc means noprint data=x;
output out=m(where=(_stat_ in ('MIN', 'MAX')));
var x;
run;
data x2(drop=_stat_ min max);
retain min max;
if _n_ eq 1 then do;
set m(keep=x _stat_ where=(_stat_='MIN'));
min = x;
set m(keep=x _stat_ where=(_stat_='MAX'));
max = x;
end;
set x;
w = 1;
output;
if abs(x - min) le 1e-8 or abs(x - max) le 1e-8 then do;
w = 1e6;
y = 0;
output;
end;
run;
proc transreg data=x2 plots=fit(nocli);
model ide(y) = spl(x / nkn=3);
output out=b(where=(w=1)) p;
weight w;
run;
proc sort data=b;
by x;
run;
proc sgplot;
scatter y=y x=x;
series y=py x=x;
refline 0 / axis=y;
run;
Alright, yes, I thought of doing something like this, but I was hoping there was a nicer way. Thanks!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.