DATA Step, Macro, Functions and more

Cubic Spline interpolation - MANY variables at once

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 19
Accepted Solution

Cubic Spline interpolation - MANY variables at once

[ Edited ]

I have 500 or so variables that each has 11 points of data. I need to cubic spline between those 11 points across 100 smaller intervals. And I need to do that for all 500 variables.

 

Is there a way to do this using some sort of LOOP? And is there a way to do it so that it automatically knows how many variables to do it for? i.e. this month I have 500 variables, but last month it was only 493 variables.

 

Sample Data:


DATA Have;
INPUT Percentile Var1 Var2 Var499 Var500;
DATALINES;
0.01 -950 400 300 650
0.02 . . . .
0.05 -600 150 200 450
0.10 -300 100 125 200
0.20 . . . .
0.25 . . . .
0.50 0 0 0 0
0.75 . . . .
0.80 . . . .
0.90 60 -225 -150 -375
0.95 180 -390 -300 -500
0.98 . . . .
0.99 400 -675 -450 -800
;
RUN;

 

I need to fill in the missing values by cubic splining. And I need to do it for Var1-Var500. And need to be able to easily account for more or less variables without manual intervention.


Accepted Solutions
Solution
‎05-17-2018 05:04 PM
SAS Super FREQ
Posts: 4,277

Re: Cubic Spline interpolation - MANY variables at once

Posted in reply to SASaholic629

Yes. I think that will solve your problem. You transpose from wide to long format and then use BY processing to fit the 500 models. (The transpose will work on arbitrary number of variables that have the same prefix, as shown below.

 

There is a little wrinkle: You want to score the models on a scoring data set. You haven't said what the scoring data looks like, but the code below scores all models on 101 points in the interval [0,1].  If you want each variable to be scored on different intervals (like [min, max]) then form the scoring data first and merge it into the long data.

 

The code below follows my blog post but uses your data set. I changed the names of some variables because your explanatory variables use the prefix 'Var', so I made sure I didn't introduce new variables that have the same prefix. 

 

data Long;
set Have;
array x [*] Var:;
/* the original data */
Orig = 1;
do vNum = 1 to dim(x);
   vName = vname(x[vNum]);  /* variable name in char var */
   Value = x[vNum];        /* value for each variable for each obs */
   output;
end;
Orig = 0;
/* the scoring data set ... or merge with scoring data later */
do vNum = 1 to dim(x);
   vName = vname(x[vNum]);     /* variable name in char var */
   Value = .;                  /* use missing values for scoring data */
   do Percentile = 0 to 1 by 0.1;   /* evenly spaced in [0,1] */
      output;
   end;
end;
drop Var:;
run;

/* 2. Sort by BY-group variable */
proc sort data=Long;  by vName Percentile;  run;

/* 3. Call PROC REG and use BY statement to compute all regressions */
options nonotes;
proc transreg data=Long noprint;
by vName;
MODEL IDENTITY(Value) =SMOOTH(Percentile/SM=0);
OUTPUT OUT=CubicSpline p;
RUN;
options notes;

/* for this small example with 4 variables, you can graph the results */
proc sgplot data=CubicSpline;
series x=Percentile y=PValue / group=vName;
run;

View solution in original post


All Replies
Esteemed Advisor
Posts: 5,628

Re: Cubic Spline interpolation - MANY variables at once

Posted in reply to SASaholic629

transpose all VARs by percentile, sort by varname and percentile Then interpolate BY varname. Transpose back, if needed.

PG
Occasional Contributor
Posts: 19

Re: Cubic Spline interpolation - MANY variables at once

How do you interpolate across by Varname? I only know how to interpolate going down:

 


PROC TRANSREG DATA=Have;

MODEL IDENTITY(Var1) =SMOOTH(Percentile/SM=0);

OUTPUT OUT=CubicSpline p;

RUN;

Esteemed Advisor
Posts: 5,628

Re: Cubic Spline interpolation - MANY variables at once

Posted in reply to SASaholic629

Using proc expand, if you have the licence,

 

DATA Have;
INPUT Percentile Var1 Var2 Var499 Var500;
DATALINES;
0.01 -950 400 300 650
0.02 . . . .
0.05 -600 150 200 450
0.10 -300 100 125 200
0.20 . . . .
0.25 . . . .
0.50 0 0 0 0
0.75 . . . .
0.80 . . . .
0.90 60 -225 -150 -375
0.95 180 -390 -300 -500
0.98 . . . .
0.99 400 -675 -450 -800
;

proc transpose data=have out=havet name=varname;
by percentile;
var var: ;
run;

proc sort data=havet; by varname percentile; run;

/* proc expand requires SAS/ETS licence */
proc expand data=havet out=haveExp factor=10;
by varname;
id percentile;
convert col1;
run;

proc sort data=haveExp; by percentile varname; run;

proc transpose data=haveExp out=want(drop=_name_);
by percentile;
var col1;
id varname;
run; 


PG
Occasional Contributor
Posts: 19

Re: Cubic Spline interpolation - MANY variables at once

I was afraid you were going to say that. I don't have the SAS/ETS license - might be time to petition for it though. Thanks, PG

SAS Super FREQ
Posts: 4,277

Re: Cubic Spline interpolation - MANY variables at once

Posted in reply to SASaholic629

Do you have SAS/IML?

Occasional Contributor
Posts: 19

Re: Cubic Spline interpolation - MANY variables at once

I don't, but my colleague has that one (SAS/IML)

 

No one here has SAS/ETS

SAS Super FREQ
Posts: 4,277

Re: Cubic Spline interpolation - MANY variables at once

Posted in reply to SASaholic629

Do you want a cubic interpolating spline (goes through every data point) or a cubic smoothing spline (smooths the data, but does not always go through the points)? If a smoothing spline is okay, does it have to be a spline? There are other nonpametric smoothers such as loess and penalized B-splines that are more powerful because they automatically pick a smoothing parameter based on the data.

 

Please read the following:

  1. Cubic splines: interpolating vs smoothing
  2. Smoothing splines
  3. Automatically select a smoother based on the data

My personal recommendation would be to use a loess or penalized B-spline.

 

If you decide that PROC LOESS or PROC TRANSREG can fit the smoothers that you want, then the next step is to use the technique to compute thousands of regression models in SAS.

Occasional Contributor
Posts: 19

Re: Cubic Spline interpolation - MANY variables at once

It has to be a cubic interpolating spline that goes through each of the known data points. The PROC TRANSREG I posted above is doing exactly what I need. Problem is that it only does it for one variable at a time.

 

Maybe I can use that and pair it with your link on how to compute thousands of regression models?

Solution
‎05-17-2018 05:04 PM
SAS Super FREQ
Posts: 4,277

Re: Cubic Spline interpolation - MANY variables at once

Posted in reply to SASaholic629

Yes. I think that will solve your problem. You transpose from wide to long format and then use BY processing to fit the 500 models. (The transpose will work on arbitrary number of variables that have the same prefix, as shown below.

 

There is a little wrinkle: You want to score the models on a scoring data set. You haven't said what the scoring data looks like, but the code below scores all models on 101 points in the interval [0,1].  If you want each variable to be scored on different intervals (like [min, max]) then form the scoring data first and merge it into the long data.

 

The code below follows my blog post but uses your data set. I changed the names of some variables because your explanatory variables use the prefix 'Var', so I made sure I didn't introduce new variables that have the same prefix. 

 

data Long;
set Have;
array x [*] Var:;
/* the original data */
Orig = 1;
do vNum = 1 to dim(x);
   vName = vname(x[vNum]);  /* variable name in char var */
   Value = x[vNum];        /* value for each variable for each obs */
   output;
end;
Orig = 0;
/* the scoring data set ... or merge with scoring data later */
do vNum = 1 to dim(x);
   vName = vname(x[vNum]);     /* variable name in char var */
   Value = .;                  /* use missing values for scoring data */
   do Percentile = 0 to 1 by 0.1;   /* evenly spaced in [0,1] */
      output;
   end;
end;
drop Var:;
run;

/* 2. Sort by BY-group variable */
proc sort data=Long;  by vName Percentile;  run;

/* 3. Call PROC REG and use BY statement to compute all regressions */
options nonotes;
proc transreg data=Long noprint;
by vName;
MODEL IDENTITY(Value) =SMOOTH(Percentile/SM=0);
OUTPUT OUT=CubicSpline p;
RUN;
options notes;

/* for this small example with 4 variables, you can graph the results */
proc sgplot data=CubicSpline;
series x=Percentile y=PValue / group=vName;
run;
Occasional Contributor
Posts: 19

Re: Cubic Spline interpolation - MANY variables at once

My scoring data set isn't evenly spaced, so I'll have to merge that in. But I ran through your example and I'm pretty sure this is doing exactly what I need. Going to try to apply to my data and hopefully will be coming back here to mark this as the solution! Thanks, Rick

Occasional Contributor
Posts: 19

Re: Cubic Spline interpolation - MANY variables at once

This is pure gold. Works perfectly! Thank you so much!

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 219 views
  • 1 like
  • 3 in conversation