turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Programming
- /
- Cubic Spline interpolation - MANY variables at onc...

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-17-2018 01:54 PM - edited 05-17-2018 01:57 PM

I have 500 or so variables that each has 11 points of data. I need to cubic spline between those 11 points across 100 smaller intervals. And I need to do that for all 500 variables.

Is there a way to do this using some sort of LOOP? And is there a way to do it so that it automatically knows how many variables to do it for? i.e. this month I have 500 variables, but last month it was only 493 variables.

Sample Data:

DATA Have;

INPUT Percentile Var1 Var2 Var499 Var500;

DATALINES;

0.01 -950 400 300 650

0.02 . . . .

0.05 -600 150 200 450

0.10 -300 100 125 200

0.20 . . . .

0.25 . . . .

0.50 0 0 0 0

0.75 . . . .

0.80 . . . .

0.90 60 -225 -150 -375

0.95 180 -390 -300 -500

0.98 . . . .

0.99 400 -675 -450 -800

;

RUN;

I need to fill in the missing values by cubic splining. And I need to do it for Var1-Var500. And need to be able to easily account for more or less variables without manual intervention.

Accepted Solutions

Solution

05-17-2018
05:04 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SASaholic629

05-17-2018 04:14 PM

Yes. I think that will solve your problem. You transpose from wide to long format and then use BY processing to fit the 500 models. (The transpose will work on arbitrary number of variables that have the same prefix, as shown below.

There is a little wrinkle: You want to score the models on a scoring data set. You haven't said what the scoring data looks like, but the code below scores all models on 101 points in the interval [0,1]. If you want each variable to be scored on different intervals (like [min, max]) then form the scoring data first and merge it into the long data.

The code below follows my blog post but uses your data set. I changed the names of some variables because your explanatory variables use the prefix 'Var', so I made sure I didn't introduce new variables that have the same prefix.

```
data Long;
set Have;
array x [*] Var:;
/* the original data */
Orig = 1;
do vNum = 1 to dim(x);
vName = vname(x[vNum]); /* variable name in char var */
Value = x[vNum]; /* value for each variable for each obs */
output;
end;
Orig = 0;
/* the scoring data set ... or merge with scoring data later */
do vNum = 1 to dim(x);
vName = vname(x[vNum]); /* variable name in char var */
Value = .; /* use missing values for scoring data */
do Percentile = 0 to 1 by 0.1; /* evenly spaced in [0,1] */
output;
end;
end;
drop Var:;
run;
/* 2. Sort by BY-group variable */
proc sort data=Long; by vName Percentile; run;
/* 3. Call PROC REG and use BY statement to compute all regressions */
options nonotes;
proc transreg data=Long noprint;
by vName;
MODEL IDENTITY(Value) =SMOOTH(Percentile/SM=0);
OUTPUT OUT=CubicSpline p;
RUN;
options notes;
/* for this small example with 4 variables, you can graph the results */
proc sgplot data=CubicSpline;
series x=Percentile y=PValue / group=vName;
run;
```

All Replies

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SASaholic629

05-17-2018 02:04 PM

transpose all VARs by percentile, sort by varname and percentile Then interpolate BY varname. Transpose back, if needed.

PG

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PGStats

05-17-2018 02:08 PM

How do you interpolate across by Varname? I only know how to interpolate going down:

PROC TRANSREG DATA=Have;

MODEL IDENTITY(Var1) =SMOOTH(Percentile/SM=0);

OUTPUT OUT=CubicSpline p;

RUN;

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SASaholic629

05-17-2018 02:27 PM

Using **proc expand**, if you have the licence,

```
DATA Have;
INPUT Percentile Var1 Var2 Var499 Var500;
DATALINES;
0.01 -950 400 300 650
0.02 . . . .
0.05 -600 150 200 450
0.10 -300 100 125 200
0.20 . . . .
0.25 . . . .
0.50 0 0 0 0
0.75 . . . .
0.80 . . . .
0.90 60 -225 -150 -375
0.95 180 -390 -300 -500
0.98 . . . .
0.99 400 -675 -450 -800
;
proc transpose data=have out=havet name=varname;
by percentile;
var var: ;
run;
proc sort data=havet; by varname percentile; run;
/* proc expand requires SAS/ETS licence */
proc expand data=havet out=haveExp factor=10;
by varname;
id percentile;
convert col1;
run;
proc sort data=haveExp; by percentile varname; run;
proc transpose data=haveExp out=want(drop=_name_);
by percentile;
var col1;
id varname;
run;
```

PG

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PGStats

05-17-2018 02:30 PM

I was afraid you were going to say that. I don't have the SAS/ETS license - might be time to petition for it though. Thanks, PG

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SASaholic629

05-17-2018 02:35 PM

Do you have SAS/IML?

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Rick_SAS

05-17-2018 02:37 PM

I don't, but my colleague has that one (SAS/IML)

No one here has SAS/ETS

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SASaholic629

05-17-2018 03:40 PM

Do you want a cubic interpolating spline (goes through every data point) or a cubic smoothing spline (smooths the data, but does not always go through the points)? If a smoothing spline is okay, does it have to be a spline? There are other nonpametric smoothers such as loess and penalized B-splines that are more powerful because they automatically pick a smoothing parameter based on the data.

Please read the following:

- Cubic splines: interpolating vs smoothing
- Smoothing splines
- Automatically select a smoother based on the data

My personal recommendation would be to use a loess or penalized B-spline.

If you decide that PROC LOESS or PROC TRANSREG can fit the smoothers that you want, then the next step is to use the technique to compute thousands of regression models in SAS.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Rick_SAS

05-17-2018 03:46 PM

It has to be a cubic interpolating spline that goes through each of the known data points. The PROC TRANSREG I posted above is doing exactly what I need. Problem is that it only does it for one variable at a time.

Maybe I can use that and pair it with your link on how to compute thousands of regression models?

Solution

05-17-2018
05:04 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SASaholic629

05-17-2018 04:14 PM

Yes. I think that will solve your problem. You transpose from wide to long format and then use BY processing to fit the 500 models. (The transpose will work on arbitrary number of variables that have the same prefix, as shown below.

There is a little wrinkle: You want to score the models on a scoring data set. You haven't said what the scoring data looks like, but the code below scores all models on 101 points in the interval [0,1]. If you want each variable to be scored on different intervals (like [min, max]) then form the scoring data first and merge it into the long data.

The code below follows my blog post but uses your data set. I changed the names of some variables because your explanatory variables use the prefix 'Var', so I made sure I didn't introduce new variables that have the same prefix.

```
data Long;
set Have;
array x [*] Var:;
/* the original data */
Orig = 1;
do vNum = 1 to dim(x);
vName = vname(x[vNum]); /* variable name in char var */
Value = x[vNum]; /* value for each variable for each obs */
output;
end;
Orig = 0;
/* the scoring data set ... or merge with scoring data later */
do vNum = 1 to dim(x);
vName = vname(x[vNum]); /* variable name in char var */
Value = .; /* use missing values for scoring data */
do Percentile = 0 to 1 by 0.1; /* evenly spaced in [0,1] */
output;
end;
end;
drop Var:;
run;
/* 2. Sort by BY-group variable */
proc sort data=Long; by vName Percentile; run;
/* 3. Call PROC REG and use BY statement to compute all regressions */
options nonotes;
proc transreg data=Long noprint;
by vName;
MODEL IDENTITY(Value) =SMOOTH(Percentile/SM=0);
OUTPUT OUT=CubicSpline p;
RUN;
options notes;
/* for this small example with 4 variables, you can graph the results */
proc sgplot data=CubicSpline;
series x=Percentile y=PValue / group=vName;
run;
```

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Rick_SAS

05-17-2018 04:25 PM

My scoring data set isn't evenly spaced, so I'll have to merge that in. But I ran through your example and I'm pretty sure this is doing exactly what I need. Going to try to apply to my data and hopefully will be coming back here to mark this as the solution! Thanks, Rick

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Rick_SAS

05-17-2018 05:06 PM

This is pure gold. Works perfectly! Thank you so much!