BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SASaholic629
Fluorite | Level 6

I have 500 or so variables that each has 11 points of data. I need to cubic spline between those 11 points across 100 smaller intervals. And I need to do that for all 500 variables.

 

Is there a way to do this using some sort of LOOP? And is there a way to do it so that it automatically knows how many variables to do it for? i.e. this month I have 500 variables, but last month it was only 493 variables.

 

Sample Data:


DATA Have;
INPUT Percentile Var1 Var2 Var499 Var500;
DATALINES;
0.01 -950 400 300 650
0.02 . . . .
0.05 -600 150 200 450
0.10 -300 100 125 200
0.20 . . . .
0.25 . . . .
0.50 0 0 0 0
0.75 . . . .
0.80 . . . .
0.90 60 -225 -150 -375
0.95 180 -390 -300 -500
0.98 . . . .
0.99 400 -675 -450 -800
;
RUN;

 

I need to fill in the missing values by cubic splining. And I need to do it for Var1-Var500. And need to be able to easily account for more or less variables without manual intervention.

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

Yes. I think that will solve your problem. You transpose from wide to long format and then use BY processing to fit the 500 models. (The transpose will work on arbitrary number of variables that have the same prefix, as shown below.

 

There is a little wrinkle: You want to score the models on a scoring data set. You haven't said what the scoring data looks like, but the code below scores all models on 101 points in the interval [0,1].  If you want each variable to be scored on different intervals (like [min, max]) then form the scoring data first and merge it into the long data.

 

The code below follows my blog post but uses your data set. I changed the names of some variables because your explanatory variables use the prefix 'Var', so I made sure I didn't introduce new variables that have the same prefix. 

 

data Long;
set Have;
array x [*] Var:;
/* the original data */
Orig = 1;
do vNum = 1 to dim(x);
   vName = vname(x[vNum]);  /* variable name in char var */
   Value = x[vNum];        /* value for each variable for each obs */
   output;
end;
Orig = 0;
/* the scoring data set ... or merge with scoring data later */
do vNum = 1 to dim(x);
   vName = vname(x[vNum]);     /* variable name in char var */
   Value = .;                  /* use missing values for scoring data */
   do Percentile = 0 to 1 by 0.1;   /* evenly spaced in [0,1] */
      output;
   end;
end;
drop Var:;
run;

/* 2. Sort by BY-group variable */
proc sort data=Long;  by vName Percentile;  run;

/* 3. Call PROC REG and use BY statement to compute all regressions */
options nonotes;
proc transreg data=Long noprint;
by vName;
MODEL IDENTITY(Value) =SMOOTH(Percentile/SM=0);
OUTPUT OUT=CubicSpline p;
RUN;
options notes;

/* for this small example with 4 variables, you can graph the results */
proc sgplot data=CubicSpline;
series x=Percentile y=PValue / group=vName;
run;

View solution in original post

11 REPLIES 11
PGStats
Opal | Level 21

transpose all VARs by percentile, sort by varname and percentile Then interpolate BY varname. Transpose back, if needed.

PG
SASaholic629
Fluorite | Level 6

How do you interpolate across by Varname? I only know how to interpolate going down:

 


PROC TRANSREG DATA=Have;

MODEL IDENTITY(Var1) =SMOOTH(Percentile/SM=0);

OUTPUT OUT=CubicSpline p;

RUN;

PGStats
Opal | Level 21

Using proc expand, if you have the licence,

 

DATA Have;
INPUT Percentile Var1 Var2 Var499 Var500;
DATALINES;
0.01 -950 400 300 650
0.02 . . . .
0.05 -600 150 200 450
0.10 -300 100 125 200
0.20 . . . .
0.25 . . . .
0.50 0 0 0 0
0.75 . . . .
0.80 . . . .
0.90 60 -225 -150 -375
0.95 180 -390 -300 -500
0.98 . . . .
0.99 400 -675 -450 -800
;

proc transpose data=have out=havet name=varname;
by percentile;
var var: ;
run;

proc sort data=havet; by varname percentile; run;

/* proc expand requires SAS/ETS licence */
proc expand data=havet out=haveExp factor=10;
by varname;
id percentile;
convert col1;
run;

proc sort data=haveExp; by percentile varname; run;

proc transpose data=haveExp out=want(drop=_name_);
by percentile;
var col1;
id varname;
run; 


PG
SASaholic629
Fluorite | Level 6

I was afraid you were going to say that. I don't have the SAS/ETS license - might be time to petition for it though. Thanks, PG

Rick_SAS
SAS Super FREQ

Do you have SAS/IML?

SASaholic629
Fluorite | Level 6

I don't, but my colleague has that one (SAS/IML)

 

No one here has SAS/ETS

Rick_SAS
SAS Super FREQ

Do you want a cubic interpolating spline (goes through every data point) or a cubic smoothing spline (smooths the data, but does not always go through the points)? If a smoothing spline is okay, does it have to be a spline? There are other nonpametric smoothers such as loess and penalized B-splines that are more powerful because they automatically pick a smoothing parameter based on the data.

 

Please read the following:

  1. Cubic splines: interpolating vs smoothing
  2. Smoothing splines
  3. Automatically select a smoother based on the data

My personal recommendation would be to use a loess or penalized B-spline.

 

If you decide that PROC LOESS or PROC TRANSREG can fit the smoothers that you want, then the next step is to use the technique to compute thousands of regression models in SAS.

SASaholic629
Fluorite | Level 6

It has to be a cubic interpolating spline that goes through each of the known data points. The PROC TRANSREG I posted above is doing exactly what I need. Problem is that it only does it for one variable at a time.

 

Maybe I can use that and pair it with your link on how to compute thousands of regression models?

Rick_SAS
SAS Super FREQ

Yes. I think that will solve your problem. You transpose from wide to long format and then use BY processing to fit the 500 models. (The transpose will work on arbitrary number of variables that have the same prefix, as shown below.

 

There is a little wrinkle: You want to score the models on a scoring data set. You haven't said what the scoring data looks like, but the code below scores all models on 101 points in the interval [0,1].  If you want each variable to be scored on different intervals (like [min, max]) then form the scoring data first and merge it into the long data.

 

The code below follows my blog post but uses your data set. I changed the names of some variables because your explanatory variables use the prefix 'Var', so I made sure I didn't introduce new variables that have the same prefix. 

 

data Long;
set Have;
array x [*] Var:;
/* the original data */
Orig = 1;
do vNum = 1 to dim(x);
   vName = vname(x[vNum]);  /* variable name in char var */
   Value = x[vNum];        /* value for each variable for each obs */
   output;
end;
Orig = 0;
/* the scoring data set ... or merge with scoring data later */
do vNum = 1 to dim(x);
   vName = vname(x[vNum]);     /* variable name in char var */
   Value = .;                  /* use missing values for scoring data */
   do Percentile = 0 to 1 by 0.1;   /* evenly spaced in [0,1] */
      output;
   end;
end;
drop Var:;
run;

/* 2. Sort by BY-group variable */
proc sort data=Long;  by vName Percentile;  run;

/* 3. Call PROC REG and use BY statement to compute all regressions */
options nonotes;
proc transreg data=Long noprint;
by vName;
MODEL IDENTITY(Value) =SMOOTH(Percentile/SM=0);
OUTPUT OUT=CubicSpline p;
RUN;
options notes;

/* for this small example with 4 variables, you can graph the results */
proc sgplot data=CubicSpline;
series x=Percentile y=PValue / group=vName;
run;
SASaholic629
Fluorite | Level 6

My scoring data set isn't evenly spaced, so I'll have to merge that in. But I ran through your example and I'm pretty sure this is doing exactly what I need. Going to try to apply to my data and hopefully will be coming back here to mark this as the solution! Thanks, Rick

SASaholic629
Fluorite | Level 6

This is pure gold. Works perfectly! Thank you so much!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1882 views
  • 1 like
  • 3 in conversation