BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
leonzheng
Obsidian | Level 7

I have a Data set and need to do some operations

Data set:

PARM       x         y

3               x0      y0

3.1            x1      y1

2.9            x2      y2

5               x3      y3

First I run a condition filter and get result like this:

PARM       x         y        Filter

3               x0      y0       pass

3.1            x1      y1       pass

2.9            x2      y2       pass

5               x3      y3       fail

 

Then I need to run a 4th-polynomial fit with points that pass the filter, below is my code:

proc glm data = wafdata noprint;
class x y;
model PARM = x
                         y
                         x * x
                         x * y
                         y * y
                         x * x * x
                         x * x * y
                         x * y * y
                         y * y * y
                         x * x * x * x
                         x * x * x * y
                         x * x * y * y
                         x * y * y * y
                         y * y * y * y /p;
output out = poly p = prediction r = residual;
ods graphics off;
run;

 

Then I need to get

PARM       x         y        Filter      Residual

3               x0      y0       pass       xxx

3.1            x1      y1       pass       xxx

2.9            x2      y2       pass       xxx

5               x3      y3       fail          xxx          (this data point is not included in fitting but have residual calculation as well)

 

The problem I cannot solve is that I need to get residual from this fit for all data point, including points that fail the filter, so I cannot just delete the filter fail points and run fit.

Last time someone told me try use score but I cannot figure out how, is there anyone can help me on this problem in detail, using score or any other method?

I hope I describe clear enough.

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

Create your data set like this

 

PARM       x         y        Filter     original_parm

3               x0      y0       pass          3

3.1            x1      y1       pass          3.1

2.9            x2      y2       pass          2.9

.               x3      y3          fail          5

 

Then the fail observation will not be used in the model fit, but it will produce a predicted value. Then it is a simple calculation to use the predicted value and original_parm to get the residual.

--
Paige Miller

View solution in original post

14 REPLIES 14
PaigeMiller
Diamond | Level 26

Create your data set like this

 

PARM       x         y        Filter     original_parm

3               x0      y0       pass          3

3.1            x1      y1       pass          3.1

2.9            x2      y2       pass          2.9

.               x3      y3          fail          5

 

Then the fail observation will not be used in the model fit, but it will produce a predicted value. Then it is a simple calculation to use the predicted value and original_parm to get the residual.

--
Paige Miller
leonzheng
Obsidian | Level 7
I tried your way but it seems not work, predictions for PARM = '.' points are also '.'
PaigeMiller
Diamond | Level 26

@leonzheng wrote:
I tried your way but it seems not work, predictions for PARM = '.' points are also '.'

I don't know what this means. Show me the results you are getting.

--
Paige Miller
leonzheng
Obsidian | Level 7
I there any requirement for "by" in fitting in your way? I changed "by" group and it works... both "by" groups are character bu the working one is simpler (not working one is included in working one)
Reeza
Super User

What does a higher order for a categorical variable imply? Since it's 0/1 coding and squaring/cubing etc just result in similar values, ie -1 squared = 1, or -1 cubed = -1. I don't think that makes a lot of sense from a practical perspective. From a technical perspective it's easy enough to implement.

 

You use a BY statement when you're developing multiple regression models for different groups, for example if you were running this model for two or three different regions or countries you would want a separate model for each country.

 

SAS cannot calculate a residual with this method, because it does not have the data. However the method is mostly correct, you just need to add a second step to calculate it. Here's an example that works. 

 

data heart;
set sashelp.heart;
where status='Dead';
Answer=AgeAtDeath;
if _n_ in (5, 10, 15) then ageAtDeath=.; *set to missing;
run;

proc glm data=heart;
class bp_status weight_status;
model  ageatDeath =  bp_status
                         weight_status
                         bp_status * bp_status
                         bp_status * weight_status
                         weight_status * weight_status
                         bp_status * bp_status * bp_status
                         bp_status * bp_status * weight_status
                         bp_status * weight_status * weight_status
                         weight_status * weight_status * weight_status
                         bp_status * bp_status * bp_status * bp_status
                         bp_status * bp_status * bp_status * weight_status
                         bp_status * bp_status * weight_status * weight_status
                         bp_status * weight_status * weight_status * weight_status
                         weight_status * weight_status * weight_status * weight_status /p;
output out = poly p = prediction r = residual; ;
run;quit;

data poly;
set poly;

if missing(residual) then residual = Answer - prediction;
run;

@leonzheng wrote:
I there any requirement for "by" in fitting in your way? I changed "by" group and it works... both "by" groups are character bu the working one is simpler (not working one is included in working one)



leonzheng
Obsidian | Level 7

My actual dataset is much more conplicated, there are many other columns beside the ones I listed, for more detail in the "by" group, let me describe as below:

PARM       x         y          Field        ParmName      F_ParmName               Filter

3               x0      y0           1A              AA                    1A_AA                      pass

3.1            x1      y1           1A              BB                    1A_BB                      pass

2.9            x2      y2           1B              AA                    1B_AA                      pass

5               x3      y3           1B              BB                    1B_BB                      fail

 

Field and ParmName is independent, I contact them as a new column F_ParmName, PARM is the actual value of each ParmName. The dataset is much larger so there are much more than one row of data for each category as I showed. I used by Field fitting, PaigeMiller's way is working but fitting result is wired since by group is not right, but when I try by F_ParmName fitting, the PARM '.' row will have prediction '.' result. 

Reeza
Super User

The By group isn't relevant to your original question. 

 

If your model isn't working, that's a different issue. From your initial question, does my response work and do what you expected?

 

If so, how is your current situation different from what you explained? The number of variables or BY statements shouldn't affect the solution I proposed.

 


@leonzheng wrote:

My actual dataset is much more conplicated, there are many other columns beside the ones I listed, for more detail in the "by" group, let me describe as below:

PARM       x         y          Field        ParmName      F_ParmName               Filter

3               x0      y0           1A              AA                    1A_AA                      pass

3.1            x1      y1           1A              BB                    1A_BB                      pass

2.9            x2      y2           1B              AA                    1B_AA                      pass

5               x3      y3           1B              BB                    1B_BB                      fail

 

Field and ParmName is independent, I contact them as a new column F_ParmName, PARM is the actual value of each ParmName. The dataset is much larger so there are much more than one row of data for each category as I showed. I used by Field fitting, PaigeMiller's way is working but fitting result is wired since by group is not right, but when I try by F_ParmName fitting, the PARM '.' row will have prediction '.' result. 


 

leonzheng
Obsidian | Level 7

I also thought By statement should not affect the solution, however,

case 1. use your solution with by Field

             fitting works, have prediction and residual but fitting result is wired. 

case 2. use your solution with by F_ParmName or ParmName

             fitting works, but missing part has no prediction, thus cannot calculate residual.

PaigeMiller
Diamond | Level 26

@leonzheng wrote:

My actual dataset is much more conplicated, there are many other columns beside the ones I listed, for more detail in the "by" group, let me describe as below:

PARM       x         y          Field        ParmName      F_ParmName               Filter

3               x0      y0           1A              AA                    1A_AA                      pass

3.1            x1      y1           1A              BB                    1A_BB                      pass

2.9            x2      y2           1B              AA                    1B_AA                      pass

5               x3      y3           1B              BB                    1B_BB                      fail

 

Field and ParmName is independent, I contact them as a new column F_ParmName, PARM is the actual value of each ParmName. The dataset is much larger so there are much more than one row of data for each category as I showed. I used by Field fitting, PaigeMiller's way is working but fitting result is wired since by group is not right, but when I try by F_ParmName fitting, the PARM '.' row will have prediction '.' result. 


If you are getting a missing value as a prediction, then either you have done something wrong, or your data is causing the missing value. But to be definitive, we need to see the code you used to implement the trick I showed in Message 2 above plus the code to fit the model, and we need to see your data, at least enough of the data to see the variables used the model. Please share this with us.

--
Paige Miller
leonzheng
Obsidian | Level 7

Here is my code:

data wafdata;
  set wafdata;
  PARM_fit = PARM;  
  if Tukey1 = 'fail' then PARM = .;
run;

 

proc glm data = wafdata noprint;
  by F_ParmName;
  class x y;
  model PARM = x
                           y
                           x * x
                           x * y
                           y * y
                           x * x * x
                           x * x * y
                           x * y * y
                           y * y * y
                           x * x * x * x
                           x * x * x * y
                           x * x * y * y
                           x * y * y * y
                           y * y * y * y /p;
  output out = poly p = prediction r = residual;
  ods graphics off;
run;

 

data poly;
set poly;
if missing(residual) then residual = PARM_fit - prediction;
run;

 

Part of table wafdata is as this, also attached a xlsx file for whole data

PARMNAME                PARM                x                y                    Field                  F_ParmName                Filter
MR5                            5.702313        -11.475        9.28                 10A                   10A_MR5                       pass
MR5                            5.701248         -9.945         9.28                 10A                   10A_MR5                      pass
MR5                            5.687179         -8.415         9.28                 11A                    11A_MR5                      pass
MR5                            -7777.7            -6.885         9.28                 11A                    11A_MR5                      fail
MR5                            5.632577         -5.355         9.28                 11A                    11A_MR5                      pass
MR5                            5.627558         -3.825         9.28                 11A                    11A_MR5                      pass
MR5                            5.653947         -2.295         9.28                 12A                    12A_MR5                      pass
MR5                            5.648395         -0.765         9.28                 12A                    12A_MR5                      pass
MR5                            5.740413          -22.185      8.7                   14A                     14A_MR5                    pass
MR5                            5.70026            -20.655      8.7                   14A                    14A_MR5                     pass
MR5                            5.715317         -19.125       8.7                   14A                     14A_MR5                    pass
MRSUB                       7295.616           18.36       39.73               11A                     11A_MRSUB                pass
MRSUB                       7404.475           19.89       39.73               11A                      11A_MRSUB               pass
MRSUB                       7431.816           21.42        39.73               11A                     11A_MRSUB               pass
MRSUB                      1500000000       0.765        51.91               13A                     13A_MRSUB               fail
MRSUB                       7061.483           2.295        51.91               13A                     13A_MRSUB               pass
MRSUB                       7110.674           3.825        51.91               13A                     13A_MRSUB                pass
MRSUB                      7063.209            5.355        51.91               14A                     14A_MRSUB                pass
MRSUB                      7122.559            6.885        51.91               14A                     14A_MRSUB                pass
MRSUB                      7108.894            8.415        51.91               14A                     14A_MRSUB                pass

 

Thank you!

PaigeMiller
Diamond | Level 26

Remove the CLASS statement.

--
Paige Miller
leonzheng
Obsidian | Level 7
It worked! Thank you, could you tell me why I cannot have class statement?
PaigeMiller
Diamond | Level 26

Your variables X and Y are continuous. They are not class (or categorical) variables. So the putting these variables in the CLASS statement is wrong here.

--
Paige Miller

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 14 replies
  • 3112 views
  • 4 likes
  • 3 in conversation