I have a Data set and need to do some operations
Data set:
PARM x y
3 x0 y0
3.1 x1 y1
2.9 x2 y2
5 x3 y3
First I run a condition filter and get result like this:
PARM x y Filter
3 x0 y0 pass
3.1 x1 y1 pass
2.9 x2 y2 pass
5 x3 y3 fail
Then I need to run a 4th-polynomial fit with points that pass the filter, below is my code:
proc glm data = wafdata noprint;
class x y;
model PARM = x
y
x * x
x * y
y * y
x * x * x
x * x * y
x * y * y
y * y * y
x * x * x * x
x * x * x * y
x * x * y * y
x * y * y * y
y * y * y * y /p;
output out = poly p = prediction r = residual;
ods graphics off;
run;
Then I need to get
PARM x y Filter Residual
3 x0 y0 pass xxx
3.1 x1 y1 pass xxx
2.9 x2 y2 pass xxx
5 x3 y3 fail xxx (this data point is not included in fitting but have residual calculation as well)
The problem I cannot solve is that I need to get residual from this fit for all data point, including points that fail the filter, so I cannot just delete the filter fail points and run fit.
Last time someone told me try use score but I cannot figure out how, is there anyone can help me on this problem in detail, using score or any other method?
I hope I describe clear enough.
Thanks!
Create your data set like this
PARM x y Filter original_parm
3 x0 y0 pass 3
3.1 x1 y1 pass 3.1
2.9 x2 y2 pass 2.9
. x3 y3 fail 5
Then the fail observation will not be used in the model fit, but it will produce a predicted value. Then it is a simple calculation to use the predicted value and original_parm to get the residual.
Create your data set like this
PARM x y Filter original_parm
3 x0 y0 pass 3
3.1 x1 y1 pass 3.1
2.9 x2 y2 pass 2.9
. x3 y3 fail 5
Then the fail observation will not be used in the model fit, but it will produce a predicted value. Then it is a simple calculation to use the predicted value and original_parm to get the residual.
@leonzheng wrote:
I tried your way but it seems not work, predictions for PARM = '.' points are also '.'
I don't know what this means. Show me the results you are getting.
What does a higher order for a categorical variable imply? Since it's 0/1 coding and squaring/cubing etc just result in similar values, ie -1 squared = 1, or -1 cubed = -1. I don't think that makes a lot of sense from a practical perspective. From a technical perspective it's easy enough to implement.
You use a BY statement when you're developing multiple regression models for different groups, for example if you were running this model for two or three different regions or countries you would want a separate model for each country.
SAS cannot calculate a residual with this method, because it does not have the data. However the method is mostly correct, you just need to add a second step to calculate it. Here's an example that works.
data heart;
set sashelp.heart;
where status='Dead';
Answer=AgeAtDeath;
if _n_ in (5, 10, 15) then ageAtDeath=.; *set to missing;
run;
proc glm data=heart;
class bp_status weight_status;
model ageatDeath = bp_status
weight_status
bp_status * bp_status
bp_status * weight_status
weight_status * weight_status
bp_status * bp_status * bp_status
bp_status * bp_status * weight_status
bp_status * weight_status * weight_status
weight_status * weight_status * weight_status
bp_status * bp_status * bp_status * bp_status
bp_status * bp_status * bp_status * weight_status
bp_status * bp_status * weight_status * weight_status
bp_status * weight_status * weight_status * weight_status
weight_status * weight_status * weight_status * weight_status /p;
output out = poly p = prediction r = residual; ;
run;quit;
data poly;
set poly;
if missing(residual) then residual = Answer - prediction;
run;
@leonzheng wrote:
I there any requirement for "by" in fitting in your way? I changed "by" group and it works... both "by" groups are character bu the working one is simpler (not working one is included in working one)
My actual dataset is much more conplicated, there are many other columns beside the ones I listed, for more detail in the "by" group, let me describe as below:
PARM x y Field ParmName F_ParmName Filter
3 x0 y0 1A AA 1A_AA pass
3.1 x1 y1 1A BB 1A_BB pass
2.9 x2 y2 1B AA 1B_AA pass
5 x3 y3 1B BB 1B_BB fail
Field and ParmName is independent, I contact them as a new column F_ParmName, PARM is the actual value of each ParmName. The dataset is much larger so there are much more than one row of data for each category as I showed. I used by Field fitting, PaigeMiller's way is working but fitting result is wired since by group is not right, but when I try by F_ParmName fitting, the PARM '.' row will have prediction '.' result.
The By group isn't relevant to your original question.
If your model isn't working, that's a different issue. From your initial question, does my response work and do what you expected?
If so, how is your current situation different from what you explained? The number of variables or BY statements shouldn't affect the solution I proposed.
@leonzheng wrote:
My actual dataset is much more conplicated, there are many other columns beside the ones I listed, for more detail in the "by" group, let me describe as below:
PARM x y Field ParmName F_ParmName Filter
3 x0 y0 1A AA 1A_AA pass
3.1 x1 y1 1A BB 1A_BB pass
2.9 x2 y2 1B AA 1B_AA pass
5 x3 y3 1B BB 1B_BB fail
Field and ParmName is independent, I contact them as a new column F_ParmName, PARM is the actual value of each ParmName. The dataset is much larger so there are much more than one row of data for each category as I showed. I used by Field fitting, PaigeMiller's way is working but fitting result is wired since by group is not right, but when I try by F_ParmName fitting, the PARM '.' row will have prediction '.' result.
I also thought By statement should not affect the solution, however,
case 1. use your solution with by Field
fitting works, have prediction and residual but fitting result is wired.
case 2. use your solution with by F_ParmName or ParmName
fitting works, but missing part has no prediction, thus cannot calculate residual.
@leonzheng wrote:
My actual dataset is much more conplicated, there are many other columns beside the ones I listed, for more detail in the "by" group, let me describe as below:
PARM x y Field ParmName F_ParmName Filter
3 x0 y0 1A AA 1A_AA pass
3.1 x1 y1 1A BB 1A_BB pass
2.9 x2 y2 1B AA 1B_AA pass
5 x3 y3 1B BB 1B_BB fail
Field and ParmName is independent, I contact them as a new column F_ParmName, PARM is the actual value of each ParmName. The dataset is much larger so there are much more than one row of data for each category as I showed. I used by Field fitting, PaigeMiller's way is working but fitting result is wired since by group is not right, but when I try by F_ParmName fitting, the PARM '.' row will have prediction '.' result.
If you are getting a missing value as a prediction, then either you have done something wrong, or your data is causing the missing value. But to be definitive, we need to see the code you used to implement the trick I showed in Message 2 above plus the code to fit the model, and we need to see your data, at least enough of the data to see the variables used the model. Please share this with us.
Here is my code:
data wafdata;
set wafdata;
PARM_fit = PARM;
if Tukey1 = 'fail' then PARM = .;
run;
proc glm data = wafdata noprint;
by F_ParmName;
class x y;
model PARM = x
y
x * x
x * y
y * y
x * x * x
x * x * y
x * y * y
y * y * y
x * x * x * x
x * x * x * y
x * x * y * y
x * y * y * y
y * y * y * y /p;
output out = poly p = prediction r = residual;
ods graphics off;
run;
data poly;
set poly;
if missing(residual) then residual = PARM_fit - prediction;
run;
Part of table wafdata is as this, also attached a xlsx file for whole data
PARMNAME PARM x y Field F_ParmName Filter
MR5 5.702313 -11.475 9.28 10A 10A_MR5 pass
MR5 5.701248 -9.945 9.28 10A 10A_MR5 pass
MR5 5.687179 -8.415 9.28 11A 11A_MR5 pass
MR5 -7777.7 -6.885 9.28 11A 11A_MR5 fail
MR5 5.632577 -5.355 9.28 11A 11A_MR5 pass
MR5 5.627558 -3.825 9.28 11A 11A_MR5 pass
MR5 5.653947 -2.295 9.28 12A 12A_MR5 pass
MR5 5.648395 -0.765 9.28 12A 12A_MR5 pass
MR5 5.740413 -22.185 8.7 14A 14A_MR5 pass
MR5 5.70026 -20.655 8.7 14A 14A_MR5 pass
MR5 5.715317 -19.125 8.7 14A 14A_MR5 pass
MRSUB 7295.616 18.36 39.73 11A 11A_MRSUB pass
MRSUB 7404.475 19.89 39.73 11A 11A_MRSUB pass
MRSUB 7431.816 21.42 39.73 11A 11A_MRSUB pass
MRSUB 1500000000 0.765 51.91 13A 13A_MRSUB fail
MRSUB 7061.483 2.295 51.91 13A 13A_MRSUB pass
MRSUB 7110.674 3.825 51.91 13A 13A_MRSUB pass
MRSUB 7063.209 5.355 51.91 14A 14A_MRSUB pass
MRSUB 7122.559 6.885 51.91 14A 14A_MRSUB pass
MRSUB 7108.894 8.415 51.91 14A 14A_MRSUB pass
Thank you!
Remove the CLASS statement.
Your variables X and Y are continuous. They are not class (or categorical) variables. So the putting these variables in the CLASS statement is wrong here.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.