Hello! I'm about as green as they get to programming, let alone for SAS, and I'm really struggling. So any help would be so very helpful.
I have 2 questions.
1) We had to split a data set into two (training and validation). In our training dataset we built our regression model. Now we need to test for sensitivity. There are two outliers. How do I do this?
2) Now that we have our model, we need to test it in our validation dataset. How do I do this? Below are the three ways I have tried and I get error messages that I don't understand.
Regarding #2 see examples here on how to score your data. Note that PROC SCORE is one method but you never tell it which model data to use and you never store the model output from PROC REG anywhere.
https://blogs.sas.com/content/iml/2014/02/19/scoring-a-regression-model-in-sas.html
We had to split a data set into two (training and validation). In our training dataset we built our regression model. Now we need to test for sensitivity. There are two outliers. How do I do this?
PROC REG has several methods in the MODEL statement to check for data points that might be considered extreme or high leverage. One is the R option which produces the Cook's D statistic and the other one is the INFLUENCE option. Outliers can also be obtain by the R option, any observation with a large residual (either positive or negative) is considered a potential outlier.
There are also multivariate measures of being an outlier using all of v6 v8 v9 v13 v14 v15 simultaneously.
Now that we have our model, we need to test it in our validation dataset. How do I do this? Below are the three ways I have tried and I get error messages that I don't understand.
proc reg data=training;model v18=v6 v8 v9 v13 v14 v15;/*Final Model*/run;
How do I append one data set to another?
Better you shouldn't split them in the first place. Just include a variable in the data set that contains either "Training" or "Validation". For the validation samples, make Y missing and then save the actual value of Y in another variable.
@koraskornel the link I included also shows how to do this method as well.
I think there are a number of problems here, such as you don't have a SET statement in your DATA step.
Also, the proper syntax of an IF statement is:
IF some condition THEN variablename=".";
Got it. I'll fix those and report back soon. Thank you!
Hi all. Thank you all so much for the help! Through all the responses I am understanding the basics better and sussed out a solution. Here is what I used to use the parameter estimates from data set (training), to another (validation).
@koraskornel wrote:
Thanks for the speedy response! I totally understand, but this is for a class and they are requiring us to split it via even and odd ID numbers. The following gets me the following error.data validation;if 0>=v18=>0 then ".";run;
You don't show any source for data. SAS would be expecting either a SET (or merge , update or modify statement) with existing data set or to read data from a file so that you would have a variable V18.
Some other issues:
0>=v18=>0 can only be true when v18 is = 0. So you need to reconsider what the limits here are supposed to actually be.
An "If then " requires either an action such as OUTPUT or DELETE, or a variable to assign a value to. If you want to assign a value to a variable you must list it such as the v18 = .; (to assign missing value to a numeric).
The comparison you use for v18 implies that is numeric. In which case you should not attempt to assign a character value. "." would be a character value of period.
It is generally not a good idea to use the same data set as the source and result for a data step. It is not a syntax error but if you have a coding issue that does not result in an error that halts the data step you can corrupt your data and would have to go back to an earlier point in the code to recover the set.
Example: suppose you intended to recode a value of 3 to something else: If var = 3 then var=.; ;
But accidentally type: if var >= 3 then var=. ; which would recode any value of 3 or larger, you cannot recover the previous values of var that were accidently coded to missing. It is better practice to use
Data newdataset;
set dataset;
<code>.
To go along with that, it is better to move all of your recoding or such into a single step than to create a bunch of data sets where you are modifying one variable or adding one variable at a time. Use a temporary data set to test. Then when it is working move it into your "main" recoding step.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.