Programming the statistical procedures from SAS

using WHERE syntax in the Regression

Posts: 0

using WHERE syntax in the Regression

Hello All,

I want to remove an observation from the data.Suppose that I'm predicting Y from A,B and C and that I initially removed say D from the analysis.Now , if I want to not have the observation, say 50 in my analysis then in the WHERE condition what variable I must use.

The code I used is :-

proc reg data="c:\sasreg\crime";
model Y=A B C;
where D ne "50";

My question is that since D is not in my model then will that delete observation 50 from the model I'm interested in.

Kind Regards,
Posts: 8,781

Re: using WHERE syntax in the Regression

If variable D holds the observation number, then your logic would be correct. However, if D was originally one of your analysis variables, I find it hard to understand what purpose would have been served by having the observation number in your model.

Generally, to use WHERE logic to remove an observation from being used by a SAS procedure, you use criteria other than observation number - -because, for example, the observation number could change if you performed a sort on the dataset. What was observation 50 in one sorted order could become observation 214 in another sorted order. So, for example, if I did this:
proc reg data=sashelp.class;
model age = weight;
where name ne 'Alfred';

Then the observations where name was not equal to "Alfred" would be the only ones passed to PROC REG. As you can see, if you run the program, even though the NAME variable is not in the model, the WHERE statement will do its job and PROC REG will only get 18 observations to process. SASHELP.CLASS has 19 observations -- so with the exclusion of the observation for Alfred, only 18 observations get passed to PROC REG.

You might have to find another way to exclude observation 50, such as finding a unique combination of conditions that would identify that observation and only that observation. So the answer to your question is 1) the variable in your WHERE statement does not have to be in your MODEL statement in order to exclude observations; however 2) your particular WHERE statement will only delete observation #50 if variable D is holding the observation number or ID number that uniquely identifies observation #50 and only observation #50.

You are the only one who can answer #2 -- it's your data -- does variable D uniquely identify observation 50 and only observation 50??? What does variable D represent? What are the range of values in variable D?

Ask a Question
Discussion stats
  • 1 reply
  • 2 in conversation