Good Evening,
I am taking a data analysis course and we are using SAS. I checked the dataset for outliers and I want to print the data that corresponds to those outliers. There are 200 observations in the dataset and I want to see what values they have so rather than printing the full dataset and scrolling, I want to print only observations 3, 55, and 196. I've found examples of how to print a range of observations, but can't find anything that says how to print specific observations.
Thank you for your help!
@russoj5 wrote:
Good Evening,
I am taking a data analysis course and we are using SAS. I checked the dataset for outliers and I want to print the data that corresponds to those outliers. There are 200 observations in the dataset and I want to see what values they have so rather than printing the full dataset and scrolling, I want to print only observations 3, 55, and 196. I've found examples of how to print a range of observations, but can't find anything that says how to print specific observations.
Thank you for your help!
One way is to create a subset of the data:
data subset; set have; if _n_ in (3,55,196); run;
Another is if the value of a single variable is extreme to use that in a Where statement
Proc print data=have; where somevariable > 123456; run;
Or if multiple variables
Proc print data=have; where somevar > 123456 or othervar < 0.0001 ; run;
The last two methods obviously require knowing a limit value of some type and the variable.
One of the big problems with doing things like this by observation number is that data sets get subsetted and resorted and the observation numbers change.
@russoj5 wrote:
Good Evening,
I am taking a data analysis course and we are using SAS. I checked the dataset for outliers and I want to print the data that corresponds to those outliers. There are 200 observations in the dataset and I want to see what values they have so rather than printing the full dataset and scrolling, I want to print only observations 3, 55, and 196. I've found examples of how to print a range of observations, but can't find anything that says how to print specific observations.
Thank you for your help!
One way is to create a subset of the data:
data subset; set have; if _n_ in (3,55,196); run;
Another is if the value of a single variable is extreme to use that in a Where statement
Proc print data=have; where somevariable > 123456; run;
Or if multiple variables
Proc print data=have; where somevar > 123456 or othervar < 0.0001 ; run;
The last two methods obviously require knowing a limit value of some type and the variable.
One of the big problems with doing things like this by observation number is that data sets get subsetted and resorted and the observation numbers change.
Thank you for your help. I used the first solution to get the data I wanted to review. I used code to calculate studentized residuals and Cook's D, which generated a table of data that did not include the raw data from the csv file. I wanted to look at the raw data to see what the values of the predictor variables were to review prior to deciding whether I should remove them from the dataset. I realize that observation numbers can change as things get sorted or manipulated, but the data was taken directly from a csv file and there wouldnt be any intermediary steps changing the data at all so figured it would be safe to print the observations from the raw data. The code I used to generate the outlier and influential point data shows values for observations in relation to the dependent variable so the only way to review the values of the specific datapoint is to cross reference the observations from the table to the source file. There's probably a much easier/better way to do it, but I'm new and this was what I came up with. We weren't required to print the observation datapoints. It was just something I wanted to do. Thank you again for your help!
What criterion are you using to identify outliers? Why not use that criterion for a WHERE statement in a PROC PRINT? After all, if you re-order the original data, you'd have the same outliers, but they would not likely be in observations 3, 55, and 196.
I used studentized residuals and Cook's D. I probably could have used a where statement in a proc print if I had more experience. This is my first course using this software and don't have a lot of resources for it. This is the code I used:
TITLE "Identifying Outliers & Influential Points";
PROC REG data=PGATour;
model ln_prize = GIR BirdieConversion PuttsPerRound/influence r;
plot student.*(GIR BirdieConversion PuttsPerRound predicted.);
plot npp.*student.;
RUN;
Influence and r print a table that provides data, but don't show the raw data for the variables. I used the observations from that table to match with the corresponding raw data observations to determine if the datapoints should be removed or not. I was copying and pasting the observations from the raw table into a spreadsheet so I could review the data, but thought there should be a much easier way to do it.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.