Solved: Re: print specific observations

russoj5 · Posted 04-25-2021 07:35 PM

Good Evening,

I am taking a data analysis course and we are using SAS. I checked the dataset for outliers and I want to print the data that corresponds to those outliers. There are 200 observations in the dataset and I want to see what values they have so rather than printing the full dataset and scrolling, I want to print only observations 3, 55, and 196. I've found examples of how to print a range of observations, but can't find anything that says how to print specific observations.

Thank you for your help!

ballardw · Posted 04-25-2021 08:09 PM

@russoj5 wrote:

Good Evening,

I am taking a data analysis course and we are using SAS. I checked the dataset for outliers and I want to print the data that corresponds to those outliers. There are 200 observations in the dataset and I want to see what values they have so rather than printing the full dataset and scrolling, I want to print only observations 3, 55, and 196. I've found examples of how to print a range of observations, but can't find anything that says how to print specific observations.

Thank you for your help!

One way is to create a subset of the data:

data subset;
   set have;
   if _n_ in (3,55,196);
run;

Another is if the value of a single variable is extreme to use that in a Where statement

Proc print data=have;
   where somevariable > 123456;
run;

Or if multiple variables

Proc print data=have;
   where somevar > 123456 or othervar < 0.0001 ;
run;

The last two methods obviously require knowing a limit value of some type and the variable.

One of the big problems with doing things like this by observation number is that data sets get subsetted and resorted and the observation numbers change.

View solution in original post

ballardw · Posted 04-25-2021 08:09 PM

@russoj5 wrote:

Good Evening,

I am taking a data analysis course and we are using SAS. I checked the dataset for outliers and I want to print the data that corresponds to those outliers. There are 200 observations in the dataset and I want to see what values they have so rather than printing the full dataset and scrolling, I want to print only observations 3, 55, and 196. I've found examples of how to print a range of observations, but can't find anything that says how to print specific observations.

Thank you for your help!

One way is to create a subset of the data:

data subset;
   set have;
   if _n_ in (3,55,196);
run;

Another is if the value of a single variable is extreme to use that in a Where statement

Proc print data=have;
   where somevariable > 123456;
run;

Or if multiple variables

Proc print data=have;
   where somevar > 123456 or othervar < 0.0001 ;
run;

The last two methods obviously require knowing a limit value of some type and the variable.

One of the big problems with doing things like this by observation number is that data sets get subsetted and resorted and the observation numbers change.

russoj5 · Posted 04-26-2021 07:57 AM

Thank you for your help. I used the first solution to get the data I wanted to review. I used code to calculate studentized residuals and Cook's D, which generated a table of data that did not include the raw data from the csv file. I wanted to look at the raw data to see what the values of the predictor variables were to review prior to deciding whether I should remove them from the dataset. I realize that observation numbers can change as things get sorted or manipulated, but the data was taken directly from a csv file and there wouldnt be any intermediary steps changing the data at all so figured it would be safe to print the observations from the raw data. The code I used to generate the outlier and influential point data shows values for observations in relation to the dependent variable so the only way to review the values of the specific datapoint is to cross reference the observations from the table to the source file. There's probably a much easier/better way to do it, but I'm new and this was what I came up with. We weren't required to print the observation datapoints. It was just something I wanted to do. Thank you again for your help!

mkeintz · Posted 04-25-2021 10:49 PM

What criterion are you using to identify outliers? Why not use that criterion for a WHERE statement in a PROC PRINT? After all, if you re-order the original data, you'd have the same outliers, but they would not likely be in observations 3, 55, and 196.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

russoj5 · Posted 04-26-2021 08:07 AM

I used studentized residuals and Cook's D. I probably could have used a where statement in a proc print if I had more experience. This is my first course using this software and don't have a lot of resources for it. This is the code I used:

TITLE "Identifying Outliers & Influential Points";

PROC REG data=PGATour;

model ln_prize = GIR BirdieConversion PuttsPerRound/influence r;

plot student.*(GIR BirdieConversion PuttsPerRound predicted.);

plot npp.*student.;

RUN;

Influence and r print a table that provides data, but don't show the raw data for the variables. I used the observations from that table to match with the corresponding raw data observations to determine if the datapoints should be removed or not. I was copying and pasting the observations from the raw table into a spreadsheet so I could review the data, but thought there should be a much easier way to do it.

print specific observations

Re: print specific observations

Re: print specific observations

Re: print specific observations

Re: print specific observations

Re: print specific observations

Catch up on SAS Innovate 2026