turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Learn SAS
- /
- Analytics U
- /
- Dropping values >x

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-21-2015 02:31 PM

Here is my coding for my analysis thus far:

proc import datafile="/folders/myfolders/Trayless/DHData.xlsx"

out=WORK.Analysis

DBMS=XLSX

Replace;

run;

proc freq DATA=WORK.Analysis;

WHERE DineHall='Snelling' and DayCat='After';

tables TotPlate NEplate SalBowls NoSalBow DesPlate NoDesPlt/chisq;

run;

I am trying to figure out the coding to where I can drop values >6 only for TotPlate.

I have tried the following coding and it does not end up dropping the values I need dropped:

data remove;

set WORK.Analysis;

if TotPlate>6 then delete;

run;

How do I get rid of values for TotPlate>6 only?

Accepted Solutions

Solution

04-22-2015
09:05 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-22-2015 09:05 AM

This is very interesting data that you're analyzing. Are you planning a dinner party and trying to determine whether you have enough plates? I'm proud of you for using SAS for this. Or are you moonlighting as a plates accountant for UGA (the location of Snelling dining hall according to Google)?

Reading into the one record of data you supplied, it looks like TotPlates is actually the sum of the other variables. Is the maximum allowed plates assumed to be 6? But I'm not sure that the variables are all meant to be additive (is SalBowls the number of bowls you have while NoSalBow is how many are missing/not needed)?

Anyway, if TotPlat>6 is deemed "bad data", then my answer stands. You can mark that as MISSING while keeping the other variables with something like:

out=WORK.Analysis

DBMS=XLSX

Replace;

set WORK.Analysis;

if TotPlate>

TotPlat=

/* in case this is what you're after */

SumOfPlates = NEplate + SalBowls + NoSalBow + DesPlate + NoDesPlt

run;

WHERE DineHall='Snelling' and DayCat='After';

tables TotPlate NEplate SalBowls NoSalBow DesPlate NoDesPlt/chisq;

The output will show the number of non-missing values in the frequencies, which will be different than the total number of observations (if indeed this "cleansing" step results in missing values).

Chris

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-21-2015 02:46 PM

where totplate>6;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-21-2015 03:04 PM

That helps, but [where totplate>6;] also removes observations from the other variables as well. I am only trying to remove variables from TotPlate and not NEplate SalBowls NoSalBow DesPlate NoDesPlt.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-21-2015 03:16 PM

Sounds like you want to replace this with a missing value? Here's a statement for your DATA step.

if totplate>6 then totplate=.;

Chris

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-21-2015 04:43 PM

Chris,

So if my data looks like this (see below), for example, it will still include all of the other variables in the analysis besides TotPlate?

TotPlate NEplate SalBowls NoSalBow DesPlate NoDesPlt

7 1 1 2 1 1

So [if totplate>6 then totplate=.;] will eliminate the 7, but still include everything else, correct? (I don't have access to SAS at the moment and want to make sure I enter it correctly once I do have access again).

If that is the case, where in my existing code do I insert [if totplate>6 then totplate=.;] or is there a new code I need to create?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-22-2015 03:18 AM

You will have to split your proc freq and do the totplate table separately with the proper where condition.

In case you need a single report, output the freq tables to datasets and combine them.

---------------------------------------------------------------------------------------------

Maxims of Maximally Efficient SAS Programmers

Maxims of Maximally Efficient SAS Programmers

Solution

04-22-2015
09:05 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-22-2015 09:05 AM

This is very interesting data that you're analyzing. Are you planning a dinner party and trying to determine whether you have enough plates? I'm proud of you for using SAS for this. Or are you moonlighting as a plates accountant for UGA (the location of Snelling dining hall according to Google)?

Reading into the one record of data you supplied, it looks like TotPlates is actually the sum of the other variables. Is the maximum allowed plates assumed to be 6? But I'm not sure that the variables are all meant to be additive (is SalBowls the number of bowls you have while NoSalBow is how many are missing/not needed)?

Anyway, if TotPlat>6 is deemed "bad data", then my answer stands. You can mark that as MISSING while keeping the other variables with something like:

out=WORK.Analysis

DBMS=XLSX

Replace;

set WORK.Analysis;

if TotPlate>

TotPlat=

/* in case this is what you're after */

SumOfPlates = NEplate + SalBowls + NoSalBow + DesPlate + NoDesPlt

run;

WHERE DineHall='Snelling' and DayCat='After';

tables TotPlate NEplate SalBowls NoSalBow DesPlate NoDesPlt/chisq;

The output will show the number of non-missing values in the frequencies, which will be different than the total number of observations (if indeed this "cleansing" step results in missing values).

Chris

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-22-2015 11:56 AM

I am undergraduate student working with professors at UGA evaluating food waste in the dining halls.

TotPlate is actually not the sum of the other variables. It is just the total number of plates a student had on his/her tray including the number of non-empty plats and empty plates. The variables I have listed are the only variables I am focusing on at the moment.

TotPlate = total number of plates

NEplate = presence of non-empty plate (designated 1 or 0)

SalBowls = presence of salad bowls (designated 1 or 0)

NoSalBow = number of salad bowls on one tray

DesPlate = presence of dessert plate (designated 1 or 0)

NoDesPlt = number of dessert plates

Your answer worked! Thanks for the help!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-22-2015 02:46 PM

If this analysis was done at my college dining hall, there would have been a lot of "0"s for the SalBowls dummy var, and NoDesPlt would have registered quite high. This was back before fresh salads were in vogue.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-21-2015 03:18 PM

I'm not sure if this is what you mean but try this:

data remove;

set analysis;

if totplate > 6 then do:

neplate = .;

salbowls = .;

nosalbow = .;

desplate = .;

nodesplt = .;

end;

run;