Learning SAS? Welcome to the exclusive online community for all SAS learners.

Dropping values >x

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 8
Accepted Solution

Dropping values >x

Here is my coding for my analysis thus far:

proc import datafile="/folders/myfolders/Trayless/DHData.xlsx"

     out=WORK.Analysis

     DBMS=XLSX

     Replace;

run;

proc freq DATA=WORK.Analysis;

WHERE DineHall='Snelling' and DayCat='After';

tables TotPlate NEplate SalBowls NoSalBow DesPlate NoDesPlt/chisq;

run;

I am trying to figure out the coding to where I can drop values >6 only for TotPlate.

I have tried the following coding and it does not end up dropping the values I need dropped:

data remove;

     set WORK.Analysis;

     if TotPlate>6 then delete;

     run;

How do I get rid of values for TotPlate>6 only?


Accepted Solutions
Solution
‎04-22-2015 09:05 AM
Community Manager
Posts: 2,955

Re: Dropping values >x

This is very interesting data that you're analyzing.  Are you planning a dinner party and trying to determine whether you have enough plates?  I'm proud of you for using SAS for this.  Or are you moonlighting as a plates accountant for UGA (the location of Snelling dining hall according to Google)?

Reading into the one record of data you supplied, it looks like TotPlates is actually the sum of the other variables.  Is the maximum allowed plates assumed to be 6?  But I'm not sure that the variables are all meant to be additive (is SalBowls the number of bowls you have while NoSalBow is how many are missing/not needed)?

Anyway, if TotPlat>6 is deemed "bad data", then my answer stands.  You can mark that as MISSING while keeping the other variables with something like:

proc import datafile="/folders/myfolders/Trayless/DHData.xlsx"
   
out=WORK.Analysis
   
DBMS=XLSX
   
Replace;
run;

data remove;
    set WORK.Analysis;
    if TotPlate>6 then
        TotPlat=
.;

   
/* in case this is what you're after */
    SumOfPlates = NEplate + SalBowls + NoSalBow + DesPlate + NoDesPlt
run;

proc freq DATA=WORK.Remove;
    WHERE DineHall='Snelling' and DayCat='After';
   
tables TotPlate NEplate SalBowls NoSalBow DesPlate NoDesPlt/chisq;
run;

The output will show the number of non-missing values in the frequencies, which will be different than the total number of observations (if indeed this "cleansing" step results in missing values).

Chris

View solution in original post


All Replies
Valued Guide
Posts: 860

Re: Dropping values >x

where totplate>6;

Occasional Contributor
Posts: 8

Re: Dropping values >x

Posted in reply to Steelers_In_DC

That helps, but [where totplate>6;] also removes observations from the other variables as well. I am only trying to remove variables from TotPlate and not NEplate SalBowls NoSalBow DesPlate NoDesPlt.

Community Manager
Posts: 2,955

Re: Dropping values >x

Sounds like you want to replace this with a missing value?  Here's a statement for your DATA step.

   if totplate>6 then totplate=.;

Chris

Occasional Contributor
Posts: 8

Re: Dropping values >x

Posted in reply to ChrisHemedinger

Chris,

So if my data looks like this (see below), for example, it will still include all of the other variables in the analysis besides TotPlate?

TotPlate   NEplate   SalBowls   NoSalBow   DesPlate   NoDesPlt  

   7                1                1                2                1                1

So [if totplate>6 then totplate=.;] will eliminate the 7, but still include everything else, correct? (I don't have access to SAS at the moment and want to make sure I enter it correctly once I do have access again).

If that is the case, where in my existing code do I insert [if totplate>6 then totplate=.;] or is there a new code I need to create?

Super User
Posts: 7,854

Re: Dropping values >x

You will have to split your proc freq and do the totplate table separately with the proper where condition.

In case you need a single report, output the freq tables to datasets and combine them.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Solution
‎04-22-2015 09:05 AM
Community Manager
Posts: 2,955

Re: Dropping values >x

This is very interesting data that you're analyzing.  Are you planning a dinner party and trying to determine whether you have enough plates?  I'm proud of you for using SAS for this.  Or are you moonlighting as a plates accountant for UGA (the location of Snelling dining hall according to Google)?

Reading into the one record of data you supplied, it looks like TotPlates is actually the sum of the other variables.  Is the maximum allowed plates assumed to be 6?  But I'm not sure that the variables are all meant to be additive (is SalBowls the number of bowls you have while NoSalBow is how many are missing/not needed)?

Anyway, if TotPlat>6 is deemed "bad data", then my answer stands.  You can mark that as MISSING while keeping the other variables with something like:

proc import datafile="/folders/myfolders/Trayless/DHData.xlsx"
   
out=WORK.Analysis
   
DBMS=XLSX
   
Replace;
run;

data remove;
    set WORK.Analysis;
    if TotPlate>6 then
        TotPlat=
.;

   
/* in case this is what you're after */
    SumOfPlates = NEplate + SalBowls + NoSalBow + DesPlate + NoDesPlt
run;

proc freq DATA=WORK.Remove;
    WHERE DineHall='Snelling' and DayCat='After';
   
tables TotPlate NEplate SalBowls NoSalBow DesPlate NoDesPlt/chisq;
run;

The output will show the number of non-missing values in the frequencies, which will be different than the total number of observations (if indeed this "cleansing" step results in missing values).

Chris

Occasional Contributor
Posts: 8

Re: Dropping values >x

Posted in reply to ChrisHemedinger

I am undergraduate student working with professors at UGA evaluating food waste in the dining halls.

TotPlate is actually not the sum of the other variables. It is just the total number of plates a student had on his/her tray including the number of non-empty plats and empty plates. The variables I have listed are the only variables I am focusing on at the moment.

TotPlate = total number of plates

NEplate = presence of non-empty plate (designated 1 or 0)

SalBowls = presence of salad bowls (designated 1 or 0)

NoSalBow = number of salad bowls on one tray

DesPlate = presence of dessert plate (designated 1 or 0)

NoDesPlt = number of dessert plates

Your answer worked! Thanks for the help!

Community Manager
Posts: 2,955

Re: Dropping values >x

If this analysis was done at my college dining hall, there would have been a lot of "0"s for the SalBowls dummy var, and NoDesPlt would have registered quite high.  This was back before fresh salads were in vogue.

Valued Guide
Posts: 860

Re: Dropping values >x

I'm not sure if this is what you mean but try this:

data remove;

set analysis;

if totplate > 6 then do:

     neplate = .;

     salbowls = .;

     nosalbow = .;

     desplate = .;

     nodesplt = .;

end;

run;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 502 views
  • 8 likes
  • 4 in conversation