BookmarkSubscribeRSS Feed
Nupur20
Calcite | Level 5

Hi,

I have a dataset like:

ColA   ColB    ColC

Red       5        10

Blue      3         24

Average  4         5

Orange

Purple

etc......

I want to have a code that would let me keep all the values at or above average and I dont know how to do it.

I would apprecaite your time nad suggestions regarding the same.

thanks a lot!

10 REPLIES 10
art297
Opal | Level 21

What do you want the file to look like after running the code?

Nupur20
Calcite | Level 5

I want the file to look like this:

ColA   ColB    ColC

Red       5        10

Blue      3         24

Average  4         5

Basically I want to keep all the rows above starting from average.

In this example data set above, I have "average" at row number three but this could change.

Thanks!

art297
Opal | Level 21

If 'Average' is the lowest value for ColA, depending upon the output you want, you might be able to use something like:

data have;

  input ColA $   ColB    ColC;

  cards;

Red       5        10

Blue      3         24

Average  4         5

Orange   7         4

Purple   2         6

;

proc sort data=have;

  by cola;

run;

data want (drop=_:);

  set have;

  retain _bavg _cavg;

  if _n_ eq 1 then do;

    _bavg=colb;

    _cavg=colc;

  end;

  else do;

    if colb lt _bavg then call missing(colb);

    if colc lt _cavg then call missing(colc);

    output;

  end;

run;

Nupur20
Calcite | Level 5

Thanks for your response. I am not really concerned with the value associated with "average". It can lowest or highest. I just want to keep average and all the rows above that. 

art297
Opal | Level 21

data have;

  input ColA $   ColB    ColC;

  cards;

Red       5        10

Blue      3         24

Average  4         5

Orange   7         4

Purple   2         6

;

data want;

  set have;

  output;

  if cola eq 'Average' then stop;

run;

Nupur20
Calcite | Level 5

Thank you so much for your response. Your code worked perfectly fine.

I am sorry. I missed something in my question above.

Along with what I asked before, I also want to have this:

Using the sample dataset above, I have added numbers to ColB and ColC for orange and purple as well.

ColA   ColB    ColC

Red       5        10

Blue      3         24

Average  4         5

Orange   4         6

Purple    4         3

etc......

Basically, I want to keep evrything which is at or above the averatge in the dataset ( by only looking at numbers in ColB). As you can see orange and purple have both 4 in ColB which is same as average so I want to keep them as well.

If a do a sort by ColA, this would sort ColA it alphabetically and I would still have Orange and purple below it.

Using your previous code would exclude orange and purple from the dataset.

Also, a separate data set for numbers below average which is in this case would be "Blue".

So I will have two datasets in the end.

1. At or above average

ColA   ColB    ColC

Red       5        10

Average  4         5

Orange   4         6

Purple    4         3

2. Below average

ColA   ColB    ColC

Blue      3         24

Can you please tell me the solution to it. I am extremely sorry, I didnt frame my question appropriately previuosly. I would appreciate your help again regarding the same.

thanks a ton!

art297
Opal | Level 21

Does the following do what you want?

data have;

  input ColA $   ColB    ColC;

  cards;

Red       5        10

Blue      3         24

Average  4         5

Orange   4         6

Purple    4         3

;

proc sort data=have;

  by cola;

run;

data want (drop=_:);

  set have;

  retain _bavg _cavg;

  if _n_ eq 1 then do;

    _bavg=colb;

    _cavg=colc;

  end;

  else if colb ge _bavg then output;

run;

Tom
Super User Tom
Super User

Is the row with average in the dataset as in your example? 

If so then you can read that row in first to get the value to use to test the other records.

data want ;

   if _n_=1 then set have(where=(colA='Average') keep=ColA ColB rename=(ColB=_average) );

   set have ;

   if ColB >= _average;

  drop _average;

run;

art297
Opal | Level 21

I don't recall having seen your requirement for two data sets.  How about:

data have;

  input ColA $   ColB    ColC;

  cards;

Red       5        10

Blue      3         24

Average  4         5

Orange   4         6

Purple    4         3

;

proc sort data=have;

  by cola;

run;

data above (drop=_:) below  (drop=_:)

  set have;

  retain _bavg;

  if _n_ eq 1 then _bavg=colb;

  else if colb ge _bavg then output above;

  else output below;

run;

Linlin
Lapis Lazuli | Level 10

data have;

  input ColA $   ColB    ColC;

  cards;

Red       5        10

Blue      3         24

Average   4         5

Orange    4         6

Purple    4         3

;

data at_or_above_average below_average;

   if _n_=1 then set have(where=(colA='Average') keep=ColA ColB rename=(ColB=_average) );

   set have ;

   if ColB >= _average then output at_or_above_average;

     else output below_average;

  drop _average;

run;

Linlin

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 1670 views
  • 4 likes
  • 4 in conversation