Hi,
I have a dataset like:
ColA ColB ColC
Red 5 10
Blue 3 24
Average 4 5
Orange
Purple
etc......
I want to have a code that would let me keep all the values at or above average and I dont know how to do it.
I would apprecaite your time nad suggestions regarding the same.
thanks a lot!
What do you want the file to look like after running the code?
I want the file to look like this:
ColA ColB ColC
Red 5 10
Blue 3 24
Average 4 5
Basically I want to keep all the rows above starting from average.
In this example data set above, I have "average" at row number three but this could change.
Thanks!
If 'Average' is the lowest value for ColA, depending upon the output you want, you might be able to use something like:
data have;
input ColA $ ColB ColC;
cards;
Red 5 10
Blue 3 24
Average 4 5
Orange 7 4
Purple 2 6
;
proc sort data=have;
by cola;
run;
data want (drop=_:);
set have;
retain _bavg _cavg;
if _n_ eq 1 then do;
_bavg=colb;
_cavg=colc;
end;
else do;
if colb lt _bavg then call missing(colb);
if colc lt _cavg then call missing(colc);
output;
end;
run;
Thanks for your response. I am not really concerned with the value associated with "average". It can lowest or highest. I just want to keep average and all the rows above that.
data have;
input ColA $ ColB ColC;
cards;
Red 5 10
Blue 3 24
Average 4 5
Orange 7 4
Purple 2 6
;
data want;
set have;
output;
if cola eq 'Average' then stop;
run;
Thank you so much for your response. Your code worked perfectly fine.
I am sorry. I missed something in my question above.
Along with what I asked before, I also want to have this:
Using the sample dataset above, I have added numbers to ColB and ColC for orange and purple as well.
ColA ColB ColC
Red 5 10
Blue 3 24
Average 4 5
Orange 4 6
Purple 4 3
etc......
Basically, I want to keep evrything which is at or above the averatge in the dataset ( by only looking at numbers in ColB). As you can see orange and purple have both 4 in ColB which is same as average so I want to keep them as well.
If a do a sort by ColA, this would sort ColA it alphabetically and I would still have Orange and purple below it.
Using your previous code would exclude orange and purple from the dataset.
Also, a separate data set for numbers below average which is in this case would be "Blue".
So I will have two datasets in the end.
1. At or above average
ColA ColB ColC
Red 5 10
Average 4 5
Orange 4 6
Purple 4 3
2. Below average
ColA ColB ColC
Blue 3 24
Can you please tell me the solution to it. I am extremely sorry, I didnt frame my question appropriately previuosly. I would appreciate your help again regarding the same.
thanks a ton!
Does the following do what you want?
data have;
input ColA $ ColB ColC;
cards;
Red 5 10
Blue 3 24
Average 4 5
Orange 4 6
Purple 4 3
;
proc sort data=have;
by cola;
run;
data want (drop=_:);
set have;
retain _bavg _cavg;
if _n_ eq 1 then do;
_bavg=colb;
_cavg=colc;
end;
else if colb ge _bavg then output;
run;
Is the row with average in the dataset as in your example?
If so then you can read that row in first to get the value to use to test the other records.
data want ;
if _n_=1 then set have(where=(colA='Average') keep=ColA ColB rename=(ColB=_average) );
set have ;
if ColB >= _average;
drop _average;
run;
I don't recall having seen your requirement for two data sets. How about:
data have;
input ColA $ ColB ColC;
cards;
Red 5 10
Blue 3 24
Average 4 5
Orange 4 6
Purple 4 3
;
proc sort data=have;
by cola;
run;
data above (drop=_:) below (drop=_:)
set have;
retain _bavg;
if _n_ eq 1 then _bavg=colb;
else if colb ge _bavg then output above;
else output below;
run;
data have;
input ColA $ ColB ColC;
cards;
Red 5 10
Blue 3 24
Average 4 5
Orange 4 6
Purple 4 3
;
data at_or_above_average below_average;
if _n_=1 then set have(where=(colA='Average') keep=ColA ColB rename=(ColB=_average) );
set have ;
if ColB >= _average then output at_or_above_average;
else output below_average;
drop _average;
run;
Linlin
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.