I want to make a statement thats says;
If values in VALUE for each value in PLACE has a value over 400 most of the date
period make make a new variable that says '>400':
PLACE | DATE | VALUE | newvariable |
---|---|---|---|
A | 06/02/2021 | 676 | >400 |
A | 07/02/2021 | 566 | >400 |
A | 08/02/2021 | 345 | >400 |
A | 09/02/2021 | 23 | >400 |
A | 10/02/2021 | 1323 | >400 |
It is not clear what you want:
1) Does "for each value in place" mean the sum of VALUEs per place?
or you meant something else?
2) What do you mean by "most of the date"? "most" is ambiguous.
Each place has a value everyday.
Place A has different values, but if Place A mostly have values over 400 out of all dates then it should say >400 in the new column in every row,
I suppose "most" means more than 50% of days in a place.
The logic should be:
step 1) count number of days with value > 400 and number of days with value <= 400
step 2) assign the newvar value according to relation of those two counters
proc sort data=have; by place; run;
data temp;
set have;
by place;
retain gt400 le400;
if first.place then do; gt400=0; le400=0; end;
if value > 400 then gt400+1;
else le400+1;
if last.place then output;
run;
data want;
merge have temp;
by place;
length newvar $4;
retain newvar;
if first.place then newvar = ' ';
drop gt400 le400;
if gt400 > le400 then newvar = '>400';
run;
@mmea wrote:
I want to make a statement thats says;
If values in VALUE for each value in PLACE has a value over 400 most of the date period make make a new variable that says '>400':
PLACE DATE VALUE newvariable A 06/02/2021 676 >400 A 07/02/2021 566 >400 A 08/02/2021 345 >400 A 09/02/2021 23 >400 A 10/02/2021 1323 >400
By "PLACE has a value over 400 most of the date period", I think you mean the MEDIAN of value is >400 over the set of dates for a single place. If so then, you could use PROC UNIVARIATE to create a median for each PLACE, outputting the median to a dataset (NEED). Then merge NEED with your original dataset (call it HAVE) match-merging on PLACE.
proc univariate data=have noprint;
by place;
var value;
output out=need median=median_value;
run;
data want;
merge have need;
by place;
if median_value <= 400 then result='<=400';
else result='>400';
run;
This assumes your original data is sorted by PLACE.
If I understood right, you mean the count of >400 is greater than the count of <400 ?
data have;
input place $ date : $20. value;
cards;
A 06/02/2021 676
A 07/02/2021 566
A 08/02/2021 345
A 09/02/2021 23
A 10/02/2021 1323
;
proc sql;
create table want as
select *,case when sum(value>400)> sum(value<400) then '>400' else '<400' end as variable
from have
group by place;
quit;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.