BookmarkSubscribeRSS Feed
mmea
Quartz | Level 8

I want to make a statement thats says;

If values in VALUE for each value in  PLACE has a value over 400 most of the date 
period make make a new variable that says '>400':
PLACE DATE VALUE newvariable
A 06/02/2021 676 >400
A 07/02/2021 566 >400
A 08/02/2021 345 >400
A 09/02/2021 23 >400
A 10/02/2021 1323 >400
5 REPLIES 5
Shmuel
Garnet | Level 18

It is not clear what you want:

1) Does "for each value in place" mean the sum of VALUEs per place?

     or you meant something else?

2) What do you mean by "most of the date"? "most" is ambiguous.

mmea
Quartz | Level 8

Each place has a value everyday. 

Place A has different values, but if Place A mostly have values over 400 out of all dates then it should say >400 in the new column in every row,

 

 

Shmuel
Garnet | Level 18

I suppose "most" means more than 50% of days in a place. 

The logic should be:

step 1) count number of days with value > 400 and number of days with value <= 400

step 2) assign the newvar value according to relation of those two counters

proc sort data=have; by place; run;
data temp;
 set have;
  by place;
       retain gt400 le400;
       if first.place then do; gt400=0; le400=0; end;
       if value > 400 then gt400+1; 
       else le400+1;
       if last.place then output;
run;
data want;
 merge have temp;
   by place;
        length newvar $4;
        retain newvar;
if first.place then newvar = ' '; drop gt400 le400; if gt400 > le400 then newvar = '>400'; run;
mkeintz
PROC Star

@mmea wrote:

I want to make a statement thats says;

If values in VALUE for each value in  PLACE has a value over 400 most of the date 
period make make a new variable that says '>400':
PLACE DATE VALUE newvariable
A 06/02/2021 676 >400
A 07/02/2021 566 >400
A 08/02/2021 345 >400
A 09/02/2021 23 >400
A 10/02/2021 1323 >400

By "PLACE has a value over 400 most of the date period", I think you mean the MEDIAN of value is >400 over the set of dates for a single place.  If so then, you could use PROC UNIVARIATE to create a median for each PLACE, outputting the median to a dataset (NEED).  Then merge NEED with your original dataset (call it HAVE) match-merging on PLACE.

 

proc univariate data=have noprint;
  by place;
  var value;
  output out=need median=median_value;
run;

data want;
  merge have need;
  by place;
  if median_value <= 400 then result='<=400';
  else result='>400';
run;

This assumes your original data is sorted by PLACE.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User

If I understood right, you mean the count of >400 is greater than the count of <400 ?

 

data have;
input place $ date : $20. value;
cards;
A	06/02/2021	676	
A	07/02/2021	566	
A	08/02/2021	345	
A	09/02/2021	23	
A	10/02/2021	1323
;
proc sql;
create table want as
select *,case when sum(value>400)> sum(value<400) then '>400' else '<400' end as variable
 from have
  group by place;
quit;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 511 views
  • 0 likes
  • 4 in conversation