Solved: Can proc expand return the ID / position where it found the max value ...

acordes · Posted 09-30-2022 07:40 AM

data have;
  input date id value ;
datalines;
1 1 1
2 1 9
3 1 0
4 1 7
5 1 5
1 2 100
2 2 0
3 2 -2
4 2 2
5 2 0
6 2 1
;

proc expand data=have out=outy method=none;
by id;
   id date;
   convert value = max_value   / transform=(movmax 3) ;
run;

data have1;
set have;
by id value date notsorted;
if first.value;
run;

proc sql;
create table want as
select a.*, b.date as earliest_max_date from 
outy a left join have1 b on
a.max_value=b.value and a.id=b.id
order by id, date;
quit;

mkeintz · Posted 10-01-2022 02:11 PM

I'm unaware of proc expand being able to extract an identifier for within-window observations having a particular value statistic. But you can do this in a fairly simple data step:

data want (drop=_:);
  set have;
  by id;

  array value_window {0:2} _temporary_;
  array date_window  {0:2} _temporary_;

  if first.id then call missing(of value_window{*},of date_window{*});
  _i=mod(_n_,3);
  value_window{_i}=value;
  date_window{_i}=date;

  max_value=max(of value_window{*});
  do _n=max(1,_n_-2) to _n_ until (value_window{_i}=max_value);
    _i=mod(_n,3);
    earliest_max_date=date_window{_i};
  end;
run;

Note that the two arrays are NOT usually (i.e. two-thirds of the time) in date order. But that doesn't matter, because you won't process the arrays in index order. Instead they are processed in observation order which in turn is mapped to array index _i.

Also: the do statement

  do _n=max(1,_n_-2) to _n_ until (value_window{_i}=max_value);

is used instead of

  do _n=_n_-2 to _n_ until (value_window{_i}=max_value);

because _n-2 is negative and will generate negative values of the index _i for the first two observations. And _n-1 will similarly generate a negative index for one observation. Those negative indexes will not be valid.

Edited note: The code sequence

  do _n=max(1,_n_-2) to _n_ until (value_window{_i}=max_value);
    _i=mod(_n,3);
    earliest_max_date=date_window{_i};
  end;

might be better put as

  do _n=max(1,_n_-2) to _n_ until (value_window{_i}=max_value);
    _i=mod(_n,3);
  end;
  earliest_max_date=date_window{_i};

The results are the same, but the second version makes it more obvious that there is no need to repeatedly assign a value to earliest_max_date inside the loop.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

mkeintz · Posted 10-01-2022 02:11 PM

I'm unaware of proc expand being able to extract an identifier for within-window observations having a particular value statistic. But you can do this in a fairly simple data step:

data want (drop=_:);
  set have;
  by id;

  array value_window {0:2} _temporary_;
  array date_window  {0:2} _temporary_;

  if first.id then call missing(of value_window{*},of date_window{*});
  _i=mod(_n_,3);
  value_window{_i}=value;
  date_window{_i}=date;

  max_value=max(of value_window{*});
  do _n=max(1,_n_-2) to _n_ until (value_window{_i}=max_value);
    _i=mod(_n,3);
    earliest_max_date=date_window{_i};
  end;
run;

Note that the two arrays are NOT usually (i.e. two-thirds of the time) in date order. But that doesn't matter, because you won't process the arrays in index order. Instead they are processed in observation order which in turn is mapped to array index _i.

Also: the do statement

  do _n=max(1,_n_-2) to _n_ until (value_window{_i}=max_value);

is used instead of

  do _n=_n_-2 to _n_ until (value_window{_i}=max_value);

because _n-2 is negative and will generate negative values of the index _i for the first two observations. And _n-1 will similarly generate a negative index for one observation. Those negative indexes will not be valid.

Edited note: The code sequence

  do _n=max(1,_n_-2) to _n_ until (value_window{_i}=max_value);
    _i=mod(_n,3);
    earliest_max_date=date_window{_i};
  end;

might be better put as

  do _n=max(1,_n_-2) to _n_ until (value_window{_i}=max_value);
    _i=mod(_n,3);
  end;
  earliest_max_date=date_window{_i};

The results are the same, but the second version makes it more obvious that there is no need to repeatedly assign a value to earliest_max_date inside the loop.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

acordes · Posted 10-02-2022 09:03 AM

So arrays used that way intrinsically retain the value, right?

Tom · Posted 10-02-2022 10:15 AM

@acordes wrote:
So arrays used that way intrinsically retain the value, right?

It is not how they are used, but how they are defined.

Temporary arrays are retained.

If you define the array to use actual data step variables then whether or not to retain is determined separately for each variable. The same as variables that are not made available to be referenced via an array.

Can proc expand return the ID / position where it found the max value of movmax?

Re: Can proc expand return the ID / position where it found the max value of movmax?

Re: Can proc expand return the ID / position where it found the max value of movmax?

Re: Can proc expand return the ID / position where it found the max value of movmax?

Re: Can proc expand return the ID / position where it found the max value of movmax?

Can proc expand return the ID / position where it found the max value of movmax?

Re: Can proc expand return the ID / position where it found the max value of movmax?

Re: Can proc expand return the ID / position where it found the max value of movmax?

Re: Can proc expand return the ID / position where it found the max value of movmax?

Re: Can proc expand return the ID / position where it found the max value of movmax?

Registration is open