Solved: How to find minimum value higher than current value-30?

Dodo3 · Posted 09-18-2023 09:47 AM

Hello.

I have a dataset with 2 columns - column ID and column DayOfVisit.

I want to add a column C, which contains foremost day of visit of the same person, within 30 previous days.

For example

ID DayOfVisit
a 1
a 20
a 25
a 40
a 52
b 3
b 80

In this example, 30previous days would be

ID DoV DoV-30
a 1 -29
a 20 -10
a 25 -5
a 40 10
a 52 22
b 3 -27
b 80 50

For example,
the fourth row should find DoV
with same ID (1, 20, 25, 40, 52)
and within 30 previous days (20, 25, 40)
and minimum among them (20).

The result should be
ID DoV Result
a 1 1
a 20 1
a 25 1
a 40 20
a 52 25
b 3 3
b 80 80

Is this possible?

FreelanceReinh · Posted 09-18-2023 11:44 AM

Hello @Dodo3 and welcome to the SAS Support Communities!

Your sample data are sorted by ID DoV and there are no duplicate observations. If this is true for your real data as well, you can also use this PROC SQL step:

proc sql;
create table want as
select a.*, min(b.DoV) as Result
from have a, have b
where a.ID=b.ID & a.DoV-30<=b.DoV<=a.DoV
group by 1, 2;
quit;

Otherwise (and if there is no unique key available such as a sequence number), you can temporarily create a unique key:

data v_temp / view=v_temp;
seqno=_n_;
set have;
run;

proc sql;
create table want(drop=seqno) as
select a.*, min(b.DoV) as Result
from v_temp a, have b
where a.ID=b.ID & a.DoV-30<=b.DoV<=a.DoV
group by 1, 2, 3;
drop view v_temp;
quit;

Variable SEQNO serves two purposes: It prevents the removal of duplicates and helps maintaining the original sort order.

View solution in original post

Oligolas · Posted 09-18-2023 10:05 AM

I unfortunately do not understand a single word of what you want to achieve and how you logically get to the results you posted.

Could you please rephrase the sentence "I want to add a column C, which contains minimum value of B with same ID and large or equal to current B value-30."

and explain how you get the 4th observation "a 40 20"?

________________________
- Cheers -

Dodo3 · Posted 09-18-2023 10:28 AM

A B
a 1
a 20
a 25
a 40
a 52
b 3
b 80

In this example, B-30 would be

A B B-30
a 1 -29
a 20 -10
a 25 -5
a 40 10
a 52 22
b 3 -27
b 80 50

Then, I want column C to contain minimum value of B, but larger or equal to
B-30 within same ID.

The fourth row should find minimum B, within the same group a (1, 20, 25,
40, 52) and larger or equal to 40-30 (20, 25, 40, 53).

Actually, column B contains the dates of visit, and I want to get the
foremost day of visit within 30 previous days.

Dodo3 · Posted 09-18-2023 10:42 AM

Thank you for your reply. I added example and specified what I need to do.

ballardw · Posted 09-18-2023 11:06 AM

This "works" for your example data. Note the first data step is to make data like you describe. You should provide such a data step to provide example data because there will be cases when we can't tell what your actual variable type or name may actually be and a data step clears that up.

data have;
   input id $ Dov;
datalines;
a 1
a 20
a 25
a 40
a 52
b 3
b 80
;

/* your actual data may want*/
proc sort data=have;
   by id dov;
run;
proc transpose data=have out=trans (drop=_name_)
     prefix=dv;
   by id ;
   var dov;
run;
   
data want;
   merge have
         trans
   ;
   by id;
   array d(*) dv: ;
   do i=1 to dim(d);
      if abs(dov- d[i]) le 30 then do;
         dov_30days=d[i];
         leave;
      end;
   end;
   drop i dv: ;
run;

What this does:

The sort is to get all of your ID values together in visit order.

The transpose will create a data set with one row for each ID and all of the visit date values.

The second data step MERGEs the existing data to the transposed set to allow comparison of the DOV variable to the entire list. Because of the order from proc sort the DV variables in the transposed set are in order of values.

The ARRAY statement uses a list creator DV: , note the colon, to use all variables whose names start with DV as elements of the array. If your data actually has variables that start withe DV then the PROC TRANSPOSE code would require changing the PREFIX option to change the name to something that won't cause a problem. This sort of list is kind of needed as we do not know how many actual values of visits your complete data set might have.

The DO loop tests each value of the array, the visit days values, with the described rule. ABS is "absolute value" so does the subtraction you showed in the example and returns a value which if less then or equal to30 meets your rule and sets the value of the target variable. The LEAVE instruction quits the Do loop when found, so stops on the first value found that matches your want.

The Drop removes the loop counter variable and all the DV variables.

FreelanceReinh · Posted 09-18-2023 11:44 AM

Hello @Dodo3 and welcome to the SAS Support Communities!

Your sample data are sorted by ID DoV and there are no duplicate observations. If this is true for your real data as well, you can also use this PROC SQL step:

proc sql;
create table want as
select a.*, min(b.DoV) as Result
from have a, have b
where a.ID=b.ID & a.DoV-30<=b.DoV<=a.DoV
group by 1, 2;
quit;

Otherwise (and if there is no unique key available such as a sequence number), you can temporarily create a unique key:

data v_temp / view=v_temp;
seqno=_n_;
set have;
run;

proc sql;
create table want(drop=seqno) as
select a.*, min(b.DoV) as Result
from v_temp a, have b
where a.ID=b.ID & a.DoV-30<=b.DoV<=a.DoV
group by 1, 2, 3;
drop view v_temp;
quit;

Variable SEQNO serves two purposes: It prevents the removal of duplicates and helps maintaining the original sort order.

Tom · Posted 09-18-2023 03:22 PM

So you want the rolling 30 day minimum of the actual DOV value? Normally you would do this sort of rolling calculation of some other variable.

The best solution is to use SAS/ETS procedures. So check if the those licensed.

Otherwise if the data is smallish then doing a self join in the PROC SQL is the clearest. Just left join the table with itself and use your DOV window in the ON condition.

This will work for your test data (renamed Result to EXPECT so we can make sure it worked).

data have;
  input ID $ DoV expect;
cards;
a  1  1
a 20  1
a 25  1
a 40 20
a 52 25
b  3  3
b 80 80
;

proc sql;
create table want as
  select a.*,min(b.dov) as Result
  from have a
  left join have b
  on a.id=b.id and b.dov between a.dov-30 and a.dov
  group by a.id,a.dov,a.expect
;
quit;

If the data is too large then you can use an array of 30 elements to keep track of the values for the last 30 days. You can use MOD() function to simulate a "wrap around" array. Make sure the empty out the array elements for the skipped days.

data want;
  do until (last.id);
    set have;
    by id dov ;
    array rolling[0:29] rolling0-rolling29 ;
    if not first.id then do index=lag_dov+1 to dov-1 ;
      rolling[mod(index,30)]=.;
    end;
    rolling[mod(dov,30)]=dov;
    Result=min(of rolling[*]);
    output;
    lag_dov=dov;
  end;
 drop index lag_dov rolling0-rolling29;
run;

How to find minimum value higher than current value-30?

Re: How to find minimum value higher than current value-30?

Re: How to find minimum value higher than current value-30?

Re: How to find minimum value higher than current value-30?

Re: How to find minimum value higher than current value-30?

Re: How to find minimum value higher than current value-30?

Re: How to find minimum value higher than current value-30?

Re: How to find minimum value higher than current value-30?

Registration is open

SAS Training: Just a Click Away