BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Bright
Obsidian | Level 7

Hello,

I have a dataset looks like this:

Location

Date

Number of warnings (N)

Number of correct warnings (T)

A

06/19/2020

3

2

A

06/29/2020

5

3

A

06/30/2020

1

1

A

07/1/2020

4

3

A

07/05/2020

7

1

A

07/15/2020

2

0

A

07/19/2020

3

1

A

….

B

05/12/2020

4

4

B

05/24/2020

1

0

B

05/27/2020

3

2

B

….

For each specific “Date” I want to calculate R=Sum(T)/Sum(N) during the last month. For example, if we consider Location “A” and Date “07/19/2020”, I want to have R=(0+1+3+1+3+2)/(2+7+4+1+5+3).

Any help is appreciated!

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Make 2 arrays, NVALS and TVALS, indexed by date.  Set the array bounds to be the lowest and highest expected dates.  Then for each new observation, sum the preceding 30 elements of the two arrays, and get the ration of those sums:

 

data want (drop=_:);
  set have;
  by location;
  array tvals{%sysevalf("01jan2019"d):%sysevalf("31dec2020"d)} _temporary_;
  array nvals{%sysevalf("01jan2019"d):%sysevalf("31dec2020"d)} _temporary_;
  if first.location=1 then call missing(of tvals{*},of nvals{*},tsum,nsum);
  tvals{date}=t;
  nvals{date}=n;
  do _d=1 to 30;
    tsum=sum(0,tsum,tvals{date-_d});
    nsum=sum(0,nsum,nvals{date-_d});
  end;
  ratio=divide(tsum,nsum);
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

8 REPLIES 8
sbxkoenk
SAS Super FREQ

Hello,

Here you are.

A hash table look-up would be much faster though.

Solution published here is fine unless you have Gigabytes of data.

PROC DATASETS library=WORK memtype=DATA NoList;
 delete have0-have2 want ; run;
QUIT;

data have0;
infile cards delimiter='|';
input Location $ Date $ Nbr_warnings_N Nbr_correct_warnings_T;
datedate=input(date,mmddyy10.);
datedateMinus1Month=INTNX('MONTH',datedate,-1,'SAMEDAY');
format datedate datedateMinus1Month date9.; 
cards;
A|06/19/2020|3|2
A|06/29/2020|5|3
A|06/30/2020|1|1
A|07/01/2020|4|3
A|07/05/2020|7|1
A|07/15/2020|2|0
A|07/19/2020|3|1
B|05/12/2020|4|4
B|05/24/2020|1|0
B|05/27/2020|3|2
;
run;

PROC SQL noprint;
 create table work.have1 as
 select   t1.Location , t1.DateDate , t1.datedateMinus1Month 
        , t2.DateDate as DateDateCountsQ 
        , t2.Nbr_warnings_N as NcountsQ , t2.Nbr_correct_warnings_T as TcountsQ
 from
   work.have0(drop=Date) t1
 , work.have0            t2
 where     t1.Location = t2.Location
       AND t1.DateDateMinus1Month <= t2.DateDate < t1.DateDate
 order by t1.Location, t1.DateDate  
;
QUIT;

PROC MEANS data=work.have1 SUM noprint NWAY;
 CLASS Location DateDate;
 var NcountsQ TcountsQ;
 output out=have2 sum= / autoname;
run;

data want;
 set have2(DROP=_:);
 R=TcountsQ_Sum/NcountsQ_Sum;
 if Location='A' And DateDate='19JUL2020'd then R_check=(0+1+3+1+3+2)/(2+7+4+1+5+3);
 else R_check=.;
run;
/* end of program */

Cheers,

Koen

Bright
Obsidian | Level 7
Thanks Koen, I have a large dataset ...
mkeintz
PROC Star

What is your definition of "last month".   In your example, you apparently mean all the dates including the date exactly one month prior (i.e. for 7/19/2020 totals, include 6/19/2020).  Is that correct?

 

On the date 1/31/2020 you want not only all of January, but also the last date of December, right?

 

If so, then for October 30, 2020 you would want  Sep 30-Oct30.   But consider October 31, 2020.  There is no Sep 31, so would it be Oct 1- Oct31,  or Sep 30-Oct31?   In such cases, what is your definition of "the date exactly one month prior"?

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Bright
Obsidian | Level 7
I want 30 days back, excluding the day itself.
mkeintz
PROC Star

Make 2 arrays, NVALS and TVALS, indexed by date.  Set the array bounds to be the lowest and highest expected dates.  Then for each new observation, sum the preceding 30 elements of the two arrays, and get the ration of those sums:

 

data want (drop=_:);
  set have;
  by location;
  array tvals{%sysevalf("01jan2019"d):%sysevalf("31dec2020"d)} _temporary_;
  array nvals{%sysevalf("01jan2019"d):%sysevalf("31dec2020"d)} _temporary_;
  if first.location=1 then call missing(of tvals{*},of nvals{*},tsum,nsum);
  tvals{date}=t;
  nvals{date}=n;
  do _d=1 to 30;
    tsum=sum(0,tsum,tvals{date-_d});
    nsum=sum(0,nsum,nvals{date-_d});
  end;
  ratio=divide(tsum,nsum);
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Patrick
Opal | Level 21

Below should work without consuming too much resources.

data have;
  infile datalines truncover;
  input Location $ Date:mmddyy10. N_warnings N_correct_warnings;
  format date date9.;
  datalines;
A 06/19/2020 3 2
A 06/29/2020 5 3
A 06/30/2020 1 1
A 7/01/2020 4 3
A 7/05/2020 7 1
A 07/15/2020 2 0
A 07/19/2020 3 1
A 07/21/2020 0 0
A 07/22/2020 0 0
A 08/01/2020 0 0
A 08/05/2020 0 0
A 08/15/2020 0 0
B 5/12/2020 4 4
B 05/24/2020 1 0
B 05/27/2020 3 2
;

/* &lb: look_back days */
%let lb=30;

data want(drop=_:);
  set have;
  by location date;

  /* arrays to store values of last &lb days */
  array nw  {&lb} 8 _temporary_;
  array ncw {&lb} 8 _temporary_;
  array dt  {&lb} 8 _temporary_;

  if first.location then 
    do;
      call missing(of nw[*], of ncw[*], of dt[*]);
    end;

  else
  if not first.location then
    do;
      /* delete values for array elements not meeting date range condition */
      dt[mod(date,&lb)+1]=date;
      do _i=1 to dim(dt);
        if dt[_i]=date or .<dt[_i]<date-&lb then
          do;
            _ind=mod(dt[_i],&lb)+1;
            call missing(nw[_ind]);
            call missing(ncw[_ind]);
            call missing(dt[_ind]);
          end; 
      end;
    end;
  
  /* create desired sums and ratios */
  nw_sum  =sum(of nw[*]);
  ncw_sum =sum(of ncw[*]);
  ratio=divide(sum(of ncw[*]),sum(of nw[*]));

  /* prep for next row: populate array elements with current row values */
  /* - only at the end to not include into calculations for current row */
  _ind=mod(date,&lb)+1;
  nw[_ind]  =N_warnings;
  ncw[_ind] =N_correct_warnings;
  dt[_ind]  =date;

run;

Patrick_0-1624166781578.png

 

PaigeMiller
Diamond | Level 26

A PROC SUMMARY solution (because PROC SUMMARY is my favorite PROC 😁 ) with a MULTILABEL format. I didn't do the division at the end, but you can add that in yourself.

 

data have0;
infile cards delimiter='|' truncover;
input Location $ Date :mmddyy10. Nbr_warnings_N Nbr_correct_warnings_T;
cards;
A|06/19/2020|3|2
A|06/29/2020|5|3
A|06/30/2020|1|1
A|07/01/2020|4|3
A|07/05/2020|7|1
A|07/15/2020|2|0
A|07/19/2020|3|1
B|05/12/2020|4|4
B|05/24/2020|1|0
B|05/27/2020|3|2
;
data _null_;
    set have0 end=eof;
    if _n_=1 then call execute('proc format; value df (multilabel)');
    call execute(cats(date-31,'-',date-1,'=',date));
    if eof then call execute ('; run;');
run;
proc summary data=have0 nway;
    class location;
    class date / mlf;
    var nbr_warnings_n nbr_correct_warnings_t;
    output out=_sums_ sum=/autoname;
    format date df.;
run;
data _sums1_;
    set _sums_;
    ndate=input(left(date),7.);
    format ndate mmddyy10.;
run;
--
Paige Miller
Ksharp
Super User

Assuming there were no tie for DATE .

 

data have;
  infile datalines truncover;
  input Location $ Date:mmddyy10. N T;
  format date date9.;
  datalines;
A 06/19/2020 3 2
A 06/29/2020 5 3
A 06/30/2020 1 1
A 7/01/2020 4 3
A 7/05/2020 7 1
A 07/15/2020 2 0
A 07/19/2020 3 1
A 07/21/2020 0 0
A 07/22/2020 0 0
A 08/01/2020 0 0
A 08/05/2020 0 0
A 08/15/2020 0 0
B 5/12/2020 4 4
B 05/24/2020 1 0
B 05/27/2020 3 2
;

data want;
 if _n_=1 then do;
  if 0 then set have;
  declare hash h();
  h.definekey('date');
  h.definedata('N','T');
  h.definedone();
 end;
set have;
by Location;
if first.Location then h.clear();
h.add();
sum_n=0;sum_t=0;
do i=date-1 to intnx('month',date,-1,'s') by -1;
  if h.find(key:i)=0 then do;sum_n+n;sum_t+t;end;
end;
want=divide(sum_t,sum_n);
drop i sum_: ;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 805 views
  • 6 likes
  • 6 in conversation