@RandyStan wrote:
Tom: I do not follow your question
You have multiple observations with the same value for the date. In theory they could each have a different value for varA.
Is that actually possible? Does it actually happen? And if it happens how does that change what you want for the new variable?
As a side note do you need to keep multiple observations for the same date in the output dataset? Is there some other variable that will have different values one each of the multiple observations for the same date? If not then why not just collapse to one observation per date and make the process easier.
Hi Tom:
Sorry for not getting back earlier in the day.
This is a trading data set and there are multiple observations for other variables on a particular date.
The study is an event study where the window opens some days prior to the event and shuts some days after the event. The event date is flagged as VarA = 1 and VarB = 1. For a normal event study the code is simple (from the SA s Boards)
data want; x='15JAN2019'd; fivedaysprior = intnx('day',x,-5); twodaysafter = intnx('day',x,2); format x fivedaysprior twodaysafter date9.; run;
But what is complicating is that there are multiple events and multiple observations of other variables on the same day.
Does anyone have any suggestions?
Randy
Here's a variant of the previous using dates as key unlike record identifier and setcur hash method
data have;
input Date :mmddyy10. VarA VarB;
format date mmddyy10.;
cards;
6/20/2019 . .
6/20/2019 . .
6/20/2019 . .
6/20/2019 . .
6/21/2019 . .
6/21/2019 . .
6/21/2019 . .
6/22/2019 . 1
6/22/2019 . 1
6/22/2019 . 1
6/22/2019 . 1
6/23/2019 1 .
6/23/2019 1 .
6/23/2019 1 .
6/24/2019 . .
6/24/2019 . .
6/24/2019 . .
6/24/2019 . .
6/26/2019 . .
6/26/2019 . .
6/27/2019 . .
6/27/2019 . .
6/27/2019 . .
6/28/2019 . .
;
dm log 'clear';
data want;
if 0 then set have have(rename=(date=_date));
dcl hash H (dataset:'have(rename=(date=_date))',ordered:'a');
h.definekey ("_date");
h.definedata ("_date");
h.definedone ();
dcl hiter hi('h');
do until(z);
set have end=z;
by date;
array u varA varB;
array v VarA_BF VarB_BF;
array t(2,999999) _temporary_;
if first.date then do;
_count=0;
do i=1 to dim(u);
if u(i)=1 then do;
rc = hi.setcur(key:date);
do _count = 1 to 2 while (rc = 0);
rc = hi.prev();
_n_+1;
t(i,_n_)=_date;
end;
do rc=hi.setcur(key:date) by 0 while(rc=0 and _count<=5);
_n_+1;
_count+1;
t(i,_n_)=_date;
rc=hi.next();
end;
end;
end;
end;
end;
z=0;
do until(z);
set have end=z;
by date;
if first.date then do;
call missing(of v(*));
do i=1,2;
do j=1 to 999999;
if t(i,j)=date then do;
v(i)=1;
leave;
end;
end;
end;
end;
output;
end;
stop;
keep date var:;
run;
If the general pattern of your input data set is exactly as you've shown - in other words, the data variations whose possibility @Tom has sagely spelled out do not apply, you can simplify your program logic by eliminating the duplicates by DATE (again, @Tom's suggestion), creating the required variables for the unduplicated dates, and merging the results back with the original data. It can be done as a multi-step process creating a separate data set at every step. Below, it's done in a single step via a hash table made use of two things at once: (a) auto-eliminating the duplicates and (b) flagging the required items which are then merged with the original data.
data have ;
input date :mmddyy10. vara varb ;
format date yymmdd10. ;
cards;
6/20/2019 . .
6/20/2019 . .
6/20/2019 . .
6/20/2019 . .
6/21/2019 . .
6/21/2019 . .
6/21/2019 . .
6/22/2019 . 1
6/22/2019 . 1
6/22/2019 . 1
6/22/2019 . 1
6/23/2019 1 .
6/23/2019 1 .
6/23/2019 1 .
6/24/2019 . .
6/24/2019 . .
6/24/2019 . .
6/24/2019 . .
6/26/2019 . .
6/26/2019 . .
6/27/2019 . .
6/27/2019 . .
6/27/2019 . .
6/28/2019 . .
;
run ;
data want (drop = _:) ;
if _n_ = 1 then do ;
dcl hash h (ordered:"a") ;
h.definekey ("date") ;
h.definedata ("date", "vara", "varb", "vara_bf", "varb_bf") ;
h.definedone () ;
dcl hiter hi ("h") ;
do until (z) ;
set have end = z ;
h.ref() ;
end ;
do _q = 1 by 1 while (hi.next() = 0) ; link fill ; end ;
do _q = 1 by 1 while (hi.prev() = 0) ; link fill ; end ;
end ;
set have ;
h.find() ;
return ;
fill: if _q = 1 then call missing (_qa, _qb) ;
if vara then _qa = _q ;
if varb then _qb = _q ;
if _qa then if _q <= _qa + 2 then vara_bf = 1 ;
if _qb then if _q <= _qb + 2 then varb_bf = 1 ;
h.replace() ;
return ;
run ;
Kind regards
Paul D.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.