Dear All,
My dataset looks like this:
company date date1 var1
1 19960630 19960624 5
1 19960630 19960625 6
1 19960630 19960626 7
1 19960630 19960627 2
1 19960630 19960628 0
1 19960630 19960701 9
1 19960630 19960702 6
1 19960630 19960703 1
1 19960630 19960705 5
1 19960630 19960708 8
2 20041231 20041227 2
2 20041231 20041228 6
2 20041231 20041229 9
2 20041231 20041230 5
2 20041231 20041231 3
2 20041231 20050103 8
2 20041231 20050104 7
2 20041231 20050105 6
2 20041231 20050106 4
2 20041231 20050107 0
I need to obtain the dataset that looks like this:
company date date1 var1
1 19960630 19960627 2
1 19960630 19960628 0
1 19960630 19960701 9
1 19960630 19960702 6
1 19960630 19960703 6
2 20041231 20041229 9
2 20041231 20041230 5
2 20041231 20041231 3
2 20041231 20050103 8
2 20041231 20050104 7
So, for each company, I need to select two days before and two days after the specified date (and specified date itself). In my case, specified date is column "date" and the days surrouding the specified date are in column "date1". I have cases where specified date is included in the "date1" column, but I also have cases when specified date is not included in the "date1" column. In any case, I need to select five days: the one closest (or equal) to the specified date and four days surrounding the closest (or specified) date.
Any help on this problem will be hugely appreciated.
Kind regards,
Ruslan
I'm not sure I understand this entirely. For each company, you want to select only the rows in which date1 lies within 2 days from date? Or am I misunderstanding?
Yes, you are right. Please bear in mind cases when date is included and not included in date1.
Are your two date fields SAS dates or just numbers that look like dates?
Art, CEO, AnalystFinder.com
Columns date and date1 are in date format.
Do something like this
data have;
input company date:yymmdd10. date1:yymmdd10. var1;
format date date1 yymmdd10.;
datalines;
1 19960630 19960624 5
1 19960630 19960625 6
1 19960630 19960626 7
1 19960630 19960627 2
1 19960630 19960628 0
1 19960630 19960701 9
1 19960630 19960702 6
1 19960630 19960703 1
1 19960630 19960705 5
1 19960630 19960708 8
2 20041231 20041227 2
2 20041231 20041228 6
2 20041231 20041229 9
2 20041231 20041230 5
2 20041231 20041231 3
2 20041231 20050103 8
2 20041231 20050104 7
2 20041231 20050105 6
2 20041231 20050106 4
2 20041231 20050107 0
;
data want;
set have;
where date-4 < date1 < date+4;
run;
and adjust how many days you want on each side. I dont see how 19960627 is within 2 days of 19960630 eg? 🙂
Many thanks for your code.
However, when you said "within" I meant within the existing days in column "date1". Hence, in my post I mention that I need to select two closest days to the specified date. Any ideas how to select two closest days to the specified date?
The closest isn't a clear definition. So if all 5 date1 are below the date and the date after is ten days later, what would the results be? The 5 days from prior to date1?
If so, then calculate the diff between the dates, take the absolute value, sort them descending and take the top 5 records from each ID.
Thanks a lot for your reply, @Reeza!
In my data, I definitely have two days before and two days after the specified date. The case that you mentioned is very unlikely to exist in my data, however even if it exists, then there should be NO output at all.
So, I need to select two closest days before the specified date and two closest days after the specified date and the specified date itself. If very awkward cases exist (as in your example), SAS should just ignore such cases.
But you don't always have a 'center' so that's the issue here....If you don't how do you know which direction to go in?
Yes, that is the issue and that is why I posted my question here 🙂
So first, I need to select a 'center'. The 'center' should be either the specified date or the closest day in "date1" column to the specified date. Next, relative to this 'center', I need to select two closest days before and two closest days after the 'center'.
Please, @Reeza, I do rely on your help 🙂
@Reeza, Guys, please, I urgently need your help!!!
I hope the following works for your requirement.
data have;
input company date:yymmdd10. date1:yymmdd10. var1;
format date date1 yymmdd10.;
datalines;
1 19960630 19960624 5
1 19960630 19960625 6
1 19960630 19960626 7
1 19960630 19960627 2
1 19960630 19960628 0
1 19960630 19960701 9
1 19960630 19960702 6
1 19960630 19960703 1
1 19960630 19960705 5
1 19960630 19960708 8
2 20041231 20041227 2
2 20041231 20041228 6
2 20041231 20041229 9
2 20041231 20041230 5
2 20041231 20041231 3
2 20041231 20050103 8
2 20041231 20050104 7
2 20041231 20050105 6
2 20041231 20050106 4
2 20041231 20050107 0
;
data want;
if _N_ = 1 then do;
if 0 then set have;
declare hash h(dataset: "have(keep=company date1 var1)",multidata:'yes');
h.defineKey('date1');
h.defineData('var1','date1');
h.defineDone();
end;
do until(last.company);
set have(keep=company date);
by company;
if first.company then
do;
call missing(__flag);
do _temp=date by -1 while(1);
rc=h.find(key:_temp);
if rc=0 then
do;
_flag+1;
if _temp=date then __flag=1;
output;
call missing(var1,date1);
end;
if __flag=1 and _flag=3 then
do;
_flag=0;
leave;
end;
else if missing(__flag) and _flag=2 then
do;
_flag=0;
leave;
end;
end;
do _temp=date+1 by 1 while(1);
rc=h.find(key:_temp);
if rc=0 then
do;
_flag+1;
output;
call missing(var1,date1);
end;
if _flag=2 then
do;
_flag=0;
leave;
end;
end;
end;
else leave;
end;
drop _: rc;
run;
Regards,
Naveen Srinivasan
I think that the following does what you want (note: it assumes your data are already in company,date1 order):
data want (drop=counter check closest); do until (last.company); set have; by company; if first.company then closest=9999; closest=min(closest,abs(date-date1)); end; do until (last.company); set have; by company; if first.company then counter=1; else counter+1; if abs(date-date1) eq closest then check=counter; end; do until (last.company); set have; by company; if first.company then counter=1; else counter+1; if abs(date-date1) eq closest then check=counter; end; do until (last.company); set have; by company; if first.company then counter=1; else counter+1; if check-2 le counter le check+2 then output; end; run;
Art, CEO, AnalystFinder.com
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.