BookmarkSubscribeRSS Feed
Ruslan
Calcite | Level 5

Dear All,

 

My dataset looks like this:

 

company      date            date1          var1      

1                   19960630   19960624     5

1                   19960630   19960625     6

1                   19960630   19960626     7

1                   19960630   19960627     2

1                   19960630   19960628     0

1                   19960630   19960701     9

1                   19960630   19960702     6

1                   19960630   19960703     1

1                   19960630   19960705     5 

1                   19960630   19960708     8

2                   20041231   20041227     2

2                   20041231   20041228     6

2                   20041231   20041229     9

2                   20041231   20041230     5

2                   20041231   20041231     3

2                   20041231   20050103     8

2                   20041231   20050104     7

2                   20041231   20050105     6

2                   20041231   20050106     4

2                   20041231   20050107     0

 

I need to obtain the dataset that looks like this:

 

company      date            date1          var1   

1                   19960630   19960627     2

1                   19960630   19960628     0

1                   19960630   19960701     9

1                   19960630   19960702     6

1                   19960630   19960703     6

2                   20041231   20041229     9

2                   20041231   20041230     5

2                   20041231   20041231     3

2                   20041231   20050103     8

2                   20041231   20050104     7

 

So, for each company, I need to select two days before and two days after the specified date (and specified date itself). In my case, specified date is column "date" and the days surrouding the specified date are in column "date1". I have cases where specified date is included in the "date1" column, but I also have cases when specified date is not included in the "date1" column. In any case, I need to select five days: the one closest (or equal) to the specified date and four days surrounding the closest (or specified) date.

 

Any help on this problem will be hugely appreciated.

 

Kind regards,

Ruslan

13 REPLIES 13
PeterClemmensen
Tourmaline | Level 20

I'm not sure I understand this entirely. For each company, you want to select only the rows in which date1 lies within 2 days from date? Or am I misunderstanding?

Ruslan
Calcite | Level 5

Yes, you are right. Please bear in mind cases when date is included and not included in date1. 

art297
Opal | Level 21

Are your two date fields SAS dates or just numbers that look like dates?

 

Art, CEO, AnalystFinder.com

 

Ruslan
Calcite | Level 5

Columns date and date1 are in date format.

PeterClemmensen
Tourmaline | Level 20

Do something like this

 

data have;
input company date:yymmdd10. date1:yymmdd10. var1;
format date date1 yymmdd10.;
datalines; 
1 19960630 19960624 5
1 19960630 19960625 6
1 19960630 19960626 7
1 19960630 19960627 2
1 19960630 19960628 0
1 19960630 19960701 9
1 19960630 19960702 6
1 19960630 19960703 1
1 19960630 19960705 5 
1 19960630 19960708 8
2 20041231 20041227 2
2 20041231 20041228 6
2 20041231 20041229 9
2 20041231 20041230 5
2 20041231 20041231 3
2 20041231 20050103 8
2 20041231 20050104 7
2 20041231 20050105 6
2 20041231 20050106 4
2 20041231 20050107 0
;

data want;
   set have;
   where date-4 < date1 < date+4;
run;

 and adjust how many days you want on each side. I dont see how  19960627 is within 2 days of  19960630 eg? 🙂

Ruslan
Calcite | Level 5

Many thanks for your code. 

 

However, when you said "within" I meant within the existing days in column "date1". Hence, in my post I mention that I need to select two closest days to the specified date. Any ideas how to select two closest days to the specified date?

Reeza
Super User

The closest isn't a clear definition. So if all 5 date1 are below the date and the date after is ten days later, what would the results be? The 5 days from prior to date1?

 

If so, then calculate the diff between the dates, take the absolute value, sort them descending and take the top 5 records from each ID.

Ruslan
Calcite | Level 5

Thanks a lot for your reply, @Reeza!

 

In my data, I definitely have two days before and two days after the specified date. The case that you mentioned is very unlikely to exist in my data, however even if it exists, then there should be NO output at all. 

 

So, I need to select two closest days before the specified date and two closest days after the specified date and the specified date itself. If very awkward cases exist (as in your example), SAS should just ignore such cases. 

 

 

Reeza
Super User

But you don't always have a 'center' so that's the issue here....If you don't how do you know which direction to go in?

Ruslan
Calcite | Level 5

Yes, that is the issue and that is why I posted my question here 🙂

 

So first, I need to select a 'center'. The 'center' should be either the specified date or the closest day in "date1" column to the specified date. Next, relative to this 'center', I need to select two closest days before and two closest days after the 'center'.

 

Please, @Reeza, I do rely on your help 🙂

Ruslan
Calcite | Level 5

@Reeza, Guys, please, I urgently need your help!!!

novinosrin
Tourmaline | Level 20

I hope the following works for your requirement. 

 

data have;

input company date:yymmdd10. date1:yymmdd10. var1;

format date date1 yymmdd10.;

datalines;

1 19960630 19960624 5

1 19960630 19960625 6

1 19960630 19960626 7

1 19960630 19960627 2

1 19960630 19960628 0

1 19960630 19960701 9

1 19960630 19960702 6

1 19960630 19960703 1

1 19960630 19960705 5

1 19960630 19960708 8

2 20041231 20041227 2

2 20041231 20041228 6

2 20041231 20041229 9

2 20041231 20041230 5

2 20041231 20041231 3

2 20041231 20050103 8

2 20041231 20050104 7

2 20041231 20050105 6

2 20041231 20050106 4

2 20041231 20050107 0

;

 

 data want;

   if _N_ = 1 then do;

   if 0 then set have;

     declare hash h(dataset: "have(keep=company date1 var1)",multidata:'yes');

      h.defineKey('date1');

      h.defineData('var1','date1');

      h.defineDone();

   end;

   do until(last.company);

       set have(keep=company date);

       by company;

         if first.company then

            do;

                   call missing(__flag);

                   do _temp=date by -1 while(1);

                  rc=h.find(key:_temp);

                  if rc=0 then

                  do;

                          _flag+1;

                            if _temp=date then __flag=1;

                            output;

                          call missing(var1,date1);

                  end;

                       if __flag=1 and  _flag=3 then

                   do;

                            _flag=0;

                              leave;

                        end;

                       else if missing(__flag) and _flag=2 then

                        do;

                          _flag=0;

                             leave;

                        end;                        

                  end;

                  do _temp=date+1 by 1 while(1);

                  rc=h.find(key:_temp);

                  if rc=0 then

                  do;

                          _flag+1;

                     output;

                          call missing(var1,date1);

                  end;

                       if _flag=2 then

                   do;

                            _flag=0;

                              leave;

                        end;

                  end;

              end;

              else leave;

     end;

     drop _:  rc;

     run;

 

Regards,

Naveen Srinivasan

art297
Opal | Level 21

I think that the following does what you want (note: it assumes your data are already in company,date1 order):

data want (drop=counter check closest);
  do until (last.company);
    set have;
    by company;
    if first.company then closest=9999;
    closest=min(closest,abs(date-date1));
  end;
  do until (last.company);
    set have;
    by company;
    if first.company then counter=1;
    else counter+1;
    if abs(date-date1) eq closest then check=counter;
  end;
  do until (last.company);
    set have;
    by company;
    if first.company then counter=1;
    else counter+1;
    if abs(date-date1) eq closest then check=counter;
  end;
  do until (last.company);
    set have;
    by company;
    if first.company then counter=1;
    else counter+1;
    if check-2 le counter le check+2 then output; 
  end;
run;

Art, CEO, AnalystFinder.com

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 3797 views
  • 1 like
  • 5 in conversation