Desktop productivity for business analysts and programmers

how to subset records from a table based upon conditional matching

Reply
Frequent Contributor
Posts: 87

how to subset records from a table based upon conditional matching

 

There are two tables Have1 and Have2:

 

 

data have1;
  infile cards truncover expandtabs;
  input ET $ Date :Date9.;
  format Date :date9.;
cards;
Lath 12Jan2015
Instal 15Feb2015
CRE 21May2015
. 24Jun2015 Lath 13Jul2015 CRE 01Oct2015 Lath 02Jan2016 Lath 13Jun2016 Lath 12Jan2015 Lath 30May2015 Lath 14Jan2016 Lath 05Aug2016 Lath 13Jun2015 Rem 17Jun2015 Instal 19Apr2016 Lath 12Jun2016 Lath 02Jul2015 Lath 24Jan2016 Lath 27Aug2016 Rem 19Feb2015 Lath 17Mar2015 Instal 07Oct2015 Lath 15Jan2016 Lath 05Aug2016 Rem 19Feb2015 Lath 16Mar2015 Instal 22Mar2015 Lath 02May2015 Rem 29Sep2015 Lath 28Jan2016 Instal 07Feb2016 Lath 14Mar2016 Lath 11Jun2016 CRE 20Feb2015 Lath 07Sep2015 Lath 07Sep2015 Lath 16Mar2016 Rem 23Jul2016 . 10Apr2015 Lath 14Apr2015 CRE 01Oct2015 Lath 01Jan2016 Lath 13Jun2016 Lath 06Aug2015 Lath 05Feb2016 Lath 09Feb2015 CRE 21May2015 . 23Sep2015 Lath 05Oct2015 CRE 21Mar2016 Lath 18Jun2016 Lath 07Nov2016 . 07Apr2015 Lath 13Apr2015 CRE 21May2015 CRE 21May2015 Lath 13Dec2015 Lath 10Mar2016 Lath 08Sep2016 ; run; data have2; infile cards truncover expandtabs; input Date :datetime19.2; format Date :datetime19.2; cards; 02Jan2016 18:50:46 04Jan2016 0:08:47 04Jan2016 5:40:23 05Jan2016 11:58:24 12Jan2016 10:06:35 24Jan2016 17:49:23 26Jan2016 9:58:08 29Jan2016 9:58:02 30Jan2016 11:48:34 02Feb2016 13:10:07 04Feb2016 9:12:03 04Feb2016 20:29:06 05Feb2016 11:18:21 06Feb2016 10:24:24 07Feb2016 1:09:35 08Feb2016 10:45:27 09Feb2016 13:10:47 10Feb2016 13:07:53 11Feb2016 13:15:14 12Feb2016 9:35:45 13Feb2016 1:11:07 15Feb2016 9:53:32 16Feb2016 9:51:57 14Mar2016 14:41:45 04Apr2016 1:09:09 07Apr2016 15:59:36 08Apr2016 10:49:23 09Apr2016 17:49:55 12Apr2016 15:55:17 15Apr2016 0:10:38 18Apr2016 11:53:01 19Apr2016 9:59:04 20Apr2016 9:50:01 26Apr2016 9:57:59 27Apr2016 10:45:19 30Apr2016 10:52:36 02May2016 10:01:18 03May2016 1:28:27 03May2016 10:49:17 04May2016 9:48:31 04May2016 14:47:58 05May2016 10:22:44 08May2016 11:53:59 08May2016 14:47:47 09May2016 1:06:01 09May2016 5:07:56 09May2016 10:46:49 11May2016 10:02:00 12May2016 1:30:07 12May2016 10:46:29 13May2016 16:13:38 15May2016 21:00:06 17May2016 0:09:47 18May2016 11:50:22 19May2016 9:57:19 20May2016 9:57:00 21May2016 17:49:05 22May2016 21:00:46 24May2016 0:12:04 26May2016 10:55:09 27May2016 10:02:30 28May2016 9:50:28 29May2016 15:44:50 29May2016 15:49:50 30May2016 9:02:56 31May2016 1:52:30 31May2016 10:45:11 02Jun2016 16:04:57 06Jun2016 0:08:38 07Jun2016 11:46:54 08Jun2016 10:04:26 09Jun2016 10:17:40 10Jun2016 9:57:10 12Jun2016 1:17:59 13Jun2016 9:59:42 15Jun2016 10:51:52 16Jun2016 15:15:07 17Jun2016 10:48:33 19Jun2016 1:12:13 20Jun2016 0:11:14 21Jun2016 13:00:25 23Jun2016 8:57:54 25Jun2016 12:07:01 26Jun2016 11:10:54 28Jun2016 10:01:16 29Jun2016 12:58:16 06Jul2016 11:05:23 07Jul2016 14:02:25 09Jul2016 19:53:57 11Jul2016 20:13:37 14Jul2016 11:14:45 15Jul2016 10:50:34 16Jul2016 16:10:53 19Jul2016 8:59:00 19Jul2016 19:10:27 20Jul2016 15:47:18 21Jul2016 15:45:18 22Jul2016 16:00:21 23Jul2016 15:42:42 24Jul2016 11:02:24 25Jul2016 1:09:19 27Jul2016 9:24:55 28Jul2016 9:00:54 28Jul2016 16:13:52 29Jul2016 14:09:31 02Aug2016 9:03:36 04Aug2016 13:55:52 07Aug2016 14:31:22 08Aug2016 11:11:00 09Aug2016 13:54:56 10Aug2016 8:57:20 13Aug2016 9:55:11 14Aug2016 11:09:51 23Aug2016 13:21:45 25Aug2016 12:56:55 26Aug2016 15:52:00 27Aug2016 18:50:02 29Aug2016 13:02:10 30Aug2016 8:11:25 31Aug2016 11:13:00 01Sep2016 13:55:02 02Sep2016 8:59:53 02Sep2016 18:55:42 04Sep2016 10:55:57 04Sep2016 18:38:39 05Sep2016 9:02:25 05Sep2016 19:17:13 08Sep2016 10:12:09 12Sep2016 10:16:22 12Sep2016 13:19:08 12Sep2016 13:25:59 13Sep2016 9:18:51 14Sep2016 9:02:19 16Sep2016 9:02:00 18Sep2016 21:48:05 22Sep2016 9:58:14 24Sep2016 13:53:58 25Sep2016 11:15:54 28Sep2016 9:03:51 28Sep2016 18:48:21 01Oct2016 13:02:22 02Oct2016 16:09:30 06Oct2016 8:58:00 07Oct2016 18:46:15 13Oct2016 16:03:55 19Oct2016 18:45:58 20Oct2016 18:56:03 21Oct2016 9:02:21 21Oct2016 18:45:19 24Oct2016 14:02:24 24Oct2016 18:50:13 26Oct2016 13:06:29 27Oct2016 13:55:43 28Oct2016 9:03:18 29Oct2016 13:00:58 03Nov2016 13:03:28 08Nov2016 13:55:16 11Nov2016 22:09:58 13Nov2016 21:56:39 14Nov2016 17:26:59 14Nov2016 17:33:54 17Nov2016 9:00:47 17Nov2016 18:45:14 ; run;

 

Let's assume that there are V1 and V2 columns (with NULL values, just for this case) in table Have2 so that we have 'DateTimeStamp', 'V1' and 'V1' columns.

 

 

What I want to do is to:

 

1. Create a subset from table Have2 in such a way that...

    a. Immediate previous and next dates (i.e only one date before and one date after) where Date in Have1 is equal to Date in Have2, and Have1.ET = "Lath"

 2. even If there is NO Have1.Date=Have2.Date, still for each Have1.Date, find immediate previous and next dates (one date before and one date after) from Have2, where Have1.ET = "Lath".

 

Please note the above dataset are just samples from larger ones.

Super User
Posts: 7,387

Re: how to subset records from a table based upon conditional matching

Since your second datastep would not produce usable data, I've taken the liberty to change it.

data have1;
  infile cards truncover expandtabs;
  input ET $ Date :Date9.;
  format Date :date9.;
cards;
Lath 12Jan2015
Instal 15Feb2015
CRE 21May2015. 24Jun2015
Lath 13Jul2015
CRE 01Oct2015
Lath 02Jan2016
Lath 13Jun2016
Lath 12Jan2015
Lath 30May2015
Lath 14Jan2016
Lath 05Aug2016
Lath 13Jun2015
Rem 17Jun2015
Instal 19Apr2016
Lath 12Jun2016
Lath 02Jul2015
Lath 24Jan2016
Lath 27Aug2016
Rem 19Feb2015
Lath 17Mar2015
Instal 07Oct2015
Lath 15Jan2016
Lath 05Aug2016
Rem 19Feb2015
Lath 16Mar2015
Instal 22Mar2015
Lath 02May2015
Rem 29Sep2015
Lath 28Jan2016
Instal 07Feb2016
Lath 14Mar2016
Lath 11Jun2016
CRE 20Feb2015
Lath 07Sep2015
Lath 07Sep2015
Lath 16Mar2016
Rem 23Jul2016
. 10Apr2015
Lath 14Apr2015
CRE 01Oct2015
Lath 01Jan2016
Lath 13Jun2016
Lath 06Aug2015
Lath 05Feb2016
Lath 09Feb2015
CRE 21May2015
. 23Sep2015
Lath 05Oct2015
CRE 21Mar2016
Lath 18Jun2016
Lath 07Nov2016
. 07Apr2015
Lath 13Apr2015
CRE 21May2015
CRE 21May2015
Lath 13Dec2015
Lath 10Mar2016
Lath 08Sep2016
;
run;


data have2;
  infile cards truncover expandtabs;
  input Date :date9. time :time8.;
  dt = date * 86400 + time;
  format Date date9. time time8. dt datetime19.;
cards;
02Jan2016 18:50:46
04Jan2016 0:08:47
04Jan2016 5:40:23
05Jan2016 11:58:24
12Jan2016 10:06:35
24Jan2016 17:49:23
26Jan2016 9:58:08
29Jan2016 9:58:02
30Jan2016 11:48:34
02Feb2016 13:10:07
04Feb2016 9:12:03
04Feb2016 20:29:06
05Feb2016 11:18:21
06Feb2016 10:24:24
07Feb2016 1:09:35
08Feb2016 10:45:27
09Feb2016 13:10:47
10Feb2016 13:07:53
11Feb2016 13:15:14
12Feb2016 9:35:45
13Feb2016 1:11:07
15Feb2016 9:53:32
16Feb2016 9:51:57
14Mar2016 14:41:45
04Apr2016 1:09:09
07Apr2016 15:59:36
08Apr2016 10:49:23
09Apr2016 17:49:55
12Apr2016 15:55:17
15Apr2016 0:10:38
18Apr2016 11:53:01
19Apr2016 9:59:04
20Apr2016 9:50:01
26Apr2016 9:57:59
27Apr2016 10:45:19
30Apr2016 10:52:36
02May2016 10:01:18
03May2016 1:28:27
03May2016 10:49:17
04May2016 9:48:31
04May2016 14:47:58
05May2016 10:22:44
08May2016 11:53:59
08May2016 14:47:47
09May2016 1:06:01
09May2016 5:07:56
09May2016 10:46:49
11May2016 10:02:00
12May2016 1:30:07
12May2016 10:46:29
13May2016 16:13:38
15May2016 21:00:06
17May2016 0:09:47
18May2016 11:50:22
19May2016 9:57:19
20May2016 9:57:00
21May2016 17:49:05
22May2016 21:00:46
24May2016 0:12:04
26May2016 10:55:09
27May2016 10:02:30
28May2016 9:50:28
29May2016 15:44:50
29May2016 15:49:50
30May2016 9:02:56
31May2016 1:52:30
31May2016 10:45:11
02Jun2016 16:04:57
06Jun2016 0:08:38
07Jun2016 11:46:54
08Jun2016 10:04:26
09Jun2016 10:17:40
10Jun2016 9:57:10
12Jun2016 1:17:59
13Jun2016 9:59:42
15Jun2016 10:51:52
16Jun2016 15:15:07
17Jun2016 10:48:33
19Jun2016 1:12:13
20Jun2016 0:11:14
21Jun2016 13:00:25
23Jun2016 8:57:54
25Jun2016 12:07:01
26Jun2016 11:10:54
28Jun2016 10:01:16
29Jun2016 12:58:16
06Jul2016 11:05:23
07Jul2016 14:02:25
09Jul2016 19:53:57
11Jul2016 20:13:37
14Jul2016 11:14:45
15Jul2016 10:50:34
16Jul2016 16:10:53
19Jul2016 8:59:00
19Jul2016 19:10:27
20Jul2016 15:47:18
21Jul2016 15:45:18
22Jul2016 16:00:21
23Jul2016 15:42:42
24Jul2016 11:02:24
25Jul2016 1:09:19
27Jul2016 9:24:55
28Jul2016 9:00:54
28Jul2016 16:13:52
29Jul2016 14:09:31
02Aug2016 9:03:36
04Aug2016 13:55:52
07Aug2016 14:31:22
08Aug2016 11:11:00
09Aug2016 13:54:56
10Aug2016 8:57:20
13Aug2016 9:55:11
14Aug2016 11:09:51
23Aug2016 13:21:45
25Aug2016 12:56:55
26Aug2016 15:52:00
27Aug2016 18:50:02
29Aug2016 13:02:10
30Aug2016 8:11:25
31Aug2016 11:13:00
01Sep2016 13:55:02
02Sep2016 8:59:53
02Sep2016 18:55:42
04Sep2016 10:55:57
04Sep2016 18:38:39
05Sep2016 9:02:25
05Sep2016 19:17:13
08Sep2016 10:12:09
12Sep2016 10:16:22
12Sep2016 13:19:08
12Sep2016 13:25:59
13Sep2016 9:18:51
14Sep2016 9:02:19
16Sep2016 9:02:00
18Sep2016 21:48:05
22Sep2016 9:58:14
24Sep2016 13:53:58
25Sep2016 11:15:54
28Sep2016 9:03:51
28Sep2016 18:48:21
01Oct2016 13:02:22
02Oct2016 16:09:30
06Oct2016 8:58:00
07Oct2016 18:46:15
13Oct2016 16:03:55
19Oct2016 18:45:58
20Oct2016 18:56:03
21Oct2016 9:02:21
21Oct2016 18:45:19
24Oct2016 14:02:24
24Oct2016 18:50:13
26Oct2016 13:06:29
27Oct2016 13:55:43
28Oct2016 9:03:18
29Oct2016 13:00:58
03Nov2016 13:03:28
08Nov2016 13:55:16
11Nov2016 22:09:58
13Nov2016 21:56:39
14Nov2016 17:26:59
14Nov2016 17:33:54
17Nov2016 9:00:47
17Nov2016 18:45:14
;
run;

data lookup (keep=date);
set have1 (rename=(date=olddate));
where et = 'Lath';
format date date9.;
do date = olddate - 1 to olddate + 1;
  output;
end;
run;

proc sort data=lookup nodupkey;
by date;
run;

data want;
merge
  lookup (in=a)
  have2
;
by date;
if a;
run;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Super User
Posts: 7,682

Re: how to subset records from a table based upon conditional matching

Well, the code below will find the nearest dates around the datetime given.  I wouldn't call variables "date" by the way, you have fallen into describing data, but not acurately,  Have2 has a datetime not a date.  Also it doesn't show well in the editor.  Not sure if this code would be viable on large data, but try it:

data have1;
  infile cards truncover expandtabs;
  input ET $ Date :Date9.;
  format Date date9.;
cards;
Lath 12Jan2015
Instal 15Feb2015
CRE 21May2015. 24Jun2015
Lath 13Jul2015
CRE 01Oct2015
Lath 02Jan2016
Lath 13Jun2016
Lath 12Jan2015
Lath 30May2015
Lath 14Jan2016
Lath 05Aug2016
;
run;


data have2;
  infile cards truncover expandtabs;
  input Date datetime19.2;
  format Date datetime19.2;
cards;
02Jan2016:18:50:46
04Jan2016:08:47
04Jan2016:5:40:23
05Jan2016:11:58:24
12Jan2016:10:06:35
24Jan2016:17:49:23
26Jan2016:9:58:08
;
run;

data inter;
  set have1 (where=(et="Lath"));
run;

proc sql;
  create table WANT as
  select  HAVE2.*,
          (select max(DATE) from (select * from INTER where DATE le datepart(HAVE2.DATE))) as V1 format=date9.,
          (select min(DATE) from (select * from INTER where DATE ge datepart(HAVE2.DATE))) as V2 format=date9.
  from    HAVE2 HAVE2;
quit;

I split out the Lath part, as no need to drag in more data than necessary.

Trusted Advisor
Posts: 1,459

Re: how to subset records from a table based upon conditional matching

It seems that have2 is sorted by datetime.

Add to have2 dataset a variable = sequential number of the observatioN

data temp1;
 set have 2;
      seq = _N_;
run;

Add to have1 the minimum of have2.seq which is GE have1.date and a flag assigning is it equal (=0) or is it greater (=1);

You asked to subset it for ET = 'Lath' only:

proc sql;
       create table temp2
       as select a.et, a.date,
           min(b.seq) as seq
           case when b.date = a.date) then 0 else
           case when b.date > a.date  then 1 as flag
       from hav1(where=(et='Lath')  as a 
       left join temp1 as b
       on b.date ge a.date
order by seq ; quit;

Now that you have in temp2 the sequence number of the related observation in have2 

select the previous and if needed the next seq obs:

 

data temp3;
set temp2;
if seq = 1 then seq1 = .;
else seq1 = seq -1;
if flag=0 then seq2 = seq +1;
else seq2 = seq;
run;

proc sql
create table want
as select a.*,
case b.seq = a.seq1 then b.date as v1
case b.seq = a.seq2 then b.date as v2
from temp1 as a
left join temp2 as b
on b.et = a.et
;
quit;

I haven't tested the code therefore I left all seq: variables in want dataset for control.

Finally of ok DROP unwanted variables.

 

Frequent Contributor
Posts: 87

Re: how to subset records from a table based upon conditional matching

Hello, Thanks very much for the explanation and suggestions. There were couple of typo mistakes which i corrected them and run the code:

 

proc sql;
       create table temp2
       as select a.et, a.date,
           min(b.seq) as seq,
           case when b.date = a.date then 0 else
           case when b.date > a.date  then 1 as flag
       from have1(where=(et='Lath'))  as a 
       left join temp1 as b
       on b.date ge a.date       order by seq
; quit;
 
 

 

I got the following errors:

 

267                   case when b.date > a.date  then 1 as flag
                                                        __
                                                        22
                                                        76
ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, *, **, +, -, /, <, <=, <>, =, >, >=, ?, AND, BETWEEN, 
              CONTAINS, ELSE, END, EQ, EQT, GE, GET, GT, GTT, IN, IS, LE, LET, LIKE, LT, LTT, NE, NET, NOT, NOTIN, OR, WHEN, ^, ^=, 
              |, ||, ~, ~=.  

ERROR 76-322: Syntax error, statement will be ignored.

268               from have1(where=(et='Lath'))  as a
269               left join temp1 as b
Super User
Posts: 7,387

Re: how to subset records from a table based upon conditional matching

The SQL case needs to be closed with an end:

proc sql;
create table temp2 as
select
  a.et, a.date,
  min(b.seq) as seq,
  case
    when b.date = a.date then 0
    when b.date > a.date then 1
  end as flag
from have1(where=(et='Lath'))  as a 
left join temp1 as b
on b.date ge a.date
order by seq
;
quit;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Ask a Question
Discussion stats
  • 5 replies
  • 258 views
  • 0 likes
  • 4 in conversation