Solved: Create date range variables by comparing current observation with prev...

ravib · Posted 08-03-2016 03:15 AM

Hi,

I am facing problem in reading each observation from input table and compare its values against the previous record or next record based on certain conditions in the logic. Appreciate if I can get any help from this group.

My current data looks like:

***************************************

data one;
infile datalines;
input CURR_RK ORIG_RK CUST_SK VALID_DTTM: anydtdte10. ;
format VALID_DTTM date9.;
datalines;
41 2673 2342321 1-Sep-12
41 41 12559928 1-Jan-13
41 4615581 14452649 1-Feb-13
41 4615581 67316104 1-Aug-14
41 41 72971041 1-Oct-14
41 4615581 74341618 1-Nov-14
41 8828883 63829111 1-Jul-15
9217 9217 21128775 1-Jan-13
9217 10124544 15142897 1-Jan-13
9217 9217 30874351 1-Mar-13
9217 9217 71276102 1-Sep-14
9217 10124544 74885982 1-Nov-14
346372 346372 6478392 1-Sep-14
7801669 6368875 2591876 1-Dec-12
7801669 7801669 3829995 1-Jan-13
7801669 7801669 97082679 1-Aug-15
7801669 6368875 106930088 1-Oct-15
7801669 7801669 107207912 1-Nov-15
;
run;

***************************************

The logic required is something like below:

For each curr_rk by group, I need to check as below:

>>

Scenario#1:

if first.curr_rk and curr_rk = orig_rk then assign

new_sk=CUST_SK and new_dt = VALID_DTTM

Scenario#2

if first.curr_rk and curr_rk ^= orig_rk then

go down below to the next matching record (i.e. with curr_rk = orig_rk)

Bring it's CUST_SK & VALID_DTTM to replace the values of the current record.

Scenario#3:

if not first.curr_rk and curr_rk = orig_rk then assign

new_sk=CUST_SK and new_dt = VALID_DTTM

Scenario#4:

if not first.curr_rk and curr_rk ^= orig_rk then

go up above to the nearest matching record (i.e. with curr_rk = orig_rk)

Bring it's CUST_SK & VALID_DTTM to replace the values of the current record.

In this process, I have sorted the input table and used "retain" statement to fetch desired values. But challenge is my input table sometimes has "non-matching" record occuring first and matching-record is followed later on. In such cases, retain statement is not working. Hence I thought of creating two date range variables (MAX_DT_BFOR_VALID_DTTM & MIN_DT_AFTR_VALID_DTTM) based on each record's VALID_DTTM compared against its corresponding pre-match and post-match records.

Please refer the attached excel sheet with desired output results shown.

I am ok if there is a better approach with out using these variables. Also I need to create the output in Oracle.

Hope I am able to explain the issue well. Please let me know if any more information is required.

Thanks a lot in advance.

Ksharp · Posted 08-04-2016 02:06 PM

OK. Maybe I understand what you mean.
You only want PRE date which is less than or equal with the current obs , Do not want any POST date ?



data one;
infile datalines expandtabs truncover;
input CURR_RK ORIG_RK CUST_SK VALID_DTTM: anydtdte10. ;
format VALID_DTTM date9.;
datalines;
41 2673 2342321 1-Sep-12
41 41 12559928 1-Jan-13
41 4615581 14452649 1-Feb-13
41 4615581 67316104 1-Aug-14
41 41 72971041 1-Oct-14
41 4615581 74341618 1-Nov-14
41 8828883 63829111 1-Jul-15
9217 9217 21128775 1-Jan-13
9217 10124544 15142897 1-Jan-13
9217 9217 30874351 1-Mar-13
9217 9217 71276102 1-Sep-14
9217 10124544 74885982 1-Nov-14
346372 346372 6478392 1-Sep-14
7801669	6368875	2591876	1-Dec-12
7801669	7801669	3829995	1-Jan-13
7801669	7801669	97082679 1-Aug-15
7801669	6368875	106930088 1-Nov-15
7801669	7801669	107207912 1-Nov-15
;
run;


data have;
do total=1 by 1 until(last.curr_rk);
 set one;
 by curr_rk;
end;
do n=1 by 1 until(last.curr_rk);
 set one;
 by curr_rk;
 output;
end;
run;

data want;
if _n_=1 then do;
  if 0 then set have(rename=(CUST_SK=new_sk VALID_DTTM=new_dt));
  declare hash h2();
  h2.definekey('CURR_RK','n');
  h2.definedata('new_sk','new_dt');
  h2.definedone();
end;

do until(last.curr_rk);
 set have;
 by curr_rk;
 if CURR_RK=ORIG_RK then do;
  new_sk=CUST_SK; new_dt=VALID_DTTM;h2.add();
 end;
end;

do until(last.curr_rk);
 set have;
 by curr_rk;
 
 if CURR_RK=ORIG_RK then do;want_sk=CUST_SK;want_dt=VALID_DTTM; end;
 else do;
 
  do k=n-1 to 1 by -1;
    call missing(new_sk,new_dt);
    rc=h2.find(key:CURR_RK,key:k);
    if rc=0 then do;
     want_sk=new_sk;
     want_dt=new_dt;
     leave;
    end;
  end;
 
  do k=n+1 to total;
    call missing(new_sk,new_dt);
    rc=h2.find(key:CURR_RK,key:k);
    if rc=0 then do;
     if new_dt=VALID_DTTM or first.curr_rk then do;
       want_sk=new_sk;
       want_dt=new_dt;
     end;
     leave;
    end;
  end;
 
 end;
   

output;
end;

h2.clear();
format want_dt date9.;
drop rc  new_sk new_dt n k  total;
run;

View solution in original post

Kurt_Bremser · Posted 08-03-2016 03:37 AM

If you need to get data from 'later' records, how about reversing the order of the dataset, so you can use retain then?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ravib · Posted 08-03-2016 04:11 AM

Hi,

Thanks for the quick response.

I need both. Some times I need previous record values and some times I need later record values depending on the conditions.

Do you mean, first fetch from the previous records into a dataset and then sort it again in reverse order to fetch values from later records? Will it work?

Kurt_Bremser · Posted 08-03-2016 04:25 AM

Yes.

If you do not already have a variable that defines your order, then you can create one in the first pass by using _N_.

Then sort descending by that variable, do your reverse pass, and then sort again (without descending) to get the original order.

If you use out= in the proc sort statement to create new datasets, you can add a drop= dataset option to get rid of the "order" variable.

Here's an example to illustrate what I mean:

data have;
input value condition;
cards;
3 0
4 1
5 2
6 0
7 1
8 0
;
run;

data firstpass (drop=keepval);
set have;
retain keepval;
order = _n_;
if condition = 0 then keepval = value;
if condition = 1 then value = keepval;
run;

proc sort data=firstpass;
by descending order;
run;

data secondpass (drop=keepval);
set firstpass;
retain keepval;
if condition = 0 then keepval = value;
if condition = 2 then value = keepval;
run;

proc sort
  data=secondpass
  out=want (drop=order)
;
by order;
run;

proc print;
run;

In the end, you get this output:

Obs    value    condition

 1       3          0    
 2       3          1    
 3       6          2    
 4       6          0    
 5       6          1    
 6       8          0

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Kurt_Bremser · Posted 08-03-2016 04:27 AM

PS if you only need data from the immediately preceding observation in both directions, you can also use the lag() function.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

RW9 · Posted 08-03-2016 04:32 AM

ITs a bit oif a confusing post here, basically you need the current record, the previous record which matches, and the next record that matches, merge these three together, then perform your logic check. My question is how do you know the order of the data, does a "previous" record mean one that matches the criteria and has max(date) < current record? If so then something like:

proc sql;
  create table WANT as
  select  A.*,
          B.VALID_DTTM as PREV_DTTM,
          C.VALID_DTTM as NEXT_DTTM
  from    ONE A
  left join (select * from ONE group by CURR_RK having CURR_RK=A.CURR_RK and VALID_DTTM < A.VALID_DTTM and VALID_DTTM=max(VALID_DTTM)) B
  on      A.CURR_RK=B.CURR_RK
  left join (select * from ONE group by CURR_RK having CURR_RK=A.CURR_RK and VALID_DTTM > A.VALID_DTTM and VALID_DTTM=min(VALID_DTTM)) C
  on      A.CURR_RK=C.CURR_RK;
quit;

data want;
  set want;
  by curr_rk;
  if first.curr_rk and ...
run;

ravib · Posted 08-03-2016 08:14 AM

Hi RW9,

Yes, you got the question right. I want to merge those 3 records (pre-match, current, post-match) and create two new date variables

MAX_DT_BFOR_VALID_DTTM & MIN_DT_AFTR_VALID_DTTM as shown below.

I will then use these two date variables to do the further logic.

Desired Output for me:

CURR_RK	ORIG_RK	CUST_SK	VALID_DTTM	FLAG	MAX_DT_BFOR_VALID_DTTM	MIN_DT_AFTR_VALID_DTTM	EXPECTED FINAL VALID_DTTM	EXPECTED FINAL CUST_SK
41	2673	2342321	1-Sep-12	FALSE	-	1-Jan-13	1-Jan-13	12559928
41	41	12559928	1-Jan-13	TRUE	1-Jan-13	1-Oct-14	1-Jan-13	12559928
41	4615581	14452649	1-Feb-13	FALSE	1-Jan-13	1-Oct-14	1-Jan-13	12559928
41	4615581	67316104	1-Aug-14	FALSE	1-Jan-13	1-Oct-14	1-Jan-13	12559928
41	41	72971041	1-Oct-14	TRUE	1-Oct-14	-	1-Oct-14	72971041
41	4615581	74341618	1-Nov-14	FALSE	1-Oct-14	-	1-Oct-14	72971041
41	8828883	63829111	1-Jul-15	FALSE	1-Oct-14	-	1-Oct-14	72971041
9217	9217	21128775	1-Jan-13	TRUE	1-Jan-13	1-Mar-13	1-Jan-13	21128775
9217	10124544	15142897	1-Jan-13	FALSE	1-Jan-13	1-Mar-13	1-Jan-13	21128775
9217	9217	30874351	1-Mar-13	TRUE	1-Mar-13	1-Sep-14	1-Mar-13	30874351
9217	9217	71276102	1-Sep-14	TRUE	1-Sep-14	1-Nov-14	1-Sep-14	71276102
9217	10124544	74885982	1-Nov-14	FALSE	1-Sep-14	-	1-Sep-14	71276102
346372	346372	6478392	1-Sep-14	TRUE	1-Sep-14	1-Sep-14	1-Sep-14	6478392
7801669	6368875	2591876	1-Dec-12	FALSE	-	1-Jan-13	1-Jan-13	3829995
7801669	7801669	3829995	1-Jan-13	TRUE	1-Jan-13	1-Aug-15	1-Jan-13	3829995
7801669	7801669	97082679	1-Aug-15	TRUE	1-Aug-15	1-Nov-15	1-Aug-15	97082679
7801669	6368875	106930088	1-Nov-15	FALSE	1-Nov-15	1-Nov-15	1-Nov-15	107207912
7801669	7801669	107207912	1-Nov-15	TRUE	1-Nov-15	-	1-Nov-15	107207912

Thanks for the logic provided. I tried it in SAS but it's throwing CORRELATION errors.

proc sql;
create table WANT as
select A.*,
B.VALID_DTTM as PREV_DTTM,
C.VALID_DTTM as NEXT_DTTM
from ONE A
left join (select * from ONE group by CURR_RK having CURR_RK=A.CURR_RK and VALID_DTTM < A.VALID_DTTM and VALID_DTTM=max(VALID_DTTM)) B
on A.CURR_RK=B.CURR_RK
left join (select * from ONE group by CURR_RK having CURR_RK=A.CURR_RK and VALID_DTTM > A.VALID_DTTM and VALID_DTTM=min(VALID_DTTM)) C
on A.CURR_RK=C.CURR_RK;
quit;

SQL error:

ERROR: Correlated reference to column CURR_RK is not contained within a subquery.
ERROR: Correlated reference to column VALID_DTTM is not contained within a subquery.
ERROR: Correlated reference to column CURR_RK is not contained within a subquery.
ERROR: Correlated reference to column VALID_DTTM is not contained within a subquery.

The query seems to be correct but I am not sure what is causing the issue.

I need to investigate and see if it gives me the desired new date variables.

Thanks once again.

RW9 · Posted 08-03-2016 08:35 AM

Just re-jig a bit:

proc sql;
  create table WANT as
  select  A.*,
          (select distinct VALID_DTTM from ONE group by CURR_RK having CURR_RK=A.CURR_RK and VALID_DTTM < A.VALID_DTTM and VALID_DTTM=max(VALID_DTTM)) as PREV_DTTM,
          (select distinct VALID_DTTM from ONE group by CURR_RK having CURR_RK=A.CURR_RK and VALID_DTTM > A.VALID_DTTM and VALID_DTTM=min(VALID_DTTM)) as NEXT_DTTM
  from    ONE A;
quit;

Ksharp · Posted 08-03-2016 05:15 AM

No. Post data here. No one would like to download file from website.


data one;
infile datalines;
input CURR_RK ORIG_RK CUST_SK VALID_DTTM: anydtdte10. ;
format VALID_DTTM date9.;
datalines;
41 2673 2342321 1-Sep-12
41 41 12559928 1-Jan-13
41 4615581 14452649 1-Feb-13
41 4615581 67316104 1-Aug-14
41 41 72971041 1-Oct-14
41 4615581 74341618 1-Nov-14
41 8828883 63829111 1-Jul-15
9217 9217 21128775 1-Jan-13
9217 10124544 15142897 1-Jan-13
9217 9217 30874351 1-Mar-13
9217 9217 71276102 1-Sep-14
9217 10124544 74885982 1-Nov-14
346372 346372 6478392 1-Sep-14
7801669 6368875 2591876 1-Dec-12
7801669 7801669 3829995 1-Jan-13
7801669 7801669 97082679 1-Aug-15
7801669 6368875 106930088 1-Oct-15
7801669 7801669 107207912 1-Nov-15
;
run;

data one;
 set one;
 by curr_rk;
 if first.curr_rk then n=0;
 n+1;
run;
data want;
 if _n_=1 then do;
  if 0 then set one(rename=(CUST_SK=new_sk VALID_DTTM=new_dt));
  declare hash h1(dataset:'one(rename=(CUST_SK=new_sk VALID_DTTM=new_dt)
   where=(CURR_RK=ORIG_RK))');
  h1.definekey('CURR_RK','ORIG_RK');
  h1.definedata('new_sk','new_dt');
  h1.definedone();
  
  if 0 then set one(keep=ORIG_RK rename=(ORIG_RK=_ORIG_RK));
  declare hash h2(dataset:'one(rename=(ORIG_RK=_ORIG_RK 
   CUST_SK=new_sk VALID_DTTM=new_dt))');
  h2.definekey('CURR_RK','n');
  h2.definedata('new_sk','new_dt','_ORIG_RK');
  h2.definedone();
 end;
set one;
by curr_rk;
if first.curr_rk then do;
 if curr_rk = orig_rk then do;new_sk=CUST_SK; new_dt=VALID_DTTM;end;
  else do;call missing(new_sk,new_dt);rc=h1.find(key:CURR_RK,key:CURR_RK);end;
end;
else do;
 if curr_rk = orig_rk then do;new_sk=CUST_SK; new_dt=VALID_DTTM;end;
  else do;
   do k=n-1 to 1 by -1;
    call missing(new_sk,new_dt);
    rc=h2.find(key:CURR_RK,key:k);
    if CURR_RK=_ORIG_RK then leave;
   end;
  end;
end;
drop rc k n _ORIG_RK;
run;

ravib · Posted 08-03-2016 09:04 AM

Hi Ksharp,

Thank you very much for the solution logic provided. I am able to create new_sk & new_dttm with this logic but we found a small here is a small challenge to it. For e.g. if you observe last 3 records in below table, the 3rd record which is a matching record should get as is new_sk & new_dttm. Where as since 4th record is a non-match record, we need to verify its VALID_DTTM against pre-match record and also with post-match record values. In this case, the 5th record VALID_DTTM matches to the 4th record VALID_DTTM and hence we need to get its CUST_SK, VALID_DTTM and replace the values in the 4th record. Else it should pick the values from 3rd record (pre-matching record) irrespective of its VALID_DTTM values.

Because of this complexity, we want to create two seperate date variables based on pre-match VALID_DTTM value and post-match VALID_DTTM value. Then take the min of them as FINAL_DTTM. Accordingly replace the CUST_SK values for each record.

They also want to use these dates in further jobs to implement some logic.

CURR_RK	ORIG_RK	CUST_SK	VALID_DTTM	VALID_END_DTTM	FLAG	MAX_DT_BFOR_VALID_DTTM	MIN_DT_AFTR_VALID_DTTM	EXPECTED FINAL VALID_DTTM	EXPECTED FINAL CUST_SK
7801669	6368875	2591876	1-Dec-12	31-Oct-15	FALSE	-	1-Jan-13	1-Jan-13	3829995
7801669	7801669	3829995	1-Jan-13	31-Jul-15	TRUE	1-Jan-13	1-Aug-15	1-Jan-13	3829995
7801669	7801669	97082679	1-Aug-15	31-Oct-15	TRUE	1-Aug-15	1-Nov-15	1-Aug-15	97082679
7801669	6368875	106930088	1-Nov-15	31-Dec-47	FALSE	1-Nov-15	1-Nov-15	1-Nov-15	107207912
7801669	7801669	107207912	1-Nov-15	31-Dec-47	TRUE	1-Nov-15	-	1-Nov-15	107207912

Hope I am able to explain well. Please help if you have any alternative solution for this.

Thank you in advance.

Ksharp · Posted 08-04-2016 05:48 AM

So you want the nearest  date no matter it PRE date or POST date?
But if PRE and POST have the same different days like below , which one you gonna to pick ?

7801669	7801669	97082679	1-Nov-15	
7801669	6368875	106930088	1-Nov-15	
7801669	7801669	107207912	1-Nov-15 


Assuming both either you could accept :


data one;
infile datalines expandtabs truncover;
input CURR_RK ORIG_RK CUST_SK VALID_DTTM: anydtdte10. ;
format VALID_DTTM date9.;
datalines;
41 2673 2342321 1-Sep-12
41 41 12559928 1-Jan-13
41 4615581 14452649 1-Feb-13
41 4615581 67316104 1-Aug-14
41 41 72971041 1-Oct-14
41 4615581 74341618 1-Nov-14
41 8828883 63829111 1-Jul-15
9217 9217 21128775 1-Jan-13
9217 10124544 15142897 1-Jan-13
9217 9217 30874351 1-Mar-13
9217 9217 71276102 1-Sep-14
9217 10124544 74885982 1-Nov-14
346372 346372 6478392 1-Sep-14
7801669	6368875	2591876	1-Dec-12
7801669	7801669	3829995	1-Jan-13
7801669	7801669	97082679 1-Aug-15
7801669	6368875	106930088 1-Nov-15
7801669	7801669	107207912 1-Nov-15
;
run;

proc sql;
create table key as
 select *,max(VALID_DTTM) as max_date,min(VALID_DTTM) as min_date
  from one
   group by curr_rk
    order by curr_rk,valid_dttm;
quit;

data want;
 if _n_=1 then do;
  if 0 then set key(rename=(CUST_SK=new_sk VALID_DTTM=new_dt));
  declare hash h1(dataset:'one(rename=(CUST_SK=new_sk VALID_DTTM=new_dt)
   where=(CURR_RK=ORIG_RK))');
  h1.definekey('CURR_RK','new_dt');
  h1.definedata('new_sk','new_dt');
  h1.definedone();
 end;
set key;
dif=99999;
do i=min_date to max_date;
 call missing(new_sk,new_dt);
 rc=h1.find(key:CURR_RK,key:i);
 if rc=0 then do;
  abs=abs(VALID_DTTM-new_dt);
  if abs

ravib · Posted 08-04-2016 06:08 AM

Hi,

In such case, I should pick the 1st record values and place it into the 2nd record.

7801669 7801669 97082679 1-Nov-15
7801669 6368875 106930088 1-Nov-15
7801669 7801669 107207912 1-Nov-15

New data would be:

7801669 7801669 97082679 1-Nov-15
7801669 6368875 97082679 1-Nov-15
7801669 7801669 107207912 1-Nov-15

Anyway, the previous program you have sent is working fine now as I sorted the data with matching flag now.

The new problem I am facing is the HASH object execution failure due to insufficient memory in Work.

Can I push this entire code onto Oracle database processing? Will Hash programs work on Oracle using SQL pass through?

ravib · Posted 08-04-2016 05:23 AM

Hi Ksharp,

I tried to execute the sas code you have given (using Hash object) through DI Studio on our UAT environment but it is failing:

ERROR: Hash object added 85983216 items when memory failure occured

FATAL : Insufficient memory to execute DATA step program. Aborted during the EXECUTION phase.

ERROR: The SAS system stopped processing this step because of insufficient memory.

My Input table is having 112937794 observations and hence HASH object process is causing the failure.

I verified my "xmrlmem" option and it is currently set to 6442450944.

I am not getting any other option handle this issue. Do you have any suggestions?

Ksharp · Posted 08-04-2016 06:30 AM

OK. Use DOW skill. But you still don't answer my question yet ?



data one;
infile datalines expandtabs truncover;
input CURR_RK ORIG_RK CUST_SK VALID_DTTM: anydtdte10. ;
format VALID_DTTM date9.;
datalines;
41 2673 2342321 1-Sep-12
41 41 12559928 1-Jan-13
41 4615581 14452649 1-Feb-13
41 4615581 67316104 1-Aug-14
41 41 72971041 1-Oct-14
41 4615581 74341618 1-Nov-14
41 8828883 63829111 1-Jul-15
9217 9217 21128775 1-Jan-13
9217 10124544 15142897 1-Jan-13
9217 9217 30874351 1-Mar-13
9217 9217 71276102 1-Sep-14
9217 10124544 74885982 1-Nov-14
346372 346372 6478392 1-Sep-14
7801669	6368875	2591876	1-Dec-12
7801669	7801669	3829995	1-Jan-13
7801669	7801669	97082679 1-Aug-15
7801669	6368875	106930088 1-Nov-15
7801669	7801669	107207912 1-Nov-15
;
run;

proc sql;
create table key as
 select *,max(VALID_DTTM) as max_date,min(VALID_DTTM) as min_date
  from one
   group by curr_rk
    order by curr_rk,valid_dttm;
quit;

data want;
 if _n_=1 then do;
  if 0 then set key(rename=(CUST_SK=new_sk VALID_DTTM=new_dt));
  declare hash h1(hashexp:20);
  h1.definekey('CURR_RK','new_dt');
  h1.definedata('new_sk','new_dt');
  h1.definedone();
 end;

do until(last.curr_rk);
 set key;
 by curr_rk;
 if CURR_RK=ORIG_RK then do;
  new_sk=CUST_SK; new_dt=VALID_DTTM; rc=h1.add();
 end;
end;

do until(last.curr_rk);
 set key;
 by curr_rk;
dif=99999;
do i=min_date to max_date;
 call missing(new_sk,new_dt);
 rc=h1.find(key:CURR_RK,key:i);
 if rc=0 then do;
  abs=abs(VALID_DTTM-new_dt);
  if abs

Ksharp · Posted 08-04-2016 06:33 AM

My code has been truncated and I can't delete it . Try this one .


data one;
infile datalines expandtabs truncover;
input CURR_RK ORIG_RK CUST_SK VALID_DTTM: anydtdte10. ;
format VALID_DTTM date9.;
datalines;
41 2673 2342321 1-Sep-12
41 41 12559928 1-Jan-13
41 4615581 14452649 1-Feb-13
41 4615581 67316104 1-Aug-14
41 41 72971041 1-Oct-14
41 4615581 74341618 1-Nov-14
41 8828883 63829111 1-Jul-15
9217 9217 21128775 1-Jan-13
9217 10124544 15142897 1-Jan-13
9217 9217 30874351 1-Mar-13
9217 9217 71276102 1-Sep-14
9217 10124544 74885982 1-Nov-14
346372 346372 6478392 1-Sep-14
7801669	6368875	2591876	1-Dec-12
7801669	7801669	3829995	1-Jan-13
7801669	7801669	97082679 1-Aug-15
7801669	6368875	106930088 1-Nov-15
7801669	7801669	107207912 1-Nov-15
;
run;

proc sql;
create table key as
 select *,max(VALID_DTTM) as max_date,min(VALID_DTTM) as min_date
  from one
   group by curr_rk
    order by curr_rk,valid_dttm;
quit;

data want;
 if _n_=1 then do;
  if 0 then set key(rename=(CUST_SK=new_sk VALID_DTTM=new_dt));
  declare hash h1(hashexp:20);
  h1.definekey('CURR_RK','new_dt');
  h1.definedata('new_sk','new_dt');
  h1.definedone();
 end;

do until(last.curr_rk);
 set key;
 by curr_rk;
 if CURR_RK=ORIG_RK then do;
  new_sk=CUST_SK; new_dt=VALID_DTTM; rc=h1.add();
 end;
end;

do until(last.curr_rk);
 set key;
 by curr_rk;
dif=99999;
do i=min_date to max_date;
 call missing(new_sk,new_dt);
 rc=h1.find(key:CURR_RK,key:i);
 if rc=0 then do;
  abs=abs(VALID_DTTM-new_dt);
  if abs lt dif then do;
   dif=abs;want_sk=new_sk;want_dt=new_dt;
  end;
 end;
end;
output;
end;

h1.clear();
format want_dt date9.;
drop rc dif i min_date max_date abs new_sk new_dt;
run;

Create date range variables by comparing current observation with previous and post observation

Re: Create date range variables by comparing current observation with previous and post observation

Re: Create date range variables by comparing current observation with previous and post observation

Re: Create date range variables by comparing current observation with previous and post observation

Re: Create date range variables by comparing current observation with previous and post observation

Re: Create date range variables by comparing current observation with previous and post observation

Re: Create date range variables by comparing current observation with previous and post observation

Re: Create date range variables by comparing current observation with previous and post observation

Re: Create date range variables by comparing current observation with previous and post observation

Re: Create date range variables by comparing current observation with previous and post observation

Re: Create date range variables by comparing current observation with previous and post observation

Re: Create date range variables by comparing current observation with previous and post observation

Re: Create date range variables by comparing current observation with previous and post observation

Re: Create date range variables by comparing current observation with previous and post observation

Re: Create date range variables by comparing current observation with previous and post observation

Re: Create date range variables by comparing current observation with previous and post observation

Registration is open

SAS Training: Just a Click Away