Solved: Replacing missing IDs with the previous ones based on Date condition.

Saba1 · Posted 09-28-2018 02:15 AM

Hi

I have the following dataset:

Data have;
input ID_W Date_F: mmddyy10. ID_Fund;
datalines;
105246 06/30/2016 10
105246 09/30/2016 10
105246 12/31/2016 10
. 03/31/2017 10
. 06/30/2017 10
. 09/30/2017 10
. 12/31/2017 10
105250 12/31/2012 12
. 06/30/2017 12
. 09/30/2017 12
. 12/31/2017 12
105256 03/31/2016 25
105256 09/30/2016 25
. 06/30/2017 25
. 09/30/2017 25
run;

I want the below:

Data want;
input ID_W Date_F: mmddyy10. ID_Fund;
datalines;
105246 06/30/2016 10
105246 09/30/2016 10
105246 12/31/2016 10
105246 03/31/2017 10
105246 06/30/2017 10
105246 09/30/2017 10
105246 12/31/2017 10
105250 12/31/2012 12
. 06/30/2015 12
. 09/30/2017 12
. 12/31/2017 12
105256 03/31/2016 25
105256 09/30/2016 25
105256 06/30/2017 25
105256 09/30/2017 25
run;

This means for each ID_Fund, if ID_W exists for the dates of the year 2016, then it should replace the missing ID_Ws for all the dates of the year 2017. However, if ID_W exists for any year before 2016, then it should not replace the missing values.

Thanks.

Kurt_Bremser · Posted 09-28-2018 03:20 AM

Use a retained variable to keep values across observations:

Data have;
input ID_W Date_F: mmddyy10. ID_Fund;
format Date_F yymmddd10.;
datalines;
105246 06/30/2016 10
105246 09/30/2016 10
105246 12/31/2016 10
. 03/31/2017 10
. 06/30/2017 10
. 09/30/2017 10
. 12/31/2017 10
105250 12/31/2012 12
. 06/30/2017 12
. 09/30/2017 12
. 12/31/2017 12
105256 03/31/2016 25
105256 09/30/2016 25
. 06/30/2017 25
. 09/30/2017 25
;
run;

data want;
set have;
by id_fund;
retain _w;
if first.id_fund then _w = .;
if year(date_f) = 2016 and id_w ne . then _w = id_w;
if year(date_f) = 2017 and id_w = . and _w ne . then id_w = _w;
drop _w;
run;

proc print data=want noobs;
run;

Result:

 ID_W         Date_F    ID_Fund

105246    2016-06-30       10  
105246    2016-09-30       10  
105246    2016-12-31       10  
105246    2017-03-31       10  
105246    2017-06-30       10  
105246    2017-09-30       10  
105246    2017-12-31       10  
105250    2012-12-31       12  
     .    2017-06-30       12  
     .    2017-09-30       12  
     .    2017-12-31       12  
105256    2016-03-31       25  
105256    2016-09-30       25  
105256    2017-06-30       25  
105256    2017-09-30       25

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

Saba1 · Posted 09-28-2018 03:00 AM

Hi

I have the following dataset:

Data have;
input ID_W Date_F: mmddyy10. ID_Fund;
datalines;
105246 06/30/2016 10
105246 09/30/2016 10
105246 12/31/2016 10
. 03/31/2017 10
. 06/30/2017 10
. 09/30/2017 10
. 12/31/2017 10
105250 12/31/2012 12
. 06/30/2017 12
. 09/30/2017 12
. 12/31/2017 12
105256 03/31/2016 25
105256 09/30/2016 25
. 06/30/2017 25
. 09/30/2017 25
run;

I want the below:

Data want;
input ID_W Date_F: mmddyy10. ID_Fund;
datalines;
105246 06/30/2016 10
105246 09/30/2016 10
105246 12/31/2016 10
105246 03/31/2017 10
105246 06/30/2017 10
105246 09/30/2017 10
105246 12/31/2017 10
105250 12/31/2012 12
. 06/30/2015 12
. 09/30/2017 12
. 12/31/2017 12
105256 03/31/2016 25
105256 09/30/2016 25
105256 06/30/2017 25
105256 09/30/2017 25
run;

This means for each ID_Fund, if ID_W exists for the dates of the year 2016, then it should replace the missing ID_Ws for all the dates of the year 2017. However, if ID_W exists for any year before 2016, then it should not replace the missing values.

Thanks.

Kurt_Bremser · Posted 09-28-2018 03:20 AM

Use a retained variable to keep values across observations:

Data have;
input ID_W Date_F: mmddyy10. ID_Fund;
format Date_F yymmddd10.;
datalines;
105246 06/30/2016 10
105246 09/30/2016 10
105246 12/31/2016 10
. 03/31/2017 10
. 06/30/2017 10
. 09/30/2017 10
. 12/31/2017 10
105250 12/31/2012 12
. 06/30/2017 12
. 09/30/2017 12
. 12/31/2017 12
105256 03/31/2016 25
105256 09/30/2016 25
. 06/30/2017 25
. 09/30/2017 25
;
run;

data want;
set have;
by id_fund;
retain _w;
if first.id_fund then _w = .;
if year(date_f) = 2016 and id_w ne . then _w = id_w;
if year(date_f) = 2017 and id_w = . and _w ne . then id_w = _w;
drop _w;
run;

proc print data=want noobs;
run;

Result:

 ID_W         Date_F    ID_Fund

105246    2016-06-30       10  
105246    2016-09-30       10  
105246    2016-12-31       10  
105246    2017-03-31       10  
105246    2017-06-30       10  
105246    2017-09-30       10  
105246    2017-12-31       10  
105250    2012-12-31       12  
     .    2017-06-30       12  
     .    2017-09-30       12  
     .    2017-12-31       12  
105256    2016-03-31       25  
105256    2016-09-30       25  
105256    2017-06-30       25  
105256    2017-09-30       25

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Saba1 · Posted 09-29-2018 08:38 AM

thanks a lot. it works really well.

RW9 · Posted 09-28-2018 03:24 AM

Something like:

data have;
  input id_w date_f: mmddyy10. id_fund;
  format date_f date9.;
datalines;
105246 06/30/2016 10
105246 09/30/2016 10
105246 12/31/2016 10
. 03/31/2017 10
. 06/30/2017 10
. 09/30/2017 10
. 12/31/2017 10
105250 12/31/2012 12
. 06/30/2017 12
. 09/30/2017 12
. 12/31/2017 12
105256 03/31/2016 25
105256 09/30/2016 25
. 06/30/2017 25
. 09/30/2017 25
;
run;

data want;
  set have;
  retain lstid;
  if id_w ne . and year(date_f)=2016 then lstid=id_w;
  if id_w ne . and id_w ne lstid then lstid=.;
  if id_w=. and year(date_f) ge 2016 then id_w=lstid;
run;

Saba1 · Posted 09-29-2018 08:39 AM

thank you for the guidance.

mkeintz · Posted 09-28-2018 07:55 AM

Here is a solution that use the SET ... POINT= instead of retained variables.

Data have;
input ID_W Date_F: mmddyy10. ID_Fund;
format Date_F yymmddd10.;
datalines;
105246 06/30/2016 10
105246 09/30/2016 10
105246 12/31/2016 10
. 03/31/2017 10
. 06/30/2017 10
. 09/30/2017 10
. 12/31/2017 10
105250 12/31/2012 12
. 06/30/2017 12
. 09/30/2017 12
. 12/31/2017 12
105256 03/31/2016 25
105256 09/30/2016 25
. 06/30/2017 25
. 09/30/2017 25
run;

data want;
  set have ;
  by id_fund;;
  if first.id_fund then ptr=.;
  if year(date_f)=2016 then ptr=_n_;
  if id_w=. and ptr^=. then set have (keep=id_w) point=ptr;
run;

The ptr variable keeps track of the position of most recent 2016 record within a BY ID_FUND group. When you are concerned with updating only the ID_W variable, it has no particular advantage over the retain approach. But if you wanted to recall (say) 20 variables from the 2016 record when ID_W is missing, then all you would need to do is list those 20 variables in the "keep=" parameter in the SET .... POINT= statement.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Saba1 · Posted 09-29-2018 08:39 AM

thanks a lot.

Replacing missing IDs with the previous ones based on Date condition.

Re: Replacing missing IDs with the previous ones based on Date condition.

Replacing missing IDs with the previous ones based on Date condition

Re: Replacing missing IDs with the previous ones based on Date condition.

Re: Replacing missing IDs with the previous ones based on Date condition.

Re: Replacing missing IDs with the previous ones based on Date condition.

Re: Replacing missing IDs with the previous ones based on Date condition.

Re: Replacing missing IDs with the previous ones based on Date condition.

Re: Replacing missing IDs with the previous ones based on Date condition.

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away