Hi
I have the following dataset:
Data have; input ID_W
Date_F: mmddyy10. ID_Fund;
105246 06/30/2016 10
datalines;
105246 09/30/2016 10
105246 12/31/2016 10
. 03/31/2017 10
. 06/30/2017 10
. 09/30/2017 10
. 12/31/2017 10
105250 12/31/2012 12
. 06/30/2017 12
. 09/30/2017 12
. 12/31/2017 12
105256 03/31/2016 25
105256 09/30/2016 25
. 06/30/2017 25
. 09/30/2017 25
run;
I want the below:
Data want; input ID_W
Date_F: mmddyy10. ID_Fund;
105246 06/30/2016 10
datalines;
105246 09/30/2016 10
105246 12/31/2016 10
105246 03/31/2017 10
105246 06/30/2017 10
105246 09/30/2017 10
105246 12/31/2017 10
105250 12/31/2012 12
. 06/30/2015 12
. 09/30/2017 12
. 12/31/2017 12
105256 03/31/2016 25
105256 09/30/2016 25
105256 06/30/2017 25
105256 09/30/2017 25
run;
This means for each ID_Fund, if ID_W exists for the dates of the year 2016, then it should replace the missing ID_Ws for all the dates of the year 2017. However, if ID_W exists for any year before 2016, then it should not replace the missing values.
Thanks.
Use a retained variable to keep values across observations:
Data have;
input ID_W Date_F: mmddyy10. ID_Fund;
format Date_F yymmddd10.;
datalines;
105246 06/30/2016 10
105246 09/30/2016 10
105246 12/31/2016 10
. 03/31/2017 10
. 06/30/2017 10
. 09/30/2017 10
. 12/31/2017 10
105250 12/31/2012 12
. 06/30/2017 12
. 09/30/2017 12
. 12/31/2017 12
105256 03/31/2016 25
105256 09/30/2016 25
. 06/30/2017 25
. 09/30/2017 25
;
run;
data want;
set have;
by id_fund;
retain _w;
if first.id_fund then _w = .;
if year(date_f) = 2016 and id_w ne . then _w = id_w;
if year(date_f) = 2017 and id_w = . and _w ne . then id_w = _w;
drop _w;
run;
proc print data=want noobs;
run;
Result:
ID_W Date_F ID_Fund 105246 2016-06-30 10 105246 2016-09-30 10 105246 2016-12-31 10 105246 2017-03-31 10 105246 2017-06-30 10 105246 2017-09-30 10 105246 2017-12-31 10 105250 2012-12-31 12 . 2017-06-30 12 . 2017-09-30 12 . 2017-12-31 12 105256 2016-03-31 25 105256 2016-09-30 25 105256 2017-06-30 25 105256 2017-09-30 25
Hi
I have the following dataset:
Data have; input ID_W
Date_F: mmddyy10. ID_Fund;
105246 06/30/2016 10
datalines;
105246 09/30/2016 10
105246 12/31/2016 10
. 03/31/2017 10
. 06/30/2017 10
. 09/30/2017 10
. 12/31/2017 10
105250 12/31/2012 12
. 06/30/2017 12
. 09/30/2017 12
. 12/31/2017 12
105256 03/31/2016 25
105256 09/30/2016 25
. 06/30/2017 25
. 09/30/2017 25
run;
I want the below:
Data want; input ID_W
Date_F: mmddyy10. ID_Fund;
105246 06/30/2016 10
datalines;
105246 09/30/2016 10
105246 12/31/2016 10
105246 03/31/2017 10
105246 06/30/2017 10
105246 09/30/2017 10
105246 12/31/2017 10
105250 12/31/2012 12
. 06/30/2015 12
. 09/30/2017 12
. 12/31/2017 12
105256 03/31/2016 25
105256 09/30/2016 25
105256 06/30/2017 25
105256 09/30/2017 25
run;
This means for each ID_Fund, if ID_W exists for the dates of the year 2016, then it should replace the missing ID_Ws for all the dates of the year 2017. However, if ID_W exists for any year before 2016, then it should not replace the missing values.
Thanks.
Use a retained variable to keep values across observations:
Data have;
input ID_W Date_F: mmddyy10. ID_Fund;
format Date_F yymmddd10.;
datalines;
105246 06/30/2016 10
105246 09/30/2016 10
105246 12/31/2016 10
. 03/31/2017 10
. 06/30/2017 10
. 09/30/2017 10
. 12/31/2017 10
105250 12/31/2012 12
. 06/30/2017 12
. 09/30/2017 12
. 12/31/2017 12
105256 03/31/2016 25
105256 09/30/2016 25
. 06/30/2017 25
. 09/30/2017 25
;
run;
data want;
set have;
by id_fund;
retain _w;
if first.id_fund then _w = .;
if year(date_f) = 2016 and id_w ne . then _w = id_w;
if year(date_f) = 2017 and id_w = . and _w ne . then id_w = _w;
drop _w;
run;
proc print data=want noobs;
run;
Result:
ID_W Date_F ID_Fund 105246 2016-06-30 10 105246 2016-09-30 10 105246 2016-12-31 10 105246 2017-03-31 10 105246 2017-06-30 10 105246 2017-09-30 10 105246 2017-12-31 10 105250 2012-12-31 12 . 2017-06-30 12 . 2017-09-30 12 . 2017-12-31 12 105256 2016-03-31 25 105256 2016-09-30 25 105256 2017-06-30 25 105256 2017-09-30 25
Something like:
data have; input id_w date_f: mmddyy10. id_fund; format date_f date9.; datalines; 105246 06/30/2016 10 105246 09/30/2016 10 105246 12/31/2016 10 . 03/31/2017 10 . 06/30/2017 10 . 09/30/2017 10 . 12/31/2017 10 105250 12/31/2012 12 . 06/30/2017 12 . 09/30/2017 12 . 12/31/2017 12 105256 03/31/2016 25 105256 09/30/2016 25 . 06/30/2017 25 . 09/30/2017 25 ; run; data want; set have; retain lstid; if id_w ne . and year(date_f)=2016 then lstid=id_w; if id_w ne . and id_w ne lstid then lstid=.; if id_w=. and year(date_f) ge 2016 then id_w=lstid; run;
Here is a solution that use the SET ... POINT= instead of retained variables.
Data have;
input ID_W Date_F: mmddyy10. ID_Fund;
format Date_F yymmddd10.;
datalines;
105246 06/30/2016 10
105246 09/30/2016 10
105246 12/31/2016 10
. 03/31/2017 10
. 06/30/2017 10
. 09/30/2017 10
. 12/31/2017 10
105250 12/31/2012 12
. 06/30/2017 12
. 09/30/2017 12
. 12/31/2017 12
105256 03/31/2016 25
105256 09/30/2016 25
. 06/30/2017 25
. 09/30/2017 25
run;
data want;
set have ;
by id_fund;;
if first.id_fund then ptr=.;
if year(date_f)=2016 then ptr=_n_;
if id_w=. and ptr^=. then set have (keep=id_w) point=ptr;
run;
The ptr variable keeps track of the position of most recent 2016 record within a BY ID_FUND group. When you are concerned with updating only the ID_W variable, it has no particular advantage over the retain approach. But if you wanted to recall (say) 20 variables from the 2016 record when ID_W is missing, then all you would need to do is list those 20 variables in the "keep=" parameter in the SET .... POINT= statement.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.