BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Saba1
Quartz | Level 8

Hi

I have the following dataset:

 

Data have;
input ID_W Date_F: mmddyy10. ID_Fund;
datalines;
105246 06/30/2016 10
105246 09/30/2016 10
105246 12/31/2016 10
. 03/31/2017 10
. 06/30/2017 10
. 09/30/2017 10
. 12/31/2017 10
105250 12/31/2012 12
. 06/30/2017 12
. 09/30/2017 12
. 12/31/2017 12
105256 03/31/2016 25
105256 09/30/2016 25
. 06/30/2017 25
. 09/30/2017 25
run;

 

I want the below:

 

Data want;
input ID_W Date_F: mmddyy10. ID_Fund;
datalines;
105246 06/30/2016 10
105246 09/30/2016 10
105246 12/31/2016 10
105246 03/31/2017 10
105246 06/30/2017 10
105246 09/30/2017 10
105246 12/31/2017 10
105250 12/31/2012 12
. 06/30/2015 12
. 09/30/2017 12
. 12/31/2017 12
105256 03/31/2016 25
105256 09/30/2016 25
105256 06/30/2017 25
105256 09/30/2017 25
run;

 

This means for each ID_Fund, if ID_W exists for the dates of the year 2016, then it should replace the missing ID_Ws for all the dates of the year 2017. However, if ID_W exists for any year before 2016, then it should not replace the missing values.

 

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Use a retained variable to keep values across observations:

Data have;
input ID_W Date_F: mmddyy10. ID_Fund;
format Date_F yymmddd10.;
datalines;
105246 06/30/2016 10
105246 09/30/2016 10
105246 12/31/2016 10
. 03/31/2017 10
. 06/30/2017 10
. 09/30/2017 10
. 12/31/2017 10
105250 12/31/2012 12
. 06/30/2017 12
. 09/30/2017 12
. 12/31/2017 12
105256 03/31/2016 25
105256 09/30/2016 25
. 06/30/2017 25
. 09/30/2017 25
;
run;

data want;
set have;
by id_fund;
retain _w;
if first.id_fund then _w = .;
if year(date_f) = 2016 and id_w ne . then _w = id_w;
if year(date_f) = 2017 and id_w = . and _w ne . then id_w = _w;
drop _w;
run;

proc print data=want noobs;
run;

Result:

 ID_W         Date_F    ID_Fund

105246    2016-06-30       10  
105246    2016-09-30       10  
105246    2016-12-31       10  
105246    2017-03-31       10  
105246    2017-06-30       10  
105246    2017-09-30       10  
105246    2017-12-31       10  
105250    2012-12-31       12  
     .    2017-06-30       12  
     .    2017-09-30       12  
     .    2017-12-31       12  
105256    2016-03-31       25  
105256    2016-09-30       25  
105256    2017-06-30       25  
105256    2017-09-30       25  

View solution in original post

7 REPLIES 7
Saba1
Quartz | Level 8

Hi

I have the following dataset:

 

Data have;
input ID_W Date_F: mmddyy10. ID_Fund;
datalines;
105246 06/30/2016 10
105246 09/30/2016 10
105246 12/31/2016 10
. 03/31/2017 10
. 06/30/2017 10
. 09/30/2017 10
. 12/31/2017 10
105250 12/31/2012 12
. 06/30/2017 12
. 09/30/2017 12
. 12/31/2017 12
105256 03/31/2016 25
105256 09/30/2016 25
. 06/30/2017 25
. 09/30/2017 25
run;

 

I want the below:

 

Data want;
input ID_W Date_F: mmddyy10. ID_Fund;
datalines;
105246 06/30/2016 10
105246 09/30/2016 10
105246 12/31/2016 10
105246 03/31/2017 10
105246 06/30/2017 10
105246 09/30/2017 10
105246 12/31/2017 10
105250 12/31/2012 12
. 06/30/2015 12
. 09/30/2017 12
. 12/31/2017 12
105256 03/31/2016 25
105256 09/30/2016 25
105256 06/30/2017 25
105256 09/30/2017 25
run;

 

This means for each ID_Fund, if ID_W exists for the dates of the year 2016, then it should replace the missing ID_Ws for all the dates of the year 2017. However, if ID_W exists for any year before 2016, then it should not replace the missing values.

 

Thanks.

Kurt_Bremser
Super User

Use a retained variable to keep values across observations:

Data have;
input ID_W Date_F: mmddyy10. ID_Fund;
format Date_F yymmddd10.;
datalines;
105246 06/30/2016 10
105246 09/30/2016 10
105246 12/31/2016 10
. 03/31/2017 10
. 06/30/2017 10
. 09/30/2017 10
. 12/31/2017 10
105250 12/31/2012 12
. 06/30/2017 12
. 09/30/2017 12
. 12/31/2017 12
105256 03/31/2016 25
105256 09/30/2016 25
. 06/30/2017 25
. 09/30/2017 25
;
run;

data want;
set have;
by id_fund;
retain _w;
if first.id_fund then _w = .;
if year(date_f) = 2016 and id_w ne . then _w = id_w;
if year(date_f) = 2017 and id_w = . and _w ne . then id_w = _w;
drop _w;
run;

proc print data=want noobs;
run;

Result:

 ID_W         Date_F    ID_Fund

105246    2016-06-30       10  
105246    2016-09-30       10  
105246    2016-12-31       10  
105246    2017-03-31       10  
105246    2017-06-30       10  
105246    2017-09-30       10  
105246    2017-12-31       10  
105250    2012-12-31       12  
     .    2017-06-30       12  
     .    2017-09-30       12  
     .    2017-12-31       12  
105256    2016-03-31       25  
105256    2016-09-30       25  
105256    2017-06-30       25  
105256    2017-09-30       25  
Saba1
Quartz | Level 8
thanks a lot. it works really well.
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Something like:

data have;
  input id_w date_f: mmddyy10. id_fund;
  format date_f date9.;
datalines;
105246 06/30/2016 10
105246 09/30/2016 10
105246 12/31/2016 10
. 03/31/2017 10
. 06/30/2017 10
. 09/30/2017 10
. 12/31/2017 10
105250 12/31/2012 12
. 06/30/2017 12
. 09/30/2017 12
. 12/31/2017 12
105256 03/31/2016 25
105256 09/30/2016 25
. 06/30/2017 25
. 09/30/2017 25
;
run;

data want;
  set have;
  retain lstid;
  if id_w ne . and year(date_f)=2016 then lstid=id_w;
  if id_w ne . and id_w ne lstid then lstid=.;
  if id_w=. and year(date_f) ge 2016 then id_w=lstid;
run;
  
  
  
Saba1
Quartz | Level 8
thank you for the guidance.
mkeintz
PROC Star

Here is a solution that use the SET ... POINT=  instead of retained variables.

 

Data have;
input ID_W Date_F: mmddyy10. ID_Fund;
format Date_F yymmddd10.;
datalines;
105246 06/30/2016 10
105246 09/30/2016 10
105246 12/31/2016 10
. 03/31/2017 10
. 06/30/2017 10
. 09/30/2017 10
. 12/31/2017 10
105250 12/31/2012 12
. 06/30/2017 12
. 09/30/2017 12
. 12/31/2017 12
105256 03/31/2016 25
105256 09/30/2016 25
. 06/30/2017 25
. 09/30/2017 25
run;

data want;
  set have ;
  by id_fund;;
  if first.id_fund then ptr=.;
  if year(date_f)=2016 then ptr=_n_;
  if id_w=. and ptr^=. then set have (keep=id_w) point=ptr;
run;

The ptr variable keeps track of the position of most recent 2016 record within a BY ID_FUND group. When you are concerned with updating only the ID_W variable, it has no particular advantage over the retain approach.  But if you wanted to recall (say) 20 variables from the 2016 record when ID_W is missing, then all you would need to do is list those 20 variables in the "keep=" parameter in the SET .... POINT= statement.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 681 views
  • 0 likes
  • 4 in conversation