Write and run SAS programs in your web browser

Replace Missing Values with Lag and delete observations with all missing values

Accepted Solution Solved
Reply
Contributor
Posts: 25
Accepted Solution

Replace Missing Values with Lag and delete observations with all missing values

Hi all,

I have an issue with my dataset that I hope you will help me to solve. I am using SAS University Edition v.9.4.

 

In my dataset, which is a panel data sorted by the variables customer ID, year and month, I have some missing values linked to the variable customer_age that I would like to substitute with their lagged values.
Below, a small data step of my dataset to play with it around. 

As you will see, I have four types of missing values:

1. Clients whose age is first observed, then missing and observed again (See cust_id 1). In this case, SAS should substitute the missing value with the previous observation.

2. Client whose age is first missing and then observed (See cust_id 3). In this case, SAS should substitute the missing with the first next observation it can find. In the case of cust_id 3, SAS should substitute the missing values from January 2017 and February 2017 with the customer_age observed on March 2017.

3. Clients whose age is always missing (See cust_id 5). In this case, I would like SAS to delete these observations from the dataset. In other words, if customer_age is a missing value from January 2017 to December 2017, then delete. 

I hope I've given you the idea of my issue and hopefully you will help! 

Thank you in advance.

 

data have;
input cust_id year month customer_age;
cards;
1 2017 1 23
1 2017 2 23
1 2017 3 23
1 2017 4 .
1 2017 5 .
1 2017 6 .
1 2017 7 23
1 2017 8 23
1 2017 9 23
1 2017 10 23
1 2017 11 23
1 2017 12 23
2 2017 1 41
2 2017 2 41
2 2017 3 41
2 2017 4 41
2 2017 5 41
2 2017 6 41
2 2017 7 41
2 2017 8 41
2 2017 9 41
2 2017 10 41
2 2017 11 41
2 2017 12 .
3 2017 1 .
3 2017 2 .
3 2017 3 22
3 2017 4 22
3 2017 5 22
3 2017 6 22
3 2017 7 22
3 2017 8 22
3 2017 9 22
3 2017 10 22
3 2017 11 22
3 2017 12 22
4 2017 1 34
4 2017 2 34
4 2017 3 34
4 2017 4 34
4 2017 5 34
4 2017 6 34
4 2017 7 34
4 2017 8 34
4 2017 9 34
4 2017 10 34
4 2017 11 34
4 2017 12 34
5 2017 1 .
5 2017 2 .
5 2017 3 .
5 2017 4 .
5 2017 5 .
5 2017 6 .
5 2017 7 .
5 2017 8 .
5 2017 9 .
5 2017 10 .
5 2017 11 .
5 2017 12 .
6 2017 1 56
6 2017 2 .
6 2017 3 .
6 2017 4 .
6 2017 5 .
6 2017 6 56
6 2017 7 56
6 2017 8 56
6 2017 9 56
6 2017 10 56
6 2017 11 56
6 2017 12 56
7 2017 1 45
7 2017 2 45
7 2017 3 45
7 2017 4 45
7 2017 5 45
7 2017 6 45
7 2017 7 45
7 2017 8 45
7 2017 9 45
7 2017 10 45
7 2017 11 45
7 2017 12 .
;
run;

 


Accepted Solutions
Solution
‎07-16-2017 05:14 AM
PROC Star
Posts: 7,468

Re: Replace Missing Values with Lag and delete observations with all missing values

[ Edited ]

I changed your example for cust_id eq 1 to meet your first condition. Since neither of the two proposed solutions came up with what I think you wanted, I came up with an alternative solution:

data have;
  input cust_id year month customer_age;
  cards;
1 2017 1 23
1 2017 2 23
1 2017 3 23
1 2017 4 .
1 2017 5 .
1 2017 6 .
1 2017 7 24
1 2017 8 24
1 2017 9 24
1 2017 10 .
1 2017 11 24
1 2017 12 24
2 2017 1 41
2 2017 2 41
2 2017 3 41
2 2017 4 41
2 2017 5 41
2 2017 6 41
2 2017 7 41
2 2017 8 41
2 2017 9 41
2 2017 10 41
2 2017 11 41
2 2017 12 .
3 2017 1 .
3 2017 2 .
3 2017 3 22
3 2017 4 22
3 2017 5 22
3 2017 6 22
3 2017 7 22
3 2017 8 22
3 2017 9 22
3 2017 10 22
3 2017 11 22
3 2017 12 22
4 2017 1 34
4 2017 2 34
4 2017 3 34
4 2017 4 34
4 2017 5 34
4 2017 6 34
4 2017 7 34
4 2017 8 34
4 2017 9 34
4 2017 10 34
4 2017 11 34
4 2017 12 34
5 2017 1 .
5 2017 2 .
5 2017 3 .
5 2017 4 .
5 2017 5 .
5 2017 6 .
5 2017 7 .
5 2017 8 .
5 2017 9 .
5 2017 10 .
5 2017 11 .
5 2017 12 .
6 2017 1 56
6 2017 2 .
6 2017 3 .
6 2017 4 .
6 2017 5 .
6 2017 6 56
6 2017 7 56
6 2017 8 56
6 2017 9 56
6 2017 10 56
6 2017 11 56
6 2017 12 56
7 2017 1 45
7 2017 2 45
7 2017 3 45
7 2017 4 45
7 2017 5 45
7 2017 6 45
7 2017 7 45
7 2017 8 45
7 2017 9 45
7 2017 10 45
7 2017 11 45
7 2017 12 .
8 2017 1 12
8 2017 2 .
8 2017 3 .
8 2017 4 .
8 2017 5 .
8 2017 6 .
8 2017 7 .
8 2017 8 .
8 2017 9 .
8 2017 10 .
8 2017 11 .
8 2017 12 .
9 2017 1 .
9 2017 2 .
9 2017 3 .
9 2017 4 .
9 2017 5 .
9 2017 6 .
9 2017 7 .
9 2017 8 .
9 2017 9 .
9 2017 10 .
9 2017 11 .
9 2017 12 13
;
run;

data need;
  set have;
  by cust_id;
  retain holdage .;
  if first.cust_id then do;
    group=1;
    counter=1;
  end;
  else if not missing(customer_age) then do;
    if counter ne 1 then group+1;
    counter+1;
  end;
  keep cust_id year month group;
run;

data want;
  merge have need;
  by cust_id year month;
run;

data want (drop=customer_age group rename=(c_age=customer_age));
  do until (last.group);
    set want;
    by cust_id group;
    if not missing(customer_age) then c_age=customer_age;
  end;
   do until (last.group);
    set want;
    by cust_id group;
    if not missing(c_age) then output;
  end;
run;

Art, CEO, AnalystFinder.com

 

Note: the above code assumes that the original file is already sorted in cust_id year month order. Minimally, if all of the records for each ID are grouped together and in Year Month order, then only the following changes would have to be made to the code:

data need;
  set have;
  by cust_id notsorted;
  retain holdage .;
  if first.cust_id then do;
    group=1;
    counter=1;
  end;
  else if not missing(customer_age) then do;
    if counter ne 1 then group+1;
    counter+1;
  end;
  keep cust_id year month group;
run;

data want;
  merge have need;
  by cust_id year month notsorted;
run;

data want (drop=customer_age group rename=(c_age=customer_age));
  do until (last.group);
    set want;
    by cust_id group notsorted;
    if not missing(customer_age) then c_age=customer_age;
  end;
   do until (last.group);
    set want;
    by cust_id group notsorted;
    if not missing(c_age) then output;
  end;
run;

View solution in original post


All Replies
Frequent Contributor
Posts: 95

Re: Replace Missing Values with Lag and delete observations with all missing values

Add these data steps to your code.

 

data new;  set have;   by cust_id year month;    retain holdage .;
if first.cust_id then holdage=.;
if customer_age not =. then holdage=customer_age;
if last.cust_id then output;
keep cust_id holdage;
proc print; title 'new';


data three; merge have new ; by cust_id ;

if customer_age=. then customer_age =holdage;

if customer_age=. then delete; *else output;

proc print data=three; id cust_id customer_age holdage; run;

 

 

Jim

PROC Star
Posts: 7,468

Re: Replace Missing Values with Lag and delete observations with all missing values

@Jim_G: Out of curiosity, are you my old friend from the Netherlands?

 

Art, CEO, AnalystFinder.com

 

Frequent Contributor
Posts: 95

Re: Replace Missing Values with Lag and delete observations with all missing values

No Art,  I am from Ohio.  An old mainframe programmer.   Jim

Trusted Advisor
Posts: 1,137

Re: Replace Missing Values with Lag and delete observations with all missing values


proc sort data=have;
by cust_id year  descending customer_age month;
run;

data want;
set have;
by cust_id year  descending customer_age month;
retain new_age;
if first.cust_id then new_age =.;
if customer_age ne . then new_age = customer_age;
customer_age=coalesce(customer_age,new_age);
if customer_age^=.;
drop new_age;
run;

proc sort data=want;
by cust_id year month;
run;
Thanks,
Jag
Solution
‎07-16-2017 05:14 AM
PROC Star
Posts: 7,468

Re: Replace Missing Values with Lag and delete observations with all missing values

[ Edited ]

I changed your example for cust_id eq 1 to meet your first condition. Since neither of the two proposed solutions came up with what I think you wanted, I came up with an alternative solution:

data have;
  input cust_id year month customer_age;
  cards;
1 2017 1 23
1 2017 2 23
1 2017 3 23
1 2017 4 .
1 2017 5 .
1 2017 6 .
1 2017 7 24
1 2017 8 24
1 2017 9 24
1 2017 10 .
1 2017 11 24
1 2017 12 24
2 2017 1 41
2 2017 2 41
2 2017 3 41
2 2017 4 41
2 2017 5 41
2 2017 6 41
2 2017 7 41
2 2017 8 41
2 2017 9 41
2 2017 10 41
2 2017 11 41
2 2017 12 .
3 2017 1 .
3 2017 2 .
3 2017 3 22
3 2017 4 22
3 2017 5 22
3 2017 6 22
3 2017 7 22
3 2017 8 22
3 2017 9 22
3 2017 10 22
3 2017 11 22
3 2017 12 22
4 2017 1 34
4 2017 2 34
4 2017 3 34
4 2017 4 34
4 2017 5 34
4 2017 6 34
4 2017 7 34
4 2017 8 34
4 2017 9 34
4 2017 10 34
4 2017 11 34
4 2017 12 34
5 2017 1 .
5 2017 2 .
5 2017 3 .
5 2017 4 .
5 2017 5 .
5 2017 6 .
5 2017 7 .
5 2017 8 .
5 2017 9 .
5 2017 10 .
5 2017 11 .
5 2017 12 .
6 2017 1 56
6 2017 2 .
6 2017 3 .
6 2017 4 .
6 2017 5 .
6 2017 6 56
6 2017 7 56
6 2017 8 56
6 2017 9 56
6 2017 10 56
6 2017 11 56
6 2017 12 56
7 2017 1 45
7 2017 2 45
7 2017 3 45
7 2017 4 45
7 2017 5 45
7 2017 6 45
7 2017 7 45
7 2017 8 45
7 2017 9 45
7 2017 10 45
7 2017 11 45
7 2017 12 .
8 2017 1 12
8 2017 2 .
8 2017 3 .
8 2017 4 .
8 2017 5 .
8 2017 6 .
8 2017 7 .
8 2017 8 .
8 2017 9 .
8 2017 10 .
8 2017 11 .
8 2017 12 .
9 2017 1 .
9 2017 2 .
9 2017 3 .
9 2017 4 .
9 2017 5 .
9 2017 6 .
9 2017 7 .
9 2017 8 .
9 2017 9 .
9 2017 10 .
9 2017 11 .
9 2017 12 13
;
run;

data need;
  set have;
  by cust_id;
  retain holdage .;
  if first.cust_id then do;
    group=1;
    counter=1;
  end;
  else if not missing(customer_age) then do;
    if counter ne 1 then group+1;
    counter+1;
  end;
  keep cust_id year month group;
run;

data want;
  merge have need;
  by cust_id year month;
run;

data want (drop=customer_age group rename=(c_age=customer_age));
  do until (last.group);
    set want;
    by cust_id group;
    if not missing(customer_age) then c_age=customer_age;
  end;
   do until (last.group);
    set want;
    by cust_id group;
    if not missing(c_age) then output;
  end;
run;

Art, CEO, AnalystFinder.com

 

Note: the above code assumes that the original file is already sorted in cust_id year month order. Minimally, if all of the records for each ID are grouped together and in Year Month order, then only the following changes would have to be made to the code:

data need;
  set have;
  by cust_id notsorted;
  retain holdage .;
  if first.cust_id then do;
    group=1;
    counter=1;
  end;
  else if not missing(customer_age) then do;
    if counter ne 1 then group+1;
    counter+1;
  end;
  keep cust_id year month group;
run;

data want;
  merge have need;
  by cust_id year month notsorted;
run;

data want (drop=customer_age group rename=(c_age=customer_age));
  do until (last.group);
    set want;
    by cust_id group notsorted;
    if not missing(customer_age) then c_age=customer_age;
  end;
   do until (last.group);
    set want;
    by cust_id group notsorted;
    if not missing(c_age) then output;
  end;
run;
Contributor
Posts: 25

Re: Replace Missing Values with Lag and delete observations with all missing values

Thank you to everyone and each of you!! Smiley Happy

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 405 views
  • 4 likes
  • 4 in conversation