BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
noemi_b
Obsidian | Level 7

Hi all,

I have an issue with my dataset that I hope you will help me to solve. I am using SAS University Edition v.9.4.

 

In my dataset, which is a panel data sorted by the variables customer ID, year and month, I have some missing values linked to the variable customer_age that I would like to substitute with their lagged values.
Below, a small data step of my dataset to play with it around. 

As you will see, I have four types of missing values:

1. Clients whose age is first observed, then missing and observed again (See cust_id 1). In this case, SAS should substitute the missing value with the previous observation.

2. Client whose age is first missing and then observed (See cust_id 3). In this case, SAS should substitute the missing with the first next observation it can find. In the case of cust_id 3, SAS should substitute the missing values from January 2017 and February 2017 with the customer_age observed on March 2017.

3. Clients whose age is always missing (See cust_id 5). In this case, I would like SAS to delete these observations from the dataset. In other words, if customer_age is a missing value from January 2017 to December 2017, then delete. 

I hope I've given you the idea of my issue and hopefully you will help! 

Thank you in advance.

 

data have;
input cust_id year month customer_age;
cards;
1 2017 1 23
1 2017 2 23
1 2017 3 23
1 2017 4 .
1 2017 5 .
1 2017 6 .
1 2017 7 23
1 2017 8 23
1 2017 9 23
1 2017 10 23
1 2017 11 23
1 2017 12 23
2 2017 1 41
2 2017 2 41
2 2017 3 41
2 2017 4 41
2 2017 5 41
2 2017 6 41
2 2017 7 41
2 2017 8 41
2 2017 9 41
2 2017 10 41
2 2017 11 41
2 2017 12 .
3 2017 1 .
3 2017 2 .
3 2017 3 22
3 2017 4 22
3 2017 5 22
3 2017 6 22
3 2017 7 22
3 2017 8 22
3 2017 9 22
3 2017 10 22
3 2017 11 22
3 2017 12 22
4 2017 1 34
4 2017 2 34
4 2017 3 34
4 2017 4 34
4 2017 5 34
4 2017 6 34
4 2017 7 34
4 2017 8 34
4 2017 9 34
4 2017 10 34
4 2017 11 34
4 2017 12 34
5 2017 1 .
5 2017 2 .
5 2017 3 .
5 2017 4 .
5 2017 5 .
5 2017 6 .
5 2017 7 .
5 2017 8 .
5 2017 9 .
5 2017 10 .
5 2017 11 .
5 2017 12 .
6 2017 1 56
6 2017 2 .
6 2017 3 .
6 2017 4 .
6 2017 5 .
6 2017 6 56
6 2017 7 56
6 2017 8 56
6 2017 9 56
6 2017 10 56
6 2017 11 56
6 2017 12 56
7 2017 1 45
7 2017 2 45
7 2017 3 45
7 2017 4 45
7 2017 5 45
7 2017 6 45
7 2017 7 45
7 2017 8 45
7 2017 9 45
7 2017 10 45
7 2017 11 45
7 2017 12 .
;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

I changed your example for cust_id eq 1 to meet your first condition. Since neither of the two proposed solutions came up with what I think you wanted, I came up with an alternative solution:

data have;
  input cust_id year month customer_age;
  cards;
1 2017 1 23
1 2017 2 23
1 2017 3 23
1 2017 4 .
1 2017 5 .
1 2017 6 .
1 2017 7 24
1 2017 8 24
1 2017 9 24
1 2017 10 .
1 2017 11 24
1 2017 12 24
2 2017 1 41
2 2017 2 41
2 2017 3 41
2 2017 4 41
2 2017 5 41
2 2017 6 41
2 2017 7 41
2 2017 8 41
2 2017 9 41
2 2017 10 41
2 2017 11 41
2 2017 12 .
3 2017 1 .
3 2017 2 .
3 2017 3 22
3 2017 4 22
3 2017 5 22
3 2017 6 22
3 2017 7 22
3 2017 8 22
3 2017 9 22
3 2017 10 22
3 2017 11 22
3 2017 12 22
4 2017 1 34
4 2017 2 34
4 2017 3 34
4 2017 4 34
4 2017 5 34
4 2017 6 34
4 2017 7 34
4 2017 8 34
4 2017 9 34
4 2017 10 34
4 2017 11 34
4 2017 12 34
5 2017 1 .
5 2017 2 .
5 2017 3 .
5 2017 4 .
5 2017 5 .
5 2017 6 .
5 2017 7 .
5 2017 8 .
5 2017 9 .
5 2017 10 .
5 2017 11 .
5 2017 12 .
6 2017 1 56
6 2017 2 .
6 2017 3 .
6 2017 4 .
6 2017 5 .
6 2017 6 56
6 2017 7 56
6 2017 8 56
6 2017 9 56
6 2017 10 56
6 2017 11 56
6 2017 12 56
7 2017 1 45
7 2017 2 45
7 2017 3 45
7 2017 4 45
7 2017 5 45
7 2017 6 45
7 2017 7 45
7 2017 8 45
7 2017 9 45
7 2017 10 45
7 2017 11 45
7 2017 12 .
8 2017 1 12
8 2017 2 .
8 2017 3 .
8 2017 4 .
8 2017 5 .
8 2017 6 .
8 2017 7 .
8 2017 8 .
8 2017 9 .
8 2017 10 .
8 2017 11 .
8 2017 12 .
9 2017 1 .
9 2017 2 .
9 2017 3 .
9 2017 4 .
9 2017 5 .
9 2017 6 .
9 2017 7 .
9 2017 8 .
9 2017 9 .
9 2017 10 .
9 2017 11 .
9 2017 12 13
;
run;

data need;
  set have;
  by cust_id;
  retain holdage .;
  if first.cust_id then do;
    group=1;
    counter=1;
  end;
  else if not missing(customer_age) then do;
    if counter ne 1 then group+1;
    counter+1;
  end;
  keep cust_id year month group;
run;

data want;
  merge have need;
  by cust_id year month;
run;

data want (drop=customer_age group rename=(c_age=customer_age));
  do until (last.group);
    set want;
    by cust_id group;
    if not missing(customer_age) then c_age=customer_age;
  end;
   do until (last.group);
    set want;
    by cust_id group;
    if not missing(c_age) then output;
  end;
run;

Art, CEO, AnalystFinder.com

 

Note: the above code assumes that the original file is already sorted in cust_id year month order. Minimally, if all of the records for each ID are grouped together and in Year Month order, then only the following changes would have to be made to the code:

data need;
  set have;
  by cust_id notsorted;
  retain holdage .;
  if first.cust_id then do;
    group=1;
    counter=1;
  end;
  else if not missing(customer_age) then do;
    if counter ne 1 then group+1;
    counter+1;
  end;
  keep cust_id year month group;
run;

data want;
  merge have need;
  by cust_id year month notsorted;
run;

data want (drop=customer_age group rename=(c_age=customer_age));
  do until (last.group);
    set want;
    by cust_id group notsorted;
    if not missing(customer_age) then c_age=customer_age;
  end;
   do until (last.group);
    set want;
    by cust_id group notsorted;
    if not missing(c_age) then output;
  end;
run;

View solution in original post

6 REPLIES 6
Jim_G
Pyrite | Level 9

Add these data steps to your code.

 

data new;  set have;   by cust_id year month;    retain holdage .;
if first.cust_id then holdage=.;
if customer_age not =. then holdage=customer_age;
if last.cust_id then output;
keep cust_id holdage;
proc print; title 'new';


data three; merge have new ; by cust_id ;

if customer_age=. then customer_age =holdage;

if customer_age=. then delete; *else output;

proc print data=three; id cust_id customer_age holdage; run;

 

 

Jim

art297
Opal | Level 21

@Jim_G: Out of curiosity, are you my old friend from the Netherlands?

 

Art, CEO, AnalystFinder.com

 

Jim_G
Pyrite | Level 9

No Art,  I am from Ohio.  An old mainframe programmer.   Jim

Jagadishkatam
Amethyst | Level 16

proc sort data=have;
by cust_id year  descending customer_age month;
run;

data want;
set have;
by cust_id year  descending customer_age month;
retain new_age;
if first.cust_id then new_age =.;
if customer_age ne . then new_age = customer_age;
customer_age=coalesce(customer_age,new_age);
if customer_age^=.;
drop new_age;
run;

proc sort data=want;
by cust_id year month;
run;
Thanks,
Jag
art297
Opal | Level 21

I changed your example for cust_id eq 1 to meet your first condition. Since neither of the two proposed solutions came up with what I think you wanted, I came up with an alternative solution:

data have;
  input cust_id year month customer_age;
  cards;
1 2017 1 23
1 2017 2 23
1 2017 3 23
1 2017 4 .
1 2017 5 .
1 2017 6 .
1 2017 7 24
1 2017 8 24
1 2017 9 24
1 2017 10 .
1 2017 11 24
1 2017 12 24
2 2017 1 41
2 2017 2 41
2 2017 3 41
2 2017 4 41
2 2017 5 41
2 2017 6 41
2 2017 7 41
2 2017 8 41
2 2017 9 41
2 2017 10 41
2 2017 11 41
2 2017 12 .
3 2017 1 .
3 2017 2 .
3 2017 3 22
3 2017 4 22
3 2017 5 22
3 2017 6 22
3 2017 7 22
3 2017 8 22
3 2017 9 22
3 2017 10 22
3 2017 11 22
3 2017 12 22
4 2017 1 34
4 2017 2 34
4 2017 3 34
4 2017 4 34
4 2017 5 34
4 2017 6 34
4 2017 7 34
4 2017 8 34
4 2017 9 34
4 2017 10 34
4 2017 11 34
4 2017 12 34
5 2017 1 .
5 2017 2 .
5 2017 3 .
5 2017 4 .
5 2017 5 .
5 2017 6 .
5 2017 7 .
5 2017 8 .
5 2017 9 .
5 2017 10 .
5 2017 11 .
5 2017 12 .
6 2017 1 56
6 2017 2 .
6 2017 3 .
6 2017 4 .
6 2017 5 .
6 2017 6 56
6 2017 7 56
6 2017 8 56
6 2017 9 56
6 2017 10 56
6 2017 11 56
6 2017 12 56
7 2017 1 45
7 2017 2 45
7 2017 3 45
7 2017 4 45
7 2017 5 45
7 2017 6 45
7 2017 7 45
7 2017 8 45
7 2017 9 45
7 2017 10 45
7 2017 11 45
7 2017 12 .
8 2017 1 12
8 2017 2 .
8 2017 3 .
8 2017 4 .
8 2017 5 .
8 2017 6 .
8 2017 7 .
8 2017 8 .
8 2017 9 .
8 2017 10 .
8 2017 11 .
8 2017 12 .
9 2017 1 .
9 2017 2 .
9 2017 3 .
9 2017 4 .
9 2017 5 .
9 2017 6 .
9 2017 7 .
9 2017 8 .
9 2017 9 .
9 2017 10 .
9 2017 11 .
9 2017 12 13
;
run;

data need;
  set have;
  by cust_id;
  retain holdage .;
  if first.cust_id then do;
    group=1;
    counter=1;
  end;
  else if not missing(customer_age) then do;
    if counter ne 1 then group+1;
    counter+1;
  end;
  keep cust_id year month group;
run;

data want;
  merge have need;
  by cust_id year month;
run;

data want (drop=customer_age group rename=(c_age=customer_age));
  do until (last.group);
    set want;
    by cust_id group;
    if not missing(customer_age) then c_age=customer_age;
  end;
   do until (last.group);
    set want;
    by cust_id group;
    if not missing(c_age) then output;
  end;
run;

Art, CEO, AnalystFinder.com

 

Note: the above code assumes that the original file is already sorted in cust_id year month order. Minimally, if all of the records for each ID are grouped together and in Year Month order, then only the following changes would have to be made to the code:

data need;
  set have;
  by cust_id notsorted;
  retain holdage .;
  if first.cust_id then do;
    group=1;
    counter=1;
  end;
  else if not missing(customer_age) then do;
    if counter ne 1 then group+1;
    counter+1;
  end;
  keep cust_id year month group;
run;

data want;
  merge have need;
  by cust_id year month notsorted;
run;

data want (drop=customer_age group rename=(c_age=customer_age));
  do until (last.group);
    set want;
    by cust_id group notsorted;
    if not missing(customer_age) then c_age=customer_age;
  end;
   do until (last.group);
    set want;
    by cust_id group notsorted;
    if not missing(c_age) then output;
  end;
run;
noemi_b
Obsidian | Level 7

Thank you to everyone and each of you!! Smiley Happy

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 2492 views
  • 5 likes
  • 4 in conversation