Write and run SAS programs in your web browser

How to Remove observations that have the same value over time

Accepted Solution Solved
Reply
Contributor
Posts: 25
Accepted Solution

How to Remove observations that have the same value over time

Hi all!

I am stuck with the process of removing missing values from my dataset that I hope you will help me to solve. I am using SAS University Edition v9.4 . 

Dataset Description:


I have a confidential dataset consisting of 135000 bank customers and 172 variables that describe the customer's demographics, the accounts' transactions volume, the client's geographical coordinates, dummies for having a mortgage or not, etc...

All observations are sorted by customer ID and collected across the time period that goes from April 2014 to August 2015. 
Well..this dataset is a panel data and among the 172 variables I have the variable "customer_status_numeric" that is 1 (if customer is Passive) and 0 if customer is Active. The peculiarity of this variable is that as soon as the customer turns from Active into Passive there are missing values for the rest of the observation period. It means that if the dummy "customer_status_numeric" is 1 (Passive) then all the other 169 variables (except for customer ID and month) become missing values. This is because when the customer becomes Passive it means that his/her bank account gets closed and therefore no data will be collected for the remaining time period.

Problem to be solved:

Now my problem is as follows. There are certain customers that are passive across all the observation period, and, thus, they are meaningless for my analysis given the fact that they present no observations but customer ID and Month. 
Therefore, I have to find a way to tell SAS to delete those customers that are Passive both on 01-Apr-2014 and on 01-Aug-2015. 
However, there are also customers that are Active until a certain date- i.e. 01-May-2015- and then they become Passive for the remaining period- i.e- from 01-May-2015 to 01-Aug-2015-. I need to keep those customers as they will be relevant for my study.

In other words, I need to delete customers whose dummy variable "customer_status_numeric" appears to be 1 both on 01-Apr-2014 and on 01-Aug-2015. In this way I will remove both customers who are Passive from the beginning to the end of the time period and those clients who are active first, then passive for a certain time interval, and then active again and finally passive again. 

Attached a small snapshot of my dataset.

 

I hope I gave you a bit of an overview of my problem and that you can help. Please, reply in simple terms. I am still a beginner of SAS so I will not understand if you will use too many difficult statistical terms in your answer. I only need a code that hopefully will help! 

Thank you in advance!


Screen Shot 2017-07-03 at 17.05.49.png

Accepted Solutions
Solution
‎07-11-2017 06:32 AM
Super User
Posts: 5,083

Re: How to Remove observations that have the same value over time

One piece that's not clear:  Is your date variable a character string, or is it a numeric?  PROC CONTENTS will tell you that.

 

You would identify the observations slightly differently.  For character strings:

 

set have (where=(date='2014-04-01'));

 

vs. for numerics:

 

set have (where=(date='01Apr2014'd));

 

Using the character strings to illustrate, here is a way you could do this:

 

data want;

set have (where=(date='2014-04-01') in=in1)

   have (where=(date='2015-08-01') in=in2)

   have (in=in3);

by customer_id;

if in1 then total_passive = customer_status_numeric;

else if in2 then total_passive + customer_status_numeric;

if in3 and total_passive < 2;

run;

 

This logic does assume that you have data for the complete time period for all customers.

View solution in original post


All Replies
Solution
‎07-11-2017 06:32 AM
Super User
Posts: 5,083

Re: How to Remove observations that have the same value over time

One piece that's not clear:  Is your date variable a character string, or is it a numeric?  PROC CONTENTS will tell you that.

 

You would identify the observations slightly differently.  For character strings:

 

set have (where=(date='2014-04-01'));

 

vs. for numerics:

 

set have (where=(date='01Apr2014'd));

 

Using the character strings to illustrate, here is a way you could do this:

 

data want;

set have (where=(date='2014-04-01') in=in1)

   have (where=(date='2015-08-01') in=in2)

   have (in=in3);

by customer_id;

if in1 then total_passive = customer_status_numeric;

else if in2 then total_passive + customer_status_numeric;

if in3 and total_passive < 2;

run;

 

This logic does assume that you have data for the complete time period for all customers.

Contributor
Posts: 25

Re: How to Remove observations that have the same value over time

[ Edited ]

@Astounding Thank you for the kind reply!

First of all, my date variable is numeric. Secondly, when you write "This logic does assume that you have data for the complete time period for all customers" what do you mean? Do you mean that the logic assumes I have data in the variable "customer_status_numeric" for the complete time period and for all customers? If yes, then sure I have data for all the customers and I will try your code out. 

Super User
Posts: 5,083

Re: How to Remove observations that have the same value over time

Actually, it means you have customer_status_numeric for both date values (starting and ending point).

 

Since date is numeric, you need to switch to the other form for referring to specific dates:

 

set have (where=(date=('01apr2014'd) in=in1)

   have (where=(date=('1Aug2015'd) in=in2)

   have (in=in3);

 

Contributor
Posts: 25

Re: How to Remove observations that have the same value over time

@Astounding Yes, I do have customer_status_numeric for both date values. Then your logic is right!

I only have one more question to address you. By writing in=in1, in=in2 and in=in3, do we tell to SAS that if the variable customer_status_numeric is 1 both on 01-Apr-2014 and on 01-Aug-2015 then delete customer_id from the dataset? What I actually need to solve my issue. 

Super User
Posts: 5,083

Re: How to Remove observations that have the same value over time

The in= variables determine where the observation came from.  So customer by customer ...

 

The in=in1 observations bring in just the April 1, 2014 observation, and sets up the initial total status value for that customer.

 

The in=in2 observations bring in just the August 1, 2015 observation, and possibly increments the total status value.

 

The in=in3 observations bring in all observations for the customer (including the April 1 and August 1 observations).

 

The IF statement subsets observations.  It looks for in3 and the previously computed total status to decide whether or not to keep the observation or delete it.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 249 views
  • 1 like
  • 2 in conversation