Hello Community,
Is it something that could be done using SAS?
I have a table like this:
Customer | Start_Date | Rental_Date | Grace_Period | MovieTitle | Movie1 | Movie2 | Movies_List |
12345 | 13-Mar-13 | 13-Mar-13 | 30 | Gone Girl | Gone Girl | Titanic | Gone Girl,Titanic |
12345 | 13-Mar-13 | 13-Mar-13 | 30 | Titanic | Gone Girl | Titanic | Gone Girl,Titanic |
12345 | 13-Mar-13 | 8-Apr-13 | 30 | Gone Girl | Gone Girl | Titanic | Gone Girl,Titanic |
12345 | 13-Mar-13 | 8-Apr-13 | 30 | Titanic | Gone Girl | Titanic | Gone Girl,Titanic |
12345 | 13-Mar-13 | 4-May-13 | 30 | Gone Girl | Gone Girl | Titanic | Gone Girl,Titanic |
12345 | 13-Mar-13 | 4-May-13 | 30 | Titanic | Gone Girl | Titanic | Gone Girl,Titanic |
12345 | 13-Mar-13 | 11-Feb-15 | 30 | Hunger Games | Gone Girl | Titanic | Gone Girl,Titanic |
12345 | 13-Mar-13 | 11-Feb-15 | 30 | Titanic | Gone Girl | Titanic | Gone Girl,Titanic |
67890 | 10-Aug-14 | 10-Aug-14 | 30 | Pitch Perfect | Pitch Perfect |
| Pitch Perfect |
67890 | 10-Aug-14 | 8-Sep-14 | 30 | Pitch Perfect | Pitch Perfect |
| Pitch Perfect |
67890 | 10-Aug-14 | 17-Oct-14 | 30 | Pitch Perfect | Pitch Perfect |
| Pitch Perfect |
67890 | 10-Aug-14 | 29-Nov-14 | 30 | Pitch Perfect | Pitch Perfect |
| Pitch Perfect |
67890 | 10-Aug-14 | 31-Dec-14 | 30 | Booksmart | Pitch Perfect |
| Pitch Perfect |
67890 | 10-Aug-14 | 10-Feb-15 | 30 | Booksmart | Pitch Perfect |
| Pitch Perfect |
67890 | 10-Aug-14 | 22-May-15 | 30 | Booksmart | Pitch Perfect |
| Pitch Perfect |
67890 | 10-Aug-14 | 29-Jun-15 | 30 | Booksmart | Pitch Perfect |
| Pitch Perfect |
… where there are many rows per 1 customer and the table records the dates (Rental_Date) and the movies (MovieTitle) the customer rented on those dates. Sometimes a customer rents more than 1 movie, ex: customer 12345. This customer rented two movies on March 13, 2013.
What I want to do is to create a new column called “Switch_Status” that records whether the customer switched movies after whatever their initial movie/movies selection was. For example, customer 12345 initially rented 2 movies: Gone Girl and Titanic. I want the new column to record “Switch_Status” = “yes” because in the time Rental_Date+Grace_Period, they eventually stopped renting Gone Girl and Titanic, and instead rented Hunger Games and Titanic (February 11, 2015).
The second example customer 67890 is easier to record. Here, this customer initially rented Pitch Perfect and then after Rental_Date+Grace Period, there is evidence that they switched the movie they initially rented (Pitch Perfect), and instead rented Booksmart. So for this customer also, “Switch_Status” would be “yes”.
And obviously, if there is no evidence of switch, I want “Switch_Status” to equal “no”.
Is there any easy way to do this?
Thank you!
Sure, this is doable but I'd say 2 years out is really not within any grace period...
It would help if you showed a smaller, realistic example and the expected outcome. A description goes only so far, whereas showing the expected output for an input data set really helps to illustrate what you're trying to accomplish.
@iced_tea wrote:
Hello Community,
Is it something that could be done using SAS?
I have a table like this:
Customer
Start_Date
Rental_Date
Grace_Period
MovieTitle
Movie1
Movie2
Movies_List
12345
13-Mar-13
13-Mar-13
30
Gone Girl
Gone Girl
Titanic
Gone Girl,Titanic
12345
13-Mar-13
13-Mar-13
30
Titanic
Gone Girl
Titanic
Gone Girl,Titanic
12345
13-Mar-13
8-Apr-13
30
Gone Girl
Gone Girl
Titanic
Gone Girl,Titanic
12345
13-Mar-13
8-Apr-13
30
Titanic
Gone Girl
Titanic
Gone Girl,Titanic
12345
13-Mar-13
4-May-13
30
Gone Girl
Gone Girl
Titanic
Gone Girl,Titanic
12345
13-Mar-13
4-May-13
30
Titanic
Gone Girl
Titanic
Gone Girl,Titanic
12345
13-Mar-13
11-Feb-15
30
Hunger Games
Gone Girl
Titanic
Gone Girl,Titanic
12345
13-Mar-13
11-Feb-15
30
Titanic
Gone Girl
Titanic
Gone Girl,Titanic
67890
10-Aug-14
10-Aug-14
30
Pitch Perfect
Pitch Perfect
Pitch Perfect
67890
10-Aug-14
8-Sep-14
30
Pitch Perfect
Pitch Perfect
Pitch Perfect
67890
10-Aug-14
17-Oct-14
30
Pitch Perfect
Pitch Perfect
Pitch Perfect
67890
10-Aug-14
29-Nov-14
30
Pitch Perfect
Pitch Perfect
Pitch Perfect
67890
10-Aug-14
31-Dec-14
30
Booksmart
Pitch Perfect
Pitch Perfect
67890
10-Aug-14
10-Feb-15
30
Booksmart
Pitch Perfect
Pitch Perfect
67890
10-Aug-14
22-May-15
30
Booksmart
Pitch Perfect
Pitch Perfect
67890
10-Aug-14
29-Jun-15
30
Booksmart
Pitch Perfect
Pitch Perfect
… where there are many rows per 1 customer and the table records the dates (Rental_Date) and the movies (MovieTitle) the customer rented on those dates. Sometimes a customer rents more than 1 movie, ex: customer 12345. This customer rented two movies on March 13, 2013.
What I want to do is to create a new column called “Switch_Status” that records whether the customer switched movies after whatever their initial movie/movies selection was. For example, customer 12345 initially rented 2 movies: Gone Girl and Titanic. I want the new column to record “Switch_Status” = “yes” because in the time Rental_Date+Grace_Period, they eventually stopped renting Gone Girl and Titanic, and instead rented Hunger Games and Titanic (February 11, 2015).
The second example customer 67890 is easier to record. Here, this customer initially rented Pitch Perfect and then after Rental_Date+Grace Period, there is evidence that they switched the movie they initially rented (Pitch Perfect), and instead rented Booksmart. So for this customer also, “Switch_Status” would be “yes”.
And obviously, if there is no evidence of switch, I want “Switch_Status” to equal “no”.
Is there any easy way to do this?
Thank you!
I'm wouldn't say this is easy, but it can be done. Here are the results, below, and at the very bottom, the SAS code. Note: I changed the last two rental dates for customer 12345 so that they would be within the 30 day grace period.
Is this what you were looking for?
Jim
DATA Cust_Movie_Data_Switch;
DROP _:;
SET Cust_Movie_Data;
BY Customer Rental_Date;
ARRAY Prior_Movies [*] $32 _Prior_Movie1 _Prior_Movie2;
FORMAT _Prior_Rental_Date DATE9.;
FORMAT _Prior2_Rental_Date DATE9.;
FORMAT _Temp_Date DATE9.;
LENGTH _Prior_Movie1 $32;
LENGTH _Prior_Movie2 $32;
LENGTH _Prior2_Movie1 $32;
LENGTH _Prior2_Movie2 $32;
LENGTH _Temp1 $32;
LENGTH _Temp2 $32;
Switch_Status = 'No ';
_Prior_Customer = LAG(Customer);
_Prior2_Customer = LAG2(Customer);
_Prior_Rental_Date = LAG(Rental_Date);
_Prior2_Rental_Date = LAG2(Rental_Date);
_Prior_Movie1 = LAG(Movie1);
_Prior_Movie2 = LAG(Movie2);
_Prior2_Movie1 = LAG2(Movie1);
_Prior2_Movie2 = LAG2(Movie2);
_Prior_Rental_Date = LAG(Rental_Date);
_Temp1 = Prior_Movies[1];
_Temp2 = Prior_Movies[2];
IF FIRST.Customer THEN
DO;
CALL MISSING( _Prior_Customer
,_Prior_Rental_Date
,_Prior2_Rental_Date
,_Prior_Movie1
,_Prior_Movie2
,_Prior_Rental_Date);
END;
IF Rental_Date = _Prior_Rental_Date THEN
IF _Prior2_Customer = Customer THEN
DO;
_Prior_Rental_Date = _Prior2_Rental_Date;
_Prior_Movie1 = _Prior2_Movie1;
_Prior_Movie2 = _Prior2_Movie2;
END;
ELSE
DO;
CALL MISSING( _Prior_Rental_Date
,_Prior_Movie1
,_Prior_Movie2);
END;
IF NOT MISSING(_Prior_Rental_Date) THEN
_Temp_Date = INTNX('DAYS', _Prior_Rental_Date, Grace_Period);
IF NOT MISSING(_Prior_Rental_Date) AND
Rental_Date < INTNX('DAYS', _Prior_Rental_Date, Grace_Period) THEN
IF Movie1 NOT IN Prior_Movies OR
Movie2 NOT IN Prior_Movies THEN
DO;
Switch_Status = 'Yes';
END;
RUN;
Thank you Jim. I really appreciate your taking a look.
The only thing is that - in the case of the last two rental dates for customer 12345 being way outside the 30 days, I still would want this customer to be flagged as "Switch_Status" = "yes".
My apologies. I might have not made it clear. So, if the customer has a different movie/movies within Rental_Date+Grace_Period, then it's no switch. It is a switch if they have different movie/movies AFTER Rental_Date+Grace_Period. Does that make sense?
Ah, ah, I see. I misunderstood. I assumed the opposite, that within the grace period was considered a switch.
Let me reverse the logic.
Jim
Yes, all I had to do was reverse the less than/greater than symbol. Now record 13 also shows as a switch ("Yes"). Record 13 is a day or two past the grace period.
Results and code are below.
Jim
DATA Cust_Movie_Data;
INFILE DATALINES
DSD DLM='09'x MISSOVER;
INPUT Customer $
Start_Date : ANYDTDTE10.
Rental_Date : ANYDTDTE10.
Grace_Period
MovieTitle : $32.
Movie1 : $32.
Movie2 : $32.
Movies_List : $64.
;
FORMAT Start_Date DATE9.;
FORMAT Rental_Date DATE9.;
DATALINES;
12345 13-Mar-13 13-Mar-13 30 Gone Girl Gone Girl Titanic Gone Girl,Titanic
12345 13-Mar-13 13-Mar-13 30 Titanic Gone Girl Titanic Gone Girl,Titanic
12345 13-Mar-13 8-Apr-13 30 Gone Girl Gone Girl Titanic Gone Girl,Titanic
12345 13-Mar-13 8-Apr-13 30 Titanic Gone Girl Titanic Gone Girl,Titanic
12345 13-Mar-13 4-May-13 30 Gone Girl Gone Girl Titanic Gone Girl,Titanic
12345 13-Mar-13 4-May-13 30 Titanic Gone Girl Titanic Gone Girl,Titanic
12345 13-Mar-13 11-Feb_15 30 Hunger Games Hunger Games Titanic Hunger Games,Titanic
12345 13-Mar-13 11-Feb_15 30 Titanic Hunger Games Titanic Hunger Games,Titanic
67890 10-Aug-14 10-Aug-14 30 Pitch Perfect Pitch Perfect Pitch Perfect
67890 10-Aug-14 8-Sep-14 30 Pitch Perfect Pitch Perfect Pitch Perfect
67890 10-Aug-14 17-Oct-14 30 Pitch Perfect Pitch Perfect Pitch Perfect
67890 10-Aug-14 29-Nov-14 30 Pitch Perfect Pitch Perfect Pitch Perfect
67890 10-Aug-14 31-Dec-14 30 Booksmart Booksmart Booksmart
67890 10-Aug-14 10-Feb-15 30 Booksmart Booksmart Booksmart
67890 10-Aug-14 22-May-15 30 Booksmart Booksmart Booksmart
67890 10-Aug-14 29-Jun-15 30 Booksmart Booksmart Booksmart
;
RUN;
PROC SORT DATA=Cust_Movie_Data;
BY Customer Rental_Date;
RUN;
DATA Cust_Movie_Data_Switch;
DROP _:;
SET Cust_Movie_Data;
BY Customer Rental_Date;
ARRAY Prior_Movies [*] $32 _Prior_Movie1 _Prior_Movie2;
FORMAT _Prior_Rental_Date DATE9.;
FORMAT _Prior2_Rental_Date DATE9.;
FORMAT _Temp_Date DATE9.;
LENGTH _Prior_Movie1 $32;
LENGTH _Prior_Movie2 $32;
LENGTH _Prior2_Movie1 $32;
LENGTH _Prior2_Movie2 $32;
LENGTH _Temp1 $32;
LENGTH _Temp2 $32;
Switch_Status = 'No ';
_Prior_Customer = LAG(Customer);
_Prior2_Customer = LAG2(Customer);
_Prior_Rental_Date = LAG(Rental_Date);
_Prior2_Rental_Date = LAG2(Rental_Date);
_Prior_Movie1 = LAG(Movie1);
_Prior_Movie2 = LAG(Movie2);
_Prior2_Movie1 = LAG2(Movie1);
_Prior2_Movie2 = LAG2(Movie2);
_Prior_Rental_Date = LAG(Rental_Date);
_Temp1 = Prior_Movies[1];
_Temp2 = Prior_Movies[2];
IF FIRST.Customer THEN
DO;
CALL MISSING( _Prior_Customer
,_Prior_Rental_Date
,_Prior2_Rental_Date
,_Prior_Movie1
,_Prior_Movie2
,_Prior_Rental_Date);
END;
IF Rental_Date = _Prior_Rental_Date THEN
IF _Prior2_Customer = Customer THEN
DO;
_Prior_Rental_Date = _Prior2_Rental_Date;
_Prior_Movie1 = _Prior2_Movie1;
_Prior_Movie2 = _Prior2_Movie2;
END;
ELSE
DO;
CALL MISSING( _Prior_Rental_Date
,_Prior_Movie1
,_Prior_Movie2);
END;
IF NOT MISSING(_Prior_Rental_Date) THEN
_Temp_Date = INTNX('DAYS', _Prior_Rental_Date, Grace_Period);
IF NOT MISSING(_Prior_Rental_Date) AND
Rental_Date > INTNX('DAYS', _Prior_Rental_Date, Grace_Period) THEN
IF Movie1 NOT IN Prior_Movies OR
Movie2 NOT IN Prior_Movies THEN
DO;
Switch_Status = 'Yes';
END;
RUN;
Yes, probably the better option. With such a small dataset, I just threw in a SORT, but perhaps that's a bit lazy on my part. 😳
Jim
Interesting how it's working for you. I see the output you got. That's exactly what I'd like. But for me, even running the code to create Cust_Movie_Data only produces an empty table.
I tried reworking your code for my real data, and everyone ends up being Switch_Status = No, even though it is obvious some should be 'Yes', just by looking at their data row by row.
That's very very strange. You cut in paste every line of code even the DATALINES?
Are you using SAS Studio? Display Manager? Enterprise Guide? Which one are you using?
Try closing your session, opening a brand new session and cutting and pasting in the below code. I changed the DATALINES back to February 2015, so it's important to get these DATALINES, not the ones from before where I had the requirements backwards.
Jim
DATA Cust_Movie_Data;
INFILE DATALINES
DSD DLM='09'x MISSOVER;
INPUT Customer $
Start_Date : ANYDTDTE10.
Rental_Date : ANYDTDTE10.
Grace_Period
MovieTitle : $32.
Movie1 : $32.
Movie2 : $32.
Movies_List : $64.
;
FORMAT Start_Date DATE9.;
FORMAT Rental_Date DATE9.;
DATALINES;
12345 13-Mar-13 13-Mar-13 30 Gone Girl Gone Girl Titanic Gone Girl,Titanic
12345 13-Mar-13 13-Mar-13 30 Titanic Gone Girl Titanic Gone Girl,Titanic
12345 13-Mar-13 8-Apr-13 30 Gone Girl Gone Girl Titanic Gone Girl,Titanic
12345 13-Mar-13 8-Apr-13 30 Titanic Gone Girl Titanic Gone Girl,Titanic
12345 13-Mar-13 4-May-13 30 Gone Girl Gone Girl Titanic Gone Girl,Titanic
12345 13-Mar-13 4-May-13 30 Titanic Gone Girl Titanic Gone Girl,Titanic
12345 13-Mar-13 11-Feb_15 30 Hunger Games Hunger Games Titanic Hunger Games,Titanic
12345 13-Mar-13 11-Feb_15 30 Titanic Hunger Games Titanic Hunger Games,Titanic
67890 10-Aug-14 10-Aug-14 30 Pitch Perfect Pitch Perfect Pitch Perfect
67890 10-Aug-14 8-Sep-14 30 Pitch Perfect Pitch Perfect Pitch Perfect
67890 10-Aug-14 17-Oct-14 30 Pitch Perfect Pitch Perfect Pitch Perfect
67890 10-Aug-14 29-Nov-14 30 Pitch Perfect Pitch Perfect Pitch Perfect
67890 10-Aug-14 31-Dec-14 30 Booksmart Booksmart Booksmart
67890 10-Aug-14 10-Feb-15 30 Booksmart Booksmart Booksmart
67890 10-Aug-14 22-May-15 30 Booksmart Booksmart Booksmart
67890 10-Aug-14 29-Jun-15 30 Booksmart Booksmart Booksmart
;
RUN;
PROC SORT DATA=Cust_Movie_Data;
BY Customer Rental_Date;
RUN;
DATA Cust_Movie_Data_Switch;
DROP _:;
SET Cust_Movie_Data;
BY Customer Rental_Date;
ARRAY Prior_Movies [*] $32 _Prior_Movie1 _Prior_Movie2;
FORMAT _Prior_Rental_Date DATE9.;
FORMAT _Prior2_Rental_Date DATE9.;
FORMAT _Temp_Date DATE9.;
LENGTH _Prior_Movie1 $32;
LENGTH _Prior_Movie2 $32;
LENGTH _Prior2_Movie1 $32;
LENGTH _Prior2_Movie2 $32;
LENGTH _Temp1 $32;
LENGTH _Temp2 $32;
Switch_Status = 'No ';
_Prior_Customer = LAG(Customer);
_Prior2_Customer = LAG2(Customer);
_Prior_Rental_Date = LAG(Rental_Date);
_Prior2_Rental_Date = LAG2(Rental_Date);
_Prior_Movie1 = LAG(Movie1);
_Prior_Movie2 = LAG(Movie2);
_Prior2_Movie1 = LAG2(Movie1);
_Prior2_Movie2 = LAG2(Movie2);
_Prior_Rental_Date = LAG(Rental_Date);
_Temp1 = Prior_Movies[1];
_Temp2 = Prior_Movies[2];
IF FIRST.Customer THEN
DO;
CALL MISSING( _Prior_Customer
,_Prior_Rental_Date
,_Prior2_Rental_Date
,_Prior_Movie1
,_Prior_Movie2
,_Prior_Rental_Date);
END;
IF Rental_Date = _Prior_Rental_Date THEN
IF _Prior2_Customer = Customer THEN
DO;
_Prior_Rental_Date = _Prior2_Rental_Date;
_Prior_Movie1 = _Prior2_Movie1;
_Prior_Movie2 = _Prior2_Movie2;
END;
ELSE
DO;
CALL MISSING( _Prior_Rental_Date
,_Prior_Movie1
,_Prior_Movie2);
END;
IF NOT MISSING(_Prior_Rental_Date) THEN
_Temp_Date = INTNX('DAYS', _Prior_Rental_Date, Grace_Period);
IF NOT MISSING(_Prior_Rental_Date) AND
Rental_Date > INTNX('DAYS', _Prior_Rental_Date, Grace_Period) THEN
IF Movie1 NOT IN Prior_Movies OR
Movie2 NOT IN Prior_Movies THEN
DO;
Switch_Status = 'Yes';
END;
RUN;
hmmm interesting... I commented out the 3rd line:
DATA Cust_Movie_Data; INFILE DATALINES ; /* DSD DLM='09'x MISSOVER;*/
... and Cust_Movie_Data produced properly.
I am using SAS 9.4 (not SAS Studio, not Enterprise Guide).
@iced_tea wrote:
hmmm interesting... I commented out the 3rd line:
DATA Cust_Movie_Data; INFILE DATALINES ; /* DSD DLM='09'x MISSOVER;*/... and Cust_Movie_Data produced properly.
I am using SAS 9.4 (not SAS Studio, not Enterprise Guide).
Well, for your next SAS trivia challenge: You're using Display Manager. Display Manager is the Windows front end to SAS. SAS runs on mainframes, Unix, Linux, Posix (and probably others), so while a lot of people describe Display Manager as "SAS" or "Base SAS" really SAS is a much broader thing. OK, enough SAS trivia. 🙂
I think that makes sense that you would have to comment out that line. I think what's happening is that when I cut and paste into the code window that tabs are getting converted to spaces. When I copied your data over to my system, there were tabs in the data, so I added the DSD and DLM parameters.
Jim
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.