Hi, suppose I have the following Transaction table that records customers' spendings.
CustID# |
|
Home City |
Trip City |
Amt |
DateTime |
1 |
|
NY |
NY |
100 |
20161201095033 |
1 |
|
NY |
LA |
200 |
20161203103032 |
1 |
|
NY |
LA |
300 |
20161204092330 |
1 |
|
NY |
NY |
500 |
20161210172538 |
1 |
|
NY |
BOS |
100 |
20170105113032 |
1 |
|
NY |
LA |
200 |
20170108072330 |
1 |
|
NY |
NY |
300 |
20170108172538 |
I want to find out trip details about each customer. In particular, when a customer is away from his/her home city, what is the total amount that he/she spent in the city, as well as the dates of the first and last transactions that were made in that city.
Cust# |
Home City |
Trip City |
FirstDateTime |
LastDateTime |
TotalAmt |
1 |
NY |
LA |
20161203103032 |
20161204092330 |
500 |
1 |
NY |
BOS |
20170105113032 |
20170105113032 |
100 |
1 |
NY |
LA |
20170108072330 |
20170108072330 |
200 |
I don't know how to write a query to accomplish this requirement. Please help.
Thank you.
@wtay wrote:
Hi, suppose I have the following Transaction table that records customers' spendings.
CustID#
Home City
Trip City
Amt
DateTime
1
NY
NY
100
20161201095033
1
NY
LA
200
20161203103032
1
NY
LA
300
20161204092330
1
NY
NY
500
20161210172538
1
NY
BOS
100
20170105113032
1
NY
LA
200
20170108072330
1
NY
NY
300
20170108172538
I want to find out trip details about each customer. In particular, when a customer is away from his/her home city, what is the total amount that he/she spent in the city, as well as the dates of the first and last transactions that were made in that city.
Cust#
Home City
Trip City
FirstDateTime
LastDateTime
TotalAmt
1
NY
LA
20161203103032
20161204092330
500
1
NY
BOS
20170105113032
20170105113032
100
1
NY
LA
20170108072330
20170108072330
200
I don't know how to write a query to accomplish this requirement. Please help.
Thank you.
What have you tried, what do you have so far?
When you say "query" do you mean you want a Proc SQL solution?
Since your "trip" does not have an actual identifier adding that could be the first step and that may require datastep processing since order of the data appears to be needed to identify different "trips", ie LA, then BOS, then back to LA as a third trip.
Run a data step with
by cust_id trip_city;
Retain a variable for start_datetime and a variable for total_amount.
At first.trip_city, set the start_datetime and set the total_amount to zero.
Add amd to total_amount.
At last.trip_city, set the end_datetime and do an output.
Keep only the variables needed.
Thank you. I think there's just no way simply writing a Proc SQL to accomplish the task.
I am totally new to data step... relatively new to SAS. Could you write some data step code to show me so that I can learn from it?
Thanks.
SAS can do this pretty easily, although it's not considered a query:
proc summary data=have nway;
var Amt datetime;
class custid HomeCity TripCity;
where HomeCity ne TripCity;
output out=want (drop=_type_ _freq_) sum(Amt)=TotalAmt min(datetime) = FirstDateTime max(datetime) = LastDateTime;
run;
Unfortunately, it's not clear what your DATETIME variable actually contains, so computing a duration can only take place once we understand what is in that variable. But the above code should be fine whether DATETIME contains a 14-digit integer or whether it contains a true SAS datetime value.
proc sql;
select custID,
HomeCity,
TripCity,
min(DateTime) as FirstDateTime,
max(DateTime) as LastDateTime,
sum(Amt) as TotalAmt
from Transaction_Table
group by custID,
HomeCity,
TripCity;
quit;
Thanks for suggesting a solution. I actually started with something similar to yours, which is as follows:
proc sql;
create table TripDetails_Table AS
select CustID#,
Home City,
Trip City,
MIN(DateTime) AS FirstDateTime, // DateTime is YYYYMMDDHHMMSS
MAX(DateTime) AS LastDateTime,
SUM(Amt) AS TotalAmt
from Transaction Table
where Trip City ne Home City
group by CustID#, Trip City;
quit;
Given that CustID# and Home City is a 1-1 mapping, I suppose grouping by CustID#, Home City, Trip City is the same as grouping by Card#, Trip City, right?
Based on the above query, I think SAS will first eliminate the records whereby Home City = Trip City, then it will group the CustID# followed by Trip City together like this:
CustID# |
| Home City | Trip City | Amt | DateTime |
1 |
| NY | LA | 200 | 20161203103032 |
1 |
| NY | LA | 300 | 20161204092330 |
1 |
| NY | LA | 200 | 20170108072330 |
1 |
| NY | BOS | 100 | 20170105113032 |
|
|
|
|
|
|
Then, the result would be:
CustID# | Home City | Trip City | FirstDateTime | LastDateTime | TotalAmt |
1 | NY | LA | 20161203103032 | 20170108072330 | 700 |
1 | NY | BOS | 20170105113032 | 20170105113032 | 100 |
which is not what I want.
I want the following:
Cust# | Home City | Trip City | FirstDateTime | LastDateTime | TotalAmt |
1 | NY | LA | 20161203103032 | 20161204092330 | 500 |
1 | NY | BOS | 20170105113032 | 20170105113032 | 100 |
1 | NY | LA | 20170108072330 | 20170108072330 | 200 |
Here is one way to add a trip number.
data junk; input id homecity $ tripcity $; datalines; 1 NY NY 1 NY LA 1 NY LA 1 NY NY 1 NY BO 1 NY LA 1 NY Ny ; run; data temp; set Junk; by id; retain TripNum; Lagtrip=lag(TripCity); if first.id then tripNum=0; If LagTrip ne tripcity then Tripnum+1; if tripcity ne homecity then ActualTripNum=TripNum; drop lagtrip tripnum; run;
Then you would use group by ID, ActualTripNum in the query.
Assumes the data is sorted by Id Datetime.
Try this.....
proc sql;
create table WANT(drop=year) as
select
CUSTID,
HOME,
TRIP,
year(datetime_var) as year,
sum(amt) as tot_sum,
min(datetime_var) as firstdttm,
max(datetime_var) as lastdttm
from HAVE
where HOME ^= TRIP
group by
CUSTID, HOME, TRIP, calculated YEAR
order by
CUSTID, calculated firstdttm;
quit;
Hope this helps..
I tried and got exactly the same as your "WANT" output. However, i expect (not assuming) that you have successfully converted your datetime value correctly. I used the year function as i only used date (no time information - i'm lazy to infuse the time value). You can tweak this a little by using (year(datepart(datetime_var)) --where you extract date from the datetime value first then extract the year. The functions MIN and MAX will pick the correct value as records with the same date will be unique because of time information forces it to be unique (split seconds makes the BIG difference, if its there).
From the data you've shown, how do you know that which NY/LA data to group, besides the order they're in? Or is that all you have to go on....that seems a bit sloppy, what if there were back to back trips to the same place?
Assuming that the customer spends money when he/she is back in the home city, I am interested in finding a trip/series of trips in the period when he/she is away from the Home City. During this period, he/she may travel to multiple cities and each travel will be considered a trip.
Will
@wtay wrote:
Assuming that the customer spends money when he/she is back in the home city, I am interested in finding a trip/series of trips in the period when he/she is away from the Home City. During this period, he/she may travel to multiple cities and each travel will be considered a trip.
Will
That's nice for context, but doesn't explain your data.
Looking at your data, what are the exact rules for defining a single trip?
Is it based on the order you currently presented? Can you guarantee its always in the correct order? How did you create that order in the first place, is there some other variables you're using?
@wtay wrote:
Assuming that the customer spends money when he/she is back in the home city, I am interested in finding a trip/series of trips in the period when he/she is away from the Home City. During this period, he/she may travel to multiple cities and each travel will be considered a trip.
Will
Did you look at my code example?
If you didn't like it because it did not have all of your variables then post the data in a format that I don't have to retype everything to create a working data set. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.
@wtay wrote:
Assuming that the customer spends money when he/she is back in the home city, I am interested in finding a trip/series of trips in the period when he/she is away from the Home City. During this period, he/she may travel to multiple cities and each travel will be considered a trip.
Will
I will expand on my previous post.
First, sort your existing dataset by id and datetime.
Then run a data step
data want;
set have;
with
by cust_id trip_city notsorted;
(the notsorted option for trip_city is essential)
Retain a variable for start_datetime and a variable for total_amount:
retain start_datetime total_amount;
At first.trip_city, set the start_datetime and set the total_amount to zero:
if first.trip_city
then do;
start_datetime = datetime;
total_amount = 0;
end;
Add amd to total_amount.
total_amount + amt;
At last.trip_city, set the end_datetime and do an output.
if last.trip_city
then do;
end_datetime = datetime;
output;
end;
Keep only the variables needed:
keep id trip_city start_datetime end_datetime total_amount;
Additionally, end the data step:
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.