BookmarkSubscribeRSS Feed
wtay
Calcite | Level 5

Hi, suppose I have the following Transaction table that records customers' spendings.  

 

CustID#

 

Home City

Trip City

 Amt

DateTime

1

 

NY

NY

100

20161201095033

1

 

NY

LA

200

20161203103032

1

 

NY

LA

300

20161204092330

1

 

NY

NY

500

20161210172538

1

 

NY

BOS

100

20170105113032

1

 

NY

LA

200

20170108072330

1

 

NY

NY

300

20170108172538

 

I want to find out trip details about each customer. In particular, when a customer is away from his/her home city, what is the total amount that he/she spent in the city, as well as the dates of the first and last transactions that were made in that city.

 

Cust#   

Home City

Trip City

FirstDateTime

LastDateTime

TotalAmt

1

NY

LA

20161203103032

20161204092330

500

1

NY

BOS

20170105113032

20170105113032

100

1

NY

LA

20170108072330

20170108072330

200

 

I don't know how to write a query to accomplish this requirement. Please help.

 

Thank you.

14 REPLIES 14
Reeza
Super User

@wtay wrote:

Hi, suppose I have the following Transaction table that records customers' spendings.  

 

CustID#

 

Home City

Trip City

 Amt

DateTime

1

 

NY

NY

100

20161201095033

1

 

NY

LA

200

20161203103032

1

 

NY

LA

300

20161204092330

1

 

NY

NY

500

20161210172538

1

 

NY

BOS

100

20170105113032

1

 

NY

LA

200

20170108072330

1

 

NY

NY

300

20170108172538

 

I want to find out trip details about each customer. In particular, when a customer is away from his/her home city, what is the total amount that he/she spent in the city, as well as the dates of the first and last transactions that were made in that city.

 

Cust#   

Home City

Trip City

FirstDateTime

LastDateTime

TotalAmt

1

NY

LA

20161203103032

20161204092330

500

1

NY

BOS

20170105113032

20170105113032

100

1

NY

LA

20170108072330

20170108072330

200

 

I don't know how to write a query to accomplish this requirement. Please help.

 

Thank you.


What have you tried, what do you have so far?

ballardw
Super User

When you say "query" do you mean you want a Proc SQL solution?

Since your "trip" does not have an actual identifier adding that could be the first step and that may require datastep processing since order of the data appears to be needed to identify different "trips", ie LA, then BOS, then back to LA as a third trip.

Kurt_Bremser
Super User

Run a data step with

by cust_id trip_city;

Retain a variable for start_datetime and a variable for total_amount.

At first.trip_city, set the start_datetime and set the total_amount to zero.

Add amd to total_amount.

At last.trip_city, set the end_datetime and do an output.

Keep only the variables needed.

wtay
Calcite | Level 5

Thank you. I think there's just no way simply writing a Proc SQL to accomplish the task.

 

I am totally new to data step... relatively new to SAS. Could you write some data step code to show me so that I can learn from it?

 

Thanks.

Astounding
PROC Star

SAS can do this pretty easily, although it's not considered a query:

 

proc summary data=have nway;

   var Amt datetime;

   class custid HomeCity TripCity;

   where HomeCity ne TripCity;

   output out=want (drop=_type_ _freq_) sum(Amt)=TotalAmt min(datetime) = FirstDateTime max(datetime) = LastDateTime;

run;

 

Unfortunately, it's not clear what your DATETIME variable actually contains, so computing a duration can only take place once we understand what is in that variable.  But the above code should be fine whether DATETIME contains a 14-digit integer or whether it contains a true SAS datetime value.

sotojcr
Obsidian | Level 7
proc sql;
select	custID,
		HomeCity,
		TripCity,
		min(DateTime)	as	FirstDateTime,
		max(DateTime)	as	LastDateTime,
		sum(Amt)		as	TotalAmt
from Transaction_Table
group by custID,
		HomeCity,
		TripCity;
quit;
wtay
Calcite | Level 5

Thanks for suggesting a solution. I actually started with something similar to yours, which is as follows:

 

proc sql;

create table TripDetails_Table AS

select   CustID#,

         Home City,

         Trip City,

         MIN(DateTime) AS FirstDateTime,    // DateTime is YYYYMMDDHHMMSS

         MAX(DateTime) AS LastDateTime,

         SUM(Amt) AS TotalAmt

from Transaction Table

where Trip City ne Home City

group by CustID#, Trip City;

quit;

 

Given that CustID# and Home City is a 1-1 mapping, I suppose grouping by CustID#, Home City, Trip City is the same as grouping by Card#, Trip City, right?

 

Based on the above query, I think SAS will first eliminate the records whereby Home City = Trip City, then it will group the CustID# followed by Trip City together like this:

 

CustID#

 

Home City

Trip City

 Amt

DateTime

1

 

NY

LA

200

20161203103032

1

 

NY

LA

300

20161204092330

1

 

NY

LA

200

20170108072330

1

 

NY

BOS

100

20170105113032

 

 

 

 

 

 

 

Then, the result would be:

CustID#   

Home City

Trip City

FirstDateTime

LastDateTime

TotalAmt

1

NY

LA

20161203103032

20170108072330

700

1

NY

BOS

20170105113032

20170105113032

100

which is not what I want.

 

I want the following:

Cust#   

Home City

Trip City

FirstDateTime

LastDateTime

TotalAmt

1

NY

LA

20161203103032

20161204092330

500

1

NY

BOS

20170105113032

20170105113032

100

1

NY

LA

20170108072330

20170108072330

200

ballardw
Super User

Here is one way to add a trip number.

data junk;
   input id homecity $ tripcity $;
datalines;
1  NY  NY
1  NY LA
1  NY LA
1  NY NY
1  NY BO
1  NY LA
1  NY Ny
;
run;

data temp;
   set Junk;
   by  id;
   retain TripNum;
   Lagtrip=lag(TripCity);
   if first.id then tripNum=0;
   If LagTrip ne tripcity then Tripnum+1;
   if tripcity ne homecity then ActualTripNum=TripNum;
   drop lagtrip tripnum;
run;

Then you would use group by ID, ActualTripNum in the query.

 

Assumes the data is sorted by Id Datetime.

ShiroAmada
Lapis Lazuli | Level 10

Try this.....

 

proc sql;
  create table WANT(drop=year) as
select
  CUSTID,
  HOME,
TRIP,
year(datetime_var) as year,
sum(amt) as tot_sum,
min(datetime_var) as firstdttm,
max(datetime_var) as lastdttm
from HAVE
  where HOME ^= TRIP
group by
  CUSTID, HOME, TRIP, calculated YEAR
order by
  CUSTID, calculated firstdttm;
quit;

 

Hope this helps..

 

I tried and got exactly the same as your "WANT" output.  However, i expect (not assuming) that you have successfully converted your datetime value correctly.  I used the year function as i only used date (no time information - i'm lazy to infuse the time value).  You can tweak this a little by using (year(datepart(datetime_var)) --where you extract date from the datetime value first then extract the year. The functions MIN and MAX will pick the correct value as records with the same date will be unique because of time information forces it to be unique (split seconds makes the BIG difference, if its there).

Reeza
Super User

From the data you've shown, how do you know that which NY/LA data to group, besides the order they're in? Or is that all you have to go on....that seems a bit sloppy, what if there were back to back trips to the same place?

wtay
Calcite | Level 5

Assuming that the customer spends money when he/she is back in the home city, I am interested in finding a trip/series of trips in the period when he/she is away from the Home City. During this period, he/she may travel to multiple cities and each travel will be considered a trip.

 

Will

Reeza
Super User

@wtay wrote:

Assuming that the customer spends money when he/she is back in the home city, I am interested in finding a trip/series of trips in the period when he/she is away from the Home City. During this period, he/she may travel to multiple cities and each travel will be considered a trip.

 

Will


That's nice for context, but doesn't explain your data. 

Looking at your data, what are the exact rules for defining a single trip?

 

Is it based on the order you currently presented? Can you guarantee its always in the correct order? How did you create that order in the first place, is there some other variables you're using?

 

 

 

ballardw
Super User

@wtay wrote:

Assuming that the customer spends money when he/she is back in the home city, I am interested in finding a trip/series of trips in the period when he/she is away from the Home City. During this period, he/she may travel to multiple cities and each travel will be considered a trip.

 

Will


Did you look at my code example?

 

If you didn't like it because it did not have all of your variables then post the data in a format that I don't have to retype everything to create a working data set. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

Kurt_Bremser
Super User

@wtay wrote:

Assuming that the customer spends money when he/she is back in the home city, I am interested in finding a trip/series of trips in the period when he/she is away from the Home City. During this period, he/she may travel to multiple cities and each travel will be considered a trip.

 

Will


I will expand on my previous post.

First, sort your existing dataset by id and datetime.

Then run a data step

data want;
set have;

with

by cust_id trip_city notsorted;

(the notsorted option for trip_city is essential)

Retain a variable for start_datetime and a variable for total_amount:

retain start_datetime total_amount;

At first.trip_city, set the start_datetime and set the total_amount to zero:

if first.trip_city
then do;
  start_datetime = datetime;
  total_amount = 0;
end;

Add amd to total_amount.

total_amount + amt;

At last.trip_city, set the end_datetime and do an output.

if last.trip_city
then do;
  end_datetime = datetime;
  output;
end;

Keep only the variables needed:

keep id trip_city start_datetime end_datetime total_amount;

Additionally, end the data step:

run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 1194 views
  • 0 likes
  • 7 in conversation