How to summarize data

Reply
New Contributor
Posts: 4

How to summarize data

[ Edited ]

Hi, suppose I have the following Transaction table that records customers' spendings.  

 

CustID#

 

Home City

Trip City

 Amt

DateTime

1

 

NY

NY

100

20161201095033

1

 

NY

LA

200

20161203103032

1

 

NY

LA

300

20161204092330

1

 

NY

NY

500

20161210172538

1

 

NY

BOS

100

20170105113032

1

 

NY

LA

200

20170108072330

1

 

NY

NY

300

20170108172538

 

I want to find out trip details about each customer. In particular, when a customer is away from his/her home city, what is the total amount that he/she spent in the city, as well as the dates of the first and last transactions that were made in that city.

 

Cust#   

Home City

Trip City

FirstDateTime

LastDateTime

TotalAmt

1

NY

LA

20161203103032

20161204092330

500

1

NY

BOS

20170105113032

20170105113032

100

1

NY

LA

20170108072330

20170108072330

200

 

I don't know how to write a query to accomplish this requirement. Please help.

 

Thank you.

Super User
Posts: 23,663

Re: Need help writing a query !!!


wtay wrote:

Hi, suppose I have the following Transaction table that records customers' spendings.  

 

CustID#

 

Home City

Trip City

 Amt

DateTime

1

 

NY

NY

100

20161201095033

1

 

NY

LA

200

20161203103032

1

 

NY

LA

300

20161204092330

1

 

NY

NY

500

20161210172538

1

 

NY

BOS

100

20170105113032

1

 

NY

LA

200

20170108072330

1

 

NY

NY

300

20170108172538

 

I want to find out trip details about each customer. In particular, when a customer is away from his/her home city, what is the total amount that he/she spent in the city, as well as the dates of the first and last transactions that were made in that city.

 

Cust#   

Home City

Trip City

FirstDateTime

LastDateTime

TotalAmt

1

NY

LA

20161203103032

20161204092330

500

1

NY

BOS

20170105113032

20170105113032

100

1

NY

LA

20170108072330

20170108072330

200

 

I don't know how to write a query to accomplish this requirement. Please help.

 

Thank you.


What have you tried, what do you have so far?

Super User
Posts: 13,498

Re: Need help writing a query !!!

When you say "query" do you mean you want a Proc SQL solution?

Since your "trip" does not have an actual identifier adding that could be the first step and that may require datastep processing since order of the data appears to be needed to identify different "trips", ie LA, then BOS, then back to LA as a third trip.

Super User
Posts: 10,209

Re: Need help writing a query !!!

Run a data step with

by cust_id trip_city;

Retain a variable for start_datetime and a variable for total_amount.

At first.trip_city, set the start_datetime and set the total_amount to zero.

Add amd to total_amount.

At last.trip_city, set the end_datetime and do an output.

Keep only the variables needed.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
New Contributor
Posts: 4

Re: Need help writing a query !!!

Posted in reply to KurtBremser

Thank you. I think there's just no way simply writing a Proc SQL to accomplish the task.

 

I am totally new to data step... relatively new to SAS. Could you write some data step code to show me so that I can learn from it?

 

Thanks.

Super User
Posts: 6,751

Re: How to summarize data

SAS can do this pretty easily, although it's not considered a query:

 

proc summary data=have nway;

   var Amt datetime;

   class custid HomeCity TripCity;

   where HomeCity ne TripCity;

   output out=want (drop=_type_ _freq_) sum(Amt)=TotalAmt min(datetime) = FirstDateTime max(datetime) = LastDateTime;

run;

 

Unfortunately, it's not clear what your DATETIME variable actually contains, so computing a duration can only take place once we understand what is in that variable.  But the above code should be fine whether DATETIME contains a 14-digit integer or whether it contains a true SAS datetime value.

New Contributor
Posts: 4

Re: How to summarize data

proc sql;
select	custID,
		HomeCity,
		TripCity,
		min(DateTime)	as	FirstDateTime,
		max(DateTime)	as	LastDateTime,
		sum(Amt)		as	TotalAmt
from Transaction_Table
group by custID,
		HomeCity,
		TripCity;
quit;
New Contributor
Posts: 4

Re: How to summarize data

Thanks for suggesting a solution. I actually started with something similar to yours, which is as follows:

 

proc sql;

create table TripDetails_Table AS

select   CustID#,

         Home City,

         Trip City,

         MIN(DateTime) AS FirstDateTime,    // DateTime is YYYYMMDDHHMMSS

         MAX(DateTime) AS LastDateTime,

         SUM(Amt) AS TotalAmt

from Transaction Table

where Trip City ne Home City

group by CustID#, Trip City;

quit;

 

Given that CustID# and Home City is a 1-1 mapping, I suppose grouping by CustID#, Home City, Trip City is the same as grouping by Card#, Trip City, right?

 

Based on the above query, I think SAS will first eliminate the records whereby Home City = Trip City, then it will group the CustID# followed by Trip City together like this:

 

CustID#

 

Home City

Trip City

 Amt

DateTime

1

 

NY

LA

200

20161203103032

1

 

NY

LA

300

20161204092330

1

 

NY

LA

200

20170108072330

1

 

NY

BOS

100

20170105113032

 

 

 

 

 

 

 

Then, the result would be:

CustID#   

Home City

Trip City

FirstDateTime

LastDateTime

TotalAmt

1

NY

LA

20161203103032

20170108072330

700

1

NY

BOS

20170105113032

20170105113032

100

which is not what I want.

 

I want the following:

Cust#   

Home City

Trip City

FirstDateTime

LastDateTime

TotalAmt

1

NY

LA

20161203103032

20161204092330

500

1

NY

BOS

20170105113032

20170105113032

100

1

NY

LA

20170108072330

20170108072330

200

Super User
Posts: 13,498

Re: How to summarize data

Here is one way to add a trip number.

data junk;
   input id homecity $ tripcity $;
datalines;
1  NY  NY
1  NY LA
1  NY LA
1  NY NY
1  NY BO
1  NY LA
1  NY Ny
;
run;

data temp;
   set Junk;
   by  id;
   retain TripNum;
   Lagtrip=lag(TripCity);
   if first.id then tripNum=0;
   If LagTrip ne tripcity then Tripnum+1;
   if tripcity ne homecity then ActualTripNum=TripNum;
   drop lagtrip tripnum;
run;

Then you would use group by ID, ActualTripNum in the query.

 

Assumes the data is sorted by Id Datetime.

Frequent Contributor
Posts: 116

Re: How to summarize data

[ Edited ]

Try this.....

 

proc sql;
  create table WANT(drop=year) as
select
  CUSTID,
  HOME,
TRIP,
year(datetime_var) as year,
sum(amt) as tot_sum,
min(datetime_var) as firstdttm,
max(datetime_var) as lastdttm
from HAVE
  where HOME ^= TRIP
group by
  CUSTID, HOME, TRIP, calculated YEAR
order by
  CUSTID, calculated firstdttm;
quit;

 

Hope this helps..

 

I tried and got exactly the same as your "WANT" output.  However, i expect (not assuming) that you have successfully converted your datetime value correctly.  I used the year function as i only used date (no time information - i'm lazy to infuse the time value).  You can tweak this a little by using (year(datepart(datetime_var)) --where you extract date from the datetime value first then extract the year. The functions MIN and MAX will pick the correct value as records with the same date will be unique because of time information forces it to be unique (split seconds makes the BIG difference, if its there).

Super User
Posts: 23,663

Re: How to summarize data

From the data you've shown, how do you know that which NY/LA data to group, besides the order they're in? Or is that all you have to go on....that seems a bit sloppy, what if there were back to back trips to the same place?

New Contributor
Posts: 4

Re: How to summarize data

Assuming that the customer spends money when he/she is back in the home city, I am interested in finding a trip/series of trips in the period when he/she is away from the Home City. During this period, he/she may travel to multiple cities and each travel will be considered a trip.

 

Will

Super User
Posts: 23,663

Re: How to summarize data


wtay wrote:

Assuming that the customer spends money when he/she is back in the home city, I am interested in finding a trip/series of trips in the period when he/she is away from the Home City. During this period, he/she may travel to multiple cities and each travel will be considered a trip.

 

Will


That's nice for context, but doesn't explain your data. 

Looking at your data, what are the exact rules for defining a single trip?

 

Is it based on the order you currently presented? Can you guarantee its always in the correct order? How did you create that order in the first place, is there some other variables you're using?

 

 

 

Super User
Posts: 13,498

Re: How to summarize data


wtay wrote:

Assuming that the customer spends money when he/she is back in the home city, I am interested in finding a trip/series of trips in the period when he/she is away from the Home City. During this period, he/she may travel to multiple cities and each travel will be considered a trip.

 

Will


Did you look at my code example?

 

If you didn't like it because it did not have all of your variables then post the data in a format that I don't have to retype everything to create a working data set. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

Super User
Posts: 10,209

Re: How to summarize data


wtay wrote:

Assuming that the customer spends money when he/she is back in the home city, I am interested in finding a trip/series of trips in the period when he/she is away from the Home City. During this period, he/she may travel to multiple cities and each travel will be considered a trip.

 

Will


I will expand on my previous post.

First, sort your existing dataset by id and datetime.

Then run a data step

data want;
set have;

with

by cust_id trip_city notsorted;

(the notsorted option for trip_city is essential)

Retain a variable for start_datetime and a variable for total_amount:

retain start_datetime total_amount;

At first.trip_city, set the start_datetime and set the total_amount to zero:

if first.trip_city
then do;
  start_datetime = datetime;
  total_amount = 0;
end;

Add amd to total_amount.

total_amount + amt;

At last.trip_city, set the end_datetime and do an output.

if last.trip_city
then do;
  end_datetime = datetime;
  output;
end;

Keep only the variables needed:

keep id trip_city start_datetime end_datetime total_amount;

Additionally, end the data step:

run;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Ask a Question
Discussion stats
  • 14 replies
  • 251 views
  • 0 likes
  • 7 in conversation