06-20-2017 02:42 PM
I have a table with customer transaction history and I am trying to write a query which brings back these variables:
So, I start with a table (transaction) which looks like this:
And I am looking for a result which looks like this:
I have to query the data from an Oracle Environment. I was able to figure out how to write a query which pulls in the first purchase date (see below ) but I’m having trouble figuring how to incorporate/add/pull the second purchase date and the last purchase date variables.
proc sql; connect to oracle; create table want as select customer_id, datepart(First_purchase_Date) as First_purchase_Date format=date9. from connection to oracle ( select t.customer_id, t.transaction_date AS First_purchase_Date from transaction t inner join (select customer_id, min(transaction_date) as First_purchase_Date from sas_transaction where sales_credit in (1) and business_unit in (3) group by customer_id having max(case when sales_credit in (1) and transaction_date >= '01JAN2010' and transaction_date <= '20JUN2017' then 1 else 0 end)=1 ) f on (t.customer_id = f.customer_id and t.transaction_date = f.First_purchase_Date) where t.business_unit in (3) and t.sales_credit in (1) group by t.First_purchase_Date, t.transaction_date ); disconnect from oracle; quit;
Any assistance will be greatly aperciated.
06-20-2017 03:17 PM - edited 06-20-2017 03:40 PM
My experience with extracting data from relational databases like Oracle is that you are better off doing math like this (and also means, sums, counts, standard deviations, etc.) outside of SQL. Better off? Yes, in terms of speed, in terms of code simplicity, as well as probably for other reasons.
UNTESTED CODE (assumes data is sorted by customer and date)
data first second last; set oracle_extract; by customer date; if first.customer then purchase_count=0; purchase_count+1; if purchase_count=1 then output first; else if purchase_count=2 then output second; if last.customer then output last; run;
Once you have the three output data sets, FIRST, SECOND and LAST, it's easy to merge them to produce the final data set you described above.
06-20-2017 04:13 PM - edited 06-20-2017 07:57 PM
It is much better off doing this in datastep and code by @PaigeMiller is much cleaner and I think will be efficient too. But you may able to get the answer by sql with below code. Sorry I just saw that you are running this query in an explicit pass through and sql query I gave will not in explicit pass through as it has sas compoenents in it. Below query should work in explicit pass through.
data abc; input transaction_id transaction_date:mmddyy10.; format transaction_date date9.; datalines; 124 6/14/2017 656 6/14/2017 124 4/17/2017 656 7/23/2016 656 4/17/2016 124 3/15/2016 124 12/20/2014 656 4/16/2014 333 8/5/2013 454 11/5/2012 656 1/20/2011 454 10/22/2010 233 4/30/2010 ; proc sql; select distinct bde.transaction_id, first_transaction_date , second_transaction_date , last_transaction_date from (select a.transaction_id, first_transaction_date, last_transaction_date from (select transaction_id, transaction_date as first_transaction_date from abc group by transaction_id having transaction_date = min(transaction_date))a inner join (select transaction_id, transaction_date as last_transaction_date from abc group by transaction_id having transaction_date = max(transaction_date))b on a.transaction_id =b.transaction_id)bde left join (select a.transaction_id, min(a.transaction_date) as second_transaction_date from (select transaction_id, transaction_date from abc)a inner join (select transaction_id, min(transaction_date)as min_trasaction_date from abc b group by transaction_id)b on a.transaction_id = b.transaction_id and a.transaction_date > min_trasaction_date)cvcv on bde.transaction_id =cvcv.transaction_id;