BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Kalai2008
Pyrite | Level 9


I have a huge dataset. Trying something like below.

I have 10 customers who gets repeated in 3 months. Trying to capture the repeated customers in each month and new customers.

For Jan (Cust 1,cust3, cust4 repeats in other months)--- Overlap

For Jan( Cust 2 and cust 10 are new)

 

Given Dataset:

 

Month Cust

jan cust1

jan cust2

jan cust3

jan cust4

jan cust5

feb cust1

feb cust3

feb cust4

mar cust3

mar cust4

mar cust10

 

Output want:

 


Month Total  Overlap   New

jan        5        3             2

feb       3         3              0

mar      3         2              1

 

Thanks for checking

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Hi  @Kalai2008  Pretty straight forward SQL

 



data have;
input month $ cust $;
cards;
jan cust1
jan cust2
jan cust3
jan cust4
jan cust5
feb cust1
feb cust3
feb cust4
mar cust3
mar cust4
mar cust10
;
proc sql;
create table want as
select  a.month,count(distinct b.cust) as overlap,(count(distinct a.cust)-calculated overlap) as New
from have a left join have b
on a.cust=b.cust and a.month ne b.month
group by a.month 
order by a.month;
quit;

View solution in original post

12 REPLIES 12
Reeza
Super User
How do you overlap with future dates? It makes sense to look backwards for overlap but how does looking forward help here? In January, you have 5 unique individuals, 3 purchased again later on but as of January they hadn't....and in March you look backwards not forwards.

Are you sure this is what you want?

Kalai2008
Pyrite | Level 9
Yes, since this is historical data.
Reeza
Super User
So you want to look both forward and backward in times for overlap?
Kalai2008
Pyrite | Level 9
yes Correct.
novinosrin
Tourmaline | Level 20

Hi  @Kalai2008  Pretty straight forward SQL

 



data have;
input month $ cust $;
cards;
jan cust1
jan cust2
jan cust3
jan cust4
jan cust5
feb cust1
feb cust3
feb cust4
mar cust3
mar cust4
mar cust10
;
proc sql;
create table want as
select  a.month,count(distinct b.cust) as overlap,(count(distinct a.cust)-calculated overlap) as New
from have a left join have b
on a.cust=b.cust and a.month ne b.month
group by a.month 
order by a.month;
quit;
Reeza
Super User
My concern with a full SQL approach with self joins is that it won't scale well with a huge data set....
novinosrin
Tourmaline | Level 20

I agree with you @Reeza . SQL is ready meals  and convenience at least for this solution.  But since we are doing an equi and non equi filter in the sub-setting pattern makes the SQL optimizer choose the SORT MERGE(Magic=102) JOIN algorithm  and optimizes.

 

Of course, I am certain a more programming solution will likely give you better performance but for a long and narrow dataset, I am holding faith that this approach should suffice, though your point is well taken 🙂

 

 

Kalai2008
Pyrite | Level 9
Thank you, performance is very slow and code is still running.
novinosrin
Tourmaline | Level 20

Hi @Kalai2008 Should you have enough memory , you could try HASH


data want ;
 if _n_=1 then do;
   if 0 then set have have(rename=(month=_month));
   dcl hash H (dataset:'have(rename=(month=_month))',multidata:'y') ;
   h.definekey  ("cust") ;
   h.definedata ("_month") ;
   h.definedone () ;
 end;
 do Total=1 by 1 until(last.month);
  set have;
  by month notsorted;
  do rc=h.find() by 0 while(rc=0);
   if month ne _month then do; Overlap=sum(Overlap,1);leave;end;
   rc=h.find_next();
  end;
 end;
 New=Total-Overlap;
 drop rc _month cust;
run;

 

Kalai2008
Pyrite | Level 9
Awesome ....Thank you. It worked..
Ksharp
Super User

Assuming the data has been sorted by month.

 


data have;
input month $ cust $;
cards;
jan cust1
jan cust2
jan cust3
jan cust4
jan cust5
feb cust1
feb cust3
feb cust4
mar cust3
mar cust4
mar cust10
;
proc sort data=have;by month;run;
proc sql;
create table cust as
 select distinct cust from have group by cust having count(distinct month)=1;
create table month as
 select month,count(distinct cust) as total from have group by month;
quit;

data temp;
 if _n_=1 then do;
  if 0 then set month;
  declare hash h(dataset:'cust',hashexp:20);
  h.definekey('cust');
  h.definedone();
 end;
set have;
by month;
if first.month then new=0;
if h.check()=0 then new+1;
if last.month;
keep month new;
run;

data want;
 merge month temp;
 by month ;
 overlap=total-new;
run;

 

 

Kalai2008
Pyrite | Level 9
Thank you.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 923 views
  • 6 likes
  • 4 in conversation