Solved: Re: Count new customers,were and stay,were left and back

Ronein · Posted 12-21-2020 08:00 AM

Hello

I have raw data sets in each month with list of customer Id.

I want to classify each customer in each data set to one of 3 groups:

1-How many customers in this file that have never been before in previous months

(Call it "New customers never been before")

2-How many customers in this file that have been in previous months but were not in last month(Call it "Were ,Left and Back customers")

3-How many customers in this file that have been in last month

(Call it "Were and Stay customers")

Then for each month I want to calculate 4 statistics:

1-How many customers in the file

2-How many customers belong to category "New customers never been before"

3-How many customers belong to category "Were Left and Back customers"

3-How many customers belong to category "Were and Stay customers"

4-How many customers left this month and never came back

Data t2004;
input ID;
cards;
1
2
;
Run;
Data t2005;
input ID;
cards;
1
2
3
;
Run;
Data t2006;
input ID;
cards;
1
2
4
;
Run;
Data t2007;
input ID;
cards;
2
4
5
6
7
;
Run;
Data t2008;
input ID;
cards;
2
4
6
1
;
Run;

For example :

for 2004:

1-How many customers in the file:2

2-How many customers belong to category "New customers never been before":0

3-How many customers belong to category "Were, Left and Back customers":0

3-How many customers belong to category "Were and Stay customers":0

for 2005:
1-How many customers in the file:3
2-How many customers belong to category "New customers never been before":1
3-How many customers belong to category "Were, Left and Back customers":0
3-How many customers belong to category "Were and Stay customers":2

for 2006:
1-How many customers in the file:3
2-How many customers belong to category "New customers never been before":1
3-How many customers belong to category "Were, Left and Back customers":0
3-How many customers belong to category "Were and Stay customers":2

for 2007:
1-How many customers in the file:5
2-How many customers belong to category "New customers never been before": 3
3-How many customers belong to category "Were, Left and Back customers": 0
3-How many customers belong to category "Were and Stay customers":2

for 2008:
1-How many customers in the file:4
2-How many customers belong to category "New customers never been before":
3-How many customers belong to category "Were, Left and Back customers": 1
3-How many customers belong to category "Were and Stay customers":3

CarmineVerrell · Posted 12-21-2020 08:41 AM

Sorry did not see the rest of the question at bottom of post. My example only does the calculation based on 2018.

View solution in original post

Ronein · Posted 12-21-2020 08:29 AM

I know to identify customer who "were and stay"


Data tbl1;
set t2004 t2005 t2006 t2007 t2008
indsname=ds
;
source=substr(scan(ds,2,'.'),2,4);
Run;
 
PROC SQL;
	create table Help1  as
	select  ID,
            count(*) as nr_appear
	from  tbl1
	group by ID 
;
QUIT;

PROC SQL;
	create table tbl  as
	select  a.*,b.nr_appear,case when b.nr_appear=5 then 'Was and stay' end as Ind	   
	from tbl1 as a
    left join  Help1 as b
	on a.ID=b.ID
	order by a.source
;
QUIT;

CarmineVerrell · Posted 12-21-2020 08:37 AM

/*Modified your code below to include ID 6 in T2004 as there are no examples 
of "Were left and Back Customers".*/
/**/
/*Please see below, for possible solution code. Hope this helps.*/
/**/
/*Carmine*/
/*If this is an acceptable solution, please let us know.*/


Data t2004;
input ID;
cards;
1
2
6
;
Run;
Data t2005;
input ID;
cards;
1
2
3
;
Run;
Data t2006;
input ID;
cards;
1
2
4
;
Run;
Data t2007;
input ID;
cards;
2
4
5
7
;
Run;
Data t2008;
input ID;
cards;
2
4
6
8
;
Run;


data newc(drop=stattxt) leftb(drop=stattxt) Stayc(drop=stattxt) stats;
	merge t2004(in=in2004)
	      t2005(in=in2005)
		  t2006(in=in2006)
	      t2007(in=in2007)
	      t2008(in=in2008);
	by id;
	length stattxt $40;
		if in2008 and sum(of in2004-in2007)=0 then do; stattxt="New customers never been before";output stats; output newc;end;
	    if in2008 and not in2007 and sum(of in2004-in2006)>0 then do;stattxt="Were Left and Back Customers";output stats; output leftb; end;
		if in2008 and in2007 then do; stattxt="Were and Stay Customers";output stats; output stayc; end;
run;

Proc Freq data=stats;
table stattxt;
run;

CarmineVerrell · Posted 12-21-2020 08:41 AM

Sorry did not see the rest of the question at bottom of post. My example only does the calculation based on 2018.

CarmineVerrell · Posted 12-21-2020 08:42 AM

Meant to say 2008.

Ronein · Posted 12-21-2020 09:01 AM

What is the way to do it for each month?

Can you please explain why in each block (IF do end) you write 2 output statements?

for example:

output stats;
output newc;

Ronein · Posted 12-21-2020 10:06 AM

Can I ask, If you use else IF instead of "IF" will it be also okay?

Why are you usinf multiple IF statements and not If and then else if..?

Ronein · Posted 12-21-2020 04:50 PM

Why are you using "BY ID"?
As I see we can SET the data sets without BY?
In this example is it essential to use BY?

andreas_lds · Posted 12-21-2020 09:16 AM

This is not the best solution, but i don't have the time to think about it in detail.

data work.all_years / view=work.all_years;
   length 
      _source $ 42 
      year 8
   ;
   
   set work.t: indsname=_source;
   
   year = input(compress(_source,, 'kd'), 4.);
run;

proc sort data=work.all_years out=work.sorted;
   by id year;
run;


data work.pre_want;
   set work.sorted;
   by id;
   
   length flag2-flag4 8; 
   
   flag2 = first.id;
   flag3 = (not first.id and year-1 > lag(year));
   flag4 = (not first.id and year-1 = lag(year));
run;


proc report data=work.pre_want;
   column year year=flag1 flag2 flag3 flag4;
   define year / group;
   define flag1 / n 'How many customers in the file';
   define flag2 / sum 'New customers never been before';
   define flag3 / sum "Were, Left and Back customers";
   define flag4 / sum "Were and Stay customers";
run;

Register Today!

SAS Training: Just a Click Away