BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ronein
Meteorite | Level 14

Hello

I have raw data sets in each month with list of customer Id.

I want to classify each customer in each data set to one of 3 groups:

1-How many customers in this file that have never been before in previous months

(Call it "New customers never been before")

2-How many customers in this file that have been in previous months but were not in last month(Call it "Were ,Left and Back customers")

3-How many customers in this file that have been in last month

(Call it "Were and Stay customers")

 

Then for each month I want to calculate 4 statistics:

1-How many customers in the file

2-How many customers belong to category "New customers never been before"

3-How many customers belong to category "Were Left and Back customers"

3-How many customers belong to category "Were and Stay customers"

4-How many customers left this month and never came back

 

Data t2004;
input ID;
cards;
1
2
;
Run;
Data t2005;
input ID;
cards;
1
2
3
;
Run;
Data t2006;
input ID;
cards;
1
2
4
;
Run;
Data t2007;
input ID;
cards;
2
4
5
6
7
;
Run;
Data t2008;
input ID;
cards;
2
4
6
1
;
Run;

 

For example :

for 2004:

1-How many customers in the file:2 

2-How many customers belong to category "New customers never been before":0

3-How many customers belong to category "Were, Left and Back customers":0

3-How many customers belong to category "Were and Stay customers":0

 for 2005:
1-How many customers in the file:3 
2-How many customers belong to category "New customers never been before":1
3-How many customers belong to category "Were, Left and Back customers":0
3-How many customers belong to category "Were and Stay customers":2

 for 2006:
1-How many customers in the file:3 
2-How many customers belong to category "New customers never been before":1
3-How many customers belong to category "Were, Left and Back customers":0
3-How many customers belong to category "Were and Stay customers":2

 for 2007:
1-How many customers in the file:5 
2-How many customers belong to category "New customers never been before": 3
3-How many customers belong to category "Were, Left and Back customers": 0
3-How many customers belong to category "Were and Stay customers":2

 for 2008:
1-How many customers in the file:4 
2-How many customers belong to category "New customers never been before": 
3-How many customers belong to category "Were, Left and Back customers": 1 
3-How many customers belong to category "Were and Stay customers":3

1 ACCEPTED SOLUTION

Accepted Solutions
CarmineVerrell
SAS Employee
Sorry did not see the rest of the question at bottom of post. My example only does the calculation based on 2018.

View solution in original post

8 REPLIES 8
Ronein
Meteorite | Level 14

I know to identify customer who "were and stay"

 


Data tbl1;
set t2004 t2005 t2006 t2007 t2008
indsname=ds
;
source=substr(scan(ds,2,'.'),2,4);
Run;
 
PROC SQL;
	create table Help1  as
	select  ID,
            count(*) as nr_appear
	from  tbl1
	group by ID 
;
QUIT;

PROC SQL;
	create table tbl  as
	select  a.*,b.nr_appear,case when b.nr_appear=5 then 'Was and stay' end as Ind	   
	from tbl1 as a
    left join  Help1 as b
	on a.ID=b.ID
	order by a.source
;
QUIT;
CarmineVerrell
SAS Employee
/*Modified your code below to include ID 6 in T2004 as there are no examples 
of "Were left and Back Customers".*/
/**/
/*Please see below, for possible solution code. Hope this helps.*/
/**/
/*Carmine*/
/*If this is an acceptable solution, please let us know.*/


Data t2004; input ID; cards; 1 2 6 ; Run; Data t2005; input ID; cards; 1 2 3 ; Run; Data t2006; input ID; cards; 1 2 4 ; Run; Data t2007; input ID; cards; 2 4 5 7 ; Run; Data t2008; input ID; cards; 2 4 6 8 ; Run; data newc(drop=stattxt) leftb(drop=stattxt) Stayc(drop=stattxt) stats; merge t2004(in=in2004) t2005(in=in2005) t2006(in=in2006) t2007(in=in2007) t2008(in=in2008); by id; length stattxt $40; if in2008 and sum(of in2004-in2007)=0 then do; stattxt="New customers never been before";output stats; output newc;end; if in2008 and not in2007 and sum(of in2004-in2006)>0 then do;stattxt="Were Left and Back Customers";output stats; output leftb; end; if in2008 and in2007 then do; stattxt="Were and Stay Customers";output stats; output stayc; end; run; Proc Freq data=stats; table stattxt; run;
CarmineVerrell
SAS Employee
Sorry did not see the rest of the question at bottom of post. My example only does the calculation based on 2018.
Ronein
Meteorite | Level 14

What is the way to do it for each month?

Can you please explain why in each block (IF do end) you write 2 output statements?

for example:

output stats;
output newc;

Ronein
Meteorite | Level 14

Can I ask, If you use else IF instead of "IF"  will it be also okay?

Why are you usinf multiple IF statements and not If   and then else if..?

Ronein
Meteorite | Level 14
Why are you using "BY ID"?
As I see we can SET the data sets without BY?
In this example is it essential to use BY?
andreas_lds
Jade | Level 19

This is not the best solution, but i don't have the time to think about it in detail.

 

data work.all_years / view=work.all_years;
   length 
      _source $ 42 
      year 8
   ;
   
   set work.t: indsname=_source;
   
   year = input(compress(_source,, 'kd'), 4.);
run;

proc sort data=work.all_years out=work.sorted;
   by id year;
run;


data work.pre_want;
   set work.sorted;
   by id;
   
   length flag2-flag4 8; 
   
   flag2 = first.id;
   flag3 = (not first.id and year-1 > lag(year));
   flag4 = (not first.id and year-1 = lag(year));
run;


proc report data=work.pre_want;
   column year year=flag1 flag2 flag3 flag4;
   define year / group;
   define flag1 / n 'How many customers in the file';
   define flag2 / sum 'New customers never been before';
   define flag3 / sum "Were, Left and Back customers";
   define flag4 / sum "Were and Stay customers";
run;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1285 views
  • 3 likes
  • 3 in conversation