Hello
I have raw data sets in each month with list of customer Id.
I want to classify each customer in each data set to one of 3 groups:
1-How many customers in this file that have never been before in previous months
(Call it "New customers never been before")
2-How many customers in this file that have been in previous months but were not in last month(Call it "Were ,Left and Back customers")
3-How many customers in this file that have been in last month
(Call it "Were and Stay customers")
Then for each month I want to calculate 4 statistics:
1-How many customers in the file
2-How many customers belong to category "New customers never been before"
3-How many customers belong to category "Were Left and Back customers"
3-How many customers belong to category "Were and Stay customers"
4-How many customers left this month and never came back
Data t2004;
input ID;
cards;
1
2
;
Run;
Data t2005;
input ID;
cards;
1
2
3
;
Run;
Data t2006;
input ID;
cards;
1
2
4
;
Run;
Data t2007;
input ID;
cards;
2
4
5
6
7
;
Run;
Data t2008;
input ID;
cards;
2
4
6
1
;
Run;
For example :
for 2004:
1-How many customers in the file:2
2-How many customers belong to category "New customers never been before":0
3-How many customers belong to category "Were, Left and Back customers":0
3-How many customers belong to category "Were and Stay customers":0
for 2005:
1-How many customers in the file:3
2-How many customers belong to category "New customers never been before":1
3-How many customers belong to category "Were, Left and Back customers":0
3-How many customers belong to category "Were and Stay customers":2
for 2006:
1-How many customers in the file:3
2-How many customers belong to category "New customers never been before":1
3-How many customers belong to category "Were, Left and Back customers":0
3-How many customers belong to category "Were and Stay customers":2
for 2007:
1-How many customers in the file:5
2-How many customers belong to category "New customers never been before": 3
3-How many customers belong to category "Were, Left and Back customers": 0
3-How many customers belong to category "Were and Stay customers":2
for 2008:
1-How many customers in the file:4
2-How many customers belong to category "New customers never been before":
3-How many customers belong to category "Were, Left and Back customers": 1
3-How many customers belong to category "Were and Stay customers":3
I know to identify customer who "were and stay"
Data tbl1;
set t2004 t2005 t2006 t2007 t2008
indsname=ds
;
source=substr(scan(ds,2,'.'),2,4);
Run;
PROC SQL;
create table Help1 as
select ID,
count(*) as nr_appear
from tbl1
group by ID
;
QUIT;
PROC SQL;
create table tbl as
select a.*,b.nr_appear,case when b.nr_appear=5 then 'Was and stay' end as Ind
from tbl1 as a
left join Help1 as b
on a.ID=b.ID
order by a.source
;
QUIT;
/*Modified your code below to include ID 6 in T2004 as there are no examples
of "Were left and Back Customers".*/
/**/
/*Please see below, for possible solution code. Hope this helps.*/
/**/
/*Carmine*/
/*If this is an acceptable solution, please let us know.*/
Data t2004;
input ID;
cards;
1
2
6
;
Run;
Data t2005;
input ID;
cards;
1
2
3
;
Run;
Data t2006;
input ID;
cards;
1
2
4
;
Run;
Data t2007;
input ID;
cards;
2
4
5
7
;
Run;
Data t2008;
input ID;
cards;
2
4
6
8
;
Run;
data newc(drop=stattxt) leftb(drop=stattxt) Stayc(drop=stattxt) stats;
merge t2004(in=in2004)
t2005(in=in2005)
t2006(in=in2006)
t2007(in=in2007)
t2008(in=in2008);
by id;
length stattxt $40;
if in2008 and sum(of in2004-in2007)=0 then do; stattxt="New customers never been before";output stats; output newc;end;
if in2008 and not in2007 and sum(of in2004-in2006)>0 then do;stattxt="Were Left and Back Customers";output stats; output leftb; end;
if in2008 and in2007 then do; stattxt="Were and Stay Customers";output stats; output stayc; end;
run;
Proc Freq data=stats;
table stattxt;
run;
What is the way to do it for each month?
Can you please explain why in each block (IF do end) you write 2 output statements?
for example:
output stats;
output newc;
Can I ask, If you use else IF instead of "IF" will it be also okay?
Why are you usinf multiple IF statements and not If and then else if..?
This is not the best solution, but i don't have the time to think about it in detail.
data work.all_years / view=work.all_years;
length
_source $ 42
year 8
;
set work.t: indsname=_source;
year = input(compress(_source,, 'kd'), 4.);
run;
proc sort data=work.all_years out=work.sorted;
by id year;
run;
data work.pre_want;
set work.sorted;
by id;
length flag2-flag4 8;
flag2 = first.id;
flag3 = (not first.id and year-1 > lag(year));
flag4 = (not first.id and year-1 = lag(year));
run;
proc report data=work.pre_want;
column year year=flag1 flag2 flag3 flag4;
define year / group;
define flag1 / n 'How many customers in the file';
define flag2 / sum 'New customers never been before';
define flag3 / sum "Were, Left and Back customers";
define flag4 / sum "Were and Stay customers";
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.