- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello
I have raw data sets in each month with list of customer Id.
I want to classify each customer in each data set to one of 3 groups:
1-How many customers in this file that have never been before in previous months
(Call it "New customers never been before")
2-How many customers in this file that have been in previous months but were not in last month(Call it "Were ,Left and Back customers")
3-How many customers in this file that have been in last month
(Call it "Were and Stay customers")
Then for each month I want to calculate 4 statistics:
1-How many customers in the file
2-How many customers belong to category "New customers never been before"
3-How many customers belong to category "Were Left and Back customers"
3-How many customers belong to category "Were and Stay customers"
4-How many customers left this month and never came back
Data t2004;
input ID;
cards;
1
2
;
Run;
Data t2005;
input ID;
cards;
1
2
3
;
Run;
Data t2006;
input ID;
cards;
1
2
4
;
Run;
Data t2007;
input ID;
cards;
2
4
5
6
7
;
Run;
Data t2008;
input ID;
cards;
2
4
6
1
;
Run;
For example :
for 2004:
1-How many customers in the file:2
2-How many customers belong to category "New customers never been before":0
3-How many customers belong to category "Were, Left and Back customers":0
3-How many customers belong to category "Were and Stay customers":0
for 2005:
1-How many customers in the file:3
2-How many customers belong to category "New customers never been before":1
3-How many customers belong to category "Were, Left and Back customers":0
3-How many customers belong to category "Were and Stay customers":2
for 2006:
1-How many customers in the file:3
2-How many customers belong to category "New customers never been before":1
3-How many customers belong to category "Were, Left and Back customers":0
3-How many customers belong to category "Were and Stay customers":2
for 2007:
1-How many customers in the file:5
2-How many customers belong to category "New customers never been before": 3
3-How many customers belong to category "Were, Left and Back customers": 0
3-How many customers belong to category "Were and Stay customers":2
for 2008:
1-How many customers in the file:4
2-How many customers belong to category "New customers never been before":
3-How many customers belong to category "Were, Left and Back customers": 1
3-How many customers belong to category "Were and Stay customers":3
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I know to identify customer who "were and stay"
Data tbl1;
set t2004 t2005 t2006 t2007 t2008
indsname=ds
;
source=substr(scan(ds,2,'.'),2,4);
Run;
PROC SQL;
create table Help1 as
select ID,
count(*) as nr_appear
from tbl1
group by ID
;
QUIT;
PROC SQL;
create table tbl as
select a.*,b.nr_appear,case when b.nr_appear=5 then 'Was and stay' end as Ind
from tbl1 as a
left join Help1 as b
on a.ID=b.ID
order by a.source
;
QUIT;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
/*Modified your code below to include ID 6 in T2004 as there are no examples of "Were left and Back Customers".*//**//*Please see below, for possible solution code. Hope this helps.*//**//*Carmine*//*If this is an acceptable solution, please let us know.*/Data t2004;
input ID;
cards;
1
2
6
;
Run;
Data t2005;
input ID;
cards;
1
2
3
;
Run;
Data t2006;
input ID;
cards;
1
2
4
;
Run;
Data t2007;
input ID;
cards;
2
4
5
7
;
Run;
Data t2008;
input ID;
cards;
2
4
6
8
;
Run;
data newc(drop=stattxt) leftb(drop=stattxt) Stayc(drop=stattxt) stats;
merge t2004(in=in2004)
t2005(in=in2005)
t2006(in=in2006)
t2007(in=in2007)
t2008(in=in2008);
by id;
length stattxt $40;
if in2008 and sum(of in2004-in2007)=0 then do; stattxt="New customers never been before";output stats; output newc;end;
if in2008 and not in2007 and sum(of in2004-in2006)>0 then do;stattxt="Were Left and Back Customers";output stats; output leftb; end;
if in2008 and in2007 then do; stattxt="Were and Stay Customers";output stats; output stayc; end;
run;
Proc Freq data=stats;
table stattxt;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What is the way to do it for each month?
Can you please explain why in each block (IF do end) you write 2 output statements?
for example:
output stats;
output newc;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Can I ask, If you use else IF instead of "IF" will it be also okay?
Why are you usinf multiple IF statements and not If and then else if..?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
As I see we can SET the data sets without BY?
In this example is it essential to use BY?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This is not the best solution, but i don't have the time to think about it in detail.
data work.all_years / view=work.all_years;
length
_source $ 42
year 8
;
set work.t: indsname=_source;
year = input(compress(_source,, 'kd'), 4.);
run;
proc sort data=work.all_years out=work.sorted;
by id year;
run;
data work.pre_want;
set work.sorted;
by id;
length flag2-flag4 8;
flag2 = first.id;
flag3 = (not first.id and year-1 > lag(year));
flag4 = (not first.id and year-1 = lag(year));
run;
proc report data=work.pre_want;
column year year=flag1 flag2 flag3 flag4;
define year / group;
define flag1 / n 'How many customers in the file';
define flag2 / sum 'New customers never been before';
define flag3 / sum "Were, Left and Back customers";
define flag4 / sum "Were and Stay customers";
run;