BookmarkSubscribeRSS Feed
KDang
Fluorite | Level 6
Hi,
I have a data set with a log of members activities, and their signup date
for example the data may look like this

id signupdate logindate activity
1 11/01/07 11/01/07 open
1 11/01/07 11/06/07 open
1 11/01/07 11/07/07 start
1 11/01/07 11/22/07 open
1 11/01/07 12/06/07 start
1 11/01/07 12/08/07 open
2 12/05/07 01/01/08 start
2 12/05/07 01/22/08 open
2 12/05/07 01/28/08 start
2 12/05/07 01/29/08 open

I'm trying to get a report that will track their activity by month since sign up
for example:
ID = 1
1st month:
Login: 4
Open: 3
Start: 1
2nd month:
Login: 2
Open: 1
Start: 1

ID = 2
1st month:
Login: 1
Open: 0
Start: 1
2nd month:
Login: 3
Open: 1
Start: 2

I can probably figure out the reporting aspect of it, any help in structuring my data set and functions that i can use to get to that point will be much appreciated!
Thanks you. Message was edited by: KDang
12 REPLIES 12
Cynthia_sas
SAS Super FREQ
Hi:
Should all these login dates be in 08 or 07 for ID# 2???
[pre]
2 Dec/05/07 Jan/01/08 start
2 Dec/05/07 Jan/22/07 open
2 Dec/05/07 Jan/28/07 start
2 Dec/05/07 Jan/29/07 open
[/pre]

Otherwise, it looks odd that they logged in in Jan, 07 when they signed up in Dec, 07.

cynthia
KDang
Fluorite | Level 6
Yes sorry 07 should be 08, i've make the edits above, same with the month formatting to MMDDYY
Daryl
SAS Employee
You could try PROC FREQ to get the counts you are looking for. Consider this example below. The first data step is clumsy; I couldn't find a SAS format to read in dates of the form MON/DD/YY. So I had to parse your dates to read them correctly.

[pre]
data test (keep=id signupdate logindate loginmmyy activity);
input id s_month $ 3-5 s_day 7-8 s_year 10-11 l_month $ 13-15 l_day 17-18 l_year 20-21 activity $;
format signupdate logindate mmddyy10.;
format loginmmyy monyy.;
signupdate_string = put(s_day,z2.) || upcase(s_month) || put(s_year,z2.);
signupdate = input(signupdate_string,date7.);
logindate_string = put(l_day,z2.) || upcase(l_month) || put(l_year,z2.);
logindate = input(logindate_string,date7.);
loginmmyy = logindate;
datalines;
1 Nov/01/07 Nov/01/07 open
1 Nov/01/07 Nov/06/07 open
1 Nov/01/07 Nov/07/07 start
1 Nov/01/07 Nov/22/07 open
1 Nov/01/07 Dec/06/07 start
1 Nov/01/07 Dec/08/07 open
2 Dec/05/07 Jan/01/08 start
2 Dec/05/07 Jan/22/07 open
2 Dec/05/07 Jan/28/07 start
2 Dec/05/07 Jan/29/07 open
;
run;


proc sort data=test;
by id loginmmyy;
run;


proc freq data=test noprint;
by id loginmmyy;
tables activity / out=output;
run;


proc print data=output noobs;
run;
[/pre]

Result:
[pre]
id loginmmyy activity COUNT PERCENT

1 NOV07 open 3 75.000
1 NOV07 start 1 25.000
1 DEC07 open 1 50.000
1 DEC07 start 1 50.000
2 JAN07 open 2 66.667
2 JAN07 start 1 33.333
2 JAN08 start 1 100.000
[/pre]
KDang
Fluorite | Level 6
Thanks for your reply Daryl, buts its not quite what im looking for.


I'm trying to create a report where i can look at activity by first/second/third etc. month since their sign up.

Since everyone signs up at different dates, they will have a different monthly anniversary, so I'm not sure how to structure my data to take that into account.

Thanks again for your help.
SPR
Quartz | Level 8 SPR
Quartz | Level 8
Hello Kdang,

Try the following code, where b is the data set containing id signupdate logindate activity:

/* Add Login Month */;
data r;
set b;
lm=MONTH(logindate);
run;
/* Count Start and Open Activities */;
proc SQL;
create table rr as
select id, lm, activity, Count(*) as n
from r
group by id, lm, activity
;quit;
/* Count Login Activities */;
proc SQL;
create table login as
select id, lm, "login" as activity , sum(n) as n
from rr
group by id, lm
;quit;
/* Combine results */;
data rrr;
set rr login;
run;
proc sort data=rrr;
by id lm activity;
run;

Sincerely,
SPR
Daryl
SAS Employee
OK, I understand. The report should be easy to create if you can derive a new column in your data that describes the month from sign-up-date.

id signupdate logindate activity month_from_signup
1 11/01/07 11/01/07 open 1
1 11/01/07 11/06/07 open 1
1 11/01/07 11/07/07 start 1
1 11/01/07 11/22/07 open 1
1 11/01/07 12/06/07 start 2
1 11/01/07 12/08/07 open 2
2 12/05/07 01/01/08 start 1
2 12/05/07 01/22/08 open 2
2 12/05/07 01/28/08 start 2
2 12/05/07 01/29/08 open 2
Daryl
SAS Employee
I've modified my example above to calculate the month from signup.

It's a rough calculation. Take the number of days ellapsed from sign up and divide it by the number of days in an average month. Add 1 so that the first month appears as month 1 instead of month 0.

[pre]
month_from_signup = int(1 + (logindate - signupdate) / (365.25/12));
[/pre]

The full code is here:
[pre]
data test (keep=id signupdate logindate month_from_signup activity);
input id s_month $ 3-5 s_day 7-8 s_year 10-11 l_month $ 13-15 l_day 17-18 l_year 20-21 activity $;
format signupdate logindate mmddyy10.;
signupdate_string = put(s_day,z2.) || upcase(s_month) || put(s_year,z2.);
signupdate = input(signupdate_string,date7.);
logindate_string = put(l_day,z2.) || upcase(l_month) || put(l_year,z2.);
logindate = input(logindate_string,date7.);
month_from_signup = int(1 + (logindate - signupdate) / (365.25/12));
datalines;
1 Nov/01/07 Nov/01/07 open
1 Nov/01/07 Nov/06/07 open
1 Nov/01/07 Nov/07/07 start
1 Nov/01/07 Nov/22/07 open
1 Nov/01/07 Dec/06/07 start
1 Nov/01/07 Dec/08/07 open
2 Dec/05/07 Jan/01/08 start
2 Dec/05/07 Jan/22/08 open
2 Dec/05/07 Jan/28/08 start
2 Dec/05/07 Jan/29/08 open
;
run;
proc sort data=test;
by id month_from_signup;
run;
proc freq data=test noprint;
by id month_from_signup;
tables activity / out=output;
run;
proc print data=output noobs;
run;
[/pre]
Result:
[pre]
month_
from_
id signup activity COUNT PERCENT

1 1 open 3 75.000
1 1 start 1 25.000
1 2 open 1 50.000
1 2 start 1 50.000
2 1 start 1 100.000
2 2 open 2 66.667
2 2 start 1 33.333

[/pre]
Getting closer?
KDang
Fluorite | Level 6
Yes getting much closer now i have the dataset that looks like the following

id signupdate logindate activity month_fr_signup
1 Nov/01/07 Nov/01/07 open 1
1 Nov/01/07 Nov/06/07 open 1
1 Nov/01/07 Nov/07/07 start 1
1 Nov/01/07 Nov/22/07 open 1
1 Nov/01/07 Dec/06/07 start 2
1 Nov/01/07 Dec/08/07 open 2
2 Dec/05/07 Jan/01/08 start 1
2 Dec/05/07 Jan/22/08 open 2
2 Dec/05/07 Jan/28/08 start 2
2 Dec/05/07 Jan/29/08 open 2

The final step is to calculate 3 variables: Activitycount Startcount Opencount
by id and by month_fr_signup
so it looks like the following:

id signupdate logindate activity month_fr_signup Activitycount Startcount Opencount
1 Nov/01/07 Nov/01/07 open 1 1 0 1
1 Nov/01/07 Nov/06/07 open 1 2 0 2
1 Nov/01/07 Nov/07/07 start 1 3 1 2
1 Nov/01/07 Nov/22/07 open 1 4 1 3
1 Nov/01/07 Dec/06/07 start 2 1 1 0
1 Nov/01/07 Dec/08/07 open 2 2 1 1
2 Dec/05/07 Jan/01/08 start 1 1 1 0
2 Dec/05/07 Jan/22/08 open 2 1 0 1
2 Dec/05/07 Jan/28/08 start 2 2 1 1
2 Dec/05/07 Jan/29/08 open 2 3 1 2
Daryl
SAS Employee
Add the following data step after the proc sort.

[pre]
data test;
set test;
by id month_from_signup;
retain activity_count start_count open_count;
if first.month_from_signup then do;
activity_count = 1;
start_count = 0;
open_count = 0;
end;
else do;
activity_count + 1;
end;
if activity = "start" then do;
start_count + 1;
end;
else if activity = "open" then do;
open_count + 1;
end;
run;
[/pre]
SPR
Quartz | Level 8 SPR
Quartz | Level 8
Sorry, I did not realize that you need the month from signdate. In this case the first data step should be replaced with the code:

data r;
set b;
lm=(MONTH(logindate)+12*YEAR(logindate))-
(MONTH(signdate )+12*YEAR(signdate ));
run;

SPR
Ksharp
Super User
Hi. For your siduation that has lots of statistical estimator, The 'proc tabulate' is the best choice.And Cynthia@sas can make it very well.
[pre]
data temp;
input id signupdate : mmddyy8. logindate : mmddyy8. activity $;
format signupdate mmddyy8. logindate mmddyy8. ;
dif=intck('day',signupdate,logindate);
interval=floor(dif/30)+1;
drop dif ;
datalines;
1 11/01/07 11/01/07 open
1 11/01/07 11/06/07 open
1 11/01/07 11/07/07 start
1 11/01/07 11/22/07 open
1 11/01/07 12/06/07 start
1 11/01/07 12/08/07 open
2 12/05/07 01/01/08 start
2 12/05/07 01/22/08 open
2 12/05/07 01/28/08 start
2 12/05/07 01/29/08 open
;
run;

options missing=0;
ods html file='c:\tab.html' style=sasweb;
proc tabulate data=temp;
class id interval activity;
table id*interval='month',activity='Number of activity' all='Number of Login';
keylabel n=' ';
run;
ods html close;
[/pre]



Ksharp Message was edited by: Ksharp
KDang
Fluorite | Level 6
Thanks for the help all, ive gotten the problem solved now.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1523 views
  • 0 likes
  • 5 in conversation