I have a table where every row is a unique user, and I have column containing their total logins and their time as an active user in days.
I would like to create a new table that contains only users who have had at least 1 login every 30 days.
How do I go about this?
Here's what I'm thinking: ?
data want;
set have;
logins_days = (logins/(days/30));
run;
proc sql;
create table want AS
select *
from have
where logins_days >= 0.03333;
quit;
logins | days |
2 | 1047 |
20 | 262 |
8 | 1095 |
15 | 485 |
0 | 40 |
16 | 546 |
3 | 1148 |
20 | 532 |
5 | 378 |
3 | 908 |
17 | 1153 |
1 | 15 |
13 | 90 |
9 | 871 |
11 | 271 |
9 | 745 |
1 | 25 |
20 | 304 |
14 | 875 |
20 | 1121 |
13 | 412 |
5 | 309 |
19 | 138 |
Sorry, but my sql-knowledge is limited, so i can't debug your code.
I am not sure, that i understood your requirement fully, so here's a two-step solution. You may want to comment-out the subsetting-if and the keep-statement to inspect what the step does:
/* count the number of logins per year/month and user */
proc summary data=work.records nway;
class userid login_datetime;
format login_datetime dtmonyy7.;
output out=work.counted(drop=_type_ rename=(_freq_ = count));
run;
data work.want;
set work.counted;
by userid;
retain consec_month last_login;
login_date = intnx('month', datepart(login_datetime), 0, 'b');
if first.userid then do;
consec_month = 1;
end;
else do;
if intck('month', last_login, login_date) = 1 then do;
consec_month = consec_month + 1;
end;
else do;
consec_month = 0;
end;
end;
last_login = login_date;
if consec_month = 4;
keep userid;
run;
Strange logic. I would question it.
Since you have not posted data in usable form or showed the expected result, i can only post untested code:
data want;
set have;
if not missing(Logins);
if Days / Logins <= 30;
run;
The data you've posted only allows to determine the average login gaps. It won't tell you if someone logged in for the first 100 days but then was inactive for the next 100 days.
data want;
set have;
login_avg_less30days_flg = days/logins<=30;
run;
Okay, I see your point.
I need help thinking this out. I have a table of users and their dates that they logged in to their online accounts from 2018-2020 – so a user can have more than one observation in the table.
How do I calculate/see the number of users who logged in at least once per month for 4 consecutive months from 1/1/2019 through 12/31/2020?
data records;
input @1 userid $7.
@8 login_datetime ANYDTDTM21.;
format login_datetime DATETIME21.;
datalines;
Z100003 10/29/2020:04:47:00PM
Z100001 10/30/2020:04:47:00PM
Z100001 10/31/2020:04:47:00PM
Z100009 11/01/2020:04:47:00PM
Z100005 11/02/2020:04:47:00PM
Z100010 11/03/2020:04:47:00PM
Z100012 11/04/2020:04:47:00PM
Z100003 11/05/2020:04:47:00PM
Z100008 11/06/2020:04:47:00PM
Z100013 11/07/2020:04:47:00PM
Z100006 11/08/2020:04:47:00PM
Z100013 11/09/2020:04:47:00PM
Z100011 11/10/2020:04:47:00PM
Z100006 11/11/2020:04:47:00PM
Z100006 11/12/2020:04:47:00PM
Z100013 11/13/2020:04:47:00PM
Z100007 11/14/2020:04:47:00PM
Z100010 11/15/2020:04:47:00PM
Z100008 11/16/2020:04:47:00PM
Z100003 11/17/2020:04:47:00PM
Z100007 11/18/2020:04:47:00PM
Z100010 11/19/2020:04:47:00PM
Z100012 11/20/2020:04:47:00PM
Z100001 11/21/2020:04:47:00PM
Z100003 11/22/2020:04:47:00PM
Z100003 11/23/2020:04:47:00PM
Z100010 11/24/2020:04:47:00PM
Z100010 11/25/2020:04:47:00PM
Z100004 11/26/2020:04:47:00PM
Z100009 11/27/2020:04:47:00PM
Z100005 11/28/2020:04:47:00PM
Z100011 11/29/2020:04:47:00PM
Z100000 11/30/2020:04:47:00PM
Z100013 12/01/2020:04:47:00PM
Z100001 12/02/2020:04:47:00PM
Z100001 12/03/2020:04:47:00PM
Z100002 12/04/2020:04:47:00PM
Z100012 12/05/2020:04:47:00PM
Z100013 12/06/2020:04:47:00PM
Z100006 12/07/2020:04:47:00PM
Z100000 12/08/2020:04:47:00PM
Z100010 12/09/2020:04:47:00PM
Z100002 12/10/2020:04:47:00PM
Z100011 03/12/2018:08:04:00AM
Z100013 03/12/2019:08:17:00AM
Z100006 03/13/2019:08:17:00AM
Z100003 03/14/2019:08:17:00AM
Z100006 03/15/2019:08:17:00AM
Z100007 03/16/2019:08:17:00AM
Z100008 03/17/2019:08:17:00AM
Z100004 03/18/2019:08:17:00AM
Z100006 03/19/2019:08:17:00AM
Z100009 03/20/2019:08:17:00AM
Z100011 03/21/2019:08:17:00AM
Z100011 03/22/2019:08:17:00AM
Z100013 03/23/2019:08:17:00AM
Z100008 03/24/2019:08:17:00AM
Z100009 03/25/2019:08:17:00AM
Z100005 03/26/2019:08:17:00AM
Z100008 03/27/2019:08:17:00AM
Z100010 03/28/2019:08:17:00AM
Z100004 03/29/2019:08:17:00AM
Z100009 03/30/2019:08:17:00AM
Z100012 03/31/2019:08:17:00AM
Z100002 04/01/2019:08:17:00AM
Z100011 04/02/2019:08:17:00AM
Z100006 04/03/2019:08:17:00AM
Z100002 04/04/2019:08:17:00AM
Z100002 04/05/2019:08:17:00AM
Z100005 04/06/2019:08:17:00AM
Z100008 04/07/2019:08:17:00AM
Z100001 04/08/2019:08:17:00AM
Z100001 04/09/2019:08:17:00AM
Z100003 04/10/2019:08:17:00AM
Z100005 04/11/2019:08:17:00AM
Z100004 04/12/2019:08:17:00AM
Z100004 04/13/2019:08:17:00AM
Z100004 04/14/2019:08:17:00AM
Z100004 04/15/2019:08:17:00AM
Z100000 04/16/2019:08:17:00AM
Z100002 04/17/2019:08:17:00AM
Z100007 04/18/2019:08:17:00AM
Z100011 04/19/2019:08:17:00AM
Z100002 04/20/2019:08:17:00AM
Z100004 04/21/2019:08:17:00AM
Z100001 04/22/2019:08:17:00AM
Z100011 04/23/2019:08:17:00AM
Z100006 04/24/2019:08:17:00AM
Z100010 04/25/2019:08:17:00AM
Z100001 04/26/2019:08:17:00AM
Z100007 04/27/2019:08:17:00AM
Z100002 04/28/2019:08:17:00AM
Z100000 04/29/2019:08:17:00AM
Z100011 04/30/2019:08:17:00AM
Z100012 05/01/2019:08:17:00AM
Z100012 05/02/2019:08:17:00AM
Z100003 05/03/2019:08:17:00AM
Z100004 05/04/2019:08:17:00AM
Z100003 05/05/2019:08:17:00AM
Z100009 05/06/2019:08:17:00AM
Z100002 05/07/2019:08:17:00AM
Z100010 05/08/2019:08:17:00AM
Z100005 05/09/2019:08:17:00AM
Z100011 05/10/2019:08:17:00AM
Z100003 05/11/2019:08:17:00AM
Z100010 05/12/2019:08:17:00AM
Z100011 05/13/2019:08:17:00AM
Z100001 05/14/2019:08:17:00AM
Z100012 05/15/2019:08:17:00AM
Z100004 05/16/2019:08:17:00AM
Z100009 05/17/2019:08:17:00AM
Z100007 05/18/2019:08:17:00AM
Z100005 05/19/2019:08:17:00AM
Z100003 05/20/2019:08:17:00AM
Z100004 05/21/2019:08:17:00AM
Z100009 05/22/2019:08:17:00AM
Z100006 05/23/2019:08:17:00AM
Z100006 05/24/2019:08:17:00AM
Z100003 05/25/2019:08:17:00AM
Z100007 05/26/2019:08:17:00AM
Z100003 05/27/2019:08:17:00AM
Z100010 05/28/2019:08:17:00AM
Z100007 05/29/2019:08:17:00AM
Z100004 05/30/2019:08:17:00AM
Z100006 05/31/2019:08:17:00AM
Z100013 06/01/2019:08:17:00AM
Z100000 06/02/2019:08:17:00AM
Z100000 06/03/2019:08:17:00AM
Z100001 06/04/2019:08:17:00AM
Z100012 06/05/2019:08:17:00AM
Z100008 06/06/2019:08:17:00AM
Z100003 06/07/2019:08:17:00AM
Z100011 06/08/2019:08:17:00AM
Z100003 06/09/2019:08:17:00AM
Z100009 06/10/2019:08:17:00AM
Z100007 06/11/2019:08:17:00AM
Z100006 06/12/2019:08:17:00AM
Z100001 06/13/2019:08:17:00AM
Z100011 06/14/2019:08:17:00AM
Z100003 06/15/2019:08:17:00AM
Z100005 06/16/2019:08:17:00AM
Z100010 06/17/2019:08:17:00AM
Z100003 06/18/2019:08:17:00AM
Z100013 06/19/2019:08:17:00AM
Z100007 06/20/2019:08:17:00AM
Z100009 06/21/2019:08:17:00AM
Z100010 06/22/2019:08:17:00AM
Z100008 06/23/2019:08:17:00AM
Z100011 06/24/2019:08:17:00AM
Z100006 06/25/2019:08:17:00AM
Z100000 06/26/2019:08:17:00AM
Z100000 06/27/2019:08:17:00AM
Z100013 06/28/2019:08:17:00AM
Z100009 06/29/2019:08:17:00AM
Z100010 06/30/2019:08:17:00AM
Z100012 07/01/2019:08:17:00AM
Z100013 07/02/2019:08:17:00AM
Z100001 07/03/2019:08:17:00AM
Z100003 07/04/2019:08:17:00AM
Z100000 07/05/2019:08:17:00AM
Z100008 07/06/2019:08:17:00AM
Z100005 07/07/2019:08:17:00AM
Z100008 07/08/2019:08:17:00AM
Z100002 07/09/2019:08:17:00AM
Z100005 07/10/2019:08:17:00AM
Z100003 07/11/2019:08:17:00AM
Z100003 07/12/2019:08:17:00AM
Z100006 07/13/2019:08:17:00AM
Z100012 07/14/2019:08:17:00AM
Z100005 07/15/2019:08:17:00AM
Z100004 07/16/2019:08:17:00AM
Z100005 07/17/2019:08:17:00AM
Z100002 07/18/2019:08:17:00AM
Z100000 07/19/2019:08:17:00AM
Z100005 07/20/2019:08:17:00AM
Z100011 07/21/2019:08:17:00AM
Z100002 07/22/2019:08:17:00AM
Z100008 07/23/2019:08:17:00AM
Z100009 07/24/2019:08:17:00AM
Z100011 07/25/2019:08:17:00AM
Z100013 07/26/2019:08:17:00AM
Z100001 07/27/2019:08:17:00AM
Z100007 07/28/2019:08:17:00AM
Z100010 07/29/2019:08:17:00AM
Z100005 07/30/2019:08:17:00AM
Z100002 07/31/2019:08:17:00AM
Z100011 08/01/2019:08:17:00AM
Z100012 08/02/2019:08:17:00AM
Z100005 08/03/2019:08:17:00AM
Z100003 08/04/2019:08:17:00AM
Z100007 08/05/2019:08:17:00AM
Z100002 08/06/2019:08:17:00AM
Z100001 08/07/2019:08:17:00AM
Z100003 08/08/2019:08:17:00AM
Z100007 08/09/2019:08:17:00AM
Z100004 08/10/2019:08:17:00AM
Z100012 08/11/2019:08:17:00AM
Z100004 08/12/2019:08:17:00AM
Z100005 08/13/2019:08:17:00AM
Z100012 08/14/2019:08:17:00AM
Z100009 08/15/2019:08:17:00AM
Z100005 08/16/2019:08:17:00AM
Z100009 08/17/2019:08:17:00AM
Z100004 08/18/2019:08:17:00AM
Z100013 08/19/2019:08:17:00AM
Z100009 08/20/2019:08:17:00AM
Z100006 08/21/2019:08:17:00AM
Z100012 08/22/2019:08:17:00AM
Z100006 08/23/2019:08:17:00AM
Z100012 08/24/2019:08:17:00AM
Z100008 08/25/2019:08:17:00AM
Z100001 08/26/2019:08:17:00AM
Z100005 08/27/2019:08:17:00AM
Z100007 08/28/2019:08:17:00AM
Z100003 08/29/2019:08:17:00AM
Z100013 08/30/2019:08:17:00AM
Z100007 08/31/2019:08:17:00AM
Z100000 09/01/2019:08:17:00AM
Z100004 09/02/2019:08:17:00AM
Z100008 09/03/2019:08:17:00AM
Z100010 09/04/2019:08:17:00AM
Z100008 09/05/2019:08:17:00AM
Z100001 09/06/2019:08:17:00AM
Z100010 09/07/2019:08:17:00AM
Z100009 09/08/2019:08:17:00AM
Z100000 09/09/2019:08:17:00AM
Z100011 09/10/2019:08:17:00AM
Z100001 09/11/2019:08:17:00AM
Z100008 09/12/2019:08:17:00AM
Z100004 09/13/2019:08:17:00AM
Z100009 09/14/2019:08:17:00AM
Z100004 09/15/2019:08:17:00AM
Z100008 09/16/2019:08:17:00AM
Z100012 09/17/2019:08:17:00AM
Z100002 09/18/2019:08:17:00AM
Z100007 09/19/2019:08:17:00AM
Z100005 09/20/2019:08:17:00AM
Z100008 09/21/2019:08:17:00AM
Z100010 09/22/2019:08:17:00AM
;
run;
proc print data=records;
run;
Can anyone confirm if the following code works for solving logins that occurred in previous 30 days per user?
proc sql;
create table records2 AS
select *
from records
where (datepart(login_datetime) between '01JAN2019'd and '31Dec2020'd) ;
quit;
data records3;
set records2;
date_30_days_before = intnx('day',datepart(login_datetime),-30,'B') ;
format date_30_days_before DATE9.;
run;
proc print data=records3;
run;
proc sql;
create table consolidate_logins as
select userid,
login_datetime,
date_30_days_before,
count(*) as logins
from records3
group by userid,
login_datetime,
date_30_days_before;
quit;
proc sql;
create table logins_month as
select a.userid,
a.login_datetime format=mmddyy10.,
a.logins as logins_that_day,
sum(b.logins) as logins_last_30_days /*Number of logins on, or within 30 days of, the login_date*/
from consolidate_logins as a
left join consolidate_logins as b
on a.userid = b.userid
and b.value <= a.login_datetime
and b.value >= a.date_30_days_before
group by a.userid,
a.login_datetime,
a.logins
;
quit;
I couldn't check your code because I don't know what b.value is. This might get your answer? Do >= 4 in having to get 4 consecutive months or more.
proc sql number;
create table have as select a.userid, a.login_datetime from records2 a
join records2 b
on a.userid=b.userid
and b.login_datetime <= a.login_datetime
and datepart(b.login_datetime) >= datepart(a.login_datetime)-30
order by 1,2;
quit;
data temp;
set have;
format date date9.;
date=datepart(login_datetime);
if lag(userid)=userid and intck('month',lag(date),date)=1 then do;
lag_n=_n_-1;
set have point=lag_n; output;
set have point=_n_; output;
end;
run;
proc sql;
create table want as
select count(distinct userid) as count from
(select * from (select distinct * from temp)
group by userid
having count(*) >=4);
quit;
So, I want to create a new table that lists the userids that had at least 1 login/month for at least 4 consecutive months.
Did you even try running my queries and examining the results?
Yes, I used your code and it produced 14.
(The code that I used I copied from https://stackoverflow.com/questions/42763358/sas-sql-find-3-or-more-occurences-within-a-given-timefr...
and it didn't produce what I was looking for. )
Weird, stackoverflow is usually pretty reliable. I usually post here if I get stuck. Where do you work that you use this type of data? I only know clinical data.
I'm a pharmacist by background but I work in healthcare research.
Sorry, but my sql-knowledge is limited, so i can't debug your code.
I am not sure, that i understood your requirement fully, so here's a two-step solution. You may want to comment-out the subsetting-if and the keep-statement to inspect what the step does:
/* count the number of logins per year/month and user */
proc summary data=work.records nway;
class userid login_datetime;
format login_datetime dtmonyy7.;
output out=work.counted(drop=_type_ rename=(_freq_ = count));
run;
data work.want;
set work.counted;
by userid;
retain consec_month last_login;
login_date = intnx('month', datepart(login_datetime), 0, 'b');
if first.userid then do;
consec_month = 1;
end;
else do;
if intck('month', last_login, login_date) = 1 then do;
consec_month = consec_month + 1;
end;
else do;
consec_month = 0;
end;
end;
last_login = login_date;
if consec_month = 4;
keep userid;
run;
It looks like you solved what I was looking for - I wanted to create a new table that lists the userids that had ≥ 1 login/month for ≥ 4 consecutive months.
Thanks!
Now I have another table of userids called all_records (with just one observation per user) which contains users who activated their online account, and therefore includes the users from the previous table "records", and users who did not activate their account.
How do I create a new column in the table "all_records" that flags the userids produced from your code (work.want) (who met the criteria for having ≥ 1 login/month for ≥ 4 consecutive months)?
Your code produced this for work.want:
Obs userid
Z100001 |
Z100003 |
Z100005 |
Z100006 |
Z100007 |
Z100009 |
Z100010 |
Z100011 |
Z100012 |
/* activated_account yes=1 no=0 */
data all_records;
input @1 userid $7.
@8 activated_account;
datalines;
Z100000 1
Z100001 1
Z100002 1
Z100003 1
Z100004 1
Z100005 1
Z100006 1
Z100007 1
Z100008 1
Z100009 1
Z100010 1
Z100011 1
Z100012 1
Z100013 1
Z100014 0
Z100015 0
Z100016 0
Z100017 0
Z100018 0
Z100019 0
Z100020 0
;
run;
proc print data=all_records;
run;
I think I would probably just do data step for that? So you can sort them both by userid and imagine allrecords is the leftmost table and want is the right table. By using the in flags, we can keep everything in the left dataset by using if a. Just write another if statement for if a and b to flag records that are in both datasets. I think that's what you're asking?
proc sort data=want;
by userid;
run;
proc sort data=allrecords;
by userid;
run;
data combine;
merge allrecords(in=a) want(in=b);
if a;
if a and b then flag=1;
else flag=0;
run;
Don't forget by userid in the merge.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.