Hi,
Example: Person 001 (in reality, many persons IDs) log into personal account online, then visit different websites, the data are in separate tables
(1) login data
ID login_account
001 2010-01-01 08:00:01AM
001 2010-01-01 10:00:01AM
001 2010-01-02 09:00:00AM
001 2010-01-03 09:00:01AM
(2) Web A data
ID login_web_A
001 2010-01-01 08:01:00AM
001 2010-01-01 08:30:00AM
001 2010-01-01 08:45:00AM
001 2010-01-01 10:10:10AM
001 2010-01-02 09:10:00AM
001 2010-01-02 10:00:03AM
001 2010-01-02 11:00:10AM
001 2010-01-03 11:30:00AM
001 2010-01-03 11:33:00AM
001 2010-01-03 12:30:00AM
(3) Web B data
001 2010-01-01 08:06:00AM
001 2010-01-01 08:39:00AM
001 2010-01-01 08:47:00AM
001 2010-01-01 11:10:10AM
001 2010-01-02 09:30:00AM
001 2010-01-02 10:20:03AM
001 2010-01-02 11:20:10AM
001 2010-01-03 11:33:00AM
001 2010-01-03 11:56:00AM
001 2010-01-03 12:33:00AM
001 2010-01-03 11:34:00AM
001 2010-01-03 11:36:00AM
001 2010-01-03 12:55:00AM
How could I link those data based on person ID and log in account time, please note there are more than one visits on each web sites after log in.
Thanks!
What are you looking for as output? Based on the data sets provided, please show what you would expect as output and explain the logic, then we can help you with what type of approach would be required.
@joe66 wrote:
Hi,
Example: Person 001 (in reality, many persons IDs) log into personal account online, then visit different websites, the data are in separate tables
(1) login data
ID login_account
001 2010-01-01 08:00:01AM
001 2010-01-01 10:00:01AM
001 2010-01-02 09:00:00AM
001 2010-01-03 09:00:01AM
(2) Web A data
ID login_web_A
001 2010-01-01 08:01:00AM
001 2010-01-01 08:30:00AM
001 2010-01-01 08:45:00AM
001 2010-01-01 10:10:10AM
001 2010-01-02 09:10:00AM
001 2010-01-02 10:00:03AM
001 2010-01-02 11:00:10AM
001 2010-01-03 11:30:00AM
001 2010-01-03 11:33:00AM
001 2010-01-03 12:30:00AM
(3) Web B data
001 2010-01-01 08:06:00AM
001 2010-01-01 08:39:00AM
001 2010-01-01 08:47:00AM
001 2010-01-01 11:10:10AM
001 2010-01-02 09:30:00AM
001 2010-01-02 10:20:03AM
001 2010-01-02 11:20:10AM
001 2010-01-03 11:33:00AM
001 2010-01-03 11:56:00AM
001 2010-01-03 12:33:00AM
001 2010-01-03 11:34:00AM
001 2010-01-03 11:36:00AM
001 2010-01-03 12:55:00AM
How could I link those data based on person ID and log in account time, please note there are more than one visits on each web sites after log in.
Thanks!
You could concatenate the website tables, adding a web site id, or do this separately for each table, this way:
data login;
input ID time &:anydtdtm32.;
format time datetime19.;
datalines;
001 2010-01-01 08:00:01AM
001 2010-01-01 10:00:01AM
001 2010-01-02 09:00:00AM
001 2010-01-03 09:00:01AM
;
data web_a;
input ID time &:anydtdtm32.;
format time datetime19.;
datalines;
001 2010-01-01 08:01:00AM
001 2010-01-01 08:30:00AM
001 2010-01-01 08:45:00AM
001 2010-01-01 10:10:10AM
001 2010-01-02 09:10:00AM
001 2010-01-02 10:00:03AM
001 2010-01-02 11:00:10AM
001 2010-01-03 11:30:00AM
001 2010-01-03 11:33:00AM
001 2010-01-03 12:30:00AM
;
proc sql;
create table web_a_login as
select
a.*,
b.time as login_time
from web_a as a left join
login as b on a.id=b.id and b.time <= a.time
group by a.id, a.time
having b.time=max(b.time);
select * from web_a_login;
quit;
Actually, this would be more efficient than my previous post:
data web_a_login;
merge web_a(in=inWeb) login(in=inLogin);
by id time;
retain login_time;
if first.id then call missing(login_time);
if inLogin then login_time=time;
if inWeb;
format login_time datetime19.;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.