Bingo!
proc sort data=want;
by userid;
run;
proc sort data=allrecords;
by userid;
run;
data combine;
merge allrecords(in=a) want(in=b);
by userid;
if a;
if a and b then flag=1;
else flag=0;
run;
proc print data=combine;
run;
Here is an example of inexact matching using SQL. Could have used logic like this to find the logins within 30 days. I have SQL notes if anyone wants them. Personally, I am a huge fan of SQL and find it advantageous to know it. I think I got the logic wrong in my query but this would have been how to do it. It is nice to be able to think of more than one solution to programming problems.
TITLE 'Example of a match on inexact values
PROC SQL; SELECT a.pt_id, a.admdate LABEL="Admission", b.admdate LABEL="Re-admission within 35 days"
FROM ex.admits AS a, ex.admits AS b
WHERE a.pt_id = b.pt_id AND /* match on patient ID */ a.admdate < b.admdate AND /* only if admit before re-admit */ b.admdate <= (a.admdate+35); /* only if re-admit w/in 35 days */
I'm actually not getting what I believe to be correct from this.
The tables that I'm using have thousands of users. And for some reason when I merge the want table with the allrecords table, the resulting combine table has more distinct userids than the want table, which doesn't make sense. The numbers should be the same or less.
Well, the code I gave you was assuming that want and all_records are both at 1 row per userid. Are they both not at user-id level?
The final derived want dataset has one userid per row (no duplicate user id's), likewise all_records has 1 row per user id (no duplicates). Both of these tables have multiple columns but each table has only one column that contains userid's.
Yes, each userid only has one row associated with it per table.
This SQL code does the same thing as the data step + merge statement, but the SQL takes much longer to run. I'll dig back in at the code tomorrow.
proc sql noprint;
select s.*, (case when r.userid is null then 0 else 1 end) as flag
from allrecords s left join
want r
on r.userid = s.userid;
quit;
Everything comes out correctly with my real dataset with the following code:
proc summary data=work.records nway;
class userid login_datetime;
format login_datetime dtmonyy7.;
output out=work.counted(drop=_type_ rename=(_freq_ = count));
run;
For some reason SAS gives me duplicate userids when I use the keep userid;
statement with my real dataset.
But the following code appears to work and gives me the results that I got with the test data that I included in this post.
data work.want;
set work.counted;
by userid;
retain consec_month last_login;
login_date = intnx('month', datepart(login_datetime), 0, 'b');
if first.userid then do;
consec_month = 1;
end;
else do;
if intck('month', last_login, login_date) = 1 then do;
consec_month = consec_month + 1;
end;
else do;
consec_month = 0;
end;
end;
last_login = login_date;
if consec_month >= 4 ;
/* deleted keep userid; */
run;
proc print data=want;
run;
proc sql ;
create table userid_active as
select distinct(userid) as USER_ID
from want
;
quit;
@PharmlyDoc wrote:
Everything comes out correctly with my real dataset with the following code:
proc summary data=work.records nway; class userid login_datetime; format login_datetime dtmonyy7.; output out=work.counted(drop=_type_ rename=(_freq_ = count)); run;
For some reason SAS gives me duplicate userids when I use the
keep userid;
statement with my real dataset.But the following code appears to work and gives me the results that I got with the test data that I included in this post.
data work.want; set work.counted; by userid; retain consec_month last_login; login_date = intnx('month', datepart(login_datetime), 0, 'b'); if first.userid then do; consec_month = 1; end; else do; if intck('month', last_login, login_date) = 1 then do; consec_month = consec_month + 1; end; else do; consec_month = 0; end; end; last_login = login_date; if consec_month >= 4 ; /* deleted keep userid; */ run;
The duplicates are a result of the change in the subsetting-if you implemented.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.