BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
PharmlyDoc
Quartz | Level 8

Bingo! 

 

proc sort data=want;
by userid;
run;

proc sort data=allrecords;
by userid;
run;

data combine;
merge allrecords(in=a) want(in=b);
by userid;
if a;
if a and b then flag=1;
else flag=0;
run;

proc print data=combine;
run;
tarheel13
Rhodochrosite | Level 12

Here is an example of inexact matching using SQL. Could have used logic like this to find the logins within 30 days. I have SQL notes if anyone wants them. Personally, I am a huge fan of SQL and find it advantageous to know it. I think I got the logic wrong in my query but this would have been how to do it. It is nice to be able to think of more than one solution to programming problems.

TITLE 'Example of a match on inexact values
PROC SQL; SELECT a.pt_id, a.admdate LABEL="Admission", b.admdate LABEL="Re-admission within 35 days"
FROM ex.admits AS a, ex.admits AS b
WHERE a.pt_id = b.pt_id AND /* match on patient ID */ a.admdate < b.admdate AND /* only if admit before re-admit */ b.admdate <= (a.admdate+35); /* only if re-admit w/in 35 days */
PharmlyDoc
Quartz | Level 8

I'm actually not getting what I believe to be correct from this. 

 

The tables that I'm using have thousands of users. And for some reason when I merge the want table with the allrecords table, the resulting combine table has more distinct userids than the want table, which doesn't make sense. The numbers should be the same or less. 

tarheel13
Rhodochrosite | Level 12

Well, the code I gave you was assuming that want and all_records are both at 1 row per userid. Are they both not at user-id level? 

PharmlyDoc
Quartz | Level 8

The final derived want dataset has one userid per row (no duplicate user id's), likewise all_records has 1 row per user id (no duplicates). Both of these tables have multiple columns but each table has only one column that contains userid's.

tarheel13
Rhodochrosite | Level 12
I was asking if it was 1 row per userid. If it’s not, then that might be why it’s at the wrong level.
PharmlyDoc
Quartz | Level 8

Yes, each userid only has one row associated with it per table. 

 

This SQL code does the same thing as the data step + merge statement, but the SQL takes much longer to run. I'll dig back in at the code tomorrow. 

proc sql noprint;
select s.*, (case when r.userid is null then 0 else 1 end) as flag
from allrecords s left join
want r
on r.userid = s.userid;
quit;
tarheel13
Rhodochrosite | Level 12
Well, we haven’t seen your program but I tested the proc sort and merge with the datasets in this thread and got the desired results you described. can help better if we see code or you post an example of output that isn’t what you expected.
PharmlyDoc
Quartz | Level 8

@andreas_lds 

@tarheel13 

 

Everything comes out correctly with my real dataset with the following code:

proc summary data=work.records nway;
   class userid login_datetime;
   format login_datetime dtmonyy7.;
   output out=work.counted(drop=_type_ rename=(_freq_ = count));
run;

 

For some reason SAS gives me duplicate userids when I use the  keep userid; statement  with my real dataset. 

But the following code appears to work and gives me the results that I got with the test data that I included in this post. 

data work.want;
   set work.counted;
   by userid;
   
   retain consec_month last_login;
   
   login_date = intnx('month', datepart(login_datetime), 0, 'b');
   
   if first.userid then do;
      consec_month = 1;
   end;
   else do; 
      if intck('month', last_login, login_date) = 1 then do;
         consec_month = consec_month + 1;
      end;
      else do;
         consec_month = 0;
      end;   
   end;
  
   last_login = login_date;
  
   if consec_month >= 4 ;
/* deleted  keep userid;  */
run;

proc print data=want;
run;

proc sql ;
  create table userid_active as
    select distinct(userid) as USER_ID
      from want
  ;
quit;

 

 

andreas_lds
Jade | Level 19

@PharmlyDoc wrote:

@andreas_lds 

@tarheel13 

 

Everything comes out correctly with my real dataset with the following code:

proc summary data=work.records nway;
   class userid login_datetime;
   format login_datetime dtmonyy7.;
   output out=work.counted(drop=_type_ rename=(_freq_ = count));
run;

 

For some reason SAS gives me duplicate userids when I use the  keep userid; statement  with my real dataset. 

But the following code appears to work and gives me the results that I got with the test data that I included in this post. 

data work.want;
   set work.counted;
   by userid;
   
   retain consec_month last_login;
   
   login_date = intnx('month', datepart(login_datetime), 0, 'b');
   
   if first.userid then do;
      consec_month = 1;
   end;
   else do; 
      if intck('month', last_login, login_date) = 1 then do;
         consec_month = consec_month + 1;
      end;
      else do;
         consec_month = 0;
      end;   
   end;
  
   last_login = login_date;
  
   if consec_month >= 4 ;
/* deleted  keep userid;  */
run;

 

 


The duplicates are a result of the change in the subsetting-if you implemented.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 24 replies
  • 1176 views
  • 3 likes
  • 4 in conversation