11-18-2015 03:46 PM
So I have a program where we randomly sample patients each month. I want to create a code where I can merge sample_patients1 (say this is the sample for month of April) merge it sample_pateients2 (this is for month of May) to ensure not same patients get sampled again in May. And if there are any duplicates then I want to know which patients are those.
FYI - I am merging the 2 datasets by using 'PatientFileNo' which is unique for each patient.
I am using this now but it is not working -
merge work.eligible (in=new) reference1 (in=ref keep=PRAMSFileNo);
if (new and (not ref));
11-18-2015 03:53 PM - edited 11-18-2015 03:56 PM
You can use by processing in a few more steps:
/*generate sample data*/ data class; set sashelp.class; run; data class2; set sashelp.class(obs=5); run; data merged; set class class2; run; /*Sort*/ proc sort data=merged; by name; run; /*ID*/ data duplicate unique; set merged; by name; if first.name and last.name then output unique; else output duplicate; run;
EDIT: This may work for your situation. I would also recommend NOT overwriting your dataset.
data work.eligible2 duplicate; merge work.eligible (in=new) reference1 (in=ref keep=PRAMSFileNo); by PRAMSFileNo; if first.pramsfileno and last.pramsfileno then output eligible2; else output duplicate; run;
11-18-2015 04:00 PM
What do you mean when you say it is not working? Are you getting errors, or not the desired result? You have an extra */ at the end of your code that woudl cause problems if you submitted it (an unclosed comment).
Generally it's a bad idea to overwrite a dataset (you are reading work.eligible and writing work.eligible), but it's allowable. I don't see anything wrong in the code. Below quick test looks like it's working as expected:
226 data eligible; 227 do id=1 to 5; 228 output; 229 end; 230 run; NOTE: The data set WORK.ELIGIBLE has 5 observations and 1 variables. 231 232 data reference1; 233 do id=1,3,5; 234 output; 235 end; 236 run; NOTE: The data set WORK.REFERENCE1 has 3 observations and 1 variables. 237 238 data eligible; 239 merge eligible (in=new) 240 reference1 (in=ref) 241 ; 242 by id; 243 if new=1 and ref=0; 244 run; NOTE: There were 5 observations read from the data set WORK.ELIGIBLE. NOTE: There were 3 observations read from the data set WORK.REFERENCE1. NOTE: The data set WORK.ELIGIBLE has 2 observations and 1 variables.
11-18-2015 05:59 PM
I would write the same thing like this to prevent error in syntax:
merge work.eligible (in=new)
reference1 (in=ref keep=PRAMSFileNo);
if new and not ref;
BUT, I think you would be better off giving example data and an example of the desired output. You can possibly learn a completely different and possibly better way to get what you are looking for.
11-19-2015 09:31 AM
Here's a way to display only the matches between sample_patients1 (eligible) and sample_patients2 (reference1) using the SQL procedure:
proc sql; create table eligible_duplicates as select A.*, B.PRAMSFileNo as Dup_PRAMSFileNo from eligible A inner join reference1 B on (A.PRAMSFileNo=B.PRAMSFileNo); quit;
11-19-2015 10:48 AM
When I have had somewhat similar sample management issues my approach has been to maintain a separate data table of the selected sample with the sample time frame (month and year perhaps). Then when the next round comes up start with the sample frame, exclude any records/subjects from the previously samples (proc sql with EXCEPT works well) and then APPEND the selected sample afterwards.
The append with the sample period means that I can easily include previously sampled if sufficient time has expired assuming that is allowed in the protocol for a study. Or if you have separate groups such as by facility or treatment groupings you include that information as well so that a patient that might have been sampled for group x but is also eligible for group y is available for sample in that part of the process.