BookmarkSubscribeRSS Feed
new_sas2015
Calcite | Level 5

Hello All,

 

So I have a program where we randomly sample patients each month. I want to create a code where I can merge sample_patients1 (say this is the sample for month of April) merge it sample_pateients2 (this is for month of May) to ensure not same patients get sampled again in May. And if there are any duplicates then I want to know which patients are those.

 

FYI - I am merging the 2 datasets by using 'PatientFileNo' which is unique for each patient.

 

I am using this now but it is not working -

data work.eligible;

merge work.eligible (in=new) reference1 (in=ref keep=PRAMSFileNo);

by PRAMSFileNo; 

if (new and (not ref));   

run; */  

5 REPLIES 5
Reeza
Super User

You can use by processing in a few more steps:

 

/*generate sample data*/
data class;
set sashelp.class;
run;
 
data class2;
set sashelp.class(obs=5);
run;
 
data merged;
set class class2;
run;
 
/*Sort*/
proc sort data=merged;
by name;
run;
 
/*ID*/
data duplicate unique;
set merged;
by name;
if first.name and last.name then output unique;
else output duplicate;
run;

 

EDIT: This may work for your situation. I would also recommend NOT overwriting your dataset.

 

data work.eligible2 duplicate;
merge work.eligible (in=new) reference1 (in=ref keep=PRAMSFileNo);
by PRAMSFileNo; 
if first.pramsfileno and last.pramsfileno then output eligible2;
else output duplicate;
run;

 

Quentin
Super User

What do you mean when you say it is not working?  Are you getting errors, or not the desired result?  You have an extra */ at the end of your code that woudl cause problems if you submitted it (an unclosed comment).

 

Generally it's a bad idea to overwrite a dataset (you are reading work.eligible and writing work.eligible), but it's allowable.  I don't see anything wrong in the code. Below quick test looks like it's working as expected:

 

226  data eligible;
227    do id=1 to 5;
228      output;
229    end;
230  run;

NOTE: The data set WORK.ELIGIBLE has 5 observations and 1 variables.

231
232  data reference1;
233    do id=1,3,5;
234      output;
235    end;
236  run;

NOTE: The data set WORK.REFERENCE1 has 3 observations and 1 variables.

237
238  data eligible;
239    merge eligible (in=new)
240          reference1 (in=ref)
241    ;
242    by id;
243    if new=1 and ref=0;
244  run;

NOTE: There were 5 observations read from the data set WORK.ELIGIBLE.
NOTE: There were 3 observations read from the data set WORK.REFERENCE1.
NOTE: The data set WORK.ELIGIBLE has 2 observations and 1 variables.

BASUG is hosting free webinars Next up: Jane Eslinger presenting PROC REPORT and the ODS EXCEL destination on Mar 27 at noon ET. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
Steelers_In_DC
Barite | Level 11

I would write the same thing like this to prevent error in syntax:

 

data new_eligible;
merge work.eligible (in=new)  
          reference1    (in=ref keep=PRAMSFileNo);
by PRAMSFileNo;
if new and not ref;   
run; 

 

BUT, I think you would be better off giving example data and an example of the desired output.  You can possibly learn a completely different and possibly better way to get what you are looking for.

dcruik
Lapis Lazuli | Level 10

Here's a way to display only the matches between sample_patients1 (eligible) and sample_patients2 (reference1) using the SQL procedure:

proc sql;
create table eligible_duplicates as
select A.*,
       B.PRAMSFileNo as Dup_PRAMSFileNo
from eligible A inner join reference1 B
on (A.PRAMSFileNo=B.PRAMSFileNo);
quit;
ballardw
Super User

When I have had somewhat similar sample management issues my approach has been to maintain a separate data table of the selected sample with the sample time frame (month and year perhaps). Then when the next round comes up start with the sample frame, exclude any records/subjects from the previously samples (proc sql with EXCEPT works well) and then APPEND the selected sample afterwards.

 

The append with the sample period means that I can easily include previously sampled if sufficient time has expired assuming that is allowed in the protocol for a study. Or if you have separate groups such as by facility or treatment groupings you include that information as well so that a patient that might have been sampled for group x but is also eligible for group y is available for sample in that part of the process.

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1002 views
  • 0 likes
  • 6 in conversation