Hello!
I am currently working on a large database for a medical laboratory. I need to figure out a way to give individual patients a unique patient ID that remains the same even as new patients and existing patient visits and test results are entered (which is done daily). The patient ID needs to remain the same so individual patients can be looked up using the unique ID. Individual patients may have multiple visits entered over time and their ID needs to be matched to each new entry. I use patient name and birthday to assign individual patient IDs and I currently have unique IDs assigned using the code:
data full_db; set full_db;
by pt_name dob;
retain id 0;
if first.dob then id=id+1;
run;
I merge all the data together and then sort and assign IDs but the problem is, because it is sorted by name and DOB, each time I add more people to the database the IDs get reorganized and there is a new, but unique, patient ID for individuals. Is there any way to merge new data into the dataset and assign an ID matched to a known patient but at the same time assign a new ID to patients who have no previous entries? To complicate matters a bit more, I have to merge data coming in from 15 different streams.
Any advice would be greatly appreciated!
Thank you!
Without seeing a realistic (but small) example, its hard to write any code. All I can say is that whatever code you write should assign the IDs to new patients, not existing patients. There are many ways to do this, each time searching the database for whether or not this patient is an existing patient.
Thanks Paige-
So I currently have a dataset with 1.5 million (+) entries belonging to around 700,000 individual patients. Each entry consists of patient name, DOB, testing facility name, and test result. Each day I get a couple hundred new entries sent in batches from every testing location. For example:
Current DataSet:
Name: Master_List
Facility NAME DOB Patient ID
A Smith, John 1/2/92 000123
A Smith, John 1/2/92 000123
A Person, Beth 2/2/95 000124
A Patient, Abe 12/1/15 000125
A Patient, Kelly 2/6/15 000126
B Test, Person 12/12/18 000127
B Another, Person 1/1/99 000128
B Person, Beth 2/2/95 000124
C Test, Person 12/12/18 000127
I need to merge in data from the following data sets (none of them have patient IDs)
DataSet 1
Facility Name DOB
A Persons, Name 1/1/11
A Another, Person 1/1/99
A Athird, Person 1/1/11
DataSet 2
Facility Name DOB
B Smith, John 1/2/92
B Myname, IS 1/1/15
B The, Other 1/1/15
B Person, Bee 1/15/15
and so one and so forth for 13 other facilities..
How would I merge these datasets together, assign new patient IDs to the patients who have no previous entries in my master list and also match assign the correct Patient IDs to the patients who do have previous entries?
Try initializing the variable to the last used number.
%let lastid=0;
proc sql noprint;
select max(id) into :lastid trimmed from FULL_DB_OLD ;
quit;
data full_db_new ;
set full_db_old full_db_update;
by pt_name dob ;
retain new_id &lastid;
if first.dob and missing(id) then do;
new_id+1;
id=new_id;
end;
drop new_id;
run;
So the business key here is the combination of name and DoB?
If you don't have to use the id outside your system, you could use a hashed value of the concatenation of these two fileds.
The benefit of this is you dan't have to manage the id sequence.
pt_hash = md5(catx('|',pt_name,DoB));
Managing surrogate keys (automatic SAS code generation) is a standard functionality in Data Integration Studio, but it seems from your post you're not using that.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.