data test1;
infile datalines delimiter = "," truncover;
input
LN $
FN $
ID
M_LN $
M_FN $
M_DOB date9.
Adress1 $
Adress2 $
city $
;
datalines;
Doe, Emily, 1234, Lan, Jane, 01JAN1988, 1234BlueLan, Apt67, LA
Doe, John, 1235, Lan Jane, 01JAN1988, 1234BlueLan, Apt67, LA
John, Mike, 2345, John, Karen, 19FEB1985, 234ORANGEPL, , SF
Garcia, Jose, 3456, Rosa, Maria, 23APR2001, 89APPLELN, APT34, Boston
Garcia, Victor, 3457, Rosa, Maria, 23APR2001, 89APPLELN, APT34, Boston
;
I have a very large dataset (about 1 million). The above is just a simple dummy example.
Basically I have babies born to the same mother. The baby has their own distinct ID but they don't have a shared family ID.
I want to find the babies with the same mom and create a family ID based on shared mom's information.
The shared data fields can be as many as 10 but not all of them are populated or all the time useful.
What would be an effective way to do that? I'm also considering macro if possible but I want to write the repetitive code first so that I can fully understand the macro.
Thank you for any help!
... View more