Did you happen to note how many variables are in the OUTCOV data when you ran this with 20000 records? Then consider the 1.7M.
You might consider, if practical, adding a generation or similar class variable to subset the data a bit.
If you have repeats of mother father reduce to a single one. Your current code is basically assuming everyone is in the same generation so repeats of family members in the same generation don't add much information while adding many more columns. The individuals would have the same relationship to external-to-the-family members.
... View more