Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

How to predict date of birth using First name in SAS? Please help Thank you

Reply
Frequent Contributor
Posts: 95

How to predict date of birth using First name in SAS? Please help Thank you

Hi All

The date of birth in our current database has 25% missing ...I  would like to predict the missing usng first name etc..I heard it gives a good predition?

Could anyone help with this? How is it done in SAS! What modelling technique do we use here?

Your help would be much appreciated

Many Thanks

Super User
Super User
Posts: 7,403

Re: How to predict date of birth using First name in SAS? Please help Thank you

Never heard of such a thing, I see there is an R package for guessing the Gender via name, but couldn't find anything on age.  Don't see how it would work anyway.  Why not just assign them a random age within certain group ranges if you have to have age, or infer it from other data, e.g. they had an "xyz procedure" at this date, so they would be > 18 at that point etc. or they got a credit card at this date which indicates they were 18 at that point.

SAS Employee
Posts: 106

Re: How to predict date of birth using First name in SAS? Please help Thank you

Are you using Enterprise Miner? If so, you can use the Impute node. Choose the Tree option:

    • Tree — Use the Tree setting to replace missing interval variable values with replacement values that are estimated by analyzing each input as a target. The remaining input and rejected variables are used as predictors. Use the Variables window to edit the status of the input variables. Variables that have a model role of target cannot be used to impute the data. Because the imputed value for each input variable is based on the other input variables, this imputation technique may be more accurate than simply using the variable mean or median to replace the missing tree values.          

Hope this helps!

Ray

Trusted Advisor
Posts: 1,615

Re: How to predict date of birth using First name in SAS? Please help Thank you

One of the frustrations of a question like this, is that people will suggest methods to do this, without stopping to mention that the idea of predicting a birth date based on first name seems to make no sense at all.

For any prediction method to work, there must be some sort of "correlation" between the input and the output. Maybe there is some correlation that I don't know about, but at this time, I would advise the original poster to not do this at all. The original poster did say "I would like to predict the missing usng first name etc..I heard it gives a good predition?" but unless you can give us a reference, I think you're wasting your time.

Contributor dkb
Contributor
Posts: 53

Re: How to predict date of birth using First name in SAS? Please help Thank you

There's a Wolfram Alpha topic that's relevant: https://www.wolframalpha.com/input/?i=name+William&lk=3 . The "Estimated Current Age Distribution" plot is enlightening - it bears out PaigeMiller's advice - the slight correlation that's visible is far too loose to justify using this idea.

Super User
Posts: 10,500

Re: How to predict date of birth using First name in SAS? Please help Thank you

An added complication is, at least in many areas of the United States, parents have been attempting to name children with "unique" names. Below is a selected list of girl's names from a recent 10 year period. Careful reading will show that many of these names are somewhat phonetically equivalent to relatively common names, Aeryka <=> Erica for example.

Acacia,Chili,Indigodawn,Mem' Ree,Secret-Destiny

Aeryka,Cinamin,Indyana,Memphis,Serendipity

Alaska,Clarity,Infiniti,Mesa,Shasta

Alastrionna,Clarixxa,Innocence,Miami,Shasta-Rain

Allyvia,Cloketta,Integrity,Mishalyn,Sha'uri

Alpine,Creedance,Isabow,Modiesty-Star,Shy

Ambrosia,Crimson,Itali,Monet,Sicily

Americus,Cymmetry,Ixzy,Montana,Silver

Amnesty,Cynica,Izrayelle,Mysticque,Sincere

Anakalia,Daiquiri,Jaazminh,Nature,Snow

Angelic,Daytona,Jetta,Nautica,Sonrisa

Aptisam,Dayzee,Jewleah,Navy,Soul

Aquilla,Dazsha,Jorja,Nirvana,Sparrow

Arbor,Denym,Jubilee,Normandie,Starlit

Arlington,Diamonique,Juniper,Northstar,Sublym

Ataree,Diligence,Jupiter,Noxx,Supernova

August Star,Divinity,Jurnee,Nutaliay Harmoney,Surreal

Auktober,Dorcas,Jynnjer,Octayvia,Symphony

Auroaramackay,Draekli,Kahlua,Olive,Tacheranai

Autumn Hunnie,Dream,Kalispell,Otila,Tanaquil

Autym,Dublin,Kanyon,Oyuky,Tehyanatane

Aveda,Dymand,Karizma,Pallas,Teighlor

Beatriz,Eboneyrose,Kaskade,Pandora,Tennessee

Beautifull,Ecstacy,Kazpyr,Payshence,Thyme

Belphoebe,Eeleceya,Kezzi,Peaches,Tottie

Berlyn,Elexious,Khlover,Pennilane Meadow,Tragen

Berthaalicia,England,Kiffin,Pepper,Tricity

Bicardi,Envy,Klowie,Perfect,Trulie

Blayde,Eos,Kozmo,Persephone,Trynadee

Blessin,Epiphany,Krickette,Phaedra,Tsunami

Blyss,Essence,Kronic,Poet,Tuesday-Rain

Boisen,Eternytie,Krymsun,Poppy,Tundra

Braenwynne,Fall,Kwincee,Prairie,Tyranny

Breeze,Fancee,Lala,Pranaleyadri,Ugonna

Bristol,Fashion,Lavender,Pranathi,Uneike

Brittanica,Fayble,Lectra,Promise,Utahnna

Brixx,Fayte,Lexington,Qatira,Vegas

Brizzbin,Fennel,Libbertie,Quietstorm,Velicydee

Brookenzie,Flossianna,Libertyann,Quimby,Viktoriya

Burgandee,Freedom,Licet,Rainger,Wajd

Byainett,Goldie-Moon,Little Summers,Ravenbella,Wrandie

Byrkli,Graceland,Lixy,Rebel-Ann,Wyntre

Cabella,Gyzzelle,London,Remedy,Xerenity

Cachet,Hadies,Lotus,Remmington,Yafa

California,Hailo,Love,Remzije,Ynfiniti

Calloway,Happy,Lux,Reverie,Yochabelle

Calypso,Harlequin,Lybburtie,Rhuivnyin,Zepplyn

Capreece,Heaven,Magnolia,Russia,Zipaya

Cascade,Hella,Maitre,Saig,Zipporah

Cassiopeia,Hermyanie,Mali,Saylor,Zoigh

Catalina,Hero,Malybu,Sã-kõ-yã,Zuzu

Cedar,Heziachiah,Manhattan,Saoirse,

Cedee,Holland,Maplejo,Sapphire,

Celestial,Honesty,Mavity,Sativalyn,

Celtic,Icelynn,Mayte,Season,

Charm,Immaculata,McCall,Seattle,

SAS Employee
Posts: 24

Re: How to predict date of birth using First name in SAS? Please help Thank you

Ray,

That is a great suggestion and a well-founded, scalable, and contemporary method for addressing missing values in a predictive model. The idea is that a decision tree will use patterns detected from *all* the variables - which may not be obvious to us, e.g. 2-way correlations - to predict the missing value for each observation.

Several other best practices for handling missing values include:

1. Simply leaving the missing values in the data and using a decision tree or an ensemble of decision trees (i.e. random forest and/or gradient boosting) as your final predictive model.

Decision trees handle missing values at least 2 different ways:

--- In training they can group missing values in bins by themselves or along with other values of a variable, and use missing values to build the predictive model.

--- Surrogate rules: decision trees can use a variable like "State" to make a decision about a variable like "ZipCode" if it encounters a missing value for "ZipCode".

2. Impute the missing values however you like but retain a binary missing value indicator variable, so that missingness can be used to help make your final predictions.

Hope that helps.

SAS Employee
Posts: 340

Re: How to predict date of birth using First name in SAS? Please help Thank you

You could use databases like this:

Years with Popular Baby Names Data | BabyCenter

Ask a Question
Discussion stats
  • 7 replies
  • 693 views
  • 3 likes
  • 8 in conversation