BookmarkSubscribeRSS Feed

Conquer the Dating Scene with SAS

Started ‎04-03-2020 by
Modified ‎08-03-2021 by
Views 3,632
Editor's note: SAS programming concepts in this and other Free Data Friday articles remain useful, but SAS OnDemand for Academics has replaced SAS University Edition as a free e-learning option. Hit the orange button below to start your journey with SAS OnDemand for Academics:
 

Access Now

 

One of the few lighter moments of the current public health crisis occurred during one of the UK man-in-black-long-sleeved-shirt-and-woman-in-black-dress-888899.jpggovernment’s daily coronavirus press conferences. One prominent journalist, Tom Newton-Dunn, asked about the lockdown and how it would affect two people who were boyfriend and girlfriend. He wanted to know if they could meet and be affectionate. The question was fielded by Dr Jenny Harries, the Deputy Chief Medical Officer. Her reply raised a few eyebrows and a few smiles. She said that ideally they should stay in their separate households but that they may wish to “test the strength of their relationship” by moving in together. This question of how the coronavirus crisis has affected the dating scene is interesting and will have long term implications for many relationships.

 

In this edition of Free Data Friday we will be looking at data from the University of Columbia to see if we can find what attributes can improve our chances of success on the dating scene when life returns to normal. Between 2002 and 2004 two professors ran a series of speed dating events for students and recorded data about the students and how much weight they put on individual attributes of potential dating partners along with their decision on the night about those partners. Data Scientist Dr Keith McNulty has created a simplified version of the data and that is the data set we will use.

 

Get the Data

 

The original data along with a file containing metadata about the fields can be downloaded here, FreeDataFriday_graphic.jpghowever, as mentioned we will be using a simplified version which can be downloaded from Dr McNulty's Github repository. Dr McNulty has also performed an analysis of the data on his blog but the code is in R which is a language I have no experience with and so I was keen to see how easy it would be to perform a similar analysis with SAS. In particular I wanted to see what attributes were most valued by participants and would be most likely to get you picked for a date.

 

Get started with SAS OnDemand for Academics

 
In this 9-minute tutorial, SAS instructor @DomWeatherspoon shows you how to get your data into SAS OnDemand for Academics and other key steps:

Get Started

 

Getting the Data Ready

 

As always the first thing we want to do is to import the data into SAS. I attempted to import the data using Proc Import and immediately hit a problem. The field holding income data contained some NA values.and these caused errors. I needed to edit the file to replace the NAs with blanks which would be converted to missing values by SAS. In order to do this I used a technique borrowed from a SAS Community solution which you can find here.

 

data _null_;
	infile "/folders/myshortcuts/Dropbox/speed_data_data.csv" dsd truncover;
	file "/folders/myshortcuts/Dropbox/speed_data_clean.csv" dsd;
	length word $200;
	
	do i=1 to 15;
		input word @;
		if word="NA" then word=" ";
		put word @;
	end;
	put;
run;

You can find an explanation of how the code works in the link.

 

Now that I have a "clean" CSV I can run Proc Import to get the data into SAS

 

filename reffile '/folders/myshortcuts/Dropbox/speed_data_clean.csv';

proc import datafile=reffile
	dbms=csv
	out=speed_dating;
	getnames=yes;
	guessingrows=4000;
run;

 

You will notice that I've used quite a large value for guessingrows - this is a habit of mine as I find it helps to avoid issues with fields which may be of mixed data types.

 

The Results

 

In order to carry out the analysis I'm going to us Proc Logistic to execute a simple binary logistic regression model.

 

proc logistic data=speed_dating;
	model dec (event='1') = attr sinc fun intel like met prob amb shar;	
run;

The model lists the variables to be used - dec is the outcome we want to investigate (the decision on whether the potential partner was a match) and the other variables are the scores on a scale of 1 to 10 of various characteristics ranging from attractiveness to ambition and shared interests. Notice that event='1' is expressed as a character value even though dec is a numeric variable.

 

There is quite a bit of information in the output so we'll just confine ourselves to looking at the analysis of maximum likelihood estimates.

 

Results1.png

 

If we look at the Estimate column we can see that  apart from the overall score (the 'like' variable) that the most important attribute appears to be attractiveness. Strangely ambition ('amb') and sincerity ('sinc') both decreased the likelihood of a match.

 

This is promising but I wanted to see if there were any differences between men and women so I ran the model again for the different genders. Firstly the women

 

proc logistic data=speed_dating(where=(gender=0));
	model dec (event='1') = attr sinc fun intel like met prob amb shar;	
run;

Results2.png

 

Then for men

 

proc logistic data=speed_dating(where=(gender=1));
	model dec (event='1') = attr sinc fun intel like met prob amb shar;	
run;

Results3.png

 

We can see that true to stereotype men look more for attractiveness in a partner (a value of 0.5331 compared to 0.3164) than do women. Women look for intelligence and fun more than the men.

 

We can run this model for other categories e.g. age, income and career if we want to refine which attributes we would need to stress in order to improve our chances of a match.

 

Now it's Your Turn!

 

Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.

 

Visit [[this link]] to see all the Free Data Friday articles.

 

 

 

Version history
Last update:
‎08-03-2021 09:55 AM
Updated by:

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags