BookmarkSubscribeRSS Feed

Examining Age Gaps in Hollywood Romances with SAS

Started ‎05-17-2019 by
Modified ‎08-04-2021 by
Views 1,811

SAS programming concepts in this and other Free Data Friday articles remain useful, but SAS OnDemand for Academics has replaced SAS University Edition as a free e-learning option. Hit the orange button below to start your journey with SAS OnDemand for Academics:

 

Access Now

 

It’s often said that female actors in Hollywood have a harder time getting parts in films once they pass a certain age than their male contemporaries and that this is particularly true when the part involves a romantic relationship.

 

Lynn Fisher of the web site hollywoodagegap.com has some interesting graphics showing the relative ages of actors whose characters in films are romantically involved. You can also download the data which was used to create these graphics from her github repository to use it in your own analysis. In this article we will look at this data to see if we can determine not only whether this claim is true, but if there are any other interesting patterns in the data.

 

Get the Data

 

FreeDataFriday_graphic.jpgYou can download the data from the GitHub repository as a CSV file. I renamed the downloaded file to make it clearer what data the file held. 

 

Get Started with SAS OnDemand for Academics

 
In this 9-minute tutorial, SAS instructor @DomWeatherspoon shows you how to get your data into SAS OnDemand for Academics and other key steps:
 

Get Started

 

Get the Data Ready

 

Firstly, I imported the data using Proc Import and then used Proc Print and Proc Contents to examine it.

 

 

filename csv
	"/folders/myshortcuts/Dropbox/Articles/
		SAS Communities Library/Hollywood Age Gaps/agegaps.csv"
		termstr=LF;


proc import datafile=csv
		    out=moviedata
		    dbms=CSV
		    replace;
run;


filename csv;

proc print data=moviedata(obs=10);
run;

proc contents data=moviedata order=varnum;
run;

 

I discovered a few issues with the data and the variable format:

 

  1. Although actor_1_gender is usually male it is not always the case (e.g. The Movie Love Actually has Laura Linney as actor_1 and Rodrigo Santoro as actor_2); and
  2. While actor_1_age is usually greater than actor_2_age, again that is not always true; and
  3. I also noticed that actor_1_age, actor_2_age and age_difference were imported as character variables when they really should be numeric, so I’ll change that; and
  4. In addition, there are some same-sex relationships in the file which I will have to handle separately; and
  5. For convenience I wanted a flag variable showing if the older actor was male; and
  6. Finally, I also want to divide up the observations by age band to try to remove a lot of the noise which would appear in the data, so I’ll create a custom format to help with that.

 

Here’s the code which accomplishes all this and splits the file into three files according to whether the pairing is different sex, same sex or is between two actors of identical age.

 

 

proc format;
	value  age_band
		Low-25='25 and under'
		26-35='26-35'
		36-45='36-45'
		46-55='46-55'
		56-65='56-65'
		66-75='66-75'
		76-high='76 plus';
run;

data moviedata;
	set moviedata(drop=actor_1_birthdate actor_2_birthdate director release_year);
	
	actor_1_age_new = input(actor_1_age, 8.);
	drop actor_1_age;
	rename actor_1_age_new=actor_1_age;
   
	actor_2_age_new = input(actor_2_age, 8.);
	drop actor_2_age;
	rename actor_2_age_new=actor_2_age;
	
	age_difference_new = input(age_difference, 8.);
	drop age_difference;
	rename age_difference_new=age_difference;
run;

data diff_sex same_sex same_age(drop=male_age female_age man_older);

	set moviedata;

	if actor_1_gender=actor_2_gender then output same_sex;
	else do;
		if actor_1_gender="man" then do;
			male_age=actor_1_age;
			female_age=actor_2_age;
		end;
		else do;
			male_age=actor_2_age;
			female_age=actor_1_age;
		end;
		if male_age=female_age then output same_age;
		if male_age>=female_age
			then man_older=1;
			else man_older=0;
		output diff_sex;
	end;
run;

 

 

The Results

 

Having reshaped the data into a form suitable for my analysis I then used Proc SQL to create files holding summary details of the average age difference between the older and younger actor where the male actor is older, the female actor is older and for same sex relationships all grouped by the custom format age_band which I created earlier.

 

 

proc sql;
	create table diffstats_m as
	select distinct put(male_age,age_band.) as age_band,
	count(actor_1_name) as num_pairings,
	avg(age_difference) as actual_diff
	from diff_sex
	where man_older=1
	group by put(male_age,age_band.);
quit;

proc sql;
	create table diffstats_f as
	select distinct put(female_age,age_band.) as age_band,
	count(actor_1_name) as num_pairings,
	avg(age_difference) as actual_diff
	from diff_sex
	where man_older=0
	group by put(female_age,age_band.);
quit;

data same_sex;
	set same_sex;
	older_actor=largest(1,actor_1_age,actor_2_age);
	younger_actor=smallest(1,actor_2_age,actor_1_age);
run;

proc sql;
	create table diffstats_s as
	select distinct put(older_actor,age_band.) as age_band,
	count(actor_1_name) as num_pairings,
	avg(age_difference) as actual_diff
	from same_sex
	group by put(older_actor,age_band.);
	;
quit;

 

Having done that, I then merged the three files and used Proc SGPlot to create two graphs.

 

The first graph is a line chart which has three series.

 

 

data all_stats;
	merge diffstats_m(rename=(actual_diff=male_diff num_pairings=pairings_m))
		diffstats_f(rename=(actual_diff=female_diff num_pairings=pairings_f))
		diffstats_s(rename=(actual_diff=same_diff num_pairings=pairings_s));
	by age_band;
run;

title 'Average Age Differences in Hollywood Romances';
footnote j=l 'Data From: https://github.com/lynnandtonic/hollywood-age-gap';
proc sgplot data=all_stats;
	series x=age_band y=male_diff /smoothconnect lineattrs=(thickness=3
		pattern=SOLID) legendlabel='Avg Age Difference when Male Actor is Older';
	series x=age_band y=female_diff /smoothconnect lineattrs=(thickness=3
		pattern=SHORTDASH) legendlabel='Avg Age Difference when Female Actor is
			Older';
	series x=age_band y=same_diff /smoothconnect lineattrs=(thickness=3
		pattern=LONGDASH) legendlabel='Avg Age Difference in Same Sex Relationship';
	yaxis grid values=(0 to 60 by 2) valueshint label='Age Difference (Years)';
	xaxis label='Age Band of Older Actor';
run;

 

Here is the output of that first Proc SGPlot

 

SGPlot Chart1 Age Gaps.png

 

 

From the chart, I can see three things:

 

  1. No matter whether the older actor is male or female as they move up through the age bands the average age difference of their on-screen partner rises; and
  2. This rise starts much earlier where male actors are the older of the two. Once a male actor reaches his mid-thirties, he starts to be paired with relatively much younger female actors. This isn’t true for female actors until they reach the 46-55 age band; and
  3. Interestingly although the sub-sample is fairly small this pattern seems to be also true for on screen same sex relationships. The rise is steadier, but it follows a similar pattern to different sex relationships with older actors paired with much younger ones.

The second graph is a bar chart showing the number of relationships by age band for each category (notice how easy it was for me to add tooltips to the chart)

 

 

ods graphics /imagemap=on;
title 'Number of Older Actor Relationships by Age Band/Sex';
footnote j=l 'Data From: https://github.com/lynnandtonic/hollywood-age-gap';
proc sgplot data=all_stats;
	vbar age_band /response=pairings_m dataskin=pressed
		legendlabel='Number of Relationships with Older Male Actors'
		tip=(pairings_m) tiplabel=('No of Pairings');
	vbar age_band /response=pairings_f dataskin=pressed
		legendlabel='Number of Relationships with Older Female Actors'
		tip=(pairings_f) tiplabel=('No of Pairings');
	vbar age_band /response=pairings_s dataskin=pressed
		legendlabel='Number of Same Sex Relationships'
		tip=(pairings_s) tiplabel=('No of Pairings');
	yaxis grid values=(0 to 400 by 50) valueshint
		label='Total Number of Pairings';
	xaxis label='Age Band of Older Actor';
run;

 

 

Here is the output of that second Proc SGPlot

 

SGPlot Chart2 Age Gaps.png

 

We can see that the majority of pairings occur in the 36-45 age band but in every band older male actors far outnumber older female actors.

 

In conclusion then it seems that the complaint from older female actors about the difficulty in getting romantic lead parts is justified but when they do get the parts then, like the men, they are often paired with much younger actors. Perhaps most surprisingly are the same sex relationship figures. It may be that, after all, Hollywood just loves its May to December romances.

 

Now it's Your Turn!

 

Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.

 

Visit [[this link]] to see all the Free Data Friday articles.

 

 

Version history
Last update:
‎08-04-2021 08:35 AM
Updated by:

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags