SAS programming concepts in this and other Free Data Friday articles remain useful, but SAS OnDemand for Academics has replaced SAS University Edition as a free e-learning option. Hit the orange button below to start your journey with SAS OnDemand for Academics:
It’s often said that female actors in Hollywood have a harder time getting parts in films once they pass a certain age than their male contemporaries and that this is particularly true when the part involves a romantic relationship.
Lynn Fisher of the web site hollywoodagegap.com has some interesting graphics showing the relative ages of actors whose characters in films are romantically involved. You can also download the data which was used to create these graphics from her github repository to use it in your own analysis. In this article we will look at this data to see if we can determine not only whether this claim is true, but if there are any other interesting patterns in the data.
You can download the data from the GitHub repository as a CSV file. I renamed the downloaded file to make it clearer what data the file held.
Firstly, I imported the data using Proc Import and then used Proc Print and Proc Contents to examine it.
filename csv "/folders/myshortcuts/Dropbox/Articles/ SAS Communities Library/Hollywood Age Gaps/agegaps.csv" termstr=LF; proc import datafile=csv out=moviedata dbms=CSV replace; run; filename csv; proc print data=moviedata(obs=10); run; proc contents data=moviedata order=varnum; run;
I discovered a few issues with the data and the variable format:
Here’s the code which accomplishes all this and splits the file into three files according to whether the pairing is different sex, same sex or is between two actors of identical age.
proc format; value age_band Low-25='25 and under' 26-35='26-35' 36-45='36-45' 46-55='46-55' 56-65='56-65' 66-75='66-75' 76-high='76 plus'; run; data moviedata; set moviedata(drop=actor_1_birthdate actor_2_birthdate director release_year); actor_1_age_new = input(actor_1_age, 8.); drop actor_1_age; rename actor_1_age_new=actor_1_age; actor_2_age_new = input(actor_2_age, 8.); drop actor_2_age; rename actor_2_age_new=actor_2_age; age_difference_new = input(age_difference, 8.); drop age_difference; rename age_difference_new=age_difference; run; data diff_sex same_sex same_age(drop=male_age female_age man_older); set moviedata; if actor_1_gender=actor_2_gender then output same_sex; else do; if actor_1_gender="man" then do; male_age=actor_1_age; female_age=actor_2_age; end; else do; male_age=actor_2_age; female_age=actor_1_age; end; if male_age=female_age then output same_age; if male_age>=female_age then man_older=1; else man_older=0; output diff_sex; end; run;
Having reshaped the data into a form suitable for my analysis I then used Proc SQL to create files holding summary details of the average age difference between the older and younger actor where the male actor is older, the female actor is older and for same sex relationships all grouped by the custom format age_band which I created earlier.
proc sql; create table diffstats_m as select distinct put(male_age,age_band.) as age_band, count(actor_1_name) as num_pairings, avg(age_difference) as actual_diff from diff_sex where man_older=1 group by put(male_age,age_band.); quit; proc sql; create table diffstats_f as select distinct put(female_age,age_band.) as age_band, count(actor_1_name) as num_pairings, avg(age_difference) as actual_diff from diff_sex where man_older=0 group by put(female_age,age_band.); quit; data same_sex; set same_sex; older_actor=largest(1,actor_1_age,actor_2_age); younger_actor=smallest(1,actor_2_age,actor_1_age); run; proc sql; create table diffstats_s as select distinct put(older_actor,age_band.) as age_band, count(actor_1_name) as num_pairings, avg(age_difference) as actual_diff from same_sex group by put(older_actor,age_band.); ; quit;
Having done that, I then merged the three files and used Proc SGPlot to create two graphs.
The first graph is a line chart which has three series.
data all_stats; merge diffstats_m(rename=(actual_diff=male_diff num_pairings=pairings_m)) diffstats_f(rename=(actual_diff=female_diff num_pairings=pairings_f)) diffstats_s(rename=(actual_diff=same_diff num_pairings=pairings_s)); by age_band; run; title 'Average Age Differences in Hollywood Romances'; footnote j=l 'Data From: https://github.com/lynnandtonic/hollywood-age-gap'; proc sgplot data=all_stats; series x=age_band y=male_diff /smoothconnect lineattrs=(thickness=3 pattern=SOLID) legendlabel='Avg Age Difference when Male Actor is Older'; series x=age_band y=female_diff /smoothconnect lineattrs=(thickness=3 pattern=SHORTDASH) legendlabel='Avg Age Difference when Female Actor is Older'; series x=age_band y=same_diff /smoothconnect lineattrs=(thickness=3 pattern=LONGDASH) legendlabel='Avg Age Difference in Same Sex Relationship'; yaxis grid values=(0 to 60 by 2) valueshint label='Age Difference (Years)'; xaxis label='Age Band of Older Actor'; run;
Here is the output of that first Proc SGPlot
From the chart, I can see three things:
The second graph is a bar chart showing the number of relationships by age band for each category (notice how easy it was for me to add tooltips to the chart)
ods graphics /imagemap=on; title 'Number of Older Actor Relationships by Age Band/Sex'; footnote j=l 'Data From: https://github.com/lynnandtonic/hollywood-age-gap'; proc sgplot data=all_stats; vbar age_band /response=pairings_m dataskin=pressed legendlabel='Number of Relationships with Older Male Actors' tip=(pairings_m) tiplabel=('No of Pairings'); vbar age_band /response=pairings_f dataskin=pressed legendlabel='Number of Relationships with Older Female Actors' tip=(pairings_f) tiplabel=('No of Pairings'); vbar age_band /response=pairings_s dataskin=pressed legendlabel='Number of Same Sex Relationships' tip=(pairings_s) tiplabel=('No of Pairings'); yaxis grid values=(0 to 400 by 50) valueshint label='Total Number of Pairings'; xaxis label='Age Band of Older Actor'; run;
Here is the output of that second Proc SGPlot
We can see that the majority of pairings occur in the 36-45 age band but in every band older male actors far outnumber older female actors.
In conclusion then it seems that the complaint from older female actors about the difficulty in getting romantic lead parts is justified but when they do get the parts then, like the men, they are often paired with much younger actors. Perhaps most surprisingly are the same sex relationship figures. It may be that, after all, Hollywood just loves its May to December romances.
Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.
Visit [[this link]] to see all the Free Data Friday articles.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.