SAS programming concepts in this and other Free Data Friday articles remain useful, but SAS OnDemand for Academics has replaced SAS University Edition as a free e-learning option. Hit the orange button below to start your journey with SAS OnDemand for Academics:
Fifty years ago this week, Neil Armstrong and Buzz Aldrin stepped out of the Eagle lunar module to become the first humans to walk on any astronomical body other than Earth. At that time, the existence of any planets outside our solar system was mere conjecture and travel to them the stuff of science fiction. However, in the decades since that epic feat, more and more of these planets (known as exoplanets) have been discovered, until today we know of more than 4,000.
Naturally, one of the first questions that was asked about these planets was, “Could life exist on them?” Considering their great distance from us we know quite a lot about these planets, but we still can’t say for sure that life exists on any of them. We do know that on many of them life, certainly as we know it, cannot exist – for example planet HD 189733b, where temperatures reach over 900C, wind speeds are in excess of 5,400 mph and where it rains glass is, to say the least, highly unlikely to be a suitable candidate for a life-bearing planet.
There is still hope, however, that some of the discovered exoplanets may harbour life. So in this edition of Free Data Friday, we will be looking at NASA data to determine which exoplanets may be in the “habitable zone” (not too hot and not too cold, also known as the Goldilocks zone) around their star.
NASA has a database of confirmed exoplanets hosted at the California Institute of Technology web site. There are a number of options for downloading the data and I chose to download a CSV file.
I opened the CSV file in Excel to manually remove the header rows which contain metadata for the columns. I then saved the edited file in XLSX format so that I could use the SAS XLSX engine to open it (I find this gives better results than Proc Import – particularly with large files).
There are a number of ways of calculating likely habitability, and with so many variables and unknowns we have to make a choice about which method to use. I chose the method detailed in the Planetary Biology web site, which calculates the inner and outer bounds of the habitable zone.
Here is the data step code I used with comments showing which stage and step in the process the code relates to:
libname planets xlsx "/folders/myshortcuts/Dropbox/composite.xlsx";
/* Method from https://www.planetarybiology.com/calculating_habitable_zone.html */
data chz(keep=fpl_hostname fpl_letter fpl_name inner_bound outer_bound
abs_mag fst_spt spectral_type fpl_smax in_zone fpl_discmethod);
set planets.composite;
/* Stage 1 - Calculate the host star's absolute luminosity based */
/* on it's apparent visible magnitude */
/* Step 1 - Calculate the star's absolute visual magnitude */
abs_mag=fst_optmag-5*log10(fst_dist/10);
/* Step 2 - If we know the absolute magnitude calculate the star's */
/* bolometric magnitude */
if abs_mag ne . then do;
spectral_type=substr(fst_spt,1,1);
select(spectral_type);
when("B") bolo_correction=-2;
when("A") bolo_correction=-0.3;
when("F") bolo_correction=-0.15;
when("G") bolo_correction=-0.4;
when("K") bolo_correction=-0.8;
when("M") bolo_correction=-2;
otherwise bolo_correction=.;
end;
/* Step 3 - Calculate the absolute luminosity of the star */
if bolo_correction ne . then do;
bolo_mag=abs_mag+bolo_correction;
abs_lum=10**((bolo_mag-4.72)/-2.5);
/* Stage 2 - Approximate the radii of the host star's */
/* habitable zone */
inner_bound=sqrt(abs_lum/1.1);
outer_bound=sqrt(abs_lum/0.53);
end;
/* Determine whether or not the planet's orbit lies */
/* inside the habitable zone */
if fpl_smax > inner_bound and fpl_smax < outer_bound
then in_zone=1;
else in_zone=0;
end;
run;
I then ran the following Proc SQL statement to create a data set of only planets inside the zone:
proc sql;
create table in_zone as
select fpl_name as planet_name,
fpl_smax as orbit_semi_maxor_axis,
inner_bound,
outer_bound,
fpl_discmethod as discovery_method
from chz
where in_zone;
quit;
This gave me a data set of 63 candidate planets looking like this:
In order to cross-check the results, I took a random sample of the planets listed and checked them against The Open Exoplanet Catalogue, which has a diagram for each planet showing its position in its star’s habitable zone. All of the planets sampled and selected by the data step were confirmed to be in the zone.
Of course, this method makes a number of assumptions, not least that the planet only orbits one star. Binary or greater systems would require a much more complex method of calculation for their habitable zones. We would also need to rerun the analysis periodically as new observations change the values used (this may partially explain any differences between various online lists of potentially habitable exoplanets.)
So, what does the data tell us about exoplanets? Firstly, those in the habitable zone are very rare (about 1.5% of those discovered) and secondly I decided to examine the method used to discover the star. I ran the following Proc SQL Statement to calculate the total number discovered by each method:
title 'Method of Discovery of all Exoplanets';
proc sql;
select fpl_discmethod, count(fpl_discmethod) as count_method
from chz
group by fpl_discmethod
order by count_method desc;
quit;
This gives the following results:
We can see that the Transit method has discovered by far the most exoplanets. However, if we look at the 63 selected by our code all of them were discovered by the Radial Velocity method implying that this is the best method for discovering potentially habitable planets.
Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.
Visit [[this link]] to see all the Free Data Friday articles.
I wonder if there will be a 'flat earth' version of this that limits the search to no more than 10 feet. 🙂
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.