BookmarkSubscribeRSS Feed

Join the Search for Habitable Planets with SAS

Started ‎07-19-2019 by
Modified ‎08-03-2021 by
Views 3,286

SAS programming concepts in this and other Free Data Friday articles remain useful, but SAS OnDemand for Academics has replaced SAS University Edition as a free e-learning option. Hit the orange button below to start your journey with SAS OnDemand for Academics:


Access Now

Looking for habitable planetsLooking for habitable planets


Fifty years ago this week, Neil Armstrong and Buzz Aldrin stepped out of the Eagle lunar module to become the first humans to walk on any astronomical body other than Earth. At that time, the existence of any planets outside our solar system was mere conjecture and travel to them the stuff of science fiction. However, in the decades since that epic feat, more and more of these planets (known as exoplanets) have been discovered, until today we know of more than 4,000.


Naturally, one of the first questions that was asked about these planets was, “Could life exist on them?” Considering their great distance from us we know quite a lot about these planets, but we still can’t say for sure that life exists on any of them. We do know that on many of them life, certainly as we know it, cannot exist – for example planet HD 189733b, where temperatures reach over 900C, wind speeds are in excess of 5,400 mph and where it rains glass is, to say the least, highly unlikely to be a suitable candidate for a life-bearing planet.


There is still hope, however, that some of the discovered exoplanets may harbour life. So in this edition of Free Data Friday, we will be looking at NASA data to determine which exoplanets may be in the “habitable zone” (not too hot and not too cold, also known as the Goldilocks zone) around their star.


Get the Data


NASA has a database of confirmed exoplanets hosted at the California Institute of Technology web site. There are a number of options for downloading the data and I chose to download a CSV file.


Get Started with SAS OnDemand for Academics

In this 9-minute tutorial, SAS instructor @DomWeatherspoon shows you how to get your data into SAS OnDemand for Academics and other key steps:

Get Started


Getting the Data Ready


I opened the CSV file in Excel to manually remove the header rows which contain metadata for the columns. I then saved the edited file in XLSX format so that I could use the SAS XLSX engine to open it (I find this gives better results than Proc Import – particularly with large files).


The Results


There are a number of ways of calculating likely habitability, and with so many variables and unknowns we have to make a choice about which method to use. I chose the method detailed in the Planetary Biology web site, which calculates the inner and outer bounds of the habitable zone.


Here is the data step code I used with comments showing which stage and step in the process the code relates to:



libname planets xlsx "/folders/myshortcuts/Dropbox/composite.xlsx";

/* Method from */

data chz(keep=fpl_hostname fpl_letter fpl_name inner_bound outer_bound
			abs_mag fst_spt spectral_type fpl_smax in_zone fpl_discmethod);
	set planets.composite;
/* 	Stage 1 - Calculate the host star's absolute luminosity based */
/* 	on it's apparent visible magnitude */

/* 	Step 1 - Calculate the star's absolute visual magnitude */
/* 	Step 2 - If we know the absolute magnitude calculate the star's */
/* 	bolometric magnitude */
	if abs_mag ne . then do;
			when("B") bolo_correction=-2;
			when("A") bolo_correction=-0.3;
			when("F") bolo_correction=-0.15;
			when("G") bolo_correction=-0.4;
			when("K") bolo_correction=-0.8;
			when("M") bolo_correction=-2;
			otherwise bolo_correction=.;
/* 		Step 3 - Calculate the absolute luminosity of the star */
		if bolo_correction ne . then do;

/* 		Stage 2 - Approximate the radii of the host star's */ 
/* 		habitable zone */
/* 		Determine whether or not the planet's orbit lies */
/* 		inside the habitable zone */
		if fpl_smax > inner_bound and fpl_smax < outer_bound
			then in_zone=1;
			else in_zone=0;


I then ran the following Proc SQL statement to create a data set of only planets inside the zone:



proc sql;
	create table in_zone as
	select fpl_name as planet_name,
			fpl_smax as orbit_semi_maxor_axis,
			fpl_discmethod as discovery_method
	from chz
	where in_zone;


This gave me a data set of 63 candidate planets looking like this:



Data Set - Habitable.png


In order to cross-check the results, I took a random sample of the planets listed and checked them against The Open Exoplanet Catalogue, which has a diagram for each planet showing its position in its star’s habitable zone. All of the planets sampled and selected by the data step were confirmed to be in the zone.


Of course, this method makes a number of assumptions, not least that the planet only orbits one star. Binary or greater systems would require a much more complex method of calculation for their habitable zones. We would also need to rerun the analysis periodically as new observations change the values used (this may partially explain any differences between various online lists of potentially habitable exoplanets.)


Habitable zones are rare


So, what does the data tell us about exoplanets? Firstly, those in the habitable zone are very rare (about 1.5% of those discovered) and secondly I decided to examine the method used to discover the star. I ran the following Proc SQL Statement to calculate the total number discovered by each method:



title 'Method of Discovery of all Exoplanets';
proc sql;
	select fpl_discmethod, count(fpl_discmethod) as count_method
	from chz
	group by fpl_discmethod
	order by count_method desc;


This gives the following results:



Exoplanet Count.png


We can see that the Transit method has discovered by far the most exoplanets. However, if we look at the 63 selected by our code all of them were discovered by the Radial Velocity method implying that this is the best method for discovering potentially habitable planets.


Now it's your Turn!


Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.


Visit [[this link]] to see all the Free Data Friday articles.


I wonder if there will be a 'flat earth' version of this that limits the search to no more than 10 feet. 🙂

Version history
Last update:
‎08-03-2021 03:43 PM
Updated by:

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags