Please rate the book
Follow along with Dee Yan, our fictional data science intern, as she assumes the job of interim database administrator at the fictional aerospace startup, Red:4. She’ll learn PostgreSQL like we all do: on the job and under pressure.
You’ll start out with the basics: creating tables and importing data. Soon, however, you’ll be awash in glorious SQL and data from space (the NASA/JPL archives of the Cassini mission), creating functions, common table expressions and calculating aggregates using window functions all in the name of science while trying to figure out if there’s life under the ice of a very curious little moon.
YOU'LL DIG IN TO SOME OF THE MOST AMAZING DATA OF OUR LIFETIME...
I won't waste your time with sleep-inducing demos and examples - we're going to hit the ground running by importing millions of records into PostgreSQL right from the command line and then we're going to interrogate it for correctness. From there we put our detective hats on and get to work.
WORKING WITH THE POSTGRESQL CLI
We don't have time for fluffy tooling! Yes there are GUIs and visual tools out there, but SQL with PostgreSQL is simple and easy to use when describing the precise table and index set that you want.
IMPORTING DATA FROM MASSIVE CSV FILES
You'll import data like a pro, using the command line and a Makefile. There are GUIs you could use, but here at Red:4 we believe in keeping things simple and powerful..
WEEDING OUT THE INEVITABLE CRAP DATA
You will become "data minded".You'll go through a basic audit process from real, raw data from JPL. It doesn't matter where the data is from, it will always have errors.
TRIAGING AND SIZING UP WHAT THE DATA MEANS
You'll sleuth through raw Cassini data using basic queries. Pulling data in is only part of the process – looking for clues and understanding what you're seeing is the next step. To do this you'll use Common Table Expressions, Full Text Search indexing and Windowing Functions.
OPTIMIZING QUERIES
You'll speed up slow queries with built-in analysis tools and objects. The Cassini data dump is gigantic, and sifting through the analysis records can be time consuming! You'll use EXPLAIN and ANALYZE to figure out where to put your indexes and when it makes sense to build a materialized view, which is data cached on disk.
VERIFYING WHAT WE HAVE USING SQL
NASA is a very thorough organization, but it's staffed by humans and humans like spreadsheets and spreadsheets destroy data. You'll use mathematical analysis to verify flyby altitudes and speeds using data from the INMS during the 22 close encounters with Enceladus.