DS Capstone · SpaceX Falcon 9

Briefing

Mission Objective

🎯 First-Stage Booster Landing Prediction

Use Machine Learning to determine whether the Falcon 9's first stage booster will successfully land after a launch — a critical factor in estimating the real cost of each mission (~$62M with a reusable booster vs. ~$165M with a disposable one).

The full pipeline covers data collection from the SpaceX API and Wikipedia, wrangling and feature engineering, SQL and visual EDA across 18+ notebooks, interactive maps, a live Dash dashboard, and a four-algorithm ML comparison with hyperparameter tuning.

Launch Sequence

Project Phases

📡 01 — Data Collection

Data pulled from the official SpaceX REST API (filtered to Falcon 9 launches) and Web Scraping of Wikipedia tables using BeautifulSoup. Outputs: dataset_part_1.csv and wiki_launches.csv.

📂 01_dataCollection / 01_APICollection.ipynb · 02_WebScraping.ipynb

🔧 02 — Data Wrangling

Null handling, categorical encoding, and engineering of the target variable Class (1 = successful landing, 0 = failure). Output: dataset_part_2.csv, ready for analysis.

📂 02_dataWrangling / 03_DataWrangling.ipynb

📊 03 — Exploratory Data Analysis (EDA)

SQL EDA: queries to find success patterns, average payload mass, and outcomes by launch site.
Visual EDA: 18 notebooks using Seaborn and Matplotlib — flight number vs. site, orbit type, yearly success trends, payload ranges, and more.

📂 03_EDA / 04_SQL.ipynb · 05_01…18_*.ipynb

🗺️ 04 — Interactive Maps & Dashboard

Folium maps: color-coded marker clusters (🟢 success / 🔴 failure), coastline proximity via the Haversine formula, and geospatial analysis of all 4 launch sites.
Dash App: interactive web application with site filters and dynamic performance charts.

📂 04_mapsDashboards / 06_InteractiveMapsFolium.ipynb · 07_LaunchSiteDashApp.py

🤖 05 — Machine Learning

Feature normalization with StandardScaler, 80/20 train-test split, and hyperparameter search via GridSearchCV. Four classifiers evaluated and compared by accuracy, F1-score, and confusion matrices.

📂 05_machineLearning / 08_MLPrediction.ipynb · 09_MLComparison.ipynb · 10_confusionMatrix.ipynb

Telemetry

Classification Results

LOGISTIC REGRESSION

~83%

SUPPORT VECTOR MACHINE

~83%

★ BEST MODEL

DECISION TREE

~89%

K-NEAREST NEIGHBORS

~83%

Architecture

Repository Structure

DS_Capstone_Coursera_IBM/
├── 01_dataCollection/
│ ├── 01_APICollection.ipynb ← SpaceX REST API
│ └── 02_WebScraping.ipynb ← Wikipedia + BeautifulSoup
├── 02_dataWrangling/
│ └── 03_DataWrangling.ipynb ← cleaning + Class variable
├── 03_EDA/
│ ├── 04_SQL.ipynb ← SQL queries
│ └── 05_01…18_*.ipynb ← 18 visual analyses
├── 04_mapsDashboards/
│ ├── 06_InteractiveMapsFolium.ipynb
│ └── 07_LaunchSiteDashApp.py ← Dash web app
├── 05_machineLearning/
│ ├── 08_MLPrediction.ipynb
│ ├── 09_MLComparison.ipynb
│ └── 10_confusionMatrix.ipynb
├── data/
│ ├── dataset_part_1.csv ← raw API data
│ ├── dataset_part_2.csv ← cleaned + Class
│ └── wiki_launches.csv ← scraped data
├── examResults/
│ ├── examGrade.png ← grading screenshot
│ └── AI_GradingFeedback.pdf ← AI evaluation report
└── presentation/
├── DS_Capstone_Coursera.pdf
└── DS_Capstone_Coursera.pptx

Onboard Systems

Tech Stack

Python 3

Jupyter

Pandas

NumPy

Matplotlib

Seaborn

scikit-learn

SQL / SQLite

BeautifulSoup

Requests

Folium

Plotly Dash

Mission Debrief

Evaluation & Grading

COURSE GRADE

Official peer-reviewed score from the IBM Data Science Capstone on Coursera.

📋 View Grade Screenshot →

AI EVALUATION REPORT

Detailed AI-assisted feedback covering methodology, analysis quality, and findings.

🤖 View AI Feedback Report →