MISSION · DS-CAPSTONE-IBM-COURSERA · 2026
IBM Data Science Professional Certificate · Capstone Project
Briefing
Mission Objective
🎯 First-Stage Booster Landing Prediction
Use Machine Learning to determine whether the Falcon 9's first stage booster
will successfully land after a launch — a critical factor in estimating the real cost of each mission
(~$62M with a reusable booster vs. ~$165M with a disposable one).
The full pipeline covers data collection from the SpaceX API and Wikipedia, wrangling and feature engineering,
SQL and visual EDA across 18+ notebooks, interactive maps, a live Dash dashboard,
and a four-algorithm ML comparison with hyperparameter tuning.
Launch Sequence
Project Phases
📡 01 — Data Collection
Data pulled from the official SpaceX REST API (filtered to Falcon 9 launches)
and Web Scraping of Wikipedia tables using BeautifulSoup.
Outputs: dataset_part_1.csv and wiki_launches.csv.
📂 01_dataCollection / 01_APICollection.ipynb · 02_WebScraping.ipynb
🔧 02 — Data Wrangling
Null handling, categorical encoding, and engineering of the target variable
Class (1 = successful landing, 0 = failure).
Output: dataset_part_2.csv, ready for analysis.
📂 02_dataWrangling / 03_DataWrangling.ipynb
📊 03 — Exploratory Data Analysis (EDA)
SQL EDA: queries to find success patterns, average payload mass, and outcomes by launch site.
Visual EDA: 18 notebooks using Seaborn and Matplotlib — flight number vs. site, orbit type,
yearly success trends, payload ranges, and more.
📂 03_EDA / 04_SQL.ipynb · 05_01…18_*.ipynb
🗺️ 04 — Interactive Maps & Dashboard
Folium maps: color-coded marker clusters (🟢 success / 🔴 failure),
coastline proximity via the Haversine formula, and geospatial analysis of all 4 launch sites.
Dash App: interactive web application with site filters and dynamic performance charts.
📂 04_mapsDashboards / 06_InteractiveMapsFolium.ipynb · 07_LaunchSiteDashApp.py
🤖 05 — Machine Learning
Feature normalization with StandardScaler, 80/20 train-test split, and hyperparameter search via GridSearchCV. Four classifiers evaluated and compared by accuracy, F1-score, and confusion matrices.
📂 05_machineLearning / 08_MLPrediction.ipynb · 09_MLComparison.ipynb · 10_confusionMatrix.ipynb
Telemetry
Classification Results
LOGISTIC REGRESSION
~83%
SUPPORT VECTOR MACHINE
~83%
★ BEST MODEL
DECISION TREE
~89%
K-NEAREST NEIGHBORS
~83%
Architecture
Repository Structure
Onboard Systems
Tech Stack
Mission Debrief
Evaluation & Grading
COURSE GRADE
Official peer-reviewed score from the IBM Data Science Capstone on Coursera.
📋 View Grade Screenshot →AI EVALUATION REPORT
Detailed AI-assisted feedback covering methodology, analysis quality, and findings.
🤖 View AI Feedback Report →