Mohamed Yassine Hemissi

Student • Dreamer • Human

HomeProjectsBlog

© 2026 Mohamed Yassine Hemissi. All rights reserved.

← Back to projects

BreatheSafe: Air Quality Business Intelligence & Machine Learning Analysis

Flagship

An end-to-end analytics project combining data engineering, warehousing, business intelligence, and machine learning to study air quality, public health outcomes, and vulnerability patterns.

2025·By Mohamed Yassine Hemissi·Updated January 13, 2025
EngineeringAcademic/Community

Technologies Used

Data EngineeringData WarehouseBusiness IntelligenceMachine LearningPredictive ModelingClustering

BreatheSafe Project

BreatheSafe was built as a full air-quality analytics system rather than a single model or dashboard. The broader goal was to connect emissions, AQI, and public-health data inside one platform that could support analysis, decision-making, and machine learning work.

Platform Scope

Our work covered four main layers:

  • collecting and organizing emissions and health data
  • building a structured warehouse for downstream analysis
  • creating BI dashboards for different decision-makers
  • developing supervised and unsupervised machine learning studies

At the data level, the project was designed as a real analytics backbone rather than an ad-hoc notebook workflow. The warehouse followed a constellation model centered around human-health and air-quality facts, connected to dimensions such as geography, time, transport, hospitalization, environmental conditions, and exercise.

Business Intelligence Layer

The BI side was built to serve multiple audiences, including regional authorities, public-health actors, and automobile manufacturers.

The dashboards were designed as interactive decision-support tools and included views for:

  • overall navigation and project entry points
  • AQI analysis
  • air-quality analysis
  • public-health analysis
  • automobile manufacturer insights
  • prediction model results

What made the BI layer meaningful was that it already framed the project around impact. The public-health side did not stop at descriptive charts: it focused on hospitalization, recovery, mortality, and disparities across populations.

My Part Within the Project

Within the broader team effort, my individual work focused on two machine learning objectives.

Ethnicity-Based Air Quality Disease Risk Prediction

The first objective studied the relationship between ethnicity, AQI, and mortality linked to air-pollution-related diseases.

This part was not limited to one model. I worked on both:

  • a classification pipeline for disease-risk categories
  • a regression attempt for life expectancy prediction

Among the tested classification models, the Decision Tree performed best, with the strongest balance across accuracy, precision, recall, and F1-score. The regression side achieved only moderate explanatory power, which made that part useful not because it solved the problem completely, but because it showed the limits of the available feature set and the room left for future improvement.

California AQI Monitoring Centers Clustering

The second objective focused on clustering AQI monitoring centers across California in order to reveal meaningful similarities between locations.

I started with PCA for dimensionality reduction and exploratory visualization, then compared hierarchical clustering and K-Means. K-Means emerged as the more practical solution, with four clusters producing acceptable quality according to the evaluation metrics used in the report.

What mattered most in this part was not only the metrics, but the interpretation. The clustering was read spatially across California, with visible differences between areas such as Los Angeles, the Bay Area, and the Central Valley, which opened the door to hypotheses around density, urbanization, and environmental conditions.

Why This Project Matters

BreatheSafe stands out because it was not just a machine learning exercise. It connected engineering structure, business intelligence, and applied analysis into one system. The project forced us to think across layers: data pipelines, warehouse design, dashboard usability, prediction quality, and the public-health meaning behind the numbers.

Full Report

PDF preview unavailable. Open the report.