Hybrid Recommendation Engine for Optimal Neighborhood Selection
Project Overview
The Applied Data Science Capstone project centered on building a hybrid machine learning recommendation engine to identify optimal living environments for users. The system integrates unsupervised and supervised learning. First, neighborhoods were clustered using K-Means, then K-Nearest Neighbors (KNN) was leveraged to generate ranked, personalized neighborhood recommendations based on user preferences and lifestyle data.
Methodology
Data Sources
- Aggregated neighborhood-level data from Zillow, US Census, city datasets, and real estate APIs.
- Key features included affordability, crime, walkability, schools, transit, and amenities.
Feature Engineering
- Standardized and scaled all features for clustering.
- Principal Component Analysis (PCA) used to reduce feature dimensionality where appropriate.
- Categorical features (e.g., “Dog friendly”) one-hot encoded for inclusion in clustering and KNN stages.
Hybrid Modeling Pipeline
- Stage 1: K-Means Clustering
- Grouped neighborhoods into clusters with similar profiles (housing, safety, lifestyle).
- Identified macro-level fit for user preferences
- Users only compared to neighborhoods in their best-fit cluster.
- Stage 2: K-Nearest Neighbors (KNN) Personalization
- For a given user, identified similar user-neighborhood preference pairs within the cluster using KNN.
- Produced a personalized, ranked shortlist of ideal neighborhoods.
Evaluation & Validation
- Qualitative validation against ground-truth preferences (benchmarking against known “best” neighborhoods for different user profiles).
- User-centric scoring: Recommendations cross-validated via scenario testing (e.g., single professionals, families, pet owners).
Key Results
- The engine delivered actionable, interpretable neighborhood recommendations for a wide range of user profiles.
- The two-layer pipeline (“cluster first, personalize second”) improved both the relevance and diversity of recommendations compared to single-stage approaches.
- Visualization of clusters and ranked recommendations allowed clear communication to end-users and stakeholders.
Tools & Technologies
- Languages: Python
- Libraries: scikit-learn (KMeans, KNN), pandas, numpy, matplotlib, seaborn
- Deployment: Jupyter Notebooks, GitHub
GitHub Repository & Additional Links
Conclusion & Future Work
This hybrid recommendation system offers interpretable, actionable advice for users seeking an ideal place to live, combining data-driven neighborhood segmentation with personalized, scenario-based ranking. Future extensions could integrate real-time user feedback, additional lifestyle variables, and direct web-app deployment for broader accessibility.