Hybrid Recommendation Engine for Optimal Neighborhood Selection

Hybrid Recommendation Engine for Optimal Neighborhood Selection

Project Overview

The Applied Data Science Capstone project centered on building a hybrid machine learning recommendation engine to identify optimal living environments for users. The system integrates unsupervised and supervised learning. First, neighborhoods were clustered using K-Means, then K-Nearest Neighbors (KNN) was leveraged to generate ranked, personalized neighborhood recommendations based on user preferences and lifestyle data.

Methodology

Data Sources

  • Aggregated neighborhood-level data from Zillow, US Census, city datasets, and real estate APIs.
  • Key features included affordability, crime, walkability, schools, transit, and amenities.

Feature Engineering

  • Standardized and scaled all features for clustering.
  • Principal Component Analysis (PCA) used to reduce feature dimensionality where appropriate.
  • Categorical features (e.g., “Dog friendly”) one-hot encoded for inclusion in clustering and KNN stages.

Hybrid Modeling Pipeline

  • Stage 1: K-Means Clustering
    • Grouped neighborhoods into clusters with similar profiles (housing, safety, lifestyle).
    • Identified macro-level fit for user preferences
    • Users only compared to neighborhoods in their best-fit cluster.
  • Stage 2: K-Nearest Neighbors (KNN) Personalization
    • For a given user, identified similar user-neighborhood preference pairs within the cluster using KNN.
    • Produced a personalized, ranked shortlist of ideal neighborhoods.

Evaluation & Validation

  • Qualitative validation against ground-truth preferences (benchmarking against known “best” neighborhoods for different user profiles).
  • User-centric scoring: Recommendations cross-validated via scenario testing (e.g., single professionals, families, pet owners).

Key Results

  • The engine delivered actionable, interpretable neighborhood recommendations for a wide range of user profiles.
  • The two-layer pipeline (“cluster first, personalize second”) improved both the relevance and diversity of recommendations compared to single-stage approaches.
  • Visualization of clusters and ranked recommendations allowed clear communication to end-users and stakeholders.

Tools & Technologies

  • Languages: Python
  • Libraries: scikit-learn (KMeans, KNN), pandas, numpy, matplotlib, seaborn
  • Deployment: Jupyter Notebooks, GitHub

Conclusion & Future Work

This hybrid recommendation system offers interpretable, actionable advice for users seeking an ideal place to live, combining data-driven neighborhood segmentation with personalized, scenario-based ranking. Future extensions could integrate real-time user feedback, additional lifestyle variables, and direct web-app deployment for broader accessibility.