Predicting House Prices With K-Nearest Neighbors

Introduction

In this blog post, we'll be exploring the K-Nearest Neighbors (KNN) algorithm for predicting housing prices using a dataset of housing prices in a particular city. KNN is a simple, yet effective, algorithm that is widely used for classification and regression tasks in various fields.

Dataset

We'll be working with a fictional dataset containing various features of houses along with their prices. The features include:

Number of bedrooms bedrooms
Number of bathrooms bathrooms
Living area size in square feet living_area
Lot size in square feet lot_size
Year the house was built year_built
Presence of a garage garage_present (1 if present, 0 if not)

Getting Started

First, let's import the necessary libraries and load the dataset into a pandas DataFrame.

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error, r2_score

# Read dataset from CSV file
housing = pd.read_csv('housing_prices.csv')
housing.head()

Preprocessing the Data

Next, we'll divide our dataset into input features X and the output target variable y which will be the housing prices.

X = housing.drop('price', axis=1)
y = housing['price']

Now, let's split the data into training and testing sets using sklearn's train_test_split function.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Training the KNN Model

We can now train the KNN model using KNeighborsRegressor from the sklearn.neighbors module. We'll set n_neighbors to 5, which means that the algorithm will consider the 5 nearest neighbors.

knn = KNeighborsRegressor(n_neighbors=5)
knn.fit(X_train, y_train)

Predicting House Prices and Evaluating the Model

Now that our model is trained, let's predict house prices for the test set and evaluate the performance of our model using the root mean squared error (RMSE) metric and the R2 score.

y_pred = knn.predict(X_test)

rmse = np.sqrt(mean_squared_error(y_test, y_pred))
r2 = r2_score(y_test, y_pred)

print('Root Mean Squared Error:', rmse)
print('R2 Score:', r2)

Lower RMSE values indicate better performance and the R2 score, which ranges from 0 to 1, indicates the proportion of the variance in the target variable that can be explained by the model.

Conclusion

In this blog post, we used the K-Nearest Neighbors algorithm to predict housing prices using a simple dataset. Keep in mind that for more complex problems and larger datasets, you may achieve better results using other algorithms or feature engineering techniques.

You can even try experimenting with various values of n_neighbors or other parameters of KNeighborsRegressor to improve the performance of your model.