In this blog post, we'll be exploring the K-Nearest Neighbors (KNN) algorithm for predicting housing prices using a dataset of housing prices in a particular city. KNN is a simple, yet effective, algorithm that is widely used for classification and regression tasks in various fields.
We'll be working with a fictional dataset containing various features of houses along with their prices. The features include:
bedrooms
bathrooms
living_area
lot_size
year_built
garage_present
(1 if present, 0 if not)First, let's import the necessary libraries and load the dataset into a pandas DataFrame.
import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn.neighbors import KNeighborsRegressor from sklearn.metrics import mean_squared_error, r2_score # Read dataset from CSV file housing = pd.read_csv('housing_prices.csv') housing.head()
Next, we'll divide our dataset into input features X
and the output target variable y
which will be the housing prices.
X = housing.drop('price', axis=1) y = housing['price']
Now, let's split the data into training and testing sets using sklearn's train_test_split
function.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
We can now train the KNN model using KNeighborsRegressor from the sklearn.neighbors
module. We'll set n_neighbors
to 5, which means that the algorithm will consider the 5 nearest neighbors.
knn = KNeighborsRegressor(n_neighbors=5) knn.fit(X_train, y_train)
Now that our model is trained, let's predict house prices for the test set and evaluate the performance of our model using the root mean squared error (RMSE) metric and the R2 score.
y_pred = knn.predict(X_test) rmse = np.sqrt(mean_squared_error(y_test, y_pred)) r2 = r2_score(y_test, y_pred) print('Root Mean Squared Error:', rmse) print('R2 Score:', r2)
Lower RMSE values indicate better performance and the R2 score, which ranges from 0 to 1, indicates the proportion of the variance in the target variable that can be explained by the model.
In this blog post, we used the K-Nearest Neighbors algorithm to predict housing prices using a simple dataset. Keep in mind that for more complex problems and larger datasets, you may achieve better results using other algorithms or feature engineering techniques.
You can even try experimenting with various values of n_neighbors
or other parameters of KNeighborsRegressor to improve the performance of your model.