Data Cleaning in SQL Server Management Studio

Data cleaning is the most important and time-consuming step in the data analysis process. It’s the process of detecting and correcting corrupt or inaccurate records from a record set, table, or database and refers to identifying incomplete, incorrect, inaccurate, or irrelevant parts of the data and then replacing, modifying or deleting the dirty or coarse data.

Link to the code

https://www.google.com/url?sa=j&url=https%3A%2F%2Fgithub.com%2Frrizwan43%2FData-Cleanig-with-SQL&uct=1680957152&usg=87nVIZU4JxfGMy3fvo4i5dOOMTQ.

Dataset Used

Nashville Housing Dataset form Kaggle.

Tasks

1. Populate the property address.
2. Breaking out address into individual columns (address, city, state)
3. Change Y and N to Yes and No in the “Sold as Vacant field”
4. Remove duplicates5. Delete Unused Columns

Techniques

Here are some of the advanced techniques that I used for this data cleaning.
1. CTEs
2. Temp tables
3. Windows functions.