Results of the RTDIP Data Quality Checker Project (Video and Report, AMOS Winter 2024/25)

This project is one of the Scrum projects with industry partners that were part of the AMOS Winter 2024/25 Projects. Below please find the video (you may also like the other videos) and the project summary which details the final result of the project. We run these projects every semester, so please be in touch if you would like to motivate one of your own!

Demo Video

Project Summary

Project NameReal Time Data Ingestion Platform (RTDIP)
Project MissionTo support the advancement of the Real-Time Data Ingestion Platform (RTDIP) by contributing to the development of innovative, open-source components focused on ensuring data quality. The mission includes creating tools to detect missing data, outliers, duplicates, and irregularities in real-time data streams, while aligning with RTDIP’s development guidelines to promote robust, scalable, and collaborative solutions.
Industry Partner Shell
Team Logo
Project SummaryThis project integrates various Python packages to process sensor data in the format used by Shell. It is designed to efficiently handle large PySpark DataFrames, with successful testing on real datasets. A key focus is on ensuring ease of use for all components while allowing for monitoring and preprocessing of data.

In order to meet the requirement of integrating our work into an existing open source project, high standards for code quality were ensured. This is why we included extensive documentation and high unit test coverage with all components.

List Of Implemented Components

Data Manipulation

* Input Validator
* Z-Score Normalization
* Minmax Normalization
* Mean Normalization
* Denormalization
* Duplicate Detection
* Interval Filtering
* Missing Value Imputation
* Dimensionality Reduction
* K-Sigma Anomaly Detection
* Out of Range Value Filter
* Gaussian Smoothing

Data Monitoring
* Check Value Ranges
* Flatline Detection
* Identify Missing Data

Forecasts
* ARIMA
* Auto ARIMA
* Linear Regression
* K-Nearest Neighbors

Transformers
* OneHotEncoding
* Columns To Vector
* Polynomial Features
Project Illustration

Team Photo
Project Repositoryhttps://github.com/amosproj/amos2024ws01-rtdip-data-quality-checker/