This repository now only contains multiple FTIR spectral datasets used for polymer classification and machine learning experiments.
The datasets are primarily focused on the following six commodity plastics:
- PET — Polyethylene Terephthalate
- HDPE — High-Density Polyethylene
- LDPE — Low-Density Polyethylene
- PP — Polypropylene
- PS — Polystyrene
- PVC — Polyvinyl Chloride
| Dataset | Samples | Description |
|---|---|---|
openspecy_polymer_dataset.csv |
8390 | Extracted from the OpenSpecy spectral library |
FTIR_PLASTIC_c4.csv |
3000 | Balanced FTIR plastics dataset |
FTIR_PLASTIC_c8.csv |
3000 | Balanced FTIR plastics dataset |
openspecy_polymer_dataset.csv
This dataset was generated from the OpenSpecy spectral library using custom building scripts.
The dataset contains:
- FTIR and related infrared spectra
- 1983 spectral features per sample
- 6 polymer classes
Each row corresponds to one spectrum.
8390 rows × 1984 columns
- Columns
V1→V1983: spectral intensity values - Column
label: polymer class label
| Polymer | Samples |
|---|---|
| HDPE | 1057 |
| LDPE | 867 |
| PET | 2267 |
| PP | 2451 |
| PS | 1491 |
| PVC | 257 |
The 1983 feature columns correspond to spectral intensity values measured across different wavenumbers.
Additional wavenumber metadata is stored separately in:
openspecy_wavenumbers.csv
Each feature index corresponds to a specific FTIR wavenumber.
FTIR_PLASTIC_c4.csv
FTIR_PLASTIC_c8.csv
These datasets are balanced FTIR polymer datasets containing 1000 samples for each plastic class.
Each sample stores:
- polymer metadata
- spectral wavelength values
- spectral intensity values
Each dataset contains:
3000 samples
with balanced polymer classes.
| Column | Description |
|---|---|
IDE |
Sample identifier |
Polymer |
Polymer class label |
Technic |
Spectroscopy technique |
Sample |
Sample description |
BR |
Metadata field |
RST |
Metadata field |
Data(x) |
Wavelength / wavenumber values |
Data(y) |
Spectral intensity values |
In FTIR spectroscopy:
Data(x)usually represents the wavenumber axis (cm⁻¹)Data(y)represents absorbance, transmittance, or intensity values
The exact interpretation depends on the original acquisition setup.
The system shall:
- Accept FTIR spectral data as input.
- Preprocess spectra into a standardized format suitable for machine learning.
- Normalize absorbance values before classification.
The system shall:
- Predict the plastic category from a given infrared spectrum.
- Support classification into the six defined plastic classes.
- Output the predicted class and confidence score.
The project shall implement and evaluate:
- A 1D Convolutional Neural Network (1D-CNN)
- A Random Forest classifier
The models shall be compared using:
- Classification accuracy
- Inference speed
- Robustness
- Suitability for real-time deployment
The system shall:
- Save trained models to disk.
- Reload trained models for future inference without retraining.
The system shall:
- Evaluate models using validation and test datasets.
- Produce accuracy metrics and classification reports.
- Support single-sample testing for inference verification.
The classifier should achieve high classification accuracy across all supported plastic categories.
(Done) Current experimental results using the 1D-CNN model have achieved near-perfect classification performance on the available dataset.
The system should support fast inference suitable for real-time or near real-time recycling machine operation.
(Need improvement) Inference time should remain sufficiently low to support conveyor-belt-based plastic sorting systems.
The architecture should support:
- Addition of new plastic classes
- Integration of future datasets
- Deployment to embedded or edge-computing systems
The system should maintain stable predictions under repeated testing and varying spectra conditions.
Recommended pipeline:
-
Train models on:
FTIR_PLASTIC_c4.csvFTIR_PLASTIC_c8.csv
-
Evaluate generalization on:
openspecy_polymer_dataset.csv
This helps test model robustness across:
- instruments
- preprocessing pipelines
- spectral variability
- real-world samples
-
Spectral preprocessing may still be required before training.
-
Recommended preprocessing:
- normalization
- baseline correction
- interpolation
- smoothing
- noise filtering
-
OpenSpecy data contains spectra from multiple sources and instruments, which may introduce distribution shifts.
Please refer to the original dataset and OpenSpecy licenses for redistribution and usage terms.