Table of Contents
- Introduction
- Why Datasets Matter in QML
- Classical vs Quantum Datasets
- Synthetic Datasets for Quantum ML
- Real-World Use Cases for Quantum Datasets
- Benchmarking in Classical ML vs QML
- Types of Quantum Datasets
- Quantum-Classical Hybrid Datasets
- Dataset Formats and Representations
- Encoding Datasets into Quantum Circuits
- Quantum Dataset Libraries and Platforms
- IBM Qiskit Datasets and qiskit-machine-learning
- PennyLane Datasets and QML Benchmarks
- TFQ Datasets and Integration
- Notable Quantum Benchmarks
- Quantum Dataset Generation Techniques
- Evaluation Metrics in QML Benchmarks
- Challenges in Dataset Standardization
- Open Source Quantum ML Datasets
- Conclusion
1. Introduction
Quantum machine learning (QML) requires appropriate datasets and benchmarks to compare models, evaluate algorithms, and validate performance. As the field evolves, the creation and standardization of quantum datasets are becoming increasingly important.
2. Why Datasets Matter in QML
- Provide ground truth for training and validation
- Enable reproducibility of experiments
- Support fair comparison between quantum and classical models
3. Classical vs Quantum Datasets
Feature | Classical Dataset | Quantum Dataset |
---|---|---|
Format | Vectors, matrices | States, circuits, density |
Input size | MBs to GBs | Limited by qubit count |
Access method | CSV, images, tensors | Qiskit, PennyLane objects |
4. Synthetic Datasets for Quantum ML
- Iris dataset (projected into quantum encodings)
- Parity classification
- Quantum state discrimination
- XOR problem in quantum space
5. Real-World Use Cases for Quantum Datasets
- Quantum chemistry states
- Material simulations (e.g., lattice models)
- Financial time series encoded in qubit registers
6. Benchmarking in Classical ML vs QML
- MNIST, CIFAR-10 in classical ML
- No widely accepted standard yet in QML
- Most studies use simulated or re-encoded datasets
7. Types of Quantum Datasets
- Labeled qubit states
- Quantum circuits as data points
- Quantum trajectories and time evolution data
8. Quantum-Classical Hybrid Datasets
- Classical data encoded in quantum circuits (e.g., angle encoding)
- Used for hybrid models and transfer learning
9. Dataset Formats and Representations
- NumPy arrays for parameters
- Qiskit
QuantumCircuit
objects - PennyLane templates with labels
10. Encoding Datasets into Quantum Circuits
- Angle Encoding: \( x_i
ightarrow RY(x_i) \) - Amplitude Encoding: normalize data vector and map to amplitudes
- Basis Encoding: binary feature maps to qubit states
11. Quantum Dataset Libraries and Platforms
- Qiskit’s
datasets
andqiskit_machine_learning.datasets
- PennyLane’s
qml.datasets
module - TFQ’s
tfq.datasets
12. IBM Qiskit Datasets and qiskit-machine-learning
- Ad hoc dataset loaders (e.g.,
ad_hoc_data
) - Iris, breast cancer, quantum-enhanced classification tasks
13. PennyLane Datasets and QML Benchmarks
qml.datasets.qcircuits()
for circuit generation- Integration with PyTorch and TensorFlow
14. TFQ Datasets and Integration
- TFQ provides datasets in TensorFlow tensor format
- Supports quantum-enhanced layers on top of classical embeddings
15. Notable Quantum Benchmarks
- VQE on molecule datasets (H2, LiH, BeH2)
- QAOA on graph optimization
- Quantum kernel classification (synthetic vs noisy data)
16. Quantum Dataset Generation Techniques
- Generate circuits with specific entanglement properties
- Simulate Hamiltonian dynamics
- Create oracle-based classification labels
17. Evaluation Metrics in QML Benchmarks
- Accuracy, precision, recall (classification)
- Fidelity with target quantum states
- Cost function convergence and gradient norms
18. Challenges in Dataset Standardization
- Lack of large-scale quantum-native datasets
- Hardware dependence of results
- Reproducibility due to shot noise and backend drift
19. Open Source Quantum ML Datasets
- Pennylane QHack challenges
- QML community benchmarks on GitHub
- Synthetic generators like QSet and QData
20. Conclusion
Quantum datasets and benchmarks are crucial to the development and evaluation of QML models. As quantum hardware scales and software matures, more standardized and diverse datasets will become available, enabling meaningful comparisons and progress across the field.