Privacy Dataset Distillation for Medical Imaging (OCTMNIST)
Data distillation method balancing privacy protection with key feature retention — achieving 47.25% accuracy under IPC=1, surpassing existing best results.
Problem
Medical image datasets contain sensitive patient information. Training on raw data risks privacy breaches, while naive anonymization destroys critical diagnostic features.
Action
Co-designed a distillation method introducing class-centric and covariance-matching constraints. Conducted multiple experiments under varied images-per-class (IPC=1,2,5,10) settings. Benchmarked against DC, DM, and DSA3 methods. Assessed privacy by quantifying structural similarity between original and distilled images.
Result
Achieved 47.25% accuracy under IPC=1, surpassing existing best results. Cross-class SSIM ranged 0.68–0.75 and L2 norm 0.59–0.65, demonstrating effective anonymization. Published at AASIP 2024.
Learnings
Deepened understanding of privacy-utility tradeoffs in dataset distillation and the role of structural constraints in preserving discriminative features while ensuring anonymization.