UnlearnCanvas: A Stylized Image Dataset to Benchmark Machine Unlearning for Diffusion Models and Beyond

Michigan State University, IBM Research, Cisco Research

UnlearnCanvas: An image gallery of the artistic styles in this dataset for benchmarking machine unlearning and beyond.

Abstract


The rapid advancement of diffusion models (DMs) has not only transformed various real-world industries but has also introduced negative societal concerns, including the generation of harmful content, copyright disputes, and the rise of stereotypes and biases. To mitigate these issues, machine unlearning (MU) has emerged as a potential solution, demonstrating its ability to remove undesired generative capabilities of DMs in various applications. However, by examining existing MU evaluation methods, we uncover several key challenges that can result in incomplete, inaccurate, or biased evaluations for MU in DMs. To address them, we enhance the evaluation metrics for MU, including the introduction of an often-overlooked retainability measurement for DMs post-unlearning. Additionally, we introduce UnlearnCanvas, a comprehensive high-resolution stylized image dataset that facilitates us to evaluate the unlearning of artistic painting styles in conjunction with associated image objects. We show that this dataset plays a pivotal role in establishing a standardized and automated evaluation framework for MU techniques on DMs, featuring 7 quantitative metrics to address various aspects of unlearning effectiveness. Through extensive experiments, we benchmark 5 state-of-the-art MU methods, revealing novel insights into their pros and cons, and the underlying unlearning mechanisms. Furthermore, we demonstrate the potential of UnlearnCanvas to benchmark other generative modeling tasks, such as style transfer.

Video


Machine Unlearning for Diffusion Models




Figure: An illustration of the task Machine Unlearning for Diffusion Models. The pretrained model contains the generation capability of different concepts in different domains. MU aims to erase the generation of a certain unlearning target concept while retaining the generation of others.


To enhance the assessment of MU in DMs and establish a standardized evaluation framework, we propose to develop a new benchmark dataset, referred to as UnlearnCanvas, which is designed to evaluate the unlearning of artistic painting styles along with associated image objects.

What is Missing in Current Evaluation Systems?

Unresolved Challenges in MU Evaluation


We started by examining the existing MU evaluation methods and uncovered several key challenges that can result in incomplete, inaccurate, or biased evaluations for MU in DMs.


Table: An overview of the scope and evaluation methods of the existing MU methods for DMs.


Challenge I: The absence of a diverse yet cohesive unlearning target repository. The assessment of MU for DMs, in terms of both effectiveness (concept erasure performance) and retainability (preserved generation quality under non-forgotten concepts), is typically conducted using manually selected targets, often chosen from a limited pool of unlearning targets.


Challenge II: The lack of precision in evaluation. Artistic styles can be challenging to precisely define and distinguish, which makes quantifying their impact on unlearning effectiveness and retainability difficult.


Challenge III: The lack of a systematic study on ‘retainability’ of DMs post unlearning. As indicated by the table above, assessing the capacity of unlearned DMs to maintain image generation under innocent concepts is notably lacking, known as the quantitative evaluation of the ‘retainability’.

Our Proposal: UnlearnCanvas Dataset


UnlearnCanvas is created for ease of MU evaluation in DMs. Its construction involves two main steps: seed image collection and subsequent image stylization.


Figure: An illustration of the key steps when curating UnlearnCanvas and its key features.


Advantages of UnlearnCanvas.
A1: Style-object dual supervision enables a rich unlearning target bank for comprehensive evaluation.
A2: High stylistic consistency ensures precise style definitions and enables accurate quantitative evaluations.
A3: Enabling in-depth retainability analyses for MU evaluation.


Key features of UnlearnCanvas:
High-Resolution
Dual-Supervised
Balanced Dataset
High stylistic consistent within each style.
High stylistic distinction across different styles.

An Automated, Complete, and Accurate Evaluation Pipe with UnlearnCanvas



Figure: An illustration of the key steps when curating UnlearnCanvas and its key features.


We introduce the evaluation pipeline and the benchmarked MU methods with UnlearnCanvas, which comprises four phases (I-IV) to evaluate unlearning effectiveness, retainability, generation quality, and efficiency.
Phase I: Testbed preparation. We commence by fine-tuning a specific DM, given by SD v1.5 on UnlearnCanvas for text-to-image generation, and a ViT-Large for style and object classification after unlearning.
Phase II: Machine unlearning. We utilize the selected MU methods for benchmarking to update the DM acquired in Phase I, aiming to eliminate a designated unlearning target.
Phase III: Answer set generation. We utilize the unlearned model to generate a set of images conditioned on both the unlearning-related prompts and other innocent prompts. For comprehensive evaluation, three types of answer sets are generated: for unlearning effectiveness, in-domain retainability, and cross-domain retainability evaluation.
Phase IV: Answer set generation. The answer set undergoes style/object classification for unlearning performance assessment. This classification results in three quantitative metrics: unlearning accuracy, in-domain retaining accuracy, and cross-domain retaining accuracy.

Experiment Results with UnlearnCanvas


Overall Performance Assessment


Table: Performance overview of different MU methods evaluated with UnlearnCanvas dataset. The performance assessment includes unlearning accuracy (UA), in-domain retain accuracy (IRA), cross-domain retain accuracy CRA), and FID. Symbols ↑ or ↓ denote whether larger or smaller values represent better performance. Results are averaged over all the style and object unlearning cases. The best performance regarding each metric is highlighted in bold.



(1) Retainability is essential for comprehensive assessment of MU in DMs.
(2) CRA (cross-domain retaining accuracy) is harder to retain than IRA (in-domain retaining accuracy).


Figure: Performance visualization of different methods. For UA, IRA, and CRA, the results are averaged over all the style and object unlearning scenarios. Other metrics undertake the inverse operation as a smaller values represent better performance. Results are normalized to 0% ∼ 100% per metric..


(3) One MU method can perform differently in different domains.
(4) No single method can excel in all aspects.



A closer Look into Retainability: A Case Study on ESD.


Figure: Left: Heatmap visualization of the unlearning and retain accuracy of ESD on UnlearnCanvas. The x-axis shows the tested concepts for image generation using the unlearned model, while the y-axis indicates the unlearning target. Concept types are distinguished by color: styles in blue and objects in orange. The figure is separated into different regions to represent corresponding evaluation metrics and the unlearning scopes (A for UA, B for IRA, C for CRA; ‘1’ for style unlearning, ‘2’ for object unlearning). Diagonal regions (A1 and A2) indicate unlearning accuracy, and off-diagonal values (B1, B2, C1, and C2) represent retain accuracy. Higher values in lighter color denote better performance. The first row serves as a reference for comparison before unlearning. Zooming into the figure is recommended for detailed observation. Right: Representative cases illustrating each region with images generated before and after unlearning a specific concept..


(1) An MU method might demonstrate a preference within a specific domain but face challenges in others.
(2) Style/Object unlearning is relatively easier compared to retaining the generation performance of unlearned DMs conditioned on unlearning-unrelated prompts.



Figure: Heatmap visualization of SalUn..

(3) In comparison to ESD, SalUn demonstrates more consistent performance across different unlearning scenarios. However, SalUn does not achieve the same level of UA as ESD, as evidenced by the lighter diagonal values



Understanding Unlearning Method’s Behavior via Unlearning Directions.


Figure: Visualization of the unlearning directions of (a) ESD and (b) SalUn. This figure illustrates the conceptual shift of the generated images of an unlearned model conditioned on the unlearning target. Images generated by the post-unlearning models are classified and used to understand this shift. Edges leading from the object in the left column to the right signify that images generated conditioned on unlearning targets are instead classified as the shifted concepts after unlearning. This reveals the primary unlearning direction for each unlearning method. The most dominant unlearning direction for an object is visualized. Figure (c) provides visualizations of generated images using the prompt template ‘A painting of {object} in Sketch style.’ with object being each unlearning target..


Different unlearning methods display distinct unlearning behaviors. These unlearning directions are determined by connecting the unlearning target with the predicted label of the generated image from the unlearned DM conditioned on the unlearning target.

Paper


Y. Zhang, Y. Zhang, Y. Yao, J. Jia, J. Liu, X. Liu, S. Liu.
UnlearnCanvas: A Stylized Image Dataset to Benchmark Machine Unlearning for Diffusion Models
(hosted on ArXiv)

BibTeX


@article{zhang2024unlearncanvas,
  title={UnlearnCanvas: A Stylized Image Dataset to Benchmark Machine Unlearning for Diffusion Models},
  author={Zhang, Yihua and Zhang, Yimeng and Yao, Yuguang and Jia, Jinghan and Liu, Jiancheng and Liu, Xiaoming and Liu, Sijia},
  journal={arXiv preprint arXiv:2402.11846},
  year={2024}
}