ShapeShift: Text-to-Mosaic Synthesis via Semantic Phase-Field Guidance

Vihaan Misra, Peter Schaldenbrand, Jean Oh

arXiv:2503.14720·cs.CV·Published 2025-03-18·Updated 2026-02-23

We present ShapeShift, a method for arranging rigid objects into configurations that visually convey semantic concepts specified by natural language. While pretrained diffusion models provide powerful semantic guidance, such as Score Distillation Sampling, enforcing physical validity poses a fundamental challenge. Naive overlap resolution disrupts semantic structure -- separating overlapping shapes along geometrically optimal directions (minimum translation vectors) often destroys the very arrangements that make concepts recognizable. Our intuition is that diffusion model features encode not just what a concept looks like, but its geometric, directional structure -- how it is oriented and shaped -- which we leverage to make overlap resolution semantically aware. We introduce a deformable boundary represented as a phase field that expands anisotropically, guided by intermediate features from the diffusion model, creating space along semantically coherent directions. Experiments demonstrate that ShapeShift, by coupling semantic guidance and feasibility constraint resolution, produces arrangements achieving both semantic clarity and overlap-free validity, significantly outperforming baselines that treat these objectives independently.

TopicsGenerative Models & Discovery

Tagsdiffusion-models

arXiv categoriescs.CV

arXiv abstract page PDF