The first unified multithreaded multimodal diffusion system. Versatile Diffusion initially supports image-to-text, image-to-variation, text-to-image, and text-to-variation, and can be extended for other applications such as semantic unraveling, dual-controlled image and text generation, hidden image-to-text and image-to-image editing, and more. Future versions will support more modalities such as speech, music, video, and 3D.