This paper shows that high-resolution image diffusion can be trained directly in pixel space with a simple recipe: adapt the noise schedule, scale the right parts of the architecture, place dropout carefully, and downsample high-resolution feature maps.