Apple, $AAPL, researchers have released a new model that lets users describe in plain language what they want to change in a photo without ever touching photo editing software
Apple, $AAPL, researchers have released a new model that lets users describe in plain language what they want to change in a photo without ever touching photo editing software, per the Verge.
The collaborative effort between Apple and the University of California, Santa Barbara resulted in the development of the MGIE model, capable of performing various image editing tasks solely through text prompts.
Referred to as MLLM-Guided Image Editing (MGIE), this model is adept at both simple and intricate image editing operations, such as resizing, flipping, and applying filters, as well as more nuanced tasks like altering specific objects within a photo to change their shape or enhance their brightness. The model effectively integrates two distinct functionalities of multimodal language models: understanding user prompts and envisioning the desired edits. For instance, requesting a bluer sky in a photograph translates into adjusting the brightness of the sky portion of the image.
When utilizing MGIE to edit an image, users simply articulate their desired modifications through text input. For example, instructing the model to "make it more healthy" while editing an image of a pepperoni pizza would result in the addition of vegetable toppings. Similarly, a directive to "add more contrast to simulate more light" transforms a dark photo of tigers in the Sahara into a brighter depiction.
The researchers behind MGIE highlighted its capability to derive explicit, visually informed intentions from user input, thereby facilitating effective image editing. They conducted comprehensive evaluations across various editing scenarios, demonstrating that MGIE significantly enhances performance while maintaining competitive efficiency. Additionally, they expressed confidence in the potential of the MLLM-guided framework to advance future research in vision and language.
Although Apple has made MGIE available for download via GitHub and provided a web demo on Hugging Face Spaces, its long-term plans for the model remain undisclosed.
While platforms like OpenAI's DALL-E 3 and Adobe's Firefly AI offer similar capabilities, Apple has not historically been a prominent player in the generative AI domain, unlike Microsoft, Meta, or Google. However, Apple CEO Tim Cook has indicated the company's intent to incorporate more AI features into its devices in the coming year. In December, Apple researchers also introduced MLX, an open-source machine learning framework designed to streamline AI model training on Apple Silicon chips.