Architecture students bring new forms of human-machine interaction into the kitchen.
- •MIT researchers unveil Kitchen Cosmo, a physical Large Language Object for AI-assisted cooking.
- •Interactive prototype uses Vision Language Models to identify ingredients and generate customized recipes.
- •Tactile design integrates physical dials and thermal printers to move AI beyond digital screens.
MIT Architecture students have bridged the gap between digital intelligence and the physical world with Kitchen Cosmo, a tactile interactive assistant. This device belongs to a new category called Large Language Objects (LLOs), which are physical interfaces that ground the capabilities of a Large Language Model into real-world environments. By moving beyond screens, Kitchen Cosmo uses a hinged webcam to scan ingredients and interprets them through a Vision Language Model (VLM), a type of AI that can process both text and visual information simultaneously.
The project, led by students Jacob Payne and Ayah Mahmoud, draws inspiration from the 1969 Honeywell 316 Kitchen Computer but adds modern generative capabilities. Users interact with the device through physical dials to set parameters such as dietary restrictions and "mood." This tactile approach aims to make AI a partner rather than an invisible tool. The model was optimized through Fine-tuning to grasp culinary concepts like regional spice profiles and cooking temperatures, which often challenge standard text-based models lacking "physical" common sense.
By integrating a thermal printer, the device offers a tangible output that keeps the user focused on the kitchen counter. The creators are now exploring "learning modes" where the AI could demonstrate how to use specific kitchen tools. This shift toward physically situated AI highlights a growing trend where intelligence is no longer just a digital service, but a physical presence capable of contextual understanding and real-time interaction.