Meta Unveils SAM 3.1 for Real-Time Video Segmentation
- •Meta releases SAM 3.1 with object multiplexing for 2x faster video tracking
- •New model processes 16 objects simultaneously, achieving 32 FPS on H100 GPUs
- •Integration expands to Instagram video effects and Facebook Marketplace's View in Room
Meta’s AI research team has introduced SAM 3.1, a significant update to their Segment Anything Model, focusing on high-speed video processing and "promptable concept segmentation." By shifting from processing objects individually to a "global reasoning" approach, the model can now track up to 16 different items in a single forward pass through the system.
This technical leap, dubbed object multiplexing, doubles throughput from 16 to 32 frames per second on high-end hardware like the H100 GPU. This efficiency allows the model to handle complex, crowded scenes in real-time while lowering the barrier for high-performance applications on more accessible hardware. Beyond raw speed, SAM 3.1 enables users to define objects using natural language phrases like "the striped red umbrella" rather than being limited to a fixed list of categories.
The update also powers new consumer features. Instagram’s "Edits" app will soon allow creators to apply visual effects to specific people or objects with a single tap. Meanwhile, Facebook Marketplace is utilizing the model’s 3D sister-release, SAM 3D, for a "View in Room" feature that helps shoppers visualize how furniture fits in their actual living spaces.
To build the massive dataset required for such precision, Meta developed a "data engine" that pairs human reviewers with AI annotators powered by Llama models. This hybrid system speeds up data labeling by up to five times, allowing the team to curate over four million unique visual concepts.