AWS and TGS Accelerate Seismic AI Training 36x
- •TGS and AWS optimize seismic foundation models, slashing training time from six months to five days
- •Context parallelism techniques expand the model’s analytical field of view by 4.5x for larger 3D volumes
- •Amazon SageMaker HyperPod enables near-linear scaling across 128 NVIDIA H200 GPUs using DeepSpeed ZeRO-2
Geoscience data leader TGS has achieved a massive breakthrough in subsurface analysis by modernizing its AI training infrastructure on AWS. By leveraging Amazon SageMaker HyperPod, the team optimized their Vision Transformer (ViT) architecture, which analyzes complex 3D seismic data to locate energy resources. This infrastructure overhaul reduced a grueling six-month training cycle to a mere five days—a 36-fold increase in efficiency that allows for weekly model iterations instead of twice-yearly updates.
The collaboration centered on overcoming the data bottleneck inherent in massive 3D volumes. Instead of using traditional file systems, the team implemented a direct streaming pipeline from Amazon S3, achieving aggregate throughput of up to 80 GBps. To manage the massive computational load, they utilized DeepSpeed ZeRO-2 to distribute model states across 128 NVIDIA H200 GPUs, maintaining near-linear scaling efficiency while minimizing memory overhead.
Perhaps the most impressive technical feat is the 4.5x expansion of the model’s context window. Using Ring Attention (a method where GPUs share data in a circular chain), the model can now process over 1.1 million tokens. This expanded field of view allows the AI to perceive both microscopic fractures and basin-wide geological patterns simultaneously, providing unprecedented clarity for energy exploration.