dropbox.tech
|
ksl
|
|
Dropbox’s ML team published a detailed technical walkthrough of how they deploy quantized models across Dash, their AI-powered assistant handling search, document understanding, and speech processing. The piece covers the full landscape – from symmetric and asymmetric linear quantization to newer MXFP and NVFP4 formats that let Tensor Cores operate directly on packed low-bit data. What stands out is the honesty about gaps: FP4 framework support is still patchy, pre-quantized models are scarce, and portability across GPU architectures remains painful. More infrastructure teams are quietly publishing these kinds of production-focused quantization guides, which says something about where the real bottleneck in AI deployment has shifted – away from model quality and toward serving economics.
