This document defines x86 extensions for accelerating computation tasks, initially focusing on matrix multiplication kernels and reduced precision data formats important to ML workloads.
The ACE extensions define matrix multiplication primitives that augment AVX and scalar code with new capabilities, adding:
ACE provides tight integration between AVX vectors and ACE tile registers, combining high compute density tile processing operations with the comprehensive data processing features of AVX.
In addition to matrix acceleration, a number of dedicated format convert operations are provided under the AVX10 framework.