oneDNN

Introduction

oneDNN is the open source cross-platform performance acceleration library for deep learning from Intel, The documentation guides you to find out which primitives are supported. OneDNN has been integrated into DeepRec, which can be enabled by adding the compiling option in the compile command. --config=mkl_threadpool is used to enable oneDNN accelerated arithmetic computation. Adding the compiling option --config=opt will enable the optimization of --copt=-march=native, which can further accelerate arithmetic performance on the CPU which supports AVX512, for example, Skylake, Caslake and Icelake.

Tips: MKL was first renamed as DNNL and then renamed as oneDNN. Tensorflow initially used MKL to accelerate the computation of the operators, and in subsequent versions of iteration, oneDNN gradually take the place of MKL, but the macro definitions were still retained.

Macro definition of oneDNN in DeepRec:

Macro Definition	Values（Bold for Default）	Explanation
TF_MKL_PRIMITIVE_ONLY_FOR_RECO	1/true, 0/false	1: Only replace the operators which supported by oneDNN in recommendation models; 0: Replace all of the operators to that supported by oneDNN.
TF_MKL_OPTIMIZE_PRIMITIVE_MEMUSE	1/true, 0/false	1: Reduce the use of main memory by releasing the primitives; 0: Don’t release primitives.
TF_DISABLE_MKL	0, 1	0: Enable MKL; 1: Disable MKL
TF_MKL_NUM_INTRAOP	Integer, such as 14 ,Not set by default	Integer：set the number of intra threads used by oneDNN；Not set：number of TF intra threads used most.
ONEDNN_VERBOSE	0/1/2	Print the level of log output by oneDNN primitive.
DNNL_MAX_CPU_ISA	ALL, AVX512_CORE_AMX, AVX512_CORE_BF16, …	The highest ISA used by oneDNN (for versions less than 2.5.0)
ONEDNN_MAX_CPU_ISA	ALL, AVX512_CORE_AMX, AVX512_CORE_BF16, …	The highest ISA by oneDNN (for versions more than or equal to 2.5.0)

Primitives supported by oneDNN:

Primitive	Available Types	Available Backward Operations
Matrix Multiplication	f32, bf16, f16, u8, s8	Scale, Zero, Eltwise, Sum, Binary
Inner Product	f32, bf16, f16, u8, s8	Scale, Eltwise, Sum, Binary
Layer Normalization	f32, bf16, f16	/
Batch Normalization	f32, bf16, f16, s8	Eltwise
Local Response Normalization (LRN)	f32, bf16, f16	/
Binary (+, =, *, /, >, <, min, max…)	f32, bf16, f16, u8, s8	Scale, Eltwise, Sum, Binary
Eltwise (relu, gelu, tanh, linear…)	f32, s32, bf16, f16, u8, s8	Binary
PReLU	f32, s32, bf16, s8, u8	/
Sum	f32, s32, bf16, f16, u8, s8	/
Reduction	f32, bf16, u8, s8	Eltwise, Sum, Binary
Softmax	f32, bf16, f16	/
LogSoftmax	f32, bf16	/
Reorder	f32, s32, bf16, f16, u8, s8	Scale, Sum
Concat	f32, s32, bf16, f16, u8, s8	/
Convolution	f32, bf16, f16, u8, s8	Scale, Zero, Eltwise, Sum, Binary
Pooling	f32, s32, bf16, f16, u8, s8	Binary
RNN (LSTM, GRU, Vanilla RNN…)	f32, bf16, f16, u8, s8	/
Resampling	f32, s32, bf16, f16, s8, u8	Eltwise, Sum, Binary
Shuffle	f32, s32, bf16, s8, u8	/