oneDNN

Introduction

oneDNN is the open source cross-platform performance acceleration library for deep learning from Intel, The documentation guides you to find out which primitives are supported. OneDNN has been integrated into DeepRec, which can be enabled by adding the compiling option in the compile command. --config=mkl_threadpool is used to enable oneDNN accelerated arithmetic computation. Adding the compiling option --config=opt will enable the optimization of --copt=-march=native, which can further accelerate arithmetic performance on the CPU which supports AVX512, for example, Skylake, Caslake and Icelake.

Tips: MKL was first renamed as DNNL and then renamed as oneDNN. Tensorflow initially used MKL to accelerate the computation of the operators, and in subsequent versions of iteration, oneDNN gradually take the place of MKL, but the macro definitions were still retained.

Macro definition of oneDNN in DeepRec:

Macro Definition

Values(Bold for Default)

Explanation

TF_MKL_PRIMITIVE_ONLY_FOR_RECO

1/true, 0/false

1: Only replace the operators which supported by oneDNN in recommendation models; 0: Replace all of the operators to that supported by oneDNN.

TF_MKL_OPTIMIZE_PRIMITIVE_MEMUSE

1/true, 0/false

1: Reduce the use of main memory by releasing the primitives; 0: Don’t release primitives.

TF_DISABLE_MKL

0, 1

0: Enable MKL; 1: Disable MKL

TF_MKL_NUM_INTRAOP

Integer, such as 14 ,Not set by default

Integer:set the number of intra threads used by oneDNN;Not set:number of TF intra threads used most.

ONEDNN_VERBOSE

0/1/2

Print the level of log output by oneDNN primitive.

DNNL_MAX_CPU_ISA

ALL, AVX512_CORE_AMX, AVX512_CORE_BF16, …

The highest ISA used by oneDNN (for versions less than 2.5.0)

ONEDNN_MAX_CPU_ISA

ALL, AVX512_CORE_AMX, AVX512_CORE_BF16, …

The highest ISA by oneDNN (for versions more than or equal to 2.5.0)

Primitives supported by oneDNN:

Primitive

Available Types

Available Backward Operations

Matrix Multiplication

f32, bf16, f16, u8, s8

Scale, Zero, Eltwise, Sum, Binary

Inner Product

f32, bf16, f16, u8, s8

Scale, Eltwise, Sum, Binary

Layer Normalization

f32, bf16, f16

/

Batch Normalization

f32, bf16, f16, s8

Eltwise

Local Response Normalization (LRN)

f32, bf16, f16

/

Binary (+, =, *, /, >, <, min, max…)

f32, bf16, f16, u8, s8

Scale, Eltwise, Sum, Binary

Eltwise (relu, gelu, tanh, linear…)

f32, s32, bf16, f16, u8, s8

Binary

PReLU

f32, s32, bf16, s8, u8

/

Sum

f32, s32, bf16, f16, u8, s8

/

Reduction

f32, bf16, u8, s8

Eltwise, Sum, Binary

Softmax

f32, bf16, f16

/

LogSoftmax

f32, bf16

/

Reorder

f32, s32, bf16, f16, u8, s8

Scale, Sum

Concat

f32, s32, bf16, f16, u8, s8

/

Convolution

f32, bf16, f16, u8, s8

Scale, Zero, Eltwise, Sum, Binary

Pooling

f32, s32, bf16, f16, u8, s8

Binary

RNN (LSTM, GRU, Vanilla RNN…)

f32, bf16, f16, u8, s8

/

Resampling

f32, s32, bf16, f16, s8, u8

Eltwise, Sum, Binary

Shuffle

f32, s32, bf16, s8, u8

/