Question

如何在 Windows 上为 RTX 3070 设置 Tensorflow？

2021-01-26

4802

python windows tensorflow

我正在使用 Windows 10，并尝试设置 tesnsorflow 脚本以与我的新 RTX 3070 GPU 配合使用。之前我在 GTX 980 上使用它。

从二进制文件安装 TensorFlow ( pip3 install tensorflow )
尝试了最新的稳定版 v2.4.0-49-g85c8b2a817f 2.4.1，但也尝试了夜间版（见下文）
Win32 上的 Python 3.6.8 (tags/v3.6.8:3c6b436a57，2018 年 12 月 24 日，00:16:47) [MSC v.1916 64 位 (AMD64)]
CUDA/cuDNN 版本：cuda_11.2.0_460.89_win10\cudnn-11.1-v8.0.5.39
GPU 型号和内存：似乎被 TF 正确识别 - GeForce RTX 3070 计算能力：8.6 核心时钟：1.725GHz 核心数：46 设备内存大小：8.00GiB 设备内存带宽：417.29GiB/s

当前行为

出现以下错误：

2021-01-25 21:36:01.042433: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Epoch 1/500
2021-01-25 21:36:03.304809: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-01-25 21:36:03.880223: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-01-25 21:36:03.911531: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-01-25 21:36:04.515409: E tensorflow/stream_executor/cuda/cuda_dnn.cc:336] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2021-01-25 21:36:04.515498: E tensorflow/stream_executor/cuda/cuda_dnn.cc:340] Error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
2021-01-25 21:36:04.515607: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at cudnn_rnn_ops.cc:1514 : Unknown: Fail to find the dnn implementation.
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "C:\Users\Aleksander\.IntelliJIdea2018.3\config\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "C:\Users\Aleksander\.IntelliJIdea2018.3\config\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:/Workspace_GpwScan/dnn/sandbox/reproduce_issue.py", line 110, in <module>
    callbacks=[checkpoint, tensorboard])
  File "C:\Workspace_GpwScan\stubs\tensorflow\python\keras\engine\training.py", line 1100, in fit
    tmp_logs = self.train_function(iterator)
  File "C:\Workspace_GpwScan\stubs\tensorflow\python\eager\def_function.py", line 828, in __call__
    result = self._call(*args, **kwds)
  File "C:\Workspace_GpwScan\stubs\tensorflow\python\eager\def_function.py", line 888, in _call
    return self._stateless_fn(*args, **kwds)
  File "C:\Workspace_GpwScan\stubs\tensorflow\python\eager\function.py", line 2943, in __call__
    filtered_flat_args, captured_inputs=graph_function.captured_inputs)  # pylint: disable=protected-access
  File "C:\Workspace_GpwScan\stubs\tensorflow\python\eager\function.py", line 1919, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "C:\Workspace_GpwScan\stubs\tensorflow\python\eager\function.py", line 560, in call
    ctx=ctx)
  File "C:\Workspace_GpwScan\stubs\tensorflow\python\eager\execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.UnknownError:    Fail to find the dnn implementation.
     [[{{node CudnnRNN}}]]
     [[sequential/lstm/PartitionedCall]] [Op:__inference_train_function_8782]
Function call stack:
train_function -> train_function -> train_function

tf_2.4.1_issue_on_3070.txt

也尝试使用最新的 nightly 2.4.02.5.0.dev20210125 ，结果错误:

2021-01-25 21:31:05.429799: E tensorflow/stream_executor/dnn.cc:618] CUDNN_STATUS_EXECUTION_FAILED
in tensorflow/stream_executor/cuda/cuda_dnn.cc(1975): 'cudnnRNNBackwardData( cudnn.handle(), rnn_desc.handle(), model_dims.max_seq_length, output_desc.handles(), output_data.opaque(), output_desc.handles(), output_backprop_data.opaque(), output_h_desc.handle(), output_h_backprop_data.opaque(), output_c_desc.handle(), output_c_backprop_data.opaque(), rnn_desc.params_handle(), params.opaque(), input_h_desc.handle(), input_h_data.opaque(), input_c_desc.handle(), input_c_data.opaque(), input_desc.handles(), input_backprop_data->opaque(), input_h_desc.handle(), input_h_backprop_data->opaque(), input_c_desc.handle(), input_c_backprop_data->opaque(), workspace.opaque(), workspace.size(), reserve_space_data->opaque(), reserve_space_data->size())'
2021-01-25 21:31:05.430291: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at cudnn_rnn_ops.cc:1926 : Internal: Failed to call ThenRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 1, 128, 1, 128, 256, 128] 
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "C:\Users\Aleksander\.IntelliJIdea2018.3\config\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "C:\Users\Aleksander\.IntelliJIdea2018.3\config\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:/Workspace_GpwScan/dnn/sandbox/reproduce_issue.py", line 108, in <module>
    callbacks=[checkpoint, tensorboard])
  File "C:\Workspace_GpwScan\stubs\tensorflow\python\keras\engine\training.py", line 1134, in fit
    tmp_logs = self.train_function(iterator)
  File "C:\Workspace_GpwScan\stubs\tensorflow\python\eager\def_function.py", line 818, in __call__
    result = self._call(*args, **kwds)
  File "C:\Workspace_GpwScan\stubs\tensorflow\python\eager\def_function.py", line 846, in _call
    return self._stateless_fn(*args, **kwds)  # pylint: disable=not-callable
  File "C:\Workspace_GpwScan\stubs\tensorflow\python\eager\function.py", line 2994, in __call__
    filtered_flat_args, captured_inputs=graph_function.captured_inputs)  # pylint: disable=protected-access
  File "C:\Workspace_GpwScan\stubs\tensorflow\python\eager\function.py", line 1939, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "C:\Workspace_GpwScan\stubs\tensorflow\python\eager\function.py", line 569, in call
    ctx=ctx)
  File "C:\Workspace_GpwScan\stubs\tensorflow\python\eager\execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InternalError:    Failed to call ThenRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 1, 128, 1, 128, 256, 128] 
     [[{{node gradients/CudnnRNN_grad/CudnnRNNBackprop}}]]
     [[Adam/gradients/PartitionedCall_2]] [Op:__inference_train_function_8936]
Function call stack:
train_function -> train_function -> train_function

tf_nightly_issue_on_3070.txt

用于重现问题的独立代码

import datetime
import os

import pandas as pd
from numpy import reshape

import tensorflow as tf

EPOCHS = 500
BATCH_SIZE = 256
TEST_SET_RATIO = 0.2

LEARNING_RATE = 0.001
DECAY = 3e-5
LOSS_FUNC = 'categorical_crossentropy'
DROPOUT = 0.2
OUTPUT_PATH = "e:\\ml"

RNN_SEQ_LEN = 128  # number of RNN/LSTM sequence features
L_AMOUNT = 2  # number of labels

MIN_ACC_TO_SAVE_MODEL = 0.6


def create_model():
    new_model = tf.keras.models.Sequential()

    # NETWORK INPUT
    new_model.add(tf.keras.layers.LSTM(RNN_SEQ_LEN, input_shape=TR_FEATURES.shape[1:], return_sequences=True))
    new_model.add(tf.keras.layers.Dropout(DROPOUT))
    new_model.add(tf.keras.layers.BatchNormalization())

    new_model.add(tf.keras.layers.LSTM(RNN_SEQ_LEN, return_sequences=True))
    new_model.add(tf.keras.layers.Dropout(DROPOUT / 2))
    new_model.add(tf.keras.layers.BatchNormalization())

    new_model.add(tf.keras.layers.LSTM(RNN_SEQ_LEN))
    new_model.add(tf.keras.layers.Dropout(DROPOUT))
    new_model.add(tf.keras.layers.BatchNormalization())

    # NETWORK OUTPUT
    new_model.add(tf.keras.layers.Dense(L_AMOUNT, activation=tf.keras.activations.softmax))

    opt = tf.keras.optimizers.Adam(LEARNING_RATE, decay=DECAY)
    new_model.compile(optimizer=opt,
                      loss=LOSS_FUNC,
                      metrics=['accuracy'])

    print(new_model.summary())
    return new_model


class CustomModelCheckpoint(tf.keras.callbacks.ModelCheckpoint):
    def __init__(self, fp, monitor='val_loss', verbose=0, save_best_only=False, save_weights_only=False, mode='auto', save_freq='epoch', **kwargs):
        super().__init__(fp, monitor, verbose, save_best_only, save_weights_only, mode, save_freq, **kwargs)

    def on_epoch_end(self, epoch, logs=None):
        print("\n-------------------------------------------------------------------------------------------------------")
        print(f"epoch: {epoch}, training_acc: {round(float(logs['accuracy']), 4)}, validation_acc: {round(float(logs['val_accuracy']), 4)}")
        print("-------------------------------------------------------------------------------------------------------\n")

        if MIN_ACC_TO_SAVE_MODEL <= logs['accuracy']:
            super().on_epoch_end(epoch, logs)


if __name__ == '__main__':
    data_filename = 'train_2020-02-07_pp_x128_3_2_all.csv'
    print("Loading data file: %s" % data_filename)
    dataset = pd.read_csv(data_filename, delimiter=',', header=None)
    dataset = dataset.drop(columns=[0, 1, 2, 3, 4, 5, 6]).values  # drop columns with additional information

    test_set_size = int(len(dataset) * TEST_SET_RATIO)
    print("Test set split at: %d" % test_set_size)

    train_data = dataset[:-test_set_size]
    test_data = dataset[-test_set_size:]  # use most recent data for validation (extract before shuffle)

    TR_F = train_data[:, 0:RNN_SEQ_LEN]
    TS_F = test_data[:, 0:RNN_SEQ_LEN]

    TR_L = train_data[:, RNN_SEQ_LEN:RNN_SEQ_LEN + L_AMOUNT]
    TS_L = test_data[:, RNN_SEQ_LEN:RNN_SEQ_LEN + L_AMOUNT]

    TR_FEATURES = reshape(TR_F, (len(TR_F), RNN_SEQ_LEN, 1))
    TS_FEATURES = reshape(TS_F, (len(TS_F), RNN_SEQ_LEN, 1))

    model = create_model()

    TRAINING_TIMESTAMP = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
    model_name = "sscce_%s" % TRAINING_TIMESTAMP
    os.mkdir("%s\\models\\%s" % (OUTPUT_PATH, model_name))
    filepath = "%s\\models\\%s\\%s--{epoch:02d}-{val_accuracy:.3f}.model" % (OUTPUT_PATH, model_name, model_name)
    checkpoint = CustomModelCheckpoint(filepath,
                                       monitor='val_accuracy',
                                       verbose=1,
                                       save_best_only=True,
                                       mode='max')

    log_dir = "%s\\logs\\fit\\%s.model" % (OUTPUT_PATH, model_name)
    tensorboard = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1, profile_batch=0)

    model.fit(x=TR_FEATURES,
              y=TR_L,
              epochs=EPOCHS,
              batch_size=BATCH_SIZE,
              shuffle=True,
              validation_data=(TS_FEATURES, TS_L),
              callbacks=[checkpoint, tensorboard])

数据文件示例: input_data.zip

其他信息/日志

还提供 CUDA 11.0 安装路径，因为没有它会出现错误例如：

2021-01-25 21:44:15.989317: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found

完整的 win 系统路径：

Path=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\libnvvp;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin;C:\cudnn-11.1-v8.0.5.39\bin;C:\Python36\Scripts\;C:\Python36;C:\ProgramData\DockerDesktop\version-bin;C:\Program Files\Docker\Docker\Resources\bin;c:\Java\jdk1.8.0_144_x86;C:\gradle-6.0.1\bin;C:\SVN\bin;C:\MinGW\bin;C:\WinAVR-20100110\;c:\avrdude\;c:\Android\sdk\platform-tools;C:\adb\;C:\TortoiseGit\bin;C:\Git4Windows\cmd;c:\sqlite-tools-win32-x86-3130000\;C:\WINDOWS\System32;C:\WINDOWS;C:\WINDOWS\System32\wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\Program Files (x86)\Bitvise SSH Client;C:\Program Files (x86)\Windows Live\Shared;C:\WINDOWS\system32;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\OpenSSH\;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\WINDOWS\System32\OpenSSH\;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\NVIDIA Corporation\Nsight Compute 2020.3.0\;C:\Program Files\NVIDIA Corporation\NVIDIA NvDLISR

我尝试了 cuda/cudnn/tensorflow 的不同组合，但实际上只有 cuda_11.2.0_460.89_win10 附带的 win nvidia GPU 驱动程序版本足够高，可以支持 RTX 30xx 系列。仍然 - 目前还没有专门为 CUDA 11.2 指定的 cudnn 版本......也许这是一个问题......

有什么想法如何使它们一起工作吗？

Answer 1

我已回滚到 CUDA 11.0 以及匹配的 CUDNN 8.0.2 和 tensorflow 2.4.1 ，只是为了仔细检查它和此组合

cudnn-11.0-windows-x64-v8.0.2.39.zip
cuda_11.0.2_451.48_win10.exe
latest stable tensorflow 2.4.1
updated nVidia GPU drivers to 461.40 as 451.48 packaged with above CUDA installer won't work with rtx 3070

... 给出：

2021-02-04 19:36:59.700433: E tensorflow/stream_executor/cuda/cuda_dnn.cc:336] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2021-02-04 19:36:59.700523: E tensorflow/stream_executor/cuda/cuda_dnn.cc:340] Error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
2021-02-04 19:36:59.700630: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at cudnn_rnn_ops.cc:1514 : Unknown: Fail to find the dnn implementation.

最终它开始与最近与 2.5-nightly 一起发布的最新 cudnn-11.2-windows-x64-v8.1.0.77.zip 一起使用，但显然只能与 cuda_11.2.0_460.89_win10.exe 一起使用。

Answer 2

我遇到过类似的问题，之前使用过 TF 2.4、CUDA 11.0 和 CuDNN 8.0。我不知道为什么简单的网络在这种配置下可以工作，而更复杂的网络却不行。显然我的简单网络没有使用 CuDNN？

无论如何，升级到 TF 2.5、CUDA 11.2 和 CuDNN 8.1 后一切都正常。将来最好从 tensorflow.com 检查兼容的库版本。