如何在 Windows 上为 RTX 3070 设置 Tensorflow?
我正在使用 Windows 10,并尝试设置 tesnsorflow 脚本以与我的新 RTX 3070 GPU 配合使用。之前我在 GTX 980 上使用它。
-
从二进制文件安装 TensorFlow (
pip3 install tensorflow
) - 尝试了最新的稳定版 v2.4.0-49-g85c8b2a817f 2.4.1,但也尝试了夜间版(见下文)
- Win32 上的 Python 3.6.8 (tags/v3.6.8:3c6b436a57,2018 年 12 月 24 日,00:16:47) [MSC v.1916 64 位 (AMD64)]
- CUDA/cuDNN 版本:cuda_11.2.0_460.89_win10\cudnn-11.1-v8.0.5.39
- GPU 型号和内存:似乎被 TF 正确识别 - GeForce RTX 3070 计算能力:8.6 核心时钟:1.725GHz 核心数:46 设备内存大小:8.00GiB 设备内存带宽:417.29GiB/s
当前行为
出现以下错误:
2021-01-25 21:36:01.042433: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Epoch 1/500
2021-01-25 21:36:03.304809: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-01-25 21:36:03.880223: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-01-25 21:36:03.911531: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-01-25 21:36:04.515409: E tensorflow/stream_executor/cuda/cuda_dnn.cc:336] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2021-01-25 21:36:04.515498: E tensorflow/stream_executor/cuda/cuda_dnn.cc:340] Error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
2021-01-25 21:36:04.515607: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at cudnn_rnn_ops.cc:1514 : Unknown: Fail to find the dnn implementation.
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "C:\Users\Aleksander\.IntelliJIdea2018.3\config\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "C:\Users\Aleksander\.IntelliJIdea2018.3\config\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Workspace_GpwScan/dnn/sandbox/reproduce_issue.py", line 110, in <module>
callbacks=[checkpoint, tensorboard])
File "C:\Workspace_GpwScan\stubs\tensorflow\python\keras\engine\training.py", line 1100, in fit
tmp_logs = self.train_function(iterator)
File "C:\Workspace_GpwScan\stubs\tensorflow\python\eager\def_function.py", line 828, in __call__
result = self._call(*args, **kwds)
File "C:\Workspace_GpwScan\stubs\tensorflow\python\eager\def_function.py", line 888, in _call
return self._stateless_fn(*args, **kwds)
File "C:\Workspace_GpwScan\stubs\tensorflow\python\eager\function.py", line 2943, in __call__
filtered_flat_args, captured_inputs=graph_function.captured_inputs) # pylint: disable=protected-access
File "C:\Workspace_GpwScan\stubs\tensorflow\python\eager\function.py", line 1919, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "C:\Workspace_GpwScan\stubs\tensorflow\python\eager\function.py", line 560, in call
ctx=ctx)
File "C:\Workspace_GpwScan\stubs\tensorflow\python\eager\execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.UnknownError: Fail to find the dnn implementation.
[[{{node CudnnRNN}}]]
[[sequential/lstm/PartitionedCall]] [Op:__inference_train_function_8782]
Function call stack:
train_function -> train_function -> train_function
也尝试使用最新的
nightly 2.4.02.5.0.dev20210125
,结果错误:
2021-01-25 21:31:05.429799: E tensorflow/stream_executor/dnn.cc:618] CUDNN_STATUS_EXECUTION_FAILED
in tensorflow/stream_executor/cuda/cuda_dnn.cc(1975): 'cudnnRNNBackwardData( cudnn.handle(), rnn_desc.handle(), model_dims.max_seq_length, output_desc.handles(), output_data.opaque(), output_desc.handles(), output_backprop_data.opaque(), output_h_desc.handle(), output_h_backprop_data.opaque(), output_c_desc.handle(), output_c_backprop_data.opaque(), rnn_desc.params_handle(), params.opaque(), input_h_desc.handle(), input_h_data.opaque(), input_c_desc.handle(), input_c_data.opaque(), input_desc.handles(), input_backprop_data->opaque(), input_h_desc.handle(), input_h_backprop_data->opaque(), input_c_desc.handle(), input_c_backprop_data->opaque(), workspace.opaque(), workspace.size(), reserve_space_data->opaque(), reserve_space_data->size())'
2021-01-25 21:31:05.430291: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at cudnn_rnn_ops.cc:1926 : Internal: Failed to call ThenRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 1, 128, 1, 128, 256, 128]
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "C:\Users\Aleksander\.IntelliJIdea2018.3\config\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "C:\Users\Aleksander\.IntelliJIdea2018.3\config\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Workspace_GpwScan/dnn/sandbox/reproduce_issue.py", line 108, in <module>
callbacks=[checkpoint, tensorboard])
File "C:\Workspace_GpwScan\stubs\tensorflow\python\keras\engine\training.py", line 1134, in fit
tmp_logs = self.train_function(iterator)
File "C:\Workspace_GpwScan\stubs\tensorflow\python\eager\def_function.py", line 818, in __call__
result = self._call(*args, **kwds)
File "C:\Workspace_GpwScan\stubs\tensorflow\python\eager\def_function.py", line 846, in _call
return self._stateless_fn(*args, **kwds) # pylint: disable=not-callable
File "C:\Workspace_GpwScan\stubs\tensorflow\python\eager\function.py", line 2994, in __call__
filtered_flat_args, captured_inputs=graph_function.captured_inputs) # pylint: disable=protected-access
File "C:\Workspace_GpwScan\stubs\tensorflow\python\eager\function.py", line 1939, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "C:\Workspace_GpwScan\stubs\tensorflow\python\eager\function.py", line 569, in call
ctx=ctx)
File "C:\Workspace_GpwScan\stubs\tensorflow\python\eager\execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InternalError: Failed to call ThenRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 1, 128, 1, 128, 256, 128]
[[{{node gradients/CudnnRNN_grad/CudnnRNNBackprop}}]]
[[Adam/gradients/PartitionedCall_2]] [Op:__inference_train_function_8936]
Function call stack:
train_function -> train_function -> train_function
用于重现问题的独立代码
import datetime
import os
import pandas as pd
from numpy import reshape
import tensorflow as tf
EPOCHS = 500
BATCH_SIZE = 256
TEST_SET_RATIO = 0.2
LEARNING_RATE = 0.001
DECAY = 3e-5
LOSS_FUNC = 'categorical_crossentropy'
DROPOUT = 0.2
OUTPUT_PATH = "e:\\ml"
RNN_SEQ_LEN = 128 # number of RNN/LSTM sequence features
L_AMOUNT = 2 # number of labels
MIN_ACC_TO_SAVE_MODEL = 0.6
def create_model():
new_model = tf.keras.models.Sequential()
# NETWORK INPUT
new_model.add(tf.keras.layers.LSTM(RNN_SEQ_LEN, input_shape=TR_FEATURES.shape[1:], return_sequences=True))
new_model.add(tf.keras.layers.Dropout(DROPOUT))
new_model.add(tf.keras.layers.BatchNormalization())
new_model.add(tf.keras.layers.LSTM(RNN_SEQ_LEN, return_sequences=True))
new_model.add(tf.keras.layers.Dropout(DROPOUT / 2))
new_model.add(tf.keras.layers.BatchNormalization())
new_model.add(tf.keras.layers.LSTM(RNN_SEQ_LEN))
new_model.add(tf.keras.layers.Dropout(DROPOUT))
new_model.add(tf.keras.layers.BatchNormalization())
# NETWORK OUTPUT
new_model.add(tf.keras.layers.Dense(L_AMOUNT, activation=tf.keras.activations.softmax))
opt = tf.keras.optimizers.Adam(LEARNING_RATE, decay=DECAY)
new_model.compile(optimizer=opt,
loss=LOSS_FUNC,
metrics=['accuracy'])
print(new_model.summary())
return new_model
class CustomModelCheckpoint(tf.keras.callbacks.ModelCheckpoint):
def __init__(self, fp, monitor='val_loss', verbose=0, save_best_only=False, save_weights_only=False, mode='auto', save_freq='epoch', **kwargs):
super().__init__(fp, monitor, verbose, save_best_only, save_weights_only, mode, save_freq, **kwargs)
def on_epoch_end(self, epoch, logs=None):
print("\n-------------------------------------------------------------------------------------------------------")
print(f"epoch: {epoch}, training_acc: {round(float(logs['accuracy']), 4)}, validation_acc: {round(float(logs['val_accuracy']), 4)}")
print("-------------------------------------------------------------------------------------------------------\n")
if MIN_ACC_TO_SAVE_MODEL <= logs['accuracy']:
super().on_epoch_end(epoch, logs)
if __name__ == '__main__':
data_filename = 'train_2020-02-07_pp_x128_3_2_all.csv'
print("Loading data file: %s" % data_filename)
dataset = pd.read_csv(data_filename, delimiter=',', header=None)
dataset = dataset.drop(columns=[0, 1, 2, 3, 4, 5, 6]).values # drop columns with additional information
test_set_size = int(len(dataset) * TEST_SET_RATIO)
print("Test set split at: %d" % test_set_size)
train_data = dataset[:-test_set_size]
test_data = dataset[-test_set_size:] # use most recent data for validation (extract before shuffle)
TR_F = train_data[:, 0:RNN_SEQ_LEN]
TS_F = test_data[:, 0:RNN_SEQ_LEN]
TR_L = train_data[:, RNN_SEQ_LEN:RNN_SEQ_LEN + L_AMOUNT]
TS_L = test_data[:, RNN_SEQ_LEN:RNN_SEQ_LEN + L_AMOUNT]
TR_FEATURES = reshape(TR_F, (len(TR_F), RNN_SEQ_LEN, 1))
TS_FEATURES = reshape(TS_F, (len(TS_F), RNN_SEQ_LEN, 1))
model = create_model()
TRAINING_TIMESTAMP = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
model_name = "sscce_%s" % TRAINING_TIMESTAMP
os.mkdir("%s\\models\\%s" % (OUTPUT_PATH, model_name))
filepath = "%s\\models\\%s\\%s--{epoch:02d}-{val_accuracy:.3f}.model" % (OUTPUT_PATH, model_name, model_name)
checkpoint = CustomModelCheckpoint(filepath,
monitor='val_accuracy',
verbose=1,
save_best_only=True,
mode='max')
log_dir = "%s\\logs\\fit\\%s.model" % (OUTPUT_PATH, model_name)
tensorboard = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1, profile_batch=0)
model.fit(x=TR_FEATURES,
y=TR_L,
epochs=EPOCHS,
batch_size=BATCH_SIZE,
shuffle=True,
validation_data=(TS_FEATURES, TS_L),
callbacks=[checkpoint, tensorboard])
数据文件示例: input_data.zip
其他信息/日志
还提供 CUDA 11.0 安装路径,因为没有它会出现错误例如:
2021-01-25 21:44:15.989317: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found
完整的 win 系统路径:
Path=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\libnvvp;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin;C:\cudnn-11.1-v8.0.5.39\bin;C:\Python36\Scripts\;C:\Python36;C:\ProgramData\DockerDesktop\version-bin;C:\Program Files\Docker\Docker\Resources\bin;c:\Java\jdk1.8.0_144_x86;C:\gradle-6.0.1\bin;C:\SVN\bin;C:\MinGW\bin;C:\WinAVR-20100110\;c:\avrdude\;c:\Android\sdk\platform-tools;C:\adb\;C:\TortoiseGit\bin;C:\Git4Windows\cmd;c:\sqlite-tools-win32-x86-3130000\;C:\WINDOWS\System32;C:\WINDOWS;C:\WINDOWS\System32\wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\Program Files (x86)\Bitvise SSH Client;C:\Program Files (x86)\Windows Live\Shared;C:\WINDOWS\system32;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\OpenSSH\;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\WINDOWS\System32\OpenSSH\;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\NVIDIA Corporation\Nsight Compute 2020.3.0\;C:\Program Files\NVIDIA Corporation\NVIDIA NvDLISR
我尝试了 cuda/cudnn/tensorflow 的不同组合,但实际上只有
cuda_11.2.0_460.89_win10
附带的 win nvidia GPU 驱动程序版本足够高,可以支持 RTX 30xx 系列。
仍然 - 目前还没有专门为
CUDA 11.2
指定的
cudnn
版本......也许这是一个问题......
有什么想法如何使它们一起工作吗?
我已回滚到
CUDA 11.0
以及匹配的
CUDNN 8.0.2
和
tensorflow 2.4.1
,只是为了仔细检查它和此组合
cudnn-11.0-windows-x64-v8.0.2.39.zip
cuda_11.0.2_451.48_win10.exe
latest stable tensorflow 2.4.1
updated nVidia GPU drivers to 461.40 as 451.48 packaged with above CUDA installer won't work with rtx 3070
... 给出:
2021-02-04 19:36:59.700433: E tensorflow/stream_executor/cuda/cuda_dnn.cc:336] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2021-02-04 19:36:59.700523: E tensorflow/stream_executor/cuda/cuda_dnn.cc:340] Error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
2021-02-04 19:36:59.700630: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at cudnn_rnn_ops.cc:1514 : Unknown: Fail to find the dnn implementation.
最终它开始与最近与 2.5-nightly 一起发布的最新
cudnn-11.2-windows-x64-v8.1.0.77.zip
一起使用,但显然只能与
cuda_11.2.0_460.89_win10.exe
一起使用。
我遇到过类似的问题,之前使用过 TF 2.4、CUDA 11.0 和 CuDNN 8.0。我不知道为什么简单的网络在这种配置下可以工作,而更复杂的网络却不行。显然我的简单网络没有使用 CuDNN?
无论如何,升级到 TF 2.5、CUDA 11.2 和 CuDNN 8.1 后一切都正常。将来最好从 tensorflow.com 检查兼容的库版本。