Resource localhost/total/N10tensorflow3VarE does not exist

报错如下:

otFoundError: 2 root error(s) found.
(0) Not found: Resource localhost/total/N10tensorflow3VarE does not exist.
[[{{node metrics/accuracy/AssignAddVariableOp}}]]
[[metrics/precision/Mean/_87]]
(1) Not found: Resource localhost/total/N10tensorflow3VarE does not exist.
[[{{node metrics/accuracy/AssignAddVariableOp}}]]
0 successful operations.
0 derived errors ignored.

解决方案:

最主要原因是GPU被占用导致内存不足,使用nvitop -m查看被占用的程序的PID,sudo kill 对应PID,有可能还是不行需要使用part2部分

## part 1
能从TensorFlow导入keras及其子库,就从里面导入
from tensorflow.keras.layers import Conv2D, Flatten, Dense, MaxPool2D, Dropout, BatchNormalization
from keras.utils.np_utils import to_categorical
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import LearningRateScheduler
import tensorflow.compat.v1 as tf

from sklearn.model_selection import train_test_split

## part 2
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.85 # 程序最多只能占用指定gpu75%的显存
config.gpu_options.allow_growth=True #不全部占满显存, 按需分配
sess = tf.Session(config=config)

发表回复

您的电子邮箱地址不会被公开。