ChatGPT解决这个技术问题 Extra ChatGPT

Failed to get convolution algorithm. This is probably because cuDNN failed to initialize,

In Tensorflow/ Keras when running the code from https://github.com/pierluigiferrari/ssd_keras, use the estimator: ssd300_evaluation. I received this error.

Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.

This is very similar to the unsolved question: Google Colab Error : Failed to get convolution algorithm.This is probably because cuDNN failed to initialize

With the issue I'm running:

python: 3.6.4.

Tensorflow Version: 1.12.0.

Keras Version: 2.2.4.

CUDA: V10.0.

cuDNN: V7.4.1.5.

NVIDIA GeForce GTX 1080.

Also I ran:

import tensorflow as tf
with tf.device('/gpu:0'):
      a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
      b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
      c = tf.matmul(a, b)
with tf.Session() as sess:
print (sess.run(c))

With no errors or issues.

The minimalist example is:

 from keras import backend as K
 from keras.models import load_model
 from keras.optimizers import Adam
 from scipy.misc import imread
 import numpy as np
 from matplotlib import pyplot as plt

 from models.keras_ssd300 import ssd_300
 from keras_loss_function.keras_ssd_loss import SSDLoss
 from keras_layers.keras_layer_AnchorBoxes import AnchorBoxes
 from keras_layers.keras_layer_DecodeDetections import DecodeDetections
 from keras_layers.keras_layer_DecodeDetectionsFast import DecodeDetectionsFast
 from keras_layers.keras_layer_L2Normalization import L2Normalization
 from data_generator.object_detection_2d_data_generator import DataGenerator
 from eval_utils.average_precision_evaluator import Evaluator
 import tensorflow as tf
 %matplotlib inline
 import keras
 keras.__version__



 # Set a few configuration parameters.
 img_height = 300
 img_width = 300
 n_classes = 20
 model_mode = 'inference'


 K.clear_session() # Clear previous models from memory.

 model = ssd_300(image_size=(img_height, img_width, 3),
            n_classes=n_classes,
            mode=model_mode,
            l2_regularization=0.0005,
            scales=[0.1, 0.2, 0.37, 0.54, 0.71, 0.88, 1.05], # The scales 
 for MS COCO [0.07, 0.15, 0.33, 0.51, 0.69, 0.87, 1.05]
            aspect_ratios_per_layer=[[1.0, 2.0, 0.5],
                                     [1.0, 2.0, 0.5, 3.0, 1.0/3.0],
                                     [1.0, 2.0, 0.5, 3.0, 1.0/3.0],
                                     [1.0, 2.0, 0.5, 3.0, 1.0/3.0],
                                     [1.0, 2.0, 0.5],
                                     [1.0, 2.0, 0.5]],
            two_boxes_for_ar1=True,
            steps=[8, 16, 32, 64, 100, 300],
            offsets=[0.5, 0.5, 0.5, 0.5, 0.5, 0.5],
            clip_boxes=False,
            variances=[0.1, 0.1, 0.2, 0.2],
            normalize_coords=True,
            subtract_mean=[123, 117, 104],
            swap_channels=[2, 1, 0],
            confidence_thresh=0.01,
            iou_threshold=0.45,
            top_k=200,
            nms_max_output_size=400)

 # 2: Load the trained weights into the model.

 # TODO: Set the path of the trained weights.
 weights_path = 'C:/Users/USAgData/TF SSD 
 Keras/weights/VGG_VOC0712Plus_SSD_300x300_iter_240000.h5'

 model.load_weights(weights_path, by_name=True)

 # 3: Compile the model so that Keras won't complain the next time you load it.

 adam = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)

 ssd_loss = SSDLoss(neg_pos_ratio=3, alpha=1.0)

 model.compile(optimizer=adam, loss=ssd_loss.compute_loss)


dataset = DataGenerator()

# TODO: Set the paths to the dataset here.
dir= "C:/Users/USAgData/TF SSD Keras/VOC/VOCtest_06-Nov-2007/VOCdevkit/VOC2007/"
Pascal_VOC_dataset_images_dir = dir+ 'JPEGImages'
Pascal_VOC_dataset_annotations_dir = dir + 'Annotations/'
Pascal_VOC_dataset_image_set_filename = dir+'ImageSets/Main/test.txt'

# The XML parser needs to now what object class names to look for and in which order to map them to integers.
classes = ['background',
           'aeroplane', 'bicycle', 'bird', 'boat',
           'bottle', 'bus', 'car', 'cat',
           'chair', 'cow', 'diningtable', 'dog',
           'horse', 'motorbike', 'person', 'pottedplant',
           'sheep', 'sofa', 'train', 'tvmonitor']

dataset.parse_xml(images_dirs=[Pascal_VOC_dataset_images_dir],
                  image_set_filenames=[Pascal_VOC_dataset_image_set_filename],
                  annotations_dirs=[Pascal_VOC_dataset_annotations_dir],
                  classes=classes,
                  include_classes='all',
                  exclude_truncated=False,
                  exclude_difficult=False,
                  ret=False)



evaluator = Evaluator(model=model,
                      n_classes=n_classes,
                      data_generator=dataset,
                      model_mode=model_mode)



results = evaluator(img_height=img_height,
                    img_width=img_width,
                    batch_size=8,
                    data_generator_mode='resize',
                    round_confidences=False,
                    matching_iou_threshold=0.5,
                    border_pixels='include',
                    sorting_algorithm='quicksort',
                    average_precision_mode='sample',
                    num_recall_points=11,
                    ignore_neutral_boxes=True,
                    return_precisions=True,
                    return_recalls=True,
                    return_average_precisions=True,
                    verbose=True)
If using Conda environments, in my case the issue was solved by installing tensorflow-gpu and not CUDAtoolkit nor cuDNN because they are already installed by tensorflow-gpu (see this answer). Note though, that new conda tensorflow-gpu versions may not install CUDAtoolkit or cuDNN -> the solution is to install a lower version of tensorflow-gpu and then upgrade it with pip (see this answer).

w
waterproof

I've seen this error message for three different reasons, with different solutions:

1. You have cache issues

I regularly work around this error by shutting down my python process, removing the ~/.nv directory (on linux, rm -rf ~/.nv), and restarting the Python process. I don't exactly know why this works. It's probably at least partly related to the second option:

2. You're out of memory

The error can also show up if you run out of graphics card RAM. With an nvidia GPU you can check graphics card memory usage with nvidia-smi. This will give you a readout of how much GPU RAM you have in use (something like 6025MiB / 6086MiB if you're almost at the limit) as well as a list of what processes are using GPU RAM.

If you've run out of RAM, you'll need to restart the process (which should free up the RAM) and then take a less memory-intensive approach. A few options are:

reducing your batch size

using a simpler model

using less data

limit TensorFlow GPU memory fraction: For example, the following will make sure TensorFlow uses <= 90% of your RAM:

import keras
import tensorflow as tf

config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.9  # 0.6 sometimes works better for folks
keras.backend.tensorflow_backend.set_session(tf.Session(config=config))

This can slow down your model evaluation if not used together with the items above, presumably since the large data set will have to be swapped in and out to fit into the small amount of memory you've allocated.

A second option is to have TensorFlow start out using only a minimum amount of memory and then allocate more as needed (documented here):

os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true'

3. You have incompatible versions of CUDA, TensorFlow, NVIDIA drivers, etc.

If you've never had similar models working, you're not running out of VRAM and your cache is clean, I'd go back and set up CUDA + TensorFlow using the best available installation guide - I have had the most success with following the instructions at https://www.tensorflow.org/install/gpu rather than those on the NVIDIA / CUDA site. Lambda Stack is also a good way to go.


I'm upvoting this answer since for me, I was out of memory only.
In my case, it was incompatible versions. Instructions are tensorflow.org/install/gpu are accurate if you pay close attention to the operators like = or >=. Oiriginally I assumed "equal or newer", but with TensorFlow 2.2 (seemingly need to treat like 2.1), you need exactly CUDA 10.1 and >= CuDNN 7.6 that is compatible with CUDA 10.1 (currently, that's only 7.6.5 - and there's two different ones for CUDA 10.2 and 10.1.
It was memory for me as well. Thanks for the in depth explanation.
In my case it's out of memory.and your code for 0.6 worked for me [per_process_gpu_memory_fraction = 0.6]. Thanks
I was out of memory the whole time. A background process was hogging up all of my GPU memory. Cross checked the process ids with htop and nvidia-smi
B
Bensuperpc

I had the same issue, I solved it thanks to that :

os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true'

or

physical_devices = tf.config.experimental.list_physical_devices('GPU')
if len(physical_devices) > 0:
   tf.config.experimental.set_memory_growth(physical_devices[0], True)

1st solution solved it like magic. Thus probably not solving the source of the issue.
This seems like a very common problem at the moment, I found similar solutions on GitHub and Medium. Worked for me too, so presumably a problem with the current TF or CuDNN versions, rather than incorrect installations. It was specifically a problem with CNN layers, regardless of size. Other operations/layers are okay.
1st solution works great for me too.
Thanks! this solution worked for me too. I just used the receipe of the top voted answer here (except for the reinstallation) but it didn't work. I guess it would be a great idea to create a receipe from all the measures described in this thread to consolidate it.
g
gatefun

I had this error and I fixed it by uninstalling all CUDA and cuDNN versions from my system. Then I installed CUDA Toolkit 9.0 (without any patches) and cuDNN v7.4.1 for CUDA 9.0.


You can also downgrade the TensorFlow version
Same error i got , The Reason of getting this error is due to the mismatch of the version of the cudaa/cudnn with your tensorflow version there are two methods to solve this: Either you Downgrade your Tensorflow Version pip install --upgrade tensorflowgpu==1.8.0 Or You can follow the steps at tensorflow.org/install/gpu tip: Choose your Ubuntu version and follow the steps.:-)
For me, it was a mismatch between CUDA and cuDNN. Replacing cuDNN libraries with a matching version solved the issue.
This is not the actual solution, it just somehow worked for you look at stackoverflow.com/questions/53698035/… for the actual solution.
How i can download cudatookkit 9.0 for windows 10 ?
R
Rheatey Bash

I also had the same issue with Tensorflow 2.4 and Cuda 11.0 with CuDNN v 8.0.4. I had wasted almost 2 to 3 days to solve this issue. The problem was just a driver mismatch. I was installing Cuda 11.0 Update 1, I thought this is update 1 so might work well but that was the culprit there. I uninstalled Cuda 11.0 Update 1 and installed it without an update. Here is the list of drivers that worked for TensorFlow 2.4 at RTX 2060 6GB GPU.

cuDNN v8.0.4 for CUDA 11.0 Select preferred OS and download

CUDA Toolkit 11.0 Select your OS

A list of required hardware and software requirements are mentioned here

I also had to do this

import tensorflow as tf
physical_devices = tf.config.list_physical_devices('GPU') 
tf.config.experimental.set_memory_growth(physical_devices[0], True)

to avoid this error

2020-12-23 21:54:14.971709: I tensorflow/stream_executor/stream.cc:1404] [stream=000001E69C1DA210,impl=000001E6A9F88E20] did not wait for [stream=000001E69C1DA180,impl=000001E6A9F88730]
2020-12-23 21:54:15.211338: F tensorflow/core/common_runtime/gpu/gpu_util.cc:340] CPU->GPU Memcpy failed
[I 21:54:16.071 NotebookApp] KernelRestarter: restarting kernel (1/5), keep random ports
kernel 8b907ea5-33f1-4b2a-96cc-4a7a4c885d74 restarted
kernel 8b907ea5-33f1-4b2a-96cc-4a7a4c885d74 restarted

These are some of the error samples which I was getting

Type 1

UnpicklingError: invalid load key, 'H'.

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-2-f049ceaad66a> in <module>

Type 2


InternalError: Blas GEMM launch failed : a.shape=(15, 768), b.shape=(768, 768), m=15, n=768, k=768 [Op:MatMul]

During handling of the above exception, another exception occurred:

Type 3

failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-12-23 21:31:04.534375: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-12-23 21:31:04.534683: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-12-23 21:31:04.534923: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-12-23 21:31:04.539327: E tensorflow/stream_executor/cuda/cuda_dnn.cc:336] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED
2020-12-23 21:31:04.539523: E tensorflow/stream_executor/cuda/cuda_dnn.cc:336] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED
2020-12-23 21:31:04.539665: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at conv_ops_fused_impl.h:697 : Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.


Works like a charm.Thanks
G
Gahan

Keras is included in TensorFlow 2.0 above. So

remove import keras and

replace from keras.module.module import class statement to --> from tensorflow.keras.module.module import class

Maybe your GPU memory is filled. So use allow growth = True in GPU option. This is deprecated now. But use this below code snippet after imports may solve your problem.

import tensorflow as tf
from tensorflow.compat.v1.keras.backend import set_session
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True  # dynamically grow the memory used on the GPU
config.log_device_placement = True  # to log device placement (on which device the operation ran)
sess = tf.compat.v1.Session(config=config)
set_session(sess)

Thanks for the perfect answer! It helps me a lot.
M
Mainak Dutta

The problem is with the incompatibility of newer versions of tensorflow 1.10.x plus versions with cudnn 7.0.5 and cuda 9.0. Easiest fix is to downgrade tensorflow to 1.8.0

pip install --upgrade tensorflow-gpu==1.8.0


R
Ralph Bisschops

This is a follow up to https://stackoverflow.com/a/56511889/2037998 point 2.

2. You're out of memory

I used the following code to limit the GPU RAM usage:

import tensorflow as tf

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only allocate 1*X GB of memory on the first GPU
  try:
    tf.config.experimental.set_virtual_device_configuration(
        gpus[0],
        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=(1024*4))])
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)

This code sample comes from: TensorFlow: Use a GPU: Limiting GPU memory growth Put this code before of any other TF/Keras code you are using.

Note: The application might still use a bit more GPU RAM than the number above.

Note 2: If the system also runs other applications (like a UI) these programs can also consume some GPU RAM. (Xorg, Firefox,... sometimes up to 1GB of GPU RAM combined)


V
Vidit Varshney

Same error i got , The Reason of getting this error is due to the mismatch of the version of the cudaa/cudnn with your tensorflow version there are two methods to solve this:

Either you Downgrade your Tensorflow Version pip install --upgrade tensorflowgpu==1.8.0 Or You can follow the steps at Here. tip: Choose your ubuntu version and follow the steps.:-)


G
Gangadhar S

I had this same issue with RTX 2080. Then following code worked for me.

from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession

config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)

H
Haziq Sheikh

I was having the same issue but adding these line of code at the start solved my problem:

physical_devices = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True)

works with tensorflow V2.


didn't work for me w tensorflow-gpu 2.2, cuda 10.2 and cudnn 7.4.2 in CentOS 7 and the error wants me to install cudnn 7.6.4
@MonaJalal You can downgrade either TensorFlow or upgrade your CUDNN for compatibility check this link: tensorflow.org/install/source#gpu
K
Karthikeyan Sise

Just add

from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession

config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)

add from tensorflow.compat.v1 import ConfigProto
R
RadV

I had this problem after upgrading to TF2.0. The following started giving error:

   outputs = tf.nn.conv2d(images, filters, strides=1, padding="SAME")

I am using Ubuntu 16.04.6 LTS (Azure datascience VM) and TensorFlow 2.0. Upgraded per instruction on this TensorFlow GPU instructions page. This resolved the issue for me. By the way, its bunch of apt-get update/installs and I executed all of them.


I did the same for Ubuntu 18.04 and everything is working fine now. But now when I run nvidia-smi in the terminal, it shows CUDA 10.2. But here it says that Tensorflow 2.0 is compatible with CUDA 10.0. I don't understand how is everything working? Output of which nvcc in the terminal gives /usr/local/cuda-10.0/bin/nvcc
So I think there are 2 independent CUDAs, one for the nvidia driver and another one for the base environment.
Should be I think. I did not notice closely the CUDA version displayed. My environment has changed and now I could not check anymore. Interesting info. Thank you.
E
Emrullah Çelik

I had the same problem. I am using conda environment so my packages are automatically managed by conda. I solved the problem by constraining the memory allocation of tensorflow v2, python 3.x

physical_devices = tf.config.experimental.list_physical_devices(‘GPU’)
tf.config.experimental.set_memory_growth(physical_devices[0], True)

This solved the my problem. However, this limits the memory very much. When I simulteniously run the

nvidia-smi

I saw that it was about 700mb. So in order to see more options one can inspect the codes at tensorflow's website

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only allocate 1GB of memory on the first GPU
  try:
    tf.config.experimental.set_virtual_device_configuration(
        gpus[0],
        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)])
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)

In my case the code snip above solved the problem perfectly.

Note: I didn't tried installing tensorflow with pip, this worked with conda installed tensorflow effectively.

Ubuntu: 18.04

python: 3.8.5

tensorflow: 2.2.0

cudnn : 7.6.5

cudatoolkit : 10.1.243


L
Laurin Herbsthofer

As already observed by Anurag Bhalekar above, this can be fixed by a dirty workaround by setting up and running a model in your code before loading an old model with load_model() from keras. This correctly initializes cuDNN which can then be used for load_model(), it seems.

In my case, I am using Spyder IDE to run all my python scripts. Specifically, I set up, train and save a CNN in one script. After that, another script loads the saved model for visualization. If I open Spyder and directly run the visualization script to load an old, saved model, I get the same error as mentioned above. I was still able to load the model and to modify it, but when I tried to create a prediction, I got the error.

However, If I first run my training script in a Spyder instance and then run the visualization script in the same Sypder instance, it works fine without any errors:

#training a model correctly initializes cuDNN
model=Sequential()
model.add(Conv2D(32,...))
model.add(Dense(num_classes,...))
model.compile(...)
model.fit() #this all works fine

Then afterwards, the following code including load_model() works fine:

#this script relies on cuDNN already being initialized by the script above
from keras.models import load_model
model = load_model(modelPath) #works
model = Model(inputs=model.inputs, outputs=model.layers[1].output) #works
feature_maps = model.predict(img) #produces the error only if the first piece of code is not run

I could not figure out why this is or how to solve the problem in a different way, but for me, training a small working keras model before using load_model() is a quick and dirty fix that does not require any reinstallation of cuDNN or otherwise.


a
abdul

Was facing the same issue, I think GPU is not able to load all the data at once. I resolved it by reducing the batch size.


P
Paktalin

I was struggling with this problem for a week. The reason was very silly: I used high-res photos for training.

Hopefully, this will save someone's time :)


k
kHarshit

The problem can also occur if there are incompatible version of cuDNN, which could be the case if you installed Tensorflow with conda, as conda also installs CUDA and cuDNN while installing Tensorflow.

The solution is to install the Tensorflow with pip, and install CUDA and cuDNN separately without conda e.g. if you have CUDA 10.0.130 and cuDNN 7.4.1 (tested configurations), then

pip install tensorflow-gpu==1.13.1

A
AndrewPt

1) close all other notebooks, that use GPU

2) TF 2.0 needs cuDNN SDK (>= 7.4.1)

extract and add path to 'bin' folder into "environment variables / system variables / path": "D:\Programs\x64\Nvidia\cudnn\bin"


V
Vasco Cansado Carvalho

I had the same problem but with a simpler solution than the others posted here. I have both CUDA 10.0 and 10.2 installed but I only had cuDNN for 10.2 and this version [at the time of this post] is not compatible with TensorFlow GPU. I just installed the cuDNN for CUDA 10.0 and now everything runs fine!


S
Sivakumar D

Workaround: Fresh install TF 2.0 and ran a simple Minst tutorial, it was alright, opened another notebook, tried to run and encountered this issue. I existed all notebooks and restarted Jupyter and open only one notebook, ran it successfully Issue seems to be either memory or running more than one notebook on GPU

Thanks


B
BenedictGrain

I got same problem with you and my config is tensorflow1.13.1,cuda10.0,cudnn7.6.4. I try to change cudnn's version to 7.4.2 lucky, I solve the problem.


D
DEEPAK S.V.

Enabling memory growth on GPU at the start of my code solved the problem:

import tensorflow as tf

physical_devices = tf.config.experimental.list_physical_devices('GPU')
print("Num GPUs Available: ", len(physical_devices))
tf.config.experimental.set_memory_growth(physical_devices[0], True)

Num GPUs Available: 1

Reference: https://deeplizard.com/learn/video/OO4HD-1wRN8


高鵬翔

in starting of your notebook or code add below lines of code

import tensorflow as tf

physical_devices = tf.config.experimental.list_physical_devices('GPU')

tf.config.experimental.set_memory_growth(physical_devices[0], True)

J
Jensun Ravichandran

I had a similar problem. Tensorflow complained that it expected a certain version of cuDNN but wasn't the one it found. So, I downloaded the version it expected from https://developer.nvidia.com/rdp/cudnn-archive and installed it. It now works.


d
dpacman

If you have installed Tensorflow-gpu using Conda, then install the cudnn and cudatoolkit which were installed along with it and re-run the notebook.

NOTE: Trying to uninstall only these two packages in conda would force a chain of other packages to be uninstalled as well. So, use the following command to uninstall only these packages

(1) To remove the cuda

conda remove --force cudatookit

(2) To remove the cudnn

conda remove --force cudnn

Now run Tensorflow, it should work!


J
J B

Without any rep I can't add this as a comment to the two existing answers above from Anurag and Obnebion, neither can I upvote the answers, so I make a new answer even though it seems to be breaking guidelines. Anyway, I originally had the problem that the other answers on this page address, and fixed it, but then re-encountered the same message later on when I started to use checkpoint callbacks. At this point, only the Anurag/Obnebion answer was relevant. It turns out I'd originally been saving the model as a .json and the weights separately as .h5, then using model_from_json along with a separate model.load_weights to get the weights back again. That worked (I have CUDA 10.2 and tensorflow 2.x). It's only when I tried to switch to this all-in-one save/load_model from the checkpoint callback that it broke. This is the small change I made to keras.callbacks.ModelCheckpoint in the _save_model method:

                            if self.save_weights_only:
                                self.model.save_weights(filepath, overwrite=True)
                            else:
                                model_json = self.model.to_json()
                                with open(filepath+'.json','w') as fb:
                                    fb.write(model_json)
                                    fb.close()
                                self.model.save_weights(filepath+'.h5', overwrite=True)
                                with open(filepath+'-hist.pickle','wb') as fb:
                                    trainhistory = {"history": self.model.history.history,"params": self.model.history.params}
                                    pickle.dump(trainhistory,fb)
                                    fb.close()
                                # self.model.save(filepath, overwrite=True)

The history pickle dump is just a kludge for yet another question on stack overflow, what happens to the history object when you exit early from a Checkpoint callback. Well you can see in the _save_model method there is a line which pulls the loss monitor array out of the logs dict... but never writes it to a file! So I just put in the kludge accordingly. Most people don't recommend using pickles like this. My code is just a hack so it doesn't matter.


A
Anurag Bhalekar

It seems like the libraries need some warm up. This isn't an effective solution for production but you can at least carry on with other bugs...

from keras.models import Sequential
import numpy as np
from keras.layers import Dense
from keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
model = Sequential()
model.add(Dense(1000,input_dim=(784),activation='relu') )  #imnput layer
model.add(Dense(222,activation='relu'))                     #hidden layer
model.add(Dense(100,activation='relu'))   
model.add(Dense(50,activation='relu'))   
model.add(Dense(10,activation='sigmoid'))   
model.compile(optimizer="adam",loss='categorical_crossentropy',metrics=["accuracy"])
x_train = np.reshape(x_train,(60000,784))/255
x_test = np.reshape(x_test,(10000,784))/255
from keras.utils import np_utils
y_train = np_utils.to_categorical(y_train) 
y_test = np_utils.to_categorical(y_test)
model.fit(x_train[:1000],y_train[:1000],epochs=1,batch_size=32)

k
k.akash

Just install TensorFlow with GPU with this command : pip install tensorflow; You don't need to install GPU separately. If you install GPU separately then this is a high chance it will mismatch the versions of them.

But For releases 1.15 and older, CPU and GPU packages are separate.


I
Ivan

I struggled with this for a while working on an AWS Ubuntu instance.

Then, I found the solution, which was quite simple in this case.

Do not install tensorflow-gpu with pip (pip install tensorflow-gpu), but with conda (conda install tensorflow-gpu) so that it is in the conda environment and it installs the cudatoolkit and the cudnn in the right environment.

That worked for me, saved my day, and hope it helps somebody else.

See the original solution here from learnermaxRL: https://github.com/tensorflow/tensorflow/issues/24828#issuecomment-453727142


f
future

If you are the Chinese ,please make sure that your work path is not include chinese,and change your batch_size more and more smaller.Thanks!


关注公众号,不定期副业成功案例分享
Follow WeChat

Success story sharing

Want to stay one step ahead of the latest teleworks?

Subscribe Now