ChatGPT解决这个技术问题 Extra ChatGPT

How to fix 'Object arrays cannot be loaded when allow_pickle=False' for imdb.load_data() function?

I'm trying to implement the binary classification example using the IMDb dataset in Google Colab. I have implemented this model before. But when I tried to do it again after a few days, it returned a value error: 'Object arrays cannot be loaded when allow_pickle=False' for the load_data() function.

I have already tried solving this, referring to an existing answer for a similar problem: How to fix 'Object arrays cannot be loaded when allow_pickle=False' in the sketch_rnn algorithm. But it turns out that just adding an allow_pickle argument isn't sufficient.

My code:

from keras.datasets import imdb
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)

The error:

ValueError                                Traceback (most recent call last)
<ipython-input-1-2ab3902db485> in <module>()
      1 from keras.datasets import imdb
----> 2 (train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)

2 frames
/usr/local/lib/python3.6/dist-packages/keras/datasets/imdb.py in load_data(path, num_words, skip_top, maxlen, seed, start_char, oov_char, index_from, **kwargs)
     57                     file_hash='599dadb1135973df5b59232a0e9a887c')
     58     with np.load(path) as f:
---> 59         x_train, labels_train = f['x_train'], f['y_train']
     60         x_test, labels_test = f['x_test'], f['y_test']
     61 

/usr/local/lib/python3.6/dist-packages/numpy/lib/npyio.py in __getitem__(self, key)
    260                 return format.read_array(bytes,
    261                                          allow_pickle=self.allow_pickle,
--> 262                                          pickle_kwargs=self.pickle_kwargs)
    263             else:
    264                 return self.zip.read(key)

/usr/local/lib/python3.6/dist-packages/numpy/lib/format.py in read_array(fp, allow_pickle, pickle_kwargs)
    690         # The array contained Python objects. We need to unpickle the data.
    691         if not allow_pickle:
--> 692             raise ValueError("Object arrays cannot be loaded when "
    693                              "allow_pickle=False")
    694         if pickle_kwargs is None:

ValueError: Object arrays cannot be loaded when allow_pickle=False
what does this error mean?
@CharlieParker Apparently there has been an addition of a parameter in the numpy.load() function. Previously it was np.load(path) , now it's np.load(path, boolean) By default, the boolean (allow_pickle) is false
thanks! but does that mean that numpy now pickles things for me without my permission when saving?! weird! I looked at np.savez docs but there was no reference to pickling so I have no idea how it even knew in the first place that the things I was saving were Pytorch stuff and not only numpy...weird! If you know whats going on share with us :)
My belief after running into the same problem is that it totally depends on what you are saving to an .npz. If you are saving built-in types, then no pickling. However, if you write an object python/numpy will pickle it (ie serialize it). This I imagine opens up a security risk, so later versions of numpy stopped allowing it be default...just a hunch though.

M
Matthew Kerian

Here's a trick to force imdb.load_data to allow pickle by, in your notebook, replacing this line:

(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)

by this:

import numpy as np
# save np.load
np_load_old = np.load

# modify the default parameters of np.load
np.load = lambda *a,**k: np_load_old(*a, allow_pickle=True, **k)

# call load_data with allow_pickle implicitly set to true
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)

# restore np.load for future normal usage
np.load = np_load_old

I suggest to add "import numpy as np" in the beginning. Numpy may be imported under a different name, or not imported at all...
It helps me a lot
Getting error TypeError: <lambda>() got multiple values for keyword argument 'allow_pickle'
The problem of multiple values for keyword argument has been addressed in stackoverflow.com/a/58586450/5214998
T
Tirth Patel

This issue is still up on keras git. I hope it gets solved as soon as possible. Until then, try downgrading your numpy version to 1.16.2. It seems to solve the problem.

!pip install numpy==1.16.1
import numpy as np

This version of numpy has the default value of allow_pickle as True.


I would use the solution from MappaGnosis rather than downgrade numpy version: for me futzing around with the version dance is a last resort!
1.16.4 has the issue as well
Thanks @kensai. Does anyone know if this got solved in numpy 1.17 ?
In numpy 1.18 still this problem is present. I had to switch to numpy 1.16.1 and it solved now. thank you.
nothing too much changed from 1.16 to 1.17. This is the most helpful answer.
M
Madhuparna Bhowmik

I just used allow_pickle = True as an argument to np.load() and it worked for me.

np.load(path, allow_pickle=True)


I am observing that allowing pickle changes the array. The .npy array before saving and after loading thows an exception when trying to assert for equality using np.array_equal
M
MappaGnosis

Following this issue on GitHub, the official solution is to edit the imdb.py file. This fix worked well for me without the need to downgrade numpy. Find the imdb.py file at tensorflow/python/keras/datasets/imdb.py (full path for me was: C:\Anaconda\Lib\site-packages\tensorflow\python\keras\datasets\imdb.py - other installs will be different) and change line 85 as per the diff:

-  with np.load(path) as f:
+  with np.load(path, allow_pickle=True) as f:

The reason for the change is security to prevent the Python equivalent of an SQL injection in a pickled file. The change above will ONLY effect the imdb data and you therefore retain the security elsewhere (by not downgrading numpy).


As I said, I'm using Colab, how can I make changes in imdb.py file?
This is not a Colab issue as IMDB is downloaded locally the first time you reference it. So, there will be a local copy somewhere on your computer (try the suggested paths above - or, if you set a directory for Colab, try there first) and simply open the imdb.py file in any IDE or even a text editor to make the change (I used Notepad ++ to edit the imdb.py file which was downloaded when working in Jupyter - so a very similar environment to Colab!).
the solution that work for me is > np.load(data_path, encoding='latin1',allow_pickle=True)
This is the solution I use, as messing around with versions (especially of numpy), as in the accepted answer, is something I try to avoid. This is also more pythonic as it explicitly just fixes the problem. (Note also the newest versions of Keras, at github, actually incorporate this fix)
H
Hossein

In my case worked with:

np.load(path, allow_pickle=True)

G
Gustavo Mirapalheta

I think the answer from cheez (https://stackoverflow.com/users/122933/cheez) is the easiest and most effective one. I'd elaborate a little bit over it so it would not modify a numpy function for the whole session period.

My suggestion is below. I´m using it to download the reuters dataset from keras which is showing the same kind of error:

old = np.load
np.load = lambda *a,**k: old(*a,**k,allow_pickle=True)

from keras.datasets import reuters
(train_data, train_labels), (test_data, test_labels) = reuters.load_data(num_words=10000)

np.load = old
del(old)

Can you explain more on what's happening here?
I was not being able to load the Keras datasets. I searched the internet and found a solution that said that I should edit de imdb.py file, others pointed to changes in numpy installation (like here) or changing Tensorflow to a development version. I came across cheez solution. IMHO that was the easiest and most effective one.
@Kanad - lambda is an anonymous function. Gustavo created a function-augment to the np.load, used the augmented version, then set back to default value.
B
Brayan Armando Yaquian Gonzale

You can try changing the flag's value

np.load(training_image_names_array,allow_pickle=True)

Great. Its working. This should be the accepted answer.
F
Farid Khafizov

none of the above listed solutions worked for me: i run anaconda with python 3.7.3. What worked for me was

run "conda install numpy==1.16.1" from Anaconda powershell

close and reopen the notebook


Thanks, that's what I searched for. By the way, it looks like 1.16.2 is newest version where allow_pickle=True is default value.
Y
Yasser Albarbay

on jupyter notebook using

np_load_old = np.load

# modify the default parameters of np.load
np.load = lambda *a,**k: np_load_old(*a, allow_pickle=True, **k)

worked fine, but the problem appears when you use this method in spyder(you have to restart the kernel every time or you will get an error like:

TypeError : () got multiple values for keyword argument 'allow_pickle'

I solved this issue using the solution here:


L
Leonhard Rathnak

find the path to imdb.py then just add the flag to np.load(path,...flag...)

    def load_data(.......):
    .......................................
    .......................................
    - with np.load(path) as f:
    + with np.load(path,allow_pickle=True) as f:

O
Otabek

Use this

 from tensorflow.keras.datasets import imdb

instead of this

 from keras.datasets import imdb

R
ReimuChan

Its work for me

        np_load_old = np.load
        np.load = lambda *a: np_load_old(*a, allow_pickle=True)
        (x_train, y_train), (x_test, y_test) = reuters.load_data(num_words=None, test_split=0.2)
        np.load = np_load_old

And some context explaing why your solution works. (From Review).
S
SidK

What I have found is that TensorFlow 2.0 (I am using 2.0.0-alpha0) is not compatible with the latest version of Numpy i.e. v1.17.0 (and possibly v1.16.5+). As soon as TF2 is imported, it throws a huge list of FutureWarning, that looks something like this:

FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/anaconda3/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/anaconda3/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/anaconda3/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.

This also resulted in the allow_pickle error when tried to load imdb dataset from keras

I tried to use the following solution which worked just fine, but I had to do it every single project where I was importing TF2 or tf.keras.

np.load = lambda *a,**k: np_load_old(*a, allow_pickle=True, **k)

The easiest solution I found was to either install numpy 1.16.1 globally, or use compatible versions of tensorflow and numpy in a virtual environment.

My goal with this answer is to point out that its not just a problem with imdb.load_data, but a larger problem vaused by incompatibility of TF2 and Numpy versions and may result in many other hidden bugs or issues.


S
Sajad Norouzi

The answer of @cheez sometime doesn't work and recursively call the function again and again. To solve this problem you should copy the function deeply. You can do this by using the function partial, so the final code is:

import numpy as np
from functools import partial

# save np.load
np_load_old = partial(np.load)

# modify the default parameters of np.load
np.load = lambda *a,**k: np_load_old(*a, allow_pickle=True, **k)

# call load_data with allow_pickle implicitly set to true
(train_data, train_labels), (test_data, test_labels) = 
imdb.load_data(num_words=10000)

# restore np.load for future normal usage
np.load = np_load_old

P
Pmpr.ir

I landed up here, tried your ways and could not figure out.

I was actually working on a pregiven code where

pickle.load(path)

was used so i replaced it with

np.load(path, allow_pickle=True)

C
Carlos S Traynor

The error also can occur if you try to save a python list of numpy arrays with np.save and load with np.load. I am only saying it for the sake of googler's to check out that this is not the issue. Also using allow_pickle=True fixed the issue if a list is indeed what you meant to save and load.


S
Shaina Raza

This error comes when you have the previous version of torch like 1.6.0 with torchvision==0.7.0, you may check yours torch version through this command:

import tensorflow
print(tensorflow.__version__)

this error is already resolved in the newer version of torch.

you can remove this error through making the following change in np.load()

np.load(somepath, allow_pickle=True)

The allow_pickle=True will solve it


M
Mustafa Sakhai

[Fast Solution] I got it worked by modifying "allow_pickle" when calling np.load:

labels = np.load("Labels",allow_pickle=True)


I
Ivan Borshchov

There are a lot of answers, but to really understand the issue I recommend you just try next on simple example:

a=np.array([[1, 2, 3], [4, 5, 6]])
# Object array
b={'data':'somet',
   'data_2':'defin'}
#Save arrays into file
np.savez('/content/123.npz', a=a, b=b)
#Load file into data variable
data = np.load('/content/123.npz')
print(data['b'])

This simple example already reproduces the error. Thing is that you had dictionary serialized in npz,

now jus ttry to replace line with np.load with:

data = np.load('/content/123.npz',allow_pickle=True)

And it works! Source of example: fix object arrays cannot be loaded when allow_pickle=False


W
Wissam

Yes, installing previous a version of numpy solved the problem.

For those who uses PyCharm IDE:

in my IDE (Pycharm), File->Settings->Project Interpreter: I found my numpy to be 1.16.3, so I revert back to 1.16.1. Click + and type numpy in the search, tick "specify version" : 1.16.1 and choose--> install package.


j
jww

I don't usually post to these things but this was super annoying. The confusion comes from the fact that some of the Keras imdb.py files have already updated:

with np.load(path) as f:

to the version with allow_pickle=True. Make sure check the imdb.py file to see if this change was already implemented. If it has been adjusted, the following works fine:

from keras.datasets import imdb
(train_text, train_labels), (test_text, test_labels) = imdb.load_data(num_words=10000)

N
Nasif Imtiaz Ohi

The easiest way is to change imdb.py setting allow_pickle=True to np.load at the line where imdb.py throws error.


M
Mudasir Habib

I was facing the same issue, here is line from error

File "/usr/lib/python3/dist-packages/numpy/lib/npyio.py", line 260, in __getitem__

So i solve the issue by updating "npyio.py" file. In npyio.py line 196 assigning value to allow_pickle so i update this line as

self.allow_pickle = True

S
Sabito 錆兎 stands with Ukraine

Instead of

from keras.datasets import imdb

use

from tensorflow.keras.datasets import imdb

top_words = 10000
((x_train, y_train), (x_test, y_test)) = imdb.load_data(num_words=top_words, seed=21)

J
Joocheol Kim

Tensorflow has a fix in tf-nightly version.

!pip install tf-nightly

The current version is '2.0.0-dev20190511'.


P
Prometheus

If you are loading compressed storage file like npz format then below code will do good

np.load(path, allow_pickle=True)

Make sure while specifying path, you are surrounding it with single quotes and allow_pickle = True shouldn't be there in any quotes.


F
Fenil

change to this line of code worked for me and solved the error.

data_dict = np.load(data_path, encoding='latin1', allow_pickle=True).item()

do check if your numpy module is imported properly. it will replace the deprecated version.


Welcome to Stack Overflow. This question is over 3 years old and has many existing answers, including an accepted answer with a score of over 150 points. Are you entirely sure that this answer introduces something new? If so, can you edit to clarify? Which line should be modified, what is the specific modification (not just the whole line, but specifically what are you changing), and why? See How to Answer.
Yes, basically it occurs due to a deprecated version of tenserflow. so if someone is working on editor such as vs code rather than notebook as mentioned in that popular answer, replacing code of line provided by me solved allow_pickled issue. Thanks!

关注公众号,不定期副业成功案例分享
Follow WeChat

Success story sharing

Want to stay one step ahead of the latest teleworks?

Subscribe Now