ChatGPT解决这个技术问题 Extra ChatGPT

How to write a multidimensional array to a text file?

In another question, other users offered some help if I could supply the array I was having trouble with. However, I even fail at a basic I/O task, such as writing an array to a file.

Can anyone explain what kind of loop I would need to write a 4x11x14 numpy array to file?

This array consist of four 11 x 14 arrays, so I should format it with a nice newline, to make the reading of the file easier on others.

Edit: So I've tried the numpy.savetxt function. Strangely, it gives the following error:

TypeError: float argument required, not numpy.ndarray

I assume that this is because the function doesn't work with multidimensional arrays? Any solutions as I would like them within one file?


C
Community

If you want to write it to disk so that it will be easy to read back in as a numpy array, look into numpy.save. Pickling it will work fine, as well, but it's less efficient for large arrays (which yours isn't, so either is perfectly fine).

If you want it to be human readable, look into numpy.savetxt.

Edit: So, it seems like savetxt isn't quite as great an option for arrays with >2 dimensions... But just to draw everything out to it's full conclusion:

I just realized that numpy.savetxt chokes on ndarrays with more than 2 dimensions... This is probably by design, as there's no inherently defined way to indicate additional dimensions in a text file.

E.g. This (a 2D array) works fine

import numpy as np
x = np.arange(20).reshape((4,5))
np.savetxt('test.txt', x)

While the same thing would fail (with a rather uninformative error: TypeError: float argument required, not numpy.ndarray) for a 3D array:

import numpy as np
x = np.arange(200).reshape((4,5,10))
np.savetxt('test.txt', x)

One workaround is just to break the 3D (or greater) array into 2D slices. E.g.

x = np.arange(200).reshape((4,5,10))
with open('test.txt', 'w') as outfile:
    for slice_2d in x:
        np.savetxt(outfile, slice_2d)

However, our goal is to be clearly human readable, while still being easily read back in with numpy.loadtxt. Therefore, we can be a bit more verbose, and differentiate the slices using commented out lines. By default, numpy.loadtxt will ignore any lines that start with # (or whichever character is specified by the comments kwarg). (This looks more verbose than it actually is...)

import numpy as np

# Generate some test data
data = np.arange(200).reshape((4,5,10))

# Write the array to disk
with open('test.txt', 'w') as outfile:
    # I'm writing a header here just for the sake of readability
    # Any line starting with "#" will be ignored by numpy.loadtxt
    outfile.write('# Array shape: {0}\n'.format(data.shape))
    
    # Iterating through a ndimensional array produces slices along
    # the last axis. This is equivalent to data[i,:,:] in this case
    for data_slice in data:

        # The formatting string indicates that I'm writing out
        # the values in left-justified columns 7 characters in width
        # with 2 decimal places.  
        np.savetxt(outfile, data_slice, fmt='%-7.2f')

        # Writing out a break to indicate different slices...
        outfile.write('# New slice\n')

This yields:

# Array shape: (4, 5, 10)
0.00    1.00    2.00    3.00    4.00    5.00    6.00    7.00    8.00    9.00   
10.00   11.00   12.00   13.00   14.00   15.00   16.00   17.00   18.00   19.00  
20.00   21.00   22.00   23.00   24.00   25.00   26.00   27.00   28.00   29.00  
30.00   31.00   32.00   33.00   34.00   35.00   36.00   37.00   38.00   39.00  
40.00   41.00   42.00   43.00   44.00   45.00   46.00   47.00   48.00   49.00  
# New slice
50.00   51.00   52.00   53.00   54.00   55.00   56.00   57.00   58.00   59.00  
60.00   61.00   62.00   63.00   64.00   65.00   66.00   67.00   68.00   69.00  
70.00   71.00   72.00   73.00   74.00   75.00   76.00   77.00   78.00   79.00  
80.00   81.00   82.00   83.00   84.00   85.00   86.00   87.00   88.00   89.00  
90.00   91.00   92.00   93.00   94.00   95.00   96.00   97.00   98.00   99.00  
# New slice
100.00  101.00  102.00  103.00  104.00  105.00  106.00  107.00  108.00  109.00 
110.00  111.00  112.00  113.00  114.00  115.00  116.00  117.00  118.00  119.00 
120.00  121.00  122.00  123.00  124.00  125.00  126.00  127.00  128.00  129.00 
130.00  131.00  132.00  133.00  134.00  135.00  136.00  137.00  138.00  139.00 
140.00  141.00  142.00  143.00  144.00  145.00  146.00  147.00  148.00  149.00 
# New slice
150.00  151.00  152.00  153.00  154.00  155.00  156.00  157.00  158.00  159.00 
160.00  161.00  162.00  163.00  164.00  165.00  166.00  167.00  168.00  169.00 
170.00  171.00  172.00  173.00  174.00  175.00  176.00  177.00  178.00  179.00 
180.00  181.00  182.00  183.00  184.00  185.00  186.00  187.00  188.00  189.00 
190.00  191.00  192.00  193.00  194.00  195.00  196.00  197.00  198.00  199.00 
# New slice

Reading it back in is very easy, as long as we know the shape of the original array. We can just do numpy.loadtxt('test.txt').reshape((4,5,10)). As an example (You can do this in one line, I'm just being verbose to clarify things):

# Read the array from disk
new_data = np.loadtxt('test.txt')

# Note that this returned a 2D array!
print new_data.shape

# However, going back to 3D is easy if we know the 
# original shape of the array
new_data = new_data.reshape((4,5,10))
    
# Just to check that they're the same...
assert np.all(new_data == data)

There's a much easier solution now to this here problem: yourStrArray = np.array([str(val) for val in yourMulDArray],dtype='string'); np.savetxt('YourTextFile.txt',yourStrArray,fmt='%s')
@GregKramida and how do you recover the array?
@Juanlu001: I know that numpy.loadtxt(...) also accepts a dtype argument, which can be set to np.string_. I would give that a shot, first and formost. There is also a numpy.fromstring(...) for parsing arrays from strings.
Hey what if I need to store an image array? How would we resize that if the image size is lets say, 512 x 512?
z
zyy

I am not certain if this meets your requirements, given I think you are interested in making the file readable by people, but if that's not a primary concern, just pickle it.

To save it:

import pickle

my_data = {'a': [1, 2.0, 3, 4+6j],
           'b': ('string', u'Unicode string'),
           'c': None}
output = open('data.pkl', 'wb')
pickle.dump(my_data, output)
output.close()

To read it back:

import pprint, pickle

pkl_file = open('data.pkl', 'rb')

data1 = pickle.load(pkl_file)
pprint.pprint(data1)

pkl_file.close()

You might not need pprint in order to print the dictionary.
a
aseagram

If you don't need a human-readable output, another option you could try is to save the array as a MATLAB .mat file, which is a structured array. I despise MATLAB, but the fact that I can both read and write a .mat in very few lines is convenient.

Unlike Joe Kington's answer, the benefit of this is that you don't need to know the original shape of the data in the .mat file, i.e. no need to reshape upon reading in. And, unlike using pickle, a .mat file can be read by MATLAB, and probably some other programs/languages as well.

Here is an example:

import numpy as np
import scipy.io

# Some test data
x = np.arange(200).reshape((4,5,10))

# Specify the filename of the .mat file
matfile = 'test_mat.mat'

# Write the array to the mat file. For this to work, the array must be the value
# corresponding to a key name of your choice in a dictionary
scipy.io.savemat(matfile, mdict={'out': x}, oned_as='row')

# For the above line, I specified the kwarg oned_as since python (2.7 with 
# numpy 1.6.1) throws a FutureWarning.  Here, this isn't really necessary 
# since oned_as is a kwarg for dealing with 1-D arrays.

# Now load in the data from the .mat that was just saved
matdata = scipy.io.loadmat(matfile)

# And just to check if the data is the same:
assert np.all(x == matdata['out'])

If you forget the key that the array is named in the .mat file, you can always do:

print matdata.keys()

And of course you can store many arrays using many more keys.

So yes – it won't be readable with your eyes, but only takes 2 lines to write and read the data, which I think is a fair trade-off.

Take a look at the docs for scipy.io.savemat and scipy.io.loadmat and also this tutorial page: scipy.io File IO Tutorial


a
atomh33ls

ndarray.tofile() should also work

e.g. if your array is called a:

a.tofile('yourfile.txt',sep=" ",format="%s")

Not sure how to get newline formatting though.

Edit (credit Kevin J. Black's comment here):

Since version 1.5.0, np.tofile() takes an optional parameter newline='\n' to allow multi-line output. https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.savetxt.html


But is there a way to create original array from the texfile?
tofile doesn't have newline='\n'.
N
Nico Schlömer

File I/O can often be a bottleneck in codes. That's why it's important to know that ASCII I/O is always magnitudes slower that binary I/O. I've compared some of the suggested solutions with perfplot:

https://i.stack.imgur.com/rRTpy.png

Code to reproduce the plot:

import json
import pickle

import numpy as np
import perfplot
import scipy.io


def numpy_save(data):
    np.save("test.dat", data)


def numpy_savetxt(data):
    np.savetxt("test.txt", data)


def numpy_savetxt_fmt(data):
    np.savetxt("test.txt", data, fmt="%-7.2f")


def pickle_dump(data):
    with open("data.pkl", "wb") as f:
        pickle.dump(data, f)


def scipy_savemat(data):
    scipy.io.savemat("test.dat", mdict={"out": data})


def numpy_tofile(data):
    data.tofile("test.txt", sep=" ", format="%s")


def json_dump(data):
    with open("test.json", "w") as f:
        json.dump(data.tolist(), f)


perfplot.save(
    "out.png",
    setup=np.random.rand,
    n_range=[2 ** k for k in range(20)],
    kernels=[
        numpy_save,
        numpy_savetxt,
        numpy_savetxt_fmt,
        pickle_dump,
        scipy_savemat,
        numpy_tofile,
        json_dump,
    ],
    equality_check=None,
)

s
shantanu pathak

You can also store NumPy multidimensional array data in .npy file type(it's a binary file).

Use numpy save() function to store data to a file:

import numpy as np
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) #shape (3x3)
np.save('filename.npy', a)

Unload using numpy load() function:

b = np.load('filename.npy')

Sometimes I see an extension .npz. When should I use that?
R
Ronny Brendel

There exist special libraries to do just that. (Plus wrappers for python)

netCDF4: http://www.unidata.ucar.edu/software/netcdf/

netCDF4 Python interface: http://www.unidata.ucar.edu/software/netcdf/software.html#Python

HDF5: http://www.hdfgroup.org/HDF5/

hope this helps


j
jwueller

You can simply traverse the array in three nested loops and write their values to your file. For reading, you simply use the same exact loop construction. You will get the values in exactly the right order to fill your arrays correctly again.


B
BennyD

I have a way to do it using a simply filename.write() operation. It works fine for me, but I'm dealing with arrays having ~1500 data elements.

I basically just have for loops to iterate through the file and write it to the output destination line-by-line in a csv style output.

import numpy as np

trial = np.genfromtxt("/extension/file.txt", dtype = str, delimiter = ",")

with open("/extension/file.txt", "w") as f:
    for x in xrange(len(trial[:,1])):
        for y in range(num_of_columns):
            if y < num_of_columns-2:
                f.write(trial[x][y] + ",")
            elif y == num_of_columns-1:
                f.write(trial[x][y])
        f.write("\n")

The if and elif statement are used to add commas between the data elements. For whatever reason, these get stripped out when reading the file in as an nd array. My goal was to output the file as a csv, so this method helps to handle that.

Hope this helps!


A
Andrew Fan

Pickle is best for these cases. Suppose you have a ndarray named x_train. You can dump it into a file and revert it back using the following command:

import pickle

###Load into file
with open("myfile.pkl","wb") as f:
    pickle.dump(x_train,f)

###Extract from file
with open("myfile.pkl","rb") as f:
    x_temp = pickle.load(f)

k
kenorb

Use JSON module for multidimensional arrays, e.g.

import json
with open(filename, 'w') as f:
   json.dump(myndarray.tolist(), f)

k
kenorb

Write to a file with Python's print():

import numpy as np
import sys

stdout_sys = sys.stdout
np.set_printoptions(precision=8) # Sets number of digits of precision.
np.set_printoptions(suppress=True) # Suppress scientific notations.
np.set_printoptions(threshold=sys.maxsize) # Prints the whole arrays.
with open('myfile.txt', 'w') as f:
    sys.stdout = f
    print(nparr)
    sys.stdout = stdout_sys

Use set_printoptions() to customize how the objects are displayed.


J
JH S

If your array is numpy.array or torch.tensor and dimension is under 4. Use this code.

# from util.npa2csv import Visualarr; Visualarr(x)
import numpy as np
import torch

def Visualarr(arr, out = 'array_out.txt'):
    dim = arr.ndim 
    if isinstance(arr, np.ndarray):
        # (#Images, #Chennels, #Row, #Column)
        if dim == 4:
            arr = arr.transpose(3,2,0,1)
        if dim == 3:
            arr = arr.transpose(2,0,1)

    if isinstance(arr, torch.Tensor):
        arr = arr.numpy()
    
    
    with open(out, 'w') as outfile:    
        outfile.write('# Array shape: {0}\n'.format(arr.shape))
        
        if dim == 1 or dim == 2:
            np.savetxt(outfile, arr, fmt='%-7.3f')

        elif dim == 3:
            for i, arr2d in enumerate(arr):
                outfile.write('# {0}-th channel\n'.format(i))
                np.savetxt(outfile, arr2d, fmt='%-7.3f')

        elif dim == 4:
            for j, arr3d in enumerate(arr):
                outfile.write('\n# {0}-th Image\n'.format(j))
                for i, arr2d in enumerate(arr3d):
                    outfile.write('# {0}-th channel\n'.format(i))
                    np.savetxt(outfile, arr2d, fmt='%-7.3f')

        else:
            print("Out of dimension!")

    

def test_va():
    arr = np.random.rand(4,2)
    tens = torch.rand(2,5,6,3)
    Visualarr(arr)

test_va()