ChatGPT解决这个技术问题 Extra ChatGPT

Pretty-print an entire Pandas Series / DataFrame

I work with Series and DataFrames on the terminal a lot. The default __repr__ for a Series returns a reduced sample, with some head and tail values, but the rest missing.

Is there a builtin way to pretty-print the entire Series / DataFrame? Ideally, it would support proper alignment, perhaps borders between columns, and maybe even color-coding for the different columns.

The reduced output is due to the default options which you can change using pd.set_option('display.max_rows', 1000) for example, the colouring is something else, I assume you are talking about colouring the html repr output. I don't think this is built in at all.
@EdChum: thanks, I knew about this display.max_rows, the problem is that most of the time I do want output to be truncated. It is only occasionally that I wish to see the full output. I could set the option to a very high value, use the default __repr__, then revert the value, but that seems a bit cumbersome, and I might as well write my own pretty-print function in that case.
@EdChum: regarding colors - this is a color terminal, so it would be nice to have each row printed in a different color, to easily distinguish values from each other. Pandas works well with ipython, which uses advanced terminal features - including color - so I was wondering if Pandas had some coloring capabilities itself.
I use Pandas in IPython Notebook rather than IPython as a terminal shell, I don't see any options in set_option that supports the colouring, it maybe something that could be done as a plugin to apply some css or output formatting. This is the only way I think you could achieve this
Colouring the output, just like the tibble data structure in R, that colored red the negative values will be a nice plugins for pandas.

h
harmonica141

You can also use the option_context, with one or more options:

with pd.option_context('display.max_rows', None, 'display.max_columns', None):  # more options can be specified also
    print(df)

This will automatically return the options to their previous values.

If you are working on jupyter-notebook, using display(df) instead of print(df) will use jupyter rich display logic (like so).


Thank you! Note that setting the max values to None turns them off. Using the with pd.option_context() option documents what is going on very clearly and explicitly, and makes it clear how to achieve other changes in output formatting that may be desired, using e.g. precision, max_colwidth, expand_frame_repr, colheader_justify, date_yearfirst, encoding, and many many more: pandas.pydata.org/pandas-docs/stable/options.html
For anyone who wonder: when using jupyter, use display(df) instead of print(df)
If the DataFrame is really large, it might make sense to write it as a .csv temporarily and use Jupyter Lab's fast csv viewer
To avoid wrapping columns below each other you can also add ..., 'display.width', 100, ... (with an appropriate value) to the context-manager.
Can anyone explain why Pandas syntax is always so haphazzard? Why can it not be pd.option_context(display_max_rows=None)? Or pd.option_context({'display.max_rows': None}) or some other reasonable syntax?
A
Andrey Shokhin

No need to hack settings. There is a simple way:

print(df.to_string())

How many columns do you have? I've checked with 1300 columns and it work fine: from itertools import combinations from string import ascii_letters df = pd.DataFrame(data=[[0]*1326], index=[0], columns=[(a+b) for a,b in combinations(ascii_letters, 2)])
Using the with pd.option_context() option documents what is going on much more clearly and explicitly, and makes it clear how to achieve other changes in output formatting that may be desired, using e.g. precision, max_colwidth, expand_frame_repr, colheader_justify, date_yearfirst, encoding, and many many more: pandas.pydata.org/pandas-docs/stable/options.html
I do prefer the other answers because this looks weird in my example if I have a lot of columns and my screen is not wide enough to display them. Column names and data will do separate line breaks, so it's not easy to see which data belongs to which column name anymore.
The asker requested a "pretty-print" solution. This is not it. If this were used within Jupyter Notebook, the built-in pretty display wouldn't be used at all. It's better to use pd.set_option('display.max_rows', None) just before printing df.
This may not be the solution to the question asked, but it is exactly what I was looking for to just view the df and move on.
D
Donald Duck

Sure, if this comes up a lot, make a function like this one. You can even configure it to load every time you start IPython: https://ipython.org/ipython-doc/1/config/overview.html

def print_full(x):
    pd.set_option('display.max_rows', len(x))
    print(x)
    pd.reset_option('display.max_rows')

As for coloring, getting too elaborate with colors sounds counterproductive to me, but I agree something like bootstrap's .table-striped would be nice. You could always create an issue to suggest this feature.


The link is dead. Perhaps it should be ipython.org/ipython-doc/dev/config/intro.html?
It would be great, if someone, anyone, even the author maybe, could verify and fix the link and flag these comments as obsolete.
This is bad, as it assumes that the option was set to default before the printing operation which is not necessarily the case and might therefore lead to unexpected behavior. Using the option context in conjunction with the with statement is the more robust option and will revert to anything that was set before.
doing it like this will print without any table formatting, is it possible to format the output as it would usually be by calling 'df' at the end of a cell?
l
lucidyan

After importing pandas, as an alternative to using the context manager, set such options for displaying entire dataframes:

pd.set_option('display.max_columns', None)  # or 1000
pd.set_option('display.max_rows', None)  # or 1000
pd.set_option('display.max_colwidth', None)  # or 199

For full list of useful options, see:

pd.describe_option('display')

Thanks for adding this. "None" is way better than the actual length of each single dataframe if you want to display more than one dataframe.
@Corrumpo For some options you should use -1 int value instead of None, if you want full representation
Prefixing display. in the option name doesn't seem to be necessary. For example, set_option('max_columns') works equally well.
minus 1 does not work anymore. None does the job.
U
Urda

Use the tabulate package:

pip install tabulate

And consider the following example usage:

import pandas as pd
from io import StringIO
from tabulate import tabulate

c = """Chromosome Start End
chr1 3 6
chr1 5 7
chr1 8 9"""

df = pd.read_table(StringIO(c), sep="\s+", header=0)

print(tabulate(df, headers='keys', tablefmt='psql'))

+----+--------------+---------+-------+
|    | Chromosome   |   Start |   End |
|----+--------------+---------+-------|
|  0 | chr1         |       3 |     6 |
|  1 | chr1         |       5 |     7 |
|  2 | chr1         |       8 |     9 |
+----+--------------+---------+-------+

tabulate goes haywire when printing a pd.Series.
@eliu Thanks for the info. You always have pd_series.to_frame()
A
Asclepius

Using pd.options.display

This answer is a variation of the prior answer by lucidyan. It makes the code more readable by avoiding the use of set_option.

After importing pandas, as an alternative to using the context manager, set such options for displaying large dataframes:

def set_pandas_display_options() -> None:
    """Set pandas display options."""
    # Ref: https://stackoverflow.com/a/52432757/
    display = pd.options.display

    display.max_columns = 1000
    display.max_rows = 1000
    display.max_colwidth = 199
    display.width = 1000
    # display.precision = 2  # set as needed

set_pandas_display_options()

After this, you can use either display(df) or just df if using a notebook, otherwise print(df).

Using to_string

Pandas 0.25.3 does have DataFrame.to_string and Series.to_string methods which accept formatting options.

Using to_markdown

If what you need is markdown output, Pandas 1.0.0 has DataFrame.to_markdown and Series.to_markdown methods.

Using to_html

If what you need is HTML output, Pandas 0.25.3 does have a DataFrame.to_html method but not a Series.to_html. Note that a Series can be converted to a DataFrame.


Yes this appears a better elegant way for displaying in Jupyter instead of set_option. Is there a way to left align the displayed output? Right rows of displayed dataframe are aligned right by default.
Additional tip: you may need to use print(...). Examples: print(df.to_string()) or print(df.to_markdown())
R
R Kisyula

If you are using Ipython Notebook (Jupyter). You can use HTML

from IPython.core.display import HTML
display(HTML(df.to_html()))

please show the output for comparison with other solutions, Tnx.
Beware to try to show a big Dataframe with this. You might run out of memory and never be able to open again your notebook unless you edit the raw code in your .ipyndb file. True story ;)
This is the best option for me. The table is displayed in full with coloring. Nice one!
L
Liang Zulin

Try this

pd.set_option('display.height',1000)
pd.set_option('display.max_rows',500)
pd.set_option('display.max_columns',500)
pd.set_option('display.width',1000)

A
AKW

Scripts

Nobody has proposed this simple plain-text solution:

from pprint import pprint

pprint(s.to_dict())

which produces results like the following:

{'% Diabetes': 0.06365372374283895,
 '% Obesity': 0.06365372374283895,
 '% Bachelors': 0.0,
 '% Poverty': 0.09548058561425843,
 '% Driving Deaths': 1.1775938892425206,
 '% Excessive Drinking': 0.06365372374283895}

Jupyter Notebooks

Additionally, when using Jupyter notebooks, this is a great solution.

Note: pd.Series() has no .to_html() so it must be converted to pd.DataFrame()

from IPython.display import display, HTML

display(HTML(s.to_frame().to_html()))

which produces results like the following:

https://i.stack.imgur.com/QMmSK.png


d
data princess

datascroller was created in part to solve this problem.

pip install datascroller

It loads the dataframe into a terminal view you can "scroll" with your mouse or arrow keys, kind of like an Excel workbook at the terminal that supports querying, highlighting, etc.

import pandas as pd
from datascroller import scroll

# Call `scroll` with a Pandas DataFrame as the sole argument:
my_df = pd.read_csv('<path to your csv>')
scroll(my_df)

Disclosure: I am one of the authors of datascroller


G
Giorgos Myrianthous

You can set expand_frame_repr to False:

display.expand_frame_repr : boolean Whether to print out the full DataFrame repr for wide DataFrames across multiple lines, max_columns is still respected, but the output will wrap-around across multiple “pages” if its width exceeds display.width. [default: True]

pd.set_option('expand_frame_repr', False)

For more details read How to Pretty-Print Pandas DataFrames and Series


A
Abhinav Ravi

You can achieve this using below method. just pass the total no. of columns present in the DataFrame as arg to

'display.max_columns'

For eg :

df= DataFrame(..)
with pd.option_context('display.max_rows', None, 'display.max_columns', df.shape[1]):
    print(df)

J
JSVJ

Try using display() function. This would automatically use Horizontal and vertical scroll bars and with this you can display different datasets easily instead of using print().

display(dataframe)

display() supports proper alignment also.

However if you want to make the dataset more beautiful you can check pd.option_context(). It has lot of options to clearly show the dataframe.

Note - I am using Jupyter Notebooks.