ChatGPT解决这个技术问题 Extra ChatGPT

What are data classes and how are they different from common classes?

With PEP 557 data classes are introduced into python standard library.

They make use of the @dataclass decorator and they are supposed to be "mutable namedtuples with default" but I'm not really sure I understand what this actually means and how they are different from common classes.

What exactly are python data classes and when is it best to use them?

Given the extensive content of the PEP, what else could you want to know? namedtuples are immutable and can't have default values for the attributes, whereas data classes are mutable and can have them.
@jonrsharpe Seems reasonable to me that there should be a stackoverflow thread on the subject. Stackoverflow is meant to be an encyclopedia in Q&A format, no? The answer is never "just look on this other website." There shouldn't have been downvotes here.
There are five threads on how to append an item to a list. One question on @dataclass won't cause the site to disintegrate.
@jonrsharpe namedtuples CAN have default values. Have a look here: stackoverflow.com/questions/11351032/…

a
aerin

Data classes are just regular classes that are geared towards storing state, rather than containing a lot of logic. Every time you create a class that mostly consists of attributes, you make a data class.

What the dataclasses module does is to make it easier to create data classes. It takes care of a lot of boilerplate for you.

This is especially useful when your data class must be hashable; because this requires a __hash__ method as well as an __eq__ method. If you add a custom __repr__ method for ease of debugging, that can become quite verbose:

class InventoryItem:
    '''Class for keeping track of an item in inventory.'''
    name: str
    unit_price: float
    quantity_on_hand: int = 0

    def __init__(
            self, 
            name: str, 
            unit_price: float,
            quantity_on_hand: int = 0
        ) -> None:
        self.name = name
        self.unit_price = unit_price
        self.quantity_on_hand = quantity_on_hand

    def total_cost(self) -> float:
        return self.unit_price * self.quantity_on_hand
    
    def __repr__(self) -> str:
        return (
            'InventoryItem('
            f'name={self.name!r}, unit_price={self.unit_price!r}, '
            f'quantity_on_hand={self.quantity_on_hand!r})'

    def __hash__(self) -> int:
        return hash((self.name, self.unit_price, self.quantity_on_hand))

    def __eq__(self, other) -> bool:
        if not isinstance(other, InventoryItem):
            return NotImplemented
        return (
            (self.name, self.unit_price, self.quantity_on_hand) == 
            (other.name, other.unit_price, other.quantity_on_hand))

With dataclasses you can reduce it to:

from dataclasses import dataclass

@dataclass(unsafe_hash=True)
class InventoryItem:
    '''Class for keeping track of an item in inventory.'''
    name: str
    unit_price: float
    quantity_on_hand: int = 0

    def total_cost(self) -> float:
        return self.unit_price * self.quantity_on_hand

The same class decorator can also generate comparison methods (__lt__, __gt__, etc.) and handle immutability.

namedtuple classes are also data classes, but are immutable by default (as well as being sequences). dataclasses are much more flexible in this regard, and can easily be structured such that they can fill the same role as a namedtuple class.

The PEP was inspired by the attrs project, which can do even more (including slots, validators, converters, metadata, etc.).

If you want to see some examples, I recently used dataclasses for several of my Advent of Code solutions, see the solutions for day 7, day 8, day 11 and day 20.

If you want to use dataclasses module in Python versions < 3.7, then you could install the backported module (requires 3.6) or use the attrs project mentioned above.


In the first example do you intentionally hide class members with instance members of the same names? Please help understanding this idiom.
@VladimirLenin: there are no class attributes, there are only type annotations. See PEP 526, specifically the Class and instance variable annotations section.
@Bananach: the @dataclass generates roughly the same __init__ method, with a quantity_on_hand keyword argument with default value. When you create an instance, it'll set the quantity_on_hand instance attribute, always. So my first, non-dataclass example uses the same pattern to echo what the dataclass generated code will do.
@Bananach: so in the first example, we could just omit setting an instance attribute and not shadow the class attribute, it is redundant setting it anyway in that sense, but dataclasses do set it.
@user2853437 your use case isn’t really supported by dataclasses; perhaps you would be better off using dataclasses’ bigger cousin, attrs. That project supports per-field converters that let you normalise field values. If you want to stick with dataclasses, then yes, do normalisation in the __post_init__ method.
p
pdaawr

Overview

The question has been addressed. However, this answer adds some practical examples to aid in the basic understanding of dataclasses.

What exactly are python data classes and when is it best to use them?

code generators: generate boilerplate code; you can choose to implement special methods in a regular class or have a dataclass implement them automatically. data containers: structures that hold data (e.g. tuples and dicts), often with dotted, attribute access such as classes, namedtuple and others.

"mutable namedtuples with default[s]"

Here is what the latter phrase means:

mutable: by default, dataclass attributes can be reassigned. You can optionally make them immutable (see Examples below).

namedtuple: you have dotted, attribute access like a namedtuple or a regular class.

default: you can assign default values to attributes.

Compared to common classes, you primarily save on typing boilerplate code.

Features

This is an overview of dataclass features (TL;DR? See the Summary Table in the next section).

What you get

Here are features you get by default from dataclasses.

Attributes + Representation + Comparison

import dataclasses


@dataclasses.dataclass
#@dataclasses.dataclass()                                       # alternative
class Color:
    r : int = 0
    g : int = 0
    b : int = 0

These defaults are provided by automatically setting the following keywords to True:

@dataclasses.dataclass(init=True, repr=True, eq=True)

What you can turn on

Additional features are available if the appropriate keywords are set to True.

Order

@dataclasses.dataclass(order=True)
class Color:
    r : int = 0
    g : int = 0
    b : int = 0

The ordering methods are now implemented (overloading operators: < > <= >=), similarly to functools.total_ordering with stronger equality tests.

Hashable, Mutable

@dataclasses.dataclass(unsafe_hash=True)                        # override base `__hash__`
class Color:
    ...

Although the object is potentially mutable (possibly undesired), a hash is implemented.

Hashable, Immutable

@dataclasses.dataclass(frozen=True)                             # `eq=True` (default) to be immutable 
class Color:
    ...

A hash is now implemented and changing the object or assigning to attributes is disallowed.

Overall, the object is hashable if either unsafe_hash=True or frozen=True.

See also the original hashing logic table with more details.

What you don't get

To get the following features, special methods must be manually implemented:

Unpacking

@dataclasses.dataclass
class Color:
    r : int = 0
    g : int = 0
    b : int = 0

    def __iter__(self):
        yield from dataclasses.astuple(self)

Optimization

@dataclasses.dataclass
class SlottedColor:
    __slots__ = ["r", "b", "g"]
    r : int
    g : int
    b : int

The object size is now reduced:

>>> imp sys
>>> sys.getsizeof(Color)
1056
>>> sys.getsizeof(SlottedColor)
888

In some circumstances, __slots__ also improves the speed of creating instances and accessing attributes. Also, slots do not allow default assignments; otherwise, a ValueError is raised.

See more on slots in this blog post.

Summary Table

+----------------------+----------------------+----------------------------------------------------+-----------------------------------------+
|       Feature        |       Keyword        |                      Example                       |           Implement in a Class          |
+----------------------+----------------------+----------------------------------------------------+-----------------------------------------+
| Attributes           |  init                |  Color().r -> 0                                    |  __init__                               |
| Representation       |  repr                |  Color() -> Color(r=0, g=0, b=0)                   |  __repr__                               |
| Comparision*         |  eq                  |  Color() == Color(0, 0, 0) -> True                 |  __eq__                                 |
|                      |                      |                                                    |                                         |
| Order                |  order               |  sorted([Color(0, 50, 0), Color()]) -> ...         |  __lt__, __le__, __gt__, __ge__         |
| Hashable             |  unsafe_hash/frozen  |  {Color(), {Color()}} -> {Color(r=0, g=0, b=0)}    |  __hash__                               |
| Immutable            |  frozen + eq         |  Color().r = 10 -> TypeError                       |  __setattr__, __delattr__               |
|                      |                      |                                                    |                                         |
| Unpacking+           |  -                   |  r, g, b = Color()                                 |   __iter__                              |
| Optimization+        |  -                   |  sys.getsizeof(SlottedColor) -> 888                |  __slots__                              |
+----------------------+----------------------+----------------------------------------------------+-----------------------------------------+

+These methods are not automatically generated and require manual implementation in a dataclass.

* __ne__ is not needed and thus not implemented.

Additional features

Post-initialization

@dataclasses.dataclass
class RGBA:
    r : int = 0
    g : int = 0
    b : int = 0
    a : float = 1.0

    def __post_init__(self):
        self.a : int =  int(self.a * 255)


RGBA(127, 0, 255, 0.5)
# RGBA(r=127, g=0, b=255, a=127)

Inheritance

@dataclasses.dataclass
class RGBA(Color):
    a : int = 0

Conversions

Convert a dataclass to a tuple or a dict, recursively:

>>> dataclasses.astuple(Color(128, 0, 255))
(128, 0, 255)
>>> dataclasses.asdict(Color(128, 0, 255))
{'r': 128, 'g': 0, 'b': 255}

Limitations

Lacks mechanisms to handle starred arguments

Working with nested dataclasses can be complicated

References

R. Hettinger's talk on Dataclasses: The code generator to end all code generators

T. Hunner's talk on Easier Classes: Python Classes Without All the Cruft

Python's documentation on hashing details

Real Python's guide on The Ultimate Guide to Data Classes in Python 3.7

A. Shaw's blog post on A brief tour of Python 3.7 data classes

E. Smith's github repository on dataclasses


I would give two likes if it were possible. Very nice answer @pylang. I tip my hat to you Sir/Madam ;)
This is a much better answer than the accepted one. Bravo!
I do love these micro-blog length extended responses. Well formatted, sectioned into digestable headings, code snippets and a references sections.
any idea why duck typing / type inference e.g. @dataclasses.dataclass class RGB(r=255,g=0,b=0) were not supported? For the basic struct type that shorthand is important to me
Would it make sense to use a @dataclass while turning all features off? And with what exactly would you end up with?
M
Mahmoud Hanafy

From the PEP specification:

A class decorator is provided which inspects a class definition for variables with type annotations as defined in PEP 526, "Syntax for Variable Annotations". In this document, such variables are called fields. Using these fields, the decorator adds generated method definitions to the class to support instance initialization, a repr, comparison methods, and optionally other methods as described in the Specification section. Such a class is called a Data Class, but there's really nothing special about the class: the decorator adds generated methods to the class and returns the same class it was given.

The @dataclass generator adds methods to the class that you'd otherwise define yourself like __repr__, __init__, __lt__, and __gt__.


p
prosti

Consider this simple class Foo

from dataclasses import dataclass
@dataclass
class Foo:    
    def bar():
        pass  

Here is the dir() built-in comparison. On the left-hand side is the Foo without the @dataclass decorator, and on the right is with the @dataclass decorator.

https://i.stack.imgur.com/pvmn9.png

Here is another diff, after using the inspect module for comparison.

https://i.stack.imgur.com/ARkeJ.png