ChatGPT解决这个技术问题 Extra ChatGPT

How to query as GROUP BY in django?

I query a model:

Members.objects.all()

And it returns:

Eric, Salesman, X-Shop
Freddie, Manager, X2-Shop
Teddy, Salesman, X2-Shop
Sean, Manager, X2-Shop

What I want is to know the best Django way to fire a group_by query to my database, like:

Members.objects.all().group_by('designation')

Which doesn't work, of course. I know we can do some tricks on django/db/models/query.py, but I am just curious to know how to do it without patching.


F
Flimm

If you mean to do aggregation you can use the aggregation features of the ORM:

from django.db.models import Count
result = (Members.objects
    .values('designation')
    .annotate(dcount=Count('designation'))
    .order_by()
)

This results in a query similar to

SELECT designation, COUNT(designation) AS dcount
FROM members GROUP BY designation

and the output would be of the form

[{'designation': 'Salesman', 'dcount': 2}, 
 {'designation': 'Manager', 'dcount': 2}]

If you don't include the order_by(), you may get incorrect results if the default sorting is not what you expect.

If you want to include multiple fields in the results, just add them as arguments to values, for example:

    .values('designation', 'first_name', 'last_name')

References:

Django documentation: values(), annotate(), and Count

Django documentation: Aggregation, and in particular the section entitled Interaction with default ordering or order_by()


@Harry: You can chain it. Something like: Members.objects.filter(date=some_date).values('designation').annotate(dcount=Count('designation'))
i have a question, this query is only returning designation and dcount, what if i want to get other values of the table too?
Note that if your sorting is a field other than designation, it will not work without resetting the sort. See stackoverflow.com/a/1341667/202137
@Gidgidonihah True, the example should read Members.objects.order_by('disignation').values('designation').annotate(dcount=Count('designation'))
i have a question, this query is only returning designation and dcount, what if i want to get other values of the table too?
3
3 revs, 3 users 82%

An easy solution, but not the proper way is to use raw SQL:

results = Members.objects.raw('SELECT * FROM myapp_members GROUP BY designation')

Another solution is to use the group_by property:

query = Members.objects.all().query
query.group_by = ['designation']
results = QuerySet(query=query, model=Members)

You can now iterate over the results variable to retrieve your results. Note that group_by is not documented and may be changed in future version of Django.

And... why do you want to use group_by? If you don't use aggregation, you can use order_by to achieve an alike result.


Can you please tell me how to do it using order_by??
Hi, if you are not using aggregation you could emulate group_by by using an order_by and eliminate the entries you don't need. Of course, this is an emulation and is only useable when using not a lot of data. Since he didn't speak of aggregation, I thought it could be a solution.
Hey this is great - can you please explain how to the use execute_sql it doesn't appear to work..
Note this no longer works on Django 1.9. stackoverflow.com/questions/35558120/…
This is kind of a hack-ish way to use the ORM. You shouldn't have to instantiate new querysets passing in old ones manually.
i
inostia

You can also use the regroup template tag to group by attributes. From the docs:

cities = [
    {'name': 'Mumbai', 'population': '19,000,000', 'country': 'India'},
    {'name': 'Calcutta', 'population': '15,000,000', 'country': 'India'},
    {'name': 'New York', 'population': '20,000,000', 'country': 'USA'},
    {'name': 'Chicago', 'population': '7,000,000', 'country': 'USA'},
    {'name': 'Tokyo', 'population': '33,000,000', 'country': 'Japan'},
]

...

{% regroup cities by country as countries_list %}

<ul>
    {% for country in countries_list %}
        <li>{{ country.grouper }}
            <ul>
            {% for city in country.list %}
                <li>{{ city.name }}: {{ city.population }}</li>
            {% endfor %}
            </ul>
        </li>
    {% endfor %}
</ul>

Looks like this:

India Mumbai: 19,000,000 Calcutta: 15,000,000

Mumbai: 19,000,000

Calcutta: 15,000,000

USA New York: 20,000,000 Chicago: 7,000,000

New York: 20,000,000

Chicago: 7,000,000

Japan Tokyo: 33,000,000

Tokyo: 33,000,000

It also works on QuerySets I believe.

source: https://docs.djangoproject.com/en/2.1/ref/templates/builtins/#regroup

edit: note the regroup tag does not work as you would expect it to if your list of dictionaries is not key-sorted. It works iteratively. So sort your list (or query set) by the key of the grouper before passing it to the regroup tag.


This is perfect! I've searched a lot for a simple way to do this. And it works on querysets as well, that's how I used it.
this is totally wrong if you read from database big set of data and then just use aggregated values.
@SławomirLenart sure, this might not be as efficient as a straight DB query. But for simple use cases it can be a nice solution
This will work if the result shown in template. But, for JsonResponse or other indirect response. this solution will not work.
@Willysatrionugroho if you wanted to do it in a view, for example, stackoverflow.com/questions/477820/… might work for you
L
Luis Masuelli

Django does not support free group by queries. I learned it in the very bad way. ORM is not designed to support stuff like what you want to do, without using custom SQL. You are limited to:

RAW sql (i.e. MyModel.objects.raw())

cr.execute sentences (and a hand-made parsing of the result).

.annotate() (the group by sentences are performed in the child model for .annotate(), in examples like aggregating lines_count=Count('lines'))).

Over a queryset qs you can call qs.query.group_by = ['field1', 'field2', ...] but it is risky if you don't know what query are you editing and have no guarantee that it will work and not break internals of the QuerySet object. Besides, it is an internal (undocumented) API you should not access directly without risking the code not being anymore compatible with future Django versions.


indeed you are limited not only in free group-by, so try SQLAlchemy instead of Django ORM.
R
Risadinha

The following module allows you to group Django models and still work with a QuerySet in the result: https://github.com/kako-nawao/django-group-by

For example:

from django_group_by import GroupByMixin

class BookQuerySet(QuerySet, GroupByMixin):
    pass

class Book(Model):
    title = TextField(...)
    author = ForeignKey(User, ...)
    shop = ForeignKey(Shop, ...)
    price = DecimalField(...)

class GroupedBookListView(PaginationMixin, ListView):
    template_name = 'book/books.html'
    model = Book
    paginate_by = 100

    def get_queryset(self):
        return Book.objects.group_by('title', 'author').annotate(
            shop_count=Count('shop'), price_avg=Avg('price')).order_by(
            'name', 'author').distinct()

    def get_context_data(self, **kwargs):
        return super().get_context_data(total_count=self.get_queryset().count(), **kwargs)

'book/books.html'

<ul>
{% for book in object_list %}
    <li>
        <h2>{{ book.title }}</td>
        <p>{{ book.author.last_name }}, {{ book.author.first_name }}</p>
        <p>{{ book.shop_count }}</p>
        <p>{{ book.price_avg }}</p>
    </li>
{% endfor %}
</ul>

The difference to the annotate/aggregate basic Django queries is the use of the attributes of a related field, e.g. book.author.last_name.

If you need the PKs of the instances that have been grouped together, add the following annotation:

.annotate(pks=ArrayAgg('id'))

NOTE: ArrayAgg is a Postgres specific function, available from Django 1.9 onwards: https://docs.djangoproject.com/en/3.2/ref/contrib/postgres/aggregates/#arrayagg


This django-group-by is an alternative to the values method. It's for different purpose I think.
@LShi It's not an alternative to values, of course not. values is an SQL select while group_by is an SQL group by (as the name indicates...). Why the downvote? We are using such code in production to implement complex group_by statements.
Its doc says group_by "behaves mostly like the values method, but with one difference..." The doc doesn't mention SQL GROUP BY and the use case it provides doesn't suggest it has anything to do with SQL GROUP BY. I will draw back the down-vote when someone has made this clear, but that doc is really misleading.
After reading the doc for values, I found I missed that values itself works like a GROUP BY. It's my fault. I think it's simpler to use itertools.groupby than this django-group-by when values is insufficient.
It is impossible to do the group by from above with a simple values call -with or without annotate and without fetching everything from the database. Your suggestion of itertools.groupby works for small datasets but not for several thousands of datasets that you probably want to page. Of course, at that point you'll have to think about a special search index that contains prepared (already grouped) data, anyway.
r
ralfzen

You could also use pythons built-in itertools.groupby directly:

from itertools import groupby

designation_key_func = lambda member: member.designation
queryset = Members.objects.all().select_related("designation")

for designation, member_group in groupby(queryset, designation_key_func):
    print(f"{designation} : {list(member_group)}")

No raw sql, subqueries, third-party-libs or templatetags needed and pythonic and explicit in my eyes.


what about the performance??
d
djvg

The documentation says that you can use values to group the queryset .

class Travel(models.Model):
    interest = models.ForeignKey(Interest)
    user = models.ForeignKey(User)
    time = models.DateTimeField(auto_now_add=True)

# Find the travel and group by the interest:

>>> Travel.objects.values('interest').annotate(Count('user'))
<QuerySet [{'interest': 5, 'user__count': 2}, {'interest': 6, 'user__count': 1}]>
# the interest(id=5) had been visited for 2 times, 
# and the interest(id=6) had only been visited for 1 time.

>>> Travel.objects.values('interest').annotate(Count('user', distinct=True)) 
<QuerySet [{'interest': 5, 'user__count': 1}, {'interest': 6, 'user__count': 1}]>
# the interest(id=5) had been visited by only one person (but this person had 
#  visited the interest for 2 times

You can find all the books and group them by name using this code:

Book.objects.values('name').annotate(Count('id')).order_by() # ensure you add the order_by()

You can watch some cheat sheet here.


Why you need group_by() to return the right result?
V
Van Gale

You need to do custom SQL as exemplified in this snippet:

Custom SQL via subquery

Or in a custom manager as shown in the online Django docs:

Adding extra Manager methods


Kind of round-trip solution. I would have used it, if i had some extended use of that. But here i just need the number of members per designation thats all.
No problem. I thought about mentioning 1.1 aggregation features but made the assumption you were using the release version :)
It's all about using raw queries, which show the weakness of Django's ORM.
r
rumbarum

This is little complex, but get questioner what he/she expected with only one DB hit.

from django.db.models import Subquery, OuterRef

member_qs = Members.objects.filter(
    pk__in = Members.objects.values('designation').distinct().annotate(
        pk = Subquery(
          Members.objects.filter(
            designation= OuterRef("designation")
        )
        .order_by("pk") # you can set other column, e.g. -pk, create_date...
        .values("pk")[:1]
        ) 
    )
   .values_list("pk", flat=True)
)

R
Raekkeri

If, in other words, you need to just "remove duplicates" based on some field, and otherwise just to query the ORM objects as they are, I came up with the following workaround:

from django.db.models import OuterRef, Exists

qs = Members.objects.all()
qs = qs.annotate(is_duplicate=Exists(
    Members.objects.filter(
        id__lt=OuterRef('id'),
        designation=OuterRef('designation')))
qs = qs.filter(is_duplicate=False)

So, basically we're just annotating the is_duplicate value by using some convenient filtering (which might vary based on your model and requirements), and then simply using that field to filter out the duplicates.


F
Flimm

If you want the model objects, and not just plain values or dictionaries, you can do something like this:

members = Member.objects.filter(foobar=True)
designations = Designation.objects.filter(member__in=members).order_by('pk').distinct()

Replace member__in with the lowercase version of your model name, followed by __in. For example, if your model name is Car, use car__in.


Ö
Özer

For some reason, the above mentioned solutions did not work for me. This is what worked:

dupes_query = MyModel.objects.all().values('my_field').annotate(
    count=Count('id')
).order_by('-count').filter(count__gt=1)

I hope it helps.


K
Kiran S youtube channel
from django.db.models import Sum
Members.objects.annotate(total=Sum(designation))

first you need to import Sum then ..