Consider simple Django models Event
and Participant
:
class Event(models.Model):
title = models.CharField(max_length=100)
class Participant(models.Model):
event = models.ForeignKey(Event, db_index=True)
is_paid = models.BooleanField(default=False, db_index=True)
It's easy to annotate events query with total number of participants:
events = Event.objects.all().annotate(participants=models.Count('participant'))
How to annotate with count of participants filtered by is_paid=True
?
I need to query all events regardless of number of participants, e.g. I don't need to filter by annotated result. If there are 0
participants, that's ok, I just need 0
in annotated value.
The example from documentation doesn't work here, because it excludes objects from query instead of annotating them with 0
.
Update. Django 1.8 has new conditional expressions feature, so now we can do like this:
events = Event.objects.all().annotate(paid_participants=models.Sum(
models.Case(
models.When(participant__is_paid=True, then=1),
default=0,
output_field=models.IntegerField()
)))
Update 2. Django 2.0 has new Conditional aggregation feature, see the accepted answer below.
Update 3. For Django 3.x please check this answer below.
Conditional aggregation in Django 2.0 allows you to further reduce the amount of faff this has been in the past. This will also use Postgres' filter
logic, which is somewhat faster than a sum-case (I've seen numbers like 20-30% bandied around).
Anyway, in your case, we're looking at something as simple as:
from django.db.models import Q, Count
events = Event.objects.annotate(
paid_participants=Count('participants', filter=Q(participants__is_paid=True))
)
There's a separate section in the docs about filtering on annotations. It's the same stuff as conditional aggregation but more like my example above. Either which way, this is a lot healthier than the gnarly subqueries I was doing before.
Just discovered that Django 1.8 has new conditional expressions feature, so now we can do like this:
events = Event.objects.all().annotate(paid_participants=models.Sum(
models.Case(
models.When(participant__is_paid=True, then=1),
default=0, output_field=models.IntegerField()
)))
Count
(instead of Sum
) I guess we should set default=None
(if not using the django 2 filter
argument).
UPDATE
The sub-query approach which I mention is now supported in Django 1.11 via subquery-expressions.
Event.objects.annotate(
num_paid_participants=Subquery(
Participant.objects.filter(
is_paid=True,
event=OuterRef('pk')
).values('event')
.annotate(cnt=Count('pk'))
.values('cnt'),
output_field=models.IntegerField()
)
)
I prefer this over aggregation (sum+case), because it should be faster and easier to be optimized (with proper indexing).
For older version, the same can be achieved using .extra
Event.objects.extra(select={'num_paid_participants': "\
SELECT COUNT(*) \
FROM `myapp_participant` \
WHERE `myapp_participant`.`is_paid` = 1 AND \
`myapp_participant`.`event_id` = `myapp_event`.`id`"
})
.extra
, as I prefer to avoid SQL in Django :) I'll update the question.
Django 1.8.2
, so i guess you are with that version and that's why its working for you. You can read more about that here and here
None
too. My solution was to use Coalesce
(from django.db.models.functions import Coalesce
). You use it like this: Coalesce(Subquery(...), 0)
. There may be a better approach, though.
I would suggest to use the .values
method of your Participant
queryset instead.
For short, what you want to do is given by:
Participant.objects\
.filter(is_paid=True)\
.values('event')\
.distinct()\
.annotate(models.Count('id'))
A complete example is as follow:
Create 2 Events: event1 = Event.objects.create(title='event1')
event2 = Event.objects.create(title='event2')
Add Participants to them: part1l = [Participant.objects.create(event=event1, is_paid=((_%2) == 0))\
for _ in range(10)]
part2l = [Participant.objects.create(event=event2, is_paid=((_%2) == 0))\
for _ in range(50)]
Group all Participants by their event field: Participant.objects.values('event')
>
The only drawback is that you have to retrieve the Event
afterwards as you only have the id
from the method above.
What result I am looking for:
People (assignee) who have tasks added to a report. - Total Unique count of People
People who have tasks added to a report but, for task whose billability is more than 0 only.
In general, I would have to use two different queries:
Task.objects.filter(billable_efforts__gt=0)
Task.objects.all()
But I want both in one query. Hence:
Task.objects.values('report__title').annotate(withMoreThanZero=Count('assignee', distinct=True, filter=Q(billable_efforts__gt=0))).annotate(totalUniqueAssignee=Count('assignee', distinct=True))
Result:
<QuerySet [{'report__title': 'TestReport', 'withMoreThanZero': 37, 'totalUniqueAssignee': 50}, {'report__title': 'Utilization_Report_April_2019', 'withMoreThanZero': 37, 'totalUniqueAssignee': 50}]>
For Django 3.x just write filter after the annotate:
User.objects.values('user_id')
.annotate(xyz=models.Count('likes'))
.filter(xyz__gt=100)
In above xyz is not the model field in User Model and here we are filtering the users who have likes (or xyz) more than 100.
Success story sharing
aggregate
usage is shown. Have you already tested such queries? (I haven't and I want to believe! :)