python – django多对多字段:仅预取主键

我正在尝试优化Django应用程序的数据库查询.这是一个简化的例子:

class Label(models.Model):
    name = models.CharField(max_length=200)
    # ... many other fields ...

class Thing(models.Model):
    name = models.CharField(max_length=200)
    labels = models.ManyToManyField(Label)

我有一个函数可以获取所有标签和事物并将它们放入JSON数据结构中,其中Things使用它们的ID(主键)引用标签.像这样的东西:

{
    'labels': [
        { 'id': 123, 'name': 'label foo' },
        ...
    ],
    'things': [
        { 'id': 45, 'name': 'thing bar', 'labels': [ 123, ... ] },
        ...
    ]
}

使用Django获取此类数据结构的最有效方法是什么?假设我有L标签和T物品,平均物品有x标签.

方法1:

data = {}
data['labels'] = [model_to_dict(label) for label in Label.objects.all()]
data['things'] = [model_to_dict(thing) for thing in Thing.objects.all()]

这使得(11T)数据库查询,因为model_to_dict(thing)需要单独获取每个Thing的标签.

方法2:

data = {}
data['labels'] = [model_to_dict(label) for label in Label.objects.all()]
data['things'] = [model_to_dict(thing) for thing in
                    Thing.objects.prefetch_related('labels').all()]

这仅使(111)数据库查询,因为现在获取的东西在单个附加查询中预取了其标签.

这仍然不能令人满意. prefetch_related(‘labels’)将获取同一Label的许多副本,而我只需要它们的id.有没有办法只预取标签的ID?我尝试了prefetch_related(‘labels__id’),但是没有用.我也担心因为T很大(数百),prefetch_related(‘labels’)导致带有大IN子句的SQL查询. L小得多(<10),所以我可以这样做: 方法3:

data = {}
data['labels'] = [model_to_dict(label) for label in
                    Label.objects.prefetch_related('thing_set').all()]
things = list(Thing.objects.all())
# plug in label ids by hand, and also fetch things that have zero labels
# somehow

这会导致较小的IN子句,但仍然不能令人满意,因为如果Thing有多个标签,prefetch_related(‘thing_set’)会获取重复的东西.

摘要:

Label和Thing由ManyToManyField连接.无论如何,我正在获取所有标签和物品.那么我如何有效地获取他们的多对多关系呢?

我知道了.感谢ilvar,他对这个问题的评论指向了 through tables.

If you don’t specify an explicit through model, there is still an
implicit through model class you can use to directly access the table
created to hold the association. It has three fields to link the
models.

长话短说:

# Fetch all labels and things:
labels = list(Label.objects.all())
things = list(Thing.objects.all())
# Fetch all label-thing pairs:
labels_of = defaultdict(lambda: [])
for pair in Thing.labels.through.objects.filter(label__in=labels):
    labels_of[pair.thing_id].append(pair.label_id)
# Put everything together:
data = {}
data['labels'] = [model_to_dict(label) for label in labels]
data['things'] = []
for thing in things:
    thing_dict = model_to_dict(thing, exclude='labels')
    thing_dict['labels'] = labels_of[thing.id]
    data['things'].append(thing_dict)

这会产生(111)查询,并且不会重复获取任何内容.我也可以将第一个for循环更改为:

for pair in Thing.labels.through.objects.filter(thing__in=things):

如果我有比标签更多的标签,这将导致具有较小的IN子句的查询.

Django-debug-toolbar的debugsqlshell管理命令非常适合实际查看一段代码正在进行的查询.

相关文章
相关标签/搜索