ChatGPT解决这个技术问题 Extra ChatGPT

I'm working on a multi-tenanted application in which some users can define their own data fields (via the admin) to collect additional data in forms and report on the data. The latter bit makes JSONField not a great option, so instead I have the following solution:

class CustomDataField(models.Model):
    """
    Abstract specification for arbitrary data fields.
    Not used for holding data itself, but metadata about the fields.
    """
    site = models.ForeignKey(Site, default=settings.SITE_ID)
    name = models.CharField(max_length=64)

    class Meta:
        abstract = True

class CustomDataValue(models.Model):
    """
    Abstract specification for arbitrary data.
    """
    value = models.CharField(max_length=1024)

    class Meta:
        abstract = True

Note how CustomDataField has a ForeignKey to Site - each Site will have a different set of custom data fields, but use the same database. Then the various concrete data fields can be defined as:

class UserCustomDataField(CustomDataField):
    pass

class UserCustomDataValue(CustomDataValue):
    custom_field = models.ForeignKey(UserCustomDataField)
    user = models.ForeignKey(User, related_name='custom_data')

    class Meta:
        unique_together=(('user','custom_field'),)

This leads to the following use:

custom_field = UserCustomDataField.objects.create(name='zodiac', site=my_site) #probably created in the admin
user = User.objects.create(username='foo')
user_sign = UserCustomDataValue(custom_field=custom_field, user=user, data='Libra')
user.custom_data.add(user_sign) #actually, what does this even do?

But this feels very clunky, particularly with the need to manually create the related data and associate it with the concrete model. Is there a better approach?

Options that have been pre-emptively discarded:

Custom SQL to modify tables on-the-fly. Partly because this won't scale and partly because it's too much of a hack.

Schema-less solutions like NoSQL. I have nothing against them, but they're still not a good fit. Ultimately this data is typed, and the possibility exists of using a third-party reporting application.

JSONField, as listed above, as it's not going to work well with queries.

Pre-emptively, this is not any of these questions: stackoverflow.com/questions/7801729/… stackoverflow.com/questions/2854656/…

3
39 revs, 2 users 88%

As of today, there are four available approaches, two of them requiring a certain storage backend:

Django-eav (the original package is no longer mantained but has some thriving forks) This solution is based on Entity Attribute Value data model, essentially, it uses several tables to store dynamic attributes of objects. Great parts about this solution is that it: uses several pure and simple Django models to represent dynamic fields, which makes it simple to understand and database-agnostic; allows you to effectively attach/detach dynamic attribute storage to Django model with simple commands like: eav.unregister(Encounter) eav.register(Patient) Nicely integrates with Django admin; At the same time being really powerful. Downsides: Not very efficient. This is more of a criticism of the EAV pattern itself, which requires manually merging the data from a column format to a set of key-value pairs in the model. Harder to maintain. Maintaining data integrity requires a multi-column unique key constraint, which may be inefficient on some databases. You will need to select one of the forks, since the official package is no longer maintained and there is no clear leader. The usage is pretty straightforward: import eav from app.models import Patient, Encounter eav.register(Encounter) eav.register(Patient) Attribute.objects.create(name='age', datatype=Attribute.TYPE_INT) Attribute.objects.create(name='height', datatype=Attribute.TYPE_FLOAT) Attribute.objects.create(name='weight', datatype=Attribute.TYPE_FLOAT) Attribute.objects.create(name='city', datatype=Attribute.TYPE_TEXT) Attribute.objects.create(name='country', datatype=Attribute.TYPE_TEXT) self.yes = EnumValue.objects.create(value='yes') self.no = EnumValue.objects.create(value='no') self.unkown = EnumValue.objects.create(value='unkown') ynu = EnumGroup.objects.create(name='Yes / No / Unknown') ynu.enums.add(self.yes) ynu.enums.add(self.no) ynu.enums.add(self.unkown) Attribute.objects.create(name='fever', datatype=Attribute.TYPE_ENUM,\ enum_group=ynu) # When you register a model within EAV, # you can access all of EAV attributes: Patient.objects.create(name='Bob', eav__age=12, eav__fever=no, eav__city='New York', eav__country='USA') # You can filter queries based on their EAV fields: query1 = Patient.objects.filter(Q(eav__city__contains='Y')) query2 = Q(eav__city__contains='Y') | Q(eav__fever=no) Hstore, JSON or JSONB fields in PostgreSQL PostgreSQL supports several more complex data types. Most are supported via third-party packages, but in recent years Django has adopted them into django.contrib.postgres.fields. HStoreField: Django-hstore was originally a third-party package, but Django 1.8 added HStoreField as a built-in, along with several other PostgreSQL-supported field types. This approach is good in a sense that it lets you have the best of both worlds: dynamic fields and relational database. However, hstore is not ideal performance-wise, especially if you are going to end up storing thousands of items in one field. It also only supports strings for values. #app/models.py from django.contrib.postgres.fields import HStoreField class Something(models.Model): name = models.CharField(max_length=32) data = models.HStoreField(db_index=True) In Django's shell you can use it like this: >>> instance = Something.objects.create( name='something', data={'a': '1', 'b': '2'} ) >>> instance.data['a'] '1' >>> empty = Something.objects.create(name='empty') >>> empty.data {} >>> empty.data['a'] = '1' >>> empty.save() >>> Something.objects.get(name='something').data['a'] '1' You can issue indexed queries against hstore fields: # equivalence Something.objects.filter(data={'a': '1', 'b': '2'}) # subset by key/value mapping Something.objects.filter(data__a='1') # subset by list of keys Something.objects.filter(data__has_keys=['a', 'b']) # subset by single key Something.objects.filter(data__has_key='a') JSONField: JSON/JSONB fields support any JSON-encodable data type, not just key/value pairs, but also tend to be faster and (for JSONB) more compact than Hstore. Several packages implement JSON/JSONB fields including django-pgfields, but as of Django 1.9, JSONField is a built-in using JSONB for storage. JSONField is similar to HStoreField, and may perform better with large dictionaries. It also supports types other than strings, such as integers, booleans and nested dictionaries. #app/models.py from django.contrib.postgres.fields import JSONField class Something(models.Model): name = models.CharField(max_length=32) data = JSONField(db_index=True) Creating in the shell: >>> instance = Something.objects.create( name='something', data={'a': 1, 'b': 2, 'nested': {'c':3}} ) Indexed queries are nearly identical to HStoreField, except nesting is possible. Complex indexes may require manually creation (or a scripted migration). >>> Something.objects.filter(data__a=1) >>> Something.objects.filter(data__nested__c=3) >>> Something.objects.filter(data__has_key='a') Django MongoDB Or other NoSQL Django adaptations -- with them you can have fully dynamic models. NoSQL Django libraries are great, but keep in mind that they are not 100% the Django-compatible, for example, to migrate to Django-nonrel from standard Django you will need to replace ManyToMany with ListField among other things. Checkout this Django MongoDB example: from djangotoolbox.fields import DictField class Image(models.Model): exif = DictField() ... >>> image = Image.objects.create(exif=get_exif_data(...)) >>> image.exif {u'camera_model' : 'Spamcams 4242', 'exposure_time' : 0.3, ...} You can even create embedded lists of any Django models: class Container(models.Model): stuff = ListField(EmbeddedModelField()) class FooModel(models.Model): foo = models.IntegerField() class BarModel(models.Model): bar = models.CharField() ... >>> Container.objects.create( stuff=[FooModel(foo=42), BarModel(bar='spam')] ) Django-mutant: Dynamic models based on syncdb and South-hooks Django-mutant implements fully dynamic Foreign Key and m2m fields. And is inspired by incredible but somewhat hackish solutions by Will Hardy and Michael Hall. All of these are based on Django South hooks, which, according to Will Hardy's talk at DjangoCon 2011 (watch it!) are nevertheless robust and tested in production (relevant source code). First to implement this was Michael Hall. Yes, this is magic, with these approaches you can achieve fully dynamic Django apps, models and fields with any relational database backend. But at what cost? Will stability of application suffer upon heavy use? These are the questions to be considered. You need to be sure to maintain a proper lock in order to allow simultaneous database altering requests. If you are using Michael Halls lib, your code will look like this: from dynamo import models test_app, created = models.DynamicApp.objects.get_or_create( name='dynamo' ) test, created = models.DynamicModel.objects.get_or_create( name='Test', verbose_name='Test Model', app=test_app ) foo, created = models.DynamicModelField.objects.get_or_create( name = 'foo', verbose_name = 'Foo Field', model = test, field_type = 'dynamiccharfield', null = True, blank = True, unique = False, help_text = 'Test field for Foo', ) bar, created = models.DynamicModelField.objects.get_or_create( name = 'bar', verbose_name = 'Bar Field', model = test, field_type = 'dynamicintegerfield', null = True, blank = True, unique = False, help_text = 'Test field for Bar', )


this topic was recently talked about at DjangoCon 2013 Europe: slideshare.net/schacki/… and youtube.com/watch?v=67wcGdk4aCc
It may also be worth noting that using django-pgjson on Postgres >= 9.2 allows direct use of postgresql's json field. On Django >= 1.7, the filter API for queries is relatively sane. Postgres >= 9.4 also allows jsonb fields with better indexes for faster queries.
Updated today to note Django's adoption of HStoreField and JSONField into contrib. It includes some form widgets which aren't awesome, but do work if you need to tweak data in the admin.
S
Simon Charette

I've been working on pushing the django-dynamo idea further. The project is still undocumented but you can read the code at https://github.com/charettes/django-mutant.

Actually FK and M2M fields (see contrib.related) also work and it's even possible to define wrapper for your own custom fields.

There's also support for model options such as unique_together and ordering plus Model bases so you can subclass model proxy, abstract or mixins.

I'm actually working on a not in-memory lock mechanism to make sure model definitions can be shared accross multiple django running instances while preventing them using obsolete definition.

The project is still very alpha but it's a cornerstone technology for one of my project so I'll have to take it to production ready. The big plan is supporting django-nonrel also so we can leverage the mongodb driver.


Hi, Simon! I've included a link to your project in my wiki answer just after you've created it on github. :))) Nice to see you on stackoverflow!
G
GDorn

Further research reveals that this is a somewhat special case of Entity Attribute Value design pattern, which has been implemented for Django by a couple of packages.

First, there's the original eav-django project, which is on PyPi.

Second, there's a more recent fork of the first project, django-eav which is primarily a refactor to allow use of EAV with django's own models or models in third-party apps.


I will include it in the wiki.
I would argue the other way around, that EAV is a special case of dynamic modeling. It's heavily used in the "semantic web" community where it's called a "triple" or "quad" if it includes a unique ID. However, it's unlikely to ever be as efficient as a mechanism that can dynamically create and modify SQL tables.
@GDom is eav-django your first choice? I mean which option above did you choose?
@Moreno The right choice is going to depend very heavily on your specific use case. I've used both EAV and JsonFields for different reasons. The latter is directly supported by Django now, so for a new project I'd use that first unless I had a specific need to be able to query on the EAV table. Note that you can query on JsonFields as well.