ChatGPT解决这个技术问题 Extra ChatGPT

Find MongoDB records where array field is not empty

All of my records have a field called "pictures". This field is an array of strings.

I now want the newest 10 records where this array IS NOT empty.

I've googled around, but strangely enough I haven't found much on this. I've read into the $where option, but I was wondering how slow that is to native functions, and if there is a better solution.

And even then, that does not work:

ME.find({$where: 'this.pictures.length > 0'}).sort('-created').limit(10).execFind()

Returns nothing. Leaving this.pictures without the length bit does work, but then it also returns empty records, of course.


r
rpatel

If you also have documents that don't have the key, you can use:

ME.find({ pictures: { $exists: true, $not: {$size: 0} } })

MongoDB doesn't use indexes if $size is involved, so here is a better solution:

ME.find({ pictures: { $exists: true, $ne: [] } })

If your property can have invalid values (like null boolean or others) , then you an add an additional check using $types as proposed in this answer:

With mongo >= 3.2:

ME.find({ pictures: { $exists: true, $type: 'array', $ne: [] } })

With mongo < 3.2:

ME.find({ pictures: { $exists: true, $type: 4, $ne: [] } })

Since the MongoDB 2.6 release, you can compare with the operator $gt, but this could lead to unexpected results (you can find a detailed explanation in this answer):

ME.find({ pictures: { $gt: [] } })

To me that's the correct approach, since it makes sure the array exists and is not empty.
CAREFUL, ME.find({ pictures: { $gt: [] } }) IS DANGEROUS, even in newer MongoDB versions. If you have an index on your list field and that index is utilized during the query, you will get unexpected results. For example: db.doc.find({'nums': { $gt: [] }}).hint({ _id: 1 }).count() returns the right number, while db.doc.find({'nums': { $gt: [] }}).hint({ nums: 1 }).count() returns 0.
See my detailed answer below to learn why this might not work for you: stackoverflow.com/a/42601244/1579058
@wojcikstefan's comment needs to be upvoted to prevent people from using the last suggestion which, indeed, under certain circumstances does not return matching documents.
Using $exists returns documents where pictures is null, undefined, etc. Here is a better solution
b
bgran

After some more looking, especially in the mongodb documents, and puzzling bits together, this was the answer:

ME.find({pictures: {$exists: true, $not: {$size: 0}}})

This doesn't work. I don't know if this previously worked, but this will also return objects which dont have the 'pictures' key.
Unbelievable how this answer has 63 upvotes, when in fact what @rdsoze said is true - the query will also return records that don't have the pictures field.
Be careful, mongoDB won't use indexes if $size is involved link. Would be better to include {$ne:[]} and possibly {$ne:null}.
@rdsoze the very first line of the question states "All of my records have a field called "pictures". This field is an array". What's more, this is a perfectly realistic and common scenario. This answer isn't wrong, it works for the question precisely as written, and criticising or downvoting it for the fact that it doesn't solve a different problem is silly.
@Cec All the documentation says is that if you use $size in the query, it won't use any index to give you faster results. So if you have an index on that field and you want to use it, stick to other approaches like {$ne:[]}, if that works for you, that will use your index.
D
Dan Dascalescu

This might also work for you:

ME.find({'pictures.0': {$exists: true}});

Nice! This also lets you check for a minimum size. Do you know if arrays are always indexed sequentially? Would there ever be a case where pictures.2 exists but pictures.1 does not?
The $exists operator is a boolean, not an offset. @tenbatsu should be using true instead of 1.
@anushr Would there ever be a case where pictures.2 exists but pictures.1 does not? Yes, that case could be happens.
@TheBndr That could only happen if pictures is a sub-doc, not an array. e.g. pictures: {'2': 123}
This is nice and intuitive, but beware if performance is important - it'll do a full collection scan even if you have an index on pictures.
w
wojcikstefan

You care about two things when querying - accuracy and performance. With that in mind, I tested a few different approaches in MongoDB v3.0.14.

TL;DR db.doc.find({ nums: { $gt: -Infinity }}) is the fastest and most reliable (at least in the MongoDB version I tested).

EDIT: This no longer works in MongoDB v3.6! See the comments under this post for a potential solution.

Setup

I inserted 1k docs w/o a list field, 1k docs with an empty list, and 5 docs with a non-empty list.

for (var i = 0; i < 1000; i++) { db.doc.insert({}); }
for (var i = 0; i < 1000; i++) { db.doc.insert({ nums: [] }); }
for (var i = 0; i < 5; i++) { db.doc.insert({ nums: [1, 2, 3] }); }
db.doc.createIndex({ nums: 1 });

I recognize this isn't enough of a scale to take performance as seriously as I am in the tests below, but it's enough to present the correctness of various queries and behavior of chosen query plans.

Tests

db.doc.find({'nums': {'$exists': true}}) returns wrong results (for what we're trying to accomplish).

MacBook-Pro(mongod-3.0.14) test> db.doc.find({'nums': {'$exists': true}}).count()
1005

--

db.doc.find({'nums.0': {'$exists': true}}) returns correct results, but it's also slow using a full collection scan (notice COLLSCAN stage in the explanation).

MacBook-Pro(mongod-3.0.14) test> db.doc.find({'nums.0': {'$exists': true}}).count()
5
MacBook-Pro(mongod-3.0.14) test> db.doc.find({'nums.0': {'$exists': true}}).explain()
{
  "queryPlanner": {
    "plannerVersion": 1,
    "namespace": "test.doc",
    "indexFilterSet": false,
    "parsedQuery": {
      "nums.0": {
        "$exists": true
      }
    },
    "winningPlan": {
      "stage": "COLLSCAN",
      "filter": {
        "nums.0": {
          "$exists": true
        }
      },
      "direction": "forward"
    },
    "rejectedPlans": [ ]
  },
  "serverInfo": {
    "host": "MacBook-Pro",
    "port": 27017,
    "version": "3.0.14",
    "gitVersion": "08352afcca24bfc145240a0fac9d28b978ab77f3"
  },
  "ok": 1
}

--

db.doc.find({'nums': { $exists: true, $gt: { '$size': 0 }}}) returns wrong results. That's because of an invalid index scan advancing no documents. It will likely be accurate but slow without the index.

MacBook-Pro(mongod-3.0.14) test> db.doc.find({'nums': { $exists: true, $gt: { '$size': 0 }}}).count()
0
MacBook-Pro(mongod-3.0.14) test> db.doc.find({'nums': { $exists: true, $gt: { '$size': 0 }}}).explain('executionStats').executionStats.executionStages
{
  "stage": "KEEP_MUTATIONS",
  "nReturned": 0,
  "executionTimeMillisEstimate": 0,
  "works": 2,
  "advanced": 0,
  "needTime": 0,
  "needFetch": 0,
  "saveState": 0,
  "restoreState": 0,
  "isEOF": 1,
  "invalidates": 0,
  "inputStage": {
    "stage": "FETCH",
    "filter": {
      "$and": [
        {
          "nums": {
            "$gt": {
              "$size": 0
            }
          }
        },
        {
          "nums": {
            "$exists": true
          }
        }
      ]
    },
    "nReturned": 0,
    "executionTimeMillisEstimate": 0,
    "works": 1,
    "advanced": 0,
    "needTime": 0,
    "needFetch": 0,
    "saveState": 0,
    "restoreState": 0,
    "isEOF": 1,
    "invalidates": 0,
    "docsExamined": 0,
    "alreadyHasObj": 0,
    "inputStage": {
      "stage": "IXSCAN",
      "nReturned": 0,
      "executionTimeMillisEstimate": 0,
      "works": 1,
      "advanced": 0,
      "needTime": 0,
      "needFetch": 0,
      "saveState": 0,
      "restoreState": 0,
      "isEOF": 1,
      "invalidates": 0,
      "keyPattern": {
        "nums": 1
      },
      "indexName": "nums_1",
      "isMultiKey": true,
      "direction": "forward",
      "indexBounds": {
        "nums": [
          "({ $size: 0.0 }, [])"
        ]
      },
      "keysExamined": 0,
      "dupsTested": 0,
      "dupsDropped": 0,
      "seenInvalidated": 0,
      "matchTested": 0
    }
  }
}

--

db.doc.find({'nums': { $exists: true, $not: { '$size': 0 }}}) returns correct results, but the performance is bad. It technically does an index scan, but then it still advances all the docs and then has to filter through them).

MacBook-Pro(mongod-3.0.14) test> db.doc.find({'nums': { $exists: true, $not: { '$size': 0 }}}).count()
5
MacBook-Pro(mongod-3.0.14) test> db.doc.find({'nums': { $exists: true, $not: { '$size': 0 }}}).explain('executionStats').executionStats.executionStages
{
  "stage": "KEEP_MUTATIONS",
  "nReturned": 5,
  "executionTimeMillisEstimate": 0,
  "works": 2016,
  "advanced": 5,
  "needTime": 2010,
  "needFetch": 0,
  "saveState": 15,
  "restoreState": 15,
  "isEOF": 1,
  "invalidates": 0,
  "inputStage": {
    "stage": "FETCH",
    "filter": {
      "$and": [
        {
          "nums": {
            "$exists": true
          }
        },
        {
          "$not": {
            "nums": {
              "$size": 0
            }
          }
        }
      ]
    },
    "nReturned": 5,
    "executionTimeMillisEstimate": 0,
    "works": 2016,
    "advanced": 5,
    "needTime": 2010,
    "needFetch": 0,
    "saveState": 15,
    "restoreState": 15,
    "isEOF": 1,
    "invalidates": 0,
    "docsExamined": 2005,
    "alreadyHasObj": 0,
    "inputStage": {
      "stage": "IXSCAN",
      "nReturned": 2005,
      "executionTimeMillisEstimate": 0,
      "works": 2015,
      "advanced": 2005,
      "needTime": 10,
      "needFetch": 0,
      "saveState": 15,
      "restoreState": 15,
      "isEOF": 1,
      "invalidates": 0,
      "keyPattern": {
        "nums": 1
      },
      "indexName": "nums_1",
      "isMultiKey": true,
      "direction": "forward",
      "indexBounds": {
        "nums": [
          "[MinKey, MaxKey]"
        ]
      },
      "keysExamined": 2015,
      "dupsTested": 2015,
      "dupsDropped": 10,
      "seenInvalidated": 0,
      "matchTested": 0
    }
  }
}

--

db.doc.find({'nums': { $exists: true, $ne: [] }}) returns correct results and is slightly faster, but the performance is still not ideal. It uses IXSCAN which only advances docs with an existing list field, but then has to filter out the empty lists one by one.

MacBook-Pro(mongod-3.0.14) test> db.doc.find({'nums': { $exists: true, $ne: [] }}).count()
5
MacBook-Pro(mongod-3.0.14) test> db.doc.find({'nums': { $exists: true, $ne: [] }}).explain('executionStats').executionStats.executionStages
{
  "stage": "KEEP_MUTATIONS",
  "nReturned": 5,
  "executionTimeMillisEstimate": 0,
  "works": 1018,
  "advanced": 5,
  "needTime": 1011,
  "needFetch": 0,
  "saveState": 15,
  "restoreState": 15,
  "isEOF": 1,
  "invalidates": 0,
  "inputStage": {
    "stage": "FETCH",
    "filter": {
      "$and": [
        {
          "$not": {
            "nums": {
              "$eq": [ ]
            }
          }
        },
        {
          "nums": {
            "$exists": true
          }
        }
      ]
    },
    "nReturned": 5,
    "executionTimeMillisEstimate": 0,
    "works": 1017,
    "advanced": 5,
    "needTime": 1011,
    "needFetch": 0,
    "saveState": 15,
    "restoreState": 15,
    "isEOF": 1,
    "invalidates": 0,
    "docsExamined": 1005,
    "alreadyHasObj": 0,
    "inputStage": {
      "stage": "IXSCAN",
      "nReturned": 1005,
      "executionTimeMillisEstimate": 0,
      "works": 1016,
      "advanced": 1005,
      "needTime": 11,
      "needFetch": 0,
      "saveState": 15,
      "restoreState": 15,
      "isEOF": 1,
      "invalidates": 0,
      "keyPattern": {
        "nums": 1
      },
      "indexName": "nums_1",
      "isMultiKey": true,
      "direction": "forward",
      "indexBounds": {
        "nums": [
          "[MinKey, undefined)",
          "(undefined, [])",
          "([], MaxKey]"
        ]
      },
      "keysExamined": 1016,
      "dupsTested": 1015,
      "dupsDropped": 10,
      "seenInvalidated": 0,
      "matchTested": 0
    }
  }
}

--

db.doc.find({'nums': { $gt: [] }}) IS DANGEROUS BECAUSE DEPENDING ON THE INDEX USED IT MIGHT GIVE UNEXPECTED RESULTS. That's because of an invalid index scan which advances no documents.

MacBook-Pro(mongod-3.0.14) test> db.doc.find({'nums': { $gt: [] }}).count()
0
MacBook-Pro(mongod-3.0.14) test> db.doc.find({'nums': { $gt: [] }}).hint({ nums: 1 }).count()
0
MacBook-Pro(mongod-3.0.14) test> db.doc.find({'nums': { $gt: [] }}).hint({ _id: 1 }).count()
5

MacBook-Pro(mongod-3.0.14) test> db.doc.find({'nums': { $gt: [] }}).explain('executionStats').executionStats.executionStages
{
  "stage": "KEEP_MUTATIONS",
  "nReturned": 0,
  "executionTimeMillisEstimate": 0,
  "works": 1,
  "advanced": 0,
  "needTime": 0,
  "needFetch": 0,
  "saveState": 0,
  "restoreState": 0,
  "isEOF": 1,
  "invalidates": 0,
  "inputStage": {
    "stage": "FETCH",
    "filter": {
      "nums": {
        "$gt": [ ]
      }
    },
    "nReturned": 0,
    "executionTimeMillisEstimate": 0,
    "works": 1,
    "advanced": 0,
    "needTime": 0,
    "needFetch": 0,
    "saveState": 0,
    "restoreState": 0,
    "isEOF": 1,
    "invalidates": 0,
    "docsExamined": 0,
    "alreadyHasObj": 0,
    "inputStage": {
      "stage": "IXSCAN",
      "nReturned": 0,
      "executionTimeMillisEstimate": 0,
      "works": 1,
      "advanced": 0,
      "needTime": 0,
      "needFetch": 0,
      "saveState": 0,
      "restoreState": 0,
      "isEOF": 1,
      "invalidates": 0,
      "keyPattern": {
        "nums": 1
      },
      "indexName": "nums_1",
      "isMultiKey": true,
      "direction": "forward",
      "indexBounds": {
        "nums": [
          "([], BinData(0, ))"
        ]
      },
      "keysExamined": 0,
      "dupsTested": 0,
      "dupsDropped": 0,
      "seenInvalidated": 0,
      "matchTested": 0
    }
  }
}

--

db.doc.find({'nums.0’: { $gt: -Infinity }}) returns correct results, but has bad performance (uses a full collection scan).

MacBook-Pro(mongod-3.0.14) test> db.doc.find({'nums.0': { $gt: -Infinity }}).count()
5
MacBook-Pro(mongod-3.0.14) test> db.doc.find({'nums.0': { $gt: -Infinity }}).explain('executionStats').executionStats.executionStages
{
  "stage": "COLLSCAN",
  "filter": {
    "nums.0": {
      "$gt": -Infinity
    }
  },
  "nReturned": 5,
  "executionTimeMillisEstimate": 0,
  "works": 2007,
  "advanced": 5,
  "needTime": 2001,
  "needFetch": 0,
  "saveState": 15,
  "restoreState": 15,
  "isEOF": 1,
  "invalidates": 0,
  "direction": "forward",
  "docsExamined": 2005
}

--

db.doc.find({'nums': { $gt: -Infinity }}) surprisingly, this works very well! It gives the right results and it's fast, advancing 5 docs from the index scan phase.

MacBook-Pro(mongod-3.0.14) test> db.doc.find({'nums': { $gt: -Infinity }}).explain('executionStats').executionStats.executionStages
{
  "stage": "FETCH",
  "nReturned": 5,
  "executionTimeMillisEstimate": 0,
  "works": 16,
  "advanced": 5,
  "needTime": 10,
  "needFetch": 0,
  "saveState": 0,
  "restoreState": 0,
  "isEOF": 1,
  "invalidates": 0,
  "docsExamined": 5,
  "alreadyHasObj": 0,
  "inputStage": {
    "stage": "IXSCAN",
    "nReturned": 5,
    "executionTimeMillisEstimate": 0,
    "works": 15,
    "advanced": 5,
    "needTime": 10,
    "needFetch": 0,
    "saveState": 0,
    "restoreState": 0,
    "isEOF": 1,
    "invalidates": 0,
    "keyPattern": {
      "nums": 1
    },
    "indexName": "nums_1",
    "isMultiKey": true,
    "direction": "forward",
    "indexBounds": {
      "nums": [
        "(-inf.0, inf.0]"
      ]
    },
    "keysExamined": 15,
    "dupsTested": 15,
    "dupsDropped": 10,
    "seenInvalidated": 0,
    "matchTested": 0
  }
}

Thanks for your very detailed answer @wojcikstefan. Unfortunately, your suggested solution doesn't seem to work in my case. I have a MongoDB 3.6.4 collection with 2m documents, most of them having a seen_events String array, which is also indexed. Searching with { $gt: -Infinity }, I immediately get 0 documents. Using { $exists: true, $ne: [] } I get the more likely 1,2m docs, with a lot of time being wasted in the FETCH stage: gist.github.com/N-Coder/b9e89a925e895c605d84bfeed648d82c
It seems that you're right @Ncode - this no longer works in MongoDB v3.6 :( I played around with it for a few minutes and here's what I found: 1. db.test_collection.find({"seen_events.0": {$exists: true}}) is bad because it uses a collection scan. 2. db.test_collection.find({seen_events: {$exists: true, $ne: []}}) is bad because its IXSCAN matches all documents and then the filtering is performed in the slow FETCH phase. 3. Same goes for db.test_collection.find({seen_events: {$exists: true, $not: {$size: 0}}}). 4. All the other queries return invalid results.
@NCode found a solution! If you're certain that all non-empty seen_events contain strings, you can use this: db.test_collection.find({seen_events: {$gt: ''}}).count(). To confirm it performs well, check out db.test_collection.find({seen_events: {$gt: ''}}).explain(true).executionStats. You can probably enforce that seen events are strings via schema validation: docs.mongodb.com/manual/core/schema-validation
Thanks! All existing values are strings so I'll try that out. There's also a bug discussing this problem in the MongoDB bugtracker: jira.mongodb.org/browse/SERVER-26655
Using mongodb shell and explain this approach gave me the most efficient query, great work and thank you! Here is the best way I figured out how to express it so far using c# driver: var builder = Builders.Filter; var filter = builder.Gt("myarray", Decimal128.NegativeInfinity); myObjects = col.findOne(filter).ToList();
J
JohnnyHK

Starting with the 2.6 release, another way to do this is to compare the field to an empty array:

ME.find({pictures: {$gt: []}})

Testing it out in the shell:

> db.ME.insert([
{pictures: [1,2,3]},
{pictures: []},
{pictures: ['']},
{pictures: [0]},
{pictures: 1},
{foobar: 1}
])

> db.ME.find({pictures: {$gt: []}})
{ "_id": ObjectId("54d4d9ff96340090b6c1c4a7"), "pictures": [ 1, 2, 3 ] }
{ "_id": ObjectId("54d4d9ff96340090b6c1c4a9"), "pictures": [ "" ] }
{ "_id": ObjectId("54d4d9ff96340090b6c1c4aa"), "pictures": [ 0 ] }

So it properly includes the docs where pictures has at least one array element, and excludes the docs where pictures is either an empty array, not an array, or missing.


CAREFUL this answer might give you trouble if you try to use indexes. Doing db.ME.createIndex({ pictures: 1 }) and then db.ME.find({pictures: {$gt: []}}) will return zero results, at least in MongoDB v3.0.14
@wojcikstefan Good catch. Need to take a fresh look at this.
S
SC1000

Retrieve all and only the documents where 'pictures' is an array and is not empty

ME.find({pictures: {$type: 'array', $ne: []}})

If using a MongoDb version prior to 3.2, use $type: 4 instead of $type: 'array'. Notice that this solution doesn't even use $size, so there's no problem with indexes ("Queries cannot use indexes for the $size portion of a query")

Other solutions, including these (accepted answer):

ME.find({ pictures: { $exists: true, $not: {$size: 0} } }); ME.find({ pictures: { $exists: true, $ne: [] } })

are wrong because they return documents even if, for example, 'pictures' is null, undefined, 0, etc.


Q
Quentin Hayot

You can use any of the following to achieve this. Both also take care of not returning a result for objects that don't have the requested key in them:

db.video.find({pictures: {$exists: true, $gt: {$size: 0}}})
db.video.find({comments: {$exists: true, $not: {$size: 0}}})

B
Banane
db.find({ pictures: { $elemMatch: { $exists: true } } })

$elemMatch matches documents that contain an array field with at least one element that matches the specified query.

So you're matching all arrays with at least an element.


A
Andres Moreno

Use the $elemMatch operator: according to the documentation

The $elemMatch operator matches documents that contain an array field with at least one element that matches all the specified query criteria.

$elemMatches makes sure that the value is an array and that it is not empty. So the query would be something like

ME.find({ pictures: { $elemMatch: {$exists: true }}})

PS A variant of this code is found in MongoDB University's M121 course.


P
Prabhat Yadav
{ $where: "this.pictures.length > 1" }

use the $where and pass the this.field_name.length which return the size of array field and check it by comparing with number. if any array have any value than array size must be at least 1. so all the array field have length more than one, it means it have some data in that array


D
Dave

This also works:

db.getCollection('collectionName').find({'arrayName': {$elemMatch:{}}})

E
Eat at Joes

You can also use the helper method Exists over the Mongo operator $exists

ME.find()
    .exists('pictures')
    .where('pictures').ne([])
    .sort('-created')
    .limit(10)
    .exec(function(err, results){
        ...
    });

L
Luis Fletes
ME.find({pictures: {$exists: true}}) 

Simple as that, this worked for me.


Question wasn't if not null. It was if it is empty.