ChatGPT解决这个技术问题 Extra ChatGPT

DynamoDB vs MongoDB NoSQL [closed]

Closed. This question is opinion-based. It is not currently accepting answers. Want to improve this question? Update the question so it can be answered with facts and citations by editing this post. Closed 3 years ago. Improve this question

I'm trying to figure out what can I use for a future project, we plan to store about 500k records per month in the first year and maybe more for the next years this is a vertical application so there's no need to use a database for this, that's the reason why I decided to choose a NoSQL data storage.

The first option that came to my mind was mongo DB since is a very mature product with a lot of support from the community but on the other hand, we got a brand new product that offers a managed service at top performance, I'll develop this application but there's no maintenance plan (at least for now) so I think that will be a huge advantage since amazon provides an elastic way to scale.

My major concern is about the query structure, I haven't looked at the dynamo DB query capabilities yet but since is a k/v data storage I feel that this could be more limited than mongo DB.

If someone had the experience of moving a project from MongoDB to DynamoDB, any advice will be totally appreciated.

If you want advice on query structure I would suggest providing an example of your schema along with your use cases for accessing data. Without these it is hard to make a judgement on fit.
Indeed, how you're querying the data could dramatically influence the backend db selection. How hierarchical would be my #1 question.
I'm surprised this question hasn't already been closed by ranking SO people. Usually questions that seek advice get closed because they're not asking for help with a very specific problem.

J
Justin Johnson

I know this is old, but it still comes up when you search for the comparison. We were using Mongo, have moved almost entirely to Dynamo, which is our first choice now. Not because it has more features, it doesn't. Mongo has a better query language, you can index within a structure, there's lots of little things. The superiority of Dynamo is in what the OP stated in his comment: it's easy. You don't have to take care of any servers. When you start to set up a Mongo sharded solution, it gets complicated. You can go to one of the hosting companies, but that's not cheap either. With Dynamo, if you need more throughput, you just click a button. You can write scripts to scale automatically. When it's time to upgrade Dynamo, it's done for you. That is all a lot of precious stress and time not spent. If you don't have dedicated ops people, Dynamo is excellent.

So we are now going on Dynamo by default. Mongo maybe, if the data structure is complicated enough to warrant it, but then we'd probably go back to a SQL database. Dynamo is obtuse, you really need to think about how you're going to build it, and likely you'll use Redis in Elasticcache to make it work for complex stuff. But it sure is nice to not have to take care of it. You code. That's it.


If one has to compare database to database, one must compare database features only. Hosted solution is not a database feature. If you're looking for a hosted MongoDB, go for MongoHQ and they do all the grunt work which you may want to avoid while focusing on your core work.
It's true, though the initial cost comparison we did showed dynamo being a pretty good deal. The other issue is that if you do have to upsize/downsize dynamo, it's a click of a button. If you have to add disk or resize a mongo server, there is downtime involved, whether you have to do it, or someone else.
@Kabeer I 100% agree with you technically, but in the real world the entire package matters to make a business decision. Ultimately, this is a business decision.
M
Maziyar

With 500k documents, there is no reason to scale whatsoever. A typical laptop with an SSD and 8GB of ram can easily do 10s of millions of records, so if you are trying to pick because of scaling your choice doesn't really matter. I would suggest you pick what you like the most, and perhaps where you can find the most online support with.


yeah my mayor concern is about scaling up and the maintenance over the time to be honest personally I feel that mongoDB can do the job I'm just thinking about in terms of mid and long term maintenance
Derick, another major factor in scale is utilization, not just doc count or db size. @jack don't "feel" but rely on testing, including the platform and hardware of final deployment; a week spent stuffing a couple db variants with data and benchmarking should lead to informed decisions saving lots of pain.
Providing a professional product/service goes far beyond what a simple "this can do that" solution. Just because a cheapo machine can run Linux, MongoDB and millions of records for almost no money doesn't equal great performance in the real world. 500K records (with a SIMPLE schema) would probably be a good candidate for DynamoDB simply because the OP would have no maintenance cost (for hardware at least) and the monthly charge would probably be far less than the cost of a server over the course of a year or two.
P
Paul

For quick overview comparisons, I really like this website, that has many comparison pages, eg AWS DynamoDB vs MongoDB; http://db-engines.com/en/system/Amazon+DynamoDB%3BMongoDB


thanks for the link! I have never been before to the db-engines.com. Great site!
D
Deemoe

Short answer: Start with SQL and add NoSQL only when/if needed. (unless you don't need anything beyond very simple queries)

My personal experience: I haven't used MongoDB for queries but as of April 2015 DynamoDB is still very crippled when it comes to anything beyond the most basic key/value queries. I love it for the basic stuff but if you want query language then look to a real SQL database solution.

In DynamoDB you can query on a hash or on a hash and range key, and you can have multiple secondary global indexes. I'm doing queries on a single table with 4 possible filter parameters and sorting the results, this is supported (barely) through the use of the global secondary indexes with filter expressions. The problem comes in when you try to get the total results matching the filter, you can't just search for the first 10 items matching the filter, but rather it checks 10 items and you may get 0 valid results forcing you to keep re-scanning from the continue key - pain in the neck and consumes too much of your table read quota for a simple scenario.

To be specific about the limit problem with filters in the query, this is from the docs (http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/QueryAndScan.html#ScanQueryLimit):

In a response, DynamoDB returns all the matching results within
the scope of the Limit value. For example, if you issue a Query 
or a Scan request with a Limit value of 6 and without a filter
expression, the operation returns the first six items in the 
table that match the request parameters. If you also supply a
FilterExpression, the operation returns the items within the 
first six items in the table that match the filter requirements.

My conclusion is that queries involving FilterExpressions are only usable on very rare occasions and are not scalable because each query can easily read most or all of your of your table which consumes far too many DynamoDB read units. Once you use too many read units you'll get throttled and see poor performance.

Expert opinion: In the AWS summit on Apr 9, 2015 Brett Hollman, Manager, Solutions Architecture, AWS in his talk on scalling to your first 10 million users advocates starting with a SQL database and then using NoSQL only when and if it makes sense. Because sooner or later you'll probably need a SQL server somewhere in your stack. His slides are here: http://www.slideshare.net/AmazonWebServices/deep-dive-scaling-up-to-your-first-10-million-users See slide 28.


You should really check out how easy is to integrate cloudsearch with dynamodb streams and lambda for reach full text or location based queries.
Choose your database according to your needs. This is not a choice between SQL and noSQL, but between documents-oriented DB, graph-oriented DB, key-value DB, RDMBS.... There is no golden choice, and SQL is certainly not.
S
Steffan Perry

We chose a combination of Mongo/Dynamo for a healthcare product. Basically mongo allows better searching, but the hosted Dynamo is great because its HIPAA compliant without any extra work. So we host the mongo portion with no personal data on a standard setup and allow amazon to deal with the HIPAA portion in terms of infrastructure. We can query certain items from mongo which bring up documents with pointers (ID's) of the relatable Dynamo document.

The main reason we chose to do this using mongo instead of hosting the entire application on dynamo was for 2 reasons. First, we needed to preform location based searches which mongo is great at and at the time, Dynamo was not, but they do have an option now.

Secondly was that some documents were unstructured and we did not know ahead of time what the data would be, so for example lets say user a inputs a document in the "form" collection like this: {"username": "user1", "email": "me@me.com"}. And another user puts this in the same collection {"phone": "813-555-3333", "location": [28.1234,-83.2342]}. With mongo we can search any of these dynamic and unknown fields at any time, with Dynamo, you could do this but would have to make a index every time a new field was added that you wanted searchable. So if you have never had a phone field in your Dynamo document before and then all of the sudden, some one adds it, its completely unsearchable.

Now this brings up another point in which you have mentioned. Sometimes choosing the right solution for the job does not always mean choosing the best product for the job. For example you may have a client who needs and will use the system you created for 10+ years. Going with a SaaS/IaaS solution that is good enough to get the job done may be a better option as you can rely on amazon to have up-kept and maintained their systems over the long haul.


R
Rahul Kumar

I have worked on both and kind of fan of both.

But you need to understand when to use what and for what purpose.

I don't think It's a great idea to move all your database to DynamoDB, reason being querying is difficult except on primary and secondary keys, Indexing is limited and scanning in DynamoDB is painful.

I would go for a hybrid sort of DB, where extensive query-able data should be there is MongoDB, with all it's feature you would never feel constrained to provide enhancements or modifications.

DynamoDB is lightning fast (faster than MongoDB) so DynamoDB is often used as an alternative to sessions in scalable applications. DynamoDB best practices also suggests that if there are plenty of data which are less being used, move it to other table.

So suppose you have a articles or feeds. People are more likely to look for last week stuff or this month's stuff. chances are really rare for people to visit two year old data. For these purposes DynamoDB prefers to have data stored by month or years in different tables.

DynamoDB is seemlessly scalable, something you will have to do manually in MongoDB. however you would lose on performance of DynamoDB, if you don't understand about throughput partition and how scaling works behind the scene.

DynamoDB should be used where speed is critical, MongoDB on the other hand has too many hands and features, something DynamoDB lacks.

for example, you can have a replica set of MongoDB in such a way that one of the replica holds data instance of 8(or whatever) hours old. Really useful, if you messed up something big time in your DB and want to get the data as it is before.

That's my opinion though.


And a combination of Redis and MongoDB? That's awesome, I think.
I guess so, I don't have a hands on experience on Redis but for sure it is widely used because of it's performance, in memory DBs almost always better perform than disk based DBs. So I think data which needs to be accessed on huge demand and high frequency should go to Redis. On the other hand for large lethargic data MongoDB should be used.
A
Andrew Smith

Bear in mind, I've only experimented with MongoDB...

From what I've read, DynamoDB has come a long way in terms of features. It used to be a super-basic key-value store with extremely limited storage and querying capabilities. It has since grown, now supporting bigger document sizes + JSON support and global secondary indices. The gap between what DynamoDB and MongoDB offers in terms of features grows smaller with every month. The new features of DynamoDB are expanded on here.

Much of the MongoDB vs. DynamoDB comparisons are out of date due to the recent addition of DynamoDB features. However, this post offers some other convincing points to choose DynamoDB, namely that it's simple, low maintenance, and often low cost. Another discussion here of database choices was interesting to read, though slightly old.

My takeaway: if you're doing serious database queries or working in languages not supported by DynamoDB, use MongoDB. Otherwise, stick with DynamoDB.