Saturday, March 6, 2021

MongoDB Is it good to store entire data in Single Collection?

Well, MongoDB models are usually meant to hold data and relationship together since it doesn't provides JOINS ($lookup is the nearest to join and costly, best to avoid).


That's why in DB modeling there is huge emphasis on denormalization, since there are two benefits of storing together


You wouldn't have to join the collections and you can get the data in a single query.

Since mongo provides atomic update, you can update comments and article in one go, not worrying about transaction and rollback.

So almost certainly you would like to put comments inside article collection. So it would be something like



articles: [

  {

    _id: uid,

    owner: userId,

    title: string,

    text: text,

    comments: [

      {

        _id: commentId,

        text: text,

        user: {

          name: string,

          id: uid

        }

      }

    ]

  }

]


Before we agree to it, let us see the drawback of above approach.


There is a limit of 16MB per document which is huge, but think if the text of your article is large and the comments on that article is also in large number, maybe it can cross 16 MB.


All the places where you get article for other purposes you might have to exclude the comments field, otherwise it would be heavy and slow.


If you have to do aggregation again we might get into memory limit issue if we need to aggregate based on comments also one way or other.


These are serious problem, and we cannot ignore that, now we might want to keep it in different collection and see what we are losing.


First of all comment and articles though linked but are different entity, so you might never need to update them together for any field.


Secondly, you would have to load comments separately, which makes sense in normal use-case, in most application that's how we proceed, so that too is not an issue.


So in my opinion clear winner is having two separate collection


articles: [

  {

    _id: uid,

    owner: userId,

    title: string,

    text: text,

  }

],

comments: [

  {

    // single comment

    articleId: uid,

    text: text,

    user: {

      name: string,

      id: uid

    }

  }

]


You wouldn't want to go comment_2 way if you are choosing for two collection approach, again for same reason as what if there are huge comments for a single article


references:

https://stackoverflow.com/questions/41408248/mongodb-store-all-related-data-in-one-collection-or-abstract-pieces-of-data-fro

No comments:

Post a Comment