Are you a candidate? Complete list of MongoDB interview questions 👇

Index

Basic

  1. Why is MongoDB not chosen for a 32-bit system?
  2. In which format MongoDB represents document structure?
  3. Are null values permitted?
  4. On a high level, compare SQL databases and MongoDB.
  5. How does Scale-Out occur in MongoDB?
  6. Does MongoDB write to disk automatically or slowly?
  7. How to query MongoDB with %like%?
  8. How does MongoDB provide concurrency?
  9. “When you add more slaves to a replica set, both writes and reads become faster.” Is this claim correct or incorrect? Why is this so?
  10. What are the similarities and differences between sharding and replication in MongoDB?
  11. What is the difference between a secondary and a slave?
  12. What is the significance of a covered query?
  13. Define the Aggregation pipeline.
  14. When to use MongoDB?
  15. When to use - embeded or referenced relationship?
  16. Can you configure the cache size for WiredTiger?
  17. Define the application-level Encryption.
  18. What are 32-bit nuances?
  19. While creating Schema in MongoDB what are the points need to be taken in consideration?
  20. Does an update fsync to the disk immediately?
  21. What are some features of MongoDB?
  22. How can concurrency affect replica sets primary?
  23. Which are the two storage engines used by MongoDB?
  24. What will happen when you remove a document from database in MongoDB? Does MongoDB remove it from disk?
  25. Is it essential for MongoDB to have a ton of RAM?
  26. Is an object attribute deleted from the store as it is removed?
  27. Explain relationships in MongoDB?
  28. When do we use a namespace in MongoDB?
  29. How can applications access real-time data changes in MongoDB?
  30. How many indexes does MongoDB create by default for a new collection?
  31. Can you configure the cache size for MMAPv1?
  32. Define MapReduce.
  33. Describe oplog?
  34. Do the MongoDB databases have schema?
  35. Explain the situation when an index does not fit into RAM.
  36. Explain what is GridFS in MongoDB?
  37. What exactly do you mean by NoSQL databases? Is MongoDB a NoSQL database? Please elaborate.
  38. When and to what degree does data become multi-slice?
  39. Why use MongoDB?
  40. What will have to do if a shard is down or slow and you do a query?
  41. Why MongoDB is known as best NoSQL database?
  42. What type of data is stored by MongoDB?
  43. When should we embed one document within another in MongoDB?
  44. What are the data types in MongoDB?
  45. What are MongoDB’s data models?
  46. What is a Collection in MongoDB?
  47. In MongoDB, how are constraints managed?
  48. What is a Document in MongoDB?
  49. What is Splitting in MongoDB?
  50. What is sharding in MongoDB?
  51. What is the use of the dot notation in MongoDB?
  52. Why are MongoDB data files large in size?
  53. What is the use of the db command?
  54. Is MongoDB better than other SQL databases? If yes then how?
  55. Define Auditing.
  56. How does MongoDB do text search?
  57. Can journaling features be used to perform safe hot backups?
  58. What are the main distinctions between BSON documents in MongoDB and JSON documents in general?
  59. MongoDB is referred to as a schema-less database. Is this true? How can you make a schema in MongoDB?
  60. How does MongoDB handle transactions or locking?
  61. What is Vertical Scaling?
  62. What are Databases in MongoDB?
  63. In MongoDB, how can CRUD operations be performed?
  64. State the difference between find() and limit() method.
  65. To do safe backups what is the feature in MongoDB that you can use?
  66. What type of DBMS is MongoDB?
  67. Define Horizontal Scaling.
  68. Can one MongoDB operation lock more than one databases? If yes, how?

Intermediate

  1. What is upsert operation in MongoDB?
  2. Explain Single Field Indexing?
  3. Explain Compound Indexing?
  4. Explain the process of Sharding.
  5. How is Querying done in MongoDB?
  6. Define MongoDB Projection.
  7. What are the components of the Sharded cluster?
  8. Describe oplog.
  9. Explain the term “Indexing” in MongoDB?
  10. Is there an "upsert" option in the mongodb insert command?
  11. When is MultiKey Indexing used?
  12. When and how is Text Indexing used?
  13. What are the types of Indexes available in MongoDB?

Advanced

  1. What are MongoDB Charts?
  2. What are the aggregate functions of MongoDB?
  3. How do we find array elements with multiple criteria?
  4. What are the pros and cons of normalizing data in a MongoDB database?
  5. Should I normalize my data before storing it in MongoDB?
  6. What do you mean by Transactions?
  7. Explain the concept of pipeline in the MongoDB aggregation framework.
  8. How do we control the MongoDB Performance?
  9. What is a Replica Set in MongoDB?
  10. What is the difference between the $all operator and the $in operator?
  11. Explain the concept of pipeline in the MongoDB aggregation framework.
  12. What is the Aggregation Framework in MongoDB?
  13. How MongoDB supports ACID transactions and locking functionalities?
  14. Assume there is a collection named users that looks like the one below. How can you get all houses in the “Rabia” neighborhood? [ { "_id" : ObjectId("5d011c94ee66e13d34c7c388"), "userName" : "kevin", "email" : "[email protected]", "password" : "affdsg342", "houses" : [ { "name" : "Big Villa", "neighborhood" : "Zew Ine" }, { "name" : "Small Villa", "neighborhood" : "Rabia" } ] }, { "_id" : ObjectId("5d011c94ee66e13d34c7c387"), "userName" : "sherif", "email" : "[email protected]", "password" : "67834783ujk", "houses" : [ { "name" : "New Mansion", "neighborhood" : "Nasr City" }, { "name" : "Old Villa", "neighborhood" : "Rabia" } ] } ]
  15. How can we sort the user-defined function? For example, x and y are integers, and how do we calculate “x-y”?
  16. What is the difference between the save and insert commands in MongoDB, and when do they act similarly?
  17. When a “moveChunk” fails, is it required to clean up partly moved docs?
  18. Assume there is a document with nested arrays that looks like the one below. How can you insert a “room” that has the name “Room 44” and size of “50” for a particular “house” that belongs to this user? { "_id": "682263", "userName" : "sherif", "email": "[email protected]", "password": "67834783ujk", "houses": [ { "_id": "2178123", "name": "New Mansion", "rooms": [ { "name": "4th bedroom", "size": "12" }, { "name": "kitchen", "size": "100" } ] } ] }
  19. A staple feature of relational database systems is the JOIN clause. What is the equivalent in MongoDB, and does it have any known limitations?
  20. Explain the Replication Architecture in MongoDB?
  21. Could you catch how the two queries are different? dealers.find({ "$and": [ { "length": { "$gt": 2000 } }, { "cars.weight": { "$gte": 800 } } ] }); and dealers.find({ "length": { "$gt": 2000 }, "cars.weight": { "$gte": 800 } });
  22. When is the SET Modifier used in MongoDB?
  23. How does MongoDB ensure high availability?
  24. What are some utilities for backup and restore in MongoDB?


The Questions
0. Why is MongoDB not chosen for a 32-bit system?

MongoDB uses memory-mapped files. When running a 32-bit build of MongoDB, the total storage size for the server, including data and indexes, is 2 GB. For this reason, we do not deploy MongoDB to production on 32-bit machines.

If we’re running a 64-bit build of MongoDB, there’s virtually no limit to the storage size. For production deployments, 64-bit builds and operating systems are strongly recommended.

1. In which format MongoDB represents document structure?

MongoDB uses BSON to represent document structures.

2. Are null values permitted?

Yes, it’s only for the members of an object. Since a null is not an object, it cannot be attached to the database collection. However, {} can be added.

3. On a high level, compare SQL databases and MongoDB.

SQL databases store data in the form of tables, rows, columns and documents. This data is contained in a pre-defined data format, which is not quite scalable for today’s rapidly increasing real-world applications. MongoDB, on the other hand, employs a modular framework that can be quickly changed and expanded.

4. How does Scale-Out occur in MongoDB?

The document-oriented data model of MongoDB makes it easier to split data across multiple servers. Balancing and loading data across a cluster is done by MongoDB. It then redistributes documents automatically.

The mongos acts as a query router, providing an interface between client applications and the sharded cluster.

Config servers store metadata and configuration settings for the cluster. MongoDB uses the config servers to manage distributed locks. Each sharded cluster must have its own config servers.

5. Does MongoDB write to disk automatically or slowly?

MongoDB writes data to disk in a haphazard fashion. It changes the data that is automatically written to the server, but it writes the data from the journal to disk slowly.

6. How to query MongoDB with %like%?
db.users.find({name: /a/})  //like '%a%'
db.users.find({name: /^pa/}) //like 'pa%'
db.users.find({name: /ro$/}) //like '%ro'

Or using Mongoose:

db.users.find({'name': {'$regex': 'sometext'}})
7. How does MongoDB provide concurrency?

MongoDB uses reader-writer locks that allow concurrent readers shared access to a resource, such as a database or collection, but give exclusive access to a single write operation.

8. “When you add more slaves to a replica set, both writes and reads become faster.” Is this claim correct or incorrect? Why is this so?

False. All write operations are only performed on the master. Read operations, on the other hand, may be performed on any instance — slave or master. As more slaves are added to a replica set, only reads get faster.

9. What are the similarities and differences between sharding and replication in MongoDB?

Both sharding and replication require the use of several instances to host the database. Replicas are MongoDB instances that contain identical data, hence the name. To maximize redundancy and availability, we use replicas. In contrast, for sharding, each shard instance has data that is distinct from its neighbors. For horizontal scaling, we use sharding.

10. What is the difference between a secondary and a slave?

A secondary is a node/member that uses the existing primary’s operations. This is accomplished by following the replication oplog (local.oplog.rs). The replication from primary to secondary is asynchronous; however, the secondary will strive to keep up with the primary as much as practicable (on a LAN, this is often a few milliseconds).

11. What is the significance of a covered query?

Since the index contains all of the fields, MongoDB will fit the question condition and return the result fields without having to search into the documents. Since indexes are contained in RAM or sequentially on disk, such access is much quicker.

12. Define the Aggregation pipeline.

The aggregation pipeline is a framework for performing aggregation tasks. The pipeline is used to transform documents into aggregated results.

13. When to use MongoDB?

You should use MongoDB when you are building internet and business applications that need to evolve quickly and scale elegantly. MongoDB is popular with developers of all kinds who are building scalable applications using agile methodologies. MongoDB is a great choice if one needs to:

  • Support a rapid iterative development.
  • Scale to high levels of read and write traffic: MongoDB supports horizontal scaling through Sharding, distributing data across several machines, and facilitating high throughput operations with large sets of data.
  • Scale your data repository to a massive size.
  • Evolve the type of deployment as the business changes.
  • Store, manage and search data with text, geospatial, or time-series dimensions.
14. When to use - embeded or referenced relationship?

In general,

  • embed is good if you have one-to-one or one-to-many relationships between entities, and
  • reference is good if you have many-to-many relationships.
15. Can you configure the cache size for WiredTiger?

For the WiredTiger storage engine, you can specify the maximum size of the cache that WiredTiger will use for all data. This can be done using storage.wiredTiger.engineConfig.cacheSizeGB option.

16. Define the application-level Encryption.

The application-level encryption provides encryption on a per-field or per-document basis within the application layer.

17. What are 32-bit nuances?

With journaling, there is an additional memory-mapped file activity. This would also restrict the restricted database size of 32-bit builds. Journaling is now disabled by default on 32-bit computers.

18. While creating Schema in MongoDB what are the points need to be taken in consideration?

Points need to be taken in consideration are

  • Design your schema according to user requirements
  • Combine objects into one document if you use them together. Otherwise, separate them
  • Do joins while write, and not when it is on read
  • For most frequent use cases optimize your schema
  • Do complex aggregation in the schema
19. Does an update fsync to the disk immediately?

No, it does not. By default, disk writes are lazy. A write cannot reach the disk for many seconds. For example, if the database receives a thousand increments to an object in one second, the object can only be flushed to disk once (it should be noted that fsync options are accessible both at the command line and via getLastError_old).

20. What are some features of MongoDB?
  • Indexing: It supports generic secondary indexes and provides unique, compound, geospatial, and full-text indexing capabilities as well.
  • Aggregation: It provides an aggregation framework based on the concept of data processing pipelines.
  • Special collection and index types: It supports time-to-live (TTL) collections for data that should expire at a certain time
  • File storage: It supports an easy-to-use protocol for storing large files and file metadata.
  • Sharding: Sharding is the process of splitting data up across machines.
21. How can concurrency affect replica sets primary?

In replication, when MongoDB writes to a collection on the primary, MongoDB also writes to the primary's oplog, which is a special collection in the local database. Therefore, MongoDB must lock both the collection's database and the local database.

22. Which are the two storage engines used by MongoDB?

MongoDB uses MMAPv1 and WiredTiger.

23. What will happen when you remove a document from database in MongoDB? Does MongoDB remove it from disk?

Yes. If you remove a document from database, MongoDB will remove it from disk too.

24. Is it essential for MongoDB to have a ton of RAM?

No, it does not. MongoDB does not need a ton of RAM to operate. It can operate on very little RAM because it dynamically allocates and deallocates RAM based on the needs of the processes.

25. Is an object attribute deleted from the store as it is removed?

Yes, you can erase the attribute and then re-save() the object.

26. Explain relationships in MongoDB?

Relationships in MongoDB are used to specify how one or more documents are related to each other. In MongoDB, the relationships can be modelled either by Embedded way or by using the Reference approach. These relationships can be of the following forms:

  • One to One
  • One to Many
  • Many to Many
27. When do we use a namespace in MongoDB?

During the sequencing of the names of the database and the collection, the namespace is used.

28. How can applications access real-time data changes in MongoDB?

Applications can access real-time data changes using Change streams which acts as a subscriber to all the collection operations like insert, delete and update.

29. How many indexes does MongoDB create by default for a new collection?

By default MongoDB creates a unique index on the _id field during the creation of a collection. The _id index prevents clients from inserting two documents with the same value for the _id field.

30. Can you configure the cache size for MMAPv1?

No. MMAPv1 does not allow configuring the cache size.

31. Define MapReduce.

MapReduce is a generic multi-phase data aggregation modality that is used for processing quantities of data.

32. Describe oplog?

The operational log (oplog) is a kind of capped collection that maintains a running record of all operations that change the data in your databases. It first performs database operations on the primary, after which it logs these operations in the primary’s oplog. The secondary members then copy and execute these operations in an asynchronous method.

33. Do the MongoDB databases have schema?

Yes. MongoDB databases have dynamic schema. There is no need to define the structure to create collections.

34. Explain the situation when an index does not fit into RAM.

When an index is too huge to fit into RAM, then MongoDB reads the index, which is faster than reading RAM because the indexes easily fit into RAM if the server has got RAM for indexes, along with the remaining set.

35. Explain what is GridFS in MongoDB?

For storing and retrieving large files such as images, video files and audio files GridFS is used. By default, it uses two files fs.files and fs.chunks to store the file’s metadata and the chunks.

36. What exactly do you mean by NoSQL databases? Is MongoDB a NoSQL database? Please elaborate.

The internet is now filled with huge data, users, complexities, etcetera, and it is also getting more nuanced by the day. NoSQL is the solution to all of these issues. It is not a standard database management system, nor is it a relational database management system (RDBMS). NoSQL is an abbreviation for “Not Only SQL”. NoSQL is a database that can manage and filter all types of unstructured, jumbled and complicated data. It’s also a different way of looking at the database. MongoDB is a NoSQL (no-SQL) database.

37. When and to what degree does data become multi-slice?

MongoDB scrap stands on a collection. As a result, an album containing all substances is stored in the form of a lump or mass. Where an extra time period is available there can be more than a few slice data accomplishment options, so when there is more than one chunk, data is extended to a large number of slices and can be extended to 64 MB.

38. Why use MongoDB?
  • MongoDB supports field, range-based, string pattern matching type queries. for searching the data in the database
  • MongoDB support primary and secondary index on any fields
  • MongoDB basically uses JavaScript objects in place of procedures
  • MongoDB uses a dynamic database schema
  • MongoDB is very easy to scale up or down
  • MongoDB has inbuilt support for data partitioning (Sharding).
39. What will have to do if a shard is down or slow and you do a query?

If a shard is down and you even do query then your query will be returned with an error unless you set a partial query option. But if a shard is slow them Mongos will wait for them till response.

40. Why MongoDB is known as best NoSQL database?

MongoDb is the best NoSQL database because, it is:

  • Document Oriented

  • Rich Query language

  • High Performance

  • Highly Available

  • Easily Scalable

41. What type of data is stored by MongoDB?

MongoDB stores data in the form of documents, which are JSON-like field and value pairs.

42. When should we embed one document within another in MongoDB?

You should consider embedding documents for:

  • contains relationships between entities
  • One-to-many relationships
  • Performance reasons
43. What are the data types in MongoDB?

MongoDB supports a wide range of data types as values in documents. Documents in MongoDB are similar to objects in JavaScript. Along with JSON’s essential key/value–pair nature, MongoDB adds support for a number of additional data types. The common data types in MongoDB are:

  • Null {"x" : null}
  • Boolean {"x" : true}
  • Number {"x" : 4}
  • String {"x" : "foobar"}
  • Date {"x" : new Date()}
  • Regular expression {"x" : /foobar/i}
  • Array {"x" : ["a", "b", "c"]}
  • Embedded document {"x" : {"foo" : "bar"}}
  • Object ID {"x" : ObjectId()}
  • Binary Data Binary data is a string of arbitrary bytes.
  • Code {"x" : function() { /* ... */ }}
44. What are MongoDB’s data models?

The structure of documents influences data modeling. Related data in MongoDB may be embedded in a single document structure (embedded data model). Through references from one document to another, the relationship between data is stored. It is called the normalized data model.

45. What is a Collection in MongoDB?

A collection in MongoDB is a group of documents. If a document is the MongoDB analog of a row in a relational database, then a collection can be thought of as the analog to a table. Documents within a single collection can have any number of different “shapes.”, i.e. collections have dynamic schemas. For example, both of the following documents could be stored in a single collection:

"{greeting" : "Hello world!", "views": 3}
{"signoff": "Good bye"}
46. In MongoDB, how are constraints managed?

Starting with MongoDB 3.2, you can add a document validator to collections. Unique indexes can also be formed using db.collection. createIndex(“key”: 1, );

47. What is a Document in MongoDB?

A Document in MongoDB is an ordered set of keys with associated values. It is represented by a map, hash, or dictionary. In JavaScript, documents are represented as objects: {"greeting" : "Hello world!"}

Complex documents will contain multiple key/value pairs: {"greeting" : "Hello world!", "views" : 3}

48. What is Splitting in MongoDB?

Splitting is a background process that is used to keep chunks from growing too large.

49. What is sharding in MongoDB?

The procedure of storing data records across multiple machines is referred as Sharding. It is a MongoDB approach to meet the demands of data growth. It is the horizontal partition of data in a database or search engine. Each partition is referred as shard or database shard.

50. What is the use of the dot notation in MongoDB?

MongoDB uses the dot notation to access the elements of an array and the fields of an embedded document.

51. Why are MongoDB data files large in size?

MongoDB doesn't follow file system fragmentation and pre allocates data files to reserve space while setting up the server. That's why MongoDB data files are large in size.

52. What is the use of the db command?

The db command gives the name of the currently selected database.

53. Is MongoDB better than other SQL databases? If yes then how?

MongoDB is better than other SQL databases because it allows a highly flexible and scalable document structure.

For example:

  • One data document in MongoDB can have five columns and the other one in the same collection can have ten columns.
  • MongoDB database are faster than SQL databases due to efficient indexing and storage techniques.
54. Define Auditing.

Auditing provides administrators with the ability to verify that the implemented security policies are controlling the activity in the system.

55. How does MongoDB do text search?

Text search can be done using text index. Example –

db.collection_name.ensureIndex();

56. Can journaling features be used to perform safe hot backups?

Yes.

57. What are the main distinctions between BSON documents in MongoDB and JSON documents in general?

JSON (JavaScript Object Notation), like XML, is a human-readable data exchange standard. JSON has been the most commonly adopted data exchange standard on the web. JSON accepts booleans, numbers, sequences and arrays as data types. BSON, on the other hand, is the binary encoding used by MongoDB to store its documents. It is equivalent to JSON, but it expands JSON to accept additional data types, such as Date. Unlike JSON records, BSON documents are ordered. BSON usually uses less room than JSON and traverses easier. Since it is binary, BSON is, therefore, faster to encrypt and decode.

58. MongoDB is referred to as a schema-less database. Is this true? How can you make a schema in MongoDB?

Since JSON is a schema-free data system, it will be more accurate to assume that MongoDB has a dynamically typed schema. Create and insert a text to begin creating a schema. When a document is entered into the database, a corresponding collection is generated.

59. How does MongoDB handle transactions or locking?

MongoDB does not use traditional locking with reduction since its presentation is intended to be light, quick and understandable. It can be compared to the MySQL MyISAM auto entrust sculpt. Performance is improved with the simplest business maintenance, particularly in a structure with several servers.

60. What is Vertical Scaling?

Vertical scaling adds more CPU and storage resources to increase capacity.

61. What are Databases in MongoDB?

MongoDB groups collections into databases. MongoDB can host several databases, each grouping together collections. Some reserved database names are as follows: admin local config

62. In MongoDB, how can CRUD operations be performed?
  • C – stands for create – db.collection.insert()

  • R – stands for read – db.collection.find()

  • U – stands for update – db.collection.update()

  • D – stands for delete – db.collection.remove({“fieldname” : ”value”})

63. State the difference between find() and limit() method.

find() – displays only selected data rather than all the data of a document. For example, if your document has 4 fields but you want to show only one, set the required field as 1 and others as 0.

db.COLLECTION_NAME.find({},); limit() – limit function limits the number of records fetched. For example, if you have 7 documents but want to display only the first 4 documents in a collection, use limit. Syntax –

db.COLLECTION_NAME.find().limit(NUMBER);

64. To do safe backups what is the feature in MongoDB that you can use?

Journaling is the feature in MongoDB that you can use to do safe backups.

65. What type of DBMS is MongoDB?

MongoDB is a document oriented DBMS

66. Define Horizontal Scaling.

Horizontal scaling divides the dataset and distributes data over multiple servers, or shards.

67. Can one MongoDB operation lock more than one databases? If yes, how?

Yes. Operations like copyDatabase(), repairDatabase(), etc. can lock more than one databases involved.

0. What is upsert operation in MongoDB?

Upsert operation in MongoDB is utilized to save document into collection. If document matches query criteria then it will perform update operation otherwise it will insert a new document into collection.

Upsert operation is useful while importing data from external source which will update existing documents if matched otherwise it will insert new documents into collection.

Example: Upsert option set for update

This operation first searches for the document if not present then inserts the new document into the database.

> db.car.update(
...    { name: "Qualis" },
...    {
...       name: "Qualis",
...       speed: 50
...    },
...    { upsert: true }
... )
WriteResult({
    "nMatched" : 0,
    "nUpserted" : 1,
    "nModified" : 0,
    "_id" : ObjectId("548d3a955a5072e76925dc1c")
})

The car with the name Qualis is checked for existence and if not, a document with car name "Qualis" and speed 50 is inserted into the database. The nUpserted with value "1" indicates a new document is inserted.

1. Explain Single Field Indexing?

MongoDB supports user-defined indexes like single field index. A single field index is used to create an index on the single field of a document. With single field index, MongoDB can traverse in ascending and descending order. By default, each collection has a single field index automatically created on the _id field, the primary key.

Example

{
  "_id": 1,
  "person": { name: "Alex", surname: "K" },
  "age": 29,
  "city": "New York"
}

We can define, a single field index on the age field.

db.people.createIndex( {age : 1} ) // creates an ascending index

db.people.createIndex( {age : -1} ) // creates a descending index With this kind of index we can improve all the queries that find documents with a condition and the age field, like the following:

db.people.find( { age : 20 } ) db.people.find( { name : "Alex", age : 30 } ) db.people.find( { age : { $gt : 25} } )

2. Explain Compound Indexing?

A compound index is an index on multiple fields. Using the same people collection we can create a compound index combining the city and age field.

db.people.createIndex( {city: 1, age: 1, person.surname: 1 } ) In this case, we have created a compound index where the first entry is the value of the city field, the second is the value of the age field, and the third is the person.name. All the fields here are defined in ascending order.

Queries such as the following can benefit from the index:

db.people.find( { city: "Miami", age: { $gt: 50 } } ) db.people.find( { city: "Boston" } ) db.people.find( { city: "Atlanta", age: {$lt: 25}, "person.surname": "Green" } )

3. Explain the process of Sharding.

Sharding is the process of splitting data up across machines. We also use the term “partitioning” sometimes to describe this concept. We can store more data and handle more load without requiring larger or more powerful machines, by putting a subset of data on each machine. MongoDB’s sharding allows you to create a cluster of many machines (shards) and break up a collection across them, putting a subset of data on each shard. This allows your application to grow beyond the resource limits of a standalone server or replica set.

4. How is Querying done in MongoDB?

The find method is used to perform queries in MongoDB. Querying returns a subset of documents in a collection, from no documents at all to the entire collection. Which documents get returned is determined by the first argument to find, which is a document specifying the query criteria.

For example: If we have a string we want to match, such as a "username" key with the value "bob", we use that key/value pair instead:

>db.users.find({"username" : "bob"})

5. Define MongoDB Projection.

Projection is used to select only the necessary data. It does not select the whole data of a document.

6. What are the components of the Sharded cluster?

The sharded cluster has the following components:

Shards Query routers Config server

7. Describe oplog.

The operational log (oplog) is a kind of capped collection that maintains a running record of all operations that change the data in your databases. It first performs database operations on the primary, after which it logs these operations in the primary’s oplog. The secondary members then copy and execute these operations in an asynchronous method.

8. Explain the term “Indexing” in MongoDB?

In MongoDB, indexes help in efficiently resolving queries. What an Index does is that it stores a small part of the data set in a form that is easy to traverse. The index stores the value of the specific field or set of fields, ordered by the value of the field as specified in the index. MongoDB’s indexes work almost identically to typical relational database indexes.

Indexes look at an ordered list with references to the content. These in turn allow MongoDB to query orders of magnitude faster. To create an index, use the createIndex collection method.

For example:

>db.users.find({"username": "user101"}).explain("executionStats")

Here, executionStats mode helps us understand the effect of using an index to satisfy queries.

9. Is there an "upsert" option in the mongodb insert command?

The db.collection.insert() provides no upsert possibility. Instead, mongo insert inserts a new document into a collection. Upsert is only possible using db.collection.update() and db.collection.save() .

10. When is MultiKey Indexing used?

This is the index type for arrays. When creating an index on an array, MongoDB will create an index entry for every element.

Example

{
   "_id": 1,
   "person": { name: "John", surname: "Brown" },
   "age": 34,
   "city": "New York",
   "hobbies": [ "music", "gardening", "skiing" ]
 }

The multikey index can be created as:

db.people.createIndex( { hobbies: 1} ) Queries such as these next examples will use the index:

db.people.find( { hobbies: "music" } ) db.people.find( { hobbies: "music", hobbies: "gardening" } )

11. When and how is Text Indexing used?

Text index supports searching for string content in a collection. These index types do not store language-specific stop words (e.g. "the", "a", "or"). Text indexes restrict the words in a collection to only store root words.

Example Let's insert some sample documents.

var entries = db.people("blogs").entries;
entries.insert( {
  title : "my blog post",
  text : "i am writing a blog. yay",
  site: "home",
  language: "english" });
entries.insert( {
  title : "my 2nd post",
  text : "this is a new blog i am typing. yay",
  site: "work",
  language: "english" });
entries.insert( {
  title : "knives are Fun",
  text : "this is a new blog i am writing. yay",
  site: "home",
  language: "english" });

Let's define create the text index.

var entries = db.people("blogs").entries;
entries.ensureIndex({title: "text", text: "text"}, { weights: {
    title: 10,
    text: 5
  },
  name: "TextIndex",
  default_language: "english",
  language_override: "language" });

Queries such as these next examples will use the index:

var entries = db.people("blogs").entries;
entries.find({$text: {$search: "blog"}, site: "home"})
12. What are the types of Indexes available in MongoDB?

MongoDB supports the following types of the index for running a query.

  1. Single Field Index: It sorts and indexes over a single field.
  2. Compound Index: It is useful for multiple fields.
  3. Multikey Index: It indexes the array data.
  4. Geospatial Index: It is useful for querying the location data.
  5. Text Index: It indexes the strings.
  6. Hashed Index: It indexes the hashes of the multiple fields.
0. What are MongoDB Charts?

MongoDB Charts is a new, integrated tool in MongoDB for data visualization.

MongoDB Charts offers the best way to create visualizations using data from a MongoDB database. It allows users to perform quick data representation from a database without writing code in a programming language such as Java or Python.

The two different implementations of MongoDB Charts are:

  • MongoDB Charts PaaS (Platform as a Service)
  • MongoDB Charts Server
1. What are the aggregate functions of MongoDB?

Following are the aggregate functions of MongoDB:

  • AVG
  • Sum
  • Min
  • Max
  • First
  • Push
  • addTo Set
  • Last
2. How do we find array elements with multiple criteria?

For example, if we have the below documents:

{ _id: 1, numbers: [1000, -1000]]
{ _id: 2, numbers: [500]]

When we execute the following command:

db.example.find( { numbers: { $elemMatch: { $gt: -10, $lt: 10 } } } );

3. What are the pros and cons of normalizing data in a MongoDB database?

Just like in traditional RDBMSes, updating documents is fast for normalized data and relatively slower for denormalized data. On the other hand, reading documents is fast in denormalized data and slower for normalized data. Denormalized data is harder to keep in sync and takes up more space.

Note that in MongoDB, denormalized data is a more common expectation. This is because RDBMSes have inherent support for normalization and allow data to be managed as a separate concern, whereas NoSQL DBMSes like MongoDB do not inherently support normalization.

Instead, normalization requires that client applications carefully maintain integrity themselves. To help with this, it’s possible to run audits to ensure that app data conforms to expected patterns of referential integrity.

4. Should I normalize my data before storing it in MongoDB?

Data used by multiple documents can either be embedded (denormalized) or referenced (normalized). Normalization, which is increasing the complexity of the schema by splitting tables into multiple smaller ones to reduce the data redundancy( 1NF, 2NF, 3NF).

But Mongo follows the exact opposite way of what we do with SQL. In MongoDB, data normalization is not requried. Indeed we need to de-normalize and fit it into a collection of multiple documents.

Example: Let's say we have three tables

  • Table - 1 : ColumnA, ColumnB (primary key)
  • Table - 2 : ColumnC (Foreign key), ColumnD (primary key)
  • Table - 3 : ColumnE (foreign key), ColumnF

In this case, mongoDB document structure should be as follows.

{
    ColumnA : ValueA,
    ColumnB : ValueB,
    Subset1 : [{
       ColumnC : ValueC,
       ColumnD : ValueD,
       Subset2 : [{
           ColumnE : ValueE,
           ColumnF : ValueF
       }]
    }]
}
5. What do you mean by Transactions?

A transaction is a logical unit of processing in a database that includes one or more database operations, which can be read or write operations. Transactions provide a useful feature in MongoDB to ensure consistency.

MongoDB provides two APIs to use transactions.

  • Core API: It is a similar syntax to relational databases (e.g., start_transaction and commit_transaction)
  • Call-back API: This is the recommended approach to using transactions. It starts a transaction, executes the specified operations, and commits (or aborts on the error). It also automatically incorporates error handling logic for "TransientTransactionError" and"UnknownTransactionCommitResult".
6. Explain the concept of pipeline in the MongoDB aggregation framework.

An individual stage of an aggregation pipeline is a data processing unit. It takes in a stream of input documents one at a time, processes each document one at a time, and produces an output stream of documents one at a time.

7. How do we control the MongoDB Performance?

We can control the MongoDB Performance by:

  • Locking the Performance
  • Identifying the number of connections
  • Database Profiling
  • Full-time Diagnostic Data Capture
8. What is a Replica Set in MongoDB?

To keep identical copies of your data on multiple servers, we use replication. It is recommended for all production deployments. Use replication to keep your application running and your data safe, even if something happens to one or more of your servers.

Such replication can be created by a replica set with MongoDB. A replica set is a group of servers with one primary, the server taking writes, and multiple secondaries, servers that keep copies of the primary’s data. If the primary crashes, the secondaries can elect a new primary from amongst themselves.

9. What is the difference between the $all operator and the $in operator?

Both the $all operator and the $in operator are used to filter documents in a subarray based on a conditional. Let us assume we have the following documents in a collection.

[
    {
        "name": "Youssef",
        "sports": [
            "Boxing",
            "Wrestling",
            "Football"
        ]
    },
    {
        "name": "Kevin",
        "sports": [
            "Wrestling",
            "Football"
        ]
    },
    {
        "name": "Eva",
        "sports": [
            "Boxing",
            "Football"
        ]
    }
]

Using $all as shown below will return only the first two documents:

db.users.find({
    sports: {
        $all: ["Wrestling", "Football"]
    }
})

Using $in will return all three documents:

db.users.find({
    skills: {
        $in: ["Wrestling", "Football"]
    }
})

The $all operator is stricter than the $in operator. $all is comparable to an AND conditional, and likewise $in resembles an OR conditional. That is to say, $all retrieves documents that satisfy all conditions in the query array, whereas $in retrieves documents that meet any condition in the query array.

10. Explain the concept of pipeline in the MongoDB aggregation framework.

An individual stage of an aggregation pipeline is a data processing unit. It takes in a stream of input documents one at a time, processes each document one at a time, and produces an output stream of documents one at a time.

11. What is the Aggregation Framework in MongoDB?
  • The aggregation framework is a set of analytics tools within MongoDB that allow you to do analytics on documents in one or more collections.
  • The aggregation framework is based on the concept of a pipeline. With an aggregation pipeline, we take input from a MongoDB collection and pass the documents from that collection through one or more stages, each of which performs a different operation on its inputs. Each stage takes as input whatever the stage before it produced as output. The inputs and outputs for all stages are documents—a stream of documents.
12. How MongoDB supports ACID transactions and locking functionalities?

ACID stands that any update is:

  • Atomic: it either fully completes or it does not
  • Consistent: no reader will see a "partially applied" update
  • Isolated: no reader will see a "dirty" read
  • Durable: (with the appropriate write concern)

MongoDB, has always supported ACID transactions in a single document and, when leveraging the document model appropriately, many applications don't need ACID guarantees across multiple documents.

MongoDB is a document based NoSQL database with a flexible schema. Transactions are not operations that should be executed for every write operation since they incur a greater performance cost over a single document writes. With a document based structure and denormalized data model, there will be a minimized need for transactions. Since MongoDB allows document embedding, you don't necessarily need to use a transaction to meet a write operation.

MongoDB version 4.0 provides multi-document transaction support for replica set deployments only and probably the version 4.2 will extend support for sharded deployments.

Example: Multi-Document ACID Transactions in MongoDB

These are multi-statement operations that need to be executed sequentially without affecting each other. For example below we can create two transactions, one to add a user and another to update a user with a field of age.

$session.startTransaction()

   db.users.insert({_id: 6, name: "John"})

   db.users.updateOne({_id: 3, {$set: {age:26} }})
   
session.commit_transaction()

Transactions can be applied to operations against multiple documents contained in one or many collection/database. Any changes due to document transaction do not impact performance for workloads not related or do not require them. Until the transaction is committed, uncommitted writes are neither replicated to the secondary nodes nor are they readable outside the transactions.

13. Assume there is a collection named users that looks like the one below. How can you get all houses in the “Rabia” neighborhood? [ { "_id" : ObjectId("5d011c94ee66e13d34c7c388"), "userName" : "kevin", "email" : "[email protected]", "password" : "affdsg342", "houses" : [ { "name" : "Big Villa", "neighborhood" : "Zew Ine" }, { "name" : "Small Villa", "neighborhood" : "Rabia" } ] }, { "_id" : ObjectId("5d011c94ee66e13d34c7c387"), "userName" : "sherif", "email" : "[email protected]", "password" : "67834783ujk", "houses" : [ { "name" : "New Mansion", "neighborhood" : "Nasr City" }, { "name" : "Old Villa", "neighborhood" : "Rabia" } ] } ]

Use the $filter aggregation operator. The query is:

db.users.aggregate([
    { $match: { 'houses.neighborhood': 'Rabia' } },
    {
        $project: {
            filteredHouses: {   // This is just an alias 
                $filter: {
                    input: '$houses', // The field name we are checking
                    as: 'houseAlias', // just an alias
                    cond: { $eq: ['$$houseAlias.neighborhood', 'Rabia'] }
                }
            },
            _id: 0
        }
    }
])

The first match query will return all documents that have a house with the name Rabia. The first query in the pipeline, {$match: {'houses.neighborhood': 'Rabia'}}, will return the whole collection. This is because both users have one house in the neighborhood “Rabia”.

This is the return for the first query in the pipeline

[
    {
        "_id" : ObjectId("5d011c94ee66e13d34c7c388"),
        "userName" : "kevin",
        "email" : "[email protected]",
        "password" : "affdsg342",
        "houses" : [
            {
                "name" : "Big Villa",
                "neighborhood" : "Zew Ine"
            },
            {
                "name" : "Small Villa",
                "neighborhood" : "Rabia"
            }
        ]
    },
    {
        "_id" : ObjectId("5d011c94ee66e13d34c7c387"),
        "userName" : "sherif",
        "email" : "[email protected]",
        "password" : "67834783ujk",
        "houses" : [
            {
                "name" : "New Mansion",
                "neighborhood" : "Nasr City"
            },
            {
                "name" : "Old Villa",
                "neighborhood" : "Rabia"
            }
        ]
    },
]

We do not want to display other user details nor display houses other than those in Rabia, so we will use the $filter operator inside the $project operator:

{
    $project: {
        filteredHouses: {   // This is just an alias 
            $filter: {
                input: '$houses', // The field name we check
                as: 'houseAlias', // just an alias
                cond: { $eq: ['$$houseAlias.neighborhood', 'Rabia'] }
            }
        },
        _id: 0
    }
}

The $$ prefix is required on houseAlias (instead of simply one $) due to nesting.

Here is the result we obtain at the end of the pipeline:

[
    {
        "filteredHouses" : [
            {
                "name" : "Old Villa",
                "neighborhood" : "Rabia"
            }
        ]
    },
    {
        "filteredHouses" : [
            {
                "name" : "Small Villa",
                "neighborhood" : "Rabia"
            }
        ]
    }
]
14. How can we sort the user-defined function? For example, x and y are integers, and how do we calculate “x-y”?

By executing the following code, we calculate x-y.

db.eval(function() {
return db.scratch.find().toArray().sort(function(doc1, doc2) {
return doc1.a – doc2.a
})
});

Versus the equivalent client-side sort:

db.scratch.find().toArray().sort(function(doc1, doc2) {
return doc1.a – doc2.b
});

By using the aggregation pipeline and “$orderby” operator, it is possible to sort.

15. What is the difference between the save and insert commands in MongoDB, and when do they act similarly?

Whether we provide an _id determines the expected result for both of these commands. Here is the expected outcome for each case.

  • save command while providing an _id: In this case, the newly provided document replaces the document found with a matching _id.
  • save command while not providing an _id: Inserts a new document.
  • insert command while providing an _id: Gives a E11000 duplicate key error listing the collection, index, and duplicate key.
  • insert command while not providing an _id: Inserts a new document. As you can see, both the insert and save commands act similarly only when we do not provide an _id.

For example, the commands below would give us the same result:

db.cars.save({motor:"6-cylinder",color:"black"}) db.cars.insert({motor:"6-cylinder",color:"black"})

16. When a “moveChunk” fails, is it required to clean up partly moved docs?

No, it is not required to clean up the partly moved docs because chunk moves are deterministic and consistent. The move will try again, and when finished, data will be on the latest Shard.

17. Assume there is a document with nested arrays that looks like the one below. How can you insert a “room” that has the name “Room 44” and size of “50” for a particular “house” that belongs to this user? { "_id": "682263", "userName" : "sherif", "email": "[email protected]", "password": "67834783ujk", "houses": [ { "_id": "2178123", "name": "New Mansion", "rooms": [ { "name": "4th bedroom", "size": "12" }, { "name": "kitchen", "size": "100" } ] } ] }

We can do so with the following code, commented inline:

db.users.update(
    { 
        "_id": ObjectId("682263"),
        "houses._id":"2178123"     // identify the id for the house that we want to update
    },
    { "$push":   
        {
            "houses.$.rooms":      // identify the array we want to push items into
                {                  
                    "name": "Room 44",      // this is the payload that needs to be pushed 
                    "size": "50"
                }
        }
    }
)
18. A staple feature of relational database systems is the JOIN clause. What is the equivalent in MongoDB, and does it have any known limitations?

The $lookup operator is the equivalent of JOIN.

Here is an example of a nested lookup in MongoDB.

Assume we have three collections (authors, authorInfo, and userRole) with the following data:

// authors collection

[
    {
        "_id" : ObjectId("5d0127aaee66e13d34c7c389"),
        "address" : "32 Makram Ebeid Street",
        "isActive" : true,
        "authorId" : "121"
    }
]

// authorInfo collection

[
    {
        "_id" : ObjectId("5d0f726bac65f929d0fa98b2"),
        "authorId" : "121",
        "description" : "A description"
    }
]

// userRole collection

[
    {
        "_id" : ObjectId("5d012a08ee66e13d34c7c38f"),
        "userId" : "121",
        "role" : "manager"
    }
]

What if we want to join the authors from all three collections? In the SQL world, a JOIN query for this might look like:

SELECT a._id, a.address, b.description, c.role
  FROM authors a
  INNER JOIN "authorInfo" b ON b."authorId" = a."authorId"
  INNER JOIN "userRole" c ON c."userId" = a."authorId"

But in MongoDB, here is the equivalent query:

db.authors.aggregate([

    // Join with authorInfo table
    {
        $lookup:{
            from: "authorInfo",       // connecting authorInfo collection
            localField: "authorId",   // name of field in the authors collection
            foreignField: "authorId", // name of field in the authorInfo collection
            as: "authorInfoAlias"     // any alias
        }
    },
    {   $unwind:"$authorInfoAlias" }, // use the alias here
    
    // Join with userRole collection
    {
        $lookup:{
            from: "userRole", 
            localField: "authorId", 
            foreignField: "userId",
            as: "authorRoleAlias"
        }
    },
    {   $unwind:"$authorRoleAlias" },
    {   
        $project: {                                          // Just projecting our data.
            _id : 1,
            address : 1,
            description : "$authorInfoAlias.description",
            role : "$authorRoleAlias.role",
        } 
    }

The $ prefix is required for aliases to work.

The result of this query is the following:

[
    {
        "_id" : ObjectId("5d0127aaee66e13d34c7c389"),
        "address" : "32 Makram Ebeid Street",
        "description" : "A description",
        "role" : "manager"
    }
]

The major drawback of the $lookup operator is that it does not work in sharded collections.

It’s worth noting that, instead of looking for a direct equivalent to JOIN, a more common approach with MongoDB developers is to simple denormalize the data, precluding the need for a JOIN equivalent.

19. Explain the Replication Architecture in MongoDB?

A simple replica set cluster with only three server nodes – one primary node and two secondary nodes.

the PRIMARY database is the only active replica set member that receives write operations from database clients. The PRIMARY database saves data changes in the Oplog. Changes saved in the Oplog are sequential—that is, saved in the order that they are received and executed. The SECONDARY database is querying the PRIMARY database for new changes in the Oplog. If there are any changes, then Oplog entries are copied from PRIMARY to SECONDARY as soon as they are created on the PRIMARY node. Then, the SECONDARY database applies changes from the Oplog to its own datafiles. Oplog entries are applied in the same order they were inserted in the log. As a result, datafiles on SECONDARY are kept in sync with changes on PRIMARY. Usually, SECONDARY databases copy data changes directly from PRIMARY. Sometimes a SECONDARY database can replicate data from another SECONDARY. This type of replication is called Chained Replication because it is a two-step replication process. Chained replication is useful in certain replication topologies, and it is enabled by default in MongoDB.

20. Could you catch how the two queries are different? dealers.find({ "$and": [ { "length": { "$gt": 2000 } }, { "cars.weight": { "$gte": 800 } } ] }); and dealers.find({ "length": { "$gt": 2000 }, "cars.weight": { "$gte": 800 } });

Actually, they are exactly the same. MongoDB implicitly uses the $and operator for comma-separated queries. Which one to use is more a matter of preference than best practices.

21. When is the SET Modifier used in MongoDB?

$set be useful for updating schemas or adding user-defined keys.

Example:

> db.users.findOne() { "_id" : ObjectId("4b253b067525f35f94b60a31"), "name" : "alice", "age" : 23, "sex" : "female", "location" : "India" } To add a field to this, we use “$set”:

> db.users.updateOne({"_id" : ObjectId("4b253b067525f35f94b60a31")}, ... {"$set" : {"favorite book" : "Start with Why"}})

22. How does MongoDB ensure high availability?

High Availability (HA) refers to the improvement of system and app availability by minimizing the downtime caused by routine maintenance operations (planned) and sudden system crashes (unplanned).

Replica Set

The replica set mechanism of MongoDB has two main purposes:

  • One is for data redundancy for failure recovery. When the hardware fails, or the node is down for other reasons, you can use a replica for recovery.
  • The other purpose is for read-write splitting. It routes the reading requests to the replica to reduce the reading pressure on the primary node. MongoDB automatically maintains replica sets, multiple copies of data that are distributed across servers, racks and data centers. Replica sets help prevent database downtime using native replication and automatic failover.

A replica set consists of multiple replica set members. At any given time, one member acts as the primary member, and the other members act as secondary members. If the primary member fails for any reason (e.g., hardware failure), one of the secondary members is automatically elected to primary and begins to process all reads and writes.

23. What are some utilities for backup and restore in MongoDB?

The mongo shell does not include functions for exporting, importing, backup, or restore. However, MongoDB has created methods for accomplishing this, so that no scripting work or complex GUIs needed. For this, several utility scripts are provided that can be used to get data in or out of the database in bulk. These utility scripts are:

  • mongoimport
  • mongoexport
  • mongodump
  • mongorestore