MongoDB Comprehensive Guide | Generated by AI
MongoDB is a popular open-source NoSQL document database designed for modern application development. It stores data in flexible, JSON-like documents called BSON (Binary JSON), making it ideal for handling unstructured or semi-structured data. Unlike traditional relational databases, MongoDB uses a schema-less approach, allowing collections (groups of documents) to have varying structures. It’s highly scalable, supports horizontal scaling via sharding, and provides high availability through replication. MongoDB can be deployed on-premises, in the cloud via MongoDB Atlas (a managed service), or in hybrid environments. This guide covers everything from basics to advanced topics, with examples using the MongoDB Shell (mongosh).
Introduction
MongoDB excels in scenarios requiring rapid development, flexible data models, and high performance. Key features include:
- Document Model: Data as self-contained documents with nested structures.
- Query Language: Rich queries using a syntax similar to JavaScript objects.
- Scalability: Built-in support for distributed systems.
- Ecosystem: Integrates with languages like Python, Node.js, Java via official drivers.
It’s used by companies like Adobe, eBay, and Forbes for applications involving big data, real-time analytics, and content management.
Installation
MongoDB offers Community (free, open-source) and Enterprise editions. Installation varies by platform; always download from the official site for security.
Windows
- Download the MSI installer from the MongoDB Download Center.
- Run the installer, select “Complete” setup, and include MongoDB Compass (GUI tool).
- Add MongoDB’s
bin
directory (e.g.,C:\Program Files\MongoDB\Server\8.0\bin
) to your PATH. - Create a data directory:
mkdir -p C:\data\db
. - Start the server:
mongod.exe --dbpath C:\data\db
.
Supported: Windows 11, Server 2022/2019.
macOS
- Use Homebrew:
brew tap mongodb/brew && brew install mongodb-community
. - Or download the TGZ archive, extract, and add to PATH.
- Create data directory:
mkdir -p /data/db
. - Start:
mongod --dbpath /data/db
(or usebrew services start mongodb/brew/mongodb-community
).
Supported: macOS 11–14 (x86_64 and arm64).
Linux
- For Ubuntu/Debian: Add MongoDB repo key and list, then
apt-get install -y mongodb-org
. - For RHEL/CentOS: Use yum/dnf with the repo file.
- Create data directory:
sudo mkdir -p /data/db && sudo chown -R $USER /data/db
. - Start:
sudo systemctl start mongod
.
Supported: Ubuntu 24.04, RHEL 9+, Debian 12, Amazon Linux 2023, etc. Use XFS/EXT4 filesystems; avoid 32-bit.
Cloud (MongoDB Atlas)
- Sign up at mongodb.com/atlas.
- Create a free cluster via UI or CLI:
atlas clusters create <name> --provider AWS --region us-east-1 --tier M0
. - Whitelist your IP:
atlas network-access create <IP>
. - Get connection string and connect:
mongosh "mongodb+srv://<user>:<pass>@cluster0.abcde.mongodb.net/"
.
Atlas handles backups, scaling, and monitoring automatically.
Core Concepts
Databases
Containers for collections, logically separating data. Create implicitly on first use: use mydb
. Switch with use mydb
. List: show dbs
.
Collections
Groups of documents, like tables but schema-flexible. Create implicitly: db.mycollection.insertOne({})
. List: show collections
.
Documents
Basic units: BSON objects with key-value pairs. Example:
{ "_id": ObjectId("..."), "name": "John", "age": 30, "address": { "city": "NYC", "zip": 10001 } }
Supports arrays, nested objects, and types like dates, binaries.
BSON
Binary format for efficient storage/networking. Extends JSON with types like ObjectId, Date, Binary.
Namespaces
Unique identifiers: database.collection
(e.g., mydb.orders
).
Example setup:
use test
db.orders.insertMany([
{ item: "almonds", price: 12, quantity: 2 },
{ item: "pecans", price: 20, quantity: 1 }
])
CRUD Operations
Use db.collection.method()
in mongosh. Transactions via sessions for multi-document ACID.
Create (Insert)
- Single:
db.users.insertOne({ name: "Alice", email: "alice@example.com" })
- Multiple:
db.users.insertMany([{ name: "Bob" }, { name: "Charlie" }])
Returns inserted IDs.
Read (Find)
- All:
db.users.find()
- Filtered:
db.users.find({ age: { $gt: 25 } })
- Pretty print:
.pretty()
- Limit/sort:
db.users.find().limit(5).sort({ age: -1 })
Update
- Single:
db.users.updateOne({ name: "Alice" }, { $set: { age: 31 } })
- Multiple:
db.users.updateMany({ age: { $lt: 20 } }, { $set: { status: "minor" } })
- Increment:
{ $inc: { score: 10 } }
Delete
- Single:
db.users.deleteOne({ name: "Bob" })
- Multiple:
db.users.deleteMany({ status: "inactive" })
- Drop collection:
db.users.drop()
Querying and Indexing
Querying
Use predicates for conditions. Supports equality, ranges, logical ops.
- Basic:
db.inventory.find({ status: "A" })
(SQL equiv:WHERE status = 'A'
) - $in:
db.inventory.find({ status: { $in: ["A", "D"] } })
- $lt/$gt:
db.inventory.find({ qty: { $lt: 30 } })
- $or:
db.inventory.find({ $or: [{ status: "A" }, { qty: { $lt: 30 } }] })
- Regex:
db.inventory.find({ item: /^p/ })
(starts with “p”) - Embedded:
db.users.find({ "address.city": "NYC" })
Projection (select fields): db.users.find({ age: { $gt: 25 } }, { name: 1, _id: 0 })
Indexing
Improves query speed by avoiding full scans. B-tree based.
- Types: Single-field (
db.users.createIndex({ name: 1 })
), Compound ({ name: 1, age: -1 }
), Unique ({ email: 1 }
). - Benefits: Faster equality/range queries, sorted results.
- Creation:
db.users.createIndex({ age: 1 })
(ascending). - View:
db.users.getIndexes()
- Drop:
db.users.dropIndex("age_1")
Use Atlas Performance Advisor for recommendations. Trade-off: Slower writes.
Aggregation Framework
Processes data through stages in a pipeline. Like SQL GROUP BY but more powerful.
- Basic:
db.orders.aggregate([ { $match: { price: { $lt: 15 } } } ])
- Stages:
$match
(filter),$group
(aggregate:{ $sum: "$price" }
),$sort
,$lookup
(join:{ from: "inventory", localField: "item", foreignField: "sku", as: "stock" }
),$project
(reshape). - Example (join and sort):
db.orders.aggregate([ { $match: { price: { $lt: 15 } } }, { $lookup: { from: "inventory", localField: "item", foreignField: "sku", as: "inventory_docs" } }, { $sort: { price: 1 } } ])
Expressions:
{ $add: [ "$price", 10 ] }
. Preview in Atlas UI.
Schema Design
MongoDB’s flexibility avoids rigid schemas but requires thoughtful design for performance.
- Principles: Model for access patterns (reads/writes), use indexes, keep working set in RAM.
- Embedding: Denormalize related data in one document for atomic reads/writes. E.g., embed comments in posts. Pros: Fast queries. Cons: Duplication, large docs.
- Referencing: Normalize with IDs. E.g.,
posts
refsusers
viauserId
. Use$lookup
for joins. Pros: Less duplication. Cons: Multiple queries. - Patterns: One-to-few (embed), one-to-many (reference or embed array), many-to-many (reference).
- Validation: Enforce with
db.createCollection("users", { validator: { $jsonSchema: { ... } } })
.
Consider duplication trade-offs and atomicity (document-level only).
Replication and Sharding
Replication
Provides redundancy/high availability via replica sets (group of mongod
instances).
- Components: Primary (writes), Secondaries (replicate via oplog, reads optional), Arbiter (votes, no data).
- Deployment: Init with
rs.initiate({ _id: "rs0", members: [{ _id: 0, host: "host1:27017" }] })
. Add members:rs.add("host2:27017")
. - Elections: If primary fails, secondary elects in ~10-12s.
- Read Preference:
primary
,secondary
(may lag). - Use for failover, backups. Enable flow control to manage lag.
Sharding
Horizontal scaling: Distribute data across shards.
- Components: Shards (replica sets), Mongos (routers), Config servers (metadata).
- Shard Key: Field(s) for partitioning (e.g., hashed for even distribution). Create index first.
- Setup: Enable sharding
sh.enableSharding("mydb")
, shard collectionsh.shardCollection("mydb.users", { userId: "hashed" })
. - Balancer: Migrates chunks for even load. Zones for geo-locality.
- Strategies: Hashed (uniform), Ranged (targeted queries).
Connect via mongos; supports transactions.
Security
Secure deployments with layered protections.
- Authentication: SCRAM, LDAP, OIDC, X.509. Create users:
db.createUser({ user: "admin", pwd: "pass", roles: ["root"] })
. - Authorization: Role-Based Access Control (RBAC). Built-in roles: read, readWrite, dbAdmin.
- Encryption: TLS/SSL for transit, Encryption at Rest (EAR) via AWS KMS/Google Cloud KMS/Azure Key Vault. Client-Side Field Level Encryption (CSFLE) for sensitive fields.
- Network: IP access lists, VPC peering in Atlas.
- Auditing: Log operations.
Enable auth on startup: --auth
. Use Atlas for built-in security.
Best Practices
- Production Setup: Run as service (systemctl/brew). Separate data/journal/logs on SSDs. Use WiredTiger engine (default).
- Monitoring:
mongostat
,mongotop
, Atlas charts. Watch connections (connPoolStats
), cache evictions, I/O (iostat
). - Backups:
mongodump
/mongorestore
, or Atlas snapshots. Logical (JSON) vs physical (files). - Performance: Index wisely, limit projections, avoid large arrays. Set
ulimit -n 64000
, disable swap (vm.swappiness=0
), tune TCP keepalive. - Scaling: Start with replica sets; shard at >100GB or high throughput.
- Data Import:
mongoimport --db test --collection users --file users.json
. - Tools: MongoDB Compass (GUI), mongosh (shell), drivers for apps.
- Common Pitfalls: Over-embedding (doc size limit 16MB), ignoring indexes, not handling replication lag.
For development, use Atlas M0 free tier. Test workloads with mongoperf
.
References
- MongoDB Manual
- MongoDB Installation
- MongoDB Core Concepts
- MongoDB CRUD
- MongoDB Querying
- MongoDB Indexes
- MongoDB Aggregation
- MongoDB Schema Design
- MongoDB Replication
- MongoDB Sharding
- MongoDB Security
- MongoDB Production Notes