Before getting to know about Aggregation Framework, we have to understand what it is mongoDB, So let’s have a quick look at MongoDB
MONGODB IS A FREE AND OPEN-SOURCE NOSQL DOCUMENT DATABASE THAT STORES DATA IN BSON (BINARY JSON) FORMAT. IT IS DESIGNED TO PROVIDE HIGH PERFORMANCE, SCALABILITY, AND EASE OF USE FOR MODERN WEB APPLICATIONS. MONGODB IS KNOWN FOR ITS FLEXIBLE DOCUMENT-ORIENTED DATA MODEL AND STRONG CONSISTENCY AND RELIABILITY, MAKING IT A POPULAR CHOICE FOR BUSINESSES OF ALL SIZES.
MongoDB allows developers to store and retrieve data in a flexible and scalable way. One of the key features of MongoDB is its support for aggregation (which we are going to talk about in this article), which enables users to process large amounts of data and return computed results.
What is an aggregation framework?
The Aggregation Framework in MongoDB is a pipeline-based mechanism for processing and transforming data stored in the database. It provides a way to perform operations on collections of documents, such as grouping, filtering, transforming, and summarizing data, to produce an aggregated result. The Aggregation Framework supports a wide range of operations and can be used to perform complex data processing and analytics tasks in an efficient and scalable manner. The Aggregation Framework is a crucial feature of MongoDB and is commonly used for business intelligence and data analysis.
Aggregation in MongoDB can be performed using the aggregate() method, which takes an array of aggregation pipeline stages as its argument. Each stage in the pipeline transforms the data in some way, allowing users to perform operations such as filtering, grouping, and summarizing data. The results of each stage are passed to the next stage in the pipeline until the final result is computed.
Some common use cases for MongoDB aggregation include:
- Data analysis: You can use aggregation to summarize large amounts of data and identify trends and patterns.
- Reporting: You can use aggregation to produce reports that summarize data in various ways.
- Data cleaning: You can use aggregation to clean and transform data before inserting it into your database.
To get started with MongoDB aggregation, you’ll need to have a basic understanding of the aggregation pipeline stages, including $match
, $group
, $sort
, $skip
, and $limit
. These stages allow you to filter and manipulate data in a variety of ways, and you can combine them to perform complex aggregations.
$match: Filters the documents to pass only the documents that match the specified condition(s) to the next pipeline stage.
$group: The $group stage separates documents into groups according to a “group key”. The output is one document for each unique group key.
$sort: Sorts all input documents and returns them to the pipeline in sorted order.
$skip: Skips over the specified number of documents that pass into the stage and passes the remaining documents to the next stage in the pipeline.
$limit: Limits the number of documents passed to the next stage in the pipeline.
import pymongo # Connect to the MongoDB database client = pymongo.MongoClient("mongodb://localhost:27017/") db = client["testdb"] collection = db["sales"] # Define the aggregation pipeline pipeline = [ { "$match": { "region": "North", } }, { "$group": { "_id": "$region", "total_sales": { "$sum": "$sales" } } }, { "$sort": { "total_sales": -1 } }, { "$skip": 5 }, { "$limit": 10 } ] # Execute the aggregation pipeline result = list(collection.aggregate(pipeline)) # Print the result for r in result: print(r)
In the above code, we first connect to the MongoDB database using the PyMongo library. Then, we define the aggregation pipeline, which includes the $match
, $group
, $sort
, $skip
, and $limit
operators. Finally, we execute the aggregation pipeline using the aggregate()
method on the collection, and the result is stored in a list.
The result of this code would be the top 10 total sales for the “North” region, sorted in descending order, skipping the first 5 documents.
Here’s what the code does:
- The
$match
operator is used to filter the documents in the collection based on two conditions:region
must be “North”. - The
$group
operator is used to group the filtered documents byregion
, and calculate the total sales for each group using the$sum
operator. - The
$sort
operator is used to sort the documents based on thetotal_sales
field in descending order. - The
$skip
operator is used to skip the first 5 documents. - The
$limit
operator is used to limit the result to the top 10 documents.
The final result of the code is a list of dictionaries, with each dictionary representing a document in the result. The dictionary contains two fields: _id
and total_sales
. The _id
field represents the region
field, and the total_sales
field represents the total sales for each region
.
Apart from the above operators, the aggregation framework also has:
$project
: Reshapes the documents in the pipeline by adding, removing, or modifying fields.$unwind
: Flattens an array field in a document into multiple documents, one for each array item.$lookup
: Performs a left outer join between two collections, adding data from a related collection to documents in the pipeline.$redact
: Removes data from documents based on conditions expressed in a security-redaction expression.$out
: Writes the results of the pipeline to a new collection.$facet
: Performs multiple aggregations within a single pipeline and returns the results as an array of documents.$bucket
: Groups values into “buckets” based on specified boundaries.$geoNear
: Finds the documents in a collection closest to a specified location.
You can read more about it here.
In conclusion, MongoDB aggregation framework is a powerful tool for processing and transforming data stored in MongoDB collections. The framework provides a wide range of operators and functions for filtering, grouping, calculating, reshaping, and transforming data, allowing you to perform complex data processing tasks with ease. Whether you need to perform simple operations like filtering and sorting, or more complex tasks like performing left outer joins, aggregating data into buckets, or writing the results to a new collection, MongoDB aggregation framework provides the tools you need to get the job done. With its flexible and scalable design, MongoDB aggregation framework is an essential tool for anyone working with data in MongoDB.
And that’s it! Happy Coding!! 😎