Skip to main content

Joins in NoSQL world

                            : Joins in NoSQL world:

Here in NoSQL world, we have $lookup aggregate function for performing joins and it performs left outer equi-join functionality. This aggregate function is only available from MongoDB 3.2 versions only. The Left outer equi-join produces a result set that contains data for all documents from the left table (collection) together with data from the right table (collection) for documents where there is a match with documents from the left table (collection). See the diagram below                    

$lookup – Left Outer Equi-Joins

Below diagram illustrates the syntax for performing the join:

1)   leftCollection is the collection that the aggregation is being performed on and is the left collection in the join

2)   from identifies the collection that it will be joined with – the right collection (rightCollection in this case)

3)   localField specifies the key from the original/left collection – leftVal

4)   foreignField specifies the key from the right collection – rightVal

5)   as indicates that the data from the right collection should be embedded within the resulting documents as an array called embeddedData

MongoDB's Aggregation Framework:

The Aggregation Framework is a pipeline for data aggregation modeled on the concept of data processing pipelines. Documents enter a multi-stage pipeline that transforms the documents into aggregated results. The pipeline consists of stages; each stage transforms the documents as they pass through.

In general, each successive stage reduces the volume of data; removing information that is not needed and combining other data to produce summarized results.

Below diagram shows a conceptual model for the Aggregation Framework pipeline. This is what is happening at each stage:

·       On the left-hand side/start of the pipeline is the original collection contents – each record (document) containing several shapes (keys), each with a particular color (value)

·       The $match stage filters out any documents that don't contain a red diamond

·       The $project stage adds a new “square” attribute with a value computed from the value (color) of the snowflake and triangle attributes

·       The $lookup stage (new in 3.2 - more details later) performs a left-outer join with another collection, with the star being the comparison key. This creates new documents which contain everything from the previous stage but augmented with data from any document from the second collection containing a matching-colored star (i.e., the blue and yellow stars had matching “lookup” values, whereas the red star had none).

·       Finally, the $group stage groups the data by the color of the square and produces statistics (sum, average and standard deviation) for each group.


This is the full set of aggregation stages:

·       $match – Filter documents

·       $geoNear – Sort documents based on geographic proximity

·       $project – Reshape documents (remove or rename keys or add new data based on calculations on the existing data)

·       $lookup – Available in 3.2 – Left-outer joins

·       $unwind – Expand documents (for example create multiple documents where each contains one element from an array from the original document)

·       $group – Summarize documents

·       $sample – Randomly selects a subset of documents

·       $sort – Order documents

·       $skip – Jump over several documents

·       $limit – Limit number of documents

·       $redact – Restrict sensitive content from documents

·       $out – Available in 3.2* – store the results in a new collection

New Aggregation Operators in MongoDB 3.2:

There are operators used within each stage and this set is being extended in MongoDB 3.2 to include:

1.    Array operations

-         $slice, $arrayElemAt, $concatArrays, $isArray, $filter, $min, $max, $avg and $sum (some of these were previously available in a $group stage but not in $project

2.    Standard Deviations

-         $stdDevSamp (based on a sample) and $stdDevPop (based on the complete population)

3.    Square Root

-         $sqrt

4.    Absolute (make +ve) value

-         $abs

5.    Rounding numbers

-         $trunc, $ceil, $floor

6.    Logarithms

-         $log, $log10, $ln

7.    Raise to power

-         $pow

8.    Natural Exponent

-         $exp

Join Operations:

SQL

NoSQL

where

$match

group by

$group

having

$match

Select

$project

order by

$sort

limit

$limit

sum

$sum

count()

$sum or $sortbycount

join

$lookup

select into new_table

$out

merge into table

$merge

union all

$unionwith

$lookup field information-

Field

Description

 

 

 

 

localField

Specifies the field from the documents input to the $lookup stage. $lookup performs an equality match on the localField to the foreignField from the documents of the from collection. If an input document does not contain the localField, the $lookup treats the field as having a value of null for matching purposes.

 

from

Specifies the collection in the same database to perform the join with. The from collection cannot be sharded.

 

 

 

 

foreignField

Specifies the field from the documents in the from collection. $lookup performs an equality match on the foreignField to the localField from the input documents. If a document in the from collection does not contain the foreignField, the $lookup treats the value as null for matching purposes.

 

 

 

as

Specifies the name of the new array field to add to the input documents. The new array field contains the matching documents from the from collection. If the specified name already exists in the input document, the existing field is overwritten.

Here are sample queries which shows difference between SQL Join query vs NoSQL Join query:

SQL Table Representation:

Customers Table:

+---------+------+----------+--------+
| cust_id | age  | location | gender |
+---------+------+----------+--------+
|    1000 |   42 | Austin   | female |
+---------+------+----------+--------++----------+------------+--------+---------+

Orders Table:
| order_id | date       | amount | cust_id |
+----------+------------+--------+---------+
|        1 | 2020-10-01 |  27.40 |    1001 |
+----------+------------+--------+---------+

NoSQL(MongoDB) Collections Representation:

Customers Collection:
{
 "_id" : ObjectId("600e120b44284c416405dd7e"),
 "cust_id" : "1000",
 "age" : 42,
 "location" : "Austin",
 "gender" : "Female"
}
Orders Collection:
{
 "_id" : ObjectId("600e141d44e046eb7c92c4fe"),
 "order_id" : "1",
 "date" : "2020-10-01",
 "amount" : 27.4,
 "cust_id" : "1001"
}

Query1: Display orders made by customers who are older than 40

NoSQL (MongoDB):
Ø  db.customer.aggregate([
{ $match: { age: {$gt:40} }},
{ $lookup: { from: "orders",
            localField: "cust_id",
            foreignField: "cust_id",
            as: "orders_docs" }}
]).pretty()
 
Ø  Result Displays as below:
{
 "_id" : ObjectId("600e120b44284c416405dd7e"),
 "cust_id" : "1000",
 "age" : 42,
 "location" : "Austin",
 "gender" : "Female",
 "orders_docs" : [
     {
        "_id" : ObjectId("600e141d44e046eb7c92c4ff"),
        "order_id" : "2",
        "date" : "2020-10-01",
        "amount" : 36.2,
        "cust_id" : "1000"
     },
     {
        "_id" : ObjectId("600e157c44e046eb7c92c50a"),
        "order_id" : "13",
        "date" : "2020-10-03",
        "amount" : 46.1,
        "cust_id" : "1000"
     }
 ]
}
SQL:
select orders.* 
    -> from customer
    -> join orders
    -> on customer.cust_id = orders.cust_id
    -> where customer.age > 40;
Ø  Result Displays as below:
+----------+------------+--------+---------+
| order_id | date       | amount | cust_id |
+----------+------------+--------+---------+
|        2 | 2020-10-01 |  36.20 |    1000 |
|       13 | 2020-10-03 |  46.10 |    1000 |
+----------+------------+--------+---------+
 

Query2: Display the average order amount, of customers from each location

NoSQL (MongDB):
Ø  db.customer.aggregate([
{ $lookup: { from: "orders",
           localField: "cust_id",
           foreignField: "cust_id",
           as: "orders_docs" }},
{ $group: { _id: "$location",
          avg_amount: { $avg: "$amount" }}}
])
Ø  Result Displays as below:
{ "_id" : "Houston", "avg_amount" : 44.450000 }
{ "_id" : "Dallas", "avg_amount" : 34.591667 }
{ "_id" : "Austin", "avg_amount" : 33.333333 }
SQL:
Ø  select customer.location, avg(orders.amount) as avg_amount
    -> from customer
    -> join orders
    -> on customer.cust_id = orders.cust_id
    -> group by customer.location;
Ø  Result Displays as below:
+----------+------------+
| location | avg_amount |
+----------+------------+
| Austin   |  33.333333 |
| Dallas   |  34.591667 |
| Houston  |  44.450000 |
+----------+------------+
 

Note:

Ø  $avg – returns the average value of the numeric value. $avg ignores non-numeric values. It is available in the following stages

-         $group

-         $project

-         $addField

-         $set

-         $replaceRoot

-         $replaceWith

-         $match

Ø  We group the document based on the location by selecting “$location” as id.

Query3: For each location, calculate the average order amount of the customers who are younger than 30 yrs.

NOTE: In this query we use “$match” stage, to specify the condition along with the field name to filtered.

NoSQL (MongoDB):
Ø  db.customer.aggregate([
{ $match: { age: {$lt: 30} }},
{ $lookup: { from: "orders",
           localField: "cust_id",
           foreignField: "cust_id",
           as: "orders_docs" }},
{ $group: { _id: "$location",
           avg_amount: { $avg: "$amount" }}}
])
Ø  Result Displays as below:
{ "_id" : "Houston", "avg_amount" : 35.625000 }
{ "_id" : "Dallas", "avg_amount" : 34.591667 }
{ "_id" : "Austin", "avg_amount" : 36.000000 }
 
SQL:
Ø  select customer.location, avg(orders.amount) as avg_amount
    -> from customer
    -> join orders
    -> on customer.cust_id = orders.cust_id
    -> where customer.age < 30
    -> group by customer.location;
Ø  Result Displays as below:
+----------+------------+
| location | avg_amount |
+----------+------------+
| Austin   |  36.000000 |
| Dallas   |  34.591667 |
| Houston  |  35.625000 |
+----------+------------+
 

Reference Links:

Ø  https://docs.mongodb.com/manual/reference/operator/aggregation-pipeline/

Ø  https://docs.mongodb.com/manual/reference/sql-aggregation-comparison/ 

Comments

Popular posts from this blog

SQL Real Time Interview Questions with solutions

SQL_Query : In this MySQL challenge, your query should return the names of the people who are reported to (excluding null values), the number of members that report to them, and the average age of those members as an integer. The rows should be ordered by the names in alphabetical order. Your output should look like the  following table . Solution : SELECT  ReportsTo ,   count ( ReportsTo )   as  Members ,   round ( avg ( Age ),   0 )   as   'Average Age'     FROM  maintable_MS9H0 Where  ReportsTo  is   not   null Group   by  ReportsTo order   by  ReportsTo or SELECT  ReportsTo , COUNT ( FirstName )   as  Members , ROUND ( AVG ( Age ))   as   "Average Age"   FROM  maintable_MS9H0  WHERE  ReportsTo  IS   NOT   NULL   GROUP   BY  ReportsTo SQL_Query: In this MySQL challenge, the ta...

Defect Triaging and Priority vs Severity

 I fount the following sites explained clearly about defect triaging and felt worth to share it my blog to help others.  This question was been asked in many interviews as well "What is defect triaging? And how do you handle it?". Defect Triaging Meeting:  http://www.testingdiaries.com/defect-triage-meeting/#:~:text=Defect%20Triage%20Meetings%20are%20project,are%20defined%20for%20the%20bugs. Defect Triage:  https://www.guru99.com/bug-defect-triage.html#:~:text=Defect%20triage%20is%20a%20process,and%20priority%20of%20new%20defects. Priority Vs Severity:  https://www.guru99.com/defect-severity-in-software-testing.html https://www.softwaretestinghelp.com/how-to-set-defect-priority-and-severity-with-defect-triage-process/

NoSQL Document Database Queries

  -           To check the current database list cmd> show dbs; -           To drop database cmd> db.dropdatabase( ); -           To create collection cmd> db.createCollection(“collection name”); or Use <collection name> -           To insert a document into the collection cmd> db.collectionname.insert([{}]); //insert many documents or one document           db.collectionname.insertOne({}); //inserts one document           db.collectionname.insertMany([{},{}]); //insert many documents -           To Print data in Json format cmd> db.collectionname.find( ).forEach(printjson); -        ...