Open-Source Internship opportunity by OpenGenus for programmers. Apply now.
In this article, we will take a look at distributed database management systems (Distributed DBMS) and their advantages and disadvantages along with examples.
Table of Contents:
- What is Distributed DBMS
- How is data stored in a distributed database
- Types of Distributed DBMS
- Advantages of Distributed DBMS
- Disadvantages of Distributed DBMS
- Examples of Distributed DBMS
What is Distributed DBMS?
To understand what a Distributed Database Management System is, We must first get a recap of what a Database is and what a Database Management System is, and how they differ from each other.
Database
A database is a collection of data that is organized so that it can be easily accessed and modified.
Database Management System (DBMS)
A DBMS or Database Management System is the software that controls and manages databases. It allows us to perform various operations on the data very easily.
Distributed Database
In a distributed database, the data is spread or replicated among several databases which are physically separate from each other. These databases are connected through a network so that they appear as a single database to the user.
Or in other words, A distributed database is a database in which all storage devices are not attached to a common CPU. Data may be stored in multiple sites separate from each other.
Distributed DBMS (DDBMS)
The DDBMS software system permits the management of the distributed database so that it appears as one single database to the users. Each database in the distributed system has its DBMS software.
DDBMS consists of a single logical database that is split into multiple fragments. Each fragment is stored in one or more computers which are under the control of a separate DBMS and connected by a network.
Using DDBMS a single query is run on multiple local databases and the results are merged proving the illusion of a single database.
How is data stored in a Distributed Database
There are mainly 3 ways of storing the data in a distributed database.
- Data Replication
- Data Fragmentation
- Hybrid
Data Replication
The same data is stored at more than one site. This improves the availability of the data as even if one site goes down, the data will still be available on the other sites.
It can also improve performance by providing faster access and reducing the need for data to travel over the network and hence improving performance.
However, replication does have the disadvantage of requiring more space to store duplicate data and when one table is updated all the copies of it must also be updated to maintain consistency.
Data Fragmentation
The system fragments the relation into smaller parts and each part is stored at a different site. Usually, fragments are stored at sites where they are most accessed.
When a query is made, first the local database in the site is checked. If the data is not available on the local site then all the other sites are checked.
Fragmentation can be done based on rows of a relation (Horizontal Fragmentation) or based on the columns of a relation (Vertical Fragmentation).
Fragmentation allows for local optimization and security but suffers from inconsistent access speed.
Hybrid Storage
Hybrid data storage combines both data replication and fragmentation to get the benefits of both models.
Types of Distributed DBMS
Distributed DBMS can be classified into Homogenous DDBMS and Heterogeneous DDBMS.
Homogeneous DDBMS
In a Homogeneous DDBMS, all sites have a similar database schema and run similar database management software. Each site is aware of the existence of other sites and can directly communicate with each other.
Homogeneous DDBMS are much simpler to design, manage and add new a new site. Homogeneous DDBMS can also improve the parallel processing capabilities of multiple sites.
Heterogeneous DDBMS
In reality, multiple already existing databases may be linked to construct a distributed database. These databases may run different database management software and have a different schema. Such systems are called Heterogeneous DDBMS.
Users can submit the queries in the language of DBMS used at their local site. The system hides the existence of other databases from the user and provides transparency. Designing a heterogeneous DDBMS is much harder since the system must provide interoperability between different databases which may have different database management software and hardware.
No currently available DDBMS software provides full support for Heterogeneous DDBMSs.
Advantages of Distributed DBMS
- Reliability:
Data may be replicated in several sites so that the failure of a single site does not make the data inaccessible. - Information Sharing:
Users in one site can access the data present in other sites. - Faster data processing:
A distributed database allows for the processing of data at several sites simultaneously. - Faster data access:
In a distributed system, the data is usually stored at the site where the demand for it is the greatest. This can lead to faster access of the data and better performance. - Autonomy::
Each site retains some level of control over its data, unlike a central database. - Modularity:
New sites can be added and removed when required thus improving flexibility.
Disadvantages of Distributed DBMS
- Complexity:
The design and management of Distributed DBMS are very complex especially the heterogeneous DDBMS since it can use different software. - Increased Storage:
Data may be replicated at several sites which leads to increase storage requirements. - Difficulty in maintaining integrity:
Integrity refers to the consistency of data. When the data is replicated at multiple sites, all of them need to be updated if a change is made to one. - Communication costs:
The need for the sites to communicate with each other adds more complexity and cost. - Security:
Since data is stored at multiple sites, the security risk increases.
Examples of Distributed DBMS
A Real Life Example
Consider a company like Walmart which has branches all over the USA. Each branch stores information about the customers, products and purchases in that branch. The schema can look something like this
Customers(ID, Name, Email, Address, Phone No)
Products(ID, Name, Category, Price)
Purchases(CustomerID, ProductID, Date)
Suppose the CEO wants to know the number of purchases in the whole of USA. In the manual approach, we would have to log in to each branch and run a query to get the count of purchases and then combine the results. This can be very time-consuming.
But if the system is a distributed database, we can get the count of all purchases by using a single query.
Commercial Distributed DBMS Software
There are many commercial Distributed DBMS software available. Some of the popular ones are Apache Cassandra, Apache HBase, Amazon SimpleDB, Clusterpoint and FoundationDB.
Apache Cassandra
Apache Cassandra is an open-source, NoSQL, distributed database management system that provides high scalability and high availability.
Apache HBase
Apache HBase is also an open-source, NoSQL, distributed database management system used for Big Data storage. It is written in Java.
Amazon SimpleDB
Amazon SimpleDB is a NoSQL, distributed database which is available as a part of Amazon Web Services. It is written in Erlang.
Clusterpoint
Clusterpoint is a schema-less database that allows the distribution of data. It also provides cross-platform support.
FoundationDB
FoundationDB is a NoSQL, distributed database that stores ordered key-value pairs. It was designed and developed by Apple.
With this article at OpenGenus, you must have the complete idea of Distributed Database Management System (Distributed DBMS).