B-Tree (Multiway Tree)

Distribute Education like Computer Geek

Multi-Way Tree

A multi-way tree is a tree where each node has more than 1 key and each node has more than two children.

A 4-way tree or order-4 tree has 3 keys in each node and 4 ways, or children.

Mullti 4 way tree
Mullti 4 way tree
Ques- Why do we use multiway trees?

Suppose you are using an AVL tree or a red-black tree, which are self-balancing binary search trees, and their time complexity of searching, insertion, and deletion is O(h). where h is the height of the tree. If h increases then time complexity will also increase and if h decreases then time complexity also decreases.

Accessing main memory takes about 100 times more time than the time of an arithmetic operation, and accessing disk memory takes about 10000 times more time than the time of an arithmetic operation. This means that we have to reduce the memory access and keep the input numbers in the cache memory, which is the fastest memory. Storage is less in cache memory, so each and every number cannot come in it if we are accessing a very huge random file.

Suppose the height of an AVL tree or a red-black tree is 25, which means there are many numbers in it. Each number is stored in a separate memory block, so if we need to access all 25 memory blocks for data, it will take the longest possible amount of time, or worst-case time.

That’s why Bayer and McCreight designed multi-way tree to reduce the height and time complexity of the tree.

A multi-way tree has two types. The first is a B tree, and the second is a B+ tree.

 

B Tree

When working at Boeing Research Labs, Rudolf Bayer and Edward M. McCreight developed the B-tree to effectively manage the index pages for huge random-access files.

Neither Bayer nor McCreight clarified what the “B” stands for, Bayer, balanced, or broad. So those who say b is for balanced tree, they say it wrong.

B trees are also called height-balanced m-way trees. It is also called a “broad tree” because it’s a fat tree with a short height. The searching, insertion, and deletion operations have a time complexity of O(h), where h is the height of the tree. So, the b tree has a low height and it is an advantage for us to run the operations in a very less time.

Knuth’s define that a B-tree of order m or known as m-way tree is a tree that matches the following constraints. 

  1. Each node n has at most m-1 keys and at most m children. Root node has two or more children.
  2. All keys in a node must be in non-decreasing order key1 ≤ key2 ≤ key3 ≤ … ≤ keyn-1 ≤ keyn.
  3. Each internal node must have at least ⌈m/2⌉ children.
  4. All leaf nodes are at the same level.

We have understood the order of the tree. Now we talk about the degree of a node in the B tree.

The degree of a node is determined by the number of its children. If a node has degree 2, then it has two children.

The maximum and minimum keys that a node can hold are bound. The minimum degree of the B-tree, which is a fixed integer with a value greater than two (t ≥ 2), can be used to represent these bounds:

  1. Each node must have t-1 keys, except the root. This means that every internal node except for the root has t or more children. The root of the tree must include at least one key if it is not empty.
  2. Each node can hold a maximum of 2t-1 keys. An internal node can therefore have a maximum of 2t children. If a node has exactly 2t-1 keys, we say it is full.

Note – Some students want to insert the keys into the b-tree if they have the order, but if the question has only the minimum degree, they can’t create the tree. I’ll explain the relationship between order and degree in the insertion part.

Insertion into a b-tree
  1. Always insert the key into a leaf node.
  2. Median key is moved up to its parent’s node during splitting.

Show the result of inserting the following items in an initially empty B-tree of degree t= 2.

Input is –
F, S, Q, K, C, L, H, T, V, W, M, R, N, P, A, B, X, Y, D, Z, E

Ans-

Minimum degree (t) = 2

Maximum degree = 2t -1 = 2(2) – 1 = 3

It means that a node has 3 keys and at most 4 children. So, the order is 4.

Insert F, S, Q

4 way B tree

The tree has 3 keys and 4 children.

Insert K

The node is completely full, so before inserting K, we have to split the keys. Q is the median will be the root and F and S are its left child and right child.

After splitting, insert K in leaf.

Btree Insert K

Insert C

C comes before F. So, C is the left key to the F in the leaf.

Btree

 
Insert L

L comes after the K key, but the left child, CFK, is full. So, before inserting L, we have to split the keys. F is the median will be the root and C and K are the left and right child of F.

After splitting, insert L in leaf.

Insert L in b tree

 
Insert H, T, V

H comes before K. So, H is the left key to the K in the leaf.

T and V comes after S, So T and V are right keys to the S.

insert H, T, V in b tree

 
Insert W

W comes after V key, but the right child STV is full. So, before inserting W, we have to split the keys. T is the median will be the root. S and V are the left and right child of T.

After splitting, insert W into the leaf.

insert w in B tree

 
Insert M

M comes after L key, but the child HKL is full. So, before inserting M we have to split the keys. K is the median will be the root but root FQT is also full. So, Q is the median will be the new root and F and T are the left and right child of Q.

After splitting, insert K after F, and H and L are the left and right children of K.

After splitting, insert M in the leaf.

insert m in b tree

 
Insert R, N

R comes before S. So, R is the left key to the S in leaf.

N comes after M. So, N is the right key to the M in the leaf.

insert r, n in b tree

 
Insert P

P comes after the N key, but the child LMN is full. So, before inserting P, we have to split the keys. M, the median, will be moved up, and L and N are the left and right children of M, respectively.

After splitting, insert P in the leaf.

insert p in B tree

 
Insert A, B, X

A and B come before C. So, A and B left key to the C in leaf.

X comes after W. So, X is the right key to the W in leaf.

insert A, B, X in B tree

 
Insert Y

Y comes after X, but the child VWX is full. So, before inserting Y, we have to split the keys. W is the median that will be moved up, and V and X are the left and right children of W.

After splitting, insert Y into the leaf.

insert y in B tree

 
Insert D, Z, E

D comes after C, but the child ABC is full. So, before inserting D, we have to split the keys. B is the median, which will be moved up, but root FKM is also full. So, K is the median and will be the root, and F and M are the left and right children of K.

After splitting, insert B before F, and A and C are the left and right children of B.

After splitting, insert D in the leaf.

Z comes after Y. So, Z is the right key to the Y in leaf.

E comes after D. So, E is right key to the D in leaf.

insert D, Z, E in B tree

 
Algorithm of Insertion in B tree
InsertionBTree(n, k)
  1. If tree has no node
  2. then insert a key in new node as the root.
  3. else search a vacant position in leaf node to add the key in sorted order.
  4. If leaf node is completely full
  5. Split the node by median key as a parent of left and right keys child.
  6. Repeat the splitting until the node is completely full.

Deletion in a B-Tree

Deletion is a very complex operation in b tree. When deleting, two properties must always be considered.

  1. A node should have at least minimum degree t.
  2. All non-leaf node child pointers must have children.

Input is

Deletion in B tree

There are many scenarios of deletion

CASE 1
  • If deleted key is the leaf node,
  • Keys in the leaf node > minimum keys require for b tree

Then this is a straightforward deletion. No further operations are required.

Delete D

btree delete D

 
CASE 2
  • If deleted key is the leaf node,
  • Keys in the leaf node = minimum keys require for b tree

Then after deletion node will violate the property of B tree.

Solution – The deleted node can borrow a key from the left sibling. If left sibling does not have more keys then you can borrow from right sibling.

If H is deleted, the left sibling’s biggest number E will take its parent F place, and F will replace H.

 
Delete H

btree delete H

 
Delete L

If L is deleted, the right sibling’s smallest number N will take its parent M place, and M will replace L.

btree delete L

 
CASE 3
  • If deleted key is the leaf node,
  • Keys in the leaf node = minimum keys require for b tree
  • Can not borrow the key from left and right sibling because they only have minimum number of keys.

Then after deletion node will violate the property of B tree.

Solution – Merge the siblings and the minimum of parent node and replace this with deleted node.

 
Delete C

btree delete C

 
CASE 4
  • If deleted key is an internal node.
  • Find the immediate predecessor of the deleted key and replace it.
 
Delete T

btree delete T

S is the immediate predecessor of T, Hence S replaces T.

  • If immediate predecessor has minimum key, then find the immediate successor of deleted key and replace it.
 
Delete W

btree delete W

X is the immediate successor of W, Hence X replaces W.

 
Delete X

btree delete X

Y is the immediate successor of X, Hence Y replaces X.

  • If immediate predecessor and immediate successor have minimum key only, then merge the predecessor and successor.
 
Delete Y

btree delete Y

Key Y predecessor V and Successor Z are merged.

 
CASE 5
  • If deleted key is Root node.
  • Find the immediate predecessor of deleted key and replace it with the deleted root node.
  • Now we have to follow Case 1, Case 2 and Case3, as the immediate predecessor of deleted root node is in the leaf node of the tree.
 
Delete K

btree delete K

In Case 2, the deleted or replaced node can borrow a key from the left sibling.

btree delete K

 
Algorithm of deletion in b-tree
DeletionBTree(X,N,M)

Key to be deleted is X

Node which contains deleted key is N

Minimum degree is M

1. If Parent(N) = Nil //Root

        If Left(N) = Right(N) = Nil      //Root has no children

            Delete X

        Else Left(N) = Right(N) = Non-Nil   //Root have children

            Delete X and take the key from predecessor or successor.

2. Else, Parent(N) = Non-Nil

        If Left(N) = Right(N) = Nil      //Leaf node

            If N > M, delete X.

        Else If, N = M, delete X and take the key from predecessor or successor.

           If Predecessor = Successor = M, delete X and merge left sibling and minimum key of parent’s node.

        Else, Left(N) = Right(N) = Non-Nil      //Internal node

            Delete X and take the key from predecessor or successor.

                 If Predecessor = successor = M, merge them. 

Time & Space Complexity of B-Tree
  1. In searching, the time complexity is O(logn).
  2. In insertion, the time complexity is O(logn).
  3. In deletion, the time complexity is O(logn).
  4. Space complexity is O(n).
 
Advantages of B-Tree
  1. M-way tree is the index pages for huge random-access files.
  2. Searching, Insertion and deletion operations are fast.
  3. It can arrange large number of records in the less height of tree.
  4. The tree has sorted order of keys.
 
Disadvantages of B-Tree
  1. Insertion & deletion operations are complex.
  2. When stored in RAM, B-trees are not as efficient as other balanced trees.
Applications of B-Tree
  1. B-trees are used in databases and file systems to efficiently store and retrieve large amounts of data, optimizing disk access and minimizing search time.
  2. They are employed in search engines to index web pages and quickly retrieve relevant results, improving the speed and efficiency of searches.
  3. B-trees are used in file transfer protocols to organize and navigate directory structures, ensuring fast file access and transfer.
  4. They find applications in database systems for indexing and querying multidimensional data, supporting efficient range searches and spatial queries.
  5. B-trees are used in network routers to store routing tables and efficiently route data packets through complex networks, ensuring reliable and fast data transmission.
  6. They are employed in transaction processing systems to manage large volumes of concurrent transactions, ensuring data integrity and high throughput.
  7. B-trees find applications in file compression algorithms to organize and compress data efficiently, reducing storage space and improving file transfer speeds.
  8. They are used in version control systems to manage revisions and branches of code repositories, facilitating efficient code collaboration and history tracking.

Test Yourself

Q1- Explain the concept of a B-Tree and its advantages over other data structures.

B-Tree is a balanced m-way tree that efficiently manages index pages for large random-access files. 

It has the following advantages:

   – It has a low height, enabling faster search, insertion, and deletion operations (O(logn)).

   – The tree maintains sorted order of keys, making range searches easier.

   – B-Trees are well-suited for databases and file systems as they reduce the number of disk accesses.

   – It can handle a large number of records while keeping the height of the tree small.

Q2- How is the height of a B-Tree minimized, and why is it important in reducing time complexity?

The height of a B-Tree is minimized by keeping a high branching factor (order) in each node. With a higher branching factor, each node can hold more keys and have more children, leading to a shorter height. A shorter height means faster access to data, reducing the time complexity of operations like search, insertion, and deletion.

Q3- Explain the complexities involved in deletion from a B-Tree and how different cases are handled.

Deletion in a B-Tree is complex and involves several cases

1. Deleting a key from a leaf node with enough keysSimple deletion without affecting the tree balance.

 2. Deleting a key from a leaf node with minimum keys: Borrowing from siblings or merging nodes to maintain the B-Tree properties.

 3. Deleting an internal node: Replacing with the immediate predecessor or successor, and handling potential underflows.

4. Deleting the root node: Replacing the root and then applying the above cases as needed.

Q4- Discuss the advantages and disadvantages of using B-Trees for large-scale data storage.

Advantages of B-Trees

   – Efficient management of index pages for large files and databases.

   – Faster search, insertion, and deletion operations due to a low tree height.

   – Sorted order of keys simplifies range searches.

   – Reduced disk accesses due to a balanced structure.

Disadvantages of B-Trees

   – Complex insertion and deletion operations.

   – Less efficient than other balanced trees in RAM storage.

Q5- Compare and contrast B-Trees with AVL trees and red-black trees.

– B-Trees are m-way trees that have a higher branching factor, while AVL and red-black trees are binary search trees.

– B-Trees are designed for efficient disk access, while AVL and red-black trees are optimized for in-memory operations.

– B-Trees have a lower height, leading to faster search, insertion, and deletion, but AVL and red-black trees maintain stricter balance, leading to slightly higher heights.

– AVL and red-black trees have logarithmic time complexities for search, insertion, and deletion, similar to B-Trees.

Q6- Can a B-Tree have a degree or branching factor of 1? Explain the consequences.

No, a B-Tree cannot have a degree or branching factor of 1. The minimum branching factor (degree) of a B-Tree is 2. If a B-Tree had a branching factor of 1, each node could have only one child, making it equivalent to a linked list. In such a scenario, the B-Tree would lose its balancing property and become inefficient, defeating the purpose of using a B-Tree.

Q7- How does the B-Tree’s height affect the time complexity of search, insertion, and deletion operations?

The height of a B-Tree directly affects the time complexity of its operations.

A shorter height means faster access to data and, consequently, faster search, insertion, and deletion operations. As the height of the B-Tree increases, the time complexity of these operations also increases, making the tree less efficient.

The goal is to keep the height as low as possible to achieve optimal performance.

Q8- Can a B-Tree have a branching factor of 5? Justify your answer.

Yes, a B-Tree can have a branching factor of 5. The branching factor of a B-Tree determines the maximum number of keys a node can hold and the maximum number of children it can have. As long as the branching factor is greater than or equal to 2, and the B-Tree properties are maintained (e.g., minimum keys in each node, keys in non-decreasing order), it can be considered a valid B-Tree.

Q9- Explain the role of the “minimum degree” in the insertion and deletion operations of a B-Tree.

The “minimum degree” of a B-Tree, denoted by “t,” represents the minimum number of keys a non-root node can hold.

It also determines the maximum number of keys a node can hold (2t – 1) and the maximum number of children (2t).

In insertion, the minimum degree ensures that nodes are split when they become full, and new keys are inserted appropriately to maintain the B-Tree properties.

In deletion, the minimum degree helps in determining the cases for handling underflows and merging nodes when necessary.

Q10- What is the time complexity of searching, insertion, and deletion in a B-Tree?
  1. O(1)
  2. O(log n)
  3. O(n)
  4. O(n log n)

Ans – (2)

Explanation – The time complexity of searching, insertion, and deletion in a B-Tree is O(log n). This is because B-Trees are balanced m-way trees, and their height remains logarithmic with respect to the number of elements (n) in the tree. As each level divides the number of elements by the branching factor, the height of the tree remains log(n) with base m.

Q11- B-Trees are well-suited for managing:
  1. Linked list
  2. Stacks
  3. Queues
  4. Random-access files

Ans – (4)

Explanation – B-Trees are well-suited for managing random-access files. Their low height and high branching factor make them efficient for disk-based data storage and retrieval, particularly for databases and file systems.

Q12- What is the minimum branching factor (degree) for a valid B-Tree?
  1. 1
  2. 2
  3. 3
  4. 4

Ans – (2)

Explanation –  The minimum branching factor (degree) for a valid B-Tree is 2. This means each internal node must have at least two children, and each node, except the root, must have at least two keys.

Q13- The height of a B-Tree is minimized by:
  1. Increasing the branching factor
  2. Decreasing the branching factor
  3. Randomly rearranging nodes
  4. None of the above

Ans – (1)

Explanation – The height of a B-Tree is minimized by increasing the branching factor (degree) of the tree. With a higher branching factor, each node can hold more keys and have more children, resulting in a shorter height and faster operations.

Q14- When inserting keys into a B-Tree, the nodes are split when:
  1. The parent node becomes full
  2. The root node becomes full
  3. The leaf node becomes full
  4. None of the above

Ans – (3)

Explanation – When inserting keys into a B-Tree, the nodes are split when the leaf node becomes full. The splitting process ensures that the B-Tree maintains its balanced structure and the keys are inserted in sorted order.

BOOKS

Algorithm T H CoremanAlgorithm by Udit AgarwalAlgorithm Ellis HorowitzData Structure & Algorithm Made Easy Narasimha