Subset Sum Problem

Distribute Education like Computer Geek

Introduction

The Subset Sum Problem (SSP) is a classic problem in computer science and combinatorial optimization. It belongs to the category of decision problems, where the task is to determine whether there exists a subset of numbers in a given set that adds up to a particular value.

This problem is particularly famous because it is a special case of the more general Knapsack Problem and is known for being NP-complete, meaning it is at least as hard as the hardest problems in the class NP.

.

Problem Definition

Determine whether there exists any non-empty subset of 𝑆 such that the sum of its elements equals 𝑇.

In other words, the goal is to find any subset 𝑆′⊂𝑆 where the sum of the elements in 𝑆’ is exactly 𝑇.

Input

  • A finite set 𝑆 of 𝑛 integers
  • An integer 𝑇, the target sum

Output

  • True if there exists a subset of 𝑆 whose sum is equal to 𝑇.
  • False otherwise

Subset Sum Problem

Subset Sum Problem can be solved by Naïve method, Recursive method, Dynamic Programming and Backtracking.

Here, naïve method, recursive method and dynamic programming is discussed here.

.

Example

Question – Check whether the subset S make a sum equal to T or not?
Let S = {2, 4, 5, 7}
1. T = 8
2. T = 9

.
Answer – With using naive approach, we can say that
1. T is not equal to 8, because there is no subset of the set S that can create the sum 8.
2. T is equal to 9, because the subsets {4, 5} and {2, 7} can create the sum 9.

However, the naive method is quite complex. It’s feasible for simpler and smaller inputs, but not for larger ones due to its complexity of n.2n, which can complicate matters significantly.

When employing this approach to solve the Subset Sum problem for a set 𝑆 with 5 elements {0, 2, 4, 5, 7}, the method would indeed consider 25=32 different possible combinations of these elements.

For each subset, we might need to sum up to 𝑛 elements. On average, the size of subsets considered is about 𝑛/2 if we consider all possible subsets uniformly. So, the final time complexity is O(n.2n).

Each combination represents a potential subset of 𝑆, and each subset’s sum is calculated to see if it matches a specific target sum. Here’s a clearer representation of how the naive method evaluates each subset

  • {0, 0, 0, 0, 0} – corresponds to taking no elements, resulting in a sum of 0.
  • {0, 0, 0, 0, 1} – corresponds to taking just the last element (7), so the subset sum is 7.
  • {0, 0, 0, 1, 0} – takes only the fourth element (5), making the subset sum 5.
  • {0, 0, 0, 1, 1} – takes the fourth and fifth elements (5 and 7), resulting in a sum of 12.
  • … and so forth, up to
  • {1, 1, 1, 1, 1} – takes all elements, adding up to 18 (0+2+4+5+7).

.

Recursive Algorithm

In the recursive approach, at each step, we have two choices, include the current element in the subset or exclude it. This results in a binary tree-like recursive exploration of all possible subsets.

Let’s denote the number of elements in the input set as n.

In the worst-case scenario, the recursion explores all possible subsets, which is 2n, because at each step, there are two recursive calls – one including the current element and one excluding it.

Therefore, the time complexity of the recursive algorithm for the Subset Sum Problem is O(2n).

In 1974, Horowitz and Sahni published an improved exponential-time approach that runs in time O(2n/2. (n/2)) but takes significantly more space O(2n/2).

In 1981, Schroeppel and Shamir published an approach based on Horowitz and Sanhi that required comparable runtime O(2n/2. (n/4)) but significantly less space O(2n/4).

.

Historical Context

The phrase “Subset Sum Problem” may have originated with the advent of complexity theory and computational complexity classes in the 1970s.

Richard Karp’s influential 1972 paper, “Reducibility Among Combinatorial Problems,” included the Subset Sum Problem as one of 21 NP-complete problems, which was a crucial milestone in bringing it into the spotlight of computational theory.

In the early 1970s, academics like as Stephen Cook and Leonid Levin developed and formalized complexity theory and NP-completeness, which helped to create and name norms for many computer issues, including the Subset Sum Problem.

.

Dynamic Programming comes into play

Dynamic programming offers a more efficient solution compared to naive method and recursive method, especially for larger input sizes, making it a powerful technique for solving the Subset Sum Problem.

The dynamic programming solution to Subset Sum runs in pseudo-polynomial time. This complexity is considered pseudo-polynomial because while it is polynomial in the numeric value of the target sum t and the number n, it is not polynomial in the size of the input if t is large. This is because the input size in bits is proportional to log(t), and the complexity O(n * t) can be exponential in terms of the number of bits used to represent t.

The Subset Sum problem is NP-complete, a classification in computational complexity theory. This means two things –

NP (Non-deterministic Polynomial time) – It is easy to verify a correct solution to the problem in polynomial time.

NP-complete – It is as hard as the hardest problems in NP, and no polynomial-time algorithm is known for solving all NP-complete problems efficiently.

.

Algorithm for SSP by Dynamic Programming
  1. Create a 2D array (n+1)(t+1) where Array[i][j] will be True if a subset of the first i items from S can sum to j. Otherwise, it will be False.
  2. Array[0][0] is True because a subset of zero elements can sum to 0.
  3. Array[i][0] is True for all i, because with any number of items you can always form the sum 0 by not taking any items.
  4. Array[0][j] is False for all j>0, because no subset of zero items can sum to a positive j.
  5. Iterate over each item i from 1 to n (number of elements in S).
  6. For each item, iterate over possible sums j from 1 to T.
  7. Set Array[i][j] to True if either Array[i-1][j] is True (not taking the current item) or Array[i-1][j – S[i-1]] is True (taking the current item, hence subtracting its value from j).
  8. The value at Array[n][T] will indicate whether a subset of S can sum to T.

.

Example

We use the same question that we took earlier in the  method.

Let S = {2, 4, 5, 7} and T = 9

The length of S is n = 4

Step 1 – Create a 2D array of (n+1)(t+1).

 0123456789
0          
2          
4          
5          
7          

.

Step 2 – True – 1, False – 0

  • Array[0][0] is True because a subset of zero elements can sum to 0.
  • Array[i][0] is True for all i, because with any number of items you can always form the sum 0 by not taking any items.
  • Array[0][j] is False for all j>0, because no subset of zero items can sum to a positive j.
 

j ->

0123456789

i=0

{0}1000000000

i=1

{2}

1

         

i=2

{4}

1

         

i=3

{5}

1

         

i=4

{7}

1

         

.

Step 3 – The current item is 2, So it cannot be equal to 1. So, it is false.

 

j ->0123456789

i=0

{0}100000000

0

i=1

{2}1

0

        

i=2

{4}

1

         

i=3

{5}

1

         
i=4{7}

1

         

.

Step 4 – The current item is 2 and the sum is also 2. So, it is true.

 

j ->0123456789

i=0

{0}1000000000

i=1

{2}10

1

       

i=2

{4}1         

i=3

{5}

1

         
i=4{7}

1

         

.

Step 5 – We only have the number 2 available, so it’s not possible for it to be greater than 2. Therefore, sums from 3 to 9 are impossible and would be false.

 j ->0123456789

i=0

{0}1000000000

i=1

{2}1010000000

i=2

{4}

1

         

i=3

{5}

1

         

i=4

{7}

1

         

.

Step 6 – We have the numbers 2 and 4, with 4 being greater than 2. Therefore, for the sum equal to 4, the value can be true, but for sums less than 4, the values remain the same as in the previous row.

 

j ->

012345678

9

i=0

{0}1000000000

i=1

{2}1010000000

i=2

{4}10101     
i=3{5}1         

i=4

{7}

1

         

.

Step 7 – We have the numbers 2 and 4. Their combined sum is 6, which means it’s not possible to reach a sum of 5, but it is possible to achieve a sum of 6.

 j ->0123456789
i=0{0}100000000

0

i=1

{2}1010000000
i=2{4}1010101   

i=3

{5}1         
i=4{7}

1

         

.

Step 8 – We only have the number 2 and 4 available, so it’s not possible for it to be greater than 6. Therefore, sums from 7 to 9 are impossible and would be false.

 

j ->

0123456789

i=0

{0}100000000

0

i=1{2}101000000

0

i=2

{4}101010100

0

i=3

{5}

1

         

i=4

{7}

1

         

.

Step 9 – We have the numbers 2, 4 and 5 available, with 5 being greater than 2 and 4. Therefore, for the sum equal to 5, the value can be true, but for sums less than 5, the values remain the same as in the previous row.

 j ->0123456789

i=0

{0}1000000000
i=1{2}101000000

0

i=2

{4}1010101000
i=3{5}101011    
i=4{7}1         

.

Step 10 – In the previous row, sum = 6 is true, then in this current row, the sum = 6 has to be true.

 j ->0123456789

i=0

{0}1000000000
i=1{2}101000000

0

i=2

{4}1010101000
i=3{5}1010111   
i=4{7}1         

.

Step 11 – The number 5 + 2 is 7. So, the sum = 7 is true.

 j ->0123456789

i=0

{0}1000000000
i=1{2}101000000

0

i=2

{4}1010101000
i=3{5}1010111

1

  
i=4{7}1         

.

Step 12 – We have numbers 2, 4 and 5. It cannot be equal to 8. So, sum = 8 is false.

 j ->0123456789

i=0

{0}1000000000
i=1{2}101000000

0

i=2

{4}1010101000
i=3{5}10101111

0

 
i=4{7}1         

.

Step 13 – The number 5 + 4 is 9. So, the sum = 9 is true.

 j ->0123456789

i=0

{0}1000000000
i=1{2}101000000

0

i=2

{4}1010101000
i=3{5}101011110

1

i=4{7}1         

.

Step 14 – We have the numbers 2, 4, 5 and 7 available, with 7 being greater than 2, 4 and 5. Therefore, for the sum equal to 7, the value can be true, but for sums less than 7, the values remain the same as in the previous row.

 j ->0123456789

i=0

{0}1000000000
i=1{2}101000000

0

i=2

{4}1010101000
i=3{5}101011110

1

i=4{7}10101111  

.

Step 15 – We have numbers 2, 4, 5 and 7. It cannot be equal to 8. So, sum = 8 is false.

 j ->0123456789

i=0

{0}1000000000
i=1{2}101000000

0

i=2

{4}1010101000
i=3{5}101011110

1

i=4{7}101011110 

.

Step 16 – We can sum the numbers 4 and 5 or we can sum the numbers 2 and 7, their sum = 9. So, it is true.

 j ->0123456789

i=0

{0}1000000000
i=1{2}101000000

0

i=2

{4}1010101000
i=3{5}101011110

1

i=4{7}101011110

1

The algorithm will find that {4,5} is a subset of Set S that adds up to 9. Thus, the output will be True.

.

Source Code of Subset Sum Problem

Time Complexity

The time complexity of the Subset Sum Problem when solved using dynamic programming is best described as pseudo-polynomial and is 𝑂(𝑛×𝑇), where 𝑛 is the number of elements in the set 𝑆 and 𝑇 is the target sum.

Although the time complexity is polynomial in terms of 𝑛 and 𝑇, it is important to note that if 𝑇 is large, the algorithm may take a long time to execute. That’s why the time complexity is said to be pseudo-polynomial.

.

Space complexity

The dynamic programming solution to the Subset Sum Problem uses a 2D table (often implemented as a 2D array or list of lists) to store the results of subproblems.

Since there are n+1 rows (one for each item plus an additional row for the case with zero items) and T+1 columns (one for each possible sum from 0 to T), the total number of entries in the table is (n+1)(T+1).

So, the space complexity is O(nxT).

.

Applications

Subset Sum Problem (SSP) is np-complete, so it can work in cryptography techniques. Some cryptographic systems rely on the hardness of problems like the Subset Sum for security. Breaking these systems can sometimes be reduced to solving instances of the Subset Sum Problem.

  1. The Merkle-Hellman knapsack cryptosystem, an early public key system, was based on a variant of the knapsack problem, which is closely related to the Subset Sum Problem.
  2. Certain types of data compression algorithms, where you need to fit maximum relevant data into a limited amount of space, can also be framed as problems similar to SSP.
  3. Allocating fixed time slots or resources to a set of tasks such that their total time or resource usage meets but does not exceed the available limit.

Test Yourself

Q1- Explain why the Subset Sum Problem is considered NP-complete.

The Subset Sum Problem is considered NP-complete because it satisfies two important criteria

NP (Nondeterministic Polynomial time) – This means that given a solution, it can be verified in polynomial time.

For the Subset Sum Problem, if someone claims to have found a subset that sums up to the target value, you can easily verify this by adding up the elements of the subset, which takes polynomial time proportional to the size of the subset.

NP-hardness – This means that the problem is at least as hard as the hardest problems in NP. If we could solve the Subset Sum Problem efficiently (in polynomial time), then we could solve all other problems in NP efficiently as well. This is because any problem in NP can be reduced to the Subset Sum Problem in polynomial time.

Q2- Why is the dynamic programming solution to the Subset Sum Problem considered pseudo-polynomial?

While the dynamic programming solution’s time complexity is polynomial in terms of the input size and target sum, it’s pseudo-polynomial because it can be exponential in the number of bits used to represent the target sum.

Q3- What role does the Subset Sum Problem play in cryptography?

The Subset Sum Problem’s hardness is utilized in cryptographic systems for security. Breaking these systems often involves solving instances of the Subset Sum Problem.

Q4- Why is the Subset Sum Problem considered a decision problem?

The Subset Sum Problem asks whether a subset of a given set sums up to a specific target value, making it a decision problem where the answer is either “yes” or “no“.

Q5- What is the time complexity of the brute-force approach to solving the Subset Sum Problem?
   1. O(n * T)
   2. O(T)
   3. O(n.2n)
   4. O(n2)

Ans – (3)

Explanation – The brute-force method involves examining all subsets of a set, and there are n.2n possible subsets.

Q6- Which method is known for improving the time complexity of solving SSP but increases space complexity?
   1. Horowitz and Sahni method
   2. Dynamic programming
   3. Schroeppel and Shamir method
   4. Recursive algorithm

Ans – (1)

Explanation – This method splits the problem into smaller parts, improving time complexity at the cost of higher space usage.

Q7- The dynamic programming solution to SSP has a space complexity of
   1. O(n + T)
   2. O(n)
   3. O(T)
   4. O(n * T)

Ans – (4)

Explanation – The space complexity is due to the 2D array that stores boolean values for subsets up to n and sums up to T.

Q8- Which problem is a variation of the Subset Sum Problem that involves partitioning a set into two subsets with equal sums?
   1. Travelling salesman problem
   2. Partition problem
   3. 0/1 Knapsack problem
   4. Coin change problem

Ans – (2)

Explanation – The partition problem is a specific case of the SSP where the target is half the total sum of the set.

Q9- In the context of complexity theory, SSP being NP-complete implies that
   1. It has a polynomial-time solution
   2. It is solvable in exponential time only
   3. It is as hard as the hardest problems in NP
   4. It cannot be solved

Ans – (3)

Explanation – NP-complete means a problem is both in NP and as challenging as any problem in NP.

Q10- What is the primary goal of the Subset Sum Problem?
  1. Maximizing profit
  2. Minimizing subset size
  3. Achieving a target sum
  4. Sorting the input set

Ans – (3)

Explanation – The Subset Sum Problem aims to find a subset of a given set that sums up to a specific target value.

BOOKS

Algorithm T H CoremanAlgorithm by Udit AgarwalAlgorithm Ellis HorowitzData Structure & Algorithm Made Easy Narasimha