1. Question / 实践题目
2. Analysis / 问题描述
The task is to find the median of the union of two sequences in a non-descending order with the same length, with a \(O(\log N)\) time complexity.
The inputs contain 3 lines. The length \(N\) (\(0<N\leq100000\)) of the sequences on the first line, two \(N\)-length non-descending sequences on the second line and the three line respectively, with their elements of int type separated by spaces.
The output is the median.
And here‘s how we can solve the problem.
Without the requirement of the \(O(\log N)\) time complexity, the easiest way to solve the problem is to merge two linear lists together, and calculate their median with the equation \(\lfloor\frac{N+1}{2}\rfloor\).
Taking the time complexity \(O(\log N)\) into account, we have to find another way. For the requirement to find the MEDIAN, that is, the MIDDLE element of the union, and the \(\log N\) time complexity, the binary search which is also about the MIDDLE and the \(\log N\) is likely to be a way to figure out the task.
Different from the original binary search, which processes a sequence each time, we could process the two sequences together in each recursion, to meet the \(\log N\) requirement.
3. Algorithm / 算法描述
Assume that the first sequence is seqA and the second seqB. Using the binary search, the key algorithm to find the median may be:
Calculate the median of seqA (medianA)
Calculate the median of seqB (medianB)
if the first element is greater than the last element in two sequences respectively
return the smaller one in the first elements
else
if medianA == medianB
the median is medianA (or medianB)
else if the sequence has a greater median
take its left part (including the median) as the new sequence
else if the sequence has a smaller median
take its right part (including the median) as the new sequence
Concretely, let‘s take sample 1 for an example.
The original state of the two sequences of sample 1 is:
seqA: 1 3 5 7 9
seqB: 2 3 4 5 6
And the medianA is \(5\), the medianB is \(4\). Since \(5 > 4\), we take the left part of seqA and the right part of the other‘s.
seqA: [1 3 5] 7 9
seqB: 2 3 [4 5 6]
↓
seqA_1: 1 3 5
seqB_1: 4 5 6
Then calculate two medians again and use the same strategy, we get
seqA_1: 1 [3 5]
seqB_1: [4 5] 6
↓
seqA_2: 3 5
seqB_2: 4 5
And then
seqA_2: 3 [5]
seqB_2: [4] 5
↓
seqA_3: 5
seqB_3: 4
So far, for seqA_3:
aLeft == aRight
and for seqB_3:
bLeft == bRight
Thus the smaller element in {4, 5}, that is, 4, is returned as the median.
The strategy above works well for sequences with an odd number length, but that‘s not the case for sequences with an even number length. Concretely, for the second sample, the original state of the 2 sequences is:
seqA: -100 -10 1 1 1 1
seqB: -50 0 2 3 4 5
And after the first recursion, the two subsequences will be like:
seqA_1: 1 1 1 1
seqB_1: -50 0 2
The two sequences don‘t have the same length, and thus the strategy cannot be used. To avoid this problem, we can omit the smaller median of the original sequence, which won‘t affect the result. And instead the subsequences will be like:
seqA_1: 1 1 1
seqB_1: -50 0 2
Since the length of the two sequences are the same now, we can continue the process.
Therefore, our algorithm now will be:
Calculate the median of seqA (medianA)
Calculate the median of seqB (medianB)
if the first element is greater than the last element in two sequences respectively
return the smaller one in the first elements
else
if medianA == medianB
the median is medianA (or medianB)
else if the sequence has a greater median
take its left part (including the median) as the new sequence
else if the sequence has a smaller median
if the sequence has an odd length
take its right part (including the median) as the new sequence
else if the sequence has an even length
take its right part (excluding the median) as the new sequence
And below is the implementation of the algorithm written in c++.
The recursion part:
int biSearchMedian(int *a, int aLeft, int aRight, int *b, int bLeft, int bRight)
{
// calculate the 2 medians of the 2 sequences
int medianA = (aRight + aLeft) / 2;
int medianB = (bRight + bLeft) / 2;
if (a[aLeft] >= a[aRight] && b[bLeft] >= b[bRight])
return a[aLeft] < b[bLeft] ? a[aLeft] : b[bLeft];
else
{
// find the median recursively
if (a[medianA] == b[medianB])
return a[medianA];
else if (a[medianA] < b[medianB])
{
if ((aRight - aLeft + 1) % 2 == 1) // an odd length
return biSearchMedian(a, medianA, aRight, b, bLeft, medianB);
else // an even length
return biSearchMedian(a, medianA + 1, aRight, b, bLeft, medianB);
}
else
{
if ((bRight - bLeft + 1) % 2 == 1) // an odd length
return biSearchMedian(a, aLeft, medianA, b, medianB, bRight);
else // an even length
return biSearchMedian(a, aLeft, medianA, b, medianB + 1, bRight);
}
}
}
And the completed code:
#include<iostream>
using namespace std;
int biSearchMedian(int *a, int aLeft, int aRight, int *b, int bLeft, int bRight)
{
// calculate the 2 medians of the 2 sequences
int medianA = (aRight + aLeft) / 2;
int medianB = (bRight + bLeft) / 2;
if (a[aLeft] >= a[aRight] && b[bLeft] >= b[bRight])
return a[aLeft] < b[bLeft] ? a[aLeft] : b[bLeft];
else
{
// find the median recursively
if (a[medianA] == b[medianB])
return a[medianA];
else if (a[medianA] < b[medianB])
{
if ((aRight - aLeft + 1) % 2 == 1) // an odd length
return biSearchMedian(a, medianA, aRight, b, bLeft, medianB);
else // an even length
return biSearchMedian(a, medianA + 1, aRight, b, bLeft, medianB);
}
else
{
if ((bRight - bLeft + 1) % 2 == 1) // an odd length
return biSearchMedian(a, aLeft, medianA, b, medianB, bRight);
else // an even length
return biSearchMedian(a, aLeft, medianA, b, medianB + 1, bRight);
}
}
}
int a[100000];
int b[100000];
int main(void)
{
int n; // the length
cin >> n;
// receive two sequences
for (int i = 0; i < n; i++)
cin >> a[i];
for (int i = 0; i < n; i++)
cin >> b[i];
// calculate the median using the binary search algorithm
int median = biSearchMedian(
a, 0, n - 1,
b, 0, n - 1);
// output the median
cout << median;
return 0;
}
4. T(n) and S(n) / 算法时间及空间复杂度分析(要有分析过程)
Like the binary search, the time complexity of the algorithm is \(O(\log N)\) and space complexity \(O(N)\).
Time complexity:
Divide: Split the sequences into 2 halves, thus \(O(1)\).
Conquer: The same as the binary search, thus \(T(\frac{N}{2})\).
Merge: Not needed here.
Thus
\[ \begin{align} T(N) &= T_{conquer} + T_{divide + merge} \ &= T(\frac{N}{2}) + O(1) \ &= O(N^{\log1}) + O(1) \ &= O(N^0) + O(N^0) \ &= O(N^0 \log N) \ &= O(\log N) \end{align} \]
Space complexity:
Since the algorithm does not use extra space, the space complexity
\[
\begin{align}
S(N) &= O(1)
\end{align}
\]
The same as the binary search algorithm
5. Experience / 心得体会(对本次实践收获及疑惑进行总结)
Originally we concatenated two sequences together and calculated the median with the equation \(\lfloor\frac{N+1}{2}\rfloor\). However, this solution has a time complexity of \(O(N)\), which is greater than the requirement.
Including the question discussed here, I‘ve encountered many questions whose solution is the binary search. When there‘s info about half, middle, median and \(\log N\) in the question, we can try using the binary search algorithm, which is a good example of the divide-and-conquer strategy.
原文:https://www.cnblogs.com/Chunngai/p/11565544.html