ChatGPT解决这个技术问题 Extra ChatGPT

Finding the median of an unsorted array

To find the median of an unsorted array, we can make a min-heap in O(nlogn) time for n elements, and then we can extract one by one n/2 elements to get the median. But this approach would take O(nlogn) time.

Can we do the same by some method in O(n) time? If we can, then please tell or suggest some method.

Keep in mind that if it takes O(nlogn) then you might as well just sort the array and divide the index by 2.
building heap takes O(n) time not O(nlogn)
@JerryGoyal, If you have all elements at the same time, then building a heap takes O(n). But if you have stream of elements then, it takes O(nlogn). Its like pushing one element at time, and n times. So, I guess he means stream of elements here.

K
Katamaritaco

You can use the Median of Medians algorithm to find median of an unsorted array in linear time.


@KevinKostlan It actually isn't approximate, it's the real median and it finds it in linear time. Notice that after finding the median of medians (which is guaranteed to be greater than at least 30% of the elements and smaller than at least 30% of the elements) you partition the the array using that pivot. Then you recurse (if necessary) into one of those array which is at most %70 the size of the original array in order to find the real median (or in the general case the k-statistic).
@dcmm88: Please read [en.wikipedia.org/wiki/Median_of_medians]. In linear time, the best you can get is a good guess. (The moment you recurse you're no longer O(n)/linear - by definition.)
@AlanK the wikipedia page you linked specifically say that it is. en.wikipedia.org/wiki/…
@dcmm88 Read the first sentence of the article again. MoM is O(n) and approximate. When you prescribe recursive repetition of a linear operation over (subsets of) a dataset to obtain a "real median", you are specifying a new algorithm, with greater time complexity, by definition.
@AlanK excuse me, I misinterpreted the reply. I thought approximate was referring to the complexity, not the accuracy. Nevertheless, you can still use median of medians to find the true median in O(n), it's just that the wikipedia page doesn't explain this. I hinted at the solution in my previous reply, and you can find a more detailed explanation here, from stackoverflow.com/a/251884/3272850. Basically since you are recursing into a set that is 70% size of previous step, you get a geometric series that sums to some constant times O(n).
r
rkachach

I have already upvoted the @dasblinkenlight answer since the Median of Medians algorithm in fact solves this problem in O(n) time. I only want to add that this problem could be solved in O(n) time by using heaps also. Building a heap could be done in O(n) time by using the bottom-up. Take a look to the following article for a detailed explanation Heap sort

Supposing that your array has N elements, you have to build two heaps: A MaxHeap that contains the first N/2 elements (or (N/2)+1 if N is odd) and a MinHeap that contains the remaining elements. If N is odd then your median is the maximum element of MaxHeap (O(1) by getting the max). If N is even, then your median is (MaxHeap.max()+MinHeap.min())/2 this takes O(1) also. Thus, the real cost of the whole operation is the heaps building operation which is O(n).

BTW this MaxHeap/MinHeap algorithm works also when you don't know the number of the array elements beforehand (if you have to resolve the same problem for a stream of integers for e.g). You can see more details about how to resolve this problem in the following article Median Of integer streams


Why does this work? Suppose your array is [3, 2, 1]. We would then put the first 2 in a max heap: [3, 2], thus 3 would be the root, so that 2, its child must be smaller than it. And, we would have [1] in the min heap. According to this algorithm, we would then choose the max (root), of the maxHeap as our median. Wouldn't this give us 3?
It's O(n^2) time worse case, not O(n). When referring to an algorithm's Big O complexity, without specifying the case, it's typically assumed that you're referring to the worse time.
Yeah the answer given is wrong, he said first n/2 elements needs to be added that's not true, in reality you have to add first n/2 (or n/2 +1 if n is odd) smallest element in Max heap and rest in Min heap hence it would ensure correct answer. Follow the link he provided below "Median of integer stream"
B
BrokenGlass

Quickselect works in O(n), this is also used in the partition step of Quicksort.


I don't think quickselect would necessarily give the median in ONLY ONE run. It depends on your pivot choice.
Unfortunately, quickselect to find median will take O(n^2) in worst case. This occurs when we reduce the array by just 1 element in each iteration of QuickSelect. Consider an already sorted array and we always choose right most element as pivot. I know this is bit foolish to do so but this is how worst cases are.
@VishalSahu you are wrong. Quickselect runs in O(n), because it always chooses a good pivot
Quickselect is between O(n) and O(n^2).
r
royhowie

The quick select algorithm can find the k-th smallest element of an array in linear (O(n)) running time. Here is an implementation in python:

import random

def partition(L, v):
    smaller = []
    bigger = []
    for val in L:
        if val < v: smaller += [val]
        if val > v: bigger += [val]
    return (smaller, [v], bigger)

def top_k(L, k):
    v = L[random.randrange(len(L))]
    (left, middle, right) = partition(L, v)
    # middle used below (in place of [v]) for clarity
    if len(left) == k:   return left
    if len(left)+1 == k: return left + middle
    if len(left) > k:    return top_k(left, k)
    return left + middle + top_k(right, k - len(left) - len(middle))

def median(L):
    n = len(L)
    l = top_k(L, n / 2 + 1)
    return max(l)

How is this linear? If I understand correctly this implementation is O(n^2) in worst case.
@akki It's "expected value" linear time because of the randomness. The intuition is that the random index will, on average, split the list into a list of 1/4 size and of 3/4 size.
A
AlanK

The answer is "No, one can't find the median of an arbitrary, unsorted dataset in linear time". The best one can do as a general rule (as far as I know) is Median of Medians (to get a decent start), followed by Quickselect. Ref: [https://en.wikipedia.org/wiki/Median_of_medians][1]


I
Imposter

It can be done using Quickselect Algorithm in O(n), do refer to Kth order statistics (randomized algorithms).


A
Adam Gawne-Cain

As wikipedia says, Median-of-Medians is theoretically o(N), but it is not used in practice because the overhead of finding "good" pivots makes it too slow.
http://en.wikipedia.org/wiki/Selection_algorithm

Here is Java source for a Quickselect algorithm to find the k'th element in an array:

/**
 * Returns position of k'th largest element of sub-list.
 * 
 * @param list list to search, whose sub-list may be shuffled before
 *            returning
 * @param lo first element of sub-list in list
 * @param hi just after last element of sub-list in list
 * @param k
 * @return position of k'th largest element of (possibly shuffled) sub-list.
 */
static int select(double[] list, int lo, int hi, int k) {
    int n = hi - lo;
    if (n < 2)
        return lo;

    double pivot = list[lo + (k * 7919) % n]; // Pick a random pivot

    // Triage list to [<pivot][=pivot][>pivot]
    int nLess = 0, nSame = 0, nMore = 0;
    int lo3 = lo;
    int hi3 = hi;
    while (lo3 < hi3) {
        double e = list[lo3];
        int cmp = compare(e, pivot);
        if (cmp < 0) {
            nLess++;
            lo3++;
        } else if (cmp > 0) {
            swap(list, lo3, --hi3);
            if (nSame > 0)
                swap(list, hi3, hi3 + nSame);
            nMore++;
        } else {
            nSame++;
            swap(list, lo3, --hi3);
        }
    }
    assert (nSame > 0);
    assert (nLess + nSame + nMore == n);
    assert (list[lo + nLess] == pivot);
    assert (list[hi - nMore - 1] == pivot);
    if (k >= n - nMore)
        return select(list, hi - nMore, hi, k - nLess - nSame);
    else if (k < nLess)
        return select(list, lo, lo + nLess, k);
    return lo + k;
}

I have not included the source of the compare and swap methods, so it's easy to change the code to work with Object[] instead of double[].

In practice, you can expect the above code to be o(N).


h
hrithik maheshw

Let the problem be: finding the Kth largest element in an unsorted array.

Divide the array into n/5 groups where each group consisting of 5 elements.

Now a1,a2,a3....a(n/5) represent the medians of each group.

x = Median of the elements a1,a2,.....a(n/5).

Now if k

else if k>n/2 then we can remove the smallest ,2nd smallest and 3rd smallest element of the group whose median is smaller than the x. We can now call the function of again with 7n/10 elements and finding the (k-3n/10)th largest value.

Time Complexity Analysis: T(n) time complexity to find the kth largest in an array of size n.

T(n) = T(n/5) + T(7n/10) + O(n)

if you solve this you will find out that T(n) is actually O(n)

n/5 + 7n/10 = 9n/10 < n


T
Tosa Logitech

Notice that building a heap takes O(n) actually not O(nlogn), you can check this using amortized analysis or simply check in Youtube. Extract-Min takes O(logn), therefore, extracting n/2 will take (nlogn/2) = O(nlogn) amortized time.

About your question, you can simply check at Median of Medians.


F
Filippo Teodoro

Given two sorted arrays nums1 and nums2 of size m and n respectively, return the median of the two sorted arrays.

Example 1:

Input: nums1 = [1,3], nums2 = [2]
Output: 2.00000
Explanation: merged array = [1,2,3] and median is 2.

Code:

class Solution:
    def findMedianSortedArrays(self, nums1: List[int], nums2: List[int]) -> float:
        merged_array = sorted(nums1 + nums2)
        if len(merged_array) % 2 == 0:
            index = int(len(merged_array)/2)
            output =  (merged_array[index - 1] +  merged_array[index])/2
        else: 
            index = int(len(merged_array)/2)
            output = merged_array[index]
        return output

While code-only answers (reasonably) are frowned upon here, generally: what question does this answer, where do self,nums1/nums2 come from, why would the elements be ints?
This does not answer Can we [find the median of an unsorted array] by some method in O(n) time?