Air
2020年12月19日

堆(heap)

堆是计算机里经常使用的数据结构，它经常用于实现优先级队列，具有优先权限的任务会排在最前头，并会得到优先处理的权力。在排序算法里，专门有一种算法，是基于堆来排序的，也就是我们常说的堆排序。堆能够将实数对半分，其中，根元素的值就是分割点，这种特性可以用于解决：获得某个数据集合里前M大（小）的数据集合。

本文将通过以下几方面来介绍Heap：

定义
实现和操作
空间复杂度和时间复杂度

定义

In computer science, a heap is a specialized tree-based data structure which is essentially an almost complete[1] tree that satisfies the heap property: in a max heap, for any given node C, if P is a parent node of C, then the key (the value) of P is greater than or equal to the key of C. In a min heap, the key of P is less than or equal to the key of C.[2] The node at the "top" of the heap (with no parents) is called the root node.

以上信息来自wiki，它告诉我们，堆是一种基于树的数据结构，这棵树具有完整节点的特性(Complete Tree)。除此之外，这棵树还满足一些特性，比如，对于大根堆（Max Heap），每个父节点的值比它的所有孩子节点的值大；对于小根堆（Min Heap），每个父节点的值比它的所有孩子节点的值小。其中，没有父节点的节点被称为根节点（Root）。

实现堆的一种方法是借助Binary Tree-它规定了，每一个父节点最多只有2个孩子节点，如下图所示：

上图是一个大根堆，每一个父节点最多只有2个孩子节点，这种类型的堆就是我们常说的Binary Heap。它的根节点，也就是图中的100，比所有孩子节点19和36大，以此同时，其它父节点，比如19，比它的孩子节点17和3大。除此之外，所有节点均是从上到下，从左到右依次排布，中间没有空隙（比如17和25之间一定会存在一个节点3），这种特性就是我们之前提到的完整节点的特性。

如果你将上面的大根堆平铺到实数轴上，你会惊奇的发现，根节点是一个分割点，这个大根堆上的所有数值，均小于或等于根节点100。同理，如果你将一个小根堆平铺到实数轴上，那么该小根堆上的所有节点均会大于或等于根节点。这种特性使得堆能够将数据对半分割。

实现和操作

Binary Heap的实现一般是基于一块连续的存储，也就是一个数组。数组的特性我们在之前的内容里也介绍了，如果你不清楚数组有哪些特性，那么你需要滑动到最上面回顾和学习。

如果是基于数组来实现的Binary Heap，则需要建立一些规则：

给定一个节点的下标 i，则该节点的左孩子和右孩子的下标分别是：2 * i + 1 和 2 * i + 2
给定一个节点的下标 i，则该节点的父亲节点的下标是：（i - 1） / 2，并且向下取整，由于根节点没有父亲节点，因此根节点的父亲节点的下标是-1

选择堆的数据存储之后，接下来，我们要对堆上的数据进行一系列的操作，常见的关于堆的操作至少需要实现以下几个：

给定一个数组来建立堆
取出并移除根元素
向已有的堆里新添一个元素
堆排序

以下Python代码实现了一个堆：

class Heap:
    def __init__(self, arr_data, min_heap = True):
        self.arr_data = arr_data
        self.compare_func = (lambda parent, child : parent > child) if min_heap else (lambda parent, child : parent < child)
        for index, _ in enumerate(self.arr_data):
            self._heap_up(index)

    def _heap_up(self, start_index):
        tmp_index = start_index
        while True:
            parent_index = int((tmp_index -1) / 2)
            if tmp_index == 0:
                break
            if self.compare_func(self.arr_data[parent_index], self.arr_data[tmp_index]):
                self.arr_data[parent_index], self.arr_data[tmp_index] = self.arr_data[tmp_index], self.arr_data[parent_index]
            else:
                break
            tmp_index = parent_index

    def push(self, val):
        self.arr_data.append(val)
        self._heap_up(len(self.arr_data) - 1)

    def pop(self):
        if len(self.arr_data) == 0:
            return (False, None)
        val = self.arr_data[0]
        self.arr_data[0] = self.arr_data[len(self.arr_data) - 1]
        self.arr_data.pop()
        self._heap_down()
        return (True, val)

    def _heap_down(self):
        total_len = len(self.arr_data)
        parent_index = 0
        while True:
            left_child_index = parent_index * 2 + 1
            right_child_index = parent_index * 2 + 2

            # left child and right child are None
            if left_child_index >= total_len:
                break
            # has left child and right child is None
            if left_child_index < total_len and right_child_index == total_len:
                if self.compare_func(self.arr_data[parent_index], self.arr_data[left_child_index]):
                    self.arr_data[left_child_index], self.arr_data[parent_index] = self.arr_data[parent_index], self.arr_data[left_child_index]
                    parent_index = left_child_index
                else:
                    break
            # has left child and has right child
            else:
                child_index = right_child_index if self.compare_func(self.arr_data[left_child_index], self.arr_data[right_child_index]) else left_child_index
                if self.compare_func(self.arr_data[parent_index], self.arr_data[child_index]):
                    self.arr_data[child_index], self.arr_data[parent_index] = self.arr_data[parent_index], self.arr_data[child_index]
                    parent_index = child_index
                else:
                    break

    def sort(self):
        result = []
        while len(self.arr_data) != 0:
            val = self.pop()
            result.append(val[1])
        return result

    def replace_root(self, val):
        if self.arr_data:
            self.arr_data[0] = val
            self._heap_down()
            return True
        else:
            return False

该堆的用法如下：

定义一个小根堆

# define a min heap
minHeap = Heap([2,10,1,5,8,22])
print(minHeap.sort())

输出结果为：

[1, 2, 5, 8, 10, 22]

定义一个大根堆

# define a max heap
maxHeap = Heap([2,10,1,5,8,22], False)
print(maxHeap.sort())

输出结果为：

[22, 10, 8, 5, 2, 1]

空间复杂度和时间复杂度

之前我们定义了一个堆，并且实现了4个方法，接下来，让我们看看每一种方法它们的空间复杂度和时间复杂度是多少。

__init__用于创建一个堆，它的空间复杂度是O(1)，而时间复杂度是O(nlog(n))
push用于往堆中增加一个元素，它的空间复杂度是O(1)，而时间复杂度是O(log(n))
pop用于取出堆中的根节点，它的空间复杂度是O(1)，而时间复杂度是O(log(n))
sort用于排序，它的空间复杂度是O(1)，而时间复杂度是O(nlog(n))

{{ c.user.name }}
- {{ c.display_datetime }}
{{ c.content }}

请登录购买（{{ selected_gold.price }}元），即可解锁剩余教程

点击购买

{{ s.title }}
{{ a.title }}
{{ a.title }}

{{ title }}

堆(heap)

定义

实现和操作

空间复杂度和时间复杂度

{{ c.user.name }}

{{ title }}

{{ selected_gold.title }}

{{ title }}

堆(heap)

定义

实现和操作

空间复杂度和时间复杂度

创建新文章

删除文章或章节

{{ title }}