从内存中丢弃一个数据块,如果可以,可能把数据块放到磁盘上。 当内存到达限制并且需要释放空间时调用。 /** * Drop a block from memory, possibly putting it on disk if applicable. Called when the memory * store reaches its limit and needs to free up

 每个结点(driver和执行器)上运行的管理器,这个管理器提供本地和远程放置和获取数据块的接口。 /** * Manager running on every node (driver and executors) which provides interfaces for putting and * retrieving blocks both locally and remotely i

MemoryManager是一个实施在执行器和存储之间怎么分配内存的内存管理器,它的子类是UnifiedMemoryManager。 在这里,执行内存指在shfuules,joins, sorts 和aggregations中用来计算的内存,而存储内存指的是用来缓存和在集群中传播内部数据。 每个JVM中只有一个MemoryManager。 /** * An abstract memory man

object UnifiedMemoryManager { // Set aside a fixed amount of memory for non-storage, non-execution purposes. // This serves a function similar to `spark.memory.fraction`, but guarantees that we r

/** * Implements policies and bookkeeping for sharing an adjustable-sized pool of memory between tasks. * * Tries to ensure that each task gets a reasonable share of memory, instead of some task ra

RedirectableOutputStream把一个输出流重新定向为另一个输出流。使用方式如下: /** * A wrapper which allows an open [[OutputStream]] to be redirected to a different sink. */ private[storage] class RedirectableOutputStream exten

ChunkedByteBufferOutputStream把输入的数据分块存储。 /** * An OutputStream that writes to fixed-size chunks of byte arrays. * * @param chunkSize size of each chunk, in bytes. */ private[spark] class ChunkedBy

StorageMemoryPool是MemoryPool的子类,MemoryPool参见MemoryPool 源代码分析, 构造参数多了一个MemoryMode,poolName根据MemoryMode生成。 /** * Performs bookkeeping for managing an adjustable-size pool of memory that is used for sto

MemoryPool管理大小可调的内存区域。 这个类被MemoryManager作为内部使用。 参数是一个MemoryManager实例的一个锁,用于同步。我们特意擦除类型信息,变为Object类型,避免程序错误,因为这个对象应当仅作为同步用。 _poolSize 变量保存当前内存区域的大小。memoryUsed 函数用于已经使用的内存,在子类实现。 /** * Manages bookkeep

DiskStore 存储BlockManager的数据块到磁盘。以getBytes为例,如果文件小于2m,则一次读到内存里,否则用内存映射文件,代码如下: def getBytes(blockId: BlockId): ChunkedByteBuffer = { val file = diskManager.getFile(blockId.name) val channel = n

1 2 3 4 5 6 7 8 9