缓冲管理器小结

关于FileObject
 
一个文件可以多个FO,每个FO代表一个打开实例.
对于代表同一个Stream来说,它们的
Section Object Pointers值是一样的.
FsContext值也是一样的.
 
 
FsContext对同一个文件的所有FO都一样.
FsContext2  每个用户句柄上下文都有,metada没有用户句柄上下文.
 
SectionObjectPointers  单个实例的指针.
    DataSection  若已创建一个Mapped Section则此值非空.
    SharedCacheMap  若此Stream已被缓冲管理器Set up则此值非空.
    ImageSection      对可Executables文件才有此项
 
PrivateCacheMap ---per handle Cc context
                                    (readahead) that alse servers as erference form this file object to the shared cache Map.
   
单个实例和Metadata
文件系统用一个Stream代表Metadata,但它们对用户是不可见的.
Directoryies require a level of  indirection to escape single instancing exposing the data.
 
Filesystems create a second internal "stream" fileobject
-- user's fileobject has NULL members in its Section Object Pointers
--stream fileobjects have no FsContext2(user handle context)
 
All metadata streams are built lik this (MFTs,FATs,ects.)
FsContext2 == NULL play an important role in how Cc treats these streams,
which we'll discuss later.
 
 
View Management
 
 
A Shared Cache Map has an array of View Access Control Block (VACB) pointers which record the base cache address of each view
promoted to a sparse form for files > 32MB
 
Access interfaces map File+FileOffset to a cache address.
 .Taking a view miss results in
a new mapping ,possibly unmapping  an unreferenced
view in another file(views are recycled LRU).
 
 
Since a view is fixed size, mapping across a view is impossible � Cc returns .e address
 
 
 
Fixed size means no fragmentation …
 
 
 
Interface Summary
 
 
    File objects start out unadorned
    CcInitializeCacheMap to initiate caching via Cc . a file object
  setup the Shared/Private Cache Map & Mm if neccesary
    Access methods (Copy, Mdl, Mapping/Pinning)
    Maintenance Functions
    CcUninitializeCacheMap to terminate caching . a file object
  teardown S/P Cache Maps
  Mm lives .. Its data section is the cache!
 
The Cache Manager Doesn’t Stand Alone
 
 
    Cc is an extension of either Mm or the FS depending how you look at it
    Cc is intimately tied into the filesystem model
    Understanding Cc means we have to take a slight detour to mention some concepts filesystem folks think are interesting. Raise your hand if you’re a filesystem person :-)
 
 
The Slight Filesystem Digression
    Three basic types of IO . NT: cached, noncached and “paging”
    Paging IO is simply IO generated by Mm � flushing or faulting
  the data section implies the file is big enough
  can never extend a file
    A filesystem will re-enter itself . the same callstack as Mm dispatches cache pagefaults
    This makes things exciting! (ERESOURCEs)
 
 
The Three File Sizes
 
    FileSize � how big the file looks to the user
  1 byte, 102 bytes, 1040592 bytes
    AllocationSize � how much backing store is allocated . the volume
  multiple of cluster size, which is 2n * sector size
  ... a more practical definition shortly
    ValidDataLength � how much of the file has been written by the user in cache, zeros seen beyond (some OS use sparse allocation)
    ValidDataLength <= FileSize <= AllocationSize
 
 
    Why not use Fast IO all the time?
  file locks
  oplocks
  extending files (and so forth)
 
 
Pagefault Cluster Hints
    Taking a pagefault can result in Mm opportunistically bringing surrounding pages in (up 7/15 depending)
    Since Cc takes pagefaults . streams, but knows a lot about which pages are useful, Mm provides a hinting mechanism in the TLS
  MmSetPageFaultReadAhead()
    Not exposed to usermode …
 
 
 
Readahead
 
    CcScheduleReadAhead detects patterns . a handle and schedules readahead into the next suspected ranges
  Regular motion, backwards and forwards, with gaps
  Private Cache Map contains the per-handle info
  Called by CcCopyRead and CcMdlRead
    Readahead granularity (64KB) controls the scheduling trigger points and length
  Small IOs � don’t want readahead every 4KB
  Large IOs � ya get what ya need (up to 8MB, thanks to Jim Gray)
    CcPerformReadAhead maps and touch-faults pages in a Cc worker thread, will use the new Mm prefetch APIs in a future release
 
 
Unmap Behind
 
    Recall how views are managed (misses)
    On view miss, Cc will unmap two views behind the current (missed) view before mapping
    Unmapped valid pages go to the standby list in LRU order and can be soft-faulted. In practice, this is where much of the actual cache is as of Windows 2000.
    Unmap behind logic is default due to large file read/write operations causing huge swings in working set. Mm’s working set trim falls down at the speed a disk can produce pages, Cc must help.
 
 
Write Throttling
    Avoids out of memory problems by delaying writes to the cache
  Filling memory faster than writeback speed is not useful, we may as well run into it sooner
    Throttle limit is twofold
  CcDirtyPageThreshold � dynamic, but ~1500 . all current machines (small, but see above)
  MmAvailablePages & pagefile page backlog
    CcCanIWrite sees if write is ok, optionally blocking, also serving as the restart test
    CcDeferWrite sets up for callback when write should be allowed (async case)
    !defwrites debugger extension triages and shows the state of the throttle
 
 
Writing Cached Data
 
    There are three basic sets of threads involved, .ly .e of which is Cc’s
  Mm’s modified page writer
    the paging file
  Mm’s mapped page writer
    almost anything else
  Cc’s lazy writer pool
    executing in the kernel critical work queue
    writes data produced through Cc interfaces
 
 
 
The Lazy Writer
    Name is misleading, its really delayed
    All files with dirty data have been queued .to CcDirtySharedCacheMapList
    Work queueing � CcLazyWriteScan()
  Once per second, queues work to arrive at writing 1/8th of dirty data given current dirty and production rates
  Fairness considerations are interesting
    CcLazyWriterCursor rotated around the list, pointing at the next file to operate . (fairness)
  16th pass rule for user and metadata streams
    Work issuing � CcWriteBehind()
  Uses a special mode of CcFlushCache() which flushes front to back (HotSpots � fairness again)
 
 
 
 
Letting the Filesystem Into The Cache
 
    Two distinct access interfaces
  Map � given File+FileOffset, return a cache address
  Pin � same, but acquires synchronization � this is a range lock . the stream
    Lazy writer acquires synchronization, allowing it to serialize metadata production with metadata writing
    Pinning also allows setting of a log sequence number (LSN) . the update, for transactional FS
  FS receives an LSN callback from the lazy writer prior to range flush
 
 
Remember FsContext2?
 
    Synchronization . Pin interfaces requires that Cc be the writer of the data
    Mm provides a method to turn off the mapped page writer for a stream, MmDisableModifiedWriteOfSection()
  confusing name, I know (modified writer is not involved)
    Serves as the trigger for Cc to perform synchronization . write
 
 
BCBs and Lies Thereof
 
    Mapping and Pinning interfaces return opaque Buffer Control Block (BCB) pointers
    Unpin receives BCBs to indicate regions
    BCBs for Map interfaces are usually VACB pointers
    BCBs for Pin interfaces are pointers to a real BCB structure in Cc, which references a VACB for the cache address
 
 
Cache Manager Summary
Virtual block cache for files not logical block cache for disks
Memory manager is the ACTUAL cache manager
Cache Manager context integrated into FileObjects
Cache Manager manages views . files in kernel virtual address space
I/O has special fast path for cached accesses
The Lazy Writer periodically flushes dirty data to disk
Filesystems need two interfaces to CC: map and pin
相关文章
相关标签/搜索