Transcription

Boki: Stateful Serverless Computing with Shared LogsZhipeng JiaEmmett WitchelThe University of Texas at AustinThe University of Texas at Austin and Katana Graphand the stateful applications built with them [36, 48, 52, 59].Serverless applications are often composed of multiple functions, where application state is shared. However, managingshared state using current options, e.g., cloud databases or object stores, struggles to achieve strong consistency and faulttolerance while maintaining high performance and scalability [50, 56].The shared log [23, 30, 55] is a popular approach for buildingstorage systems that can simultaneously achieve scalability,strong consistency, and fault tolerance [7, 22, 24, 26, 35, 41, 54,55]. A shared log offers a simple abstraction: a totally orderedlog that can be accessed and appended concurrently. Whilesimple, a shared log can efficiently support state machinereplication [49], the well-understood approach for buildingfault-tolerant stateful services [24, 55]. The shared log API alsofrees distributed applications from the burden of managingthe details of fault-tolerant consensus, because the consensus protocol is hidden behind the API [22]. Providing sharedlogs to serverless functions can address the dual challengesof consistency and fault tolerance (§ 2.1).We present Boki (meaning bookkeeping in Japanese), aFaaS runtime that exports the shared log API to functions forstoring shared state. Boki realizes the shared log API witha LogBook abstraction, where each function invocation isassociated with a LogBook (§ 3). For a Boki application, itsfunctions share a LogBook, allowing them to share and coordinate updates to state. In Boki, LogBooks enable statefulserverless applications to manage their state with durability,consistency, and fault tolerance.The shared log API is simple to use and applicable to diverse applications [22, 24, 25, 55], so the challenge of Boki isto achieve high performance and strong consistency whileconforming to the serverless environment (§ 2.2). Data locality is one challenge for serverless storage, because disaggregated storage is strongly preferred in the serverlessenvironment [36, 48, 52]. Boki separates the read and writepath, where read locality is optimized with a cache on function nodes and writes are optimized with scale-out bandwidth. Boki will scatter writes over variable numbers of shardswhile providing consistent reads and fault tolerance. In Boki,high performance, read consistency and fault tolerance areachieved by a single log-based mechanism, the metalog.The metalog defines a total order of Boki’s internal statethat applications can use to enforce consistency when theyneed it. For example, monotonic reads are enforced by tracking metalog positions. The metalog contains metadata thattotally orders a log’s data records. Because Boki uses a compact format for the metalog, durability and consensus areAbstractBoki is a new serverless runtime that exports a shared logAPI to serverless functions. Boki shared logs enable statefulserverless applications to manage their state with durability,consistency, and fault tolerance. Boki shared logs achieve highthroughput and low latency. The key enabler is the metalog, anovel mechanism that allows Boki to address ordering, consistency and fault tolerance independently. The metalog ordersshared log records with high throughput and it provides readconsistency while allowing service providers to optimize thewrite and read path of the shared log in different ways. Todemonstrate the value of shared logs for stateful serverlessapplications, we build Boki support libraries that implementfault-tolerant workflows, durable object storage, and messagequeues. Our evaluation shows that shared logs can speed upimportant serverless workloads by up to 4.7 .CCS Concepts: Information systems Distributedstorage; Computer systems organization Dependable and fault-tolerant systems and networks; Cloudcomputing.Keywords: Serverless computing, function-as-a-service, sharedlog, consistencyACM Reference Format:Zhipeng Jia and Emmett Witchel. 2021. Boki: Stateful ServerlessComputing with Shared Logs. In ACM SIGOPS 28th Symposium onOperating Systems Principles (SOSP ’21), October 26–29, 2021, Virtual Event, Germany. ACM, New York, NY, USA, 17 pages. onServerless computing has become increasingly popular forbuilding scalable cloud applications. Its function-as-a-service(FaaS) paradigm empowers diverse applications includingvideo processing [21, 32], data analytics [39, 47], machinelearning [27, 51], distributed compilation [31], transactionalworkflows [56], and interactive microservices [38].One key challenge in the current serverless paradigm is themismatch between the stateless nature of serverless functionsPermission to make digital or hard copies of part or all of this work for personalor classroom use is granted without fee provided that copies are not made ordistributed for profit or commercial advantage and that copies bear this noticeand the full citation on the first page. Copyrights for third-party componentsof this work must be honored. For all other uses, contact the owner/author(s).SOSP ’21, October 26–29, 2021, Virtual Event, Germany 2021 Copyright held by the owner/author(s).ACM ISBN 77132.34835411

SOSP ’21, October 26–29, 2021, Virtual Event, GermanyZhipeng Jia and Emmett Witchelvital, but high data throughput is not. Therefore Boki storesand updates metalogs using a simple primary-driven design.Boki handles machine failures by reconfiguration, similar to previous shared log systems [22, 30, 55]. Because themetalog controls Boki’s internal state transitions, sealing themetalog (making it no longer writable) pauses state transitions. Therefore, Boki implements reconfiguration by sealingthe metalog, changing the system configuration, and startinga new metalog.Boki’s metalog allows easy adoption of state-of-the-arttechniques from previous shared log designs because it makelog ordering, consistency, and fault tolerance into independent modules (§ 4.1). Boki adapts ordering from Scalog [30]and fault tolerance from Delos’s [22] sealing protocol. Another benefit of the metalog is it decouples read consistencyfrom data placement, enabling indices and caches for logrecords to be co-located with functions. Without interferingwith read consistency, cloud providers can build simple cacheswhich increase data locality when scheduling functions onnodes where their data is likely to be cached.We implement Boki’s shared log designs on top of Nightcore [38], a FaaS runtime optimized for microservices. Nightcore has no specialized mechanism for state management,Boki provides it; while Nightcore’s design for I/O efficiencybenefits Boki. Boki achieves append throughput of 1.2M Ops/swithin a single LogBook, while maintaining a p99 latency of6.4ms. With LogBook engines co-located with functions, Bokiachieves a read latency of 121𝜇s for best-case LogBook reads.To make writing Boki applications easier, we build supportlibraries on top of the LogBook API aimed at three differentserverless use cases: fault-tolerant workflows, durable object storage, and serverless message queues. Boki supportlibraries leverage techniques from Beldi [56], Tango [24],and vCorfu [55], while adapting them for the LogBook API.Boki and its support libraries are open source on GitHubut-osa/boki.This paper makes the following contributions. Boki is a FaaS runtime that exports a LogBook API forstateful serverless applications to manage their state withdurability, consistency, and fault tolerance. Boki proposes a unified mechanism, the metalog, to address log ordering, read consistency, and fault tolerance. Themetalog decouples the read and write path of LogBooks, letting Boki achieve high throughput and low latency. We build Boki support libraries that use the LogBook APIto demonstrate the value of shared logs for stateful serverlessapplications. The libraries implement fault-tolerant workflows (BokiFlow), durable object storage (BokiStore), andserverless message queues (BokiQueue). Our evaluation shows: BokiFlow executes workflows4.3–4.7 faster than Beldi [56]; BokiStore achieves 1.18–1.25 higher throughput than MongoDB, while executing transactions 1.5–2.3 faster; BokiQueue achieves 2.14 higherthroughput and up to 15 lower latency than Amazon SQS [2],while achieving 1.23 higher throughput and up to 2.0 lowerlatency than Apache Pulsar [3].2Background and MotivationServerless functions, or function as a service (FaaS) [4, 6],allow developers to upload simple functions to the cloudprovider which are invoked on demand. The cloud providermanages the execution environment of serverless functions.State management remains a major challenge in the currentFaaS paradigm [36, 48, 52, 59]. Because of the stateless natureof serverless functions, current serverless applications relyon cloud storage services (e.g., Amazon S3 and DynamoDB)to manage their state. However, current cloud storage cannot simultaneously provide low latency, low cost, and highthroughput [42, 47]. Relying on cloud storage also complicates data consistency in stateful workflows [56], becausefunctions in a workflow could fail in the middle which leavesinconsistent workflow state stored in the database.2.1Shared Log Approach for Stateful ServerlessIn the current FaaS paradigm, stateful applications struggle toachieve fault tolerance and strong consistency of their criticalstate. For example, consider a travel reservation app builtwith serverless functions. This app has a function for booking hotels and another function for booking flights. Whenprocessing a travel reservation request, both functions areinvoked, but both functions can fail during execution, leavinginconsistent state. Using current approaches for state management such as cloud object stores or even cloud databases,it is difficult to ensure the consistency of the reservation stategiven the failure model [56].The success of log-based approaches for data consistencyand fault tolerance motivates the usage of shared logs forstateful FaaS. For example, Olive [50] proposes a client library interacting with cloud storage, where a write-aheadredo log is used to achieve exactly-once semantics in face offailures. Beldi [56] extends Olive’s log-based techniques fortransactional serverless workflows. State machine replication(SMR) [49] is another general approach for fault tolerance,where application state is replicated across servers by a command log. The command log is traditionally backed by consensus algorithms [45, 46, 53]. But recent studies demonstratea shared log can provide efficient abstraction to support SMRbased data structures [24, 55] and protocols [22, 25]. Boki provides shared logs to serverless functions, so that Boki’s applications can leverage well-understood log-based mechanismsto efficiently achieve data consistency and fault tolerance.By examining demands in serverless computing, we identify three important cases where shared logs provide a solution. Boki provides support libraries for these use cases (§ 5).Fault-tolerant workflows. Workflows orchestrating stateful functions create new challenges for fault tolerance and2

Boki: Stateful Serverless Computing with Shared LogsSOSP ’21, October 26–29, 2021, Virtual Event, Germanytransactional state updates. Beldi [56] addresses these challenges via logging workflow steps. Beldi builds an atomic logging layer on top of DynamoDB. We adapt Beldi’s techniquesto the LogBook API without building an extra logging layer.struct LogRecord {uint64 t seqnum;string data;vector tag t tags; string auxdata;};// Append a new log record.status t logAppend(vector tag t tags, string data,uint64 t* seqnum);// Read the next/previous record whose// seqnum min seqnum , or max seqnum .status t logReadNext(uint64 t min seqnum, tag t tag,LogRecord* record);status t logReadPrev(uint64 t max seqnum, tag t tag,LogRecord* record);// Alias of logReadPrev(kMaxSeqNum, tag, record).status t logCheckTail(tag t tag, LogRecord* record);// Trim log records until seqnum .status t logTrim(uint64 t seqnum, tag t tag);// Set auxiliary data for the record of seqnum .status t logSetAuxData(uint64 t seqnum, string auxdata);Durable object storage. Previous studies like Tango [24]and vCorfu [55] demonstrate that shared logs can supporthigh-level data structures (i.e., objects), that are consistent,durable, and scalable. Motivated by Cloudflare’s Durable Objects [17], we build a library for stateful functions to createdurable JSON objects. Our object library is more powerfulthan Cloudflare’s because it supports transactions across objects, using techniques from Tango [24].Serverless message queues. One constraint in the currentFaaS paradigm is that functions cannot directly communicatewith each other via traditional approaches [31], e.g., networksockets. Shared logs can naturally be used to build messagequeues [30] that offer indirect communication and coordination among functions. We build a queue library that providesshared queues among serverless functions.2.2Figure 1. Boki’s LogBook API (§ 3).3Boki’s LogBook APIBoki provides a LogBook abstraction for serverless functionsto access shared logs. Boki maintains many independent LogBooks used by different serverless applications. In Boki, eachfunction invocation is associated with one LogBook, whosebook id is specified when invoking the function. A LogBookcan be shared with multiple function invocations, so thatapplications can share state among their function instances.Like previous shared log systems [22, 23, 30, 55], Boki exposes append, read, and trim APIs for writing, reading, anddeleting log records. Figure 1 lists Boki’s LogBook API.Technical Challenges for Serverless Shared LogsWhile prior shared log designs [22, 23, 30, 55] provide inspiration, the serverless environment creates new challenges.Elasticity and data locality. Serverless computing stronglybenefits from disaggregation [20, 34], which offers elasticity.However, current serverless platforms choose physical disaggregation, which reduces data locality [36, 52]. Boki achievesboth elasticity and data locality, by decoupling the read andthe write paths for log data and co-locating read componentswith functions.Read consistency. LogBook guarantees monotonic readsand read-your-writes when reading records. These guarantees imply a function has a monotonically increasing viewof the log tail. Moreover, a child function inherits its parentfunction’s view of the log tail, if two functions share the sameLogBook. This property is important for serverless applications that compose multiple functions (§4.4).Resource efficiency. Boki aims to support a high densityof LogBooks efficiently, so it multiplexes many LogBookson a single physical log. Multiplexing LogBooks can addressperformance problems that arise from a skewed distributionof LogBook sizes. But this approach creates a challenge forLogBook reads: how to locate the records of a LogBook. Bokiproposes a log index to address this issue, with the metalogproviding the mechanism for read consistency (§ 4.4).Sequence numbers (seqnum). The logAppend API returnsa unique seqnum for the newly appended log record. Theseqnums determine the relative order of records within aLogBook. They are monotonically increasing but not guaranteed to be consecutive. Boki’s logReadNext and logReadPrevAPIs enable bidirectional log traversals, by providing lowerand upper bounds for seqnums (§4.2).The ephemeral nature of FaaS. Shared logs are used forbuilding high-level data structures via state machine replication (SMR) [24, 55]. To allow fast reads, clients keep inmemory copies of the state machines, e.g., Tango [24] haslocal views for its SMR-based objects. However, serverlessfunctions are ephemeral – their in-memory state is not guaranteed to be preserved between invocations. This limitationforces functions to replay the full log when accessing a SMRbased object. Boki introduces auxiliary data (§ 3) to enableoptimizations like local views in Tango (§ 5.4). Auxiliary dataare designed as cache storage on a per-log-record basis, whiletheir relaxed durability and consistency guarantees allow asimple and efficient mechanism to manage their storage (§ 4.4).Log tags. Every log record has a set of tags, that is specified inlogAppend. Log tags enable selective reads and trims, whereonly records with the given tag are considered (see the tagparameter in logReadNext, logReadPrev, and logTrim APIs).Records with same tags form abstract streams within a singleLogBook. Having sub-streams in a shared log for selectivereads is important for reducing log replay overheads, that isused in Tango [24] and vCorfu [55] (§4.4).3

SOSP ’21, October 26–29, 2021, Virtual Event, GermanyZhipeng Jia and Emmett WitchelAuxiliary data. LogBook’s auxiliary data is designed as perlog-record cache storage, which is set by the logSetAuxDataAPI. Log reads may return auxiliary data along with normaldata if found. Auxiliary data can cache object views in a sharedlog-based object storage. These object views can significantlyreduce log replay overheads (§ 5.4).As auxiliary data is designed to be used only as a cache,Boki does not guarantee its durability, but provides best effortsupport. Moreover, Boki does not maintain the consistency ofauxiliary data, i.e., Boki trusts applications to provide consistent auxiliary data for the same log record. Relaxing durabilityand consistency allows Boki to have a simple yet efficientbackend for storing auxiliary data (§ 4.4).Table 1. Comparison between vCorfu [55], Scalog [30], and Boki.Boki’s metalog provides a unified approach for log ordering, readconsistency, and fault tolerance (§ 4.1).4Once acknowledged by a quorum, the new metalog entry issuccessfully appended. The primary sequencer always waitsfor the previous entry to be acknowledged by a quorum before issuing the next one. Sequencers propagate appendedmetalog entries to other Boki components that subscribe tothe metalog.Boki DesignBoki’s design combines a FaaS system with shared log storage.Boki internally stores multiple independent, totally orderedlogs. User-facing LogBooks are multiplexed onto internalphysical logs for better resource efficiency (§ 2.2). A Bokiphysical log has an associated metalog, playing the centralrole in ordering, consistency, and fault tolerance.4.1OrderingLog RecordsReadConsistencyFailureHandlingvCorfuA dedicatedsequencerStream replicasHole-fillingprotocolScalogPaxos andaggregatorsSharding policyPaxosBokiAppendingmetalog entriesTrackingmetalog positionsSealingthe metalog4.2ArchitectureFigure 2 depicts Boki’s architecture, which is based on Nightcore [38], a state-of-the-art FaaS system for microservices. InNightcore’s design, there is a gateway for receiving functionrequests and multiple function nodes for running serverlessfunctions. On each function node, an engine process communicates with the Nightcore runtime within function containers via low-latency message channels.Boki extends Nightcore’s architecture by adding components for storing, ordering, and reading logs. Boki also has acontrol plane for storing configuration metadata and handlingcomponent failures.Metalog is “the Answer to Everything” in BokiEvery shared log system must answer three questions becausethey store log records across a group of machines. The firstis how to determine the global total order of log records. Thesecond is how to ensure read consistency as the data are physically distributed. The third is how to tolerate machine failures.Table 1 shows different mechanisms used by previous sharedlog systems to address these three issues, whereas in Boki,the metalog provides the single solution to all of them.In Boki, every physical log has a single associated metalog,to record its internal state transitions. Boki sequencers append to the metalog, while all other components subscribe toit. In particular, appending, reading, and sealing the metalogprovide mechanisms for log ordering, read consistency, andfault tolerance: Log ordering. The primary sequencer appends metalogentries to decide the total order for new records, using Scalog [30]’s high-throughput ordering protocol. (§ 4.3) Read consistency. Different LogBook engines update theirlog indices independently, however, read consistency is enforced by comparing metalog positions. (§ 4.4) Fault tolerance. Boki is reconfigured by sealing metalogs,because a sealed metalog pauses state transitions for the associated log. When all current metalogs are sealed, a newconfiguration can be safely installed. (§ 4.5)Storage nodes. Boki stores log records on dedicated storagenodes. Boki’s physical logs are sharded, and each log shard isstored on 𝑛 data storage nodes (𝑛 data equals 3 in the prototype).Individual storage nodes contain different shards from thesame log, and/or shards from different logs, depending onhow Boki is configured.Sequencer nodes. Sequencer nodes run Boki sequencersthat store and update metalogs using a primary-driven protocol (see § 4.1). Sequencers append new metalog entries toorder physical log records as detailed in § 4.3. Similar to storage nodes, individual sequencer nodes can be configured toback different metalogs.LogBook engines. In Nightcore, the engine processes running on function nodes are responsible for dispatching function requests. Boki extends Nightcore’s engine by adding anew component serving LogBook calls. We refer the new partas LogBook engine, to distinguish it from the part servingfunction requests.LogBook API requests are forwarded to LogBook enginesby Boki’s runtime, which is linked with user supplied functionThe metalog is backed by a primary-driven protocol. Every Boki metalog is stored by 𝑛 meta sequencers (which is 3 inthe prototype). One of the 𝑛 meta sequencers is configured asprimary, and only the primary sequencer can append the metalog. To append a new metalog entry, the primary sequencersends the entry to all secondary sequencers for replication.4

Boki: Stateful Serverless Computing with Shared LogsFunctionrequestsGatewayFunction nodeContainerStorage nodesstore recordsof physical logsRecord storeFn codeFunction engineRuntimeLogBook engine (more functioncontainers)Replicatelog record ①SOSP ’21, October 26–29, 2021, Virtual Event, GermanyLogindexRecordcacheSequencer nodesstore and updatemetalogsControl planeZooKeeperReportprogress ②SequencerAppendmetalog ③(secondary)ControllerSequencerRecord store(secondary)Invoking functionsSequencerRecord store(primary)LogBook API calls①②③④Propagate metalog ④Appending logsFigure 2. Architecture of Boki (§ 4.2), where red arrows show the workflow of log appends (§ 4.3).code. LogBook engines maintain indices for physical logs, inorder to efficiently serve LogBook reads (detailed in § 4.4).LogBook engines subscribe to the metalog, and incrementallyupdate their indices in accordance with the metalog. LogBookengines also cache log records for faster reads, using theirunique sequence numbers as keys. Co-locating LogBook engines with functions means that, in the best case, LogBookreads can be served without leaving the function node.0a2a3a4a0b1bshard ametalogtotal order(2, 1, 1)0a2b3b0c1cshard b1a0b(3, 1, 3)0c2a1c2c2c3c4c5cshard c(5, 3, 4)3a4a1b2b(5, 4, 6)3c3b4c5cFigure 3. An example showing how the metalog determines thetotal order of records across shards. Each metalog entry is a vector,whose elements correspond to shards. In the figure, log recordsbetween two red lines form a delta set, which is defined by twoconsecutive vectors in the metalog (§ 4.3).Control plane. Boki’s control plane uses ZooKeeper [37]for storing its configuration. Boki’s configuration includes(1) the set of storage, sequencers, and indices constituting eachphysical log; (2) addresses of gateway, function, storage, andsequencer nodes; (3) parameters of consistent hashing [40]used for the mapping between LogBooks and physical logs.Every Boki node maintains a ZooKeeper session to keep synchronized with the current configuration. ZooKeeper sessionsare also used to detect failures of Boki nodes.Boki’s controller (see the control plane in Figure 2) is responsible for global reconfiguration. Reconfiguration happens when node failures are detected, or when instructed bythe administrator to scale the system, e.g., by changing thenumber of physical logs (see §7.1 for reconfiguration latencymeasurements). We define the duration between consecutive reconfigurations as a term. Terms have a monotonicallyincreasing term id.log. For simplicity, in this section, the term log always refersto physical logs.Records in a Boki log are sharded, and each shard is replicated on 𝑛 data storage nodes. Within a Boki log, each functionnode controls a shard. For a function node, its LogBook enginemaintains a counter for numbering records from its own shard.On receiving a logAppend call, the LogBook engine assignsthe counter’s current value as the local id of the new record.The LogBook engine replicates a new record to all storagenodes backing its shard ( 1 in Figure 2). Storage nodes thenneed to update the sequencers with the information of whatrecords they have stored. The monotonic nature of local idenables a compact progress vector, 𝑣. Suppose the log has𝑀 shards. We use a vector 𝑣 of length 𝑀 to represent a setof log records. The set consists of, for all shards 𝑗, recordswith local id 𝑣 𝑗 . If shard 𝑗 is not assigned to this node, weset the 𝑗-th element of its progress vector as . Every storage node maintains their progress vectors, and periodicallycommunicates them to the primary sequencer ( 2 in Figure 2).By taking the element-wise minimum of progress vectorsfrom all storage nodes, the primary sequencer computes theglobal progress vector. Based on the definition of progressvectors, we can see the global progress vector represents theset of log records that are fully replicated. Finally, the primarysequencer periodically appends the latest global progress vector to the metalog ( 3 in Figure 2), which effectively orderslog records across shards.Structure of sequence numbers (seqnum). In Boki, everylog record has a unique seqnum. The seqnum, from higher tolower bits, is (term id, log id, pos), where log id identifies thephysical log and pos is the record’s position in the physicallog. Seqnums in this structure determine a total order withina LogBook, which is in accordance with the chronologicalorder of terms and the total order of the underlying physicallog. But note that this structure cannot guarantee seqnumswithin a LogBook to be consecutive, whose records can bephysically interspersed with other LogBooks.4.31aWorkflow of Log AppendsWhen appending a LogBook (shown by the red arrows in Figure 2), the new record is appended to the associated physical5

SOSP ’21, October 26–29, 2021, Virtual Event, GermanylogReadNext(book id 3, min seqnum 8, tag 2)①Log index ②(book id, tag)seqnums [ ](3, 2)[3, 6, 7, 9, 10, ]LogBook engineZhipeng Jia and Emmett WitchelmetalogpositionsStorage nodes③23X4the logYlog indicesCBAindices make progressindependentlyRecord storeUpdating metalog positionsConsistency checksRecord storeFn gFn hFigure 4. Workflow of LogBook reads (§ 4.4): 1 Locate a LogBookengine stores the index for the physical log backing book id 3; 2 Query the index row (book id,𝑡𝑎𝑔) (3,2) to find the metadataof the result record (seqnum 9 in this case); 3 Check if the recordis cached; 4 If not cached, read it from storage nodes.0readsBFn f3 readsAFn f1reads3appendsXvia BFn fYFn f34Figure 5. Consistency checks by comparing metalog positions(§ 4.4). For a function, if reading from a log index whose progressis behind its metalog position, it could see staled states. For example,function ℎ have already seen record 𝑋 , so that it cannot performfuture log reads through index 𝐴.We now explain how the total order is determined by themetalog. Consider a newly appended global progress vector,denoted by 𝑣𝑖 . By comparing it with the previous vector inthe metalog (denoted by 𝑣𝑖 1 ), we can define the delta setof log records between these two vectors: for all shards 𝑗,𝑗𝑗records satisfying 𝑣𝑖 1 local id 𝑣 𝑖 . This delta set exactlycovers log records that are added to the total order by thenew metalog entry 𝑣𝑖 . Records within a delta set are orderedby (shard, local id). Figure 3 shows an example of metalogand its corresponding total order. In this figure, between twoconsecutive red lines is a delta set.The LogBook engine initiating the append operation learnsabout its completion by its subscription to the metalog ( 4in Figure 2). The metalog allows the LogBook engine to compute the final position of the new record in the log, used toconstruct the sequence number returned by logAppend.4.41Record store④Recordcache0Figure 1) allow selective reads by log tags (tags are specified byusers in logAppend). Both APIs seek for records sequentiallyby providing bounds for seqnums, e.g., logReadNext findsthe first record whose seqnum min seqnum. Putting themtogether, Boki’s log index groups records by (book id,tag).For each (book id,tag), it builds an index row as an array ofrecords, sorted by their seqnums. Figure 4 depicts the workflow of LogBook reads using the index.Read consistency. The consistency of Boki’s log reads aredetermined by the log index. The log index is used to find theseqnum of the result record. The seqnum uniquely identifiesa log record, while both data and metadata (i.e., tags) of a logrecord are immutable after they are appended.The challenge of enforcing read consistency comes frommultiple copies of the log index, which are maintained bydifferent LogBook engines. Keeping these copies consistentmakes the system vulnerable to “slowdown cascades” [19, 44],i.e., the slowdown of a single node can prevent the wholesystem from making progr

The shared log API is simple to use and applicable to di-verse applications [22, 24, 25, 55], so the challenge of Boki is . The metalog defines a total order of Boki's internal state that applications can use to enforce consistency when they need it. For example, monotonic reads are enforced by track- .