
Transcription
API Protocol Compliance in Object-Oriented SoftwareKevin Bierhoff23 April 2009CMU-ISR-09-108Institute for Software ResearchSchool of Computer ScienceCarnegie Mellon UniversityPittsburgh, PA 15213Thesis Committee:Jonathan Aldrich, chairDavid GarlanPeter LeeAlex Aiken (Stanford University)Submitted in partial fulfillment of the requirementsfor the degree of Doctor of PhilosophyCopyright c 2008-2009 Kevin BierhoffThis work was supported in part by DARPA grant #HR00110710019, NSF grants CCF-0811592 and CCF-0546550, ArmyResearch Office grant #DAAD19-02-1-0389 entitled “Perpetually Available and Secure Information Systems”, the Departmentof Defense, NASA cooperative agreements NCC-2-1298 and NNA05CS30A, and the Software Industry Center at CMU and itssponsors, especially the Alfred P. Sloan Foundation. The views and conclusions of this dissertation are those of the author and donot necessarily reflect the views of any funding agencies or the U.S. government.
Keywords: Typestate, access permissions, state refinement, Plural.
AbstractModern software development is highly reliant on reusable APIs. APIs often define usage protocols thatAPI clients must follow in order for code implementing the API to work correctly. Loosely speaking, APIprotocols define legal sequences of method calls on objects. In this work, protocols are defined based ontypestates (Strom and Yemini, 1986; DeLine and Fähndrich, 2004b). Typestates leverage the familiar intuition of abstract state machines to define usage protocols.The goal of this work is to give developers comprehensive help in defining and following API protocols in object-oriented software. Two key technical contributions enable the proposed approach: (1) Objectstate spaces are defined with hierarchical state refinements. Hierarchical state spaces make specificationsmore succinct, elegantly deal with subtyping, express uncertainty, and enable more precise reasoning aboutaliasing. (2) A novel abstraction, called access permissions, combines typestate and aliasing information.Access permissions capture developers’ design intent regarding API protocols and enable sound modularverification of API protocol compliance while allowing a great deal of flexibility in aliasing objects.This dissertation demonstrates that typestate-based protocols with state refinement and access permissions can be used for automated, static, modular enforcement of API protocols in practical object-orientedsoftware. Formal and empirical results show that the presented approach captures common API protocolssuccinctly, allows sound modular checking of protocol compliance in object-oriented code, can be automated in tools for mainstream programming languages that impose low annotation burden on developers,and can check API protocols in off-the-shelf software with higher precision than previous approaches.This work puts automatic API protocol compliance checking within reach of being used in practice. Itwill enable rapid and correct use of APIs during initial construction and ensure that API clients and implementations remain consistent with the specified protocol during maintenance tasks.
For my parents,who have given me ability and encouragement to pursue an academic career,who have always supported me in my quest,and who have been there for me whenever I needed their help.Ich liebe Euch sehr.
AcknowledgmentsThis work would not have been possible without the help, encouragement, and support of many people.I thank my advisor, Jonathan Aldrich, for his continuous advice and support all along. I also thankthe other members of my thesis committee, David Garlan, Peter Lee, and Alex Aiken, for their advice andengagement.Furthermore, I am deeply indebted to the many people who over the years have provided feedback, input,and encouragement. Thanks to all of you. I want to thank Nels Beckman in particular for being both a greatcollaborator and friend, and my friends George Fairbanks and Ciera Jaspan for all their help and support.In addition to my committee members, I was fortunate to get advice from many members of the CarnegieMellon faculty, including Bob Harper, Jim Herbsleb, Mary Shaw, and Frank Pfenning, as well as several“outsiders” including John Boyland, Manuel Fähndrich, Viktor Kuncak, and Rustan Leino. Many others atCarnegie Mellon have challenged, helped, and encouraged me over the years, including Sebastian Boßung,Kevin Bowers, Shang-Wen Cheng, Neel Krishnaswami, Donna Malayeri, Sven Stork, Joshua Sunshine,William Lovas, Tom Murphy, Michael Richter, Rob Simmons, the members of my research group and theSoftware Systems Study Group, and too many more to mention here. You all have made this work into whatit is.Last but not least I want to thank my family, friends, and fellow students for their support and for makingmy years at Carnegie Mellon a fun and enriching experience. I thank Maja for her patience with me and herinvaluable support. Brianne, Ciera, Dominique, Dorothee, Edith, Elke, George, Ivonne, Kai, Lucia, Hans,Maja, Marc, Michael, Nels, Owen, Robert, Saul, Sebastian, Tian, Tom, Verena, William, Wolfgang, and allthe others that I am forgetting: thank you for a wonderful time. I hope that you all continue to be part of mylife.vii
viiiACKNOWLEDGMENTS
Contents123Introduction1.1 APIs and Object Protocols . . . .1.1.1 Helping Developers . . . .1.1.2 Challenges . . . . . . . .1.2 This Dissertation . . . . . . . . .1.2.1 Expressiveness . . . . . .1.2.2 Subtyping and Inheritance1.2.3 Aliasing Flexibility . . . .1.2.4 Contributions . . . . . . .1.3 Potential Impact . . . . . . . . . .1.4 Thesis Outline . . . . . . . . . . .1. 1. 2. 4. 7. 7. 8. 8. 10. 12. 12Thesis and Hypotheses2.1 Thesis . . . . . . . . . . . . . . . . . . . . . .2.2 Hypotheses . . . . . . . . . . . . . . . . . . .2.2.1 Capture Common Protocols Succinctly2.2.2 Sound Modular Checking . . . . . . .2.2.3 Automation . . . . . . . . . . . . . . .2.2.4 Practical Checking . . . . . . . . . . .15151516161717Approach: Access Permissions3.1 Java Iterators . . . . . . . . . . . . . .3.1.1 Specification Goals . . . . . . .3.1.2 State Machine Protocol . . . . .3.1.3 Iterator Interface Specification .3.1.4 Creating and Disposing Iterators3.1.5 Client Verification . . . . . . .3.1.6 Modifying Iterators . . . . . . .3.2 Java Stream Implementations . . . . . .3.2.1 Stream Pipes . . . . . . . . . .3.2.2 Buffered Input Streams . . . . .3.3 Summary . . . . . . . . . . . . . . . .191919202022252526273236.ix.
x456CONTENTSType System4.1 Formal Language . . . . . . . . . . . . . . . . . . . . .4.1.1 Syntax . . . . . . . . . . . . . . . . . . . . . .4.1.2 State Spaces . . . . . . . . . . . . . . . . . . .4.1.3 Access Permissions . . . . . . . . . . . . . . . .4.1.4 Permission-Based Specifications . . . . . . . . .4.1.5 Handling Inheritance . . . . . . . . . . . . . . .4.1.6 Behavioral Subtyping . . . . . . . . . . . . . . .4.2 Modular Typestate Verification . . . . . . . . . . . . . .4.2.1 Permission Tracking . . . . . . . . . . . . . . .4.2.2 Packing and Unpacking . . . . . . . . . . . . .4.2.3 Calling Methods . . . . . . . . . . . . . . . . .4.2.4 Field Assignments . . . . . . . . . . . . . . . .4.2.5 Permission Reasoning with Splitting and Joining4.2.6 Soundness . . . . . . . . . . . . . . . . . . . .4.2.7 Example . . . . . . . . . . . . . . . . . . . . .Polymorphic Permission Inference5.1 Inference System . . . . . . . . . . .5.1.1 Syntax . . . . . . . . . . . .5.1.2 Typechecking Rules . . . . .5.1.3 Proof Rules . . . . . . . . . .5.1.4 Soundness and Completeness5.2 Solving Constraints . . . . . . . . . .5.2.1 Checking Method Definitions5.2.2 Constraint Satisfiability . . . .Plural: Java Tooling6.1 Developer Annotations . . . . . . . . . . . . . . .6.2 Background . . . . . . . . . . . . . . . . . . . . .6.2.1 Eclipse . . . . . . . . . . . . . . . . . . .6.2.2 Crystal . . . . . . . . . . . . . . . . . . .6.3 Flow Analysis for Local Permission Inference . . .6.3.1 Annotations Make Analysis Modular . . .6.3.2 Tuple Lattice . . . . . . . . . . . . . . . .6.3.3 Comparing and Joining Permission Tuples6.3.4 Composing Tuples . . . . . . . . . . . . .6.3.5 Local Must Alias Analysis . . . . . . . . .6.3.6 Dynamic State Tests . . . . . . . . . . . .6.3.7 API Implementation Checking . . . . . . .6.3.8 Error Reporting . . . . . . . . . . . . . . .6.4 Extensions . . . . . . . . . . . . . . . . . . . . . .6.4.1 Immutable Permissions . . . . . . . . . . .6.4.2 Borrowing . . . . . . . . . . . . . . . . 65.6767696969707071727475757676767677
CONTENTS6.56.678xi6.4.3 Method Cases . . . . . . . . . . . . . . . . . . .6.4.4 Marker States . . . . . . . . . . . . . . . . . . .6.4.5 Dependent Objects . . . . . . . . . . . . . . . .6.4.6 Concrete Predicates . . . . . . . . . . . . . . . .6.4.7 Permissions Allowing Dispatch And Field AccessDealing with Java . . . . . . . . . . . . . . . . . . . . .6.5.1 Constructors . . . . . . . . . . . . . . . . . . .6.5.2 Private Methods . . . . . . . . . . . . . . . . .6.5.3 Static Fields (Globals) . . . . . . . . . . . . . .6.5.4 Arrays . . . . . . . . . . . . . . . . . . . . . . .6.5.5 Future Work . . . . . . . . . . . . . . . . . . .Use Cases . . . . . . . . . . . . . . . . . . . . . . . . .Evaluation7.1 Specifying APIs . . . . . . . . . . . . . . . . .7.1.1 Java Database Connectivity . . . . . .7.1.2 Java Collections Framework . . . . . .7.1.3 Other Java Standard Libraries . . . . .7.2 Beehive: Verifying an Intermediary Library . .7.2.1 Checked Java Standard Library APIs .7.2.2 Implementing an Iterator . . . . . . . .7.2.3 Formalizing Beehive Client Obligations7.2.4 Overhead: Annotations in Beehive . . .7.2.5 Analysis Precision . . . . . . . . . . .7.3 Iterators in PMD: Scalability and Precision . .7.3.1 Iterator Usage Protocol . . . . . . . . .7.3.2 Concurrent Modifications . . . . . . .7.3.3 Comparison to Tracematches . . . . . .7.4 Discussion: Lessons Learned . . . . . . . . . .7.4.1 APIs . . . . . . . . . . . . . . . . . .7.4.2 API Client Code . . . . . . . . . . . .7.4.3 Plural Tool Usage . . . . . . . . . . . .Related Work8.1 Static Protocol Analyses . . . . . .8.1.1 Modular Analyses . . . . .8.1.2 Whole-Program Analyses .8.2 Protocols at Runtime . . . . . . . .8.3 Verifying Component Compositions8.4 Fractional Permission Inference . .8.5 Comprehensive Program Verification8.6 Protocol Inference . . . . . . . . . .
xii9CONTENTSConclusions9.1 Validation of Hypotheses . . . . . . . . . . . .9.1.1 Capture Common Protocols Succinctly9.1.2 Sound Modular Checking . . . . . . .9.1.3 Automation . . . . . . . . . . . . . . .9.1.4 Practical Checking . . . . . . . . . . .9.1.5 Thesis . . . . . . . . . . . . . . . . . .9.2 Concerns . . . . . . . . . . . . . . . . . . . .9.2.1 Variations in Development Practices . .9.2.2 Applicability to Frameworks . . . . . .9.2.3 Expression Cost . . . . . . . . . . . .9.2.4 Adoptability . . . . . . . . . . . . . .9.2.5 Tool Performance . . . . . . . . . . . .9.2.6 Concurrency . . . . . . . . . . . . . .9.3 Discussion . . . . . . . . . . . . . . . . . . . .9.3.1 Pay-off and Incrementality . . . . . . .9.3.2 Essential Features . . . . . . . . . . . .9.3.3 Advice to API Designers . . . . . . . .9.3.4 Effect on Software Engineering Practice9.4 Contributions . . . . . . . . . . . . . . . . . .9.5 Future Work . . . . . . . . . . . . . . . . . . .9.6 Summary . . . . . . . . . . . . . . . . . . . 28129130131131133
List of Figures1.1Simplified JDBC result set protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.13.23.33.43.53.63.73.83.93.103.113.12Read-only iterator state machine protocol . . . . . . . . . . . . . . . . . . . . . . .Simple Iterator client with concurrent modification error . . . . . . . . . . . . . .Read-only Iterator and partial Collection interface specification . . . . . . . . .Verifying a simple Iterator client . . . . . . . . . . . . . . . . . . . . . . . . . .Modifying iterator state machine protocol . . . . . . . . . . . . . . . . . . . . . . .Java modifying Iterator specification . . . . . . . . . . . . . . . . . . . . . . . .PipedInputStream’s state space (inside open) . . . . . . . . . . . . . . . . . . . .Java PipedOutputStream (simplified) . . . . . . . . . . . . . . . . . . . . . . . . .Java PipedInputStream (simplified) . . . . . . . . . . . . . . . . . . . . . . . . .Java FilterInputStream forwards all calls to underlying InputStream (simplified)Frames of a BufferedInputStream instance in state filled . . . . . . . . . . . . . .BufferedInputStream caches characters from FilterInputStream base class . .94.104.114.124.13Core language syntax . . . . . . . . . . . . . . . . . . . . . . . . . . .State space judgments . . . . . . . . . . . . . . . . . . . . . . . . . . .Permission-based specifications . . . . . . . . . . . . . . . . . . . . .Fraction typing and well–formed permissions . . . . . . . . . . . . . .Term typechecking . . . . . . . . . . . . . . . . . . . . . . . . . . . .Permission checking for expressions (part 1) and declarations . . . . . .Protocol verification helper judgments . . . . . . . . . . . . . . . . . .Invariant construction . . . . . . . . . . . . . . . . . . . . . . . . . . .Permission checking for expressions (part 2) . . . . . . . . . . . . . . .Permission purification . . . . . . . . . . . . . . . . . . . . . . . . . .Affine logic for permission reasoning . . . . . . . . . . . . . . . . . .Splitting and merging of access permissions . . . . . . . . . . . . . . .Fragment of BufferedInputStream from Figure 3.12 in core tax for permission inference system . . . . . . . . . . . . . . .Typechecking rules for permission inference . . . . . . . . . . . . .Helper judgments for permission inference . . . . . . . . . . . . . .Proof rules for deriving constraints for atomic permission predicates.57596061xiii.3
xivLIST OF FIGURES5.55.65.7Proof rules for linear logic formulae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63Context proof rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64Well-defined methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 656.16.26.3Simplified ResultSet specification in Plural . . . . . . . . . . . . . . . . . . . . . . . . . 68Simple ResultSet client with error in else branch that is detected by Plural. . . . . . . . . . 71Overriding a method requiring a full “dispatch-and-fields” permission . . . . . . . . . . . . 817.17.27.37.47.57.67.77.8Simplified JDBC Connection interface specification . . . . . . . . . . . . . . .Fragment of JDBC Statement interface specification . . . . . . . . . . . . . .JDBC ResultSet interface specification (fragment). . . . . . . . . . . . . . .Method cases allow iterators to be read-only or modifiable at the client’s choiceList interface that optionally maintains permissions for contained elements . . .Java exceptions have an initialization protocol! . . . . . . . . . . . . . . . . .Beehive’s iterator over the rows of a result set (constructor omitted) . . . . . .Simplified annotations for checking iterator protocol . . . . . . . . . . . . . .86878991929497102
List of Tables1.1Access permission taxonomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107.17.27.37.4Specified JDBC interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Beehive classes checked with Plural . . . . . . . . . . . . . . . . . . . . . . . . . . . . .PMD annotations for preventing concurrent modifications . . . . . . . . . . . . . . . . . .Plural’s performance on false positives in checking concurrent modifications reported byBodden et al. (2008) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Overhead and precision in Beehive and PMD case studies . . . . . . . . . . . . . . . . . .7.5xv. 86. 98. 104. 108. 112
xviLIST OF TABLES
Chapter 1Introduction1.1APIs and Object ProtocolsModern software development is highly reliant on reusable APIs (Application Programming Interfaces).Many programming languages in industrial use such as Java or C# include enormous “standard” librariesfor everyday tasks, and organizations often use additional libraries and frameworks that were developedin-house, in the open-source community, or by third parties. For example, the Java Database ConnectivityAPI (JDBC) defines canonical interfaces for accessing relational databases from Java programs using SQLqueries which developers can use without having to worry about the details of accessing the particulardatabase they employ.Reusable APIs often define usage protocols. Loosely speaking, usage protocols impose constraints onthe order in which events are allowed to occur. For example, a database connection can only be used toexecute SQL queries while it is open. Running an SQL query will return a “result set” object that can beused to iterate over the rows (tuples) of the query result using the next operation (Figure 1.1). This operationwill return true if the result set was advanced to a valid row and false otherwise. Column values can beretrieved from valid rows, but attempts to retrieve column values in the end state violate the protocol forResultSet. Furthermore, result sets can only be used until the connection they were created on is closed.1It has been a long-standing research challenge to ensure statically (before a program ever runs) that APIprotocols are followed in client programs using an API. A long-overlooked but related problem is to makesure that the protocol being checked is consistent with what the API implementation, such as implementations of JDBC interfaces for a particular database, does. Both of these challenges are complicated by objectaliasing (objects being referenced and possibly updated from multiple places in the program)—the hallmarkfeature of imperative languages like C and Java.The goal of this work is to give developers comprehensive help in following API protocols as well as toallow them to correctly and concisely formalize protocols for their own code. Our studies of APIs, primarilyfrom the Java standard library, show that challenges in this area revolve around expressiveness, subtyping,inheritance, and aliasing. These challenges will be described in more detail below.This dissertation proposes a set of techniques that address these challenges. The techniques can formalize and statically enforce API protocols in common object-oriented programs. Case studies show that these1This description is simplified: connections and result sets are related through intermediary “statement” objects (Section 7.1.1).1
2CHAPTER 1. INTRODUCTIONtechniques can express interesting and relevant protocols of APIs defined in the Java standard library, andthat the tools we developed can automatically verify compliance to these protocols in open-source softwarecodebases.1.1.1Helping DevelopersCurrently, it is a painstaking and time-consuming undertaking for a developer to gain familiarity with anAPI. Developers typically create a series of (hopefully small) programs that exercise the API to understandhow it works, for instance by invoking operations whose names promise to be relevant for the task at hand.If the developer is fortunate then the API implementation will throw meaningful runtime exceptions whenprotocols are violated. But at other times, API invocations may simply not produce the expected result,i.e., they silently fail, or protocol violations may corrupt internal data structures of the API implementation, leading to subtle errors later (when these data structures are accessed again), or even security flawsexploitable by an attacker. In these cases, developers may be forced to use the debugger to investigate theAPI internals. If available, the developer can also read documentation, example code, online forums, or theAPI implementation code, which may accelerate or decelerate the process depending on the quality of theseartifacts.2 What we need is a situation where developers can write code against an unfamiliar API and toolswill point out when they start violating protocols, allowing developers to focus on the task they are trying toaccomplish.Even familiar API protocols can be difficult to follow in all cases. Protocols of real APIs are complexand easily violated. In particular, if objects created through the API depend on each other in some way, arestored in fields—or are otherwise shared—then ensuring compliance to API protocols becomes a non-localproblem that is challenging even for expert developers. Moreover, if a program behaves as expected, there isno guarantee that API protocols will be followed on all paths through the program. This reduces developers’confidence in their code, especially during maintenance tasks, which can lead to long testing cycles andhigh costs to fix bugs late in the development process. We would like to quickly check that protocols areconsistently followed when code (or its protocol) is written or changed.Producing reusable APIs is time-consuming as well because protocols have to be correctly documentedand implementations have to comply with their documented protocols. In order to avoid erratic behavior atruntime, API implementations have to protect themselves against protocol violations by clients, requiringconsistent and expensive runtime checks on method arguments. If data structures are shared between APIclients and implementation then API implementation code has to be prepared for client modifications ofthese data structures. Reentrancy and potential overriding further exacerbate the challenges for API developers. Here, we would like to check that “bad states” are avoided and implementations are consistent withdocumented protocols. Moreover, if clients are trustworthy, we could potentially speed up programs byremoving defensive runtime checks.Using this work. For the purpose of this work, we roughly distinguish API designers, API client developers, and API implementers. Reflecting upon the scenarios described above, our goals include aiding APIdesigners in documenting API protocols and client developers in understanding APIs; to assure to client2The author had an experience with the Eclipse IDE’s “builder” infrastructure where (a) names used in the API did not reflecttheir meaning, (b) the documentation did not state important rules, and (c) sample code included with Eclipse was faulty (fixedupon request by the author with bug 263807). Several hours of using the debugger finally shed light on the problem.
1.1. APIS AND OBJECT PROTOCOLS3openvalidunreadwasNull()next() / truegetInt( )close()endreadclosednext() / false(a) Protocol visualized similar to a Statechart using state refinementnext() / return trueRefinedstatewasNull()unreadreadgetInt( )endnext() /return false(b) Flat state machineclosedclose()Atomicstateoperation( ) /return value(c) LegendFigure 1.1: Simplified JDBC result set protocol. Return values for next are included to show the possibleoutcomes of dynamic state tests.developers that API protocols are followed in their code; and to help API implementers create dependableAPI implementations. In order to be applicable to large codebases, we would like to, as much as possible,check protocols automatically (instead of relying on manual proofs) and in a way suitable for interactive use(in order to support developers while they write or maintain code). The work presented in this dissertationaccomplishes these goals as follows: Documentation and understanding. API designers can use hierarchical state machines, such as theone illustrated in Figure 1.1(a), to succinctly describe API protocols. Additionally, they use “accesspermissions” to describe how API methods will access and use method parameters. Hierarchical statemachines and access permissions also address a number of other concerns and are described in moredetail below.Figure 6.1 shows a description of the ResultSet protocol discussed in this chapter using the annotations defined by our tool, Plural. There, Java annotations @Full and @Pure indicate modifying andreading access to the receiver object, respectively, and specify states required or ensured by the annotated methods, directly encoding the protocol shown in Figure 1.1(a). The conventional Java typesignature for ResultSet is unchanged by these annotations.These annotations allow API designers to concisely and formally document their protocols; they canalso help developers using an API understand how the API works, even without tool involvement. Automation, assurance, and interactivity. Our tool, Plural, automatically checks at compile-timewhether a given API client program complies with API protocols described as above. Plural works
4CHAPTER 1. INTRODUCTIONwith conventional Java code, and similar tools could be developed for other programming languages.Plural determines for every variable at every program point what is known about the state of thereferenced object. This information is known as the object’s current “typestate” (Strom and Yemini,1986) and allows assuring that API protocols are followed.Plural may need additional annotations for method parameters and instance fields to check individualmethods in a API client program independently from the rest of the program. Our case studies showthat this approach requires only about one developer-provided annotation per method in an API clientprogram. These additional annotations enable quick checking of individual methods (currently around100ms) and allows making assumptions about unfinished or unavailable parts of the program.API client developers can use Plural interactively while they write code in their development environments. Figure 6.2 shows a small ResultSet client with an annotation for a method parameter.This annotation allows making all calls on the annotated parameter, rs, that are legal according to itsprotocol (Figure 6.1). Plural also detects a protocol violation. When the developer fixes the problemshe can re-run Plural to assure that the modified code respects all API protocols. Dependability. Separately, developers of API implementations can use Plural to make sure that theircode is consistent with the published protocol (see Section 3.2).This work takes the point of view that a protocol is part of an API’s interface that should be capturedas part of that interface an
API clients must follow in order for code implementing the API to work correctly. Loosely speaking, API protocols define legal sequences of method calls on objects. In this work, protocols are defined based on typestates (Strom and Yemini, 1986; DeLine and Fähndrich, 2004b). Typestates leverage the familiar intu-