Received February 4, 2020, accepted March 11, 2020, date of publication March 23, 2020, date of current version April 6, 2020.Digital Object Identifier 10.1109/ACCESS.2020.2982619Crowdsourcing in Software Development:Empirical Support for Configuring ContestsSTAMATIA BIBI 1 , IOANNIS ZOZAS 1 , APOSTOLOS AMPATZOGLOUPANAGIOTIS G. SARIGIANNIDIS 1 , GEORGE KALAMPOKIS 1 ,AND IOANNIS STAMELOS 32,1 Departmentof Electrical and Computer Engineering, University of Western Macedonia, 50100 Kozani, Greeceof Applied Informatics, University of Macedonia, 54636 Thessaloniki, Greece3 Department of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece2 DepartmentCorresponding author: Stamatia Bibi ([email protected])This work was supported by the European Union under the Erasmus KA2 grants and the project ARRANGE-ICT,Grant 2018-1-BG-01-KA203-048023.ABSTRACT Despite the extensive adoption of crowdsourcing for the timely, cost-effective, and high-qualitycompletion of software development tasks, a large number of crowdsourced challenges are not able toacquire a winning solution, on time, and within the desired cost and quality thresholds. A possible reason forthis is that we currently lack a systematic approach that would aid software managers during the processof designing software development tasks that will be crowdsourced. This paper attempts to extend thecurrent knowledge on designing crowdsourced software development tasks, by empirically answering thefollowing management questions: (a) what type of projects should be crowdsourced; (b) why should onecrowdsource—in terms of acquired benefits; (c) where should one crowdsource—in terms of applicationdomain; (d) when to crowdsource—referring to the time period of the year; (e) who will win or participatein the contest; and (f) how to crowdsource (define contest duration, prize, type of contest etc.) to acquirethe maximum benefits—depending on the goal of crowdsourcing. To answer the aforementioned questions,we have performed a case study on 2,209 software development tasks crowdsourced through TopCoderplatform. The results suggest that there are significant differences in the level to which crowdsourcing goalsare reached, across different software development activities. Based on this observation we suggest thatsoftware managers should prioritize the goals of crowdsourcing, decide carefully upon the activity to becrowdsourced and then define the settings of the task.INDEX TERMS Crowdsourcing, software development, success factors, crowd factors, cost, duration.I. INTRODUCTIONThe term crowdsourcing combines two words: crowd andoutsourcing. Formally, Howe [12]—who first coined the termin 2006—defined crowdsourcing as ‘‘the act of taking ajob traditionally performed by a designated agent and outsourcing it to an undefined, generally large group of peoplein the form of an open call’’. In this context crowdsourcing is a new business model that enables the co-creationbetween a ‘‘provider’’ that is the one that sets the detailsof the problem, the ‘‘supplier’’ / ‘‘crowd’’ that suggestsa solution and the ‘‘host’’ who offers the crowdsourcingplatform, enabled by Web 2.0 [32], presenting the problemThe associate editor coordinating the review of this manuscript andapproving it for publication was Chintan Amrit58094.formulated by the provider to the crowd. CrowdsourcedSoftware Engineering (CSE) gained an increasing interestby both industry and academia, appearing to be a promisingapproach for completing specific software engineering tasks.CSE as a means to leverage the power of the crowd overtask completion is not a new research topic, considering thatthe development paradigm of Open Source Software [17] ispopular from the late 90s, but as a formulated contest-basedsoftware development process, has been an emerging hottrend only the last years [5]. According to Latoza and VanDer Hoek [17] ‘‘contests’’ as a crowdsourcing model, aresimilar to the outsourcing model, in the sense that a clientrequests work and pays for its completion, but differs asit treats the crowd as contestants rather than collaborators.When formulating a contest, the provider suggests the task toThis work is licensed under a Creative Commons Attribution 4.0 License. For more information, see 8, 2020

S. Bibi et al.: Crowdsourcing in Software Development: Empirical Support for Configuring Contestsbe crowdsourced, that may cover requirements, architecture,user interface design, implementation, and testing, that canbe completed in a number of days. Contestants (the crowd)provide each one of them a competing solution; from whicha winning one is selected and apprized. Current evidencesuggests that crowdsourcing contests are more successfulwhen associated with the completion of micro-tasks [37], [7],yet there are some examples where large, innovation projectsare successfully crowdsourced [12].Despite the dominating feeling that crowdsourcing is alow risk alternative to outsourcing tasks, the possibility offailure is non-negligible [20]. According to Dahlander andPiezunka [5] only 10% of crowdsourced contests are ableto attract the desired amount of contributions, while in 50%of the contests there is no contribution at all. The successof crowdsourced software development is closely related tothe reliability of the participating crowd-workers [1] andtheir experience [37] and therefore it is important for contestproviders to attract the suitable contributors [7]. According toWeidema et al. [37] many participants find the crowdsourcedtasks to be difficult despite the fact they are of limited scope.A possible reason for this perception might be that the taskscrowdsourced are poorly oriented [10], or even the fact thatthe tasks are not suitable for being crowdsourced [5]. Anotherproblem that crowdsourcing contests deal with is the qualityof the submitted solutions [37] and the fact that the overall costs of contests are underestimated [10]. Summarizing,crowdsourcing contests apart from failing directly (i.e., theyare not able to acquire a winning solution), they very frequently fail indirectly. In particular, if a solution comes late,it might not be relevant or useful for the contest provider; if itis of low quality then, it might not be ready for being broughtinto market; or if it is overpriced, it might not be beneficialdecision to not build the product in-house.Therefore, it is of paramount importance to carefully makedecisions during the planning phase of a crowdsourcingproject [14]. Focusing on the decision phase and inspiredby the 5W 1H model [11] to improve project managementefficiencies in this paper we aim to provide assistance to acontest provider in order to help him clearly answering thefollowing questions: What to crowd-source? To answer this question, we willcarefully examine the different tasks that can be crowdsourced and their performance with respect to solutionacquisition, the required time, the cost, and the qualityof the solution acquired. Traditional software engineering activities that can be considered for crowdsourcingare design, development and testing tasks. Also morespecialized activities such as cognitive tasks (i.e. Artificial Intelligence solutions) can be considered. Such ananswer will help the provider decide upon the specifictask that can be assigned for crowdsourcing. Where to crowd-source? This question extends the previous one, emphasizing on the application domain ofthe software engineering challenges that can be crowdsourced. Our interest here is to find the types of theVOLUME 8, 2020applications that are more likely to be successful whenassigned to an online community. Such applicationdomains can be scientific applications, media applications, civil engineering applications or even businessapplications among others. Similarly, this question willhelp the provider realize whether the application domainof the task planned to be crowd-sourced has the potentials to be successful. Why to crowd-source? The answer to this question isrelated to the goals that a provider is expecting to achievewhile adopting crowd-sourcing solutions. Apart fromthe delivery of the end-product/artifact crowdsourcinggoals may include cost savings, quick solution acquisition (time reduction), solution diversity (that includesmore than one possible solutions) and increased quality.A provider needs to clarify which of these goals arethe most important in each contest and focus so as tocarefully set the configuration parameters of the contest. When to crowd-source? In this question we will examine the distributions of challenges over the calendar yearand draw conclusions on the activity of each month,the completion of successful contests and the crowdparticipation. Our target is to help the contest provider,define the optimal time period to launch a new challengeso as to maximize the likelihood of achieving crowdsourcing goals. Who is the crowd? The answer to this question is relatedto the profiles of the winners and the participants ofthe contest, their competencies and their relative performance. Also the profiles of participants and the ‘‘quitters’’ of the contests (low reliability) will be analyzed.Such information will be useful for contest providers tobetter orient the contests based on the experience andexpertise of the community of ‘crowd workers’ they areexpecting. How to crowd-source? In this question we examinethe relevant factors that orient a crowdsourcing contest,in-cluding the configuration parameters in the problemstatement of the challenge. Such parameters can bethe type of the contest, the prize, the duration and thenumber of winners. Obviously all the 5W questions arerelated to the current question and thus a contest providerneeds to carefully answer all of them to be able to seta new contest that has the potentials to succeed. Thedecision making phase is further exemplified in Fig.1.This study explores the ‘‘How’’ question by taking intoaccount the ‘‘Why’’ question in the sense that a managershould consider the level to which crowdsourcing goalsare achieved across varying software development tasks(‘‘What’’), different application domains (‘‘Where’’), timeperiods that the contest will be announced (‘‘When’’) and theprofiles of ‘crowd’ workers (‘‘Who’’).For the purpose of this study we have performed anexploratory case study on 2,209 projects, crowdsourced bythe popular TopCoder platform during 2018. The data involvea variety of different software development challenges that58095

S. Bibi et al.: Crowdsourcing in Software Development: Empirical Support for Configuring ContestsFIGURE 1. Decision making phase for crowdsourcing software.were crowdsourced through different types of contests andapprized by specific monetary rewards. The challenges werefurther classified to different types of tasks and applicationdomains. We investigated if there are statistically significantdifferences among the success indicators of different typesof challenges, examined the distribution of contest types andthe level to which each type is preferred when differentchallenges are crowd sourced. All the results were summarized in evidence-based models that provide assistance tocontest providers when deciding upon the activity to be crowdsourced and the settings of the chal-lenge, with respect to thecrowdsourcing goals.The rest of the paper is organized as follows: In Section IIwe present an overview of related work and in Section III wepresent the case study design, in Section IV we provide theresults, organized by research question, and discuss them inSection V. In Section VI we present the threats to validity ofour study, and in Section VII, we conclude the paper.II. RELATED WORKCrowdsourced Software Engineering (CSE), as a development model, has been an emerging trend in the last decade [2],facing new types of management challenges determining itsoperational success. In this section we will refer to the different CSE management challenges that have been tackled bythe related work by taking into consideration the perspectiveof the 5W 1H model. In particular we will present worksrelated to the questions: Why to crowdsource? What to crowdsource?58096 Where to crowdsource?Who will be the crowd?When to crowdsource?How to crowdsource?A. WHY TO CROWDSOURCE?According to Stol et al. [28], the main benefits acquired fromcrowdsourcing software are:Cost reduction that is achieved by exploiting lower development costs in certain regions [28] and by offering newtypes of compensation models such as experience gain, andrecognition based systems [16] instead of typical monetaryprizes. Another reason that supports the claims of cost reduction is that crowdsourcing model is a form of outsourcingand therefore it inherits the typical benefits of outsourcingsuch as the avoidance of insourcing costs (i.e hire overheads,know how acquisition, application of new processes) [18],[28], [31].Time reduction that is achieved due to the parallelizationof decomposed tasks [17], [27] and the potential to accessa pool of experienced developers [14], [16], [31] that canincrease productivity and accelerate development. Additionally according to Stol et al. [28], motivated geographicallydistributed workers are willing to work during weekends,exploiting the different time zones so as to achieve a full24h productive day.Higher quality is also a potential benefit acquired fromcrowdsourcing [4]. According to Latoza and Van DerHoek [17] crowdsourcing, as a process, promotes thegeneration of alternative solutions to a given problem.VOLUME 8, 2020

S. Bibi et al.: Crowdsourcing in Software Development: Empirical Support for Configuring ContestsFrom these solutions the provider may select or even combine the most appropriate ones, obtaining at the end higherquality solutions. This argument is further supported byStol et al. [28] who emphasize that broad participation ofthe crowd provides access to experienced developers thatself-select the task based on their skills and therefore attemptto offer the highest quality solution in order to win.Crowd innovation is the last but the most importantmotivation to crowdsourcing tasks. The ‘‘wisdom of thecrowd’’ [12], the democratization of participation [17], theopen creativity that many times outmatches the fixed mindsetthat exists within individual companies [28], [30] is the maindriver for selecting and designing crowdsourced softwaredevelopment contests. It seems that crowd participation isamong the most important goals of crowdsourcing sinceengaging the right ‘‘crowd’’ the contest provider can obtainall the aforementioned benefits regarding time and cost reduction along with quality assurance. Solution diversity obtainedby broad participation is also another benefit leveraging fromcrowdsourcing [38].B. WHAT TO CROWDSOURCE?Sarı et al. [26] mention that crowdsourcing has been appliedto various process areas in software engineering with coding tasks being the most popular one. Task selection of thesoftware development activities crowdsourced is a great challenge for stakeholders as micro-tasks are more easy to complete according to Latoza and Van Der Hoek [17], but biggertasks may under the right environment leverage the most thepower of the crowd. Stol et al. [28] mention that effective taskdecomposition can help the organization leverage the mostthe power of crowdsourced development.Thuan et al. [34] offer a theoretical framework to supportdecision making regarding the type of task to be crowdsourced considering four types of tasks based on theirproperties: internet tasks, interactive tasks, sensitive tasksand partitioned tasks concluding that internet, interactivetasks are easier to crowdsource. As for task management,Dissanayake et al. [6] explored task division practices inteam based competitions by analyzing data from the Kaggle platform, finding that team leader’s social capital andteam expert’s intellectual capital affect the performance of ateam that is accelerated in contests that are less competitive.Though in this study the authors do not refer to the specifictypes of tasks crowdsourced but rather examine the efficacyof task division practices. Similarly Yu et al. [40] exploredtask assignment and division in collaborative crowdsourceddevelopment.The level of success of crowdsourced micro-task completion, and in particular software interface design tasks,was studied in [37]. The authors conducted experimentswith Amazon Mechanical Turk workers and noted that itis feasible for the crowd to generate a large number ofalternative solutions, though their quality is highly differentiated. Yang et al. [39] proposed a methodology, based onranking, to recommend tasks to crowd workers taking intoVOLUME 8, 2020consideration, among other factors, the average submissionquality on similar tasks and the overall submission rate ofeach crowd worker.We can observe that despite the fact that task selection anddecomposition is considered a very important success factorwhen crowdsourcing software [6], [10], [17], [18], [31], [33],we cannot find any study that compares the efficiency ofcrowdsourcing different types of software engineering tasksin terms of the success indicators described in section II-A.C. WHERE TO CROWDSOURCE?The ‘‘Where’’ question may refer to two things: (a) theplatform in which a software development contest will behosted and (b) the application domain where the solutionderived from the crowdsourced contest is deployed. Regarding the first interpretation according to Mao et al. [21] andWu et al. [38] there are several online platforms available thatcan currently host software development challenges. Severalof these platforms support all types of software developmenttasks1 (TopCoder, Bountify) while others support specifictasks 2 such as testing and mobile development (uTest, TestBirds). According to Mao et al. [21] TopCoder is a pioneer forpracticing crowdsourced software engineering and the dominant platform when selecting the medium to host contents.Additionally this platform is used in the majority of researchstudies performed for determining success factors of crowdsourced software development [1], [2], [19], [36], [38], [39].However, we were not able to identify any study that compares the efficacy of different platforms. Regarding the second interpretation of the where question, we were not ableto find empirical evidence on the application domain that ismore popular for crowdsourcing contests.D. WHO WILL BE THE CROWD?Several studies have explored the profile and the characteristics of the ‘‘crowd’’ participating in software development contests. The crowd motivation and incentives arerecognized as the most important factors from the crowdperspective affecting CSE success [17], [18], [22]. A contributor can be highly motivated to participate in a contestbecause of the recognition received [19], or the monetaryprize [21] or the acquisition of experience [29]. The crowdsize necessary to tackle a problem is appointed by Latoza andVan Der Hoek [17], as an important dimension of CSE.According to Tajedin and Nevo [33] the size of the crowd alsohas a positive effect on the crowd composition that affectsCSE success considering the fact that when more peopleare attracted to a project the chances of receiving innovative, diversified solutions are increased. Crowd reliabilityis recorded also as an important factor by Mehta [22] andYang et al. [39] referring to the level to which registeredcontributors submit, at the end, a solution and whether thissolution is of the expected quality. Crowd experience is part1, www.bountify.co2, www.appstori.com58097

S. Bibi et al.: Crowdsourcing in Software Development: Empirical Support for Configuring Contestsof the CSE success according to Li [19], as it is a factor thatcan affect the quality of the end product solution.E. WHEN TO CROWDSOURCE?Numerous studies can be found in literature mentioning theimportance of selecting the right period to crowdsource aproject [30]. Stol et al. [30] appoint that providers need toschedule a contest during the right period so as to ensurethat sufficient number of workers are available when needed.While there may be extensive expertise within the crowd,it might not be available at the moment when it is needed [14].Despite the aforementioned, we were not able to findany case studies answering the When question directly.Li et al. [19] approximated time by introducing a variablerepresenting the contest platform ‘‘traffic’’, measured as thenumber of other projects posted when the present project isbeing crowdsourced. Li et al. [19] examined the influenceof platform traffic on the final quality of the crowdsourcedproject. He concluded that the quality of a crowdsourcedproject is increased when it is posted in a prosperous period(i.e. a number of other challenges are communicated throughthe platform).F. HOW TO CROWDSOURCE?In this section, we present an overview of the experimentalresearch performed on how to crowdsource software development. We mainly refer to the settings and environmentalparameters that orient a contest and affect its performance interms of the quality of the solution acquired and the crowdparticipation and engagement. We should mention that wewere not able to find any studies examining the contestparameters that affect CSE success in terms of cost and time.Archak [2] was one of the first to explore efficient crowdsourcing mechanisms with respect to quality factors. Thefindings of his study highlighted the influence of specialfactors like payment and project requirements over the finalquality of the delivered project. Li in [19] tested 23 software quality factors based on platform and project natureand performed an experimental analysis on crowdsourcedreusable projects concluding that among the four aspects thataffect CSE quality and should be considered when designingcontests are the time period when a project is posted, the sizeof the project, the participation of experienced developersand the level to which the design documents of the projectare of high quality. Sohibani et al. [27] recorded quality factors based on the findings from questionnaires answered byparticipants of 5 different crowdsourcing platforms YouTube,Amazon Mechanical Turk, Wikipedia, Rally Fighter and KickStarter appointing as well that the crowd company workersexperience is very important. Wang et al. [36] suggesteda new quality metric for assessing crowdsourced softwaredevelopment projects, the effort level. The effort level iscalculated as a bi-product of 5 parameters: duration, payment,specification length and number of links, and technologyrequirements.58098Crowd participation was found to be influenced by theclarity of the description of the associated tasks [35].Tasks with unclear objective description, without specifyingrequired technologies or environment setup instructions, discourage developers from selecting them. Crowd participationwas also examined by Stol et al. in [28] who concluded thatthe duration and the prize of contests do not significantlyaffect crowd participation. On the other hand, the numberof competitions that run in parallel within a project has asignificant negative effect on the crowd’s interest in a competition. Alelyani and Yang [1] explored crowd reliabilityby investigating possible connections between the nature ofthe crowd and crowdsourced tasks. Data regarding workforce reliability (in terms of complete task submission andthus participation), task registration speed, task completionduration, skills (on programming language knowledge list)and challenge types (on the nature and rewards of eachcontest) were analyzed and signalized as catalyst factors forsuccess. Similarly, Dwarakanath et al. [8] assessed the crowdtrustworthiness based on submission quality, timeliness andownership concluding that task requirements, user efficacyand reputation strongly influence the trustworthiness of thecrowd. Karim et al. in [13] proposed a recommendationsystem that can help mainly crowd workers (and as a sideeffect providers) on taking over the appropriate tasks based ontechnology requirements and their skills. The aforementionedfactors are also explored by Saremi et al. [25] who exploredteam reliability, velocity and reliability.G. CONTRIBUTIONS OF THE STUDYA summary of the studies experimenting on the successof crowdsourced development is presented in Table 1. Forinstance, study [18] investigates if ‘‘Quality’’ is a parameterthat affects ‘‘why crowdsourcing is performed’’ and ‘‘how thecontest shall be setup’’. The sign ‘‘X’’ corresponds to investigation performed by the current study. Based on Table 1,and the aforementioned discussion, we can observe that themajority of these studies focus on contestants’ behavior andquality assessment. In this study we go beyond current literature, since this study: Focuses on management issues that can be controlledand monitored early. In particular, we focus on aspectsthat are formulated early while designing and setting upthe crowdsourced activity. Special emphasis is placed onfactors that can affect the success of the CSE operation. Provides a model that can aid contest providers ondeciding upon important aspects of challenges’ settings,based on the specific goal of crowdsourcing. Investigates duration and cost of crowdsourcing challenges, as factors for answering all 5W 1H management questions. Examines the where to crowdsource question, from theperspective of the application domain. Examines the relation between the time period when acontest is announced, to the potential success in termsof cost, duration, solution acquisition and quality.VOLUME 8, 2020

S. Bibi et al.: Crowdsourcing in Software Development: Empirical Support for Configuring ContestsTABLE 1. Case studies on CSE challenges.III. STUDY DESIGNIn this section we present the protocol of our case studythat has been designed according to the guidelines ofRuneson et al. [23]. The main goal of the case study is to provide empirical evidence on the success potentials of crowdsourcing software engineering tasks. The factors of interestinclude the parameters that the contest provider should configure, while setting up a new contest, whereas a CSE projectis considered successful if it is able: (a) to acquire a winningsolution, (b) to retain the cost under acceptable thresholds,(c) to maximize quality, and (d) to minimize the duration ofthe contest. To achieve this goal, we conducted a case studyon the 2,209 contests in the TopCoder platform performedduring 2018. The reason for conducting a case study is thatour goal is to investigate the phenomenon of crowdsourcingin its real context. The cases and units of analysis in this studyare presented in Section III-B.A. OBJECTIVES AND RESEARCH QUESTIONSThe overall goal of this case study, formulated accordingto the Goal-Question-Metric approach [3] is to analyzedata from crowdsourced software development; for the purpose ofovercoming software management challenges; withrespect to achieving contest success in terms of solutionacquisition, cost, quality, duration; from the viewpoint ofthe contest provider; in the context of applying the crowdsourcing software development approach in design, development, quality assurance and cognitive tasks. In order toachieve the aforementioned goal, we set six research questions driven by the 5W 1H model defined in Section I:[RQ1] Why should a contest provider setup acompetition?RQ1 focuses on the goals of crowdsourcing. Based on theliterature software development crowdsourcing, apart fromthe delivery of the product per se, provides additional benefits, e.g., decreased development cost, decreased development time and increased quality. Therefore, in this researchquestion we investigate why contest providers’ crowdsourcewith respect to the success indicators that are: delivery of asolution, reduced costs, time efficiency, quality. We answerthis RQ by investigating the values of the aforementionedsuccess indicators per type of software engineering challenge.VOLUME 8, 2020[RQ2] What types of tasks should be crowdsourced?RQ2 aims to investigate the types of crowdsourced software engineering tasks that are the most common in theTop-Coder platform and the level to which these tasks achievethe crowdsourcing goals regarding solution delivery, reducedcosts, time efficiency, quality and Overall Success. In thisRQ we differentiate from RQ1 by examining whether theperformance of each contest, with respect to the four success indicators, is within the success thresholds definedin Section III-D. Being aware of this information the contest provider can have access to the accumulated experiencederived from the TopCoder community regarding the specifictypes of tasks.[RQ3] Where can the crowdsourced projects beexploited?RQ3 digs further into the findings of RQ1 by placingemphasis in the application domain (e.g., business applications, scientific applications, etc.). The obtained benefit fromanswering this research question is the same as in RQ1,we note that the extraction of the application domain cannotbe obtained automatically from the TopCoder API, but wasextracted manually.[RQ4] Who is going to participate and win thecompetition?RQ4 focuses on investigating the profile of the competitionparticipants along with the winner(s) profile. In particular,we obtain information on the experience of the participants/winners and their reliability. This information can be ofparamount importance for contest providers: e.g., if the natureof the contest is relevant only for highly experienced developers, demanding high prize rates, then the configuration ofthe contest should not be of a low prize.[RQ5] When is the right time to crowdsource a project?RQ5 attempts to investigate trends on which periods of theyear are the most active for competitions in terms of openingnew contests, submitti

INDEX TERMS Crowdsourcing, software development, success factors, crowd factors, cost, duration. I. INTRODUCTION The term crowdsourcing combines two words: crowd and outsourcing.Formally,Howe[12] who rstcoinedtheterm in 2006 de ned crowdsourcing as ''the act of taking a job traditionally performed by a designated agent and out-