LNCS 10027

Pekka Abrahamsson · Andreas Jedlitschka
Anh Nguyen Duc · Michael Felderer
Sousuke Amasaki · Tommi Mikkonen (Eds.)

Product-Focused
Software Process
Improvement
17th International Conference, PROFES 2016
Trondheim, Norway, November 22–24, 2016
Proceedings

123

Lecture Notes in Computer Science
Commenced Publication in 1973
Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

Editorial Board
David Hutchison
Lancaster University, Lancaster, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Friedemann Mattern
ETH Zurich, Zurich, Switzerland
John C. Mitchell
Stanford University, Stanford, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
C. Pandu Rangan
Indian Institute of Technology, Madras, India
Bernhard Steffen
TU Dortmund University, Dortmund, Germany
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Gerhard Weikum
Max Planck Institute for Informatics, Saarbrücken, Germany

10027

More information about this series at http://www.springer.com/series/7408

Pekka Abrahamsson Andreas Jedlitschka
Anh Nguyen Duc Michael Felderer
Sousuke Amasaki Tommi Mikkonen (Eds.)
•

•

•

Product-Focused
Software Process
Improvement
17th International Conference, PROFES 2016
Trondheim, Norway, November 22–24, 2016
Proceedings

123

Editors
Pekka Abrahamsson
Norwegian University of Science
and Technology
Trondheim
Norway
Andreas Jedlitschka
Fraunhofer Institute for Experimental
Software Engineering
Kaiserslautern
Germany
Anh Nguyen Duc
Norwegian University of Science
and Technology
Trondheim
Norway

Michael Felderer
University of Innsbruck
Innsbruck
Austria
Sousuke Amasaki
Okayama Prefectural University
Soja
Japan
Tommi Mikkonen
Tampere University of Technology
Tampere
Finland

ISSN 0302-9743
ISSN 1611-3349 (electronic)
Lecture Notes in Computer Science
ISBN 978-3-319-49093-9
ISBN 978-3-319-49094-6 (eBook)
DOI 10.1007/978-3-319-49094-6
Library of Congress Control Number: 2016955998
LNCS Sublibrary: SL2 – Programming and Software Engineering
© Springer International Publishing AG 2016
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the
material is concerned, speciﬁcally the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microﬁlms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now
known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a speciﬁc statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors
give a warranty, express or implied, with respect to the material contained herein or for any errors or
omissions that may have been made.
Printed on acid-free paper
This Springer imprint is published by Springer Nature
The registered company is Springer International Publishing AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

The 17th International Conference on Product-Focused Software Process Improvement
(PROFES 2016) brought together software researchers and industrial practitioners to
Trondheim in Norway, during November 22–24, 2016. The hosting organization was
the Department of Computer and Information Science (IDI) from the Norwegian
University of Science and Technology. It is notable that this was the ﬁrst time the
PROFES conference was held in Norway, which is a country known for its advanced
IT services and infrastructure. Norway is quickly becoming the hotspot of ICT
development and innovations in Scandinavia. In the spirit of the PROFES conference
series, PROFES 2016 focused on how the challenges of improving software development within the different practice areas such as requirements, design, construction,
testing, maintenance, process, methods, management, etc. The conference has always
encouraged submissions of research papers based on empirical evidence ranging from
controlled experiments to case studies and from quantitative to qualitative studies.
This year we received 82 submissions of which 24 were selected as full papers and
21 as short papers. The scientiﬁc works were strictly scrutinized by international
Program Committee members. Scientiﬁc papers in the PROFES conference received
three or four reviews each.
As a novelty for 2016, we introduced a new track for the conference called “Radical
Challenge Track.” The intent was to draw ideas from the scientiﬁc and professional
software communities who are searching ways to build the next paradigm for software
development. The contributions in such a track are less scientiﬁc in their nature but are
argued in a compelling way. We wanted to provide an opportunity for the community
to present ideas that generate discussion and have the capacity to push the ﬁeld forward
in an unexpected way. The submissions were still to be peer-reviewed to ensure the
quality and they were to be included in the technical conference proceedings. Both full
papers and short papers on radical challenges were welcomed. However, it appears that
the community is not ready for such track because we did not receive any submissions
under the heading of radical challenges. We were able to attract a few papers dealing
with the future of computing and were thus able to conclude the conference with these
presentations and a panel dealing with issues of tomorrow. We will continue to
motivate the community and especially the more senior scientists to begin undertaking
ambitious endeavors dealing with tomorrow’s challenges. We are well aware of the fact
the digitalization process, Internet of Things, and big data require a lot of software to
run them effectively and such issues should be raised under discussion sooner rather
than later.
Another topic in the academic circles in the past years, which also includes the
PROFES community, is the participation of the industrial community in the discussions, presentations, and experience sharing. Since 10–15 years ago, the development
has actually been heading in the opposite direction. Industrial practitioners appear to
organize themselves organically among certain topics quite effectively and are less

VI

Preface

keen in participating in classic scientiﬁc conferences. PROFES 2016 recognized this
and organized tutorials on topics and themes of industrial interest. There were nine
tutorials held on topics such as regulated software development, DevOps, lean startups, innovation, and software security. Scientiﬁc software engineering research needs
industrial attention to survive and prosper.
The keynote speakers this year were of high quality. Mikko Terho is the CTO for
Mobile Software and the site manager of Huawei’s R&D center in Finland. As one
of the founding board members of Symbian, at the time the leading open mobile
operating system, Mikko Terho has had a signiﬁcant inﬂuence on, and made a substantial contribution to, the development of the mobile industry as a whole. He was one
of the few persons who were appointed as Nokia Fellow. Dag Sjøberg is Professor in
Software Engineering and works at the Department of Informatics in University of
Oslo in Norway. He is one the brightest software researchers around and has had a
signiﬁcant impact in the ﬁeld. Dag Sjøberg sees that software systems form the
foundation of the economic, political, social, cultural, and scientiﬁc spheres of modern
information society. Such systems are, for example, crucial to solving global humanitarian and environmental problems.
We are thankful for having had the opportunity to organize PROFES 2016 in
Trondheim. The Program Committee members and reviewers provided excellent
support in reviewing the papers. We are also grateful to the authors, presenters, and
session chairs for their time and effort that made PROFES 2016 a success. We are
especially thankful to Prof. Letizia Jaccheri, the head of the Department of Computer
and Information Science at NTNU, for providing the conference with the ﬁnancial
backing and helping in the organization. We would like to thank the PROFES steering
group members and organizations (University of Oulu, VTT Technical Research
Centre of Finland and Fraunhofer IESE) for the guidance and support in the organization process. Finally, we would like to thank the NTNU IDI’s student and staff
volunteers for making PROFES 2016 an experience that will live in the memory of the
participants for years to come.
September 2016

Pekka Abrahamsson
Andreas Jedlitschka
Anh Nguyen Duc
Michael Felderer
Sousuke Amasaki
Tommi Mikkonen

Organization

PROFES 2016 was hosted by the Norwegian
University of Science and Technology, Department
of Computer and Information Science.

General Chair
Tommi Mikkonen

Tampere University of Technology, Finland

Program Co-chairs
Pekka Abrahamsson
Andreas Jedlitschka

Norwegian University of Science and Technology, Norway
Fraunhofer IESE, Germany

Organization Chair
Jingyue Li

Norwegian University of Science and Technology, Norway

Proceedings Chair
Anh Nguyen Duc

Norwegian University of Science and Technology, Norway

Short Paper Co-chairs
Michael Felderer
Sousuke Amasaki

Innsbruck University, Austria
Okayama University, Japan

Publicity and Social Media Co-chairs
Daniel Méndez
Daniel Graziotin

Technische Universität München, Germany
University of Stuttgart, Germany

Poster Co-chairs
Tanja Suomalainen
Masud Fazal-Baqaie

VTT Technical Research Center of Finland
S&N CQM, Germany

VIII

Organization

Design Chair
Juhani Risku

Norwegian University of Science and Technology, Norway

PhD Symposium Co-chairs
Xiaofeng Wang
John Noll

University of Bolzano, Italy
The Irish Software Research Center, Ireland

Workshop and Tutorial Co-chairs
Daniela S. Cruzes
Sabrina Marczak

SINTEF, Norway
PUCRS, Porto Alegre, Brazil

Program Committee
Andreas Birk
Anh Nguyen Duc
Andreas Jedlitschka
Barbara Russo
Bruno Rossi
Daniel Rodriguez
Daniel Méndez
Fernández
Daniel Graziotin
Davide Falessi
Dietmar Pfahl
Dietmar Winkler
Frank Houdek
Hironori Washizaki
Jens Heidrich
Jingyue Li
Jonas Eckhardt
Jürgen Münch
Klaus Schmid
Kurt Schneider
Lech Madeyski
Luigi Buglione
Marco Kuhrmann
Marco Torchiano
Marcus Ciolkowski
Maria Teresa
Baldassarre

SWPM, Germany
Norwegian University of Science and Technology, Norway
Fraunhofer Institute for Experimental Software Engineering,
Germany
Free University of Bolzano/Bozen, Italy
Masaryk University, Czech Republic
The University of Alcalá, Spain
Technische Universität München, Germany
University of Stuttgart, Germany
Cal Poly, USA
University of Tartu, Estonia
Vienna University of Technology, Austria
Ulm University, Germany
Waseda University, Japan
Fraunhofer Institute for Experimental Software Engineering,
Germany
Norwegian University of Science and Technology (NTNU),
Norway
Technische Universität München, Germany
University of Helsinki, Finland
University of Hildesheim, Germany
Leibniz Universität Hannover, Germany
Wroclaw University of Technology, Poland
Engineering.IT/ETS, Canada
University of Southern Denmark, Denmark
Politecnico di Torino, Italy
QAware GmbH, Germany
Università degli Studi di Bari A. Moro, Italy

Organization

Maurizio Morisio
Masud Fazal-Baqaie
Maya Daneva
Michael Felderer
Noriko Hanakawa
Oscar Dieste
Paolo Panaroni
Reinhold Plösch
Rini van Solingen
Risto Nevalainen
Silvia Abrahão
Sousuke Amasaki
Tomi Mannisto
Yoshiki Higo

IX

Politecnico di Torino, Italy
S&N CQM, Germany
University of Twente, The Netherlands
University of Innsbruck, Austria
Hannan University, Japan
Universidad Politécnica de Madrid, Spain
INTECS, Rome, Italy
Johannes Kepler Universität Linz, Austria
Delft University of Technology, The Netherlands
FiSMA Association, Finland
Okayama Prefectural University, Japan
University of Helsinki, Finland
Osaka University, Japan

Main Sponsors
Department of Computer and Information
Science, Norwegian University of Science and
Technology

Norwegian Research Council

City of Trondheim

Contents

Keynotes
The Relationship Between Software Process, Context and Outcome . . . . . . .
Dag I.K. Sjøberg

3

Early Phases in Software Engineering
Eight Paths of Innovations in a Lean Startup Manner: A Case Study . . . . . . .
Mikko Raatikainen, Marko Komssi, Harri Kiljander, Laura Hokkanen,
Jukka Märijärvi, and Omar Mohout

15

On the Distinction of Functional and Quality Requirements in Practice . . . . .
Jonas Eckhardt, Andreas Vogelsang, and Daniel Méndez Fernández

31

A Survey on Software Release Planning Models . . . . . . . . . . . . . . . . . . . . .
David Ameller, Carles Farré, Xavier Franch, and Guillem Rufian

48

Organizational Models
A Power Perspective on Software Ecosystem Partnerships . . . . . . . . . . . . . .
George Valença, Carina Alves, and Slinger Jansen
No More Bosses? A Multi-case Study on the Emerging
Use of Non-hierarchical Principles in Large-Scale Software Development . . .
Helena Holmström Olsson and Jan Bosch
Supporting Management of Hybrid OSS Communnities - A Stakeholder
Analysis Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Hanna Mäenpää, Tero Kojo, Myriam Munezero, Fabian Fagerholm,
Terhi Kilamo, Mikko Nurminen, and Tomi Männistö

69

86

102

Architecture
A Process Framework for Designing Software Reference Architectures
for Providing Tools as a Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Muhammad Aufeef Chauhan, Muhammad Ali Babar,
and Christian W. Probst
Should We Adopt a New Version of a Standard? – A Method and Its
Evaluation on AUTOSAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Corrado Motta, Darko Durisic, and Miroslaw Staron

111

127

XII

Contents

Choreography Modelling Language for the Embedded Systems Domain:
Empirical Evaluation and Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . .
Nebojša Taušan, Jari Lehto, Jouni Markkula, Pasi Kuvaja,
and Markku Oivo

144

Methods and Tools
An ISO 26262 Compliant Design Flow and Tool for Automotive
Multicore Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Maria Trei, Salome Maro, Jan-Philipp Steghöfer,
and Thomas Peikenkamp
Evaluating a GUI Development Tool for Internet of Things and Android . . . .
Björn A. Johnsson, Martin Höst, and Boris Magnusson
Application of GQM+Strategies in a Multi-industry State-Owned Company:
An Experience Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Gustavo López, Brenda Aymerich, Diana Garbanzo, and Alexia Pacheco

163

181

198

Verification and Validation
Is Mutation Testing Ready to Be Adopted Industry-Wide? . . . . . . . . . . . . . .
Jakub Možucha and Bruno Rossi
An Effective Verification Strategy for Testing Distributed
Automotive Embedded Software Functions: A Case Study . . . . . . . . . . . . . .
Annapurna Chunduri, Robert Feldt, and Mikael Adenmark
Problems and Solutions in Mobile Application Testing . . . . . . . . . . . . . . . .
Triin Samuel and Dietmar Pfahl
Cost-Benefit Analysis of Using Dependency Knowledge
at Integration Testing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Sahar Tahvili, Markus Bohlin, Mehrdad Saadatmand, Stig Larsson,
Wasif Afzal, and Daniel Sundmark
Using Surveys and Web-Scraping to Select Tools for Software
Testing Consultancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Päivi Raulamo-Jurvanen, Kari Kakkonen, and Mika Mäntylä
On the Need for a New Generation of Code Review Tools . . . . . . . . . . . . .
Tobias Baum and Kurt Schneider

217

233
249

268

285
301

Contents

XIII

Process Improvement
GQM+Strategies and IDEAL: A Combination of Approaches to Achieve
Continuous SPI: An Experience Report in a Large Multi-industry
State-Owned Company . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Gustavo López, Alexia Pacheco, Francisco Cocozza, Diana Garbanzo,
Brenda Aymerich, and Gabriela Marín
On the Role of Software Quality Management in Software Process
Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Jan Wiedemann Jacobsen, Marco Kuhrmann, Jürgen Münch,
Philipp Diebold, and Michael Felderer
Transitioning Towards Continuous Experimentation in a Large Software
Product and Service Development Organisation – A Case Study . . . . . . . . . .
Sezin Gizem Yaman, Fabian Fagerholm, Myriam Munezero,
Jürgen Münch, Mika Aaltola, Christina Palmu, and Tomi Männistö
Why Do We Do Software Process Improvement? Study on Commonly
Used Goals in Practice. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Anna Schmitt and Philipp Diebold
Developing Processes to Increase Technical Debt Visibility
and Manageability – An Action Research Study in Industry . . . . . . . . . . . . .
Jesse Yli-Huumo, Andrey Maglyas, Kari Smolander, Johan Haller,
and Hannu Törnroos
Applying Social Network Analysis and Centrality Measures to Improve
Information Flow Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Stephan Kiesling, Jil Klünder, Diana Fischer, Kurt Schneider,
and Kai Fischbach
Design of Project Management Capabilities . . . . . . . . . . . . . . . . . . . . . . . .
Solvita Berzisa and Jānis Grabis

311

327

344

360

368

379

387

Speed and Agility in System Engineering
Relationship of DevOps to Agile, Lean and Continuous Deployment:
A Multivocal Literature Review Study . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Lucy Ellen Lwakatare, Pasi Kuvaja, and Markku Oivo

399

Agile Practices, Collaboration and Experience: An Empirical Study About
the Effect of Experience in Agile Software Development . . . . . . . . . . . . . . .
Martin Kropp, Andreas Meier, and Robert Biddle

416

A Multiple Case Study on the Architect’s Role in Scrum . . . . . . . . . . . . . . .
Matthias Galster, Samuil Angelov, Marcel Meesters,
and Philipp Diebold

432

XIV

Contents

Continuous Integration Applied to Software-Intensive Embedded
Systems – Problems and Experiences. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Torvald Mårtensson, Daniel Ståhl, and Jan Bosch

448

Exploring Norms in Agile Software Teams . . . . . . . . . . . . . . . . . . . . . . . .
Viktoria Stray, Tor Erlend Fægri, and Nils Brede Moe

458

Forces that Prevent Agile Adoption in the Automotive Domain . . . . . . . . . .
Philipp Hohl, Jürgen Münch, Kurt Schneider, and Michael Stupperich

468

Exploring IoT User Dimensions: A Multi-case Study on User Interactions
in ‘Internet of Things’ Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Helena H. Olsson, Jan Bosch, and Brian Katumba

477

Requirements and Quality
An Industrial Case Study on Measuring the Quality of the Requirements
Scoping Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Krzysztof Wnuk, Markus Borg, and Sardar Muhammad Sulaman
Quality Rule Violations in SharePoint Applications: An Empirical Study
in Industry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Apostolos Ampatzoglou, Paris Avgeriou, Thom Koenders,
Pascal van Alphen, and Ioannis Stamelos
Quality Assurance of Requirements Artifacts in Practice: A Case Study
and a Process Proposal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Henning Femmer, Benedikt Hauptmann, Sebastian Eder,
and Dagmar Moser
Commodity Eats Innovation for Breakfast: A Model for Differentiating
Feature Realization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Aleksander Fabijan, Helena Holmström Olsson, and Jan Bosch

487

495

506

517

Process and Repository Mining
PROMOTE: A Process Mining Tool for Embedded System Development . . .
Arttu Leppäkoski and Timo D. Hämäläinen
Evaluation of Kano-like Models Defined for Using Data Extracted
from Online Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Huishi Yin and Dietmar Pfahl
Log File Analyzing in Intelligent Transportation Systems Development . . . . .
Esa Heikkinen and Timo D. Hämäläinen

529

539
550

Contents

On the Effectiveness of Vector-Based Approach for Supporting
Simultaneous Editing of Software Clones . . . . . . . . . . . . . . . . . . . . . . . . . .
Seiya Numata, Norihiro Yoshida, Eunjong Choi, and Katsuro Inoue

XV

560

Business Value and Benefits
The Developers Dilemma: Perfect Product Development or Fast Business
Validation?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Henri Terho, Sampo Suonsyrjä, and Kari Systä

571

Workshop-Based Corporate Foresight Process: A Case Study . . . . . . . . . . . .
Leila Saari, Tanja Suomalainen, Raija Kuusela,
and Tapio Hämeen-Anttila

580

DevOps Adoption Benefits and Challenges in Practice: A Case Study . . . . . .
Leah Riungu-Kalliosaari, Simo Mäkinen, Lucy Ellen Lwakatare,
Juha Tiihonen, and Tomi Männistö

590

Towards Continuous Customer Satisfaction and Experience Management:
A Measurement Framework Design Case in Wireless B2B Industry . . . . . . .
Petri Kettunen, Mikko Ämmälä, Tanja Sauvola, Susanna Teppola,
Jari Partanen, and Simo Rontti

598

Emerging Research Topics
Gamification of Software Testing - An MLR . . . . . . . . . . . . . . . . . . . . . . .
Mika V. Mäntylä and Kari Smolander

611

Internationally Distributed Software Development: On the Impact
of Distance Based on a Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Harri Sten, Hannu Jaakkola, and Kari Systä

615

Using Scrum to Develop a Formal Model – An Experience Report . . . . . . . .
Marta Olszewska, Sergey Ostroumov, and Marina Waldén

621

Towards Better Selection Between Moving Windows and Growing
Portfolio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Sousuke Amasaki and Chris Lokan

627

Assessing the Behavior of Software Analysis Tools. . . . . . . . . . . . . . . . . . .
Lerina Aversano, Carmine Grasso, Pasquale Grasso,
and Maria Tortorella
Driving Academic Spin-off by Software Development Process:
A Case Study in Federal Institute of Rio Grande do Norte - Brazil . . . . . . . .
Claudia M.F.A. Ribeiro, Fellipe A. Aleixo, and Marília A. Freire

631

636

XVI

Contents

Future of Computing
The CRUSOE Framework: A Holistic Approach to Analysing Prerequisites
for Continuous Software Engineering. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Teemu Karvonen, Tanja Suomalainen, Marko Juntunen, Tanja Sauvola,
Pasi Kuvaja, and Markku Oivo
Software Development in the Post-PC Era: Towards Software Development
as a Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Sami Alajrami, Alexander Romanovsky, and Barbara Gallina

643

662

Invited Papers
The Origins of Design Thinking and the Relevance in Software Innovations . . .
Matilde Bisballe Jensen, Federico Lozano, and Martin Steinert

675

Playing Protection Poker for Practical Software Security . . . . . . . . . . . . . . .
Martin Gilje Jaatun and Inger Anne Tøndel

679

Exploring Expectations About Risk-Based Testing: Towards Increasing
Effectiveness and Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Michael Felderer and Rudolf Ramler

683

2nd International Workshop on Human Factors in Software
Development Processes
Human Factors in Software Development Processes: Measuring System
Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Silvia Abrahao, Maria Teresa Baldassarre, Danilo Caivano,
Yvonne Dittrich, Rosa Lanzilotti, and Antonio Piccinno
Gamification and Functional Prototyping to Support Motivation Towards
Software Process Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Mercedes Ruiz, Manuel Trinidad, and Alejandro Calderón

691

697

Exploring Mobile User Experience Through Code Quality Metrics . . . . . . . .
Gerardo Canfora, Andrea Di Sorbo, Francesco Mercaldo,
and Corrado Aaron Visaggio

705

Early Usability in Model-Driven Game Development . . . . . . . . . . . . . . . . .
Silvia Abrahão, Emilio Insfran, José Ángel Carsí, and Adrián Fernandez

713

What Aspects of Context Should Be Described in Case Studies About
Software Teams? Preliminary Results from a Mapping Study . . . . . . . . . . . .
Maria Teresa Baldassarre, César França, and Fabio Q.B. da Silva

723

Contents

Miscommunication in Software Projects: Early Recognition
Through Tendency Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Fabian Kortum, Jil Klünder, and Kurt Schneider

XVII

731

Doctoral Symposium
A Research Proposal: Tracking Open Source Software Evolution
for the Characterization of Its Evolutionary Behavior. . . . . . . . . . . . . . . . . .
Munish Saini and Kuljit Kaur Chahal
Transition from Plan-Driven to Agile: An Action Research . . . . . . . . . . . . .
Mohammad Abdur Razzak
Software Product Innovation Through Startup Experimentation
in Large Companies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Henry Edison

741
746

751

Tutorials
Tutorials at PROFES 2016 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Daniela S. Cruzes and Sabrina Markzac

759

Continuous Experimentation: Accelerating Innovation Through Highly
Effective Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Jürgen Münch

761

Integrating Agile Development with Process Standards Like ASPICE
and ISO 26262 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Even-André Karlsson

763

Architecture Evaluation - Threat or Opportunity? . . . . . . . . . . . . . . . . . . . .
Even-André Karlsson

765

SafeScrum Tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Geir Kjetil Hanssen, Thor Myklebust, Tor Stålhane, and Børge Haugset

767

Creating Champions and Battling Dragons – How to Create a DevOps
Culture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Pål Thomassen and Ingrid Sorgendal

770

Lean Startups in Established Companies: How to Make it Really Happen
and How to Avoid Common Pitfalls . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Nils Brede Moe and Tone Merethe Aasen

772

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

775

Keynotes

The Relationship Between Software Process,
Context and Outcome
Dag I.K. Sjøberg1,2 ✉
(

1

)

Department of Informatics, University of Oslo, Oslo, Norway
dagsj@ifi.uio.no
2
SINTEF ICT, Trondheim, Norway

Abstract. Most practitioners and researchers agree that when developing soft‐
ware, process aﬀects product, and the usefulness of a process depends on the
context. However, which processes are most useful for a speciﬁc company or
project is generally unknown. When studying the relation between context,
process and product, one challenge is that experiments often lack realism, which
makes the transfer of results to industry diﬃcult. In contrast, most of the important
factors vary beyond the researcher’s control in case studies, which makes it diﬃ‐
cult to identify cause and eﬀect relationships. This paper reports a study where
realism was combined with control over certain context and process factors. Four
companies developed the same system, and the price varied by a factor of six.
Certain patterns of relationships were expected (expensive company, low cost,
schedule overrun); others were unexpected (cheap company, maintainable system
because of small code). The community needs to identify the most important
relationships among process, context and outcome.
Keywords: Software process improvement · Controlled multiple-case study ·
Software industry · Theory · Software engineering folklore · Measurement

1

Introduction

I am regularly contacted by various organizations for help regarding their software
processes. They range from small, private companies to large, public sector agencies,
some of whose projects failed to the order of hundreds of millions of euros. These
organizations are not interested in general, overall principles regarding process; they are
interested in what would work for them. They expect us, as researchers in the ﬁeld, to
know “what works for whom, where, when, and why” [1].
In a few cases, I have immediate suggestions for improvement, such as introducing
automated testing if the number of defects is out of control. But in most cases, I cannot
propose anything without working with the organization for some length of time. Iden‐
tifying and measuring the factors that should be taken into account when proposing
process changes are far from trivial; the software engineering literature does not give
much help in concrete settings. The current body of knowledge is mostly too general or
too speciﬁc; see Fig. 1.

© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 3–11, 2016.
DOI: 10.1007/978-3-319-49094-6_1

4

D.I.K. Sjøberg

Fig. 1. Useful theories and patterns

Software engineering folklore guides processes to some extent. For example, a
collection of “laws” and principles that have emerged over the years can be found in
[2–4]. One of the laws stated by Endres and Rombach [4], attributed to the work by
Boehm [5], is: “Errors are most frequent during the requirements and design activities
and are more expensive the later they are removed.” This “law” encourages processes
that emphasize the analysis and design phases. However, what does this mean in prac‐
tice? Should one spend, say, 10 to 30 percent of the total eﬀort of a development project
in these phases? Because of varying contexts, software engineering folklore is often
contradictory. For example, much eﬀort has been devoted to developing process models
that include a large number of activities, practices and roles together with formal,
detailed project documents. In contrast to such heavy processes are the light processes
recommended in agile development.
Few empirical studies in software engineering discuss the contexts to which the
results may be generalizable. Experiments in software engineering generally have few
subjects and almost all of them use convenience sampling [6]. Most case studies are of
single cases, and few attempt to generalize the results through theories with a welldeﬁned scope of validity [7]. Surveys collect people’s subjective opinions, which are
based on knowledge and experience gained in speciﬁc contexts. The results of surveys
also need to be related to theories to become generally useful.
Nevertheless, a premise in software engineering is that there is a relationship between
software processes and success of a project or task. The success is typically described
in terms of the quality of the delivered software, how long it takes to develop it and how
much it costs. It is also commonly agreed that this relationship is moderated by the
context of the processes, as illustrated in Fig. 2. It is reasonable to assume that an optimal
process varies with context; for example, a small team may not beneﬁt from activities
designed to help large teams.

The Relationship Between Software Process, Context and Outcome

5

Fig. 2. Relationship between process, context and outcome

The ideal model would be a deterministic one, in which a set of given context and
outcome parameters would determine the optimal process, and a given process and
context would determine the outcome. However, it is unlikely that we will manage to
develop such models given that software development is a mostly human activity and
we are unable to describe human behaviour deterministically in general, even though
certain theories describe human behaviour in speciﬁc situations, for example, the pros‐
pect theory [8].
Although we are far from a scenario where we can fully determine which process
gives which result in a speciﬁc context, our community can improve in identifying
patterns and proposing theories for the relationships among process, context and
outcome. This paper reports on a study that is an ongoing attempt in that direction.

2

Design of Study

The empirical software engineering community conducts both controlled experiments,
which focus on cause and eﬀect, and case studies, which focus on realism. How to
identify cause–eﬀect relationships in realistic settings is a challenge. What if we hire
several companies to develop the same system and see what happens? Some years ago,
our research group had such an opportunity. We needed a web application to store
information about all the empirical studies of the group. We developed a requirement
speciﬁcation and sent a call for tender to 81 consultancy companies and received bids
from 35 of them. A study of this bidding process was reported in [9].
The striking diﬀerence in the bids, given that we provided a well-deﬁned 11-page
requirement speciﬁcation, led us to use price as the selection criteria. We wanted to study
the eﬀect of price on process and outcome. Thus, in four price segments, we selected
the company that appeared most likely to develop a good system based on the quality
of the bid documents. The companies are named A to D in this paper, in the order of bid
price; see Fig. 3.

6

D.I.K. Sjøberg

Fig. 3. Four out of 35 companies selected for development

The data sources in this study are comprehensive. They include daily time sheets on
tasks and subtasks of each developer, weekly snapshots of all documents including
source code produced during the projects, full history provided by the conﬁguration
management and issue tracker tools and other information collected from defect logs,
e-mail communication and team interviews.
From this study, we published an investigation on reproducibility and variability in
software engineering [10] and a study of eﬀort estimation based on use case points [11].
The code developed by the four companies has also been used in follow-up studies on
maintenance metrics [12] and eﬀects of code smells [13]. (In this paper, the companies
are named according to bid price, while in the papers already published, the order was
alphabetic. Company C is now Company A, Company A is now Company B, and
Company B is now Company C. Company D remains Company D.)
A detailed investigation of the eﬀect of process and context has not yet been
published. Initial results are reported here.

3

Context

We controlled parts of the context to make them the same for all the companies; other
context factors were speciﬁc to each company. The controlled ones included:
•
•
•
•
•
•

Requirement speciﬁcation
Application domain (web document management)
Functional size of the system (57 unadjusted use case points [11])
Low complexity of system
Customer (our research department)
Programming language (Java, Javascript and SQL)

The Relationship Between Software Process, Context and Outcome

7

• Tools (IDE: Netbeans or Eclipse, Build & Deploy: Ant, Conﬁguration management:
CVS; note that these tools were selected by the companies themselves but they
happened to be the same by accident)
• Team composition (1 project manager and 2 developers, except in Company B, which
had 1 developer and 1 interaction designer)
• Uniform interaction between development team and customer (e.g., use of same issue
tracker, acceptance tests by the same customer team)
• Intermediate skill level of the developers
Regarding skill level, we selected the developers on the basis of their CVs. All of
them had at least three years of formal education in programming and three years of
industrial experience with the technology to be used. Ideally, we should have tested the
developers using a validated skill evaluation instrument [14], but in the absence of such
an instrument at that time, the developers were tested for their Java skills by taking part
in a one-day exercise in which they performed the same Java programming tasks used
in a former experiment [15]. Their performance was thus compared with that of 77 other
Java programmers. Similarly, the developers were tested for their design skills by taking
part in a half-day UML exercise where they performed the same tasks used by 28 persons
in a former experiment on use cases and class diagrams [16]. We did not observe any
clear relationship between the skills of the team and project outcome.
Table 1 shows context factors that varied among the companies. Some factors were
speciﬁc to the development organization; others were speciﬁc to this development
project. Note that the bid by Company D, of 69,000 euros, shown in Fig. 3, was nego‐
tiated down to the 56,000 euros, shown in Table 1.
Table 1. Varying context
Aspect
Development
organization

Project

Variable
Size (#
employees)
Nationality
Ownership

Company A
Appr. 8

Company B
Appr. 100

Company C
Appr. 25

Domestic
By employees

Domestic
Private

Domestic
By employees

Location

Bergen

Oslo

Oslo

Process
models
Firm price
Agreed time
schedule
Estimated
effort
Allocation
Co-location

Light

Intermediate

Intermediate

Company D
Appr. 13,000
worldwide
International
Listed
exchanges
Oslo, 20
countries
Heavy

8,750
41 days

20,000
55 days

45,380
73 days

56,000
62 days

100 hours

220 hours

341 hours

650 hours

Part-time
No

Part-time
No

Part-time
No

Full-time
Yes

8

D.I.K. Sjøberg

The table shows several internal relationships among the factors. Company A is small
and can therefore only run fairly small projects with small teams. Their organizationlevel process models are therefore light. The low price oﬀered to build the system is
followed by an expectation of a short lead time, a low number of eﬀort hours, and the
need for the developers to work on several projects in parallel. At the other end, Company
D is large and has a heavy organization-level process model. The high bid allows higher
estimated eﬀort and allows developers to work full time on this project.

4

Processes

As an example of process data, Fig. 4 shows the number of hours the companies spent
on various development activities. Note the one-to-one correspondence between the
eﬀort spent on the activities “Implementation” and “Analysis and Design”. There is no
indication here that much eﬀort spent on analysis and design reduces the eﬀort needed
on implementation, or vice versa. Remember that the amount of functionality is ﬁxed.
Figure 5 shows the hours spent on the activities as the projects were running.

Fig. 4. Eﬀort spent on various activities

The Relationship Between Software Process, Context and Outcome

9

Fig. 5. Eﬀort on activities along the way

5

Outcome

The outcome of a process or project may be measured along many dimensions. For the
four systems, we assessed reliability, usability and maintainability. Reliability was
measured by investigating the defects found over a period of two years when the systems
were operational [10]. The usability was measured through a dedicated experiment [17]
and maintainability in a follow-up experiment [13]. Table 2 shows the results and also
includes measurements of eﬀort and lead time.
Table 2. Outcome variables
Aspect
System

Effort
Lead time

Variable
Reliability
Usability
Maintainability
Actual effort
Overrun effort
Actual lead time
Overrun lead time

Company A
Poor
Fair
Good
315 hours
215 %
79 days
93 %

Company B
Good
Good
Fair
562 hours
155 %
87 days
58 %

Company C
Good
Fair
Poor
894 hours
74 %
90 days
23 %

Company D
Fair
Good
Fair
796 hours
22 %
65 days
5%

10

6

D.I.K. Sjøberg

Relationships

Did the rather extreme price diﬀerences, of a factor from 1 to 6, lead to corresponding
diﬀerences in outcome? Generally not. Company A had given the lowest bid and
accordingly spent the least eﬀort on the whole development, particularly on analysis and
design and on testing. In a sense, this company developed a “quick and dirty” solution,
but the small size of their Java code led to the most maintainable system. The low number
of lines of Java code trumped other maintainability metrics [12]. On the other hand, the
low focus on testing resulted in the least reliable system and required us as a customer
to spend much more eﬀort on testing than we did for the other companies. In total, we
spent almost twice the number of eﬀort hours on Company A as we did on the other
companies, which to some extent reduces the cost savings of hiring Company A.
Furthermore, we were a competent customer; an incompetent customer might have
resulted in a failed project.
Company B scored best on the system quality dimensions on average. Given the next
lowest price, one may consider their project as the best value for money.
Company C over-designed their system, which resulted in excess code size, which
in turn resulted in poor maintainability. But they scored top on reliability.
Company D had relatively heavy processes and a highly competent project manager.
The developers worked full-time and were co-located. Their project seemed to have full
control all the way and resulted in the lowest lead time and very little overrun.
We have observed many other relationships among context, process and outcome,
but much analysis remains. We hope to reveal interesting patterns that may shed new
light on existing theories or be the basis for new theories.

References
1. Dybå, T., Sjøberg, D.I.K., Cruzes, D.S.: What works for whom, where, when, and why? On
the role of context in empirical software engineering. In: ACM-IEEE International
Symposium on Empirical Software Engineering and Measurement. ACM (2012)
2. Brooks Jr., F.P.: The Mythical Man-Month: Essays on Software Engineering. AddisonWesley Publishing Company, Reading (1975)
3. Glass, R.L.: Facts and Fallacies of Software Engineering. Addison-Wesley Professional,
Reading (2002)
4. Endres, A., Dieter Rombach, H.: A Handbook of Software and Systems Engineering:
Empirical Observations, Laws, and Theories. Pearson Education, New York (2003)
5. Boehm, B.W., McClean, R.K., Urfrig, D.E.: Some experience with automated aids to the
design of large-scale reliable software. IEEE Trans. Softw. Eng. 1, 125–133 (1975)
6. Sjøberg, D.I.K., Hannay, J.E., Hansen, O., Kampenes, V.B., Karahasanovic, A., Liborg, N.K.,
Rekdal, A.C.: A survey of controlled experiments in software engineering. IEEE Trans. Softw.
Eng. 31(9), 733–753 (2005)
7. Sjøberg, D.I.K., Dybå, T., Anda, B.C., Hannay, J.E.: Building theories in software
engineering. In: Shull, F., et al. (eds.) Guide to Advanced Empirical Software Engineering,
pp. 312–336. Springer, London (2008)
8. Kahneman, D., Tversky, A.: Prospect theory: an analysis of decision under risk. Econometrica
47(2), 263–291 (1979)

The Relationship Between Software Process, Context and Outcome

11

9. Jørgensen, M., Carelius, G.J.: An empirical study of software project bidding. IEEE Trans.
Softw. Eng. 30(12), 953–969 (2004)
10. Anda, B.C.D., Sjøberg, D.I.K., Mockus, A.: Variability and reproducibility in software
engineering: A study of four companies that developed the same system. IEEE Trans. Softw.
Eng. 35(3), 407–429 (2009)
11. Anda, B., Benestad, H.C., Hove, S.E.: A multiple-case study of software eﬀort estimation
based on use case points. In: International Symposium on Empirical Software Engineering,
pp. 407–416 (2005)
12. Sjøberg, D.I.K., Anda, B.C., Mockus, A.: Questioning software maintenance metrics: a
comparative case study. In: ACM-IEEE International Symposium on Empirical Software
Engineering and Measurement (2012)
13. Sjøberg, D.I.K., Yamashita, A., Anda, B.C., Mockus, A., Dybå, T.: Quantifying the eﬀect of
code smells on maintenance eﬀort. IEEE Trans. Softw. Eng. 39(8), 1144–1156 (2013)
14. Bergersen, G.R., Sjøberg, D.I.K., Dybå, T.: Construction and validation of an instrument for
measuring programming skill. IEEE Trans. Softw. Eng. 40(12), 1163–1184 (2014)
15. Arisholm, E., Sjøberg, D.I.K.: Evaluating the eﬀect of a delegated versus centralized control
style on the maintainability of object-oriented software. IEEE Trans. Softw. Eng. 30(8), 521–
534 (2004)
16. Anda, B., Sjøberg, D.I.K.: Investigating the role of use cases in the construction of class
diagrams. Empirical Softw. Eng. 10(3), 285–309 (2005)
17. Følstad, A., Anda, B.C.D., Sjøberg, D.I.K.: The usability inspection performance of workdomain experts: An empirical study. Interact. Comput. 22(2), 75–87 (2010)

Early Phases in Software Engineering

Eight Paths of Innovations in a Lean Startup
Manner: A Case Study
Mikko Raatikainen1(B) , Marko Komssi2 , Harri Kiljander2 , Laura Hokkanen3 ,
Jukka Märijärvi4 , and Omar Mohout5
1

3

Aalto University, Espoo, Finland
mikko.raatikainen@aalto.fi
2
F-Secure Inc., Helsinki, Finland
{marko.komssi,harri.kiljander}@f-secure.com
Tampere University of Technology, Tampere, Finland
laura.hokkanen@tut.fi
4
Landon Ltd., Helsinki, Finland
jukka.marijarvi@landon.fi
5
Sirris, Heverlee, Belgium
omar.mohout@sirris.be

Abstract. Software companies face high pressure to develop innovative
products and services at increasing speed. However, a traditional new
product development (NPD) process is not always a suﬃcient means
for doing this. We report experiences from an explorative multiple case
study covering eight cases from four companies of diﬀerent sizes and
business characteristics. Each case aimed to streamline the development
of a product or service innovation in a Lean startup manner as an alternative to the traditional NPD. We present eight life-cycle paths that
together exemplify the use of the ﬁve organizational alternatives, such
as internal startup and company subsidiary. Driving force to choose the
organizational alternative is novel business endeavor rather than being
depended on the company. Using even multiple organizational alternatives is possible during the innovation life-cycle as long as the speed and
independence for the innovation is achieved.
Keywords: Internal startup · Lean startup · Subsidiary · Institutional
entrepreneurship · Intrapreneurship · Case study · Industrial experience

1

Introduction

Software companies face high pressure to develop innovative products and services at increasing speed. The traditional new product development (NPD)
process, such as presented by Cooper [1], seeks to enable the creation of new
innovative products. However, the NPD process is fuzzy and diﬃcult from the
beginning [2]. In particular, NPD is a complex higher-order capability that
involves multiple organizational functions, capabilities, and competencies [3].
However, forming a traditional NPD program may not be always a suﬃcient
c Springer International Publishing AG 2016

P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 15–30, 2016.
DOI: 10.1007/978-3-319-49094-6 2

16

M. Raatikainen et al.

to take care of the full life-cycle of developing and commercializing a new software product — creating new innovations may require more ﬂexibility in terms
of experimentation and learning from failure than is characteristic of the traditional NPD process [4]. For example, fast-paced, revolutionary, or disruptive
innovations could be done diﬀerently, as exempliﬁed by startups.
Recently, new means have emerged to reshape the work done in the traditional NPD programs. For example, Customer development [5] and Lean
startup [4] are methods that try to streamline the innovation development
process. However, in order to be able to gain the beneﬁts of such methods, software companies need to establish ways in which to carry out the Lean startup
process. Diﬀerent types of company ventures — seeking to eﬃciently capture
and monetize new innovations outside established business lines — can be used
for this purpose [6]: For instance, the software company can incubate the idea in
a separate strategic unit and, later, form an internal startup or even a separate
subsidiary to eﬀectively search for new business [7].
However, the experiences on diﬀerent types of company ventures are still
scarce. The research problem addressed in this paper is the structures and lifecycles of means of going beyond the traditional NPD. To shed light on the topic
from the industrial perspective, we carried out an explorative case study about
NPD programs that decided adopt also the other than traditional NPD structures
over their life-cycles. Based on the eight cases, we describe and synthesize (1) the
structures that were established, (2) the points in the life-cycles when decisions
were made about the structures, and (3) the justiﬁcations for the structures.
The paper is organized as follows. Section 2 describes the life-cycle of an innovation program as background. Section 3 outlines the research method. Section 4
provides the case accounts. Section 5 analyses the ﬁndings. Section 6 provides
discussion in the light of related work. Section 7 concludes.

2

Background: The Life-Cycle of an Innovation

For the scope of an innovation project, the four stages can be mapped and
summarized as follows (Fig. 1):
Idea stage: The innovation program should focus on gaining a detailed understanding of the problem or need that it wants to tackle. At the end of this stage,
the startup should have a holistic understanding of the problem domain, and a
minimal viable product (MVP) or concept to initiate concept validation with real
customers and users. Ideas are handled as hypotheses that need to be validated
using the MVP to collect customer feedback. The primary goal is learning.
Problem/solution ﬁt: In this stage, the innovation program should focus on
further developing the concept as an optimal solution for the ﬁrst lead users and
customers. The MVP oﬀers a path of very rapid iteration of customer requirements followed by testing and validation. This stage is called ‘going from 0 to 1’
by Peter Thiel [9]. Experimenting with (innovative ways of) customer acquisition
is the second key activity of this stage.

Eight Paths of Innovations in a Lean Startup Manner: A Case Study

17

Fig. 1. Four stages of the startup life-cycle [8].

Product/market ﬁt: Once the optimal solution for the lead users is ready
and the innovation program has been able to acquire new customers and users,
its focus should move to customer retention and further generation of the business model. In this stage, the innovation program should particularly focus on
retention, the business model, and a pricing strategy.
Scaling: When the innovation program has found a scalable business model,
the focus should shift to actual scaling. In this stage, the innovation program
should focus on accelerating the business. The acceleration typically requires
large investments in marketing and business development. This stage is called
‘going from 1 to n’ by Thiel [9].
The four stages form a value chain: No idea creates value until you embody
it in a product or service; no product or service captures value until you embody
it in a business model and pricing strategy; and no business model becomes
sustainable until you ﬁgure out distribution.
The lengths of these phases vary largely, and the stages can partially overlap
and are not clear-cut. It is also not possible to set a deadline for product/market
ﬁt. By moving from one stage to another, the risks are reduced because assumptions are validated. This is of paramount importance, because markets are simply
unpredictable for innovative products and services. According to our experience
from the ﬁeld, the main focus of each innovation program typically follows these
stages over the startup life-cycle. Furthermore, they are similar to other life-cycle
models presented in [4,10,11]. If the project does not succeed, it will face its end
and discontinue: That can take place during any of these phases.

3

Research Method

The research follows the explorative case study methodology, meaning a study
about a contemporary phenomena in industry [12]. As the research topic is

18

M. Raatikainen et al.

emergent, the selection of cases was done opportunistically and purposefully on
the basis of innovation programs that the authors were familiar with in order to
explore diﬀerent kinds of cases. The main criterion for selecting the cases was
that they did not follow a traditional NPD program but started to apply some
other kind of organizational alternative during the life-cycle. However, the idea
stage started within an existing company rather than as an independent startup in each case. The study was carried out to investigate the following cases:
F-Secure’s F-SOS, DF-Data, Freedome, Key, Lokki and Sense; Qentinel’s Quality
Intelligence (QI); Aptual’s Johku; and OP Financial Group’s Pivo. These are all
Finnish companies developing software-intensive solutions.
The collected data was qualitative and relied mostly on interviews. The Johku
and QI cases were each based on a one-hour interview with the CEO of the company. During the interview, the interviewee was asked to describe the life-cycle of
the case. Additional clariﬁcation questions were asked during the interviews. Two
researchers conducted the interviews and took notes. Sense, Freedome, KEY and
Lokki are based on the interview with the responsible vice president at F-Secure
as well as interviews with two representatives from each of these cases. Pivo
is based on an interview with the head of New Business Development who has
been responsible for Pivo from the start. All these interviews were carried out by
researchers who did not work in the case company. F-SOS is based on a previous
case study by one of the authors [13,14] complemented with re-analysis and an
additional informal interview with a key person in F-SOS. Moreover, two of the
authors have been a part of or had a supporting role in the ﬁve cases (Lokki,
Key, Freedome, Sense, and DF-Data). These authors’ participatory role in these
programs were used to enrich the data. Furthermore, the authors have gained
additional informal knowledge about the cases through personal collaboration.
Based on the acquired data, case accounts to summarize each case were
constructed. The accounts were read for corrections by at least one representative
of each case: Typically, the interviewee or alternatively someone equally familiar
with the case read the account.
We analyzed each of the cases along the life-cycle as introduced in the previous section as follows. For each case, we identiﬁed the organizational alternatives
that the case applied. Furthermore, we analyzed the decisions and justiﬁcations
to understand the rationale for each decision. The analysis was carried out in
a bottom-up manner, so that diﬀerent organizational alternatives emerged and
were formulated as we analyzed the cases. As diﬀerent organizational alternatives
emerged over the life-cycle, we were able to illustrate and synthesize the paths
that the cases had taken in a two-dimensional space. Finally, we cross-analyzed
and synthesized the rationales, decision points along the paths and beneﬁts over
diﬀerent cases.

4

Case Descriptions

F-Secure F-SOS. F-Secure Corporation is an Internet and cyber security company. F-Secure founded a company subsidiary called F-SOS to establish a new

Eight Paths of Innovations in a Lean Startup Manner: A Case Study

19

business concept and model. F-SOS used F-Secure’s existing technology without developing the core technology further and even without direct access to
the technology. Rather, the key idea was to build new service-oriented business
and service concept based on the existing technology. The ﬁrst release was not
especially successful, but a year later the second solution’s release showed major
success. After being in operation for one year, the F-SOS service and solution
were merged back into the F-Secure mother company and the solution became
strategically important, and eventually the most important of F-Secure’s business areas for several years. The key idea of the solution was to build a Software as
a Service (SaaS) business model around F-Secure’s existing technology. Around
the same time, F-Secure reshaped its business by discontinuing other product
oﬀerings. Later, F-SOS operated inside F-Secure as its own unit and the crossfunctional F-SOS team tried to establish other similar SaaS-type solutions that
turned out to be moderate successes, at best.
F-Secure DF-Data. F-Secure has a subsidiary called DF-Data for testing and
validating any new kinds of product or business concepts. Although products and
services are developed and marketed under this incubating subsidiary’s brand,
these products are using the subsidiary as a test bed without generating any
revenue. Overall, DF-Data operates as an incubator, for example to explore
and test social media marketing practices; to test beta products; and to test
concept products and technologies. All this is accomplished without the risk of
sacriﬁcing the mother companys brand name. The company has been used as
the publisher of beta versions for several products, such as FS Cloud and FS
Protection. Product concepts include Secure Selﬁe Camera, Funny Hat Stickers,
and Snapwallet: Photo Safe, that are available as apps for Android smartphones.
F-Secure Lokki. Lokki was the ﬁrst internal startup that F-Secure established
within the business organization. The company strategy renewal in 2012 identiﬁed Family Protection to be a prospective new security product and business
area. A new family location-sharing service concept was developed by a small
concept design team. The company leadership team decided to productize the
concept and bring it to the market with a rapid schedule, and the guidance
from the company CEO to the concept creation team leader was to “work like
a startup!”. The Lokki service reached some tens of thousands of users through
moderate marketing eﬀorts but it was eventually ramped down as it did not
ﬁt well enough within the company’s strategy framework. The learnings of the
internal startup way of working, together with some of the software features
are being deployed and further developed as part of current F-Secure consumer
security and privacy products and services. The Lokki service itself was open
sourced and Lokki is currently managed by University of Helsinki.
F-Secure Freedome. The second F-Secure internal startup, called Freedome,
was founded within the company strategy unit. The startup’s product leveraged

20

M. Raatikainen et al.

some existing technologies, while some parts were developed from scratch. After
the commercial launch of the MVP, the internal startup was seen as a great
success in terms of the deﬁned objectives such as the number of downloads and
new users, as well as positive reviews. The internal startup was later integrated
into the F-Secure consumer business organization. More recently, the product
has been adapted for business customers and has been subject to other kinds
of market and channel extensions, e.g. from B2C to B2B. However, the scaling of the business from consumers to B2B customers has not been completely
straightforward due to diﬀerent customer needs and expectations.
F-Secure KEY. The idea of the third internal startup, KEY, was incubated as
a technology proof-of-concept within the company strategy unit. After a strategic
decision, the internal startup was transfered to the consumer security business
organization. The initial hypothesis for the product was to compete in the direct
consumer business (B2C), but after this was found to be an uphill battle against
the dominant incumbent players, the more lucrative B2B2C channel was chosen
as the primary business opportunity. At the moment, various business models
are being experimented with, ranging from a standalone product to using the
product as a tactical add-on in a larger security service bundle.
F-Secure SENSE. By the strategic decision of the company management,
the fourth internal startup, called Sense, was founded at F-Secure. A signiﬁcant
investment is being made to develop a security software and hardware solution
to protect new kinds of IoT (Internet of Things) devices in smart homes. The
hardware product developed in-house is the ﬁrst of its kind for F-Secure. However, an essential element of the total product has been developed using existing
F-Secure software security technology. The product is currently available for preordering and the internal startup team is currently ﬁnalising the MVP to start
commercial deliveries.
Qentinel Quality Intelligence (QI). Qentinel is a quality assurance company
and most of its services are based on knowledge-intensive consulting. Around
10 years ago, Qentinel started to plan to shift its business toward a more sophisticated understanding of quality through the concept of value, resulting in a
traditional NPD program (2007–2008) that eventually became known as QI. QI
was a drastic change for Qentinel because the objective was to develop the company’s own products and services with protected IPR that was diﬀerent from
Qentinel’s existing consultancy-based business model. However, the project was
not a business success at ﬁrst, and the existing business of Qentinel continued to
be successful, which hindered the concept development of the new business. In
2008 the strategy work of Qentinel resulted in establishing runtime quality monitoring in information systems as a new area as one of its three strategic areas. A
company specialized in the technical monitoring of IT services was acquired in
2011. This technology-based service became a key part of the QI oﬀering, and at

Eight Paths of Innovations in a Lean Startup Manner: A Case Study

21

the same time was expected to ﬁnance the development of the other parts of QI.
However, the QI project was still relatively time-consuming and used resources
from the other proﬁtable consultancy-based business with constant cash ﬂow.
The organizational culture hindered rather than boosted the development of QI.
To clarify the role of QI, Qentinel ended up turning QI into an internal startup
staﬀed with dedicated people. This somewhat clariﬁed QI’s identity. Less than a
year later, Qentinel decided to restructure its organization. As part of this, the
QI internal startup was moved to a separate corporate subsidiary. Although the
move caused some uncertainties at ﬁrst, things started to go better. QI obtained
independence without the burden of the old organization and the QI employees
realized that they had the sole responsibility for acquiring new prospects and
customers. At the moment, QI is growing but its business is not yet making
proﬁt because of the heavy development investments.
Aptual Johku. Aptual is a small company that focuses on creating better
marketing communications and exploring new frontiers for its customers. The
company has historically carried out customer-speciﬁc projects that it has further developed and commercialized. This has resulted in a set of small solutions
that have each had a good problem/solution ﬁt for a single customer. However,
a poor product/market ﬁt has required a signiﬁcant amount of customer-speciﬁc
work with the next customers for scaling. Therefore, Aptual decided to narrow
down its number of solutions to three. One of these three was Johku, which
is today a software-based solution for travel service providers, such as cottage
renters. The intial version of Johku was developed as a typical Aptual NPD
project. However, Aptual carried out diﬀerent kinds of analysis about the market, resulting in a decision that the value proposition of Johku needed reshaping and sharpening as a part of a signiﬁcant development project. As a result,
Aptual decided to establish an internal startup for Johku that was ﬁnanced by
the two other revenue-generating solutions. The essence of the internal startup
was to make an explicit internal investment and clarify the role of Johku as
an upfront investment in development rather than trying to productize existing
projects. The internal startup developed an MVP and started to search for product/market ﬁt. A startup that takes care of Johku was spun oﬀ very recently.
While the startup considers the existing MVP to be ready for scaling, validation
from larger markets is lacking.
OP Pivo. The OP Financial Group is the largest of the major players that
dominate the Finnish banking market. Three to four years ago, the company
recognized the importance of mobile payments but the topic did not ﬁnd a clear
owner in the organization. A team familiar with mobile payments was established
and they started working on the problem. A new research and development unit
was set up in the city of Oulu, far from the headquarters, because there was
a lot of experience available due to the citys past life as one of Nokia’s major
sites in Finland. Most people were new hires, but the owner of the problem was
an OP Financial Group veteran, bringing in a wealth of banking experience.

22

M. Raatikainen et al.

The team started using the Lean startup paradigm relatively strictly, and was a
an internal startup inside the OP Financial Group. The Lean startup methodology was found to be successful, but it also became clear that if one follows
the model strictly, it is a very disciplined model and one needs to be prepared
for it. Initially the team created a set of assumptions, which looking back were
all wrong, and they were changed later in the project. However, fast learning
is key in the setup – being initially wrong does not matter as the Lean startup
methodology tests the assumptions early and then brings in end-user feedback
to re-direct the project. Currently, Pivo is its own company with its own management, being separated from the OP Financial Group so that the app can be
used by customers from any banks.

5

Analysis

Decisions need to be taken at the company level regarding the types of innovation
programs to be used. In particular, the decision-makers need to understand that
a traditional NPD program is only one option, while other alternatives exist. In
this section, we analyze the organizational alternatives to the traditional NPD
program that were identiﬁed from the cases. As another orthogonal concern,
we use the life-cycle phases of an innovation program from the stage of idea
incubation to that of business scaling, and map the path through the phases and
structures of the cases. The key decisions and rationales are also analyzed along
these paths. Finally, we highlight the key beneﬁts of applying other structures
than the traditional NPD program that have emerged from the cases.
5.1

Organizational Alternatives

The following structures were observed in the cases. The diﬀerence between the
alternatives is that the relative independence from the parent company increases
as we progress through the list. The cases that apply each structure is summarized in Table 1.
– NPD (new product development) project. This is broadly any kind of
traditional innovation or development project or programs for products or services within a company relying on established structures. It typically includes
Table 1. The structures and supporting cases.
Structure

Case

NPD

(All)

Internal startup

Freedome, Key, Lokki, Sense, Johku, Pivo

Company subsidiary

QI, F-SOS, Johku, Pivo

Company startup

Johku

Incubating subsidiary DF-Data

Eight Paths of Innovations in a Lean Startup Manner: A Case Study

–

–

–

–

23

the company’s strategic work and follows the company’s established practices
and structures. We diﬀerentiate an NPD because all cases were selected so
that the start was in an NPD.
Internal startup. Internal startups take place within a company but work
much more independently than an NPD, even entirely independently. As a
results, internal startups have diﬀerent levels of freedom from the companys
standard policies.
Company subsidiary. A company can found a child company to take care
of an innovation program. A subsidiary is fully owned by the originating company. Such a subsidiary has more freedom, responsibilities and ﬁnancial incentives than an internal startup.
Company startup. Innovation programs can also take place in more independent startups of which the originating company has no full control. However,
often the originating company or its owners have inﬂuence over or ownership
of the company startup, and a company startup still has ties to the originating company: For example, employees can have working contracts in both
companies or work through subcontracting arrangements.
Incubating subsidiary. A variant of the company subsidiary is an incubatory subsidiary. The subsidiary is not founded for one particular innovation
program, but the same subsidiary exists continuously and a series of innovation
projects are carried out within it.

5.2

Rationales and Success Factors

There were varying rationales for establishing an alternative structure to the NPD
program, but one common characteristic was that something novel or a signiﬁcant
change to the existing business was being planned. F-SOS developed a new SaaSbased business model. Sense had some new features such as including hardware
and targeting new kinds of devices, although its main functionality was basically
the same as other security products. DF-Data focused on diﬀerent kinds of new
concepts. Sense, Freedome, Key and Lokki each targeted new markets, although
they were strategically in the same security business area. In general, these three
products were considered internally as completely new oﬀerings rather than features of existing products, even though some existing technology was utilized. The
Aptual’s internal startup Johku was a means to make an investment decision and
to give Johku the independence, freedom and resources to develop the new solution as well as a change in business model. Similarly, gradually moving QI through
the phases of traditional NPD program, internal startup, and subsidiary seems to
have been a successful business model transformation and service development
path. Nevertheless, founding a subsidiary does not necessarily explain the recent
success of QI fully. During the transformation, the QI oﬀering has also matured,
the market seems to be more ready, and the technologies used within the major
digitalization trend have shaped the business environment.
When establishing any of the alternatives to a traditional NPD program, a
certain level of independence was looked for, and the independence seems to have
been a prerequisite and a success factor in all cases. The structural alternatives

24

M. Raatikainen et al.

are not better or worse per se, but rather a means to an end. Independence needs
to be realized by a clear mandate to be able to work toward the objectives. Independence also minimizes the disturbances of the existing organizational culture,
resulting in less unnecessary interaction.
One means of establishing an organizational structure as well as a factor
positively aﬀecting success seems to be the use of external people such as new
recruits in these organizational alternatives. The external people are not tied
to the old organizational culture and business models. For example, F-SOS had
many new recruits in the team and likewise Pivo recruited most of the team
from outside of OP group.
There is a risk, especially in establishing an internal startup, that it may
remain too close to the existing business areas whereas a corporate subsidiary,
being relatively more independent, can be a means for assuring independence.
Business lines do not seem to natively support novel or disruptive ideas that can
even be destructive to their current business or ideas beyond their current goals.
This is exempliﬁed by the QI case, where the organization did not initially realize the value of QI. Established business organizations also tend to be primarily focused on the short term or quarterly business priorities, not allowing a new
business or product entry to be developed and iterated suﬃciently, and a revenue
stream may be expected too soon, when the new product is still going through the
Problem/Solution Fit phase, as with Lokki. Business lines seem to focus on evolutionary ideas and concepts rather than revolutionary ones. As the CEO of Qentinel
stated, “The more successful your business is, the more likely your transformation
is to fail”. If the business is successful, there is less pressure to change or invent
something new, and the organization may not necessarily want a change. However,
there is then a risk that a competitor can establish a new business and take over the
market. Thus, it seems that a strategic or research unit is a good choice for internal
startups when business lines are inhibiting rather than catalyzing structures. The
more radical or disruptive the innovation is, the better a corporate subsidiary or
other means of being made independent can be, as indicated by F-SOS and QI.
Even in more general terms, independence seems to be a good indication for good
operations or freedom from existing hindering or controlling practices.
5.3

The Diﬀerent Paths

Combining the organizational alternatives described above and the life-cycle
phases results in a two-dimensional space through which each innovation program takes its path. This space and the paths of each program within the space
are illustrated in Fig. 2. The horizontal axis shows the diﬀerent phases as well as
the progress but does not indicate the absolute length of the phases, e.g. in terms
of months, but is roughly propositional in relation to the phase length, such as
whether transformation was carried out at the beginning or end of the phase.
For example, the life-cycle of Lokki and QI are presented in this same ﬁgure
but the life-cycles were roughly one and eight years, respectively. The vertical
axis lists the diﬀerent organizational alternatives as described above. The vertical position of diﬀerent paths within each box has no meaning. Due to the case

Eight Paths of Innovations in a Lean Startup Manner: A Case Study

25

Fig. 2. The paths that diﬀerent cases have taken through organizational alternatives
over the life-cycle of the cases.

selection, all paths start as a traditional NPD program. Some of the cases apply
several structures. Direct vertical lines over structures is used for illustrative
purposes for changes while horizontal lines indicate that a structure is applied;
for example, F-SOS that became a subsidiary from NPD without being internal
startup in between. The end of a line as an arrow indicates that particular cases
are still in progress, while Lokki has been discontinued at F-Secure and the code
has been open sourced.
In the following, we elaborate key points of the phases and decision points
along the paths.
Origins of the idea. As all cases represent an alternative means for structuring
an NPD, the ideation phase was typically carried out as part of the strategic work
within the company in the form of an NPD program or a pre-phase for such a
program. After the idea had reached a certain degree of maturity, an alternative
structure to the NPD program was established. An incubating subsidiary was
also used in DF-data, but even then, the ideas followed the company’s strategic
initiatives due to its close links to the main company. QI relied partially on M&A
for part of the technology, but even then the strategic work was the basis for the
idea that the M&A decision supported.
When to make the change. It is feasible to even incubate the idea for a long
time as a traditional NPD program but the idea phase seems to be a good point
at which to make a change. That is, all of the cases resulted in an organizational
alternative other than a traditional NPD program during the idea phase through
the establishment of an internal startup or a subsidiary. However, the length of
ideation and the maturity of the idea diﬀered largely. For example, especially QI

26

M. Raatikainen et al.

but also Freedome and Key were ideated and were a part of strategic process
for a long time as a traditional NPD program. On the one hand, the QI case
indicates that delaying the decision does hinder the progress. On the other hand,
it is unclear whether too immature a concept can survive on its own too early.
In contrast, Lokki was established as an internal startup relatively quickly as
the result of a strategic initiative.
Multiple changes. Although making several changes is not necessarily the
optimal strategy, multiple changes can be a less risky method and suitable for
certain situations. QI started as an NPD followed by an internal startup and
ﬁnally ending up becoming a subsidiary; and Johku started similarly as an
NPD followed by an internal startup but was ﬁnally spun oﬀ to an independent startup. The development of QI might have been slowed down as an NPD
and internal startup, but the slow progress allowed the idea to mature and was
not in the end that delaying to the progress. For Johku, the multiple changes
were justiﬁed by investment decisions and the changes do not seem to have had
too much of an eﬀect. Moreover, unlike in the other cases, the leaders of QI and
Johku were also the CEOs of their parent companies, and therefore, had more
power to make decisions and changes.
The end of the life-cycle. The structural alternatives can be temporary.
An internal startup in particular typically has a limited lifetime. At the end
of the life-cycle, the developed innovation can be brought back into the main
company or a spinoﬀ company can be established, for example. However, only
some of the studied cases are at the end of their life-cycle: F-SOS was turned
into a scalable business within the existing business line, but F-SOS has later
been able to repeat the success only moderately; Lokki was discontinued in FSecure during the internal startup stage on the basis of a strategic decision and
open sourced; DF-Data products have not been discontinued but not all of them
are being developed actively and some are being used for other purposes such
as testing and learning new marketing strategies; and Key and Freedome have
been brought back to the business lines and are searching for a scalable business
model within them.
It seems that after the end of the life-cycle of the alternative structures,
being too closely related to business lines can be problematic, especially for
novel innovations. A reason for this could be that, especially if the resulting
innovation is not mature enough, it does not ﬁt into the daily routines and
objectives of the business line. In fact, one important consideration during the
end of the life-cycle is whether the appropriate home for the developed concepts
lies in the existing business lines, or is it worth establishing a new business line
for the new innovation or a subsidiary to take care of the business, as is the case
with QI and Johku.
The length of the life-cycle. The length of the life-cycle and phases can vary
largely. Too fast progress in terms of ﬁnding solutions before fully understanding

Eight Paths of Innovations in a Lean Startup Manner: A Case Study

27

the problem or aiming at scaling before appropriate market ﬁt that is, rushing
to the next life-cycle phase too early seems in general to not be recommended, as
for any innovation program. For example, a too early search for a scalable model
and merging into a business line might have taken place in Freedome before a
proper market ﬁt was found, whereas the long idea stage of QI matured and
helped to clarify its value proposition. In particular, at the end of the life-cycle
of an organizational alternative such as an internal startup, it seems that the
solution should have completed the objectives of the phases properly in addition
to ﬁnding a relevant home.
In general, it seems that internal startups should continue until at least the
product/market ﬁt stage has been reached properly, while corporate subsidiaries
and other more independent entities should continue at least until the scaling
stage.
5.4

Beneﬁts

Eﬀects to the innovation program. The experience from the cases show
that, one the one hand, the alternatives to NPD, such as internal startups and
subsidiaries, can be more undisciplined as well as radical and quicker to test
something new. The independence of an innovation program means minimizing
the disturbances from, e.g., other employees, management, and even customers.
The existence of fewer such interfaces mean faster speed that then enables fast
changes and faster learning. On the other hand, compared to startups that do
not have any back-up from the originating company, they can be ﬁnancially in
a better position to not rush, for example starting scaling too early, which adds
interfaces and naturally slows the team down.
Eﬀects to the existing business. A successful innovation can even be a boost
to the existing business. In the case of QI, the main business ended up with
a new business area to complement their current business. F-SOS drastically
changed the business models of F-Secure toward the SaaS model, even in other
business lines. Although not everything results in ﬂourishing new business, the
cases show several other beneﬁts and usages. Lokki continues its life as open
source. Funny Hat Stickers from DF-Data was ﬁrst developed as a funny add-on
to cloud services, but once the F-Secure cloud services had been discontinued,
there was no strategic use for Funny Hats. However, the Funny Hat Stickers app
had users and worked on its own. Thus, Funny Hat Stickers was changed to be
a testbed for trying new things such as learning and testing the usage of social
media channels for various purposes.

6

Discussion

In recent years, new methods and models have been introduced to tackle the
challenge of bringing new products to market in a timely manner. While digital

28

M. Raatikainen et al.

products can be distributed globally by even the smallest startups, competition to ﬁnd scalable business models fast enough is getting tighter. Customer
development [5] and lean startup [4] seek to answer to this need where methods
like the NPD stage gate model [1] or the new concept model for Fuzzy Front
End (FFE) [15] fall short. Emphasizing interaction and testing hypotheses with
real customers to gain validation before investing heavily in development, Lean
startup takes advantage of global digital markets as a test bench for new product
innovations. However, these experimental methods also need speciﬁc attention
to avoid biases that result in false validation [16]. Moreover, utilizing such methods in established companies often requires more freedom for the executing team
in order for them to be able to react quickly to the insights gained. Company
ventures can enable the development of new business with more freedom while
fostering entrepreneurial culture [17] and also enable the recruitment of talent
to complement the current skillset inside the company [6].
This paper introduced four alternative organizational structures for traditional NPD: Internal startup, company subsidiary, company startup, and incubating subsidiary. While none of the study’s alternatives are novel, the ﬁndings
from the selection and usage of each alternative oﬀer practical insights from the
Finnish companies. In fact, one of the alternatives, internal startup, was used in
six out of eight of the study’s cases during the life-cycle of new business endeavors. Indeed, there is a growing interest in initiating internal startups, or at least
adopting the principles of the Lean startup method in software companies. To
our knowledge, however, internal startups have not been widely studied (with
that term) in the software business research ﬁeld. On the other hand, the term
and idea of the intrapreneur have been used in large software companies for over
thirty years [18]. Edison et al. [19] have recently studied internal startups in
one of the companies that is also part of our study. Their study focused on the
characteristics and implementations of internal startups, whereas our ﬁndings
emphasize that the internal startup seems to be a suitable but only a temporary
organizational structure in the early and middle phases of NPD.
Previous work exists on the factors that determine whether new business
opportunities are exploited by starting a new venture for an employer (i.e.,
nascent intrapreneurship) or independently (i.e., nascent entrepreneurship) [20].
In our study, the former refers to internal startups and the latter to company
spinoﬀs, such as subsidiaries and startups. Individual, organizational, and product characteristics all aﬀect the decision to exploit an opportunity via intrapreneurship or entrepreneurship [20]. Our ﬁndings highlight that the decision does
not have to be one or the other. A software company can also begin a business
endeavor with intrapreneurship and move to entrepreneurship later. Actually,
one of the cases shows that the order can also be the other way around. This
implies that decision-makers need to continuously evaluate, not just at the beginning, the alternatives for organizational structures over the life-cycle of NPD.
In the four cases of our study, spinoﬀs (either company subsidiaries or startups) were created. Interestingly, all of them have been successful business-wise
so far. Based on our ﬁndings, however, we cannot draw the conclusion that

Eight Paths of Innovations in a Lean Startup Manner: A Case Study

29

company subsidiaries or startups are always a better choice business-wise than,
for instance, internal startups. However, Rice et al. [6] and Mazur [21] conclude
that depending on business goals and available funding, spinning out can bring
advantages in terms of more ﬂexible funding possibilities as well as increased
motivation through increased freedom, responsibility, and personal risk. In addition, it can be beneﬁcial to recruit experienced venture capitalists and other
experts to advise the new spinoﬀ [6]. Chammanur and Yan also claim that
spinoﬀs are associated with an increase in long-term operating performance [22].
According to them, however, an incumbent management team is required to give
up control to a rival management team in the case of a spinoﬀ, which may motivate the incumbent management team to work harder to avoid loss of control.
Interestingly, however, companies are announcing increasing numbers of new but
smaller and smaller spinoﬀs [23]. Indeed, a small spinoﬀ will not take a lot of
power from the incumbent management team. Our study’s ﬁndings also showed
that the parent company’s CEO can become the leader of the new spinoﬀ.
Our ﬁndings are exploratory and preliminary representing cases only in one
country. Further studies on the same topic in diﬀerent countries are recommended. Our ﬁndings also raise further research topics. First, what are the business performance diﬀerences between internal startups and company spinoﬀs
over the NPD life-cycle? Secondly, if internal startups (or Lean startups inside
corporations) are becoming increasingly popular, how can we better integrate
the strengths of spinoﬀs, such as ﬁnancial freedom and incentive mechanisms,
into those internal startups?

7

Conclusions

We presented an explorative case study of innovation programs applying different organizational alternatives than traditional new product development
(NPD). We outlined the life-cycle of such innovation programs along their path
through the life-cycles of ideation, problem-solution, product/market ﬁt and
scaling the business. Along this path, we showed how an innovation program
approach can apply diﬀerent structures: internal startup, component subsidiary,
company startup, or incubating subsidiary. Finally, we elaborated on the nature
of diﬀerent decision points, changes, and prevailing mechanisms.
Acknowledgments. The authors would like to extend their gratitude to the participants of the study. They also acknowledge the ﬁnancial support of TEKES as part of
the Accelerate program and Need for Speed (N4S) program of Dimecc.

References
1. Cooper, R.G.: The new product process: a decision guide for management. J. Mark.
Manag. 3(3), 238–255 (1988)
2. Wowak, K.D., Craighead, C.W., Ketchen, D.J., Hult, G.T.M.: Toward a “theoretical toolbox” for the supplier-enabled fuzzy front end of the new product development process. J. Supply Chain Manag. 52, 66–81 (2015)

30

M. Raatikainen et al.

3. Bendoly, E., Bharadwaj, A., Bharadwaj, S.: Complementary drivers of new product development performance: cross-functional coordination, information system
capability, and intelligence quality. Prod. Oper. Manag. 21(4), 653–667 (2012)
4. Ries, E.: The Lean Startup. Crown Business, New York (2011)
5. Blank, S.: The Four Steps to the Epiphany. K&S Ranch, Pescadero (2013)
6. Rice, M.P., OConnor, G.C., Leifer, R., McDermott, C.M., Standish-Kuon, T.: Corporate venture capital models for promoting radical innovation. J. Mark. Theor.
Pract. 8(3), 1–10 (2000)
7. Fosfuri, A., Rønde, T.: Leveraging resistance to change and the skunk works model
of innovation. J. Econ. Behav. Organ. 72(1), 274–289 (2009)
8. Mohout, O.: Startup master class ii: Exodus – problem-solution ﬁt (2015). http://
www.slideshare.net/omohout/exodus-problem-solution
9. Thiel, P., Masters, B.: Zero to One: Notes on Startups, or How to Build the Future.
Crown Business, New York (2014)
10. Cooper, R.G.: What’s next?: after stage-gate. Res. Technol. Manag. 57(1), 20–31
(2014)
11. Overall, J., Wise, S.: An s-curve model of the start-up life cycle through the lens
of customer development. J. Private Equity 18(2), 23–34 (2015)
12. Yin, R.K.: Case Study Research, 2nd edn. Sage, Thousand Oaks (1994)
13. Komssi, M., Kauppinen, M., Heiskari, J., Ropponen, M.: Transforming a software
product company into a service business: case study at F-Secure. In: IEEE International Computer Software and Applications Conference, pp. 61–66 (2009)
14. Komssi, M., Kauppinen, M., Ropponen, M., Palomäki, P.: Transformations of
a solution strategy: a case study. In: Regnell, B., Weerd, I., Troyer, O. (eds.)
ICSOB 2011. LNBIP, vol. 80, pp. 140–153. Springer, Heidelberg (2011). doi:10.
1007/978-3-642-21544-5 12
15. Koen, P.A.: The fuzzy front end for incremental, platform and breakthrough products and services. In: PDMA Handbook, pp. 81–91 (2004)
16. York, J.L., Danes, J.E.: Customer development, innovation, and decision-making
biases in the lean startup. J. Small Bus. Strategy 24(2), 21 (2014)
17. Hass, B.H.: Intrapreneurship and corporate venturing in the media business: a
theoretical framework and examples from the german publishing industry. J. Media
Bus. Stud. 8(1), 47–68 (2011)
18. Pinchot, G.: Introducing the ‘intrapreneur’: successful innovators in large companies sometimes function as in-house entrepreneurs, running projects as independent
innovators would. IEEE Spectr. 22(4), 74–79 (1985)
19. Edison, H., Wang, X., Abrahamsson, P.: Lean startup: why large software companies should care. In: Scientiﬁc Workshop Proceedings of the XP2015 (2015)
20. Parker, S.C.: Intrapreneurship or entrepreneurship? J. Bus. Ventur. 26(1), 19–34
(2011)
21. Mazur, M.: Creating m&a opportunities through corporate spin-oﬀs. J. Appl. Corp.
Finan. 27(3), 137–143 (2015)
22. Chemmanur, T.J., Yan, A.: A theory of corporate spin-oﬀs. J. Finan. Econ. 72(2),
259–290 (2004)
23. Zenner, M., Junek, E., Chivukula, R.: Shrinking to grow: evolving trends in corporate spin-oﬀs. J. Appl. Corp. Finan. 27(3), 131–136 (2015)

On the Distinction of Functional and Quality
Requirements in Practice
Jonas Eckhardt1(B) , Andreas Vogelsang2 , and Daniel Méndez Fernández1
1

Technical University of Munich, Munich, Germany
{eckharjo,mendezfe}@in.tum.de
2
Technische Universität Berlin, Berlin, Germany
andreas.vogelsang@tu-berlin.de

Abstract. Requirements are often divided into functional requirements
(FRs) and quality requirements (QRs). However, we still have little
knowledge about to which extent this distinction makes sense from a
practical perspective. In this paper, we report on a survey we conducted
with 103 practitioners to explore whether and, if so, why they handle
requirements labeled as FRs diﬀerently from those labeled as QRs. We
additionally asked for consequences of this distinction w.r.t. the development process. Our results indicate that the development process for
requirements of the two classes strongly diﬀers (e.g., in testing). We
identiﬁed a number of reasons why practitioners do (or do not) distinguish between QRs and FRs in their documentation and we analyzed
both problems and beneﬁts that arise from that. We found, for instance,
that many reasons are based on expectations rather than on evidence.
Those expectations are, in fact, not reﬂected in speciﬁc negative or positive consequences per se. It therefore seems more important that the
decision whether to make an explicit distinction or not should be made
consciously such that people are also aware of the risks that this distinction bears so that they may take appropriate countermeasures.
Keywords: Quality requirements

1

· Functional requirements · Survey

Introduction

In literature (e.g., [9,13–15,18]), requirements are often categorized in functional
requirements (FRs), quality requirements (QRs), and constraints. FRs are characterized as “things the product must do” contrasting QRs as “qualities the
product must have” and constraints as “organizational or technological requirement”. Although this categorization is common sense to some degree, there are
still debates about the precision of the categories (e.g., [7]). There are other academic groups that suggest to rather distinguish between behavior (e.g., response
times) and representation (e.g., programming languages) [3].
In a previously conducted study [6], we analyzed 11 requirements speciﬁcations from industrial environments with a particular focus on requirements
labeled as “quality”. We found out that (i) there is a distinction between QRs
c Springer International Publishing AG 2016

P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 31–47, 2016.
DOI: 10.1007/978-3-319-49094-6 3

32

J. Eckhardt et al.

and FRs in the documentations, and that (ii) many requirements labeled as
QR actually describe system behavior and, thus, could also be labeled as FR.
However, our previous investigation focused on analyzing artifacts after the fact
and we still have little knowledge about what diﬀerence it makes in a development process if a requirement is labeled as FR or as QR and what the resulting
consequences are. In response to this question, we conducted a survey with 103
practitioners which we report in this paper.
In particular, we contribute: (i) a quantiﬁcation of company practices regarding the style of documenting functional and quality requirements, (ii) a list of
reasons why practitioner do or do not document FRs and QRs separately, (iii)
a list of consequences for the two styles of documentation that helps engineers
to make conscious decisions.

2

Research Objective

The goal of this study is to understand whether practitioners consider productrelated requirements labeled as FR diﬀerently from those labeled as QR. We
are further interested in the reasons for this distinction and the resulting consequences for the development process. We derive the following research questions:
RQ1: Do practitioners handle FRs and QRs diﬀerently? In this RQ,
we want to analyze whether QRs are documented in practice, whether there
is a distinction in the documentation, and whether this distinction makes a
diﬀerence in the development process. To this end, we formulate the following
sub-RQs:
RQ1.1 Do practitioners diﬀerentiate between QRs and FRs in the
documentation? We want to know whether the accepted categorization
of product-related requirements as FRs or QRs is reﬂected in the style of
documentation as used in practice.
RQ1.2 To what extent do development activities for QRs diﬀer from
activities for FRs? A possible consequence of a requirement categorization is that diﬀerent categories of requirements are handle diﬀerently in
the development process. We want to investigate whether this is the case
in practice and how this is inﬂuenced by the style of documentation.
RQ2: What are reasons for distinguishing or not distinguishing
between QRs and FRs in the documentation? While categorizations
only provide deﬁnitions, we are interested in the underlying reasons that lead
practitioners to distinguish or not distinguish between QRs and FRs in the
documentation.
RQ3: What are positive and negative consequences of distinguishing
or not distinguishing QRs and FRs in the documentation? A decision for or against a separate documentation may have positive or negative
consequences that practitioners should be aware of.

On the Distinction of Functional and Quality Requirements in Practice

3

33

Research Methodology

Our goal was to reach out to a broad spectrum of practitioners and capture their
perceptions of their own project environments. To this end, we used (online)
survey research as our main vehicle. We intentionally designed the survey such
that respondents required as little eﬀort as possible to complete it; we kept
the number of questions at a minimum, the instrument was self-contained and
it included all relevant information. We further limited the response types to
numerical, Likert-scale, and short free form answers as suggested by Kitchenham
and Pﬂeeger [10]. As a validation of our instrument and its alignment with the
audience, we piloted the survey with three practitioners, who completed the
survey and afterwards participated in an interview, where questions and answers
where checked for misunderstandings.
3.1

Subject Selection

We deliberately targeted practitioners who work with requirements. This
includes practitioners who write requirements (e.g., requirements engineers)
but also practitioners whose work is based on requirements (e.g., developers or
testers), and also practitioners who manage projects or requirements. Our survey
was further conducted anonymously. Since we were not able to exactly control
who is answering the survey, it was especially important to follow Kitchenham
and Pﬂeeger’s [10] advice on the need to understand whether the respondents
had enough knowledge to answer the questions in an appropriate manner. For
this, we excluded data from respondents who answered that they do not use
requirements speciﬁcations at all, or respondents who stated that they did not
know how requirements are handled in their company. We ﬁnally oﬀered respondents the chance to leave an email address if they were interested in the results
of the survey.
3.2

Data Collection and Instrument

We started our data collection on February 4th, 2016 and closed the survey on
February 22nd, 2016. For inviting practitioners to participate, we did not select a
speciﬁc closed group of practitioners but, instead, contacted as many practitioners as possible via the authors’ personal contacts from previous collaborations,
via public mailing lists such as RE-online, and via social networks. In the following, we introduce the main elements of our instrument used. The full instrument
can be taken from our online material1 .
Demographics: We collected a set of demographic data from the respondents
to interpret and triangulate the data with respect to diﬀerent contexts of the
respondents. The demographic data included the role of the participant, the
experience, the company’s size, the typical project size, the geographical distribution of project members, the paradigm of their applied development process
1

http://www4.in.tum.de/∼eckharjo/SurveyResults.zip.

34

J. Eckhardt et al.
All
Documenting QRs
Distinction btw.
QRs and FRs

Not documenting
QRs

No distinction

Fig. 1. Categorization of respondents by their style of documenting QRs.

(on a scale from agile to plan-driven), the industrial sector, the type of developed
systems, and the role of the requirements speciﬁcation within the company. To
better understand the participant’s focus and project context, we additionally
asked respondents for the importance of diﬀerent types of QRs in their projects.
The respondents were asked to assess the importance of quality factors2 taken
from ISO/IEC 25010 [8] for their typical projects on a 5-point Likert scale.
Practices of Handling QRs: As a ﬁrst step towards comparing diﬀerent practices for handling QRs, we asked the respondents how strongly development
activities diﬀer between QRs and FRs in the phases requirements engineering,
architecture/design, implementation, and testing. As a follow up, we provided a
free form text ﬁeld and asked the respondents to explain the diﬀerences in detail.
We were especially interested in the question whether it makes a diﬀerence
for the development process if project participants distinguish between QRs and
FRs and how this distinction is documented. Therefore, we asked the respondents two conditional questions. First, we asked whether QRs are explicitly documented in their projects. If this was the case, we asked whether the respondents
explicitly distinguish between QRs and FRs in the documentation, i.e. whether
they are labeled diﬀerently (e.g., some requirements are labeled as performance
or maintainability) or documented in diﬀerent sections (e.g., special sections for
performance or maintainability). The answers to these questions categorize the
responses into three groups (see also Fig. 1).
Problems/Beneﬁts of Current Practices: Given the categorization into the
three groups, we asked our respondents for speciﬁc reasons why they do or do
not distinguish between QRs and FRs. Additionally, we asked for beneﬁts and
problems that arise from the way they consider QRs (i.e., not documenting QRs,
mixing QRs and FRs in the documentation, or distinguishing between QRs and
FRs in the documentation). For these questions, we provided free form text ﬁelds
to be ﬁlled out by the respondents.
3.3

Data Analysis

Our data analysis constitutes a mix of descriptive statistics and qualitative text
analysis. To answer RQ1, we analyzed in particular the answers that the respon2

These were functional suitability, performance/eﬃciency, compatibility, usability,
reliability, security, maintainability, and portability.

On the Distinction of Functional and Quality Requirements in Practice

35

dents provided for the following survey questions: (i) Are QRs documented in
your typical projects, (ii) In the documentation (e.g., in a requirements speciﬁcation), do you distinguish between QRs and FRs, (iii) Considering the following
phases, how much do the activities for handling QRs diﬀer from those for FRs,
and (iv) Considering your work, for what activities does it make a diﬀerence if
you consider an QRs vs. an FR. For RQ1.1 and RQ1.2 we analyzed the results of
the ﬁrst, second, and third question, respectively. As the answers for the fourth
question are open, we analyzed the answers in detail to provide more insights in
the activities and the diﬀerences.
To answer RQ2 and RQ3, we analyzed the data our respondents provided for
the following survey questions: (i) Are there speciﬁc reasons why you do (or do
not) distinguish between QRs and FRs in the documentation, (ii) Do you experience negative consequences in your current work that result from distinguishing
(not distinguishing) between QRs and FRs in the documentation, and (iii) Do
you experience positive consequences in your current work that result from distinguishing (not distinguishing) between QRs and FRs in the documentation. The
answers to the questions are free text answers. To analyze the results, we coded
the provided answers in pairs of researchers to assemble a conceptual model of
reasons and consequences for distinguishing between QRs and FRs in practice.
The qualitative coding technique was chosen as recommended by (Straussian)
Grounded Theory [16], but diﬀers in that the central categories were previously
deﬁned following our research questions. To visualize our results from the text
analysis, we used cause-eﬀect diagrams (also known as Ishikawa diagrams).

4
4.1

Study Results
Sample Characterization

In total, 283 people clicked on the link to our survey, 172 started the survey
(61 %), and 109 completed it (39 %). From these 109 respondents, we excluded 6
as they matched our exclusion criteria. The respondents seem quite experienced
as 93 % stated that they have more than 3 years of experience with requirements, 5 % one to three years, and only 2 % with less than a year. Furthermore,
a majority of the respondents work in large companies: 57 % work in companies
with more than 2000 employees, 25 % in companies with 250–2000 employees,
and 17 % in companies with less than 250 employees. However, typical projects
of the respondents showed a variety of small to large projects: 24 % stated that
in a usual project in their company up to 10 people are involved, 46 % that
11–50 people are involved, 24 % that more than 50 people are involved, and
6 % did not know. Most of the respondents (59 %) answered that their team
is distributed over multiple locations in more than one country, 23 % that the
team is distributed over multiple locations but in one country, and 17 % that
all team members are in one location. The employed process paradigm is balanced between agile and plan-driven: 41 % of the respondents answered that
their development process is rather agile, 21 % that it is rather plan-driven, 37 %
that it is mixed, and 1 % did not know. The type of systems the respondents

36

J. Eckhardt et al.

develop is quite balanced (except for consumer software): 24 % develop embedded systems, 37 % business information systems, 5 % consumer software, and
34 % hybrid systems. Most of the respondents use requirements speciﬁcations
for in-house development (57 %), 23 % create them and an external company is
responsible for the development, and 19 % are subcontractors using requirements
speciﬁcations (e.g., as basis for development or testing).
4.2

RQ1: Handling of QRs in Practice

RQ1.1: Do practitioners diﬀerentiate between QRs and FRs in the
documentation? 88 % of the respondents answered that they document QRs
in their projects, while 12 % answered that they do not document QRs at all.
We contextualized this distribution w.r.t. the process paradigm the respondents
use in their projects. Figure 2a shows that all respondents with a plan-driven
process document QRs, while in agile processes only 77 % document QRs.
100%

100%

75%

75%

50%

50%

25%

25%

0%

0%

Agile

Mixed

Document QRs

Plan-driven
No QRs

(a) Do you document QRs?

Agile

Mixed

Distinction

Plan-driven
No distinction

(b) Do you distinguish btw. QR & FR?

Fig. 2. Relation between process paradigm and the style of documenting QRs.

From the respondents who document QRs (91 in total), 85 % answered that
they distinguish between QRs and FRs in the documentation and 15 % answered
that they do not. We also contextualized this distribution w.r.t. the process
paradigm. Figure 2b shows that a higher percentage of the respondents in agile
processes distinguish between QRs and FRs compared with respondents in plandriven processes. As a second contextualization, we analyzed the importance
of quality factors w.r.t. the style of documentation. Figure 3 shows how the
respondents ranked the importance of diﬀerent quality factors for their daily
work on a ﬁve point Likert scale. Reliability and Performance/Eﬃciency, for
example, stand out as they are considered more important by participants who
do not distinguish between QRs and FRs.
RQ1.2: To what extent do development activities for QRs diﬀer from
activities for FRs? Figure 4 shows how the respondents ranked the diﬀerence in
the phases requirements engineering, architecture/design, implementation, and
testing on a three point Likert scale. As a contextualization, we analyzed whether

On the Distinction of Functional and Quality Requirements in Practice
Distinction
Portability
Compatibility
Maintainability
Usability
Reliability
Perf./Eff.
Security
Funct. Suit.

9%

0%
Very Important

No distinction

32%
27% 10%
19%
22%
32%
34% 9%
29% 8%
23%
39%
30%
40%
23%
39%
43% 16%
40%
48% 12%
47%
38% 13%
35%6%
53%
25%
Important

50%

37

75%

Moder. Important

100% 0%

21%
21% 7% 14%
36%
36%
36% 14% 14%
36% 14%
29%
21%
29% 7%
29%
36%
29% 7%
64%
64% 14% 14% 7%
57%
29% 14%
57%
29% 7%7%
50%
100%
75%
25%

Slightly Important

Not Important

Don't Know

Fig. 3. Relation btw. importance of quality attributes and style of documentation.

there is a diﬀerence in how respondents rank the diﬀerence in the development
phases w.r.t. whether they do or do not distinguish between QRs and FRs. The
ﬁgure shows that the phase architecture/design was reported to diﬀer stronger
by respondents who distinguish between QRs and FRs.

Distinction

RE
Impl
Arch/Design
Testing

23%
27%
48%
52%
0%

25%

No distinction

47%
26%
43% 13% 17%
31% 8%13%
31%12%

50%

Differs strongly

75%

14%

100% 0%

Differs slightly

43% 7%
43% 14%7%
43%
21% 14%
57%
21%
21%

36%
36%

21%
25%

50%

Does not differ at all

75%

100%

Don't Know

Fig. 4. Relation between process diﬀerences and style of documentation.

To further detail this response, Table 1 shows exemplary statements that
respondents gave explaining the diﬀerences in the development activities.
According to the answers, there is a diﬀerent maturity of the processes for treating FRs vs. QRs (see Statement A). Furthermore, when it comes to project
planning, FRs are planned in detail but QRs are considered in an unplanned
way and only documented on a high-level (see Statement B). In testing, there
are approaches for deriving test cases from FRs but none for deriving them from
QRs (see Statement C). Moreover, diﬀerent stakeholders are involved in testing
QRs vs. FRs (see Statement D). In architecture and design, QRs need to be
considered early in the project as they have a high impact on the architecture.
In contrast to this, it is suﬃcient to consider FRs at an abstract level in early
stages (see Statement E). In the implementation, QRs need to be monitored
continuously, whereas FRs can be implemented successively (see Statement F).
In requirements engineering, FRs are more ﬁxed than QRs as QRs can be negotiated with the customer while FRs usually cannot (see Statement G).

38

4.3

J. Eckhardt et al.

RQ2: Reasons for Distinguishing QRs and FRs

Figures 5 and 6 show the cause-eﬀect diagrams for the reasons for and consequences of (not) distinguishing between QRs and FRs in practice. On the lefthand side of the diagrams, the mentioned reasons for distinguishing (Fig. 5) or
not distinguishing (Fig. 6) between QRs and FRs are indicated. On the righthand side of the diagrams, the mentioned consequences of the decision are shown.
The upper part contains the positive consequences while the lower part contains
the negative consequences. The diﬀerent entries of the diagrams (e.g., QRs have
diﬀerent nature in Fig. 5) correspond to codes that we identiﬁed in the data and
their number of occurrences. Furthermore, we structured the codes in categories
that are represented by the arcs in the diagram.
Reasons for Distinguishing QRs and FRs: The left-hand side of Fig. 5
shows the resulting reasons for distinguishing between QRs and FRs. In total,
49 out of the 77 respondents (64 %) that distinguish between QRs and FRs provided an answer to this open question. We identiﬁed 24 codes in the answers
for this question. For clarity, we only show codes that occurred at least twice in
Table 1. Exemplary answers about diﬀerences in the development process.
# Phase

Answer

A. General “[QRs] are usually treated less transparent: not clearly documented, not
explicitly tested, but somehow considered in RE, design and coding as
common sense background, e.g., in terms of [QRs] considering IT
security, performance or reliability.”
B. General “FRs are documented and planned in high detail [...] Working on
[QRs] are often unplanned activities and only high level documented.”
C. Test

“Test cases for FR[s] can quite easily [be] derived from functional
models or textual requirements [... but there is no] method for deriving
test cases from [QRs].”

D. Test

“Test planning, preparation and execution for [QRs] are handled by
diﬀerent stakeholders ([QRs] are [. . . ] strongly architecture related) and
personnel (performance and load tests are performed by specialists
usually not part of the project team).”

E. Arch.

“[QRs] are often architectural drivers and therefore have to be
evaluated and considered very early in the project when deﬁning the
architecture. Whereas in an early stage of the project a more abstract
view on the functional requirements is suﬃcient.”

F. Impl.

“[QRs] require continuous monitoring, as achievements (e.g.,
performance) may degrade during implementation.”

G. RE

“[In contrast to FRs,] [QRs] can be negotiated, if they are technically
not reachable.”

Fig. 5. Reasons for and consequences of distinguishing between QRs and FRs (Condensed version containing codes that occurred at least
twice. The comprehensive diagram containing all codes is available at http://www4.in.tum.de/∼eckharjo/DistinctionFishbone.pdf). The
left-hand side shows the mentioned reasons and the right-hand side the mentioned consequences. The upper part of the right-hand side
contains the positive consequences while the lower part contains the negative consequences.

On the Distinction of Functional and Quality Requirements in Practice
39

Fig. 6. Reasons for and consequences of not distinguishing between QRs and FRs. The left-hand side shows the mentioned reasons and
the right-hand side the mentioned consequences. The upper part of the right-hand side contains the positive consequences while the lower
part contains the negative consequences.

40
J. Eckhardt et al.

On the Distinction of Functional and Quality Requirements in Practice

41

Fig. 5.3 Reasons that we coded as QRs have diﬀerent nature, Company Practice,
and QRs are cross-functional occur frequently in the category General & Project
Organization. Furthermore, in the category Design & Implementation the reason Inﬂuence the architecture and in the category Validation & Veriﬁcation the
reason QRs require diﬀerent veriﬁcation methods also occur often.
Reasons for Not Distinguishing QRs and FRs: The left-hand side of Fig. 6
shows the mentioned reasons for not distinguishing between QRs and FRs. In
total, 7 out of the 14 respondents (50 %) who do not distinguish between QRs
and FRs provided an answer to this open question. We identiﬁed 8 codes in the
answers for this question. Figure 5 shows all identiﬁed codes, which all occurred
only once in the data (except for There is no diﬀerence).
4.4

RQ3: Beneﬁts and Problems

Beneﬁts and Problems of Distinguishing QRs and FRs: The right-hand
side of Fig. 5 shows the consequences of distinguishing between QRs and FRs.
The upper part shows the positive consequences while the lower part shows negative consequences. In total, 45 out of the 77 respondents (58 %) that distinguish
between QRs and FRs provided answers to the open question about positive consequences. Regarding negative consequences, 16 out of the 77 respondents (21 %)
provided answers. We identiﬁed 35 codes in the answers for positive consequences
and 13 in the answers for negative consequences. As shown in the diagram, the
code that we identiﬁed most in the mentioned beneﬁts is Find information in one
place in the category General & Project Organization. In this category, there are
also other beneﬁts that occurred frequently (e.g., structuredness of the process,
completeness of the requirements, separation of concerns, and increasing the
awareness of QRs). We coded the beneﬁt Increased awareness of QRs also three
times in the category implementation. For validation and veriﬁcation, the most
frequent beneﬁts are Focused Tests and Explicit QRs Tests. The code that we
identiﬁed most in the mentioned problems is Traceability becomes expensive. Further problems that were mentioned are that QRs are neglected or forgotten, that
the distinction between QRs and FRs is unclear and that the distinction results
in a weak user acceptance. Moreover, in the category Validation & Veriﬁcation,
the problem Missing testability was mentioned.
Beneﬁts and Problems of Not Distinguishing QRs and FRs: The righthand side of Fig. 6 shows the consequences of not distinguishing between QRs
and FRs. The upper part shows the positive consequences while the lower part
shows negative consequences. In total, 9 out of the 14 respondents (64 %) that
distinguish between QRs and FRs provided answers to the open question about
positive consequences. Regarding negative consequences, 5 out of the 14 respondents (36 %) provided answers. We identiﬁed 7 codes in the answers for positive
consequences and 6 in the answers for negative consequences.
3

The complete diagram including all codes is available at http://www4.in.tum.de/
∼eckharjo/DistinctionFishbone.pdf.

42

5

J. Eckhardt et al.

Discussion

From the results presented in the previous section, we conclude that practitioners are split into two groups; one advocating a distinction between QRs and
FRs and one advising against it. Interestingly, the respondents stated contrary
reasons as arguments for or against a distinction (e.g., “Both are requirements”
vs. “We distinguish them because they are diﬀerent”). Similarly, we found the
same beneﬁts stated by respondents of both parties: “If you distinguish, then
QRs are considered better” vs. “As soon as QRs are treated equally to FRs it is a
clear win-win situation such that QRs get the same attention.” Additionally, our
results indicate that it is not clear to practitioners what the diﬀerence between
both classes of requirements actually is, even though they stated reasons, beneﬁts, and problems of a distinction: “Most people have problems to distinguish
between them, so they mix” or “[Not distinguishing] avoids unnecessary confusion at the requirements authors’ side. Adding the distinction QR/FR would
require additional training, QS, etc. without adding value to the projects”. Some
respondents see this as a reasons why they do not distinguish between them:
“[. . . ]There is just no real guideline how to do it”.
The most prevalent reasons for distinguishing between QRs and FRs are in
line with those that are often found in literature (e.g., QRs have a diﬀerent nature
and are cross-functional, inﬂuence on architecture, require diﬀerent veriﬁcation
methods). However, we cannot underpin any of those reasons with negative consequences in the cases where QRs and FRs were not distinguished. Therefore,
we conclude that there seems to be confusion about this topic in practice and
handling QRs seems to be driven by expectations rather than by evidence.
In the following, we will detail and discuss some conﬂicting or even contradictory statements. We believe that these are topics that need to be investigated
further in the future, or, in case of a clear scientiﬁc position about a topic, we
need to invest more into the dissemination of the results into practice.
QR Testing – A Double-edged Sword: One of the top reasons mentioned
for distinguishing QRs and FRs was the need for diﬀerent veriﬁcation methods
(especially w.r.t. testing). Figure 4 also shows that testing is the activity that
diﬀers most for QRs and FRs. When considering consequences of distinguishing
between QRs and FRs in testing, we found both positive and negative. While
some respondents said that a distinction leads to more focused and specialized
tests for speciﬁc QRs, some also stated that a distinction leads to the fact that
some QRs are not tested at all. For example, “Performance tests are recognized
as [a] key success factor by project managers” vs. “Main issue is how to handle
the [QR] tests before product release”. On the other hand, respondents who do
not distinguish between QRs and FRs also reported positive and negative consequences regarding testing: “[. . . ]the mapping [of FRs to QRs] should ensure
that this testing also covers [QRs]” vs. “[When not distinguishing,] corresponding
V&V suﬀers”. We conclude from this that distinguishing QRs and FRs supports
the awareness for specialized tests of important QRs but, simultaneously, bears
the risk of neglecting tests for less important QRs.

On the Distinction of Functional and Quality Requirements in Practice

43

Company Practice – Never Change a Running Game: Another commonly stated reason for distinguishing between QRs and FRs is that this is
common practice in the company or that this is required by customers. However,
these reasons were almost never questioned or justiﬁed. For example, “[. . . ]Our
speciﬁcation template prescribes a structuring w.r.t. [QRs] and FRs” or “[we distinguish] as requested by the customer”. Additionally, the respondents did not
mention any positive or negative consequences that result from complying with
customer constraints. We consider this as a sign of inadvertent handling of this
topic. It would be interesting to ask customers to explicitly state reasons why
they request a distinction of QRs and FRs.
QRs – Drivers for the Architecture: Several respondents stated that the
architecture of a system is speciﬁcally inﬂuenced by QRs. For example, “[QRs]
are often architectural drivers and therefore have to be evaluated and considered
very early in the project when deﬁning the architecture”. This was often used as
an argument to distinguish between QRs and FRs: “The separation allows architects to get a quick (and in-depth) understanding of the QRs without needing to
know all the functional requirements”. FRs, on the contrary, were considered to
be more local and do not need to be ﬁxed at the beginning of the project: “[It
is] easier to ﬁnd[. . . ]special FRs for developing a single use case” or “[. . . ]in
an early stage of the project a more abstract view on the functional requirements is suﬃcient”. Surprisingly, some respondents stated that it has a positive
impact for the implementation when QRs and FRs are not strictly distinguished:
“[QRs] and FRs are handled as features. They are not separated, which avoids
the redesigns e.g., due to performance problems” and “[When not distinguishing,] we have much more freedom during the implementation iterations[. . . ]to
ﬁnd solutions that ﬁt the customers’ expectations and the possibilities that come
with the architecture and technology we use”.
Awareness Matters: It seems that an increased awareness for QRs was considered as one of the most prominent beneﬁts. Both parties claimed this as a
beneﬁt of distinguishing respectively not distinguishing between QRs and FRs:
“[Distinction] ensures that [QRs] are also in the focus” vs. “[Not distinguishing]
helps keeping the team aware that the device does not only need to have certain
features, but that these features also need to work e.g., at a high temperature”.
It seems that awareness can be increased with both strategies. The crucial point
seems to be that there is a clear and explicit relation between FRs and QRs,
which leads to the following observation.
Tracing – The Good, the Bad, and the Ugly: One trade-oﬀ that we found
in the data is an inherent challenge that does not seem to be resolved in practice.
Some respondents stated that a distinction between QRs and FRs is beneﬁcial
because it keeps associated information in one place and, thus, supports diﬀerent viewpoints on the requirements: “People who are particularly concerned with
QRs, such as architects and performance testers, ﬁnd relevant information in
one place” and “As most [QRs] apply across components, they are more easily retrieved in a separate speciﬁcation”. However, this beneﬁt also comes with

44

J. Eckhardt et al.

clear disadvantages considering tracing and the risk of forgetting requirements:
“Consistent documentation of relationships between FRs and [QRs] is diﬃcult”
and “The development team needs to be fully aware about all sources for requirements. Ostrich strategy causes a high yield of trouble”. Respondents who do
not distinguish reported on beneﬁts regarding the cohesiveness of their speciﬁcations: “Some documents beneﬁt from this, as they turn more cohesive” or
“[. . . ]the feature is really ready if installed and not only 80 %”.

6

Limitations and Threats to Validity

We now discuss the threats to validity and mitigation measures we applied.
Participant Selection: One limitation in the study is the missing lack of control over the respondents given that we distributed the survey invitation over
various networks. Apart from an unknown response rate, this means that we
cannot control how representative the responses are. We removed those respondents from the population that stated that they do not deal with requirements.
Also, although the introductory texts explicitly stated that the survey is aimed
at addressing practitioners perspective, we cannot guarantee that all the views
taken really result from practitioners.
Survey Research: Further threats to the validity result from the nature of
survey research. We cannot control on which basis the respondents provide their
answers, the respondents might be biased, and there exists the possibility that
they have misinterpreted some of the questions or even the concept of QR/NFR.
We reduced the ﬁrst threat by asking questions to characterize the context of the
respondents. We cannot mitigate the second threat, but reduced it by conducting
the survey anonymously. We minimized the third threat by conducting a pilot
phase in which we tested the instrument used and the data analysis techniques
applied.
Subjectivity of Coding: A further major threat to validity, however, arises
from the data analysis, i.e., the coding process, because coding is a creative
task. Subjective views of the coders, such as experiences and expectations, might
have inﬂuenced the way we coded the free text statements. A threat arises from
the fact that we cannot validate our results with the respondents given the
anonymous nature of our survey. We minimized this threat by coding in pairs
(researcher triangulation).
Representativeness of the Codes: Finally, one limitation stems from the
result set itself and its expressiveness. Our focus was to collect and code practitioners experiences on how they consider QRs. We quantiﬁed the results to get
an overview of whether certain codes dominate others. However, a potentially
high frequency of codes does still not allow for conclusions on the criticality of
those codes. In particular, the fact that we got more answers about reason for
and consequences of a distinction between QRs and FR than for no distinction
might have distorted our interpretation of the results.

On the Distinction of Functional and Quality Requirements in Practice

7

45

Related Work

The literature on categorizations of requirements is very extensive. Major contributions address categorizing non-functional requirements (e.g., [5,7,13]), of
which most rely on quality (deﬁnition) models (a detailed discussion can be
found in [6]). Pohl [13], for instance, discusses the misleading use of the term
“non-functional” and argues to use “quality requirements” for product-related
NFRs that are not constraints. Glinz [7] performs a comprehensive review on
the existing deﬁnitions of NFRs, analyzes problems with these deﬁnitions, and
proposes a deﬁnition on his own. Mairiza et al. [11] perform a literature review
on QRs, investigating the notion of QRs in the software engineering literature
to increase the understanding of this complex and multifaceted phenomenon.
They found 114 diﬀerent QR classes. Contributions such as those have fostered
valuable discussions on the fuzzy terminology used and the concepts applied, but
they did not focus on the implications of these categorizations on development
processes in practice.
Chung and Nixon [4] investigate how practitioners handle QRs. They argue
that QRs are often retroﬁtted in the development process or pursued in parallel
with, but separately from, functional design and that an ad hoc development
process often makes it hard to detect defects early. They perform three experimental studies on how well a given framework [12] can be used to systematically
deal with QRs. Svensson et al. [17] perform an interview study on how QRs are
used in practice. Based on their interviews, they found that there is no QRspeciﬁc elicitation, documentation, and analysis, that QRs are often not quantiﬁed and, thus, diﬃcult to test, and that there is only an implicit management
of QRs with little or no consequence analysis. Furthermore, they found that at
the project level, QRs are not taken into consideration during product planning
(and are thereby not included as hard requirements in the projects) and they
conclude that the realization of QRs is a reactive rather than proactive eﬀort.
Borg et al. [2] analyze via interviews how QRs are handled in two Swedish
software development organizations. They found that QRs are diﬃcult to elicit
because of a focus on FRs, they are often described vaguely, are often not suﬃciently considered and prioritized, and they are sometimes even ignored. Furthermore, they state that most types of QRs are diﬃcult to test properly due to their
nature, and when expressed in non-measurable terms, testing is time-consuming
or even impossible. Ameller et al. [1] perform an empirical study based on interviews around the question How do software architects deal with QRs in practice?
They found that QRs were often not documented, and even when documented,
the documentation was not always precise and usually became desynchronized.
In all of the investigations, FRs and QRs are treated separately, and the
investigations take an observational perspective on how practitioners deal with
QRs in that context. The goal of our study is to analyze whether practitioners
handle FRs and QRs diﬀerently, which reasons motivate the way they consider
QRs, and what consequences this has on the development process.

46

8

J. Eckhardt et al.

Conclusions

In this paper, we reported on a survey conducted with 103 practitioners to
explore whether and, if so, why they handle requirements labeled as “functional”
diﬀerently from those labeled as “quality” as well as to disclose resulting consequences for the development process. Our results indicate that practitioners
document QRs and most of them do make an explicit distinction between QRs
and FRs in the documentation. Furthermore, our data suggests that the development process strongly diﬀers depending on a distinction between QRs and
FRs, especially in interconnected activities such as testing. The rationale of
practitioners is that QRs are diﬀerent to FRs, i.e. they are of diﬀerent nature,
are cross-functional, strongly inﬂuence the architecture, and require diﬀerent
veriﬁcation methods. In our previous study [6], we found, however, that many
requirements labeled as “quality” might as well be categorized as “functional”
and prior to the study presented here, we had the simple speculation that if a
blurry distinction determines how the following development activities are performed, we should ﬁnd problems that arise because the activities do not really
ﬁt the corresponding requirements. Still, our results indicate that the question
whether to make a distinction or not is without a direct linkage to negative or
positive consequences per se. Therefore, we argue that the decision whether to
make an explicit distinction should be made consciously such that people are
aware of the risks that this distinction bears so that they may take countermeasures.
Acknowledgements. We would like to thank M. Broy, K. Beckers, J. Mund, S.
Smith-Eckhardt, and M. Glinz for their helpful comments and suggestions.

References
1. Ameller, D., Ayala, C., Cabot, J., Franch, X.: How do software architects consider
non-functional requirements: an exploratory study. In: 20th IEEE International
Requirements Engineering Conference (RE) (2012)
2. Borg, A., Yong, A., Carlshamre, P., Sandahl, K.: The bad conscience of requirements engineering: an investigation in real-world treatment of non-functional
requirements. In: 3rd Conference on Software Engineering Research and Practice
in Sweden (SERPS) (2003)
3. Broy, M.: Rethinking nonfunctional software requirements: a novel approach categorizing system and software requirements. In: Software Technology: 10 Years of
Innovation in IEEE Computer. John Wiley & Sons (2016)
4. Chung, L., Nixon, B.A.: Dealing with non-functional requirements: three experimental studies of a process-oriented approach. In: 17th International Conference
on Software Engineering (ICSE) (1995)
5. Chung, L., do Prado Leite, J.C.S.: On non-functional requirements in software
engineering. In: Borgida, A.T., Chaudhri, V.K., Giorgini, P., Yu, E.S. (eds.) Conceptual Modeling: Foundations and Applications. LNCS, vol. 5600, pp. 363–379.
Springer, Heidelberg (2009). doi:10.1007/978-3-642-02463-4 19

On the Distinction of Functional and Quality Requirements in Practice

47

6. Eckhardt, J., Vogelsang, A., Méndez Fernández, D.: Are non-functional requirements really non-functional? An investigation of non-functional requirements in
practice. In: 38th International Conference on Software Engineering (ICSE) (2016)
7. Glinz, M.: On non-functional requirements. In: 15th IEEE International Requirements Engineering Conference (RE) (2007)
8. ISO/IEC: Systems and software quality requirements and evaluation (SQuaRE).
ISO/IEC 25010, Geneva, Switzerland (2011)
9. ISO/IEC/IEEE: Systems and software engineering – Life cycle processes – Requirements engineering. ISO/IEC/IEEE 29148:2011(E), Geneva, Switzerland (2011)
10. Kitchenham, B.A., Pﬂeeger, S.L.: Personal opinion surveys. In: Guide to Advanced
Empirical Software Engineering. Springer, London (2008)
11. Mairiza, D., Zowghi, D., Nurmuliani, N.: An investigation into the notion of nonfunctional requirements. In: 25th ACM Symposium on Applied Computing (2010)
12. Mylopoulos, J., Chung, L., Nixon, B.: Representing and using nonfunctional
requirements: a process-oriented approach. Trans. Softw. Eng. 18, 483–497 (1992)
13. Pohl, K.: Requirements Engineering: Fundamentals, Principles, and Techniques.
Springer, Heidelberg (2010)
14. Robertson, S., Robertson, J.: Mastering the Requirements Process: Getting
Requirements Right. Addison-Wesley (2012)
15. Sommerville, I., Kotonya, G.: Requirements Engineering: Processes and Techniques. John Wiley & Sons Inc., Hoboken (1998)
16. Stol, K., Raph, P., Fitzgerald, B.: Grounded theory in software engineering
research: a critical review and guidelines. In: 38th International Conference on
Software Engineering (ICSE) (2016)
17. Berntsson Svensson, R., Gorschek, T., Regnell, B.: Quality requirements in practice: an interview study in requirements engineering for embedded systems. In:
Glinz, M., Heymans, P. (eds.) REFSQ 2009. LNCS, vol. 5512, pp. 218–232.
Springer, Heidelberg (2009). doi:10.1007/978-3-642-02050-6 19
18. Van Lamsweerde, A.: Goal-oriented requirements engineering: a guided tour. In:
5th IEEE International Symposium on Requirements Engineering (2001)

A Survey on Software Release Planning Models
David Ameller, Carles Farré(&), Xavier Franch, and Guillem Ruﬁan
Universitat Politècnica de Catalunya, Barcelona, Spain
{dameller,farre,franch}@essi.upc.edu,
guillemon13@gmail.com

Abstract. Software release planning (SRP) is the problem of selecting which
features or requirements will be included in the next release or releases. It is a
crucial step in software development, which happens to be extremely complex
given the need to reconcile multiple decision making criteria, (e.g., business
value, effort and cost), while considering several constraints (e.g., feature
precedencies, resource availability). For this reason, several SRP models have
been proposed in the literature. The objective of this paper is to provide an
updated review of SRP approaches reported in the literature.
Keywords: Software release planning  Next release problem  Resource
optimization  Software evolution  Literature survey  State of the art

1 Introduction
According to Lehman’s First Law of software evolution [6], software must be continually adapted or it becomes progressively less satisfactory to be used in its environment. Software changes are usually not implemented all together but incrementally
[5]. Major enhancements are planned and incorporated, together with other minor
changes, in each new release or upgrade. According to Sommerville [12], the main
activities in an evolution loop are feedback, impact analysis, release planning, system
implementation, and system release.
Software Release Planning (SRP) is the problem of ﬁnding the best combination of
features to implement in a sequence of releases. SRP seeks to maximize business value
and stakeholder satisfaction without neglecting the constraints imposed by the availability of adequate resources and the existence of dependencies between features,
among other constraints [11]. There are several factors that make SRP a computationally complex problem: the number of features and their interdependencies; the
number of stakeholders involved, their different levels of priority, and their conflicting
interests; the variety of variables to be considered (e.g., business value, effort, cost);
and the uncertainty and incompleteness of the available information [10].
Given its importance, many approaches to SRP have been proposed. Svahnberg
et al. [14] presented a comprehensive survey of SRP models formulated until 2008.
These models were analysed under different perspectives (inputs considered, industrial
application, etc.). Since then, other proposals have been formulated.
The goal of this paper is to update the results of the survey [14] by considering
these recent approaches to SRP. For attaining this goal, we have searched, analysed and
© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 48–65, 2016.
DOI: 10.1007/978-3-319-49094-6_4

A Survey on Software Release Planning Models

49

discussed the different SRP models that have been proposed in the scientiﬁc literature
related to this question by performing a snowballing-based literature review.
Following the four perspectives for describing a goal in the Goal-Question-Metric
(GQM) approach [2], the goal of our study is deﬁned as follows:
•
•
•
•

Purpose: ﬁnd and characterize
Issue: the proposed models in the academic literature since 2009
Object: for software release planning
Viewpoint: from the viewpoint of project managers and software developers.

The rest of the paper is organized as follows. In Sect. 2, we introduce the research
method followed in the paper, with special attention to the snowballing procedure and
the research questions. In Sect. 3, we extract the results of the surveyed papers in
relation to the research questions. In Sect. 4, we analyse the results and provide the
most relevant observations from it. Finally, in Sect. 5 we present the conclusions and
future work.

2 Research Method
As stated above, we built on top of the knowledge gained in the Systematic Literature
Review (SLR) conducted by Svahnberg et al. [14] that surveyed the SRP approaches
proposed until 2008. We proceeded according to the following steps:
• Our research questions were based on those in [14].
• Given its high citation count and prominent venue of publication, we assumed that
any sound primary study on SRP models published in the period 2011–2015 has
cited [14]. Therefore, we based our search of primary studies on forward snowballing from this paper.
• For the period 2009–2010, we checked the references appearing in the papers found
in the forward snowballing. In addition, this reference analysis (backward snowballing) was used also to check any further relevant work in the period 2011–2015.
• Finally, we conducted an expert consultation step and added a few extra references
that were considered relevant.
The rest of the section presents in detail the research method applied under these
general guidelines.
2.1

Research Questions

In this study, we kept two of the research questions (RQs) from the SLR presented in
[14] while discarding the other two that were more speciﬁc and related to speciﬁc
interests of the authors. We also decided to decompose the two remaining RQs into
sub-RQs to structure the data collection and analysis.
The most fundamental RQ of our study is: What SRP models have been presented
since 2009? (RQ1). We decompose it into four sub-questions:

50

•
•
•
•

D. Ameller et al.

RQ1.1.
RQ1.2.
RQ1.3.
RQ1.4.

What
What
What
What

are
are
are
are

the
the
the
the

main motivations for the models?
inputs processed by the models?
outputs generated by the models?
algorithms or techniques applied by the models?

We also want to know to what extent have the SRP models surveyed in RQ1 been
validated? (RQ2). We have decomposed this RQ into:
• RQ2.1. Are the models supported by tools?
• RQ2.2. How has been industry involved in the models?
• RQ2.3. What are the major threats identiﬁed on the models?

2.2

Selection of Studies

As mentioned above, we combined forward and backward snowballing using [14] as
starting point plus an additional ﬁnal check with experts. Snowballing refers to using
the reference list of a given paper (backward snowballing) or the citations to the paper
(forward snowballing) to identify additional literature [16].
We deﬁned the following inclusion criteria to select the relevant studies:
1. The paper is a full research paper published in any of: JCR-indexed journal
belonging to Q1–Q3 quartiles; proceedings of one of the following main software
engineering or software evolution conferences: ICSE, ESEC/FSE, ESEM, HICSS,
ICSM, ICSME, CSMR; or any JCR-indexed journal, CORE A or B conference/
workshop proceedings if at least one of the authors has an industry afﬁliation. In
addition, we included PROFES because it is the most recurrent venue in the papers
found by [14], together with IEEE Software.
2. The paper describes an SRP model.
The rationale about the restricted selection of venues in the ﬁrst criterion is that the
most relevant studies should be found published in the most renowned journals and
conferences. The reason for which the selection of these venues is relaxed for studies
with authors having industrial afﬁliation is that we want to ﬁnd as many as possible
proposals where a full-scale industrial validation of the proposed SRP model is conducted. We think that, in line with the conclusions and recommendations given in [14],
industrial validation is a key issue when assessing an SRP model.
The process was implemented in three different iterations as depicted in Fig. 1. For
the ﬁrst iteration, we used both Scopus and Google Scholar to ﬁnd the references that
cite directly [14]. This search was performed on 2 October 2015 and we retrieved 56
references from Scopus, while Google Scholar provided us 101. All the references
returned by Scopus, except two, were also in the set provided by Google Scholar. Most
of the 47 Google Scholar references not included in Scopus were excluded later on (see
application of inclusion criteria below), but a few of them were considered. These ones
were references to papers published in major journals as “online ﬁrst”, without being
assigned yet to a concrete journal issue.

A Survey on Software Release Planning Models

51

Fig. 1. Papers selected in each iteration.

From the grand total of 103 studies, 81 studies were excluded by the ﬁrst inclusion
criterion. The titles and abstracts of the remaining 22 studies were read to apply the
second inclusion criterion. In the cases where that information was not sufﬁcient to
make a decision, the full text was also considered [16]. This resulted in the selection of
9 relevant papers listed in Table 1 (see the References section for full details).
Table 1. Papers selected from the forward snowballing iteration.
Ref.
M1
M2
M3
M4

Year
2012
2011
2013
2012

M5

2013

M6

2013

M7

2015

M8
M9

2014
2014

Paper title
A hybrid release planning method and its empirical justiﬁcation
Quantitative release planning in extreme programming
Multi-sprint planning and smooth replanning: An optimization model
Solving the large scale next release problem with a backbone-based
multilevel algorithm
Analyzing an industrial strategic release planning process - a case study at
Roche diagnostics
Continuous release planning in a large-scale scrum development organization
at Ericsson
Software requirements prioritization and selection using linguistic tools and
constraint solvers — a controlled experiment
Industrial evaluation of the impact of quality-driven release planning
Theme-based product release planning: an analytical approach

For the second iteration, we used the 22 papers from the ﬁrst iteration (i.e., the
selected papers before applying the second inclusion criterion) and retrieved all their
references. We retrieved 647 references overall, without taking into account any
inclusion criteria. These references included papers, books, manifestos and programs.
For the second iteration we used the same inclusion criteria, with the exception that
from the 647 references we only considered those that were published after 2008. Then
for the remaining 240 papers we applied the ﬁrst inclusion criterion. Finally, the titles
and abstracts of the 125 remaining papers were read to apply the second inclusion

52

D. Ameller et al.
Table 2. Papers selected from the backward snowballing iteration.

Ref.
M10

Year
2009

M11

2010

M12

2011

M13

2015

M14

2010

M15

2014

Paper title
Software project planning for robustness and completion time in the
presence of uncertainty using multi-objective search based software
engineering
Rigorous support for flexible planning of product release - a stakeholder
centric approach and its initial evaluation
Conceptual scheduling model and optimized release scheduling for agile
environments
Differential evolution with Pareto tournament for the multi-objective next
release problem
An integrated approach for requirement selection and scheduling in software
release planning
Bi-objective genetic search for release planning in support of themes

criterion. This resulted in the selection of 6 relevant papers listed in Table 2 (see the
Reference section for full details).
For the third iteration, we asked for additional references to experts participating in
the SUPERSEDE H2020 European project (www.supersede.eu). We received 5 proposals of papers which were considered relevant in the context of this project. From
these 5 references, we discarded 3 of them: Ruhe’s book because it was not providing
new models but a compilation of knowledge [10], a paper by Durillo et al. [4] (see
justiﬁcation below) and a tool demo report since it was only 4 LNCS pages long [1].
The two remaining papers are listed in Table 3.
As a result, we selected 17 papers from this period starting in 2009 until date. This
number is quite aligned with the total number of papers surveyed in [14], a total of 28
papers found in the period 1997–2008.
Table 3. Papers selected after recommendation in the SUPERSEDE project.
Ref.
M16

Year
2016

M17

2010

Paper title
Risk-aware multi-stakeholder next release planning using multi-objective
optimization
Analytical product release planning

It is worth to mention that the second inclusion criterion left out some papers that
are relevant to the SRP ﬁeld but are not directly related to the goals of this research. We
highlight four categories of papers that have not been included:
• Papers that present some experimentation (through benchmarks) on advanced
algorithms that are applied to SRP (e.g., Durillo et al. [4] performing a sensitivity
analysis of three genetic algorithms or Luna et al. [9] which surveys different
metaheuristics for solving a multi-objective based formulation of the problem).

A Survey on Software Release Planning Models

53

• Papers that assess the quality of SRP techniques or practices (e.g. Lindgren et al. [8]
proposing a capability maturity model for release planning or Didar-al-Alam et al.
[3] identifying release readiness improvement factors).
• Papers that focus on activities that may or must take place during SRP but do not
focus on this particular context (e.g., Lehtola et al. [7] analysing requirements
prioritization).
• Papers that add a small delta to previous publications (e.g., Szõke [15] adding
assignment of features to distributed teams which does not change signiﬁcantly the
model proposed by the same author in another journal paper selected in our study).

2.3

Data Extraction and Analysis

For the data extraction we used a set of variables associated to each research question
and the metadata used for the selection criteria (see Table 4). For the analysis we
extracted data and categories as a basis for our results, and we revised the relevant
papers for each particular topic to provide valuable information.
Table 4. Data extraction criteria
Metadata

Article title
Authors’ name
Journal/conference
Retrieval search query
Date of publication
RQ1 Context What is the main motivation for the model
Models What are the inputs that the model processes
What are the algorithms/techniques used for computation
What are the outputs that the model produces
RQ2 Tool
Tool availability
Study
Study from academia or industry
Description of threats to validity
Adoption Model proposed in literature or in industry
Model validated in academy or in industry
Model adopted in academy or in industry

2.4

Threats to Validity

Construct Validity. The selection of primary studies followed a strict protocol; however, the use of snowballing has some inherent, well-known limitations. The most
important threat is that snowballing narrows the search scope to the referenced papers,
therefore some papers may be left out. We consider that this threat was mitigated by the
fact that we used an SLR published in a main software engineering journal as departing
paper because such SLRs are normally cited by many researchers. A second mitigation
action was to include a third iteration based on experts’ opinion.

54

D. Ameller et al.

Internal Validity. Each paper was analysed in depth by one researcher of this study;
however, it is known that it is easy to have different views or interpretations on the
same paper depending, e.g., on the research background or past experiences in similar
studies. Therefore, papers were checked by a second researcher when doubts arose.
External Validity. As in most literature reviews, this study does not aim to generalize
results because there is no statistical basis to claim that the selected papers are a
representative sample of the population (i.e., all the papers ever published about SRP).
Therefore, any claim made in this study is limited to the set of studied papers.
Moreover, the study only covers the works published in the literature, any model made
available by other means (e.g., commercial tools) has not been considered.
Conclusion Validity. We deﬁned a precise protocol of the steps to be followed,
however as in any other literature review, we relied on the result of search engines
which may offer different results in the future. Therefore, a replication of this study
could lead to different selection of primary studies, and thus to different results.

3 Results
This section summarizes the results of the analysis of the 17 selected SRP references in
relation to the RQs formulated in Sect. 2.2.
3.1

RQ1. What SRP Models Have Been Presented Since 2009?

Eleven [M2–M4, M6, M7, M10, M12–M16] out of the 17 papers that we have
examined propose new models. The 6 remaining models [M1, M5, M8, M9, M11,
M17] are extensions of the EVOLVE II model (being itself an extension of EVOLVE)
and its implementation as a commercial tool, ReleasePlanner. Given this fact, we
analyse separately the two families in the RQ. They are summarized in Tables 5 and 6
whose contents will be detailed in the rest of this subsection. Table 6 focuses on the
customization that the different approaches propose on EVOLVE II.
Table 5. Summary of responses to RQ1: new SRP models
Ref

Motivation

M2

Planning in
agile

M3

Planning in
agile

Input
factors
RD • EC
• SiF •
VF
RD • EC
• VF •
RF

Output

Technique

Next release •
User stories • 3
priority levels
Multi-release •
User stories

Nested knapsack problem,
solved with branch and bound
Generalized knapsack problem,
solved with branch and cut
(continued)

A Survey on Software Release Planning Models

55

Table 5. (continued)
Ref

Motivation

M4

Scalability

M6

Scalability

M7

Scalability

M10

Robustness and
completion
time
Planning in
agile

M12

M13

M14

M15

M16

Development
cost and
customer
satisfaction
Time schedule
+diversity of
precedences
Grouping
features by
themes
Risk
assessment and
stakeholder
prioritization

Input
factors
RD •
B&CC •
SiF
B&CC •
VF • RF
SiF

RD • RC

Output

Technique

Next release •
Requirements

Backbone-based multilevel
algorithms

Next release •
Features
Next release •
Requirements •
Prioritized list
Tasks assigned
to developers

Human assessment
Satisﬁability modulo theory
supported by NLP and AHP
Genetic algorithm

Multi-release •
Features

0-1 multiple knapsack problem,
solved with branch and bound

Next release •
Requirements

Evolutionary algorithm

RD • RC
• TiC

Next release •
Requirements •
Time schedule

RD • RC
• EC •
SiF • VF
B&CC •
SiF

Next release •
Features

Knapsack problem
+resource-constrained
scheduling problem, solved with
branch and bound
Evolutionary algorithm

RD • RC
• EC •
TiC •
SiF
B&CC •
SiF

Next release •
Requirements

Satisﬁability modulo theory

Table 6. Summary of responses to RQ1: EVOLVE II-based models
Ref

Motivation

M1

Dealing with
complex
constraints
Scalability

M5

Input
factors
RD • TeC •
RC • SiF

Output

Technique

Multi-release
• Features

ReleasePlanner+Constraint
Programming Solver

RD • RC •
SiF

Multi-release
• Features

ReleasePlanner
(continued)

56

D. Ameller et al.
Table 6. (continued)

Ref

Motivation

M8

Quality aspects

M9

Grouping features
by themes
Active stakeholder
involvement

M11

M17

Introducing data
analysis

Input
factors
RD • QC •
RC • SiF
RD • RC •
SiF
RC • SiF

RD •
B&CC •
SiF

Output

Technique

Multi-release
• Features
Multi-release
• Features
Multi-release
• Features

ReleasePlanner

Multi-release
• Features

Graph clustering
+ReleasePlanner
ReleasePlanner
+Weighting-Based
techniques
Crowdsourcing+Data
Analysis+ReleasePlanner

RQ1.1. What are the main motivations for the models?
Among the 11 new models, we have identiﬁed a ﬁrst group [M2, M3, M12] whose
main concern is to address SRP in agile contexts, characterized by the uncertainty
related to working with estimations (e.g., project velocity) and the flexibility required
by agile projects (e.g., continuously changing customer needs, especially when the
model is intended to plan more than one iteration as [M3, M12] do). Other approaches
do also propose a solution in an agile context (e.g. [M14]) but the problems they face
are not inherent to this paradigm (e.g., different type of requirement precedencies).
A second group of new models seems more concerned in proposing solutions that
scale up in the presence of large sets of requirements ([M4, M6, M7]).
Another group [M10, M13] emphasise the contradictory nature of some SRP
objectives: robustness (deﬁned as a solution that satisﬁes expectations of project
manager) vs. completion time [M10] and development cost vs. customer satisfaction
[M13].
For the remaining new models, the motivations differ: grouping related features into
themes to be scheduled preferable together [M15], considering different sorts of
dependencies between requirements [M14], taking into account the requirements
inherent risks [M16] or tightly integrating time [M14] into the solution.
For the 6 models that extend EVOLVE II/ReleasePlanner, the motivations are also
different in each case. The resulting extensions, thus, are orthogonal and complementary with respect to the others. In [M1], for example, there is the need of dealing
with requirements selection constraints that are more complex and richer than the ones
that ReleasePlanner accepted as input. In [M5], the original motivation is to apply
ReleasePlanner in an industrial case study and, as a result, a new extension is proposed
to address the problem of feature generation. In [M8, M9] the proposed extension of
EVOLVE II is driven by the need of dealing with quality aspects and features grouped
into themes, respectively. In [M11] the main aim is to promote and support the active
participation of the stakeholders in the planning process. Finally, in [M17] the motivation is to deﬁne a generic framework to solve different sorts of release planning

A Survey on Software Release Planning Models

57

problems by applying analytical methods on a diversity of data available from internal
and external sources of information.
RQ1.2. What are the inputs processed by the models?
Here we replicate the analysis of input factors done in [14] by applying the same
taxonomy of requirements selection factors. The taxonomy is as follows:
• Hard Constraints: include those factors that may restrict the order and time when
certain features or requirements can be implemented. They are classiﬁed as:
– Technical Constraints: Requirements Dependencies (RD), Quality Constraints
(QC), and Other Technical Constraints (TeC).
– Other hard constraints: Budget & Cost Constraints (BCC), Resource Constraints
(RC), Effort Constraints (EC), Time Constraints (TiC).
• Soft Factors: include those factors that are more difﬁcult to estimate and provide
exact numbers on. They are classiﬁed as: Stakeholders’ Influence Factors (SiF),
Value Factors (VF), Risk Factors (RF), and Resource Consumption Factors (RCF).
In Table 7 the factors appear in their original formulation and, between parenthesis,
their mapping to the taxonomy of requirements selection factors.
Table 7. Requirements selection factors per SRP model
Ref
Factors in model
New models
M2
Technical precedencies (RD) • Story sizes+Velocity estimate (EC) • Preference
(customers wish) precedencies (SiF) • Story & theme values (VF)
M3
Correlation & Precedence among stories (RD) • Story efforts (EC) • Story values
(VF) • Story risks (RF)
M4
Requirements dependencies (RD) • Requirement costs (B&CC) • Customer
satisfactions (SiF)
M6
Feasibilities (B&CC) • Proﬁtabilities (VF) • Risks (RF)
M7
Keyword prioritization by user (SiF) • Pairwise comparisons between requirements by
user (SiF)
M10
Task dependencies (RD) • Available resources/staff (RC) • Different skills (RC).
M12
Feature dependencies (RD) • Resource capacities (RC) • Business priorities and
efforts (EC) • Iteration-delivery times (TiC) • Requirement priorities (SiF)
M13
Requirement costs (B&CC) • Client values (SiF) • Client weight factors (SiF)
M14
Task dependencies (RD) • Available employees (RC) • Employee salaries and
dedications (RC) • Deadline of the project (TiC)
M15
Task dependencies (RD) • Available resources (RC) • Task efforts (EC) • Stakeholder
values (SiF) • Theme & feature values (VF)
M16
Requirement costs (B&CC) • Stakeholder values (SiF)
EVOLVE-based models
M1
Coupling and precedence dependencies (RD) • Constraints solvable by CP solvers
(TeC) • Resource constraints (RC) • Stakeholder scores to features (SiF)
(continued)

58

D. Ameller et al.
Table 7. (continued)

EVOLVE-based models
M5
Coupling and precedence dependencies (RD) • Resource constraints (RC) •
Stakeholder scores to features (SiF)
M8
Coupling and precedence dependencies (RD) • Quality constraints (QC) • Resource
constraints (RC) • Stakeholder scores to features (SiF)
M9
Coupling and precedence dependencies (RD) • Resource constraints (RC) •
Stakeholder scores to features (SiF)
M11
Resource constraints (RC) • Stakeholder scores to features (SiF) • Stakeholder
pre-selected features (SiF)
M17
Advanced feature dependencies (RD) • Budget capacities (B&CC) • Stakeholders’
cost predictions (SiF) • Stakeholders’ willingness to pay (SiF)

The input factors considered in EVOLVE II/ReleasePlanner are coupling and
precedence constraints between features (RD), resource constraints (RC), as well as the
stakeholder scores to features, given for a flexible number of criteria (SiF) [M1]. Some
of the models that extend EVOLVE II/ReleasePlanner do not require extra input
[M5, M9]. In addition, [M1] accepts as input any constraint that can be also expressed
as an input of a Constraint Programming Solver (e.g., a constraint expressing mutual
exclusion between features, or productivity investments). In the case of [M8], some
quality aspects are added to the model. In [M11], the model adds a pre-selection of
candidate features from stakeholders. Finally, [M17] presents a case study where the
decision of what to release considers advanced feature dependencies, stakeholders’
predictions on feature costs, budget capacities over different time periods, and the price
that stakeholders would pay for each feature.
RQ1.3. What are the outputs generated by the models?
In the case of the new SRP models, we report three main categories of outputs.
Eight approaches [M2, M4, M6, M7, M13–M16] produce as output the list of
stories/requirements/features to be included in the next release. Moreover, in [M2] this
list is divided into three groups with different priority: must-have, should-have and
could-have stories. Going further, [M7] produces a list of prioritized requirements for
the next release. In [M14] the output combines the list of next-released requirements
with a schedule of the relative time at which these tasks should be performed by the
development teams. Two approaches [M3, M12] produce a multi-release plan, more
speciﬁcally an assignment of user stories or features to consecutive sprints. One
approach [M10] does not produce release plans but an assignment of different tasks to
different developers taking into account the speciﬁed constraints (i.e., the focus is only
on the operation release planning).
The models that extend EVOLVE II/ReleasePlanner produce the same output than
the original model/tool: an assignment of features to the releases in which they have to
be implemented.

A Survey on Software Release Planning Models

59

RQ1.4. What are the algorithms or techniques applied by the models?
In spite of their diversity, most of the approaches build on top of a very simple
formulation of the SRP problem: calculate an assignment from a feature set {f(1), …, f
(N)} to a release plan x = (x(1), …, x(N)) such x(j) = k means that f(j) is offered at
release k. The solution is required to maximize the value of some utility or objective
function. Except for naïve assumptions, the problem becomes a multi-objective
problem and the approaches differ mainly in the techniques proposed to solve this NP
problem.
Some approaches [M2, M3, M12, M14] formulate the problem as an instance of the
knapsack problem, using some kind of branch and bound algorithm to solve it. The
number of releases to plan and the type of constraints managed conﬁgure the exact type
of knapsack (nested, generalized, …). [M14] combines this solution with additional
techniques from resource-constrained project scheduling problem solving.
Other family of approaches use optimization-related techniques. Satisﬁability
modulo theory is among the preferred ones [M7, M16]. In addition, [M7] uses natural
language processing (NLP) for extracting information from requirements, and the
analytic hierarchy process (AHP) to ﬁne-tune the prioritization of requirements. Evolutionary algorithms [M13, M15] and genetic algorithms [M10] are also used to
implement the multi-objective problem. Finally, a backbone-based multilevel algorithm
was tune to the SRP problem in [M4].
On a completely different setting, [M6], which is based on a large-scale industrial
case at Ericsson for agile processes, presents a model which does not use any particular
model. In fact, one of the motivations of the case study is to overcome the limitations
posed by model-based approaches, especially in relation with the assumptions for their
application. Therefore, all the release planning is integrated in the traditional agile
lightweight agile process.
On the other hand, the EVOLVE-based models are implemented using the
ReleasePlanner tool. From these approaches:
• [M5, M8] use the tool as it is.
• [M9, M11] complement the tool with some extra functionalities. For instance,
[M11] includes a weighting-based technique to consider the influences that every
stakeholder has in each iteration requirements.
• [M1, M9] integrate ReleasePlanner with some other software component. In [M1],
ReleasePlanner is used to generate a ﬁrst solution that feeds a constraint programming solver to ﬁnd the best solution with an enlarged set of constraints.
Conversely, [M9] uses the output of a graph clustering algorithm to feed
ReleasePlanner.
• [M17] embeds ReleasePlanner into a more complex system which seeks maximizing a utility function.

3.2

RQ2. To What Extent Have the SRP Models Surveyed in RQ1 Been
Validated?

To answer RQ2, following we summarize our ﬁndings for the three sub-RQs.

60

D. Ameller et al.

RQ2.1. Are the models supported by tools?
Nearly half of the works found in this state of the art mention some kind of tool, but
is worth to differentiate those works that use a tool just to validate their approach (i.e. a
prototype or just an ad-hoc solution speciﬁc for the paper) from those that are presenting a ready-to-use tool; in this second case, the most remarkable case is
ReleasePlanner, mentioned in [M1, M5, M8, M9, M11, M17] (the models based on
EVOLVE-II).
The papers that use a prototype or ad-hoc solution mention the following technologies: CP-Solver [M1], LP-Solve (an OSS linear programing tool) [M2], and
CPLEX [M3]. In general, we can see that all the academic contributions use problem
solvers to determine what features will be implemented in the next release.
The rest of papers ([M4, M6, M10, M12–M16]) did not report any kind of tool.
RQ2.2. How has been industry involved in the models?
All selected papers are academic works (i.e., all or most authors have an academic
afﬁliation). In 4 cases [M5, M6, M8, M10] there was one author from the industry. It is
worth noting that these works were the only ones that provide real case studies as part
of their contribution. The rest of works were validated using experiments with the
exception of [M9], which had a case study (using students).
All the proposed models except one were originated in academic research. The
exception is an approach proposed by Ericsson [M6], which is also the only one that is
being adopted by the industry (by the same company).
RQ2.3. What are the major threats identiﬁed on the models?
We found 5 papers [M5, M6, M7, M8, M10] with a wide analysis of the threats to
validity (i.e., including internal, external, construct and conclusion validity threats). In
5 cases [M1, M4, M9, M11, M15] there were some threats explained but without any
kind of structure and in 7 cases [M2, M3, M12–M14, M16, M17] there was no mention
of threats to validity.
From those that mention the threats to validity, the most recurrent ones are: lack of
testing in an industrial setting, the difﬁculty to generalize the results (e.g., due to the
different skills of the testing participants, variability, and different domain applications), and that some formalizations do not contemplate all the possible dependencies.

4 Discussion
In this section discuss the most remarkable observations coming from the analysis of
the results.
4.1

RQ1. What SRP Models Have Been Presented Since 2009?

One family of models prevails in the ﬁeld, namely those coming from the
EVOLVE-II/ReleasePlanner proposal. This fact is not surprising due to the high
prevalence of the EVOLVE family in SRP before 2010 as reported in [14].

A Survey on Software Release Planning Models

61

RQ1.1. What are the main motivations for the models?
Pursuing scale. In contrast with the results reported in [14], we have found more
proposals aimed to scale in presence of large sets of requirements, which is a necessary
step towards a full industry application. Unfortunately, the theoretical approaches have
not been complemented with a proper validation in true industrial settings (see below).
RQ1.2. What are the inputs processed by the models?
Incomplete input factors. In a recent survey conducted through interviews with
the three companies participating in the SUPERSEDE project [13], any SRP model
should be able to deal with the following input factors in order to ﬁt industry needs:
• For each requirement: the required effort to implement it (measured in person hours
or similar), the developer skills required to implement it, its deadline (optional), its
dependencies with respect to other requirements and its priority or its business value
assessed by its stakeholders.
• For each release: its deadline and the list of available developers, for whom it is
necessary to know:
– his/her skills (to be matched with the ones required to implement requirements)
– the amount of effort (measured in hours per week or similar) that s/he can invest
in the release.
While requirements dependencies are taken into account by almost all the studies,
other constraints like time (deadlines) are only addressed by two papers [M12, M14],
and only one other paper [M10] take into account the different (developer) skills
required and available to implement the requirements. On the contrary, soft factors like
stakeholder consideration have been considered by a great share of approaches.
RQ1.3. What are the outputs generated by the models?
Simple outputs. Most of the SRP surveyed models produce a “binary” yes/no
result that simply tells which requirements should be implemented for the next release
(s). Only in the case of the models presented in [M10, M14], a richer type of results, in
terms of requirement implementation scheduling and resource (developers) allocation,
is provided. Clearly, this latter type of results is the one that matches better with the
needs of software companies, in accordance with the input factors that they think that
should be taken into account as reported above.
RQ1.4. What are the algorithms or techniques applied by the models?
SRP as a multi-objective problem. Most approaches recognize the existence of
different and often conflicting objectives that need to be reconciled in the planning of
releases. Solutions aim at reducing the inherent NP nature of the algorithms into
linear-time implementations still able to ﬁnd an optimal release plan. Experimentation
is a key instrument for these works in order to demonstrate the accuracy and efﬁciency
of the proposed technique.

62

4.2

D. Ameller et al.

RQ2. To What Extent Have the SRP Models Surveyed in RQ1 Been
Validated?

RQ2.1. Are the models supported by tools?
Lack of ready-to-market tools. Only one tool (among the mentioned in the
selected papers) can be considered a ready-to-market tool, namely ReleasePlanner. In
fact, the product, while created in an academic context, is the core business of a spin-off
company created around it. On the positive side, this commercial nature may be an
indicator that there is a market share for this kind of tools. However, it is true that other
tools are available in the market (e.g., Tempo Planner, a plug-in for JIRA) but none of
them got their way into the selected papers. Another important observation is that 8 of
the selected papers (out of 17) that did not mention any tool support; this circumstance
makes very difﬁcult to assess the suitability of these approaches.
RQ2.2. How has been industry involved in the models?
Scarce industrial contributions. Very few authors from the surveyed papers came
from industry, in most cases as providers of a case study. Only in one single case the
industrial authors were providing the SRP model. On the other hand, as commented
before, there are other commercialized tools without representation in the academic
literature. This situation may imply that the observations made in this paper are slightly
deviated towards the academic perspective because there is a part of the big picture
underrepresented in the academic literature.
RQ2.3. What are the major threats identiﬁed on the models?
Non-optimal consideration of threats to validity. For a literature survey focused
on journals and main conferences, it is a bit surprising to ﬁnd as much as 7 papers
(from 17) with no mention at all at threats to validity. The absence of their analysis
hampers the applicability of the presented models.
4.3

SRP Field Evolution

It is interesting to analyse the evolution of the SRP ﬁeld by comparing the results from
[14] and ours. As a ﬁrst observation already stated in Sect. 4, the EVOLVE family of
SRP models keeps its prevalence. In our literature review, we have found 6 papers out
of 17 (35.3 %), less than the 16/28 = 57.1 % found by [14] but still the biggest share by
far. No other method has shown prevalence in the ﬁeld since 2009. In particular, in the
case of papers not belonging to the EVOLVE family, none of the authors of models
reviewed by [14] appear in the new models that we have studied, except for the case of
[M14]. This means that although these researchers have made interesting proposals
from a research point of view, the transferability to industry is not reported.
Svahnberg et al. [14] found that most models focused on a limited set of input
factors, mainly hard constraints. Only a 57.1 % of the reviewed models considered soft
factors. In our study, 15 out of the 17 models (88.2 %) do include soft factors.
Noteworthy, soft-constraints that implied some kind of stakeholder involvement (SiF
constraints) are considered in 13 out of the 17 models (76.5 %) that we have found,

A Survey on Software Release Planning Models

63

whereas that in [14] only 7 out of 28 (25 %) took them into account. An interpretation
to this observation is that the proposals until 2008 needed to focus on algorithms able
to solve such hard constraints in a comprehensive and efﬁcient manner, while newer
proposals could build on top of these results and focus on the business-related issues
pointed out by soft factors.
As for model validation, we have already stated in Sect. 4 that industry involvement in the validation of the SRP models has decreased in the proposals found by our
study compared to those in [14]: while in [14] the amount of models validated in the
industry was 56 %, in the last years the participation of the industry has decreased to
23.5 % (4/17). This can be somewhat compensated by the fact that we have found
more models, in proportion, considering larger sets of requirements, which is a factor
supporting transferability of the models to industry.

5 Conclusions
In this paper, we have presented the state of the art on SRP models in the period
2009-2016. We have investigated two research questions analysing the characteristics
of these models and their validation state. We have used the results of a previous
systematic literature review published in 2010 [14] as main reference to our research
methodology. The main results (detailed in Sect. 4) show some progress with respect
the previous proposals in the period 1997–2008 surveyed in [14], in particular:
• Special attention to the scalability of the models (cf. RQ1.1).
• Increasing emphasis on soft factors like consideration of stakeholders and business
value (RQ1.4).
On the contrary, some other observations make evident that SRP scientiﬁc proposals have not yet reached the maturity required by industrial contexts:
• Incomplete input factors considered (RQ1.2) and simple output produced (RQ1.3).
• Proof-of-concept tool support, except for the case of the EVOLVE-ReleasePlanner
family of proposals (RQ2.1).
• Poor validation due to scarce industry validation (RQ2.2) and non-optimal consideration of threats to validity (RQ2.3).
We may conclude that the current state of the art claims for an increasing effort in
making SRP models closer to industry requirements like those surveyed in [13].
Acknowledgements. This work is a result of the SUPERSEDE project, funded by the EU’s
H2020 Programme under the agreement number 644018.

References
1.
2.

Aydemir, F.B., Mekuria, D.N., Giorgini, P., Mylopoulos, J.: Next Release Tool. ER 2015
Basili, V., Caldiera, G., Rombach, D.: Goal/question/metric paradigm. In: Encyclopedia of
Software Engineering, vol. 1, Wiley, New York (1994)

64
3.

4.
5.
6.
7.

8.

9.

10.
11.
12.
13.
14.

15.

16.

D. Ameller et al.
Didar-Al-Alam, S,M., Shahnewaz, S,M., Pfahl, D., Ruhe, G.: Analysis and improvement
of release readiness – a genetic optimization approach. In: Jedlitschka, A., Kuvaja, P.,
Kuhrmann, M., Männistö, T., Münch, J., Raatikainen, M. (eds.) PROFES 2014. LNCS,
vol. 8892, pp. 164–177. Springer, Heidelberg (2014). doi:10.1007/978-3-319-13835-0_12
Durillo, J.J., Zhang, Y., Alba, E., Harman, M., Nebro, A.J.: A study of the bi-objective
next release problem. Empir. Softw. Eng. 16(1), 29–60 (2011)
Grubb P, Takang AA. Software maintenance: concepts and practice. World Scientiﬁc,
2003
Lehman, M.M.: On understanding laws, evolution, and conservation in the large-program
life cycle. J. Syst. Softw. 1, 213–221 (1979)
Lehtola, L., Kauppinen, M., Kujala, S.: Requirements prioritization challenges in practice.
In: Bomarius, F., Iida, H. (eds.) PROFES 2004. LNCS, vol. 3009, pp. 497–508. Springer,
Heidelberg (2004). doi:10.1007/978-3-540-24659-6_36
Lindgren, M., Land, R., Norström, C., Wall, A.: Towards a capability model for the
software release planning process –- based on a multiple industrial case study. In:
Jedlitschka, A., Salo, O. (eds.) PROFES 2008. LNCS, vol. 5089, pp. 117–132. Springer,
Heidelberg (2008)
Luna, F., González-Álvarez, D.L., Chicano, F., Vega-Rodríguez, M.A.: The software
project scheduling problem: a scalability analysis of multi-objective metaheuristics. Appl.
Softw. Comput. 15, 136–148 (2014)
Ruhe, G.: Product Release Planning: Methods, Tools and Applications. CRC Press, Boca
Raton (2010)
Ruhe, G., Saliu, M.O.: The art and science of software release planning. IEEE Softw. 22
(6), 47–53 (2005)
Sommerville, I.: Software Engineering, 10th edn. Pearson (2015)
Stade, M., Seyff, N., Perini, A., Marco, J., Nadal, S., Franch, X.: D3.1: Requirements for
Methods and Tools. SUPERSEDE EU project deliverables (2015)
Svahnberg, M., Gorschek, T., Feldt, R., Torkar, R., Saleem, S.B., Shaﬁque, M.U.: A
systematic review on strategic release planning models. Inf. Softw. Technol. 52(3), 237–
248 (2010)
Szőke, Á.: Optimized feature distribution in distributed agile environments. In: Ali Babar,
M., Vierimaa, M., Oivo, M. (eds.) PROFES 2010. LNCS, vol. 6156, pp. 62–76. Springer,
Heidelberg (2010). doi:10.1007/978-3-642-13792-1_7
Wohlin, C.: Guidelines for snowballing in systematic literature studies and a replication in
software engineering. In: EASE 2014

Primary Studies Found in the Literature Survey
M1. Przepiora, M., Karimpour, R., Ruhe, G.: A hybrid release planning method and its
empirical justiﬁcation. In: ESEM 2012
M2. van Valkenhoef, G., Tervonen, T., de Brock, B., Postmus, D.: Quantitative release
planning in extreme programming. Inf. Softw. Technol. 53(11), 1227–1235 (2011)
M3. Golfarelli, M., Rizzi, S., Turricchia, E.: Multi-sprint planning and smooth replanning: an
optimization model. J. Syst. Softw. 86(9), 2357–2370 (2013)
M4. Xuan, J., Jiang, H., Ren, Z., Luo, Z.: Solving the large scale next release problem with a
backbone-based multilevel algorithm. IEEE Trans. Softw. Eng. 38(5), 1195–1212 (2012)
M5. Zorn-Pauli, G., Paech, B., Beck, T., Karey, H., Ruhe, G.: Analyzing an industrial strategic
release planning process–a case study at Roche diagnostics. In: REFSQ 2013

A Survey on Software Release Planning Models

65

M6. Heikkilä, V.T., Paasivaara, M., Lassenius, C., Engblom, C.: Continuous release planning
in a large-scale scrum development organization at ericsson. In: Baumeister, H., Weber, B.
(eds.) XP 2013. LNBIP, vol. 149, pp. 195–209. Springer, Heidelberg (2013). doi:10.1007/
978-3-642-38314-4_14
M7. McZara, J., Sarkani, S., Holzer, T., Eveleigh, T.: Software requirements prioritization and
selection using linguistic tools and constraint solvers—a controlled experiment. Empir.
Softw. Eng. 20(6), 1721–1761 (2015)
M8. Felderer, M., Beer, A., Ho, J., Ruhe, G.: Industrial evaluation of the impact of
quality-driven release planning. In: ESEM 2014
M9. Agarwal, N., Karimpour, R., Ruhe, G.: Theme-based product release planning: an
analytical approach. In: HICSS 2014
M10. Gueorguiev, S., Harman, M., Antoniol, G.: Software project planning for robustness and
completion time in the presence of uncertainty using multi objective search based software
engineering. In: GECCO 2009
M11. Heikkilä, V., Jadallah, A., Rautiainen, K., Ruhe, G.: Rigorous support for flexible planning
of product releases - A stakeholder-centric approach and its initial evaluation. In: HICSS
2010
M12. Szõke, A.: Conceptual scheduling model and optimized release scheduling for agile
environments. Inf. Softw. Technol. 53(6), 574–591 (2011)
M13. Chaves-González, J.M., Pérez-Toledano, M.A.: Differential evolution with Pareto
tournament for the multi-objective next release problem. Appl. Math. Comp. 252, 1–13
(2015)
M14. Li, C., van den Akker, M., Brinkkemper, S., Diepen, G.: An integrated approach for
requirement selection and scheduling in software release planning. Requir. Eng. 15(4),
375–396 (2010)
M15. Karim, M.R., Ruhe, G.: Bi-objective genetic search for release planning in support of
themes. In: Goues, C., Yoo, S. (eds.) SSBSE 2014. LNCS, vol. 8636, pp. 123–137.
Springer, Heidelberg (2014). doi:10.1007/978-3-319-09940-8_9
M16. Pitangueira, A.M., Tonella, P., Susi, A., Maciel, R.S., Barros, M.: Risk-aware
multi-stakeholder next release planning using multi-objective optimization. In: Daneva,
M., Pastor, O. (eds.) REFSQ 2016. LNCS, vol. 9619, pp. 3–18. Springer, Heidelberg
(2016). doi:10.1007/978-3-319-30282-9_1
M17. Nayebi, M., Ruhe, G.: Analytical product release planning. In: The Art and Science of
Analyzing Software Data. Morgan Kaufmann (2015)

Organizational Models

A Power Perspective on Software Ecosystem Partnerships
George Valença1 ✉ , Carina Alves1, and Slinger Jansen2
(

1

)

Centro de Informática, Universidade Federal de Pernambuco, Recife, Brazil
{gavs,cfa}@cin.ufpe.br
2
Department of Information and Computing Sciences,
Utrecht University, Utrecht, The Netherlands
fs.jansen@cs.uu.nl

Abstract. To address today’s market demands for continuous enhancement of
systems capabilities, software producing organisations have increasingly formed
or joined software ecosystems. In these complex and networked settings, they
deﬁne partnerships to complement each other’s features, acquire new skills,
divide R&D costs and share customers. Such business model entails mutual
dependence on companies for survival and eﬀectiveness. It creates a ﬂow of
inﬂuence among them and makes the ecosystem resemble power-law distribu‐
tions. Drawing on established concepts from Social and Behavioural Sciences,
we performed an exploratory case study of six software companies to investigate
their power-dependence relationships in an ecosystem environment. A prime
result of this research is showing that it is possible to understand how power and
dependence inﬂuence the behaviour and coordination of partner ﬁrms within a
software ecosystem.
Keywords: Software ecosystem · Open business model · Partnerships · Power ·
Dependence · Case study · Software product management

1

Introduction

Software companies have increasingly recognised that product and process innovation
alone are no longer suﬃcient to successfully market their technology. To stay compet‐
itive in current fast-moving economy, they have adopted innovative business models by
changing the dominant logic of doing business [26]. The emergence of cloud platforms,
the explosion of data and the development of new avenues for information have led these
ﬁrms to gradually build proprietary software ecosystems around their products. In
parallel, free software is a promising platform for open-source software ecosystems,
which are leveraged by active collaboration of community developers.
These ecosystems result from the strengthening of multiple, bi-lateral alliances
among complementors. As competitors in the IT industry and collaborators in joint
development initiatives, software ﬁrms have embraced coopetition, i.e. relationships
between companies that cooperate in some activities while compete in others. They are
constantly facing the interplay of power and dependence, which are driving forces of
their partnerships. The power exerted by companies in business-to-business contexts
© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 69–85, 2016.
DOI: 10.1007/978-3-319-49094-6_5

70

G. Valença et al.

inﬂuences business models for value co-creation as well as market dominance. Hence,
power is a critical aspect of interﬁrm relationships in business networks [19].
This paper reports on an exploratory case study that analysed power-dependence
relationships between partners in proprietary software ecosystems. We interpret this
phenomenon in light of established work from Social and Behavioural Sciences theo‐
rists, using a conceptual framework of multiple facets of power. Furthermore, we discuss
our ﬁndings from the perspective of ecosystem governance and health [1].
Our contribution is twofold. First, we describe diﬀerent power types and a structured
way of modelling power directions in a dyad. We then analyse the power ﬂows in the
partnerships and their outcomes in the ecosystems. Second, we oﬀer empirically
grounded knowledge and raise a theoretical discussion that is relevant for future research
on software ecosystems [9], as a growing ﬁeld in software engineering (SE).
The rest of the paper is structured as follows: in Sect. 2, we present the research
fundamentals. In Sect. 3, we detail the research design. Section 4 describes the results,
while Sect. 5 discusses these ﬁndings. Finally, in Sect. 6, we conclude the paper, with
implications and future work of this research.

2

Background

2.1 Software Ecosystems
The notion of software ecosystem adopts concepts from business and biological ecosys‐
tems to analyse the dynamics of today’s software industry [2]. For Jansen and colleagues
[13], a software ecosystem consists in a set of businesses functioning as a unit and
interacting with a shared market for software and services, together with relationships
among them. It represents a disruptive open business model strategy that proposes novel
ways for a central ﬁrm to collaborate with partners, and for them to create and capture
value from the network [26].
According to Campbell and Ahmed [3], one can describe a software ecosystem from
a business, architecture and social dimension. The business dimension comprises factors
such as vision, innovation and strategic planning. It involves the deﬁnition of business
strategies (e.g. proﬁt and revenue models) and partnership model (e.g. membership
models serving participants). The architecture or technical dimension focuses on tech‐
nological aspects of the ecosystem. It is concerned with products from third parties,
generally developed and integrated through a common platform. Finally, the social
dimension consists in the relationships among software ﬁrms/developers in the associ‐
ated social ecosystem. It includes the motivations to build alliances, rules for social
interaction, and opportunities to show and enhance actor’s capabilities [17].
There are two main types of software ecosystems: proprietary and open source [17].
In a proprietary ecosystem, the source code and other artefacts produced are protected
and new players usually need to be certiﬁed to join the network. It is the case of iOS
ecosystem, in which external developers guarantee a steady ﬂow of apps for Apple’s
iPhone, or SAP ecosystem, in which a thriving community of resellers enables SAP to
be Europe’s largest software company. In its turn, an open-source ecosystem has a
generally ﬂexible certiﬁcation criteria and actors who participate independently from

A Power Perspective on Software Ecosystem Partnerships

71

receiving revenues from their activity. Android is an open source ecosystem, with
participants that develop apps or plug-ins for the software platform.
Actors in software ecosystems play diﬀerent roles, with speciﬁc duties. The keystone
or orchestrator is a company, community or independent entity responsible for running
a technological platform, creating and applying rules (e.g. quality standards), and
managing the participation of actors. Niche player is a company, person or entity that
complements the platform by developing speciﬁc features that customers require. Valueadded reseller (VAR) is a company that makes proﬁt from selling ecosystem products.
Finally, users acquire and use an ecosystem solution or service to carry out their business
or perform personal activities [9, 17].
The keystone is the main responsible to orchestrate ecosystem members and coor‐
dinate development eﬀorts on top of the software platform. Moreover, this ﬁrm is
responsible for software ecosystem governance, which involves the creation of proce‐
dures to control, maintain, or change the ecosystem [1]. It includes business and tech‐
nical aspects such as management of the platform and interfaces, deﬁnition of a sustain‐
able business model, and development of partners [25]. A successful governance
strategy leads to a healthy software ecosystem. It brings a growing number of opportu‐
nities for participants and a great ability to innovate by transforming inputs into new
products and services with lower costs [1, 10, 12].
2.2 Power and Dependence
According to Emerson [8], power is not a property of an actor or group. It derives from
the existence of a relationship, which implies on specifying over what/whom power is
exercised. The power of an actor A is thought to be the inverse of the dependence of an
actor B, bringing the idea of a power-dependence relationship: the power of A over B
is equal to, and based upon, the dependence of B upon A.
Lawler [16] considers power as a structurally based capability of an actor. Hence,
an actor can rely on a power capability (PC) to exert inﬂuence over another party. This
PC can be positioned according to French and Raven taxonomy [20], which is formed
by ﬁve core power types or bases: coercive power, expert power, legitimate power,
referent power and reward power. We depict these forms of power in Table 1.
Table 1. Main types of power [20].
Power type
Description
Coercive (CO) It is based on B’s perception that A has the ability to mediate punishments for
him
Expert (EX)
It is based on B’s perception that A has some special knowledge or expertise.
Obs. informational power can be seen as a speciﬁc type of expert power that
involves the control of relevant information
Legitimate
It is based on B’s perception that A has the right to prescribe behaviour for him
(LE)
Referent (RF) It is based on the identiﬁcation of B with A, i.e. feeling of oneness of B with A
or desire for such identity. If A is an attractive group, B will desire to join in
Reward (RW) It is the perception that A has ability to provide rewards and beneﬁts for B

72

G. Valença et al.

The total amount of power in a relationship is not ﬁxed, but variable [16]. It means
that power is a contingent and dynamic construct, which is constantly negotiated in the
course of a relationship [15]. Hence, there may occur shifts of existing power, e.g. one
party gains power while the other’s power remains constant. Such dynamism results
from changes in an actor’s sources of power. They represent tangible or intangible
resources or outcomes that he exploits to aﬀect the behaviour of another actor [6]. Any
change in the availability or demand for such power sources may aﬀect power distri‐
bution in a dyad (i.e. interaction between a pair of ﬁrms).
The analysis of power is a means to explain behaviour and balance a dyad. When
mutual dependence diﬀers, there is a power advantage for one party, e.g. if A acts as
less dependent and more powerful, B may comply with requests from A, since B is less
able to resist. By tactically manipulating his PCs, an actor can obtain a power advantage
and rebalance a relationship. Too imbalanced dyads may be dysfunctional, since a
powerful actor may pursue short-term exclusive interests. This actor may also appro‐
priate a larger portion of overall beneﬁts accruing from the dyad [4]. In addition, certain
forms of power may not sustain long-term development of the relationship, as there shall
be undesirable exchange conditions and levels of uncertainty for one party.
Emerson’s deﬁnition is a common operationalisation of power in studies on interorganisational settings, with increased academic interest in recent years [18]. These
studies primarily draw on the power base theory from French and Raven, which was
originally presented in the book ‘Studies in Social Power’, in 1959, and is one of the
most adopted conceptualisations of power [7]. These authors underpin the conceptual
framework that we use in this paper to analyse power-dependence relationships between
studied companies in a software ecosystem.

3

Research Method

This research analyses the interplay of power and dependence in software ecosystems.
We translate this goal in the research question: how power and dependence manifest in
partnerships between companies participating in a software ecosystem? To answer it,
we performed an exploratory case study, which is appropriate when there is little
evidence about a phenomenon and researchers seek new hypotheses [22]. The qualitative
data collected enabled us to examine speciﬁc aspects of the phenomenon, such as situa‐
tions when a ﬁrm gains power, and decisions that reduce the dependence of a partner.
The study involved six companies that participate in a software ecosystem.
3.1 Data Collection
We collected empirical data through semi-structured interviews, which provided an indepth understanding of the exercise of power and the consequent notion of dependence
in an ecosystem. The interview protocol1 covered partnership strategies, technical and
social issues. We guaranteed that the same basic structure was followed in each
1

The interview protocol is available via this link https://goo.gl/LbvSLL.

A Power Perspective on Software Ecosystem Partnerships

73

interview, although we asked new questions according to interviewees discourse. To
map other partnerships and ecosystems, we used a snowball sampling and asked inter‐
viewees to recommend other ﬁrms and participants based on their expertise.
The case study started with the CRM Software Company, where we interviewed the
CEO and a developer. This company is a Microsoft VAR and has partnerships with the
software ﬁrms here named as Data Integration Company, E-mail Marketing System
Company, Financials Software Company and Insurance Software Company. We also
interviewed the CEOs of the Insurance Software Company and Financials Software
Company to enrich partnerships information provided by the CRM Software Company.
In addition to the interviews, we adopted the analysis method from Romano and collea‐
gues [21] to examine web-based qualitative data. We retrieved data from the websites
of these ﬁrms to analyse their product portfolio, partners, marketplace and news pages.
We also searched IT news portals, since most of these ﬁrms have an international oper‐
ation that makes them subject of evaluations from such websites.
3.2 Data Analysis
Initially, one researcher generated the interview transcripts and another researcher veri‐
ﬁed them to validate the text, clarify interviewees’ expressions and discuss the ﬁndings.
This procedure turned the ﬁndings more concrete by reducing misunderstandings. In
addition to interviews data, we collected evidence from ﬁrms’ portals and news websites.
We adopted Thematic Analysis (TA), which is one of the most common methods for
synthesising evidence in SE and particularly useful in case studies [5]. TA aims to iden‐
tify, analyse and report patterns within data. This coding procedure generated themes
and sub-themes related to the ecosystem scenario, such as ‘technological platform’ and
‘software product management’. This structure helped us to organise the data set in rich
detail, preparing it for a conceptual analysis.
In this subsequent step, we relied on our theoretical framework, without which TA
would have limited interpretative strength. We used an abductive reasoning and adopted
established theories from Social and Behavioural Sciences (c.f. Sect. 2) to describe our
ﬁndings. We considered Emerson’s statement that power resides in the other’s depend‐
ency [8] to examine the power-dependence relationships, here represented by the part‐
nerships. We identiﬁed the power capabilities (PC) [16] held by partners, which enabled
us to denote situations of power exercise in the ecosystem. We just considered PCs
identiﬁed by our data analysis, since we are not performing a general investigation of
the companies in their segments. It means we only listed PCs supported by collected
evidence. This decision increases the validity of our study, since we kept a clear chain
of evidence while drawing our conclusions. We labelled each PC with the code Power‐
Type_CompanyCode_Number, where PowerType means the form of power exercised
by the ﬁrm (cf. Table 1), CompanyCode indicates the ﬁrm (CA - CRM Software
Company, CB - Data Integration Company, CC - E-mail Marketing System Company,
CD - Financials Software Company, CE - Insurance Software Company and CF Microsoft) and Number is the number of the PC (01, 02, and so on). For instance,
LE_CA_02 means the second power capability of legitimate power type exercised by
the CRM Software Company over an ecosystem partner.

74

G. Valença et al.

We created schemes to represent the use of power capabilities by companies in a
software ecosystem. A directed arrow indicates an activity that expresses a form of
power exercised by a ﬁrm in a given situation of the partnership. In addition, the schemes
indicate the correspondent source(s) of power used by partner companies.
3.3 Case Companies
The CRM Software Company (CA) was founded in mid 90 s. It has 30 employees and
serves about 400 customers from IT and ﬁnancials markets in Benelux. The ﬁrm’s busi‐
ness model completely depends on Microsoft: it is a VAR of Dynamics CRM. It adapts
the product to diﬀerent verticals by oﬀering templates (functional modules), toolbox
(solution to build such templates) and connectors (solution that enables data exchange
between Dynamics CRM and other systems). It has focused on software integration via
connectors to act as niche player in multiple software ecosystems.
The Data Integration Company (CB) was founded in 1995 and is a leading provider
of CRM integration solutions worldwide. The company built an ecosystem around an
integration platform, which has over 1.200 partners. Integration providers get access to
12.000 clients from diverse markets after joining the partner program as VARs or system
integrators. They build their solutions on the platform or sell existing ones. Its Microsoft
partner program provides standard connectors for Microsoft Dynamics.
The E-mail Marketing System Company (CC) is a 10-year-old company with 40
employees. The company provides solutions for the marketing domain and has over
2.700 customers. Based on its e-mail marketing system, it raised an ecosystem of over
130 partners. It particularly invests in integration partners, e.g. ﬁrms that develop
connectors that allow clients to send newsletters from their ERPs or CRM systems.
The Financials Software Company (CD) has existed for more than 30 years. It is a
market leader for cloud accounting in Benelux, with 500 employees and a portfolio of
products for ﬁnancial, accountancy and related domains. It has an expanding software
ecosystem around its solutions, with more than 160.000 clients. A third party can join
this network after becoming an app centre partner. Generally, partners build extensions
in the form of connectors that the ﬁrm will later oﬀer in its online marketplace.
The Insurance Software Company (CE) is a 30-year-old ﬁrm with 180 employees.
It provides a cloud-based and modular SOA insurance system, with policy and claim
handling modules, for Netherlands, UK, Belgium and South Africa markets. It has
gradually created an ecosystem as a means to go forward in Netherlands competitive
software industry. It focuses on its competencies and relies on small implementation
partners who provide complementary features and consultants who sell its products.
Finally, Microsoft (CF) was founded in 1975 and it is the world’s largest software
vendor in terms of revenue. Its Microsoft Dynamics solution consists in a line of ERP
and CRM applications. Dynamics CRM focuses on sales, marketing, and service sectors,
relying on an ecosystem of 640.000 partners who use a .NET-based framework to build
customisations. Microsoft certiﬁes these ﬁrms as VARs and enables them to access
cutting-edge technologies and potentially reach a base of 40.000 customers.

A Power Perspective on Software Ecosystem Partnerships

4

75

Results

This section answers the research question by describing how power and dependence
manifest in partnerships within a software ecosystem setting. Our analysis takes the
perspective of the CRM Software Company as VAR of Dynamics CRM ecosystem and
niche player of other ecosystems, in which it provides connectors for Dynamics CRM.
We adopt the three-dimensional view of software ecosystems from Campbell and
Ahmed [3] to describe power distribution in the partnerships and introduce the power
capabilities of the companies based on French and Raven power forms [20].
4.1 Business Dimension
The business dimension involves the creation of an ecosystem vision, which generally
consists of disseminating product and platform goals to inspire participants to follow
them. This dimension embraces the deﬁnition of an innovation strategy to support the
continuous improvement of processes and products. It also includes the creation of a
strategic plan to understand how, when and who will perform the goals [3, 23]. In
particular, the keystone may open up governance policies and allow the community to
inﬂuence them. In this case, this ﬁrm gives power to partners as they start to participate
in ecosystem decision-making process [14].
The studied networks are in an expanding phase and their initial keystones (Data
Integration Company, E-mail Marketing System Company, Financials Software
Company, Insurance Software Company and Microsoft) are in charge of ecosystem
governance. It diﬀers from open ecosystems, where this duty is generally shared in a
committee (e.g. product managers, partners and users). Keystones deﬁne how much
power is left to members and how much they keep for themselves. Frequently, studied
ﬁrms not just provide partners with a comprehensive view of the platform and product
roadmaps, but also share decision rights. For instance, the Data Integration Company
has a voting system for partners to make trade-oﬀ decisions that a product manager does.
They register feature requests to improve integration tools and assess the value of other
partners’ ideas. The keystone then shows its dependence on partners to fulﬁl untapped
needs and leverage ecosystem innovation, as perceived in the arguments of the Finan‐
cials Software Company CEO: “to what extent I want to be a certain product (or) leave
it to others? I said (this) when he (marketing director) asked me how to build this
ecosystem: ‘create space; if (partners) don’t have space, they gonna suﬀocate and there
is no money for anybody; nobody is gonna work with (the ecosystem)”.
Partners such as the CRM Software Company obtain the legitimate power to inﬂu‐
ence management plans of the ecosystem (LE_CA_01). By sharing this power with
external actors, studied keystones allow partners to adjust ecosystem focus, e.g. tech‐
nologies to develop, features to include in future release. Hence, keystones beneﬁt from
the convergence of development eﬀorts in the network. Figure 1 shows this power.

76

G. Valença et al.

Fig. 1. Legitimate power (LE) of the CRM Software Company over keystones

4.2 Social Dimension
The social dimension of an ecosystem involves the factors promotion, utilitarianism and
knowledge sharing. Other aspects related to this perspective are recognition from peers,
reputation, learning and sense of code ownership, for instance [2, 3]. The keystone must
explore these factors and not simply open a platform to obtain extensions from third
parties. The success of a software ecosystem depends on how adequately a ﬁrm engages
with other peers and creates a collaborative and innovative environment.
By structuring a software ecosystem and positioning itself as a keystone, a ﬁrm
reinforces its expert power, i.e. superior abilities or information. The recognition of
such expertise by external actors may generate a feeling of membership and desire to
join the network. The reliance of the CRM Software Company on the vast know-how
and robust products of Microsoft (EX_CF_01) motivated the ﬁrm to deﬁne Dynamics
CRM as a foundation for its solutions. It also trusts the platform and tools of the Data
Integration Company (EX_CB_01) to develop connectors for SaaS, cloud or hybrid
scenarios, as cited by the CRM Software Company CEO: “They (Data Integration
Company) are a market leader in connectors”. Partners’ knowledge is critical for the
ﬁrm, which previously had to develop connectors from scratch, using web services.
Despite the expertise of the E-mail Marketing System Company, Financials Software
Company and Insurance Software Company, respectively, in e-mail marketing
(EX_CC_01), ﬁnances control (EX_CD_01) and policy/claim handling (EX_CE_01),
they depend on the CRM Software Company to extend their product portfolio. The CEO
of the Insurance Software Company CEO explained the importance of complementors
to fuel the ecosystem with speciﬁc features: “we have organisations such as CRM Soft‐
ware Company around us that provide additional functionality to our own (, which is)
not that special need that we want to develop ourselves”. The need for the expertise of
partners was also mentioned by the Financials Software Company CEO: “these are the
areas that we don’t cover with our own product – this is where we are going to ﬁnd
apps; you won’t be able to do it without an emergent ecosystem”.
The dependence of these companies gives power to the CRM Software Company.
Hence, this ﬁrm expresses its expert power on CRM (EX_CA_01). It links Dynamics
CRM with other solutions via connectors, resells this product and enables clients to
optimise it with a toolbox and templates. “For certain verticals you have to adapt it
(Dynamics CRM); we got 4 templates (and) developed an editor where you can build
templates; there is a need for CRM and they (partners) don’t want to build it themselves;

A Power Perspective on Software Ecosystem Partnerships

77

we build this connection; we got expertise in this product; it is integration; connector
sales”, detailed the CRM Software Company CEO.
This context describes the role of expert power in a partnership, which involves the
knowledge a ﬁrm has in a given domain, product or technology, as shown in Fig. 2.

Fig. 2. Expert power (EX) of the CRM Software Company and keystones

Microsoft and other keystones may directly promote ecosystem participants via an
associate model, which enables them to obtain and manage partners. In a proprietary
ecosystem, a partnership model deﬁnes roles and duties that external actors may play in
the network while they manipulate software artefacts and information [24]. This instru‐
ment specially states the beneﬁts resulting from the adoption of the platform and presents
strategies to generate value from the partnership. By creating incentives, the keystone
highlights ecosystem utility and fosters partners’ engagement. In their turn, partners
enable the keystone to oﬀer complements and access new technologies [23].
We perceived that all ﬁrms have the ability to generate advantages for a partner,
which denotes their reward power. Microsoft beneﬁts the CRM Software Company
with Dynamics CRM VAR certiﬁcation (RW_CF_01), which involves technical support
in product deployment and maintenance, and commercial support via licensing, presales and marketing actions. Moreover, it lists the solutions of the CRM Software
Company at Microsoft Pinpoint, a wide marketplace for clients to search applications
and services based on Microsoft technologies. “This one (Dynamics CRM ecosystem)
has 40.000 customers”, explained the CEO of the CRM Software Company.
Microsoft ecosystem also oﬀers a key asset to partners: Microsoft’s strong image as
one of the world’s most valuable and successful ﬁrms. “CRM is a bigger ecosystem; if
they (partners) say ‘this is CRM Software Company add-on’, no one knows the ﬁrm; if
they (clients) see Microsoft logo, they click and buy”, argued the CEO of the CRM
Software Company. This niche player recognises Microsoft’s referent power
(RF_CF_01), using such reputation to promote its sales (Fig. 3). This visibility is far
more relevant than that of small-to-medium partners of the CRM Software Company
such as the E-mail Marketing System Company or Insurance Software Company.
An indirect beneﬁt from Microsoft VAR certiﬁcation is enabling the CRM Software
Company to reach other networks. The ﬁrm was certiﬁed as system integrator in the
software ecosystem of the Data Integration Company, which has a Microsoft partner
program (RW_CB_01). This keystone provides partners with an API and integration
tools as well as developers to support connectors’ construction. In addition, the Data
Integration Company enables partners to access an online marketplace and promotes

78

G. Valença et al.

Fig. 3. Referent power (RF) of Microsoft over the CRM Software Company

their expertise by publishing customer stories involving the extensions. The CRM Soft‐
ware Company developer reinforced the success of this ecosystem: “(It) is a new plat‐
form; there are lots of partners developing connectors for it”.
The E-mail Marketing System Company grants an integration partner certiﬁcation
(RW_CC_01) to the CRM Software Company. It does campaigns about the CRM Soft‐
ware Company connector on its website, where customers can buy it and receive further
assistance. Similarly, the Financials Software Company certiﬁes the CRM Software
Company as app centre partner (RW_CD_01). The beneﬁts include free access to APIs,
participation in workshops and developer resources. It also publishes partners extensions
at an apps centre, enabling them obtain new customers: “they have business apps from
third parties and we are one of them; (its) online (platform) is an accountancy program
(with) 160.000 customers”, described the CRM Software Company CEO. In addition,
the Financials Software Company organises business events to foster interaction among
niche players.
In Fig. 4, we represent the dynamics of reward power in the relationship between
keystones and the CRM Software Company in the software ecosystems.

Fig. 4. Reward power (RW) of keystones over the CRM Software Company

The Insurance Software Company presents partners’ products to potential clients in
pre-sales (RW_CE_01), as cited by its CEO: “in our portfolio, these products are lined
up just as our own products are; we sell (CRM Software Company) eﬃciency, compli‐
ance or commercial possibilities”. The partner receives a purchase order, with an agree‐
ment per client, or it may be hired as contractor in joint sales. In this speciﬁc relationship,
the CRM Software Company oﬀers a kickback fee (RW_CA_01) once there are recom‐
mendations/potential sales generated by this partner. The CEO of the Insurance Software
Company described this agreement: “if a partner does something on his own, he
provides a kickback and you get a small percentage of (the business deal)”. Figure 5
shows reward power forces in the relationship between the Insurance Software Company
and the CRM Software Company.

A Power Perspective on Software Ecosystem Partnerships

79

Fig. 5. Reward power (RW) of the CRM and Insurance Software Companies

The studied keystones obtain reward power by increasing the dependence of niche
players on opportunities accruing from ecosystem customer base. The CRM Software
Company CEO described this context: “you are depending but on the other side you
are using big marketing machine ecosystems of very big companies; (and) the marketing
is done by those players; we are here on their website”. In its turn, the CRM Software
Company gains power once creating a dependence in the Insurance Software Company
with respect to monetary payment for deals this partner promotes.
However, the Insurance Software Company has the right to cease all opportunities
oﬀered to a partner if his extensions do not satisfy acceptance criteria. Since these
features may pose a risk to the image of the system and aﬀect company’s reputation, the
partner can be removed from the ecosystem. The CEO of the Insurance Software
Company draws an analogy to explain this practice: “the quality of our partners is a risk
to our brand; if you have a low battery quality, you have an issue (for the whole car);
all cars (have) rubbish (and) there is only one part that has been replaced, which is
rubbish; it has been a natural situation”. Such careful quality control may involve the
substitution of partners who oﬀer low quality features. This penalty denotes the coercive
power of the Insurance Software Company (CO_CE_01) (Fig. 6).

Fig. 6. Coercive power (CO) of Insurance Software Company over CRM Software Company

4.3 Technical Dimension
It is imperative that the keystone coordinates the contributions of multiple and varied
actors from a technical perspective. Hence, the technical dimension embraces software
platform management in terms of domain engineering, products’ commonalities and
variabilities, among other issues. Such open software enterprise model also requires
changes in software product management processes [14], e.g. on a tactical-operational
level, the keystone shall inform partners about policies related to quality requirements,
certiﬁcation and intellectual property (IP) of products in the ecosystem.

80

G. Valença et al.

Such rules that guide the partnerships involve implicit and explicit rights of the
companies, which conﬁgure their legitimate power in the ecosystem. The ownership
of Dynamics CRM provides Microsoft with full control of changes over system func‐
tionality. Thereby, value-added resellers cannot change core features of Dynamics CRM
(LE_CF_01). This means that the CRM Software Company can only develop exten‐
sions. Frequently, the company has to explain to clients that their customisation requests
cannot be satisﬁed. “We are only partners; we can’t just change (Dynamics) CRM; (it)
is a Microsoft product”, argued the CEO of the CRM Software Company.
The expertise of the CRM Software Company in CRM domain enables it to deter‐
mine how each connector will be built (LE_CA_02), as indicated by its CEO: “the whole
thing about connectors is deﬁned by us”. The ﬁrm is in charge of requirements and
technologies speciﬁcation, whereas partners generally have short inﬂuence on connec‐
tors’ development due to their usual lack of knowledge in CRM. “The [CRM Software
Company] owner is very convincing saying ‘we are the specialists, we will dictate what
the requirements are’; Insurance Software Company isn’t a CRM specialist”, cited the
developer of the CRM Software Company.
In particular, the E-mail Marketing System Company obtained the right to specify
the scope of the connector since it paid the CRM Software Company to develop it
(LE_CC_01). It demanded the CRM Software Company to follow a requirements docu‐
ment. “They had a document (with) how the connector should work; it was mainly a
connector that was placed (by us) into their requirements to get it working”, described
the CRM Software Company developer.
Although the CRM Software Company uses the platforms and app stores of partner
ecosystems to build and oﬀer the connectors, it owns the intellectual property of connec‐
tors. Thereby, it has the prerogative to control the evolution of connectors (LE_CA_03),
as argued by the CEO: “the plan is to phase it (E-Mail Marketing System Company
connector) out, because we want to bring down our portfolio; we have a lot of little
products and we want to focus on a couple of things”.
While the CRM Software Company deﬁnes how connectors are built and maintained,
the Data Integration Company demands that the ﬁnal version of the connectors go
through a certiﬁcation procedure (LE_CB_01). This prerogative stems from the Data
Integration Company ownership of the technology and marketplace used by the CRM
Software Company. It also results from the fact that connectors will be available in other
ﬁrms’ sales channels, carrying the mark of the Data Integration Company.
Similarly, the Financials Software Company controls the submission of partners’
extensions to the apps centre via a lengthy quality review (LE_CD_01). “They (partners)
submit the application referral, (which) is reviewed by market and tech departments;
we request them to do a demo (to) see how it works (and) publish (in the) apps centre”,
detailed the Financials Software Company CEO.
The Insurance Software Company imposes a code restriction to partners. It has the
right to deﬁne degrees of access to source code of its system (LE_CE_01). Hence, the
CRM Software Company only deals with the system via interfaces: “so far we have not
let partners into our code base; they can adjust or add codes; change parameterisation;
(but) not customise”, cited the Insurance Software Company CEO. It denotes the legit‐
imate power of the ﬁrm to manage system architecture in the ecosystem.

A Power Perspective on Software Ecosystem Partnerships

81

In Fig. 7, we represent the interplay of legitimate power forces among studied soft‐
ware companies. For instance, it shows that the CRM Software Company cannot apply
its power to deﬁne connector requirements (LE_CA_02) over the E-mail Marketing
System Company. This is a right of this partner due to his payment for the connector
(LE_CC_01). Therefore, the E-mail Marketing System Company supersedes the right
of the CRM Software Company to deﬁne development details.

Fig. 7. Legitimate power (LE) of keystones and CRM Software Company

5

Discussion

The previous section described the forms of power and respective sources used by the
companies in investigated partnerships. The representation of power enabled us to
understand their relationships from multiple views, such as knowledge recognition and
use (expert power), rights and roles deﬁnition (legitimate power), and beneﬁts sharing
(reward power). This analysis of what directs power conﬁguration is a relevant input
to propose eﬀective business strategies and deﬁne suitable governance mechanisms in
a software ecosystem environment. Our ﬁndings reveal that certain power capabilities
allow partnerships to ﬂourish. The keystones’ strategy of sharing decisions provides
niche players with the legitimate power to inﬂuence changes in product and platform
plans [1]. In turn, this strategy fosters innovation in the ecosystem. The keystone ﬁrms
could also nurture value creation by exercising a speciﬁc reward power: support part‐
ners with low implementation costs and even provide ﬁnancial resources [25]. In addi‐
tion, external players would have an important incentive to join and stay in the network.
By adopting this strategy, keystones would reinforce their role and power position in
the expanding software ecosystems.
The legitimate power of keystones and niche players to control the platform and
complementary products reﬂect their rights, as rules that govern partnerships. For
instance, keystones have the right to perform quality evaluations of extensions built by

82

G. Valença et al.

partners, who accept this rule due to their dependence on the ecosystem. However,
keystones must ensure that these standards and certiﬁcations do not decrease the produc‐
tivity of niche players. It is not the case of the Data Integration Company, whose legit‐
imate power to review partners’ connectors implies on a lengthy certiﬁcation process.
Similarly, the Insurance Software Company has the power to restrict the access to core
code of the insurance system, which constraints the development of complements and
may aﬀect ecosystem productivity [12].
If certain exercises of legitimate power may cause conﬂicting relationships, the use
of coercive power can compromise the success of the business model adopted in the
ecosystem. The coercive behaviour of a keystone may promote the migration of a niche
player to another network where he could obtain similar opportunities but with less
intimidating rules. The survival rate of participants then decreases, aﬀecting ecosystem
robustness [10]. We observed that the Insurance Software Company simply substitutes
a partner once the quality of his features may threaten the reputation of the ﬁrm and
hamper the development of its referent power. Instead of exercising such negative form
of power over partners, it should deﬁne mechanisms for them to properly develop
complements (e.g. open platform, with published interfaces, or integration services) and
guarantee their approval (e.g. quality requirements for a release).
Studied keystones also invest in the use of reward power. They provide a ﬁrm such
as the CRM Software Company with a partner certiﬁcation. This often involves the
promotion of partner products in online sales channels or via recommendations to
clients. Such capability provides partners with business prospects, increasing their
market share. It reveals a governance mechanism to attract and retain ﬁrms in the
network over time by oﬀering beneﬁts that are critical for small players. A speciﬁc
reward power of the Financials Software Company is a key asset for a software
ecosystem to thrive: it creates incentives for partners to close new deals via workshops
and business events for ecosystem members. This keystone increases the connections
among participants once it fosters networking. Thereby, it strengthens the structure of
the network, which contributes to the robustness of the ecosystem [10].
By oﬀering a technological platform, keystones enable third parties to fuel the
ecosystem with additional functionality. Through this strategy, partners can show their
expert power in the network. In particular, keystones such as the Data Integration
Company and E-mail Marketing System Company reinforce the expertise of niche
players by publishing successful cases about their complements. They promote the
knowledge available in the ecosystem and diﬀuse innovation among members [10].
Big players such as Microsoft naturally hold referent power. The strong business
reputation is a signiﬁcant power capability for a keystone as it creates a feeling of respect
and admiration for the ﬁrm. To develop their referent power, other keystones may high‐
light their growing position in the market as well as the attractiveness of their platforms
and economics of their network [11]. Such promotion strategies rely on external actors
recognising the status of both the ﬁrm and the ecosystem.
In Table 2, we provide an overview of our case study analysis. It presents the
different types of power and their observed outcomes in a software ecosystem.

A Power Perspective on Software Ecosystem Partnerships

83

Table 2. Forms of power used by companies and outcomes on the software ecosystems.
Power type Outcomes
Coercive
It fosters partners’ migration to other ecosystems due to tensions entailed by
negative attitudes of the keystone, which may also directly remove members from
community
Expert
It raises the trust of partners and external actors on the expert company. It enables
the ﬁrm to enter niche markets, build complements and create value for the
ecosystem
Legitimate It represents rules that guide partnerships, delineating ﬁrms’ rights and roles. Once
there is a keystone, this ﬁrm formally deﬁnes such rules, as governance
mechanisms
Referent
It provides the company with strong respect and reputation, increasing its ability
to attract external actors interested in gaining visibility and opportunities
Reward
It enables partners to fulﬁl each other’s business and ﬁnancial expectations. It helps
value co-creation in the ecosystem

6

Conclusion and Future Work

The in-depth analysis of power-dependence dyads is a useful lens for researchers to
explore ecosystem partnerships. From the view of practitioners, it is a valuable tool for
ﬁrms to have insights on how to aﬀect the resources ﬂow, obtain a higher position in the
ecosystem or manage the degree of dependence on competitors. By analysing power
distribution, keystones can also deﬁne governance strategies that enable them to use
existing power to eﬀectively manage the ecosystem [11].
Studied keystones foster ecosystem success by leveraging complementors: they exalt
partners’ expert power and exercise reward power by raising business in the network.
Besides, they often do not use the inﬂuence resulting from their status to apply coercive
power. They also avoid using power to take actions solely in their favour and get a bigger
slice of the pie, which could harm ecosystem performance.
The ﬁndings represent our interpretation of partners’ reality. To support credibility,
we used multiple sources of data and discussion of results among authors. Together with
details about data collection and analysis, these strategies also ensured reliability.
In future studies, we plan to (i) increase the number of participants per company (e.g.
obtain more input from technical staﬀ about power in partnerships) and (ii) verify our
ﬁndings in ﬁnal interviews (respondent validation). Our broad goal is to develop a
substantive theory to describe power exercise by software ecosystem partners.
Acknowledgement. We thank Yang Zherui Ryan for his constructive feedback on early versions
of this paper.

84

G. Valença et al.

References
1. Alves, C., Oliveira, J., Jansen, S.: Software ecosystem governance – a systematic literature
review and research agenda. Inf. Softw. Technol. (In submission)
2. Barbosa, O., Pereira, R., Alves, C., Werner, C., Jansen, S.: A systematic mapping study on
software ecosystems from a three-dimensional perspective. In: Jansen, S., Cusumano, M. A.,
Brinkkemper, S. (eds.) Software Ecosystems: Analyzing and Managing Business Networks
in the Software Industry, pp. 59–81. Edward Elgar Publishing (2013)
3. Campbell, P.R.J., Ahmed, F.: A three-dimensional view of software ecosystems. In: Fourth
European Conference on Software Architecture, pp. 81–84 (2010)
4. Casciaro, T., Piskorski, M.J.: Power imbalance, mutual dependence, and constraint
absorption: a closer look at resource dependence theory. Adm. Sci. Q. 50(2), 167–199 (2005)
5. Cruzes, D.S., Dybå, T.: Recommended steps for thematic synthesis in software engineering.
In: International Symposium on Empirical Software Engineering and Measurement, pp. 275–
284 (2011)
6. Dahl, R.A.: The concept of power. Behav. Sci. 2(3), 201–215 (1957)
7. Elias, S.: Fifty years of inﬂuence in the workplace: the evolution of the French and Raven
power taxonomy. J. Manage. Hist. 14(3), 267–283 (2008)
8. Emerson, R.M.: Power-dependence relations. Am. Sociol. Rev. 27(1), 31–41 (1962)
9. Hanssen, G., Dybå, T.: Theoretical foundations of software ecosystems. In: 4th International
Workshop on Software Ecosystems, pp. 6–17 (2012)
10. Hartigh, E., Tol, M., Visscher, W.: The health measurement of a business ecosystem. In:
Annual Meeting of European Chaos and Complexity in Org. Network, pp. 1–39 (2006)
11. Hurni, T., Huber, T.: The interplay of power and trust in platform ecosystems of the enterprise
application software industry. In: European Conference on Information Systems (2014)
12. Iansiti, M., Levien, R.: Strategy as ecology. Harvard Bus. Rev. 82(3), 68–81 (2004)
13. Jansen, S., Finkelstein, A., Brinkkemper, S.: A sense of community: a research agenda for
software ecosystems. In: 31st International Conference on Software Engineering, pp. 187–
190 (2009)
14. Jansen, S., Brinkkemper, S., Souer, J., Luinenburg, L.: Shades of gray: Opening up a software
producing organization with the open software enterprise model. J. Syst. Softw. 85(7), 1495–
1510 (2012)
15. Lacoste, S., Johnsen, R.E.: Supplier-customer relationships: a case study of power dynamics.
J. Purchasing Supply Manage. 21(4), 229–240 (2015)
16. Lawler, E.: Power processes in bargaining. Sociol. Q. 33(1), 17–34 (1992)
17. Manikas, K., Hansen, K.M.: Software ecosystems - A systematic literature review. J. Syst.
Softw. 86(5), 1294–1306 (2013)
18. Meehan, J., Wright, G.H.: The origins of power in buyer–seller relationships. Ind. Mark.
Manage. 41(4), 669–679 (2012)
19. Olsen, P.I., Prenkert, F., Hoholm, T., Harrison, D.: The dynamics of networked power in a
concentrated business network. J. Bus. Res. 67(12), 2579–2589 (2014)
20. Raven, B.H.: The bases of power: origins and recent developments. J. Soc. Issues 49(4), 227–
251 (1993)
21. Romano, N.C., Donovan, C., Chen, H., Nunamaker, J.F.: A methodology for analyzing webbased qualitative data. J. Manage. Inf. Syst. 19(4), 213–246 (2003)
22. Runeson, P., Höst, M.: Guidelines for conducting and reporting case study research in software
engineering. Empir. Softw. Eng. 14(2), 131–164 (2009)

A Power Perspective on Software Ecosystem Partnerships

85

23. Santos, R.P., Werner, C.: Treating business dimension in software ecosystems. In:
International Conference on Management of Emergent Digital EcoSystems, pp. 197–201
(2011)
24. Van Angeren, J., Kabbedijk, J., Jansen, S., Popp, K.M.: A survey of associate models used
within large software ecosystems. In: 3rd International Workshop on Software Ecosystems,
pp. 27–39 (2011)
25. Van Angeren, J., Alves, C., Jansen, S.: Can we ask you to collaborate? Analyzing app
developer relationships in commercial platform ecosystems. J. Syst. Softw. 113, 430–445
(2016)
26. Weiblen, T.: Opening up the business model: Business model innovation through
collaboration. Ph.D. Thesis. University of St. Gallen, Bamberg (2015)

No More Bosses?
A Multi-case Study on the Emerging
Use of Non-hierarchical Principles in Large-Scale
Software Development
Helena Holmström Olsson1(&) and Jan Bosch2
1

Department of Computer Science, Malmö University, Nordenskiöldsgatan 1,
211 19 Malmö, Sweden
helena.holmstrom.olsson@mah.se
2
Department of Computer Science and Engineering, Chalmers University
of Technology, Hörselgången 11, 412 96 Göteborg, Sweden
jan.bosch@chalmers.se

Abstract. Organizations are increasingly adopting alternative organizational
models to circumvent the challenges of traditional hierarchies. In these alternative models, organizations have leaders instead of the traditional boss and
teams operate using self-management and peer-to-peer advice processes.
Although the adoption of these models have primarily been seen in smaller
companies and startups, examples of long-established organizations that have
adopted these models to restructure themselves and move away from their traditionally slow hierarchies are starting to appear. In this paper, we explore how
seven large software-intensive companies in the embedded systems domain are
adopting principles of non-hierarchical organizations in order to increase
empowerment. Based on our empirical ﬁndings, we provide recommendations
for how to manage this transformation and we develop a model that outlines the
steps that companies typically take when transforming from hierarchical towards
empowered organizations.
Keywords: Empowered organizations
Self-management  Autonomy


Non-hierarchical

principles


1 Introduction
In most organizations, formal structures and hierarchical layers deﬁne tasks, roles and
behaviors [1, 2]. In such organizations, job descriptions outline the responsibility and
the authority that is associated with each person and each role and there is a clear
separation between different teams, departments and divisions in the organization.
Typically, hierarchical organizations have a top-down management structure and are
characterized by centralized decision-making and strategy [2]. While hierarchical
organizations beneﬁt from advantages related to e.g. efﬁciency and control, they suffer
from being slow and less responsive to rapidly changing market and customer needs. In
addition, they are often found less motivating and engaging for employees, as they tend
to restrict individual and team autonomy.
© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 86–101, 2016.
DOI: 10.1007/978-3-319-49094-6_6

No More Bosses?

87

Recently, a number of alternative organizational models have started to appear as a
response to the many challenges identiﬁed with traditional hierarchies, and to better
address the rapid changes in today’s market environment. Also, and as a major characteristic, these models recognize the increasing need for organizations to attract, retain
and advance top talent by supporting employees in adopting the critical new behaviors
necessary to keep pace. Therefore, and instead of hierarchies and managers as central
decision-makers, these models focus on self-management, autonomy and empowerment and rather than managers they advocate leaders with the role to support and guide
the organization. Based on a number of recent examples [2–7], it is argued that
companies adopting non-hierarchical principles to increase empowerment early will
signiﬁcantly improve their competitiveness.
However, although some examples exist, non-hierarchical principles and initiatives
to increase empowerment have been challenging to adopt in larger companies due to
existing structures and formal hierarchies. Typically, these organizations are large,
globally distributed and producing systems of high complexity that need to satisfy strict
certiﬁcation and regulatory demands. As a result, the organizational structures tend to
be highly hierarchical and centralized.
In this paper, we report on recent attempts to adopt non-hierarchical principles to
increase empowerment also in large organizations. We explore how seven large
software-intensive companies in the embedded systems domain are adopting principles
of non-hierarchical organizations in order to increase empowerment. In our study, we
provide qualitative empirical evidence on situations in which the organizations we
studied were able to by-pass formal hierarchies by adopting alternative ways-of-working
that allowed for outcomes that the traditional structures do not support.
The contribution of the paper is twofold. First, we present qualitative empirical
evidence on how large software-intensive companies are adopting non-hierarchical
principles. Based on our empirical ﬁndings, we present four areas in which the companies adopted alternative ways-of-working and how these allowed them to act more
rapidly and with increased engagement and autonomy by teams. In addition, we provide a number of recommendations that help facilitate the adoption of non-hierarchical
ways-of-working in large organizations. Second, we develop a model that identiﬁes the
steps that companies typically take when transforming from hierarchical towards
empowered organizations.
The continuation of the paper is organized as follows. In the background section,
we describe the characteristics, the strengths and the weaknesses of hierarchical
organizations as outlined in literature. Also, we identify and describe a number of new
organizational models that have recently emerged as a response to the weaknesses
experienced in hierarchical organizations. In Sect. 3, we present the case companies
involved in the study and we describe the research method that was chosen for the
study. In Sect. 4, we present the ﬁndings from the empirical study and we outline four
areas in which the case companies adopted alternative ways-of-working characterized y
non-hierarchical principles. In Sect. 5, we discuss our ﬁndings and we develop a model
that show the steps that companies typically take when moving towards empowered
organizations. Finally, in Sect. 6, we conclude the paper.

88

H.H. Olsson and J. Bosch

2 Background
2.1

Hierarchical Organizations

Traditionally, and as can be seen in most larger companies including the ones we
studied, organizational structures are formed around hierarchies [1, 2, 7–9]. In such
hierarchies, people and roles are typically deﬁned by job descriptions that outline the
responsibility and the authority that is associated with each person and role. While the
advantage of this is that each person in such an organization has a well-deﬁned task and
clear boundaries within which to operate, it comes with the drawback of having each
person doing exactly what they are asked for instead of being encouraged to explore
new interests and take on more challenging tasks. Also, job descriptions are often
imprecise as they are rarely updated. Therefore, the risk is that they reflect an outdated
and illusionary understanding of an organization rather than an accurate and valid one.
Moreover, hierarchical organizations have strong top-down management with delegation of authority being rare. Managers limit delegation of decision-making and even
when they do their decisions ultimately trump. As a result, the organizational structure
is mandated from the top and rarely revisited as that could reveal re-organization
opportunities that might threaten existing power structures. Finally, hierarchical
organizations are often characterized by implicit rules and “shadow beliefs” that slow
down change and favor people and assumptions that are well known and well established in the organization. In Table 1, we summarize the strengths and the weaknesses
that are typically associated with hierarchical organizations, and as described in previous literature [1, 2, 7–9].
Table 1. Strengths and weaknesses of hierarchical organizations.
Strengths:
• Effective for scaling
• Efﬁcient for controlling many people from a
central position
• Efﬁcient for repeatable tasks and replicable
processes
• Harmonization of processes
• Globalization
• Handles low complexity situations well

Weaknesses:
• Slow decision making processes
• Power driven by position rather than by
capability
• Tendency to be internally focused
• Easily gravitates to politics
• Highly resistant to changes
• Challenged by high-complexity
situations

As can be seen in the table, hierarchical organizations scale well, they are efﬁcient
for managing large groups of people and they support the execution of repeatable tasks
in large organizations even if geographically distributed. Also, they are regarded
superior for tasks and situations of low complexity as they advocate stability and rigor
rather then flexibility and elasticity. However, and as mentioned already in beginning of
this section, hierarchies suffer from a number of weaknesses that make them less suited
for certain business environments. They are considered to slow down decision-making
processes, as it is typically time consuming to coordinate within an hierarchy that
involve a number of layers of people and roles. Also, they easily gravitate towards

No More Bosses?

89

internal politics and they favor people and opinions that are already well established.
Finally, and as the main drawback for organizations that operate in turbulent and fast
changing business environments, they are less suited for highly complex situations
where there are no predeﬁned rules or known variables.
2.2

New Organizational Models

In response to the weaknesses that have been identiﬁed in relation to hierarchical
organizations, a number of new organizational models have emerged. Instead of managers, top-down decision-making and formal reporting structures these models
emphasize self-management, peer-to-peer advice processes and informal, temporary
structures that support the task or situation at hand. Examples of such organizational
models are e.g. agile development [3, 4, 10], holistic/authentic organizations [2],
holocracy [5], and exponential organizations [7]. Although different in some aspects,
they all emphasize principles of empowerment and self-management, wholeness and
evolutionary purpose. This means that they advocate organizations with coordination
mechanisms and natural leadership instead of managers with positional power, temporary hierarchies instead of formal ones and open and accepting cultures in which
people act according to whom they are instead of what is expected in order to “to try to
ﬁt in”. Also, all models emphasize the wisdom of the crowds instead of central
decision-making. Opposite to hierarchical organizations, these models encourage people to shoulder one or more roles, independent on place in the organization, and they
provide support for coordination of work between people. Throughout the organization,
everyone has the authority and autonomy to make decisions that pertain to their role or
roles. All relevant stakeholders need to be asked for advice before any decision is made,
but this is not to be confused with consensus. Any individual can decide to disregard the
advice given as long as all stakeholders that are affected by the decision have been asked
for their input. In such organizations, roles, activities and agreements evolve constantly
in mutual agreement and agreements are entered voluntarily.
In looking more closely at the concept of empowerment, empowered organizations
are described as having fewer levels of rank and hierarchy, making them flat, lean and
nimble [11]. Typically, decisions are made at every level within the organization
instead of handed down from the top as orders to be followed, and all employees are
encouraged to expand their expertise in various roles rather than climb a ladder of titles.
Empowered organizations are driven by teamwork [3, 4, 12]. Employees proven to be
capable are made responsible for making decisions that impact the company, and are
held accountable for the results of their decisions. In larger empowered organizations,
employees form teams to control various aspects of the organization. Employees may
move around to different teams over time, which can increases their expertise in various
roles and their value as employees overall. Instead of the traditional management role,
empowered organizations have ‘leaders’ that guide the direction of the company by
enabling employees to create, to take risks and to work interdependently, and as such
uncover the direction for the company as an organization as part of its evolving
purpose. Hence, a leader is less a person of authority and more a person of support.
It should be noted that organizational models that emphasize empowerment is not a
new phenomenon. Already in the early 1980’s, Henry Mintzberg presented ﬁve

90

H.H. Olsson and J. Bosch

organizational types for designing effective organizations and shows how most organizations are typically hybrids of two or more of these types [8, 9]. In his research,
Mintzberg focuses on collaboration mechanisms, power distribution and structural
issues and identiﬁes mutual adjustment as one of the coordination mechanisms that
allow individuals to coordinate their own work. The adhocracy type is an example of
an organizational moel focused on empowerment.
Table 2. Organizational models that emphasize empowerment, self-management and
autonomy.
Organizational model:
Agile development

Holistic/authentic
organizations

Holocracy

Exponential
organizations

Characteristics:
• Empowered teams
• Voluntary commitment
• Coordination through communication (daily standup meetings)
• Close and frequent customer collaboration (short development
sprints)
• Team mission is to do “right” by the customer
• Self-managed teams
• Wholeness as in “be yourself at work”
• Evolutionary purpose
• Organizations as increasingly driven by autonomous teams
• Peer-to-peer review and advice processes as the basis for
decision-making
• Roles are deﬁned around the work, not around people, and are
updated regularly
• People ﬁll several roles and are associated with several tasks
• Authority is distributed to teams and roles and decisions are made
locally
• The organizational structure is regularly updated via small and
frequent iterations and every team is self-managed
• Everyone is bound by the same rules – the CEO included. • These
rules are transparent and known by everyone in the organization
• Strives for nimbleness by accessing, renting or sharing of people
and assets rather than owning them (“staff on demand”)
• Engagement through digital reputation systems and incentive prizes
• Organizational metrics are transparent and visible to everyone in
the organization
• Emphasizes risk-taking and failure to learn rapidly (lean startup)
• Self-managed and multi-disciplinary teams and individuals that
operate with decentralized authority

In Table 2, we provide an overview of some of the new organizational models that
have recently gained increasing interest in management literature as well as among
practitioners in the ﬁeld [2–5, 7]. While we are aware that the models we outline are not
the complete set, and that there exist variations of each of these, we believe that the
ones we identify reflect the main characteristics and therefore, provide a solid background for understanding the fundamentals of the many new organizational models that
are currently gaining momentum.

No More Bosses?

91

While the organizational models identiﬁed above are indeed gaining momentum,
they have so far been challenging to apply in large and distributed organizations with
multiple teams working on a broad product portfolio. Today, most successful examples
originate in smaller companies or in startups where non-hierarchical principles and a
culture of empowerment can be more easily implemented from the start, and where
communication and coordination is by nature less complex. However, there are
examples of long-established organizations that have restructured themselves with great
success [2, 7, 11], and there are several aspects of these models that help larger organizations move away from their traditionally slow hierarchies towards more rapid and
competitive ways-of-working. In what follows, we report on seven companies in the
embedded systems domain that have managed to adopt principles of empowerment. As
can be seen in our research, they did so in relation to certain tasks and they were able to
accelerate performance by bypassing formal organizational structures. Below, we outline the case companies involved in our study and the research method that we choose.

3 Research Method
The research reported in this paper builds on on-going multi case study research [13,
14] in close collaboration with seven companies in the embedded systems domain. The
project was initiated in July 2015 and is on-going. In the sections below, we provide a
short description of each company and the roles that we met with so far, and we
describe the research design that was chosen.
3.1

Case Companies

See Table 3.
Table 3. The seven case companies and the roles involved in the study.
Company A

Company B

Company C

Company D

A provider of communication systems and equipment for mobile and ﬁxed
network operators. For the purpose of this study, we met with ﬁve people with
expertise in product and project management, change management, software
development and methods and tools.
Offers network cameras and camera applications for professional IP video
surveillance. For the purpose of this study, we met with four people with
expertise in software architecture, product and project management and
methods and tools.
A manufacturer and supplier of transport solutions for commercial use. For the
purpose of this study, we met with two people with expertise in technical lead
within software engineering and management.
An automotive telematics service provider providing manufacturers of cars
and commercial vehicles with complete and customized telematics services to
end-customers. For the purpose of this study, we met with two people with
expertise in product and project management and technical management and
connectivity.
(continued)

92

H.H. Olsson and J. Bosch
Table 3. (continued)

Company E

Company F

Company G

3.2

A software company specializing in navigational information, operations
management and optimization solutions for airlines. For the purpose of this
study, we met with three people with expertise in software architecture,
software development and product and project management.
A manufacturer producing pumps for heating, air conditioning and for water
supply. For the purpose of this study, we met with three people with expertise
in product and project management, solutions and services development and
emerging technologies and connectivity.
A company that serves the global market with products, services and solutions
from military defense to civil security. For the purpose of this study, we met
with two people with expertise in software development, architecture and
product and project management.

Case Study Design

The ﬁndings reported in this paper are based on longitudinal multi case study research
in seven software-intensive companies in the embedded systems domain. The research
project was initiated in July 2015 and is on going. So far, we have conducted a number
of workshops and focus groups in all companies. Also, we have developed and piloted
an on-line survey to be used in our future research.
In total, we have met with sixteen people representing the seven companies and we
have organized six cross-company workshops and two reporting workshops. At each
workshop, the researchers introduced a selected topic by giving a short presentation.
Examples of topics are emerging organizational models, empowerment and
self-management, non-hierarchical principles in software development and
non-hierarchical principles for business decisions. To prepare for each workshop, the
company representatives got a few questions related to the selected topic, and after the
introduction and presentation by the two researchers each company was asked to
present and share a few examples/cases in relation to the questions. In this way, the
workshops became a mix of research presentations, company presentations and group
discussions focusing on the selected questions and topics. During the workshops, one
of the researchers took notes to document the discussions and the answers to the
pre-deﬁned questions. Also, each company provided a presentation slide deck that they
shared with the researchers and that summarized the examples/cases they presented.
Finally, many illustrations were made on a white board during the discussions and all
these were captured as input to the empirical data collection process.
During analysis, the workshop notes, the company presentations and the graphical
illustrations were used as the basis for interpretation and coding of the data. The data
was coded and analyzed following the conventional qualitative content analysis
approach [14] where we derived the codes directly from the text data. As soon as any
questions or potential misunderstandings occurred, we veriﬁed the information with the
representatives from the companies.
To strengthen the construct validity of our study [15], we continuously validated
our interpretations and ﬁndings with the company representatives. Also, we combined

No More Bosses?

93

data from different sources and we were two researchers who independently could
assess the data, as well as together discuss our interpretations. The results of the study
cannot directly translate to other companies. However, and considering the external
validity of our study, the case companies represent current state-of-practice of
large-scale software development and therefore, we believe that the results we present
are valid also for other large-scale software development companies.

4 Findings
In this section, we present our empirical ﬁndings. Our ﬁndings build on group discussions and company presentations from the cross-company workshops that we
organized for the purpose of this research. First, we provide an overall description of
the companies and the business domain in which they operate. This description serves
as a basis for understanding their current ways-of-working and the type of products and
systems they provide. Although the companies operate in different domains and
therefore embrace different characteristics, they have a number of similarities in relation to the way they are organized, the organizational principles they apply and the
restrictions and regulations they face.
Second, we present examples of situations in which the companies have moved
away from their hierarchical and formal ways-of-working and where they have instead
adopted non-hierarchical ways-of-working to achieve outcomes that were considered
difﬁcult to achieve within the traditional organizational structures. These examples
reflect attempts to increase empowerment and to adopt principles that emphasize
informal rather than formal structures and autonomous rather than hierarchical
ways-of-working.
4.1

Case Company Contexts and Current Ways-of-Working

The seven companies involved in this study are all large companies within the
embedded systems domain. This domain has gone through signiﬁcant changes over the
last decade. Whereas this domain was originally driven by mechanics and hardware,
now software makes up the key differentiation in virtually any industry, ranging from
telecommunications to automotive and from aeronautics to defense. For instance, some
reports claim that modern high-end cars have up to 100 million lines of code [16]. In
addition, embedded systems typically have signiﬁcant security and safety requirements
that limit the freedom that these companies have to adopt continuous deployment and
other modern software development technologies. Finally, as embedded systems often
are highly interconnected in order to deliver the desired product functionality, the
complexity of the system is not only due to size but also due to the high connectivity
between different parts of the system.
The characteristics of software-intensive embedded systems have signiﬁcant
implications on the organizational approaches that these companies typically employ.
As these organizations are large, globally distributed and have to satisfy certiﬁcation and
regulatory demands, the organizational structure tends to be highly hierarchical and

94

H.H. Olsson and J. Bosch

centralized. Over the last decades, this structure has evolved as the most effective to
deliver large and complex systems while satisfying the requirements of all stakeholders.
However, with the emergence of modern collaboration technologies, automated ways of
ensuring regulatory and certiﬁcation requirements as well as the increasing need for
continuous deployment for differentiation purposes, also embedded systems companies
are now experiencing the limits of hierarchical organizations and have started to
experiment with principles reflecting self-management and empowerment.
4.2

Adoption of Non-hierarchical and Empowered Ways-of-Working

Below, we present examples that illustrate situations in which the companies moved
away from their formal and typically hierarchical ways-of-working and where they
instead adopted non-hierarchical ways-of-working reflecting empowerment. We present our ﬁndings by categorizing them into four areas within which all companies
report on new ways-of-working. We summarize our ﬁndings in Table 4.
Table 4. Summary of the four areas in which the case companies adopted alternative
ways-of-working.

Area:
New
strategic
focus
areas/innovations
Customer/research
collaborations

Adoption of new
development
practices
Competence
development
new core areas

in

Principles of empowerment:
• Self-managed teams
• Voluntarily engagement
• Advice and peer-to-peer processes
• Formal organization as a “guiding star”
• Purpose and vision-driven initiatives
• Direct customer communication
• Exploratory experimentation
• Voluntarily engagement
• Autonomous teams that run in parallel with the
formal organization
• Rapid feedback cycles
• Bottom-up adoption of new practices
• Cross-functional teams
• Teams decide on methods and tools
• Short iterations with a “build-test-refine” approach

Area 1: New Strategic Focus Areas/Innovations
In all case companies, strategic focus areas are typically put together and presented by
top management. Often, they consist of a list of ideas that are identiﬁed as critical for
the coming years. These ideas are then shared with the organization to be evaluated and
“tested” on people in order to understand whether they are indeed the right things to
focus on. Common examples of focus areas in the companies are continuous integration, continuous deployment and development and adoption of new technologies etc.

No More Bosses?

95

Typically, strategic focus areas involve technical development and new technology as
well as adoption of new ways-of-working and process initiatives. As a common
characteristic, they don’t involve speciﬁc funding but rather they have synergies that
integrate well with the existing formal organization. When it comes to involving in
such an initiative the companies use a self-management approach in that they don’t
assign the traditional roles. Instead, role assignment is flexible and people choose in
what initiative they want to involve. No speciﬁc requirements are deﬁned for the
involvement. Instead, people engage based on their own interest and in order to beneﬁt
from synergies in relation to their formally assigned tasks. When starting a strategic
focus area or innovation initiative, the companies identify key people that can work as
ambassadors and that can help drive the initiative in the different units. By having a
good mapping and alignment between the formal organization and the new initiatives
commitment from key people is ensured. Also, the mapping and alignment is critical
for securing an impact on the organization as a whole. From a management perspective,
only brief guidelines are provided in terms of ‘how’ to address and solve a speciﬁc
challenge in order not to restrict creativity in the units.
During the initiative, those involved meet with, and report to, the formal organization on a regular basis. However, the formal reporting structure is only used as a
“guiding star” that helps setting direction and steer the initiative towards a goal. It is not
used for guiding ways-of-working or detailing what methods and processes to use. In
all case companies, people report on a wide variety of methods and processes and the
opportunity to more flexibly adapt to the task at hand. Throughout the initiative, an
advice process is used to invite the larger organization to workshops and presentation
sessions where they get the opportunity to provide feedback on the initiative and the
focus and direction of the work. In this way, the team gets ideas and comments that can
be used to structure the work and to understand the direction in which the larger
organization is aiming in the coming years.
Area 2: Customer/Research Collaborations
A common area in which principles of non-hierarchical and autonomous
ways-of-working are seen in all companies, are customer and research collaborations.
The case companies report on examples where individual engineers shortcuts the
formal hierarchies and instead initiates direct customer contact in order to accelerate
and achieve a certain outcome. Typically, this happens when there is a purpose and
vision-driven initiative rather than requirements-driven initiatives. The purpose and
vision driven initiatives are more open in scope, they often involve stakeholders that
the companies don’t typically collaborate with and they require alternative
ways-of-working to be successful. In such initiatives, the time people put in is “synergy
funded” which means that they don’t get time allocated. Instead, the problem they
solve and the effort they put into this generate advantages and beneﬁts in other tasks
and in their formal assignments.
In some of the companies, these initiatives are used, as examples for customers in
order to show them that the organization is capable of doing new things in new ways and
things that might not be recognized in the formal organization and its processes. As such,
innovative customer collaborations are used to show other customers that the organization manages to explore new opportunities by using alternative ways-of-working at the

96

H.H. Olsson and J. Bosch

same time as it is able to fulﬁll the requirements in relation to the traditional tasks.
Internally, the companies ﬁnd the opportunity for people to work on things where there
are no formal requirements and “no right way” very motivating. In similar, the companies use alternative ways-of-working in collaborations with different research teams.
The companies in our study are all involved in several larger research collaborations in
which different academic partners are involved. In such collaborations people involve
when they ﬁnd it interesting and if they ﬁnd it beneﬁcial to their work. In some cases, the
companies agree more formally on time and resources, but often there is flexibility and
people decide themselves. The goal is to create a situation where people can choose to be
part of projects that they pick themselves and where they feel they get support by the
larger organization without having to put the traditional structures and processes in place.
Area 3: Adoption of New Development Practices
All the case companies are large and successful software-intensive companies in the
embedded systems domain. If not market leading, they have strong positions in their
business ecosystems and they manage to stay competitive despite ﬁerce business
environments and disruptive technologies. One of the reasons for this is the ambition to
accelerate the adoption of new development practices. All companies have continuous
integration practices in place, and some of the companies are implementing continuous
deployment. In order to drive the adoption of new development practices, several of the
companies report on the potential to have speciﬁc and “non-traditional” initiatives as a
means to accelerate the process. As an example, one of the companies started a project
with the aim to re-architect the platform to reduce its complexity. But rather than
viewing this as the only goal, the responsible team recognized the potential to also have
this project help drive continuous deployment in the organization. What typically
happens, and as experienced in several of the companies, is that an initiative that aims
to improve product and process performance often beneﬁt from alternative
ways-of-working where the responsible team can operate outside, or in parallel, with
the formal organizational structure to increase speed. In the case of the platform initiative mentioned above, it became a larger initiative that aligned well with the organizational goals and that added beneﬁts to a number of roles and units, but that was run
in parallel with the formal organization and outside the traditional unit boundaries. This
allows for cross-functional and innovative ways-of-working where teams can operate
more autonomously and where they can drive the process. As recognized in one of the
companies, this reflects a bottom-up approach to process improvement and as a result
the new practices are better manifested and aligned with the organization in which they
are intended to be used.
Area 4: Competence Development in New Core Areas
The majority of the case companies use alternative ways-of-working when learning
skills that are outside their core practices. For example, one of the companies in the
automotive domain is experiencing a strong need to catch up when it comes to software
development as they are traditionally a mechanic company. To do so, they apply more
flexible ways-of-working that allow for cross-functional teams to work together and
they let the teams decide what methods and what tools they want to use when practicing the new skills. Still, the teams interface with the formal organization by using
standardized exchange formats, but these are meant to facilitate knowledge sharing and

No More Bosses?

97

not to restrict team creativity. In this area, the other companies describe similar initiatives where they are able to have teams explore new ways-of-working while at the
same time contribute to the larger goals and initiatives of the formal organization. Since
the focus is on competence development for areas that are regarded important for future
development, they are prioritized by the organization and the teams that choose to work
on these enjoy the opportunity of being more flexible and explorative. In most cases,
the teams work in a “build-measure-learn” fashion [17] instead of the traditional
“specify-understand-agree-build-test” model, and some of the companies explore agile
contracting as part of the initiatives.

5 Discussion
In this section, we discuss and summarize our ﬁndings and we present a number of
recommendations that facilitate the adoption of empowerment in large organizations. In
addition, we present a model in which we outline the transformation towards
empowered organizations by identifying the different steps that companies typically
take when moving from traditional towards more empowered ways-of-working.
5.1

Towards Empowerment in Large-Scale Software Development

Traditionally, and as recognized in previous research, hierarchical organizations have
severe challenges meeting rapidly changing markets and customer needs [2, 5, 7]. Due
to formal structures and communication and coordination overhead between the different organizational layers hierarchical organizations often struggle to innovate. As a
result, these organizations run the risk of being disrupted by competitors that are able to
act nimble and adopt more effective ways-of-working. This was recognized already in
the beginning of the 1980’s when a number of organizational structures were presented
in order to increase the understanding for the many different ways in which organizations operate [1, 8, 9]. Also, the introduction of the agile development methods
signiﬁcantly increased the emphasis on flexibility and speed as organizational principles [10, 12]. More recently, and in response to the challenges that hierarchical
organizations face, a number of alternative organizational models have started to
emerge. Typically, these models originate from successful startups, from online
companies or from smaller companies that have managed to grow based on highly
engaged and skilled teams that operate independently and that spend little, if any, time
navigating through administrational, managerial and political layers [11]. While traditional hierarchies support repetitive tasks, control of large groups of people and
scaling of processes, these alternative models emphasize empowerment and
self-management and they seek to facilitate autonomy to increase speed and responsiveness throughout the organization.
Although very attractive, the adoption of alternative organizational models in
large-scale companies is challenging and so far examples of such adoption are few.
However, and as reported in this paper, attempts to facilitate empowerment are
emerging also in larger organizations and we are starting to discern situations in which

98

H.H. Olsson and J. Bosch

also these companies move away from their traditionally slow hierarchies towards more
empowered ways-of-working. In our study, we see how the companies have adopted
principles that helped them increase flexibility and responsiveness, and that encouraged
individuals and teams to take responsibility, to accelerate innovation and to improve
creativity. Based on our empirical research and the lessons learnt in the case companies, we summarize our ﬁndings by presenting a number of recommendations. The
recommendations reflect considerations that should be taken to facilitate the adoption
of empowerment in large software-intensive organizations.
• Recommendation 1: Non-hierarchical ways-of-working and initiatives that challenge traditional organizational structures need to align well with the overall
business goals in order to have an impact and to attract long-term commitment of
key people in the organization.
• Recommendation 2: Non-hierarchical ways-of-working and initiatives that
emphasize informal and emergent structures need to align with, contribute to and
help accelerate already existing formal and approved structures.
• Recommendation 3: Non-hierarchical ways-of-working and initiatives that challenge the traditional organizational structures should aim at strengthening customer
relationships and collaborations by shortening feedback loops and allowing rapid
communication and coordination channels.
• Recommendation 4: In moving towards non-hierarchical and autonomous waysof-working, large organizations need to balance bottom-up autonomy and nonhierarchical problem solving with top-down strategy and central decision-making.
• Recommendation 5: In moving towards non-hierarchical and autonomous
ways-of-working, large organizations need to balance long-term and strategic
decision-making made centrally within existing management, with short-term and
tactic implementations made locally within self-organizing teams.
5.2

Transforming Towards Empowered Organizations

During our research, and supported by previous research, we see that companies evolve
their software development practices over time [17]. Similarly, companies evolve their
organizational practices and their ways-of-working. Typically, and in relation to the
focus of this paper, there is a pattern that companies typically follow as they evolve and
as they transform towards becoming empowered organizations. We illustrate this
evolution in Fig. 1 by outlining the steps from traditional organizations characterized
by formal hierarchies to empowered organizations characterized by decentralized
decision-making and leaders instead of managers.
To further detail our model, and to recognize the different organizational levels at
which principles of empowerment can be adopted, we outline each step in relation to
four organizational functions that are found in most organizations (Table 5). We refer to
these levels as ‘local’ as principles characterizing the culture between multiple R&D
teams, ‘inter-team’ as in principles characterizing the culture between departments such
as e.g. R&D and Product Mgmt. (PdM), ‘General Mgmt.’ as in principles characterizing
the culture between high-level divisions and, ﬁnally, ‘Culture’ as in principles characterizing the culture in the entire company.

No More Bosses?

99

Fig. 1. From traditional towards empowered organizations.
Table 5. Summary of each step in the model (Traditional – Empowered), and how different
organizational functions operate at each of these steps.

Finally, it should be noted that although the principles of empowered organizations
are both attractive and rewarding, the adoption of these depend on the type and mission
of a company. In organizations that operate in domains that are characterized by highly
standardized and repetitive ways of doing things, a hierarchical organizational model is
typically the best form. In such organizations, the short-term can be to stay hierarchical
while the long-term goal should probably be to automate manual practices rather than
focusing on the transformation towards more empowered ways-of-working.

6 Conclusion
In this paper, we explore how seven large software-intensive companies in the
embedded systems domain are adopting principles of non-hierarchical organizations in
order to increase team autonomy. Typically, such principles are seen in smaller companies and in startups but so far they have been challenging to adopt in larger companies due to formal structures and existing hierarchies. However, and as reported in
this paper, attempts to facilitate empowerment and self-management of teams are
rapidly emerging also in larger organizations. In our study, we provide empirical
evidence on situations in which the organizations we studied were able to accelerate
innovation and by-pass formal hierarchies by adopting alternative ways-of-working

100

H.H. Olsson and J. Bosch

that allow outcomes that the traditional structures do not support. Also, we show how
non-hierarchical principles help large-scale software development companies to act
more rapidly and with increased engagement and autonomy by teams. We provide
recommendations for how to facilitate the adoption of non-hierarchical and autonomous ways-of-working and we develop a model that outlines the typical evolution path
when transforming towards autonomous organizations.
While our research is limited to seven software-intensive companies in the embedded
systems domain we have good reasons to believe that the challenges they face, and the
opportunities they currently explore, are valid also for other similar companies. As any
hierarchical organization has severe challenges in meeting rapidly changing markets and
customer needs, and as alternative organizational models are appearing to address these
challenges, we foresee a future in which companies that manage to adopt this paradigm
shift early will signiﬁcantly improve their competitiveness.

References
1. Malone, T.W.: Modeling coordination in organizations and markets. Manage. Sci. 33(10),
1317–1332 (1987)
2. Laloux, F.: Reinventing organizations. Nelson Parker, Brussels (2014)
3. Larman, C., Vodde, B.: Practices for Scaling Lean & Agile Development: Large, Multisite,
and Offshore Product Development with Large-Scale Scrum. Addison-Wesley, Upper
Saddle River (2010)
4. Lefﬁngwell, D.: Scaling Software Agility: Best Practices for Large Enterprises.
Addison-Wesley, Upper Saddle River (2007)
5. Robertson, B.J.: Holacracy: The New Management System for a Rapidly Changing World.
Henry Holt and Company, New York (2015)
6. Endenburg, G.: Sociocracy. The Organization of Decision-Making ‘No Objection’ as the
Principle of Sociocracy. Eburon, Delft (1998)
7. Ismail, S., Malone, M.S., Van Guest, Y.: Exponential Organizations: Why New Organizations are Ten Times Better, Faster, and Cheaper than Yours (and What to Do About It).
Diversion Books, New York (2014)
8. Mintzberg, H.: Structure in ﬁves: a synthesis of the research on organisation design. Manage.
Sci. 26(3), 322–341 (1980)
9. Mintzberg, H.: Designing Effective Organizations. Prentice-Hall, New Jersey (1983)
10. Eckstein, J.: Agile Software Development in the Large: Diving into the Deep. Dorset House,
New York (2004)
11. http://smallbusiness.chron.com/structure-empowered-organization-44194.html. Accessed 23
June 2016
12. Highsmith, J., Cockburn, A.: Agile software development: the business of innovation. In:
Software Management, pp. 120–122 (2001)
13. Yin, R.K.: Case study research. Design and Methods, 3rd edn. Sage, London (2003)
14. Maxwell, J.A.: Qualitative Research Design: An Interactive Approach, 2nd edn. SAGE
Publications, Thousands Oaks (2005)
15. Adler, P.A., Adler, P.: Observational Techniques. In: Denzin, N.K., Lincoln, Y. (eds.)
Handbook of Qualitative Research, pp. 377–393. Sage, Thousand Oaks (1994)

No More Bosses?

101

16. Ebert, C., Jones, C.: Embedded software: facts, ﬁgures, and future. Computer 42(4), 42–52
(2009)
17. Ries, E.: The Lean Startup: How Constant Innovation Creates Radically Successful
Businesses. Penguin Group, London (2011)
18. Olsson, H.H., Alahyari, H., Bosch, J.: Climbing the “Stairway to Heaven”: a multiple-case
study exploring barriers in the transition from agile development towards continuous
deployment of software. In: Proceedings of the 38th Euromicro Conference on Software
Engineering and Advanced Applications, Cesme, Izmir, Turkey, 5–7 September 2012

Supporting Management of Hybrid OSS
Communnities - A Stakeholder
Analysis Approach
Hanna Mäenpää1(B) , Tero Kojo2 , Myriam Munezero1 , Fabian Fagerholm1 ,
Terhi Kilamo3 , Mikko Nurminen3 , and Tomi Männistö1
1

University of Helsinki, Helsinki, Finland
{hanna.maenpaa,myriam.munezero,fabian.fagerholm,
tomi.mannisto}@cs.helsinki.fi
2
The Qt Company, Espoo, Finland
tero.kojo@qt.io
3
Tampere Technical University, Tampere, Finland
{terhi.kilamo,mikko.nurminen}@tut.fi

Abstract. In Hybrid Open Source Software projects, independent and
commercially oriented stakeholders collaborate using freely accessible
tools and development processes. Here, contributors can enter and leave
the community ﬂexibly, which poses a challenge for community managers
in ensuring the sustainability of the community. This short paper reports
initial results from an industrial case study of the “Qt” Open Source Software project. We present a visual stakeholder analysis approach, building
on data from the three systems that provide for the Qt project’s complete software development workﬂow. This overview, augmented with
information about the stakeholders’ organizational aﬃliations, proved to
help the project’s community manager in ﬁnding potential for encouraging contributors and to identify issues that can potentially be detrimental
for the community.

Keywords: Hybrid open source
Stakeholder identiﬁcation

1

·

Community management

·

Introduction

Using Open Source Software (OSS) has proven eﬀective for companies as it
removes many barriers to code reuse and modiﬁcation. By collaborating with
OSS communities, companies can beneﬁt from the knowledge and work of external developers and ﬂexibly integrate their input to the company’s internal development processes. OSS communities provide a low cost means for testing the
quality of products which reduces the time to market and cost of software releases
[5,12]. In hybrid OSS projects a mix of independent and commercially oriented
stakeholders build software collaboratively. While independent developers are
inspired by intrinsic reasons such as improving their skills or supporting an
c Springer International Publishing AG 2016

P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 102–108, 2016.
DOI: 10.1007/978-3-319-49094-6 7

Supporting Management of Hybrid OSS Communnities

103

ideology [6,11], companies focus on achieving business goals. In some hybrid
communities (including our case community, Qt) a community manager works
to ensure that the community is healthy and heading in a sustainable direction.
As the scope of OSS projects is typically large, knowledge about stakeholders’
needs has to be synthesized manually from various sources. This encumbers the
work of the community manager signiﬁcantly.
Previous attempts to identify stakeholders and their roles have focused on
single data sources such as code repositories [7,10], bug databases [3] and mailing lists [2,8]. Fewer address the complete software development process from
requirements engineering to delivery of the code. The contribution of this short
paper is to report experiences of the latter strategy from the practical viewpoint
of a community manager’s work.

2

Case Qt

Qt is an open source software that is used to create platform-independent applications for Android, iOS and Windows operating system environments. Its users
include independent application developers and companies from a wide variety of industries including electronics, automotives, defense and media. The Qt
software’s hybrid licensing model oﬀers both an Open Source1 and commercial
options [4]. The chargeable licenses allow for making applications proprietary,
to access new software components and to receive varying levels of support from
hosting organization: The Qt Company. Development of the Qt software is driven
by a versatile, open community for which the company provides software development tools. The company holds decision power for timing and packaging of
releases and employs an internal software development unit to strengthen the
community driven development process. To advance interoperability and dissemination of the software, the company hosts a partner consortium. The community
manager describes dynamics of the ecosystem as:
“Adoption of the Qt software by developers leads to increased participation in
community activities. This disseminates knowledge about the software, as information is shared among developers, leading to a stronger, healthier community.
This translates into improvements in the product itself. From a commercial perspective, an active community presents more potential to existing stakeholders
and builds attractiveness of the product and its surrounding business ecosystem.”
He continues:
“A high-performing community member is very active on multiple platforms
or extremely active on one of those. Some contributors shine out of the usage
statistics of distinct tools but without looking at multiple platforms the best do
not stand out.”
1

GPL, LGPL v2.1, LGPL v3.

104

H. Mäenpää et al.

The manager hopes to get a data-centric, visual overview of all the stakeholders who develop the software. With this, he hopes to streamline his work
process and to identify individuals that are entering the community, have potential for increasing contributions and those that have previously been active, but
are fading away.

3

Research Approach

This study highlights relevant issues for community management in its real context [9,13]. By using data from tools that provide for the project’s complete
development workﬂow, we hoped to reveal those stakeholders that are active in
multiple roles. A random three-month period was chosen because it represents a
typical release span of the software. A sample was extracted from the task management system Jira, code review tool Gerrit and source code repository Git in
June 2015, focusing on the currently latest release of the software (version 5.5).
We yielded records of 248 unique work requests, 2194 code review actions
and 3363 increments to the software code. As the data sources were separate,
cross-sample matching was required to eliminate false identities that were due
to variations in spellings and the several email addresses persons had used in
registering to the systems. 284 unique stakeholders from 139 organizational aﬃliations were discovered and network visualizations with force-directed layout [1]
were created in July 2015 using the Gephi tool. Three semi-structured interviews with the community manager followed within the timespan of 6 months to
evaluate beneﬁts and deﬁciencies of the approach.

4

Results

A developer could have: (A) created a work task, (B) being assigned to one,
(C) performed a code review or (D) authored a code increment. 12 diﬀerent
combinations of these roles (referred to later as “activity groups”) were found.
Figure 1 overviews the 284 individuals who are represented as small nodes while
the large central nodes (A,B,C,D) represent types of the activities they had
performed. Table 1 displays sizes of the activity groups.
A total of 139 aﬃliations of developers were grouped into four groups: employees of the Qt Company2 , members of its partner consortium3 , employees of
commercial companies4 and independent individuals5 . Shares of these aﬃliation types in each activity group are presented in Fig. 2, highlighting the host
company’s inﬂuence on diﬀerent stages of the workﬂow.

2
3
4
5

theqtcompany.com, digia.com, qt-project.org.
kde.org, kdab.org, redhat.org.
jolla.com, mrisoftware.com, basyscom.com.
gmail.com, yeandex.ru, hotmail.com.

Supporting Management of Hybrid OSS Communnities

105

Fig. 1. Developers associated with 1–2 activities (left) and 3–4 activities (right). (A)
Creating a work task (B) being assigned for work (C) performing a code review (D)
authoring a code increment.

Fig. 2. Distribution of aﬃliations in each activity combination group. Column height
is related to group size.

4.1

The Manager’s Viewpoint

The community manager (Later, CM) was provided with interactive network
visualizations where names of developers were distinguishable. At the same time
Fig. 2 provided shares of diﬀerent aﬃliations among each of the activity groups.
The following subsections summarize the observations.
Company Involvement. One of the beneﬁts of the OSS development approach is that by combining expertise and knowledge of versatile stakeholders, a

106

H. Mäenpää et al.

Table 1. Number of developers that have performed in an activity or a combination
of activities. # = number of individuals.
Description

#

%

Person that has performed one activity N = 185 (65 %)
A

Created work task

50

18 %

B

Assigned for work

6

2%

C

Performed a code review

8

D

Authored code increment

121

3%
43 %

Two activities N = 39 (14 %)
A,B

Created, Assigned

4

1%

A,C

Created, Reviewed

0

0%

A,D

Created, Authored

14

5%

B,C

Assigned, Reviewed

0

0%

B,D

Assigned, Authored

6

2%

C,D

Reviewed, Authored

15

5%

0

0%

Three activities N = 35 (23 %)
A,B,C

Created, Assigned, Reviewed

A,B,D

Created, Assigned, Authored

5

2%

A,C,D

Created, Reviewed, Authored

7

2%

C,B,D

Assigned, Reviewed, Authored

23

8%

A,B,C,D Created, Assigned, Reviewed, Authored 25

9%

Four activities N = 25 (9 %)
Total of 284 100 %

better quality of software products can be expected. The CM was delighted to
see that a mix of company external stakeholders dominated both reporting new
work issues (A) and delivering source code increments (D). As groups (ABCD),
(BCD,ABD,ACD) and (CD) consisted of the developers who actively run the
project, the CM identiﬁed that having stakeholders from the partner organizations complemented the host company’s dominance.
Potential for Growth. Acquiring new contributors and increasing the involvement of existing ones are essential for ensuring sustainability of OSS communities. The CM found value in seeing names of individuals in the visualization, as
contacting people for rewarding or encouragement could then be a next step. He
also saw the potential for identifying and following entry paths of new developers.
As reporting defects (A) is often the ﬁrst thing a person can do, combinations
(AD) and (ABD) could reﬂect the increasing or decreasing of their involvement.
As the combination groups were small compared to (A), a need for encouraging the current stakeholders was identiﬁed. This was also the case for (D) as

Supporting Management of Hybrid OSS Communnities

107

delegating work for others while working on the source code (AD) would be
desired.
Code reviewers (C) are the most fundamentally integrated to the community
since this task requires deep expertise, which is gained through personal learning
and peer-to-peer mentoring. 55 % of developers that performed code reviews with
any other combination were employees of the host company. Increasing the share
of active code reviewers from outside the host company was thus desired.
Alarm Signals. A healthy hybrid OSS community consists of both commercial
and non-aligned stakeholders. If the overall number of independent developers in
any activity group would suddenly drop, a conﬂict between community-driven
and commercial motivations would be evident. Here, a rapid investigation into
the reasons and planning an intervention would be required. Secondly, some
activity groups reﬂect the way the software development process is organized
and how it is performing. The lack of group (BD) was mentioned as indicative of
the Gerrit tools ﬁtness to support coordination of the code review process. Also,
if the number of people performing only work coordination (AB) would grow, a
management overhead would be evident.

5

Limitations

External validity of the empirical evaluation is compromised by the singlecase study design and involvement of only one interviewee. However, as hybrid
OSS projects typically form around business-driven, platform-like products, the
ecosystem itself can be considered representative. The limited time span of the
sampling does not allow continuous analysis, which is particularly interesting for
understanding development of the community. However, we hope the results give
a way for future research on community management in hybrid environments.

6

Conclusion

This single case study explored ways of supporting management of hybrid OSS
communities. Based on an initial visualization approach, several indicators that
contribute to the health of the community could be observed by overviewing
stakeholders that actively participate in tasks related to work coordination, code
review and code delivery of the project. Based on these, the community manager was able to pinpoint individuals whose contributions to the project could be
encouraged and to contemplate how the eﬀectiveness of these actions could be
evaluated. Knowledge of stakeholders’ aﬃliations helped in evaluating the inﬂuence of diﬀerent organizations on the development process. The manager also
reported increased awareness of phenomena that might be indicative of problems
in the organization of the community-driven software development process.
As future work, we have identiﬁed several places of improvement. These
include using a longer time span for sampling, which would reveal the the in-

108

H. Mäenpää et al.

and outﬂow of developers and help to identify those who are the most dedicated
to the community. Highlighting the frequency of contribution would also add
to the value of the visualization and to help identify high performers on single
platforms. Finally, combining the approach with software process metrics, e.g.
throughput of work requests can bring new, meaningful insight on the phenomena that prevail in hybrid OSS environments.
Acknowledgments. This research is funded by DIMECC’s Need4Speed program
(http://www.n4s.ﬁ/) and the Finnish Funding Agency for Innovation Tekes (http://
www.tekes.ﬁ/en/tekes/).

References
1. Adamic, L.A., Baeza-Yates, R.A., Counts, S. (eds.): Proceedings of the Fifth International Conference on Weblogs and Social Media, Barcelona, Catalonia, Spain,
July 17–21, 2011. The AAAI Press (2011)
2. Concas, G., Lisci, M., Pinna, S., Porruvecchio, G., Uras, S.: Analysing the social
networks constituted by open source communities. In: International Electronic
Conference on Computer Science, vol. 1060, pp. 147–150 (2008)
3. Crowston, K., Howison, J.: Hierarchy and centralization in free and open source
software team communications. Knowl. Technol. Policy 18(4), 65–85 (2006)
4. Digia annual report (2014). http://vuosikertomus2014.digia.com/ﬁlebank/
132-Digia-Annual-Report-2014.pdf. Accessed 08 Sep 2015
5. Free software deﬁnition (2015). http://www.gnu.org/philosophy/free-sw.html.
Accessed 18 Sep 2015
6. Lakhani, K., Wolf, R.G.: Why hackers do what they do: understanding motivation
and eﬀort in free/open source software projects. In: MIT Sloan Working Paper No.
4425–03 (2003)
7. Long, Y., Siau, K.: Social network structures in open source software development
teams. J. Database Manage. 18(2), 25–40 (2007)
8. Oezbek, C., Prechelt, L., Thiel, F.: The onion has cancer: some social network
analysis visualizations of open source project communication. In: Proceedings of
the 3rd International Workshop on Emerging Trends in Free/Libre/Open Source
Software Research and Development (FLOSS 2010), pp. 5–10 (2010)
9. Runeson, P., Höst, M.: Guidelines for conducting and reporting case study research
in software engineering. Empirical Softw. Eng. 14(2), 131–164 (2009)
10. Shen, C., Monge, P.: Who connects with whom? A social network analysis of
an online open source software community. First Monday 16(6) (2011). http://
ﬁrstmonday.org/ojs/index.php/fm/article/view/3551
11. Von Krogh, G., Haeﬂiger, S., Spaeth, S., Wallin, M.W.: Carrots and rainbows:
motivation and social practice in open source software development. MIS Q. 36(2),
649–676 (2012)
12. Watson, R.T., Boudreau, M.-C., York, P.T., Greiner, M.E., Wynn Jr., D.: The
business of open source. Commun. ACM 51(4), 41–46 (2008)
13. Yin, R.K.: Case Study Research: Design and Methods. Sage publications, Thousand Oaks (2014)

Architecture

A Process Framework for Designing Software Reference
Architectures for Providing Tools as a Service
Muhammad Aufeef Chauhan1,3 ✉ , Muhammad Ali Babar1,2, and Christian W. Probst3
(

)

1

3

CREST-Centre for Research on Engineering Software Technologies,
Software and Systems Section, IT University of Copenhagen, Copenhagen, Denmark
muac@itu.dk
2
CREST-Centre for Research on Engineering Software Technologies,
The University of Adelaide, Adelaide, Australia
ali.babar@adelaide.edu.au
Formal Methods Section, Department of Applied Mathematics and Computer Science,
Technical University of Denmark, Kongens Lyngby, Denmark
cwpr@dtu.dk
Abstract. Software Reference Architecture (SRA), which is a generic architec‐
ture solution for a speciﬁc type of software systems, provides foundation for the
design of concrete architectures in terms of architecture design guidelines and
architecture elements. The complexity and size of certain types of software
systems need customized and systematic SRA design and evaluation methods. In
this paper, we present a software Reference Architecture Design process Frame‐
work (RADeF) that can be used for analysis, design and evaluation of the SRA
for provisioning of Tools as a Service as part of a cloud-enabled workSPACE
(TSPACE). The framework is based on the state of the art results from literature
and our experiences with designing software architectures for cloud-based
systems. We have applied RADeF SRA design two types of TSPACE: software
architecting TSPACE and software implementation TSPACE. The presented
framework emphasizes on keeping the conceptual meta-model of the domain
under investigation at the core of SRA design strategy and use it as a guiding tool
for design, evaluation, implementation and evolution of the SRA. The framework
also emphasizes to consider the nature of the tools to be provisioned and under‐
lying cloud platforms to be used while designing SRA. The framework recom‐
mends adoption of the multi-faceted approach for evaluation of SRA and quan‐
tiﬁable measurement scheme to evaluate quality of the SRA. We foresee that
RADeF can facilitate software architects and researchers during design, applica‐
tion and evaluation of a SRA and its instantiations into concrete software systems.
Keywords: Cloud Computing · Software Reference Architecture (SRA) · Tools
as a Service (TaaS) · Architecture Design · Architecture evaluation

1

Introduction

A Software Reference Architecture (SRA) provides an abstraction for designing and
reasoning about a concrete software architecture of a speciﬁc system domain [1, 2].
Whilst a concrete architecture is designed for a speciﬁc project according to well-deﬁned
© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 111–126, 2016.
DOI: 10.1007/978-3-319-49094-6_8

112

M.A. Chauhan et al.

business goals and requirements, a SRA usually aims to address generic business goals
and domain requirements. A SRA consists of not only details on architecture compo‐
nents and its view, but also encompasses best practices for describing the architecture
and the process guidelines for analysis, design and development of the architecture [3].
Though describing stakeholders concerns in terms of architecture view points and
presenting the details of a SRA using multiple views [4] is important, it is equally
important to describe the design-time and run-time quality characteristics of a SRA and
the use of appropriate architecture styles and patterns [5]. A SRA is primarily designed
for two main reasons: (i) to standardize existing available concrete architectures or (ii)
to propose a preliminary SRA that can facilitate concrete architecture design for a
speciﬁc domain. Whilst a SRA standardization eﬀort focuses on extracting reusable
architecture elements from a number of concrete architectures, a SRA preliminary prop‐
osition focuses on recommendations for SRA documentation, guidelines for SRA design
and evaluation as well as SRA adoption and evolution.
In this paper, we present a software Reference Architecture Design process Frame‐
work (RADeF) for designing cloud-based systems in general and cloud-based Tools as
a service workSPACE (TSPACE) in particular. RADeF reports a set of key speciﬁca‐
tions and SRA design guidelines. Whilst cloud-based systems provision on-demand
computing as Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Soft‐
ware as a Service (SaaS) [6], a TSPACE is characterized by as an activity or a task
speciﬁc selection and on-demand provisioning of Tools as a Service (TaaS) as part of
an integrated cloud-enabled workspace [6]. We assert that designing and evaluating a
TSPACE SRA is more challenging than SRAs of general cloud-based systems because
of the involvement of diversiﬁed tools and tenants with varying functional requirements
and quality concerns. For example, performance and scalability can be more important
for tenants and users of software development and testing TSPACE, whereas integration
can be more important for tenant and users of architecture analysis and design TSPACE.
Furthermore, instantiation of a TSPACE SRA for diﬀerent domains can require custom‐
ization (e.g. addition of new components or only selecting a subset of a SRA), which
requires a mechanism that can be used to analyze quality and completeness of the
instantiated architectures. Although there have been attempts to provide a systematic
approaches for reference architecture design, documentation and evaluation [2, 3, 7], to
the best of our knowledge, there has been a little work done on providing a process
framework for SRA design given the speciﬁc needs of SRA design and evaluation of
the TSPACE. Our work reported in this paper aims to address the following research
objectives:
• Provide a systematic approach that can lead to a SRA’s design elements identiﬁca‐
tion, requirements analysis and detailed design.
• Provide insight to the speciﬁc needs of TSPACE SRA evaluation and instantiation
into concrete architectures.
• Demonstrate application of RADeF on SRAs of software architecture design and
implementation TSPACEs.
The organization of the paper is as follows. Section 2 provides the details of RADeF.
Section 3 describes the results of the case studies of applying RADeF for describing and

A Process Framework for Designing SRAs for TaaS

113

implementing TSPACE. Section 4 provides an overview of the related work and
Sect. 5 concludes the paper by sharing lessons learned and experiences.

2

Reference Architecture Design Process Framework (RADeF)

A SRA is expected to provide guidance for designing and evaluating a concrete archi‐
tecture. A SRA description usually includes reusable solutions in terms of architectural
goals, architectural styles, design patterns, design principles and decision and guidelines
for initiating a SRA. That is why it is important that a SRA description includes as much
details as possible. It is also important to have a clearly described process that can be
used to design and evaluate a SRA [2]. In this section, we describe RADeF, the process
that can lead to TSPACE SRA development, evaluation and implementation. We also
discuss important factors that should be considered at each stage of TSPACE SRA
design. A pictorial representation of RADeF is presented in Fig. 1. RADeF is an iterative
process framework and information produced in preceding stages is used as input for
the proceeding stages of the process and as shown in Fig. 1.

Fig. 1. A process framework for designing a Software Reference Architecture

2.1 Identiﬁcation of a Reference Architecture’s Concepts and Elements
First step in designing a SRA of a cloud-based TSPACE is to identify the concepts and
elements that constitute TSPACE. A SRA consists of not only SRA requirements and

114

M.A. Chauhan et al.

SRA views, but it also provides guidelines for SRA evaluation. A generic view of the
SRA elements is depicted in Fig. 2(a). The required concepts an elements are identiﬁed
through a high-level analysis of a particular domain. TSPACE SRA elements can be
classiﬁed into: Tenants, Tools, Provisioning Infrastructure, Artifacts, Context and Inte‐
gration Methods. Each of the elements is tailored and extended with respect to the
domain requirements for which the SRA is to be enacted.
Participants’ Roles: End users, Requirements Analysts and Software Architects.
Artifact(s) Consumed: Business Requirements.
Artifact(s) Produced: High-level relationship models for TSPACE concepts and
elements.

Fig. 2. Software Reference Architecture elements

2.2 Reﬁnement of Domain Element and Relationships Modeling
The activities identiﬁed for this stage are aimed at reﬁning the identiﬁed elements in
previous stage, establishing the hierarchical structure of TSPACE elements and
modeling relations among the elements. Domain models are considered the main sources
of the information for this stage. The domain models can provide standardizations for
elements, their hierarchical structures and the relationships among the elements.
However the domain models need to be extended in order to cover all the dimensions
of TSPACE including the tools, the development processes which govern the provi‐
sioning and usage of the tools, data integration and exchange formats among the tools,
and any additional functional aspects that are required by TSPACE in a speciﬁc domain.
Figure 2(b) shows TSPACE elements and relationships among the elements. The arti‐
facts that are produced at this stage, serve as a foundation for the detailed requirements
analysis and architecture design of the components that are responsible for tools
bundling and integration in the TSPACE.

A Process Framework for Designing SRAs for TaaS

115

Participants’ Roles: Business Analyst and Software Architect.
Artifact(s) Consumed: Documentation approaches, documentation templates and
architecture design abstractions.
Artifact(s) Produced: TSPACE conceptual models that consists of concepts and
elements that encompass TSPACE and relationship among the concepts and models.
2.3 Functional Demarcation Between the Reference Architecture Elements
and the Tools to Be Provisioned
This stage of the activities deals with demarcation of functional requirements to be
handled by a SRA and functional requirements for which TSPACE can rely on the tools
(that can be provisioned by TSPACE). The artifacts that are produced at this stage
provide a foundation for TSPACE functional requirements. The high-level architecture
design with speciﬁc focus on the identiﬁcation of components responsible for the
TSPACE features.
Participants’ Roles: Requirements Analyst, Business Analyst and Software Architect.
Artifact(s) Consumed: Domain models.
Artifact(s) Produced: Documents describing functional demarcation of TSPACE and
encompassing tools.
2.4 Requirements Identiﬁcation and Classiﬁcation
The TSPACE SRA requirements can be classiﬁed into service model, integration and
quality requirements as discussed below.
Service Model Requirements: This task aims at identifying the requirements for tools
bundling, provisioning and enactment. For example, one of the primary objectives for
providing a software architecting TSPACE is to provide the bundled suite of tools on
demand as part of a TSPACE. It is critical to determine bundling and provisioning
constraints and parameters. The tools bundling mechanism should be ﬂexible enough
to cater integration needs of diﬀerent types of the tools to be used in a particular domain.
In certain cases, there can also be some constraints with respect to the underlying
virtualized infrastructure (e.g., IaaS cloud virtual machines) that can host the tools to
enable their operations within acceptable runtime quality parameters (e.g., performance,
scalability and reliability). The artifacts that are produced in this activity provide guide‐
lines to identify integration needs of the tools in a TSPACE and guide the SRA analysis
and design process.
Integration Requirements: Integration requirements focus on integration needs of the
tools that can be provisioned in TSPACE. With reference to software architecting
domain, the integration mechanism should be ﬂexible enough to accommodate diﬀerent
proprietary and standardized formats as well as support integration among heteroge‐
neous types of tools (e.g. desktop-based, web-based and cloud-based tools). The tools
that are provisioned in a TSPACE instance can vary and the integration mechanism
should be ﬂexible enough to adapt to the tools’ integration requirements of the

116

M.A. Chauhan et al.

provisioned tools. The integration mechanism should also support workspace require‐
ments, such as awareness of the operations that are performed on the artifacts as a result
of the users’ activities [8]. The artifacts that are produced at this stage guide the reference
architecture design and analysis process of integration.
SRA Quality Requirements: The SPACE is aimed at providing a bundled suite of tools
following a service model. As a result, the TSPACE SRA needs to incorporate archi‐
tecture quality requirements of cloud-enables services based system such as scalability
[9], multi-tenancy [10] and dynamic provisioning [11]. The activities that are performed
at this stage aim to identify important quality characteristics with reference to the domain
in which the TSPACE is to be used. For the software architecting domain, scalability,
multi-tenancy and dynamic provisioning are important. For another domain such as
software testing, elasticity [12] and reliability [13] can be important. The artifacts that
are produced as a result of this activity provide a foundation for runtime architecture
quality requirements of TSPACE.
Participants’ Roles: Business Analyst and Software Architect.
Artifact(s) Consumed: Design time constrains and tools bundling constraints,
TSPACE functional boundaries, required activities and tasks, and tools enactment/
provisioning parameters and constraints, Collaboration and integration models.
Artifact(s) Produced: Integration and collaboration models. Design time constrains,
tools bundling constraints and tools’ provisioning/enactment parameters. TSPACE
runtime architecture quality requirements.
2.5 Impact of Potential Cloud Hosting Environments on the Domain
The suitability of the underlying IaaS or PaaS platform can impact the way a reference
architecture is designed. E.g. PaaS environments can be a suitable choice for testing
domains in which autonomous scalability of the resources is more important. Whilst
IaaS environments can be suitable for hosting tools implemented using diﬀerent tech‐
nologies as IaaS clouds provide customizable hosting environments.
Participants’ Roles: Software Architect.
Artifact(s) Consumed: List of potential cloud hosting environments.
Artifact(s) Produced: Selected cloud hosting environments.
2.6 Reference Architecture Documentation, Analysis and Design
This stage of the activities focus on analyzing architecture documentation approaches
and preliminary analysis of the maturity of the domain for which a SRA is designed.
The analysis of the documentation approaches determines the most appropriate strat‐
egies for capturing the architecture of the domain for which TSPACE is designed. A
comprehensive analysis of the SRA documentation approaches is reported in [2, 3].
Angelov et al. have recommended that a reference architecture documentation include
information about the context, goals and design decisions. The context dimension covers
the purpose, the organization(s) that is (are) developing a reference architecture and its

A Process Framework for Designing SRAs for TaaS

117

maturity stage (e.g., preliminary or classic) [2]. The goal dimension encompasses busi‐
ness goals and quality attributes as well as the purpose of deﬁning a reference architec‐
ture (e.g., to standardize concrete architecture or to facilitate design of concrete archi‐
tecture). The design dimension elaborates whether a SRA is concrete or abstract and
whether the SRA has been described using formal, semiformal or informal approaches.
Avgeriou et al. [3] propose that a SRA description should have three main elements: (i)
description of the approach used to document a SRA, (ii) guidelines on instantiation of
a SRA and (iii) evaluation of a SRA corresponding to desired functional requirements
and quality attributes. The outcome of this activity determines the approach used for
describing a SRA, the level of abstractions to be covered in the SRA documentation,
the objectives and the selection of the approaches for evaluation and instantiation of a
SRA. Outcome of this activity has impact on all the proceeding stages of the reference
architecture design process. A summary of a SRA design dimensions is shown in
Fig. 2(a).
A SRA design should be based on reference models and architecture styles and patterns
[14, 15]. If a TSPACE SRA is to be used for mission-critical and safety-critical tools, then
it is also important to have metrics that can be used to measure runtime quality parameters
of an architecture. An empirical investigation of the SRAs have revealed the absence of
important views [4] in a SRA and the details of the supporting algorithms and formaliza‐
tion to achieve the required functionality of the reference architecture [4] impact a SRA’s
adoption and applicability. Hence, a SRA should encompass all the important views
according to some well-known approaches such as 4 + 1 view model [4].
Participants’ Roles: Software Architect.
Artifact(s) Consumed: Architecture documentation templates and models.
Artifact(s) Produced: SRA documentation approaches used, ﬁlled templates, details
of the abstractions to be used, evaluation and initialization approaches and views.
2.7 Evaluating a Reference Architecture
Evaluation of a SRA is an important step for analyzing its feasibility and applicability.
Diﬀerent considerations for evaluating a reference architecture have been proposed [3,
7, 16]. Avgeriou et al. [3] have proposed to evaluate a SRA using scenarios and proto‐
typing. Scenarios based approaches enable an implementation-independent evaluation.
The evaluation scenarios need to be focused on important design time and runtime
qualities of the architecture. The prototyping helps analyze the suitability of the imple‐
mentation decisions such as platform choices and programming languages for the design
decisions incorporated in a SRA. Angelov et al. [7, 16] have argued that straightforward
adoption of architecture evaluation methods such as Architecture Tradeoﬀ Analysis
Methods (ATAM) [17] and Software Architecture Analysis Methods (SAAM) [18] is
not feasible because: gathering all the stakeholders and generating scenarios for a SRA
evaluation may not be possible, there can be a signiﬁcantly large diversity of stake‐
holders and the levels of abstractions in the designed components can be quite high.
Hence, it is important to identify the most relevant architecture requirements by

118

M.A. Chauhan et al.

involving domain experts or domain models and then preparing scenarios by involving
a SRA’s potential users [7, 16].
Other than the above-mentioned challenges, a TSPACE SRA evaluation activity has
some additional complexities. For example, a TSPACE provision the tools for
performing the diﬀerent activities; hence there is a need for tools integration and work‐
space speciﬁc functions in a *aaS model. An evaluation activity focuses on the parts of
a SRA that are embodied by TSPACE boundaries rather than by the tools to be provi‐
sioned. Some of the key quality characteristics are inherited from *aaS model for eval‐
uating a TSPACE SRA’s abilities of on-demand provisioning of tools in a particular
domain, whose quality attributes should drive the evaluation activities. Hence, the eval‐
uation activity should focus on identifying and analyzing the relevant quality attributes
for the given domain. Moreover, as the SRA’s elements (i.e. components or services)
and design decisions collectively constitute to SRA quality, traditional architecture
analysis and evaluation methods such as utility tree [17] from ATAM are not suﬃcient
because these are unable to quantify architecture quality. We advocate for leveraging
an new approach inspired from attack-defense trees [19] to enhance the utility tree for
analysis of the completeness of a SRA. Figure 3(a) shows the structure of the enhances
utility tree. Sub-nodes of the utility tree corresponding to each quality can be assigned
with three types of operators: logical OR operator which identiﬁes that opting any of
the branch can achieve a quality attribute, logical AND operator that indicates that opting
all of the branches will be essential to meet a quality criteria, and a Seq-AND (sequential
AND) operator indicates that the design decisions corresponding to the branches need
to be executed in a speciﬁc sequence in order to achieve the corresponding quality char‐
acteristic. In some cases, it might be required to analyze overall quality and completeness
of the SRA. For this purpose, the probability values for the eﬀectiveness of the design
decisions can be assigned to each branch of the quality attribute nodes (such that
maximum probability of all design decisions corresponding to each quality attribute do
not exceed one). When probability values are used, OR operator takes minimum, AND
takes sum and SeqAND takes sum of the probability values of all the branches of a
quality attribute sub-tree. Finally, to evaluate the tools bundling and integration
approaches, a prototype based evaluation is considered more eﬀective. That means a
TSPACE SRA prototype can play a critical role for the SRA evaluation and the tools
that are selected for provisioning using the prototype can help to cover the most critical

Fig. 3. SRA utility tree template and models

A Process Framework for Designing SRAs for TaaS

119

evaluation scenarios. The outcome of evaluation activity can trigger modiﬁcation in the
artifacts that were generated in previous stages as depicted in Fig. 1.
Participants’ Roles: User, Requirement Analyst, Business Analyst and Software
Architect
Artifact(s) Consumed: TSPACE Software Reference Architecture.
Artifact(s) Produced: Evaluation results.
2.8 Reference Architecture Instantiation and Implementation
As a SRA provides a generic architecture solution for a speciﬁc domain, its instantiations
can require appropriate tailoring, sometimes signiﬁcant. As a result, some of the compo‐
nents can be excluded from the instantiated architecture and some additional compo‐
nents can be incorporated. The enhanced utility tree (Fig. 3(a)) presented in Sect. 2.7
can facilitate the analysis and quantiﬁcation of the concrete architecture.
Participants’ Roles: Business Analyst, Software Architect and Developers.
Artifact(s) Consumed: Evaluated TSPACE Software Reference Architecture.
Artifact(s) Produced: Instantiated system.

3

Two Cases of Applying RADeF

We have followed RADeF to support the design of a SRA for two types of TSPACE:
software architecting tools domain and software implementation tools domain. The two
case studies of applying RADeF for analysis, design, evaluation and implementation of
the TSPACE aimed at provisioning integrated suite of tools for the domains. The tools
commonly used for software architecture design and software implementation were
selected for the case studies and TSPACE was designed by following RADeF steps. In
this section, we provide the insight gained from our experiences from applying RADeF.
First and second stage of RADeF is to identify concepts and elements of a SRA and
establish relationships between the elements. The generic model presented in Fig. 2(b)
provides a foundation for TSPACE elements identiﬁcation and relationship modeling.
Though the generic model needs to be extended to cater the needs of a speciﬁc type of
tools and the operations that can be performed using the tools. Figure 3(b) shows the
extensions to the generic model for software architecting and software implementation
domain. The tools used for software architecting have diﬀerent types of the artifacts,
e.g., architecture knowledge artifacts, design decision artifacts and architecture design
diagrams. Since software architecture artifacts can be at diﬀerent levels of abstractions,
and there is no need to exchange complete artifacts (although selected information
exchange is required) among architecting tools, which can be integrated through
semantic integration technologies. We have leveraged IEEE 1471-2000 [20] and
ISO/IEC/IEEE 42010:2011 [21] to build the semantic integration model for SRA of the
architecting tools. Figure 3(b) shows a high-level view of the elements of the semantic
model (the details can be found in [22]). The software implementation need to exchange
the artifacts for collaborative work. For example, in a scenario where a UML modeling

120

M.A. Chauhan et al.

tool is used to design class diagrams, the code skeleton generated using the UML
modeling tool (forward engineering) has to be used as input by Integrated Development
Environments (IDEs). For example, process-oriented tools bundling requires processcentric integration. At this stage the SRA integration models are produced that provide
foundations for the detailed architecture design.
Functional demarcation between the requirements to be incorporated by a SRA and
the requirements to be incorporated by the provisioned tools is an important step for the
requirements identiﬁcation. As in the case studies, our focus was on providing software
architecting and development tools, the SRAs focused on tools provisioning, tools inte‐
gration and awareness of the operations that are performed on the artifacts using the
tools. Whereas, individual tools were responsible for providing support for speciﬁc
activities such as architecture knowledge management, architecture design decision
management, architecture design and software implementation. Table 1 shows details
of the SRA and the tools’ requirements classiﬁcation. The details of the requirements
can be found in [22, 23]. Multi-tenancy and scalability are domain speciﬁc quality
requirements to support a large number of tenants [24]. Analysis and identiﬁcation of
cloud hosting environments for software architecting and implementation domains
requires using IaaS cloud because of heterogeneity of the tools. A combination of
desktop and web-based tools are used for software architecting and implementation. The
IaaS provides ﬂexibility to host the existing tools by conﬁguring the virtual machine
templates.

Functional Demarcation

Table 1. Functional demarcation and requirements

Tools
Requirements

SRA
Requirements

Architecting

Implementation
Functional

Quality
Domain

Quality

Knowledge management, design decision management, architecture modeling.
Software development, unit testing.
Autonomous provisioning, semantic
integration, process centric integration,
awareness of the operations.
Flexibility, interoperability, completeness and adaptability.
Multi-tenancy, scalability

For the TSPACE SRAs detailed design, we have used a layered architecture [5] and
a view-based approach [4] to represent diﬀerent parts of the SRA. A layered architecture
can facilitate easy modiﬁability of a TSPACE SRA, whose diﬀerent dimensions can be
represented using a view-based approach. The TSPACE meta-model (Fig. 2(b)) and the
detailed models (Fig. 3(b)) produced in the second stage of RADeF are used as a foun‐
dation for the detailed design. Table 2 shows the key architecture design decisions for
software architecting and software implementation of a TSPACE SRA design. We have
reported the details on the architecture views and design decisions in [6, 23].

A Process Framework for Designing SRAs for TaaS

121

Table 2. Decisions for software architecting (Arc.) and implementation (Impl.) case studies

Architecture Design Decisions

Case Study
Arc. Impl.

Service Oriented and REST Architecture
Centralized Repository to have common semantic integration models
Use of pipes and filter patterns to support multi-tenancy and easy
scalability
Tenant specific integration, information discovery and awareness
rules
Process-centric integration
IaaS cloud for hosting tools

As discussed earlier, the inclusion of heterogeneous tools producing and consuming
artifacts at diﬀerent levels of abstractions makes the evaluation of a TSPACE SRA a
challenging activity. We have adopted multi-faceted approach to evaluate the TSPACE
SRAs for the reported case studies. (i) We evaluated the TSPACE SRAs and their
respective implementations for functional completeness corresponding to the functional
and quality requirements. (ii) We implemented the prototype systems for TSPACE
SRAs using Amazon IaaS cloud1. Interface modules of TSPACE have been imple‐
mented using Service Oriented Architecture (SOA) [25] and REST [26] architecture
styles using JavaEE service technologies (JAX-RS2, JAX-WS3) for enabling easy inter‐
operability of diﬀerent types of tools with the systems. The semantic integration has
been implemented using Apache Jena Framework4. The process-centric integration has
been implemented using jBPM5 process workﬂow engine. (iii) We used quantitative
architecture evaluation approach that is presented in Sect. 2.7, which is based upon utility
tree of ATAM, but can quantiﬁably measures the TSPACE SRA’s quality. The evalu‐
ation was carried out by six potential stakeholders, who had experiences (of architecting
and implementation) of software development tools, process-based applications, cloudbased systems and collaborative software development systems.
A subset of the enhanced utility tree (described in Sect. 2.7) constructed in the eval‐
uation session is presented in Fig. 4. The participants of the evaluation session were
asked to assign each of the design decisions with values 0, 0.25, 0.50, 0.75 or 1.00. Then
the average of the value score was taken for each of the design decisions to be assigned
to a speciﬁc quality attribute on a utility tree branch. In case, if there were more than
one design decisions corresponding to a speciﬁc quality attribute, an average was divided
by the total number of design decisions to keep the maximum probability value under
1 corresponding to each of the quality attributes. If some of the design decisions are
important than others, then weighted averages can be used. Whilst we considered all of

1
2
3
4
5

http://aws.amazon.com/.
http://jax-rs-spec.java.net/.
https://jax-ws.java.net/.
https://jena.apache.org/.
http://www.jbpm.org/.

122

M.A. Chauhan et al.

the design decisions of the equal importance, the enhanced utility tree branches corre‐
sponding to each of the quality attributes (and sub attributes) had either AND, OR and
SeqAND operators (as discussed in Sect. 2.7). The evaluation participants found the
proposed operators (that were assigned to the enhanced utility tree) helpful to quantify
the architectural quality of the TSPACE SRA. Figure 4 shows the evaluation results
corresponding to four key quality attributes of the TSPACE SRAs for software archi‐
tecting and implementation TSPACE. An average of the quality score (average of the
score given by the six evaluators) is shown in the ﬁgure corresponding to each of the
design decisions of the quality attributes. Sum and Min functions (as described in
Sect. 2.7) are used to calculate the aggregated quality score of the reference architectures.

Fig. 4. SRA evaluation utility tree

4

Related Work

Given the increasing importance of SRAs for guiding the designing and evaluating of
concrete architectures in diﬀerent domains, several researchers have attempted to
provide a set of standardized activities and frameworks for designing and documenting
reference architectures. One of the most comprehensive and detailed guidelines have
been reported by Angelov et al. [1, 2, 7, 16]. Their work provides a classiﬁcation tech‐
nique of the reference architectures based upon the domain-speciﬁc maturity and how

A Process Framework for Designing SRAs for TaaS

123

the reference architectures are designed. For the mature domains, the aim of a reference
architecture is to provide the standardization of the architectures, whereas, for the
emerging domains, the purpose is to facilitate the design of concrete architectures in
multiple organizations. Some of the problems associated with designing a reference
architecture are missing design methods, challenges in deﬁning non-functional require‐
ments, problems with selecting appropriate views, lack of suitable architecture docu‐
mentation methods and relatively little support for evaluating the reference architectures
[1]. In our TSPACE SRA design process, we have explicitly catered all of the abovementioned challenges to support the process of designing the reference architecture and
have explicit stages for design and documentation methods, deﬁne non-functional
requirements, select appropriate views and choose appropriate evaluation strategies.
Avgeriou [3] suggests representing a reference architecture using multiple view‐
points of Rational Uniﬁed Process (RUP) including logical viewpoint, deployment
viewpoint, implementation viewpoint and data viewpoint. Avgeriou has emphasized
that the reference architecture should be evaluated using both scenario-based and archi‐
tecture prototype-based evaluation with respect to development-time and run-time qual‐
ities [3]. Nakagawa et al. [27] have proposed the use of ontologies to identify diﬀerent
components of the reference architecture. Fernandez et al. [28] have described the key
documentation elements of a software reference architecture. The documentation
elements include technical design, architecture knowledge and experiences and manage‐
ment documentation. For TSPACE SRA, we have described the details about the tech‐
nical design and architecture knowledge. However, the management of the documen‐
tation (during applications of the software reference architecture in diﬀerent setting) is
out of the scope of this work.

5

Discussion and Conclusions

The cloud-enabled tools not only need to be compliant with speciﬁc quality requirements
but also need to provide support for diﬀerent activities, just like desktop-based tools.
Whilst tools in every domain have their speciﬁc challenges, there are some generic
characteristics that tools in every domain need to address. We share our experience from
diﬀerent activities of designing the TSPACE SRA.
Adoption of Appropriate Methodology to Formalize relations among TSPACE
Elements: To establish relationship among the artifacts that are maintained by cloudbased tools with other tools is a critical characteristic and can play a signiﬁcant role in
cloud-based tools adoption. Hence, it is important to identify the integration needs for
the tools to be provisioned from a cloud-enabled workspace. Our experience has shown
that an ontology driven semantic model can provide support for tools selection, relating
diﬀerent artifacts with each other even though the artifacts are maintained by using
diﬀerent proprietary data structures, and awareness needs in a cloud based workspace.
As diﬀerent tools have diﬀerent requirements for integration, there is a need to have
appropriate semantic integration models corresponding to the artifacts’ formats used by
the tools.

124

M.A. Chauhan et al.

Incorporating Workﬂows with Tools Provisioning: In some cases, the tools that are
provisioned as part of a tools suite need to exchange information according to project
speciﬁc development processes (e.g., to manage collaboration in distributed architecture
evaluation processes [29]). In such cases, the integration support for the tools needs to
be complemented by a workﬂow based process on the cloud so that artifacts among the
tools can be exchanged according to the speciﬁc software development processes.
Quality of Individual Tools in TSPACE: In our proposed TSPACE SRAs, we have
considered each of the provisioned tools as a black box and have not considered the
management of quality characteristics of each individually provisioned tool during the
lifecycle of a TSPACE instance. However, for certain tools that produce executable
artifacts, e.g. model driven tools used to generate code, may require extra computing,
memory or other resources during their life cycle depending on the tasks to be executed.
In such cases, a TSPACE for the tools needs to incorporate the metrics and corresponding
prediction models so that additional resource needs can be predicted and resources can
be acquired according to the needs of a speciﬁc task.
Impact of software reference models: Availability of standardization models for
respective domain impact the reference architecture design process. Whist designing
TSPACE software reference architecture for software architecting domain, we have
leveraged IEEE 1471-2000 [20] and ISO/IEC/IEEE 42010:2011 [21] architecture docu‐
mentation models as a baseline for the identiﬁcation of the TSPACE architecture
elements and the TSPACE ontology meta-model design. The meta-model has been
further enhanced by analyzing architecting TSPACE requirements. The incorporation
of standardized domain model in the reference architecture design ensures the applica‐
bility of reference architecture for broader range of tools. Unavailability of the stand‐
ardization models for the respective domain or not using them during the reference
architecture design can negatively impact the applicability of a reference architecture.
Selection of Appropriate underlying IaaS Clouds and Cloud Deployment Models: As
tools in a TSPACE SRA are considered as black box, the tenant speciﬁc constrains on
artifacts’ storage location are applied onto the tools that are provisioned on the location
that is compliant with the constraints (in our prototype implementation, we have used
Amazon EC2 location speciﬁc provisioning features). However, for more complex use
cases, where location constraints on the artifacts can change during their lifecycle,
Virtual Machines (VMs) hosting the tools might need to be migrated from one location
to another. In such cases, the capability of underlying IaaS to support VMs migration
would play a critical role. Hence, IaaS cloud selection and selection of cloud deployment
model (e.g., public, private or hybrid) should be carefully made. A cloud environment
that supports the desired features should be selected.
Multi-facet approach for TSPACE SRA Evaluation: Considering a generic nature of
TSPACE SRA and a broad range of potential stakeholders, multiple architecture eval‐
uation techniques need to be adopted for evaluating a reference architecture from
diﬀerent perspectives. We have evaluated the TSPACE software reference architecture
using scenario-based evaluation approaches [18], architecture tradeoﬀ analysis method

A Process Framework for Designing SRAs for TaaS

125

[17] and a prototype implementation of the reference architecture. Scenario-based eval‐
uation approaches can help evaluate completeness of a SRA with respect to reference
architecture objectives and requirements. Architecture tradeoﬀ analysis method enables
the identiﬁcation of strong and week points of a SRA. A prototype is a viable way to
demonstrate the feasibility of a SRA. The proposed TSPACE SRA evaluation method‐
ology facilitates the quality score of not only the SRAs but also their concrete repre‐
sentations. For example, if a concrete implementation of the SRA corresponding to
evaluation tree presented in Fig. 4 adopts diﬀerent parts of the design decisions and
corresponding components for diﬀerent tenants, the quality of the instantiated architec‐
ture and corresponding system can be computed on the ﬂy, especially for SaaS based
systems.
In future, we intend to apply RADeF on software reference architecture design and
analysis of other types of cloud-based systems. We also intend to carry out empirical
evaluations on our proposed quantiﬁcation mechanism for SRA evaluation utility tree
to analyze its impact on long-term management of the software reference architectures.

References
1. Angelov, S., Trienekens, J., Kusters, R.: Software reference architectures - exploring their
usage and design in practice. In: Drira, K. (ed.) ECSA 2013. LNCS, vol. 7957, pp. 17–24.
Springer, Heidelberg (2013). doi:10.1007/978-3-642-39031-9_2
2. Angelov, S., Grefen, P., Greefhorst, D.: A framework for analysis and design of software
reference architectures. Inf. Softw. Technol. 54(4), 417–431 (2012)
3. Avgeriou, P.: Describing, instantiating and evaluating a reference architecture: a case study.
Enterp. Archit. J., 24 (2003)
4. Kruchten, P.B.: The 4 + 1 view model of architecture. IEEE Softw. 12(6), 42–50 (1995)
5. Buschmann, F., et al.: Pattern-Oriented Software Architecture: A System of Patterns, p. 457.
John Wiley & Sons Inc., New York (1996)
6. Chauhan, M.A., Ali Babar, M., Sheng, Q.Z.: A reference architecture for a cloud-based tools
as a service workspace, In: 2015 IEEE Conference on Service Computing (SCC). IEEE, New
York (2015)
7. Angelov, S., Trienekens, J.J.M., Grefen, P.: Towards a method for the evaluation of reference
architectures: experiences from a case. In: Morrison, R., Balasubramaniam, D., Falkner, K.
(eds.) ECSA 2008. LNCS, vol. 5292, pp. 225–240. Springer, Heidelberg (2008). doi:
10.1007/978-3-540-88030-1_17
8. Dourish, P., Bellotti, V.: Awareness and coordination in shared workspaces. In: Proceedings
of the 1992 ACM Conference on Computer-Supported Cooperative Work, pp. 107–114.
ACM, Toronto (1992)
9. Sodhi, B., Prabhakar, T.V.: Application architecture considerations for cloud platforms. In:
2011 Third International Conference on Communication Systems and Networks
(COMSNETS), p. 1–4. IEEE (2011)
10. Domingo, E.J., et al.: CLOUDIO: a cloud computing-oriented multi-tenant architecture for
business information systems. In: 2010 IEEE 3rd International Conference on Cloud
Computing (CLOUD), pp. 532–533. IEEE (2010)
11. Calheiros, R.N., et al.: The Aneka platform and QoS-driven resource provisioning for elastic
applications on hybrid Clouds. Future Gener. Comput. Syst. 28(6), 861–870 (2012)

126

M.A. Chauhan et al.

12. Han, R., et al.: Enabling cost-aware and adaptive elasticity of multi-tier cloud applications.
Future Gener. Comput. Syst. 32, 82–98 (2014)
13. Brandic, I., Music, D., Dustdar, S.: Service mediation and negotiation bootstrapping as ﬁrst
achievements towards self-adaptable grid and cloud services. In: Proceedings of the 6th
International Conference Industry Session on Grids Meets Autonomic Computing, pp. 1–8.
ACM, Barcelona (2009)
14. Bass, L., Clements, P., Kazman, R.: Software Architecture in Practice, p. 640. AddisonWesley Professional, Boston (2012)
15. Avgeriou, P., Zdun, U.: Architectural patterns revisited–a pattern (2005)
16. Angelov, S., Grefen, P.: An e-contracting reference architecture. J. Syst. Softw. 81(11), 1816–
1844 (2008)
17. Kazman, R., et al.: The architecture tradeoﬀ analysis method. In: 1998 Proceedings of the
Fourth IEEE International Conference on Engineering of Complex Computer Systems,
ICECCS 1998 (1998)
18. Kazman, R., et al.: SAAM: a method for analyzing the properties of software architectures.
In: 1994 Proceedings of the 16th International Conference on Software Engineering, ICSE-16
(1994)
19. Kordy, B., et al.: Attack–defense trees. J. Logic Comput., exs029 (2012)
20. IEEE Recommended Practice for Architectural Description of Software-Intensive Systems.
IEEE Std 1471-2000, pp. i–23 (2000)
21. ISO/IEC/IEEE Systems and software engineering – Architecture description. ISO/IEC/IEEE
42010:2011(E) (Revision of ISO/IEC 42010:2007 and IEEE Std 1471-2000), pp. 1–46 (2011)
22. Chauhan, M.A.: Foundations for Tools as a Service Workspace: A Reference Architecture.
Ph.D. Dissertation, IT University of Copenhagen, Denmark (ITU-DS; No. 118) (2016)
23. Chauhan, M.A., Babar, M.A.: PTaaS: Platform for Providing Software Developing
Applications and Tools as a Service. Technical report TR-2014-176 (2014). https://
pure.itu.dk/ws/ﬁles/74130379/TR_2014_176.pdf
24. Azeez, A., et al.: Multi-tenant SOA middleware for cloud computing. In: 2010 IEEE 3rd
International Conference on Cloud Computing (CLOUD), pp. 458–465. IEEE (2010)
25. Huhns, M.N., Singh, M.P.: Service-oriented computing: key concepts and principles. IEEE
Internet Comput. 9(1), 75–81 (2005)
26. Fielding, R.T.: Architectural Styles and the Design of Network-Based Software Architectures,
p. 162. University of California, Irvine (2000)
27. Nakagawa, E.Y., Barbosa, E.F., Maldonado, J.C.: Exploring ontologies to support the
establishment of reference architectures: an example on software testing. In: Joint Working
IEEE/IFIP Conference on Software Architecture, 2009 & European Conference on Software
Architecture, WICSA/ECSA 2009. IEEE (2009)
28. Martínez-Fernández, S., et al.: Artifacts of software reference architectures: a case study. In:
Proceedings of the 18th International Conference on Evaluation and Assessment in Software
Engineering. ACM (2014)
29. Ali Babar, M.: A framework for groupware-supported software architecture evaluation
process in global software development. J. Softw. Evol. Process 24(2), 207–229 (2012)

Should We Adopt a New Version of a Standard?
– A Method and Its Evaluation on AUTOSAR
Corrado Motta1(B) , Darko Durisic1(B) , and Miroslaw Staron2
1

2

Volvo Car Group, Gothenburg, Sweden
{Corrado.Motta,Darko.Durisic}@volvocars.com
Chalmers University of Gothenburg, Gothenburg, Sweden
Miroslaw.Staron@cse.gu.se

Abstract. The development of large software systems is usually based
on a number of industrial standards that deﬁne a set of features and their
requirements. In order to use new features speciﬁed in the standards, new
releases of the standards need to be adopted together with their requirements. This requires a thorough impact analysis of the changes in the
requirements that can be time-consuming considering their potentially
high number. In order to facilitate the adoption of new releases of industrial standards in large software systems, we present a method based
on both quantitative and qualitative analysis of requirements evolution.
The method is evaluated in a case study of AUTOSAR - a standard used
in the development of automotive software systems in cooperation with
Volvo Car Group. The evaluation results show that the use of the proposed method can identify the most unstable AUTOSAR speciﬁcations
and their requirements whose changes may have a signiﬁcant impact on
the automotive systems. This knowledge can increase the speed of adoption of new AUTOSAR releases by automotive vendors.

Keywords: Requirement evolution

1

· Metrics · Industrial standards

Introduction

Analyzing the evolution of system requirements is an important and inevitable
phase in the development of large software systems [8], especially for OEM (Original Equipment Manufacturers) that base their development on industrial standards. This is because features speciﬁed in the standards and their requirements
are usually driven by a number of competitive companies. The process of updating one system with new standardized features brings a series of advantages,
such as making use of a number of standardized requirements that are proved
to be valid in practice and buying cheaper oﬀ-the-shelf software packages from
software vendors. However, it also brings new challenges such as working with
requirements not written by OEMs and dealing with their fast evolution.
For this reason, the process of analyzing the evolution of standardized requirements without a suitable methodology and tool support can be time-consuming
c Springer International Publishing AG 2016

P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 127–143, 2016.
DOI: 10.1007/978-3-319-49094-6 9

128

C. Motta et al.

and can require signiﬁcant engineering eﬀort. Furthermore, adopting new standardized features in the development projects without having a clear knowledge
whether the standardized requirements related to them can be fulﬁlled in practice can lead to the introduction of new faults into the system. Therefore, dealing
with the evolution of standardized requirements is one of the primary objectives
of large companies in order to be able to update their systems faster and cheaper.
Although several solutions have been proposed mostly originating from the
academia, the evolution of standardized requirements is still considered one of
the most challenging practical problems in the development of large software systems [8]. The objective of this work is to deﬁne a suitable methodology for eﬃciently analyzing the evolution of standardized requirements as well as improving
the process of updating large software system with new standardized features.
We aim to provide an answer to the following research question: How can we
assure eﬃcient adoption of new releases of standards in the development of large
software systems by analyzing the evolution of standardized requirements?
As outcome of our work, this paper presents a method, named SREA (Standardized Requirements Evolution Assessment), that consists of four steps that
are based on both quantitative and qualitative analysis of requirements evolution. All steps of the method can and should be performed automatically with
the help of a software tool in order to reduce the time of analysis. Despite the
fact that we focus on the analysis of standardized requirements where the proposed method is particularly helpful as certain requirements changes are driven
by other companies, the method can also be applied on the evolution of company
internal requirements as part of the common requirements engineering process.
In order to evaluate the proposed method, we applied it in the automotive
domain in a case study of AUTOSAR (Automotive Open System Architecture)
standard [3], that speciﬁes a reference architecture and methodology for the
development of automotive software systems. In particular, AUTOSAR provides
a set of standardized requirements for the design of the automotive architectures
(i.e., language for the architectural models), that consists of a number of Electronic Control Units (ECUs) responsible for executing software functionalities,
and requirements for the ECU middleware. The rest of the functional ECU
requirements are left to be deﬁned by each OEM. As AUTOSAR represents a
big industrial standard that counts more than 150 partners and 21.000 requirements, we believe it qualiﬁes as a valid case for evaluating the SREA method.
The study is conducted in collaboration with Volvo Car Group (VCG) whose
engineers helped us to understand and prioritize AUTOSAR releases and their
speciﬁcations for the evaluation of the proposed method. The results of our evaluation show the importance of performing automated analysis of requirements
evolution. In particular, they show that the use of SREA method could help
automotive engineers in analyzing the evolution of the AUTOSAR requirements
faster by providing the engineers with information such as which speciﬁcations
are unstable (e.g., their requirements change with every release), which requirements are changed and the actual content of the changed requirements.

Should We Adopt a New Version of a Standard?

129

Knowledge about the unstable speciﬁcation can be useful for making strategic decisions about the set of standardized features that are mature enough to
be implemented in the system. Knowledge about the changed requirements has
a potential of saving lots of time spent on reading tents of AUTOSAR speciﬁcation with thousands of requirements manually. For example, in all speciﬁcation
with design requirements between AUTOSAR releases 4.2.1 and 4.2.2, we identiﬁed 1563 requirements and presented relevant changes (additions, removals and
modiﬁcations) for only 172.
The rest of the paper is organized as follows: Sect. 2 presents the related
work; Sect. 3 provides the background of AUTOSAR as our unit of analysis;
Sect. 4 describes the research methodology we used during this project; Sect. 5
deﬁnes the proposed SREA method; Sect. 6 presents the results of evaluation of
the SREA method on AUTOSAR; Sect. 7 discusses the validation of the proposed
method and provides recommendations; ﬁnally, Sect. 8 concludes our work and
describes our plans for future work.

2

Related Work

Several studies of requirements evolution are related to our study. In particular, Wang et al. [15] provide a general method for studying changes in the
requirements in order to ﬁnd relations between them and the number of software defects. The method relies on the quantitative analysis. Although their
approach was considered as a starting point for our work, our focus was on the
impact of requirements changes on the system under development.
Similar but more exhaustive studies were conducted in the avionics context
by Anderson et al. [1,2] who show how to conduct an empirical analysis of
requirements evolution starting from a general point and moving to a productoriented one. They perform the following two steps: (i) collecting information
from the avionics domain using the Requirement Maturity Index (RMI ), Historical Requirement Maturity Index (HRMI ), and Requirement Stability Index
(RSI ) metrics and (ii) reﬁning the information gathered in the ﬁrst step using
the qualitative approach. We rely on a similar work-ﬂow based on the RMI
metric that shows the stability of requirements changes in relation to the past
releases. We did not consider RSI and HRMI metrics since a great number of
requirements changes can lead to a non-meaningful (e.g., negative) number.
For the quantitative analysis of requirements evolution, we considered the
study of Shi et al. [13] that aims to identify requirements that are most likely to
be changed in the future using a number of metrics, e.g., Sequence, Frequency and
Lifecycle. However, these metrics cannot be used for identifying the requirement
speciﬁcations that are mostly aﬀected by changes, that is one of our major goals.
For eﬃciently studying the evolution of system requirements, Nurmuliani et
al. [11] provided a taxonomy of changes for categorizing diﬀerent types of requirements changes, reasons for their change and the origin of changes. This study
inspired us to deﬁne a taxonomy of changes for the AUTOSAR requirements.
Additionally, Stark g. et al. [14] proposed a method for analyzing the evolution of requirements in two steps. We adopted the ﬁrst step called micro analysis

130

C. Motta et al.

in order to get a preliminary view on the requirements architecture, structure of
the requirements speciﬁcations and possible types of requirements changes.
Finally, in order to understand the development of automotive software systems based on AUTOSAR, several automotive papers were considered. Two
studies of Durisic et al. [6,7] were useful for improving our general knowledge of
AUTOSAR, its architecture, methodology and complexity. More generally, the
paper from Broy et al. [4] was useful for identifying trends in the evolution of the
automotive software, even though they do not explain how it can be measured.
Although there is a signiﬁcant number of methods related to the analysis of
requirement evolution, the majority of them are not applied in a real industrial
context in which large software systems are developed. This paper aims to ﬁll
this gap by combining the existing studies with our considerations and metrics in
order to develop an eﬃcient method for analyzing the evolution of standardized
requirements that can facilitate this activity in the real industrial contexts.

3

Case Study Evaluation Context

In the automotive domain, OEMs are usually responsible only for the design
of automotive systems while the actual implementation of software and hardware components is done by a hierarchy of suppliers. In order to standardize the
methodology of work in such a distributed environment and a reference architecture for the distributed realization of the automotive systems using a number
of ECUs, AUTOSAR standard has been introduced. The proposed ECU architecture is based on the following common three-layer architecture:
1. Application software layer that consists of a number of software components
responsible for certain vehicle functionalities.
2. ECU middleware layer (a.k.a. basic software - BSW) that consists of a number
of BSW modules responsible for, e.g., signaling and diagnostic services.
3. ECU hardware layer responsible for executing the allocated software components and BSW modules.
Since ECU basic software does not contribute to the realization of high level
car functionalities that could make a competitive advantage for one OEM (e.g.,
autonomous drive), it is completely standardized by AUTOSAR that provides a
set of requirements speciﬁcations for each BSW module. On the other hand, the
functionality of the application layer is generally not standardized, although there
are some predeﬁned software components. However, AUTOSAR provides a metamodel followed by a set of design requirements that deﬁne the language for the
architectural models of the application layer in order to facilitate the exchange
of models between OEMs and suppliers [6]. This meta-model and design requirements serve as basis for the development of AUTOSAR modeling tools.
Based on the described methodology, we can distinguish between the following three types of requirements in the AUTOSAR based development process:

Should We Adopt a New Version of a Standard?

131

1. Functional requirements for the application software speciﬁed by OEMs.
2. Design requirements for the system models standardized by AUTOSAR.
3. BSW requirements for each BSW modules standardized by AUTOSAR.
In this paper, we focus on the last two types that are standardized by
AUTOSAR. Design requirements are described in the speciﬁcations called “templates” (TPS ) and they can be either speciﬁcation items or constraints (checked
by modeling tools). BSW requirements are described in the software requirement
speciﬁcations SWS. An example for each type is provided below:
– Specification item example: The 1:n multicast routing is supported with
the deﬁnition of several IPduMappings classes.
– Constraint example: The value of windowSize shall be greater or equal 1.
– BSW requirement example: CAN module shall allow that Multiplexed
Transmission functionality is conﬁgurable (ON — OFF) at pre-compile time.
Apart from these types of requirements, AUTOSAR builds a requirements
traceability hierarchy starting from the explained requirements until the general
AUTOSAR features and objectives. These requirements are, together with the
OEM-speciﬁc functional requirements, not considered in our analysis as they do
not produce a direct impact on the development of automotive systems.
All AUTOSAR requirements speciﬁcations including the meta-model are
released simultaneously. There are three types of AUTOSAR releases:
1. Major release: ﬁrst digit change, contains backwards incompatible features.
2. Minor release: second digit change, contains backwards compatible features.
3. Revision: third digit change, contains bugﬁxes only.

4

Research Methodology

In order to provide the answer to the research question addressed in this study
that is presented in the introduction, we developed a method named SREA that
aims to reduce the costs and time associated with the process of analyzing the
evolution of standardized requirements and evaluated it in a case study [12] of
the AUTOSAR requirements, in collaboration with VCG.
We deﬁned the SREA method by relying both on the existing and novel
approaches. First, we conducted a literature review on the existing approaches
for monitoring the evolution of system requirements using the snowball method
[16]. In order to identify the starting set of papers, we searched for papers mentioning requirements evolution and requirements volatility keywords in their title,
abstract and keywords sections using Google Scholar, IEEE Xplore and Scopus
databases. We selected papers [2,9,13] as the starting point and continued to
look for references and citation in these papers. We performed three iterations
in total and analyzed in details 23 out of 42 relevant papers. Second, we improved
our understanding of the case study context in semi-structured interviews, workshops and meetings with AUTOSAR experts from VCG. These two steps served
as input for deﬁning the SREA method.

132

C. Motta et al.

Finally, we evaluated the method using AUTOSAR requirements as a unit
of analysis from the chosen set of AUTOSAR releases. We chose all AUTOSAR
releases from the latest major release 4.0.1 until the latest revision 4.2.2. We
decided not to consider previous AUTOSAR releases as their speciﬁcation are not
structured in the same way which makes them hard to be analyzed automatically.
Nevertheless, AUTOSAR has a lot of signiﬁcant changes in the last major release.
Finally for the detailed analysis of changes between two releases, we decided to
focus on the changes between 4.0.3 and 4.2.2 releases.
In order to be able to cope with the size of AUTOSAR that counts more
than 21.000 requirements in its latest release (4.2.2 ), we developed a conﬁgurable software tool for gathering data and calculating and presenting the results
to the AUTOSAR engineers at VCG. The tool compares diﬀerent versions of
the AUTOSAR requirements speciﬁcations in PDF and creates a structured
report. The report presents the following information for the analyzed speciﬁcations: types of requirement changes in all analyzed releases, change history
of each requirement and the number of requirements, cumulative number of
requirements, number of changes and cumulative number of changes for each
AUTOSAR release.
Finally in order to validate the proposed SREA method, we distributed a
survey with 10 questions to six AUTOSAR experts at VCG. The questions
were based on both quantitative and qualitative results of the method applied
on AUTOSAR requirements, e.q., which AUTOSAR speciﬁcations are mostly
unstable, and they aimed to assess whether the results of the method are in line
with the expectation of the experts who participated in the development of the
AUTOSAR standard. The experts were not aware of the method results.

5

The SREA method

The SREA method we propose in this paper consists of the following steps:
1. Deﬁne taxonomy of requirement changes in order to design the right metrics
for performing the quantitative analysis in the next step.
2. Perform quantitative analysis of evolution of the requirements speciﬁcations
in order to be able to correctly prioritize them in the next step.
3. Prioritize individual requirements speciﬁcations in order to select groups of
speciﬁcations for the qualitative analysis in the next step.
4. Perform qualitative analysis of changes in order to accurately assess their
impact on the system under development.
Step 1: Define taxonomy of changes: The ﬁrst step of the SREA method
aims to deﬁne the taxonomy of changes by:
1. Deﬁning which types of changes shall be considered, e.g., added requirements,
in order to deﬁne the metric for each type.
2. Deﬁning the metrics for diﬀerent types of changes, e.g., NoA as the number
of added requirements, in order to calculate the total number of changes.

Should We Adopt a New Version of a Standard?

133

3. Deﬁning the total number of changes, i.e., NoC, as the (weighted) sum of
results of the previous metrics, in order to perform quantitative analysis.
4. Deﬁning the taxonomy of modiﬁcations (which modiﬁcations shall be considered), e.g., requirements title, in order to perform qualitative analysis.
First three points aim to deﬁne the types of changes that shall be considered in the requirements evolution analysis, as not all changes have impact on
the system under development (e.g., split requirements with low probability of
occurrence). We speciﬁed the following metrics for each type of identiﬁed change:
1.
2.
3.
4.
5.
6.

NoA for the number of added requirements.
NoS for the number of split requirements.
NoU for the number of merged requirements.
NoD for the number of deleted requirements.
NoM for the number of modiﬁed requirements.
NoC for the total number of changed requirements.

The NoA, NoS, NoU, NoD and NoM are simple metrics that are calculated
by counting the number of occurrences of each type of change. The NoC metrics
is calculated as the sum of the results of other metrics, as shown in Formula 1:
N oC = a ∗ N oA + b ∗ N oS + c ∗ N oU + d ∗ N oD + e ∗ N oM

(1)

The coeﬃcients a, b, c, d and e are there to indicate which simple metrics
shall be considered in the NoC metric, i.e., value 0 means that this particular
type shall not be considered whilst value 1 means that it shall be considered.
The last point aims to deﬁne the taxonomy of requirements modiﬁcations
that shall be considered in the analysis. Requirements that are added, split,
merged and deleted are usually easily detectable based on their unique IDs or
names. However, requirements that are modiﬁed require checking whether their
content was changed. In practice, not all modiﬁcations to the content of the
requirements are relevant, e.g., ﬁxing spelling mistakes does not require eﬀort
for fulﬁlling the analyzed requirements. There are no general rules for deciding
which modiﬁcation are not relevant, so they have to be deﬁned. Table 1 shows
the taxonomy of the general types of modiﬁcation we encountered in our study.
Additionally, it is advisable to implement the taxonomy of modiﬁcations in
the tool responsible for calculating the metrics in a conﬁgurable way so that the
inclusion/exclusion of each type could be done automatically.
Based on the deﬁned taxonomy of changes, we can exclude from both quantitative and qualitative analysis all types of changes in the requirements that do
not aﬀect their semantics. This in turn is very valuable for the engineers analyzing the evolution of requirements as they are presented with the precise measure
of requirements change considering only those requirements that actually require
certain eﬀort to be fulﬁlled by the system under development.
In order to successfully perform points 1–4 from the ﬁrst step for a speciﬁc
industrial case, a preliminary analysis of the requirements behavior shall be conducted. We propose the micro analysis method [14] that enables a comparison

134

C. Motta et al.
Table 1. Taxonomy of modiﬁcations

Types of modiﬁcations

Description

Grammar and spelling corrections

Grammar and spelling improved in a new release.

Encoding modification

The speciﬁcations can be encoded in diﬀerent
ways and the output could slightly change.

Format modification

The format can change, e.g., how requirements
are structured in tables or text.

Change in the technical term name Changes in the name of, e.g., an API or a class.
Title modification

The title of a requirement can change.

Content modification

Modiﬁcation in the content of a requirement.

Reference modification

Change of requirement’s traceability reference

of two diﬀerent versions of one signiﬁcant (in terms of number of requirements
and their scope) requirement speciﬁcation for each category of requirements,
e.g., functional requirements, design requirements, etc. This comparison aims
to identify both diﬀerent types of changes and diﬀerent types of modiﬁcations.
The outcome of the micro analysis should be a table with one row for each
requirement changed and one column for each type of change encountered in the
analysis, i.e., deleted, split, modiﬁed, merged and added requirements. Depending on the results of the micro analysis, we could then decide not to consider
certain types of changes if their occurrence is insigniﬁcant for the analysis.
Step 2: Perform quantitative analysis of requirements evolution: In this
step we aim to quantitatively analyze the evolution of standardized requirements
in order to identify speciﬁcations that are mostly aﬀected by changes. Our qualitative analysis relies on the NoC (Number of Changes) metric that counts the
number of added, deleted, split, merged and modiﬁed requirements according
to the deﬁned taxonomy of changes. We analyze the evolution of requirement
speciﬁcations in two ways: (i) by calculating the NoC and (ii) by considering
the percentage of changes, based on the RMI metric that is derived from NoC.
The ﬁrst one gives an overview of the amount of changes and which speciﬁcations contain the biggest number of changed requirements. The other one does
that same taking also the total number of requirements into account. For example, although a speciﬁcation with one thousand changes has a signiﬁcant NoC,
it could be quite stable, i.e., with a low percentage, if it contains ten thousands
requirements. For this reason, we aim to assess the stability of each requirement
by measuring RMI in the way deﬁned in Formula 2.
RM I =

Rt − N oC
Rt

(2)

Rt represents the total number of requirements for a speciﬁc version while
NoC represents the total number of changed requirements between this version
and the previous one. Substructing the results of the RMI metric from Rt can

Should We Adopt a New Version of a Standard?

135

be used for calculating the percentage of changed requirements. Note that the
percentage could exceed 100 % in some cases because RMI considers all types of
changes including merged and deleted requirements. For example, one speciﬁcation could have 199 NoD, 20 NoA, and 47 NoM, hence 266 NoC from one version
to another. However it could have just 200 requirements in the last version.
Step 3: Prioritize individual requirements specifications: The third step
of the proposed method aims to collect a group of speciﬁcations based on the
results of the previous step and the importance of each speciﬁcation for the
system under development. One speciﬁcation is usually considered important if
its requirements are needed for adopting a speciﬁc standardized feature in the
system. The set of prioritized speciﬁcations are then grouped according to their
semantics (e.g., relevant standardized features) in order to serve as input to the
next step of the method. We do not specify the number of speciﬁcations that
should be prioritized and grouped because it depends on the needs, e.g., adopting
one small standardized feature can aﬀect only a few requirements speciﬁcations
while adopting an entire new release of a standard may aﬀect many.
Step 4: Perform qualitative analysis of changes: The last step of the
SREA method is focused on the qualitative analysis of changes in the prioritized
group of requirement speciﬁcations. The analysis of the actual changes in the
requirements is done by comparing their content (i.e., whether their textual
representation is the same or not) between diﬀerent releases of the standard.
The outcome of this step is a report for each prioritized speciﬁcation or for a
group of speciﬁcations related to the analyzed feature. In order to increase its
readability, we propose to structure the report in the following way:
1. Table of Contents contains a list of sections and subsections of the report.
2. General Data contains the results of the NoC metric, for each type of change,
and RMI metrics, calculated again for the prioritized speciﬁcations.
3. List of Changed Requirements contains the list of all types of changes considered by specifying the ReqId, title and content of each requirement.
4. Detail of Modiﬁed Requirements contains the comparison between the content
of all modiﬁed requirements emphasizing (e.g., bold or coloring) the modiﬁed
text according to the taxonomy deﬁned in step 1.
Since the main goal of SREA is to increase the speed of analysis of standardized requirements evolution, automated tool support for performing both
quantitative and qualitative analysis described in the steps above is an important part of the method. This tool should also be able to generate the ﬁnal report
based on the conﬁgured taxonomy of changes, as already explained.

6

Evaluation of the SREA Method on AUTOSAR

In this section, we show partial results from the evaluation of the SREA method
on the evolution of AUTOSAR requirements for a speciﬁc objective: to facilitate updates of the AUTOSAR modeling tools with new releases of AUTOSAR.

136

C. Motta et al.

This implies focusing on the analysis of AUTOSAR speciﬁcations containing design
requirements. We organize this section according to the steps of the method.
Step 1: Define taxonomy of changes: We initially performed micro analysis
using two AUTOSAR speciﬁcations: AUTOSAR TPS SystemTemplate, containing design requirements, and AUTOSAR SWS Com, containing BSW requirements. Figure 1 shows an extract of the results for SWS COM.

Fig. 1. Short extract of the micro analysis

Based on the complete results of the micro analysis of the SWS COM and
TPS SystemTemplate speciﬁcations, we concluded that the number of merged
and split requirements is very low and therefore insigniﬁcant for our study. Therefore, we decided not to consider split and merged requirements because of their
low probability of appearance and increased diﬃculty in detection. The decision
not to consider these types of requirement changes resulted in the deﬁnition of
the NoC metric we used for AUTOSAR evaluation presented in Formula 3.
N oC = 1 ∗ N oA + 0 ∗ N oS + 0 ∗ N oU + 1 ∗ N oD + 1 ∗ N oM

(3)

Based on the taxonomy of modiﬁcations, and in the discussion with the engineers from VCG, we decided to consider only the following types of modiﬁcation
for the quantitative analysis performed in step 2: Change in the technical term
name, Title modiﬁcation and Content modiﬁcation.
Step 2: Perform quantitative analysis of requirements evolution: The
results of the quantitative analysis of the AUTOSAR requirements evolution
are presented in Fig. 2. We considered all minor releases and revisions of the
AUTOSAR major release 4 (i.e., releases from 4.0.1 to 4.2.2 ).
We can observe the evolution of the AUTOSAR standard in Fig. 2 from two
diﬀerent perspectives: (i) the NoC across consequently releases (left chart) and
(ii) the total number of requirements (right chart). Based on these charts, we can
see that AUTOSAR is continuously changing through its releases. Furthermore,
we can also see that AUTOSAR is continuously growing: In R401 AUTOSAR
had 14.000 requirements whilst in the last version (R422 ) it counts more than
21.000 requirements. Finally, we can see that minor releases of AUTOSAR (R411
and R421 ) bring more changes to the requirements than revisions.
In order to analyze the speciﬁcations that are mostly aﬀected by the evolution
of the AUTOSAR standard, we sorted them based on the results of the NoC and

Should We Adopt a New Version of a Standard?

137

Fig. 2. Results of the quantitative analysis of the AUTOSAR requirements evolution

Fig. 3. AUTOSAR speciﬁcations ranked according to the NoC and RMI

RMI metrics calculated between the chosen AUTOSAR releases 4.0.3 and 4.2.2.
Figure 3 shows the ﬁrst 7 (over 84) speciﬁcations.
The results of the NoC metric show that SWS RunTimeEnvironment and
SWS DiagEventManager speciﬁcations are mostly aﬀected by the changes (more
than 800 changes). Although they have a signiﬁcant NoC, we can not directly
conclude that these speciﬁcations are also the most unstable ones. To assess this,
we ranked them by the results of the RMI metric in order to investigate the relations between the two lists. We can see that SWS SynchTimeBaseManager and
SWS SocketAdaptor speciﬁcations have the highest RMI value. By combining
the two lists, we can also see that SWS SocketAdaptor and SWS SystemTemplate
are the only two speciﬁcations that are considered highly aﬀected by the evolution based on the results of both metrics. Generally, all the speciﬁcation that
have high values of NoC and RMI are good candidates to be analyzed ﬁrst,
depending on their importance for the actual system under development.
Step 3: Prioritize individual requirements specifications: In this step,
we focused on the main objective of this analysis: facilitate the updates of the
AUTOSAR modeling tools based on a new release of AUTOSAR. Therefore,
we asked the AUTOSAR engineers from VCG which speciﬁcations are considered the most important for the analysis of impact on the AUTOSAR modeling
tools. They agreed on the following three speciﬁcations: TPS SystemTemplate,
TPS SWComponentTemplate and TPS ECUConﬁguration.

138

C. Motta et al.

Based on this prioritization and the outcome of the previous step that identiﬁed speciﬁcations mostly aﬀected by the changes, we decided to focus on the
qualitative analysis of TPS SystemTemplate and TPS SWCTemplate in the ﬁrst
phase. The TPS SystemTemplate speciﬁcation contains general design requirements on how the system shall be designed, e.g., description of ECUs connected
with electronic buses and transmission of signal on the electronic buses. The
TPS SWComponentTemplate speciﬁcation contains general design requirements
on how software components should be designed with their data interfaces.
Step 4: Perform qualitative analysis of changes: In the last step we performed qualitative analysis on the prioritized speciﬁcations by running the tool
we implemented and by providing a detailed report. In this report, we ﬁrst show
the results of all proposed metrics, as presented in Table 2.
Table 2. Report - results of the metrics applied on the chosen speciﬁcations
Metric TPS SystemTemplate TPS SWCTemplate Total
NoA

472

374

846

NoD

4

24

28

NoM

13

228

241

NoC

489

626

1115

RMI

0,01

0,41

0,28

The only signiﬁcant diﬀerence in results between the two speciﬁcations is
for the NoM metric. TPS SystemTemplate counts only 13 modiﬁcations whilst
TPS SWComponentTemplate counts 228 modiﬁcations. Nevertheless, the RMI
metric indicates higher stability of the TPS SWComponentTemplate (0,41) in
comparison to the TPS SystemTemplate (0,01). This is because the number of
requirements in the last version of the TPS SWComponentTemplate is much
higher than in the last version of the TPS SystemTemplate (1069 compared to
494) with much smaller diﬀerence in the NoC value (626 compared to 489).
After showing the results of all metrics, we list in the report all added,
removed and modiﬁed requirements emphasizing the textual modiﬁcations in
the modiﬁed requirements in bold. The example of the presentation of one modiﬁed speciﬁcation items in the report is shown in Fig. 4.
One interesting discovery that we came across by analyzing changes in the
AUTOSAR requirements was the fact that change in the name of certain technical
terms (e.g., API or meta-class names) can have a signiﬁcant impact on the results
of the metrics. For example, we located an unexpected increase in the results of
the NoM metric between AUTOSAR releases 4.2.1 and 4.2.2. After investigating
the causes of this, we found that AUTOSAR renamed one requirement speciﬁcation from SWS DevelopmentErrorTracer to SWS DefaultErrorTracer. As a consequence of this renaming, all requirements that contained the word development
(in this context) have also been renamed to default.

Should We Adopt a New Version of a Standard?

5.54

139

[TPS SWCT 01209]

———–content———–
previous version:
ClientServerAnnotation
the clientserverannotation can be used to provide more information with
respect to the operation of the port.
current version
ClientServerAnnotation
the clientserverannotation can be used to provide more information with
respect to the clientserveroperation of the portprototype.
——————————————–

Fig. 4. Example of modiﬁcation of a speciﬁcation item

In order not to invalidate the results by this change that has no impact on the
semantics of the analyzed requirements, we provided an option in our tool for the
engineers to specify which changes in the names of the technical terms shall be
ignored (e.g., every change of a single word from development to default). Using
this option, engineers can add these types of modiﬁcations to the conﬁguration
ﬁle every time they encounter them and run the tool again. We discovered that
ignoring these types of replacements can signiﬁcantly decrease the total NoM
and therefore the size of the report providing more accurate results to the users.
For example, excluding the development to default replacement reduced the total
NoM between the two analyzed releases by 27 %.

7

Discussion

We validated the proposed SREA method by distributing a survey to the
AUTOSAR experts at VCG in which we asked them a number of questions
related to the evolution of the AUTOSAR requirements speciﬁcations, e.g.,
which speciﬁcations they think are mostly aﬀected by the changes between two
AUTOSAR releases. Our goal was to assess whether the results of our method
meet their expectations. We concluded that the results of the SREA method
fully met the answers from the VCG experts in 66 % of questions and were signiﬁcantly diﬀerent in just 1 %. In 17 % of questions we did not get an answer
from the experts and in 16 % of questions the answer was slightly diﬀerent than
the one provided by our method, e.g., the experts indicated that the second most
aﬀected document according to our method was mostly aﬀected by the changes.
More details about the validation including survey questions and answer can be
found in [10].

140

C. Motta et al.

Based on the results of the validation, we concluded that the proposed
method can indeed be used for analyzing the evolution of standardized requirements of AUTOSAR and identifying the requirements speciﬁcations, together
with the actual requirements, that are mostly aﬀected by the changes. However,
additional validation of the true beneﬁts of the proposed method and the tool
in reducing the amount of time spent analyzing the evolution of standardized
system requirements is yet to be performed, as we explained in the future work.
In order to assess diﬀerent threats to validity for our study, we followed Cook
and Campbell’s list of threats [5], i.e., threats to internal, external, construct and
conclusion validity. Due to space limitation, we describe in this paper the most
important threats to internal validity, that concerns accuracy of our results, and
external validity, that concerns generalization of our results.
The most important threat to the internal validity of our study is related
to what is considered a requirement in the AUTOSAR speciﬁcations. According
to AUTOSAR, not only speciﬁcation items and constraints described in this
paper can be considered as requirements, but also plain text written in the
speciﬁcations as it is mandatory to be followed when developing AUTOSAR
compliant systems. However, we realized during our micro analysis that most
of the important statements are part of speciﬁcation items (and constraints),
whilst the remaining text usually represents examples, rationales, and ﬁgures.
For this reason, we believe that the internal validity of our results is still high
as we managed to analyze the most important content of the requirements.
The most important threat to the external validity of our study is related
to the generalizability of our results to systems that are developed based on
other industrial standards and their requirements. Although we cannot claim
that the SREA method can provide equally good results in other domain without
evaluating it in additional case studies, we believe that the this is likely due to the
fact that we designed the steps of the method considering the existing literature
and studies performed and validated in diﬀerent domains (e.g., avionics).
Finally, we can recommend to other companies who would like to analyze the
evolution of standardized system requirements to start by deﬁning the taxonomy
of possible requirements modiﬁcations and the types of changes according to their
knowledge about the analyzed standard. We believe that the other steps of the
method are applicable to other contexts/domains as well.

8

Conclusion

In this paper, we present and evaluate the method named SREA that can be
used to facilitate the process of adopting new releases of industrial standards and
their features. The method is based on the quantitative and qualitative analysis
of evolution of the standardized system requirements and is able to:
– Identify, based on the NoC metric, requirements speciﬁcations that are mostly
changed in the new release of a standard, indicating that they should be
considered ﬁrst in the analysis of impact of adopting the new release.

Should We Adopt a New Version of a Standard?

141

– Identify, based on the RMI metric, requirements speciﬁcations that are
mostly unstable during the evolution of one standard, indicating that features
described in these speciﬁcations may contain defects.
– Present the actual content of added/removed and modiﬁed requirements in the
concrete speciﬁcations of one release of a standard to the engineers performing
the analysis, thus signiﬁcantly reducing the time of analysis.
We apply and validate the SREA method on the case of AUTOSAR standard
by developing the software tool that implements the method. We used the proposed method and the tool to study and assess the impact of AUTOSAR requirements evolution on the automotive software systems based on AUTOSAR. Our
results show that the requirements standardized by AUTOSAR and their evolution should be analyzed and measured in a structured and automated way, i.e.,
by following a clearly deﬁned method supported by a software tool to automate
the process of gathering results. This approach helped automotive engineers from
Volvo Car Group to faster assess the impact of AUTOSAR design requirements
changes, related to a set of new AUTOSAR features, on the AUTOSAR modeling tools used in the development process.
In particular, we show that by applying the SREA method to diﬀerent versions of the AUTOSAR standard, it is possible to identify the most important
requirements speciﬁcations to be analyzed in the ﬁrst phase. As an example,
we showed the analysis of the two speciﬁcations - TPS SystemTemplate and
TPS SWComponentTemplate - shall be done ﬁrst in order to assess the impact
of switching from the AUTOSAR release 4.0.3 to release 4.2.2 on the used
AUTOSAR modeling tools. We also show that the method is able to provide a
report containing only relevant information on the added, removed and modiﬁed
requirements to the automotive engineers in order to increase the speed of analysis. For example in case of the TPS SystemTemplate, the report contained 269
pages of relevant information about the changes whilst the same speciﬁcation
provided by AUTOSAR have around 1500 pages. Without the proposed method,
these pages would need to be compared manually between the analyzed releases.
The information provided by the SREA method can therefore help organizations responsible for managing large software systems in understanding which
areas of the system will be mostly aﬀected by the changes in the standardized
requirements and therefore faster adopt new releases of the standard.
We identiﬁed several potential areas of interest for further work. Since
requirements evolution is today considered to be a challenging task for both
industry and academia, this study shall be considered as a ﬁrst step in deﬁning
the methodology for performing this task. There are several interesting ways for
improving and/or extending the method we propose, in particular:
1. Calculate the actual engineering eﬀort that is saved by using the SREA
method. This would increase the validity of the presented results.
2. Apply SREA to other industries that develop system based on standards.
This would increase the generalizability of the presented results.

142

C. Motta et al.

3. Extend the proposed method to provide eﬀort estimates for adopting new
standardized features in the development projects. This would additionally
help companies in allocating resources for supporting speciﬁc features.
4. Extend the proposed method to include a model for estimating the number
of changes to the requirements speciﬁcations that will occur in the future
releases of the standard. This would help standardization organizations in
allocating resources for working with the most critical speciﬁcations early.
The tool we used for the analysis of AUTOSAR requirements can be downloaded from here: https://www.chalmers.se/en/projects/Documents/SREA.zip
Acknowledgment. The authors would like to thank Swedish Governmental Agency
for Innovation Systems (VINNOVA) for funding this research (grant no. 2013-02630)
and the AUTOSAR team at Volvo Car Group for contributing to the work.

References
1. Anderson, S., Felici, M.: Controlling requirements evolution: an avionics case study.
In: Koornneef, F., Meulen, M. (eds.) SAFECOMP 2000. LNCS, vol. 1943, pp. 361–
370. Springer, Heidelberg (2000). doi:10.1007/3-540-40891-6 31
2. Anderson, S., Felici, M.: Requirements evolution from process to product oriented
management. In: Bomarius, F., Komi-Sirviö, S. (eds.) PROFES 2001. LNCS, vol.
2188, pp. 27–41. Springer, Heidelberg (2001). doi:10.1007/3-540-44813-6 6
3. AUTOSAR, www.autosar.org: Automotive Open System Architecture (2003)
4. Broy, M., Kruger, I., Pretschner, A., Salzmann, C.: Engineering automotive software. Proc. IEEE 95(2), 356 (2007)
5. Cook, T., Campbell, D.: Quasi-Experimentation: Design & Analysis Issues for Field
Settings. Houghton Miﬄin, Boston (1979)
6. Durisic, D., Staron, M., Tichy, M.: ARCA - Automated analysis of AUTOSAR
meta-model changes. In: International Workshop on Modelling in Software Engineering (2015)
7. Durisic, D., Staron, M., Tichy, M., Hansson, J.: Evolution of long-term industrial
meta-models - a case study of AUTOSAR. In: Euromicro Conference on Software
Engineering and Advanced Applications, pp. 141–148 (2014)
8. Ernst, N., Borgida, A., Jureta, J., Mylopoulos, J.: An overview of requirements
evolution. In: Mens, T., Serebrenik, A., Cleve, A. (eds.) Evolving Software Systems,
pp. 3–32. Springer, Heidelberg (2014)
9. Li, J., Zhang, H., Zhu, L., Jeﬀery, R., Wang, Q., Li, M.: Preliminary results of a systematic review on requirements evolution. In: Proceedings of the IEEE Conference
on Evaluation Assessment in Software Engineering, pp. 12–21 (2012)
10. Motta, C.: Analyzing the Evolution of System Requirements. Chalmers — University of Gothenburg (2016)
11. Nurmuliani, N., Zowghi, D., Fowell, S.: Analysis of requirements volatility during software development life cycle. In: Proceedings of the Australian Software
Engineering Conference, pp. 28–37 (2004)
12. Runeson, P., Host, M.: Guidelines for conducting and reporting case study research
in software engineering. In: Proceedings of the Conference on Empirical Software
Engineering, pp. 131–164 (2009)

Should We Adopt a New Version of a Standard?

143

13. Shi, L., Wang, Q., Li, M.: Learning from evolution history to predict future requirement changes. In: Proceedings of the International Conference on Requirements
Engineering, pp. 135–144 (2013)
14. Stark, G., Skillicorn, A., Smeele, R.: A micro and macro based examination of the
eﬀects of requirements changes on aerospace software maintenance. In: Proceedings
of the IEEE Conference on Aerospace, pp. 165–172 (1998)
15. Wang, H., Li, J., Wang, Q., Wang, Y.: Quantitative analysis of requirements evolution across multiple versions of an industrial software product. In: Proceedings
of the 17th Conference on Asia-Paciﬁc Software Engineering, pp. 43–49 (2010)
16. Wohlin, C.: Guidelines for snowballing in systematic literature studies and a replication in software engineering. In: Proceedings of the 18th International Conference
on Evaluation and Assessment in Software Engineering (2014)

Choreography Modelling Language
for the Embedded Systems Domain
Empirical Evaluation and Lessons Learned
Nebojša Taušan1(B) , Jari Lehto2 , Jouni Markkula1 ,
Pasi Kuvaja1 , and Markku Oivo1
1

M3S Research Group, University of Oulu, Oulu, Finland
{nebojsa.tausan,jouni.markkula,pasi.kuvaja,markku.oivo}@oulu.fi
2
Research and Development, Management and Automation,
NOKIA, Espoo, Finland
jari.lehto@nokia.com

Abstract. Choreography, as a service-oriented architecture-speciﬁc
viewpoint, is increasingly present in the embedded systems domain.
Existing languages for choreography modelling, however, are insuﬃciently expressive to capture the complexities that are typical in the
embedded systems domain. To address this, a new language for choreography modelling was designed. This study presents an empirical evaluation of the language and ﬁndings based on the evaluation. The empirical
evaluation was conducted with experts from four software companies and
two university research groups. Data were collected using focus group
method and analysed with template-based thematic analysis. The ﬁndings of the evaluation revealed (a) software testing and protocol development as areas in which the new language can be applied, (b) design
requirements for the language improvement, and (c) practical challenges
regarding the use of the language. For practitioners, the ﬁndings conﬁrmed the applicability of choreography modelling in protocol development and that users’ level of expertise has a signiﬁcant inﬂuence on the
introduction of the language into practice. For researchers, the ﬁndings
revealed how choreography can by used beyond its original purpose in the
testing phase and identiﬁed new aspects that can be considered during
choreography modelling language design.
Keywords: Choreography
systems

1

·

Focus group

·

Evaluation

·

Embedded

Introduction

Service-oriented architecture is increasingly used in embedded systems (ES)
development as an approach to manage the growing complexity of ES [13,21].
Systems based on service-oriented architecture consist of distributed services as
their main building blocks and rely on service interactions to realise systemlevel goals. There are two viewpoints on service interactions: choreography and
c Springer International Publishing AG 2016

P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 144–159, 2016.
DOI: 10.1007/978-3-319-49094-6 10

Choreography Modelling Language for the Embedded Systems Domain

145

orchestration [18]. Choreography implies decentralised coordination and focuses
on the ﬂow of service interactions from a neutral point of view. Orchestration
implies centralised coordination and focuses on the ﬂow of tasks from the point
of view of a single participant.
Although orchestration is more frequently studied, several studies indicate that there is increasing interest in choreography among practitioners and
researchers of SOA-based ES. For example, Kaur et al. [8] suggest that choreography is a more suitable viewpoint for capturing the behaviour of large-scale
control systems. Modelling and execution of service choreography were chosen
over orchestration as an approach to broaden the industry manufacturing system with features of event-driven architecture in [24]. Following this approach,
an execution of choreography with strong real-time requirements in the context
of manufacturing was implemented in [11]. Specifying a choreography viewpoint
in ES, however, is a challenging task for two reasons. First, general purpose modelling languages used to specify service interactions are not expressive enough
to capture the complexities that are typical in the ES domain [1,7,14]. Second, custom-developed languages for choreography modelling in ES are tightlycoupled with the speciﬁc development area they aim to support [28]. Once used
in the development process, which emphasises other development areas, these
custom-developed languages become only partially applicable.
To address these problems, one of the goals of the AMALTHEA international
research project [20], of which this research is part, was to design a choreography
modelling language that is applicable in ES development. To design the language,
the authors conducted several studies focusing on challenges in software architecture development [25], the inﬂuence of middleware features on choreography
language [26] and the design requirements for choreography modelling in ES [28]
within this project. Based on the ﬁndings from these studies the authors designed
a new choreography modelling language (CML) that is suﬃciently expressive for
use in the ES domain.
This study builds on the previously mentioned studies. Its goal is to learn
about the applicability of the designed CML in the ES domain using experts’
evaluations. Based on this goal, the following research question was derived:
RQ: To what extent is the designed CML applicable in the ES domain from
experts point of view?
This research question was answered using a focus group for expert evaluation. Findings based on the evaluation complete the study goal by revealing how
experts see the applicability of the CML in the ES domain and by highlighting
directions for further improvements.
The presentation of the CML1 evaluation consists of the following sections.
Section 2 presents previous evaluations of the choreography modelling languages.
Section 3 presents the way in which focus group approach was adopted to evaluate the CML, while Sect. 4 presents the research ﬁndings. The lessons learned
from these ﬁndings are presented in Sect. 5. Techniques for ensuring the trustworthiness of our ﬁndings are presented in Sect. 6. Section 7 concludes the study.
1

The CML and its editor are publicly accessible and can be obtained from [3].

146

2

N. Taušan et al.

Background

Several studies have focused on evaluating choreography modelling languages.
The support for basic and complex control structures oﬀered by three existing
languages for choreography modelling was evaluated in [2]. The study characterised the oﬀered support for control structures, and indicated potential
improvements for the languages. A sub-set of language constructs for choreography modelling from the Business Process Model and Notation (BPMN,
bpmn.org) was evaluated using the extended semiotic quality framework in [4].
The study concluded that, despite several limitations, the BPMN is appropriate for capturing choreography. Future choreography modelling language evaluations can beneﬁt from the quality model for choreography presented in [16].
This quality model focuses on external quality attributes such as the functionality, eﬃciency and security and represents a step toward a full quality model for
choreography speciﬁcations.
The focus of the study presented in this article is the evaluation of the
designed CML (See Footnote 1), which was intended to address the limited
expressiveness and partial applicability of existing choreography modelling languages used in the ES domain. This evaluation diﬀers from the previously mentioned evaluations since (a) it represents an empirical evaluation that is conducted with experts, and (b) it considers the needs of the ES domain. The CML
that is evaluated is designed based on the knowledge from our analysis of the
state of the art and practice. This knowledge is presented in [28] and includes
a set of design requirements, language implementation technologies, and reasons for selecting BPMN as an existing choreography modelling language that
is suitable for redesign according to the design requirements.

3

Research Design

The goal of this study is to evaluate the applicability of the CML that was
designed for ES during the AMALTHEA research project, and for this purpose
we chose the focus group (FG) method [15]. The main reason for choosing the
FG method is that it is a well-accepted method to collect data in the software
engineering ﬁeld [12] and a suitable method to obtain initial evaluations about
new artefact designs [10,29]. The FG method utilises FG inquiry sessions as
the main method of data collection and allows various data analysis methods
to be used to produce the ﬁndings. In this study, FG inquiry was supported by
the aﬃnity grouping technique [6], while the data analysis relied on thematic
analysis with templates [9] and was supported by frequent collaboration with
experts. A summary of these methods is presented below.
Focus group inquiry referes to a session in which a moderator asks a group of
people about their opinions, attitudes and beliefs about the phenomenon under
study [15]. During an inquiry, a moderator encourages participants to discuss
and, if needed, align their opinions. This type of inquiry, therefore, produces
qualitative data from both participants’ answers and their discussions.

Choreography Modelling Language for the Embedded Systems Domain

147

The aﬃnity grouping technique is a brainstorming method suitable for generating ideas about a phenomenon under study. Once ideas are collected, participants can easily organise them into meaningful categories based on similarities
between the ideas [6].
The thematic analysis with templates (TAT) method was used to analyse the
experts’ feedback [9]. TAT is a variant of thematic analysis method, which is a
qualitative method used for identifying patterns, themes, and their interpretations within collected data [17]. What diﬀerentiates TAT from the conventional
thematic analysis is the use of a template during analysis. The template represents an initial set of themes that will be studied in the collected data. This set of
themes is developed based on researchers experience and exploration of the phenomenon under study. The set is a ﬂexible structure that evolves as the analysis
progresses. According to King [9], TAT is less time-consuming and more ﬂexible
than other variants of thematic analysis. In additon to these reasons, TAT was
chosen for this study since it is a suitable technique for analysing FG data that
is collected during the design of artefacts [29].
Collaboration with experts refers to meetings, workshops, demonstrations,
and email conversations during which the experts and researchers participating
in this study discussed various issues and aspects of the CML.
Figure 1 summarises the organisation of these methods in this study. The
methods and ﬁndings are presented as rectangles grouped based on whether
they are used for data collection, data analysis, or representation of the study
ﬁndings. The dotted arrows indicate the relation between methods and between
methods and ﬁndings, while solid arrows represent the order in which these
methods were utilised.

Fig. 1. Research design

Evaluation of the CML using the FG approach follows the steps proposed
in [10]: (a) deﬁnition of the research problem, (b) planning the FG session, (c)
selecting participants, (d) conducting the FG session, (e) analysing the data,
and (f) reporting the results.
Deﬁnition of the research problem. The CML evaluated in this study
is designed for ES development and, was based on experts’ needs and existing knowledge found in the literature [27]. The research problem in this study

148

N. Taušan et al.

addresses the lack of knowledge and understanding about how experts see the
expressiveness of the CML and its applicability in ES development. By understanding experts’ viewpoints on the use of CML for ES, additional reﬁnements
can be made and more focused evaluations in the ﬁeld can be performed, thus
leading to better acceptance of the CML by the ES development community.
Planning the FG session. Two FG sessions were planned. They consisted
of three phases: opening, central discussion and the closing. The opening phase
included an introduction to the general rules, study objectives and participants’
backgrounds. In the central discussion and closing phases, experts’ feedback was
elicited. To elicit feedback, the moderator asked questions recommended in [10].
These questions, which were adopted for our study, are as follows: (1) Is the CML
comprehensible? (2) How can it be deployed into a working context? (3) What
are the potential problems in using or understanding the CML? The ﬁrst two
questions were asked during the central discussion phase while the third question
during the closing phase. We planned to use the aﬃnity grouping technique to
support the closing phase. The planned duration of the FG sessions was ninety
minutes.
Selecting participants. Experts who participated in the FG sessions were
from two large software development industries (LI), two small and medium sized
enterprises (SME) and two university research groups (EDU). Two LIs are global
vendors with thousands of employees whose product portfolios include various
hardware environments, many types of embedded and non-embedded software
and, domain-speciﬁc languages (DSL) and tools. These LIs also participated in
the AMALTHEA project. One SME focuses on the development of software tools
and solutions for systems integration, which can be used for the development
and integration of ES. The other SME focuses on modelling solutions based
on open-source technology. The two research groups (EDU) closly cooperate
with industries regarding modelling and methodological improvements. Table 1
summarises the companies at which the experts are employed.
Table 1. Companies and their characteristics
ID Size

Country Domain

Product

A

LI

Finland

Telecom

Hardware, ES and non-ES

B

SME Finland

SW Products

System integration solutions

C

LI

Tools and DSLs ES, tools, modelsand DSLs

D

SME France

Tools and DSLs Models, tools and methods for ES development

E

EDU France

Modelling

F

EDU Belgium Modelling

France

Nine experts from the presented organisations participated in the FG sessions. All participants were software engineers with more than ten years of experience in software development. These participants were suitable for this inquiry

Choreography Modelling Language for the Embedded Systems Domain

149

due to their expertise in models, modelling languages and modelling technologies. Experts from companies that participated in the AMALTHEA research
project were also involved in the development of the CML under evaluation.
The experts’ backgrounds are presented in Table 2.
Table 2. Focus group participants and their characteristics
Expert Involvement in CML design Role

Role description

1

1 year

Software architect

Design and management

2

3 years

Process improvement
engineer

Product creation,
improvement and renewal

3

1 year

4

Minimal

5

Minimal

Test architect

Testing improvement

6

Not involved

Software architect

Models, tools and method
engineering

7

Not involved

8

Not involved

Researcher and
designer

Model design and
innovation

9

Not involved

Conducting the FG sessions. The FG sessions took place during the winter
and spring of 2015. Prior to the sessions, all experts received materials describing
the CML to study and participated in a workshop in which the developed CML
was presented and discussed. For six FG participants, this workshop was their
ﬁrst exposure to the CML. As planned, the FG sessions consisted of opening,
central discussion and closing phases.
In the ﬁrst phase, the moderator introduced FG participants to the objective
and general rules of the discussion and the experts presented their backgrounds.
In the second part of the session, the moderator asked experts to discuss the
comprehensibility of the CML and its deployment in their work contexts. During
the last part of the session, experts discussed potential problems with using or
understanding the CML. The aﬃnity grouping technique was employed in this
phase. In short, experts were asked to write down problems with the CML.
Then, researchers formed categories based on the similarities between the listed
problems and used these categories during the analysis. The ﬁrst FG sessions
lasted eighty minutes, and the second session lasted ﬁfty minutes. The sessions
were audio-recorded and transcribed.
Data analysis. We adopted TAT in this study and implemented it in three
phases: (a) template development, (b) coding, and (c) interpretation. Template
development. TAT starts with the development of a template that consists of
the initial themes that are considered relevant to evaluation of the CML. These
themes were created based on collaboration with industry partners on CML
design, relevant literature, and questions asked during the FG sessions. The
initial set of themes is presented in the ﬁrst column of Table 3.

150

N. Taušan et al.
Table 3. Initial and ﬁnal set of themes

Initial theme

Final set of themes

CML in practice 1. CML utilization

Final set of sub-themes (codes)
(a) software testing practice
(b) protocol development

Drawbacks

2. Drawbacks

(a) communication layer
(b) real-time execution
(c) Data visualization
(d) Interaction visualization

Challenges

3. Practical challenges of using the CML (a) abstraction level
(b) interactions
(c) integration
(d) task automation

Comprehension

4. Comprehension

No sub-themes

Benefits

Merged with 1.a and 1.b sub-codes

No sub-themes

Coding. After developing the template, researchers encoded the FG transcripts. During the encoding process, researchers assigned the codes to themes
to further reﬁne them. This process created hierarchical structures that formed
the basis for the interpretation of the themes. Final set of themes and sub-themes
is presented in the second column of Table 3.
Comparison of the initial and ﬁnal template reveals the template evolution
during the analysis process. The following changes occurred: (a) the text coded
under the beneﬁts theme was merged with the software testing and protocol
development sub-codes since beneﬁts are related to those development areas;
(b) comprehensibility remained a top-level topic but did not receive further
characterisation since all the experts agreed that the CML is comprehensible and
did not discuss the topic in more detail; (c) a new theme, practical challenges
of using the CML, was added; and (d) text under the challenges theme was
rearranged under several sub-codes.
Interpretation. Researchers interpreted the derived themes, and experts veriﬁed these interpretations during workshops and meetings (collaboration). During both the interpretation and the coding processes, researchers used the NVivo
[19] tool, which is a software package that automates a number of qualitative
research-related tasks.
Reporting of the results. The themes in the ﬁnal version of the template
(presented in Table 3) represent a structured view of the experts’ evaluations of
how the designed CML helps solve the study problem. The ﬁrst theme (ID 1)
shows that the CML’s expressiveness is suﬃcient for its application in the two
identiﬁed areas of ES development. The second theme (ID 2) consists of identiﬁed drawbacks, correction of which can increase the expressiveness of the CML.
The third theme (ID 3) reveals practical challenges to using the CML. Addressing these challenges can contribute to better acceptance of the CML by future
users, thus contributing to its wider applicability. The fourth topic (ID 4) reveals

Choreography Modelling Language for the Embedded Systems Domain

151

that experts see the CML as comprehensible, which can also contribute to better
acceptance by future users. Since the group did not discuss the CML’s comprehensibility in more detail during the FG sessions, this theme was not interpreted
further as part of the research ﬁndings.

4

Research Findings

The research ﬁndings consist of three parts. The ﬁrst part reveals how the CML
can be used as part of software testing and protocol development practices within
one of the participating companies. The second part lists the design requirements
considered important for choreography modelling in ES that the CML does not
address. The third part describes and clariﬁes the challenges experts identiﬁed
regarding CML use in practice.
4.1

CML in Practice

Software testing. Software testign practices rely on techniques such as unit
testing, functional testing, regression testing, and debugging. The management
of testing practices and techniques is organised with regard to the way in which
user requirements are structured. Analysis of the relation between testing practices and the requirement structure revealed an area in which choreography
modelling can contribute to more informative and understandable test speciﬁcations and to shortening of the testing time. This area is highlighted in Fig. 2
and explained further below.
The company uses ‘request’ to denote what customers wish to have in the
product and ‘requirements’ to express how the request aﬀects the product. Upon
receiving a new customer request, requirement engineers generate a hierarchical
structure consisting of requirements and sub-requirements. A top requirement is
broken down to sub-requirements until clear acceptance criteria can be deﬁned
for those sub-requirements. Deﬁned acceptance criteria are one of the main indicators of the completeness of a user request speciﬁcation and are important in
the testing phase.
In the testing phase, the derived (sub-) requirements and their corresponding
acceptance criteria are used for veriﬁcation of new or modiﬁed software functionalities based on those (sub-) requirements. The UML activity diagrams (http://
uml.org/) present the steps for the verifying sub-requirement acceptance criteria.
In practice, steps in one activity diagram often rely on the steps in other activity
diagrams. This is especially the case when the verifying sub-requirements which
are derived from a common requirement.
Figure 2 presents one such exemplary case in which one requirement, two
sub-requirements and their acceptance criteria are derived from a customer
request. In the testing phase, the software functionality created based on the customer request is veriﬁed against the acceptance criteria. Two activity diagrams
(labelled A and B) represent this process. The veriﬁcation processes, however,
are not isolated from each other. This means that the veriﬁcation steps, which

152

N. Taušan et al.

Fig. 2. Choreography modelling area in testing

are presented as ovals, need to interact with steps from other processes in order
to advance the veriﬁcation process.
Figure 2 presents the interaction between the veriﬁcation steps with dashed
arrows. These interactions represent the areas experts identiﬁed as suitable for
choreography modelling and thus can be speciﬁed using the CML. Once speciﬁed, the choreography scenario supplements the appropriate testing speciﬁcation by showing the veriﬁcation of acceptance criteria from the higher level of
abstraction. To represent this, Fig. 2 relates the choreography with the requirement from which the two sub-requirements are derived. Compared to veriﬁcation
of individual sub-requirements, verifying the software functionality against the
requirements is expected to shorten the length of testing by reducing the number
of test cases required to guarantee product quality.
Protocol development. Analysis of the FG discussion and characterisation
of the ES product developed by one of the companies revealed that protocol
development is an area that is suitable for CML use. The ES developed by
the company can be characterised as a large-scale distributed system in which
communication between distributed parts (e.g. components built by autonomous
development teams or companies) occurs by passing messages. The companys
current practice is that the rules that deﬁne the communication dialogs are hardcoded (or implicit) in software components. This way, upon receiving a message,
each component has to analyse whether the message is correct and required by
that particular component. Since the components participate in a large number
of diﬀerent dialogs, hard-coding the communication rules for each dialog makes
the component prone to error, requiring demanding maintenance work, while

Choreography Modelling Language for the Embedded Systems Domain

153

the analysis of each received message leads to degradation of the components’
performance.
To address this problem, the experts proposed making the dialog between
communicating entities more explicit by developing a communication protocol.
Communication protocol, according to [23], speciﬁes the interactions between
communicating entities by deﬁning acceptable dialogs between them and, therefore, enabling independent implementation and maintenance of the system parts
while retaining the correctness of their integration within the system. From the
company’s perspective, there are at least two expected beneﬁts of protocol development. The ﬁrst is the removal of hard-coded rules from the components’ code
and their allocation in the protocol. The second is the possibility to centrally
manage and monitor protocol enactment during ES execution by implementing or using existing monitoring services. Experts expect that these beneﬁts
can reduce the amount of maintenance required and improve the performance
of the components. Since the CMLs syntax natively supports the speciﬁcation
and timely ordering of interactions, experts saw choreography and the CML as
suitable tools for specifying communication protocols.
4.2

Design Requirements for the CML

Analysis of the FG data showed that some of the experts needs are not supported
with the CML. This study presents these needs using a structure similar that
used in our previous study [28]. This structure consists of the context and problem description. The context describes the details related to experts’ needs while
the problem description identiﬁes the implications of those needs for CML. The
context and problem description result in the identiﬁcation of design requirement
(DR).
Diﬀerentiation of the communication layers. Context: Systems built by
FG participants can be characterised as large, distributed, message-based systems. These systems consist of several autonomous parts, which are often distributed and communicate via message exchange. Communication or message
exchange can occur on diﬀerent communication layers depending on the purpose of the communication. Examples of diﬀerent communication layers in the
telecom domain include the exchange of messages for delivering user data delivery (data plane), for deﬁning the routes for data delivery (control plane) and for
conﬁguring system (management plane). Problem: The choreography speciﬁed
using the CML does not diﬀerentiate interactions in terms of communication
layer. To address this, the following design requirement is derived: DR1: The
CML should diﬀerentiate the communication layer on top of which the interactions being modelled are occurring.
Real-time execution. Context: Parts of the ES built in the experts’ companies have various real-time (RT) requirements. Our previous study identiﬁed the
need to capture RT information with the CML [28], but this analysis revealed
that additional RT data need to be supported. For example, the experts highlighted that synchronisation problems between participants can occur in con-

154

N. Taušan et al.

tinuous and discrete time paradigms. Problem: RT information in the CML is
too generic and insuﬃcient for practical use in the ES development process.
To overcome this, additional data that is speciﬁc to the application domain or
industry branch under consideration must supplement the CML. Accordingly,
the following requirement is derived: DR2: The CML should capture real-time
information that is speciﬁc to the application domain or industry branch under
consideration.
Visualisation of data. Context: The ES built in one of the companies supports
multiple communication technologies. Each of these technologies deﬁnes their
own message and data types, that are included in the message being exchanged
in the choreography scenario. Problem: The CML includes data that are suﬃcient
only for generic representation of the data that are included in the message. The
CML does not support the data needed to diﬀerentiate message payloads. This
need motivated the following design requirement: DR3: The CML should include
the information needed to visualise the data that is included in a message.
Visualisation of interaction. Context: The number of interactions in ES tends
to increase until visualising them in the choreography scenario becomes diﬃcult
or even impossible. In these cases, the added value of the choreography scenario
becomes questionable because the sizes of these models increase (rather than
decrease) developers’ cognitive burden. Problem: The CML does not contain
data to support the clustering or selective presentation of interactions on the
diagrams. The following design requirement addresses this need: DR4: The CML
should support deﬁnition of criteria for clustering interactions.
4.3

Practical Challenges of Using the CML

During the FG sessions, the experts strongly emphasised the changes in the
development organisations that were caused by the introduction of the CML.
According to their claims, successful integration of the CML into existing work
practices is strongly related to users’ understanding of the language and the
value it brings to the company. To contribute to better understanding, experts
identiﬁed and clariﬁed practical challenges regarding CML use.
How to keep the scenario on the right abstraction level? This question
summarises the experts’ concerns about whether use of the CML brings value
to the development process. Experts noted that, if it includes too many details,
the model becomes a burden and decreases the value of the speciﬁcation. CML
model developers should be guided to retain the value of the speciﬁcation at
the optimal level of abstraction. Two criteria determine the level of detail in
a speciﬁcation. The ﬁrst criterion is users’ expertise. Users use speciﬁcations to
implement ES and users with more expertise and knowledge about the ES under
development may require fewer details compared to novice users, who usually
require more details. Consequently, the level of detail in a speciﬁcation should
be aligned with the users’ level of expertise. The Second criterion is customer
demand. Customers can demand that the speciﬁcation include various points of

Choreography Modelling Language for the Embedded Systems Domain

155

view at various levels of detail. These demands, for example, can be related to
compliance wiht standards.
How to impose an exact CML semantic? The need to ensure an exact
semantic in CML constructs was strongly emphasised by the experts for two reasons. First, an exact semantic helps users perform their work in a way that leaves
less space for individual interpretations and the development of workarounds.
Second, it allows the development of compilers and simulators that can automatically verify various properties of the speciﬁed models. One approach to imposing
an exact semantic is to clearly deﬁne the construct based on the application context and consider end users terminology, metaphors and ways of working. Once
deﬁned, the semantic of the CML should be disseminated to end users through
training, manuals and the tools built-in help system.
How to enable users to work only with meaningful interactions? During the FG sessions, experts pointed out the possibility of encountering a large
number of interaction steps in a speciﬁcation and proposed the need to develop
a mechanism to enable them to work only with interactions that are meaningful from their perspective. One way to address this is to categorise interactions
based on the needs of diﬀerent stakeholders. The CML editor can later highlight
the speciﬁcation details that are relevant to the users roles and hide the details
that are not.
How to ensure integration with other models? Experts raised concerns
regarding the integration of the CML with changing technologies and the other
models that are used in the development process. The CML is built on Eclipse
technologies, which allow easy modiﬁcation of both the CMLs meta-model and
the visual representation of its language constructs in the CML editor [28]. To
address this, individuals responsible for maintaining the CML should be familiar
with the ﬂexibility of these technologies and trained to make necessary modiﬁcations to the CML.
Which tasks are automated by the CML editor? During the FG sessions,
the experts expressed concerns regarding the automation of tasks such as detection of faults, interaction clustering and predictive fault correction. The current
CML editor does not support the automation of these tasks, but they will be
supported in future versions of the editor. To aid acceptance of the CML in
organisations, these features of the editor need to be clearly communicated to
future users.

5

Lessons Learned

The goal of this study was to learn abuot the applicability of the CML in ES by
analysing experts’ opinions. The analysis indicated how the CML can be used in
practice, additional design requirements and potential challenges during its use.
Four lessons learned were derived based on these ﬁndings.
The ﬁrst lesson is that choreography modelling can be used beyond its original purpose. The typical understanding of choreography is that it captures the

156

N. Taušan et al.

interaction between participants services (which are autonomous units that oﬀer
distinct functionalities for its users) from a global or neutral point of view. This
study revealed its potential to capture the interactions between veriﬁcation steps
in the ES testing phase. In the testing phase, instead of presenting the sequence
of service interactions needed to fulﬁl system goals, choreography models are
related with the veriﬁcation of the requirements that are being tested. To the
authors knowledge, this is the ﬁrst study to present the use of choreography in
this context.
The second lesson is that choreography can be used in protocol development. Choreography modelling was used to capture the communication protocols
among devices for ocean observation, as in [5]. This study conﬁrms that choreography modelling can be used for protocol development in large-scale distributed
ES.
The third lesson is that four design requirements can be derived from the
analysis of the experts opinions. These requirements complement the requirements that were identiﬁed in our previous study [28] and highlight the aspects
of ES development that should be considered when designing a CML for ES.
Consequently, these requirements were seen as additions to the knowledge base
of modelling language design.
The fourth lesson is that alignment between developers level of expertise
and the level of detail in a modelling language is important for the languages
acceptance in practice. For the modelling language to be accepted and used by
its potential users, company managers should consider the users level of expertise
and balance the level of detail in the modelling language accordingly. A balanced
level of detail in a modelling language prevents ambiguous interpretation of
models without imposing an additional burden on model developers. In this
way, the developed models convey the necessary information to its users, thus
facilitating communication among them.

6

Trustworthiness of the Study

This study used TAT, a qualitative method for data analysis, and relied on the
techniques for ensuring the trustworthiness of qualitative studies proposed by
Shenton [22]. These techniques address the study’s credibility, transferability,
dependability and conﬁrmability.
Credibility of the study was ensured by applying the following techniques.
Adaption of well-established research methods. This study relied on FG and aﬃnity grouping for data collection and on TAT for data analysis. These data collection methods are well known and accepted in the software engineering ﬁeld,
and the questions asked during the FG session are recommended for collecting
experts’ feedback on novel artefacts. The TAT was chosen due to its ﬂexibility
and suitability for analysis of FG data collected during the development of novel
ICT concepts.
The development of early familiarity with the culture of participating organisations. The authors have much experience cooperating with the companies

Choreography Modelling Language for the Embedded Systems Domain

157

whose experts participated in the FG sessions. The AMALTHEA project is one
of several research projects during which the authors and experts built mutual
understanding and trust and learned about each others organisations.
Sampling. The FG participants were chosen for their experience in software
engineering, modelling, modelling technologies and languages, and their familiarity with the work of AMALTHEA.
Frequent debrieﬁng sessions. During the analysis of the collected data, the
experts and researchers frequently collaborated to validate and clarify the concepts that were not discussed in depth during the FG sessions. Before taking
their current form, the study ﬁndings underwent a process of reﬁnement during
which the researchers presented and discussed the ﬁndings with the experts.
Thorough description of the phenomenon under study. This study is part
of a larger research project that aims to design a CML. Intermediary studies
conducted as the CML design progressed describe various aspects of the CML
design, which is the phenomenon considered by this study.
Transferability of the study was ensured by providing a description of the context in which the study was executed. These data include a number of companies
and participants, participants’ backgrounds and involvement with the CML, data
collection and analysis methods, the sessions’ length and the period of time over
which the data was collected. Based on these data, readers can determine the
extent to which these study ﬁndings are applicable in their contexts.
Dependability of the study was ensured by describing the way in which
research methods were used for data collection and analysis. These descriptions,
together with other published work on the CML design, are seen as suﬃcient for
readers to develop an understanding of how the study was conducted.
Conﬁrmability of the study was ensured using two techniques. The ﬁrst technique is thorough methodological description, which, like previous criteria, allows
the study ﬁndings to be scrutinised. The second technique is recognition of the
shortcomings of the study. The main shortcoming is that the experts who were
involved in the CML design could provide biased evaluations during the FG
sessions. To a large extent, FG sessions with experts who were not involved in
the CML design mitigated risks related to this shortcoming. Additionally, only
a single expert who was involved in all of the CML development phases also
evaluated the CML. Other experts in CML design were involved with limited
tasks such as providing company-speciﬁc documentation and explaining their
company’s ES development process. Consequently, these experts were seen as
suitable participants in FGs.

7

Conclusion

In this study, the CML developed for the ES domain was evaluated by collecting and analysing experts’ opinions. The lessons learned from the evaluation
are considered relevant for both practitioners and researchers. For practitioners
involved in protocol development, the lessons learned conﬁrm that choreography modelling is a useful technique and indicate that its use may reduce the

158

N. Taušan et al.

maintenance burden and improve performance. For practitioners interested in
introducing a modelling language in their practice, the lessons learned indicate
that alignment between users expertise and the level of detail in language constructs is important for future users to accept the language.
The lessons learned are of interest to researchers due to this study’s unique
ﬁnding about the adoption of choreography modelling in ES testing and the
design requirements it identiﬁed. The use of choreography to capture the
sequence of interactions between the veriﬁcation steps of ES testing shows that it
can be used beyond its original purpose, and further research should investigate
its use in this and other similar contexts. The identiﬁed design requirements
complement the requirements that were identiﬁed in our previous study and
represent the knowledge that is relevant for designing CMLs for the ES domain.
Acknowledgments. The authors are grateful to FG participants for their time and
eﬀort and to the AMALTHEA partners for their cooperation. This study was supported
by ITEA2 and Tekes, the Finnish Funding Agency for Technology and Innovation.

References
1. Bond, G., Cheung, E., Fikouras, I., Levenshteyn, R.: Uniﬁed telecom and web
services composition: problem deﬁnition and future directions. In: Proceedings of
the 3rd International Conference on Principles, Systems and Applications of IP
Telecommunications, p. 13. ACM (2009)
2. Cambronero, M.E., Dı́az, G., Martı́nez, E., Valero, V.: A Comparative Study
between WSCI, WS-CDL, and OWL-S. In: IEEE International Conference on
e-Business Engineering, ICEBE 2009, pp. 377–382. IEEE (2009)
3. Choreography Modelling Language and Editor (publicly accessible version) (2015).
http://www.oulu.ﬁ/sites/default/ﬁles/content/cml.zip
4. Cortes-Cornax, M., Dupuy-Chessa, S., Rieu, D., Dumas, M.: Evaluating choreographies in BPMN 2.0 using an extended quality framework. In: Dijkman, R., Hofstetter, J., Koehler, J. (eds.) BPMN 2011. LNBIP, vol. 95, pp. 103–117. Springer,
Heidelberg (2011). doi:10.1007/978-3-642-25160-3 8
5. Hu, R., Neykova, R., Yoshida, N., Demangeon, R., Honda, K.: Practical interruptible conversations. In: Legay, A., Bensalem, S. (eds.) RV 2013. LNCS, vol. 8174,
pp. 130–148. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40787-1 8
6. Hut, P.M.: Aﬃnity Diagram - Kawakita Jiro or KJ Method (2015)
7. Iwai, A., Oohashi, N., Kelly, S.: Experiences with automotive service modeling. In:
Proceedings of the 10th Workshop on Domain-Speciﬁc Modeling. ACM (2010)
8. Kaur, N., McLeod, C.S., Jain, A., Harrison, R., Ahmad, B., Colombo, A.W., Delsing, J.: Design and simulation of a SOA-based system of systems for automation
in the residential sector. In: 2013 IEEE International Conference on Industrial
Technology (ICIT), pp. 1976–1981. IEEE (2013)
9. King, N.: Using templates in the thematic analysis of texts. In: Cassell, C., Symon,
G. (eds.) Essential Guide to Qualitative Methods in Organizational Research, pp.
256–270. Sage, London (2004)
10. Kontio, J., Lehtola, L., Bragge, J.: Using the focus group method in software
engineering: obtaining practitioner and user experiences. In: Proceedings of the
2004 International Symposium on Empirical Software Engineering, ISESE 2004,
pp. 271–280. IEEE (2004)

Choreography Modelling Language for the Embedded Systems Domain

159

11. Kothmayr, T., Kemper, A., Scholz, A., Heuer, J.: Schedule-based service choreographies for real-time control loops. In: 20th Conference on Emerging Technologies &
Factory Automation (ETFA), pp. 1–8. IEEE, Luxembourg (2015)
12. Lethbridge, T.C., Sim, S.E., Singer, J.: Studying software engineers: data collection
techniques for software ﬁeld studies. Empirical Softw. Eng. 10(3), 311–341 (2005)
13. Lewis, G., Morris, E., Simanta, S., Smith, D.: Service orientation and systems of
systems. IEEE Softw. 28(1), 58–63 (2011)
14. Lin, L., Lin, P.: Orchestration in Web Services and real-time communications.
IEEE Commun. Mag. 45(7), 44–50 (2007)
15. Litosseliti, L.: Using focus groups in research. A&C Black (2003)
16. Mancioppi, M., Perepletchikov, M., Ryan, C., Heuvel, W.-J., Papazoglou, M.P.:
Towards a quality model for choreography. In: Dan, A., Gittler, F., Toumani, F.
(eds.) ICSOC/ServiceWave-2009. LNCS, vol. 6275, pp. 435–444. Springer, Heidelberg (2010). doi:10.1007/978-3-642-16132-2 41
17. Miles, M.B., Huberman, A.M.: Qualitative Data Analysis: An Expanded Sourcebook. Sage, Thousand Oaks (1994)
18. Peltz, C.: Web services orchestration and choreography. Computer 36(10), 46–52
(2003)
19. QSR-International: NVivo 10 research software for analysis and insight (2014)
20. AMALTHEA. www.amalthea-project.org (2014). Accessed 25 May 2014
21. Scholz, A., Gaponova, I., Sommer, S., Kemper, A., Knoll, A., Buckl, C., Heuer, J.,
Schmitt, A.: SOA-service oriented architectures adapted for embedded networks.
In: 7th IEEE International Conference on Industrial Informatics (INDIN 2009),
pp. 599–605. IEEE, Cardiﬀ (2009)
22. Shenton, A.K.: Strategies for ensuring trustworthiness in qualitative research
projects. Educ. Inf. 22(2), 63–75 (2004)
23. Sridhar, T.: Designing Embedded Communications Software. CRC Press (2003)
24. Starke, G., Kunkel, T., Hahn, D.: Flexible collaboration and control of heterogeneous mechatronic devices and systems by means of an event-driven, SOA-based
automation concept. In: 2013 IEEE International Conference on Industrial Technology (ICIT), pp. 1982–1987. IEEE (2013)
25. Taušan, N., Aaramaa, S., Lehto, J., Kuvaja, P., Markkula, J., Oivo, M.: Customized choreography and requirement template models as a means for addressing
software architects challenges. In: The Ninth International Conference on Software
Engineering Advances, ICSEA 2014. IARIA XPS Press, Nice (2014)
26. Taušan, N., Lehto, J., Kuvaja, P., Markkula, J., Oivo, M.: Comparative inﬂuence
evaluation of middleware features on choreography DSL. In: The Eighth International Conference on Software Engineering Advances, ICSEA 2013, pp. 184–193.
IARIA XPS Press (2013)
27. Taušan, N., Markkula, J., Kuvaja, P., Oivo, M.: Choreography in Embedded Systems Domain: A Systematic Literature Review (submitted for publication, 2016)
28. Taušan, N., Markkula, J., Kuvaja, P., Oivo, M.: Choreography modelling in embedded systems domain -requirements and implementation technologies-. In: 4th International Conference on Model-Driven Engineering and Software Development,
MODELSWARD 2016. Scitepress, Rome (2016)
29. Tremblay, M.C., Hevner, A.R., Berndt, D.J.: Focus groups for artifact reﬁnement and evaluation in design research. Commun. Assoc. Inf. Syst. 26 (2010).
Article 27

Methods and Tools

An ISO 26262 Compliant Design Flow and Tool
for Automotive Multicore Systems
Maria Trei1 , Salome Maro2(B) , Jan-Philipp Steghöfer2(B) ,
and Thomas Peikenkamp1
1

2

OFFIS e.V., Eschwerweg 2, 26121 Oldenburg, Germany
{maria.trei,peikenkamp}@offis.de
Chalmers | University of Gothenburg, Gothenburg, Sweden
{salome.maro,jan-philipp.steghofer}@gu.se

Abstract. Model-based design processes in the automotive industry
must support standards like ISO 26262. Especially for smaller suppliers developing software for OEMs, large-scale methodologies like
AUTOSAR are impractical. Instead, smaller, focused processes that still
allow ISO 26262 compliance are required. In addition, the steps in the
process must be well-supported by the development tool-chain, in particular when developing complex multicore systems. In this paper, we
show such a process based on existing design ﬂows and the current state
of an automotive modelling tool. We structure the design ﬂow to ensure
compliance with the ISO 26262, where necessary complementing it with
required steps to ensure safety. Furthermore, supporting tools extending
the modelling tool are discussed. As a result, the presented design ﬂow
covers all development phases.

1

Introduction

With the development of new functions that are needed for new car generations—
in particular in the context of autonomous driving functions—a massive performance increase of electric and electronic (E/E) systems is needed. The needed
performance boosts are demonstrated not only by new car generations, but also
several research projects—backed up by key industrial partners—that investigate
how to use current multicore technologies. Projects like AMALTHEA4public and
ARAMIS address concrete challenges imposed by exploiting multicore technology in model-based design processes.
These processes are very much focused on providing functional aspects of
multicore systems. However, the domains in which the ﬁnal system is deployed
and the complexity of multicore development make it necessary to put a particular focus on safety aspects [12]. We address this need by presenting a safetyoriented design workﬂow, where state-of-the-art modelling and analysis concepts
The work has been partially funded by the German Ministry for Education and
Research (BMBF) under the funding ID 01IS14029H (AMALTHEA4public) and ID
01IS15031H (ASSUME) and Vinnova AMALTHEA4public.
c Springer International Publishing AG 2016

P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 163–180, 2016.
DOI: 10.1007/978-3-319-49094-6 11

164

M. Trei et al.

for the development of multicore systems are structured to support the fulﬁlment
of ISO 26262 requirements. The underlying work was carried out in two phases
that are also reﬂected in the structure of this paper: First, an analysis was carried
out, identifying design steps applied in current industrial automotive processes
and the design concepts involved in these design steps. The second phase was
an analysis of the ISO 26262, identifying requirements that aﬀect the execution
of these design steps, including the relevant information to be included in the
previously identiﬁed design concepts.
This paper is structured as follows: Sect. 2 introduces ISO 26262, method
engineering, and related work. In Sect. 3, we discuss the AMALTHEA platform
and design ﬂow which is extended with concepts from ISO 26262 in Sect. 4. We
conclude the paper with a summary and an outlook on future work.

2

Background and Related Work

The
international
standard
ISO 26262 (“Road vehicles –
Functional safety”) focuses on
safety issues in the development
of E/E systems of passenger cars
with a maximum gross vehicle
mass up to 3500 kg. It describes
the safety lifecycle of automotive
E/E systems, including management, development, production,
operation, service and decommissioning. A central element is the
hazard analysis and risk assessment in the beginning of the
safety lifecycle, where Automotive
Safety Integrity Levels (ASILs)
Fig. 1. V-Model of ISO 26262
are deﬁned. Based on this classiﬁcation, requirements for avoidance of residual risk are deﬁned and validation methods are recommended or prescribed. The procedure of how to develop the system under development (SUD)
is given by a V-model, that divides engineering processes into phases of system,
hardware, and software development based on the safety concept as shown in
Fig. 1. This paper addresses the phases shown in bold, where the clauses highlighted with black boxes are already partially supported by the design ﬂow. The
organisation of requirements and their correspondence to elements of the SUD
is prescribed to allow validation of the design and veriﬁcation against the safety
concept.
Since ISO 26262 mostly focuses on safety aspects of the development, additional method content must be provided for the functional part. Such adaptation
can be done in a systematic fashion using approaches such as situational method

An ISO 26262 Compliant Design Flow and Tool

165

engineering (SME) [9] or process lines [13]. The former approach uses standardised fragments to compose a process based on an analysis of the needs of a
speciﬁc team and project. The latter extends SME by making the variability
of the fragments explicit and using variability modelling primitives provided by
process modelling tools. The need for tailoring ISO 26262 to a company’s process
guidelines has been acknowledged [8].
The authors of [7] model ASPICE, IEC 61508, and ISO 26262 in a process
line with the aim to allow companies to derive processes conforming to one or
more of these safety standards. Functional aspects are not modelled explicitly.
Our work, in comparison, aims at combining a design ﬂow for the functional
aspects of the system with one for the safety aspects. We envision its use in the
context of SME with speciﬁc adaptations for teams and projects.
The migration towards an ISO 26262 compliant process is the focus of [4].
Importantly, the authors identify the need to integrate the practices of the standard with the ones of existing processes at the companies. This is in line with
what we have done by studying the existing process and deriving the superset
of activities that are now combined with safety practices from ISO 26262.

3

The AMALTHEA Platform and Design Flow

AMALTHEA and AMALTHEA4public are ITEA funded projects running since
2011. They concentrate on software development for multi- and many core systems. One of their outcomes is the AMALTHEA platform [1,14] (APP4MC),
a tool solution to aid the design of many and multi-core systems. The design
ﬂow has been developed in this context as the result of a survey of current
development activities in the projects.
3.1

AMALTHEA Platform

Overview. The AMALTHEA platform is an open source tool platform developed
mainly for engineering embedded systems in the automotive area. It supports
many design activities as well as complex partitioning and mapping for embedded systems. Further, the platform assists engineering processes with the ability
of handling product lines and executing simulation and validation tools. It is continuously extended to integrate more facets of system engineering processes such
as veriﬁcation, safety, and formal validation of timing requirements. Conformity
to ISO 26262 is one topic to be addressed. A ﬁrst step was the identiﬁcation of
correlations and needs between both the standard and the meta-model of the
platform in form of a gap analysis [3]. The platform supports an iterative workﬂow to accommodate rapid prototyping and feedback from validation activities.
Engineers are supported by the AMALTHEA platform in many steps ranging
from component modelling to partitioning and mapping activities. The trace
model, e.g., a result of the simulation of the system, provides helpful information
about the execution times of speciﬁc system parts as a basis for the improvement
of the existing architecture.

166

M. Trei et al.

Data Models. APP4MC uses a top-level System Model that is subdivided into
more ﬁne-grained sub-models. They allow modelling based on the platform to
derive the description of hard- and software used in the overall system down
to the lowest level of abstraction. We will give a brief overview of the most
important features.
The SUD can be represented from the system, software and hardware point
of view. The hardware model is composed of (hardware) systems, ECUs, microcontrollers and cores. Networks and memories can be deﬁned. Similarly, software
can be subdivided into processes, tasks, runnables, and Interrupt Service Routines (ISRs) using the software model. The technical architecture in terms of
components and their interfaces is part of the components model. Requirements
on the behaviour of the system can be described by the constraints model, which
consists of constraints on timing, data age and runnable sequencing or grouping, aspects particularly important in multicore development. This also includes
requirements of the dynamical architecture, whereas interactions of software and
hardware are covered by the property constraints model. The partitioning model
describes how the software is broken down into runnables which are in turn
aggregated in tasks [10]. The ﬁnal allocation of software to hardware is given by
the mapping model. The system model is completed by a stimuli model, common model, event model, configuration model and the OS model. The trace model
contains information about the simulation/execution of the software on the hardware.
3.2

AMALTHEA Design Flow

Methodology. The design steps are the result of an extensive data collection
process in which the project partners in AMALTHEA4public provided detailed
information about the concrete steps they follow when developing multicore
embedded software, especially in the automotive domain. Data was collected
from ﬁve industrial partners and ﬁve academic partners. While the academic
partners provided information mostly from their experience with diﬀerent industrial partners they cooperate with in research projects, the industrial partners
contributed with their concrete practical experience. The overall design steps,
along with a rationale and additional information are published in [2]. The
design steps are mainly concerned with development of functionality. Interestingly, safety aspects were not mentioned by any of the partners during the data
collection.
Design Steps. Table 1 gives an overview of the design steps that are the result of
the analysis. Note that no order in which these steps are carried out is implied
since deﬁning a concrete development process with a concrete lifecycle is generally company speciﬁc. A wide variety of lifecycles can be applied, including
the V-model that is implied by ISO 26262 and AUTOSAR. While some steps
are carried out sequentially, others can be done in parallel. The dependencies
between the design steps at times imply an iterative approach where they are

An ISO 26262 Compliant Design Flow and Tool

167

repeated or at least revisited after other work has been performed. Such an approach is common in iterative-incremental lifecycles. The identiﬁed steps cover
most aspects of a traditional software development eﬀort, starting from contract
negotiation and scope identiﬁcation and ending at the delivery of the software
(with the exception of software maintenance). Some steps and circumstances
are speciﬁc in the context of the automotive domain, e.g., the diﬀerentiation
of system and software. Another re-occurring theme is product line issues and
variants, even though this theme will not be regarded in detail in the context of
this paper.
Table 1. Overview of the identiﬁed Design Steps
DS 1: System Requirements Engineering

DS 7: Variant Configuration

DS 2: System Architecture Design

DS 8: Implementation

DS 3: Software Requirements Engineering DS 9: Validation and Testing
DS 4: Derivation of Product Variants

DS 10: System Integration

DS 5: Definition of Software Architecture DS 11: Handover
DS 6: Behaviour Modelling

Some of the design steps we elicited, such as Requirement Engineering and
Architecture Design, can be found in nearly all development processes in a similar
form. However, there are speciﬁc steps steps that are only relevant in multicore
development: Partitioning, Task Creation and Target Mapping are, e.g., part of
DS 10: System Integration. We discuss these design steps in more detail in
Sect. 4.3.

4

Analysis of Compliance Towards ISO 26262

Based on the phases and clauses of ISO 26262 as well as the identiﬁed design
steps, we analyse the recommended design ﬂow to satisfy compliance with the
standard and which support APP4MC provides. Further, we will show necessary
extensions and the steps to achieve them. This will be presented constructively
so as to show how the elicited design ﬂow can be extended and how the platform
capabilities tie into the ﬂow.
The resulting extended design ﬂow as shown in Fig. 2 represents the functional design steps outside the “V” and is enriched by six design steps
related to safety inside the “V”. The top of the given V-model stands for the
vehicle/system-part of the V-model of ISO 26262 while the bottom represents
the software part. As our focus lies on the design process derived for software
development, we will only brieﬂy relate to hardware development in the following subsections. However, many design steps, e.g., partitioning or deﬁnition of
property constraints, require the formulation of hardware assumptions that are
needed during software development.

168

M. Trei et al.

Fig. 2. Overview of the extended Design Steps

Brieﬂy summarized, the new design ﬂow lifts the development lifecycle up to
the vehicle level, where in DS 0 functional requirements are deﬁned, which is
important to initiate the safety lifecycle and therefore to support the concept
phase (DS A) of ISO 26262. Safety requirements are introduced at system and
software level (DS B and DS C). Integration and testing at software level for
safety-related software elements is described by DS D, where the integration into
the system and the item is given by DS E. DS F covers integration and validation
activities at vehicle level. Since validation is against safety goals which are part
of the concept phase at vehicle and item level, DS F can be seen as system and
vehicle part. Before discussing our analysis of the diﬀerent ISO 26262 phases, we
ﬁrst give an overview of common models that are used in all the development
phases.
4.1

Generic Models Addressed in All Development Phases

ISO 26262 describes the development of an item based on the functional behaviour which is intended by each element of the item on the one hand, and the
hazardous events concerning these elements which might lead to the violation
of safety requirements on the other hand. As the diﬀerent levels of abstraction
during this development process lead to diﬀerent features of safety requirements
it is necessary to deﬁne the intended behaviour using concepts of certain levels
of abstraction. Analogously, occurring errors of elements of the item have to be
modelled at every level.
The overall functional system behaviour, as well as the speciﬁc aspects of the
functional behaviour of the hardware and the software are captured in a Behaviour Model. The functional requirements on the vehicle level are broken down
into respective parts for all levels and corresponding behaviour and thus deﬁned
on each level. Depending on the level, the speciﬁc models can take diﬀerent forms
such as algorithms, interaction protocols, state machines, etc. The model is captured in DS 6: Behaviour Modelling which is performed for each of the levels.

An ISO 26262 Compliant Design Flow and Tool

169

Further, it is necessary to introduce modelling of errors to analyse how faults of
elements of the item aﬀect their behaviour. On the modelling side, this requires
a model that allows to perform safety activities related to the development of
such a safety concept. To this end, we are using a generic error model that has
been developed in the ITEA project SAFE [6] shown in Fig. 3.
Basically it captures failure behaviour, in
particular how internal and external failures are
propagated through the functions or components of the system. Once these propagations
have been identiﬁed, safety requirements can
then be derived and allocated to the architectural elements. Using this allocation scheme,
the corresponding Automotive Safety Integrity
Level (ASIL) can be derived for each of the allo- Fig. 3. Error model from SAFE
cated requirements. The ASIL describes the risk project
associated with each requirement based on the severity of a hazard, the likelihood
of its occurrence, and how well the hazard can be controlled. This error model
will be used at vehicle level to support the deﬁnition of the functional safety
concept (ISO 26262-3:2011, 8), and at system level for deﬁnition and allocation
of technical safety requirements (ISO 26262-4:2011, 6–7). Reﬁnement of technical safety requirements to software and hardware safety requirements entails
to also introduce error models at software and hardware level. Error models on
system, hardware and software level have to include faults related to multicore
scheduling, which are much more complex than faults occurring in a singlecore
system. This is due to the fact that the scheduling does not only rely on the
basis of timing and priorities of tasks, but also on the commonly shared memory
available, temperature behaviour of the system and, going into the structure of
the hardware a little deeper, shared buses or power supply.
4.2

Concept Phase (ISO 26262 Part 3)

Processes provided by ISO 26262. The safety lifecycle given by the international
standard ISO 26262 starts with the concept phase in ISO 26262-3:2011, which
consists of the clauses given in the ﬁrst column of Table 2.
Description of the SUD, called item in terms of ISO 26262, takes place at the
vehicle level and includes dependencies on, and interaction with, other items and
the environment of the item. Functional and non-functional requirements need to
Table 2. Concept phase
Clause

Supporting DS

Support by APP4MC

5: Item definition

DS 0, DS A, DS 6

(Property) Constraint Model, Component
Model

6: Initiation of the safety lifecycle

DS A

ProR, Papyrus, Yakindu statecharts

7: Hazard analysis and risk assessment

DS A

Papyrus, Yakindu statecharts

8: Functional safety concept

DS A

ProR

170

M. Trei et al.

be deﬁned, as well as constraints given by the environment and other items. Note
that at this point not only safety-related requirements are considered. Operating
modes, interfaces and boundaries on them have to be speciﬁed to enable hazard
analysis and risk assessment on the one hand and the development of the item
with its intended behaviour on the other hand. Based on the item deﬁnition,
the safety lifecycle is initiated in ISO 26262-3:2011, 6, which means that the
development category of all parts of the item is analysed.
Processes during the hazard analysis and risk assessment of ISO 262623:2011, 7, systematically describe possible hazardous events and their consequences. They need to be classiﬁed using metrics for the severity of potential
harm, the probability of exposure of operational situations, and the controllability of each hazardous event. As a result the ASIL can be calculated for each
hazardous event. A safety goal is derived for each hazardous event, inheriting
its ASIL. These safety goals serve as top-level safety requirements, from which
further functional safety requirements will be derived, in particular those characterizing the Functional safety concept.
The deﬁnition of a functional safety concept requires to analyse how component faults can contribute to the identiﬁed risks. On the modelling side, this
requires a model that allows to perform safety activities related to the development of such a safety concept (ISO 26262-3:2011, 8). To this end, we are using
the generic error model introduced in the previous section and shown in Fig. 3.
Basically it is able to capture failure behaviour, in particular how internal and
external failures are propagated through the components of the system. Once
these propagations have been identiﬁed, functional safety requirements can then
be derived and allocated to the architecture according to ISO 26262-3:2011, 8.4.2
and 8.4.3. Using this allocation scheme, the corresponding ASIL can be derived
for each of the allocated requirements.
Activities in the Design Flow. We extend the elicited design steps by introducing DS 0: Functional Requirements Engineering. In this step the functional
requirements of the item at vehicle level will be collected. They provide a basis
for the functional design of the system, software and hardware on the one hand
and the initiation of the safety lifecycle on the other hand. As our design ﬂow
does not include the development of hardware, but has to represent all important information about the hardware available, if appropriate, it is recommended
to deﬁne hardware properties already at this early stage of development. This
can be, e.g., the intended number of available cores, the size of memory, and
the interaction with other systems. The higher complexity of multicore systems
compared to singlecore architectures may aﬀect the exposure of certain hardware failures, which has to be considered both at system and at software level
again. Further, the activities given by clauses 6–8 and discussed above need to be
represented. For this, we introduce DS A: Derivation of the Functional safety
concept, where the hazard analysis and the risk assessment take place to deﬁne
the corresponding safety goals as top-level safety requirements.

An ISO 26262 Compliant Design Flow and Tool

171

Support by APP4MC. As shown in Table 2, APP4MC can only support the Item
deﬁnition clause by allowing the deﬁnition of the system through the Component Model and constraints through the Property Constraint Model. Other open
source eclipse-based requirements management tools such as ProR1 can be used
in the Item deﬁnition clause to describe the requirements of the item as text.
Papyrus2 and Yakindu statechart tools3 can be used to describe the relationship
between the item and the environment through component models. Description
of the safety lifecycle, hazard analysis and risk assessment and functional safety
concept can be done by text in ProR. However one needs to add safety related
attributes such as implementation categories, safety goals and hazardous events
to the requirements.
4.3

Product Development at the System Level (ISO 26262 Part 4)

Processes provided by ISO 26262. Development of the product on the system
level deﬁnes how both hardware and software capabilities form the overall system. The division into seven clauses according to ISO 26262 is shown in Table 3.
In clauses 5–7, requirements for the system to be developed as well as its overall structure are deﬁned. Then, hardware and software development takes place
(cf. Sects. 4.4 and 4.5), before clauses 8–11 manage the integration of hardware
and software components into the overall system, further veriﬁcation activities
and, later on, the release for production.
Central work products of the former steps are the Technical safety requirements specification (ISO 26262-4:2011, 6.4) and the Technical safety concept
(ISO 26262-4:2011, 7.4.) Both contain the set of safety-related requirements that
are used to implement functional safety requirements. This hierarchical structure of requirements must be traceable back to the top-level safety goals and to
the corresponding system elements. Basic information about the intended use
of particular software and hardware resources shall be considered during these
steps. Safety mechanisms qualiﬁed to prevent parts of the system from failing
need to be described in terms of the underlying system architecture. Technical safety requirements inherit the ASIL of the corresponding functional safety
requirements following the rules for ASIL decomposition given in ISO 262629:2011.
The System design specification and the Hardware-software interface specification (ISO 26262-4:2011, 7.4) consist of the overall system architecture and a ﬁrst
view on the separation into hardware and software. The system design shall be
based on the functional concept, in particular not just the safety-related parts
and the technical safety concept. If technical safety requirements are allocated
to certain parts of the system design, their ASIL shall be attached to these
parts respecting amongst others the criteria for coexistence given in ISO 262629:2011. Note that the hardware-software interface speciﬁcation is going to be
reﬁned during hardware and software development.
1
2
3

http://eclipse.org/rmf/pror/.
https://eclipse.org/papyrus/.
https://marketplace.eclipse.org/content/yakindu-statechart-tools.

172

M. Trei et al.
Table 3. Phase 4: product development at the system level

Clause

Supporting DS Supported by APP4MC

5: Initiation of product
DS B
development at the system level

✗

6: Specification of the
technical safety requirements

DS 1, DS B

(Property) Constraint Model, Component Model

7: System design

DS 2, DS 6

Component Model

8: Item integration and testing

DS 10

Mapping Model, (Property) Constraints Model, OS Model, Trace
Model, Stimuli Model, Event Model,
Configuration Model, Hardware Model

9: Safety validation

DS E

✗

10: Functional safety assessment DS F

✗

11: Release for production

✗

DS 11

Other work products concerned with planning integration, veriﬁcation and
validation activities must also be considered. The relevance of veriﬁcation even
this early on during system development must be noted.
Execution of integration, veriﬁcation and validation is part of ISO 262624:2011, 8–11. Integration takes place on diﬀerent levels, where, roughly described,
ﬁrst hardware and software is integrated to systems, second systems are integrated to the item, and third the item is integrated to the vehicle. The compliance
of the ASILs of diﬀerent objects must be analysed, as well as the correctness of
the functionality of the system design. Together, these steps form the work products Integration testing specifications and Integration testing reports. Clauses 9–10
describe in more detail which methods are recommended for certain veriﬁcation
and validation processes.
Activities in the Design Flow. The activities in the design ﬂow on the system
level are concerned with requirements engineering, overall system design, and
system integration. We combine these aspects concerning the functional system
with safety-speciﬁc aspects by introducing new, safety-focused activities.
DS 1: System Requirements Engineering is concerned with the elicitation
and deﬁnition of system requirements as well as of platform and product requirements in case a product line approach is used. The system requirements can be
captured in diﬀerent ways and describe the system as a black-box, focusing
mainly on what the system does (goals and scenarios), who the users are and
what other systems will it interact with [5]. Model-based systems engineering
methods provide a set of partial models to capture system requirements. Examples of the partial models are environment models which describe the SUD in its
context, application scenarios which capture the diﬀerent use cases of the SUD
and requirements models which capture additional functional and non-functional
requirements. The relevant artefacts created in this step are the System Model
(preliminary) and the System Requirements Specification Document.

An ISO 26262 Compliant Design Flow and Tool

173

There is a strong correspondence between the described artefacts of the
design ﬂow and the Technical Safety Requirements Specification and the Technical Safety Concept of ISO 26262, even if safety is not yet regarded. System
requirements with an impact on safety questions of the SUD should be treated
separately from system requirements as system safety requirements with necessary attributes to correlate to technical safety requirements of ISO 26262. This
includes amongst others the introduction of safety mechanisms and ASILs for
these requirements. We therefore introduce DS B: System Safety Requirements
Engineering which consists of at least the following sub-steps:
1. planning veriﬁcation/validation activities at system level;
2. deﬁnition of system safety requirements based on results of DS 1 and DS A,
including safety-related assumptions caused by the multicore structure of the
SUD.
DS 2: System Architecture Design addresses the design of the overall system
architecture according to the elicited system requirements. This system architecture consists of several partial models that describe the system, sub-systems,
their respective structure and behaviour, and the relations and interactions of
the system with the environment. The relevant outcome of this step is the System
Architecture that contains the architecture of the entire SUD. Along with the
functional behaviour on the system level deﬁned in DS 6: Behaviour Modelling,
this complies in many points with the system design specification of ISO 26262.
Accordingly, traceability of diﬀerent system elements and underlying (safety)
requirements should be enabled in this design step to ensure that safety-related
sub-systems with their corresponding ASIL can be identiﬁed. Safety-related
interference must be represented to allow validating if criteria for coexistence
are met by objects deﬁned during the system design. This includes, for instance,
the problem of space and time partitioning, which describes the concurrency of
resources and is therefore related to hardware and software development, but has
to be regarded at system level. Therefore the behaviour modelling also relates
to ISO 26262’s hardware/ software interface specification as part of the system,
software, and hardware design.
After the hardware and software development has been performed, the next
relevant step is DS 10: System Integration in which executable tasks from
the system are created, partitioned and mapped to the target hardware. This is
where multicore development diﬀers from singlecore development: multiple tasks
can run at the same time and eﬃciency and consistency must be guaranteed.
Therefore creation, partitioning and mapping of tasks as discussed below are
essential activities.
Task Creation. In this step, tasks that contain a set of Runnables are created
from the software model. Task creation also takes into account the constraint
model in order to decide which Runnables can be grouped together. The results
of this step are stored in an augmented Software Model. At this point, the rate
at which the tasks are activated, e.g., periodic, single, or sporadic activation is
determined and stored in a Stimulation model.

174

M. Trei et al.

Partitioning. In this step, tasks are identiﬁed to derive possible partitions that
can be executed in parallel. Partitioning includes possibilities to group Runnables
by their activation reference and group independent sets of Runnables to come
up with graph structures that have the most eﬃcient potential when running
tasks in parallel. This step leads to a Partitioned Software Model and a Constraint
Model.
Target Mapping. The aim of this step is to ﬁnd a valid and optimal distribution of software elements to hardware components. Data from Software and
Hardware Models, as well as the tasks activation from the Stimulation Model are
used to calculate such a distribution. Additionally, a Property Constraints Model
may be included during the mapping process which is used to narrow down the
solution space, e.g., some tasks may require the target platform to have a certain amount of memory. The results of this step are stored in a Mapping Model.
Moreover, a preliminary OS Model is generated, which contains a scheduler for
each of the cores of the hardware platform.
The artefacts of DS 10 correspond to inputs of the hardware-software interface specification (HSI) of the ISO 26262 system level, which is already supported
by DS B. But the elicited design steps do not contain explicit support for veriﬁcation of safety system requirements after the integration. This will be corrected by
introduction of DS E: Safety Validation and DS F: Functional Safety Assessment. DS E consists of testing activities to verify that the safety requirements
are satisﬁed and DS F assesses the functional safety concept deﬁned in DS A
and its implementation.
Finally, Design Step 11: Handover handles acceptance testing, delivery of
the product and sign-oﬀ. All acceptance tests must pass before the product can
be delivered to the customer. Including safety-related tests, this corresponds in
many points to the Release for production given by ISO 26262-4:2011, 11.
Support by APP4MC. APP4MC provides a Hardware Model which is dedicated
model that supports the mapping process. The model allows for deﬁnition of the
available hardware by specifying the number of cores, the speed, memory and
other hardware related properties [11]. This model can then be used together
with the software model and the constraint model to create an optimal distribution of the software to a speciﬁc hardware platform. The property constraints
model also oﬀers the possibility to deﬁne safety-related constraints, e.g., to separate safety-critical from non-safety-critical software running on diﬀerent hardware components. This could be necessary in a development process to guarantee
freedom of interference of certain artefacts. Another use case of separation related
to safety is the principle of redundancy, which means that some safety-critical
parts of a system are implemented twice to prevent the single-point-failure for a
certain part.
4.4

Product Development at the Hardware Level (ISO 26262
Part 5)

Processes provided by ISO 26262. Due to space constraints, we restrict development activities on the hardware level to the derivation of hardware safety

An ISO 26262 Compliant Design Flow and Tool

175

requirements (ISO 26262-5:2011, 6) as shown in Table 4 and do not dive into the
hardware design or its evaluation (ISO 26262-5:2011, 7–9). We cannot drop this
clause since hardware safety requirements have to be consistent with the technical safety concept and the system design speciﬁcation (ISO 26262-5:2011, 6.1),
imposing non-trivial relationships between the hardware and system level on
one side, and between hardware and software level on the other side. The hardware elements relevant for the hardware-software interface have already been
identiﬁed as shown in Sect. 4.3.
Activities in the Design Flow. Description of required hardware and its components can be captured in the Variant Model and the Hardware Model derived
during DS 4: Derivation of Product Variants and reﬁned in DS 6: Behaviour
Model. An important aspects to be considered when designing a multicore systems is the distribution of safety-related software to certain hardware elements
such as ECUs, cores, or memory. Depending on such a classiﬁcation, hardware
can then also be categorised as safety-related or not which aﬀects to what extent
testing activities are required. This again emphasizes why traceability of all kinds
of information is very important in the development of safety-critical systems.
The elicited design ﬂow did not contain an explicit step for deﬁnition of hardware
safety requirements. Therefore we suggest that this is addressed as early as in
DS B: System Safety Requirements Engineering. When system safety requirements are deﬁned, hardware safety requirements should be deﬁned as well. These
can later be reﬁned when the actual design and development of the hardware
takes place.
Support by APP4MC. Table 4 shows the tools available in APP4MC, that can
be used to support clause (ISO 26262-5:2011, 6). All necessary elements for
a description of the hardware-software interface are available in the Hardware
model, including descriptions for ECU, Microcontoller, and Core and their associated memory access and communication characteristics. Since hardware safety
requirements have to address the eﬀectiveness of safety mechanisms (ISO 262625:2011, 6.4.2), an error model used to represent hardware failures is required for
a precise description of these. Although it has the same role as the error model
on concept level shown in Fig. 3, this model has to be rich enough to allow for the
(later) evaluation, integration and testing activities (ISO 26262-5:2011, 7–10).
Due to this detail in the hardware safety requirements it is possible to evaluate
the impact of hardware failures on system level (ISO 26262-5:2011, 7.4.4) even
before the hardware is available. The support by APP4MC for the deﬁnition of
hardware safety requirements is given by the usage of ProR on the one hand
and the Property Constraints Model on the other hand, where requirements and
constraints on hardware can be deﬁned.
4.5

Product Development at the Software Level (ISO 26262 Part 6)

Processes provided by ISO 26262. ISO 26262 provides the clauses shown in
Table 5. Software safety requirements based on the technical safety concept speciﬁed at system level need to be deﬁned. The HSI is updated to accord with these

176

M. Trei et al.
Table 4. Product development at the hardware level

Clause

Supporting DS Support by APP4MC

6: Specification of hardware safety DS B
requirements

Hardware Model, (Property) Constraint Model, ProR

requirements before regarding a ﬁrst version of the overall software architecture
in ISO 26262-6:2011, 7. The Software architectural design specification describes
software components and their interactions, i.e., their hierarchical structure and
interfaces, and properties that inﬂuence their implementation such as scheduling properties (ISO 26262-6:2011, 7.4). Based on this, the Software unit design
specification is derived (ISO 26262-6:2011, 8.4). The next step is to generate
source code, which is followed by veriﬁcation and testing activities (ISO 262626:2011, 8–9). The goal of these activities is to ensure that the implementation
satisﬁes requirements on the software units. These requirements are not limited
to safety-related parts of the software, even if testing activities in ISO 26262
only focus on them. Integration of the software elements to the embedded software in ISO 26262-6:2011, 10, consists of regarding the implementation results
with respect to the software architectural design speciﬁcation, whereas satisfaction of software safety requirements is part of ISO 26262-6:2011, 11. Relations
concerning the hardware-software interface must be considered.
Table 5. Phase 6: product development at the software level
Clause

Supporting DS

Supported by APP4MC

5: Initiation of product development at the
software level

DS C

✗

6: Specification of software safety
requirements

DS 3, DS 4, DS C Software Model, (Property) Constraint
Model

7: Software architectural design
specification

DS 5, DS 6, DS 7

8: Software unit design and implementation DS 8

Software Model, Component Model
Software Model

9: Software unit testing

DS 9

Trace Model

10: Software integration and testing

DS 9

✗

11: Verification of software safety
requirements

DS D

✗

Activities in the Design Flow. The activities in the functional design ﬂow cover
all aspects of the software development lifecycle, starting from requirements, to
analysis and design, to coding, and validation. DS 3: Software Requirements
Engineering addresses the elicitation of requirements pertaining to the software
part of the system. Sources for requirements are system requirements and customers or potential users of the software. Software units shall be described so
that a partitioning can be executed to prepare the software for mapping to
the hardware. For multicore development, it is very important to describe any

An ISO 26262 Compliant Design Flow and Tool

177

kind of requirements on relations between software units, such as sequencing
and dependent deadlines. This step is usually done in several iterations. The
collected requirements are recorded in the Software Requirements Specification
Document. Acceptance Tests are deﬁned based on the requirements to validate
that the agreed-upon system is being built.
As for the system level, requirements engineering at the software level needs
to be extended with respect to safety to comply with ISO 26262. We introduce
DS C: Software Safety Requirements Engineering which, similar to DS B,
consists of
1. planning veriﬁcation/ validation activities at software level,
2. the deﬁnition of software safety requirements based on DS B (and the system
design), and the
3. validation of the software safety requirements against the software requirements (i.e., the part that is not safety-related).
Also, it must be ensured that the software safety requirements are correct, complete, and consistent with respect to the safety goals and the system design.
DS 5: Definition of Software Architecture can produce design artefacts
such as component models which contain all the software components and their
dependencies and interaction models to describe communication between these
components. The architecture of a software can be described using more than
one model in order to capture diﬀerent perspectives of the software or to further
reﬁne it into a lower abstraction level. All models are captured in the Software
Architecture Document. The development of software is further supported by DS
6: Behaviour Modelling, where the behaviour of software components can be
speciﬁed, such as the communication between components. An iterative review
process ensures consistency and completeness of the architecture.
In Design Step 8: Implementation, the required code is produced, tests are
developed and executed, the software is integrated and the code is reviewed.
The main resulting artefacts are Source Code and diﬀerent sets of Tests (unit,
component, integration) as well as the Integrated Software and its Documentation.
The ﬁnal activity in this phase is Design Step 9: Validation and Testing.
It involves testing of software components to validate if they are working as
desired, i.e., according to the speciﬁed requirements. For software components
that will interact with hardware components, simulations are run in order to ﬁx
as much defects as possible before the component can be tested on the actual
hardware. Deployable Control Software is a packaged integrated software that is
ready to be deployed on a speciﬁc hardware.
In parallel testing methods given by ISO 26262 should be introduced to
support the veriﬁcation of the embedded software against the software safety
requirements. Depending on the ASIL of each safety goal, diﬀerent technologies
and testing environments are feasible, as e.g. fault injection tests or interface
tests. We deﬁne DS D: Verification of software safety requirements to be the
design step supporting these activities.

178

M. Trei et al.

Support by APP4MC. The platform provides the Software Model that can be
used to model the runnables, tasks and processes that make up the software.
For software architecture speciﬁcation, the Component Model can be used, as
well as the Property Constraint Model to deﬁne software constraints. Additionally, the platform provides a Trace model to assist software testing. The trace
model gives details on time consumed by tasks to allow reﬁnement of the model
to get the most eﬃcient one.
4.6

Tool Support

As can be seen in the tables that show tool support for the design steps in
APP4MC, mainly parts introduced by the clauses related to safety assessment
and veriﬁcation in ISO 26262:2011 are missing natively. Therefore, it is necessary
to accompany APP4MC with one or more additional tools that can ﬁll those
gaps. Where open source eclipse tools are available(e.g., ProR and Papyrus),
we have proposed their use in combination with APP4MC. However, not all
phased can be supported by open source tools, for instance for functional safety
activities a commercial tool like medini analyse4 which supports Item Deﬁnition (ISO 26262-3:2011, 5; DS 0, DS A, DS 6), Initiation of the Safety Lifecycle
(ISO 26262-3:2011, 6; DS A), Hazard Analysis and Risk Assessment (ISO 262623:2011, 7; DS A), and Functional Safety Concept (ISO 26262-3:2011, 8; DS A)
may be used. This covers the entire concept phase. Validation and assessment
capabilities, supporting Safety Validation (ISO 26262-4:2011, 9; DS E), Functional Safety Assessment (ISO 26262-4:2011, 10; DS F), and Veriﬁcation of Software Safety Requirements (ISO 26262-6:2011, 11; DS D), can be provided by,
e.g., BTC EmbeddedValidator5 .
In addition, APP4MC interfaces with a number of requirements management and modelling tools that are commonly used in the industry. Some like
IBM Rational DOORS are integrated through the use of OSLC adapters or
other means, while many such as the diﬀerent modelling environments integrate

Fig. 4. Tools providing support for clauses not covered directly by APP4MC. The star
indicates which tools are shipped with the distribution of the platform.
4
5

http://www.kpit.com/engineering/products/medini-functional-safety-tool.
www.btc-es.de/index.php?idcatside=40&lang=2.

An ISO 26262 Compliant Design Flow and Tool

179

seamlessly into the Eclipse environment provided by APP4MC. If a direct integration or the use of standards like OSLC is not feasible, the export and import
capabilities of APP4MC and the tools must be used. This potentially introduces synchronisation issues, however. For design steps that are not repeated
very often and occur towards the end of the development cycle—such as DS D:
Verification of software safety requirements—this problem is negligible since
exported artefacts do not need to be synchronised with the tools where they
were originally created. An overview of the tools that are currently in use to
support development with APP4MC is shown in Fig. 4.

5

Summary

An eﬃcient way of working in an ISO 26262 compliant fashion with eﬀective tool
support is vital to maintain the relationship between OEMs and their suppliers.
In this paper we have shown how a design ﬂow elicited from actual development
practices at such companies can be extended for ISO 26262 compliance and how
it is supported by APP4MC. Where necessary (e.g., concept phase), missing
modelling concepts (e.g., error models) have been identiﬁed. Other tools (both
open source and commercial) that can be used in combination with the APP4MC
to support safety activities in the ISO 26262 have also been suggested.
Future work will include the deployment and validation of the design ﬂow
in the companies that are part of AMALTHEA4public. In addition, we aim to
strengthen the interface between systems and software engineering by reﬁning
the design steps that regard the exchange of information between these levels. Issues of traceability and cross-company information exchange will also be
regarded.
We proposed extensions of the platform either through provision of own tools
or recommendation of tools compatible with APP4MC to support the additional
design steps. However, the beneﬁt of using other external tools needs to be
analysed, especially concerning our safety extensions for which commercial tools
exist that allow ISO 26262 compliant design in some of the deﬁned steps. But
even commercial tools do not allow to follow our design ﬂow in all aspects, so
we have to investigate which design steps are supported by external tools, and
how they can get integrated with APP4MC. Even if some tools oﬀer interfaces
via OSLC, there is still the need for rich traceability between diﬀerent tools
to support the exchange of information between diﬀerent companies as well as
change management. This leads to the implementation of tool adapters and
traceability tools across the whole AMALTHEA toolchain.

180

M. Trei et al.

References
1. Amalthea Project. http://www.amalthea-project.org/. Online; Accessed 16 Mar
2007
2. Amalthea4Public Project. D1.1: Analysis of Necessary Design Steps. Technical report, ITEA (2015). https://itea3.org/project/workpackage/document/
download/2347/13017-AMALTHEA4public-WP-1-D11:AnalysisofNecessary
DesignSteps.pdf
3. Amalthea4Public Project. D4.1: Gap analysis against ISO 26262. Technical report,
ITEA (2015). https://itea3.org/project/workpackage/document/download/2232/
13017-AMALTHEA4public-WP-4-13017-AMALTHEA4public-WP-4-d41Gap
analysisagainstISO26262.pdf
4. Born, M., Favaro, J., Kath, O., Application of ISO DIS 26262 in practice. In:
1st Workshop on Critical Automotive Applications: Robustness & Safety, pp. 3–6.
ACM (2010)
5. Braun, P., Broy, M., Houdek, F., Kirchmayr, M., Müuller, M., Penzenstadler,
B., Pohl, K., Weyer, T.: Guiding requirements engineering for software-intensive
embedded systems in the automotive industry. Comput. Sci. Res. Dev. 9(1), 21–43
(2014)
6. Cuenot, P., Peikenkamp, T., Wenzel, T., Khalil, M., Rudolph, A., Lucas, J.,
Voget, S., Ross, H., Eckel, A., Biendl, E., Adler, N., Otten, S., Buch, S.: Methodology and application rules documentation. Technical report, ITEA (2014).
https://itea3.org/project/workpackage/document/download/1629/10039-SAFEWP-6-SAFED6b.pdf
7. Gallina, B., Kashiyarandi, S., Martin, H., Bramberger, R.: Modeling a safety- and
automotive-oriented process line to enable reuse and ﬂexible process derivation.
In: COMPSACW, pp. 504–509, July 2014
8. Hamann, R., Sauler, J., Kriso, S., Grote, W., Mössinger, J.: Application of ISO
26262 in distributed development ISO 26262 in reality. Technical report, SAE Technical Paper (2009)
9. Henderson-Sellers, B., Ralyté, J.: Situational method engineering: State-of-the-art
review. J. Univ. Comput. Sci. 16(3), 424–478 (2010)
10. Höttger, R., Krawczyk, L., Igel, B.: Model-based automotive partitioning and mapping for embedded multicore systems. Int. J. Comput. Control, Quantum Inf. Eng.
9(1), 268–274 (2015)
11. Krawczyk, L., Kamsties, E.: Hardware models for automated partitioning and mapping in multi-core systems using mathematical algorithms. Int. J. Comput. 12(4),
340–347 (2014)
12. Parkinson, P.: Safety, security and multicore. In: Dale, C., Anderson, T. (eds.)
Advances in Systems Safety, pp. 215–232. Springer, London (2011)
13. Ternité, T.: Process lines: a product line approach designed for process model
development. In: SEAA 2009, pp. 173–180. IEEE (2009)
14. Wolﬀ, C., Krawczyk, L., et al.: Amalthea - tailoring tools to projects in automotive
software development. In: IDAACS, vol. 2, pp. 515–520, September 2015

Evaluating a GUI Development Tool
for Internet of Things and Android
Björn A. Johnsson(B) , Martin Höst, and Boris Magnusson
Department of Computer Science, Lund University, Lund, Sweden
{bjorn a.johnsson,martin.host,boris.magnusson}@cs.lth.se

Abstract. In the emerging ﬁeld of Internet of Things (IoT), where computerized devices are combined in creative new ways, there is a need to
create Graphical User Interfaces (GUIs) for the systems being built, e.g.
in the form of Android “apps”. This is generally a complicated, time
consuming task. We report from a controlled experiment that evaluates a new approach for building GUIs that aims to make it easier for
more people to build quality GUIs. A tool supporting the new approach
(GPE) was compared to Android Studio (AS), the industry standard
for Android development. Our data analysis shows that GPE is more
eﬀective than AS when performing certain tasks, with no measurable
degrade in the quality of the produced GUIs. We conclude that nonAndroid developers and Android developers alike should consider using
GPE rather than AS when building Android GUIs for PalCom based
IoT systems.

1

Introduction

In this paper we report from a controlled experiment that compares two tools
supporting diﬀerent approaches for building Graphical User Interfaces (GUIs) for
Android “apps”. The purpose is to evaluate the eﬃciency of a novel approach
to GUI construction supported by the new tool. We compare it to Android
Studio [1], the industry standard for Android development, in the context of the
Internet of Things (IoT) framework PalCom [9,12,15].
In the emerging ﬁeld of IoT, our computerized devices are being combined in
creative new ways. Many IoT technologies aim to support end-users in combining
and conﬁguring their own devices into useful systems, a sentiment which we
share. However, from what we have seen for PalCom, it is still mostly developers
that create these types of systems. Although ubiquitous systems typically work
silently in the background [16], doing what we have instructed them to do, there
is often a need to control them in diﬀerent ways, hence the need for GUIs. The
tool we evaluate in this paper aims to make it easier for more people to build
quality GUIs. Our technology is platform independent – the created GUIs can be
interpreted on any platform. By comparing to an Android speciﬁc alternative, we
highlight the relative eﬃciency of our tool for that platform. In projects that use
PalCom, these ﬁndings should prove useful to developers and project managers
c Springer International Publishing AG 2016

P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 181–197, 2016.
DOI: 10.1007/978-3-319-49094-6 12

182

B.A. Johnsson et al.

when deciding on which tool to use for GUI creation – directly if the target
platform is Android and indicatively for other platforms.
With an expected increase in adoption of end-user composition of IoT systems, we recognize the need for end-users to tailor GUIs for controlling the
systems they build. With this in mind, we designed our approach to not require
program code to be written – it could eventually be used by non-programmers.

2

Background

The PalCom framework [9,12,15] provides a unique approach to building IoT
systems. PalCom supports a model based on commands for communicating
between services hosted on devices, rather than the commonly used model based
on communicating byte streams between endpoints e.g. TCP/IP. By supporting
this higher level view on communication, PalCom facilitates discovery on both
the device and service levels. Automatic routing between devices on heterogeneous networks for both discovery information and commands between services
is also supported. Services are put together to form systems with assemblies, a
mechanism which can bridge functionality between otherwise incompatible services, e.g. from diﬀerent manufacturers. This is enabled by the meta-level information in the protocol that all services implement, which includes speciﬁcation of
the commands the service can send and receive, and a natural language description. This information also enables a PalCom “browser” to facilitate a mode of
direct interaction with services through a crude GUI created from interpreting
the meta information.
The GUI language PML [5] provides additional information that is used to
render GUIs for PalCom services that are more intuitive and aesthetically pleasing than the ones automatically generated by interpreting the meta information
alone. It also facilitates the construction of GUIs that mix commands from more
than one service. The Graphical PML Editor (GPE) enables users to interactively build GUIs for PalCom services [6]. It produces platform independent GUI
descriptions that can be interpreted on any platform. Since the editor interprets
the meta information, the user can start from services and their commands and
represent service input commands as e.g. buttons and service output commands
as e.g. text boxes. In other words, the user starts by identifying what she wants
to do, after which she gets suggestions for graphical components that can achieve
that functionality. Because this approach puts focus on presenting functionality
in a GUI rather than attaching functionality to manually added components,
there is no need for programming.
Alternative technologies include Android Studio [1], which provides a graphical editor in which a GUI can be composed out of graphical components that
are placed on a canvas. Their properties (size, color, etc.) can be edited to ﬁt the
user’s needs. Other examples of such tools are often found as part of Integrated
Development Environments (IDEs) such as Swing GUI Builder in NetBeans,
Interface Builder in Xcode, or Qt Creator. These tools allow the user to create
the code that make up the resulting GUI without explicit coding, thus speeding

Evaluating a GUI Development Tool for IoT and Android

183

up development – a notion which we adapted in our own solution. Unlike our
solution however, in similar tools the user still needs to implement GUI behavior, typically by writing “glue code” linking the graphical components to the
functionality e.g. in a local model or by triggering some communication. The
requirement to provide this code makes it impossible for a non-programmer to
construct GUIs.

3

Related Work

Although the use of a graphical editor simpliﬁes the construction of GUIs, they
are still complicated to develop in general and require experienced developers.
Already in [13] an attempt was made to simplify the design of GUIs in C, by
developing a simpliﬁed toolkit. It was found that college juniors could use it
within less than three hours. In [7] an adaptive interface is introduced and evaluated. The results highlight the importance of developing simpliﬁed development
tools, as in line with this research. Furthermore, in [10] the authors state that
GUIs can be made easier and more economical to create by reducing the amount
of code that has to be written. We therefore see the result of completely eliminating programming from GUI development as an important step in making the
GUI creation process more eﬃcient and more inclusive of end-users.
End-user composition is a well covered area of research. In e.g. [3,11] the
authors present systems that enable end-users to compose systems based on
everyday metaphors. This research is on the same level as PalCom assemblies,
with no support for GUIs. The idea of a GUI language based around a component/service architecture like PalCom is not entirely novel; the languages presented in e.g. [2,8] share many similarities with PML. However, most of these
types of languages are not augmented with a graphical editor, and haven’t solved
the many challenges of development eﬃciency and end-user composition. Our
research relates to classical User Interface Management Systems (UIMSs) in
that the underlying functionality of our apps – PalCom services – is clearly separated from the GUI [14]. It is this concept that allows us to eliminate the need
to write program code.

4
4.1

Experiment Planning
Goals

The object of study for our experiment is the new approach to GUI development
provided by PML through the Graphical PML Editor (GPE). The purpose is
to evaluate the performance of the new development approach in relation to the
traditional one oﬀered in Android Studio (AS), i.e. the industry standard for
developing Android applications. During the evaluation, the quality focus is on
the eﬃciency – in terms of subject productivity – of the tested tools (approaches).
Productivity is measured as the mean number of time units per experiment task
solved. We perform the evaluation from the perspective of us as researchers, to

184

B.A. Johnsson et al.

determine if the subjectively experienced performance gains of the new approach
are statistically signiﬁcant. The subjects of the experiment are engineering students at Lund University (Sect. 4.2). The subjects solve GUI development tasks
in the context of a problem created speciﬁcally for this experiment (Sect. 4.4).
The goal of the experiment is summarized as follows:
Analyze the approach to GUI development of PML (through GPE)
for the purpose of evaluation
with respect to eﬃciency (subject productivity)
from the point of view of the researchers
in the context of software developers solving GUI development tasks for a
designed problem.
Based on the goal, we consider the following research questions:
RQ1 Which tool is most eﬃcient?
RQ2 Is the quality of produced solutions the same for both tools?
RQ3 What are the users’ impressions from using the tools?
4.2

Participants

For our experiment, we are interested in investigating what eﬀects the treatments
have on the productivity of software developers. We therefore select our subjects
from the population of engineering students at Lund University that have gotten
good grades in programming courses. Even though interest in the experiment was
high, it proved diﬃcult to ﬁnd a session date that matched everyone’s schedule.
Because of this, we had to expand on the number of sessions considered. We also
had to broaden the type of students that we invited. Students registered interest
in an online form (Sect. 4.3). Invitations were sent by email in three waves to:
1. members of Code@LTH1 , and typical 3rd year Computer Science and Engineering (CSE) students with good grades.
2. typical 2nd year CSE students with good grades.
3. engineering students, typically 2nd year, with an interest for Computer Science courses, good grades, and experience with Android Studio.
As a last measure in response to the lack of registered applicants with Android
development experience, we also invited staﬀ from our department; two volunteered and were assigned to the AS treatment. Because of how invitations
recipients were selected, we know all subjects had a common base of programming knowledge. All applicants that registered were included in the experiment,
to a total of 24 subjects. Any applicant that reported previous experience with
Android Studio (12 in total) was selected for that treatment. The remaining 12
were assigned to the GPE treatment.
The subjects committed to a half-day of experimentation time by registering
interest for participation. In doing so, they gave consent for the experiment
1

Student driven recreational programming community, http://www.codeatlth.org.

Evaluating a GUI Development Tool for IoT and Android

185

administrators to handle their data in a conﬁdential manner. The subjects were
paid (on the level of lab assistants) to participate in the experiment. To avoid
non-serious applicants, no concrete ﬁgure was speciﬁed in the invitation, we only
mentioned a “symbolic” monetary compensation.
4.3

Experimental Material

To register interest for participating in our experiment, subjects were instructed
to ﬁll out an online form. The form included questions to collect name and contact information, which of several dates they could attend, previous programming knowledge (multiple choice, four levels), and whether they would consider
using their own computers during the experiment.
The functionality needed to complete the experiment tasks (Sect. 4.4) was
provided to the subjects on a virtual machine acting as server. The server
was represented as a PalCom device hosting four services: PatientService,
LoginService, AssessmentService, and ChecklistService. The services
handle identiﬁcation and information about (ﬁctive) staﬀ and patients, and medical records for patients. Each participants was issued a printed compendium
including general experiment instructions, the description for a warmup task
(Sect. 4.4), and the speciﬁcations for the four PalCom services. Subjects in the
AS group were instructed to run Android Studio version 2.1.x when solving
the tasks. In the GPE group, the Graphical PML Editor was used instead.
Both groups tested their solutions on Android tablets from Sony, model Xperia
Tablet Z2 (SGP521), running Android version 6.0.1. This was at the time the
most up-to-date (major) release. The PML description interpreter (renderer) for
Android was installed on the tablets of the GPE group. Digital experimental
material was handed out on USB ﬂash drives containing: copy of compendium
(PDF); Graphical PML Editor (GPE group only); source ﬁles for 0th iteration
(Sect. 4.4); PalCom browser, for exploring the server; and a time tracker software
(Sect. 4.7).
To capture the subjects’ impressions of the experimental material and the
applied treatments, a second online form was digitally distributed at the end of
the experiment sessions. Questions included: whether suﬃcient information was
provided to complete the tasks; whether they would consider using their tool;
how conﬁdent they were in the correctness of their solutions; general comments
and comments on individual task; and whether they would like to take part of
the results of the experiment, i.e. this paper.
4.4

Tasks

One “warmup” task and eight proper tasks were prepared for the experiment.
The purpose of the warmup task was to allow participants to get acquainted with
their tool; development time was not recorded. The subjects started working
from a small 0th iteration, i.e. a base application with minimal graphical design
and functionality implemented. The tasks were performed in order, each adding
or changing functionality and/or design from the previous tasks. The tasks were

186

B.A. Johnsson et al.

Table 1. Functional requirements of task 3; includes requirements from tasks 1 and 2.
ID No. Description
Application
A.1

Login screen is opened upon application startup

Login Screen
L.1

Text box for password entry hides its content e.g. as asterisks

L.2

Text boxes for username and password entry are cleared upon successful login

L.3

Main screen is opened upon successful login

L.4

List of patients is requested from server upon successful login

Main Screen
M.1

No patient is selected after opening (successful login)

M.2

Patient selection list shows patient names only (no IDs)

M.3

Selecting a patient causes its information to be display

M.4

Selecting no patient clears the patient information section

M.5

Full name of logged in staﬀ member is displayed

M.6

Logout button opens Login screen

designed to mimic the evolution of software in real world development scenarios. For the experiment problem, we drew inspiration from the results of the
itACiH [4] project, where an Android application to support nurses in Hospital
Based Home Care (HBHC) was developed. No natural order of increasing diﬃculty per task was planned. Instead, tasks were allowed to organically evolve the
application. However, we strived to have the tasks cover a wide range of typical
GUI development duties (e.g. adding new screens) and graphical components
(dropdown lists, popup dialogs, etc.). Task descriptions were presented in both
text – primarily describing functional requirements – and images – mockups of
the screens, describing graphical requirements.
The tasks will not be listed in full because the page limitation of this paper
does not allow it. They can be made available (in Swedish) upon request. Table 1
lists the functional requirements of task 3, which build upon the those of tasks
1 and 2. After having completed the warmup task, the app allows users to enter
a patient’s ID into a text box and press a button to request that patient’s
information from the server; the reply is promptly displayed in the GUI. We
summarize the descriptions of the three subsequent tasks as follows:
Task 1 The patient selection mechanism is replaced; the user selects patients by
name from a drop-down list populated by content received from the server.
Requirements: M.2, M.3, M.4.
Task 2 A simple login screen is added. The content of two text boxes (username/password) is sent to the server for authentication when a button is
pressed. On positive server reply, the main screen is opened. Requirements:
A.1, L.1, L.2, L.3, L.4, M.1.

Evaluating a GUI Development Tool for IoT and Android

187

Task 3 The currently logged in user’s name, a product logotype and a logout
button is added to the main screen. Requirements: M.5, M.6.
4.5

Parameters and Hypotheses

The experiment has a single independent variable: GUI Development Approach
(GDA). This is measured on a nominal scale by categorization based on tool as
either ‘GPE’ for the Graphical PML Editor or ‘AS’ for Android Studio. We also
identify the following variables:
– Experience of Android development (AXP) is measured on an ordinal scale
by introducing a classiﬁcation with four classes based on prior experience: no
prior experience, casually tested tools or novice amateur developer, educated
and/or advanced amateur developer, professional.
– General programming experience (GXP) is measured on a scale that is analogous to that of AXP.
The experiment factor is GDA, which has two treatments. GXP is used to
verify subject selection, and AXP is used to determine treatment assignment
(Sect. 4.2). The dependent variables for the experiment are:
– Tool eﬃciency, measured on a ratio scale as mean development time (µT ) per
task, in minutes.
– Quality of produced solutions, measured on a ratio scale as the mean number of
speciﬁcation deviations (µD ) for a set of tasks. The deviations are unweighted,
i.e. all deviations are considered equal in terms of severity. Two types of deviations are considered: functional and graphical.
– Experiment impressions, measured on an ordinal scale (1–5) as the mean value
of the answers (µA ) to two questions: (1) “Would you consider using [tool]
to develop Android apps in the context of PalCom systems?”, and (2) “How
conﬁdent are you that your submissions are compliant with speciﬁcation?”.
Our primary dependent variable is µT , with µD and µA serving as variables to
strengthen conclusions drawn from µT .
We formulate three hypotheses based on our research questions. Informally,
we hypothesize that the mean development time of the GPE group will be less
than that of the AS group (RQ1). Regarding solution quality we do not expect
that any treatment will outperform the other (RQ2). Likewise, we expect both
groups to consider using their tool, and for the conﬁdence in solution quality
to be comparable (RQ3). Formally, we deﬁne A = {1, 2, . . . , 8} as the set of all
experiment tasks. Then, for all a ∈ A for some A ⊂ A
H01 : µT (‘GPE’, a) = µT (‘AS’, a)
Ha1 : µT (‘GPE’, a) < µT (‘AS’, a)

(1)

188

B.A. Johnsson et al.

where µT (t, a) refers to the mean development time of task a for treatment t ∈ T
with T = {‘AS’, ‘GPE’}. Furthermore, for all d ∈ D and some a ∈ A
H02 : µD (‘GPE’, d, a) = µD (‘AS’, d, a)
Ha2 : µD (‘GPE’, d, a) = µD (‘AS’, d, a)

(2)

where µD (t, d, a) refers to the mean number of speciﬁcation deviations of type
d for task a and treatment t ∈ T . D = {F, G} refers to functional and graphical
deviations respectively. Finally, for all q ∈ {1, 2}
H03 : µA (‘GPE’, q) = µA (‘AS’, q)
Ha3 : µA (‘GPE’, q) = µA (‘AS’, q)

(3)

where µA (t, q) refers to the mean value of the answer to question q for the group
with treatment t ∈ T .
4.6

Design

The experiment design for the stated hypotheses is of the standard type “one
factor with two treatments” [17]. The factor is GDA and its treatments are GPE
and AS. Our main hypothesis (hypothesis 1) states that development time will
be lower for treatment GPE than AS. We have two auxiliary hypothesis (hyp. 2
and 3) to strengthen the validity of hypothesis 1. The dependent variables ‘tool
eﬃciency’ and ‘solution quality’ are measured on ratio scales and are tested with
(non-parametric) Mann-Whitney-Wilcoxon (MWW) tests in R. We favor MWW
tests over t-tests since we cannot assume that the measured data will follow the
normal distribution. The ‘experiment impressions’ variable is measured on an
ordinal scale and is also tested with MWW tests.
4.7

Procedure

Data was collected across 6 distinct experiment sessions, over a period of 6 weeks.
4 sessions included participants for both treatments, and 2 covered only the AS
treatment. Each session was preceded by a registration phase, and succeeded by
a followup phase.
Before attending an experiment session, the subjects had to register. The
registration form served a double purpose: it was used to collect data regarding
programming knowledge, i.e. independent variables AXP and GXP. The data
was recorded in spreadsheets.
All sessions were scheduled for an afternoon. After an initial hour of introduction and training, the participants worked independently on solving tasks for the
remaining three hours. This time restriction was not ﬁrmly enforced; the subjects
were granted additional time to ﬁnish started tasks. The sessions were monitored,
and were held in seminar rooms at our department. Upon arrival, subjects were
assigned a desk where equipment and documents had been laid out.

Evaluating a GUI Development Tool for IoT and Android

189

The introductory part of the sessions included an overview of the experiment
(goals, etc.), an introduction to our technologies (PalCom, PML), training, and
information about practical matters. The GPE group, having never used the
Graphical PML Editor before, got approximately 20 min of training for the tool.
Additionally, the warmup task was considered part of the training; the subjects
were allowed to ask any questions during this task. For the subsequent tasks
1–8, a judgment call on the part of the experiment instructor(s) was done on
whether to answer the question, or refer to the manual. In general, only questions
resulting from technical/practical problems with the tool were answered. The AS
group was expected to have previous experience with their tool; no training was
provided. Instead, the participants were given a “tour” of the 0th iteration’s
Android Studio project while the GPE group was receiving training. Intricacies
regarding the project’s connection to the PalCom world were discussed. They
were encouraged to ask questions about this during the warmup task.
During the active phase of the sessions, participants were responsible for
recording development times for individual tasks (data for dependent variable
µT ). A custom built time tracker tool was provided for this purpose. The tool
ensured task ordering by making the task descriptions available only after the
preceding tasks had been completed. Before proceeding to a new task, the time
tracker prompted (mandatory) the subjects to upload the ﬁles for their current
solution (data for dependent variable µD ). The subjects could “pause” a task
in the tool’s GUI, e.g. when going to the bathroom. The severity of forgetting
to “unpause” was emphasized, and features of the tool were implemented to
minimize this risk. The development time data and solution ﬁles were collected
on USB ﬂash drives at the end of each session.
No restrictions on material was made during the experiment sessions. Subjects had free access to the internet.
After the conclusion of the active phase of the sessions, experiment impressions were collected in an online form. The data (dependent variable µA ) was
recorded in a spreadsheet. The participants were encouraged to ﬁll out the form
immediately following the experiment session, in order for the experience to still
be fresh in their minds. They were allowed to be anonymous.
4.8

Analysis Procedure

To analyze development time (µT ) we extracted the data collected by the time
tracker, which was stored on USB ﬂash drives (one per subject) in JSON format. The data was loaded into an custom Java program, outputting a commaseparated values (CSV) ﬁle containing for each subject: subject ID, and start/
stop/total times per task. We processed this ﬁle in R. Data for the AS group
was divided into tiers based on subject performance in task 1. Hypothesis 1 was
tested with Mann-Whitney-Wilcoxon (MWW) tests for the diﬀerent groups and
tiers, and for individual tasks depending on data availability.
We analyzed the quality of submitted solutions (µD ) by extracting the solution data that was stored on USB ﬂash drives by the time tracker tool. For the
AS group, this data consisted of Android Studio projects that could be loaded

B.A. Johnsson et al.
190

00:00

1 1: 8m

3 1: 16m

2 1: 13m

2: 10m

3: 11m

2: 23m

2: 13m

2: 18m

4: 21m
3: 20m

2: 21m

4 1: 18m
5 1: 22m
2: 25m

2: 40m

2: 18m

2: 30m

2: 24m

6 1: 23m
7 1: 27m
8 1: 28m
9 1: 30m
10 1: 30m
11 1: 36m
12 1: 61m

13 1: 36m
14 1: 39m
15 1: 43m
16 1: 54m
17 1: 106m
18 1: 108m
19 1: 111m
20 1: 149m
21 1: >51m
22 1: >110m
23 1: >138m
24 1: >146m

00:00

5: 30m
4: 17m

3: 14m

3: 25m

3: 27m

3: 25m

3: 25m
3: 30m

2: 45m

2: 34m

2: 36m

2: 47m

01:00

4: 18m

5: 41m

6: 19m

5: 30m

3: 17m

5: 47m

5: 29m

6: 18m

5: 43m

02:00

6: >42m

6: 25m
4: >26m
4: 29m

02:00

5: 35m

2: 32m

2: >49m

4: 27m

4: 17m

5: 36m

5: 27m

5: 18m

3: 13m

4: 13m

3: 26m

4: >15m

4: 21m

4: 22m

3: 26m

3: 14m

3: 35m

4: 27m

4: 38m

2: 26m

4: 31m

3: 22m

2: 77m

01:00

6: 34m

5: 25m

5: 31m

3: 46m

03:00

03:00

Fig. 1. Development times sorted by task 1, excluding warmup and pauses. Unﬁnished
tasks are slanted where the experiment ended.

Evaluating a GUI Development Tool for IoT and Android

191

into Android Studio to allow for installation and review on an Android tablet.
For the GPE group, the extracted ﬁles were PML descriptions (XML ﬁles) that
could be installed, interpreted and reviewed on an Android tablet. Solutions
from all subjects in the GPE group were reviewed, but we only considered the
top performing AS tier. We reviewed the solutions for the highest numbered
task that all subjects in this selection had completed. Review results, per subject, were manually recorded in a spreadsheet. Functional requirements (Table 1)
were recorded as either ‘passed’ or ‘failed’. Deviations from the graphical design
(screen mockups) were counted by visual comparison. Standard graphical deviations were identiﬁed, but ultimately the comparison was made on case-by-case
basis, relying on the judgment of the reviewer (one person). The data from
the spreadsheet was downloaded as a CSV ﬁle and processed in R. We tested
hypothesis 2 with MWW tests for the diﬀerent groups and the identiﬁed highest
numbered task.
The results of the post-experiment survey were stored in a spreadsheet and
were used to analyze participant experiment impressions (µA ). We downloaded
the data as a CSV ﬁle and processed it in R. As before, we included the data
from all subjects in the GPE group, but only the data from the top performing
AS tier. Hypothesis 3 was tested with MWW tests for the diﬀerent groups and
the two questions related to µA .

5
5.1

Analysis
Data Set Preparation

At the end of the experiments, some participants mistakenly submitted the solution for the task they were working on. As these solution were at best partially
complete, their end-times are not valid. Before processing the JSON ﬁles of these
subjects, the erroneous end-times were manually removed. Furthermore, to avoid
clutter in the graphics (Fig. 1), unﬁnished tasks that the subjects had worked
on for less than 15 min were also removed.
In the post-experiment survey form, subjects were allowed to identify themselves by text. Some subjects wrote their names, while others entered their
assigned subject ID. Before downloading the data from the sheet where the
results were recorded, all entries were mapped to subject ID to enable proper
analysis.
5.2

Descriptive Statistics and Hypothesis Testing

Figure 1 gives an overview of the development times per task for all participants,
from all experiment sessions. The participants are sorted based on the development time for task 1. The GPE group consists of subjects 1–12. For the AS
group, we group participants into three tiers based on performance:
Top tier completed the ﬁrst task in less than 60 min (subjects 13–16).
Middle tier completed the ﬁrst task in more than 60 min (subjects 17–20).

192

B.A. Johnsson et al.
Table 2. Top tier: One-sided MWW test, AS>GPE, for development time.
a µT (‘AS’, a) µT (‘GPE’, a) p-value
1 43.0

26.0

0.0105

2 48.5

24.4

0.0090

3 24.8

23.1

0.5725

Bottom tier failed to complete the ﬁrst task (subjects 21–24).
The results from the bottom tier cannot be analysed, and are discarded. We
proceed to analyze GPE vs. AS for the middle tier and GPE vs. AS for the top
tier separately.
For the middle tier, we analyze development time for task 1: µT (‘AS’, 1) =
118.5 min. For the GPE group we get µT (‘GPE’, 1) = 26.0 min. Comparing
the groups using a one-sided Mann-Whitney-Wilcoxon (MWW) test, assuming
AS>GPE, we get a p-value of 0.0021. The null hypothesis H01 is rejected for
the middle tier with A = {1}, i.e. for task 1.
For the top tier we analyze development time, task solution quality, and the
values of the answered survey questions. We start by analyzing development
times for tasks 1–3; the results for tasks 4–6 cannot be analysed due to a lack of
data points in the top tier AS group. The results of comparing the groups using
a one-sided MWW test, assuming AS>GPE, are reported in Table 2. The box
plot in Fig. 2 shows the data being analysed. We reject the null hypothesis H01
for the top tier and A = {1, 2}, but not for A = {3}.
Next, we analyze the data produced from reviewing the submitted task solutions. We reviewed the solutions for task 3 – the highest numbered task solved
by all subjects in both the GPE group and the top tier AS group. The results of
comparing the counts of functional (F) speciﬁcation deviations using a two-sided
MWW test are reported in Table 3. R refers to all requirements listed in Table 1
(‘ID No.’). For the top tier AS group 4/4 solutions satisﬁed requirement M.1,
Task 1

Development time (minutes)

80

Task 2

Task 3

60

40

20

AS

GPE

AS

GPE

AS

GPE

Fig. 2. Box plot of top tier development time data for tasks 1, 2 and 3.

Evaluating a GUI Development Tool for IoT and Android

193

Table 3. Top tier: Two-sided MWW test for functional speciﬁcation deviations.
Requirements

µD (‘AS’, F, 3) µD (‘GPE’, F, 3) p-value

R = Table 1

0.250

R‘ = R\{M.1} 0.250

1.08

0.0201

0.166

0.7883

Table 4. Top tier: Two-sided MWW test for answers to survey questions.
q Description

µA (‘AS’, q) µA (‘GPE’, q) p-value

1 Would consider using tool 4.8

4.2

0.163

2 Confidence in submissions 4.2

3.8

0.140

while only 1/12 solutions from the GPE group did the same. We refer to Sect. 6.1
for a discussion, and run the test again with R deﬁned as all requirements of
R except for M.1. Analyzing the data for the number of graphical (G) speciﬁcation deviations we ﬁnd that µD (‘AS’, G, 3) = 1.75 and µD (‘GPE’, G, 3) = 2.08.
Comparing the groups using a two-sided MWW test we get a p-value of 0.8521.
We cannot reject the null hypothesis H02 for the top tier for either p = F or
p = G when a = 3 and when considering R rather than R.
Finally, we analyze the data collected from the participant survey at the end
of experiment sessions. 100 % of the participants answered the survey. Table 4
compares the answers of the top tier AS group and the GPE group using a
two-sided MWW test. The two questions are described in Sect. 4.5. Both were
answered on an ordinal scale of [1, 5]. We calculate mean values to provide a
sense of how the subjects answered. We cannot reject the null hypothesis H03
for the top tier for either question 1 or 2.

6
6.1

Discussion
Evaluation of Results and Implications

Based on informal interviews with the participants of the AS group, we concluded
that the scale in the registration form for classifying AXP was unsuccessful. Some
subjects were too modest, registering in a lower category than appropriate, while
others misinterpreted the classiﬁcation and mistakenly put themselves in a higher
category than appropriate. Because of this, we grouped subjects in the AS group
based on their performance during the experiment. Three tiers were identiﬁed.
We discarded the data of the bottom tier, as we considered their level of AXP
to be too low, and hence not applicable for our experiment.
We perceive the middle tier subjects as good programmers with limited AXP.
Their results are hence indicative of what happens if good non-Android developers are to create Android applications. This has relevance in practice: after a
PalCom system has been created, typically some form of GUI has to be created
to interact with it, not uncommonly on Android devices. As hypothesized, the

194

B.A. Johnsson et al.

mean development time for the middle AS tier was longer than for the GPE
group: 118.5 vs. 26.0 min for task 1. Hence, we reject H01 for the middle tier
and task 1. In the choice between GPE and AS, the results tell us that nonAndroid developers should use GPE rather than AS when creating GUIs for
PalCom systems.
The subjects of the top AS tier are perceived as good programmers with
a fair amount of AXP. Their results are interpreted as the expected outcome
of using average Android developers to create Android applications. The mean
development time for the top AS tier was, for tasks 1 and 2, longer than for
the GPE group: 43.0 vs. 26.0 min and 48.5 vs. 24.4 min, respectively. Hence, we
reject H01 for the top tier and tasks 1 and 2. For task 3, however, we cannot
reject the null hypothesis. From the description of task 3, we observe that the
task had a clear focus on graphical design: changing screen title, adding a logo,
etc. As opposed to tasks 1 and 2, few functional matters were covered. The
novelty of GPE is to change how functionality is speciﬁed, thus eliminating the
need for glue code. Hence, we argue that it is reasonable for a task where little
glue code would have to be written in AS to not be signiﬁcantly faster to develop
in GPE. It appears that even Android developers can beneﬁt from using GPE
rather than AS when creating GUIs for PalCom systems.
Analyzing the quality of the solutions for task 3 from the top tier AS and GPE
groups, we found no diﬀerence in terms of graphical speciﬁcation deviations: we
could not reject H02 for graphical deviations and task 3. However, it appears
that the GPE group deviated more from the functional speciﬁcation. Looking
at the data, we found that almost all GPE subjects failed to satisfy requirement
M.1 (Table 1). This cannot be a coincidence, and we do not believe that so
many subjects would deliberately submit faulty solutions, hoping we would not
notice the infraction. Instead, we believe that the subjects simply missed the
error since it is intricate to test and hence easy to miss. The reason that no
AS subject failed to satisfy the same requirement is that no additional technical
eﬀort (development time) was needed on their part; AS guarantees by default
that requirement M.1 is satisﬁed. We therefore think it is relevant to test for
functional deviations again, excluding M.1. In doing so, we could not reject H02
for functional deviations and task 3. In practice, it could be argued that the
GPE group has produced solutions with no diﬀerence in quality compared to
the top tier AS group, in less time.
For experiment impressions, we found no diﬀerences between the groups,
as hypothesized. We could not reject H03 for either question. The results tell
us that both groups would consider using their tool and that the subject have
conﬁdence in their submitted solutions. The latter ﬁts well with our analysis of
solution quality. The former is interesting, as the subjects of the GPE group had
an agreeable experience (mean 4.2 on a 1–5 scale) using our tool.
6.2

Threats to Validity

Here we discuss threats to the validity [17] of the experiment and ﬁndings.

Evaluating a GUI Development Tool for IoT and Android

195

Instrumentation: As mentioned earlier, the scale in the registration form for
classifying AXP was unsuccessful due to bad instrumentation. We should have
made clearer categories to choose from. This was solved in analysis by dividing
the subject of the AS group into three tiers. Since the tasks were created specifically for the experiment, low quality in descriptions could have aﬀected results
for one or both treatments. However, in the post-experiment survey, subjects in
both treatment groups reported that they had adequate information to solve the
tasks.
History: Originally, the experiment was planned for just two sessions on consecutive days to minimize the eﬀect of external factors such as environment and
student schedule. To attract enough subjects, we opted for more sessions over a
longer period of time. Although not formally analyzed, no single session stands
out in terms of inferior results as a consequence of experiment date. We considered varying times of day as a possible threat, and hence scheduled all the
sessions in the afternoon.
Compensatory Rivalry: In all communication with subjects (invitations, experiment introduction, etc.) we deliberately expressed ourselves in the most neutral
way possible about both treatments, while at the same time not hiding the purpose of the experiment. We wanted to avoid having either group feel like the
underdog, which could have aﬀected outcome.
Group Stress: We identiﬁed a possible threat in that faster subjects could have a
stressful eﬀect on slower subjects, causing more error to be made and obstructing progress. To minimize this eﬀect, we distributed task digitally, one-by-one.
Handing out all tasks in the beginning of the experiment was considered, but that
could have compromised task ordering, e.g. by subjects accidentally skipping a
task.
Interaction of Selection and Treatments: Having subjects that are not representative of the population we want to generalize for is a threat to external
validity. The context of this experiment are software developers. While our subjects – students – are not yet fully trained industrial developers, we believe that
our subject selection process has ensured enough programming experience for a
valid generalization.
Professionals: We cannot claim that we drafted the best possible Android developers for our experiment, not even in the top tier. If we had hired trained Android
consultants, the development times of the AS group would probably have been
shorter. However, similar logic can be applied to the GPE group: had we spent
more than 20 min training the subject we would have gotten shorter development
times, even more so had we drafted experienced GPE developers. Therefore, we
believe that our comparative analysis is valid. It is however a challenge to identify
representative subjects in an experiment like this.
Interaction of Setting and Treatment: A threat when using “toy” problems is that
the experimental material might not be representative of industrial standard.
We handled this threat by designing the experiment tasks to mimic a typical

196

B.A. Johnsson et al.

industry development scenario (Sect. 4.4). The idea was to represent the most
general use cases in order to not favor one treatment over the other. We ensured
a valid setting by comparing GPE to the most up-to-date version of the market
leading tool for Android development (AS).
Reliability of Measures: Since subjects self-monitored development time, there is
a certain level of unreliability in the collected data. One possible threat was that
the subjects would forget to “unpause” the time tracker tool, thus corrupting
the data. We took measures to prevent this in the software, and more than
once per session emphasized the severity of forgetting to unpause. Experiment
instructors also checked in with participants periodically during the experiments.
No incidents were reported or observed. Another threat is subjects deliberately
pausing the time tracker to improve their recorded time. However, we do not see
what their motivation for doing so would be as the results are anonymous, and
again, the sessions were monitored by the experiment instructors.

7

Conclusions

Our analysis shows that for good programmers with limited experience of using
AS, the gains in terms of user productivity are signiﬁcant when using GPE
instead of AS. For programmers with a fair amount of experience of using AS the
gains are not as pronounced, but still present. The size of the gain also depends
on the type of task being solved – GPE is more eﬀective when performing certain
tasks, but analysis is inconclusive for others. We found no statistical diﬀerence
between the treatments in terms of the quality of the solutions produced during
the experiment. Furthermore, we found that the GPE group would consider
using our tool in the given context. We conclude that in the choice between GPE
and AS, non-Android developers and Android developers alike should consider
using GPE rather than AS when creating Android GUIs for PalCom systems. In
practice, this is relevant because typically when a developer creates a PalCom
system, a GUI – not uncommonly for Android – has to be produced as well.
With a basis for how GPE performs against the industry standard tool when
used by programmers, in future research we are interested in seeing how GPE
performs when used by non-programmers in comparison to programmers. This
could be investigated in an experiment similar to this, although the functionality
domain would have to be adjusted to be relevant to non-programmers. One such
domain could be home automation.
Acknowledgements. We thank Gunnar Weibull, M.Sc., for his work on implementing the Graphical PML Editor and for agreeing to train experiment participants in
how to use the tool. We also thank Mia Månsson, M.Sc., for being an enthusiastic ﬁrst
user of the tool and for providing valuable insights on how to improve it. Furthermore,
we thank all experiments subjects for taking the time to participate in our experiment. This research was founded by the Swedish research fund VINNOVA through its
program on Challenge Driven Innovation, and by FORTE on a grant for Application
Support in Healthcare (LUC3).

Evaluating a GUI Development Tool for IoT and Android

197

References
1. Android Studio: The oﬃcial IDE for Android (2016). https://developer.android.
com/studio/index.html
2. Bishop, J.: Multi-platform user interface construction: a challenge for software
engineering-in-the-small. In: Proceedings of the 28th International Conference on
Software Engineering, pp. 751–760. ACM (2006)
3. Humble, J., Crabtree, A., Hemmings, T., Åkesson, K.-P., Koleva, B., Rodden,
T., Hansson, P.: “Playing with the bits” user-conﬁguration of ubiquitous domestic environments. In: Dey, A.K., Schmidt, A., McCarthy, J.F. (eds.) UbiComp
2003. LNCS, vol. 2864, pp. 256–263. Springer, Heidelberg (2003). doi:10.1007/
978-3-540-39653-6 20
4. Johnsson, B.A., Magnusson, B.: Supporting collaborative healthcare using PalCom - the itACiH system. In: 2016 IEEE International Conference on Pervasive
Computing and Communication Workshops (PerCom Workshops), pp. 1–6. IEEE,
March 2016
5. Johnsson, B.A.: PalCom meets the end-user: enabling interaction with PalCombased systems. Licentiate Thesis 2014(2), 1–89 (2014). Lund University
6. Johnsson, B.A., Weibull, G.: End-user composition of graphical user interfaces for
PalCom systems. Procedia Comput. Sci. 94, 224–231 (2016)
7. Kalverkamp, M., Gorldt, C.: IoT service development via adaptive interfaces:
improving utilization of cyber-physical systems by competence based user interfaces. In: 2014 International ICE Conference on Engineering, Technology and Innovation (ICE), pp. 1–8, June 2014
8. Luyten, K., Vandervelpen, C., Coninx, K.: Migratable user interface descriptions
in component-based development. In: Forbrig, P., Limbourg, Q., Vanderdonckt, J.,
Urban, B. (eds.) DSV-IS 2002. LNCS, vol. 2545, pp. 44–58. Springer, Heidelberg
(2002). doi:10.1007/3-540-36235-5 4
9. Magnusson, B., Johnsson, B.A.: Some like it hot: automating an electric kettle
using PalCom. In: Proceedings of the 2013 ACM Conference on Pervasive and
Ubiquitous Computing Adjunct Publication, pp. 63–66. ACM (2013)
10. Myers, B.A.: User interface software tools. ACM Trans. Comput. Hum. Interact.
(TOCHI) 2(1), 64–103 (1995)
11. Newman, M.W.: Now we’re cooking: recipes for end-user service composition in
the digital home. In: CHI 2006 Workshop: IT@Home (2006)
12. Nordahl, M., Magnusson, B.: A lightweight data interchange format for internet of
things with applications in the PalCom middleware framework. J. Ambient Intell.
Humanized Comput. 7(4), 523–532 (2016)
13. Pausch, R., Conway, M., Deline, R.: Lessons learned from SUIT, the simple user
interface toolkit. ACM Trans. Inf. Syst. 10(4), 320–344 (1992)
14. Rosenberg, J., Hill, R., Miller, J., Schulert, A., Shewmake, D.: UIMSs: threat or
menace?. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 197–200. ACM (1988)
15. Svensson Fors, D., Magnusson, B., Gestegård Robertz, S., Hedin, G., NilssonNyman, E.: Ad-hoc composition of pervasive services in the PalCom architecture.
In: Proceedings of the 2009 International Conference on Pervasive Services, pp.
83–92. ACM (2009)
16. Weiser, M.: The computer for the 21st century. Sci. Am. 265(3), 94–104 (1991)
17. Wohlin, C., Runeson, P., Höst, M., Ohlsson, M.C., Regnell, B., Wesslén, A.: Experimentation in Software Engineering. Springer Science & Business Media, Heidelberg
(2012)

Application of GQM+Strategies in a Multi-industry
State-Owned Company
An Experience Report
Gustavo López ✉ , Brenda Aymerich, Diana Garbanzo, and Alexia Pacheco
(

)

Research Center for Communication and Information Technologies (CITIC),
University of Costa Rica, San José, Costa Rica
{gustavo.lopez_h,brenda.aymerich,diana.garbanzo,
alexia.pacheco}@ucr.ac.cr

Abstract. Technological applications have an increasingly important role in
most companies. Investment in Information Technologies (IT) is also growing in
most of them. The need to align IT-related goals with the company’s strategic
goals becomes imperative. GQM+Strategies is an approach to align organiza‐
tional goals, strategies, and measurements at diﬀerent levels of an organization.
This paper describes experiences learned from a GQM+Strategies implementa‐
tion at a large multi-industry state-owned company. The implementation was
conducted by an academic research team joined by representatives of the
company. Results showed an improved alignment and integration of diﬀerent
goals. Moreover, a holistic goal visualization was achieved, even though the
company works in two diﬀerent industries. As a state-owned company, external
pressures force continuous planning. Sometimes, inadvertently designing those
plans is the only goal achieved (i.e., plans are never executed). Using GQM
+
Strategies, the research team leveraged the separation of goals and strategies,
allowing identiﬁcation of redundancies and replicated eﬀorts across the company.
The implementation of the GQM+Strategies approach in such a complex context
was very valuable. However, it also required a large amount of eﬀort from the
researchers and company representatives.
Keywords: GQM+Strategies · Goal-oriented measurement · Strategic alignment ·
Software process improvement · IT strategy · Experience report

1

Introduction

GQM+Strategies is a measurement approach that extends the GQM paradigm [1, 2], it
promotes the creation of measurement programs that ensure alignment between goals
in diﬀerent levels of a company.
This paper describes experiences gathered while applying GQM+Strategies in a
complex context (i.e., a large multi-industry state-owned company). The company has
an intrinsically hierarchical structure. Moreover, industry segregation promoted a silo
mentality within the company.
© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 198–214, 2016.
DOI: 10.1007/978-3-319-49094-6_13

Application of GQM+Strategies in a Multi-industry State-Owned Company

199

The implementation of GQM+Strategies in this context was triggered by a Software
Process Improvement (SPI) initiative in the company’s IT department. This department
serves both industries and an administrative and ﬁnance management. Moreover, it is
in charge of promoting IT related controls and standards.
The main beneﬁts of GQM+Strategies and similar approaches are: an explicit
description of goals, strategies, context factors and metrics that allow the organizational
alignment; a transparent instrument to improve communication both within the organ‐
ization and to third parties; and measurement data that allows information-based deci‐
sion-making [3].
This research has two goals: one industry-related and one academic. The industryrelated goal was to align the diﬀerent levels of a large multi-industry state-owned
company and to help this company in their SPI eﬀorts. The academic goal was to assess
the applicability of GQM+Strategies in such a complex context.
GQM+Strategies demonstrated that several eﬀorts within the company were repli‐
cated and some goals were promoted by both industries in which the company operates.
Moreover, the alignment allowed to reduce segregation and provided a holistic vision
of the company’s IT roadmap. Senior management saw potential on the GQM+Strategies
Grid. They started to call it the company’s value map.
The rest of the paper is structured as follows. Section 2 introduces the theoretical
background, delving into GQM+Strategies deﬁnition and similar experience reports of
case studies (i.e., related work). Section 3 describes our implementation of GQM+Strat‐
egies including: Application context, research approach, execution and results.
Section 4 describes the main lessons learned in this research. Section 5 describes similar
experience reports and compares those experiences with the ones described on this
paper. Finally, Sect. 6 presents some ﬁnal remarks.

2

Background

2.1 GQM+Strategies
Goal-Question-Metric plus Strategies approach (GQM+Strategies® [4, 5]) was deﬁned
as an extension of the GQM paradigm [1, 2]. GQM is a measurement system that sets
rules for interpretation of measurement data in three levels: Conceptual, operational and
quantitative. The GQM process identiﬁes goals, derives questions and speciﬁes
metrics [1].
GQM+Strategies proposes the alignment of the traditional GQM paradigm with
organizational goals in three levels: Business, software and project. Moreover, GQM
+
Strategies describes an iterative process to achieve organizational goals alignment
through measurement [3] and proposes that constant updates are necessary to reﬂect
organizational, departmental or team goal changes [6]. Figure 1 shows the phases of
GQM+Strategies and its sub-activities.

200

G. López et al.

Fig. 1. GQM+Strategies ﬂow diagram. Based on phases reported on [3]

GQM+Strategies recommends the use of a Grid that links goals to strategies, ques‐
tions, and metrics, easing communication of common goals in the organization. The
Grid facilitates traceability of goal-related data and integration of the measurement
program across diﬀerent levels within the organization [7, 8].
GQM+Strategies considers context factors and assumptions that are made during the
implementation of measurement programs. Therefore, it allows inspection of erroneous
assumptions or context changes over time. GQM+Strategies helps organizations to
deﬁne what, why and how to measure, and interpret those measurements [6].

3

GQM+Strategies Implementation

3.1 Context
The company in which GQM+Strategies was implemented has several distinct charac‐
teristics: (1) It is a large organization (more than 15.000 employees on the payroll),
(2) has more than 60 years of existence, (3) it is a multi-industry company, and (4) it is

Application of GQM+Strategies in a Multi-industry State-Owned Company

201

a state-owned company. Furthermore, one of the industries in which the company oper‐
ates recently transitioned from a monopoly to a competitive environment. Figure 2
shows a simpliﬁed company organigram.

Fig. 2. Company organigram

Figure 3 illustrates the organization within the IT department. The IT department is
part of the administrative and ﬁnance management, there are also two managements at
the same level, one for each industry in which the company operates.

Fig. 3. Organization of the IT department

Industries 1 and 2 have diﬀerent dynamics, one is a very volatile (i.e., requirement
change rates are very high), the other is very structured and passive. The IT department
is in charge of IT across the company. However, each management has its local IT (selffunded) therefore a silo mentality has been established (i.e., departments or groups
within an organization do not want to share resources (e.g., information, knowledge,
funding, among others) with other individuals in the same organization.
The traditional approach towards measurement in the organization has been balanced
scorecards [9] and Key Performance Indicators (KPI). Therefore, concepts such as
learning and growth, business process, customer or ﬁnancial perspective, and strategy
mapping are deeply rooted in the company.

202

G. López et al.

3.2 Research Approach
The GQM+Strategies implementation reported in this paper was performed collabora‐
tively between the University of Costa Rica research team and the company represen‐
tatives. The research team comprised twelve members: Four PhDs, one PhD-student
researcher, three MSc (one researcher and two industry practitioners), and four BSc,
during a two-year period (i.e., approximately four at any given time). The company
representatives comprised: Three permanent members (all seniors at the organization).
The main positions of the company representatives were: Company’s IT department
manager, IT department’s strategy division manager, and another strategy division
member. However, the application of GQM+Strategies incorporated the vision of over
12 people working in the IT department and its divisions.
GQM+Strategies implementation started from senior management’s desire of imple‐
menting a software factory (i.e., commitment to long-term, integrated eﬀorts to enhance
software operations [10]). The research team performed a diagnosis focused on: The
company’s organizational scheme, software development culture and processes, human
resources and technology used in the development process and other regulations.
The diagnosis was implemented through interviews (15 interviews/71 people), site
visits (5 visits) and an online survey (281 participants). All these activities were carried
with people from the IT department and the local ITs (Fig. 2). From the diagnosis 44
ﬁndings were reported. Findings can be categorized as: General, business vision,
requirement management, quality assurance, release management, change management
and conﬁguration, project management, architecture, support, outsourcing management,
IT governance, technology, people, and culture.
In several of these ﬁndings the measurement process is mentioned as a requirement
before addressing the issues reported (i.e., measurement process must be implemented
before ﬁxing the problem), in order to ensure improvement and measure progress. This
situation was crucial to boost GQM+Strategies implementation in the organization and
convince senior management that a measurement approach that also aligns business
goals would be very beneﬁcial for the IT department and the company.
3.3 GQM+Strategies Execution and Results
The application of GQM+Strategies consisted mainly of a set of planning meetings with
the company representatives. In the next subsections we will describe how we
approached each phase of GQM+Strategies in the context of this implementation, the
main results and diﬃculties found along the way.
Initialize
According to [3], required inputs for this phase include a motivational talk and tuto‐
rial. We conducted three sessions of motivational talks (due to the number of people
that participated). In these sessions, the IT department manager and the company’s
training manager introduced the planned eﬀorts. Three cases were presented to motivate
participants: A former software engineer at Google talked about working culture, a
former software testing engineer at Microsoft Corp talked about the importance of high

Application of GQM+Strategies in a Multi-industry State-Owned Company

203

quality software products and a former software developer at Costa Rica’s Central Bank
talked about software improvement processes in public institutions.
Commitment was a given, since the process originated from the organization. More‐
over, the IT department manager was fully involved at this stage. A schedule of activities
was prepared and the budget was calculated. At this stage training people is a task that
should be executed. However, only two work sessions were conducted due to represen‐
tatives’ insuﬃcient time. Table 1 shows the training sessions, their coverage and dates.
Table 1. Joint events for sensitization and training, all sessions were conducted in 2015

The original GQM+Strategies scope was the IT department, leaving out the local
ITs. However, once the process started, provided documentations included IT-related
goals of the company’s three managements (See Fig. 2).
Characterize the Environment
In this phase, while assessing the applicability of GQM+Strategies, we discovered
that IT department’s strategy division head compared the approach with balanced score‐
cards (due to their focus on measurement), this was a diﬃcult to overcome problem,
because balanced scorecards have been used for many years and their promotors did not
want to change the way measurements were executed.
Since our application contexts is a public state-owned company, they are asked to
provide detailed documentation of goals and investments. Therefore, there was a high
availability of organizational structure, process, goals, and future plans documentation.
At a certain point there was too much documentation. Later on, we will discuss on how
plans for each management had duplications and ambiguities that made the GQM
+
Strategies reﬁnement process very diﬃcult.
Deﬁne Goals, Strategies and Measurement
We asked the company representatives to provide information regarding current
goals, plans and measurements. Delivered documentation included: Current measure‐
ments for the business operative plan, 2016 IT plan, IT strategic framework (2015-2018)
at company level, IT strategic framework for the administration and ﬁnance management
(2015), IT strategic framework for the Industry 1 management (2014-2018), IT strategic
framework for the Industry 2 management (2016), and IT Strategic framework, general
strategic documentation (2014-2018), proposed metric for ITIL (2011), IT government
statute, and the administration and ﬁnance management strategy (2015-2018).
The research team read and interpreted all the provided documentation and extracted
IT-related goals. Figure 4 provides a structural overview of the GQM+Strategies Grid
derived from the documental revision. Due to conﬁdentiality reasons of the case
company, the complete list of goals and strategies are not mentioned.

204

G. López et al.

Fig. 4. Structural overview of the GQM+Strategies grid (ﬁrst version)

In the ﬁrst version of the GQM+Strategies Grid, no diﬀerence was made between
goals and strategies, everything was considered a goal since there were no further details
on how the statements were going to be achieved.
A change in mental models was required to understand the GQM+Strategies proposal
and diﬀerentiate between goals and strategies. This eﬀort was very valuable to assess
the implementability of many proposed goals in the organization.
In order to move from the ﬁrst version of the GQM+Strategies Grid (Fig. 4) to the
ﬁnal version (Fig. 5), reﬁnement events were carried with company representatives.

Application of GQM+Strategies in a Multi-industry State-Owned Company

205

They were asked to assess the Grid’s vision according with the scope of the
implementation and the reviewed documents.

Fig. 5. Structural overview of the GQM+Strategies grid (ﬁnal version)

In Fig. 4, the left most boxes represent the company’s strategic goals. They were
considered and incorporated in the Grid because one of the documents stated that the
goal of IT in the company was to support all the company’s strategic goals.
Figure 4 contains twelve company strategic goals (non IT-related), seven high level
goals (IT-related), and 193 lower level goals. Some of these goals were similar from

206

G. López et al.

each other. Nonetheless, the research team decided to keep them all, in order to provide
evidence of the duplicated goals found across documents.
The ﬁrst reﬁnement session was conducted with the version of the GQM+Strategies
Grid created from the goal extraction of the documentation. In this ﬁrst reﬁnement
session, both the director of the IT department’s strategy division and a senior employee
of that division participated. The GQM+Strategies grid was introduced and they started
to deﬁne changes that should be incorporated.
Table 2 shows the GQM+Strategies Grid reﬁnement sessions conducted during this
GQM+Strategies implementation. After each reﬁnement session a new version of the
GQM+Strategies Grid was generated and used in the next reﬁnement session. This did
not occur in the sessions conducted on November 11, all these sessions were conducted
with the same version of the grid and a set of changes was prepared to be incorporated
after these three sessions.
Table 2. GQM+Strategies Grid reﬁnement events carried during 2015

Even though the research team tried enforced the company representatives to make
strategy decisions that would allow later strategy execution, this was not possible. In the
context of a large state-owned company, individuals are not used to make decisions
alone.
Once the GQM+Strategies Grid was ready (i.e., all apparent goals were included),
the IT plan for 2016 was released and approved by the board of the company (i.e., it was
engraved in stone). This document was facilitated to the research team in early 2016;
therefore, the GQM+Strategies Grid was reviewed and reﬁned. Again, no diﬀerence was
found between goals and strategies in the IT plan. A session was conducted to identify
how the IT plan and the GQM+Strategies Grid could be aligned and the representative’s
answer was that the Grid was the one that required to be changed, because the IT plan
was approved by the company’s board of directors.
The deﬁned solution was to implement a mapping tool (initially an Excel document)
to translate IT plan language into goals and strategies and incorporate them in the GQM
+
Strategies Grid. At this point, the GQM+Strategies Grid was transferred to be used
internally in the company.
Plan Grid Implementation
In this phase, the research team decided to adapt an iterative approach toward
improvement [11]. An improvement backlog was developed based on the GQM+Strat‐
egies Grid strategies, and an iterative improvement process began. In order to assure

Application of GQM+Strategies in a Multi-industry State-Owned Company

207

that measurements are aligned between them, the research team proposed measures for
each goal incorporated in the GQM+Strategies Grid ﬁnal version.
The GQM+Strategies Grid showed in Fig. 5 is the Grid in its current state. The Grid
contains two (non IT-related) company strategic goals that are directly mapped with
nine IT-related high level goals. Sixty-two low level goals were ﬁnally incorporated into
the GQM+Strategies Grid. Furthermore, 149 strategies were linked with those goals.
The analyze outcomes and package improvement phases of GQM+Strategies are still to
be implemented after the strategy execution and measurement implementation.
In the next section, we describe some lessons learned of this GQM+Strategies imple‐
mentation. We acknowledge that GQM+Strategies is not a novel approach (i.e., it has
more than 6 years being applied); however, we did not found evidence (published in
academic literature) of the application of this approach in such a complex context: Large
multi-industry state-owned company. Therefore, we believe that the insights gathered
from this experience could be beneﬁcial for further implementations and researches.

4

Lessons Learned

The context in which GQM+Strategies was implemented had traditionally used balanced
scorecards as the main measurement approach. Some issues were found in the imple‐
mentation of balanced scorecards that reduced their value for the organization:
• Metrics were created to be centralized and generalizable. Lower levels of the
company were not given the change to develop their own measurements.
By applying GQM+Strategies, we did not only allow lower levels to design meas‐
urements, but also to incorporate their goals into the GQM+Strategies Grid.
• The company worked with silo mentality, from managements to working depart‐
ments, all areas work separately and do not share information. There was an
entrenched lack of information integration.
By reviewing documentation provided by all management oﬃces and both the IT
department and local ITs, we took a step forward towards data integration in the
company. GQM+Strategies, and specially the Grid provided so much value in the eyes
of the strategy division director that he started calling it the company’s value map.
• Balanced scorecards were developed in diﬀerent levels of the organization; however,
they were not fully integrated.
As we mentioned, information was not integrated horizontally. Furthermore, the
measurement process carried did not provide a vertical integration. Therefore, lower
level measurements were not always proper to assess higher level goals. In some cases
balanced scorecards used diﬀerent indicators to measure similar things in diﬀerent levels
of the organization.
Delving into the GQM+Strategies implementation, we also faced some challenges
and we learned some lessons while dealing with those challenges:
• A ﬂexible tool to manage the GQM+Strategies Grid

208

G. López et al.

We developed the visual representation of the GQM+Strategies grid using Micro‐
soft’s Visio. In cases in which the grid is small, there are no major challenges to manage
changes. However, in a Grid with over 200 components, changes were diﬃcult to deal
with.
A ﬂexible tool designed for business and information management purposes (instead
of a diagramming tool) would have been very beneﬁcial for the process. During the
implementation of GQM+Strategies, the company in which the approach was imple‐
mented bought a tool called MindManager, the ﬁrst activity they performed as soon as
the grid was delivered, was to change it from Visio to MindManager due to the ﬂexibility
it provided. We acknowledge that our use of Microsoft’s Visio was not the best, since
we used a basic diagram, however, the use of a specialized tool made the process much
easier.
• During the implementation of GQM+Strategies, a clear segmentation between the
research team and the company representatives was made. The ﬁrst version of the
GQM+Strategies Grid was created by the research team by reviewing documentation.
This approach was beneﬁcial to avoid bias in the Grid; however, it also increased the
cost and time required to ﬁnalize the ﬁrst version of the GQM+Strategies Grid.
By creating the Grid from an outsider perspective, the research team was capable of
identifying a large number of goals in the reviewed documentation. All these goals were
incorporated in the ﬁrst version of the GQM+Strategies Grid. However, once this version
was ready to be assessed, we suﬀered from lack of participant identiﬁcation and commit‐
ment in the reﬁnement sessions. Moreover, we believe that they did not understand the
process.
Through training and iterations in the reﬁnement process, this issue was overcome.
However, the costs increased. If the company representatives had participated in the
process since the beginning this issue would not have appeared, but the unbiased
perspective could not have been achieved.
• An automation tool could have been used to keep track of changes
During this implementation we used manual changelogs (to keep track of changes
in the grid). A better tool to manage the GQM+Strategies Grid could have provided the
mechanisms to automate tracking changes and avoid the costs of manually tracking
them.
• The company is undertaking a change process. However, deep changes in the organ‐
izational structure of the company destabilize these eﬀorts. We faced problems to
maintain sponsorship during GQM+Strategies execution. The director of the IT
department strategy division changed during the implementation. The new director
saw potential in the results of the process and promoted it. This could have been the
other way around, and sponsorship could have been lost.
The GQM+Strategies Grid allowed us to clearly distinguish the organizational
changes over time. Excess of documents over the years were proof of those changes. It
is hard to assess if GQM+Strategies should be implemented while changes so deep are
occurring in a company. Furthermore, cultural characteristics were against the

Application of GQM+Strategies in a Multi-industry State-Owned Company

209

incorporation of practices that make visible information within the organization. There‐
fore, we faced resistance from some employees.
In order to face some of the main issues found during our GQM+Strategies imple‐
mentation we used some mechanisms including: A changelog to keep track of changes
between reﬁnement sessions. This was necessary due to the complexity of the GQM
+
Strategies Grid initial version.
The changelog is composed of all the changes that arise during the reﬁnement
session. The choices of changes proposed to the participants were: Merge, eliminate,
modify and create. Merge was used to unify two or more goals attending the same issue
or executing similar tasks. Eliminate was available to discard goals that were not consid‐
ered IT-related. Modify was used to adjust the original texts to either goal or strategy
syntax. Create was used when a strategy was devised and it had no associated goal (i.e.,
the goal was created based on the strategy).
All this was conducted to obtain the ﬁrst stable version of the GQM+Strategies Grid,
after the reﬁnement sessions. However, we also see ﬁt to use the changelog as further
reference. For instance, once a goal is achieved, it might be changed or discarded from
the GQM+Strategies Grid, without trace of it ever being there. We believe that change‐
logs or version control might be implemented to provide the company with a memory
of its goals.
To implement this proposal, the deﬁnition proposed by Münch et al. [12] might be
used to characterize each goal, denoting the changes it has undergone: New goal,
discarded goal, unchanged goal, revised goal, split goals, merged goal, established goal
and linked goal.
The IT plan provided by the company for this GQM+Strategies implementation
considers goals to have diﬀerent maturity levels, including: Foundation establishment,
positioning and consolidation/operation. These phases can be mapped to the GQM
+
Strategies Grid as it changes over time.
The changelog allowed us to eﬀectively track and document the changes made after
a reﬁnement session. Also, we used them as a reference before we started a new reﬁne‐
ment session. To show to the interviewee the evolution of the Grid and the most recent
modiﬁcations.
Finally, as it was expected, visualizing such a large GQM+Strategies Grid was almost
impossible. Navigation between goals was also a diﬃcult task. Figure 6 shows the ﬁnal
version of the developed GQM+Strategies Grid.

Fig. 6. GQM+Strategies grid traditional visualization

As it is evident, the traditional hierarchical visualization of the GQM+Strategies Grid
does not work with so many goals and strategies. It is diﬃcult to understand the context

210

G. López et al.

of each goal and navigate between them. We created a poster version of the grid (Fig. 7),
dividing each branch at the higher IT-related goals. A highlighted section of the poster
shows the highest level relationships. This visualization was very helpful for the
company representatives, easing navigation and overall understanding of the GQM
+
Strategies results.

Fig. 7. GQM+Strategies grid visualized as a poster

5

Discussion and Similar Experience Reports

Several experience reports and case studies have been reported in literature addressing
the implementation of GQM+Strategies in diﬀerent contexts. This section describes
some of them, their similarities and diﬀerences with the experience described in this
paper.
In 2010, Mandić et al. [13] proposed an approach to assess GQM+Strategies’ prac‐
tical value and applied it in a Finnish ICT company. Authors concluded that GQM
+
Strategies had practical value for the company. However, authors only assessed a small
department in the company. In this paper, only one department was assessed. Nonethe‐
less, this department serves the whole company and shares goals three diﬀerent mana‐
gerial oﬃces.
Also in 2010, Sarcia [14] reported that in order to apply GQM+Strategies in nonsoftware development domains in required to be generalized (i.e., converted into a
context free approach). Author assessed the application of GQM+Strategies in a military
training domain in Italy. This work was an ongoing experience and no further details
were found in literature. Even though the experience described in this paper was gathered

Application of GQM+Strategies in a Multi-industry State-Owned Company

211

from the application of GQM+Strategies in an IT Department, one of the industries in
which the company works is not related to software development. Some of the high level
goals (for this industry) had nothing to do with software development. Therefore, this
experiences support the ﬁnding that GQM+Strategies is applicable and introduces busi‐
ness value in non-software development domains.
In 2011, Kaneko et al. [15] presented results and experiences from applying the GQM
+
Strategies approach at the Japan Aerospace Exploration Agency’s software research
department. The eﬀorts were joined with IESE’s experts. Three top-level goals are
reported; 20 lower level goals are also identiﬁable in their GQM+Strategies Grid.
Reported lessons learned include: (1) Reuse of previously developed measurement
models is important to reduce eﬀorts and (2) GQM+Strategies Grid can contribute to
prioritize high level goals by assessing which ones are more feasible. In this experience,
consultants had to force a separation between the previously used measurement model
and GQM+Strategies, because management started to fall back into their traditional
measurement practices, and this was considered a threat for GQM+Strategies imple‐
mentation and the SPI initiative as a whole.
Also during 2011, Trendowicz, Heidrich, and Shintani [16], reported lessons learned
from a GQM+Strategies implementation at the Japanese Information-technology
Promotion Agency. Authors present an interesting perspective on how GQM+Strategies
allows project evaluation regarding their contribution towards higher-level goals, and
how clear rationale can be applied for rejecting or approving project proposals based on
their contribution. To assess this perspective, the implementation of GQM+Strategies
should be ingrained in the company and its practices. An evaluation of this contribution
in the context of application presented in this paper will require more time and further
analysis.
In 2013, Münch et al. [17] reported experiences on a ﬁve-month application of GQM
+
Strategies in an industrial company. Main lessons learned reported include: (1) The
necessity of company representatives to have knowledge of GQM+Strategies and how
to apply it, (2) ﬁnd a suitable entry point for the approach implementation is crucial. In
this case, the eﬀorts presented in this paper are very similar to the ones described by
Münch et al. [17]. However, the experience time period was larger due to the complexity
of the company and the amount of information available that had to be revised. More‐
over, the entry point was a key aspect to boost GQM+Strategies implementation. The
sponsorship of senior management and participation of representatives with the proper
knowledge enhanced sped up the GQM+Strategies implementation.
In 2013 Basili, Lampasona, and Ramírez [18], describe an overview of the applica‐
tion of GQM+Strategies in ECOPETROL (oil and gas industry). Their main goals were
for alignment and formalization of goals, providing strategies with visibility, and
deﬁning operational measurement and interpretation of goals. Authors point the eﬀort
required to implement GQM+Strategies (i.e., the GQM+Strategies Grid requires the
collaboration of several people representing diﬀerent levels of the organization and is a
non-trivial mental activity). This experience report is very similar to the one conducted
at ECOPETRO. Diﬀerent perspectives were incorporated, not only due to consultant
request but also because the company representatives advised that other perspectives
should be considered to create the GQM+Strategies Grid.

212

G. López et al.

In 2014, Petersen et al. [19] proposed an elicitation instrument to gather stakeholder
perspectives and incorporate it into the GQM+Strategies approach. In their paper,
authors explain the case study performed at Ericsson AB (Telecommunications
industry). We tried to apply this elicitation instrument; however, it was too structured
for the ﬁrst implementation of GQM+Strategies. This is the most similar case to the one
presented in this paper, due to industry similarity. However, Petersen et al. [19], do not
delve in the speciﬁc results of the implementations. Therefore, a comparison is diﬃcult
to conduct.
In this section we presented experience reports, case studies of GQM+Strategies
conducted from 2010 to 2014. Those reports were compared with the one described in
this paper.

6

Conclusions

In this paper we presented our experiences of applying the GQM+Strategies approach
in a large multi-industry state-owned company. The application was triggered by an
improvement initiative to implement a software factory.
GQM+Strategies was implemented by a research team jointly with company repre‐
sentatives. The approach was followed as proposed in [3]. The research team managed
the process and company representatives worked as stakeholders.
The GQM+Strategies Grid was ﬁrstly derived from a documental review of the
company’s information. Over 10 large (100 page or more) documents were reviewed
and goals or action plans were extracted. With the list of goals, the research team
conducted a process to unify them in one single representation. The ﬁrst version of the
GQM+Strategies Grid did not diﬀerentiate between goals and strategies.
Following a proposal to operationalize GQM+Strategies [19], we conducted reﬁne‐
ment sessions. In each reﬁnement session one company representative’s vision was
incorporated, and doubts emerged from the process were revised and solved. The
research team implemented a changelog to keep track of changes in order to modify the
Grid and use an improved version in the next reﬁnement session. We observed the
following key beneﬁts and advantages of applying GQM+Strategies in such a complex
context:
• GQM+Strategies helped the company to get a complete vision of their goals and
provided them with a tool to visualize those goals integrally.
• The approach implementation made visible the beneﬁts provided by operative and
low level tasks both to immediately above goals and to the company’s objectives.
• As a state-owned company, they suﬀer external pressures to plan their activities. Such
pressures sometimes force them to plan in advance for several years. GQM+Strat‐
egies provided a tool to enforce alignment between plans (goals) and to deﬁne strat‐
egies to achieve those goals. Moreover, it allowed the identiﬁcation of redundancies
and replicated eﬀorts.
• As a multi-industry company, they adopted a silo mentality. GQM+Strategies
promoted the uniﬁcation of the IT department goals for both industries and the
administrative and ﬁnance management. Therefore, the segregation was reduced.

Application of GQM+Strategies in a Multi-industry State-Owned Company

213

Moreover, the research team speciﬁcally recorded which industry or management
proposed the goals allowing the IT department to prioritize goals that have impact
into more than one management.
One of the key lessons from our application GQM+Strategies is that goal very diﬃ‐
cult and institutions that ﬁrst try to implement the approach do not distinguish between
goals and strategies. In general, the GQM+Strategies approach was very valuable in the
context in which it was applied. However, the eﬀorts invested were large. We expect
for the results to have an impact as large as the eﬀort invested to implement it.
One of the key lessons gathered from this eﬀort was the help that GQM+Strategies
implementation provided to visualize redundant goals within the company. This visi‐
bility allowed the research team to point out deﬁciencies in the budged investment and
the company representatives to convince internal management that a change was neces‐
sary to avoid unnecessary expenses. Even though this is not a direct beneﬁt from GQM
+
Strategies researchers could not achieved this result without following this approach.
Acknowledgments. This work was partially supported by CITIC at University of Costa Rica.
Grant No. 834-B4-412. We would also like to thank all the consultants that participated in this
research, including but not limited to: Dr. Marcelo Jenkins (current minister of Science and
Technology of Costa Rica), Dra. Alexandra Martínez, Dr. Carlos Castro, Mag. Mauricio Arroyo,
Mag. Marcela Chacón, and Mag. Francisco Cocozza.

References
1. Basili, V.R.: Software Modeling and Measurement: The Goal/Question/Metric Paradigm,
Maryland, USA (1992)
2. Basili, V.R., Caldiera, G., Rombach, H.D.: The goal question metric approach. In:
Encyclopedia of Software Engineering. Wiley Publishing, Inc. (1994)
3. Basili, V.R., Trendowicz, A., Kowalczyk, M., Heidrich, J., Seaman, C., Lindvall, M., Munch,
J.: Aligning Organizations Through Measurement - The GQM+Strategies Approach. Springer,
Switzerland (2014)
4. Fraunhofer USA Inc.: Fraunhofer Center for Experimental Software Engineering. http://
www.fc-md.umd.edu/
5. Fraunhofer-Gesellschaft: Fraunhofer Institute for Experimental Software Engineering. http://
www.iese.fraunhofer.de/en.html
6. Janes, A., Succi, G.: The GQM+Strategies approach. In: Lean Software Development in
Action, pp. 151–170. Springer, Heidelberg (2014)
7. Mandic, V., Basili, V.R., Oivo, M., Harjumaa, L., Markkula, J.: Utilizing GQM+Strategies
for an organization-wide earned value analysis. In: EUROMICRO-SEAA, pp. 255–258. IEEE
Computer Society (2010)
8. Basili, V.R., Lindvall, M., Regardie, M., Seaman, C., Heidrich, J., Rombach, D., Trendowicz,
A., Münch, J., Rombach, D., Trendowicz, A.: Linking software development and business
strategy through measurement. Comput. (Long Beach Calif.) 43, 57–65 (2010)
9. Kaplan, R.S.: Conceptual foundations of the balanced scorecard. In: Handbooks of
Management Accounting Research, Vol. 3, pp. 1253–1269 (2008)

214

G. López et al.

10. Aaen, I., Botcher, P., Mathiassen, L.: The software factory: contributions and illusions. In:
Proceedings of the Twentieth Information Systems Research Seminar in Scandinavia, pp.
736–750. Euro-Arab Management School (1997)
11. Cohn, M.: Succeeding with Agile: Software Development Using Scrum. Addison-Wesley
Professional, Reading (2009)
12. Münch, J., Fagerholm, F., Kettunen, P., Pagels, M., Partanen, J.: The eﬀects of GQM
+
Strategies on organizational alignment. In: DASMA Software Metric Congress (2013)
13. Mandić, V., Harjumaa, L., Markkula, J., Oivo, M.: Early empirical assessment of the practical
value of GQM+Strategies. In: Münch, J., Yang, Y., Schäfer, W. (eds.) ICSP 2010. LNCS, vol.
6195, pp. 14–25. Springer, Heidelberg (2010). doi:10.1007/978-3-642-14347-2_3
14. Sarcia, S.A.: Is GQM+Strategies really applicable as is to non-software development domains?
In: ACM-IEEE International Symposium on Empirical Software Engineering and
Measurement, pp. 1–4 (2010)
15. Kaneko, T., Katahira, M., Miyamoto, Y., Kowalczyk, M.: Application of GQM+Strategies®
in the Japanese space industry. In: International Conference on Software Process and Product
Measurement, pp. 221–226. IEEE Computer Society (2011)
16. Trendowicz, A., Heidrich, J., Shintani, K.: Aligning software projects with business
objectives. In: International Conference on Software Process and Product Measurement, pp.
142–150. IEEE Computer Society (2011)
17. Munch, J., Fagerholm, F., Kettunen, P., Pagels, M., Partanen, J.: Experiences and insights
from applying GQM+Strategies in a systems product development organisation. In:
Euromicro Conference Series on Software Engineering and Advanced Applications, pp. 70–
77 (2013)
18. Basili, V., Lampasona, C., Ocampo Ramírez, A.E.: Aligning corporate and IT goals and
strategies in the oil and gas industry. In: Heidrich, J., Oivo, M., Jedlitschka, A., Baldassarre,
M.T. (eds.) PROFES 2013. LNCS, vol. 7983, pp. 184–198. Springer, Heidelberg (2013). doi:
10.1007/978-3-642-39259-7_16
19. Petersen, K., Gencel, C., Asghari, N., Betz, S.: An elicitation instrument for operationalising
GQM+Strategies (GQM+S-EI). Empir. Softw. Eng. 20(4), 968–1005 (2015)

Verification and Validation

Is Mutation Testing Ready to Be Adopted
Industry-Wide?
Jakub Možucha and Bruno Rossi(B)
Faculty of Informatics, Masaryk University, Brno, Czech Republic
{jmozucha,brossi}@mail.muni.cz

Abstract. Mutation Testing has a long research history as a way to
improve the quality of software tests. However, it has not yet reached
wide consensus for industry-wide adoption, mainly due to missing clear
beneﬁts and computational complexity for the application to large systems. In this paper, we investigate the current state of mutation testing support for Java Virtual Machine (JVM) environments. By running
an experimental evaluation, we found out that while default conﬁgurations are unbearable for larger projects, using strategies such as selective
operators, second order mutation and multi-threading can increase the
applicability of the approach. However, there is a trade-oﬀ in terms of
quality of the achieved results of the mutation analysis process that needs
to be taken into account.
Keywords: Software mutation testing · Experimentation · Equivalent
mutants · Selective mutation operators · Cost-reduction strategies

1

Introduction

Large amount of resources are wasted yearly due to bugs introduced in software systems, making the testing process one of the critical phases of software
development [2]. A recent research reported the cost of software debugging up to
a yearly $312 Billion, with developers utilizing 50 % of their allocated time to
ﬁnd and ﬁx software bugs [1]. Software Engineering is for long time striving
to ﬁnd ways to reduce such ineﬃciencies, with the constant challenge to build
more robust software. Mutation Testing is one such ways, representing a powerful technique to evaluate and improve the quality of software tests written by
developers [7,14].
The main idea behind Mutation testing is to create many modiﬁed copies
of the original program called mutants — each mutant with a single variation
from the original program. All mutants are then tested by test suites to get the
percentage of mutants failing the tests. It has been proven that mutation testing
can bring several beneﬁts to complement the applied testing practices, e.g. for
test cases prioritization [6].
However, mutation testing has been often reported to struggle to be introduced in to real-world industrial contexts [8,11,15]. So why is mutation testing
c Springer International Publishing AG 2016

P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 217–232, 2016.
DOI: 10.1007/978-3-319-49094-6 14

218

J. Možucha and B. Rossi

not widely adopted within industry? According to Madeyski et al. [9], mainly due
to (a) performance reasons, (b) the equivalent mutants problem — syntactically
but not semantically equal mutants — and (c) missing integration tools. In our
view, the biggest drawback of mutation testing — its great computational costs
— prevented until recently to include mutation testing into the development
cycle of most companies. This resulted in development of many techniques to
reduce the costs of mutation testing. Furthermore, another perceived drawback
might be that the advantages of running mutation testing might not be fully
clear as opposed to other simpler testing approaches.
Problem. The applicability of Mutation Testing to real-world project is far from
reaching consensus [4,8,9,11,15]. While it seems that improvements have been
done in tools integration, performance and equivalent mutants concerns still
remain the most relevant issues, and call for further analyses.
Contribution. We report on an experiment addressed at understanding the current performance of Mutation Testing in Java Virtual Machine (JVM) environments, based on our previous experience on empirical studies [16] and the needs
for more industry-academia cooperation [3]. With the collaboration of an industrial partner, we are in particular looking at diﬀerent strategies that can reduce
runtime overhead of Mutation Testing. Among the results, we provide indications about selective operators eﬃciency for Mutation Testing and their impact
on performance. Practitioners can gain more insights about the performance /
quality trade-oﬀs in running mutation testing by evaluating several cost reduction strategies on a typical set of projects. Such information can be relevant
for the integration in their own software development process. Furthermore, we
make available the experimental package for replications.
The paper is structured as follows. Section 2 reports about the background
on mutation testing. In Sect. 3, we refer about the experimental evaluation,
describing the experimental design, choices made, results, and threats to validity.
Section 4 provides related works that evaluated mutation testing in an experimental setting. Section 5 provides the discussions and Sect. 6 the conclusions.

2

Mutation Testing Background

Mutation Testing has undergone several decades of extensive research. First
formed and published by DeMillo et al. in a 1971’s seminal paper [5], Mutation
Testing was introduced as a technique that can help programmers to improve the
tests quality signiﬁcantly. The core of Mutation Analysis is creating and killing
mutants. Each mutant is a copy of original source code modiﬁed (mutated ) with
a single change (mutation). These mutations are done based on set of predeﬁned
syntactic rules called mutation operators. Traditional mutation operators consist
of statement deletions (e.g. removing a break in a loop), statement modiﬁcations (e.g. replacing a while with do-while), Boolean expression modiﬁcation
(e.g. switching a != logical operator to ==), or variables/constants replacements.
These mutation operators can be considered to be traditional mutation operators and are mostly language independent. There are also language-dependent

Is Mutation Testing Ready to Be Adopted Industry-Wide?

219

mutation operators that are used to mutate language-speciﬁc constructs, taking
into account aspects such as encapsulation, inheritance and polymorphism.
Tests are then executed on the mutants and the failure of mutants is expected.
When the tests fail, the mutant is considered killed and no further tests are
needed to be run using this mutant. For example, the original Java code in
Algorithm 1 is mutated using a mutation operator, which replaces == with !=
and produces the mutant in Algorithm 2.
Algorithm 1. Original Code
if (a == b) then
// do something
else
// do something

Algorithm 2. Mutated Code
if (a != b) then
// do something
else
// do something

If any mutant does not cause the tests to fail, it is considered live. This
can have two meanings: that the tests are not sensitive enough to catch the
modiﬁed code or that this mutant is equivalent. An equivalent mutant is syntactically diﬀerent from the original, but its semantics are the same and therefore
it is impossible for tests to detect them. The ﬁnal indication of tests quality is
mutation score, that is the percentage of non-equivalent mutants killed by the
test data or in other terms, the number of killed mutants over the number of
non-equivalent mutants generated.
A test data is considered mutation-adequate [12] if its mutation score is
100 %. The higher mutation score is, the more faults were discovered — therefore
the better the test process. This process leads to iterative improvement of testing,
moreover, inspecting live mutants can lead to discovery and resolution of other
source code issues. The most serious problem with equivalent mutants is the
distortion of the mutation score: software tools include them in the computation,
as their accurate detection is undecidable and can only be performed by manual
inspection [9].
Mutation testing is a powerful technique, but has great computational costs.
In fact, these costs have prevented mutation testing to be used in a practical way
for many years, despite the relatively long history of mutation testing related
research. In general, these are the most expensive phases:
1. Mutant Generation – aside from great computational costs, also the memory consumption is considerably high in this phase. Mutation operators have
to be applied on the original code and mutants have to be stored;
2. Mutant Compilation – phase in which the generated mutants have to be
compiled. This phase can be very costly for larger programs;
3. Execution of tests – for every mutant, the tests have to be executed until
they are not killed. Most costly are live mutants, because every test has to
be run on them;
There are several approaches that have been proposed to reduce the
equivalent mutation problem ([9] provides an extensive review in the area).

220

J. Možucha and B. Rossi

At the same time, the problem is very often linked to performance optimization,
as less equivalent mutants generated lead to a reduction in the three phases of
mutant generation, mutant compilation, and tests execution. For this reason,
various cost reducing strategies were developed.
In this paper we look into several of these strategies and their applicability to
improve the performance for industrial applicability. A ﬁrst strategy is the Selective Mutation technique — the idea is to use only the mutation operators that
produce statistically less equivalent mutants than others. This approach allows
to reduce not only equivalent mutants, but also to improve the performance.
The aim of Selective Mutation is to achieve maximum coverage by generating
the least number of mutants. Complexity reduction of mutants generation is
from quadratic (O(Ref s ∗ V ars)) to linear (O(Ref s)) [13], while retaining much
of eﬀectiveness of non-selective mutation.
Another strategy we adopt in the current paper is Higher Order Mutation
(HOM). Taking into account the original mutants we discussed so far as First
Order Mutants (FOMs), the technique creates mutants with more than a single
mutation, referred as higher order mutants as combination of several FOMs [6].
We look in particular at four diﬀerent algorithms (Last2First, DiﬀerentOperators, ClosePair and RandomMix ) implemented in Judy [9,10] to combine FOMs
into Second Order Mutants (SOMs) — in which two mutants are combined:
– Last2First – the ﬁrst mutant in the list of FOMs is combined with the last
mutant in the list, the second mutant with next to last and so on;
– DiﬀerentOperators – only FOMs generated by diﬀerent mutation operators
are combined;
– ClosePair – two neighboring FOMs from the list are combined;
– RandomMix – any two mutants from the FOMs are combined;
All these strategies use a list of ﬁrst order mutants (FOM) and should generate at least 50 % less mutants, with impact on the ﬁnal mutation score [9].

3

Experimental Evaluation

We designed an exploratory experiment aimed at getting insights about the current applicability of mutation testing in industrial context (summary of the overall process, Fig. 1). We run in parallel a literature review (1) and an exploratory
analysis about the usage of the tools for Mutation Testing (5). The selection
of the tools for the experimentation (2), as well as the experimental units (3)
were done based on the criteria of the company. We designed the research questions (6) and created the experimental design (7) based on the results from the
exploratory analysis, taking both into account the company’s needs and theoretical constraints and aspects worth investigation from theory. Experiments were
then run (8) and results provided to the industrial partners for knowledge transfer and identiﬁcation of future works. Based on an exploratory pre-experiment
phase, we set the following research questions:

Is Mutation Testing Ready to Be Adopted Industry-Wide?

221

Fig. 1. Research process workﬂow

– RQ1. What is the performance of Mutation Testing by taking into account
standard conﬁgurations (i.e. no selective operators and no mutation strategy)?
– RQ2. What is the impact of Selective Operators on Mutation Testing eﬃciency & performance?
– RQ3. What is the impact of Second Order Mutation strategies on Mutation
Testing eﬃciency & performance?
Given the selection of the tools for mutation testing supported by the industrial partner, we looked speciﬁcally at three areas of experiments according to
the three research questions set:
EXP1. Mutation Operators Eﬃciency Experiments. Looking at the selection of the most eﬃcient mutation operators that then be evaluated in the performance experiments;
EXP2. Performance Experiments & Concurrency Experiments. Looking at the single-thread and multi-threading performance of the tools with standard conﬁgurations and according to diﬀerent selective operator strategies;
EXP3. SOM Experiments. Evaluating the impact of diﬀerent Second Order
mutation strategies (Last2First, DiﬀerentOperators, ClosePair, RandomMix ).
We run an initial review of the tools available for mutation testing in JVM
environments, omitting experimental tools. We overall considered seven tools:
MuJava, PITest, Javalanche, Judy, Jumble, Jester, MAJOR, that we compared
according to several characteristics (Table 1). To speed-up the mutation generation process, it is now a standard the support of byte-code mutation — mutants
are applied at the byte code level. The industrial partner involved in the experimentation considered PITest and Judy more relevant for a series of reasons, in
particular the availability of plugins and general easiness of integration, as well
as the open source license and support for Java 8. Both software were used to
run the experiments.

222

J. Možucha and B. Rossi
Table 1. JVM mutation testing tools
MuJava

PITest

Javalanche

Judy

Jumble

State

active

active

–

active

active

Jester MAJOR
–

active

License

Apache 2.0

Apache 2.0

LGPL

BSD

GPL

open

?

Java vers.

7

5 – 8

5, 6

7

Unit test.
framework

jUnit4

jUnit4, TestNG6 jUnit4

6 – 8

6 –8

6

jUnit4

jUnit4

jUnit3 jUnit4

Production GUI, Eclipse 3rd part. plugins Eclipse plugin
tools

Cmd

Cmd

Cmd

Automated Only classes
class/test
selection

Yes

Only tests Yes

Yes

Yes

Ant
Yes

Mutation
Operators

method class method

method concurr meth. class meth

meth

method
self-def.

Byte-code
mutation

Yes

Yes

–

Yes

Yes

Yes

Yes

For experimentation, we used as experimental projects libraries suggested
by the industrial partner (Table 2): various Apache Commons projects and the
JodaTime library. The selection was done so that the chosen projects contain
possibly the most diﬀerent distribution of size types, tests duration and test
coverage. However, we couldn’t include larger projects due to the high complexity
of running mutation tests. From an initial list, we had to discard other projects
that had either no tests (Apache Commons Daemon) or in which test cases were
failing with either PITest or Judy (Apache Commons Compression, BeanUtils).
We included in Table 2 Mutation Size, a metric that can give indication of
the complexity of the mutation process. While previous research has proved
that the number of mutants is proportional to the number of variable references
times the number of data objects (O(Refs * Vars)) [12]. However, it is uneasy to
determine the number of variable references for such large projects. Taking into
account that modern mutation testing tools are creating mutants and running
tests based on code coverage, Mutation Size is computed as coverage times size
of project, as a measure of the complexity of the mutation testing process M S =
coverage ∗ KLOC.
3.1

Experimental Procedure

The ﬁrst set of experiments was done with both tools using their default settings.
The only modiﬁcation was the conﬁguration to the same number of threads, as
Judy runs by default in parallel, while PITest uses only one thread.
After experiments with default settings, the experiments on mutation operators eﬃciency were done for each tool. The aim of these tests was to reduce
the number of active operators selecting only the most eﬃcient operators — the
ones that produce mutants that are not so easy-to-kill.
After the selection of mutation operators, another set of performance tests
was done using only the the selected operators. Concurrency tests were done on
various number of used threads, comparing time and memory usage (PITest).

Is Mutation Testing Ready to Be Adopted Industry-Wide?

223

Table 2. Projects considered for the experimental evaluation. NCL = Number of
Classes, KLOC = Lines of Code (thousands), NTCL = Number of Test Classes,
TKLOC = Test Lines of Code (thousands), TT = Test Time, CC = Code Coverage,
MS = Mutation Size
Project

Ver.

Apache Commons Chain

1.2

55

9.852

37

7.398

1.17 66.68 %

6.57

Apache Commons CLI

1.3.1

23

6.161

25

5.214

1.39 96.38 %

5.94

Apache Commons Codec

1.10

60

15.869

55

15.042

Apache Commons CSV

1.2

11

3.515

15

3.821

1.76 94.00 %

3.30

Apache Commnos DbUtils

1.6

30

7.611

26

4.453

1.36 57.71 %

4.39

Apache Commons Digester

3.2

168

23.125 101

14.220

2.31 72.49 % 16.76

Apache Commons Lang

1.2

133

68.684 148

55.467

16.51 93.80 % 64.43

Apache Commons Validator 3.4

62

16.516

77

14.117

3.42 77.61 % 12.82

70.593 158

72.423

5.02 90.18 % 63.66

Joda Time

NCL KLOC NTCL TK-LOC TT

1.5.0 166

CC

MS

5.31 95.01 % 15.08

Second Order strategies were then evaluated in terms of time performance and
mutation score (Judy).
All tests were run remotely on 4 x Intel Xeon 3.35 GHz CPUs, 16 GB RAM.
Every test result is an average of at least 10 iterations, with code coverage
computed using the Cobertura tool. The iterations of tests were launched using
simple bash scripts, which also automated renaming and moving of output ﬁles
into speciﬁed folders. Every run of tests was launched under a modiﬁed version
of the open-source memusg.sh1 program, which measures the average and peak
memory usage of all processes invoked by the current terminal session and subsessions2 . The versions of the two tools used were PITest 1.1.9 and Judy 2 (release
from 13.5.2015).
Initial Performance Evaluation. The initial evaluation was run with default
settings using one thread with default operators (Table 3). By default Judy has
active all 56 mutation operators, while PITest 7 out of 16. Missing values in the
table indicate a failure to complete the testing process.
Looking at the time performance in relation with mutation size (MS), introduced in the previous section to characterize the projects, we found a positive
correlation (Spearman’s Rank-Order, 0.85, p=0.0037, two-tailed). The experiments showed that with one exception, PITest is always faster to generate
mutants than Judy, which generated more mutants. Similarly, when comparing
how many mutants per second were generated, PITest generated mutants faster
than Judy. In our experiments, Judy was not able to ﬁnish mutation analysis for
larger projects (in particular Lang and Joda Time, that have the highest Mutation Size among the considered projects). When comparing time per number of
mutants, Judy is generally faster for all tested projects using the default settings
1
2

https://gist.github.com/netj/526585.
the experimental package is available at https://goo.gl/5GPdQv.

224

J. Možucha and B. Rossi

tests. When considering the tested projects metrics, Judy is faster for smaller
projects. However, for bigger projects or for projects with higher line coverage
or longer tests run, the performance is rapidly lower.
Comparing average memory consumption, the same pattern applies as for
comparison of tests duration. Judy consumes less memory for very small projects,
but PITest shows better results for medium and bigger projects. Similarly, the
peak of memory consumption is normally lower for Judy, but for big or better
covered projects, the memory usage peak for Judy is a lot higher than for PITest.
Table 3. Run-time performance - default settings one thread - values in () are by using
selective operators.
Project

Gen.Time (sec)

Total Time (sec)

Peak Memory (MB)

PITest

Judy

PITest

Judy

PITest

Judy

Commons Chain

1.1 (1.4)

3.18 (2.0) 33.5 (30.2)

5.67 (2.6)

956 (1587)

302 (240)

Commons CLI

1.3 (1.4)

12.8 (6.3) 45.8 (42.4)

228.5 (46.9) 1764 (1743) 4505 (3677)

Commons Codec

5.7 (6.2)

— (28.3)

247.6 (278.9) — (2225.5)

3028 (3061) — (4055)

Commons CSV

1.9 (1.6)

2.5 (1.4)

48.1 (44.1)

10.5 (4)

1648 (1654) 900 (351)

Commons DbUtils

1.6 (1.0)

6.2 (47.7) 34.2 (12)

45.6 (78.7)

784 (399)

Commons Digester

3.9 (3.0)

13.6 (8.2) 258.8 (120.1) 38.7 (20.2)

Commons Lang

21.1 (20.3) — (—)

Commons Validator 3.5 (3.5)
Joda Time

943.6 (907.9) — (—)

1441 (4653)

2678 (2540) 1509 (2071)
3825 (3600) — (—)

13.9 (5.1) 207.9 (148.2) 135.5 (29.7) 1309 (1365) 1638 (609)

28.3 (28.3) — (—)

638.8 (546)

— (—)

3857 (4267) — (—)

Time required for Mutation Testing is positively correlated with Mutation Size
(LOCS*Coverage). It can be used as initial measure of complexity. Missing
tests or tests failures (for analysis tool) hinder the possibility to apply MT.

Mutation Operators Eﬃciency Results. The procedure of selection of the
most eﬃcient operators needs some further clariﬁcation. The strong mutation
operators are those whose mutants are not easy to be killed. It would be very
diﬃcult to create tests that would kill 100 % of selective mutants. Therefore, we
adopted a diﬀerent approach by deﬁning some thresholds to deﬁne the selective
operators:
1. Run tests on all projects with all stable mutation operators (stable operators
— not causing unrecoverable crashes during mutation);
2. Find most the populous (generating the highest number of mutants) mutation
operators;
3. Exclude the operator if:
– Mutation score of mutants created by the operators is higher than the
average mutation score on all the tested projects;
– The mutation operator belongs to the most populous operators and the
score of mutants created is higher than the average of 80 % for all the
tested projects;

Is Mutation Testing Ready to Be Adopted Industry-Wide?

225

Table 4. Eﬃciency of PITest operators
Op erator

#Mut %>avg

Op erator

#Mut % >avg

INLINE CONSTS
NEGATE CONDITIONALS (D)
RETURN VALS (D)
REMOVE CONDITIONALS EQ IF
MATH (D)
REMOVE CONDITIONALS ORD ELSE
CONDITIONALS BOUNDARY (D)
REMOVE CONDITIONALS ORD IF

12455
11087
10457
8335
3457
2752
2752
2752

VOID METHOD CALLS (D)
INCREMENTS (D)
INVERT NEGS
REMOVE CONDITIONALS EQ ELSE
NON VOID METHOD CALLS
CONSTRUCTOR CALLS
EXPERIMENTAL MEMBER VARIABLE
EXPERIMENTAL SWITCH

2653
1128
71

56
100
89
100
78
89
22
67

33
100
100

Table 5. Eﬃciency of Judy operators

Non-excluded operators were considered selective operators and were active for
the selective mutation performance tests. Tables 4 and 5 are sorted by the most
populous operators from all projects (# Mut) with indication of the percentage
of mutation score of the operator being higher than average mutation score
(% > avg)3 . The (D) at the end of some operator names for PITest means
that the operator is active by default. The red-painted operators are unstable
ones, yellow are excluded operators and green are the selected operators for
the Selective Operators experiments. The Judy operators that generated many
mutants from which none were killed were considered as unstable ones.
Out of the total 16 PITest mutation operators, 5 were selected for selective
mutation including the most populous operator INLINE CONSTS, causing that
the total number of generated mutants during selective mutation was almost
the same as during the mutation using default PITest operators. For selective
mutation using Judy, 28 out of 56 mutation operators were selected and the
number of generated mutants was reduced signiﬁcantly.
3

description of operators can be found at http://pitest.org/quickstart/mutators/ and
http://mutationtesting.org/judy/documentation/.

226

J. Možucha and B. Rossi

The selected operators can be used to evaluate the number of mutated classes
vs the mutation score (Fig. 2a,b). The % of mutated classes refers to the number
of mutated classes over the total projects’ classes. The comparison of mutation score showed that the mutation score of PITest selective mutation is always
lower than mutation score of default operators. This can mean that default operators are either too easy-to-be-killed, or that selected operators produced more
equivalent mutants. Comparing selective vs non-selective strategies for mutation
score by running a Wilcoxon Signed-Rank Test showed signiﬁcant diﬀerences
(p = 0.0012 < 0.05, two-tailed, N = 15).

(a) PITest

(b) Judy

(c) PITest

(d) Judy

Fig. 2. Default vs selective mutation per mutated classes and mutation score

The total time of mutation analysis showed the real advantage of selective
mutation also for PITest tests ((Fig. 2c,d and Table 3). Except of one tested
project, all other were done faster using selective mutation. Comparing selective
vs non-selective strategies by running a Wilcoxon Signed-Rank Test showed signiﬁcant diﬀerences (p = 0.0096 < 0.05, two-tailed, N=16) in duration time. To
note also that selective operators allowed Judy to provide results on the Apache
Commons Codec project (with mutation size of 15.08).

Is Mutation Testing Ready to Be Adopted Industry-Wide?

227

EXP1. Using Selective Operators can bring beneﬁts in terms of runtime performance, however, at the expense of lower mutation score. Selective Operators
can also help in running Mutation Testing on some projects.

(a) Mutation Testing time vs # threads

(b) Average Memory vs # threads

Fig. 3. Concurrency experiments results

Performance and Concurrency Results. The results of the concurrency
experiments showed that using two or three threads can result in considerable
reduction of time compared to memory consumption increase (Fig. 3). The average memory consumption is rising for all testing projects almost linearly (ﬁtted
regression up to 7 threads, avgmem=660.42+148.55*#threads, adj R2 =0.22),
while time reduction is less than linear with the number of threads (ﬁtted regression up to 7 threads, time=262.81-21.71*#threads, adj R2 =0.20).
Looking at the combined eﬀect of decrease in time and increase in memory
consumption, we considered Δtime vs Δavgmemory (Fig. 4). In this case, time
reduces less than linearly than the increase in memory (ﬁtted regression up to 7
threads, time=13.61-0.2656*avgmem,adj R2 =0.45), so using more threads might
increase consistently memory usage without larger beneﬁts on time reduction.
EXP2. Up to 2–3 threads can bring high beneﬁts in terms of runtime performance. Change in average memory consumption grows more than linearly
compared to reduction in performance when increasing the number of threads.

SOM Experiments Results. We next looked at the Higher Order Mutation
Testing strategy for the Judy project, in particular the four diﬀerent algorithms
(Last2First DiﬀerentOperators, ClosePair and RandomMix ) to combine ﬁrst
order mutants (FOMs) into second order mutants (SOMs) implemented in Judy
[9,10]. Also in this case, we were interested in performance changes and quality of mutation score. In running the experiment, we noticed that the number

228

J. Možucha and B. Rossi

Fig. 4. Delta time vs delta average memory

(a) Mutation Score

(b) Total Time

Fig. 5. Application of diﬀerent SOM strategies vs FOM

of generated mutants was reduced at least by 50 % for some of the strategies
and projects. Mutation score for all SOM strategies was higher than for FOM
(Fig. 5a), while total time was generally lower for the three strategies in comparison with FOM (Fig. 5b). Running a Friedman non-parametric test for the differences across groups yielded signiﬁcant results (0.05, two-tailed) for generated
mutants (p = 0.0089), mutation score (p = 0.0031), and total time (p = 0.0009).
However, like mentioned, the improvement of mutation score might be due to
the inclusion of less equivalent mutants by applying such strategies.
When comparing individual SOM strategies, the ClosePair strategy gave the
lowest mutation score, while Last2First and RandomMix produced very similar
results for most of tested projects. This can be caused by the fact that neighboring mutants from the list of FOMs combined same type of mutants and the
highest number of equivalent SOM mutants were generated using this strategy.

Is Mutation Testing Ready to Be Adopted Industry-Wide?

229

EXP3. SOM strategies improve the results in terms of mutation score and
in terms of generated mutants, having positive beneﬁts on the performance.
However, manual inspection is needed to understand how many equivalent
mutants are generated.

Threats to Validity. We have several threats to validity to report [17].
For internal validity, measurements performed were averaged over several
runs to reduce the impact of external concurring for resources. One of the main
issues is the reliability of mutation score as quality indicator. The score always
includes equivalent mutants as the automated detection is undecidable [9], and
the only way to discover them is by manual inspection — unfeasible for larger
projects. In fact, two projects with the same mutation score might be quite different depending on the number of equivalent mutants. We also used thresholds
for the deﬁnition of the selective operators, and some sensitivity analysis can be
more appropriate to deﬁne the best ranges.
Related to external validity, we cannot ensure that results generalize to other
projects. However, we selected 9 heterogeneous projects in terms of size, code
coverage. More insights will be given by testing on even larger software projects
of industrial partners. Furthermore, the package of the current experiments will
be available to increase external validity by means of replications.
For conclusion validity, we applied several statistical tests and simple linear regression in diﬀerent parts of the experiment. We always used the nonparametric version of the tests, without normality distribution assumption, and
we believe to have met other assumptions (type of variables, relationships among
measurements) to apply each test.

4

Related Works

There are several related works about experimental evaluations of tools for automated mutation testing in JVM environments.
One of the ﬁrst experimental evaluations [13] was done on the mutation
operators of the Mothra project omitting two, four and six of the most populous
mutation operators. The test cases killed 100 % of mutants generated by selective
mutation. These test cases were then run on non-selective mutants. These test
cases killed almost 100 % percent of mutants. Out of 22 mutation operators
used by Mothra, 5 were key operators that provide almost the same coverage as
non-selective mutation [14].
Madeyski et al. provided an experimental evaluation comparing the performance of generating mutants between Judy and MuJava on various Apache
Commons libraries [10]. From the experiments, Judy was able to generate at
least ten time more mutants per second as MuJava.
In 2011, the applicability of mutation testing was examined in Nica et al.
[11]. The selected tools were MuJava, Jumble and Javalanche, focusing on the
performance of generating mutants. The only tool able to generate mutants

230

J. Možucha and B. Rossi

was MuJava generating about 123 class-level mutants and 30,947 method level
mutants in approx. 6 hours. Jumble and Javalanche showed conﬁguration difﬁculties and low performance. The main conclusion was that mutation testing
was too slow to be used in real world software projects.
In 2013, Delahaye et al. compared Javalanche, Judy, Jumble, MAJOR and
PITest on several sample projects [4]. The results showed that Jumble, MAJOR
and PITest were able to ﬁnish the analysis for every project, while Judy generated the highest number of mutants and Javalanche the lowest number of
mutants. The research indicated that mutation testing tools still need a lot of
improvements to be usable in real world projects.
In 2015, Rani et al. compared MuJava, Judy, Jumble, Jester and PITest in an
experimental evaluation [15]. The experiments were run on set of short programs
(17-111 LOC). The research showed that all the tools produced almost the same
average mutation scores except of PITest, which produced 25 % higher score than
the rest of the tools. One of the conclusions was that a new mutation system for
Java should be created, with faster generation and execution of mutants.
In 2016 Klischies et al. run an experimentation considering PITest on several
Apache Commons projects. As metric for the experiments authors use the inverse
of mutation score, as an indication of the goodness of the mutation operator set.
They overall considered Mutation Testing applicable to real world projects with
a low number of equivalent mutants, inspected manually, on the set of projects
that were considered. However, strong concerns remained for the applicability
to larger projects and in case code coverage within projects is too low, making
the whole mutation analysis less eﬀective [8].
Our work is diﬀerent from the aforementioned set of related works as we
focus on the selection of the best mutation operators and mutation strategies for
improvements in performance on a set of medium sized projects. We can directly
compare the SOM experiments results with Madeyski et al. [9], getting the same
results in terms of increase of mutation score and performance improvement.

5

Discussion

There are several ﬁndings about the application of Mutation Testing that we put
forward in the current paper. The general performance of Mutation Testing is
impacted by Mutation Size, that is the size of the project and the code coverage
level (RQ1 ). When taking into account the applicability of Mutation Testing,
is appropriate to consider Mutation Size (LOCs size and code coverage) as an
indication of the time required. This can be a good indicator to use by analogy for the application to other projects. A good strategy for the application
of Mutation Testing is, in fact, to ﬁrst increase code coverage to good levels, as
having lower code coverage levels cannot tell much about the quality of tests.
Clearly, larger coverage impacts on the execution of the tests, while mutant generation and mutant compilations stay the same. Taking into account multiple
threads, time reduction decreases less than linearly with the increase of memory consumption. Mutation Testing can be optimized by looking at points in

Is Mutation Testing Ready to Be Adopted Industry-Wide?

231

which parallelization does not bring enough incremental beneﬁts. For the set of
projects considered, 2–3 threads are eﬀective numbers for performance / memory
resources optimization.
Identifying the most eﬃcient operators and applying selective operators
improves the results in terms of runtime performance at the expense of lower
mutation score and lower number of mutated classes (RQ2 ). This is a strategy
that can be applied to extend the applicability of Mutation Testing to allow to
run the approach to wider set of projects. In the selective strategy we looked at
the eﬃciency of the operators in terms of killed mutants, but other approaches
may look at the operators that generate more mutants. Based on the results,
we believe that this set of strategies can help to apply Mutation Testing within
industrial contexts, as default conﬁgurations can lead to a larger overhead in
running the process. However, practitioners would need to ﬁne-tune the Mutation Testing environments according to the speciﬁc projects needs. We included
a list of selected operators eﬃciency based on the overall set of projects, that
can give indications for application to other projects.
We looked at the impact of Second Order Mutation to recombine First Order
Mutants and reduce in this way the number of mutants RQ3. All diﬀerent substrategies considered (Last2First DiﬀerentOperators, ClosePair, RandomMix )
improve in terms of time required to run mutation testing, with higher mutation score than considering the initial mutants. However, while improvements in
time are due to the lower number of generated mutants, mutation score can be
inﬂuenced by equivalent mutants, as such manual inspection would be suggested
to look for the eﬀect on each considered project.

6

Conclusion

Mutation Testing is still an evolving testing methodology that can bring great
beneﬁts to software development. With increasing computational resources, it
can reach wider adoption within industry, aiding to build more robust software.
However, there are still aspects that hinder its usage, namely the computational
complexity, equivalent mutants and possible lack of integration tools [9].
In this paper, we looked at the current support of Mutation Testing in JVM
environments, with an experimental evaluation based on industrial partner’s
needs. We focused on various aspects of performance, evaluating diﬀerent strategies that can be applied to reduce the time needed for mutation analysis. We
evaluated how selective operators and second order mutants can be beneﬁcial for
the mutation testing process, allowing to reduce runtime overhead. Based on the
results, we believe that Mutation Testing is mature enough to be more widely
adopted. In our case, the experimental results have been useful for knowledge
transfer in an industrial cooperation, with future works aimed at exploring the
experimented approaches on the company’s source code repositories.
Acknowledgments. We are grateful to the developers of both PITest and Judy for
feedback provided in the usage of the tools. In case of Judy, the SOM experiments have
been possible with a newer version provided by the developers.

232

J. Možucha and B. Rossi

References
1. CJBS Insight: Cambridge university study states software bugs cost economy $312
billion per year. http://insight.jbs.cam.ac.uk/2013/ﬁnancial-content-cambridgeuniversity-study-states-software-bugs-cost-economy-312-billion-per-year/
2. Crispin, L., Gregory, J.: Agile Testing: A Practical Guide for Testers and Agile
Teams. Pearson Education, Boston (2009)
3. Dedı́k, V., Rossi, B.: Automated bug triaging in an industrial context. In: 42nd
EUROMICRO Conference on Software Engineering and Advanced Applications,
pp. 363–367. IEEE (2016)
4. Delahaye, M., Du Bousquet, L.: A comparison of mutation analysis tools for Java.
In: 13th International Conference on Quality Software (QSIC), pp. 187–195. IEEE
(2013)
5. DeMillo, R.A., Lipton, R.J., Sayward, F.G.: Hints on test data selection: help for
the practicing programmer. Comput. 11(4), 34–41 (1978)
6. Jia, Y., Harman, M.: Higher order mutation testing. Inf. Softw. Technol. 51(10),
1379–1393 (2009)
7. Jia, Y., Harman, M.: An analysis and survey of the development of mutation
testing. IEEE Trans. Softw. Eng. 37(5), 649–678 (2011)
8. Klischies, D., Fögen, K.: An analysis of current mutation testing techniques applied
to real world examples. In: Full-scale Software Engineering/Current Trends in
Release Engineering, p. 13 (2016)
9. Madeyski, L., Orzeszyna, W., Torkar, R., Jozala, M.: Overcoming the equivalent
mutant problem: a systematic literature review and a comparative experiment of
second order mutation. IEEE Trans. Softw. Eng. 40(1), 23–42 (2014)
10. Madeyski, L., Radyk, N.: Judy-a mutation testing tool for Java. IET Softw. 4(1),
32–42 (2010)
11. Nica, S., Ramler, R., Wotawa, F.: Is mutation testing scalable for real-world software projects. In: VALID Third International Conference on Advances in System
Testing and Validation Lifecycle, Barcelona, Spain (2011)
12. Oﬀutt, A.J., Lee, A., Rothermel, G., Untch, R.H., Zapf, C.: An experimental determination of suﬃcient mutant operators. ACM Trans. Softw. Eng. Methodol. 5(2),
99–118 (1996)
13. Oﬀutt, A.J., Rothermel, G., Zapf, C.: An experimental evaluation of selective mutation. In: Proceedings of the 15th International Conference on Software Engineering
ICSE 1993, pp. 100–107. IEEE Computer Society Press, Los Alamitos (1993)
14. Oﬀutt, A.J., Untch, R.H.: Mutation 2000: uniting the orthogonal. In: Wong, W.E.
(ed.) Mutation Testing for the New Century, pp. 34–44. Kluwer Academic Publishers, Norwell (2001)
15. Rani, S., Suri, B., Khatri, S.K.: Experimental comparison of automated mutation
testing tools for Java. In: 2015 4th International Conference on Reliability, Infocom
Technologies and Optimization (ICRITO), pp. 1–6. IEEE (2015)
16. Roy, N.K.S., Rossi, B.: Towards an improvement of bug severity classiﬁcation. In:
40th EUROMICRO Conference on Software Engineering and Advanced Applications, pp. 269–276. IEEE (2014)
17. Wohlin, C., Runeson, P., Höst, M., Ohlsson, M.C., Regnell, B., Wesslén, A.: Experimentation in Software Engineering. Springer, Heidelberg (2012)

An Eﬀective Veriﬁcation Strategy for Testing
Distributed Automotive Embedded Software
Functions: A Case Study
Annapurna Chunduri1(B) , Robert Feldt1 , and Mikael Adenmark2
1

Department of Software Engineering, Blekinge Tekniska Högskola,
SE-371 79 Karlskrona, Sweden
anuchunduri11@gmail.com, robert.feldt@bth.se
2
Department of Systems and Integration Test, Scania AB,
SE-151 87 Södertälje, Sweden
mikael.adenmark@scania.com

Abstract. Integration testing of automotive embedded software functions that are distributed across several Electronic Control Unit (ECU)
system software modules is a complex and challenging task in today’s
automotive industry. They neither have inﬁnite resources, nor have the
time to carry out exhaustive testing of these functions. On the other
hand, the traditional approach of implementing an ad-hoc selection of
test scenarios based on the testers’ experience typically leads to both test
gaps and test redundancies. Here, we address this challenge by proposing
a veriﬁcation strategy that enhances the process in order to identify and
mitigate such gaps and redundancies in automotive system software testing. This helps increase test coverage by taking more data-driven decisions for integration testing of the functions. The strategy was developed
in a case study at a Swedish automotive company that involved multiple
data collection steps. After static validation of the proposed strategy it
was evaluated on one distributed automotive software function, the Fuel
Level Display, and found to be both feasible and eﬀective.
Keywords: Veriﬁcation strategy · Distributed automotive embedded
software · Test coverage · Test process improvement · Case study

1

Introduction

The automotive industry has been signiﬁcantly aﬀected by the industrial software revolution over the past decade. The share and importance of software
within a vehicle is growing steadily. Early on it was anticipated that 90 % of all
future automotive innovations will be driven by software [10] and since then it
has become clear that the industry is increasingly software-centric.
An automotive vehicle consists of Electronic Control Unit (ECU) systems,
which are essentially embedded microcontrollers with corresponding software
components. These ECU systems interact in order to execute the desired functionality in the vehicle like controlling the engine and operating air bags.
c Springer International Publishing AG 2016

P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 233–248, 2016.
DOI: 10.1007/978-3-319-49094-6 15

234

A. Chunduri et al.

In the past, each single ECU system had a single dedicated function. Hence,
execution of a function required the software within only one of the ECU systems to execute independently. Nowadays, the functions are being designed to be
realized through the interaction among diﬀerent sub-functions and ECU systems
via multiple Control Area Networks (CANs) [11]. Such cross-functionality i.e., a
function distributed across multiple ECU systems and consequently across several system software modules is a common and complex phenomenon in today’s
automobile vehicles.
In the automotive industry, the standard V-model is most widely used for
the engineering processes of embedded system software development. A typical
V-model implemented by Scania [1], a major manufacturer of commercial heavy
vehicles in the European automotive industry, can be seen in Fig. 1. Similarly,
across the automotive industry, testing of the embedded system software functions occurs at diﬀerent test levels. Here, the term ‘test level’ is used to indicate
the test focus. “Each test level describes an area of test responsibility” [1]. For
instance, at the system test level, for each individual ECU system of the vehicle,
the software modules are mounted on the corresponding ECU system hardware
like the engine and the gearbox and tested independently. At the vehicle integration test level, the software modules corresponding to all the ECU systems
which make up the vehicle are tested together in their actual operational vehicle
environment. The testing across the test levels is performed either as Hardware
in the Loop (HIL) testing, where individual ECU systems or the entire vehicle
is executed under simulated environments [12], or using real ECU systems or
vehicles. The exact number of test levels and terminology used to describe each
test level may diﬀer from one automotive company to another. But what remains
the fundamental similarity in the concept of testing is that, at each test level,
the test strategy adopted aims to address and test the system software behavior
at a diﬀerent level of abstraction and provides a diﬀerent degree of coverage of
the object under test [21].

Fig. 1. The V-model implemented at Scania [1] to test their distributed embedded
software functions

An Eﬀecective Automotive Cross-Functional Veriﬁcation Strategy

235

The distributed nature of functions across several embedded system software
modules makes integration testing of the automotive system software a complex
and challenging task [17]. Exhaustive integration testing of the distributed functions using the numerous variants across the software modules is not feasible since
the automotive industry neither has inﬁnite resources nor has the time to carry
out such exhaustive testing [3]. On the other hand, going by the traditional approach of implementing an ad-hoc selection of test scenarios based on the testers
experience typically leads to both test gaps and test redundancies across the test
levels [8]. Hence, there is a pressing need for a feasible and eﬀective veriﬁcation
strategy for integration testing of the distributed automotive embedded software
functions that would help improve test coverage while reducing test redundancies
and test gaps across the test levels. A review of research in test, veriﬁcation, and
validation in the automotive domain has revealed that research focusing on studying such challenges of high-level integration testing of the distributed automotive
embedded software functions like in [11,27], is limited. More over, research proposing and validating strategies and solutions to cope with such critical integration
test challenges, like in [3], has been found to be sparse.

2

Research Design

The research was conducted in association with Scania, a major manufacturer
of commercial heavy vehicles in the European automotive industry. This section
provides an overview of the research design. It is based on the guidelines for
conducting and reporting case study research in [20].
2.1

Research Questions

The goal of the research was to initially identify the challenges in automotive system software testing of the distributed functions. Thereafter, the research aimed
to propose and evaluate a veriﬁcation strategy which was based on addressing a
suitable subset of the identiﬁed challenges. Here, the evaluation of the proposed
strategy was in terms of its eﬀectiveness and feasibility. Initially, the strategy was
evaluated based on static comparitive analysis using scientiﬁc literature knowledge and industry expert opinion. Thereafter, the strategy was evaluated based
on its implementation on a legacy distributed automotive software function- Fuel
Level Display. Hence, in addition to having a realistic real world setting where
the proposed strategy was studied, the analysis has been extended towards investigating the perceived eﬀectiveness and feasibility of the proposed strategy based
on static data collected from industry experts and scientiﬁc literature.
Based on the above mentioned aims of the study, the research questions (RQ)
formulated are as follows.
RQ1: What are the major challenges with the current approach to test distributed software functions across test levels at the automotive case company?
RQ2: What is the feasibility and eﬀectiveness of implementation of the proposed

236

A. Chunduri et al.

process enhancement based veriﬁcation strategy according to (a) scientiﬁc literature knowledge, (b) industry expert opinion and (c) Historic data in a real
world setting?
2.2

Case Selection

Scania was chosen to be a suitable case for conducting the research and answering
the research questions. Scania is present in more than 100 countries and has been
successfully running for 250 years with over 45,000 employees. It represents a real
world setting of an automotive industry that is facing software-related challenges
in several fronts including the area of integration testing which is the focus of
the current research. Hence, this single-case study has been deemed appropriate
due to its representitive nature [31] of the automotive industry companies that
are eﬀected by the advent of software in the ﬁeld of integration testing.
2.3

Data Collection Procedure

One of the data collection methods was through conducting interviews of diﬀerent practitioners involved in the software testing activities at the case company.
A non-probabilistic judgement based sampling technique [14] was employed for
selection of interview subjects. Practitioners who were involved in, and had adequate knowledge and experience in testing were chosen. The identiﬁed practitioners included a total of 13 test engineers from diﬀerent departments with varied
experience belonging to 3 categories based on test levels, which are, System
Test Engineers, Function Test Engineers, and Lab and Vehicle Integration Test
Engineers. The interviews conducted were face-to-face, semi-structured interviews with open-ended and close-ended questions. Apart from interviews, data
from relevant artifacts in databases and archived documents in the form of test
reports, test results, test cases, function requirements speciﬁcation and function architecture design documents among others have been used as alternative
sources of data to achieve data triangulation [20].
2.4

Data Analysis Procedure

The data collection methods facilitated collection of both qualitative (predominant) and quantitative data through interviews, archived documents and databases. Strauss and Corbin’s Grounded Theory (GT) approach [26], a commonly
used method that facilitates the analysis of qualitative and quantitative data was
employed to answer the research questions. The three steps of coding process open coding, axial coding and selective coding- were performed. The theory so
generated was conﬁrmed through the hypothesis conﬁrmation method of triangulation. In addition to data traingulation, user-group triangulation [30] based
on selection of interview subjects from multiple categories and multiple subjects from each category, and evaluator traingulation [30] based on reviewing
the results obtained with several researchers working within similar areas of
research was employed. Hence, the ﬁnal output of data analysis presents results
that are both grounded and supported by a body of evidence.

An Eﬀecective Automotive Cross-Functional Veriﬁcation Strategy

2.5

237

Validity Procedure

The proposed strategy addresses process-based challenges. The validity of the
strategy was evalutated based on its eﬀectiveness of obtaining the desired
results and its feasibility for implementation with optimum cost and eﬀort. To
assess these aspects of the strategy, a static comparitive analysis of the processenhacement approach proposed in this strategy with alternative process enhancement approaches was performed. This analysis was based on scientiﬁc literature
knowledge and industry expert opinion. Here, scientiﬁc literature knowledge was
captured by conducting a literature review as per the guidelines presented in
[19]. To capture industry expert opinions, six practitioners from Scania with
vast experience and high knowledge in the relevant areas were carefully chosen. The evaluation study was conducted based on an assesment of the cost,
in terms of time and people eﬀort in implementing the strategy, certainty of
Return on Investment (ROI) in terms of the likelihood that the implementation
of the strategy would bring the desired results, and value of the ROI in terms of
the extent to which the results obtained on implementing the strategy solve the
challenges at hand. There after, historic data pertaining to testing a distributed
automotive software function - Fuel Level Display across the diﬀerent test levels
for three test rounds spread over three months was collected. This historic data
was then used to implement the proposed strategy and analyse the results to
assess its feasibility and eﬀectiveness in a real world setting.

3

Challenges in Automotive System Software Testing

The challenges, in the form of issues, pertaining to the approach implemented to
test distributed automotive software functions across the test levels at the case
company have been summarized and presented in Table 1 to answer research
question RQ1. It was identiﬁed that the challenges found through the case study
were similar to the challenges presented in exisiting scientiﬁc literature [11,27].
The study of the interdependent relationships that exist among the identiﬁed
issues helped establish that there is a very intricate relationship that can be
represented through a multi-way dependency among the identiﬁed set of people,
process and technology issues. For instance, let us consider the complex relationship between one identiﬁed set of people, process and technology issues presented
in Fig. 2. Hence, it was inferred from the results obtained on performing several
steps of data analysis that, an intricate set of people, process and technology
issues formed the source for the fundamental issues for integration testing of the
distributed software functions at the case company.
A further study of the areas of people, process and technology and the relationships that exist between the issues within these areas in the current context
was performed. It was then identiﬁed based on literature [18] and on case study
results (refer to Fig. 2), that process issues lie at the core of the three identiﬁed issue areas. Technology is adopted to ﬁt the process established, and people
knowledge and eﬀort is built around the process. Hence, addressing the process

238

A. Chunduri et al.

Table 1. Identiﬁed challenges in the form of issues with current approach to test
distributed embedded software functions at Scania
SNo Issue category

Description of issues

1

People Issues

Ambiguity of test role at function test level and a
lack of bigger picture of function test eﬀort distribution across test levels
Lack of appropriate means to share knowledge pertaining to test results across the test levels

2

Process Issues

Incomplete, vague and untestable higher level function requirements speciﬁcation
Variant-focused testing is highly dependent on each
individual tester’s knowledge with lack of means to
handle function variants through simplistic representation
Lack of adequate traceability across requirements
and test results of the diﬀerent test levels

3

Technology Issues

Lack of tool support to assist in test management, requirements management and change management
Complexity of having test artifacts (test results,
test reports, test cases, etc.) and function and system requirements spread across multiple locations
with limited interoperability

4

Fundamental Test Issues No comprehensive function test coverage information across test levels
Ambiguous and inaccurate test reports generated
at vehicle integration test level
Lack of test report generation at the function test
level
Lack of appropriate means to identify where or
whether redundant testing and risky test gaps are
present across the test levels

Fig. 2. An instance of the identiﬁed people, process and technology issue relationship

An Eﬀecective Automotive Cross-Functional Veriﬁcation Strategy

239

issues was deduced to be suitable to help pave the way for studying and addressing the people and technology issues that persist around the established process.
Thus, the veriﬁcation strategy proposed in this research aims to address the
process issues in order to resolve the fundamental test issues identiﬁed in the
approach for testing distributed embedded software functions at the automotive
case company.

4

Process Enhancement Based Veriﬁcation Strategy

The veriﬁcation strategy proposes to enhance the function requirements speciﬁcation by implementing a semi-formal scenario-based modelling that provides
a means to capture complete and testable requirements with adequate function
variant information. Thereafter, the strategy proposes to adopt a multi-level
reuse concept of combining test results from the system, function and vehicle
integration test levels by establishing appropriate traceability across the requirements and among the test results. Here, test levels with a focus on requirementsbased test coverage have been considered to be within the scope. The obtained
comprehensive test coverage information can then be used to identify test gaps
and test redundancies and take more data-driven decisions to enhance integration testing and thereby improve test coverage of the distributed software
functions.
4.1

Steps for Implementation of the Verification Strategy

The steps for implementation of the proposed veriﬁcation strategy are presented
in Fig. 3. A brief description of these steps is discussed below.

Fig. 3. Steps for implementation of proposed process enhancement based veriﬁcation
strategy

240

A. Chunduri et al.

STEP 1. Enhancing Function Requirements Specification
The ﬁrst step deals with adopting a semi-formal scenario-based modelling of the
distributed software function requirements. It initially involves the identiﬁcation
of function use cases and use case variants. Here, each use case of a function
describes one of its unique behaviour and use case variants are all possible contexts in which the function use case takes a diﬀerent sequence of actions to
execute the same behaviour described by the use case. For example, let us consider the function Fuel Level Display, for which two use cases are activation and
deactivation of the feature of displaying low fuel level warning on the Instrument
Cluster (display component) of the vehicle, when the activation and deactivation
conditions are met. Here, for one of the use cases- activation of low fuel level
warning- its variants can be identiﬁed to be all those contexts in which diﬀerent
steps are executed to reach the same outcome of displaying low fuel level warning
when activation conditions are met. This here would mean that if activation of
low fuel level warning is executed diﬀerently in vehicles with liquid engine and in
vehicles with gas engine, then the use case variants include vehicles with liquid
engine and vehicles with gas engine.
The next step is to generate a Use Case Description (UCD) for each use case
of the function and a Use Case Tree (UCT) for each function. A UCD is a stepby-step description of the interaction between the participating systems required
to execute the use case in its operational environment, written in structured
natural language. There are several diﬀerent templates proposed in literature
for generating UCDs. One such template is presented by Somé [25] which is
the inspiration for the current template proposed. The template is tailored and
enhanced to be ﬁt the automotive industry. A UCT is a graphical tree-structure
representation of all possible paths containing unique combination of steps from
the UCD for the execution of the use case.
Thereafter, all possible UCT paths, termed ‘scenarios’, are to be identiﬁed
and corresponding Message Sequence Charts (MSCs) [25] are to be generated
such that
(a) Nodes common to all variants are covered at least once for each variant, and
(b) Nodes speciﬁc to a particular variant are covered at least once for that variant.
Here, a scenario is one single sequence of interaction between the involved systems to execute a use case of the function. Adequate notes should be used within
the MSCs to capture the important information of the UCD within the scenario
models to enhance their understandability. This representation can hence be
used to provide a comprehensive view of the function as a set of complete and
testable function requirement scenarios with adequate variant information.
STEP 2. Establishing Traceability across Test Levels
This step deals with establishing traceability to obtain comprehensive test coverage information for the functions across the System, Function and Vehicle
Integration test levels.
Initially, the system test level is taken into consideration. Here, there is a
need to establish traces between the function scenarios involving multiple ECU

An Eﬀecective Automotive Cross-Functional Veriﬁcation Strategy

241

systems and system level requirements of each involved ECU system. There
after, the system level requirements test coverage pertaining to the mapped
requirements are to be captured against the function scenarios. This provides
an understanding of how much of the function is tested at the system test level
across all the relevant systems. To establish traces between the function scenario
and system level requirements, each function scenario is taken and broken down
to the system view. Thus, each scenario has a corresponding set of system views
of all the systems that contribute to its realisation. Here, each system view
represents a set of all relevant input and output combinations pertaining to that
system’s role in the scenario.
The next sub-step is to consider the function test level. The test focus at this
level is on testing the overall function requirements. Therefore, the test results
should be mapped to the corresponding scenarios. Here, the scenarios and their
expected outcomes represent the function requirements that are to be tested.
The last sub-step is to consider the vehicle integration test level. At this test
level the test focus is on interface communication across the systems involved in
the execution of the function scenarios. Therefore, at this test level, similar to the
function test level, the test results are to be mapped against the function scenarios. Finally, as a result of implementation of this step, the obtained comprehensive
function test coverage information across test levels is to be presented.
STEP 3. Comprehensive Function Test Coverage Information
The comprehensive function test coverage information so obtained can then be
used within the last step of the strategy to identify risky test gaps, avoidable test
redundancies and make a more data-driven decision during integration testing
of the distributed embedded software functions to enhance their test coverage.

5

Validation Results and Discussion

The results of the study to validate the feasibility and eﬀectiveness of the proposed veriﬁcation strategy to answer research question RQ2 are presented and
discussed in this section.
5.1

Scientific Literature and Industry Expert Opinion

As presented in Sect. 4, the proposed strategy is a semi-formal process enhancement approach (Approach 2) to solving the fundamental issues in integration
testing of the distributed software functions. Hence, a static comparitive analysis to its alternatives - informal and formal process enhancement approaches
(Approaches 1 and 3) was conducted to validate its relative feasibility and eﬀectiveness. For the three alternative process enhancement approaches, the cost,
the certainty of ROI and the value of the ROI was assessed using a relative
ordinal scale of Highest(H), Medium(M) and Least(L). The results so obtained
are presented in Table 2.

242

A. Chunduri et al.

Table 2. Comprehensive results of comparative analysis of alternative process enhancement approaches
Approach Estimation based on Scientific Literature Estimation based on Industry Expert
Opinion
Cost Certainty of ROI Value of ROI

Cost Certainty of ROI Value of ROI

(1)

L

M

L

L

H

L

(2)

M

H

M

M

M

H

(3)

H

L

H

H

L

M

It has been identiﬁed based on scientiﬁc literature knowledge that, informal
approaches using natural language for the concerned process enhancement areas
(refer to Table 1) present ease of adoption and lack of need of any special skills
which implies low cost of implementation [9]. Yet, such natural language textual
requirements and requirements based testing, irrespective of the level of abstraction of the system or software, have been found in ample of cases to still be prone
to misinterpretation, ambiguity, inconsistencies and inaccuracy [5]. Moreover, it
has been found through experience and reported in literature that in the automotive industry, textual natural language requirements and requirements based
testing are “only part of the game” [29]. The industry identiﬁes the need for
the textual requirements at each level to be supported with adequate ways to
capture, model and present various other complex attributes of the industry
and thereby facilitate enhanced testing. A similar opinion has been found to
prevail among the industry experts interviewed. It is evident that majority of
them believe the ROI on such an investment will relatively add least value to
the industry in terms of solving the test process issues at hand.
While formal methods have been found to be one of the best options for model
based requirements speciﬁcation, testing and aligning these two disciplines for
safety critical systems like the systems within automotive industry [4], it has
also been found to be diﬃcult for practitioners to implement. It requires high
experience in representing requirements in mathematical and logical notations
[5,22]. This makes it is a highly cost-intensive solution. Such is also the opinion
of the experts interviewed, who believe that formal approaches might lead to
relatively highest cost of implementation with least certainty of ROI.
On the other hand, focusing on semi-formal methods through use cases and
scenarios as a means to model and test the functional behaviour of a software
or a system has been found to have their own challenges like the challenge
of generating executable test cases for enhancing eﬃciency of testing [2]. But
exploring semi-formal Model Based Testing (MBT) methods and addressing the
challenges within it has been found to generate solutions that are more cost
eﬀective and scalable as compared to the formal MBT methods. This makes it
an interesting area of research with high focus from the industry for whom cost
eﬀectiveness and feasibility of solutions are critical [4]. A similar estimation of
relatively high value of implementing this approach has been found to be the

An Eﬀecective Automotive Cross-Functional Veriﬁcation Strategy

243

opinion of industry experts. While it might cost the industry considerable time
and eﬀort for its implementation, majority believe it is a relatively most feasible
approach that will likely produce relatively most eﬀective results.
5.2

Historic Data

The strategy was implemented for Fuel Level Display function which is distributed across six systems, with two function variants and six function scenarios.
The comprehensive function test coverage information across the test levels was
captured for three test rounds (refer to Fig. 4). This helped identify the following:
– It was evident that most of the test eﬀort across the test levels was focused
on liquid engine trucks. This exposes the risky test gap in testing the function
adequately across test levels for its implementation in gas engine variant.
– There is redundant testing of the main function scenarios (SCN), SCN1 and
SCN5, at the function and vehicle integration test levels. On obtaining comprehensive function test coverage information across test levels, such redundancies can be avoided by ensuring the function and vehicle integration test
levels cover diﬀerent scenarios of the function.
– On analysing the system level function test coverage information, it was identiﬁed that most of the test gaps at the system level lie in three of the six
systems involved.
– With this data, any change can be traced from the system it is pertaining to,
to the appropriate function scenarios that it eﬀects. This reduces the eﬀort in
analyzing the change impact at the vehicle integration test level.
Besides successfully obtaining the desired results, two critical gaps which have
the capability to potentially hamper the eﬀectiveness of the strategy were also identiﬁed. These include missing system-level requirements and missing test reports at
the function test level. Another critical result obtained was the identiﬁcation of a
potential error in the function design for one of its variant which was unnoticed
previously. This opens up the interesting possibility to more extensively study the

Fig. 4. Comprehensive Fuel Level Display function scenario test coverage across the
test levels for three test rounds

244

A. Chunduri et al.

eﬀectiveness of the proposed strategy for enhancement of distributed embedded
software function design through scenario-based requirements speciﬁcation.
While the implementation of the strategy in a real world setting helped study
its practical feasibility and eﬀectiveness, it also helped identify that its eﬃciency
will likely reach its highest potential with the incorporation of suitable tool support and right people knowledge at steps of the strategy where maximum eﬀort
in terms of time and cost was invested. These steps include manual generation of
the function scenario models and gathering information spread across multiple
data sources to establish the desired traces across the test levels. Hence, these
aspects that account to maximum eﬀort in implementation present areas where
the eﬃciency of the strategy can most likely be further enhanced. The current
study therefore sets the process focused platform for moving towards a comprehensive solution for the issues with the current approach to test distributed
embedded software functions in the automotive industry.
5.3

Threats to Validity

Refering to Runeson et al. [20], a description of the validity threats, along with
the counter measures adopted to reduce them to the best of the researchers’
ability are presented in this section.
Construct Validity. Construct validity deals with whether the operational
measures studied accurately represent what is being investigated. In the conducted research, one construct validity threat was a possibility of designing interview questions for the case study which may be misinterpreted by the interviewees and lead to collecting irrelevant data. This risk was controlled by reviewing
and revising the interview questionnaire and by performing simultaneous data
collection and data analysis so the questionnaire can be readjusted if deemed
necessary. Moreover, traingulation methods were employed to ensure the validity of what was being studied.
Repeatability and Reliability. Repeatability deals with whether the research
conducted is repeatable, hence making it reliable. There was a risk that the case
study conducted will not be adequately documented to make it repeatable. This
risk was controlled by using the case study protocol to design and well document
the research. Moreover, the actions undertaken throughout the research were
reviewed and discussed among the researchers. Such auditing has been found to
help mitigate this risk [31].
Internal Validity. Internal validity deals with how causal relationships are
examined and relevant conclusions are drawn. In the conducted research, there
is a possible risk of poor data analysis leading to incorrect conclusions. This
risk was countered by discussing the results in length among the researchers and
other senior practitioners at the case company. Moreover, the results of data
analysis were backed with adequate literature support to validate the conclusions
drawn. However, there still exists a possible internal validity threat that exists in
this research due to the consideration of limited test data for drawing conclusions

An Eﬀecective Automotive Cross-Functional Veriﬁcation Strategy

245

from the implementation of the proposed strategy. This threat can be reduced
further by focusing future work on studying the results of implementation of the
veriﬁcation strategy on more extensive test data spread across several test rounds.
External Validity/Generalisability. External validity deals with the extent
to which the ﬁndings can be generalized. Since the research was conducted within
a single automotive company, the ﬁndings may not hold in other cases. This risk
was reduced to a great extent by explaining in suﬃcient detail the context within
which the study was conducted in Sect. 2.2. Another major external validity
threat lies in the implementation of the proposed strategy for only one distributed embedded software function. These threats exist and can be mitigated
by focusing future work on implementating the strategy to a larger number of
more complex distributed functions and studying its feasibility and eﬀectiveness
within other automotive companies.

6

Related Work

Modelling requirements for veriﬁcation is not a new concept. It has been adopted
for various veriﬁcation activities, most popularly for testing and is termed MBT.
MBT has been found to help signiﬁcantly enhance the testing process [4]. A
semi-formal approach, based on the use of scenarios for modelling requirements
within MBT is being increasingly studied in literature [2,15,28]. A large part
of the research that focuses on using such semi-formal model-based approach
to aligning requirements speciﬁcation and testing through traceability has been
found to involve proposing solutions but little research focus is on evaluating
and validating these proposed solutions [4].
Tsai et al. in [28] explore scenario-based traceability in order to select test
cases for impact analysis and regression testing of changes. The authors propose
the use of scenarios in regression testing of system software function changes
for reduced test eﬀort through improved impact analysis. While fundamentally
it is similar to the work in the current research in terms of exploring scenarios
for improved testing, it diﬀers in aspects where the current research aims to
capture scenarios and propose their suitability in reducing test gaps and test
redundancies and improving test eﬀort distribution across multiple test levels of
the automotive industry V-model.
Nebut et al. in [15] present work on deriving function tests from use cases
in the form of test scenarios through test objectives. The major focus of the
presented research, similar to [28], is on the automation of the process of using
scenario-based requirements speciﬁcation and testing. Where as, in the current
research, the focus is on studying and validating the eﬀectiveness of using scenarios for testing and traceability in the automotive industry.
Within the context of embedded system software, speciﬁcally in the automotive industry, most previous related work in MBT and subsequent study on
traceability and alignment of the disciplines of requirements speciﬁcation and
testing is based on formal methods [13,24]. While some studies focus on the

246

A. Chunduri et al.

concept of test scenarios as a means of managing the requirements of the increasingly complex automotive embedded software, they take a formalized approach
to the generation and use of test scenarios like in [7].
Moreover, most of the testing focus through such formal modelling methods
is concentrated at the system integration and lower test levels of the automotive
V-model [23]. While some apply their approaches to distributed functions at the
vehicle integration test level like in [24], focus on this area in general is limited.
MBT at the vehicle integration test level of the automotive test process and more
speciﬁcally a semi-formal scenario-based MBT with established traces to lowerlevel system requirements and testing is unknown today but with great potential
for improvement [6]. Such a multi-level reuse of test eﬀort has been recognized for
test eﬀort reduction during test case generation [16]. Within the current thesis
research, this concept of multi-level reuse is dealt with a broader perspective of
reducing test eﬀort by providing a means to take more data-driven decisions to
improve test coverage by identifying and mitigating test redundancies and test
gaps. Such comprehensive test strategies to cope with the complex aspects of the
automotive software testing process like [3], are limited for integration testing of
the distributed software functions.

7

Summary and Conclusion

With the advent of software for the realization of several functions of a vehicle, there has been and continues to be several challenges that the automotive
industry faces to which it was unfamiliar a few decades ago [10,12,29]. One such
identiﬁed facet of the software-related challenges is in the area of system software
testing [11,27]. The automotive embedded software functions are increasingly
distributed in nature, implying that the software modules for the realization of
the functions are distributed across several ECU systems of the vehicle. This
adds to the complexity of integration testing of the distributed software functions [27]. Research focusing on studying the challenges in integration testing
of the distributed software functions [11,27] and possible eﬀective and feasible
strategies to tackle these challenges [3] within the automotive industry is limited. Hence, the current research contributes to the area of integration testing
of distributed automotive embedded software functions.
The scenario-based function requirements speciﬁcation proposed as part of
the process enhancement based veriﬁcation strategy in this research has been
found to provide an eﬀective and feasible means to capture test-driven requirements in the form of function scenarios. This was found to make them suitable
for establishing the desired traceability. Thereafter, the stategy proposed in this
research presents a manner in which the requirements and testing data across
test levels can be comprehensively tied together to aid in eﬀectively performing
the integration testing of the distributed embedded software functions within the
automotive industry. Implementation of the proposed strategy in a real world
setting has given promising results. The broad concept of multi-level reuse of
test results across the test levels of the automotive V-model was found to provide an eﬀective and feasible means of capturing comprehensive test coverage

An Eﬀecective Automotive Cross-Functional Veriﬁcation Strategy

247

information pertaining to the distributed software functions. Consequently, it
has been identiﬁed that there is a critical link between requirements and testing
that needs to be established across all test levels for implementing the multi-level
reuse concept. Such traceability links provide a means to assess the eﬀectiveness
of the current approach to test the distributed software functions by identifying
and mitigating test redundancies and test gaps across the test levels. It hence
helps take more data-driven decisions for integration testing of the distributed
software functions and thereby enhance function test coverage.

References
1. Adenmark, M.: Scania Test Levels, Scania Internal Document (REST08012) (2008)
2. Arnold, D., Corriveau, J.P., Shi, W.: Modeling and validating requirements using
executable contracts and scenarios. In: Proceedings of the 8th ACIS International Conference on Software Engineering Research, Management and Applications (SERA), pp. 311–320. IEEE (2010)
3. Barhate, S.S.: Eﬀective test strategy for testing automotive software. In: International Conference on Industrial Instrumentation and Control (ICIC), pp. 645–649.
IEEE (2015)
4. Barmi, Z.A., Ebrahimi, A.H., Feldt, R.: Alignment of requirements speciﬁcation
and testing: a systematic mapping study. In: Proceedings of the 4th IEEE International Conference on Software Testing, Veriﬁcation and Validation Workshops
(ICSTW), pp. 476–485 (2011)
5. Von der Beeck, M., Margaria, T., Steﬀen, B.: A formal requirements engineering
method for speciﬁcation, synthesis, and veriﬁcation. In: Proceedings of the 8th
Conference on Software Engineering Environments, pp. 131–144. IEEE (1997)
6. Bringmann, E., Kramer, A.: Model-based testing of automotive systems. In: 1st
IEEE International Conference on Software Testing, Veriﬁcation, and Validation,
pp. 485–493 (2008)
7. Conrad, M., Fey, I., Sadeghipour, S.: Systematic model-based testing of embedded
automotive software. Electronic Notes Theor. Comput. Sci. 111, 13–26 (2005)
8. Dhadyalla, G., Kumari, N., Snell, T.: Combinatorial testing for an automotive
hybrid electric vehicle control system: a case study. In: IEEE 7th International
Conference on Software Testing, Veriﬁcation and Validation Workshops (ICSTW),
pp. 51–57. IEEE (2014)
9. Ferrari, A., dell’Orletta, F., Spagnolo, G.O., Gnesi, S.: Measuring and improving
the completeness of natural language requirements. In: Salinesi, C., Weerd, I. (eds.)
REFSQ 2014. LNCS, vol. 8396, pp. 23–38. Springer, Heidelberg (2014). doi:10.
1007/978-3-319-05843-6 3
10. Grimm, K.: Software technology in an automotive company: major challenges. In:
Proceedings of the 25th International Conference on Software Engineering, pp.
498–503. IEEE Computer Society (2003)
11. Kasoju, A., Petersen, K., Mäntylä, M.V.: Analyzing an automotive testing process
with evidence-based software engineering. Inf. Softw. Technol. 55(7), 1237–1259
(2013)
12. Broy, M.: Challenges in automotive software engineering. In: Proceedings of the
28th International Conference on Software Engineering, pp. 33–42. ACM (2006)

248

A. Chunduri et al.

13. Marinescu, R., Saadatmand, M., Bucaioni, A., Seceleanu, C., Pettersson, P.: A
model-based testing framework for automotive embedded systems. In: Proceedings
of the 40th EUROMICRO Conference on Software Engineering and Advanced
Applications (SEAA), pp. 38–47. IEEE (2014)
14. Marshall, M.N.: Sampling for qualitative research. Family Pract. 13(6), 522–526
(1996)
15. Nebut, C., Fleurey, F., Le Traon, Y., Jezequel, J.M.: Automatic test generation: a
use case driven approach. IEEE Trans. Softw. Eng. 32(3), 140–155 (2006)
16. Perez, A.M., Kaiser, S.: Integrating test levels for embedded systems. In: Proceedings of the Testing: Academic and Industrial Conference - Practice and Research
Techniques (TAIC PART), pp. 184–193. IEEE, September 2009
17. Praprotnik, O., Gartner, M., Zauner, M., Horauer, M.: A test suite for system
tests of distributed automotive electronics. In: 2nd International Conference on
Advances in Circuits, Electronics and Micro-electronics (CENICS), pp. 67–70.
IEEE (2009)
18. Radeka, K.: The toyota product development system: integrating people, process
and technology by James M. Morgan and Jeﬀrey K. Liker. J. Prod. Innov. Manage
24(3), 276–278 (2007)
19. Rowley, J., Slack, F.: Conducting a literature review. Manage. Res. News 27(6),
31–39 (2004)
20. Runeson, P., Höst, M.: Guidelines for conducting and reporting case study research
in software engineering. Empir. Softw. Eng. 14(2), 131–164 (2008)
21. Saglietti, F.: Testing for dependable embedded software. In: 36th EUROMICRO
Conference on Software Engineering and Advanced Applications (SEAA), pp. 409–
416. IEEE (2010)
22. Shah, U.S., Jinwala, D.C.: Resolving ambiguities in natural language software
requirements: a comprehensive survey. ACM SIGSOFT Softw. Eng. Notes 40(5),
1–7 (2015)
23. Shokry, H., Hinchey, M.: Model-based veriﬁcation of embedded software. IEEE
Comput. 42(4), 53–59 (2009)
24. Siegl, S., Hielscher, K.S., German, R., Berger, C.: Formal speciﬁcation and systematic model-driven testing of embedded automotive systems. In: Proceedings of
the Europe Conference & Exhibition on Design, Automation & Test (DATE), pp.
1–6. IEEE (2011)
25. Somé, S.S.: Supporting use case based requirements engineering. Inf. Softw. Technol. 48(1), 43–58 (2006)
26. Strauss, A., Corbin, J.: Basics of Qualitative Research: Grounded Theory Procedures and Techniques. Sage Publications, Beverly Hills (1990)
27. Sundmark, D., Petersen, K., Larsson, S.: An exploratory case study of testing in
an automotive electrical system release process. In: 6th IEEE International Symposium on Industrial Embedded Systems (SIES), pp. 166–175. IEEE (2011)
28. Tsai, W.T., Bai, X., Paul, R., Yu, L.: Scenario-based functional regression testing.
In: Proceedings of the 25th Annual International Computer Software and Applications Conference (COMPSAC), pp. 496–501. IEEE (2001)
29. Weber, M., Weisbrod, J.: Requirements engineering in automotive development Experiences and challenges. In: Proceedings of the IEEE Joint International Conference on Requirements Engineering, pp. 331–340 (2002)
30. Wilson, C.E.: Triangulation: the explicit use of multiple methods, measures, and
approaches for determining core issues in product development. Interactions 13(6),
46–47 (2006)
31. Yin, R.K.: Case Study Research: Design and Methods. Sage Publications (2013)

Problems and Solutions in Mobile
Application Testing
Triin Samuel and Dietmar Pfahl(&)
Institute of Computer Science, University of Tartu, Tartu, Estonia
{triin.samuel,dietmar.pfahl}@ut.ee

Abstract. In recent years the amount of literature published about mobile
application testing has signiﬁcantly grown. However, it is unclear to what
degree stated problems and proposed solutions are relevant to industry. To shed
light on this issue, we conducted a literature survey to provide an overview of
what current scientiﬁc literature considers problems and potential solutions in
mobile application testing, and how often proposed solutions were reportedly
evaluated in industry. Then we conducted a case study involving six software
companies in Estonia to ﬁnd out which of the problems are considered relevant
by professionals, and which of the proposed solutions are considered novel and
applicable. In total, we identiﬁed 49 potential problems or challenges in the
mobile application testing domain and 39 potential solutions, some of which
were implemented software tools while others were just theoretical concepts.
Although some of the solutions were reportedly applied in practice, in most
cases the literature did not give much information on the actual usage in industry
of the proposed solutions. The case study revealed that while the relevance of
each identiﬁed problem was highly variable from one company to another, there
are some key problems that are generally considered vital both by research and
industry. Regarding solution proposals, it turned out they are often described too
much on the conceptual level or are too unrelated to the most urgent test-related
problems of our case companies to be of interest to them.

1 Introduction
In the recent years, mobile devices have grown from futile entertainment gadgets to
popular and ever-present media with a wide range of uses from social applications to
business, medicine and others. This has brought the importance of testing mobile
applications into highlight. As mentioned by various researchers [1–4] mobile applications have some unique qualities that demand new or modiﬁed testing approaches to
ensure effectiveness and efﬁciency. Accordingly, the number of scientiﬁc papers
written about mobile application testing is steadily increasing. However, it is uncertain
whether the proposed solutions are usable in industry and whether the problems
mentioned in industry are actually relevant in real mobile application development and
testing. In order to ﬁnd answers to these questions, we decided to carry out a literature
survey and followed by a case study to assess the practical relevance of the information
collected from literature. The exact research questions are as follows:

© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 249–267, 2016.
DOI: 10.1007/978-3-319-49094-6_16

250

T. Samuel and D. Pfahl

• RQ1: What are the problems speciﬁc to testing of mobile applications as opposed to
conventional applications, according to scientiﬁc literature?
• RQ2: What are the solutions (methods, tools) proposed by literature, if any?
• RQ3: According to literature, to what extent are these solutions used in industry?
• RQ4: Does industry consider the solutions proposed in the literature as relevant?
• RQ5: Does industry consider the solutions proposed in the literature as useful?
The rest of this paper is structured as follows. After a brief overview of the topic
given in Sect. 2, the methodology is introduced in Sect. 3. Section 4 covers the results
of the literature survey presenting answers to research questions RQ1 to RQ3. More
speciﬁcally, it presents what scientiﬁc literature considers being problems in mobile
application testing, which solutions are proposed, and how much the solutions are used
in industry. Section 5 discusses the results of interviews conducted within six companies in order to evaluate how relevant industry considers the problems and solutions
mentioned in literature. This addresses research questions RQ4 and RQ5. Section 6
provides a brief discussion of the limitations of the literature survey and case study.
The paper ends with a summary of the results and conclusions.

2 Background
The ﬁrst device that could be considered a smartphone was IBM Simon [5] released in
1994. Smartphones as we know them today started gaining mainstream popularity only
in year 2007 when the ﬁrst iPhone was released [6]. Since then, smartphone sales have
skyrocketed [7]. What initially were thought to be just enhanced phones and entertainment devices have now developed into a wide range of different devices capable of
performing business tasks, simplifying everyday life and enabling users to be continually connected to their work, social circles and service providers [1, 8]. Mobile
devices are challenging conventional computers [9]. Consequently, the criticality of
mobile applications has signiﬁcantly increased [1, 3]. This has forced developers to
focus more on the quality of their applications.
Testing of mobile applications incorporates many of the problems inherent to
software testing in general. However, mobile devices also have qualities that differentiate them from conventional computers and therefore create testing challenges that
are either unique to or more relevant in the case of mobile applications.
The three dominating mobile operating systems (OS) are Android, iOS and Windows Phone. According to net market share [10], Android was the most popular OS in
the ﬁrst quarter of 2016 with a 60 % market share. iOS followed with 32 %. Windows
Phone was third with 3 %, followed by Java ME having 2 %.
Android is a free open-source operating system based on the Linux kernel and was
released by Google in 2008. Android applications are normally developed in Java,
compiled to Java bytecode and then to Dalvik bytecode to be run on Dalvik virtual
ma-chine (DVM), with most of the code interpreted during runtime. From version 5.0,
DVM has been replaced by Android Runtime (ART) that compiles the application to
machine code during installation. Therefore, even though Android applications are
commonly developed in Java, they cannot be run on Java Virtual Machine.

Problems and Solutions in Mobile Application Testing

251

The second most popular operating system is iOS, a proprietary, closed source
operating system released by Apple in 2007. The iOS operating system can be used
only on Apple devices. Applications for iOS are normally developed either in Swift or
Objective-C. The core of iOS is based on Darwin, a Unix operating system also used
for Apple OS X, and Cocoa Touch is used for the user interface.
Windows Phone (previously Windows Mobile, now Windows 10 Mobile) is a
proprietary closed-source operating system developed by Microsoft and released in
2010. Applications for Windows mobile devices can be developed in various languages
like C#, .NET, C++ and HTML5. The latest mobile operating system from Microsoft
was released as Windows 10 Mobile, reflecting Microsoft’s intention to essentially
merge the desktop and mobile versions of Windows [11] so that same apps could be
run on both of them.

3 Methodology
In this section, we describe our research methods, i.e., how we conducted our literature
survey and case study. We used the guidelines provided in [12, 13] as an orientation
but didn’t follow all of them to the letter.
3.1

Literature Survey

In order to get familiar with the literature, we ﬁrst conducted an informal search in the
ACM Digital Library. We searched for articles related to problems in mobile application testing published in 2007 or later because that was the year when the ﬁrst
iPhone, as well as the ﬁrst alpha version of Android was released. Since we did not use
any additional ﬁltering, we got an excessive amount of results which we ordered based
on relevance. We skimmed through the most relevant search results and manually
chose 26 articles that seemed relevant to the question in hand by title.
Through reading the initial papers, we learned some additional keywords and
search criteria that could be used. We also noticed that most of the results were
conference papers and papers that mentioned problems usually also discussed solutions
to them. We then conducted a second, more formal and structured search for journal
articles. Since some relevant papers might not be indexed by the ACM Digital Library,
we conducted the second search in the following four databases:
•
•
•
•

ACM Digital Library
SpringerLink (Computer science)
Scopus (Computer science)
ScienceDirect (Computer science)

We applied the following inclusion criteria:
• Only journal articles
• Published 2007 or later
• Full-text is available in the database

252

T. Samuel and D. Pfahl

We applied the following exclusion criteria:
• Papers that were mainly about hardware-related, low-level communication or network issues, as opposed to end-user mobile applications.
• Papers to which we do not have full text access rights.
• Articles that do not analyse or make new contributions to the testing process itself;
for example, if the paper was about developing a non-testing-related mobile
application and at the end it was tested just to prove that the application works, then
the article is not really about testing, even though it features it.
• Papers that are about mobile web application testing. Since web applications run in
a browser or in a browser-like program, they don’t inherit many of the challenges
that native mobile applications have and are often more similar to web applications
meant for desktop devices than to native mobile applications [14].
• Testing techniques that are not meant for consumer-oriented mobile applications.
The exact search queries can be found in Appendix I of [15]. The second, formal,
search yielded 374 results, 355 of which were unique. Out of these, 84 were left after
manual ﬁltering based on the title. Therefore, the total set of abstracts to read was 26 +
84 = 110. Based on the abstract, 57 papers were discarded, which results in a set of 53
papers to read. While reading we discarded two papers because one had low relevance
and one was superseded by another more recent publication. This left us with 50
publications.
Extracting Problems and Solutions. For each of the selected publications, we
highlighted problems and solutions mentioned. If a solution was proposed in the paper,
we assigned an approximate category to it and wrote the most important keywords
concerning the solution to the front page. After reading all publications, we went
through all the highlighted parts concerning RQ1 and wrote out all the found problems.
Researchers rarely used the word ‘problem’, but often highlighted ‘challenges’ to
justify the necessity of the solution they were going to propose. Offering a solution
clearly shows that they considered the ‘challenge’ something that needed to be solved,
so we counted these as problems. Some problems were also collected from general
discussion parts of the papers.
After extracting a list of problems, we went through the papers again to write
summaries of the proposed solutions (RQ2). The solutions were based mostly on the
highlighted parts and the keywords we had written on the papers while reading, but
details often needed to be clariﬁed from other parts of the paper.
Assessing Whether Proposed Solutions were Evaluated in Industry. After we had
extracted the list of solutions from the selected papers, we analysed the papers once
again with the goal to ﬁnd evidence that proposed solutions had actually been evaluated
or applied in industry (RQ3).

Problems and Solutions in Mobile Application Testing

3.2

253

Case Study

Selection of Industry Professionals. We compiled a set of 23 potentially relevant
companies based on a Google search and our existing knowledge about software
companies in Estonia. Then we explored web sites of the companies and selected those
which fulﬁlled the following criteria:
• Operate in Estonia
• Deal with testing of native mobile applications. If a company develops native
mobile applications, then we assumed that testing is done unless the home page
hints that it is outsourced
• Are not a one-person company
• Seem professional enough to pay attention to the testing process
This restricted the list to seven companies, which we contacted. Five of the contacted
companies replied and were willing to participate. In addition to these, one of the
chosen companies put us in contact with a very suitable, but less known company that
we were not aware of, which also agreed to participate. This resulted in a total of 6
companies to interview. We then asked our contacts to suggest interviewees involved
in testing native mobile applications. In two cases we used pre-existing in-company
contacts to ﬁnd a suitable person in the company to interview.
Participating Companies. The following companies were selected:
• Fob Solutions: a mobile-oriented quality assurance company that on the side also
provides development of web and native mobile applications. Fob Solutions has
about 20 testers and some developers who work with Android, iOS and Windows
Phone. We talked to the head of quality assurance.
• Testlio: a company that provides a community-based testing service. Testlio
manages the testing process and prepares everything necessary, but actual testing is
performed by a network of approximately 200 freelance testers who are not
employees of Testlio. Testlio works with Android, iOS, Windows Phone and to a
lesser degree BlackBerry. In Testlio testing is performed manually and the company
doesn’t diagnose the found problems. The company does have its own platform to
facilitate testing but it mostly has management functionalities, not test running or
generation. We interviewed a QA manager that we knew prior to the interview.
• TestDevLab: a quality assurance company that in addition to the more common
testing services also provides battery, penetration and data usage testing. About 50
people are involved in Android, iOS and Windows Phone applications testing in
TestDevLab. TestDevLab QA engineers are not oriented to a certain platform,
therefore my interviewee had worked with different platforms (web, iOS, Android)
in different projects. TestDevLab owns a test automation tool called Apimation1.
The company has its headquarter in Latvia but it is common for their employees to

1

https://apimation.com.

254

T. Samuel and D. Pfahl

temporarily move to where the client is located. Therefore, we got a chance to talk
to one of their QA engineers who lives in Estonia.
• Wazombi: a company focused on providing end-to-end solutions where everything
from electrical engineering to UI design is done in one house. Wazombi works with
Android and iOS, but as learned from the interview, most of their Android applications are not Java-based. Instead, Xamarin and C# are used. Xamarin also constitutes the only test generation tool mentioned by case study participants. Since the
company is more oriented on development, it has only one person speciﬁcally
assigned to mobile application testing, whom we interviewed.
• MoonCascade: a company that mainly provides mobile, responsive web and
back-end development. From mobile platforms, Android, iOS and Windows Phone
are used. There are four people working at mobile application testing. Some testing
frameworks like Appium and Selendroid are used for test running. We interviewed
the lead of the quality assurance team.
• Mobi Lab: a mobile application design and development company, formerly a part
of current parent company Mobi Solutions. They work with Android, iOS and
Windows Phone. We interviewed the only dedicated tester, but developers are also
responsible for testing the applications that they are making.
Producing a Problem-Solution Matrix. Explaining every solution proposed in the
literature to our interviewees would have resulted in unrealistically long and inefﬁcient
interviews. Therefore, we planned to only present solutions to problems that interviewees previously identiﬁed as being highly relevant to them. Since we did not have
any information about the perceived relevance of each problem prior to the interview, it
was not possible to choose the set of solutions to explain beforehand. We needed a
mapping of problems and solutions that we could use during the interview to choose
which solutions to explain.
In order to ﬁnd out which problems a given solution solves, we used the knowledge
of the solutions that we had gained from reading the papers, as well as the problems
and challenges that researchers presented as justiﬁcations for their solution. For each
problem-solution combination there were 4 options:
•
•
•
•

‘Y’ - a proposed solution signiﬁcantly contributes to solving the given problem
‘Partly’ - partly solves the problem
‘Maybe’ - might be useful, but more information is needed to know
Blank - a proposed solution does not address this problem

This resulted in the problem-solution matrix described in Sect. 5. Having this
matrix handy allowed us to present exactly those solutions that were related to the most
urgent problems highlighted by the interviewee, no matter which problem that was.
The Interview Process. The interview structure was as follows:
1. We introduced the research problem and collected some general information about
the company. This information included the number of employees involved in
testing mobile applications, whether the company is oriented at testing or development, mobile platforms the company works with, and experience with using or

Problems and Solutions in Mobile Application Testing

255

developing automated solutions for mobile application testing. In addition to this,
before showing the list of problems acquired from literature, we asked whether the
interviewee sees any notable challenges in mobile application testing.
2. We presented the list of testing problems found from literature and asked the
interviewee to rate the relevance of each problem for their actual mobile application
testing. The answers were given on a multiple choice scale that also included
options for “N/A” and “Already solved”. The questionnaire can be found in
Appendix II of [15].
3. We looked at those problems the interviewee considered important (marked as
“Deﬁnitely”) and used the Problem-Solution mapping presented in Sect. 5 to extract
the set of corresponding solutions proposed in literature. Thereafter, we introduced
these solution ideas to the professional and asked which of them seem potentially
useful in practice. Since the respondents were only interested in practically applicable solutions and time was scarce, we omitted articles that were very general or
too theoretical.
The time planned for each interview was 1.5 h. The ﬁrst interview part took about
10 min while the duration of the second part was dependent on how fast the interviewee ﬁlled out the questionnaire, averaging at about 30 min. Duration of the third
part was affected by how many problems the interviewee considered relevant in the
questionnaire. Two respondents ﬁlled out the questionnaire very fast, which resulted in
these interviews taking only 1 h. One interview was extended to 2 h because there were
many potentially relevant solutions to present and discuss.

4 Results from the Literature Survey
The results from the literature survey were used to answer research questions RQ1-3:
• RQ1: What are the problems speciﬁc to testing of mobile applications as opposed to
conventional applications, according to scientiﬁc literature?
• RQ2: What are the solutions (methods, tools) proposed by literature, if any?
• RQ3: According to literature, to what extent are these solutions used in industry?

4.1

Findings Related to RQ1

In this sub-section, we give an overview of problems and challenges (in the following
subsumed under the term ‘problems’) that are speciﬁc to or especially relevant in the
testing of mobile applications (cf. Table 1). We grouped problems according to their
core causes. In reality, however, each problem can have more than one cause and, thus,
the grouping shown in Table 1 should be taken as an approximation made in an effort
to simplify reading. A detailed description of each problem can be found in [15]. In
total 49 problems were identiﬁed.

256

T. Samuel and D. Pfahl
Table 1. List of identiﬁed problems grouped by core cause

Core cause
Fragmentation (large variety of platforms
with different operating systems, hardware,
storage, and screen sizes)
External software dependencies
Frequent external communication

Variable users and usage contexts

Fast evolution
Limited resources
Novelty

Limitations related to platform
implementation.
Other problems

4.2

Problems with references
P1 [3, 18, 20, 22–25] - P2 [1] - P3 [23, 26] P4 [27, 28] - P5 [23] - P6 [22, 26] - P7 [22]
P8 [1, 29] - P9 [30] - P10 [31]
P11 [1, 9, 16, 19] - P12 [19, 24, 29, 31–35] P13 [1, 23, 24, 36] - P14 [1, 37] - P15 [38] P16 [9, 21, 28, 32] - P17 [9, 39] - P18 [39] P19 [39]
P20 [40, 41] - P21 [21, 42] - P22 [21, 28, 43–
45] - P23 [26] - P24 [35, 43] - P25 [21, 43,
46] - P26 [33, 45] - P27 [43]
P28 [16, 32] - P29 [18, 21, 22, 47] - P30 [9,
46]
P31 [24, 35] - P32 [35] - P33 [1] - P34 [46]
P35 [32, 41, 48–50] - P36 [21, 32, 46, 51] P37 [36] - P38 [41, 51] - P39 [9, 21, 32, 43,
52]
P40 [23, 29, 39, 48] - P41 [48] - P42 [19, 53]
- P43 [54] - P44 [1, 24] - P45 [55] - P46 [32,
56] - P47 [35]
P48 [9, 16] - P49 [9]

Findings Related to RQ2

In this sub-section, we present the tools and methods proposed in the literature for
solving the problems described in the previous sub-section (cf. Table 2). The solutions
Table 2. List of identiﬁed solutions grouped by type
Type
Theoretical
General tools &
methods
GUI-based testing
Record-and-replay
Model-based
Model-learning
Search-based
Performance testing
Reliability testing
Compatibility
Usability and user
testing
Security testing

Solutions with references
S1 [4] - S2 [21] - S3 [27] - S4 [47] -S5 [33] -S6 [57] - S7 [23]
S8 [35] - S9 [17] - S10 [29] - S11 [31, 34, 48] - S12 [50] - S13
[26]
S14 [52, 58] - S15 [16, 59]
S16 [56]
S17 [40] - S18 [30]
S19 [55, 60] - S20 [61] - S21 [24]
S22 [61–64]
S23 [36]
S24 [38] - S25 [19] - S26 [39]
S27 [22] - S28 [20] - S29 [18]
S30 [65] - S31 [32] - S32 [54] - S33 [28] - S34 [46] - S35 [45]
S36 [37] - S37 [66] - S38 [25] - S39 [49]

Problems and Solutions in Mobile Application Testing

257

are grouped by type. A detailed description of each solution can be found in [15]. In
total, 39 solutions were identiﬁed.
4.3

Findings Related to RQ3

In this sub-section, we discuss to what extent proposed solutions (methods and tools)
have reportedly been used in industry.
Most of the solutions listed in Sect. 4.2 were evaluated either on one or a few
applications familiar to the researchers or on a more representative set of applications
acquired from app stores. However, in both cases the evaluation was performed by the
researchers themselves, usually in a controlled environment. Only one paper explicitly
mentioned that their proposed solution was used in a company, i.e. Swisscom (S13).
Also some proposed tools were evaluated on published apps (S16, S19, S21, S25, S37,
S39) and one was partly tested on apps currently under development (S39).
However, publications were (co-)authored by individuals with a company afﬁliation. This applies to S1 (both authors afﬁliated with Microsoft), S5 (both authors
afﬁliated with Nokia Research Center), S6 (one of the authors afﬁliated with Fujitsu
Laboratories), S8 (all authors afﬁliated with Microsoft Research), S10 & S11 (one of
the authors afﬁliated with NASA Ames Research Center), S25 (three out of four authors
afﬁliated with Microsoft Research), S29 (ﬁrst author is one of the founders of the
TestDroid testing platform), S30 (one of the two authors afﬁliated with Ericsson
Research), S35 (one of the authors afﬁliated with Telecom Italia). Even though in these
cases it was not mentioned in the paper whether the proposed solution was evaluated in
industry, it is highly probable that some of them are used in the companies to which
(some of) the authors are afﬁliated. It is worth noting, however, that most of the
afﬁliations are with research units within large corporations. Therefore, results reported
in the related papers might not have been applied in the business units of these companies, and they might not be suitable for problems in the in the rest of the industry, i.e.
in small and mid-sized companies.

5 Results from the Case Study
The results from the case study were used to answer research questions RQ4-5:
• RQ4: Does industry consider the solutions proposed in the literature as relevant?
• RQ5: Does industry consider the solutions proposed in the literature as useful?
In order to make the interviews conducted during the case study more efﬁcient, we
developed a problem-solution mapping based on the results from the literature survey.
For easier viewing, we made the original ﬁle available on Dropbox2. Note that the
columns in grey indicate solutions that were not presented in detail to the interviewees
because they related to problems that were not of high priority for the interviewees.

2

https://www.dropbox.com/s/ia8vgjr7a8ppxkr/Problem-solution%20matrix.ods?dl=0.

258

T. Samuel and D. Pfahl

5.1

Findings Related to RQ4

Five of the six interviewed companies said that fragmentation was a signiﬁcant
problem in mobile application testing even before seeing the list of problems proposed
in the literature. Testlio was the only company that didn’t consider fragmentation as a
signiﬁcant problem because their community-based approach already ensures a high
number of different platform, OS version, device, and screen size combinations. The
interviewee from Testlio mentioned two challenges in mobile application testing.
Firstly, there is a lack of ﬁne-grained tools that testers could use to record GUI
interactions leading to a fault. The ideal approach would be able to capture videos,
screenshots with click positions and have better logs than the current approaches. The
second problem was applications that need to be tested in very speciﬁc geographical
locations, especially on iOS where location information is more difﬁcult to mock than
on Android.
The questionnaire answers provided by all of the participating companies are listed
in Table 3. Questions corresponding to each question number can be found in
Appendix II of [15]. Basically, interviewees were asked to tell for each problem P1 to
P49 mentioned in the literature whether it is also a problem for them. The answer
choices were: ‘Deﬁnitely’ – ‘Maybe’ – ‘Probably not’ – ‘Deﬁnitely not’, plus the
answer options ‘n/a’ (not applicable) and ‘Solved’ (in case the problem existed but has
been solved in the meanwhile). The ﬁrst column shows the problem ID together with
an indicator representing the relevance of the problem for the case companies. The
symbol ‘++’ indicates that at least four companies found the problem relevant (at least
four times ‘Deﬁnitely’), ‘+’ indicates that two or three companies found the problem
relevant (two or three times ‘Deﬁnitely’), and ‘−’ indicates that no company felt
strongly that the problem is relevant (none of the companies stated ‘Deﬁnitely’).
The responses from the three companies that mainly focus on testing are displayed
on the left while companies whose main area of business is mobile application
development are displayed in the right half of Table 3.
The surveyed companies didn’t agree on the relevance of any of the listed problems. However, some patterns can be pointed out. 18 of 49 listed problems were not
considered relevant by all case companies. For example, testing inter-application
communications (P14) and more sophisticated testing techniques like simulating
external dependencies (P10), automatic page-load detection (P40) and ensuring completely clean application restart (P43) were never mentioned to be deﬁnitely relevant.
None of the companies considered the lack of testing methods, tools or theory a
signiﬁcant problem (P33–P35, P37). Modelling applications before testing was not
popular and some companies mentioned that testing on all devices is not needed
because a set of supported devices is chosen before development. It can be argued that
this practice of choosing a set of supported devices itself shows that testing an
application on all potentially suitable devices is too difﬁcult, expensive or
time-consuming.
Some problems not considered highly relevant by companies mainly focusing on
development were still considered potentially problematic by testing companies. These
included acquiring a mental model of a complex application (P9), the unpredictability
of external dependencies during testing (P12), ignoring unexpected user behavior

Problems and Solutions in Mobile Application Testing
Table 3. Relevance of problems to our case companies
Problem Testing companies
P1 ++
P2 +
P3 +
P4
P5 +
P6 ++
P7 +
P8 +
P9
P10 −
P11
P12 +
P13 +
P14 −
P15 −
P16 +
P17 −
P18
P19 +
P20 +
P21 +
P22
P23 +
P24
P25
P26
P27 +
P28 +
P29 −
P30 −
P31
P32
P33 −
P34 −
P35 −
P36 −
P37 −
P38 +
P39 −
P40 −
P41 −
P42 +
P43 −
P44
P45 −
P46 −
P47 −
P48
P49 +

Development companies

Fob Solutions Testlio

TestDevLab

Wazombi

MoonCascade Mobi Lab

Deﬁnitely
Solved
Deﬁnitely
Maybe
Maybe
Deﬁnitely
Deﬁnitely not
Deﬁnitely not
Probably not
Maybe
Probably not
Deﬁnitely
Deﬁnitely not
Probably not
Maybe
N/A
Maybe
Probably not
Probably not
Deﬁnitely
Maybe
Maybe
Deﬁnitely
Maybe
Probably not
Probably not
Deﬁnitely
Deﬁnitely
Probably not
Probably not
Maybe
Deﬁnitely not
Probably not
Probably not
Maybe
Solved
Solved
N/A
N/A
Probably not
Solved
N/A
Probably not
Solved
N/A
Deﬁnitely not
Deﬁnitely not
N/A
N/A

Deﬁnitely
Probably not
Deﬁnitely not
Deﬁnitely not
Deﬁnitely
Deﬁnitely
Deﬁnitely not
Deﬁnitely
Deﬁnitely
Probably not
Deﬁnitely
Deﬁnitely
Deﬁnitely
Probably not
Deﬁnitely not
Probably not
Maybe
Deﬁnitely
Deﬁnitely not
Probably not
Deﬁnitely
Deﬁnitely not
Deﬁnitely not
Probably not
Deﬁnitely not
Probably not
Maybe
Deﬁnitely
Maybe
Maybe
Maybe
Deﬁnitely
Deﬁnitely not
Deﬁnitely not
Deﬁnitely not
Probably not
Probably not
Deﬁnitely
N/A
Deﬁnitely not
Deﬁnitely not
Deﬁnitely
Deﬁnitely not
Deﬁnitely not
N/A
Deﬁnitely not
Probably not
Probably not
Deﬁnitely not

Maybe
Deﬁnitely
Maybe
Deﬁnitely
Deﬁnitely not
Deﬁnitely not
Solved
Probably not
Deﬁnitely not
Maybe
Solved
Probably not
Solved
Probably not
Deﬁnitely not
Deﬁnitely not
Deﬁnitely not
Maybe
Probably not
Probably not
Solved
Solved
Deﬁnitely
Probably not
Probably not
Deﬁnitely not
Probably not
Probably not
Maybe
Deﬁnitely not
Maybe
Deﬁnitely not
Solved
Solved
Deﬁnitely not
Solved
Probably not
Deﬁnitely
N/A
Deﬁnitely not
Solved
Deﬁnitely not
Deﬁnitely not
Solved
Solved
Probably not
Solved
Maybe
Deﬁnitely

Deﬁnitely
Deﬁnitely
N/A
Probably not
Maybe
Deﬁnitely
Deﬁnitely
Deﬁnitely
Deﬁnitely not
Probably not
Maybe
Probably not
Deﬁnitely
Maybe
Probably not
Maybe
Maybe
Maybe
Deﬁnitely
Deﬁnitely
Probably not
Deﬁnitely
Deﬁnitely
Deﬁnitely
Maybe
Probably not
Deﬁnitely
Probably not
Maybe
Deﬁnitely not
Deﬁnitely
Probably not
Probably not
Maybe
Probably not
Solved
Solved
Deﬁnitely not
N/A
Probably not
Solved
Deﬁnitely
Solved
Solved
Solved
Maybe
Solved
Probably not
Deﬁnitely

Solved
N/A
N/A
Solved
Solved
Solved
N/A
Deﬁnitely
Maybe
Maybe
N/A
Maybe
N/A
Maybe
Solved
Deﬁnitely
Maybe
N/A
N/A
Deﬁnitely
Deﬁnitely
Maybe
N/A
Solved
Deﬁnitely
Deﬁnitely
Deﬁnitely
Maybe
N/A
N/A
Maybe
Maybe
N/A
Maybe
N/A
Maybe
Maybe
N/A
N/A
N/A
Maybe
N/A
Probably not
Deﬁnitely
N/A
Probably not
Maybe
Deﬁnitely
Maybe

Deﬁnitely
Maybe
Deﬁnitely
Deﬁnitely not
Deﬁnitely
Deﬁnitely
Deﬁnitely
Probably not
Deﬁnitely not
Deﬁnitely not
Probably not
Deﬁnitely not
Deﬁnitely
Deﬁnitely not
Deﬁnitely not
Deﬁnitely
Deﬁnitely not
Deﬁnitely not
Deﬁnitely
Maybe
Maybe
Probably not
Maybe
Probably not
Probably not
Deﬁnitely not
Deﬁnitely not
Deﬁnitely not
Maybe
Deﬁnitely not
Deﬁnitely not
Deﬁnitely not
Deﬁnitely not
Deﬁnitely not
Deﬁnitely not
Deﬁnitely not
Deﬁnitely not
N/A
Probably not
Probably not
Solved
Probably not
Solved
Solved
Probably not
Deﬁnitely not
Probably not
Deﬁnitely not
Deﬁnitely not

259

260

T. Samuel and D. Pfahl

(P15), users’ variable mobile device usage experience (P21), insufﬁcient OS failure
logging (P30) and the usability and accessibility aspects of complex input mechanisms
(P32). We suppose that testing companies do testing more thoroughly or are just more
aware of their testing processes. In addition to this, if testing is performed by developers or at least in the same company, then the people doing the testing probably have a
better overview of how the application is intended to function.
Since Testlio was the only company that actively uses a community-based testing
approach as opposed to just testing in-house, different problems are sometimes considered relevant by them. Notably, fragmentation (P1) and the large number of test
devices to buy (P6) that were considered problems by most companies are not a
problem for Testlio because their testers use personal devices for testing. On the other
hand, they are subject to some challenges that are not relevant for any other companies.
For example, since their testers are working remotely, they need more advanced UI
recording tools (P44) than companies that perform testing locally. The large number of
test devices (P6) is also not a problem for Wazombi who mainly provides end-to-end
services that include both hardware and software development.
The lack of design principles (P36) was considered already solved by guidelines
provided by mobile operating systems and cross-platform principles were not considered necessary. The problem of not being able to modify a mobile application after
installing (P41) was considered solved by either automatic updates provided by app
stores or specialized software that can be embedded into applications for A/B testing.
The fact that testing is expected to be faster for mobile applications was not considered
a big obstacle because mobile applications on average were said to contain less
functionality than desktop applications.
5.2

Findings Related to RQ5

In this sub-section we present the results regarding the extent to which industry professionals interviewed in our case study consider the solutions provided in literature
potentially useful in practice (cf. Table 4).
11 solutions were presented to industry professionals based on the problems that
they considered relevant. None of the solutions were uniformly accepted by the
companies, although solution S26 (An approach for amplifying exception handling
code) was considered useful by all the companies that found it applicable.
Solution S8 (MobiBug) was presented to all companies. Respondents from Testlio,
TestDevLab and Mobi Lab considered it potentially useful. Wazombi commented that
since even devices of the same model don’t function completely identically, a model
that assumes they do might be inaccurate. MoonCascade said that nowadays OS
built-in logging is already more ﬁne-grained than stated in the article and 3rd party
libraries for monitoring fault conﬁgurations exist, there-fore this solution already exists.
None of the interviewees considered solution S9 (iTest) a useful innovation. Some
mentioned that a solution of this kind already exists. Others were skeptical of whether
this approach would work well because people rarely give any feedback when things
work (Mobi Lab) and it is difﬁcult to ensure a full variety of user proﬁles in registered
testers (Testlio). One company expressed that the success of this approach highly

Problems and Solutions in Mobile Application Testing

261

depends on the tester incentive mechanism. Therefore, the technical solution alone does
not bring much value.
Solution S10 (Symbolic execution of Android apps) was presented to two companies. TestDevLab considered it potentially useful while Wazombi said the solution
would not applicable for them because it can only handle applications written in Java.
Solution S11 (JPF-Android) is not applicable to Wazombi whose applications are
not Java-based. TestDevLab was hesitant about whether this would work and
MoonCascade said the tool would be useful if it could emulate drivers of all kinds of
sensors and developers manage to keep the tool up to date with new OS versions.
Most of the companies liked the concept of solution S16 (VALERA) and thought it
would be useful. Wazombi was more skeptical due to the fact that VALERA does not
record memory operations.
Solution S21 (Tool with 2 approaches for automated model-based testing) was only
presented to the Testlio representative who found it useful.
Solution S23 (Unit-testing performance) was presented to Mobi Lab and TestDevLab. Mobi Lab found it useful while the latter commented that the duration of
method execution can depend on things outside the developer’s control, e.g. network
conditions, therefore duration of execution cannot be accredited to just performance.
TestDevLab considered solution S25 (VanarSena) useful. Mobi Lab said that it is
already used for Windows Phone applications.
Solution S26 (Approach for amplifying exception-handling code) was presented to
4 companies. For Wazombi, this solution wasn’t applicable due to being Java-based,
but the others considered it useful.
Regarding solution S28 (Knowledge base for compatibility testing) Mobi Lab
thought it could work and Wazombi said it could work partly, for API version based
problems. MoonCascade was skeptical about how an appropriate level of granularity
could be set for recording results – if every combination of application version, device,
OS version, etc. would be recorded separately then very few queries would get a reply
from the database while in other cases there is a high probability of over-generalization.
TestDevLab said that a solution like this is probably already integrated to some testing
software.
Solution S29 (TestDroid) was not very well-received. Sob Solutions and Mobi Lab
said that this solution already exists. TestDroid itself is available online and is not the
only cloud-based testing platform. Testlio said that in principle the approach is plausible while TestDevLab thought it might be useful only for small teams that do not
have access to an extensive set of test devices. MoonCascade was also of the opinion
that for companies of signiﬁcant size, it is better to have their own set of devices as
cloud-based solutions are expensive, unreliable and do not have support for various test
styles and frameworks.
In total, there were only 3 solutions that were considered relevant by all companies
to which they were presented and who found them applicable, i.e. S10 (2 companies),
S21 (1 company), and S26 (3 out of 4 companies). None of the respondents considered
S9 a good solution and most were skeptical about S28 and S29.
Upon hearing the solution concepts, many interviewees expressed that the general
concept of the solution is familiar to them or already exists. However, they had not
marked the corresponding problems as ‘Solved’ in the questionnaire part of the

262

T. Samuel and D. Pfahl

interview. This implies that the concepts they already knew either do not fully solve the
proposed problem or the professionals have not thought about using this concept to
solve the given problem. The latter is compatible with our general observation that
companies seem to consider the new challenges of mobile application testing inevitable
and thus do not think about the possibility to eliminate them with the help of new
methods and tools. In that sense, help from the scientiﬁc community could actually
help if they were considered more seriously.
Several proposed solutions were considered to be either too theoretical, general or
relating to problems that were already solved by the case companies. The latter point
can to some degree attributed to the fact that we included articles published from 2007,
i.e. almost 10 years ago, in the literature study. However, the two least supported
solution concepts, TestDroid and iTest, were published in 2014 and 2012, respectively.
Therefore, either the ﬁeld of mobile application testing is developing so fast that only
papers published less than two years ago provide practical value for companies or
research is sometimes detached from the current problems in industry.
Another observation was that additional attention could be paid to the fact that
companies use different tools for developing and testing mobile applications. For
example, not all Android applications are developed in Java and cloud-testing platforms would be more useful if they supported different testing frameworks.

6 Discussion of Limitations
In the following, we summarize the limitations of our literature survey and case study.
6.1

Limitations of the Literature Survey

The literature study was performed by one person (the ﬁrst author) within a limited
amount of time (four weeks). Due to this resource limitation a systematic literature
study following the guidelines deﬁned in [12] to the letter was not viable. Therefore, it
is possible that some relevant papers were either not found or ﬁltered out incorrectly
when applying the deﬁned search strings and inclusion/exclusion criteria. Also, there
was not enough time for conducting a thorough quality assessment of the included
papers. Additionally, since the information was extracted from papers by just one
person and without previously specifying what constitutes a problem or solution, the
analysis is bound to be somewhat subjective, although we tried to mitigate this problem
by having another person (the second author) review the results of the paper identiﬁcation and selection. The limitation applies to the linking of problems to their potential
solutions in the problem-solution matrix. Lastly, since papers from 2007 to 2016 were
used in the study, it is possible that some of the problems mentioned in the literature
have been solved and thus the problems have become obsolete. Also, Android was
signiﬁcantly more represented than other platforms in the set of found papers and
therefore many of the found problems and solutions concern mobile applications on the
Android platform.

Problems and Solutions in Mobile Application Testing

263

Table 4. Relevance of solutions
Solution Company
Fob Solutions
S8
No
S9
Exists
S10
S11
S16
S21
S23
S25
S26
S28
S29
Exists

6.2

Testlio Wazombi TestDevLab
Yes
No
Yes
Partly No
No
n/a
Yes
n/a
Maybe
Yes
No
Yes
Yes
No
Yes
n/a
Yes
Partly
Exists
Yes
Partly

MoonCascade Mobi Lab
Exists
Yes
Exists
No
Maybe
Yes

Yes

Yes
No
No

Yes
Exists
Yes
Yes
Exists

Limitations of the Case Study

In our case study, we used [13] as a guideline but due to time and resource constraints,
we didn’t follow all recommendations to the letter. While both testing and development
oriented companies in Estonia were included in the study, the initial list of companies
was compiled mostly opportunistically, i.e. where the ﬁrst author had some previous
knowledge and (in some cases) personal contacts. Therefore, it is likely that the participants of this study are not fully representative for all companies in Estonia doing
mobile application testing. During the interviews, participants were asked to assess the
potential suitability of some solutions proposed in literature. Since it would be
unreasonable to expect participants to read the relevant scientiﬁc articles, we shortly
explained each solution concept that the interviewees were asked to assess on the spot.
As a result, our personal bias and the quality of our explanations might have had an
effect on the perceived usefulness of the solutions. Due to time constraints not all
potential solutions were presented. In each case, the decision of which solutions to
present was made based on the prioritization of problems and using the
problem-solution matrix. This creates the possibility that the set of solutions proposed
to the case companies might not have been complete.

7 Summary and Conclusion
We conducted a two-staged study involving a literature survey and a case study to ﬁnd
answers to ﬁve research questions concerning challenges and solutions of mobile
application testing as seen by researchers and industry.
In the attempt to answer RQ1 and RQ2, 49 problems and 39 potential solutions
were extracted in our literature survey. These lists answer research question 1 and 2,
respectively. For RQ3, the result is less clear. Even though only one paper speciﬁed
that the proposed solution is already used in industry, it is likely that some of the others

264

T. Samuel and D. Pfahl

are as well, considering that many authors were associated with companies active in the
industry. Therefore, it can be said that the solutions are used in industry, but the extent
of this usage cannot be adequately determined just based on scientiﬁc litera-ture. For
RQ4, none of the problems mentioned in literature were considered uniform-ly relevant
by all industry professionals. However, most companies considered frag-mentation a
serious problem and usually mentioned it before being handed the ques-tionnaire.
Many of the problems mentioned in literature were not considered im-portant by our
case companies. Regarding RQ5, many of the solutions proposed in literature were too
general, too little evaluated, or too little related to the most relevant problems, to be
explained to and discussed with the professionals in sufﬁcient detail. And of those that
were presented and discussed, only a subset was uniformly consid-ered useful while
others were said to already exist (i.e., are already implemented) or have signiﬁcant
shortcomings (and thus would not be considered for implementation.
In conclusion, research literature is addressing some problems that are considered
very important by our case companies. However, there seems to be room for making
research more useful for industry since many of the currently proposed solutions are
considered as too much conceptual and too little practical by professionals.
Acknowledgements. We would like to thank Fob Solutions, Testlio, Mobi Lab, Wazombi,
TestDevLab and MoonCascade. This research was supported by the Estonian Research Council.

References
1. Muccini, H., Di Francesco, A., Esposito, P.: Software testing of mobile applications:
challenges and future research directions. In: Proceedings of AST 2012, Piscataway, NJ,
USA (2012)
2. Paul, S.: Role of mobile handhelds in redeﬁning how we work, live and experience the world
around us: some challenges and opportunities. In: Proceedings of SIGCOMM 2010, New
Delhi, India (2010)
3. Wasserman, A.I.: Software engineering issues for mobile application development. In:
Software Engineering Issues for Mobile Application Development, Santa Fe, New Mexico,
USA (2010)
4. Santos, A., Correia, I.: Mobile testing in software industry using agile: challenges and
opportunities. In: Proceedings of ICST 2015, Graz, Austria (2015)
5. N. T., Did you know what was the ﬁrst smartphone ever? PhoneArena, 31 July 2014. http://
www.phonearena.com/news/Did-you-know-what-was-the-ﬁrst-smartphone-ever_id58842.
Accessed 10 May 2016
6. Apple Inc., Apple Reinvents the Phone with iPhone. http://www.apple.com/pr/library/2007/
01/09Apple-Reinvents-the-Phone-with-iPhone.html. Accessed 12 May 2016
7. Statista Inc., Number of smartphones sold to end users worldwide from 2007 to 2015 (in
million units). http://www.statista.com/statistics/263437/global-smartphone-sales-to-endusers-since-2007/. Accessed 14 May 2016
8. Martinie, C., Palanque, P.: Design, development and evaluation challenges for future mobile
user interfaces in safety-critical contexts. In: Proceedings of the 2015 Workshop on Future
Mobile User Interfaces, Florence, Italy (2015)

Problems and Solutions in Mobile Application Testing

265

9. Corral, L., Sillitti, A., Succi, G.: Software assurance practices for mobile applications.
Computing 97(10), 1001–1022 (2015)
10. Net Applications, Mobile/Tablet Operating System Market Share January–March 2016.
https://www.netmarketshare.com/operating-system-market-share.aspx?qprid=
8&qpcustomd=1&qpsp=68&qpnp=1&qptimeframe=Q&qpmr=10&qpdt=0&qpct=3. Accessed 12 May 2016
11. Albanesius, C.: Nadella Raises Eyebrows With Plans to ‘Streamline’ Windows, PC
Magazine, 23 July 2014. http://www.pcmag.com/article2/0,2817,2461253,00.asp. Accessed
13 May 2016
12. Kitchenham, B.A., Charters, S.: Guidelines for performing systematic literature reviews in
software engineering. Technical report EBSE-2007-01, School of Computer Science and
Mathematics, Keele University (2007)
13. Runeson, P., Höst, M.: Guidelines for conducting and reporting case study research in
software engineering. Empirical Softw. Eng. 14(2), 131–164 (2009)
14. Charland, A., Leroux, B.: Mobile application development: web vs. native. Commun. ACM
54(5), 49–53 (2011)
15. Samuel, T.: Problems and solutions in mobile application testing, MSc thesis, University of
Tartu, Estonia (2016). https://comserv.cs.ut.ee/ati_thesis/datasheet.php?id=54422&year=
2016
16. Zhifang, L., Bin, L., Xiaopeng, G.: Test automation on mobile device. In: Proceedings of the
5th Workshop on Automation of Software Test, Cape Town, South Africa (2010)
17. Yan, M., Sun, H., Liu, X.: ITest: testing software with mobile crowdsourcing. In:
Proceedings of CrowdSoft 2014, Hong Kong, China (2014)
18. Kaasila, J., Ferreira, D., Kostakos, V., Ojala, T.: Testdroid: automated remote UI testing on
android. In: Proceedings of 11th International Conference on Mobile and Ubiquitous
Multimedia, Ulm, Germany (2012)
19. Ravindranath, L., Nath, S., Padhye, J., Balakrishnan, H.: Automatic and scalable fault
detection for mobile applications. In: Proceeidngs of MobiSys 2014, Bretton Woods, New
Hampshire, USA (2014)
20. Ham, H., Park, Y.: Designing knowledge base mobile application compatibility test system
for android fragmentation. Intl. J. Softw. Eng. Appl. 8(1), 303–314 (2014)
21. Tang, L., Yu, Z., Zhou, X., Wang, H., Becker, C.: Supporting rapid design and evaluation of
pervasive applications: challenges and solutions. Pers. Ubiquit. Comput. 15(3), 253–269
(2011)
22. Galindo, J.A., Turner, H., Benavides, D., White, J.: Testing variability-intensive systems
using automated analysis: an application to Android. Softw. Qual. J. 42(2), 365–405 (2014)
23. Baride, S., Dutta, K.: A cloud based software testing paradigm for mobile applications.
SIGSOFT Softw. Eng. Notes 36(3), 1–4 (2011)
24. Azim, T., Neamtiu, I.: Targeted and depth-ﬁrst exploration for systematic testing of Android
apps. ACM SIGPLAN Not. 48(10), 641–660 (2013)
25. Arzt, S., Rasthofer, S., Fritz, C., Bodden, E., Bartel, A., Klein, J., Le Traon, Y., Octeau, D.,
McDaniel, P.: FLOWDROID: precise context, flow, ﬁeld, object-sensitive and
lifecycle-aware taint analysis for Android apps. ACM SIGPLAN Notes 49(6), 259–269
(2014)
26. Haller, K.: Mobile testing. SIGSOFT Softw. Eng. Notes 38(6), 1–8 (2013)
27. Bastien, J.C.: Usability testing: a review of some methodological and technical aspects of the
method. Intl. J. Med. Inf. 79(4), e18–e23 (2010)
28. Ma, X., Yan, B., Chen, G., Zhang, C., Huang, K., Drury, J., Wang, L.: Design and
implementation of a toolkit for usability testing of mobile apps. Mob. Netw. Appl. 18(1),
81–97 (2013)

266

T. Samuel and D. Pfahl

29. Mirzaei, N., Malek, S., Pasareanu, C.S., Esfahani, N., Mahmood, R.: Testing Android apps
through symbolic execution. SIGSOFT Softw. Eng. Notes 37(6), 1–5 (2012)
30. Kim, H.-K.: Hybrid model based testing for mobile applications. Intl. J. Softw. Eng. Appl. 7
(3), 223–238 (2013)
31. van der Merwe, H., Tkachuk, O., van der Merwe, B., Visser, W.: Generation of library
models for veriﬁcation of android applications. SIGSOFT Softw. Eng. Notes 40(1), 1–5
(2015)
32. Hussain, A., Hashim, N., Nordin, N., Tahir, H.: A metric-based evaluation model for
applications on mobile phones. J. Inf. Commun. Technol. 12(1), 55–71 (2013)
33. Koivisto, E.M.I., Suomela, R.: Using prototypes in early pervasive game development. In:
Proceedings of ACM SIGGRAPH Symposium on Video Games, San Diego, California,
USA (2007)
34. van der Merwe, H., Tkachuk, O., Nel, S., van der Merwe, B., Visser, W.: Environment
modeling using runtime values for JPF-Android. SIGSOFT Softw. Eng. Notes 40(6), 1–5
(2015)
35. Agarwal, S., Mahajan, R., Zheng, A., Bahl, V.: Diagnosing mobile applications in the wild.
In: Proceedings of ACM SIGCOMM Workshop on Hot Topics in Networks, Monterey,
California (2010)
36. Kim, H., Choi, B., Yoon, S.: Performance testing based on test-driven development for
mobile applications. In: Proceedings of ICUIMC 2009, Suwon, South Korea, (2009)
37. Ceccato, M., Avancini, A.: Security testing of the communication among Android
applications. In: Proceedings of AST 2013, San Francisco, CA, USA (2013)
38. Adamsen, C.Q., Mezzetti, G., Moller, A.: Systematic execution of android test suites in
adverse conditions. In: Proceedings of ISSTA 2015, Baltimore, MD, USA (2015)
39. Zhang, P., Elbaum, S.: Amplifying tests to validate exception handling code: an extended
study in the mobile application domain. ACM Trans. Softw. Eng. Methodol. 23(4), 32:1–
32:28 (2014)
40. De Cleva Farto, G., Endo, A.: Evaluating the model-based testing approach in the context of
mobile applications. Electron. Notes Theor. Comput. Sci. 314, 3–21 (2015)
41. Zapata, B.C., Fernandez-Aleman, J.L., Idri, A., Toval, A.: Empirical studies on usability of
mHealth apps: a systematic literature review. J. Med. Syst. 39(2), 1–19 (2015)
42. Diewald, S., Geilhof, B., Siegrist, M., Lindemann, P., Koelle, M., Halle, M., Kranz, M.:
Mobile AgeCI: potential challenges in the development and evaluation of mobile
applications for elderly people. In: Computer Aided Systems Theory – EUROCAST
2015, Las Palmas, Spain (2015)
43. Oulasvirta, A.: Rethinking experimental designs for ﬁeld evaluations. IEEE Pervasive
Comput. 11(4), 60–67 (2012)
44. Biel, B., Grill, T., Gruhn, V.: Exploring the beneﬁts of the combination of a software
architecture analysis and a usability evaluation of a mobile application. J. Syst. Softw. 83
(11), 2031–2044 (2010)
45. Rapp, A., Cena, F., Gena, C., Marcengo, A., Console, L.: Using game mechanics for ﬁeld
evaluation of prototype social applications: a novel methodology. Behav. Inf. Technol. 35
(3), 184–195 (2015)
46. Billi, M., Burzagli, L., Catarci, T., Santucci, G., Bertini, E., Gabbanini, F., Palchetti, E.: A
uniﬁed methodology for the evaluation of accessibility and usability of mobile applications.
Univ. Access Inf. Soc. 9(4), 337–356 (2010)
47. Nascimento, L.H.D., Machado, P.D.: An experimental evaluation of approaches to feature
testing in the mobile phone applications domain. In: Proceedings of DOSTA 2007: in
Conjunction with the 6th ESEC/FSE Joint Meeting, Dubrovnik, Croatia (2007)

Problems and Solutions in Mobile Application Testing

267

48. van der Merwe, H., van der Merwe, B., Visser, W.: Verifying android applications using
Java pathﬁnder. SIGSOFT Softw. Eng. Notes 37(6), 1–5 (2012)
49. Salva, S., Zaﬁmiharisoa, S.R.: APSET, an Android aPplication SEcurity Testing tool for
detecting intent-based vulnerabilities. Intl. J. Softw. Tools Technol. Transfer 17, 201–221
(2015)
50. Aranha, E., Borba, P.: Estimating manual test execution effort and capacity based on
execution points. Intl. J. Comput. Appl. 31(3), 167–172 (2009)
51. Serra, L.C., Carvalho, L.P., Ferreira, L.P., Vaz, J.B.S., Freire, A.P.: Accessibility evaluation
of e-government mobile applications in Brazil. Procedia Comp. Sci. 37, 348–357 (2015)
52. Morgado, I.C., Paiva, A.C.R.: Test patterns for android mobile applications. In: Proceedings
of the 20th European Conference on Pattern Languages of Programs, Kaufbeuren, Germany
(2015)
53. Wang, X.S., Balasubramanian, A., Krishnamurthy, A., Wetherall, D.: Demystifying page
load performance with WProf. In: Proceedings of the 10th USENIX Conference on
Networked Systems Design and Implementation, Lombard, IL (2013)
54. Hutflesz, P., Holzmann, C.: Multivariate testing of native mobile applications. In:
Proceedings of MoMM 2014, Kaohsiung, Taiwan (2014)
55. Choi, W., Necula, G., Sen, K.: Guided GUI testing of Android apps with minimal restart and
approximate learning. ACM SIGPLAN Not. 48(10), 623–639 (2013)
56. Hu, Y., Azim, T., Neamtiu, I.: Versatile yet lightweight record-and-replay for Android. In:
Proceedings of 2015 ACM SIGPLAN International Conference on Object-Oriented
Programming, Systems, Languages, and Applications, vol. 50(10), pp. 349–366 (2015)
57. Gao, J., Tsai, W.-T., Paul, R., Bai, X., Uehara, T.: Mobile testing-as-a-service (MTaaS) infrastructures, issues, solutions and needs. In: Proceedings of 2014 IEEE 15th International
Symposium on High-Assurance Systems Engineering, Miami, FL, USA (2014)
58. Costa, P., Paiva, A.C.R., Nabuco, M.: Pattern based GUI testing for mobile applications. In:
9th International Conference on the Quality of Information and Communications,
Guimaraes, Portugal (2014)
59. Bo, J., Xiang, L., Xiaopeng, G.: MobileTest: a tool supporting automatic black box test for
software on smart mobile devices. In: Proceedings of AST 2007, Washington, DC, USA
(2007)
60. Google, Performance focus. http://developer.android.com/about/versions/lollipop.html#Perf.
Accessed 8 Apr 2016
61. Amalﬁtano, D., Fasolino, A.R., Tramontana, P., Ta, B.D., Memon, A.M.: MobiGUITAR:
automated model-based testing of mobile apps. IEEE Softw. 32(5), 53–59 (2015)
62. Amalﬁtano, D., Amatucci, N., Fasolino, A.R., Tramontana, P.: AGRippin: a novel search
based testing technique for android applications. In: Proceedings of 3rd International
Workshop on Software Development Lifecycle for Mobile, Bergamo, Italy (2015)
63. Amalﬁtano, D., Amatucci, N., Fasolino, A.R., Tramontana, P., Kowalczyk, E., Memon, A.
M.: Exploiting the saturation effect in automatic random testing of android applications. In:
Proceedings of the 2nd ACM International Conference on Mobile Software Engineering and
Systems, Florence, Italy (2015)
64. Amalﬁtano, D., Fasolino, A.R., Tramontana, P., De Carmine, S., Memon, A.M.: Using GUI
ripping for automated testing of android applications. In: Proceedings of ASE 2012, Essen,
Germany (2012)
65. Bergvall-Kareborn, B., Larsson, S.: A case study of real-world testing. In: Proceedings of the
7th International Conference on Mobile and Ubiquitous Multimedia, Umeå, Sweden (2008)
66. Guo, C., Xu, J., Yang, H., Zeng, Y., Xing, S.: An automated testing approach for
inter-application security in android. In: Proceedings of AST 2014, Hyderabad, India (2014)

Cost-Benefit Analysis of Using Dependency
Knowledge at Integration Testing
Sahar Tahvili1,2(B) , Markus Bohlin1(B) , Mehrdad Saadatmand1,2 ,
Stig Larsson1 , Wasif Afzal2 , and Daniel Sundmark2
1

SICS Swedish ICT, Västerås, Sweden
{sahart,markus.bohlin,mehrdad,stig.larsson}@sics.se
2
Mälardalen University, Västerås, Sweden
{wasif.afzal,daniel.sundmark}@mdh.se

Abstract. In software system development, testing can take considerable time and resources, and there are numerous examples in the literature of how to improve the testing process. In particular, methods
for selection and prioritization of test cases can play a critical role in
eﬃcient use of testing resources. This paper focuses on the problem of
selection and ordering of integration-level test cases. Integration testing
is performed to evaluate the correctness of several units in composition.
Further, for reasons of both eﬀectiveness and safety, many embedded
systems are still tested manually. To this end, we propose a process, supported by an online decision support system, for ordering and selection
of test cases based on the test result of previously executed test cases. To
analyze the economic eﬃciency of such a system, a customized return on
investment (ROI) metric tailored for system integration testing is introduced. Using data collected from the development process of a large-scale
safety-critical embedded system, we perform Monte Carlo simulations to
evaluate the expected ROI of three variants of the proposed new process.
The results show that our proposed decision support system is beneﬁcial
in terms of ROI at system integration testing and thus qualiﬁes as an
important element in improving the integration testing process.
Keywords: Process improvement · Software testing · Decision support
system · Integration testing · Test case selection · Prioritization · Optimization · Return on investment

1

Introduction

The software testing process is typically performed at various integration levels, such as unit, integration, system and acceptance level testing. At all levels,
software testing suﬀers from time and budget limitations. Improving the testing
process is thus essential from both product quality and economic perspectives.
Towards this goal, application of more eﬃcient testing techniques as well as
automating diﬀerent steps of the testing process (e.g., test case generation, test
execution etc.) can be considered. For test execution, the decision of which test
c Springer International Publishing AG 2016

P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 268–284, 2016.
DOI: 10.1007/978-3-319-49094-6 17

Cost-Beneﬁt Analysis of Using Dependency Knowledge

269

cases to select and the order in which they are executed can play an important
role in improving test eﬃciency. In our previous work [1], we introduced a technique based on dependencies between test cases and their execution results at
runtime. The technique dynamically selects test cases to execute by avoiding
redundant test cases. In our technique, identiﬁed dependencies among test cases
give partial information on the verdict of a test case from the verdict of another
one. In this paper, we present a cost-beneﬁt analysis and a return on investment
(ROI) evaluation of the dependency-based test selection proposed in [1]. The
analysis is conducted by means of a case study of the integration testing process
in a large organization developing embedded software for trains. In particular,
we analyze various costs that are required to introduce our decision support system (DSS) and compare these costs to the achieved cost reductions enabled by
its application. To improve the robustness of the analysis, stochastic simulation
of tens of thousands possible outcomes have been performed. In summary, the
paper makes the following contributions:
– A high-level cost estimation model, based on Monte-Carlo simulation, for the
evaluation of integration test-case prioritization with test-case dependencies.
The model is generic and can be used to analyze integration testing for a wide
range of systems that exhibit test case dependencies.
– An application of the cost estimation model in an industrial case study at Bombardier Transportation (BT) where three alternatives for process improvement
are compared to the baseline test execution order.
– A sensitivity analysis for the model parameter values in the case study.
Through the analysis, various scenarios have been identiﬁed where the application of the proposed DSS can be deemed as either cost beneﬁcial or not.
The remainder of this paper is structured as follows. Section 2 presents the
background while Sect. 3 provides a description of the DSS for test case prioritization. Section 4 describes a generic economic model. Section 5 provides a case
study of a safety-critical train control management subsystem, and gives a comparison with the currently used test case execution order. In Sect. 6, the results
and limitations are discussed and ﬁnally Sect. 7 concludes the paper.

2

Background

Numerous techniques for test case selection and prioritization have been proposed in the last decade [2–4]. Most of the proposed techniques for ordering
test cases are oﬄine, meaning that the order is decided before execution while
the current execution results do not play a part in prioritizing or selecting test
cases to execute. Furthermore, only few of these techniques are multi-objective
whereby a reasonable trade-oﬀ is reached among multiple, potentially competing, criteria. The number of test cases that are required for testing a system
depends on several factors, including the size of the system under test and its
complexity. Executing a large number of test cases can be expensive in terms of
eﬀort and wall-clock time. Moreover, selecting too few test cases for execution

270

S. Tahvili et al.

might leave a large number of faults undiscovered. The mentioned limiting factors (allocated budget and time constraints) emphasize the importance of test
case prioritization in order to identify test cases that enable earlier detection of
faults while respecting such constraints. While this has been the target of test
selection and prioritization research for a long time, it is surprising how only few
approaches actually take into account the speciﬁcs of integration testing, such
as dependency information between test cases.
Exploiting dependencies in test cases have recently received much attention
(See e.g., [5,6]) but not for test cases written in natural language, which is the
only available format of test cases in our context. Furthermore, little research has
been done in the context of embedded system development in real, industrial context, where integration of subsystems is one of the most diﬃcult and fault-prone
task. Lastly, managing the complexity of integration testing requires online decision support for test professionals as well as trading between multiple criteria;
incorporating such aspects in a tool or a framework is lacking in current research.
The cost of quality is typically broken down into two components: conformance and nonconformance costs [7]. The conformance costs are prevention and
appraisal costs. Prevention costs include money invested in activities such as
training, requirements and code reviews. Appraisal costs include money spent
on testing such as test planning, test case development and test case execution. The non-conformance costs include internal and external failures. The cost
of internal failure include cost of test case failure and the cost of bug ﬁxing.
The cost of external failure include cost incurred when a customer ﬁnds a failure [8]. This division of cost of quality is also a basis for some well-known quality
cost models such as Prevention-Appraisal-Failure (PAF) model [9] and Crosby’s
model [10]. While general in nature, such quality cost models have been used
for ﬁnding cost of software quality too, see e.g., [11–13]. Software testing is one
important determinant of software quality and smart software managers consider
the cost incurred in test related activities (i.e., appraisal cost) as an investment
in quality [8]. However, very few economic cost models of software testing exist,
especially metrics for calculating the return on testing investment are not wellresearched. It is also not clear how the existing software test process improvement
approaches [14] cater for software testing economics. One reason for this lack of
attention of economics in software quality in general is given by Wagner [15].
According to him, empirical knowledge in the area is hampered by diﬃculties
in cost data gathering from companies since it is considered as sensitive. Nikolik [16] proposes a set of test case based economic metrics such as test case cost,
test case value and return on testing investment. A test cost model to compare
regression test strategies is presented by Leung and White [17]. They distinguish
between two cost types: direct and indirect costs. Direct costs include time for
all those activities that a tester performs. This includes system analysis cost,
test selection cost, test execution cost and result analysis cost. Indirect costs
include test tool development cost, test management cost and cost of storing
test-related information. A test cost model inspired by PAF model is also presented by Black [18] while several cost factors for ROI calculation for automated

Cost-Beneﬁt Analysis of Using Dependency Knowledge

271

test tools are given in other studies [19–21]. Some other related work is done by
Felderer et al. [22,23] where they develop a generic decision support procedure
for model-based testing in an industrial project and compare estimated costs
and beneﬁts throughout all phases of the test process.

3

Decision Support System for Test Case Prioritization

In this section we outline our proposed DSS, which prioritizes and selects integration test cases based on analysis of test case dependencies. Although not the
focus of this paper, the DSS is also capable of performing multi-criteria decision
analysis. The details of the approach can be found in [1]. In essence, the DSS
provides an optimized order for execution of test cases by taking into account
the execution result of a test case, its dependency relations and various test
case properties. The steps performed in the DSS can be categorized into an
oﬄine and online phase: The oﬄine phase produces an order for execution of
test cases based on diﬀerent test case properties (e.g., fault detection probability, execution time, cost, requirement coverage, etc.) while in the online phase,
the pass or fail verdict of executed test cases is taken into account in order to
identify and exclude upcoming test cases based on knowledge of dependencies
between executed and scheduled test cases. The following deﬁnition of result
dependency for integration test cases, ﬁrst introduced in [1], constitute the basis
of the dependency-based prioritization considered in this paper:
Definition. For two test cases A and B, B is dependent on A if, from the failure
of A, it can be inferred that B will also fail.
In industrial practice, such dependencies may exist e.g., whenever a subsystem uses the result of another subsystem. During testing, the dependency
may manifest whenever a test case B, dependent on test case A, is scheduled
for execution before a component, tested by A, has been fully and adequately
implemented and tested. By delaying the execution of B until A has passed, we
ensure that the prerequisites for testing B are met. For instance, if the power
system in a train fails to work, the lighting and air conditioning systems will not
function either.
3.1

Architecture and Process of DSS

In this section, we give the basic architecture and process for the decision support
system [1]. We use the term ‘decision support system’ to emphasize that it can
be instantiated in contexts similar to ours, i.e., test cases written in natural
language, meant for testing of integration of subsystems in embedded system
development.
In Fig. 1, we describe the steps of the semi-automated decision support system
for optimizing integration test selection. New test cases are continuously collected
in a test pool (1) as they are developed. The test cases are initially not ordered
and are candidates for prioritization. As a preparation for the prioritization of
the test cases, the values for a selected set of criteria need to be determined

272

S. Tahvili et al.

2

Criteria Determination

T C1 T C8 T C2 TC6
1

T Cn T C5

Non-Ordered
Test Cases

3

Prioritization

Ordered
Test Cases
4

Execution

5

Re-consider
for execution

TC4 T C7

T Cn
T C1

TC4 T C7 T C3

6

Monitor

λ0

T C2

Stop

7

Fig. 1. Architecture of the proposed online DSS.

(2) for the new test cases. To prioritize among the test cases, the DSS expects
the use of a multi-criteria decision making technique (3, prioritization) [24].
Once prioritized, the test cases are in this step executed (preferably) according
to the recommended order (4). The result of executing each test case could
either be Pass or Fail . We have previously in [1] shown that by detecting the
dependency between test cases, we are able to avoid the redundant executions.
When a test case fails during the execution, all its dependent test cases should
be disabled for execution. The failed test cases from the previous step enter a
queue for troubleshooting. The failed test cases will be reconsidered for execution
once the reason of their failure is resolved (5). The results of each test are
monitored (6) to enable (re-)evaluation of the executability condition (see [1])
of the test cases that are dependent on it. This will determine if the dependent
test case should be selected for execution or not. Furthermore, the completeness
of the testing process will be monitored through the metric fault failure rate
(see [25]), denoted by λ in Fig. 1. This metric is the proportion of the failed test
cases to the total number of executed test cases. The goal is to reach successful
execution of maximum test cases to be able to ﬁnish the current test execution
cycle. The current test execution cycle will stop (7) once the fault failure rate
becomes 0. The steps in the DSS are performed in the integration phase for
each iteration/release of the system. This will ensure that the relevant test are
executed in a suitable sequence, and that the tests resources are used in an
optimal way.

4

Economic Model

In this section we describe an economic model for the cost-beneﬁt analysis of a software integration testing process where test cases are delayed until their execution
requirements are fulﬁlled. The model is independent on the speciﬁc multi-criteria
decision making technique used and of the speciﬁc system under test.
The purpose of the economic model is to adequately capture the costs and
beneﬁts which are directly related to the adoption of a DSS-supported testing
process aiding in the prioritization of test cases. In industry, it is common that

Cost-Beneﬁt Analysis of Using Dependency Knowledge

273

time and cost estimates for software development processes only exist as estimates or averages, if at all. Finally, for an analysis to be useful in practice, the
analysis model should be reasonably lightweight and contain only the absolutely
necessary parameters. Using the proposed economic model, a stochastic ROI
analysis can then be obtained by Monte Carlo simulation. A stochastic analysis
avoids the sensitivity of traditional ROI analyses by considering a large number
of possible parameter values, thereby oﬀsetting some of the disadvantages in
being forced to use less reliable parameter value estimates.
In this paper, we use a cost model with the following parameters:
1. A one-time ﬁxed DSS implementation cost, corresponding to a ﬁxed-price
contract negotiated beforehand,
2. Variable costs for DSS training, on a per person-days of DSS usage basis,
3. Variable costs for DSS maintenance, on a per person-days of DSS usage basis,
4. Variable costs for (a) DSS data collection, (b) test planning, on a per test-case
basis, and
5. Variable costs for (a) test-case execution (per executed test-case) and (b)
troubleshooting (per failed test case).
We make the following simplifying assumptions on the testing process to be
analyzed:
(a) A test case is executed at most once in each release cycle,
(b) If a test case fails, it is delayed until the next release cycle,
(c) Reliability of an already implemented and properly maintained system grows
according to a simpliﬁed Goel-Okumoto model [26], and
(d) Test execution and troubleshooting eﬀort is independent of each other and
between test cases.
In the model, we only include the costs and beneﬁts which are aﬀected by
using the DSS, and hence, do not need to consider other eﬀorts such as the eﬀort
of testing for test cases that pass or that fail for other reasons than dependency.
The following cost model is used for the approach in each release cycle t:


Ct = CtI + dt · CtT + CtM + nt · CtD + CtP + γt · λt · nt · CtE + CtB , (1)
where C I is the implementation cost, dt is the number of person-days in t, C T is
the training cost, C M is the maintenance cost, nt is the number of test cases, C D
is the data collection cost, C P is the test order planning cost, including possible
preprocessing, test case data input, DSS runtime, post-processing and test case
prioritization output, λt is the fraction of failed test cases in the release cycle, γt is
the fraction of test cases that failed due to a fail-based-on-fail dependency (out of
B
the failed test cases), C E is the average test execution
 and C is the average
 E cost,
B
troubleshooting cost. The last term, γt ·λt ·nt · Ct + Ct , calculates the cost for
unnecessarily running test cases which will surely fail due to dependencies. This
is the only runtime cost we need to consider when comparing a DSS-supported
process and the baseline process without DSS support, as the other costs for
running test cases will remain the same.

274

S. Tahvili et al.

Over the course of time, the maintenance cost for a deployed system with a
small number of adaptations will be approximately proportional to the failure
intensity of the system. In this paper, we therefore assume that the DSS software
reliability grows according to a simpliﬁed Goel-Okumoto model (see [26]), i.e.,
that the failure intensity decrease exponentially as λ(d) = λ0 e−σd , where λ(d)
is the failure intensity at time d, λ0 is the initial failure intensity, and σ is the
rate of reduction. Further, for any release cycle t, each test case belonging to t
is assumed to be tested exactly once within t. It is therefore logical to assume
that there is no decrease in the test case failure rate during any single release
cycle. Under these assumptions, the expected maintenance cost in release cycle
t can be calculated as follows.
CtM = C0M · Dt · e−σDt ,

(2)

t
where C0M is the initial maintenance cost and Dt = i=1 di is the total project
duration at release cycle t, where di is the duration of a single cycle i.
Apart from the implementation cost C I , there are other unrelated infrastructure costs which are not aﬀected by the process change and can therefore be
disregarded from. Likewise, full staﬀ costs are not included, as the team size
remains constant, and we instead focus on measuring the savings in work eﬀort
cost from a change in process. In the model in Eq. (1), the savings of a process
change taking test case dependencies into account can be measured as a difference in costs, under the assumption that all other costs are equal. As each
integration test case is normally executed once in each release cycle, after each
cycle there is a set of failed test cases that needs to be retested in the next
cycle. In this paper, we are interested in estimating the economic beneﬁts of
delaying test cases whose testability depend on the correctness of other parts of
the system. In other words, we are interested in estimating the number of test
cases which fail solely due to dependencies. For these test cases, we can save
executing and troubleshooting eﬀorts. If γt · λt number of the test cases fail due
to dependencies, then, from Eq. (1), we have that by delaying the execution of
such test cases, the saving (i.e. beneﬁt) of changing the process can be at most:


(3)
Bt = γt · λt · nt CtE + CtB .
The estimate is an upper bound as in reality we may not be able to capture
all dependencies in our analysis. Further, there is a possibility that the analysis
falsely identiﬁes dependencies which do not exist. The eﬀect of delaying the corresponding test cases to the next phase is to delay the project slightly; however,
this eﬀect is likely small and we therefore disregard from it in this paper.
4.1

Return on Investment Analysis

A Return on Investment (ROI) analysis represents a widely used approach for
measuring and evaluating the value of a new process and product technology
[27]. In this study we consider all costs directly related to the process change
to be part of the investment cost. If we also assume that the sets of test cases

Cost-Beneﬁt Analysis of Using Dependency Knowledge

275

to execute for all release cycles are disjoint, we can calculate the total costs
and beneﬁts by adding the costs and beneﬁts for each release cycle. We use the
following ROI model based on the net present value of future cash ﬂows until
time T and an interest rate r.

T
Bt (1 + r)t
Rt = t=0
−1
(4)

T
t
t=0 Ct (1 + r)
We assume that the implementation cost is paid upfront, so that CtI = 0
when t ≥ 1, and that there are no other beneﬁts or costs at time t = 0. In
other words, B0 = 0, C0 = C0I and, consequently, R0 = −1. The interest rate r
is used to discount future cash ﬂows, and is typically the weighted average cost
of capital for the ﬁrm, i.e., the minimum rate of return that investors expect to
provide the needed capital.

5

Case Study

In order to analyze the economic feasibility of our approach, we carried out a case
study at Bombardier Transportation (BT) in Sweden, inspired by the guidelines
of Runeson and Höst [28] and speciﬁcally the way guidelines are followed in the
paper by Engström et al. [4]. We investigated the software/hardware integration
testing process for the train control management subsystem (TCMS) in the
Trenitalia Frecciarossa 1000, a non-articulated high-speed trainset. The process
aims to identify faults at the interface of software and hardware. The case study
spanned six releases of 13 major and 46 minor function groups of the TCMS
during a time period of 2.5 y, which involved in total 12 developers and testers for
a total testing time of 4440 h. The testing process is divided into diﬀerent levels
of integration, following a variation of the conventional V-model. The integration
tests are performed manually in both a simulated environment and in a lab in the
presence of diﬀerent equipment such as motors, gearboxes and related electronics.
The testing for each release have a speciﬁc focus, and therefore, there are only
minor overlaps between the test cases in diﬀerent releases. Each test case has
a speciﬁcation in free-text form, and contain information (managed using IBM
Rational DOORS) on the (1) test result, (2) execution date and time, (3) tester
ID, and (4) testing level.
The test result is one of the following: (a) Failed, (i.e., all steps in the test
case failed), (b) Not Run, (i.e., the test case could be not executed), (c) Partly
Pass, (i.e., some of the steps in the test case passed, but not all), and (d) Pass
(i.e., all steps in the test case passed).
According to the test policy in eﬀect, all failed test cases (including “Not
Run” and “Partly Pass” test cases) should be retested in the next release. Furthermore, each of these initiates a troubleshooting process that incurs cost and
eﬀort. In the rest of this paper, we therefore use the term failed to mean any
test verdict except “Pass”. The objective of the case study is to analyze the
improvement potential for the integration testing process at BT from decreasing

276

S. Tahvili et al.

the number of unsuccessful test executions using knowledge of test-case dependencies. The chosen method is to estimate the economic eﬀect on BT in the form
of earned ROI using Monte-Carlo simulation. We answer the following research
question in this case study:
RQ: What is the economic eﬀect of introducing a DSS for reducing the number
of unsuccessful integration test executions based on dependency knowledge?
The data collection for the case study was done through both expert judgment, inspection of documentation and a series of semi-structured interviews.
The initial parameter value estimates for C I , C T and C M were made by the
author team, as it was judged that the members of the team, having deployed
several decision support systems in the past (see e.g. [29,30]), possessed the necessary experience for this evaluation. Likewise, C P was estimated by the research
team through multiple meetings and re-evaluations. The documentation consists
of the test case speciﬁcation and verdict records in DOORS. In particular, the
fault failure rate (λ) was calculated directly by counting the corresponding test
case reports, and the fraction of dependent test cases (γ) was estimated through
manual inspection of the comments in the same set of reports. Finally, a series
of semi-structured interviews were conducted to both estimate the parameter
values for the testing process itself, and to cross-validate the full set of parameter values already identiﬁed. The interview series were made with two testers
(T1 & T2), a developer (D), a technical project leader (PL), a department manager (DM) and an independent researcher (R) in veriﬁcation and validation. The
composition and main purpose of the interviews are shown in Table 1. The ﬁnal
parameter values can be found later in this paper in Tables 2 and 3.
5.1

Test Case Execution Results

To estimate the number of result dependencies between test cases, we performed
a preliminary analysis of the results for 4578 test cases. The analysis was based
on an early discussion with testing experts, in which test result patterns that
Table 1. Series of interviews to establish parameter values
#

T1 T2 D PL DM R Main purpose

1

×

Estimate C D from dependency questionnaire.

2, 3 ×

×

4

×

5, 6 ×

×

7

Identify criteria for dependencies. Validate C D .

×

Validate dependencies.
× ×

Validate number of dependencies (γ). Estimate C E .

×

Estimate C B .

8

×

×

9

×

×

Validate C E , C B , C I , C T , C M and C P .
× Validate C I , C T and C M

Cost-Beneﬁt Analysis of Using Dependency Knowledge

277

Table 2. Quantitative numbers on various collected parameters per release. Note that
the γ rate is reported as a fraction of the fault failure rate (λ).
Parameter

Release number
1
2
3

4

Working Days (d)

62

89

Test cases (n)

321

1465 630

Fault failure rate (λ)

0.545 0.327 0.460 0.513 0.346 0.246 0.379

168

5

6

65

127

44

419

1458 285

Total
555
4578

Fail based on fail. rate (γ) 0.411 0.267 0.393 0.753 0.630 0.457 0.475

were likely to indicate a dependency were identiﬁed. The patterns have previously been independently cross-validated on a smaller set of 12 test cases by
other test experts at BT (see [1]). We classiﬁed the test results using the same
patterns, resulting in 823 possible dependency failures out of 1734 failed test
cases, resulting in a total estimate of γ ≈ 0.476. In the semi-structured interviews, two testers independently estimated that approximately 45 % of the failed
test cases were caused by dependencies, which is close to our estimate. Table 2
shows the full results for the six release cycles.
5.2

DSS Alternatives Under Study

We analyzed three diﬀerent DSS variants, which all prioritize the test cases
by aligning them with the identiﬁed dependencies but vary in the amount of
automation they oﬀer. The goal was to identify the tool-supported process
change which is most cost-eﬀective (as measured by the ROI metric) within
a reasonable time horizon. The following DSS variants were considered:
– Manual version: prioritization and selection of test cases in the level of
integration testing manually. In this version, a questionnaire on the time for
test execution, troubleshooting and set of dependencies to other test cases, is
sent to the testing experts. To be manageable for an individual, the questionnaire is partitioned into smaller parts according to the subsystem breakdown.
To increase precision and decrease recall, it is preferable that several experts
answers the same questionnaire; however, the exact number should be decided
based on the experience level of the testers. One of the commercially and publicly available toolboxes for multi-criteria decision analysis (such as FAHP or
TOPSIS) are then used for prioritization of test cases. Data is fed manually
into and out of the DSS, and a spreadsheet is used to ﬁlter, prioritize and keep
track of the runtime test case pool.
– Prototype version: Dependencies are collected as in the manual version.
However, the DSS is custom-made to read the input in a suitable format,
automatically prioritize, ﬁlter and keep track of the runtime test case pool,
and can output a testing protocol outline, suitable for the manual integration
testing process.

278

S. Tahvili et al.

– Automated version: in addition to the prototype version, the DSS detects
the dependencies automatically by utilizing a publically-available toolbox
(such as Parser [31]). The criteria determination step (in Fig. 1) would be
applied on the test cases by utilizing some learning algorithms (for example a
counting algorithm for calculating the number of test steps in a test case for
estimating the execution time for a test case).
As explained earlier in Sect. 4, we divide the total cost needed for software
testing into ﬁxed (one-time cost) and variable cost. The ﬁxed cost includes the
DSS cost for three versions which includes implementation, maintenance and
training costs.
The variable cost contains execution cost and also troubleshooting cost for
the failed test cases. The variable cost changes in proportion to the number of
executed test cases and the number of failed test cases per project.
5.3

ROI Analysis Using Monte-Carlo Simulation

The three version of the DSS were evaluated on the six release cycles described
before. As many other mathematical methods, ROI analyses are sensitive to
small changes in the input parameter values. As an eﬀect, the calculated ROI
can ﬂuctuate depending on the varying time estimates. For this reason we chose
to both evaluate the ROI model above using Monte Carlo simulation, as detailed
below, and to perform sensitivity analysis by varying the expected value of some
of the time estimates, as detailed in the results section. The parameters for the
three versions are shown in Table 3.
The focus on initial analysis means that estimation eﬀorts should be kept
low. For this reason, a single-parameter Rayleigh distribution, which is the basis
in the Putnam model (see [32,33]), was chosen for the distribution of eﬀort for
software testing and implementation tasks. Test-case failures were sampled from
a Poisson distribution.
Table 3. DSS-speciﬁc model parameters and distributions
Param. Comment

Distr.

γt

Failed TC rate

Constant See Table 2.

λt

Failed dep. TC rate

Poisson

CE

TC execution time, per TC

Rayleigh 2

2

2

C

B

Distribution param.
Manual Semi Auto
See Table 2.

TC troubleshooting time, per TC Rayleigh 4

4

4

CI

Total implementation time

Rayleigh 120

825

1650

CT

Training time, per year

Rayleigh 540

360

360

Maintenance time, per year

Rayleigh 40

C

M

165

330

CD

DSS data collection time, per TC Rayleigh 69.2

69.2

0.00

CP

DSS run time, per TC

5.37

0.00

Rayleigh 32.3

Cost-Beneﬁt Analysis of Using Dependency Knowledge

279

Fig. 2. Expected ROI for the three DSS versions. The vertical dotted line indicate the
end of the six release cycles; later cycles are simulated using repeated data.

The three diﬀerent DSS versions were simulated by sampling 100 000 values
for each data point using the parameters in Tables 2 and 3. The mean cumulative
results are shown in Fig. 2 for the studied project at BT. In the experiments,
utilizing all three versions of DSS resulted in a positive ROI at the end of the
six release cycles. Moreover, the maximum value of ROI was found for the manual DSS version, where the maintenance and implementation costs are low as
compared to the other versions. By evaluating in total three identical projects
in sequence, thereby simulating one team of developers working on three larger
projects over the course of almost seven years, it can be noted that the prototype
tool is the most promising from an expected ROI perspective. The initial implementation eﬀort and the continued maintenance costs of the automatic version
makes it less promising even after seven years of deployment.
5.4

Sensitivity Analysis

To increase the validity of the study, we also performed a sensitivity analysis by
varying key parameters. In particular, the ROI analysis is sensitive to changes
in the ﬁxed and variable costs for the DSS. For this purpose, we varied the
implementation costs, i.e., the ﬁxed up-front cost for the DSS variant, and the
maintenance costs, i.e., a critical part of the variable costs. The evaluation was
performed using the same parameters as in the previous section, with the exception that the time horizon was ﬁxed to the six development cycles in the case
study project, with a duration of approximately 2.2 calendar years. The results
of the experiments are shown in Fig. 3. Each data point shows the mean result
of 100 000 Monte-carol simulations, each consisting of six release cycles.
As can be seen, when increasing the implementation cost, the relationship
between the three variants are the same. The manual variant is proﬁtable even
up to an implementation cost factor of 16. If the implementation factor is below
0.5, the prototype tool is the most proﬁtable. It is worth noting that going to the
left from the normative case, the ROI of the manual variant changes little, which
is due to the fact that the implementation costs for the manual variant is small

280

S. Tahvili et al.

Fig. 3. Sensitivity analysis results for DSS costs.

Fig. 4. Sensitivity analysis results when varying test case failure rates (λ and γ).

compared to the other variants. There is no change in the relationship between
the variants when varying the maintenance costs by equal factors (Fig. 3b). Further, changing the cost factor of any single variant by less than 8 still maintains
this relationship.
Figure 4 shows the results from varying λ, the fault failure rate, and γ, the
fail-based-on-fail rate, using a coeﬃcient between 0.0 and 5.0 individually for
each release cycle. As can be seen, for both λ and γ the average ROI is above
zero when the coeﬃcient is above 0.5, corresponding to average rates of 0.189
and 0.237, respectively. Further, it can be observed that, for the 6 release cycles,
the relationship between the three DSS variants are the same for all coeﬃcients.
The tailing-oﬀ eﬀect which can be observed for higher coeﬃcients occurs because
for each individual release cycle, there is a natural upper limit of 1.0 for both
rates, which is enforced after sampling.

6

Discussion and Threats to Validity

In this paper, we have analysed the cost-beneﬁt of utilizing a dependency-based
DSS in integration-level testing. It is clear that our results are based on simulation and it will be more convincing to do a similar investigation on real data from
an actual project. This is one of our future goals with this direction of research.

Cost-Beneﬁt Analysis of Using Dependency Knowledge

281

Our simulation results indicate that a DSS that considers dependencies in prioritizing test cases for integration level testing may be economically beneﬁcial,
but that the extent to which the method should be automated highly depends
on the characteristics of the context in which the DSS should be introduced. It
should be noted that prioritization based on test dependencies is just one way
to reduce eﬀort of integration testing, and the DSS is not inherently limited
to dependency-based prioritization. There are a number of other techniques to
reduce eﬀort such as system design planning and scheduling implementation.
In fact, prioritization of the systems and sub-systems under test can reduce
the degree of dependency between test cases. Another point to consider is earlier fault detection [34] which can be enabled by prioritization based on fault
detection probability. It should further be noted that the costs in our model,
as well as the calculated values of ROI are based on estimates. Consequently,
the experimental uncertainty is inherent in the study design. We have tried to
mitigate this uncertainty by dividing the costs into smaller units. For example,
design cost and implementation cost for a test case are considered as two separated costs. By further deﬁning the execution cost as a separate cost in our
cost model, we account for the fact that a test case can be designed and created
one time, but can be executed more than one time. We also separate the staﬀ
(testers and developers) cost between test case implementation and execution
cost. In addition, the gathered data from BT is another source of uncertainty. As
discussed in Sect. 5.4, by performing various sensitivity analyses we have identiﬁed and evaluated how diﬀerent factors can impact the cost-beneﬁt and ROI of
the application of the DSS. In terms of validity threats, in the economic model,
we assume that if test case A fails then running B (which depends on A) after
A yields no additional information. This assumption could be invalid in other
systems, where execution and failure result of test case B could still provide
additional information, which in turn may aﬀect the results of the analysis. Similarly, there are other assumptions related to our work (see Sect. 4), which can be
invalid in the real world. Further, in our study we detect dependencies between
test cases from test result reports to measure the percentage of fail-based-fail in
the analyzed project. Considering the fact that the identiﬁcation of dependencies
by testers to some extent is subjective, other studies using other subjects may
result in diﬀerences in dependency identiﬁcation, and estimated gains from the
DSS could thus be aﬀected.

7

Conclusion and Future Work

In this paper, we introduced and assessed cost and beneﬁts of applying a decision
support system (DSS) for reducing the eﬀorts at integration level testing. We
identiﬁed the cost factors relevant in integration testing and provided a cost
estimation model for calculation of return on investment based on which we
evaluated where the use of our proposed DSS will be economically beneﬁcial
and result in cost reductions. The proposed decision support system has been
applied to an industrial case study of a safety-critical train control subsystem

282

S. Tahvili et al.

and a brief analysis of the result was given. The results of the BT case study
indicate that utilizing the proposed DSS can reduce test execution eﬀorts and
achieve a positive value for ROI for a system of that size and complexity, where
higher test execution eﬃciency was enabled by identifying and avoiding test
redundancies based on their dependencies. Moreover, by applying the proposed
DSS, we can detect the hidden faults in the system under test earlier and fault
failure rate increases with time, as we have demonstrated in [24]. The ROI of the
proposed DSS increases in situations where the number of test cases is large, the
system under test is complex consisting of dependent modules and subsystems,
and there exist additional limitations, such as budget, deadline, etc. The level
of ROI depend on various cost factors, such as the cost for implementing and
maintaining the tool, personnel training, execution time of the decision making
algorithms, total number, size and execution time of test cases. This leads also
to the conclusion that for a system which is small in the sense that the number of
test cases is very small and there are no extra limitations for performing testing
activities, a high ROI value may not be achieved by using the DSS.
Acknowledgements. This work was supported by VINNOVA grant 2014-03397
through the IMPRINT project and the Swedish Knowledge Foundation (KKS) grant
20130085 through the TOCSYC project and the ITS-EASY industrial research school.
Special thanks to Johan Zetterqvist, Ola Sellin and Mahdi Sarabi at Bombardier Transportation, Västerås-Sweden.

References
1. Tahvili, S., Saadatmand, M., Larsson, S., Afzal, W., Bohlin, M., Sudmark, D.,
Dynamic integration test selection based on test case dependencies. In: The 11th
Workshop on Testing: Academia-Industry Collaboration, Practice and Research
Techniques (TAIC PART) (2016)
2. Yoo, S., Harman, M., Regression testing minimization, selection, prioritization: a
survey. Softw. Test. Veriﬁcation Reliab. 22(2), 67–120 (2012)
3. Catal, C., Mishra, D., Test case prioritization: A systematic mapping study. Soft.
Qual. Journal, 2013
4. Engström, E., Runeson, P., Ljung, A.: Improving regression testing transparency
and eﬃciency with history-based prioritization-an industrial case study, pp. 367–
376 (2011)
5. Bell, J.: Detecting, isolating, and enforcing dependencies among and within test
cases. In: 22nd International Symposium on Foundations of Software Engineering
(2014)
6. Zhang, S., Jalali, D., Wuttke, J., Mucslu, K., Lam, W., Ernst, M., Notkin, D.:
Empirically revisiting the test independence assumption. In: International Symposium on Software Testing and Analysis (2014)
7. Campanella, J.: Principles of quality costs: principles, implementation and use.
ASQ Quality Press (1999)
8. Black, R.: What it managers should know about testing: How to analyze the return
on the testing investment (2004)

Cost-Beneﬁt Analysis of Using Dependency Knowledge

283

9. British Standards Institution. Guide to the economics of quality. Proc. cost model.
B.S. (Series). BSI (1992)
10. Crosby, P.: Quality is free: the art of making quality certain. Penguin (1980)
11. Slaughter, S., Harter, D., Krishnan, M.: Evaluating the cost of software quality.
Commun. ACM 41(8), 67–73 (1998)
12. Krasner, H.: Using the cost of quality approach for software. Crosstalk J. Def.
Softw. Eng. 11, 6–11 (1998)
13. Boehm, B., Huang, L., Jain, A., Madachy, R.: The R.O.I of software dependability:
The iDAVE model. IEEE Softw. 21(3), 54–61 (2004)
14. Afzal, W., Alone, S., Glocksien, K., Torkar, R.: Software test process improvement
approaches: a systematic literature review and an industrial case study. J. Syst.
Softw. 111, 1–33 (2016)
15. Wagner. S. Software product quality control. Springer, 2013
16. Nikolik, B.: Software quality assurance economics. Info. Softw. Technol
17. Leung, H., White, L.: A cost model to compare regression test strategies. In: Proceedings of the 1991 Conference on Software Maintenance (1991)
18. Black, R.: Managing the Testing Process: Practical Tools and Techniques for Managing Hardware and Software Testing. Wiley Publishing (2009)
19. Münch, S., Brandstetter, P., Clevermann, K., Kieckhoefel, O., Reiner Schäfer, E.:
The return on investment of test automation. Pharmaceutical Eng
20. Hayduk, B.: Maximizing ROI and Avoiding the Pitfalls of Test Automation. RealTime Technology Solutions Inc. (2009)
21. Hoﬀman, D.: Cost beneﬁts analysis of test automation. Software Quality Methods
LLC (1999)
22. Mohacsi, S., Felderer, M., Beer, A.: Estimating the cost, beneﬁt of model-based
testing: a decision support procedure for the application of model-based testing
in industry. In: Proceedings of the 2015 41st Euromicro Conference on Software
Engineering and Advanced Applications, SEAA ’15 (2015)
23. Felderer, M., Beer, A.: Estimating the return on investment of defect taxonomy
supported system testing in industrial projects. In: Proceedings of the 2012 38th
Euromicro Conference on Software Engineering and Advanced Applications, SEAA
’12 (2012)
24. Tahvili, S., Afzal, W., Saadatmand, M., Bohlin, M., Sundmark, D., Larsson, S.:
Towards earlier fault detection by value-driven prioritization of test cases using
ftopsis. In: Proceedings of the 13th International Conference on Information Technology: New Generations (2016)
25. Debroy, V., Wong, W.: On the estimation of adequate test set size using fault
failure rates. J. Syst. Softw. 84, 587–602 (2011)
26. Musa, J., Okumoto, K.: A logarithmic poisson execution time model for software
reliability measurement. In: Proceedings of the 7th International Conference on
Software engineering
27. Rico, D.: ROI of Software Process Improvement. J Ross Publishing (2004)
28. Runeson, P., Höst, M., Rainer, A., Regnell, R.: Case Study Research in Software
Engineering. WILEY (2012)
29. Bohlin, M., Wärja, M.: Maintenance optimization with duration-dependent costs.
Ann. Oper. Res. 224(1), 1–23 (2015)
30. Bohlin, M., Holst, A., Ekman, J., Sellin, O., Lindström, B., Larsen, S.: Statistical
anomaly detection for train ﬂeets. In: Proceedings of the 21st Innovative Applications of Artiﬁcial Intelligence Conference (2012)

284

S. Tahvili et al.

31. Marneﬀe, M., Manning, C.: Stanford typed dependencies manual. Technical report,
Stanford University (2008)
32. Putnam, L.A.: general empirical solution to the macro software sizing, estimating
problem. IEEE Trans. Softw. Eng. 4(4), 345 (1978)
33. Putnam, L.: A macro estimating methodology for software development (1976)
34. Hunt, B., Abolfotouh, T., Carpenter, J., Gioia, R.: Software test costs and ROI
issues. University Lecture (2014)

Using Surveys and Web-Scraping to Select Tools
for Software Testing Consultancy
Päivi Raulamo-Jurvanen1 ✉ , Kari Kakkonen2, and Mika Mäntylä1
(

)

1

M3S (M-Group), ITEE University of Oulu, Oulu, Finland
{paivi.raulamo-jurvanen,mika.mantyla}@oulu.fi
2
Knowit Oy, Helsinki, Finland
kari.kakkonen@knowit.fi

Abstract. We analyzed ﬁndings from data collected utilizing surveys and Webscraping, to support Knowit Oy, a software testing consultation company, in the
process of selecting the right tools for software testing & test automation. We
conducted two surveys (2013 & 2016) among (mostly Finnish) software profes‐
sionals to acquire criteria and a list of tools used for software testing in industry.
Considering all our data sources Selenium was the most popular pure tool, while
Robot Framework was the most referenced tool (latter survey). According to the
surveys Jenkins and Sikuli have the highest increase in popularity (or familiarity).
Top referred criteria for selection were usability, functionality, maintainability
and available support for a tool. While Knowit considers it best to utilize tradi‐
tional surveys, Web-scraping is seen as cost eﬀective support for such instru‐
ments. To get comprehensive picture and to gain knowledge of the tools in
markets multiple sources should be used.
Keywords: Test automation · Software testing tool · Software test automation
tool · Tool support · Selection criteria

1

Introduction

Software test automation is tool-oriented domain and integral to frequent testing as part
of continuous delivery and rapid releases. A recent online survey reported test automa‐
tion to be a key factor with software quality and R&D cost saving from the viewpoint
of management [19]. In another recent survey by ISTQB [5] test automation was ranked
as the main area of improvement opportunities in testing activities. In addition to that,
test tool/automation consultation was ranked as the service most required from external
providers. The results from a survey by Capgemini, Sogeti and HP (the World Quality
Report, WQR 2015-2016) [1] highlight that investing in test automation is a must to
keep up with the ever increasing demand for velocity. The ﬁndings claimed that 40 %
of the respondents (IT leaders of mobile technologies) reported lack of right tools for
their testing activities. Overall, these sources highlight the importance of test automation
which cannot succeed without proper tools.
Selecting tools for software testing is a diﬃcult practical problem as there are
numerous software testing tools available. The exact number of tools is unknown as
© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 285–300, 2016.
DOI: 10.1007/978-3-319-49094-6_18

286

P. Raulamo-Jurvanen et al.

what constitutes as a test tool is diﬃcult to deﬁne, e.g. many people use Excel to manage
test cases. The high number of tools is well reﬂected for example in the web-site listing
tools for pair-wise testing <http://www.pairwise.org/tools.asp>, a technique to generate
minimum number of test cases covering all combinations of two test inputs. The website has listed 41 tools for pair-wise testing, aka combinatorial testing, alone. Given the
number of tools available only for this test input generation technique, we can estimate
that there has to be hundreds, if not thousands, of testing tools available.
In this paper, we study the knowledge acquisition phase of the test tool selection
process within a consulting company (Knowit Oy). This is our initial work on test auto‐
mation tool selection and thus we provide only initial answers to following Research
Questions: (RQ1) Why is selecting tools important for Knowit? (RQ2) What is the most
popular software test tool nowadays, in comparison to year 2013? (RQ3) How does
Web-scraping compare with traditional surveys? (RQ4) What criteria people ﬁnd
important when selecting tools for software testing or test automation?

2

Prior Work

Test automation consultants Graham and Fewster [2] have studied experiences of test
automation with industrial cases over a long period of time. They emphasized that “there
is no such thing as the perfect tool, but there are many tools that would be adequate for
a given situation”, it is the preferences that drive the decisions. They claim that the tool
must be appropriate for a job. A tool may be inadequate in some context but suitable in
another and several tools may have to be used to accomplish the goals [2].
An online survey published in March 2016 [19] focused on the tools used in test
automation, in companies of less and more than 100 employees. The ﬁndings from a
total of 644 software professionals indicated organizations to use more than a single test
automation solution, open source tools being popular, especially in smaller organiza‐
tions. The amount of tools is thought to be high due to reasons like application
complexity, multiple platforms or lack of required functionalities. The study anticipated
possible disappearance of commercial tools in favor of open source tools in the near
future. The most used tools were, in the order of preference, for smaller size organiza‐
tions: Selenium (42 %), internal tools (20 %), Junit (12 %), Android SKD (8 %), Appium
(7 %), JMeter (3 %), Watir (3 %), Pytest (3 %) and Selendroid (2 %). For larger organ‐
izations the tools most used were: Selenium (29 %), internal tools (24 %), Junit (12 %),
Appium (8 %), Microsoft (8 %, in general), QTP (7 %), Selendroid (5 %), TestComplete
(5 %) and JMeter (2 %) [19].
Past work on software testing tool selection in general by Poston and Sexton [13]
perceived systematic data collection method, preferably with forms or checklists, to be
the secret for selecting appropriate testing tools. Although several surveys of software
testing have been conducted, e.g. [3, 8, 9, 15], those typically do not cover the actual
tools used. There are however studies that focus on a few speciﬁc tools e.g. comparing
TestComplete and QTP on characteristics [7], acknowledging the need for evaluation
of tools [12], comparing Selenium, QTP and TestComplete (eventually concluding the
best tool being QTP) [6] or comparing a few web-service tools [4].

Using Surveys and Web-Scraping to Select Tools

287

To summarize, according to the authors’ best knowledge peer-reviewed literature is
missing surveys that would focus on the tools by actually naming them. Tools are
essential for our trade. Academics need tools for teaching and practitioners for their
business. Often tools are listed in requirements of job ads making tool knowledge essen‐
tial for students graduating from universities.

3

Case Context and the Problem (RQ1)

In this section, we ﬁrst describe the context of our work and the particular problem we
are trying to solve. We use a checklist by Petersen and Wohlin [11] to describe our
context in Table 1. Evidence-based software engineering can exploit the context descrip‐
tion if that is done as completely and accurately as possible for the targeted object of
study. Next, we describe the problem with an informal question answer format.
• Why is selecting tools important for Knowit? A software development and testing
project success is built on people, processes and tools. It is important to be able to
recommend and help to choose a set of tools that is eﬀective and eﬃcient in tasks
and ﬁt the context in question.
• Why are tool surveys conducted? Test tools get more visibility in the industry.
Surveys provide understanding about tools on the rise and tools on the decline. There
are excellent newcomers to the tool scene, there are changes in product portfolios
and features of existing tools sets. Identifying tools gaining market share at a given
time helps to steer for the next good tool.
• What is to be gained by surveys for tool selection? Tool selection surveys collect and
distribute the collective information from people who have invested time in choosing
and using a tool. Such knowledge can make the tool comparison and selection process
more eﬃcient.
• What does tool selection mean to the business? The business of the customer or end
user of software development process gains eﬃciency and eﬀectiveness using the
tool. A suitable test automation tool will impact the project velocity positively
(enabling e.g. faster time to market). The eﬃciency provided by two diﬀerent tools
can be signiﬁcant. For the business of a consultancy company, the tool selection is
an essential part of the service oﬀering, a must-have service although a minor one if
calculated in turnover. More importantly, the consultancy company wants to provide
tools bringing the best eﬃciency. In the end, both consultancy and customer always
share the same common goal of customer business success.
• What are the experienced diﬃculties? Typical challenge is the willingness of unen‐
lightened stakeholders to use a good tool for a purpose other than the tool was orig‐
inally designed for. That may prevent achievement of the expected results for the
tool adoption. Another diﬃculty is comparison of tools that are similar on paper, e.g.
“test management” tools, some of which work on cloud and some with native client.
In such case it is essential to understand the really important characteristics of the
tools.

288

P. Raulamo-Jurvanen et al.
Table 1. Case context with the framework by [11]

Object of study
Product
People
Practices, tools & techniques
Processes

4

Tool selection and process acquiring related knowledge.
Service oﬀering provided by the software testing consultancy
company.
Technology consultants, Customer consultants, Tool owner in
the customer organization.
Partner discussions and information, trade fairs, crosscustomer recommendation.
Software development, Software Testing, Training &
Deployment. Technology, Partner, Portfolio and Project
management.

Research Methods

This section describes research methods used for gathering knowledge about the testing
tools.
4.1 Surveys
We present results of two diﬀerent surveys that collected information about test tool
usage, mainly in Finnish software industry. A survey can be thought as a vehicle to
harness the “wisdom of the crowds” for tool selection process. The concept embodies
the idea of collective opinion (or intelligence), that under the right circumstances a group
can be smarter than a single individual [14].
The ﬁrst tool survey, Survey 1, was conducted in 2013 (as a thesis work for Master
degree of Business Informatics at Metropolia) by Knowit employee Minna Tiitinen with
Kari Kakkonen as a tutor [16]. This survey oﬀers historical perspective on how Knowit
has utilized surveys. The survey was distributed in public email list of Finnish profes‐
sional testing society (TestausOSY) and also to the partners of Knowit, receiving 107
answers.
In 2016, University of Oulu and Knowit jointly conducted a tool survey, Survey 2,
to ﬁnd out (1) the criteria people used (or preferred) for tool selection and (2) tools that
were used by software industry. The survey was targeted to software professionals and
links to the questionnaires were provided in Facebook, LinkedIn and Twitter to selected
groups (mainly Finnish software testing related groups) and sent to email lists of Finnish
professional testing society (TestausOSY) by the partner organization, Knowit. The total
of 58 answers had a clear bias in favor of Finnish respondents (51).
The questionnaires for the two surveys were rather diﬀerent of nature; ﬁrst, by the
number of questions and second, by the design of the questions. The questionnaire for
Survey 1 included 61 questions in total (9 questions for the background information
about the respondents and 52 questions for the tools). For Survey 2, the total number of
questions was only 8 (5 and 3 questions for background information and tools, respec‐
tively). Survey 1 provided multiple choice questions with predeﬁned lists of tools (and
an option to add tools), the criteria for the predeﬁned lists of tools being ISTQB tool

Using Surveys and Web-Scraping to Select Tools

289

classiﬁcation and (biased) commonness of the tools. For Survey 2, the intention was not
to steer respondents’ opinions or tool choices by providing free text ﬁelds only. The
original questionnaires are available for Survey 1 as Appendix 1 (pg. 113) in [16] and
for Survey 2 from http://goo.gl/MjPFCr.
4.2 Web-Scraping
Since surveys in general require eﬀort to create, administer and respond and may suﬀer
from low response rates and respondent bias, we utilized Web-scraping that could
support or even possibly substitute surveys. Web-scraping is an approach to fetch
content from the internet, a technique to access web-pages and extract a structured view
of the required data [10]. However, there are both legal and ethical issues involved in
this matter, e.g. the Terms of Service (ToS) for any service may clearly prohibit data
scraping from the website or the usage of scraped data may violate a website owner’s
copyrights. Sometimes a service may provide a public API to access some data, the
quality and quantity of which may be lower than (or not as up-to-date as) data acquired
by disruptive web scraping. (Some services may oﬀer free services for limited access
and require a paid fee for more frequent or massive usage).
We utilized Web-scraping to collect wider views of the Top 15 tools of Survey 2.
Data collected included number of Wikipedia page views, number of Google hits (using
a particular search string), number of StackOverﬂow questions and view counts for
those, and number of Twitter tweets (see Tables 3 and 4 – please note, rows in both
tables are sorted by column “Rank” of Table 4). The time period used for the searches
was intentionally set on three months (January 1st – March 31st 2016) to provide some
variation and recent body to the content. The data was collected on April 20th and 21st
2016.
Wikipedia is web-based encyclopedia with openly editable content, the English
version of which alone contains over 5 million articles. To get the trends of (user created)
page views for Wikipedia articles (available in en.wikipedia) we utilized the Pageview
API1 in RStudio/R (required R packages ‘httr’ and ‘jsonlite’).
Google Search is claimed to be the most used web search engine on the WWW. ToS
of Google strictly deny any access to their services via “using a method other than the
interface and the instructions that we provide”2. Thus, the Google hits were collected
manually using Firefox browser and search string “<toolname> and ‘software testing’
and tool”.
For StackOverﬂow questions the data was fetched from StackExchange Data
Explorer3 (open source tool) using the provided SQL-query editor. StackExchange data
explorer provides libraries of “high-quality questions and answers” and allows to fetch
and download data from diﬀerent sites, of which StackOverﬂow, language independent
site for programmers, is one. The actual data fetched from StackExchange included title

1
2
3

https://wikitech.wikimedia.org/wiki/Analytics/PageviewAPI.
https://www.google.com/intl/en/policies/terms/?fg=1.
http://data.stackexchange.com/.

290

P. Raulamo-Jurvanen et al.

of the question, number of views and creation date for each question (not the full body
of the posts).
Twitter is a popular, online social networking service to communicate via short 140character messages (“tweets”). The existing Twitter API has limitations to fetching
tweets (e.g. for the time period or tweets per day). Thus, an open source project
GetOldTweets-java (v1.2.0), written by Henrique Jeﬀerson4, was utilized, allowing to
get the tweets for the tools for the deﬁned observation period. The names of the tools
were used as hashtags when searching for the posted tweets.

5

Results

First, the results of Survey 1 are summarized only (as from 2013). Then the results of
Survey 2 are presented and contrasted with those of Survey 1. Finally, the results from
Web-scraping are presented.
5.1 Survey 1
The results of the Survey 1 (107 respondents) showed that agile processes and tools
adapted to them were on the rise. Most companies seemed to use both commercial (88 %)
and open source tools (60 %), and even proprietary tools (48 %). Unsurprisingly, open
source tools dominated in small companies and commercial tools in large companies.
The diﬀerent ways how tools were acquired in companies of diﬀerent size are shown in
Table 2.
Table 2. Types of tools & Company sizes (Survey 1)
1–10
1
1
2

Bought
Open source
Proprietary
tools
Cloud service 0
Rented
0
Other
0

11–50 51–100
5
4
5
5
2
2
2
0
1

2
1
2

101–500
22
12
13
3
3
2

501–1000
14
6
7
2
1
1

Over 1000
46
27
24
6
5
1

Sum %
92
40
56
24
50
22
15
10
7

7
4
3

In general, in overall analysis of all tools, HP and Atlassian were the most popular
commercial vendors (having tools for diﬀerent testing activities) while Selenium and
Robot Framework were the most popular open source tools. For diﬀerent test activities,
mostly tools were used for test execution, test case and defect management and reporting.
Excel was widely used on the side of the more sophisticated tools. The test execution
tools used in the Survey 1 are shown in Fig. 1. The most popular test execution tool was
Selenium with 45 % of the respondents using it. QTP (nowadays replaced by UFT) and
Robot Framework seemed almost level while the rest of the tools were used by a small
4

https://github.com/Jeﬀerson-Henrique/GetOldTweets-java/.

Using Surveys and Web-Scraping to Select Tools

291

number of respondents only. (A category not shown in the ﬁgure was “Other” (21 %)
which included tools referenced just once by respondents to the option “Other” for the
question of test execution tools).

Fig. 1. Test execution tools used (Survey 1)

Some of the actions taken based on the results of the survey emphasized the need
(1) to gain tool expertise in choosing tools (not just take the ﬁrst one) and (2) to look
more into the most popular tools in the survey. At the time the survey highlighted the
unanticipated importance of Quality Center (HP). Today, the distribution of tools oﬀered
by Knowit to the customers has slightly changed (i.e. somewhat more Atlassian & Robot
Framework (Knowit is one of the founding members of Robot Framework Foundation)
and somewhat less HP). However, tool changes are quite expensive investments into
learning, migrating data etc., let alone the actual tool selection and implementation.
Thus, tool choices are only questioned every 3–4 years or so. One has to use a tool for
some time to gain beneﬁts of it. Also, as tool integration provides extra eﬃciencies,
intent has been increased to integrate commercial and open source tools. The percentages
of the adoption of the tools in Survey 1, as a comparison to the Top 15 tools of Survey
2, are listed in Table 3 (column “2013 Survey, Usage %”).
5.2 Survey 2
Regarding the expertise of the respondents (58), the average years in software industry
was 15.88 (median 15), while maximum years was 43 and minimum 3. Thus, the group
of respondents was rather mature in years spent in software industry. The number of
those having been in software industry for ten years or more was 47 (82 %).

292

P. Raulamo-Jurvanen et al.

Criteria (RQ4)
The respondents were requested to describe important criteria when selecting a tool
for software testing or software test automation. They were requested to describe in their
own words what matters in general (e.g. regardless of technique, testing area or tool),
what would be good to know, or take into account in advance. The respondents were
expected to provide short, accurate descriptions of the features or characteristics they
value in such tools. The question was intentionally left open: “What are important
criteria when selecting a tool for Software Testing/Test Automation? (E.g. What features
or characteristics do you value in a tool? Or in your opinion, what would be good to
know in advance or matters to you in general?)”. The fundamental purpose of the ques‐
tion was to collect data as a basis for further studies.
We assume that general requirements for software testing tools (e.g. costs, possible
licensing model or developer support, to mention a few) are rather similar (even globally)
despite physical location. It is notable that some criteria are always more important to
some users than to others and not all criteria work for all even though we can observe
general trends. Each and every software project is unique and must choose the criteria
and how to apply those in their context.
The criteria were coded in NVivo, ﬁrst by qualitative coding by topics appearing in
the responses, i.e. open coding and axial coding. Later we mapped our codes to the ISO/
IEC 25010 quality model5. As software testing tools are software too, ISO/IEC 25010
quality model can be used to represent the desired characteristics proposed by our
respondents. However, we added “Support” and “Costs” to the categories since those
were not included in the model and our respondents frequently brought up those topics.
The references to the categories from the quality model are shown in Fig. 2.

Fig. 2. Survey 2 Criteria categorized using ISO25010 and with “Support” and “Cost” categories

5

http://iso25000.com/index.php/en/iso-25000-standards/iso-25010.

Using Surveys and Web-Scraping to Select Tools

293

Clearly the issue valued by the respondents was “Usability” with references to
“Operability” (43), “Learnability” (13), “User Interface Aesthetics” (3), “Accessibility”
(1) and “Appropriateness Recognizability” (1). “Functional Suitability” had references
to “Functional Appropriateness” (24), “Functional Correctness” (17) and “Functional
Completeness” (11). “Maintainability” included references to “Modiﬁability” (20),
“Modularity” (8), “Reusability” (5) and “Analyzability” (2). None of the respondents
referenced the sub-category of “Testability” which seems rather natural.
“Portability” was referenced as “Installability” (18) and “Adaptability” (10), but
none of the respondents referenced “Replaceability”. “Compatibility” with sub-catego‐
ries “Co-existence” (6) and “Interoperability” (20), however, this issue about compati‐
bility (with tools/systems/platform/integration) is somewhat ambiguous since respond‐
ents were not always speciﬁc with their wording. “Reliability” included “Maturity” (8),
“Fault Tolerance” (3) and “Availability” (1). “Performance eﬃciency” had reference to
“Time Behavior” (4), “Resource Utilization” (3) and “Capacity” (1). For “Security”,
there was one reference for “Integrity”.
“Support” was added with sub-categories “General Tool Support” (12), “Popularity”
(9), “Future of the Tool” (6, we interpreted that the future is mainly of interest because
support and new versions of the tool are needed in the future) and “Vendor Independ‐
ence” (1). For “Costs” there were references for “Price” (11), “License model” (11),
“Acquisition costs” (3), “Operating Costs” (3) and “Free” (3).
Open Source software becoming more common may have an impact on the concern
for price or costs in general. However, when respondents were referring to “Open Source
software” (or “open source interfaces”) we interpreted those to refer to licensing model
although we were not certain whether they were referring to importance of related costs,
licensing model or modiﬁability of the software, or all of those. Adoption of a software
testing or software test automation tool is expected to be a sustainable investment. Since
adoption of free tools brings on additional costs, mostly resource related, there are
expectations for a lifetime and development of those tools.
“Usability” and “Functional Suitability” of a tool were of high importance. The
respondents valued maintainability, i.e. ability to conﬁgure or modify the tool according
to their needs. “Support” was a topic of its own in the responses. The respondents seemed
to prefer tools that were mature, i.e. had been adopted by fellow professionals already
for some period of time and had support available in various software forums, e.g.
StackOverﬂow.
Tools Used by the Respondents
The question querying about the tools used resulted in 164 diﬀerent tools. The ques‐
tion for tools was intentionally open, not to give bias in favor of any tool: “List tools for
Software Testing/Software Test Automation you have been using yourself or tools which
have been utilized in your organization. (For non-public, self-made tools you may write
“Inhouse tool for doing X”)”. The number of tools, considered as identiﬁable software
testing or test automation tools, was 133 (excluding tools e.g. like Excel, Word, WinSCP,
PuTTY or Cygwin).

294

P. Raulamo-Jurvanen et al.

Table 3. Top 15 tools of Survey 2 (#1): Usage % (ﬁgures for Survey 1, (2013) included for
reference), Wikipedia page views & Google Hits (rankings for columns included)

To support the goals of the partner organization of having up-to-date sales oﬀering
of the most popular testing tools, we analyzed the top 15 tools based on the number of
responses. The list of Top 15 tools includes diﬀerent types and range of tools. The top
four referenced tools are all open source tools and only ﬁve of the Top 15 tools were
commercial. Open source tools seemed popular among the respondents of this survey,
too. Nearly 70 % of the respondents (column “2016 Survey, Usage %”, Table 3) had
Table 4. Top 15 tools of Survey 2 (#2): StackOverﬂow questions & view counts, average views
per question, Twitter tweets, total sum of rankings and ranking

Using Surveys and Web-Scraping to Select Tools

295

some experience with the most used tool, Robot Framework. The respondents were
mainly from Finland, thus the popularity (of familiarity) of Robot Framework in
comparison to other tools in understandable. Interestingly, the Top 5 tools within
included tools for diﬀerent purposes, e.g. for acceptance testing (Robot Framework),
web application testing (Selenium), continuous integration (Jenkins) and cross-platform
functional testing (UFT/QTP & SoapUI). Tables 3 and 4 show the details of the Top 15
tools and related data.
Contrasting Survey 2 with Survey 1 (RQ2)
Looking at the new results, (although two surveys were somewhat diﬀerent of nature)
it seems that the market has partially changed and partially stayed the same over the
three years. The same tools seem to dominate, with just slight changes in market share
(e.g. Robot Framework & Selenium). One could see somewhat more mobile-suitable
tools, e.g. Appium, in the results as an expected development. In particular, Jenkins
(continuous build and test management tool) and Sikuli (visual GUI testing tool) have
appeared as new tools in comparison to Survey 1. Both surveys indicated a strong pref‐
erence for using supportive tools (e.g. Oﬃce tools). According to Survey 1 those tools
were mainly used for reporting and documentation purposes. Thus, it is unsurprising
that reporting features were also listed as one of the important supportive features for
test automation in Survey 2.
In both surveys, we also requested the respondents to describe important criteria
when selecting a tool (for software testing or test automation). Qualitative coding of the
criteria of Survey 2 indicated two clearly important categories, the possibility to modify
the tool (to the needs of the organization) and usability of the tool. The respondents
seemed to value features such as (in the order of preference) ease and intuitiveness of
use, compatibility of the tool with the existing system, applicability (to tasks, methods
and processes), reporting features as well as price. Cost related issues were not the
topmost interest for the respondents of Survey 2 (although related to the concept of open
source tools).
In Survey 1 the functionalities reported to aﬀect the acquisition of software testing
tools the most were (in the order of preference) price, ease of use, functionality, manage‐
ment and compatibility with other applications. The diﬀerences in conducting the
surveys may aﬀect the results (e.g. possibly having options to choose from vs. free text
questions as in Survey 1 & 2, respectively).
The Top 15 list of tools from Survey 2 has not yet aﬀected the preferences of Knowit,
but it supports their strategy by substantiating the general evidence of the rising trend
of open source tools in the market. For example, the company has already adopted
Jenkins as part of their oﬀering despite the fact that it did not appear in the 2013 survey.
From a viewpoint of a consulting company, tailored sets of tools are required to serve
diﬀerent types of needs of customers. The results from both surveys, in general, seem
to correlate with the tools used by the wide customer-base (mainly Finnish companies)
of Knowit.

296

P. Raulamo-Jurvanen et al.

5.3 Web-Scraping of Top 15 Tools
From the data, see Tables 3 and 4, it seems that Python was clearly the tool being most
“popular” considering any indicator. This is understandable: Python appeared about 25
years ago, thus the tool is expected to have far more software enthusiasts than any other
tool in the list. We are aware Python is not a pure software testing tool, but mainly a
programming language, supporting many testing related activities.
Surprisingly, the diﬀerences in the number of hits for the search terms were not as
diverse among the tools as the case with Wikipedia page views. (Four tools having most
hits shared the same level in hundreds of thousands). As a disclaimer, the hits were not
analyzed any further to reveal whether those links were actually truly relevant to the
original search terms or not. In fact, a hit is a request to a webserver for a ﬁle. Since
Web pages may contain several ﬁles and images, loading a web page does not always
equal to one [18]. Thus, it is claimed that a hit is an inaccurate measure of popularity or
traﬃc of a website, page views providing a more accurate measure [17]. All Top 15
tools but two, Sikuli & Appium, had an article (in English) in Wikipedia. UFT shared
an article with QTP (QTP being deprecated and replaced by UFT).
The commercial tools seem to have a trend of having less questions in StackOverﬂow
than open source (free) tools. That may be due to a fact that commercial software vendors
often provide dedicated customer services for their paying customers in need of help.
With open source tools, people rely on the help of their fellow colleagues or forums of
software professionals (or software enthusiasts). Also, the user manuals (e.g. online
services) for commercial tools may be of higher quality and richer content than those
for open source tools (if manuals exist). Interestingly, the diﬀerences in the average
numbers of view counts per question for tools are not that big. Also, there seems to be
a diﬀerence between posted tweets for commercial and open source tools compared with
the number of StackOverﬂow questions. The ﬁve commercial tools are among the top
eight tools for the number of tweets. However, the tweets may, due to the nature of the
Twitter – list of short messages, possibly connected with hashtag(s), contain totally
irrelevant content. Furthermore, since the length of a tweet is limited the tweets with
“valid” content are not expected to include comprehensive discussions or descriptions
but rather opinions or short comments like user tips, promotions, job advertisements or
release notes.
Contrasting Web-scarping method with Surveys (RQ3)
Utilizing information in the web by diﬀerent searches has for long been an important
method for ﬁnding information about tools for Knowit. For a few decades there have
been many websites collecting “the most used tools” or “the best tools” for the help of
others looking for such tools. However, these are often rather biased geographically (e.g.
US-based perspective only) and do not really show how common some tool is.
Web-scraping provides a rather quick way to acquire large amounts of data in
comparison to surveys. However, selection of relevant sources for data can be diﬃcult
as well as ﬁnding the suitable methods to process and analyze the data, in order to provide
useful or meaningful information. As experienced in this study, the Top 15 tools are
quite diﬀerent by characteristics and purpose. Thus, the popularity or familiarity of the
tools, based on data from Survey 2, are not expected to be comparable, as such. Utilizing

Using Surveys and Web-Scraping to Select Tools

297

Web-scraping as a big-data style approach, as presented in this paper, gives some of the
power of surveys into utilizing the web as a source for information. This makes the
popularity of tools more evident. Still, surveys are irreplaceable in giving voice to the
people themselves, especially in speciﬁc market sectors and contexts – and Webscraping should be considered only as a good add-on source to surveys.
Contrasting Web-scraping results with Survey (RQ3)
Robot Framework was the most popular tool referenced by the respondents of Survey
2, however, just third but last (among the Top 15 tools) according to the ranking based
on the results from Web-scraping (see Tables 3 and 4). Please note, the total ranking is
counted from all ranked columns, taking account the number of references in Survey 2,
Wikipedia page views, Google hits, StackOverﬂow questions, view count of those and
average number of views per question as well as Twitter tweets. Popularity of Selenium
and Jenkins was evident in the results of both Web-scraping and Survey 2 (although the
latter was not referenced in Survey 1). Surprisingly, TFS seems to be widely adopted
(based on the number of Wikipedia page views, StackOverﬂow questions, view count
of those and even tweets) although that was not the case in Survey 2 (or Survey 1).
Popularity, as being widely adopted, is diﬃcult to generalize from data from a survey
since such results are always biased by the size and the origin of the sample, like in the
case of Robot Framework. Web-scraping provides a wider, quantitative perspective to
the tool scene. However, in our surveys the background information about respondents
serves as anchor for positioning the results to more concrete contexts.

6

Discussion

Our data could not conﬁrm the growth of the number of the internal tools in relation to
open source frameworks, as reported by [19]. Only about 7 % of the respondents of
Survey 2 reported having been using an inhouse tool for some speciﬁc purpose. (It may
also be that those tools have not been reported accurately or have not been considered
as tools related to software testing). Interestingly, the most popular tool, Selenium,
reported by [19] was the second most popular in our survey, after Robot Framework
(particularly popular in Finland due to its origin). Four and ﬁve out of nine diﬀerent
tools reported for small and large size organizations (respectively) were also in the list
of Top 15 tools in our Survey 2 [19]. However, the order of the tools was otherwise
diﬀering.
When considering popularity or familiarity of software testing or test automation
tools, sources like StackOverﬂow or social media like Twitter are actually rather good
sources of information. On the other hand, comparing commercial and open source tools
that way does not seem very appropriate since most likely majority of questions or
problems faced with commercial tools may have been handled via oﬃcial (private)
customer support channels, not via developer forums. Also, groups of software profes‐
sionals may have formed self-suﬃcient support networks within their organizations
only.
As with criteria provided by the respondents, ease of use, compatibility and applic‐
ability of the tools are important. Some software professionals may have been using

298

P. Raulamo-Jurvanen et al.

some tool for years, may be well familiar with functionalities, pros and cons of such
tools, but not aware of existing or new tools that could actually be more suitable (or
supportive) for their purposes (e.g. considering costs or eﬀectiveness). Some commer‐
cial tools provide a period of free trial but companies may not have the resources to
share for trying out diﬀerent choices or combinations of compatible tools. This is where
the wisdom of the crowds could be applied to.
6.1 Limitations
There are several limitations aﬀecting this study. Firstly, the surveys were targeted to
(mainly Finnish) selected groups of software professionals, thus the two sets of respond‐
ents were rather small and expected to be biased (although experienced based on work
history). Secondly, the list of tools analyzed is based on experiences of those small
groups of respondents. The tools were not only related to software testing or software
test automation but also more general and supportive for the process. Thirdly, compar‐
ison of popularity or familiarity of such tools brought up issues like commercialism and
concept of open source software, diﬀerent characteristics and purposes for the tools,
diﬀerent contexts for utilizing those tools and compatibility, just to mention a few.
Furthermore, the surveys were not identical as the ﬁrst survey used a list of preselected
tool options while the latter survey was implemented as open text-ﬁelds. The question
that remains is whether some less used tools are left out if only the main tools are
mentioned (e.g. a tool set of a typical project may include 10–20 tools).

7

Conclusions and Future Work

In this paper, we make three contributions. First, we describe the test tool selection
problem in a software testing consultancy. Having the right tools is critical for testing
consultants when they oﬀer suitable services for their clients (RQ1). Second, we
presented results of two surveys conducted in 2016 and 2013 (the responses mainly from
Finland). Surveys are a good source of professional knowledge that can be applied,
within the limits of a known context, as such. We found that among our respondents’
the tools which have gained popularity are Robot Framework, Jenkins, and Sikuli while
Selenium has maintained its high popularity (RQ2). Third, we present Web-scraping as
a method that may provide additional quantitative (or qualitative) support for the tool
selection process (RQ3).
According to our results, from both surveys and Web-scraping, the tools that ranked
the most popular (based on the references of 2016 and metrics from Web-scraping,
Tables 3 and 4) were Python, Selenium, Jenkins, TFS and Junit (in that particular order).
Python was included in the list of Top 15 tools since many of the respondents of Survey
2 had listed it as an important instrument for test automation and thus is clearly of interest
in global scale, too.
Results provided by the respondents regarding important criteria for tool selection
were surprisingly alike (RQ4). However, when mapping our open coded criteria results
to the ISO/IEC 25010 quality model, there were two evident categories that could not

Using Surveys and Web-Scraping to Select Tools

299

be directly mapped to the model, namely “Support” and “Costs”. “Support” (including
“General Tool Support”, “Popularity”, “Future of the Tool” and “Vendor Independ‐
ence”) or “Costs” (including “Price”, “License model”, “Acquisition costs”, “Operating
costs” or “Free”) are in fact characteristics that may be critical in providing value to
stated and implied needs of stakeholders.
The results highlight that local preferences may diﬀer from global preferences
(considering e.g. Robot Framework), but some tools like Selenium and Junit stand out
as popular tools based on our surveys and Web-scraping as well as [19]. Web-scraping
is seen as cost eﬀective support for traditional surveys. Utilizing multiple sources
enables getting a comprehensive picture of the tools in markets.
As future work, based on this study, another survey is planned to be conducted
applying acquired criteria for related tools. The idea is to support the process of selecting
the right tools by acquiring and comparing software professionals’ knowledge or
perceptions of characteristics of selected tools.
Acknowledgments. Our thanks to all respondents for their contribution to this survey.

References
1. Capgemini Consulting: World Quality Report 2015–2016. (2015). https://
www.capgemini.com/thought-leadership/world-quality-report-2015-16
2. Fewster, M., Graham, D.: Software Test Automation: Eﬀective Use of Test Execution Tools.
ACM Press/Addison-Wesley Publishing Co (1999)
3. Garousi, V., Zhi, J.: A survey of software testing practices in Canada. J. Syst. Softw. 86(5),
1354–1376 (2013)
4. Hussain, S., Wang, Z., Toure, I. K., Diop, A.: Web service testing tools: A comparative study.
arXiv preprint arXiv:1306.4063 (2013)
5. ISTQB (International Software Testing Qualiﬁcations Board): ISTQB® Worldwide Software
Testing Practices Report 2015–2016 (2016). http://www.istqb.org/references/surveys/istqbworldwide-software-testing-practices-report.html
6. Kaur, H., Gupta, G.: Comparative study of automated testing tools: selenium, quick test
professional and testcomplete. Int. J. Eng. Res. Appl. 3(5), 1739–1743 (2013)
7. Kaur, M., Kumari, R.: Comparative study of automated testing tools: testcomplete and
quicktest pro. Int. J. Comput. Appl. 24(1), 1–7 (2011)
8. Michael, J.B., Bossuyt, B.J., Snyder, B.B.: Metrics for measuring the eﬀectiveness of
software-testing tools. In: 13th International Symposium on Software Reliability Engineering,
pp. 117–128. IEEE (2003)
9. Ng, S., Murnane, T., Reed, K., Grant, D., Chen, T.: A preliminary survey on software testing
practices in Australia. In: 2004 Australian Software Engineering Conference, pp. 116–125.
IEEE (2004)
10. Pan, A., Raposo, J., Álvarez, M., Hidalgo, J., Viña, Á.: Semi-automatic wrapper generation
for commercial web sources. In: Rolland, C., Brinkkemper, S., Saeki, M. (eds.). IFIP, vol.
103, pp. 265–283. Springer, Heidelberg (2002). doi:10.1007/978-0-387-35614-3_16
11. Petersen, K., Wohlin, C.: Context in industrial software engineering research. In: 3rd
International Symposium on Empirical Software Engineering and Measurement, pp. 401–404.
IEEE (2009)

300

P. Raulamo-Jurvanen et al.

12. Portillo-Rodríguez, J., Vizcaíno, A., Piattini, M., Beecham, S.: Tools used in global software
engineering: a systematic mapping review. Inf. Softw. Technol. 54(7), 663–685 (2012)
13. Poston, R.M., Sexton, M.P.: Evaluating and selecting testing tools. Software 9(3), 33–42,
IEEE (1992)
14. Surowiecki, J.: The Wisdom of Crowds. Anchor, New York (2005)
15. Tassey, G.: The economic impacts of inadequate infrastructure for software testing. National
Institute of Standards and Technology, RTI Project, 7007.011 (2002)
16. Tiitinen, M., Kakkonen, K.: Software Testing Tools Research in Finland, Knowit Oy
(2013). http://www.knowit.fi/Documents/Testausty%C3%B6kalututkimus_Posteri_A3_
Tiitinen_V11_ENG.pdf
17. Wikipedia Page view. https://en.wikipedia.org/wiki/Page_view
18. Wikipedia article traﬃc. https://en.wikipedia.org/wiki/Wikipedia_article_traﬃc
19. Yehezkel, S.: Test Automation Survey (2016). http://blog.testproject.io/2016/03/16/testautomation-survey-2016/

On the Need for a New Generation of Code
Review Tools
Tobias Baum(B) and Kurt Schneider
FG Software Engineering, Leibniz Universität Hannover, Hannover, Germany
{tobias.baum,kurt.schneider}@inf.uni-hannover.de

Abstract. Tool support for change-based code review is gaining
widespread acceptance in the industry. This indicates that the current
generation of tools is well-aligned to current code review practices. Nevertheless, we believe that further improvements in code review tooling can
lead to increased review eﬃciency and eﬀectiveness. In this paper, we combine results from a qualitative study and results from the literature to substantiate this claim. We derive promising improvement areas and provide
an overview of existing research in these areas. A common attribute of
these improvements is that they trade ﬂexibility for reviewer support. As
ﬂexibility is one of the main characteristics of the current generation of
code review tools in Hedberg’s classiﬁcation of review tool generations, we
regard these coming tools as part of a new generation of code review tools.

Keywords: Code reviews
support

1

· Code inspections and walkthroughs · Tool

Introduction

Code review is a well-established method of software quality assurance. In recent
years, change-based review has become the dominant style of code review in
industry [5,24]. Its main characteristics are the use of code changes performed
in a unit of work, e.g. a user story, to determine the scope of the review, and
the replacement of management intervention through conventions or rules for
many decisions [5]. Change-based review is supported by tools, in some cases
specialized code review tools, in other cases general-purpose tools like “diﬀ”.
Looking at their widespread adoption, these tools seem to address the current
needs of the industry. Nevertheless, we believe there is still room for improvement, not least because these tools do not fully incorporate existing research
results. The purpose of this article is to derive and collect promising ideas to
improve code review eﬀectiveness and eﬃciency through code review tools in the
context of industrial software development. This is done from the point of view
of researchers developing these tools and based on results published in the literature and on interviews with software development professionals. This article can
be used to guide and direct future research as well as tool development eﬀorts.
c Springer International Publishing AG 2016

P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 301–308, 2016.
DOI: 10.1007/978-3-319-49094-6 19

302

2

T. Baum and K. Schneider

Methodology

The style of this article is largely deductive. We extract well-founded hypotheses from existing research and combine them to derive and evaluate improvement opportunities for industrial code review. While there are notable diﬀerences between classic Fagan Inspection and modern change-based code review,
many important aspects are similar [5,24]. We therefore also include results
from research on classic inspections as evidence, as long as we believe them
to be applicable. Further experiments could be conducted to ascertain these
assumptions. A limitation of this part of our analysis is that we did not perform
a systematic literature review (SLR) in the narrow sense of the term. An SLR
could have further reduced the risk to miss relevant publications.
We recently performed a study based on semi-structured interviews with 24
software professionals from 19 companies [5,6]. To some extent, these interviews
concerned the way in which reviewers work and which problems they perceive.
These interviews form the second pillar of our argumentation, in addition to the
literature. All interviews were recorded and later transcribed. Most interviews
were conducted by the ﬁrst author. Some interviews were performed by another
researcher to reduce the risk of bias. Our sample has a focus on small and medium
standard software development companies and in-house IT departments from
Germany, but we included contrasting cases for all main factors. The interviewees
are mostly software developers and team or project leads, as the development
teams were responsible for code reviews in the sampled cases. The interviews
were conducted between September 2014 and May 2015. Further methodological
details on the interviews can be found in our related articles [5,6]. The main
study [6] followed “Grounded Theory” methodology, but the results presented
in the current article are not a grounded theory. We cite many statements from
the interviews as examples for certain points. The subscripts at these citations
denote the interviewee ID from [6].
To assess the current state of code review tools, we combined information
from our interviews and from the websites of the respective tools. To a limited
degree, we also executed and tried some of the tools.

3

What Do We Know About Code Reviews?

A lot of research has been done on code reviews and inspections, and still many
questions could not be answered conclusively. But some results are relatively
well supported and a subset of these will form the foundation of our discussion:
The ﬁrst such result concerns which factors have a major and which only
a minor inﬂuence on the eﬀectiveness and eﬃciency of reviews. When analyzing experimental data, Porter et al. “found that [reviewers, authors, and code
units] were responsible for much more variation in defect detection than was
process structure”, and they “conclude that better defect detection techniques,
not better process structures, are the key to improving inspection eﬀectiveness.”
[21]. A similar conclusion is reached by Sauer et al., who identify “individuals’

On the Need for a New Generation of Code Review Tools

303

task expertise as the primary driver of review performance” based on theoretical
considerations. Correlations between the (inspection) expertise of the reviewer
and the number of found defects have also been reported by Rigby [23] and by
Biﬄ and Halling [8], just to name a few. We conclude that the major factors
inﬂuencing code review eﬀectiveness and eﬃciency are the reviewer, its relation
to the artifact under review and the way in which it performs the checking.
The second important result is about the role of understanding the artifact
under review. In their study based on interviews with developers at Microsoft,
Bacchelli and Bird found that “[m]any interviewees eventually acknowledged
that understanding is their main challenge when doing code reviews” [2], which
conﬁrmed earlier results from Tao et al. [26]. Further support for a positive
correlation between code understanding and review eﬀectiveness comes from
experiments by Dunsmore, Roper and Wood [12]. Our interview results fully
support these ﬁndings, e.g.: “I have to understand what the other developer
thought at that time. And for that you look very closely at the code, and then
things that should or could be done better somehow come up automatically”3 .

4

The Problem of Large Changes

In our interviews, we asked about problems hampering review eﬀectiveness. One
of the most common themes was the diﬃculty to understand and review large
changesets: “Smaller commits are generally not a problem. But these monster
commits are always . . . not liked very much by the reviewers.”5 “What sometimes
impedes me is when the ticket is just too big.”7 “When you have such a big pile
to review the motivation is not very high and you probably don’t approach the
review with the needed quality in mind.”12
The conclusion that large changesets are problematic can also be derived from
other research results: There is evidence that the review eﬀectiveness greatly
decreases when the review rate (checked lines of code/time for checking) is outside the optimal interval (see e.g. [15]). There is also evidence that concentration and therefore review eﬀectiveness fades after some time of reviewing [19,22].
Combining these values leads to an upper limit on the maximal size of an artifact
that can be reviewed eﬀectively in a single session.
Given the problems with the review of large changes, many teams resort to
the frequent review of small changes [23]. Up to a certain point, this is a good
thing to do, but there are also arguments in favor of larger changes and reviews:
The change under review should be self-contained, it should fulﬁll certain quality
criteria before central check-in (at least to be compilable) and reviewing very
small changes can lead to high overhead and duplicate work [26]. So instead of
forcing every change to be very small, we argue to make the review of larger
changes more eﬀective and let changes stay at their “smallest natural size”.

5

Tool Support to the Rescue

We substantiated in the previous sections that to increase the eﬀectiveness and
eﬃciency of code reviews for defect detection, we should focus on the reviewer

304

T. Baum and K. Schneider

Existing findings
- Code review effectiveness and
efficiency depend mainly on the
work of the reviewer and its fit
to the artifact under review.
- Understanding the review artifact
is the most important aspect of
reviewing code.
- Change-based code review is the
dominating style of code review
in practice.
- The review of large changes is
the most significant challenge in
change-based code review.

General improvement options

Possible tool features

Choose the best reviewer for
the job

Reviewer recommendation

Avoid large changesets
Shrink the changeset to be

=> reviewed

Help the reviewer to understand large changes
Decrease the need to understand the change

Semantic Diffs
Refactoring detection
Identification of low risk
change fragments
IDE-like navigation support
Summarizing/Visualizing the
change
Guiding the reviewer
(Integration of static analysis)

Fig. 1. Overview of argumentation and tool features

and how to help her/him understand large code changes better. We believe
that improved tool support provides a lot of opportunities in this regard, and
will give examples in the following subsections. Additionally, an overview of
our argumentation is shown in Fig. 1. The subsections correspond to the most
important inﬂuencing factors, deduced from the results mentioned so far:
–
–
–
–

Choose the best reviewer for the job (Sect. 5.1)
Shrink the size of the changeset that has to be reviewed (Sect. 5.2)
Help the reviewer to understand large changesets (Sect. 5.3)
Decrease the need to understand the change (Sect. 5.4)

5.1

Reviewer Recommendation

In recent years, there have been a number of studies on “reviewer recommendation”, i.e. on ﬁnding the best reviewer(s) for a given change (e.g. [3,29]). While
this promises a large eﬀect in theory, there are several problems reducing the
beneﬁt, especially in smaller teams. The most obvious is that in a small team, it
is often fairly easy to see who is a good reviewer for a change, so that computer
support does not provide large gains. In some other cases, the reviewer for a certain module is ﬁxed [5], so there is no choice at all. Additionally, always choosing
the best reviewer can lead to a high review load for experienced developers, and
a high workload has a negative impact on review quality [7]. Therefore reviewer
recommendation has to move from determining local optima for every single
review to a more global optimization of reviewer assignment.
5.2

Reducing Changeset Size

Given large changesets with singular changes of varying relevance for the review
goals, reviewers try to manually pick the relevant subset. This is seen as hard and
error-prone: “After some time you get a feeling which ﬁles are relevant and which
are not, but it’s hard to ﬁlter them out. And when I don’t look at them there
might be some change in there that was relevant, anyway. That’s problematic.”8

On the Need for a New Generation of Code Review Tools

305

An important special case is systematic changes, especially rename and move
refactorings. This special case has been studied for example by Thangthumachit,
Hayashi and Saeki [28] and Ge [14]. For the more general case, Kawrykow and
Robillard [18] developed a method to identify “non-essential” diﬀerences. Zhang
et al. [30] describe the tool “Critics” to help in inspecting systematic changes
using generic templates. Tao and Kim [27] propose an approach to partition
composite code changes. Further research could provide a better foundation to
decide which changes are low-risk, and it could look into the distinction between
change fragments that are error-prone and need to be checked in detail and
change fragments that only need to be read to help understanding. Another
research avenue is to include more data, such as test coverage information, to
assess review relevance. Nevertheless, much could already be gained by bringing
the promising existing results into wider use.
5.3

Support for Understanding the Change

A theme that occurred throughout our interviews is that large changes are best
reviewed with the search and hyperlinking support of an IDE (e.g. “I think
reviewing code purely in’Crucible’ only works for trivialities. Because naturally
many features are missing that you have in an IDE.”2 ). This improvement has
already made its way into some widely used review tools, either by making IDElike support available in a browser (e.g. “Upsource”1 ) or by making the review
tool available as an IDE plugin (e.g. “AgileReview”2 or “EGerrit”3 ).
Many of our interviewees try to get a high-level understanding of the change
at the start of the review (“at ﬁrst an overview because otherwise the problem is that you loose sight of the interrelation of the changes”10 ). The current
support for this activity is very limited, consisting mainly of the overview of
the commit messages of the singular commits belonging to the change. There is
relatively little research on visualizing and summarizing code changes for better understanding: McNair, German and Weber-Jahnke propose an approach
to visualize change-sets [20], as do Gomez, Ducasse and D’Hondt [16]. In addition, several textual summarization techniques have been proposed (e.g. [9]). A
related technique that can help to summarize the contents of a change is “change
untangling” [4,11,27]. We believe that more research on these topics is needed
to make change visualization eﬀectively usable by reviewers.
After having an overview of the changes, the reviewer needs to step through
the change’s details in some order. Many reviewers try to ﬁnd an order that helps
their understanding, but often fall back to the order presented by their review
tool: “The problem is you sometimes get lost and don’t ﬁnd a good starting
point.”10 “If you don’t have that, you just step through the ﬁles in the commit
one after another . . . ”10 . A similar ﬁnding resulted from a study by Dunsmore,
Roper and Wood where participants suggested “ordering of code” to improve
1
2
3

https://www.jetbrains.com/upsource/.
http://www.agilereview.org.
https://www.eclipse.org/egerrit/.

306

T. Baum and K. Schneider

inspections [13]. Guiding the reviewer as proposed here shares some similarities
with the reading techniques studied intensively for inspections [1,10]. The main
diﬀerence is that these reading techniques try to change the way the reviewer
works, while the proposed guiding moves some cognitive load from the human
reviewer to the tool. In addition, most reading techniques proposed so far are
not intended to be used with changesets, so that research opportunities abound
in this area.
5.4

Decrease the Need for Code Understanding

From a theoretical point of view, reducing the need to understand the code is
another possibility to solve the stated problem. Essentially this is a question of
eﬃciency: Is in-depth code review the most eﬃcient way to ﬁnd a certain defect
type or are there more eﬃcient ways, e.g. static code analysis or testing? [15,25] As
long as there are practically relevant defect types for which in-depth code review is
most eﬃcient, understanding the code will still be needed. And when there will be
no such defect types anymore, for example after a breakthrough in static analysis
research, code review in its current form will not be needed any longer for defect
detection. Therefore, we won’t discuss this topic further in this article.

6

A New Generation of Code Review Tools

About a decade ago, Henrik Hedberg proposed a classiﬁcation of software inspection/review tools into generations [17]. He concluded that the coming ﬁfth generation should provide ﬂexibility with regard to the supported documents and
processes and that they should comprehensively include existing research results.
This prediction has come true (with limitations): Current review tools like “Gerrit”4 , “Crucible”5 or “Collaborator”6 are ﬂexible and commonly support the
review of any kind of text ﬁle. In the preceding sections, we derived opportunities to reach a higher level of review eﬀectiveness. For most of them, a reiﬁcation
from the review of changes in text ﬁles to the review of changes in source code
has to take place. Flexibility is traded for better reviewer support to some degree.
This leads us to expect the rise of the sixth generation of code review tools, the
generation of “cognitive support review tools”.

7

Summary

We collected four ﬁndings on code review we regard as well established: (1) Code
review eﬀectiveness and eﬃciency depend to a large degree on the reviewer, its
style of work and its ﬁt to the artifact under review. (2) Understanding the
review artifact is the most important aspect of reviewing code. (3) Review in
4
5
6

https://www.gerritcodereview.com.
https://www.atlassian.com/software/crucible.
https://smartbear.com/product/collaborator/.

On the Need for a New Generation of Code Review Tools

307

industry is commonly done change-based. (4) The review of large changes is
the most signiﬁcant challenge in code review. Based on these assumptions we
derived leverage points to improve review eﬀectiveness and eﬃciency through
tool support. For each of these points, we surveyed existing research and state
open research questions. In our own work, we currently look into some of these
research questions. We believe there is an abundance of open questions for other
researchers to join us in our eﬀorts to lay the foundation for the generation of
“cognitive support review tools”.

References
1. Aurum, A., Petersson, H., Wohlin, C.: State-of-the-art: software inspections after
25 years. Softw. Test. Veriﬁcation Reliab. 12(3), 133–154 (2002)
2. Bacchelli, A., Bird, C.: Expectations, outcomes, and challenges of modern code
review. In: Proceedings of the 2013 International Conference on Software Engineering, pp. 712–721. IEEE Press (2013)
3. Balachandran, V.: Reducing human eﬀort and improving quality in peer code
reviews using automatic static analysis and reviewer recommendation. In: Proceedings of the 2013 International Conference on Software Engineering. IEEE Press
(2013)
4. Barnett, M., Bird, C., Brunet, J., Lahiri, S.K.: Helping developers help themselves:
automatic decomposition of code review changesets. In: Proceedings of the 2015
International Conference on Software Engineering. IEEE Press (2015)
5. Baum, T., Liskin, O., Niklas, K., Schneider, K.: A faceted classiﬁcation scheme
for change-based industrial code review processes. In: 2016 IEEE International
Conference on Software Quality, Reliability and Security (QRS). IEEE (2016)
6. Baum, T., Liskin, O., Niklas, K., Schneider, K.: Factors inﬂuencing code review
processes in industry. In: Proceedings of the ACM SIGSOFT 24th International
Symposium on the Foundations of Software Engineering. ACM (2016)
7. Baysal, O., Kononenko, O., Holmes, R., Godfrey, M.W.: Investigating technical and
non-technical factors inﬂuencing modern code review. Empir. Softw. Eng. 21(3),
932–959 (2016). doi:10.1007/s10664-015-9366-8
8. Biﬄ, S., Halling, M.: Investigating the inﬂuence of inspector capability factors with
four inspection techniques on inspection performance. In: Eighth IEEE Symposium
on Software Metrics, 2002, Proceedings, pp. 107–117. IEEE (2002)
9. Buse, R.P., Weimer, W.R.: Automatically documenting program changes. In: Proceedings of the IEEE/ACM international conference on Automated software engineering, pp. 33–42. ACM (2010)
10. Denger, C., Ciolkowski, M., Lanubile, F.: Investigating the active guidance factor in
reading techniques for defect detection. In: International Symposium on Empirical
Software Engineering, 2004, Proceedings, pp. 219–228. IEEE (2004)
11. Dias, M., Bacchelli, A., Gousios, G., Cassou, D., Ducasse, S.: Untangling ﬁnegrained code changes. In: 2015 IEEE 22nd International Conference on Software
Analysis, Evolution and Reengineering, pp. 341–350. IEEE (2015)
12. Dunsmore, A., Roper, M., Wood, M.: The role of comprehension in software inspection. J. Syst. Softw. 52(2), 121–129 (2000)
13. Dunsmore, A., Roper, M., Wood, M.: Systematic object-oriented inspection - an
empirical study. In: Proceedings of the 23rd International Conference on Software
Engineering, pp. 135–144. IEEE Computer Society (2001)

308

T. Baum and K. Schneider

14. Ge, X.: Improving tool support for software developers through refactoring detection. Ph.D. thesis, North Carolina State University (2014)
15. Gilb, T., Graham, D.: Software Inspection. Addison-Wesley, Wokingham (1993)
16. Gómez, V.U., Ducasse, S., D’Hondt, T.: Visually characterizing source code
changes. Sci. Comput. Program. 98, 376–393 (2015)
17. Hedberg, H.: Introducing the next generation of software inspection tools. In:
Bomarius, F., Iida, H. (eds.) PROFES 2004. LNCS, vol. 3009, pp. 234–247.
Springer, Heidelberg (2004). doi:10.1007/978-3-540-24659-6 17
18. Kawrykow, D., Robillard, M.P.: Non-essential changes in version histories. In: Proceedings of the 33rd International Conference on Software Engineering, pp. 351–
360. ACM (2011)
19. Laitenberger, O., Leszak, M., Stoll, D., El Emam, K.: Quantitative modeling of
software reviews in an industrial setting. In: Sixth International, Software Metrics
Symposium, 1999, Proceedings, pp. 312–322. IEEE (1999)
20. McNair, A., German, D.M., Weber-Jahnke, J.: Visualizing software architecture
evolution using change-sets. In: 14th Working Conference on Reverse Engineering,
2007, WCRE 2007, pp. 130–139. IEEE (2007)
21. Porter, A., Siy, H., Mockus, A., Votta, L.: Understanding the sources of variation
in software inspections. ACM Trans. Softw. Eng. Methodol. (TOSEM) 7(1), 41–79
(1998)
22. Raz, T., Yaung, A.T.: Factors aﬀecting design inspection eﬀectiveness in software
development. Inf. Softw. Technol. 39(4), 297–305 (1997)
23. Rigby, P.C.: Understanding open source software peer review: review processes,
parameters and statistical models, and underlying behaviours and mechanisms.
Ph.D. thesis, University of Victoria (2011)
24. Rigby, P.C., Bird, C.: Convergent contemporary software peer review practices. In:
Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering,
pp. 202–212. ACM (2013)
25. Roper, M., Wood, M., Miller, J.: An empirical evaluation of defect detection techniques. Inf. Softw. Technol. 39(11), 763–775 (1997)
26. Tao, Y., Dang, Y., Xie, T., Zhang, D., Kim, S.: How do software engineers understand code changes? an exploratory study in industry. In: Proceedings of the ACM
SIGSOFT 20th International Symposium on the Foundations of Software Engineering. ACM (2012)
27. Tao, Y., Kim, S.: Partitioning composite code changes to facilitate code review.
In: 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories
(MSR), pp. 180–190. IEEE (2015)
28. Thangthumachit, S., Hayashi, S., Saeki, M.: Understanding source code diﬀerences
by separating refactoring eﬀects. In: 2011 18th Asia Paciﬁc Software Engineering
Conference (APSEC), pp. 339–347. IEEE (2011)
29. Thongtanunam, P., Tantithamthavorn, C., Kula, R.G., Yoshida, N., Iida, H., Matsumoto, K.-I.: Who should review my code? a ﬁle location-based code-reviewer
recommendation approach for modern code review. In: 2015 IEEE 22nd International Conference on Software Analysis, Evolution and Reengineering (SANER)
(2015)
30. Zhang, T., Song, M., Pinedo, J., Kim, M.: Interactive code review for systematic
changes. In: Proceedings of 37th IEEE/ACM International Conference on Software
Engineering. IEEE (2015)

Process Improvement

GQM+Strategies and IDEAL: A Combination
of Approaches to Achieve Continuous SPI
An Experience Report in a Large Multi-industry
State-Owned Company
Gustavo López ✉ , Alexia Pacheco, Francisco Cocozza, Diana Garbanzo,
Brenda Aymerich, and Gabriela Marín
(

)

Universidad de Costa Rica, San José, Costa Rica
{gustavo.lopez_h,alexia.pacheco,francisco.cocozzagarro,
diana.garbanzo,brenda.aymerich,gabriela.marin}@ucr.ac.cr

Abstract. GQM+Strategies is an approach that aligns the business goals at each
level of an organization to strategies and assesses the achievement of goals. The
IDEAL model is an organizational improvement model. In this paper, we present
our experiences applying the IDEAL model and GQM+Strategies to conduct
continuous software process improvement (SPI) and establish a measurement
program in a large multi-industry state-owned company. Our goal is to provide
evidence of the use of these methods and models in such complex scenarios. The
motivation for this paper was the lack of “from the trenches” perspectives on SPI
in this kind of contexts. The main challenges faced during the experiences
reported in this paper include: rigid control structures used to manage and monitor
IT investment, inadequate or incomplete use of other measurement methods, and
lack of continuous improvement culture (due to many years in a monopolistic
industry). Moreover, we present ways in which we combined GQM+Strategies
and the IDEAL model to deploy a continuous process improvement program in
a context of limited resources and serious business threats, and to convince the
company employees of the need for process improvement.
Keywords: Software engineering · Software process improvement · IDEAL
model · GQM+Strategies · Measurement · Large multi-industry state-owned
company · Experience report

1

Introduction

Software Process Improvement (SPI) is a complex and expensive task. SPI becomes
even harder to implement when the company in which is going to be applied is either
non software-driven or managers have not realized that it is in fact a software-driven
company.
Nowadays, most companies are software-driven because productivity, time-tomarket, and time-to-decide are the most critical diﬀerentiators in competitive industries.
© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 311–326, 2016.
DOI: 10.1007/978-3-319-49094-6_20

312

G. López et al.

Software processes have a large impact not only on the organizations following them,
but also on the products being developed [1]. Moreover, the impact of process change
can range from outstanding to chaotic for the companies undertaking SPI initiatives.
Traditionally, small and medium sized enterprises (SME) have worried about fast
return of investment (RoI), and large companies have worried about large RoI. However,
the volatility in most industries is forcing a change in these perspectives [2].
In this paper, we present an experience report in which a SPI initiative is undertaken
in a large multi-industry company. Two major characteristics of this company are that
it is a state-owned company and that one of the industries in which it performs recently
moved from a monopolistic industry to a competitive one. Furthermore, this company
is part of a larger corporation.
One challenge of implementing a SPI program in this context is the fact that the
company investment in software development is high; however, most of the investment
goes to third parties providing services or products (e.g., outsourcing, solution and
infrastructure providers). Another complication is that, since it is a state-owned
company, rigid control structures are used to manage and monitor incomes and expenses.
However, the largest challenge is that the fact of being in a competitive environment
is only starting to sink in the company employees’ minds. Moreover, old monopolistic
practices became instantly obsolete when the market opened for competition. To comply
with regulations, to invest eﬃciently in SPI and to speed up the employee adoption of
new practices, we combined a systematic approach for SPI (i.e., IDEAL model (hence‐
forth IDEAL)) and a goal alignment method (i.e., GQM+Strategies).
In order to become a high performance company, goals must be understood, stratiﬁed
by levels and linked to one another [3, 4]. We applied GQM+Strategies to try to achieve
this goal. We applied IDEAL to comply with internal and external regulations (e.g.,
audits, comptroller reviews, and even internal questionings).
This paper presents the eﬀorts of an academic research team jointly with company
representatives to execute a SPI in a complex company. The goal of this research is to
assess the applicability of IDEAL and GQM+Strategies combined to conduct a SPI
eﬀort. This combination was necessary due to the necessity of explicitly stating the SPI
plan.
Even though GQM+Strategies is based on a method for SPI [5], it is focused on
measurement and goal alignment. The use of Balanced Scorecards [6, 7] is an established
practice in the company. Therefore, using other performance management tool (as
managers originally saw GQM+Strategies) to drive SPI was not conceptually correct,
as it would be contradictory with traditional practices.
This rest of the paper is structured as follows: Sect. 2 presents theoretical back‐
ground; Sect. 3 delves in the application context in which we implemented GQM+Strat‐
egies and IDEAL, Sect. 4 presents the diﬀerent ways in which we saw ﬁt to combine
GQM+Strategies and IDEAL, Sect. 5 describes the main results of our experiences. In
Sect. 6 we discuss our experiences and detail lessons learned. Finally, Sect. 7 presents
our ﬁnal considerations.

GQM+Strategies and IDEAL

2

313

Theoretical Background

2.1 GQM+Strategies
GQM+Strategies is a measurement approach that helps align the goals and strategies of
an organization. A goal is a description of a status in which the organization wants to
be in the future. A strategy deﬁnes the actions that the organization will carry to achieve
such status [5].
The core of GQM+Strategies is based on the fact that goals must be measurable in
order to assess if they are accomplished or not. Strategies are deﬁned considering context
factors and organizational assumptions that could aﬀect their outcome [5].
GQM+Strategies requires interpretation models that would help managerial level
decision-makers to understand the measurement results. GQM+Strategies consists of 6
phases [5, 8]:
• Initialize. In this phase is intended to assure commitment and resources for applying
GQM+Strategies. Moreover, the responsibilities are deﬁned and involved people are
trained to understand the approach. The application process is planned and the scope
is deﬁned.
• Characterize the Environment. The context is key in the application of GQM
+
Strategies, therefore, this stage is intended to characterize the organization in order
to understand the “best” approach to achieve goal alignment.
• Deﬁne Goals, Strategies, and Measurement. GQM+Strategies main tool is the
GQM+Strategies Grid. The grid is an instrument that shows the organization goals
and their relation. Moreover, the goals are linked with strategies across diﬀerent units.
The goals also have a measurement model for evaluating goal attainment, and guide‐
lines for interpreting the measurement data. The grid is a live tool (i.e., it changes
according to the organization’s reality).
• Plan Grid Implementation. In this phase, plans are prepared for implementing,
measuring and deploying the strategies described in the grid. At this point, speciﬁc
processes must be executed to implement the strategies deﬁned in the GQM+Strat‐
egies Grid (i.e., GQM+Strategies does not propose methods to implement the strat‐
egies).
• Execute Plans. In this phase, strategies are implemented and measured according to
the plans. Plan execution triggers a cycle within itself, therefore, the plan should have
been design to assess the execution in order to continue the implementation of strat‐
egies across the organization. GQM+Strategies proposes the implementation of strat‐
egies in small units at ﬁrst, to validate and assess their adequacy.
• Analyze Outcomes. The idea of analyzing the outcomes of the plan execution phase
lies in the assessment of either a serious problem or a successful deployment of the
strategy addressed.
• Package Improvements. The ﬁnal phase of GQM+Strategies is designed to learn
from the process, and modify the grid according to the lessons learned. Moreover,
the authors of GQM+Strategies state that this phase must communicate the outcomes
to provide visibility and to initiate a new cycle.

314

G. López et al.

2.2 The IDEAL Model
The IDEAL model is deﬁned as “A life cycle approach for process improvement” [9,
10]. It was developed by the Software Engineering Institute (SEI) at Carnegie Mellon
University. IDEAL oﬀers guidelines for the selection, implementation and evaluation
of software processes and methods in an organization. Moreover, it provides strategies
to understand when the process improvement might be required. IDEAL consists of 5
phases and a total of fourteen activities, which address the speciﬁc requirements of one
complete cycle of the model. Each phase is composed of several activities.
• Initiating. This is the starting point of IDEAL once the stimulus for improvement is
evident; the main infrastructure is established, as well as roles, responsibilities, and
high level goals. Moreover, this phase sets the context and sponsorship for the
improvement. The main product of this phase is a plan to guide the process improve‐
ment through the organization. IDEAL has some key groups that lead or execute
tasks. In the initiating phase, the Management Steering Group (MSG) and the soft‐
ware engineering process group (SEPG) are deﬁned and conformed.
• Diagnosing. This phase sets the baselines for the process improvement. It is aimed
to identify the infrastructure, characterize current practices and deﬁne improvement
recommendations. It’s a high level diagnosis of the organization’s vision, business
plans and current process maturity. The recommendations that arise from this activity
provide the basis for an action plan draft.
• Establishing. In this phase the organization prioritizes the improvement activities,
based on the diagnosis ﬁndings, diﬃculty of implementing the improvements, and
other factors. With the improvements prioritized, the plans are created and measur‐
able goals are deﬁned, these goals must be aligned with the ones established in the
initiating phase. Monitoring metrics are deﬁned, and the resources required to
perform the work are committed. In this phase the technical working groups (TWGs)
that will execute the improvement are deﬁned.
• Acting. This is the execution phase; solutions to address the areas for improvements
are created, piloted, and deployed throughout the organization. This phase has two
main purposes: assess the organization’s readiness to adopt the new practices, and
document the experiences gathered from the pilot implementations of the improve‐
ments, in order to ease the institutionalization. After assessing the success of the
improvement, it is planned and executed, and the installation is tracked.
• Leveraging. The ﬁnal phase is conducted to learn from the experiences and to deter‐
mine if the goals have been met. Moreover, this phase is crucial to improve the actions
that will be executed in further IDEAL cycles. An evaluation of the strategy, methods
and infrastructure is conducted.
This section theoretically described GQM+Strategies and IDEAL. These methods
were selected to be implemented in a large multi-industry state-owned company. Even
though authors consider important the comparison of these approaches with related
works, lack of space to deliver a proper comparison left this part of the work out of this
research paper. Next section describes the application context.

GQM+Strategies and IDEAL

3

315

Application Context

The experiences reported in this paper are the result of a SPI application in a large multiindustry state-owned company. Moreover, this company is part of a state-owned corpo‐
ration. The company has more than 15.000 employees on the payroll. It is multi-industry
working in the communications and electricity generation contexts. One of which
recently transitioned from a monopoly to a competitive environment. The company is
state-owned (i.e., part of the budget is managed by the government).
Figure 1 shows the SPI scope within the main corporation. Even though the SPI is
only being applied in only one of the companies, it is expected for the investment to be
replicable in all other companies. The main reason for this is that most of the investment
comes from public funds.

Fig. 1. SPI scope within the corporation (ellipsis represent further levels; dark boxes represent
the main scope of the SPI application reported in this paper)

The SPI initiative comes from the IT Department of Company 2. This IT Department
provides services for all three management departments within the company; however,
it is under the supervision of the administrative management.
Industries 1 and 2 have fairly diﬀerent dynamics, the most important being that one
still works in a monopolistic context and the other recently opened to competition.
Furthermore, the main corporation and all its companies have rigorous controls because
they are state-owned, and lack process change experiences.
Even though, the necessity to adapt to the new competition rules and to reduce
expenses drove the company to conduct a SPI program, IDEAL was used to conduct it
systematically, and to comply with regulations and controls. One of the company’s goals
is to become a high performance enterprise, and improve its internal capabilities to
reduce the need for third parties.
We realized that in order to boost the SPI progress, visibility and transparency of
investment was required. Moreover, goal alignment was necessary to help managers
and employees to dimension the impact of the SPI. To achieve this goal GQM

316

G. López et al.

+

Strategies was a perfect fit. However, the first obstacle for GQM+Strategies was the
entrenched use of the Balanced Scorecard.
It is not that balanced scorecard is not an appropriate performance management
framework, but when applied without clear goals and other additional controls are used,
it can easily become a pitfall for the organization applying it [11].
In an eﬀort to visualize the SPI impact, we conducted GQM+Strategies as a process
improvement and understood the alignments and possible beneﬁts in the combination
of the two approaches (i.e., GQM+Strategies and IDEAL).

4

Combining GQM+Strategies and IDEAL

GQM+Strategies and IDEAL are methods proposed by recognized authors in the soft‐
ware engineering ﬁeld. The ﬁrst is a method that provides concepts and phases to link
goals and strategies across an organization using a measurement based approach [8], the
second a continuous SPI model [9, 10].
GQM+Strategies developers explicitly state the relation between GQM+Strategies
and continuous SPI: “GQM+Strategies is helpful to measure the contribution of the
improvement initiative. Moreover, GQM+Strategies could be used to improve the goal
alignment of continuous SPI approaches and to operationalize corresponding improve‐
ment initiatives” [5].
It is clear that the core of GQM+Strategies is a SPI method [12]. However, in our
experiences we believe that combining GQM+Strategies with other (not necessarily
measurement-driven) SPI model such as IDEAL can be beneﬁcial. For instance, in a
state-owned company that already has measurement models, GQM+Strategies could
not be used to drive a SPI initiative. Therefore, combining it with IDEAL provide a good
ﬁt to assure measurement alignment and convince management that the SPI is being
driven in a formal way.
IDEAL is being applied to create a company level program for software process
improvement. IDEAL was selected due to its success in several contexts [13–15], and
recommendations of expert consultants. The following sections describe the ways in
which we combined GQM+Strategies and IDEAL to support the SPI program.
4.1 Base Application of IDEAL and GQM+Strategies as the Method for
Measurement Process Improvement
In the context of a SPI application in a large company, we decided to address SPI with
IDEAL as our main guide. Therefore, we executed the Initiating phase. The stimulus
for improvement was primarily due to the industry competition, and the ever changing
requirements in the software development areas of the company.
The sponsorship was clearly deﬁned since members of the corporation requested the
involvement of an academic team to enhance their probabilities of succeeding in the
implementation of the SPI.
The Diagnosing phase started, interviews, observations and surveys were applied
both at tactical and managerial levels of the IT department (Fig. 1). This resulted in a

GQM+Strategies and IDEAL

317

list of over 30 improvement opportunities to be addressed. The diagnosis was based on
four dimensions: people, technology, process and culture.
Findings and recommendations were prioritized and the three most important
improvement opportunities were selected to be addressed in the ﬁrst iteration of the
Establishing phase. Plans to incorporate agile software development practices, DevOps
[16, 17] and improve measurement were developed. Both in agile software development
and DevOps, further diagnosis was required in order to establish a concrete implemen‐
tation plan. On the other hand, software metrics was already being addressed. In parallel,
academics and employees of the corporation were working in the application of GQM
+
Strategies. Figure 2 shows how the introduction of GQM+Strategies was implemented
as a part of the IDEAL model.

Fig. 2. Application of GQM+Strategies as the framework to implement measurement process
improvement within the IDEAL model

After this stage, we realized the possibilities in which GQM+Strategies and IDEAL
could be combined to provide systematic SPIs and goal alignment through the company
at the same time.
The process concluded at IDEAL’s acting phase with a retrospective session for
leveraging and the GQM+Strategies implementation continued (i.e., in our implemen‐
tation of IDEAL and GQM+Strategies, we concluded the IDEAL cycle and maintained
the GQM+Strategies cycle speciﬁcally for the measurements process improvement and
goal alignment).
It is important to mention that one of the a priori decisions was to improve the meas‐
urement process. However, the sponsors of the SPI originally, did not want to invest in
the implementation of GQM+Strategies. Once they realized the results of the GQM
+
Strategies Grid implementation, and the amount of duplicated eﬀort in similar goals
across the organization, their perspectives changed.

318

G. López et al.

This approach to combine IDEAL and GQM+Strategies is valuable since the appli‐
cation of GQM+Strategies assures that all improvements will be aligned with organi‐
zational and business goals, and the application of IDEAL makes sure that the improve‐
ments are performed in a systematic and well documented way, easing the institution‐
alization of the successful improvements at company level, and possibly, at corporation
level.
4.2 GQM+Strategies Results as Inputs for IDEAL’s Initiating, Establishing,
Diagnosing and Leveraging Phases
To execute IDEAL’s initiating phase, the organization requires identifying personnel
that would launch the improvement. Moreover, the business critical needs need to be
fully understood [10]. Assuming that the organization is already applying GQM+Strat‐
egies, that input will be available, since the discovery of qualiﬁed personnel to improve
processes is also a requirement for GQM+Strategies and the business goals will be
aligned with lower level goals that can be addressed in the SPI program.
IDEAL’s diagnosing phase requires the organization’s vision, business plan, and SPI
goals to be synergistic [10]. This is also one of the main results of applying GQM
+
Strategies. In order to establish process baselines in the diagnosing phase, GQM+Strat‐
egies metrics could be used to lighten baseline implementation.
Table 1 shows how the results of GQM+Strategies phases match IDEAL phase
inputs, reducing the eﬀort required to implement IDEAL.
Table 1. Match between GQM+Strategies results and IDEAL inputs

The eﬀort of applying GQM+Strategies to gain only these advantages will not be
beneﬁcial. However, in a context like the one we are working in, with GQM+Strategies
being used to align organizational goals and deﬁne progress measurement strategies,
and IDEAL to guide other SPI initiatives, the results of GQM+Strategies are really
beneﬁcial to reduce the work required for the IDEAL model execution.

GQM+Strategies and IDEAL

319

The eﬀorts executing GQM+Strategies were very useful to drive the SPI program,
from the goal prioritization, to SPI progress measurement. Furthermore, the GQM
+
Strategies Grid, can be used to track and identify SPI impact within the organization.
Figure 3 illustrates how we combined GQM+Strategies outputs to boost IDEAL’s
execution.

Fig. 3. Illustration of how we used GQM+Strategies results to boost IDEAL implementation in
processes other than measurement

4.3 GQM+Strategies Metrics as Measuring Inputs for IDEAL’s Action Phase
The proper application of GQM+Strategies requires an eﬀort in deﬁning GQM graphs
including measurement goals, questions, metrics, and interpretation models. IDEAL
creators propose a similar approach, distributed in the Establishing and Acting phases.
In our GQM+Strategies application we found a large amount of goals, and an external
contractor suggested metrics for each one of them. This approach helped us to deﬁne
how the Action phase of IDEAL was going to be measured.
It is clear that the metric proposals will require reﬁnement, based on the execution
plans for the Action phase. However, metrics were proposed following experiences and
case studies reported in literature, good measurement practices or standards. Therefore,
they can be a helpful guide to measure SPI progress.
Currently, two improvements are being conducted: Implementing agile practices in
the software development process and implementing DevOps culture. However, none
of them have reached IDEAL’s acting phase. Therefore, this proposal needs to be
assessed in order to measure the real value of having a generic metric proposal while
implementing IDEAL. Figure 4 illustrates how we propose to use GQM+Strategies
metrics in speciﬁc tasks within IDEAL’s acting phase.

320

G. López et al.

Fig. 4. Illustration of how we used GQM+Strategies results to boost IDEAL’s acting phase (this
ﬁgure has a higher granularity than Figs. 2 and 3 as it provides insights for tasks within the acting
phase)

4.4 Mapping Between IDEAL’s Acting Phase and GQM+Strategies Execute
Strategies Phase
The last combination of GQM+Strategies and IDEAL that we identiﬁed was to use
IDEAL’s artifacts in the Execute Plans phase of GQM+Strategies. In GQM+Strategies,
plans are designed to implement and deploy “improvements” through a series of
projects, in order to achieve a GQM+Strategies Grid goal. These improvements are
assessed and replicated in the organization.
In our context, the approach proposed by GQM+Strategies authors is too lightweight
(i.e., people within the organization could question the rigor of the process). Therefore,
using IDEAL to execute GQM+Strategies strategies can provide the necessary docu‐
mentation to demonstrate the process ﬂow and progress. Even though the ﬁnal assess‐
ment would be provided by the changes in the metric established for the goal that wants
to be achieved. The process is as important as the results in order to allow replication in
the organization.
Since we deﬁned that IDEAL would be the guide for process improvement before‐
hand, we did not implement GQM+Strategies execute strategies phase. However, we
considered that this approach will also be beneﬁcial if these two methods are combined.

5

Results

The main result of the IDEAL model application was the documentation of the SPI
program eﬀorts. It is important to mention that we adapted the instruments proposed by
the authors or the model in order to lighten them just enough to allow agility while
complying with the regulations.
Other important result of applying IDEAL was the establishment of the MSG and
SEPG (i.e., IDEAL’s permanent work forces [9]), because these groups of people have
enough credentials for decision making regarding the SPI and have inﬂuence on the
budget assignment.

GQM+Strategies and IDEAL

321

GQM+Strategies Grid was a valuable instrument to assess the eﬀorts proposed in a
large company. However, the eﬀorts required to execute the initialize, characterize the
environment, and the deﬁne goals, strategies, and measurement phases was very high.
These high costs are a reﬂection of a segmented company with duplicated goals at
management levels.
Figure 5 shows the GQM+Strategies Grid structure result of the ﬁrst GQM+Strategies
iteration and of the review of a large number of assets within the company. In Fig. 5,
dark boxes represent goals and white boxes represent strategies for those goals. The
GQM+Strategies Grid contains 79 goals and more than 150 strategies.

Fig. 5. GQM+Strategies grid. This Figure illustrates the complexity of goals in the company.

322

G. López et al.

One of the most important tasks of the GQM+Strategies application was to identify
duplicated goals or strategies. Those duplicates were uniﬁed in the GQM+Strategies
Grid, however, the process allowed us to identify which managements were planning
on implementing which improvements.
As it was mentioned, other important result of GQM+Strategies was the deﬁnition
of generic metrics for each goal. These metrics are a valuable asset to assess the
improvements once they start.

6

Discussion

As we showed in previous section, the combination of GQM+Strategies and IDEAL is
a good ﬁt for this project context. Even though we know that regulatory policies must
be changed, in order to allow agility, this change should be gradual and we found a good
tool in IDEAL to start this change.
GQM+Strategies is also a great method, especially in these multi-industry contexts.
The use of the GQM+Strategies Grid provided great visibility of the eﬀorts performed
in the SPI program.
The following sections describe the experiences, lessons learned of the combination
of GQM+Strategies and IDEAL. Moreover, we describe the unsolved problems, work
in progress and future work.
6.1 Experiences and Lessons Learned
We believe that IDEAL is a very structured way to conduct SPI programs. Since the
context in which SPI is being conducted is used to the application rigorous methods
IDEAL is a ﬁt. IDEAL only provides a model to conduct the improvements and does
not suggest ways to implement them; therefore, it must be combined with other methods
to assure acceptance within the organization.
In a similar way GQM+Strategies is a very structured way to align goals within the
organization and to deﬁne metrics. However, it does not provide insights on the way in
which strategies should be executed to achieve those goals. Therefore, it must be
combined with other models to attain the organization’s regulations (i.e., execute strat‐
egies in a structured way).
We know that these regulations are somehow obsolete. In a world in which software
processes are becoming more and more agile every day, the necessity of ordered
prescriptive processes is lower and lower. Since the context is a state-owned company
that is used to perform task in certain ways, the combination of this GQM+Strategies
and IDEAL is a great match.
In the beginning or our SPI, we suﬀered from managerial absence and lack of
commitment. After gaining conﬁdence with senior management, we suﬀered from
exhausted middle managers that did not want to free resources from their daily tasks to
participate in the SPI process. This problem was solved with the change of managers
and the participation of new resources speciﬁcally selected by the new managers to drive
the SPI process.

GQM+Strategies and IDEAL

323

The application of GQM+Strategies and the creation of the GQM+Strategies Grid
allowed us to regulate manager’s enthusiasm and help them realize the amount of low
level goals deﬁned in the organization. The largest version of the GQM+Strategies Grid
contained more than 79 diﬀerent goals and over 150 strategies associate to those goals.
Therefore, when the IDEAL application began, they were able to estimate the amount
of eﬀort required to execute a process improvement initiative and the possible impacts
of those processes for the organization.
On the other hand, we assessed a large amount of documentation. In some of the
documents a lightweight Capability Maturity Model Integration (CMMI) appraisal was
conducted and the “ideal” maturity level pre-established by the organization was 5 (i.e.,
optimized). By applying GQM+Strategies identiﬁcation and selection of existing assets,
we recognized these unrealistic expectations and were able to discuss them with middle
and senior management before implementing any improvement.
Other challenge that was faced during the ﬁrst iteration of the SPI was ﬁnding the
“right” time to execute the improvements. Middle managers were too busy with their
daily tasks and the improvements were stalled. In order to avoid obsolescence of action
plans, we decided not to continue the SPI until the sponsorship changed, and eﬀorts to
change managerial perspectives were made. Finally, with the change of managers, the
sponsorship arrived and the process continued. This could not have been possible
without the application of IDEAL guiding the SPI. We would have realized the lack of
middle managerial support when the improvement was going to be deployed rather than
at the beginning of the process. To avoid resource and time waste, we invested the stall
time to proceed with GQM+Strategies application.
As for the measurement, we realized that, due to lack of a better measurement infra‐
structure, the metrics were calculated using surrogate measures. Moreover, communi‐
cation problems avoided the proper use of gathered information in the decision-making
process. Some of these issues still persist, and we are working on solving them.
Other problem identiﬁed by applying GQM+Strategies and IDEAL was the lack of
an outsourcing governance program (i.e., standardized outsourcing practices). This issue
was identiﬁed while applying GQM+Strategies and helped us to include some gover‐
nance aspects during the IDEAL execution and helped the company to recognize the
business risks inherent to lack of outsourcing management standards.
Even though it might be considered a current agile trend, iterative SPI allowed us to
demonstrate progress in short periods of time. Moreover, by holding retrospectives we
endorsed continuous improvement of the SPI deﬁnition and implementation.
One of the best ways to provide visibility to a SPI program is by actually improving.
However, process selection based on priorities is required for large scale impact. The
big bang approach is rejected in our application context. We know that current SPI
program expectations will take several years to be met. However, to increase produc‐
tivity some changes can be quick ﬁxes. For instance, moving to agile outsourcing
contracts [18] and training inside employees to work with contractors in this way.
Finally, our SPI program is aligned with technical training courses and two graduate
programs: one on software development and the other on management information
systems. These linked programs are expected to help employees understand the new
processes, concepts, techniques and problem domains.

324

G. López et al.

6.2 Unsolved Problems and Work in Progress
One problem that rose from the application of GQM+Strategies was that, since the
method has not been widely adopted in the company, other eﬀorts performed to deﬁne
goals and strategies were not aligned with the GQM+Strategies Grid. For instance, the
latest IT Department plan was released and misalignments with the GQM+Strategies
Grid were found. This issue was ﬁxed by iterating over the GQM+Strategies Grid deﬁ‐
nition to include new goals and strategies. However, these eﬀorts and its costs could
have been mitigated if the methods and information were uniﬁed.
Other unresolved problem is the change in control structures. IDEAL’s documen‐
tation would help to provide new models to control agile projects; however, auditors,
comptrollers, and other actors need to change their perspectives towards agile practices
for the SPI program to work. This issue is directly linked with the company’s culture.
Other pending task is to adopt and boost a continuous improvement culture. In our
context, this task has become diﬃcult, because some of the employees still have practices
aligned with non-competitive markets. We expect to change the culture using the lead
by example approach. This change is required in order to be able to meet the company’s
needs and business goals. Moreover, a bottom up approach can be used to demonstrate
senior management that SPI can really increase RoI and transform their vision and will‐
ingness to support and launch a long term continuous SPI program.
We have already deﬁned some improvements by working with the management
steering group, software engineering process group, and technical working groups.
However, a pending task is planning how to get other employees to accept these new
processes and participate in other improvements.
In our context, the MSG was matched with other committee that was already in place
when the SPI began. Nonetheless, the participants of this board have traditional prac‐
tices, and do not match the agility requirements neither for IDEAL nor for GQM+Strat‐
egies. Therefore, eﬀort should be placed to change these practices.
One of the major diﬃculties that we have faced is that the company has very struc‐
tured controls and hierarchical interaction approaches, therefore, some employees have
aversion to be held responsible for some tasks. We know that in order for the SPI program
to be successful, everyone should be held responsible for the improvements.

7

Conclusion

In this paper we presented our experiences applying the IDEAL model and GQM+Strat‐
egies to conduct continuous software process improvement and establish a measurement
program in a large multi-industry state-owned company. Also, we suggested several ways
in which IDEAL and GQM+Strategies can be used to complement each other.
We were forced to use a systematic approach for SPI, because state-owned compa‐
nies have rigorous controls over resource investment and are accustomed to use heavy
methodologies in most of their processes. However, we expect to change or adapt
IDEAL, once a continuous improvement culture becomes part of the company.
We believe that the experience presented in this paper could be beneﬁcial for other
endeavors in the application of GQM+Strategies that face organizational resistance. We

GQM+Strategies and IDEAL

325

conducted a parallel eﬀort of GQM+Strategies and IDEAL up to the point in which we
could convince the company representatives that IDEAL was a formality. If the SPI is
conducted in a measurement driven way, the formality is not necessary as the progress
can be measured by results rather than documents generated. However, the application
of IDEAL is not trivial. Therefore, a signiﬁcant investment was covered while these
eﬀorts were conducted in a parallel way.
The use of GQM+Strategies was crucial to provide the goal alignment required to
implement a SPI program. We believe that, if it was not for the eﬀort invested in
constructing the GQM+Strategies Grid, we would not even found half of the already
proposed strategies to improve processes in the company. Even performing the IDEAL’s
diagnosing phase in a strict and deep way would not provide the assets that GQM
+
Strategies did.
We are still working on timing. In order to improve processes and pilot solutions,
internal resources are required. However, the company’s culture is still too rigid to free
the necessary resources to implement improvements. We are working in two speciﬁc
process improvements: agile development and DevOps; nevertheless, these eﬀorts are
not suﬃciently strong to produce the impact expected.
We present these experiences because there is a lack of “from the trenches” perspec‐
tive on SPI in this kind of contexts, and we hope for these experiences to be useful for
other researchers or companies implementing SPI programs.
Acknowledgment. This work was partially supported by Research Center for Communication
and Information Technologies (CITIC) at University of Costa Rica. Grant No. 834-B4-412. We
would also like to thank all the consultants that participated in this research including but not
limited to: Dr. Marcelo Jenkins (current minister of Science and Technology of Costa Rica), Dra.
Alexandra Martínez, Dr. Carlos Castro, Mag. Mauricio Arroyo, and Mag. Marcela Chacón.

References
1. Alagarsamy, K., Justus, S., Iyakutti, K.: Implementation speciﬁcation for software process
improvement supportive knowledge management tool. IET Softw. 2, 123–133 (2008)
2. Clarke, P., O’Connor, R.V.: The Inﬂuence of SPI on business success in software SMEs. J.
Syst. Softw. 85, 2356–2367 (2012)
3. Pfeﬀer, J.: Seven practices of successful organizations. Calif. Manage. Rev. 40, 96–124 (1998)
4. Muda, A.L., Fook, C.Y., Noordin, N.M.: The relationship between learning culture and high
performance and productivity culture with job satisfaction: a study among employees in one
public organization in Sarawak, Malaysia. In: Fook, Y.C., Sidhu, K.G., Narasuman, S., Fong,
L.L., Abdul Rahman, B.S. (eds.) 7th International Conference on University Learning and
Teaching (InCULT 2014), Proceedings: Educate to Innovate, pp. 17–25. Springer, Singapore
(2016)
5. Basili, V.R., Trendowicz, A., Kowalczyk, M., Heidrich, J., Seaman, C., Lindvall, M., Munch,
J.: Aligning Organizations Through Measurement - The GQM+Strategies Approach. Springer,
Switzerland (2014)
6. Kaplan, R.S., Norton, D.P.: The balanced scorecard - measures that drive performance. Harv.
Bus. Rev. 70(1), 71–79 (1992)

326

G. López et al.

7. Kaplan, R.S.: Conceptual foundations of the balanced scorecard. In: Handbooks of
Management Accounting Research, Vol. 3, pp. 1253–1269 (2008)
8. Basili, V.R., Lindvall, M., Regardie, M., Seaman, C., Heidrich, J., Rombach, D., Trendowicz,
A., Münch, J., Rombach, D., Trendowicz, A.: Linking software development and business
strategy through measurement. Comput. (Long. Beach. Calif.) 43, 57–65 (2010)
9. Mcfeeley, B.: IDEAL: a user’s guide for software process improvement, Pittsburgh,
Pennsylvania, USA (1996)
10. Gremba, J., Myers, C.: The IDEAL model: a practical guide for improvement. Bridg. Softw.
Eng. Inst., 23(3) (1997)
11. Fink, A., Marr, B., Siebe, A., Kuhle, J.: The future scorecard: combining external and internal
scenarios to create strategic foresight. Manag. Decis. 43, 360–381 (2005)
12. Basili, V., Caldiera, G., Rombach, H.D.: Experience factory. In: Encyclopedia of Software
Engineering. John Wiley & Sons, Inc. (2002)
13. Casey, V., Richardson, I.: A practical application of the IDEAL model. In: Oivo, M., KomiSirviö, S. (eds.) Product Focused Software Process Improvement, pp. 172–184. Springer,
Heidelberg (2002)
14. Klendauer, R., Hoﬀmann, A., Leimeister, J.M., Berkovich, M., Krcmar, H.: Using the IDEAL
software process improvement model for the implementation of Automotive SPICE. In: Cooperative and Human Aspects of Software Engineering, pp. 66–72. IEEE (2012)
15. Pino, F.J., García, F., Piattini, M.: Software process improvement in small and medium
software enterprises: a systematic review. Softw. Qual. J. 16, 237–261 (2008)
16. Swartout, P.: Continuous Delivery and DevOps: A Quickstart guide. Packt Publishing,
Birmingham (2014)
17. Verona, J.: Practical DevOps. Packt Publishing, Birmingham (2016)
18. Larman, C., Vodde, B.: Practices for Scaling Lean & Agile Development: Large, Multisite,
and Oﬀshore Product Development with Large-Scale Scrum. Addison-Wesley Professional,
Reading (2010)

On the Role of Software Quality Management
in Software Process Improvement
Jan Wiedemann Jacobsen1 , Marco Kuhrmann1(B) , Jürgen Münch2 ,
Philipp Diebold3 , and Michael Felderer4
1

4

The Mærsk Mc-Kinney Møller Institute,
University of Southern Denmark, Odense, Denmark
janwj12@student.sdu.dk, kuhrmann@mmmi.sdu.dk
2
Herman Hollerith Center, Reutlingen University, Böblingen, Germany
juergen.muench@reutlingen-university.de
3
Fraunhofer Institute for Experimental Software Engineering,
Kaiserslautern, Germany
philipp.diebold@iese.fraunhofer.de
Institute of Computer Science, University of Innsbruck, Innsbruck, Austria
michael.felderer@uibk.ac.at

Abstract. Software Process Improvement (SPI) programs have been
implemented, inter alia, to improve quality and speed of software development. SPI addresses many aspects ranging from individual developer
skills to entire organizations. It comprises, for instance, the optimization
of speciﬁc activities in the software lifecycle as well as the creation of
organizational awareness and project culture. In the course of conducting
a systematic mapping study on the state-of-the-art in SPI from a general
perspective, we observed Software Quality Management (SQM) being of
certain relevance in SPI programs. In this paper, we provide a detailed
investigation of those papers from the overall systematic mapping study
that were classiﬁed as addressing SPI in the context of SQM (including
testing). From the main study’s result set, 92 papers were selected for
an in-depth systematic review to study the contributions and to develop
an initial picture of how these topics are addressed in SPI. Our ﬁndings
show a fairly pragmatic contribution set in which diﬀerent solutions are
proposed, discussed, and evaluated. Among others, our ﬁndings indicate
a certain reluctance towards standard quality or (test) maturity models
and a strong focus on custom review, testing, and documentation techniques, whereas a set of ﬁve selected improvement measures is almost
equally addressed.
Keywords: Software process improvement · Software quality management · Software test · Systematic mapping study · Systematic literature
review

1

Introduction

To organize software development companies look for Software Process Improvement (SPI; [19]) allowing them to analyze and to continuously improve their
c Springer International Publishing AG 2016

P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 327–343, 2016.
DOI: 10.1007/978-3-319-49094-6 21

328

J.W. Jacobsen et al.

development approaches. In the course of conducting a systematic mapping study
[24], SPI was mentioned a diverse ﬁeld: many SPI facets are studied, several hundreds of custom SPI approaches were proposed, e.g., to address weaknesses of
standard approaches like CMMI [34], SPI success factors are collected and analyzed, and new trends such as SPI employing agility as improvement principle
are addressed. SPI thereby aims at improving companies’ competitiveness and
is considered important regardless of a company’s size [16].
Besides accelerated development procedures, the quality of the software products developed is another important criterion (cf. Bennett and Weinberg [4], who
found bug ﬁxing cost increasing by magnitudes in later lifecycle phases). Therefore, improving the quality of software and determining the economic value [14],
notably for small and very small companies [28] is of certain relevance. For those
companies, emphasizing quality is crucial, as software testing is a strenuous and
expensive process [5] consuming up to 50 % of the total development costs [17].
Therefore, improving the quality management and, in particular, the software
test activities provide a perfect starting point for improving the software process
and hence product quality.
Problem Statement and Objective. SPI programs have been implemented to
improve product quality and speed of software development and have shown
impact [2]. Also, software quality assurance techniques play an important role
to guarantee and improve quality. Yet, the role of software quality assurance and
SQM in SPI programs has not explicitly been investigated so far. The objective
of this research is therefore to analyze the literature to characterize the role of
SQM in SPI.
Contribution. This paper provides an overview of the study population on SPI
with a special focus on SQM and shows how these studies are evaluated. It
presents the software quality assurance techniques and improvement measures
addressed in SPI. Our ﬁndings show indication that SPI in the context of SQM
is equally focussed on software testing as well as on complementing (support)
activities including reviews and documentation techniques. Furthermore, our
ﬁndings show a trend towards utilizing individual testing approaches rather than
implementing/following standards.
Context: A Systematic Mapping Study on SPI. This study is grounded in a comprehensive systematic mapping study on the state of SPI of which the ﬁndings
where published in [24] (to which we refer to as the main study). Outcomes of
this study show SPI being an actively researched topic, yet lacking theories and
models. Instead, the ﬁeld of SPI is shaped by a constant rate of approx. 10–12
new SPI models per year. These trends observed were used to form topic clusters
of which one cluster addresses Software Quality Management and Software Test.
The study at hand investigates this particular cluster in more detail utilizing a
systematic review (cf. Sect. 3).

On the Role of Software Quality Management

329

Outline. The remainder of the paper is organized as follows: Sect. 2 discusses
related work. In Sect. 3, we describe our research approach, before we present
the results of our study in Sect. 4. We provide a discussion on the results in
Sect. 5 and conclude the paper in Sect. 6.

2

Related Work

In (general) SPI, diﬀerent topics are researched in secondary studies. For
instance, Monteiro and Oliveira [31], Bayona-Oré [3], and Dybå [7] study SPI
success factors, while Helgesson et al. [15] and van Wangenheim et al. [36] review
maturity models, and Hull et al. [18] review diﬀerent assessment models. These
exemplarily mentioned studies show that the SPI community has started the
search for generalizable knowledge. Yet, the mentioned studies address more
general SPI issues.
The study at hand is the ﬁrst literature study explicitly dedicated to the role
of Software Quality Management (SQM) and Test Process Improvement (TPI)
in SPI. It is, however, related to other reviews and secondary studies in SPI,
TPI, and the improvement of other analytical and constructive software quality
aspects. For instance, regarding TPI, Afzal et al. [1] provide a systematic review,
which identiﬁed 18 approaches and their characteristics, and an industrial case
study on two prominent approaches, i.e., TPI Next and TMMi. Authors found
that many of the test process improvement approaches do not provide suﬃcient
information nor do the approaches include assessment instruments. A systematic review by Garcia et al. [10] identiﬁed 23 test process models, many of them
adapted from TMMi and TPI. Reviews and comparisons of TPI models are also
covered by a number of industrial white papers (so-called “grey literature”, e.g.,
[21,27]), which points to the practical relevance of this ﬁeld. At the more general level of analytical veriﬁcation and validation processes, Farooq and Dumke
[9] discuss research directions for the improvement of veriﬁcation and validation
processes. Authors identify research challenges concerning quantitative management, improvement of existing approaches, approaches for emerging development
environments as well as empirical investigation of success factors and tool selection. Regarding constructive software quality aspects, several systematic reviews
(e.g., for software documentation [39]) are available, but reviews discussing these
quality aspects in relation to SPI are missing so far.
All these representatively selected studies address speciﬁc topics, yet, they
do not contribute to a more general perspective on SPI in the context of SQM.
The paper at hand thus ﬁlls a gap in literature by collecting and analyzing
publications that emphasize SPI in the SQM context and, therefore, also lays
the foundation to direct future research in this ﬁeld in SPI research.

3

Research Design

This study is an in-depth analysis of a data subset identiﬁed in a systematic
mapping study [24]. In this section, we present the research design including

330

J.W. Jacobsen et al.

research questions, data collection and analysis procedures, as well as considerations on the study’s validity. Our research approach for the present study follows
the procedures applied in [25]; an in-depth analysis of SPI in Global Software
Engineering.
3.1

Research Questions

In the course of analyzing the selected papers on SQM, this study aims to answer
the following research questions:
RQ 1 What is the study population on SPI with a special focus on SQM? This
research question aims at capturing the ﬁeld of SPI from the perspective of
quality management and test. It also helps positioning the sub-study to the
main study.
RQ 2 Which software quality assurance techniques and improvement measures
are addressed in SPI? Based on 58 new metadata attributes, this research question aims at determining the diﬀerent quality assurance techniques and improvement measures addressed by SPI.
RQ 3 How are studies on SQM in SPI evaluated? This research question is
concerned with the determination of the impact of the investigated studies, in
particular, to determine the rigor and relevance [20] of the result set.
3.2

Data Collection Procedures

Being a study on a data subset (see also [25]), in this study, we had no need for
an explicit and self-contained data collection. Input data was obtained from the
main study’s result set [24], which we refer to as the study’s raw data. The selection of the data of interest in the raw data was carried out by selecting all publications from the raw data having the attributes “Quality Management” and/or
“Test” set (Fig. 3), which initially results in 96 publications. The resulting subset
(to which we refer to as the study data) was then copied to an own spreadsheet.
To improve the reliability of the data analysis, two external researchers joined
the team. Finally, two researchers carried out the data selection and cleaning
procedures and the initial data analysis, one researcher was concerned with the
deﬁnition of the extended metadata set and the data classiﬁcation and analysis,
and the two remaining researchers took over quality assurance tasks.
Having the study data available, in the course of downloading all selected
papers, an initial quality assurance was performed. This quality assurance led to
the exclusion of four papers (reasons: misclassiﬁcation, violation of language constraints). Those papers’ metadata was updated, such that they will be returned
to the main study (Sect. 6). Eventually, 92 papers remained in the cleaned study
dataset, which where then analyzed as described in Sect. 3.3.

On the Role of Software Quality Management

3.3

331

Analysis Procedure

As “preparatory” study with the purpose of getting the big picture, the main
study was conducted as a systematic mapping study following the guidelines
as proposed by Petersen et al. [32]. The present study however aims to deliver
more insights and details and, thus, is carried out also using the systematic
review instrument as described by Kitchenham and Charters [23]. In particular,
during the paper download and quality assurance, the initial metadata set (40
attributes, Fig. 3) was revisited and, if necessary, updated. Furthermore, with
calling in an external researcher (an expert in quality management and testing),
the set of metadata was substantially extended by 58 extra attributes in nine
new metadata categories (see Fig. 4).
During the analysis, each paper was inspected by two researchers, who
checked (and if necessary revised) the initial values of the metadata, provided
an initial assignment of values to the new attributes, and developed a paper
summary of 2–3 sentences. Finally, to evaluate the papers regarding their rigor
and relevance, we applied the model proposed by Ivarsson and Gorschek [20]
to complete the picture. These steps were iteratively double-checked by a third
researcher, and ﬁnally independently checked by the two researchers concerned
with (general) quality assurance. The analysis as such utilizes descriptive statistics (e.g., charts and tables), whereas we mainly rely on bubble-charts and heat
maps.
3.4

Validity Procedures

To improve the validity of the results, we applied the following measures: First,
we called in two external researchers and formed two teams. Team 1 (3 persons)
conducted the data analysis, while team 2 (2 persons) was taking over the quality
assurance. Second, in the data analysis phase, team 1 re-applied the procedures
of the main study [24], i.e., all papers were re-inspected to check the correct
assignment and to complete the assignment of the 40 metadata attributes. Third,
in the inspection, the assignment of the attributes (40: main study, 58: new,
scoped), and the evaluation according to the rigor-relevance model [20] were
carried out using the systematic review instrument [23] using the full text of the
study-relevant papers.

4

Study Results

In this section, we present the results of the study. We start with an overview
of the study population, before we present the results of the analyses structured
according to the research questions in Sects. 4.1, 4.2 and 4.3. Section 5 presents
an integrated discussion of the results obtained from the study.
In total, 92 papers remained in the study data set for inspection. Figure 1
provides an overview of the publication frequency in the study timeframe. In
general, in the result set, we see about 3 and 4 papers on the topic of interest

332

J.W. Jacobsen et al.

Fig. 1. Number of publications on SPI with a focus on software quality management
and/or testing (n = 92). The graph includes two trend lines to visualize the long-term
development of the ﬁeld (calculation basis: mean, 3-year (black) and 10-year (red)
period), which show periodical waves, but also a continuously growing general interest.
(Color ﬁgure online)

published per year, but Fig. 1 also shows a ﬁrst big jump in 1998 (from there on,
the average publication frequency is 5+ papers per year). In subsequent sections,
we provide further details and analyze them in relation to the trend observed.
4.1

RQ1: General Study Population

In this section, we ﬁrst give an overview of the general study dataset using the
instruments from the main study [24] to allow for comparability. Figure 2 provides an integrated overview of the study dataset according to the classiﬁcation
using the standard schemas (research type facet (RTF) according to Wieringa et
al. [37]) and contribution type facet (CTF) according to Petersen et al. [32]).
Figure 2 shows the studied publications forming two CTF-clusters. In particular, SPI with a special emphasis on software quality management and software test is mainly reported as framework or as lessons learned, whereas the
framework-classiﬁed papers usually propose solutions and the lessons learned
emerge from experience and evaluation research. Furthermore, a considerable
share of the lessons learned papers are classiﬁed as philosophical papers, i.e., secondary studies or discussion/comparison papers. In line with the ﬁndings from
the main study [24], models and theories are in the minority or missing. Another
(unexpected) ﬁnding is the small number (only 2 out of 92 papers) of tool-related
publications. However, although tools are underrepresented in the “formal” literature, in [11], authors argue that more tool-related material can be found in
the “grey literature”. Insofar, the chart from Fig. 2 can be considered consistent
with the ﬁndings from [11].
Figure 3 shows the classiﬁcation of the study dataset using the metadata
system introduced in [24]. Regarding the process dimension, the study dataset
shows a strong focus on general improvement and custom models. Furthermore,
standard SPI and maturity models (CMMI and ISO/IEC 15504) are addressed,
but we can also see a certain focus on general measurement (and assessment)
activities. Regarding the context dimension, in the lifecycle phases, only project
management is signiﬁcantly represented showing the close relation of project-

On the Role of Software Quality Management

333

Fig. 2. Classiﬁcation of the study dataset according to the RTF and CTF schemas.

and quality management. Other lifecycle phases are scarcely addressed, which
suggests the publications from the dataset being narrowly scoped. Concerning the application domain, the classiﬁcation does not highlight any favorite,
i.e., SQM and testing are considered relevant in all application domains. Finally,
regarding the company size and scale group, publications address companies of
all sizes. Furthermore, globally distributed development is also addressed by the
study dataset. Figure 3 shows the studied ﬁeld mostly researched in a practical
manner, i.e., case study research is found the most frequently used instrument.
The ﬁgure shows that a number of multi-case or longitudinal studies are available (which is above the general tendency observed in the main study), yet, still,
replication research is absent.
4.2

RQ2: Improvement Measures and Quality Assurance
Techniques

To investigate which improvement measures and quality assurance techniques
are addressed by the study dataset, we extended the metadata system from
[24] and deﬁned 58 new attributes for classifying the papers under study. We
added “Quality Management and Testing” as new dimension, and we reﬁned
this dimension into nine groups (Fig. 4). For space limitations, in the following,
we provide the big picture in Fig. 4, but focus on the groups “Improvement Measures” and “Quality Assurance Techniques”. The big picture in Fig. 4 shows the
groups test activity, non-functional testing, and level of testing well covered. Furthermore, the dataset provides rich information regarding the groups improvement measures and quality assurance techniques. However, especially regarding test maturity models (or “standardized” testing approaches in general),

334

J.W. Jacobsen et al.

Fig. 3. Overview of the diﬀerent standard metadata attributes addressed over time.
The darker the color, the more papers in a year have this attribute assigned, whereas
one paper can have multiple attributes assigned. (Color ﬁgure online)

the dataset provides only little information, which indicates to a conﬁrmation of
the observed trend from [24] regarding the reluctance towards standardization—
also for quality management and testing (and as initially found in [26]).
Regarding the groups “Improvement Measures” and “Quality Assurance
Techniques”, in the data, we see a fairly balanced distribution, i.e., a variety of
topics is equally researched. The only remarkable outlier is the attribute software
infrastructure. Favorites regarding the improvement measures are the improvement of defect handling (50 mentions), cost and time optimization (54 and 56
mentions). Regarding the quality assurance techniques, review (62), as well as
testing and documentation (60 mentions each) are the most frequently mentioned ones. Subsequent sections provide further details for the aforementioned
two “favorite” groups.

On the Role of Software Quality Management

335

Fig. 4. Overview of the 58 new metadata attributes addressed over time. The darker
the color, the more papers in a year have this attribute assigned, whereas one paper
can have multiple attributes assigned. (Color ﬁgure online)

4.3

RQ3: Evaluation of Software Quality Management and Software
Testing

In this section, we limit our analysis to the groups improvement measures and
quality assurance techniques. As a ﬁrst step, we review the study methods applied
to the papers reporting knowledge in the groups of interest. In the second step,
the publications contributing to the groups of interest are evaluated according
to the rigor-relevance model [20] to allow for rating the (general) impact of the
diﬀerent topics.

336

J.W. Jacobsen et al.

Fig. 5. Overview study types applied to the groups improvement measures and quality
assurance techniques. (Color ﬁgure online)

Methods Applied. Figure 5 provides a heat map summarizing the study types
applied to investigate the diﬀerent topics. The overview shows that SPI in the
context of SQM is a fairly practically researched ﬁeld. The majority of the papers
assessed combine diﬀerent research methods, whereas case study research is the
most used approach—quite often in a mixed-method approach and also implementing a multi-case or longitudinal study approach (for term deﬁnitions, see
Wohlin et al. [38]). A remarkable insight is the absence of replication research.
Secondary studies and research based on Grounded Theory is present in the
study data set, yet the action research approach prevails. Regarding the topic
clusters, from the data, we see the cluster “Improvement Measures” fully covered, whereas in the cluster “Quality Assurance Technique” the topics software
infrastructure, traceability, training, and other are only partially covered.
Evaluation of Rigor and Relevance. In the second step, we evaluate the papers
within the groups of interest for their rigor and relevance according to [20]. In
the overall dataset, 58 out of 92 papers are rated highly relevant (4 points), and
of those, 37 papers are rated of high to very high rigor (2–3 points). In the following, we break-down our analysis to the groups “Improvement Measures” (Fig. 6)
and “Quality Assurance Techniques” (Fig. 7). In Sect. 5, we use the following
presentation to direct the detailed discussion.
Figure 6 visualizes the six topics within the group “Improvement Measures”
and shows that the topics of favor in these groups are (general) quality criteria,
defects, cost, and time. Research addressing the improvement of risk management
is, so far, underrepresented and of less rigor and relevance. Remarkable, the
majority of the papers in the aforementioned four categories is considered highly
relevant (score 4).
Regarding the group “Quality Assurance Techniques”, Fig. 7 shows the following topics of relevance: review, testing, documentation, guideline, and training. The groups guideline and training comply with an expectation when coming
from the ‘pure’ SPI perspective—a focus on methods, their documentation (as
guideline) and training. Among the more ‘applicable’ techniques, review, testing, and (test) documentation show a clear focus of the study data, whereas the

On the Role of Software Quality Management

337

Fig. 6. Classiﬁcation of the study dataset (attributes from “Improvement Measures”)
according to the rigor-relevance model.

techniques static analysis and verification are not that present in the data. More
“sophisticated” topics, such as traceability and software infrastructures are (yet)
not well represented in the study data.

5

Study Summary and Discussion

To provide a in-depth discussion, we ranked the highest rated papers regarding their coverage of improvement measures and quality assurance techniques
(Figs. 6 and 7; both based on the classiﬁcation according to the rigor-relevance
model). Table 1 summarizes these papers for the two categories “Improvement
Measure” and “Quality Assurance Technique”, whereas we only provide a subset
for the in-depth discussion. In particular, we select the papers [8,14,22,29] as
sample from the study data set, as we found those papers represented in both
categories.
Elliot et al. [8] document a methodology for implementing a software quality
management system (SQMS). Table 1 shows the method proposed addressing
quality management in general thus covering a number of attributes (in particular documentation, guideline, and training; reviews and (general) testing were
mentioned as concrete techniques to, inter alia, better address diﬀerent quality
criteria, especially in the “system use” section). Key factors for the successful

338

J.W. Jacobsen et al.

Fig. 7. Classiﬁcation of the study dataset (attributes from “Quality Assurance Techniques”) according to the rigor-relevance model.

implementation of the SQMS were staﬀ training and treating users like customers, which was also required for a cultural change within the organization.
Harter et al. [14] present a framework for assessing the economic value of
SPI and quality over the software lifecycle. The eﬀects to be measured are
deﬁned based on the number of defects (development quality: defects found
prior customer testing; conformance quality: defects found in customer testing prior acceptance)—similar measures are deﬁned for development eﬀort and
cycle time, and support costs. Therefore, in [14], authors mainly address the
attributes defects, cost, and time to conclude the economic value of SPI (Table 1).

On the Role of Software Quality Management

339

Table 1. Overview of the highest rated papers according to the rigor-relevance model
in the categories Improvement Measure and Quality Assurance Technique.
Impr. Measure

Paper

QA Technique

Quality Criteria [8, 13, 14, 22, 29, 30, 33] Review

Paper
[8, 14, 22, 29, 35]

Defects

[12–14, 22, 29, 30]

Static Analysis

[6, 35]

Risk

[12, 30]

Testing

[6, 8, 14, 22, 29, 35]

Cost

[8, 12–14, 22, 29, 30, 33] Verification

[22, 35]

Time

[8, 12–14, 22, 29, 30, 33] Documentation

[6, 8, 14, 22, 29, 35]

Guideline

[8, 14, 22]

Software Infrastructure —
Traceability
Other

[8, 12, 13, 30, 33]

[6, 22]

Training

[6, 8, 35]

Other

[6, 8, 22, 35]

Eventually, authors found that higher quality is associated with reduced cycle
times and development eﬀort, and that savings accrue due to reduced rework
and, moreover, that support activity savings outweigh development savings. Harter et al. conclude that future research eﬀorts should focus on how SPI strategies
aﬀect support activities.
Kasoju et al. [22] use evidence-based software engineering (EBSE) to help
an organization improve its testing process (domain: automotive software).
They use an in-depth investigation of automotive test processes using a mixedmethod approach including case study research, systematic reviews and value
stream analysis/mapping. For eight analyzed projects, authors collect information regarding the test approaches, project/system kind and size, and the
development approach used (Table 1; mainly attributes cost, time, testing, verification). In interview sessions, among other things, authors found interviewees
stating a lack of a clear test process, which can be applied to any project lifecycle. Only 3 out of 8 studied projects follow a deﬁned process (which indicates to
the mainly individual and non-standardized process selection as already found in
[26]; moreover, authors found that a basic testing strategy is actually deﬁned, yet
not implemented by most of the teams, which is also consistent with our previous
ﬁndings from [26]). Eventually, in [22], authors conclude strengths found for automotive software testing, such as work in small agile teams, implementing agile
(communication) practices, or diﬀerent approaches like exploratory testing. However, authors also mention that these ﬁndings also depend on project/team size,
i.e., teams of diﬀerent size might go for diﬀerent solution, e.g., comprehensive test
case management tools are considered more valuable for larger teams. Nevertheless, authors found process issues problematic for teams of any size (consistent
with [16]), e.g., lacking uniﬁed testing process, unawareness of the process, or
diﬀerent process-related constrains like available time windows. Finally, authors
identiﬁed seven wastes, which were mapped to the testing process to drive process
improvement.

340

J.W. Jacobsen et al.

Li et al. [29] describe how agile processes aﬀect software quality, software
defects and defect ﬁxing eﬃciency (Table 1; mainly attributes defects, testing,
time). A major ﬁnding is that a signiﬁcant reduction of defect densities or
changes of defect proﬁles could not be found after Scrum was used. Yet, due to
the iterative development approach, the development was considered more eﬃciently (e.g., fewer surprises, better control over the quality, and better schedule
adherence). However, on the downside, authors also mention that Scrum puts
more stress and time pressure on the developers (which could make them more
reluctant towards performing tasks relevant for later maintenance). In a nutshell, authors conclude that the actual development approach is less important
than iterative development and early testing (in their study, authors showed that
about half of the (critical) defects was identiﬁed and ﬁxed early thus reducing
the risk of ﬁnding bugs late).
Summarizing the big picture obtained (Fig. 4) and the exemplarily selected
papers (Table 1), we conclude: ﬁrst, testing as such is not that massively represented in the study data as expected. For this, we argue that there is specialized
(grey) literature on test process improvement (TPI), which is not properly linked
to SPI—a phenomenon that we already observed for GSE [25]. In particular, so
far, we did not found detailed data, e.g., regarding the actual impact of switching to an alternative test approach. On the other hand, we found indication for
individual and project-speciﬁc test approach selection (even in highly-regulated
domains; [22]), which conﬁrms a ﬁnding we made in [26]. Second, so far, we
found improving the quality focussing on reducing the number of defects. In
[22,29], the authors found a lack of uniﬁed (standardized) testing approaches
[22], and that the actual development approach (agile or traditional) seemingly
not aﬀects the defect densities or defect proﬁles. Harter et al. [14] suggest putting
more eﬀort in improving support activities. It therefore remains as a question
for future work whether an SPI program with a “broader” perspective is more
beneﬁcial then optimizing a “technical” test method.
Threats to Validity. In the following, we evaluate our ﬁndings and critically
review our study regarding the threats to validity. As a literature study, this
study suﬀers from potential incompleteness of the search results and a general
publication bias. Beyond this general threat to validity, we have to particularly
discuss the internal and external validity. The internal validity could be biased
by personal ratings of the researchers. To address this risk, we continued and
reﬁned our study [24], which follows a proven procedure that utilizes diﬀerent
tools and researcher triangulation to support dataset cleaning, study selection,
and classiﬁcation. The internal validity is also aﬀected by the limited data collection, in particular, no new data was collected, and data analyzed is derived from
the main study that serves as an umbrella. Calling in extra researchers to analyze and/or conﬁrm decisions therefore further increases internal validity. The
external validity is threatened by missing knowledge about the generalizability
of the results. Furthermore, this study “inherits” several limitations regarding
the external validity by relying on the main study’s raw data only. Consequently,
this study also inherits the main study’s scope thus having certain limitations

On the Role of Software Quality Management

341

regarding the generalizability. Nevertheless, to increase the external validity, further independently conducted studies are required to conﬁrm our ﬁndings.

6

Conclusion

The paper at hand provides an in-depth investigation of how software quality
management (SQM) is treated in software process improvement (SPI). Based on
a systematic mapping study [24], we selected all papers from the main study’s
dataset that address the topics SQM and software testing. In total, in this study,
we inspected 92 papers.
Our ﬁndings show indication that SPI in the context of SQM is equally
focussed on software testing as well as on complementing (or support) activities including reviews and documentation techniques. Furthermore, our ﬁndings show a trend in SPI towards utilizing individual testing approaches rather
than implementing/following standards. A detailed discussion of four exemplarily selected papers reveals that the actual software process is less relevant than
a smart arrangement of test activities (early testing) and an interactive implementation of the development process [29]. Furthermore, Harter et al. [14] suggest putting more eﬀort on supporting activities rather than optimizing (isolated)
technical tasks.
Limitations. Our study is limited by the context of the main study [24], yet showed
some overlap and similar trends as obtained in other independently conducted
studies, such as [11,26]. In total, only 92 papers were selected for analysis and,
therefore, this study cannot claim to have delivered a generalizable set of conclusions. A major limitation is the use of a given dataset only without an extra
topic-speciﬁc literature search, which potentially limits the reliability of the data.
An extension and a complementing search, however, is subject to future research.
Future Work. This paper provides the ﬁrst analysis iteration of the 92 papers
selected thus barely scratching the surface. Future work therefore includes further
detailed analyses of the study data. Furthermore, as being a study on a data subset, in future iterations, the data analyzed will be (re-)integrated with the main
study’s data to improve the overall data quality and reliability of the data.

References
1. Afzal, W., Alone, S., Glocksien, K., Torkar, R.: Software test process improvement
approaches: a systematic literature review and an industrial case study. J. Syst.
Softw. 111, 1–33 (2016)
2. Ashraﬁ, N.: The impact of software process improvement on quality: in theory and
practice. Inf. Manag. 40(7), 677–690 (2003)
3. Bayona-Oré, S., Calvo-Manzano, J., Cuevas, G., San-Feliu, T.: Critical success
factors taxonomy for software process deployment. Software Qual. J. 22(1), 21–48
(2014)

342

J.W. Jacobsen et al.

4. Bennett, T., Wennberg, P.: Eliminating embedded software defects prior to integration test. CROSSTALK J. Defense Softw. Eng., pp. 13–18 (2005)
5. Bertolino, A., Marchetti, E.: A brief essay on software testing. In: Software Engineering: Development Process, 3rd edn., vol. 1, pp. 393–411 (2005)
6. Damian, D., Zowghi, D., Vaidyanathasamy, L., Pal, Y.: An industrial case study
of immediate beneﬁts of requirements engineering process improvement at the
australian center for unisys software. Empirical Softw. Eng. 9(1), 45–75 (2004)
7. Dybå, T.: An instrument for measuring the key factors of success in software
process improvement. Empirical Softw. Eng. 5(4), 357–390 (2000)
8. Elliott, M., Dawson, R., Edwards, J.: An evolutionary cultural-change approach to
successful software process improvement. Software Qual. J. 17(2), 189–202 (2009)
9. Farooq, A., Dumke, R.R.: Research directions in veriﬁcation & validation process
improvement. ACM SIGSOFT Softw. Eng. Notes 32(4), 3 (2007)
10. Garcia, C., Dávila, A., Pessoa, M.: Test process models: systematic literature
review. In: Mitasiunas, A., Rout, T., O’Connor, R.V., Dorling, A. (eds.) Software Process Improvement and Capability Determination, pp. 84–93. Springer,
Heidelberg (2014)
11. Garousi, V., Felderer, M., Mäntylä, M.V.: The need for multivocal literature
reviews in software engineering: complementing systematic literature reviews with
grey literature. In: Proceedings of the 20th International Conference on Evaluation
and Assessment in Software Engineering, EASE 2016, pp. 26:1–26:6. ACM, New
York (2016)
12. Camargo, K.G., Ferrari, F.C., Fabbri, S.C.P.F.: Identifying a subset of TMMi practices to establish a streamlined software testing process. In: Brazilian Symposium
on Software Engineering, SBES, pp. 137–146. IEEE (2013)
13. Green, G.C., Hevner, A.R., Collins, R.W.: The impacts of quality and productivity
perceptions on the use of software process improvement innovations. Inf. Softw.
Technol. 47(8), 543–553 (2005)
14. Harter, D.E., Krishnan, M.S., Slaughter, S.A.: The life cycle eﬀects of software
process improvement: a longitudinal analysis. In: Proceedings of the International
Conference on Information Systems, ICIS, Atlanta, GA, USA, pp. 346–351. Association for Information Systems (1998)
15. Helgesson, Y.Y.L., Höst, M., Weyns, K.: A review of methods for evaluation of
maturity models for process improvement. J. Softw. Evol. Process 24(4), 436–454
(2012)
16. Horvat, R.V., Rozman, I., Györkös, J.: Managing the complexity of SPI in small
companies. Softw. Process Improv. Pract. 5(1), 45–54 (2000)
17. Huang, L., Boehm, B.: How much software quality investment is enough: a valuebased approach. IEEE Softw. 23(5), 88–95 (2006)
18. Hull, M., Taylor, P., Hanna, J., Millar, R.: Software development processes - an
assessment. Inf. Softw. Technol. 44(1), 1–12 (2002)
19. Humphrey, W.S.: Managing the Software Process. Addison Wesley, Boston (1989)
20. Ivarsson, M., Gorschek, T.: A method for evaluating rigor and industrial relevance
of technology evaluations. Empirical Softw. Eng. 16(3), 365–395 (2011)
21. Karthikeyan, S., Rao, S.: Adopting the right software test maturity assessment
model. Technical report, Cognizant (2014)
22. Kasoju, A., Petersen, K., Mäntylä, M.V.: Analyzing an automotive testing process
with evidence-based software engineering. Inf. Softw. Technol. 55(7), 1237–1259
(2013)

On the Role of Software Quality Management

343

23. Kitchenham,B., Charters, S.: Guidelines for performing systematic literature
reviews in software engineering. Technical Report EBSE-2007-01, Keele University (2007)
24. Kuhrmann, M., Diebold, P., Münch, J.: Software process improvement: a systematic mapping study on the state of the art. PeerJ Comput. Sci. 2(1), 1–38 (2016)
25. Kuhrmann, M., Diebold, P., Münch, J., Tell, P.: How does software process
improvement address global software engineering? In: International Conference on
Global Software Engineering, ICGSE, pp. 89–98. IEEE (2016)
26. Kuhrmann, M., Fernández, D.M.: Systematic software development: a state of the
practice report from Germany. In: International Conference on Global Software
Engineering, ICGSE, pp. 51–60. IEEE (2015)
27. Kumar, P.: Test process improvement - evaluation of available models. Technical
report, Maveric (2012)
28. Larrucea, X., O’Connor, R.V., Colomo-Palacios, R., Laporte, C.Y.: Software
process improvement in very small organizations. IEEE Softw. 33(2), 85–89 (2016)
29. Li, J., Moe, N.B., Dybå, T.: Transition from a plan-driven process to scrum: a
longitudinal case study on software quality. In: Proceedings of the 2010 ACM-IEEE
International Symposium on Empirical Software Engineering and Measurement,
ESEM 2010, pp. 13:1–13:10. ACM, New York (2010)
30. McGarry, F., Burke, S., Decker, B.: Measuring the impacts individual process
maturity attributes have on software products. In: Proceedings of Fifth International on Software Metrics Symposium, Metrics 1998, pp. 52–60. IEEE (1998)
31. Monteiro, L.F.S., de Oliveira, K.M.: Deﬁning a catalog of indicators to support
process performance analysis. J. Softw. Maintenance Evol. Res. Pract. 23(6), 395–
422 (2011)
32. Petersen, K., Feldt, R., Mujtaba, S., Mattson, M.: Systematic mapping studies in
software engineering. In: International Conference on Evaluation and Assessment
in Software Engineering, EASE, pp. 68–77. ACM (2008)
33. Pino, F.J., Garcı́a, F., Piattini, M.: Software process improvement in small and
medium software enterprises: a systematic review. Software Qual. J. 16(2), 237–
261 (2008)
34. Staples, M., Niazi, M., Jeﬀery, R., Abrahams, A., Byatt, P., Murphy, R.: An
exploratory study of why organizations do not adopt CMMI. J. Syst. Softw. 80(6),
883–895 (2007)
35. Sylemez, M., Tarhan, A.: Using process enactment data analysis to support orthogonal defect classiﬁcation for software process improvement. In: International Conference on Software Process and Product Measurement, IWSM-MENSURA, pp.
120–125, October 2013
36. von Wangenheim, C.G., Hauck, J.C.R., Salviano, C.F., von Wangenheim, A.:
Systematic literature review of software process capability/maturity models.
In: International Conference on Software Process Improvement and Capability
Determination-SPICE (2010)
37. Wieringa, R., Maiden, N., Mead, N., Rolland, C.: Requirements engineering paper
classiﬁcation and evaluation criteria: a proposal and a discussion. Requirements
Eng. 11(1), 102–107 (2005)
38. Wohlin, C., Runeson, P., Höst, M., Ohlsson, M.C., Regnell, B., Wesslén, A.: Experimentation in Software Engineering. Springer, Heidelberg (2012)
39. Zhi, J., Garousi-Yusifoğlu, V., Sun, B., Garousi, G., Shahnewaz, S., Ruhe, G.:
Cost, beneﬁts and quality of software development documentation: a systematic
mapping. J. Syst. Softw. 99, 175–198 (2015)

Transitioning Towards Continuous
Experimentation in a Large Software
Product and Service Development
Organisation – A Case Study
Sezin Gizem Yaman1(B) , Fabian Fagerholm1 , Myriam Munezero1 , Jürgen
Münch1,2 , Mika Aaltola3 , Christina Palmu3 , and Tomi Männistö1
1

2

Department of Computer Science, University of Helsinki,
P.O. Box 68, 00014 Helsinki, Finland
{sezin.yaman,fabian.fagerholm,myriam.munezero,
jurgen.munch,tomi.mannisto}@helsinki.fi
Reutlingen University, Danziger Straße 6, 71034 Böblingen, Germany
juergen.muench@reutlingen-university.de
3
Ericsson, Hirsalantie 11, 02420 Jorvas, Finland
{mika.aaltola,christina.palmu}@ericsson.fi

Abstract. Context: Companies need capabilities to evaluate the customer value of software-intensive products and services. One way of
systematically acquiring data on customer value is running continuous experiments as part of the overall development process. Objective:
This paper investigates the ﬁrst steps of transitioning towards continuous experimentation in a large company, including the challenges faced.
Method: We conduct a single-case study using participant observation,
interviews, and qualitative analysis of the collected data. Results: Results
show that continuous experimentation was well received by the practitioners and practising experimentation helped them to enhance understanding of their product value and user needs. Although the complexities of a large multi-stakeholder business-to-business (B2B) environment
presented several challenges such as inaccessible users, it was possible to
address impediments and integrate an experiment in an ongoing development project. Conclusion: Developing the capability for continuous
experimentation in large organisations is a learning process which can
be supported by a systematic introduction approach with the guidance
of experts. We gained experience by introducing the approach on a small
scale in a large organisation, and one of the major steps for future work
is to understand how this can be scaled up to the whole development
organisation.
Keywords: Continuous experimentation · Experiment-driven software
development · Product management · Lean startup · Customer development · Customer involvement · Organisational transition · Agile software
development · Case study
c Springer International Publishing AG 2016

P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 344–359, 2016.
DOI: 10.1007/978-3-319-49094-6 22

Transitioning Towards Continuous Experimentation

1

345

Introduction

Continuous experimentation is a software development approach where research
and development (R&D) activities are driven by constantly conducting experiments with product value [1–3]. Customers and users are involved in the decisionmaking process as experiment subjects, providing data by interacting with experiment materials, such as the software features being developed or related design
artefacts. Product value is tested by observing actual behaviour rather than
relying on secondary sources, opinions, or assumptions.
Although several approaches to experiment-driven software development
have been proposed (e.g. [1,2,4]), guidance is lacking on how development
teams in large organisations with complex business partnership networks can
adopt them. In this paper, we investigate the introduction of continuous experimentation in a large software development organisation in a B2B domain. We
observe diﬀerent roles, means of communication, and integration with the overall development process. Furthermore, we investigate how customers and users
are accessed and involved. We collect observed challenges and lessons learned
that arise when the teams attempt to perform experiments to support decisionmaking. More speciﬁcally, we seek to answer the following research question:
RQ: How can a large software development organisation transition towards continuous experimentation in a B2B domain?
In order to address the research question, we conducted a single-case study
in which we observed and participated in the introduction of continuous experimentation in a large company. Two teams, a development and a UX team,
collaborated to select a target for experimentation and to design and implement an experiment to help make a focused product decision. Through the case
study, we uncovered some of the critical factors that may support or impede the
transition.
The rest of this paper is structured as follows. Section 2 presents the background and related work relevant to this study. Section 3 describes the research
approach, including the context in which the case study was conducted, and the
data collection and analysis methods. The design and execution details of an
experiment conducted by the case company are detailed in Sect. 4. The transition process towards continuous experimentation is outlined in Sect. 5. The
ﬁndings are discussed and the research question is addressed in Sect. 6. Section 7
concludes the paper and highlights potential future work.

2

Background and Related Work

Considering product value as a ﬁrst-class concept in software development was
proposed in value-based software engineering (VBSE) [5]. VBSE asserts that
instead of treating software engineering as value-neutral, its major artefacts and
activities should be analysed to assess what value they provide to customers
and users, and use knowledge of that value in decision-making. Value has also

346

S.G. Yaman et al.

been considered in agile software development [6,7] and in approaches to product
development and entrepreneurship such as Lean Startup [8], Customer Development [9], and Lean Analytics [10]. A body of literature is emerging in software
research that addresses this and related topics. In this section, we review a selection from this set of related work.
To survive and compete in today’s fast-changing development environments,
organisations have to develop, release, and learn from their software products and
services quickly [11]. Hence, many software companies have adopted or are adopting agile practices, which champion ﬂexibility, eﬃciency, and speed in developing
software [6]. Nevertheless, Holmström Olsson et al. [12] suggest that the application of agile methods in software R&D activities is only one stage on the
maturation path of companies’ software engineering practices. At the ﬁnal stage
of the model – R&D as an experiment system – development is based on rapid
experiments that utilise instant customer feedback and product usage data to
identify customer needs.
The experiment-driven stage of software product and service development
not only allows for quick delivery of value to customers but also helps companies
make decisions based on customer or user data rather than opinions [1–3,13].
Through experiments, organisations can gain evidence about which features customers actually want, thus helping them to avoid developing features that are
not valuable to customers [4]. As Bosch [14] states, “the faster the organisation
learns about the customer and the real-world operation of the system, the more
value it will provide.”
Continuous experimentation may take diﬀerent forms in diﬀerent environments. Rissanen and Münch [3] list a number of customer-related challenges
that continuous experimentation faces in B2B domains. For instance, customers
may have to be informed in advance and sign a written agreement to participate
in experiments. End users are not always the customers of the organisation, but
they can be a customer’s customer. Pro-active lead customers might have to be
involved in the experiment design process, but may be challenging to acquire.
Also, it may not be possible to interrupt the daily work of users in order to
involve them in experiment tasks.
Thus, how to integrate experimentation in the software product development cycle is still a key question. Fagerholm et al. [2] propose the RIGHT model
for continuous experimentation. The model consists of a process model and an
architectural model. In the process model, assumptions are ﬁrst identiﬁed, experiments are designed to test them, experiment materials (such as minimum viable
features) are built, the experiment is executed, and analysis results are then used
to support product development decision-making. The decision may be to fully
develop and deploy a feature or to pivot if the experiment indicates that the feature is unsuccessful. The architectural model outlines additional infrastructure
that is required to carry out such experiments continuously, in parallel, and at
scale. In this study, we are guided by the fundamentals of the RIGHT model in
the introduction of continuous experimentation.

Transitioning Towards Continuous Experimentation

3

347

Research Approach

This study follows a holistic single-case study approach [15] in order to gain
deeper understanding of how development teams in a large organisation adopt
continuous experimentation. Additionally, the study has elements of action
research, in that the researchers were actively involved with the process being
studied [16]. The unit of analysis is the process of transitioning towards continuous experimentation. We observe only the start of the transition, but consider
this unit of analysis to be bounded by an identiﬁable starting point, and potentially ending in either non-adoption or adoption to diﬀerent degrees. The transition process may be considered to concern several parts of the organisation, but
our observation is limited to one unit concerned with product development. The
data collection phase took place over a three-month period in autumn 2015.
3.1

Case Context

The company involved in the case study is a global corporation specialising in
providing communication technology and services. The organisation is highly
distributed, with globally allocated development teams. This study is conducted
in the context of a connectivity management and billing service platform that
the company develops for telecom operators and their enterprise customers. This
platform includes a management portal, used by operator users, which is the
focus of this study.
Figure 1 illustrates the parties and their location in the B2B network, revealing a multi-layer structure of stakeholders. The platform development project
involves 11 teams, with around 70 people, who are distributed over multiple
locations globally. The unit of observation in this study is one software development team and one UX team located in Finland, who are working on the
aforementioned management portal.
The teams are incrementally developing a new version of the portal, which
includes modernising the visual design and functionality. While the purpose is
to keep the current set of functionality, enhancements to user workﬂows can be
made if this does not impede the delivery schedule.
At the time of the study, the two teams were tasked with implementing an
activity log inside the portal which would provide information about mobile
subscription events, such as when a SIM card is registered on the network, a
data transfer occurs, or an SMS is sent. The activity log is used by operator
users to troubleshoot problems with enterprise subscriptions. A typical scenario
would involve troubleshooting during a support call. The activity log was chosen
for this study both because it was the teams’ next assignment, and because there
were open questions regarding its design.
3.2

Research Process

The study was conducted in an iterative fashion, with company representatives
evaluating decision points, executing the experiment, collecting and analysing

348

S.G. Yaman et al.

Fig. 1. The case company and other actors in the B2B network formed around the
platform. For clarity, only one operator is shown, although there are multiple operators.

the experiment data, and with researchers observing the process, analysing the
collected research data, and proposing alternative decision paths. An initial
meeting was held where the principles of continuous experimentation and the
RIGHT model [1,2] were explained to development and UX teams and product
owners. After reaching a positive decision from the company, the joint collaboration proceeded. Multiple meetings were held, both online and face to face, to
(1) understand the case context, (2) explore and select an experiment target, (3)
identify assumptions related to that target, (4) develop a hypothesis and experiment design, (5) discuss operational details regarding experiment execution, (6)
analyse experiment data, (7) draw conclusions based on the analysis and (8)
plan the next steps. Between meetings, materials from previous iterations were
analysed and developed to support subsequent decisions and actions.
This study uses materials produced in and for the meetings as well as other
primary data sources, which include participant observation, transcripts of audio
recordings of face-to-face meetings, minutes and notes of meetings (both at the
customer site and online, including weekly online status meetings), open-ended
semi-structured interviews, email communication and background material from
the company. In total, there were three on-site and eight remote meetings. The
accumulated material was analysed using thematic analysis [16,17]. The data
was ﬁrst extracted and analysed to form initial themes. These were then crosschecked against the gathered materials and reﬁned into ﬁnal themes which are
presented and discussed in Sect. 5.

Transitioning Towards Continuous Experimentation

4

349

Designing and Executing the Experiment

As our aim was to observe the introduction of continuous experimentation in a
company, we conducted an actual experiment round with a real product, i.e.,
the activity log described in Sect. 3.1. Here, we did not seek to reach a valid and
generalisable result in the scientiﬁc sense, but rather to obtain enough evidence to
support a technical decision. In this section, we describe the process of designing
and executing the experiment.
The experiment was planned by a technical coach from the development
team, two people from the UX team, and three researchers. The ﬁrst decision
to be made was to select a target for the experiment by analysing the feature requirements for the activity log. Behaviour-driven development (BDD)
stories [18] were developed and analysed during the study in order to better
understand the user requirements associated with the activity log.
In total, seven BDD stories pertaining to the activity log were analysed.
With each BDD story, underlying assumptions regarding user needs and behaviour were identiﬁed. From the identiﬁed assumptions, hypotheses to be tested
were formed. Subsequently, proposals of experimental designs to validate the
hypotheses were drafted. From these, the development and UX teams selected
one design proposal to be the experiment target, which was then elaborated into
a more complete experiment design.

Fig. 2. Mockup of activity log with “reconnect” button indicated and feedback message
displayed.

The selected experiment tested options for a feedback message that is displayed after operator users click on a “reconnect” button in the activity log
(see Fig. 2). The reconnect button sends a request to the mobile network, asking
it to ﬂush the current SIM card registration, which means the mobile device
must reconnect in order to resume normal operation. This action can be used
to recover from certain error conditions. As the mobile network provides no

350

S.G. Yaman et al.

feedback on the request, the reconnect status cannot be accurately displayed to
users in the activity log. This might lead to a situation where a user clicks the
button several times to no avail. Thus a good feedback message would inform
a user on the current state of the system as well as what to do next, while a
bad feedback message would result in increased load on the network, delays in
problem resolution, worse experience for all users involved, and potential costs
associated with these negative eﬀects.
A series of user interface mockups with feedback messages were created for
the experiment. These were ﬁrst piloted with the product owner and updated
based on the feedback given. After the update, the experiment was run with test
subjects. Two runs of the experiment were conducted as illustrated in Table 1.
In the ﬁrst run, the experimenters realised that there were ﬂaws in the mockups
and feedback messages – they were unclear and misleading to the test subjects.
Additionally, the experimenters had diﬃculties determining whether a user succeeded according to the criteria outlined in the hypothesis (see Table 1). Hence,
the mockups and feedback messages were updated and the experiment was rerun.
In the second run, the original reconnect feedback message was also included. In
both of the runs, the order of the message candidates was balanced so that each
message appeared at least once and the order changed for each test subject in
order to avoid the risk of a learning eﬀect biasing the results.
From the data analysis of the second run, one feedback message (message 6
in Table 2) had the highest score, on criterion 1, which was prioritised by the
teams. It also scored well on the other criteria. Message 6 was thus selected for
inclusion in the next product release. The results also revealed that the original
message (message 7) performed poorest on all three criteria.

5

Transitioning Towards Continuous Experimentation

In the process of planning, designing, executing, and analysing the experiment,
a number of observations and inferences were made regarding the transition
towards continuous experimentation both from practitioners’ and researchers’
points of view. In this section, we present these ﬁndings under the themes that
were deduced from our data analysis.
5.1

Initial Circumstances

Prior to the decision to proceed with experiment-driven software development,
we observed an initial interest towards continuous experimentation among company representatives, but also concerns as the adoption process started in the
middle of development with an evolving product. The product owner wanted to
limit risks while practising with the new approach. This raised some important
questions: is it possible to start at the team level and with small-scale experiments in order to gain experience before scaling up to multiple teams, higher
in the organisation, and experiment targets that have a larger impact on the

Transitioning Towards Continuous Experimentation

351

Table 1. Experiment details for the ﬁrst and second experiment run.
BDD story

As an activity log user, I want to ﬂush network memory for a
subscription so that I can be sure that there is no mismatched
information and next I can see when the device connects to the
network
First run
Second run

Hypothesis

We believe that with the right
feedback message, users should
be able to tell: (1) what the
state of the device connection is
and (2) what the next action is.
In order to validate this, users
will be shown a set of feedback
messages and will be asked to
provide answers to the above
two criteria. The message with
the most “yes” answers for each
criterion will be the best
message and will be selected

Minimum
viable feature

Five mockups (PowerPoint)
Seven mockups (PowerPoint)
with diﬀerent feedback messages with diﬀerent feedback messages

Test subjects

Three internal company
employees invited by the
experimenters based on
availability

Seven internal company
employees invited by the
experimenters based on
availability

Experimenters

One person from development
team and one from UX team

One person from development
team, one from UX team and an
additional observer from the UX
team (present only in some
sessions)

Collected data

Yes or no scores for each test subject according to each
hypothesis criterion, experimenters’ observations of test subjects
during the experimentation, and unstructured interview notes

Duration (total) 60 min
Data analysis

We believe that with the right
feedback message, users are able
to tell: (1) what the next action
to take is, (2) what the state of
device connection is, and (3)
what to do if the device does
not connect to the network. In
order to validate this, users will
be shown a set of feedback
messages and will be asked to
provide answers to the above
three criteria. The message with
the most “yes” answers for each
criterion, especially criterion 1,
will be the best message and
will be selected

120 min

Experimenter judgement (yes or no) scores on each criterion for
each feedback message candidate were summed. The sums were
used to rank the feedback messages to identify the best message

system being developed? Moreover, existing release deadlines dictated the target and scale of the experiment, as well as the resources that could be allocated
to it.

352

S.G. Yaman et al.

Table 2. Scores for each feedback message with the winning message highlighted.
Each test subject was exposed to seven message candidates and was scored by two
experimenters. Criterion 1 had double the weight when choosing the winner. (Note:
There was a data entry error for message 5 where one test subject’s scores on criterion
2 and 3 were not recorded. However, this does not impact the result of the experiment.)
Feedback
message
candidate
1
2
3
4
5
6
7

5.2

Criterion 1
(weight: 2×)

Criterion 2
(weight: 1×)

Criterion 3
(weight: 1×)

Yes

No

Yes

No

Yes

No

7
9
5
5
7
13
4

7
5
9
9
7
1
10

7
10
8
10
8
11
5

7
4
6
4
4
3
9

4
3
2
2
6
4
3

10
11
12
12
6
10
11

Starting with Small Teams

We observed that beginning with small teams who are interested did facilitate
the introduction of a new way of working in the large case organisation. The
development team consisted of four developers and the UX team consisted of
two persons. Each team had an active person, a “champion”, who took the lead
in conducting the experiment and communicating the approach to other team
members.
While it was possible to get a quick, low-risk start by beginning with small,
motivated teams, we observed challenges which might impact scaling of continuous experimentation. For instance, we observed that organisational factors
inﬂuenced the ease at which experimentation targets could be identiﬁed. The
necessary product requirements were not always available at the team level. We
furthermore observed that limitations in the teams’ area of inﬂuence aﬀected the
experimentation activities. For instance, the decision to involve real users in conducting the experimentation required approval from diﬀerent management levels
and extra consideration since most of the customers were abroad. Also, dependencies on other teams and release management decisions meant that product
changes based on the experiment result could not be immediately integrated into
the next release, but into the succeeding one.
5.3

Small-Scale Experiments

From the time that BDD stories were developed to the analysis of the results,
while the experiment planning process took approximately one month, executing
the experiment only took a couple of hours (see Table 1). Nevertheless, the aim
was to initiate the experimentation activity and to learn how to experiment,
i.e., to “experiment with experimentation” as the technical coach put it. He also

Transitioning Towards Continuous Experimentation

353

added that “[It’s better to] start experimenting with something small. [...] It’s
more important to start now. Practice will make it perfect.”
As continuous experimentation is a way to achieve customer and user involvement, user access was a discussion point during the experiment planning stage.
The decision to use internal test subjects was mostly a question of time and eﬀort
as noted by the UX designer: “It would be really time consuming to contact our
actual customers, write emails and explain what this [experiment] is about. [The
whole] idea of experimentation is quite new to our customers so [there are] kind
of political reasons why in the ﬁrst place we did not contact our customers. It
was so agile to do it in-house and we did it so fast with our workmates. [...] We
wanted to learn about the continuous experimentation approach and it would
be easier to practice it in-house for the beginning”. The technical coach also
added that “there is a limit to how much you can e.g., interview the customers
before you provide something [concrete]”. The team members were aware of the
drawbacks of using internal test subjects, but deemed it more important to get
started with the ﬁrst cycle than spending time on accessing users. “Of course
we thought of how much [more beneﬁt could be gained by experimenting] with
actual customers. But then, this experimentation is about the UX part, [...] and
we did not see that we would get much more beneﬁt if we had waited weeks to
get real customer input.” (Technical coach).
5.4

Identifying an Experimentation Target

We observed that it was not straightforward to identify an experimentation
target. In particular, options tended to be more technical than value-based.
During planning, it emerged that there was no clear understanding on some of the
platform features, and user requirements were not directly available in written
form. Instead, we deduced them from other materials, such as user journeys
and personas obtained from user research, and mockups from prototypes, all
developed in the beginning of the project. We had numerous discussions with
the teams to clarify the purpose of the activity log and its diﬀerent functions.
Finally, BDD stories were developed and utilised to identify assumptions behind
the user requirements, and the experiment was derived from those assumptions.
5.5

Designing and Executing the Experiment

The experiment was run with internal company employees. Even though the
case company had done a pilot study with a product owner to revise the experiment design, they easily recognised during the ﬁrst experiment run with test
subjects that additional planning was essential. Test feedback messages were
unclear and scoring criteria speciﬁed in the hypothesis were not explicit enough
for experimenters to reach an agreement. Therefore, better background information and clearer instructions for the subjects were developed before running
a second round of the experiment. The hypothesis was also revised and clearer
tasks for the experimenters were deﬁned. Although some eﬀort should be spent
on improving the design and execution, we found that it had to be balanced

354

S.G. Yaman et al.

with available resources. Small-scale experiments especially meant that eﬀort
should not be expended beyond what is required to get a suﬃcient result: “overplanning [improving the experiment beyond a certain point] would be pointless”,
according to the technical coach.
5.6

Collaborating with Experts

Expertise was provided by the ﬁve researchers involved in the introduction of
the continuous experimentation approach. Support for the transition was particularly provided during the planning and execution of the experiments.
At the beginning of the study, the teams and other company representatives had to spend time introducing the product and its context to the experts.
However, they stated that it was beneﬁcial to have expert facilitators guiding
the transition and providing support and guidance when they needed it. In this
case, some mistakes were avoided through expert opinion. For instance, during
the execution stage, guidance was provided on how to achieve more valid results
and avoid introducing bias during the experiment – e.g. avoiding leading the
users by keeping discussions between experimenters and test subjects minimal,
and ensuring that there were at least two experimenters.
5.7

Persistence

Continuous experimentation may be easy to understand in principle, but actually starting it in a real, large B2B organisation required persistence. The ﬁnal
experiment design was reached after a number of attempts. The pilot run and
two rounds of actual experiment runs were required to obtain data for the ﬁnal
analysis. The teams indicated that when starting, one should not dwell on temporary failures. Better to “ﬁx the experiment [the] best way you can and run it
again. You can learn so much with each experiment.” (Technical coach)
Moreover, the teams were willing to include experimentation in some of their
standard procedures. They decided to build a wiki library where all experimentation details and learnings would be stored so that the information can be reused
when necessary and help guide other teams who want to practice the approach.
Also, the champions in the teams were persistent in documenting each step of
the process, which helped communication internally and with experts. Thus,
some of the prerequisites of scaling the approach to cover a larger portion of the
organisation are in place.
Table 3 summarises the challenges faced when transitioning towards experimentation together with observed mitigation strategies under each of the six
themes presented in Sects. 5.1, 5.2, 5.3, 5.4, 5.5, 5.6 and 5.7.

6

Discussion

Transitioning towards continuous experimentation is a learning process, at the
core of which is the development of the organisational capability to identify

Transitioning Towards Continuous Experimentation

355

Table 3. Identiﬁed challenges and mitigation strategies.
Theme

Challenges

Mitigation strategies

Initial
circumstances

– Evolving product, existing
plans, deadlines
– Limited resources
– Need to limit risks

– Allocate only few resources to
begin with
– Choose a small scope for the
initial experiment

Starting with small –
teams and
small-scale
experiments
–

Higher level product informa- – Involve people from diﬀerent
tion might not be visible at the
teams in brainstorming and planteam level
ning the experiment together
The team’s area of inﬂuence – Utilise resources that are more
may be limited
accessible, e.g. internal company
– Inaccessibility of real users
employees
– Experimentation activities
– Good to have champions in
may not be initiated because
teams pioneering the transition
of prior commitments

Identifying an
experimentation
target

– Diﬃcult to select the features – Utilising existing product-related
to start experimenting with
materials helps identify experi– Identiﬁed experiment targets
ments, e.g., BDD stories
may be on a mostly technical – Having discussions with team
level
members and experts
– Carefully analyse the feature to
be experimented on to identify
user needs and assumptions

Designing and
executing the
experiment

– The lack of experimenter expe- – Piloting and rerunning the experirience can lead to biases being
ment helps to enhance the experiintroduced in the design and
ment design, and reach more valid
execution of experiments
results
– Eﬀort and time for planning
– Seek expert advice to avoid potenneeds to be allocated for
tial biases in the experiment
running a valid experiment
– Overplanning should be avoided;
when starting, the important
thing is to learn

Collaborating with
experts

– Eﬀort is needed for
introducing the product and
the context to the experts

Persistence

– First experimentations can be – Keep practising, learning will
seen as eﬀortful and
increase eﬃciency
non-eﬃcient
– Experiment designs and
guidelines for executing
experiments can be gathered and
reused, reducing future eﬀort

– Experts can help avoid mistakes in
experiment design and execution;
the eﬀort to introduce may pay oﬀ
– Introduction may be sped up by
using materials that are already
needed in development, such as
user stories and requirements
expressed as, e.g., BDD stories

356

S.G. Yaman et al.

assumptions, test them in experiments, and support development decisions based
on the evidence. However, many other factors play a role in the transition both
on the team and organisational levels. In this section, we address the research
question and compare the study ﬁndings with related work.
How can a large software development organisation transition towards continuous experimentation in a B2B domain?
Based on the ﬁndings from the case study data analysis, we identiﬁed circumstances and activities that can be taken to enable the transition towards
continuous experimentation. Even though there might be initial circumstances
that constrain the transition, we observed that initiating the transition is possible by starting with small teams and small-scale experiments. In order to lower
the barriers to starting, experiment targets can be identiﬁed from existing materials. Collaboration with experts can be used for guidance and support but team
eﬀort is still needed in planning, designing and executing experiments. At this
stage, learning about experimentation is the most important thing. Later, it is
essential to ﬁnd ways to sustain the process, making it continuous, and scaling
it to cover a larger part of the development organisation.
6.1

Challenges and Lessons Learned

The biggest beneﬁt gained by the teams was that they learned to perform experimentation in a more systematic way, which will help them to better understand
what their customers want and take the right steps in increasing user satisfaction
and reduce support costs. Doing experimentation also helped the teams gain new
insights and better understanding of their ongoing work. In addition, based on
the teams’ experiences, they realised that “experimentation made it clear to the
team that there is no need to debate between opinions and assumptions as you
can quickly test them with an experiment.” (Technical coach)
Information about user needs, e.g. requirements, is needed to identify assumptions. In this case, BDD stories proved to be useful for this purpose, but other
forms may be possible, as well. The experimentation activity needs to be integrated with the overall development process. Once integrated, the eﬀort put in
planning experiments could diminish. In this case, planning the study and the
experiment took around a month each, since they included establishing the collaboration between the experts and the teams from the organisation, getting
to know the context, and identifying assumptions in the product. On the other
hand, running the experiment itself took only hours.
Several challenges arose because of the complex B2B environment. For
instance, the path from the development organisation to users through the B2B
network was long and involved many organisations. For this reason, end users
could not be included in experiments with reasonable eﬀort, a ﬁnding that is
in line with Rissanen and Münch’s [3] observations. Other approaches to get
experiment data were needed in order to compensate this challenge. In this case,
internal test subjects were an economical way to start developing the capability

Transitioning Towards Continuous Experimentation

357

for continuous experimentation, and made it possible to even gain some decision
support for a small product decision although the set of subjects was limited.
Some barriers faced were a result of timing. The process started in the middle
of development with an evolving product. Existing release deadlines inﬂuenced
the resources available, as well as the target and scale of the experiment. The
selected product feature to be experimented on was quite small and the experiment was more about optimisation rather than validating a complete feature.
However, when the purpose is to practise experimentation, this is not a critical
issue, but it must be understood that experiment results might not be the most
beneﬁcial or target the most value-creating features in the beginning.
In general, it was diﬃcult to design an experiment that would actually test
the value of a feature. This would have entailed determining whether the feature
is necessary or suitable for accomplishing a given task that has already been
found to fulﬁl a user need. There are multiple possible reasons for why designing
a value-related experiment was diﬃcult, and the long chain from development
to user was one of them. It was unclear what the value of diﬀerent features was
and for whom. Also, the long chain meant more uncertainty about how a feature
contributes to value. A feature may fulﬁl a need indirectly, and mapping the
chain was not possible in this study.
Another challenge was that it was diﬃcult to design a behavioural experiment
task, meaning a task that would test whether a feature contributes to a behaviour change. This would have been necessary in order to determine whether the
feature contributes to a user need, since it is through behaviour that the need is
fulﬁlled. In this study, the experiment relied on subjects telling what they would
do rather than observing whether they carried out certain actions or not. Part
of the reason was the eﬀort and cost of setting up required experiment materials. Observing behaviour requires interactive materials that allow the user to
express the behaviour to be observed. Lack of such materials may be a barrier
when initiating the transition towards continuous experimentation.
Furthermore, it was observed that the scale of experiments can be adjusted
operationally so that little or no development eﬀort is required, for example
by using PowerPoint mockups as in this case. On the other hand, conducting
small-scale experiments is part of a tradeoﬀ between smoothing the path towards
continuous experimentation and reaching the level where experiments directly
target customer value.
6.2

Threats to Validity

Researchers’ expectancy bias might be a threat in this study. Researchers might
interpret the collected data in such a way that it fulﬁls their expectations. In
order to mitigate this threat, participants from the case company were involved
in the study data analysis stage and participant validation was used in order to
verify the study results.
Researcher triangulation was used to address construct and external validity
in terms of accuracy checking. Two researchers ﬁrst conducted the initial study
data analysis, then reviewed the analysis process with a third researcher and

358

S.G. Yaman et al.

later, all ﬁve researchers reviewed the results along with discussion sessions.
This ensured that the study results would not rely on interpretations of single
researchers only.
In terms of generalisability, we are interested in whether the results of this
study would be applicable to other software development organisations transitioning towards continuous experimentation. The characteristics of the case
company, such as its size, structure, customers, business domain, the scale of the
experiment, the product, and other contextual factors may limit the transferability of the results presented in this paper. It is not yet clear how such transfers can
be made. Due to the novelty of the ﬁeld and early maturity of experimentation
in the company, there is not much evidence available to support transferring the
results. We hope our ﬁndings will contribute to the knowledge about transition
to continuous experimentation when combined with further research.

7

Conclusion

We conducted a holistic single-case study in a large, global telecom company
operating in a B2B environment. We introduced an approach to continuous
experimentation in the case company. Two company teams and ﬁve researchers
conducted a single experiment cycle with internal test subjects. The experiment
results allowed the company to make a product development decision which
improved the usability of a part of their product. By participating in the activity,
we observed the ﬁrst steps of a transition towards continuous experimentation.
We found that the approach was easy for practitioners to understand and
reception was favourable in general. The experiment activity highlighted important questions about the product under development and how it could best
serve users. The collaboration between the UX and development teams was also
enhanced, as expertise from both was required to plan and execute the experiment. Starting with small teams and experiments with a tightly limited scope
allowed a fast start and a short, one-month cycle time from design to results.
We also found several challenges that may hinder the adoption of an
experiment-based approach and limit its beneﬁts in the initiating phase. Our
study shows that it may be diﬃcult to ﬁnd an experiment target due to information about user needs and goals being scattered in a large B2B organisation. This makes it diﬃcult to identify the assumptions that should be tested
in experiments. It may also be diﬃcult to reach the level where experiments
directly address product value rather than optimising usability. Involving users
directly in experiments was diﬃcult in this B2B case, and may come with additional cost, but would also make experiments more valid and relevant. Designing
experiments around assumptions about the user behaviours that are related to
value creation should result in experiments with more impact. This remains a
diﬃcult challenge which warrants further research. More research is also needed
on how to integrate continuous experimentation with the overall organisation
and how this aﬀects culture, architectures, methods, processes, management,
and staﬃng in contemporary organisations.

Transitioning Towards Continuous Experimentation

359

References
1. Fagerholm, F., Guinea, A.S., Mäenpää, H., Münch, J.: Building blocks for continuous experimentation. In: Proceedings of the 1st International Workshop on Rapid
Continuous Software Engineering, RCoSE 2014, pp. 26–35. ACM, New York (2014)
2. Fagerholm, F., Guinea, A.S., Mäenpää, H., Münch, J.: The RIGHT model for
Continuous Experimentation. J. Syst. Softw. (2016, in press). doi:10.1016/j.jss.
2016.03.034.
3. Rissanen, O., Münch, J.: Continuous experimentation in the B2B domain: a case
study. In: Proceedings of the Second International Workshop on Rapid Continuous
Software Engineering, RCoSE 2015, Piscataway, NJ, USA, pp. 12–18. IEEE Press
(2015)
4. Holmström Olsson, H., Bosch, J.: The HYPEX model: from opinions to datadriven software development. In: Bosch, J. (ed.) Continuous Software Engineering,
pp. 155–164. Springer, Cham (2014)
5. Boehm, B., Huang, L.G.: Value-based software engineering: a case study. Computer
36(3), 33–41 (2003)
6. Highsmith, J., Cockburn, A.: Agile software development: the business of innovation. Computer 34(9), 120–127 (2001)
7. Cockburn, A., Highsmith, J.: Agile software development, the people factor. Computer 34(11), 131–133 (2001)
8. Ries, E.: The Lean Startup: How Today’s Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses. Crown Business, Houston (2011)
9. Blank, S.: The Four Steps to the Epiphany: Successful Strategies for Products That
Win, 2nd edn. K&S Ranch, Pescadero (2013)
10. Croll, A., Yoskowitz, B.: Lean Analytics: Use Data to Build a Better Startup Faster.
O’Reilly Media, Sebastopol (2013)
11. Tichy, M., Bosch, J., Goedicke, M., Fitzgerald, B.: 2nd International Workshop
on Rapid Continuous Software Engineering (RCoSE 2015). In: Proceedings of the
37th International Conference on Software Engineering, vol. 2, pp. 993–994. IEEE
Press (2015)
12. Holmström Olsson, H., Alahyari, H., Bosch, J.: Climbing the “Stairway to Heaven”
- a mulitiple-case study exploring barriers in the transition from agile development
towards continuous deployment of software. In: 2012 38th Euromicro Conference
on Software Engineering and Advanced Applications, pp. 392–399 (2012)
13. Yaman, S.G., Sauvola, T., Riungu-Kalliosaari, L., Hokkanen, L., Kuvaja, P., Oivo,
M., Männistö, T.: Customer involvement in continuous deployment: a systematic
literature review. In: Daneva, M., Pastor, O. (eds.) REFSQ 2016. LNCS, vol. 9619,
pp. 249–265. Springer, Heidelberg (2016). doi:10.1007/978-3-319-30282-9 18
14. Bosch, J.: Building products as innovation experiment systems. In: Cusumano,
M.A., Iyer, B., Venkatraman, N. (eds.) ICSOB 2012. LNBIP, vol. 114, pp. 27–39.
Springer, Heidelberg (2012). doi:10.1007/978-3-642-30746-1 3
15. Yin, R.: Case Study Research: Design and Methods, 4th edn. SAGE Publications,
Inc., Thousand Oaks (2009)
16. Robson, C.: Real World Research. Wiley, Chichester (2011)
17. Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol.
3(2), 77–101 (2006)
18. North, D.: Introducing BDD. Better Software., March 2006

Why Do We Do Software Process Improvement?
Study on Commonly Used Goals in Practice
Anna Schmitt ✉ and Philipp Diebold
(

)

Fraunhofer Institute for Experimental Software Engineering IESE,
Fraunhofer-Platz 1, 67663 Kaiserslautern, Germany
{anna.schmitt,philipp.diebold}@iese.fraunhofer.de

Abstract. Every company tries to improve its overall business, especially in the
fast and reacting world of software. For these improvement activities, the devel‐
opment process is a major aspect. Our goal was the elicitation of common
improvement goals that are considered for improving the development process.
For collecting the common improvement goals, we used a mixture of methods.
We started with existing literature and results of a survey. Further, we extended
both by a set of workshops with industrial partners. Besides the common aspects
of time, cost, and quality, some new goals, such as participation and democra‐
tization, appeared. These results lead to a more practitioner-oriented ﬁeld of
process improvement, since we are aware of common practical improvement
goals.
Keywords: Development processes · Software process improvement ·
Improvement goals · Reasoning

1

Introduction and Motivation

Software process improvement (SPI) methods, such as CMMI or SPICE, deal with the
continuous improvement of existing development processes and are an important aspect
of software engineering [1, 2]. Nonetheless, SPI projects are hardly ever initiated for
their own sake [3]. Instead, process improvement initiatives are generally triggered by
organizational or project-speciﬁc improvement goals. And here, we lack insight into
which organizational improvement goals are actually driving SPI initiatives in practice
and of these, which are the most important?
One aspect that has been researched thoroughly is SPI success factors [1], which are
only partially related to the improvement goals. In this study, we want to shed light on
the state-of-the-practice in setting SPI goals. That is, we want to identify those goals
that are typically considered when improving the software development process.
Although we investigate a wide variety of software domains, special emphasis has been
placed on web domains, business processing, and tool development in small and
medium-sized enterprises (SMEs).

© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 360–367, 2016.
DOI: 10.1007/978-3-319-49094-6_23

Why Do We Do Software Process Improvement?

361

The remainder of this paper is structured as follows: After this introduction, we
brieﬂy discuss some related work on improvement goals (Sect. 2). Section 3 contains
our approach of how we collected the resulting improvement goals of Sect. 4.

2

Related Work

Software process improvement often relies on assessments and the use of reference
models, such as CMMI for Development [4] or SPICE (ISO15504) [5]. These
approaches only incorporate SPI goals implicitly by aiming for capability or maturity
levels that are loosely coupled with high-level improvement goals (e.g., “GP2.2 - plan
the process” or “GP2.5 - train people”).
However, there is some existing work on goal-based SPI: Diebold and Zehler [6]
introduce an Agile Capability Analysis that uses diﬀerent improvement goals and
applies GQM [7] and GQM+Strategies [8] approaches. This work also incorporates ISO
25010: Systems and software Quality Requirements and Evaluation (SQuaRE) - System
and software quality models [9], which are used as sub-goals for product quality.
Along with these very process-speciﬁc aspects, there are goal-based software engi‐
neering approaches in speciﬁc ﬁelds such as requirements engineering (RE) [10] and
testing [11]. These approaches focus on project and product improvement goals and do
not consider process and organizational ones.

3

Data Collection Approach

First of all, we used the input that was directly connected with our work ([6, 9] presented
in Sect. 2) as initial idea. (1) Out of this input, we put three examples on a poster to
collect further ideas at the OOP2016 conference (www.oop-konferenz.de). We ended
with 26 possible improvement goals, collected by the conference participants and other
exhibitors. (2) Afterwards, these items were discussed, sorted, and categorized in a
workshop with partners from academia and practice.
For further reﬁnement and extension, we (from academic viewpoint) (3) conducted
three independent workshops with diﬀerent industrial partners, all SMEs. These work‐
shops were guided by one of the authors and performed in an open way. We asked
generic questions regarding the improvement goals. We collected the data from all
workshop-participants (from 2 to 5 participants) on sticky-notes. In this collecting
process diﬀerent roles took part, e.g. management, project leaders, and developers. The
notes were discussed, grouped, and consolidated together with all participants. After the
workshops, we integrated these results into a mind map that visualized the previous
results.
To come up with ﬁnal results, we conducted a further workshop with practitioners
and academics to discuss the ﬁnal mind map in detail. The results that were concluded
in this meeting will be presented in the following section.

362

4

A. Schmitt and P. Diebold

Result: Current Improvement Goals

We visualized the diﬀerent improvement goals in a mind map including diﬀerent levels
of abstraction due to their level of granularity. The highest level of the mind map
(Fig. 1), level 1, shows the initially established improvement goals customer involve‐
ment, time-to-market, quality, and organizational democratization.

Fig. 1. Improvement goals

On level 2, the respective improvement goals from level 1 are reﬁned. Therefore,
customer involvement is divided into customer participation, customer satisfaction/
acceptance, (intermediate) product-transparency, budget-transparency, and projecttransparency. Organizational democratization is clustered into the following three
sub-goals: internal knowledge management (technical as well as nontechnical),
personnel motivation, and project democratization. Quality is composed of innovative
solutions, conceptual quality, technical quality, testability/acceptance criteria, and user
experience. Finally, time-to-market has four sub areas: automation, competence
focusing, resource management, and time-transparency. All improvement goals on level
2 will be described in the following sub-sections.
4.1 Customer Involvement
Objective is the integration of the customer in the early phases of the product develop‐
ment process. This is to receive early and regular feedback regarding the customer’s
expectations. It is addressed by the sub-goals customer participation and customer
satisfaction/acceptance. Furthermore, giving stakeholder transparency on the develop‐
ment increases comprehension of the product and business process. This conveys the
feeling of being involved in every step and is addressed by budget-transparency and
project-transparency.

Why Do We Do Software Process Improvement?

363

Customer participation: For a high customer involvement in the development
process, an appropriate qualitative and quantitative feedback is needed. Additionally,
predeﬁned deadlines with deﬁned commitments to assess the intermediate product
results helps increasing the involvement. To reach respectable feedback, receiving real
user feedback in an adequate quantity is important.
Customer satisfaction/acceptance: Understanding customers is a prerequisite to
reach their satisfaction/acceptance. This includes personal aspects and, especially, the
expectations of the customer. Misinterpretations of expectations can be created by either
the customer by, e.g. being uncertain about the desired result, or by the contractor by
misunderstanding the customers’ wishes. The objective is to reach a common under‐
standing concerning the desired output and to build up a common vocabulary regarding
the timed collaboration und functioning.
Budget transparency: Before gaining this goal, an initial (and later continuous) eﬀort
estimation is needed and presented openly. Furthermore, the up-to-date information
needs to be accessible anytime. The idea is to improve the cost estimation to decrease
number of adaptions. Additionally, a good, regular, and truthful communication
regarding time and progress is pursued.
Project transparency: To enable a good project transparency, a close and regular
documentation of the project work needs to be implemented in the company (work
package, problems, problem solutions, etc.). Additionally, every employee and customer
must have an overview over the ongoing projects. Further aspect is the increase of
understanding the system, e.g. by clearer and detailed requirements, clariﬁcation of
expectations, clear/understandable deﬁnition of the work packages, and transformation
of customer wishes into concrete realizable tasks. Moreover, tracking of progress needs
to be enabled, so the status quo can be presented to the stakeholders. Another important
sub-goal is the contract management, in particular, the tender preparation and project
approval. Project approval enables the customer to recognize, which products will be
contemporarily available on the market. Besides, the customer and the market compet‐
itors are comprehensible shown, which eﬀort (e.g. budget, duration, and employees) is
required to produce the end-product.
4.2 Organizational Democratization
Democratization of the organization tends to uniform, equal (working) conditions and
rights of co-determination of all employees within the company. It is clustered into:
internal knowledge management, employee motivation, and project-democratization.
Internal knowledge management: It is focused on technical as wells as on organiza‐
tion knowledge management, that is divided into: elimination of bottlenecks, better
documentation, knowledge transfer, exchange of experience, better communication
within and across teams, and increase of understanding the system (incl. interfaces).
Possible bottlenecks need to be prevented, existing bottlenecks need to be eliminated,
e.g. competition, cultural diﬀerences, distribution within and of team. Better

364

A. Schmitt and P. Diebold

documentation refers to documentation of change requests to prove anytime: “What
shall be proved when and how, and what has been changed in fact.” The exchange of
experience focusses on information distribution across all stakeholders and preparation
of project learning. Combined with them, a better communication within and across the
team is addressed. This is achieved as all employees are informed about time, progress,
and budget status of every project. The technical knowledge management should
increase the understanding of the system, including its interfaces. This increase shall be
ensured by an appropriate tool support at the beginning (e.g. Jira for ticketing) and by
avoiding repeating mistakes.
Employee motivation: The aspired improvement goals are: acceptance of the course
of action by the personnel, pleasure at work, satisfaction within the team/staﬀ/company,
increasing creativity, assumption of responsibility by all personnel, credibility
concerning successful implementation, recruiting of new personnel by increasing attrac‐
tiveness of the company, and qualiﬁcation (e.g. training opportunities). Attracting the
aspects pleasure at work, satisfaction within the team/staﬀ/company, increasing crea‐
tivity, assumption of responsibility by all personnel, and credibility concerning
successful implementation boost employee motivation by creating an overall feeling of
togetherness, being successful, and being appreciated. Recruiting new staﬀ increase
motivation by established employee by, on the one hand, getting new insights and ideas,
and, on the other hand, defending the acquired position. Acceptance of the course of
action leads to the identiﬁcation with the company values, vision, and working methods.
Project-democratization: Derived from the high-level organizational democratiza‐
tion, ﬂat hierarchies, simpliﬁcation of the communication and escalation paths, discus‐
sions about project content, multi-project-management, uniformity of processes, and/or
ﬁling by a close documentation need to be achieved and improved. This assures uniform
conditions regarding the collaboration of every single (team) employee. Additionally,
it improves the overall productivity.
4.3 Quality
One workshop resulted in a deeper clustering of the goal “quality”. The companies
explicitly addressed and discussed the sub-goals innovative solutions, conceptual
quality, technical quality, testability/acceptance criteria, user experience, and docu‐
mentation.
Innovative Solutions: One way to increase the process and, thus, the resulting product
quality is the creation of more innovative solution within the products. It leads to a
competitive advantage on the product market. This is because of creating attractive
products, the customers get convinced of their (innovative) unique quality.
Conceptual and technical quality: In particular, both quality aspects were discussed
elaborately due to their high importance. Conceptual quality refers to compatibility,
maintainability and portability of the system or the software. In contrast, the technical
quality focuses on the quality aspects of the ISO25010. These aspects contain

Why Do We Do Software Process Improvement?

365

functionality, performance, usability, reliability, and security of single technical parts
of the products. They improve software architecture, prototyping, technical up-to-date‐
ness as well as the balance between individual/customized and standard software.
Testability/acceptance criteria: A further possibility is the implementation of a more
eﬃcient testability/more eﬃcient acceptance criteria. This results in a faster testing
process. It enables a higher amount of iterations with diﬀerent scope and, thus, a faster
commercialization of a high-value product.
User experience: This goal focuses on the involvement of the user experience as it is
propagated by the ISO25010 [9] with its “quality in use”. Similar to [6], it shows the
importance of the diﬀerent quality aspects of the ISO, whereas the customers (especially
in information systems) focus on user experience. User experience increases by fast
recovering, evaluating, and implementing.
Documentation: Documentation is part of software development. Thus, it is an impor‐
tant ingredient for quality. The quality of documentation depends on diﬀerent aspects,
namely amount, availability, maintenance, and granularity of the documents.
4.4 Time-to-Market
Time-to-market aims at bringing products to the market fast for a competitive advantage
regarding the price policy. It contains the time from a vague idea through the develop‐
ment and test phase up to the real market launch. Time-to-market is divided into compe‐
tence focusing, resource management, and time transparency.
Competence focusing: First of all, competence focusing needs to be implemented and
lived. Time and eﬀort available shall fully be used in exactly these ﬁelds of work, which
represent the core competencies of the company. Therefore, core competencies need to
be identiﬁed. Afterwards, dealing with these within teams and organization must be
learned.
Resource management: Resource management needs to be optimized for not wasting
time, eﬀort, and resources. It refers to a project speciﬁc planning, adaption, and appli‐
cation of resources. Thus, a project speciﬁc capacity needs to be carried out as well as
a balance between projects, products and customers (e.g. regular customer vs. new
customer).
Time transparency: Time management/time transparency might be the most impor‐
tant factor to fulﬁll the time-to-market. It is addressed and fulﬁlled by the following two
sub-goals: (1) Adhere to internal and external deadlines. (2) Communicate delays within
the project plan (e.g. inclusion of the time buﬀer while doing time-critical tasks) in a
transparent, calm, and objective way. Furthermore, including suﬃcient time buﬀer for
changes that need be done besides the daily business is necessary.

366

5

A. Schmitt and P. Diebold

Threats to Validity

The aim of this study was not to come up with a complete model. The exploratory focus
was on having ﬁrst ideas of existing improvement goals in practice. Thus, we are aware
that conducting a collection on a conference (with mainly German companies) and
conducting workshops with only three industrial partners might be a low number.
Therefore, we performed the workshop with diﬀerent roles of partners to get diﬀerent
views and increase the number results. Furthermore, the three workshops were struc‐
tured in the same way. Also, they were performed by the same persons for a better
comparability and integrability.

6

Conclusions and Future Work

This paper presents the collection approach of improvement goals behind SPI as well
as the results from practitioners’ perspective. We identiﬁed four main goals, customer
involvement, organizational democratization, quality, and time-to-market, reﬁned into
sub-goals. These sub-goals were collected together with diﬀerent companies based on
their individual challenges. All results represent common improvement goals derived
from problems or challenges of SMEs. These results show that companies have similar
improvement potential on the high-level goals, but very individual and speciﬁc detailed
goals. These improvement goals simplify working on goal-speciﬁc SPI approaches that
address practical issues.
Nonetheless, we are aware of the threats to validity of our results. Thus, we are
collecting further input on this topic, e.g. with our project webpage (www.prokob.info)
or speciﬁc events. Another aspect of future work is the creation of an SPI-approach
similar to the application scenario in [6] for addressing the diﬀerent company goals.
The elaborated results help practitioners comparing and baselining their company.
They have the possibility to identify commonalities and diﬀerences of other enterprises
with respect to their improvement goals. Such that they can identify missing improve‐
ment goals. On this basis, practitioners consider, whether it is beneﬁcial and useful
including these goals in SPI.
Acknowledgments. This research is being funded by the German Ministry of Education and
Research (BMBF 01IS15038). We thank all the participants of the OOP-conference, and
workshops. Finally, we were happy to receive feedback of Thomas Zehler on this paper.

References
1. Münch, J., Armbrust, O., Kowalczyk, M., Soto, M.: Software Process Deﬁnition and
Management. Springer, Heidelberg (2012)
2. Kuhrmann, M., Diebold, P., Münch, J.: Software process improvement: a systematic mapping
study on the state of the art. PeerJ. Comput. Sci. 2, e62 (2016)
3. Austen, R., Hall, T.: Key success factors for implementing software process improvement: a
maturity-based analysis. JSS 62(2), 71–84 (2002)

Why Do We Do Software Process Improvement?

367

4. SEI: CMMI for Development (CMMI-DEV), Version 1.3 (2010)
5. International Organization for Standardization. ISO/IEC 15504:2012 – Information
technology – Process assessment (SPICE) (2012)
6. Diebold, P., Zehler, T.: The agile practice impact model – ideas, model, and application
scenario. In: ICSSP 2015. ACM (2015)
7. Van Solingen, R., Basili, V., Caldiera, G., Rombach, H.D.: Goal Question Metric (GQM)
Approach. Encyclopedia of Software Engineering. Wiley, Hoboken (2002)
8. Basili, V., Trendowicz, A., Kowalczyk, M., Heidrich, J., Seaman, C., Münch, J., Rombach,
D.: Aligning Organization Through Measurement – The GQM+Strategies Approach.
Springer, Heidelberg (2014)
9. International Organization for Standardization: ISO/IEC 25010: 2011 – Systems and Software
Engineering—Systems and Software Quality Requirements and Evaluation (SQuaRE)—
System and Software Quality Models (2011)
10. van Lamsweerde, A.: Goal-oriented requirements engineering: a guided tour. In: RE 2016,
pp. 249–262. IEEE Press (2001)
11. Nguyen, D., Perini, A., Tonella, P.: A goal-oriented software testing methodology. In: Luck,
M., Padgham, L. (eds.) AOSE 2007. LNCS, vol. 4951, pp. 58–72. Springer, Heidelberg
(2008). doi:10.1007/978-3-540-79488-2_5

Developing Processes to Increase Technical
Debt Visibility and Manageability – An Action
Research Study in Industry
Jesse Yli-Huumo1(&), Andrey Maglyas1, Kari Smolander2,
Johan Haller3, and Hannu Törnroos4
1

Lappeenranta University of Technology, Lappeenranta, Finland
jesse.yli-huumo@aalto.fi, maglyas@gmail.com
2
Aalto University, Espoo, Finland
kari.smolander@aalto.fi
3
Tieto Sweden AB, Stockholm, Sweden
johan.haller@tieto.com
4
Tieto Oyj, Helsinki, Finland
hannu.tornroos@tieto.fi

Abstract. The knowledge about technical debt and its management has
increased in recent years. The interest of academia and industry has generated
many viewpoints on technical debt. Technical debt management consists of
technical and organizational aspects, which make it a challenge in software
development. To increase technical debt visibility and manageability, new
processes must be developed and thoroughly empirically tested for their
applicability. In this paper, we use the action research methodology to design
processes for identiﬁcation, documentation, and prioritization of technical debt.
Our partner in this research is a large Nordic IT company Tieto, currently in a
need for new ways to improve their technical debt management. The results
include a set of processes and templates that were successfully used to identify
and document technical debt. The identiﬁed technical debt items were later
prioritized based on evaluation by Tieto employees. Tieto was able to create a
prioritized technical debt backlog, which is now used for reduction activities to
create a healthy and sustainable product for the future.
Keywords: Technical debt  Technical debt management
improvement  Action research

 Software process

1 Introduction
Technical debt refers to a situation in software development where shortcuts and/or
workarounds are used in technical decisions to gain time-to-market [1]. The beneﬁt of
taking technical debt is an earlier and faster release, which can lead to customer
satisfaction and other economic advantages [2]. However, the drawback is the ‘debt’
that is left in the system. In the long-term, shortcuts and workarounds will turn to
unnecessary complexity (interest) in the source code and architecture. Complexities in
software can become hard to ﬁx and change, which may cause decrease in software
© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 368–378, 2016.
DOI: 10.1007/978-3-319-49094-6_24

Developing Processes to Increase Technical Debt Visibility and Manageability

369

quality and productivity of the development team [3]. Therefore, technical debt can be
a major problem for a software development company.
While shortcuts and workarounds can be seen as intentional decisions to speed up
release cycles, or to circumvent a complex part of the code, unintentional technical debt
occurs without immediate awareness [4]. Unintentional technical debt is introduced to
software, for example, by inexperienced developers or legacy software. An inexperienced developer can create technical debt unintentionally with non-optimal solution.
Old legacy software can consist of obsolete or non-optimal technology and solutions
from past decades, which may require a rewrite or replacement.
Technical debt management refers to activities that are used to manage and reduce
both intentional and unintentional technical debt with various approaches, practices and
tools [5]. Technical debt management not only includes technical development activities but also organizational ones, such as communication and decision-making.
This study is made in cooperation with one of the largest IT companies in Scandinavia, Tieto. Tieto’s Capital Market product unit is currently planning new processes
for their technical debt management. The goal of the study is to develop new processes
for technical debt identiﬁcation, documentation, and prioritization. The outcome of this
study includes new processes to increase the visibility and manageability of technical
debt, which can be used in the future for better decision-making.
This paper is limited to studying technical debt that has already been acquired
previously, and does not take in consideration the management activities related to
decision-making process of acquiring new technical debt.

2 Background
Processes for technical debt management have been studied and suggested in the
literature. Li et al. [5] gathered in a mapping study relevant research on technical debt
management. The study showed that technical debt management can be divided into
following activities: (1) identiﬁcation, (2) measurement, (3) prioritization, (4) prevention, (5) monitoring, (6) repayment, (7) representation/documentation, and (8) communication [5]. Li et al. [5] also state that currently there is a lack of empirical evidence
about technical debt management. In this study, we are mainly focusing on three out of
the eight management activities. Our goal is to use processes for representation/
documentation, identiﬁcation, and prioritization of already incurred technical debt to
provide empirical evidence with a real case company.
Technical debt representation/documentation has been studied and suggested in
literature with speciﬁc lists and templates as an approach to store technical debt issues
[6, 7]. A backlog or a list should increase technical debt visibility and manageability.
When technical debt is properly documented, it is easier to start other technical debt
management activities, because it is visible to the company.
Before a technical debt issue can be documented, it has to be identiﬁed. Identiﬁcation of smaller technical debt issues from the source code is possible with speciﬁc
tools [8]. However, technical debt is not always only related to issues in the source
code [9]. Technical debt in software architecture and design is a larger challenge [5, 9].

370

J. Yli-Huumo et al.

The identiﬁcation of architectural technical debt with tools is difﬁcult and often the
only solution is to use human knowledge and examination [9].
The prioritization of technical debt is difﬁcult, because some technical debt might
be important to ﬁx for business reasons, while other for technical reasons. Some models
and methods have been developed for prioritization. Seaman et al. [10] suggested four
approaches for technical debt decision-making: simple cost-beneﬁt analysis, analytic
hierarchical process, portfolio management model, and options. These approaches
have been used also in other domains, such as ﬁnance [10]. They support evaluating the
tradeoffs between proposed enhancements, corrective maintenance, and the payment of
technical debt items [10]. Schmid [11] developed a formal model based on providing
several well-deﬁned approximations, which can be used for technical debt prioritization. In addition, some papers have used quality attributes from ISO 9126 as an
evaluation to technical debt [12–14].
Overall, there exists a variety of different ideas for technical debt documentation,
identiﬁcation and prioritization. However, most of them are focused on one speciﬁc
activity only. Studies that approach the whole process from identiﬁcation to repaying
technical debt are rare. Therefore, we collaborate with a real software company to ﬁnd
and develop processes, including technical debt identiﬁcation, documentation, and
prioritization. We take inspiration from a study conducted by Li et al. [7] that had a
similar goal. Their approach was to identify architectural technical debt based on
architecture decisions and change scenarios [7]. Our approach extends this by
expanding the technical debt evaluation and prioritization processes. Our goal is to
create more reasoning possibilities in decision-making, which is required especially in
organizational aspects of technical debt management.

3 Research Methodology
Action research was selected as a research methodology for this study. Action research
combines theory and practice [15]. Action research is an iterative process involving
researchers and practitioners acting together on a particular cycle of activities,
including problem diagnosis, action intervention, and reflective learning [15]. Action
research is especially relevant in situations where participation and organizational
change processes are necessary [16]. It attempts to provide practical value to the client
organization while simultaneously contributing to the acquisition of new theoretical
knowledge [17]. The action research cycle [18] consists of three stages: (1) a pre-step to understand context and purpose; (2) six main steps - to gather, feedback and analyze
data, and to plan, implement and evaluate action; (3) a meta-step - to monitor.
The rationale for using action research as a research methodology is the nature of
this study. The company in this study had a goal to improve their technical debt
management. The research group in this study had previous experience on working
with various companies and cases related to technical debt and its management.
Therefore, action research, as an approach where both the company and the research
group work together to understand the problem and develop a solution, was especially
ﬁt for the purpose.

Developing Processes to Increase Technical Debt Visibility and Manageability

371

The selected product line in this research is a ﬁnancial system used in the capital
market industry by multiple customers around Nordics. The product is one of the three
main products provided by Tieto and it has a long development history including
source code from over 20 years ago. The product and development team have faced
both technological changes and organizational changes during their lifetime. Now the
main objective of Tieto’s Capital Market product unit is to migrate to new technology
with the aim to replace and rewrite old one, to improve quality and productivity, while
still serving all of its customers.
The objective of the study was to increase technical debt visibility and manageability by improving processes related to identiﬁcation, documentation and prioritization. Therefore, we set up the following research questions to address the problem:
RQ1: How to improve technical debt identiﬁcation and documentation? The
limitations of the tools currently available for technical debt identiﬁcation can be seen
as a big challenge. The identiﬁcation of architectural technical debt with tools is very
difﬁcult. Therefore, most if not all technical debt identiﬁcations have to be done with
manual code and architecture inspection, where developer or architect examines the
system and the source code for possible issues. Our goal is to observe how technical
debt is currently identiﬁed in practice and how it is documented afterwards. The
objective is to identify possible improvements to these current processes, and test them
in practice.
RQ2: What factors should be taken in consideration when prioritizing technical
debt? The decisions related to technical debt can be sometimes made based on hunches without any speciﬁc model or method to follow. Business owners might prioritize
issues that give direct value to customers, while technical people might put value more
on software quality and sustainability. Understanding both business and technical
effects of technical debt repayment can help technical debt evaluation and improve the
prioritization process for safer decisions. We will observe the processes of technical
debt evaluation and prioritization in practice with the aim to improve technical debt
evaluation and prioritization.

4 Action Research Process
The action research process used in this study is presented in Fig. 1. This research can
be divided into ﬁve main activities and outcomes.
The ﬁrst step of the research process is interviews, where researcher interviews
people related to the product line or company to understand the current issues related to
technical debt and its management. We conducted seven semi-structured interviews
with the average of 45 min. We recorded, transcribed and analyzed all the interviews.
In the analysis of the interviews, we identiﬁed major issues. First, we did not ﬁnd any
systematic process for technical debt identiﬁcation, evaluation or prioritization. This
led to a technical debt communication gap between the development team and project
managers. Knowledge of technical debt seemed to be tacit personal knowledge rather
than explicitly stored in a common list. Secondly, we noticed that the developers and
architects had much knowledge about the current issues regarding technical debt, but

372

J. Yli-Huumo et al.

Fig. 1. The research process

there was not any systematic way to document it. Thirdly, when there were technical
debt issues in discussion, the decision-making was mostly done based on hunches,
rather than evaluating and prioritizing them ﬁrst. The outcome of this step is a problem
identiﬁcation, which helps to understand the problem in current processes within a
company.
The second step is to develop a process for identiﬁcation. In our case, the identiﬁcation was conducted by gathering the data from previous knowledge and history of
people related to the product. The members of the product line used ten weeks to search
and identify technical debt issues. The reason for manual inspection was that the
company did not have any speciﬁc tools in use to identify technical debt. The outcome
of this step is the increase of technical debt visibility, which helps to understand the
overall technical debt view.
The third step is to develop a process for documentation. We decided to introduce
a simple process to document all technical debt issues to a single technical debt
backlog. The idea was to use backlog as an aid to make technical debt more visible to
everyone in the product line. We used a similar template (Table 1) to Guo and Seaman
[6] to collect all technical debt items. The template was sent to nine members of the
product line that was later returned back to the managers. The managers then combined
all the reported issues and created the technical debt backlog. The outcome of this step
is getting technical debt stored.
The result of documentation process was technical debt backlog that consisted 47
identiﬁed technical debt issues. For categorization we used 15 different technical debt
types identiﬁed by Alves et al. [19] in a mapping study. The majority of identiﬁed
technical debt (33/47 issues) was related to issues in design, architecture, code, and a
new category called legacy debt. Other types of technical debt (14/47) requirements,

Table 1. Template for technical debt documentation
Technical Debt ID
Date/Reporter
Technical Debt Name
Description
Alternatives
Rationale

Technical debt identiﬁcation number
Reporting date/Reporter name
Name of identiﬁed technical debt
Description of identiﬁed technical debt
Explanation of possible alternative solutions
Reasons to ﬁx technical debt

Developing Processes to Increase Technical Debt Visibility and Manageability

373

test, test automation, and process debt were associated more to activities outside
product implementation.
The fourth step is to organize a workshop. We developed a process to prioritize
technical debt issues with a simple technical debt evaluation and prioritization template
(Table 2). This template was used when all the technical debt issues were collected to
the backlog. In the workshop, the participants would evaluate each identiﬁed technical
debt issue based on the ﬁve questions to create a prioritization. The outcome of this step
is getting technical debt evaluated.
The research group also analyzed the returned evaluation templates based on each
question to understand how technical debt was being evaluated.
We identiﬁed three different beneﬁt categories: technical, economic, and organizational beneﬁts. Technical beneﬁts include improvements in software quality, software maintainability, software reusability, software performance, software testability,
and software deployment. Organizational beneﬁts include better software deployment,
development team productivity, organizational communicability, and future adaptability. Economic beneﬁts include economic value and customer satisfaction.
We identiﬁed three different risk categories: economic, technical, and organizational risk. Economic risks include cost, time effort, testing effort, and customer satisfaction. Organizational risks include management and competence. In addition,
technical risks like system breakdown and instability are critical to companies.
For reasons, we identiﬁed three different categories: intentional decision, unintentional cause, and organizational cause. Intentional reasons were often related to
time constraints, lack of resources, and business driven development. Unintentional
causes were legacy product and lack of knowledge. For organizational causes, software
processes and lack of management were the main reasons for technical debt.
We identiﬁed two types of solutions for technical debt: technical and organizational solutions. Technical solutions were refactoring, redesigning, rewriting, architectural analysis, and increased testing. Organizational solutions were new processes
and new management plan/strategy.
The ﬁfth step of the research process is sorting. When there is an evaluation for
each technical debt item, it is easier to sort the issues out based on their importance.
The last outcome of the process is prioritized technical debt backlog. The majority of
the issues (27/47) were prioritized at the lowest priorities 5 or 4, which shows that most
technical debt was not considered dangerous now. There were total of 14 issues rated as
the highest priorities at level 1 or 2. There were three level 1 issues related to legacy
Table 2. Technical debt evaluation template
#
1
2
3
4
5

Question
What are the beneﬁts of ﬁxing this issue? (Business value, quality, productivity, less bugs
etc.)
Are there any risks in ﬁxing this issue? (Expensive, breaks the system etc.)
Why was this issue done previously like that?
How to ﬁx this issue and what resources the ﬁx would require?
From scale 1 – 5, how important would you rank this issue to be dealt with? (1 – most
important, 5 – not so important)

374

J. Yli-Huumo et al.

debt, which can be explained by Tieto’s current goals to migrate to a new technology to
replace and rewrite old technology. Interestingly, the priorities also show that most of
the technical debt related to design, architecture, and code debt were prioritized as 4 or
5, while test, test automation, and process debt were rated higher. However, it is
important to notice that number of technical debt issues in design, architecture, and
code is much higher than other types of debts, which might explain the difference.
The outcome of this action research cycle was a prioritized technical debt backlog
that can be now used to add more development tasks related to technical debt reduction. For example, Tieto managers expressed that the backlog would be used in the
future by Tieto to reduce technical debt in small iterations. Tieto managers also
mentioned that this same process would be applied in future to other product lines.

5 Discussion
RQ1: How to improve technical debt identiﬁcation and documentation? Our study
made technical debt identiﬁcation and documentation possible with simple practices
that make technical debt more visible and manageable. These similar practices have
been already suggested in other literature [6, 7]. However, the problem is not the
practices themselves, but the fact that changing or adding new practices in companies is
always a challenge and takes time [20]. In our case, we had a company that was
motivated to improve and change these practices. We started some new practices with
templates and processes that gathered previously identiﬁed technical debt from the
minds of architects and developers to a speciﬁc backlog designed only for technical
debt.
Technical debt identiﬁcation is a challenge in software development. Identiﬁcation
of smaller technical debt issues happening in single code lines can be done with static
code analysis tools and often it can be ﬁxed by single developers. However, larger
issues in architecture and structure are often unreachable with tools [9, 21] and require
technical knowledge and competence [22], and discussion on an organizational level.
In our case, the people in the product line did not use any tools to ﬁnd and identify
technical debt. Instead, the technical debt was identiﬁed based on previous experience
and history with the product. The experience with the product of workshop participants
shows that the people responsible for identiﬁcation had extensive knowledge of the
product development history and high competence to build software. We used this fact
to our advantage, since we did not have to guide developers and architects to investigate product history, because the knowledge was already acquired during the
development years. This helped to identify existing technical debt and document it
based on our recommendations.
An interesting perspective on identiﬁed and documented technical debt and is the
variety of types of technical debts. The large variety of technical debt types shows that
when talking about technical debt, it is not only related to issues in design or code.
Instead, like in our case, the same phenomena of shortcuts and bad solutions happen in
other parts of software product development as well. To some development teams,
technical debt might include only issues happening in the source code and design,

Developing Processes to Increase Technical Debt Visibility and Manageability

375

while to some other teams, like in Tieto it might also include issues like those in testing
and processes. We argue that technical debt management is successful when a company
sets a clear standard to what is technical debt in their context and start to manage
technical debt based on that standard. However, in academia, there is a need to create a
common understanding for technical debt.
RQ2: What factors should be taken in consideration when prioritizing technical
debt? The prioritization in this study was based on evaluation of beneﬁts, risks,
reasons, and solutions of technical debt. Using these factors to evaluate each technical
debt issue could be a good beginning for companies that are trying to improve their
technical debt management. However, these factors are not always measureable with a
numeric value. The interest in technical debt that accumulates larger if not repaid, it is
difﬁcult to estimate [23]. Therefore, rather than trying to measure exact values, technical debt could be easier to understand from management perspective, if evaluated
based on factors related to it.
Companies should evaluate each technical debt issue on the basis of how ﬁxing the
issue can beneﬁt both company and software, such as improved quality, and how does
this quality improvement affect other factors such as maintainability, performance or
customer satisfaction. One challenge and risk of technical debt is that it often requires
competence to ﬁx or change existing solutions. When developers are changing very old
parts of the code, it is not always certain that it will go as planned and it can be a huge
risk that needs to be evaluated before.
Understanding the reasons behind technical debt can help to understand bigger
underlying problems with technical debt. For example, a single technical debt issue in
one smaller feature can be caused by some larger architectural issue. Instead of just
ﬁxing one single technical debt issue with the most economical value, it might be
possible that another major technical debt item can be actually more beneﬁcial to ﬁx in
a long term. Sometimes the solution might only require small refactoring, while
sometimes it might need a full rewrite of that certain part of the code. Therefore, it is
important to evaluate how much resources and effort does ﬁxing technical debt require.
Understanding the solution can enable a better evaluation, whether the time required for
ﬁxing is worth compared to its beneﬁt.
We believe that technical debt prioritization should be done based on evaluation
rather than measurement. The combination of the presented factors can be used as a
simple way to create basic prioritizations, which can help companies to make decisions
with more rationality. The decision-making may improve when development teams and
management communicate and understand the beneﬁts and risks in each technical debt
issue, accompanied with knowledge on the reasons and solutions for technical debt.
Study limitations. The ﬁrst limitation to this study is the generalization of the results.
It is not certain that this process is usable in other companies. In our case, most of the
involved people had many years of experience with the product. This helped the
identiﬁcation stage, since the people from the product line had already extensive
knowledge about the issues in the product. The second limitation is that we conducted
only one round of this action research. Conducting more rounds might change some
results in the priorities and numbers of technical debt issues, but we believe it would

376

J. Yli-Huumo et al.

not have any changes to the actual processes that were used in this study. The third
limitation is that the used process only takes in consideration already occurred technical
debt, and does not include management processes for a situation, where a decision has
to be made for a new technical debt case. This makes the developed management
process limited to only already occurred technical debt.

6 Conclusions
We used the action research process [18] together with a large IT company Tieto to ﬁnd
and develop processes for technical debt identiﬁcation, documentation, and prioritization to increase technical debt visibility and manageability. The action research
process consisted several interviews and meetings with the company representatives
and an organized technical debt workshop to improve processes in the company. The
outcome of the research was a set of templates and processes to identify, document, and
prioritize technical debt. These templates and processes were used successfully at Tieto
to transition from a situation where knowledge of technical debt was not explicitly
documented, to a situation where a speciﬁcally prioritized technical debt backlog was
available to reduce technical debt. Tieto’s Capital Market product unit is now using this
new technical debt backlog to increase technical debt visibility and manageability.
Since the results with the developed process were considered successful by both the
research group and the company, the same process will be expanded to other product
lines in Tieto. The main challenges and lessons learned can be summarized as
following:
• Technical debt can be brought visible with simple practices and processes in a
company that does not have a priori knowledge on technical debt management.
• Identiﬁcation of larger scale technical debt, such as architecture and design, with
tools is a challenge that needs to be addressed and improved in future research.
• Technical debt documentation can be done with simple templates, but requires
motivation and resources from software organization.
• Technical debt prioritization based on measurements is difﬁcult, and therefore
rougher evaluations based on e.g. beneﬁts and risks through opinions can be seen
easier to start with.

References
1. Cunningham, W.: The WyCash Portfolio Management System, Experience Report (1992)
2. Yli-Huumo, J., Maglyas, A., Smolander, K.: The sources and approaches to management of
technical debt: a case study of two product lines in a middle-size ﬁnnish software company.
In: Jedlitschka, A., Kuvaja, P., Kuhrmann, M., Männistö, T., Münch, J., Raatikainen, M.
(eds.) PROFES 2014. LNCS, vol. 8892, pp. 93–107. Springer, Heidelberg (2014). doi:10.
1007/978-3-319-13835-0_7

Developing Processes to Increase Technical Debt Visibility and Manageability

377

3. Yli-Huumo, J., Maglyas, A., Smolander, K.: The beneﬁts and consequences of workarounds
in software development projects. In: Fernandes, J.M., Machado, R.J., Wnuk, K. (eds.)
ICSOB 2015. LNBIP, vol. 210, pp. 1–16. Springer, Heidelberg (2015). doi:10.1007/978-3319-19593-3_1
4. McConnell, S.: Technical Debt-10x Software Development | Construx, 1 November 2007.
http://www.construx.com/10x_Software_Development/Technical_Debt/.
Accessed
25
March 2014
5. Li, Z., Avgeriou, P., Liang, P.: A systematic mapping study on technical debt and its
management. J. Syst. Softw. 101, 193–220 (2015)
6. Guo, Y., Seaman, C.: A portfolio approach to technical debt management. In: Proceedings of
the 2nd Workshop on Managing Technical Debt, New York, NY, USA, pp. 31–34 (2011)
7. Li, Z., Liang, P., Avgeriou, P.: Architectural technical debt identiﬁcation based on
architecture decisions and change scenarios. In: Proceedings of the 12th Working IEEE/IFIP
Conference on Software Architecture, WICSA (2015)
8. Zazworka, N., Vetro’, A., Izurieta, C., Wong, S., Cai, Y., Seaman, C., Shull, F.: Comparing
four approaches for technical debt identiﬁcation. Softw. Qual. J. 22(3), 403–426 (2013)
9. Kruchten, P., Nord, R.L., Ozkaya, I.: Technical debt: from metaphor to theory and practice.
IEEE Softw. 29(6), 18–21 (2012)
10. Seaman, C., Guo, Y., Zazworka, N., Shull, F., Izurieta, C., Cai, Y., Vetro, A.: Using
technical debt data in decision making: potential decision approaches. In: 2012 Third
International Workshop on Managing Technical Debt (MTD), pp. 45–48 (2012)
11. Schmid, K.: A formal approach to technical debt decision making. In: Proceedings of the 9th
International ACM Sigsoft Conference on Quality of Software Architectures, New York,
NY, USA, pp. 153–162 (2013)
12. Curtis, B., Sappidi, J., Szynkarski, A.: Estimating the size, cost, and types of technical debt.
In: Proceedings of the Third International Workshop on Managing Technical Debt,
Piscataway, NJ, USA, pp. 49–53 (2012)
13. Theodoropoulos, T., Hofberg, M., Kern, D.: Technical debt from the stakeholder
perspective. In: Proceedings of the 2nd Workshop on Managing Technical Debt, New
York, NY, USA, pp. 43–46 (2011)
14. Letouzey, J.-L.: The SQALE method for evaluating technical debt. In: Proceedings of the
Third International Workshop on Managing Technical Debt, Piscataway, NJ, USA, pp. 31–
36 (2012)
15. Avison, D.E., Lau, F., Myers, M.D., Nielsen, P.A.: Action research. Commun. ACM 42(1),
94–97 (1999)
16. Baskerville, R.L., Wood-Harper, A.T.: A critical perspective on action research as a method
for information systems research. J. Inf. Technol. 11(3), 235–246 (1996)
17. Sjoberg, D.I.K., Dyba, T., Jorgensen, M.: The future of empirical methods in software
engineering research. In: 2007 Future of Software Engineering, Washington, DC, USA,
pp. 358–378 (2007)
18. Coughlan, P., Coghlan, D.: Action research for operations management. Int. J. Oper. Prod.
Manag. 22(2), 220–240 (2002)
19. Alves, N.S.R., Mendes, T.S., de Mendonça, M.G., Spínola, R.O., Shull, F., Seaman, C.:
Identiﬁcation and management of technical debt: a systematic mapping study. Inf. Softw.
Technol. 70, 100–121 (2016)
20. Dyba, T.: An empirical investigation of the key factors for success in software process
improvement. IEEE Trans. Softw. Eng. 31(5), 410–424 (2005)

378

J. Yli-Huumo et al.

21. Zazworka, N., Spínola, R.O., Vetro’, A., Shull, F., Seaman, C.: A case study on effectively
identifying technical debt. In: Proceedings of the 17th International Conference on
Evaluation and Assessment in Software Engineering, New York, NY, USA, pp. 42–47
(2013)
22. Robillard, P.N.: The role of knowledge in software development. Commun. ACM 42(1),
87–92 (1999)
23. Falessi, D., Shaw, M.A., Shull, F., Mullen, K., Keymind, M.S.: Practical considerations,
challenges, and requirements of tool-support for managing technical debt. In: 2013 4th
International Workshop on Managing Technical Debt (MTD), pp. 16–19 (2013)

Applying Social Network Analysis and Centrality
Measures to Improve Information Flow Analysis
Stephan Kiesling1 ✉ , Jil Klünder1, Diana Fischer2, Kurt Schneider1,
and Kai Fischbach2
(

1

)

Software Engineering Group, Leibniz Universität Hannover, Hannover, Germany
{stephan.kiesling,jil.kluender,
kurt.schneider}@inf.uni-hannover.de
2
University of Bamberg, Bamberg, Germany
{diana.fischer,kai.fischbach}@uni-bamberg.de

Abstract. In software development projects, documents are very important for
sharing requirements and other information among employees. However, infor‐
mation can be transported in diﬀerent ways. Conversations, meetings, workshops
and emails convey and impart information as well. Especially large companies
struggle in dealing with unclear and incorrect information ﬂows. These informa‐
tion ﬂows can be improved by means of information ﬂow analysis and ﬂow
patterns. One technique to analyze information ﬂows is the FLOW method. It
supports visualization and analysis of information ﬂows to detect lacks and
anomalies and thereby improves information ﬂows. An analyst gathers informa‐
tion transported in the company. Afterwards, information ﬂows are visualized
and analyzed based on patterns and personal experience. Nevertheless, analysis
based on individual knowledge is error-prone. Hence, we improve the FLOW
method with the help of social network analysis applying centrality measures to
the FLOW method and to support the FLOW analyst.

1

Introduction

Documents play a major role in process-driven software development companies. They
share requirements and other information among team members. However, information
is also transferred through other channels like conversations, meetings or workshops.
Agile methods, becoming more and more established these days, prefer direct informa‐
tion communication channels and use less documents. Especially big companies have
to cope with unclear and wrong information ﬂows in and between teams. Companies
struggle to localize these communication problems. To ﬁnd the problems, information
ﬂows must be analyzed.
Information ﬂow analysis can uncover disruptions in information ﬂows in compa‐
nies, so that they can be corrected. The ﬁrst step of the analysis is to determine the ﬂow
of information in the company. After that, the ﬂow needs to be visualized and analyzed.
The analysis may uncover ﬁndings that can be used for information ﬂow improvement.

© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 379–386, 2016.
DOI: 10.1007/978-3-319-49094-6_25

380

S. Kiesling et al.

There are various possibilities to conduct an information ﬂow analysis. One technique
is the FLOW method, which is in focus in this paper [1].
FLOW analysis is a systematic method to visualize, analyze and improve informa‐
tion ﬂow [2]. The FLOW method provides a graphical notation (Fig. 1) as well as an
approach to identify and gather information ﬂows. The graphical notation provides two
diﬀerent types of information ﬂows. Non-documented, verbal or informal ﬂows of
information are denoted as ﬂuid, whereas solid information ﬂows are always docu‐
mented. By using this graphical notation, so-called FLOW diagrams can be created.
These diagrams represent all information ﬂows and all information stores for a task or
process. On the one hand, FLOW diagrams help the FLOW analyst to analyze gathered
information ﬂows. On the other hand, they can be used after analysis to present ﬁndings
to the company the analysis was conducted for.

Solid Store Multiple
Solid Stores

Fluid Store Multiple
Solid Flow Fluid Flow
Fluid Stores

Fig. 1. The basic FLOW notation

The FLOW method consists mainly of three phases: (1) Information ﬂows need to
be collected by interviewing appropriate persons. (2) After the elicitation, FLOW
diagrams are created by combining partial diagrams from each interviewee into an
overall one. (3) Afterwards, the analysis phase starts. The analyst regards the FLOW
diagram’s structure and looks for bad patterns to decide on improvement recommen‐
dations. Moreover, the analyst also compares the depicted information ﬂow with addi‐
tional information elicited in phase (1) to look for possible contradictions or conﬂicts.
Mainly the last phase entails several problems. The analysis can be very subjective
due to varying personal experiences. The analyst can overlook or misinterpret aspects.
He may look into some details more closely and dismiss others only based on a gut
feeling.
To reduce the inﬂuence of personal experience during analysis, we extend FLOW
with methods from social network analysis. We choose centrality measures that are
suited for FLOW and use them to compute key indicators for FLOW diagrams. An
analyst can use these key indicators as support for his calculations and indicators that
point to possible information ﬂow issues. These indicators are speciﬁcally intended to
support analysis of complex and large FLOW diagrams.
The remaining work is structured as follows. Section 2 presents related work.
Section 3 introduces the concepts of the new method. Finally, Sects. 4 and 5 conclude
this paper with a discussion and outlook.

Applying Social Network Analysis and Centrality Measures

2

381

Related Work

In the following section, we provide an overview of existing research on the relevant
topics in this paper.
Modelling and analyzing information ﬂow in organizations is motivated by the need
to improve the information processes, e.g. by eliminating redundant processes, mini‐
mizing the duplication of information, or managing and sharing intra- and inter-organ‐
izational information ﬂow [3]. The FLOW method is a method for diagrammatic model‐
ling information ﬂow in teams with the aim to identify critical points which may cause
loss of information.
Stapel et al. [2] considered the information ﬂow within diﬀerent kinds of project
teams to detect critical points like lacks of information, a wrong amount of information
sharing or gatekeepers who are very central within the network.
Stapel [1] extends this approach by presenting possibilities to analyze the FLOW
diagrams which visualize the information ﬂows within the team or business.
In the organizational context, social network analysis can help understand how
inter- and intra-organizational networks are linked to outcomes and processes [4], such
as job exit, team performance, innovation and individual satisfaction (for an overview
see, for example, [5]). In addition, social network analysis has been applied to explore
the information ﬂow in and between organizations. For example, Braha and Bar-Yam
[6] analyzed the information ﬂow structure of intra- and inter-organizational networks
in large-scale product development organizations. Friedkin [7] found that the strength
of relationships between employees has an impact on intra-organizational information
ﬂow. Ryynänen et al. [8] observed the internal information ﬂow network in the project
sales process to understand and improve the information ﬂow between the employees.
Information ﬂow networks resulting from the FLOW method are directed networks.
Analyzing directed networks requires a distinction between measures for incoming ties
and measures for outgoing ties [9]. This is important for the measures, which will be
applied in the network analysis in this contribution.
Social network analysis provides a wide range of measures to analyze a network
(e.g. [9]). In order to understand and analyze the roles of actors in networks, many studies
in social science have relied on centrality measures [6]. Centrality measures help to
understand the individual actor’s prominence according to the actor’s position in the
network [10]. We argue that centrality measures are suitable means to analyze the
directed ﬂow networks.

3

Improving the FLOW Method

This section gives an overview of our concept to enrich FLOW diagrams with social
network analysis by calculating key indicators.

382

S. Kiesling et al.

3.1 Selection of Appropriate Centrality Measures
This approach concentrates on the use of centrality measures which help to analyze
FLOW networks to simplify the process. Nevertheless, the FLOW analysis cannot be
fully automated, since a FLOW network will not include all information – in particular,
interpersonal aspects will be hard to visualize. However, there are some centrality meas‐
ures which help to identify critical points and support the individual analysis. We will
focus on embedding ﬁve centrality measures in FLOW analysis.
Degree centrality is a local measure counting the number of edges of a node, i.e.
the sum of sources (in-degree) and receivers (out-degree) of information. In directed
networks, in- and out-degree can be considered as two diﬀerent values to diﬀerentiate
between the quantities of incoming and outgoing information [11]. Applied to FLOW
analysis, degree centrality measures the frequency and the amount of incoming and
outgoing information. Degree centrality is a ﬁrst indicator for a person being important
for information sharing. It identiﬁes central persons who need to share or receive much
information. A drop out of one of these central persons can threaten project success since
all information passing this person would either be lost or delivered in a longer time.
Closeness centrality measures the average distance of a node to each other node
within the network [11]. It can be seen as a measure for the well-positioning of a person
within a network [10]. This measure assumes that information originates from all other
persons with equal probability and that all information ﬂows along the shortest paths
[10]. A low raw closeness score is an indicator for a person being well-positioned to
obtain novel information early [10]. Nodes with a short distance to other nodes tend to
receive ﬂows sooner, assuming comparable times of information transport between all
nodes in the network. Thus, closeness centrality is normally interpreted as the time until
arrival of something ﬂowing through a network [12]. Persons with high closeness scores
have short distances and hence a high collaboration [13] with other persons and will
receive information sooner. In FLOW analysis, these persons are very important for
information sharing. Identifying these persons is one of the aims of the FLOW method,
since they obtain novel information early. In addition, they bundle much information
that can be shared within the network. In some cases, there are tasks referencing to
several processes. Furthermore, they bundle information of many processes and hence
can transport them to responsive persons. Replacing such persons by farseeing persons
with less closeness centrality would go along with a drop in performance.
Betweenness centrality measures how often a node is located on the shortest (i.e.
geodesic) paths between two other nodes. Persons are central if they have the potential
to mediate the ﬂow of resources or information between other actors [14]. Thus, persons
who can mediate the ﬂow of information between other actors are considered as central.
This measure identiﬁes persons who are indispensable for information sharing between
other persons. Furthermore, regular exits due to holidays, retirement, or termination, can
cause problems since it is diﬃcult to replace the respective persons adequately. There‐
fore, knowing these persons is desirable.
Flow betweenness extends betweenness centrality in two ways: It considers all paths
between nodes and not only geodesics, and it is appropriate for graphs and weighted
graphs in which larger weights indicate stronger ties between actors. Edge weights are

Applying Social Network Analysis and Centrality Measures

383

taken to represent the potential for the ﬂow of information between nodes assuming that
the constraints for a metric in a mathematical view (except for the symmetry in directed
graphs) hold [15]. Applied to FLOW analysis, ﬂow betweenness describes the amount
of information, forwarded by a single person by considering all information transported
via this speciﬁc person. These persons are interesting because they coordinate the
process-independent information ﬂow.
As an extension of degree centrality, eigenvector centrality deﬁnes actors to be
central if they have ties to other actors who, in turn, are central themselves [14]. In
FLOW analysis, eigenvector centrality identiﬁes persons who are important for infor‐
mation ﬂow since they probably get many information in a short time. Furthermore, they
can share their information with other important persons with little eﬀort. These persons
themselves can share the information so that the information can spread in a short time.
Each of these centrality measures is an indicator for problems concerning the infor‐
mation ﬂow within a team. Hence, we use them to help the FLOW analyst identify critical
points in FLOW networks. Having determined these measures, the FLOW analyst can
interpret the network and help the team ensure the right amount of information ﬂow and
unburden persons who must not fail.
3.2 Transforming Flow Diagrams to Networks
In order to apply centrality measures to a FLOW diagram, the diagram has to be trans‐
formed into a valid network ﬁrst. The ongoing of this transformation is not trivial, since
FLOW diagrams highly resemble networks but diﬀer in some signiﬁcant aspects. To
clarify this, we provide a short introduction to structure of FLOW diagrams and their
elements below.
A FLOW diagram consists of information stores, information ﬂows, and FLOW
activities.1 Having a nested structure, a FLOW activity can contain information stores,
information ﬂows, and other FLOW activities. To convert the FLOW diagram into a
network, information stores and information ﬂows can be mapped directly to nodes and
edges of these networks but nested structures need to be dissolved.
As illustrated in Fig. 2, there are three diﬀerent ways of transforming a FLOW
activity. The left side of the ﬁgure shows an example of a FLOW diagram containing
an activity with a nested structure to be transformed. On the right side of the arrows are
all possible alternatives to transform the nested structure of the activity:
1. Directly connecting all incoming and outgoing stores of the activity.
2. Representing the activity through a distinct node connected to all incoming and
outgoing stores.
3. Specifying the underlying information ﬂow structure by deciding for each incoming
and contained store to which contained or outgoing store it has to be connected. In
this case, “A” is deﬁned as an incoming store, “X” and “Y” are deﬁned as contained
stores and “B” is deﬁned as an outgoing store.

1

Refer to [16] for a detailed description of the FLOW syntax.

384

S. Kiesling et al.

Fig. 2. Three alternatives to represent the same FLOW activity (left) in a network (right)

To transform FLOW activities properly, we have to clarify the purpose of the actual
activity ﬁrst. In the ﬁrst alternative, store “A” provides information for the activity,
which might or might not be processed and changed in the activity. It is obtained by
neither “X” nor “Y”. “B” receives the possibly changed information. This signiﬁes that
an internal way of processing this information is not considered to be relevant for the
overall process.
In the second alternative, the activity represents an abstract task or sub-process,
which can neither be transformed nor removed. This may occur, if a FLOW diagram
contains uncertain information, which could not be elicited during FLOW interviews.
For example, “A” and “B” do not know how information is being passed in the activity
and “X” and “Y” could not be identiﬁed and are thereby obscured for the FLOW analyst.
The third alternative represents a fully unrestricted way of interconnecting incoming
and outgoing stores. This case should be used in all cases except alternative 1 and 2.
This case can occur if the activity contains a sub-process of the overall modelled process.
In this example, “A” provides information for “X”, while “Y” provides information for
“B”.
Only the analyst can decide which alternative of these three above he should apply.
He has to decide for each FLOW activity which case is appropriate and which incoming
and outgoing information stores have to be selected.
3.3 Calculating Key Indicators
We developed a converter to transform FLOW diagrams from xml ﬁles created with our
FLOW editor ProFLOW into networks. The transformed network now has to be
analyzed by applying centrality measures. For this task, we use UCINET2 to calculate
key indicators based on our centrality measures depicted above. The converted FLOW
networks can be imported to UCINET, which calculates the key indicators. This process
cannot be done automatically yet because user interaction with UCINET is necessary.
After calculation, the determined key indicators are exported by the analyst for further
analysis of the FLOW network.

2

UCINET is a common used tool for social network analysis.

Applying Social Network Analysis and Centrality Measures

4

385

Discussion

In this section we point out the limitations of our method followed by an interpretation
of our ﬁndings.
4.1 Limitations and Interpretation
Applying information ﬂow analysis in a software company does not imply to remedy
all company’s problems, but can improve information ﬂows. Furthermore, ﬁndings
uncovered through information ﬂow analysis must be accepted by the company and
implemented by the employees.
The beneﬁt of information ﬂow analysis depends on the chosen ﬂow analysis method.
In our case, we have chosen FLOW to analyze information ﬂows which suited well for
the analyses we conducted so far. Applying the FLOW method takes a lot of time and
is highly based on personal experience. The approach requires a considerable number
of manual steps. One reason is that these steps require a decision from the FLOW analyst
that cannot be automated (see Fig. 2). Another reason is the interface between UCINET
and our ProFLOW editor. UCINET does not allow automated processes. It is likely that
the analysis will become more time-consuming for the analyst but he will gain more
insights.
A way to decrease manual steps would be to integrate the converter into the used
FLOW editor and relieve the user to switch between applications and to choose the
FLOW diagram to convert.
In this contribution we applied centrality measures which can enrich the original
FLOW method. Considering the added information (as key indicators) can lead to results
that have been overlooked before.

5

Conclusion

This paper describes concepts, methods and tools to improve the analysis of information
ﬂows by improving the FLOW method with social network analysis. The approach is
to convert FLOW diagrams into networks and to calculate centrality measures for the
converted diagrams. These centrality measures support the analysis of structure and
meaning of the original FLOW diagram and enables an easier detection of inconsisten‐
cies and anomalies in information ﬂows. This improves processes and procedures in the
project as well as in the entire software company.
We accomplished our goal to reduce the inﬂuence of personal experience on the
FLOW method as well as our purpose to uncover more ﬁndings in information ﬂows in
software companies.
Acknowledgements. This work was supported by the German Federal Ministry of Education
and Research under grant number K3: FKZ 13N13548 (2015-2018) and by the German Research
Foundation (DFG) under grant number 263807701 (Project TeamFLOW, 2015-2017).

386

S. Kiesling et al.

References
1. Stapel, K.: Informationsﬂusstheorie der Softwareentwicklung, 1st edn., Univ, Hannover,
Dr. Hut, München (2012)
2. Stapel, K., Knauss, E., Schneider, K.: Using FLOW to improve communication of
requirements in globally distributed software projects. In: Workshop on Collaboration and
Intercultural Issues on Requirements: Communication, Understanding and Softskills
(CIRCUS 2009), pp. 5–14 (2009)
3. Durugbo, C., Tiwari, A., Alcock, J.R.: Modelling information ﬂow for organisations: a review
of approaches and future challenges. Int. J. Inf. Manag. 33(3), 597–610 (2013)
4. Merrill, J., et al.: Findings from an organizational network analysis to support local public
health management. J. Urban Health Bull. N.Y. Acad. Med. 85(4), 572–584 (2008)
5. Borgatti, S.P., Foster, P.C.: The network paradigm in organizational research: a review and
typology. J. Manag. 29(6), 991–1013 (2003)
6. Braha, D., Bar-Yam, Y.: Information ﬂow structure in large-scale product development
organizational networks. In: Vervest, P., Heck, E., Pau, L.-F., Preiss, K. (eds.) Smart Business
Networks, 1st edn., pp. 105–125. Springer, Heidelberg (2005)
7. Friedkin, N.E.: Information ﬂow through strong and weak ties in intraorganizational social
networks. Soc. Netw. 3(4), 273–285 (1982)
8. Ryynänen, H., Jalkala, A., Salminen, R.T.: Supplier’s internal communication network during
the project sales process. Proj. Manag. J. 44(3), 5–20 (2013)
9. Wasserman, S., Faust, K.: Social Network Analysis: Methods and Applications, 1st edn.
Cambridge University Press, Cambridge (1994)
10. Borgatti, S.P.: Centrality and network ﬂow. Soc. Netw. 27(1), 55–71 (2005)
11. Freeman, L.C., Roeder, D., Mulholland, R.R.: Centrality in social networks: II. Experimental
results. Soc. Netw. 2(2), 119–141 (1979)
12. Borgatti, S.P.: Centrality and AIDS. Connections 18(1), 112–114 (1995)
13. Schneider, K., Liskin, O.: Exploring FLOW distance in project communication. In: 2015
IEEE/ACM 8th International Workshop on Cooperative and Human Aspects of Software
Engineering (CHASE), pp. 117–118 (2015)
14. Faust, K.: Centrality in aﬃliation networks. Soc. Netw. 19(2), 157–191 (1997)
15. Newman, M.J.: A measure of betweenness centrality based on random walks. Soc. Netw.
27(1), 39–54 (2005)
16. Stapel, K., Schneider, K., Lübke, D., Flohr, T.: Improving an industrial reference process by
information flow analysis: a case study. In: Münch, J., Abrahamsson, P. (eds.) PROFES 2007.
LNCS, vol. 4589, pp. 147–159. Springer, Heidelberg (2007). doi:10.1007/978-3-540-73460-4_15

Design of Project Management Capabilities
Solvita Berzisa ✉ and Jānis Grabis
(

)

Information Technology Institute, Riga Technical University, Kalku 1, Riga, Latvia
{solvita.berzisa,grabis}@rtu.lv

Abstract. Project management (PM) capabilities deﬁne organizational abilities
of delivering predictable project results in a changing environment. To increase
maturity of the PM capabilities, they need to be formalized, aligned with standards
and best practices, measured, controlled and improved. One of capabilities stand‐
ardization and formalization approaches is to perform capability modelling. This
paper proposes to use the Capability Driven Development methodology to model
the PM capability models because this methodology allows for representing
unique capability delivery context situations and speciﬁcation of context-aware
PM processes. The paper outlines the capability modelling processes and elabo‐
rates the capability model for the risk management sub-capability. Potential
applications of the capability model are discussed.
Keywords: Project management capability · Capability modelling · CDD
methodology · Capability design tool

1

Introduction

An organizational project management (PM) capability enables success of PM in organ‐
ization and provides processes, technologies and peoples for predictable PM delivery
[1]. PM is a process of applying knowledge, skills, methods, techniques and tools to
project activities to meet the project requirements [2]. Organizational PM is a framework
for aligning PM practice with the organizational strategy by customizing or ﬁtting these
practices to organization context and situation [3]. An organization needs to know what
speciﬁc PM practices, knowledge, skills, tools and techniques are necessary for it to
successfully achieve the organization strategy and eﬀective PM [1]. So it is necessary
to identify current organizational PM capabilities, required improvements and establish
a roadmap to implement these improvements [1].
The Capability Maturity Model [4] evaluates an organizational ability to complete
projects successfully and describes practices helping to increase process maturity [5],
[6]. However, the maturity level is not the only factor contributing to the project success
[7]. Similarly, PMBOK [2] describes project management references processes without
considering speciﬁc project management situations. Recently, Capability Driven Devel‐
opment (CDD) [8] methodology has been proposed to enable development of organi‐
zational capabilities with respect to speciﬁc contextual situation. It deﬁnes capabilities
as ability and capacity that enable an enterprise to achieve a business goal in a certain
context. Capability modelling and design ensures formal deﬁnition of the PM capability
to provide for evaluation of the current PM capability and its improvement options.
© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 387–395, 2016.
DOI: 10.1007/978-3-319-49094-6_26

388

S. Berzisa and J. Grabis

The purpose of this position paper is to present a PM capability model and illustrate
a process of transforming traditional reference PM processes into context-aware PM
processes suitable for development of context-aware Project Management Information
Systems (PMIS). Development of context-aware PMIS as well as analysis, evaluation
and improvement of the PM capabilities are the main application areas of the PM capa‐
bility model. The main contribution of this work is development of building blocks (i.e.,
PM capability models and patterns) needed for further elaboration of context-aware
PMIS. That enables improvement of maturity of the PM processes on the basis of estab‐
lished PM practices.
The rest of the paper is structured as follows: Sect. 2 describes theoretical background
about the PM capabilities, capability modelling approach and motivation of the PM
capability modelling. Design of the PM capability model is presented in Sect. 3 and
application possibilities of models are discussed in Sect. 4. Conclusions and future work
are presented at the end of the paper.

2

Theoretical Background and Motivation

The PM capability (described in Sect. 2.1.) is one of digital organization’s capabilities
that is important for successful project-oriented organizations and is often evaluated
during tenders and outsourcing together with technical capabilities [2]. Capability
modelling (described in Sect. 2.2.) helps to understand and standardize organization
current PM practices and to understand what PM capabilities an organization has. Other
beneﬁts of the PM capability modelling are:
• Understanding of factors that aﬀect project and PM in an organization;
• Identiﬁcation of PM capability gaps preventing fulﬁlment of the strategy;
• Identiﬁcation of the PM capabilities maturity level and improvement possibilities
based on context situation speciﬁc PM practices. Also improvement of quality
management in an organization because many of quality management requirements
(e.g. ISO 9001) are related to implemented PM practices;
• Helping to understand requirements and to choose PMIS for an organization;
• Helping to understand conﬁguration requirements of PMIS for a particular project;
• Movement towards design and development of the context adaptive PMIS.
2.1 Project Management Capability
The PM capabilities includes peoples, processes and technologies [1]. Eﬀective PM
requires the right peoples with right skills [1] and is one of organization challenges to
get these peoples. The PM capability focuses to PM processes, but also needs to review
related process in project oriented organization – product related and support processes
[2]. All processes need to be tailored to organization context situation and standardized,
measure, controlled and improved to achieve higher processes capability [1]. From the
PM capability perspective, processes of organization project [2], program [9] and port‐
folio management [10] need to be considered. The third element of the PM capabilities

Design of Project Management Capabilities

389

is technologies available to the organization. For example, PMIS that supports imple‐
mentation of processes and helps people to perform their tasks.
2.2 Capability Modelling Approach
The capabilities are modelled using concepts deﬁned in the capability meta-model [8]
(simpliﬁed view of main concepts given in Fig. 1). Every capability has goals and
achievement of these goals is measured by indicators or KPI. The context (context set,
context element range, context element, context element value) deﬁnes circumstance
aﬀecting capability delivery and also deﬁnes context situations in which the capability
being able to deliver. The capability delivery is supported by a process. Process variants
can be constructed for dealing with speciﬁc capability delivery context situations.
Patterns are used to support capability design. The patterns provide reusable solutions
for capability delivery. They are also characterized by their context, which deﬁnes
situation when this pattern is applicable.

ContextElement

1..*
requires

influences

consists of
1

1..*

1..*

*

1..*

1
has value

*

1..*
requires

requires

Context Element
Value

ContextSet

Capability
1

0..1
0..1

1

1

1..*
consists of

KPI

Indicator

ContextElementRange
1

has

1
1..*
motivates

supported by

requires

1

Goal
*
0..1

requires
1..*

0..1

1..*

1

*
Context Situation
1..*

1

Pattern

Process

ProcessVariant
1..*

1

Fig. 1. Main concepts of capability modelling

3

PM Capability Design

According to the CDD methodology, the capability design process can be organized
following various pathways [11]. For the PM capability design purposes the capability
modelling is organized according the following pathway [12]:
1.
2.
3.
4.
5.
6.
7.

Deﬁne and name capability;
Deﬁne the capability goals;
Model context - identiﬁcation of contextual factors aﬀecting the capability;
Deﬁne main activities of process supporting the capability;
Identify process variation - indicate process dependences on context;
Specify of capability indicators and KPI;
Identiﬁcation of appropriate patterns in pattern repository and integration in process.

390

S. Berzisa and J. Grabis

The capability model designed provides a capability overview and an elaboration of
one of the sub-capabilities, namely, project risk management capability. The capability
model is divided in several sub-models and CDT (Capability Design Tool) [12] is used
for modelling.
The PM capability consists of interrelated sub-capabilities in diﬀerent levels:
project, program and portfolio (Fig. 2 demonstrates PMI view to PM sub-capabilities
based on [2, 9, 10]). There are interrelated and shared elements among the sub-capabil‐
ities. Some of the sub-capability have duplicated names in diﬀerent levels but these
capabilities have speciﬁc meaning in each level. The main PM capability is “Eﬀective
and eﬃcient PM” [1].

Fig. 2. Composition of PM sub-capabilities

Deﬁnition of the goals is similar to enterprise modeling [13] but in the case of the
PM capability that also has been distributed to diﬀerent levels. It is possible to deﬁne
goals for the main PM capability and also separately for each sub-capability. Figure 3
summarizes main goals of the PM capability according to PMI [3]. The main goal and
focus of the PM capability is to derive business value from implementation of practices.

Fig. 3. Goals of PM capability based on [3]

The context situation of the PM capability depends on such contextual elements as
domain, structure of organization (also geographical distribution and location), culture,
technologies, human resources, project and other characteristics. The impact of these
context elements on project success have been analyzed in several investigations: culture

Design of Project Management Capabilities

391

aspects [14]; culture, industry and PM practice/technology [15]. Similarly, as for the
goals, the context can be modelled for the main PM capability and also for each subcapability. The context model helps to identify environment in which the organizational
PM capability works. Some example: PM in local project (location in one country) will
diﬀer from global project (involved diﬀerent countries, team distributed in diﬀerent time
zones) or IT and construction PM capabilities also is deliver diﬀerently. A fragment of
the general PM context model is given in Fig. 4. that includes culture, geographical
distribution and project characteristics context elements.

Fig. 4. Fragment of the PM context model

The next step is deﬁnition of the capability delivery process. This task is performed
for each PM sub-capability. The process initially describes base activities what is
followed by identiﬁcation of context dependent variations in the next step. The risk
management capability process is considered for demonstration purposes. The reference
risk management capability process and sub-processes are summarized in Table 1.
Table 1. Processes of project risk management

Name
Project
risk
management
Risk identification
Risk analysis

Risk treatment
Risk monitoring

Process

392

S. Berzisa and J. Grabis

Context situation changes aﬀect the capability delivery process and require process
variations. The process variants can be viewed as execution scenarios of the delivery
process in a speciﬁc context situation. The process variations for the context situations
deﬁne in Fig. 4 are: local Europe software development (this process variant describes
the process of managing a software development project in Europe performed by a team
located in one country) or global Asia software implementation (software implementa‐
tion in Asia performed by a globally distributed team). Process variants are identiﬁed
for each PM sub-capability and can be aggregated in an overall process for the main
capability. To identify the process variation can be used systematic analysis of process
variation points (process gateway where decision depends on context element value;
example of variation point is in Fig. 5.) and it combinations during the process.

Fig. 5. Updated risk identiﬁcation process

The indicators are used for performance evaluation of the capability delivery. Clas‐
sical PM capability indicators according to successful project deﬁnition [2] are time,
quality and budget, and number of satisﬁed customer and other indicators. At least one
indicator is needed to be deﬁned for each PM sub-capability. In the risk management
capability example, percentage of occurred unidentiﬁed risks is one of the indicators.
During capability design, the patterns are used as recommendations or solutions for
process execution in a particular context. They are stored in the pattern repository [16].
Some examples of the PM patterns are given in Table 2. These example patterns are
related to risk management capability and propose solutions for risk identiﬁcation in
Table 2. Examples of PM patterns

Pattern problem
Pattern 1:
Risk identification for low
priority project

Context and other
characteristics
Project priority =
normal
Capability = project
risk Mgmt

Pattern 2:
Risk identification for critical
project

Project priority =
critical
Capability = project
risk Mgmt

Solution

Design of Project Management Capabilities

393

two context situations – one solution for critical priority projects and other for normal
priority. The risk identiﬁcation process after integration of these patterns is given in
Fig. 5. After this update, the risk identiﬁcation process becomes more elaborated and
relies on best practices. Its performance depends of two context elements – area (change
basic risk checklist used for review) and risk priority (change processes of risk identi‐
ﬁcation and create process variation point). During pattern identiﬁcation and integration
new process variation points and new process variants can be identiﬁed. Figure 5 shows
these new context dependent variation points for the risk identiﬁcation sub-process.

4

Application of PM Capability Models

As mentioned in Sect. 2, there are diﬀerent beneﬁts of capability design and modelling
what can be summarized as two application directions of the developed capability
models:
1. Organization PM capability analysis. There are several objectives of this analysis,
for example, to analyze problems and to evaluate context that aﬀects the organization
PM capabilities; to evaluate capabilities and identify capability gaps; benchmarking
against the best practices and identiﬁcation of improvement possibilities to increase
the maturity level of the PM capability; and to evaluate possibilities for attaining
new capabilities.
2. PMIS tailoring to support contextual process variability what is discussed in more
detail below.
PMIS is a standardized set of automated tools and techniques used in PM for plan‐
ning, execution, management and closing of the project, as well as for collecting,
combining and distributing project information [2]. PMIS is also part of PM capability
and its quality impacts the PM capability. PMIS provides a wide range of functions
directly supporting PM, as well as tools for its conﬁguration and modiﬁcation, where
conﬁguration is perceived as the most appropriate PMIS setup depending on project
situation [17]. Deﬁnition of the PMIS conﬁguration requirements must include the
following information [17]: data entities used in project; attributes of each data entity;
and processes related to the data entities. The capability model speciﬁes the speciﬁc
project situation by deﬁning context situations. It helps to understand data entities, their
attributes and processes.
During the conﬁguration of context-aware PMIS is proposed to separate a core PM
system and a context adaptive part. The core PM system implements processes often
using packaged PM systems such Jira or Microsoft Project Server. Context speciﬁc
processes would be setup dynamically depending on the observed context situation. That
is to be performed by the context adaptive part according to speciﬁcations provides in
the capability model. Context data might be static or dynamic and context brokering
solutions could be used for retrieving dynamic context data.

394

5

S. Berzisa and J. Grabis

Summary and Conclusions

This paper introduces the PM capability models and describes PM capability design
process and challenges. Design of the PM capabilities helps to document and formalize
the PM processes in an organization, understand the context situation in which the
organization can ensure predictable project results. Main application directions of the
PM capability model are the organization PM capability analysis, and PMIS tailoring
to requirements of the organizational PM capabilities.
PM capability modelling is complicated by a number of issues:
1. The PM capability is not one capability but it consists of interrelated sub-capabilities.
In this paper, one of the sub-capabilities is elaborated. Each particular organization
needs to decide how to organize sub-capabilities because they can be joined and
spilled diﬀerently. For example, risk/communication management in all level will
be one capability or it will be separate for project, program and portfolio level.
2. The PM context varies among sub-capabilities. Each sub-capability might have its
own context situation. Context granularity also should be decided upon, for example,
location can be identiﬁed with continents or country.
3. The PM processes contain many interrelations and cycles. The processes include
many variation points and their combinations. Many processes are organization
dependent and are related to product delivery and support process details.
Currently developed capability models reported in this paper are derived from PMI
standards. That demonstrates that the PM capability models allow adopting existing
general practices and standards for modelling of context-aware PM processes. On the
other hand, that limits a breath of context situations covered to those envisioned in the
PMI standards. Consideration of other PM standards and practices is needed to cover a
wider range of context situations and will be tackled in future research activities.
Future work will focus to three directions:
1. Modelling: modelling of diﬀerent PM sub-capabilities with focus to design of the
general PM capability. This general PM capability can be used as a base for design
or evaluation of particular organization capability.
2. Development the context-aware PMIS as outlined in Sect. 4: research of possibilities
to design the context adaptive PMIS from methodological view and also from PMIS
functionality and possibilities to support adaption according to the context changes.
3. Creation of the PM pattern repository: design of the PM pattern repository will
ensure collection of PM practices and reusable solutions.
Case studies of the organization PM capabilities and the pattern repository usage and
experimental designs of the context-aware PMIS also will be performed.
Acknowledgement. Support for this work was provided by the Riga Technical University
through the Scientiﬁc Research Project Competition for Young Researchers No. ZP-2016/24.

Design of Project Management Capabilities

395

References
1. PMI: Organizational Project Management Maturity Model (3ed). PMI, Pennsylvania (2013)
2. PMI: A Guide to the Project Management Body of Knowledge (5ed). PMI, Pennsylvania
(2014)
3. PMI: Implementing Organizational Project Management: A Practice Guide. PMI,
Pennsylvania (2014)
4. Paulk, M.C., Curtis, B., Chrissis, M.B., Weber, C.V.: Capability Maturity Model SM for
Software, Version 1.1. Carnegie Mellon University, Pennsylvania (1993)
5. Miklosik, A.: Improving project management performance through capability maturity
measurement. Procedia Econ. Financ. 30, 522–530 (2015)
6. Nenni, E.M., Arnone, V., Boccardelli, P., Napolitano, I.: How to increase the value of the
project management maturity model as a business-oriented framework. Int. J. Eng. Bus.
Manag. 6, 1–6 (2014)
7. Lee, L.S., Anderson, R.M.: An exploratory investigation of the antecedents of the IT project
management capability. e-Service J. 5(1), 27–42 (2006)
8. Bērziša, S., Bravos, G., Gonzalez, T.C., Czubayko, U., España, S., Grabis, J., Henkel, M.,
Jokste, L., Kampars, J., Koç, H., Kuhr, J.-C., Llorca, C., Loucopoulos, P., Pascual, R.J., Pastor,
O., Sandkuhl, K., Simic, H., Stirna, J., Valverde, F.G., Zdravkovic, J.: Capability driven
development: an approach to designing digital enterprises. Bus. Inf. Syst. Eng. 57, 15–25
(2015)
9. PMI: The Standard for Program Management (3ed). PMI, Pennsylvania (2013)
10. PMI: The Standard for Portfolio Management (3ed). PMI, Pennsylvania (2013)
11. Koç, H., Sandkuhl, K.: A business process based method for capability modelling. In:
Matulevičius, R., Dumas, M. (eds.) BIR 2015. LNBIP, vol. 229, pp. 257–264. Springer,
Heidelberg (2015). doi:10.1007/978-3-319-21915-8_17
12. Bērziša, S., España, S., Grabis, J., Henkel, M., Jokste, L., Kampars, J., Koç, H., Sandkuhl, K.,
Stirna, J., Valverde, F., Zdravkovic, J.: Deliverable 5.2: The Initial Version of Capability
Driven Development Methodology, CaaS – Capability as a Service for Digital Enterprises,
FP7 project no. 611351 (2015)
13. Sandkuhl, K., Stirna, J., Persson, A., Wibotzki, M.: Enterprise Modeling: Tackling Business
Challenges with the 4EM Method. Springer, Heidelberg (2014)
14. Zwikael, O., Shimizu, K., Globerson, S.: Cultural diﬀerences in project management
capabilities: a ﬁeld study. Int. J. Proj. Manag. 23, 454–462 (2005)
15. de Carvalho, M.M., Patah, L.A., de Souza Bido, D.: Project management and its eﬀects on
project success: cross-country and cross-industry comparisons. Int. J. Proj. Manag. 33, 1509–
1522 (2015)
16. Stirna, J., Zdravkovic, J., Henkel, M., Kampars, J.: Capability patterns as the enablers for
model-based development of business context-aware applications. In: Fleischmann, A.,
Guédria, W., Heuser, L., Kornyshova, E., Loucopoulos, P., Oberweis, A., Pastor, O., Proper,
H.A., Schmidt, W., Schönthaler, F., Stary, C., Vossen, G., Zdravkovic, J. (eds.)
Complementary Proceedings of the Workshops TEE, CoBI, and XOC-BPM at IEEE-COBI
2015, pp. 1–12, CEUR-WS.org (2015)
17. Bērziša, S., Grabis, J.: Combining project requirements and knowledge in conﬁguration of
project management information systems. In: Caivano, D., Baldassarre, M.T., García, F.O.,
Genero, M., Mendes, E., Runeson, P., Sillitti, A., Travassos, G.H., Visaggio, G. (eds.) Second
Proceedings: Short Papers, Doctoral Symposium and Workshops of the 12th International
Conference of Product Focused Software Development and Process Improvement (PROFES
2011), pp. 89–95. ACM, New York (2011)

Speed and Agility in System Engineering

Relationship of DevOps to Agile,
Lean and Continuous Deployment
A Multivocal Literature Review Study
Lucy Ellen Lwakatare(B) , Pasi Kuvaja, and Markku Oivo
Faculty of Information Technology and Electrical Engineering,
University of Oulu, Oulu, Finland
{lucy.lwakatare,pasi.kuvaja,markku.oivo}@oulu.fi

Abstract. In recent years, the DevOps phenomenon has attaracted
interest amongst practitioners and researchers in software engineering,
reﬂecting the greater emphasis on collaboration between development
and IT operations. However, despite this growing interest, DevOps is
often conﬂated with agile and continuous deployment approaches of software development. This study compares DevOps with agile, lean and
continuous deployment approaches in software development from four
perspectives: origin, adoption, implementation and goals. The study also
reports on the claimed eﬀects and on the metrics of DevOps used to
asses those eﬀects. The research is based on an interpretative analysis of
qualitative data from documents describing DevOps and practitioner’s
responses in a DevOps workshop. Our ﬁndings indicate that the DevOps
phenomenon originated from continuous deployment as an evolution of
agile software development, informed by a lean principles background.
It was also concluded that successful adoption of DevOps requires agile
software development.
Keywords: DevOps

1

· Agile · Lean · Continuous deployment · Eﬀect

Introduction

According to Sharp, Robinson and Woodman [1], the software engineering (SE)
ﬁeld is characterized by rapid change and constant emergence of new paradigms
that dominate most research agendas. In recent times, DevOps and the continuous deployment (CD) paradigm have attracted widespread interest among
SE practitioners and researchers [2,3]. With the increasing diversity of software development approaches described under the general heading of DevOps,
practitioner-led forums have highlighted a lack of clarity around the use of this
term, which has become an umbrella for more deﬁnitive terms appearing mostly
in mobile and web contexts [4]. DevOps is commonly associated with related software development approaches—especially agile, continuous delivery and CD—
to the extent that it is often conﬂated with these terms. While there are some
similarities, it remains crucial to diﬀerentiate between these concepts in order
c Springer International Publishing AG 2016

P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 399–415, 2016.
DOI: 10.1007/978-3-319-49094-6 27

400

L.E. Lwakatare et al.

to clarify the contribution and signiﬁcance of DevOps for both practice and
research.
As a blend of the words ‘development’and ‘operations’, DevOps emphasises
cross-functional collaboration within and between teams, with the goal of accelerating delivery of software changes [5]. Adoption of DevOps has many beneﬁts;
in particular, it enables better collaboration and reduces time to market of new
features to production [6]. According to Dingsøyr and Lassenius [3], the leading edge of DevOps is driven by the industry and consultants, with research
lagging behind in synthesizing and systematizing knowledge and in testing the
many claims made by proponents of the concepts. Among studies emerging to
address this concern, DevOps has been compared and contrasted with release
engineering practices [5] and roles [7].
As part of a broader goal of improving understanding of the DevOps phenomenon, the present study seeks to clarify the relationship of DevOps to associated
software development approaches. Speciﬁcally, an interpretative analysis of data
collected from online sources and from a workshop with practitioners explores
the relationship of DevOps to agile software development, lean software development and CD. The article also reports on the claimed eﬀects of DevOps and
on the metrics used to assess those eﬀects. The study addresses the following
research questions.
– RQ1. How does DevOps relate to agile, lean and CD?
– RQ2. What are the claimed eﬀects of DevOps, and what metrics can be used
to assess those eﬀects?
The rest of the paper is organised as follows. The next section describes the
background of DevOps and related work. Section 3 details research method, and
the results are presented in Sect. 4.

2

DevOps: Background and Related Work

The essential purpose of DevOps is to align the incentives of all those involved in
delivering software, with a particular emphasis on developers, testers and operations personnel [6]. The term has been popularised through a series of DevOpsDays, which started following a presentation by Patrick Debois and Andrew Clay
Shafers at the 2008 Agile Conference [8]. This phenomenon, and in particular
the deﬁnition of DevOps, has been said to lack clarity [4,5]. The ﬁrst deﬁnition
that can be considered scientiﬁc was proposed by Penners and Dyck [5] in 2015.
In 2016 Lwakatare, Kuvaja and Oivo [4] advanced the proposed deﬁnition as follows ‘a mind-set substantiated by a set of practices to encourage cross-functional
collaboration between teams—especially development and IT operations—within
a software development organization, in order to operate resilient systems and
accelerate delivery of change’. This enhanced deﬁnition was validated by analyzing descriptions of DevOps by practitioners in online documents such as blog
posts.

Relationship of DevOps to Agile, Lean and Continuous Deployment

401

DevOps is often associated (or used interchangeably) with the concepts of
agile software development and CD. In their book1 , Bass, Weber and Zhu [9]
note that the emphasis on the relationship between DevOps and agile practices
is one characterization of DevOps. Refering to IBM’s disciplined agile delivery,
they describe how DevOps adds to agile practices. According to their account
[9], DevOps practices aﬀect and relate to all three phases of IBMs framework:
inception, construction and transition. For instance, the construction phase of
disciplined agile delivery includes key elements of DevOps practices, which are
also agile practices that collectively play an important role in the ability to
automate the deployment pipeline [9]. These key elements include code branching
management, continuous integration (CI) and CD, whilst automated testing is
considered to be incorporated [9]. CD diﬀers from continuous delivery; while
the latter aims to keep software in a constantly releasable state [3], CD also
automates the ﬁnal step of delivering software to production as soon as code is
checked-in and all automated tests have been successfully passed [2]. CD enables
businesses to reduce cycle time so that feedback is quickly obtained from users.
In addition, the risk and cost associated with production deployments is reduced
and managed, as there is improved visibility in the delivery process itself [6].
Few studies have compared and contrasted DevOps with other areas of established knowledge and practices in SE. Penners and Dyck [5] recognised that the
lack of a clear deﬁnition of DevOps means that the term is easily confused with
or used as a synonym for other terms, such as release engineering. In investigating their diﬀerences and relationship, Penners and Dyck [5] found that, as a
discipline concerned with establishing and improving the process of delivering
high-quality software to customers in a holistic manner, release engineering is
broader in scope than DevOps. While not itself a discipline, DevOps stresses
the improvement of collaboration between development and operations teams
through cultural change in the operation of resilient systems and accelerated
delivery of software changes [5]. Kerzazi and Adams [7] compared the main
tasks of release and DevOps engineers by analysing online job postings, concluding that the role of a release engineer is broader than that of a DevOps
engineer because the former combines the principal activities of DevOps and
traditional build engineers. The present study aims to improve understanding
of the DevOps phenomenon by comparing and contrasting DevOps with other
software development practices, especially agile, lean and CD.

3

Research Method

Adopting a qualitative research approach, data were collected from (1) nonscientiﬁc documents, (2) scientiﬁc articles and (3) responses from practitioners who participated in a DevOps workshop. Scientiﬁc articles were gathered
by expanding our previous literature search on DevOps [10] to include recent
publications from November 2014 to March 2016. Non-scientiﬁc documents
were collected using the multivocal literature review (MLR) method [11–13].
1

For more detail, see [9], pp. 12–13.

402

L.E. Lwakatare et al.

Research Area

Data Collection

DevOps
phenomenon
RQ1.
RQ2.

Multivocal
Literatures

Scientific
documents

Results and
Reporting

Data Analysis

1. Perform Google
Search of ”What is
DevOps”

2. Include ML first based on
intial coding and then on
paper RQs

1. Include 22 studies from
Lwakatare et al. and
extend literature search
until March 2016

Responses from 1. Plan and conduct
workshop with
practitioners
practitioners

2. Search ”DevOps” in 5
databases and include
studies based on
selection criteria

2.Audio record and
take notes during
workshop

3. Quality assessment
based on the metadata
classification

3. From step 1 and 2,
include studies
relevant to the RQ1
of paper

3. Transcribed
workshop audios

Fig. 1. Research process

The responses of practitioners were obtained during a half-day workshop conducted on 25 April, 2016; the workshop was attended by eleven practitioners
from seven companies involved in the Need for Speed (N4S) project2 . Thematic
analysis was used to analyse the collected documents. The research approach is
summarised in Fig. 1.
3.1

Data Collection: Non-Scientiﬁc Documents

The ﬁrst part of the data collection process involved non-scientiﬁc documents,
such as blog posts and trade journals that are collectively known as grey or
multivocal literature (ML) [11–13]. ML typically comprises a large collection
of readily accessible writings available in a wide variety of forms. Because it
contains the views of a diverse set of authors and incorporates diﬀerent research
(or non-research) logics, it is important to have a clear goal prior to ML data
collection. Here, the overall research goal of an MLR was to understand how
diﬀerent practitioners variously describe the DevOps phenomenon. The goal of
this study—to identify how DevOps relates to agile, lean and CD—emerged from
initial coding within that overall goal.
There is no explicit guideline for collecting ML, and it is an acknowledged
challenge that the diversity of ML means that it cannot be collected and assessed
in the same way as scientiﬁc literatures. For this reason, data collection was
informed by procedures recommended or adopted by other authors using similar methods, including Ogawa [11], Tom, Aurum and Vidgen [12] and Garousi
and Mäntylä [13]. The data collection procedure took account of the following
aspects: (1) data sources and search strategy, (2) inclusion and exclusion and
(3) quality assessment.
2

http://www.n4s.ﬁ/en/.

Relationship of DevOps to Agile, Lean and Continuous Deployment

403

– Data Sources and Search Strategy. The Google search engine was used to locate
ML from the World Wide Web, using the query ‘What is DevOps?’. From the
retrieved records, links to search results were reviewed page by page, and the
outputs of each link were saved in PDF format. This procedure was followed
until pages showing job adverts began, at which point the review was stopped.
In total, 230 records were gathered for inclusion in the next step.
– Inclusion and Exclusion. All 230 documents were imported into NVivo3 for
initial analysis. Inclusion or exclusion of records was simultaneous with initial coding. No pre-deﬁned themes were used; instead, themes were allowed to
emerge from the data. The process also involved the classiﬁcation of records
with diﬀerent attributes, such as author information (e.g. name, role and place
of work) and source information (e.g. publication year, forum and link). Following the review of all 230 documents, 201 sources were identiﬁed as relevant
while 29 were excluded as either duplicates, video links, pointers to catalogues,
course adverts, certiﬁcation adverts or presentation slides. The initial coding
identiﬁed ‘claimed eﬀects’and ‘DevOps in relation to agile, lean and CD’as
new themes for further analysis; the ﬁrst of these was found in 66 and the
second in 75 of the 201 documents.
– Quality Assessment. Quality assessment was performed mostly during the classiﬁcation of sources by metadata. This process provided information about
author and source, as well as the minimum information required to establish
credibility.
3.2

Data Collection: Scientiﬁc Documents

The second part of the data collection process involved scientiﬁc studies, using
as a starting point 22 scientiﬁc documents already identiﬁed in our previous
study [10] and adopting a similar approach to identify new scientiﬁc documents
published after 11 November 2014. Because of the large number of retrieved
studies, snowballing was not used. The procedure adopted (both previously and
here) involved the following steps.
– The search term ‘DevOps’ was used to search and retrieve documents published after 11 November 2014 from the following ﬁve databases: ACM Digital
Library, ISI Web of Science, Science Direct, IEEE Xplore and Scopus. Reﬁnement of publication year for each database yielded a total of 340 documents,
distributed as follows: ACM (41), Web of Science (37), Science Direct (72),
IEEE Xplore (72) and Scopus (118).
– Similar inclusion criteria were applied to scientiﬁc documents; these were (a)
relevance to the topic, (b) peer review and (c) publication in a scientiﬁc journal
or in conference proceedings. Following exclusion during the ﬁrst stage of 104
duplicates and 55 irrelevant document types, such as glossaries, indexes and
prefaces, a total of 181 studies remained. The inclusion criteria were then further applied by reading titles and abstracts for relevance to the topic, as well
3

http://www.qsrinternational.com/what-is-nvivo.

404

L.E. Lwakatare et al.

as publication forums (for peer reviews and publication in scientiﬁc journals
and conference proceedings). On that basis, 87 studies in total were identiﬁed
and added to the 22 previously selected studies for analysis. For the current
study, a fourth criterion was added, including only those publications discussing agile, CD and lean in addition to DevOps. Ultimately, 33 studies were
included in the present analysis to address the research questions.
3.3

Data Collection: Practitioners

The last source of data was a workshop conducted with practitioners involved
in the N4S project. This is a large national programme that aims to create the
foundation for software-intensive Finnish businesses in the digital economy, in
which participating companies seek to develop real-time experimental business
models and capability for instant value delivery, based on deep customer insights.
The workshop sought to identify what DevOps means to practitioners and how
they have implemented the approach in practice. Any eﬀects attributed to the use
of DevOps and their associated metrics were also explored. The workshop session
was organised by researchers from three institutions. Two researchers facilitated
the session by posing a question to the practitioners, who then worked in small
groups and presented their answers, using ﬂipcharts as communication aids.
The presentations and discussions were recorded and subsequently transcribed
for analysis.
3.4

Data Analysis

Using thematic analysis, all data from the literature search, MLR4 and workshop
were saved and analysed in NVivo. Thematic analysis is a process for identifying,
analysing and reporting data patterns (themes); according to Braun and Clarke
[14], this process has six phases. The ﬁrst phase of thematic analysis, familiarisation with data, was conducted in tandem with the second phase, generating
initial codes. The initial codes were identiﬁed inductively from ML documents
into categories (themes) related to the initial research objective (describing the
DevOps phenomenon). The analysis yielded multiple themes, including the two
considered here: (1) DevOps in relation to agile, lean and CD and (2) Claimed
impacts of DevOps and metrics. These two themes were also used for deductive
coding of scientiﬁc articles and responses from practitioners. The third phase of
thematic analysis, searching for themes, was executed during the second iteration of coding, which involved inductive coding of the above two categories.
In addition to thematic coding, some elements of discourse analysis were used
speciﬁcally in the coding of emerging sub-themes for DevOps in relation to agile,
lean and CD practices, focusing on the words that authors used in explaining
their understanding of the concept. Discourse analysis is often utilised to understand what people are using language for in a given situation. After completing
4

Link to list of ML: http://tinyurl.com/z3jpu5v.

Relationship of DevOps to Agile, Lean and Continuous Deployment

405

the second iteration, themes were reviewed for reﬁnement, naming (e.g. adoption, implementation) and subsequent reporting in order to complete the fourth
(reviewing), ﬁfth (deﬁning and naming themes) and sixth (producing report)
phases of thematic analysis.

4

Findings

This section presents the results of the analysis in respect of the two research
questions. Figure 2 shows a comparison of included MLR and scientiﬁc articles.
4.1

How DevOps Is Related to Agile, Lean and Continuous
Deployment

The analysis showed that authors relate DevOps to agile, lean and CD with
respect to four main topics: (1) origin and background, (2) adoption, (3) implementation and (4) goals. Table 1 (at the end of section) summarises these ﬁndings.
Total number of documents discussing
DevOp in r elation to Agile, Lean and CD
250

80
70
60
50
40
30
20
10
0

Number of documents

Number of documents

Total number of documents per
publication year

200
150
100
50
0
ML

ML

Scientific Articles

Scientific Articles

Documents discussing DevOps in relation to Agile, lean and
CD

Fig. 2. Publication years and number of included ML and scientiﬁc articles.

DevOps and Agile Software Development
1. Origin and Background. Most ML items described DevOps as an evolution
or extension of agile software development—in particular, of agile principles
and values as speciﬁed in the Agile Manifesto—with the aim of applying to
operations the same values, ideas and practices of agile software development
applied by developers. A similar claim was identiﬁed in scientiﬁc articles [15–
19]. In addition, some ML and scientiﬁc articles framed DevOps as a type or
part of agile software development [20–22].

406

L.E. Lwakatare et al.

2. Adoption. DevOps has also been related to successful adoption of agile software development. While few ML and scientiﬁc documents reported on the
latter, those that did so described agile methods as the key factor in using
DevOps [23–29]. The arguments fell into two categories. First, agile methods
are well established [28], and their emphasis on cooperation among stakeholders helps to formalise similar collaborations between development and
operations [18,24]. For this reason, IT organisations applying agile methods
will ﬁnd DevOps familiar and easy to use and work with [25,27]. Second, most
of the practices advocated by DevOps are incompatible with the traditional
waterfall approach because one of the main goals of DevOps is to deliver new
software features frequently and quickly [22,29]. On the other hand, use of
agile methods does not in itself amount to adopting DevOps, unless collaboration with operations has been established to address challenges of deploying
and operating software in live environment where customers can appreciate
the value of software in the same time support continuous ﬂow of work. The
latter was evidently an important learning point for practitioners during the
workshop; as one practitioner put it,
“I think that when we talk about success factors, we want to emphasise
people and commitment over technologies. This is a somewhat familiar
idea from the agile world, but when you talk about DevOps, many
people immediately mention CI and Jenkins and Docker; these are
useful tools as well, but I think that the best DevOps achievement
was just me and one of the ops guys sitting down and setting up the
initial environment together. The most important thing is to get the
communication going, and it helps a lot when you have that element
of commitment.”
3. Implementation. DevOps is also related to the implementation aspect of agile
software development, as evidenced by two main points deduced from ML
and supported by scientiﬁc articles. First, DevOps builds upon agile software development practices, especially continuous integration [22,26,30–33],
as well as on existing software development roles in agile methods. Second,
DevOps practices enhance existing agile practices and roles by also taking
account of operations activities. In this latter regard, rethinking the day-today working relationship of the development team with the operations team
and using agile principles to guide interactions between the two groups usually helps [17,24]. For instance, one ML source listed improvements to Scrum
methodology when using DevOps [34]. Additionally, the following response
from a practitioner in a large non-consultancy organisation elaborated on the
implementation aspect:
“One of the things I came across is learning, and then collaboration.
Breaking the silos and enabling communication channels increases the
organisations overall competence. Because people are learning and collaborating, youre able to iterate these cycles faster and faster. Even
in a big organisation, this enables quite a lot of things, like the ability
to break down bureaucracy and silos. In one instance, a big customer

Relationship of DevOps to Agile, Lean and Continuous Deployment

407

took a build directly from a CI at the end of the day. Normally, that
build should have ended up with the customer after six months because
of the bureaucracy chain. As a result, learning happened because somebody took this step and learned that he can take the build directly from
the CI and put it in a live system.”
However, unlike agile software development, DevOps does not have methodologies such as Scrum and extreme programing. Instead, DevOps embodies
a vast and diverse set of practices, from which some patterns can be identiﬁed [23] and generically applied under certain conditions, depending on the
environment [30].
4. Goals and Values. DevOps and agile software development are driven by similar basic goals and values. First, in terms of organisational silos, DevOps
tears down the walls between developers and operations, not unlike how agile
methods tear down the walls between the development team and the customer, as well as walls within the team. DevOps requires the development
team to work closely with the operations team and to understand its needs
and priorities in much the same way as agile methods require developers to
work with the customer to understand real business needs and priorities and
to solve business problems together. Second, DevOps brings more decisions
about how the total system is set up and run to the level of the dev-and-ops
teams, just as agile methods bring more decisions on how the system needs to
work down to development team level. Moreover, when their goals are combined, DevOps and agile software development focus on rapid and incremental
releases, gathering feedback quickly and correcting problems [35].
DevOps and Continuous Delivery/Deployment
1. Origin and Background. Some ML authors [8,36] have argued that DevOps
emerged directly from CD during a presentation given by John Allspaw and
Paul Hammond at a Velocity conference in 2009. On that occasion, Allspaw
and Hammond showed how Flickr used the DevOps approach to deploy software updates to production multiple times a day [8,36]. CD represents a
step forward from continuous integration, and DevOps is required in order
to complete such a step [37]. DevOps in context of CD highlights issues and
challenges of deploying software updates on continuous basis (and in fast iterations) to production environment whilst also ensure reliable operability of
the live environment. When asked why the embedded systems company he
worked for had decided to introduce DevOps, one practitioner explained:
“We saw that infra-development separation doesnt work anymore. The
mindset has changed a lot in recent years. We need increased transparency and getting things done. We have and are doing continuous
integration—CI mostly—but if were going forward from CI, we need
to have this DevOps going on.”
2. Adoption. According to many ML and scientiﬁc articles [16,26,33,35,37–43],
using DevOps is a necessary and potentially great way to achieve CD.

408

L.E. Lwakatare et al.

The split between developers and operations is a major obstacle to fast and
frequent software releases; DevOps can bridge the gap by enabling more eﬃcient collaboration [44]. The use of DevOps facilitates continuous software
updates in both web and embedded systems domains [45]. However, the use
of DevOps in the embedded systems domain raises a number of concerns
[26,30,45], including competency challenges related to the use of technologies
not traditionally associated with embedded systems, such as cloud systems
in cyber-physical and embedded systems [45]. These concerns were also mentioned by two practitioners from a telecommunications company. According
to the ﬁrst, who heads a global eﬃciency team,
“We have many product creation teams; I dont know how they do
DevOps in detail, but in general, I know that were not doing very
much product development using DevOps principles.”
The second practitioner (from network management) explained the reason for
this:
“For example, in our product line, were still looking into what it would
mean for us as a very large organisation of 2000+ people to move
towards DevOps. Were still struggling with CD, even for internal purposes. We have our internal customers to do the network integration
and network veriﬁcation, in which multiple diﬀerent product lines are
delivering multiple diﬀerent products for integration as one system. So
lets say that were struggling with the scale of this work and the multiplicity of diﬀerent organisations, as well as the diﬀerent parts of the
diﬀerent organisations and were not even talking about the operator
customer yet, at this point in time.”
3. Implementation. Both ML and scientiﬁc articles agreed that the practices
and tools advocated by DevOps help to achieve CD. Some of these practices involve: (1) organisational and cultural changes designed to eliminate
the gap between development and operations, (2) automation of deployment
process in the deployment pipeline, (3) automation in infrastructure management to ensure reliability, reproducability and programatic operations and,
(4) monitoring that ensures reliable operations and feedback [42,44,46–48].
Both sources claimed that CD implementation typically includes a deployment pipeline in which tests and the deployment process are automated
[19,35,46]. This deployment pipeline is an automated delivery process involving software changes from version control and software building through multiple stages of testing and deployment until a new software version is delivered
to users [49]. The process includes automated monitoring of software quality
and performance at multiple stages in order to obtain feedback quickly. Use of
a deployment pipeline may mean that developers are responsible for testing,
deploying and monitoring software [24].
4. Goals and Values. Both DevOps and CD share a background in agile and lean
thinking. For that reason, they exhibit the same driving goals and values (i.e.
small and quick changes with a focus on the end customer) but from diﬀering
perspectives [50]. In particular, CD is much broader than DevOps in the

Relationship of DevOps to Agile, Lean and Continuous Deployment

409

sense that it’s focus on fast, frequent and rapid releases requires incorporating
additional changes in software delivery organisation such as business model.
DevOps and Lean Software Development
1. Origin and Background. Most of what has been addressed in DevOps is
informed by a lean thinking background [26,30]. Lean thinking goes further
than agile software development by emphasising that software development
does account for the entire IT value stream. In the same way, the DevOps
approach seeks to optimize the whole process as required in lean software
delivery [26,51].
2. Adoption. The Japanese word kaizen, meaning continuous improvement, is
often used in the lean software development context [31,52]. A kaizen culture
is described as implementing behaviour that continuously shows improvements [31,52]. Applying kaizen thinking results in focused attention on
increased feedback and a culture of collaboration between developers and
operations [52].
3. Implementation. ML and scientiﬁc documents related DevOps implementation to lean software development in terms of the product development
process ﬂow, encompassing a systems thinking approach (i.e. end-to-end integration of product development processes). In contrast to batch-and-queue
thinking, product development ﬂow requires eﬀective teamwork beyond the
software development function to eﬀectively and immediately connect the
actions of product design, implementation and testing, as well as deployment
of product features once requirements are identiﬁed [31]. In addition, developers can improve decision making about incremental product improvements
by exploiting operations knowledge. This highlights the eﬀective utilisation
of a short feedback loop in eliminating waste and moving beyond individual
performance to system performance, informed by insights into product quality [38]. An ML author [53] argued for the use of kanban in implementing lean
software development, and this sits well with DevOps. This argument ﬁnds
empirical support in [19,47]. Schneider [19] advocates kanban rather than
Scrum because of the potential to work immediately on high priority targets
for feature development and particularly on tasks that need to be performed
at once when they are occurring e.g. reacting on the basis of insights or
anomalies from production.
4. Goals and Values. DevOps incorporates lean principles and thinking into the
entire IT value stream, which extends development into production.

4.2

DevOps Eﬀects and Metrics

Reﬂecting the dynamic nature of the DevOps concept, the analysis suggested
a wide range of eﬀects and metrics for measuring DevOps success. Table 2 lists
some of the frequently claimed eﬀects of DevOps and associated metrics identiﬁed in this study. (The list is not exhaustive.) The analysis focused on identifying claimed eﬀects of DevOps from ML and on validating those claims, based on

410

L.E. Lwakatare et al.
Table 1. Summary of DevOps in relation to agile, lean and CD ﬁndings
DevOps as related to

Results

Source

Agile

Agile software development principles,
values and practices are required for successful adoption of DevOps

58 of 75 ML 21 of 33 sci. doc.

CD

DevOps implementation is necessary to
enable CD

36 of 75 ML 23 of 33 sci. doc.

Lean

Lean software development principles
and practices inform DevOps implementation

22 of 75 ML 9 of 33 sci. doc.

scientiﬁc articles and practitioners workshop responses. While there was some
evidence of a correlation between diﬀerent claimed eﬀects, this was not of immediate interest here. Most authors of ML described the eﬀects with reference to the
annual State of DevOps survey reports (by Puppet Labs) and the 2015 annual
State of DevOps (by Delphix) as supporting evidence. One limitation is that
the identiﬁed eﬀects were too generic, and it remains diﬃcult to argue that such
eﬀects are based entirely on the integration of operations and development.
Most of the claimed eﬀects of DevOps and associated metrics mentioned in
ML were also reported in the scientiﬁc articles. For instance, Farroha [18] listed
advantages of DevOps and key performance indicators that can be used to assess
the approachs success in an organisational context. However, only a few studies
provided empirical evidence of the claimed eﬀects, such as improved release
cycle time [35] in the context of an e-commerce platform and [46] in a project to
develop and deliver a system to network service providers. Claims about DevOps
eﬀects and metrics similar to those reported in ML were identiﬁed from the
workshop with practitioners, who were asked to name the most important eﬀect
of DevOps in their organisation and how this was measured. Importantly, other
factors such as context, customers and type and maturity of the project/product
emerged as key considerations in relation to eﬀects and metrics.

5

Discussion and Conclusion

The aim of this study was to explore DevOps in its relationship to agile, lean and
CD. Our ﬁndings conﬁrm that the DevOps phenomenon originated from CD as
an evolution of agile software development and is informed by a lean principles
background. While claimed DevOps eﬀects and metrics were also identiﬁed here,
most lacked suﬃcient empirical support. These ﬁndings invite four conclusions
– Agile software development principles, values and practices are required for
successful adoption of DevOps.
– DevOps implementation is necessary to enable CD.
– Lean software development principles and practices inform DevOps implementation
– Eﬀects of DevOps include the ability to release software quickly, frequently and
with improved quality. However, the use of popular metrics such as deployment

Relationship of DevOps to Agile, Lean and Continuous Deployment

411

Table 2. Summary of DevOps Eﬀects and Metrics
Source
ML

Sci. docs

Practi.

Fast and more frequent releases/deployments (fast
time-to-market, shortened lead time, rapid releases)

58

[18, 21, 23, 30, 37, 45–47, 49]

X

Improved quality and reliability of software product,
deployments and infrastructure

25

[18, 23, 46, 51]

X
X

Eﬀects

Increased eﬃciency through automation

22

[16, 35, 47, 49]

Increased innovation and customer experience

18

[18, 45]

Fast recovery time following unexpected events,
security ﬂaws etc

16

[18, 21, 30, 48]

X

Increased transparency and collaboration between
stakeholders especially developers and operations;
Improved employee morale and work culture

15

[19, 31, 37]

X

Lower chance of product failure once deployed

14

Maintainability and scalability of infra, Ops
processes

13

[16, 23, 29, 37]

Cost saving (associated with eﬀective utilization of
resources, reduced outages and automation)

13

[16]

X

Frequent operational and user feedback

8

[18, 45]

X

Deployment frequency/rate

13

[37, 46, 51]

X

Mean time to recover

12

[18, 21, 47]

X

Cycle/Lead Time (Time required to release software to
production)

12

[30, 46, 47, 51]

X

X

Metrics

Change success/failure rate

11

[47]

Frequency of production failures/outages

6

[47]

Customer/business associated metrics (customer
satisfaction, conversion rate, sales)

6

[23, 47]

X

rate and cycle/lead time are insuﬃcient to determine whether these eﬀects
arise from implementation of DevOps or other approaches.
While the supporting empirical evidence was poor, many sources (especially
ML) stood over many of the presented claims. The lack of empirical evidence
suggests that DevOps is still in its infancy. Additionally, most discussion of
DevOps is conﬁned to informal talks, workshops and events in forums outside
more formal dissemination channels.
In conclusion, it seems clear that the DevOps concept is often conﬂated
with other concepts, notably agile software development and CD. However, the
present study serves to clarify the essential diﬀerences between these concepts.
Like agile, lean and CD practices, DevOps makes its own signiﬁcant contribution
to SE while also building upon existing knowledge from agile and lean software
development. Future DevOps research should be clearer about its unique contribution and about what distinguishes it from and connects it to other established
and emerging practices in SE. As empirical studies on DevOps remain relatively
rare as compared to broad and mainstream informal discussions on DevOps
within the industry, studies of real-world projects can help to test these results
and conclusions.

412

6

L.E. Lwakatare et al.

Validity and Limitations

This study fulﬁls the thick-descriptive purpose of qualitative research in explicating the meanings and perspectives constructed by individuals, groups or both in
a particular context rather than eﬀectiveness or causal relationships [54]. Three
major quality criteria for thick-description qualitative research were addressed
here: triangulated and descriptive data, member checking and situating the
meaning of data [54].
Triangulation and member checking are crucial techniques for establishing
credibility, and three data collection methods were utilised for triangulation.
Given the nature of the ML, we did not conduct member checking by sending
individual e-mails to ML authors. Instead, we presented and solicited feedback
on the initial resultsand especially on the claimed eﬀects and metrics of DevOps
as derived from MLto practitioners at the end of the workshop. We acknowledge
threats to validity, and that our results are limited (particularly in terms of
the relationship of DevOps to agile, lean and CD practices) because there was
no member checking. Additionally, as most of the ML sources lacked empirical
evidence or rigorous presentation of the authors opinions or experiences, the
claims made need to be veriﬁed in future empirical evaluations, regardless of our
selection criteria. Additionally, we do not contend that the study has examined
all relevant material, especially from online sources. However, we believe our
choice of materials to be representative. Limitations also arise from the fact
that a majority of sources are of low to medium pertinence in discussing the
relationship of DevOps to agile, lean and CD practices.
Acknowledgement. This work was supported by TEKES as part of the N4S project
of DIMECC (Digital, Internet, Materials and Engineering Co-Creation).

References
1. Sharp, H., Robinson, H., Woodman, M.: Software engineering: community and
culture. IEEE Softw. 17(1), 40–47 (2000)
2. Humble, J., Farley, D.: Continuous Delivery: Reliable Software Releases through
Build, Test, and Deployment Automation. Addison-Wesley Professional, Boston
(2010)
3. Dingsøyr, T., Lassenius, C.: Emerging themes in agile software development: Introduction to the special section on continuous value delivery. IST 77, 56–60 (2016)
4. Lwakatare, L.E., Kuvaja, P., Oivo, M.: An exploratory study of DevOps: extending
the dimensions of DevOps with practices. In: 11th International Conference on
Software Engineering Advances, pp. 91–99. IARIA, Rome (2016)
5. Penners, R., Dyck, A.: Release engineering vs. DevOps-an approach to deﬁne both
terms. In: Full-Scale Software Engineering, pp. 49–53 (2015)
6. Humble, J., Molesky, J.: Why enterprises must adopt DevOps to enable continuous
delivery. Cutter IT J. 24(8), 6–12 (2011)
7. Kerzazi, N., Adams, B.: Who needs release and DevOps engineers, and why? In:
International Workshop on Continuous Software Evolution and Delivery, pp. 77–
83. ACM (2016)

Relationship of DevOps to Agile, Lean and Continuous Deployment

413

8. Wikipedia: DevOps. https://en.wikipedia.org/wiki/DevOps
9. Bass, L., Weber, I., Zhu, L.: DevOps: A Software Architect’s Perspective. AddisonWesley Professional, Old Tappan (2015)
10. Lwakatare, L.E., Kuvaja, P., Oivo, M.: Dimensions of DevOps. In: Lassenius,
C., Dingsøyr, T., Paasivaara, M. (eds.) XP 2015. LNBIP, vol. 212, pp. 212–217.
Springer, Heidelberg (2015). doi:10.1007/978-3-319-18612-2 19
11. Ogawa, R., Malen, B.: Towards rigor in reviews of multivocal literatures: applying
the exploratory case study method. Rev. Educ. Res. 61(3), 299–305 (1991)
12. Tom, E., Aurum, A., Vidgen, R.: An exploration of technical debt. J. Syst. Softw.
86(6), 1498–1516 (2013)
13. Garousi, V., Mäntylä, M.V.: When and what to automate in software testing? a
multi-vocal literature review. IST 76, 92–117 (2016)
14. Braun, V., Clarke, V.: Using thematic analysis in psychology. Qualit. Res. Psychol.
3(2), 77–101 (2006)
15. Guerriero, M., Ciavotta, M., Gibilisco, G., Ardagna, D.: A model-driven DevOps
framework for QoS-aware cloud applications. In: 17th International Symposium on
Symbolic and Numeric Algorithms for Scientiﬁc Computing, pp. 345–351. IEEE
(2015)
16. Virmani, M.: Understanding DevOps & bridging the gap from continuous integration to continuous delivery. In: 5th International Conference on the Innovative
Computing Technology, pp. 78–82. IEEE (2015)
17. Olszewska, M., Waldén, M.: DevOps meets formal modelling in high-criticality
complex systems. In: 1st International Workshop on Quality-Aware DevOps, pp.
7–12. ACM (2015)
18. Farroha, B., Farroha, D.: A framework for managing mission needs, compliance,
and trust in the DevOps environment. In: Military Communications Conference,
pp. 288–293. IEEE, October 2014
19. Schneider, T.: Achieving cloud scalability with microservices and DevOps in the
connected car domain. In: CEUR Workshop Proceedings on Continuous Software
Engineering, pp. 138–141. CEUR-WS.org (2016)
20. Walter, J., van Hoorn, A., Koziolek, H., Okanovic, D., Kounev, S.: Asking “What”,
automating the “How?”: the vision of declarative performance engineering. In: 7th
International Conference on Performance Engineering, pp. 91–94. ACM (2016)
21. Syed, M.H., Fernandez, E.B.: Cloud ecosystems support for internet of things
and DevOps using patterns. In: 1st International Conference on Internet-of-Things
Design and Implementation, pp. 301–304. IEEE (2016)
22. Borgenholt, G., Begnum, K., Engelstad, P.: Audition: a DevOps-oriented service
optimization and testing framework for cloud environments. In: Conference of
Norsk Informatik Konferanse, pp. 146–157 (2013)
23. Cukier, D.: DevOps patterns to scale web applications using cloud services. In:
Conference on Systems, Programming, & Applications: Software for Humanity,
pp. 143–152 (2013)
24. Spinellis, D.: Being a DevOps developer. IEEE Softw. 33(3), 4–5 (2016)
25. Deshpande, A.: DevOps an Extension of Agile Methodology How It will Impact
QA? (2016). http://www.softwaretestinghelp.com/devops-and-software-testing/
26. Lwakatare, L.E., Karvonen, T., Sauvola, T., Kuvaja, P., Olsson, H.H., Bosch, J.,
Oivo, M.: Towards DevOps in the embedded systems domain: why is it so hard? In:
49th Hawaii International Conference on System Sciences, pp. 5437–5446. IEEE
(2016)
27. Racspace: Building Your DevOps Engine (2015). http://tinyurl.com/htovgd9

414

L.E. Lwakatare et al.

28. Staples, M., Zhu, L., Grundy, J.: Continuous validation for data analytics systems.
In: 38th International Conference on Software Engineering Companion, pp. 769–
772. ACM (2016)
29. Cook, N., Milojicic, D., Talwar, V.: Cloud management. J. Internet Serv. Apps.
3(1), 67–75 (2012)
30. Ebert, C., Gallardo, G., Hernantes, J., Serrano, N.: DevOps. IEEE Softw. 33(3),
94–100 (2016)
31. Fitzgerald, B., Stol, K.J.: Continuous software engineering: a roadmap and agenda.
J. Syst. Softw., 1–14 (2015)
32. Waller, J., Ehmke, N.C., Hasselbring, W.: Including performance benchmarks into
continuous integration to enable DevOps. ACM SIGSOFT Softw. Eng. Not. 40(2),
1–4 (2015)
33. Zhu, L., Bass, L., Champlin-Scharﬀ, G.: DevOps and its practices. IEEE Softw.
33(3), 32–34 (2016)
34. Scrum-Alliance: DevOps and Agile: Key Considerations for DevOps and Agile to
Coexist for Expedited Delivery (2014). http://tinyurl.com/j9xqxlw
35. Soni, M.: End to end automation on cloud with build pipeline: the case for DevOps
in insurance industry, continuous integration, continuous testing, and continuous
delivery. In: International Conference on Cloud Computing in Emerging Markets,
pp. 85–89. IEEE (2015)
36. A Short History of DevOps (2014). http://rewrite.ca.com/us/articles/devops/
a-short-history-of-devops.html
37. Balalaie, A., Heydarnoori, A., Jamshidi, P.: Microservices architecture enables
DevOps: migration to a cloud-native architecture. IEEE Softw. 33(3), 42–52 (2016)
38. Claps, G.G., Berntsson Svensson, R., Aurum, A.: On the journey to continuous
deployment: technical and social challenges along the way. IST 57, 21–31 (2015)
39. Cois, C.A., Yankel, J., Connell, A.: Modern DevOps: optimizing software development through eﬀective system interactions. In: International Professional Communication Conference, pp. 1–7. IEEE (2014)
40. Rodrı́guez, P., Haghighatkhah, A., Lwakatare, L.E., Teppola, S., Suomalainen,
T., Eskeli, J., Karvonen, T., Kuvaja, P., Verner, J.M., Oivo, M.: Continuous
deployment of software intensive products and services: a systematic mapping
study. J. Syst. Softw. (2016). http://www.sciencedirect.com/science/article/pii/
S0164121215002812
41. Wettinger, J., Andrikopoulos, V., Leymann, F.: Automated capturing and systematic usage of DevOps knowledge for cloud applications. In: International Conference
on Cloud Engineering, pp. 60–65. IEEE (2015)
42. Wettinger, J., Breitenbucher, U., Leymann, F.: Dyn tail - dynamically tailored
deployment engines for cloud applications. In: 8th International Conference on
Cloud Computing, pp. 421–428. IEEE (2015)
43. Wettinger, J., Andrikopoulos, V., Leymann, F.: Enabling DevOps collaboration
and continuous delivery using diverse application environments. In: OTM 2015.
LNCS, vol. 9415, pp. 348–358. Springer International Publishing, Switzerland
(2015)
44. Wettinger, J., Breitenbücher, U., Kopp, O., Leymann, F.: Streamlining DevOps
automation for cloud applications using TOSCA as standardized metamodel.
Future Gener. Comput. Syst. 56, 317–332 (2016)
45. Axelsson, J.: Architectural allocation alternatives and associated concerns in cyberphysical systems. In: The European Conference on Software Architecture Workshops, pp. 1–6. ACM (2015)

Relationship of DevOps to Agile, Lean and Continuous Deployment

415

46. Callanan, M., Spillane, A.: DevOps: making it easy to do the right thing. IEEE
Softw. 33(3), 53–59 (2016)
47. Rong, G., Zhang, H., Shao, D.: CMMI guided process improvement for DevOps
projects. In: International Workshop on Software and Systems Process, pp. 76–85.
ACM (2016)
48. Sun, D., Fu, M., Zhu, L., Li, G., Lu, Q.: Non-intrusive anomaly detection with
streaming performance metrics and logs for DevOps in public clouds: a case study
in AWS. Trans. Emerg. Topics Comp. 4(2), 278–289 (2016)
49. Weber, I., Nepal, S., Zhu, L.: Developing dependable and secure cloud applications.
IEEE Int. Comp. 20(3), 74–79 (2016)
50. Coté: DevOps, You Keep Using That Word What Is DevOps? A Discussion And History (2015). https://blog.pivotal.io/podcasts-pivotal/features/
devops-you-keep-using-that-word-what-is-devops-a-discussion-and-history
51. Liu, Y., Li, C., Liu, W.: Integrated solution for timely delivery of customer change
requests: a case study of using DevOps approach. Int. J. U- & E-Serv., Sci. & Tech.
7(2), 41–50 (2014)
52. IT-Revolution: DevOps Culture (2012). http://itrevolution.com/devops-culturepart-1/
53. Swift-Kanban: Kanban for DevOps/Continuous Delivery (2016). http://tinyurl.
com/gpvra3r
54. Cho, J., Trent, A.: Validity in qualitative research revisited. Qualit. Res. 6(3),
319–340 (2006)

Agile Practices, Collaboration and Experience
An Empirical Study About the Eﬀect of Experience
in Agile Software Development
Martin Kropp1(B) , Andreas Meier2(B) , and Robert Biddle3(B)
1

University of Applied Sciences and Arts Northwestern Switzerland,
Windisch, Switzerland
martin.kropp@fhnw.ch
2
Zurich University of Applied Sciences, Winterthur, Switzerland
meea@zhaw.ch
3
Carleton University, Ottawa, Canada
robert.biddle@carleton.ca

Abstract. Agile Software Development has been around for more than
ﬁfteen years and is now widespread. How does experience eﬀect the application of agile methods in organizations and what are the implications
on the individual and organizational culture? This paper presents indepth analysis of the Swiss Agile Study 2014. Switzerland oﬀers an illustrative microcosm of software development, with a range of industry
domains and sizes, and well-educated and internationally aware professionals. The study included more than a hundred professionals and managers, contacted through professional and industry associations. The topics addressed included experience with Agile development, motivations
for adopting it, barriers perceived, speciﬁc practices used, and speciﬁc
beneﬁts realized. Analysis of the data identiﬁed important trends and differences. Agile experience seems to be an important factor, which aﬀects
many aspects of practice and workplace culture. More troubling is that
it appears stress and overwork may be common among Agile professionals. All these ﬁndings illustrate important diﬀerences between Agile
processes as prescribed, and as actually practiced.
Keywords: Agile · Software process
culture · Software practices

1

·

Collaboration

·

Organizational

Introduction

After 15 years since the publication of the Agile Manifesto [1], Agile software
development has become mainstream. In most recent studies 70 % up to 94 % of
the participating companies claim to follow an Agile approach in their software
projects [11,15], with Scrum by far the most dominant process identiﬁed. These
studies report about the distribution and application of Agile practices in companies, and the eﬀects and changes they cause. But as far as we know, there are no
studies about if and how the application of Agile methods and practices change
c Springer International Publishing AG 2016

P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 416–431, 2016.
DOI: 10.1007/978-3-319-49094-6 28

Agile Practices, Collaboration and Experience

417

over time and with experience. Moreover, the Agile landscape has changed. In
earlier times, for example, Extreme Programming (XP) advocates often advised
novices to learn and apply all XP practices [5,8] whereas more recently, with a
greater variety of Agile processes and practices, it has become common for educators and workplaces to make their own selections [6,7] and blend as they see
ﬁt. This has meant that there can be a large diﬀerence between Agile processes
as-prescribed, and as-practiced.

Fig. 1. Number of technical and collaborative practices applied by company.

In this paper we explore the data from the Swiss Agile Study 2014 [11] with
respect to the agile experience of the study participants, and in particular we
were interested in answering the following questions:
– RQ1 : Is the usage of Agile practices dependent from the Agile experience of
the IT professionals and organizations?
– RQ2 : Does the application of Agile methods have any inﬂuence on the organizational culture of companies and, if so, how does this evolve with experience?
– RQ3 : What is the inﬂuence of Agile on the individual IT professional? Does he
work less overtime, has less stress? (as propagated by eXtreme Programming,
for example).
– RQ4 : What improvements are reported by Agile professionals and companies,
and are these dependent from their experience?

418

M. Kropp et al.

Our analysis of the data reveals two major issues: The amount of applied
Agile practices evolves only slowly with the Agile experience of the companies
as indicated in Fig. 1. Second, following an Agile approach over a long time
changes the organizational culture in companies.
One indication of the pattern and practices revealed is shown in Fig. 1. For
each company surveyed, this shows the relationship of the number of collaborative practices reported to the number of technical practices reported. Points
nearer the bottom left show companies reporting few practices, and points near
the top right show those reporting many. Points above the diagonal show companies reporting more collaborative practices, whereas those below the diagonal
report more technical practices. Each company’s experience with Agile is shown
by a colored symbol, with the darker colors indicating more experience. There
is some variation, but the pattern is evident: in general, more experienced companies report using more practices, and especially more collaborative practices.
Our study data reﬂects the current state of Agile software development in
Switzerland. Since Switzerland oﬀers an illustrative microcosm of software development, with a range of industry domains and sizes, and well-educated and
internationally aware professionals, we suggest that this situation might be very
similar to other countries. The results of our study show that it takes many
years of Agile experience until organizations have adopted the Agile work style.
Change to the collaborative practices seems to be especially hard, due to resistance to organizational changes. However, the longer organizations are applying
Agile development the more they tend to have a collaboration and cultivation
organizational culture. The results also show that it takes many years of Agile
experience until Agile development becomes eﬀective; but once Agile has been
really adopted it seems to bring improvements across the board. We speculate
that if organizations would pro-actively address the organizational change from
the beginning, Agile development might become eﬀective much faster and organizations would beneﬁt much earlier.
In the rest of this paper, we will show in detail how the application of the technical and collaborative practices changes with Agile experience, when the major
improvements occur, how barriers for further Agile application change, and how
the individual professional experiences the transformation to an Agile work style,
with all its consequences like personal stress, work life balance, engagement in
the project and identiﬁcation with the work and team. We will discuss the results
and their possible consequences for companies introducing Agile approaches.

2

Study Method

The Swiss Agile Study, conducted by the authors, is a biennial Swiss nationwide
online survey about the usage of development methods and practices in the IT
industry.
It addresses both Agile and non-Agile companies and IT professionals. It
comprises a catalog of about 30 questions about applied software development
methodology, techniques and practices on technical level, collaborative level and

Agile Practices, Collaboration and Experience
Table 1. Demographics of participating companies: role, % of participants

Table 2. Sizes of the participating
companies
Size

%

Role

%

Micro enterprise (≤ 9)

18 %

CEO

29 %

Small enterprise (10–49)

29 %

Development manager

14 %

Medium enterprise (50–249) 25 %

Project manager

8%

Team leader

7%

CIO

6%

CTO

5%

Senior software developer

5%

Designer/Architect

2%

Product manager
Other

419

Large enterprise ≥ 250)

28 %

1%
23 %

value level as outlined elsewhere [10]. The complete study reports are freely
available [11]. The latest survey, in 2014, included speciﬁc questions concerning the organizational culture in Agile and non-Agile companies according the
organizational culture model of William Schneider [12].
101 companies and 128 IT professionals participated in the last survey in
2014. We emailed 1461 companies and about 50001 IT professionals in Switzerland. The addresses of the companies and the professionals were delivered from
the participating IT associations SwissICT, SWEN and ICTnet, as well as from
our own institutional databases. In the company survey we addressed representatives of the company or the development department of a company, i.e. the
management level. Table 1 shows the demographics of the company participants.
It shows that almost 30 % of the participants were Chief Executive oﬃcers. The
relatively high number of “others” includes roles like Business Analysts, Business
Unit Managers, and CFOs, for example.
Table 2 show the distribution of the sizes of the participating companies following the oﬃcial categories of the Swiss Federal Statistical Oﬃce2 . The main
branches of the companies are IT Services/IT Consulting (40 %), Software Industry/Development (25 %). Medical and Health Care companies and Finance and
Insurance companies make 5 % each. The rest are 3 % and below. The responding IT professionals were typically Senior Software Developers (19 %), Software
Developers (18 %) and Project Managers (11 %).

1
2

We do not know the exact number, since these mailings were partially done by the
partner associations.
http://www.bfs.admin.ch/bfs/portal/en/index/themen/06/02/blank/key/01/
groesse.html.

420

3
3.1

M. Kropp et al.

Findings
Influence

In this section we look at the inﬂuences which Agile has on the software development process, i.e. which aspects in software projects and project management
got better or worse and how they change over time. The professionals responding
to the survey answered the question: “How has Agile software development inﬂuenced the following aspects?” To display which Agile inﬂuences are strongest, we
use co-occurrence grids, as shown in Fig. 2. Each grid shows the occurrences of
each inﬂuence, and co-occurrences of each pair of inﬂuences; this allows us to see
which inﬂuences are commonly occurring together. The inﬂuences are listed on
the left and the bottom of each grid. On the diagonal of each grid, the number
of professionals answering the question with improved or significantly improved
is shown on the diagonal, with higher numbers shown in shades of blue, and
lower numbers in shades of red. At each grid intersection, the co-occurrence of
two inﬂuences is shown.

Fig. 2. Agile inﬂuence by experience. (Color ﬁgure online)

At the left of Fig. 2 are the results for the 25 professionals with less than
2 years Agile experience. The pattern is clear and a bit of a disappointment:
Agile has clearly a positive inﬂuence on managing changing priorities and, to a
lower degree, on team morale, productivity and the development process. The
remaining items of the co-occurrence map are mostly red, i.e. those aspects
have not improved. In particular, development costs have not improved at all.

Agile Practices, Collaboration and Experience

421

Why is this? This is most likely because reducing development cost is not the
main objective of an Agile transformation. Instead of reducing development cost,
the money is spent on developing more features and better software products.
Also note that nobody regarded distributed teams as an inﬂuence.
In the middle of Fig. 2 are the results for the 68 professionals with 3 to 5 years
Agile experience. With more experience, a notable cluster of positive inﬂuences
occur. This cluster includes, in decreasing order, responding to changing requirements, development process, team moral, software quality, alignment between IT
and business, project visibility, maintainability and engineering discipline. Distributed teams are still not noted as an inﬂuence.
At the right of Fig. 2 are the results for the 27 professionals with more than
5 years Agile experience. The pattern is clear: mature teams proﬁt much more
of the beneﬁts of Agile. This is reﬂected in the blue color of the co-occurrence
grid. There are only three outliers: development cost, risk management and management of distributed teams. These outliers are a topic for concern and future
study.
In summary, the presented data suggests that Agile has a profound positive
inﬂuence on many aspects of software projects but it takes long time and suggests
great eﬀort.
3.2

Agile Practices

In the introduction to this paper we highlighted Fig. 1 showing the number of
practices varied with experience, and also the balance between categories of practices: technical and collaborative practices [10]. The practices we enquired about
are shown in Table 3. The practices comprise those recommended by eXtreme
Programming [4] and by Scrum [13], plus new practices that have come since
then like Continuous Delivery, Acceptance Test Driven Development, Behaviour
Driven Development. To explore the relationship between experience and practices more closely, we now show which practices are used with which levels of
experience.
At the left of Fig. 3 are the results for the 34 professionals with less than
2 years Agile experience. The pattern is clear: at the top in light blue are a
number of commonly used technical practices: unit testing, automated builds,
coding standards, and continuous integration. Lower down, the use of user stories, daily standups, and task boards are indicated. But otherwise, the grid is
predominantly red, showing very little usage of many practices, especially the
collaborative practices such as pair programming, collective ownership, retrospectives, and others.
To display which practices were used by each group, we again use cooccurrence grids, as shown in Fig. 3. In the middle of Fig. 3 are the results for
the 93 professionals with 2–5 years Agile experience. Here the pattern is somewhat diﬀerent. The practices used in the grid at the left are now more commonly
used, as indicated by the darker shade of blue: both for the technical practices at
the top, and the collaborative practices such as user stories and daily standups.

422

M. Kropp et al.

Fig. 3. Agile practices by experience. (Color ﬁgure online)

Agile Practices, Collaboration and Experience

423

Table 3. Agile practices
Practice

Type

Unit testing

Technical

Refactoring

Technical

Automated builds

Technical

Coding standards

Technical

Continuous integration

Technical

Clean code

Technical

Test Driven Development (TDD)

Technical

Continuous delivery

Technical

Automated acceptance testing

Technical

Burndown charts

Technical

Release planning

Collaborative

On-site customer

Collaborative

Iteration planning

Collaborative

User stories

Collaborative

Daily standup

Collaborative

Taskboard

Collaborative

Pair programming

Collaborative

Collective code ownership

Collaborative

Retrospective

Collaborative

Open work area

Collaborative

Kanban pull system/Limited WIP

Advanced

Acceptance Test Driven Development (ATDD) Advanced
Behavior Driven Development (BDD)

Advanced

But now there are other practices also indicated. Practices shown include burndown charts and release planning, retrospectives, and also use of pair programming.
At the right of Fig. 3 are the results for the 33 professionals with more than 5
years of Agile experience. Again, the pattern is diﬀerent, but continues the same
trends. Practices with light usages reported by the other groups are now much
more common, and practices such as collective code, test-driven development
(TDD), and clean code are strongly indicated.
Overall, the pattern is strikingly clear: professionals with more experience
report applying considerably more practices, more consistent use of practices,
and more use of related practices. Moreover, while those with less experience
report principally technical practices, more experience brings use of more collaborative projects.

424

3.3

M. Kropp et al.

Barriers

Our basic study [11] revealed that the ability to change organizational behavior
was listed as the greatest barrier to adoption of Agile software development in
an organization. We were also interested whether the barriers reported vary with
Agile experience within organizations, and if so, in which direction. Figures 4
and 5 show the relevant data for the IT professionals and companies. The ﬁgures
show which three barriers were most important.
The ﬁgures reveal that the change within the organization is by far the most
critical issue when companies switch to Agile, especially at the beginning. More
than 50 % of the IT professionals and more than 40 % of the companies ranked
this issue as the greatest barrier. We ﬁnd it interesting that this issue remains
among the greatest barriers even in companies experienced with Agile methods.
The ﬁgures also show that other issues become more important. However the
views of the professionals and the companies diﬀer. Both see customer collaboration becoming an important issue. But while the experienced professionals
rate the lack of skilled personal as critical, the managers of experienced companies suggest the handling of complex projects as the major barrier for further
adoption of the Agile methods.

Fig. 4. Barriers by companies

Fig. 5. Barriers by IT professionals (Legend as in Fig. 4)

Agile Practices, Collaboration and Experience

3.4

425

Organizational Culture

We examined the organizational culture of the participating companies, applying
the organization model of William Schneider. Schneider identiﬁes four diﬀerent
organizational cultures, the control culture, competence culture, collaboration
culture and the cultivation culture [12]. To identify culture, we used the questionnaire from Schneider’s book [12]. We were using ten out of 20 of William
Schneider’s questions about which organizational culture the participating companies exhibit. In this approach, the answer to each question identiﬁes which of
the four culture categories is indicated, and so the overall response yields four
numbers, one for each culture. In our results it was most common for one particular culture to dominate the others by three or four points, and we therefore chose
that as the dominant culture. We then evaluated if the organizational culture
depends on the Agile experience. Figure 6 shows that, as experienced by IT professional, organizations start with the preference of the traditional control and
competence culture, which changes with more Agile experience towards a cultivation and collaboration culture. This seems to imply, that following an Agile
approach over a long time changes an organization’s culture to collaboration and
cultivation culture.

Fig. 6. From a control and competence to a collaboration and cultivation culture.

3.5

My Agile

One section of the survey asked professionals to reﬂect on their personal experiences with Agile development; we called the section “My Agile”. Whereas earlier
we had asked about the inﬂuences they saw, this set of questions asked about
their actual personal experience. We suggested a range of possibilities, and asked
whether it applied in their case.

426

M. Kropp et al.

Fig. 7. My Agile by experience. (Color ﬁgure online)

Agile Practices, Collaboration and Experience

427

The data is illustrated in Fig. 7. We again show co-occurrence grids, and
distinguish professionals with less than 2 years, between 2 and 5 years, and
more than 5 years. Again, the grids show shades of blue for higher reported
factors, and red for those reported less. The patterns and diﬀerences are clear.
The grid on the left shows the data for the 22 less experienced professionals.
The most reported individual element was team morale, and other common elements reported include technical excellence, release stability, customer relations,
more fun, better team environment and initiative. There also are some patterns
where several of these factors are reported. The most striking part of the grid,
however, are the entries highlighted in red, indicating very few responses. Most
prominent were “working less overtime” and “I have less stress”. There are also
absences shown for team empowerment and servant leadership.
In the grids in the middle (2–5 years) and the right (more than 5 years),
we see remarkable changes. In particular far more factors are reported, and the
most experienced professionals show many reports of almost all of the elements.
However, in both cases, the weakest elements reported are the same as for less
experienced professionals.
The progression from less to more experience clearly shows an increasing
appreciation of a greater range of advantages, and that seems reasonable. However, the emphatic indications about overtime and stress lack are concerning. In
Beck’s books about Extreme Programming [3,4], there were speciﬁc recommendations about what in the second book was called “sustainable pace”. It appears
this is not occurring, especially for less experienced professionals, and perhaps
even for those with more experience. It is interesting to note that the stress and
overtime are happening at the same time that high team moral and fun are both
reported, suggesting a mixed picture of positive and negative aspects.
The milder issue raised by the data involves team empowerment and “servant leadership”. This concept is emphasized in several Agile processes, including
Scrum and XP, neither of which suggests a manager in any traditional sense. This
is a long history of approaches to socio-technical teams [14] and self-organizing
teams [9], and a wide range of related thought before and since. However, our
data suggests this is seen as one of the least common beneﬁts of Agile processes.
Further work is necessary to establish why this is so: Are professionals not concerned? Or has it simply not been possible? If, so, what were the circumstances
and consequences?
3.6

Quality Control

Figure 2 showed that signiﬁcant improvements in software quality come rather
late. Accordingly, we were also interested if there is a relation between quality
control and the late quality improvement, and if quality control is also dependent
on Agile experience.
Figure 8 shows the corresponding data for the IT professionals. We focus on
the data from the professionals because the quality control measures we explore
are at a detail level commonplace to professionals – managers may be unaware
of tools at this level.

428

M. Kropp et al.

Fig. 8. Agile quality control by experience. (Color ﬁgure online)

At the left of Fig. 8 are the results for the 30 professionals with less than 2
years Agile experience. The grid shows that these professionals just start using
code coverage and code reviews for quality control as indicated by the light blue
color. The application of other measures is low (light red to dark red color). In
the middle of Fig. 8 are the results for the 87 professionals with 2–5 years Agile
experience. Here the pattern is a little diﬀerent. The measures used before are
now more common (dark blue color) and also the usage of style checker tools has
increased. At the right of Fig. 8 are the results for the 32 professionals with more
than 5 years of Agile experience. Here the pattern changed signiﬁcantly. The
numbers of the used tools have further increased, and also Pair Programming is
now widely applied. Quite striking is the very high application of code reviews
(91 %).
Again, the pattern is clear: the usage of quality control tools increases with
experience. However, it takes more than ﬁve years of Agile experience until the
majority of the listed quality control measures are applied by the majority of
the professionals.

4

Discussion

Our goal in this paper was to study how Agile methods as-prescribed might
diﬀer from Agile methods as-practiced and how Agile evolves with experience in
organizations. In the previous sections we have presented our main ﬁndings with
respect to the research questions formulated in Sect. 1, and we will now attempt
to articulate answers to these questions:
Agile and Workplace Culture: One of the issues addressed in the 2014 survey
was workplace culture. In particular, the model of Schneider [12] was used, and
questions from Schneider’s book were included in the survey. The model suggests
4 kinds of workplace: Control, Collaborative, Competence, and Cultivation. We

Agile Practices, Collaboration and Experience

429

did ﬁnd that most sets did emphasize one of these, and the responses did align
with responses to questions about Agile methods: see Sect. 3.4.
Agile Changes with Experience: Perhaps the most wide-ranging ﬁnding from
our survey was how the responses to a range of questions related to experience with
Agile. In Sect. 3.1 we showed diﬀerences in reports of inﬂuences of Agile processes,
and in Sect. 3.2 we showed how the practices used varied from a few mostly technical practices for beginners, to a wide range of practices for those with more experience. As we outlined in the introduction, in the early days after the Agile Manifesto, it was common to advocate following all the practices, especially for beginners. More recently it has become common for workplaces to choose their particular process, mixing elements from Scrum, XP, and a range of sources. While our
results show a clear pattern, we do not know why that pattern emerges, what the
implications are, and what, if anything, might be preferred.
Agile Influence for Improvements: In Sect. 3.1 we showed that it takes very
long until Agile development becomes eﬀective in Agile organizations showing signiﬁcant improvements. While there are immediate improvements with handling
changing priorities, only in more experienced Agile organizations we see improvements in requirements management, software quality, productivity, and in the engineering discipline. This long-term beneﬁt may be related to the late application
of many practices, but it might also reﬂect the strong resistance to organizational
changes, especially when starting with the transformation to Agile. It could be a
matter of further research to ﬁnd out why the improvements come so late, and if it
is possible to shorten this path to success, and if so, how it could be done.
Agile is Commonly Stressful: In Sect. 3.5 above, we examined the results
from the survey where professionals reported their own reﬂections. Surprising to
us was that the topics least agreed with were all related to stress or overwork.
This seemed odd, considering that time-boxed iterations, collaborative environments, and self-management are all Agile practices, and would appear beneﬁcial
and lead away from stress. It appears this might not be the case. A potentially
important observation is that the professionals also reported that team moral
was high. This combination has been suggested before, in the 2007 study of social
factors in Agile teams by Whitworth and Biddle [16]. Quotes from that study
reﬂect a kind of characteristic zeal with a dark side: “This isn’t a place that you
go and hide”. Moreover, they point to research by Barker [2] on self-managing
teams, entitled “Tightening the Iron Cage: Concertive Control in Self-Managing
Teams”. That research shows how the strong social control in teams leads to
a peculiar kind of stress that is more intense because it is connected to social
factors in team commitment. Barker’s term “Iron Cage” is a sobering indication
that this aspect of Agile processes needs more attention.
Our aim for the survey was to be descriptive, so we must consider threats to
external validity, and especially factors that might make our results diﬀer from
reality. One such factor is self-selection, and this might lead to more participants
with responses they strongly wish to share: these might be positive or negative, and might therefore exaggerate results both ways. Another factor might be

430

M. Kropp et al.

uneven response from diﬀerent workplaces, especially in responses from professionals, where multiple participant might describe the same workplace. This was
not possible for the company responses, but there still might be more responses
from some industries or areas than others.
When analyzing the data, we noticed that the results from the IT professionals partly diﬀer signiﬁcantly from those of the managers. This is very obvious
in Figs. 4 and 5, in which both groups partially have quite diﬀerent opinions on
the barriers for further adoption of Agile methods. We feel these do not suggest
inconsistency, but rather a diﬀerent perception of their corresponding environments. While management may often have a high-level view on the business,
the IT professionals are dealing with everyday detail issues. Another surprising
result came up in the response to inﬂuence of Agile on various aspects in Fig. 2.
While the majority of experienced participants report improvements in requirements management and in handling of changing priorities, and report applying
the retrospective practice in Fig. 1, they report no improvements in risk management. We assume that this question has been misunderstood: While there is no
explicit “Risk Management” activity foreseen in Agile methods, short iterations,
early feedback and retrospective are risk management per-se. So we might have
to change this question in future surveys.

5

Conclusions

In this paper we addressed the issue of how Agile software development processes
are actually used and viewed, showing how the process as prescribed diﬀers from
the process as practiced. We presented data from the 2014 Swiss Agile Survey,
with responses from 101 companies and 128 professionals in Switzerland. The
survey covered a wide range of topics, from inﬂuences of Agile, to practices
actually used, barriers perceived, and reﬂections on personal experience.
Our main ﬁndings show several themes. One is that experience with Agile
methods is an important factor. More experience is related to a greater number
of practices used, greater emphasis on collaboration, and more sustainable workloads. Another ﬁnding is that, despite Agile principles such as “sustainable pace”
and strict time-boxing of iterations, there are warnings that stress and overload
remain a problem, especially for those new to Agile, but sometimes even for
experienced professionals. This is despite indications of high team morale.
There are limitations in our survey approach, because of self-selection and
of inability to follow up interesting results immediately with questions to probe
details and causality. In addition, the granularity in some questions might be
improved, for example to better gauge experience on a ﬁner scale.
In future research on this topic, we suggest that questions of causality should
be the priority. In particular, it would be interesting to know why some practices
appear only with more experience. In particular, practices around customer collaboration are seen as very important, but only arise with more experience. Is
it that only then is the importance realized, or are there barriers that prevent
it by those with less experience? Also, it would be important to explore causes

Agile Practices, Collaboration and Experience

431

of stress and overwork. Is it simply enthusiasm stemming from high morale, or
perhaps is it lack of familiarity with certain practices?

References
1. Agile Manifesto Signatories. Agile Manifesto (2001). http://agilemanifesto.org
2. Barker, J.R.: Tightening the iron cage: Concertive control in self-managing teams.
Adm. Sci. Q. 38(3), 408–437 (1993)
3. Beck, K.: Extreme Programming Explained: Embrace Change. Addison-Wesley
Longman Publishing Co., Inc., Boston (2000)
4. Beck, K., Andres, C.: Extreme Programming Explained: Embrace Change, 2nd
edn. Addison-Wesley Professional, Reading (2004)
5. Bergin, J., Caristi, J., Dubinsky, Y., Hazzan, O., Williams, L.: Teaching software
development methods: the case of extreme programming. SIGCSE Bull. 36(1),
448–449 (2004)
6. Diebold, P., Dahlem, M.: Agile practices in practice: a mapping study. In: Proceedings of the 18th International Conference on Evaluation and Assessment in
Software Engineering, pp. 30:1–30:10. ACM, New York (2014)
7. Diebold, P., Ostberg, J.-P., Wagner, S., Zendler, U.: What do practitioners
vary in using scrum? In: Lassenius, C., Dingsøyr, T., Paasivaara, M. (eds.)
XP 2015. LNBIP, vol. 212, pp. 40–51. Springer, Heidelberg (2015). doi:10.1007/
978-3-319-18612-2 4
8. Goldman, A., Kon, F., Silva, P.J.S., Yoder, J.W.: Being extreme in the classroom:
experiences teaching XP. J. Braz. Comput. Soc. 10(2), 4–20 (2004). http://www.
swissagilestudy.ch/ﬁles/2015/05/SwissAgileStudy2014.pdf
9. Hackman, J.R., Oldham, G.R.: Work Redesign. Addison-Wesley, New York (1980)
10. Kropp, M., Meier, A.: Teaching agile software development at university level:
values, management, and craftsmanship. In: Software Engineering Education and
Training (CSEE&T), pp. 179–188. IEEE, May 2013
11. Kropp, M., Meier, A.: Swiss agile study 2014. Technical report, Swiss Agile
Study (2014). http://www.swissagilestudy.ch/ﬁles/2015/05/SwissAgileStudy2014.
pdf. ISSN: 2296–2476
12. Schneider, W.: The Reengineering Alternative. McGraw-Hill Education, Columbus
(2000)
13. Schwaber, K., Beedle, M.: Agile Software Development with Scrum 1st edn.
Pearson (2001)
14. Trist, E.L., Bamforth, K.W.: Some social and psychological consequences of the
longwall method of coal-getting. Technol. Organ. Innov.: The Early Debates 1, 79
(2000)
15. VersionOne. 9th state of agile survey. Technical report, VersionOne Inc. (2015)
16. Whitworth, E., Biddle, R.: The social nature of agile teams. In: Proceedings of the
AGILE 2007, AGILE 2007, pp. 26–36. IEEE Computer Society, Washington, DC
(2007)

A Multiple Case Study on the Architect’s Role in Scrum
Matthias Galster1 ✉ , Samuil Angelov2, Marcel Meesters2, and Philipp Diebold3
(

)

1

2

University of Canterbury, Christchurch, New Zealand
mgalster@ieee.org
Software Engineering, Fontys University of Applied Sciences, Eindhoven, The Netherlands
{s.angelov,m.meesters}@fontys.nl
3
Fraunhofer Institute for Experimental Software Engineering, Kaiserslautern, Germany
philipp.diebold@iese.fraunhofer.de

Abstract. Context: Previous research investigated how to approach architecting
in agile projects (e.g., in terms of processes and practices), but the role that archi‐
tects play in Scrum is still not well understood. Objective: We aim at capturing
scenarios of how architects (or those taking on architecture-related tasks) are
involved in Scrum. Furthermore, we aim at identifying how those taking on the
role of the architect interact with other roles in Scrum. Method: We conducted a
multiple case study and interviews with practitioners from six Dutch software
organizations. Results: We identiﬁed three generic scenarios of architects in
Scrum (“internal architect”, “external architect”, “internal and external archi‐
tects”). We found that how architects interact with other roles in Scrum heavily
depends on the Product Owner role. Conclusions: Some of our results are not in
line with recommended practices in the Scrum Guide. Our ﬁndings support those
who take on architecture-related tasks in preparing for Scrum-like projects.
Keywords: Software architect · Architecture · Agile software development ·
Scrum

1

Introduction

Software architecture is crucial for software project and product success [1]. However,
architecting activities and the role of the software architect are often not explicitly
considered in agile software development approaches (e.g., Scrum) [2]. This allows agile
teams (a) to consider speciﬁcs of individual projects, (b) to not restrict how architecting
responsibilities are distributed but to make most use of the expertise, skills and experi‐
ence of individual team members, and (c) to highlight that every team member is
responsible for all activities needed to make the project a success. Previous research
investigated how software architecting could be approached in agile development
projects, including processes and practices to design and maintain ﬂexible, adaptive,
and evolving architectures [3]. For example, it has been acknowledged that in agile
projects, an architecture may emerge during a project based on an architecture metaphor
and “quick design sessions”, rather than being fully designed upfront [4]. Yet, there is
currently little attention devoted to the role of software architects (i.e., the actor(s) in a
© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 432–447, 2016.
DOI: 10.1007/978-3-319-49094-6_29

A Multiple Case Study on the Architect’s Role in Scrum

433

software development process who perform architecture-related activities) and how this
role (and related tasks and capabilities) appears in agile development projects.
The most frequently used agile development framework is Scrum [5]. The Scrum
Guide1 (which describes recommended practices for Scrum) recommends cross-func‐
tional team members but does not explicitly consider the role of an architect or any other
task-related role (see also the discussion on roles in Scrum in Sect. 2.1). In addition,
anecdotal evidence from industry suggests that there is no explicit role of an architect
in Scrum [4]. However, as discussed by Abrahamsson et al. [2], it is highly recommended
to include an architect role in agile projects: “Software architects are expected to act as
facilitators in whole software development projects and as the representatives of a
system’s overall quality attributes”. Even though there is no dedicated role of an architect
in many organizations that follow Scrum, certain roles in these organizations create
architecture designs and communicate their decisions to development teams [4]. This
means that by adopting Scrum, an organization does not suddenly “forget” that there
are many diﬀerent activities to perform (including design and architecture related activ‐
ities, quality assurance and testing). There must be actors in a Scrum team who perform
these activities. Depending on the type of product and maturity/experience/skills of team
members, those team members might be more or less inﬂuential. However, detailed and
systematic insights into the architect’s role is currently missing. In this paper, we look
beyond agile architecting practices and at investigating the role of architects themselves.
This results in the following research goal:

We focus on Scrum because it is the most often used agile development framework
[5]. Also, focusing on Scrum helps us scope our study and explicate the applicability of
our results. Concrete research questions are formulated in Sect. 3. To achieve the goal
of our study, we conducted a case study in industry [6]. This case study involved inter‐
views with practitioners in Scrum projects in six Dutch companies.
The main contribution of this paper is an empirically-grounded model of the archi‐
tect’s role in Scrum projects, including scenarios that describe interactions of architects
with other roles in Scrum. These roles and interactions may not be in the Scrum Guide.
Furthermore, results help educate and train less experienced (or novice) architects for
architecting activities in Scrum by explicating their position and relationships with other
actors in the development process. Finally, our ﬁndings are a guidance for deﬁning and
understanding roles related to architecting in Scrum when setting up new teams or
projects.
In Sect. 2, we discuss background and related work. In Sect. 3, we present the case
study design. In Sect. 4, we present the results which are discussed in more detail in
Sect. 5. The paper ends with conclusions and future work in Sect. 6.

1

http://www.scrumguides.org/scrum-guide.html.

434

2

M. Galster et al.

Background and Related Work

We ﬁrst discuss Scrum based on the Scrum Guide, focusing on roles in Scrum. We do
not describe Scrum in further detail but refer the reader to literature [7, 8]. We then
discuss the role of architects in general. Finally, we highlight related work of architects
in agile development.
2.1 Roles in Scrum
The Scrum Master takes care of the proper implementation of Scrum and related prac‐
tices. He/she also helps resolve any impediments the team may face and manages
resources (software, hardware, space, time, etc.). Finally, the Scrum Master protects the
development team from undesired inﬂuences during sprints. The development team is
self-organizing and cross-functional (i.e., team members allocate tasks to themselves
and perform activities as needed) and performs all software development activities,
including architectural work if required. The Product Owner performs all communica‐
tions between the team and the stakeholders outside the team (e.g., end users or manage‐
ment). In this sense, the Product Owner shields the team from the external environment.
The Product Owner needs to be an expert in the product domain or should at least be
able to become an expert quickly. As reported by Friedrichsen [9] and Diebold et al.
[10], most Product Owners care only about functional requirements but not about nonfunctional requirements and quality attributes. Typically, Product Owners do not engage
in the architecting aspects of a project as they “are often employees of a business depart‐
ment or a diﬀerent non-IT department” [9].
2.2 Role of the Software Architect
Generally, architects have three main tasks: getting input from outside world (listen to
stakeholders, learn about technologies, etc.), architecting (i.e., make architectural deci‐
sions to decompose systems, select technologies, decide on architectural patterns and
styles, etc.) and providing information (communicate architecture, help stakeholders,
etc.) [11]. Activities of architects and required skills center on processes, practices, and
technologies. The line between “development” and “architecture” is thin [12]. However,
in contrast to architects, developers implement, test and maintain code-related software
artifacts, and spend most of their time coding (relative to other types of activities).
Developer and architect are not necessarily separate: They are roles (not ranks or posi‐
tions), i.e., one position can take on more than one role.
2.3 Related Work on the Role of the Architect in Agile Development
Faber describes experiences from architecting and the role of architects in agile projects
at a speciﬁc company [13]. According to Faber, architects should actively guide but not
dominate developers, i.e., architects should accept deviations from original architectural
designs if developers request (and justify) it [13]. Martini and Bosch introduce three

A Multiple Case Study on the Architect’s Role in Scrum

435

types of architecture-related roles in large agile organizations (not Scrum-speciﬁc): the
chief architect (takes high-level architecture decisions and drives other architects and
teams), the governance architect (role between teams and chief architect) and the team
architect (responsible for architecture in team). These roles interact with diﬀerent types
of teams in large scale projects, i.e., feature teams (responsible for implementing features
on the backlog and therefore most similar to “teams” in Scrum), runway teams (respon‐
sible for the “architecture feature” and architecture refactoring), architecture teams
(groups of architects for diﬀerent projects), and governance teams (teams of architects
and other high-level decision makers). With regards to roles in agile teams, Scott Ambler
argues for “architecture owners” in large teams to facilitate architectural decisions on a
sub-team2. In this sense, architecture owners are similar to a chief architect or gover‐
nance architect deﬁned by Martini and Bosch. In contrast to Martini and Bosch, our
study (a) investigates Scrum in small, medium, and large organizations instead only on
large agile organizations in general, and (b) focuses on the role (in terms of tasks and
capabilities) of architects that interact with other roles in Scrum and teams, rather than
with other roles in an organization in general.
Some frameworks for Scrum at larger scale (e.g., Scrum@Scale3, The Nexus –
Scaled Professional Scrum Framework4, Large-scale Scrum Framework5, “Scrum of
Scrums”6) include high-level architecture-related practices, but do not discuss the role
of the architect in the development process. One exception is SAFe (Scaled Agile
Framework)7, which considers a “System Architect/Engineer” at development program
level (program level is where teams work on a common enterprise mission). There are
also “non-Scrum” agile frameworks, such as the DSDM (Dynamic Systems Develop‐
ment Method)8 which explicitly include architecture-related roles (e.g., the “technical
coordinator” in DSDM designs the system architecture).
In our own previous work, we identiﬁed challenges that architects face in Scrum
projects [14]. This current study is an extension and focuses on the role that architects
play in agile teams and agile development settings, as well as scenarios to describe
interactions of architects with other actors in Scrum projects.

3

Research Method

We study architects in Scrum in practice. Furthermore, architects cannot be studied in
isolation from their context (e.g., organization, project) and we have little control over
all variables (e.g., people, organizational structures). Therefore, we apply case study
research. Case studies oﬀer a deeper understanding of the tasks of architects in Scrum
projects and the context in which they operate. The research followed guidelines
2
3
4
5
6
7
8

http://www.ambysoft.com/essays/agileRoles.html.
https://www.scruminc.com/scrum-scale-case-modularity/.
https://www.scrum.org/Resources/The-Nexus-Guide.
https://less.works/.
https://www.scruminc.com/scrum-of-scrums/.
http://www.scaledagileframework.com/.
https://www.dsdm.org/resources/dsdm-handbooks/dsdm-atern-handbook-2008.

436

M. Galster et al.

described by Runeson and Hoest [6] and is described below. Based on the study goal
deﬁned in Sect. 1, we deﬁned the following research questions (RQ):
• RQ1: What is the position of the architect in Scrum projects?
• RQ2: How does the architect interact with other roles in Scrum?
Case study design: Our study is a multiple case study with six cases (Table 1). Our
research is exploratory as we are looking into an unexplored phenomenon [15]. Our unit
of analysis is the architect (speciﬁcally, tasks and role) in Scrum projects. Our sampling
method is purposive sampling (i.e., we investigate organizations that use Scrum; organ‐
izations were selected on actual Scrum practices rather than only based on their own
claim to use Scrum) augmented with convenience sampling (we selected cases based
on their accessibility and availability of interviewees) [16]. Also, studied organizations
should be representative rather than in very speciﬁc domains. Therefore, we selected six
well-known organizations from the Netherlands that use Scrum and have established
software development practices (due to conﬁdentiality reasons, we only use case
numbers in Table 1).
Table 1. Overview of the case organizations
Case
1
2
3
4
5
6

Domain
Ecommerce
Software
solutions
Finance
Consul‐
tancy
Navigation
systems
Appliances

Size
Medium (<150)

Interviewees
Lead developer

(Multi-)national
Multi-national

Reach
Global

Small (~50)

Lead developer

National

Local

Large (~3,500)
Senior architect
Large (>11,000) Senior architect

National
Multi-national

Local
Global

Large (~4,500)

Multi-national

Global

Multi-national

Global

Large
(>100,000)

Senior architect,
Software architect
Design owner

Preparation for data collection: Data for each case was collected via semi-structured
interviews on-site and follow-up phone calls and e-mails. Interviewees were not only
selected based on their job title, but also based on their tasks and capabilities and their
involvement in architecting activities. All interviewees held representative roles within
their organizations to report on architecture practices, process issues, roles, etc. To
answer our research questions, we asked questions about the position and role of the
interviewees in development projects and about the tasks which they perform using
tasks-descriptions from [11, 17]. For each task, we asked if and how it was performed,
if challenges/problems were observed when performing it, what kind of interactions
would take place, with which actors, etc. Open questions were used to ﬁnd out additional

A Multiple Case Study on the Architect’s Role in Scrum

437

information not captured through questions in the interview guideline9. We took notes
during interviews and each interview was recorded and later transcribed.
Analysis of collected data: The transcripts of recordings were complemented with
information from the notes taken during the interview. The full transcripts were then
analysed and information was clustered. To cluster data, we used open coding where
one code can be assigned to many pieces of text, and one piece of text can be assigned
to more than one code [18]. After initial coding, we looked at groups of code phrases
and merged them into concepts and related them to the role, position, and interactions
of architects. Codes and concepts emerged during the analysis and were not deﬁned upfront. Since data was collected in a case study and is context sensitive, we performed
iterative content analysis to make inferences from collected data in its context [19].
Analysing qualitative interview data requires integrating data where diﬀerent inter‐
viewees might have used terms and concepts with diﬀerent meanings or diﬀerent terms
and concepts to express the same thing. To address this problem, we use reciprocal
translation. Furthermore, we checked with interviewees to ensure that our interpretation
of data and ﬁndings is valid. Data were analysed and discussed by all authors.

4

Study Results

4.1 Overview of Results and Initial Observations
In Table 1, we provide an overview of the cases. “Size” in Table 1 refers to the number
of employees, “(Multi-) national” indicates whether an organization has sites/employees
in diﬀerent countries or only in the Netherlands. “Reach” indicates whether an organi‐
zation targets global or local markets. Below, we provide initial observations for each
case with regards to the role of the architect(s).
• In case 1, there is no explicit architect role in the company. Discussions/decisions
about the architecture involve the whole Scrum team. The lead developer is the most
senior technical person and thus has the ﬁnal say in an architectural decision.
• In the organization of case 2, one project is usually done by one Scrum team. Each
team has a lead developer who is responsible for the architecting activities but the
whole team works on the architecture and architecture-relevant tasks.
• In the organization of case 3, a senior architect residing outside the Scrum team
elaborates a high-level architecture and explains it to the team leader (who also fulﬁls
the role of the Scrum Master). The architect elaborates the detailed architecture
during the project (and if needed evolves the architecture). He supports the team on
architectural aspects throughout the project.
• Case 4 is a consultancy company oﬀering specialized architects to clients. Architects
reside outside the Scrum teams and provide their services and expertise to various
Scrum teams.

9

https://sites.google.com/site/samuilangelov/InterviewQuestions.docx.

438

M. Galster et al.

• In the organization of case 5, there is a senior software architect who is not part of
the Scrum team and another software architect who is part of the Scrum team. Given
the scale of the projects, multiple Scrum teams are involved in single projects. There‐
fore, Scrum is applied integrated in the Scaled Agile Framework (SAFe) with Scrum
teams at “Team” level of SAFe and the senior software architect at “Program” level
(at which teams work towards an enterprise mission).
• In case 6, an architecture team that is located outside Scrum teams maintains a refer‐
ence architecture which Scrum teams have to apply. The reference architecture and
a prototype implementation is provided to a Scrum team at the beginning of a project.
In addition to the architecture team, a system architect outside the Scrum team main‐
tains a requirements speciﬁcation focusing on legal and regulatory aspects of the
software. Furthermore, a Scrum team has a design owner who streamlines the archi‐
tecting activities within the team and elaborates when necessary detailed designs.
To answer RQ1 (position of architects in Scrum), we analyzed the data from the
interviews and the characteristics of the six cases. We identiﬁed the following three
general scenarios (these scenarios are described in further detail in Subsects. 4.2–4.4):
• “Internal architect” scenario in case 1 and case 2
• “External architect” scenario in case 3 and case 4
• “Internal and external architect” scenario in case 5 and case 6
We acknowledge that the three scenarios are still high-level and each scenario comes
in its own “ﬂavor” depending on the organization (as described in Subsects. 4.2–4.4).
Also, these scenarios might not come as a surprise or appear novel. However, in our
study, we derived these scenarios from empirically studying industry practice, rather
than anecdotal evidence from personal experience. Exploratory studies provide a picture
of reality and reality is rarely surprising or controversial [20].
With regards to RQ2 (interactions of architects with other roles in Scrum), we found
that the interactions depend on the positioning of architects in the process and the skills
of Product Owners. Just like in “non-agile” projects, the architect in Scrum projects
needs to document and communicate the architecture; understand requirements, the
overall architecture and its implementation status; and facilitate and oﬀer services to
teams [21]. Overall, architects interact with all other types of roles and stakeholders in
Scrum projects. We discuss RQ2 and interactions in more detail in the following sections
where we introduce the three scenarios found for RQ1.
4.2 “Internal Architect” Scenario
In case 1 and case 2, we found an “internal architect” scenario (Fig. 1) where the architect
is part of the development team. Figure 1 shows an aggregation of the interactions from
the two cases (i.e., we extracted the commonalities from case 1 and case 2). In Fig. 1,
entities within the “Team” indicate roles, rather than individuals.

A Multiple Case Study on the Architect’s Role in Scrum

Stakeholder

Product
Owner

Stakeholder

439

Scrum Master

Architect
Team

Developers, Testers

Stakeholder

Fig. 1. “Internal architect” scenario

The architect may take multiple roles, e.g., also act as developer and contribute to
coding. The architect deﬁnes an architecture draft together with the other team members,
which is maintained, updated and extended throughout the project. If there is no dedi‐
cated architecture expert in the team (as in case 1), the role of the architect can be taken
on by the whole team. As an interviewee in case 2 noted, “It is important to regard all
members of the Scrum team as equal members. Also engineers with limited work expe‐
rience can have an important contribution to the team’s architecture. In this way, the
best comes out of the team, and communication goes much more smoothly”.
With regards to interactions (RQ2), we found in case 1 that Product Owners with no
architectural knowledge, although being open to architecting activities, cannot facilitate
architecture work. “Internal” architects need to directly communicate with external
stakeholders about architecture issues which increases their workload and to some
degree defeats the purpose of the role of the Product Owner in Scrum. As we found in
case 2, Product Owners provide incomplete information to architects and the team.
Therefore (and similar to case 1), external stakeholders interact with the team and the
architect directly and vice versa (indicated by the dotted line in Fig. 1). Some Product
Owners in case 2 take oﬀence from being left out in communications and therefore
conﬂicts between Product Owners and architects may arise. Note that in Fig. 1 the
Product Owner may communicate with any team member (including the architect) and
we do not illustrate the communication within the team (all team members communicate
with each other).
4.3 “External Architect” Scenario
In case 3 and 4, we identiﬁed an “external architect” scenario (Fig. 2). Figure 2 shows
an aggregation of the interactions from the two cases (i.e., we aggregated case 3 and
case 4 into a generic scenario). In this scenario, the architect is not part of the Scrum
team and there is also no “internal” architect. The architect elaborates an initial highlevel design and presents it to the team. The architect is available during the project to

440

M. Galster et al.

clarify the design, to resolve problems, to adapt or extend the architecture or to make
new architectural decisions. The architect also monitors the compliance of the product
with the architecture design.

Stakeholder
Scrum Master
Stakeholder

Stakeholder

Product
Owner

Team

Developers, Testers

Architect

Fig. 2. “External architect” scenario

With regards to RQ2 (interactions), we found in case 3 that architects need to educate
project stakeholders outside the team about architecting and the value of architecting.
However, due to time constraints of the Product Owner in this case, the architect cannot
educate the Product Owner on architecting issues or involve the Product Owner in
architectural decisions. Also, since the architect is external to the team and responsible
for more than one team, the architect is not always available to the teams and is not
always up-to-date on the team’s developments. Therefore, the architect involves team
leaders to act as a “proxy” to represent the architect to the team. In case 4, similar to
case 3, the architect does not communicate with the PO on architectural issues as the
PO has no architecting knowledge (dotted line in Fig. 2). In contrast to case 3, in case
4, the architect is available to the team on a more regular basis.
4.4 “Internal and External Architects” Scenario
In case 5 and 6, we identiﬁed the “internal and external architects” scenario (Fig. 3 again
shows a generalized scenario for case 5 and 6). In this scenario, there is an external and
an internal architect. The “internal” architect deals with tasks related to the day-to-day
work of the team whereas the “external” architect is concerned with higher level deci‐
sions that potentially aﬀect also other teams. In case 6, the external architect is a member
of the architecture team and is coupled with the Scrum team.

A Multiple Case Study on the Architect’s Role in Scrum

441

Stakeholder
Architect
Stakeholder

Stakeholder

Scrum Master

Product
Owner

Team

Developers, Testers

Architect

Fig. 3. “Internal and external architects” scenario

With regards to interactions (RQ2), we found that the external architect interacts
with external stakeholders and the communication between external architect and team
is via the internal architect. In case 5, the external architect is empowered to remove
stories from the sprint backlog which the Product Owner did not specify suﬃciently,
positioning the architect hierarchically higher than the Product Owner. Furthermore,
internal architects in diﬀerent teams are encouraged to communicate between each other
on architectural decisions that aﬀect other teams or parts of a project (not depicted in
Fig. 3, as this is a speciﬁc of case 5). As we found in case 6, there is often a one-directional
communication between external architects and external stakeholders. This means that
the architects receive input from stakeholders, but do not check back with stakeholders
about the quality of their decisions. Furthermore, in case 6, the external architects might
not provide suﬃcient input for the development team during a project, which is crucial
in agile projects where architectural decisions may change over time. To mitigate this
issue, an external architect temporarily joins the development team at the beginning of
a project. In case 6, there is an additional system architect outside the Scrum and archi‐
tecture teams. The system architect interacts with the team independently from the
“external” architect, creating an additional communication channel towards the team
and imposing additional documentation requirements on the team.

5

Discussion

5.1 Relation to Existing Literature
The general idea that architects can appear at diﬀerent positions in agile projects has
been discussed by others [22]. Our work goes beyond the general idea and diﬀers in that
we derive three concrete scenarios for Scrum teams. In this section, we therefore relate
architect scenarios and interactions that we identiﬁed to existing literature. Furthermore,
since we were conducting a qualitative analysis of data, this comparison with literature
strengthens the theoretical validity of each scenario found in our study.

442

M. Galster et al.

• The “internal architect” scenario has similarly been described by Schwaber [8] and
Cohn [23]. Others have argued that all team members are involved in architecting
activities but no single team member will carry the title or full responsibility of an
“architect” [24]. As stated in the agile principles: “The best architectures, require‐
ments, and designs emerge from self-organizing teams”10. Friedrichsen [9] also
argues that architectural work should be done by “[…] either the whole team, or at
least some speciﬁcally skilled team members - to share the knowledge across the
team”. Similarly, Waterman et al. [25] view architectural decisions as a result of the
collaboration of the whole team. In contrast to Martini and Bosch who deﬁne the
team architect as a role taken by a technical leader or experienced developer [26],
we found that this role can also be taken by the whole team. Van der Ven and Bosch
conclude in their study that an architect role in the team (either shared between all
team members or carried by one team member) signiﬁcantly contributes to successful
projects and besides architects, the Product Owner and the development team are
also “involved in, and sometimes responsible for, the architecture decision process”
[27]. We found that the team is involved in the architecting process but not necessarily
the Product Owners due to their lack of understanding of architecting issues.
• In the “external architect” scenario, the architect might work with multiple agile
teams and partner with other architects (for example, as a project architecting team
or as a member of an architecture board where common architecting practices are
agreed and implemented) [13]. As suggested by Faber [13] and Madison [28], an
“external” architect focuses on the high-level architecture diagrams, the architectural
choices and quality attributes. The architect communicates with external stakeholders
and works closely with the Product Owner and with the team to deﬁne and evolve
the architecture. According to literature, good communication with the Product
Owner is vital [13, 28]. However, in our study, we observed insuﬃcient communi‐
cations with Product Owners due to their lack of architectural knowledge. The
“external architect” in our study is similar to the role of the “chief architect” identiﬁed
by Martini and Bosch. However, “chief architect” implies a formally recognized role
to take high-level decisions to drive the rest of architects to support strategic business
goals [26]. This might apply to large-scale organizations, but in our case 4 we found
that external architects do not necessarily have the role of the chief architect respon‐
sible for higher level decisions. This role might exist, but was not found to be inter‐
acting with Scrum teams in particular. However, a role similar to the “governance
architect” who would be responsible for scalability of agile architecting in a large
setting as an intermediate between chief architect and team [29] could be considered
the “external” architect in this scenario.
• The “internal and external architects” scenario is similar to the setup discussed by
Fowler [17] and Abrahamsson et al. who deﬁne two types of architects in agile
projects: architects who focus on the big decision, who are facing requirements and
act as external coordinators (“architectus reloadus”), and architects who focus on
internal team communications, mentoring, troubleshooting and code (“architectus
oryzus”) [2]. It could also be seen as the setup of a team architect combined with a
10

http://agilemanifesto.org/principles.html.

A Multiple Case Study on the Architect’s Role in Scrum

443

governance architect discussed by Martini and Bosch [26]. As noted by Abrahamsson
et al. [2], the “internal and external architect” setup is mostly valuable in projects
developing a “large, challenging, novel system”. In the study presented by Babar
[30], the two roles are named “solution architect” (performing more management
activities) and an “implementation architect” (ensuring the implementation of the
user stories and mentoring the developers).
5.2 Implications for Practice and Research
Overall, our study contributes to highlighting diﬀerent interaction scenarios of those
responsible for architecting activities in Scrum. Even though the architect may not be
an explicit and formally recognized role, related tasks and capabilities exist in Scrum
settings. In this section, we summarize potential scenario-speciﬁc and general implica‐
tions for practice and research:
• In the scenarios that involve external architects (i.e., the “external architect” and the
“internal and external architects” scenarios), we noticed that external architects fail
to provide suﬃcient information to external stakeholders about architectural deci‐
sions made (cases 5 and 6) and to teams, including “internal” architects (cases 3 and
6). This defeats the purpose of external architects. A possible explanation can be their
high and diverse workloads (reported in case 3), geographical distance between a
team and external architect (reported in case 5), or external stakeholders who lack an
understanding of the value of architecting (cases 3 and 5). Further research is needed
to better understand the reasons for external architects not performing some of the
activities for which they are responsible.
• In the “external architect” scenario, potential challenges for the architect can be
caused by the distance between the architect and the development team. Since there
is no “internal” architect, architecturally relevant input from the team may therefore
be harder to capture at the beginning of the project. This could cause problems in
particular in critical projects. Also, the external architect may not be able to constantly
support and mentor the team during the project. Moreover, being an outsider, the
architect may face diﬃculties in convincing the team in the rightfulness of his/her
ideas and into following the architectural decisions consistently throughout the
project. Architecturally relevant changes requested by the team during a sprint may
be harder to address due to the lack of easy access to the architect.
• The “external architect” scenario seems to contradict the idea of a cross-functional
team in Scrum and the Scrum paradigm, while the “internal architect” scenario is
what the Scrum framework expects and supports. However, in organizations with
the “external architect” scenario we found other practices recommended in Scrum.
It could be argued that the “external architect” scenarios show that organizations who
claim to adopt Scrum might simply be struggling with doing so (e.g., due to unclear
responsibilities for architecture-related tasks).
• In the “internal and external architects” scenario, synchronization of ideas and under‐
standing between the external and internal architects may be diﬃcult and even lead
to conﬂicts. This is particularly true where no clear hierarchical structure exists

444

M. Galster et al.

between external architects, the team (including team architect, Scrum Master) and
Product Owner.
• The tasks of the architect in Scrum are similar to the tasks of architects in non-Scrum
projects (see general role of the software architect discussed in Sect. 2.2). However,
in all cases of our study, we observed that architects in Scrum projects should have
certain skills that go beyond “traditional” architecting, such as understanding agile
practices, being able to communicate to and convince a Product Owner about their
decisions, and being able to fulﬁll (or delegate) responsibilities of diﬀerent types of
architects as described in the diﬀerent scenarios. Furthermore, Scrum architects may
perform architecting activities diﬀerently, e.g., architect smaller parts of the design
early on, interacting with a more diverse range of stakeholders, apply speciﬁc prac‐
tices, such as agile modeling as suggested by Scott Ambler11 (i.e., model and docu‐
ment “just barely good enough” architecture, modeling as part of iteration planning
activities, documenting continuously but late, etc.). Identifying architecture skills in
agile settings that potentially diﬀer from skills required for “traditional” up-front
architecting is a topic for future research.
• We found that the activities of architects were signiﬁcantly aﬀected by the architec‐
tural background, knowledge, and availability of Product Owners. An architecturally
un-savvy Product Owner may introduce multiple challenges to an architect. For
example, architects may face the need to cope with low priorities of architecturally
critical user stories (e.g., stories related to refactoring) prioritized by the Product
Owner without understanding the signiﬁcance of these stories for the product. Also,
architects may not involve Product Owners in architecting activities (e.g., in an
architecture review to evaluate “the proposed architectural solutions against the
architecturally signiﬁcant requirements” [30]) and communicate their decisions
directly to the team and external stakeholders. This seems to contradict common agile
principles and deﬁnitions of roles in Scrum. According to the Scrum Guide, the
Product Owner’s “decisions are visible in the content and ordering of the Product
Backlog. No one is allowed to tell the Development Team to work from a diﬀerent
set of requirements, and the Development Team isn’t allowed to act on what anyone
else says”. Some of these variations of Scrum in practice have already been discov‐
ered [10]. In theory, an architecturally savvy Product Owner could substantially
facilitate the work of the architect. The Product Owner could provide a single point
of communication for the information provided by external stakeholders, simplifying
communications between architects and external stakeholders. This implies that the
Product Owner needs to collect and provide architecturally relevant information,
communicate architectural decisions to the stakeholders, and even get involved in
architecting activities (e.g., architecture reviews). All activities related to “getting
input” for architecting and “providing information” to stakeholders that go beyond
the team would target the Product Owner rather than the external stakeholders.
However, this additional responsibility requires more eﬀort and time from the
Product Owner, but also competence in architecting activities. Except for case 5, in
all our cases, the Product Owners were not skilled enough to support architects in
11

http://www.agilemodeling.com/.

A Multiple Case Study on the Architect’s Role in Scrum

445

their activities. Product Owners must be trained to not hamper communication
between Product Owner and team due to lacking architecture related knowledge (and
thus inability to provide the team with architecture-relevant input from the stake‐
holders).
• Architects faced challenges with regards to adoption of their architectural decisions
in all but one cases. These were mostly due to disagreements with other stakeholders
or team members. Related to that Woods reports that “diﬃculties frequently arise
when agile development teams and software architects work together” and proposes
general architecture practices that encourage collaborative architecture work in agile
development [31]. These issues were not reported for architects only in case 2. This
could be because the organization of case 2 follows strictly Scrum practices (advo‐
cating team autonomy and team architecting). Again, further studies are needed to
understand what Scrum practices aﬀect architecting, interactions between architects
and other actors, and in what way.
5.3 Threats to Validity
With regards to construct validity (did we measure what is intended), our study is limited
since we gathered data only from a limited number of sources. However, we obtained
insights from diﬀerent organizations and projects. Also, we followed semi-structured
interviews based on an interview guideline. We included control questions and checked
the accuracy of data with the organizations. Also, some ﬁndings about Product Owners
are based on the perception of those involved in architecting activities. Future work may
address how Product Owners see their role in architecting and supporting teams in
architecting related activities. With regards to external validity (extend to which ﬁndings
are of interest outside the investigated cases) we acknowledge that we focus on an
analytical generalization (i.e., our results are generalizable to other organizations that
have similar characteristics as the cases in our case study and use Scrum). The list of
scenarios is based on six cases and is insuﬃcient for drawing major conclusions.
However, we also compared our ﬁndings with other literature to increase theoretical
validity. With regards to reliability (how data analysis depends on researchers), we
recorded interviews and interview data, and reviewed data collection and analysis
procedures before conducting the study. Also, the types of scenarios found in organi‐
zations could depend on the type of organization and its complexity. Cases 1 and 2 are
from small and medium-sized companies, while cases 3, 4, 5, 6 are from large organi‐
zations. Internal validity is not a concern since our exploratory case study does not make
any claims about causal relationships.

6

Conclusions

We performed a case study with six cases involving companies that apply Scrum prac‐
tices to better understand the role of architects in Scrum projects. The cases were chosen
to cover diﬀerent scenarios of how architects can be involved in Scrum. We identiﬁed
three scenarios: “internal architect”, “external architect” and “internal and external

446

M. Galster et al.

architect”. With regards to communication and interactions of architects, we found that
architecting activities are signiﬁcantly impacted by the skills of the Product Owner role
in Scrum. In order to improve architectural work in Scrum settings, attention may be
paid to educating Product Owners in architectural relevant matters in order to (a) appre‐
ciate the value of architecting, and (b) help Product Owners extract architecturally rele‐
vant information from stakeholders to support architects in performing their tasks. Some
of our ﬁndings are not in line with the Scrum Guide, but represent variations in Scrum
as presented in [10]. The ﬁndings reported in this paper increase architects’ awareness
(especially of novice architects) about diﬀerent positions in a Scrum process and inter‐
actions with other roles in Scrum. They support architects (or individuals who take on
architecture-related tasks) in preparing for participating in Scrum-like projects. Further‐
more, our ﬁndings support setting up new teams, processes, and projects.
Some future work has already been outlined in the previous section. As further
research, we plan to extend the number of cases (covering companies of diﬀerent sizes
and diﬀerent countries) and provide more general conclusions and a more reﬁned list of
roles and interactions. Furthermore, we plan on identifying relevant architecting skills
for the diﬀerent roles in Scrum. Finally, we aim at investigating group dynamics and
organization theory to understand the three scenarios and in particular interactions in
more detail.

References
1. Bass, L., Clements, P., Kazman, R.: Software Architecture in Practice. Addison-Wesley,
Boston (2012)
2. Abrahamsson, P., Babar, M.A., Kruchten, P.: Agility and architecture: can they coexist? IEEE
Softw. 27, 16–22 (2010)
3. Yang, C., Liang, P., Avgeriou, P.: A systematic mapping study on the combination of software
architecture and agile development. J. Syst. Softw. 111, 157–184 (2016)
4. Eloranta, V.-P., Koskimies, K.: Lightweight architecture knowledge management for agile
software development. In: Babar, M.A., Brown, A., Mistrik, I. (eds.) Agile Software
Architecture. Morgan Kaufmann, Boston (2014)
5. VersionOne Inc.: 9th Annual State of Agile Survey (2015)
6. Runeson, P., Hoest, M.: Guidelines for conducting and reporting case study research in
software engineering. Empirical Softw. Eng. 14, 131–164 (2009)
7. Schwaber, K.: Agile Project Management with Scrum (Developer Best Practices). Microsoft
Press (2004)
8. Schwaber, K., Beedle, M.: Agile Software Development with Scrum. Prentice Hall, Upper
Saddle River (2002)
9. Friedrichsen, U.: Opportunities, threats, and limitations of emergent architecture. In: Babar,
M.A., Brown, A., Mistrik, I. (eds.) Agile Software Architecture, pp. 335–355. Morgan
Kaufmann, Boston (2014)
10. Diebold, P., Ostberg, J.-P., Wagner, S., Zendler, U.: What do practitioners vary in using
scrum? In: Lassenius, C., Dingsøyr, T., Paasivaara, M. (eds.) XP 2015. LNBIP, vol. 212, pp.
40–51. Springer, Heidelberg (2015). doi:10.1007/978-3-319-18612-2_4
11. Kruchten, P.: What do software architects really do? J. Syst. Softw. 81, 2413–2416 (2008)
12. Brown, S.: Software Architecture for Developers. Leanpub (2012)
13. Faber, R.: Architects as service providers. IEEE Softw. 27, 33–40 (2010)

A Multiple Case Study on the Architect’s Role in Scrum

447

14. Angelov, S., Meesters, M., Galster, M.: Architects in scrum: what challenges do they face?
In: 10th European Conference on Software Architecture (ECSA), Copenhagen, Denmark.
Springer, Switzerland (2016, in press)
15. Wohlin, C., Runeson, P., Hoest, M., Ohlsson, M., Regnell, B., Wesslen, A.: Experimentation
in Software Engineering. Kluwer Academic Publications, Boston (2000)
16. Kitchenham, B., Pﬂeeger, S.L.: Principles of survey research - Part 5: populations and
samples. ACM SIGSOFT Softw. Eng. Notes 27, 17–20 (2002)
17. Fowler, M.: Who needs an architect? IEEE Softw. 20, 2–4 (2003)
18. Miles, M.B., Huberman, A.M.: Qualitative Data Analysis. Sage Publications, Thousand Oaks
(1994)
19. Krippendorﬀ, K.: Content Analysis: An Introduction to Its Methodology. Sage Publications,
Thousand Oaks (2003)
20. Torchiano, M., Ricca, F.: Six reasons for rejecting an industrial survey paper. In: First
International Workshop on Conducting Empirical Studies in Industry (CESI), pp. 1–6. IEEE
Computer Society, San Francisco (2013)
21. Babar, M.A., Ihme, T., Pikkarainen, M.: An industrial case of exploiting product line
architectures in agile software development. In: 13th International Software Product Line
Conference, pp. 171–179. CMU, San Francisco (2009)
22. Rost, D., Weitzel, B., Naab, M., Lenhart, T., Schmitt, H.: Distilling best practices for agile
development from architecture methodology. In: Weyns, D., Mirandola, R., Crnkovic, I. (eds.)
ECSA 2015. LNCS, vol. 9278, pp. 259–267. Springer, Heidelberg (2015). doi:
10.1007/978-3-319-23727-5_21
23. Cohn, M.: Succeeding with Agile: Software Development Using Scrum. Addison-Wesley,
Reading (2009)
24. Beck, K.: Extreme Programming Explained. Addison-Wesley, Boston (1999)
25. Waterman, M., Noble, J., Allan, G.: How much up-front? a grounded theory on agile
architecture. In: 37th International Conference on Software Architecture, pp. 347–357. IEEE
Computer Society, Florence (2015)
26. Martini, A., Bosch, J.: A multiple case study of continuous architecting in large agile
companies: current gaps and the caﬀea framework. In: 13th Working IEEE/IFIP Conference
on Software Architecture (WICSA), pp. 1–10. IEEE, Florence (2016)
27. van der Ven, J.S., Bosch, J.: Architecture decisions: who, how and when? In: Babar, M.A.,
Brown, A., Mistrik, I. (eds.) Agile Software Architecture, pp. 113–136. Morgan Kaufmann,
Boston (2014)
28. Madison, J.: Agile architecture interactions. IEEE Softw. 27, 41–48 (2010)
29. Martini, A., Pareto, L., Bosch, J.: Role of architects in agile organizations. In: Bosch, J. (ed.)
Continuous Software Engineering, pp. 39–50. Springer, Berlin (2014)
30. Babar, M.A.: An exploratory study of architectural practices and challenges in using agile
software development approaches. In: Joint Working IEEE/IFIP Conference on Software
Architecture & European Conference on Software Architecture (WICSA/ECSA), pp. 81–90.
IEEE Computer Society, Cambridge (2009)
31. Woods, E.: Aligning architecture work with agile teams. IEEE Softw. 32, 24–26 (2015)

Continuous Integration Applied to Software-Intensive
Embedded Systems – Problems and Experiences
Torvald Mårtensson1 ✉ , Daniel Ståhl2, and Jan Bosch3
(

)

1

3

Saab AB, Linköping, Sweden
torvald.martensson@saabgroup.com
2
Ericsson AB, Linköping, Sweden
daniel.stahl@ericsson.com
Chalmers University of Technology, Gothenburg, Sweden
jan@janbosch.com

Abstract. In this paper we present a summary of factors that must be taken into
account when applying continuous integration to software-intensive embedded
systems. Experiences are presented from two study cases regarding seven topics:
complex user scenarios, compliance to standards, long build times, many tech‐
nology ﬁelds, security aspects, architectural runway and test environments. In the
analysis we show how issues within these topics obstruct the organization from
working according to the practices of continuous integration. The identiﬁed
impediments are mapped to a list of continuous integration corner-stones
proposed in literature.
Keywords: Software integration · Software testing · Continuous integration ·
Embedded systems

1

Introduction

Continuous integration is widely promoted as an eﬃcient way of conducting software
development. The practice is said to enable that tests can start earlier, that bugs are
detected earlier and to increase developer productivity [3, 12].
Martin Fowler’s popular article [4] is often referred to as a summary of the practice
of continuous integration. Paul Duvall summarizes continuous integration in a similar
way into a list of seven corner-stones [3]. The corner-stones (here labelled C1-C7) are
presented in Table 1.
Applications of continuous integration and other agile practices on large, complex
systems have been presented by Craig Larman and Bas Vodde [7] and Dean Leﬃngwell
[8]. There are also reports describing various experiences from introducing continuous
integration practices, often together with other agile practices [2, 6, 9–11]. However,
these reports do not describe experiences from applying continuous integration to soft‐
ware-intensive embedded systems (software systems combined with electronical and
mechanical systems).

© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 448–457, 2016.
DOI: 10.1007/978-3-319-49094-6_30

Continuous Integration Applied to Software-Intensive Embedded Systems

449

Table 1. Duvall’s seven corner stones of continuous integration
Id
C1

C2
C3
C4
C5
C6
C7

Continuous integration corner stone
All developers run private builds on their own workstations before committing their
code to the version control repository to ensure that their changes don’t break the
integration build
Developers commit their code to a version control repository at least once a day
Integration builds occur several times a day on a separate build machine
100 % of tests must pass for every build
A product is generated that can be functionally tested
Fixing broken builds is of the highest priority
Some developers review reports generated by the build, such as coding standards and
dependency analysis reports, to seek areas for improvement

The topic of this paper is an overview and discussion of factors speciﬁc for softwareintensive embedded systems that could constrain a full adaptation of continuous inte‐
gration (as deﬁned by Duvall’s corner-stones). The authors of this paper have under a
long period of time been involved in software development projects for large-scale and
complex systems. Through our work in various roles related to integration and testing,
we have gained experiences of problems and issues related to the practices of continuous
integration.
The main contribution of this paper is a summary of factors that we believe must be
taken into account when applying continuous integration to software systems combined
with electrical and mechanical systems. Further work could examine solution
approaches that can be applied in multiple case-studies.
The remainder of this paper is organized as follows. In the next section the study
cases are described. Subsequently in Sect. 3 we present the problems and issues that we
have experienced regarding seven topics. In Sect. 4, we present an analysis of how the
topics described in Sect. 3 are related to the corner-stones for continuous integration that
were presented in Sect. 1. The paper is concluded in Sect. 5 where we summarize those
relationships.

2

Case Study Companies

In order to discuss impediments for continuous integration, we will compare experiences
from two study cases, which both are companies developing large-scale and complex
software for products which also include a signiﬁcant amount of mechanical and elec‐
tronical systems.
2.1 Study Case A
Study Case A is a telecommunications company with a wide range of products that
serves the B2B market. The products are highly software-intensive, but also include
signiﬁcant electronical and mechanical parts.

450

T. Mårtensson et al.

Study Case A has an advanced system of automated build and test, which has been
implemented to support continuous integration. Build, test and analysis of varying
system scope and coverage run both on event basis and on ﬁxed schedules, depending
on needs and circumstances. A wide range of physical target systems as well as a multi‐
tude of both in-house and commercial simulators are used to execute these tests.
2.2 Study Case B
Study Case B is developing airborne systems and their support systems. The main
product is the Gripen ﬁghter aircraft, which has been developed in several variants.
Gripen was taken into operational service in 1996. An updated version of the aircraft
(Gripen C/D) is currently operated by the air forces in Czech Republic, Hungary, South
Africa, Sweden and Thailand. The next major upgrade (Gripen E/F) which will include
both major changes in hardware systems (sensors, fuel system, landing gear etc.) and a
completely new software architecture.
Continuous integration practices such as automated testing, private builds and inte‐
gration build servers are applied in development of software for the Gripen computer
systems. The software teams commit to a common mainline. Testing is conducted in
simulated environments, rigs and test aircraft.

3

Problems and Experiences

In this section we will compare the conditions at Study Case A and Study Case B
regarding seven topics (derived from the characteristics of the companies’ products).
The seven topics are shown in Table 2. In general, our experiences of applying contin‐
uous integration practices are positive, but we present challenges related to applications
with complex software systems together with mechanical and electronical systems.
Table 2. The seven topics discussed in Sect. 3
Id
T1
T2
T3
T4
T5
T6
T7

Topic title
Complex user scenarios need manual testing
Compliance to standards shifts focus away from working software
Longer build time due to tightly coupled systems
Complete system a secondary concern due to many technology ﬁelds
Restricted access to information due to security aspects
End-to-end testing impossible without architectural runway
Test environments often a limited resource with bespoke hardware

3.1 T1: Complex User Scenarios Need Manual Testing
Study Case A is developing communications solutions where systems interact which
other systems. The user experience is limited to measurable capabilities such as quality

Continuous Integration Applied to Software-Intensive Embedded Systems

451

and data transfer speed. Every other aspect of the user experience is linked to the user
interface of products that are provided by other companies.
Study Case B on the other hand develops a product where the pilot cockpit is a vital
part of the product. The pilot’s judgment is critical with regards to whether the presen‐
tation and manoeuvring of sensors, weapons and other systems on the displays can
support the pilot to fulﬁl the assigned missions.
Our experience is that usability testing for a product such as the Gripen ﬁghter (Study
Case B) is very diﬃcult to discuss in terms of automated testing. Testing with the purpose
of checking if for example a symbol is presented after a button is pressed can be auto‐
mated, but the pilot’s judgement when evaluating a complex user scenario is extremely
diﬃcult to replace with an automated test case. Our experience is that the challenges of
testing which include subjective experiences are clearly valid for Study Case B, but are
much less pronounced (if present at all) at Study Case A.
3.2 T2: Compliance to Standards Shifts Focus Away from Working Software
Development of airborne systems follows standards like DO-178B or speciﬁcally in
Sweden RML-V-5. Development is to a great extent requirement-driven, where highlevel requirements are broken down into low-level system requirements. Speciﬁc roles
are responsible for quality assurance through reviews and audits. The telecom industry
also has rules and regulations, but often not to the same extent as avionics software
systems.
If evidence that the product is compliant to a standard is at the same importance as
the product itself, however, a document review can be seen as time-critical and be given
higher priority than software problems. Our experience is that Study Case B (ﬁghter
aircraft) to a greater extent than Study Case A (telecom systems) has milestones and
project progress connected to audits (on system design or software) or formal documents
(a document is issued that is required at a certain stage in the process).
3.3 T3: Longer Build Time Due to Tightly Coupled Systems
The Gripen aircraft (Study Case B) is a highly integrated system which uses rate-mono‐
tonic scheduling with a cyclic execution pattern. Both execution within a computer and
communication between the central computers are scheduled. Our experience is that
when working with a highly integrated (tightly coupled) system, a small delivery to the
main track may cause building and linking of a large part of the computer system which
implies long build times.
Study Case A’s telecom systems have varying degrees of real time characteristics,
typically depending on the level of abstraction with regards to the underlying physical
interfaces. Similarly, the degree of coupling and ability to modularize also varies. Study
Case A has had (where possible) very positive experiences of increasing “integration
time” modularity – in other words, building and testing the systems in smaller, inde‐
pendent pieces. This approach is impeded by the tighter coupling of Study Case B.

452

T. Mårtensson et al.

3.4 T4: Complete System Secondary Concern Due to Many Technology Fields
Development of a product requires knowledge of all technology ﬁelds that the product
covers. The Gripen aircraft (Study Case B) covers technology ﬁelds spanning from for
example aerodynamics, engine control and electrical power system to communication
system, navigation and mission planning. The telecom products of Study Case A also
covers many technology ﬁelds, such as network optimization or handling of customer
data.
Our experience is that a large number of technology ﬁelds fosters silo behaviours.
The organization tends to establish tailored ways of working for each system (technology
ﬁeld) and also tends to see it as “our system”, and treating the complete system as a
secondary concern. This is arguably as a consequence of limited understanding of the
unique challenges and requirements governing the many parts of the complete system.
Silo mentality in not unique for this scenario, but we ﬁnd it severely exacerbated when
these silos operate in separate engineering disciplines with little or no understanding of
one another’s unique characteristics or challenges.
3.5 T5: Restricted Access to Information Due to Security Aspects
All companies have to take into account how to protect company conﬁdential informa‐
tion. Almost every company has a strategy for how to avoid information leakage.
Another aspect is the ability to protect customer data. That is, to ensure that information
about one customer’s performance or available functionality is not exposed to other
customers. Both Study Case A and Study Case B must make allowances for this.
A third aspect is defence-related security. Defence-related security includes safe‐
guarding of national security and foreign policy objectives for all (military) customers,
but also to follow export control regulations for parts or sub-systems supplied by a
foreign vendor. US arms regulations demand that it is secured that only speciﬁed indi‐
viduals have access to software included in defence-related items, which increases the
diﬃculty of a common understanding of the product. Export control of US technology
(especially arms regulations) is regulated by The International Traﬃc in Arms Regula‐
tions. Our experience is that these regulations are aﬀecting Study Case B (ﬁghter
aircraft), but are not relevant for Study Case A (telecom systems).
3.6 T6: End-to-End Testing Impossible Without Architectural Runway
Platforms like .NET or Java Virtual Machine possible for a developer to rapidly produce
software that includes both user input/output and communication with other software
modules. Embedded systems developed by Study Case A (telecom systems) and Study
Case B (ﬁghter aircraft) are not built on a commercially available platform like .NET.
Instead, the development of an entirely new product includes a long period of in-house
construction of a platform with all infrastructure functions. When you start from a clean
slate you give up the luxury of a platform with working infrastructure including for
example communication between systems, functional monitoring or data registration.

Continuous Integration Applied to Software-Intensive Embedded Systems

453

Dean Leﬃng well deﬁnes the term architectural runway as infrastructure suﬃcient
to allow incorporation of new requirements (new functionality) [8]. Development for
bespoke hardware with tight dependencies to the physical interfaces miss out the beneﬁts
from a commercially available platform. Consequently, the architectural runway is much
longer.
Our experience from both study cases is that at the initial phase of development of
a new product (lasting for a signiﬁcant part of the project) the sub-systems cannot be
integrated. Due to this, the product cannot for a long time be functionally tested end-toend to expose any problems.
3.7 T7: Test Environments Often a Limited Resource with Bespoke Hardware
Development of embedded systems is highly dependent on bespoke hardware, both
mechanical and electronical parts. The telecommunication equipment delivered by
Study Case A (for example network nodes) often contain specialized internally devel‐
oped hardware, and is deployable in a large number of variants. The equipment may
also coexist with a wide variety of topologies, including equipment developed by Study
Case A and/or any competing vendor. The computer system in the Gripen aircraft (Study
Case B) is built on internally developed hardware and equipment developed for aero‐
nautical applications. Gripen is designed in diﬀerent variants, and each variant have subvariants. Simulators with models of hardware are used by both Study Case A and B, but
have limitations regarding for example timing.
When the system is based on bespoke hardware (not running on any standard
computer) and hardware is considered expensive or in short supply, the test environ‐
ments often become a limited resource. Further on, a large number of hardware conﬁg‐
urations (caused by customer-speciﬁc hardware) increases the test eﬀort needed for
every build. Our experience is that both Study Case A and Study Case B are highly
dependent on bespoke hardware, with Study Case A having to handle a greater degree
of diﬀerences in hardware conﬁgurations.

4

Analysis

In the previous section we compared the conditions at Study Case A and Study Case B
regarding seven topics (T1-T7 in Table 2) related to product characteristics, based on
our experiences. In this section we will analyse how this relates to the seven-bullet
summary of continuous integration (C1-C7 in Table 1).
4.1 C1: All Developers Run Private Builds
The ﬁrst corner stone (C1) states that “All developers run private builds on their own
workstations before committing their code to the version control repository to ensure
that their changes don’t break the integration build”.
Test environments easily become a limited resource if the system is based on bespoke
hardware (T7). We argue that if the developers build and test in a simulated environment,

454

T. Mårtensson et al.

they cannot fully ensure that the exact same test cases will not expose problems during
test activities that run on real hardware.
4.2 C2/C3: Commit Code and Build Often
As we ﬁnd the two corner-stones “Developers commit their code to a version control
repository at least once a day” (C2) and “Integration builds occur several times a day
on a separate build machine” (C3) related they will be jointly discussed.
Build time is correlated with the size of the code base. If a product can be divided
into several parts that are built and linked in parallel as separate binaries, build time can
be reduced. If the product is a tightly coupled system, such sectioning is more diﬃcult
or even impossible which implicates a longer build time. We argue that a long buildand test-time (T3) reduces the developer’s interest in committing to the main track often,
and the developers will not commit their code to the repository at least once a day. Kent
Beck quite simply states that “if integration took a couple of hours, it would not be
possible to work in this style” [1]. If build- and test-time for the integration build (T3)
extends to several hours, this severely limits the number of integration builds that can
be produced in a day.
4.3 C4: 100 % of Tests Must Pass for Every Build
To use automated tests to support the practices of continuous integration is a far more
eﬀective approach than manual testing [3]. We ﬁnd automated tests to be a prerequisite
for the continuous integration of any not-trivial software system.
Testing should include diﬀerent categories of tests, from unit tests and component
tests to functional tests and tests of load/performance and other capabilities. Tests of
Human Machine Interaction (HMI) diﬀer from other types of testing, as the purpose of
the tests are to check that the usability is considered at least good enough by user repre‐
sentatives. Manual usability tests take longer time to execute and are less predictable
than automated tests, which means they cannot be repeated for every integration build
(at a build rate of several builds a day or more).
When the system is based on bespoke hardware (not running on any standard
computer) and hardware is considered expensive or in short supply, the test environ‐
ments soon become a limited resource. Further on, a large number of hardware conﬁg‐
urations (caused by customer-speciﬁc hardware) increases the test eﬀort needed for
every build. With a wide range of hardware conﬁgurations it is no longer clear what
“100 % of tests must pass” actually means – does it mean testing on all valid conﬁgu‐
rations or a representative subset?
Test environments more easily become a limited resource with bespoke hardware,
especially if the product uses many hardware conﬁgurations which increases the test
eﬀort (and consequently the demand of test environments). A large number of hardware
conﬁgurations also increases the risk for ﬂaky tests, as there are more test environments
to maintain.

Continuous Integration Applied to Software-Intensive Embedded Systems

455

We argue that both a product with complex user scenario testing (T1) and many
bespoke hardware conﬁgurations (T7) can be impediments when trying to adhere to the
rule that “100 % of tests must pass for every build”.
4.4 C5: A Product Is Generated That Can Be Functionally Tested
Before a ﬁrst version of all infrastructure for the complete product has been developed,
the developers don’t have a minimum viable product which then can be incrementally
expanded upon. That is, before the architectural runway is established, the product
cannot be generated (assembled) and cannot be functionality tested end-to-end (T6).
Another aspect is that is important that all participants have common understanding
of the desired functionality of the product. We argue that if the product has a large
number of technology ﬁelds (T4) and especially if the technology ﬁelds are not adjacent,
it becomes more diﬃcult to agree on the content and meaning of functionality tests.
Security aspects (T5) can also be an impediment, such as when developers are hindered
from communicating freely regarding the exact content of the functions they have built.
This further increases the diﬃculty of a common understanding of the product, which
also becomes an impediment related to testing the product end-to-end.
4.5 C6: Fixing Broken Builds Is of the Highest Priority
Fixing broken builds fast restores the conﬁdence for a stable and sound main track. If
status of the software is undisputed as the full picture of status in the project, it is easy
to keep focus on ﬁxing broken builds fast.
“Working software over comprehensive documentation” is one of the values in the
agile manifesto, which also fully applies to continuous integration. This might be seen
as a value that collides with the principles of development of safety-critical, highly
regulated software such as medical devices, nuclear power stations or ﬂight-critical
software. This conﬂict is also discussed by Janet Gregory and Lisa Crispin [5].
Regulated environments typically apply one or several standards that require that
the developing organization should “show evidence” of compliance to the standard,
which should be done in written documents. We argue that the obligation to show
compliance to a standard (T2) can be an impediment in relation to the intention of ﬁxing
broken builds as the highest priority.
4.6 C7: Developers Review Reports to Seek Areas for Improvement
The last corner stone states that “Some developers review reports generated by the build,
such as coding standards and dependency analysis reports, to seek areas for improve‐
ment”. We argue in the same way as for corner-stone C5 (Sect. 4.4) that a large number
of technology ﬁelds (T4) and security aspects (T5) make it more diﬃcult to achieve a
common understanding of the product. Only a few people have an overview of the whole
product, and in many cases information cannot be shared due to security restrictions. To
some extent, this aﬀects how developers review reports on other parts of the product
than where they are working themselves.

456

5

T. Mårtensson et al.

Conclusion

The analysis in the previous section relates the seven topics to the corner stones for
continuous integration that were presented in the introduction. The analysis is summar‐
ized into the following bullets:
• If the developers run tests in a simulated environment, they cannot fully ensure that
the same tests will pass for the integration build that runs on real hardware
• Tightly coupled systems (causing long build- and test-time) implies additional chal‐
lenges related to frequent deliveries and integration builds several times a day
• A product with complex user scenarios and/or bespoke hardware (especially a large
number of hardware conﬁgurations) implies that the rule “all tests must pass for every
build” must be replaced with other testing approaches
• In a highly regulated environment, “ﬁxing broken builds” must be balanced against
other project objectives
• At the initial phase of development of a new product (before the architectural runway
is established) the sub-systems cannot be assembled in order to test the system func‐
tionally end-to-end and expose any integration problems
• It is more diﬃcult to achieve a common understanding of a product with a large
number of technology ﬁelds or security aspects, which aﬀects tests and reviews
The relations that were found are summarized in Fig. 1.

Fig. 1. Relations between corner stones and impediments

We believe that these experiences represent an area of further work of high relevance
to large segments of the software industry. Any research promising to mitigate the
discussed impediments would be of great value in the embedded software development
community.

Continuous Integration Applied to Software-Intensive Embedded Systems

457

References
1. Beck, K.: Extreme Programming Explained: Embrace Change. Addison-Wesley Professional,
Reading (2004)
2. Downs, J., Hoskins, J., Plimmer, B.: Status communication in agile software teams: a case
study. In: Fifth International Conference on Software Engineering Advances, Nice, France
(2010)
3. Duvall, P.: Continuous Integration. Addison Wesley, Reading (2007)
4. Fowler, M.: Continuous Integration (2006). http://www.martinfowler.com/articles/continu
ousIntegration.html
5. Gregory, J., Crispin, L.: More Agile Testing, Chap. 21 (“Agile Testing in Regulated
Environments”). Addison Wesley, Reading (2015)
6. Karlström, D.: Introducing extreme programming-an experience report. In: Proceedings of
the 3rd International Conference on eXtreme Processing and Agile Processing Software
Engineering (XP 2002) (2002)
7. Larman, C., Vodde, B.: Practicies for Scaling Lean & Agile Development. Addison Wesley,
Reading (2009)
8. Leﬃngwell, D.: Agile Software Requirements. Addison Wesley, Reading (2011)
9. Miller, A.: A hundred days of continuous integration. In: Agile 2008 Conference, Toronto,
Canada (2008)
10. Roberts, M.: Enterprise continuous integration using binary dependencies. In: Eckstein, J.,
Baumeister, H. (eds.) XP 2004. LNCS, vol. 3092, pp. 194–201. Springer, Heidelberg (2004).
doi:10.1007/978-3-540-24853-8_22
11. Stolberg, S.: Enabling agile testing through continuous integration. In: Agile 2009
Conference, Chicago, IL (2009)
12. Ståhl, D., Bosch, J.: Experienced beneﬁts of continuous integration in industry software
product development: a case study. In: The 12th IASTED International Conference on
Software Engineering (2013)

Exploring Norms in Agile Software Teams
Viktoria Stray1(B) , Tor Erlend Fægri2 , and Nils Brede Moe2
1

University of Oslo, Oslo, Norway
stray@ifi.uio.no
2
SINTEF, Trondheim, Norway
{tor.e.fegri,nils.b.moe}@sintef.no

Abstract. The majority of software developers work in teams and are
thus inﬂuenced by team norms. Norms are shared expectations of how to
behave and regulate the interaction between team members. Our aim of
this study is to gain more knowledge about team norms in software teams
and to increase the understanding of how norms inﬂuence teamwork in
agile software development projects. We conducted a study of norms in
four agile teams located in Norway and Malaysia. The analysis of 22
interviews revealed that we could extract a varied set of both injunctive
and descriptive norms. Our results suggest that team norms have an
important role in enabling team performance.
Keywords: Group norms · Team values · Collaboration · Self-managing
teams · Behavioral software engineering

1

Introduction

Teamwork is an integral part of contemporary software practice. Productive collaboration in software teams requires a certain unity in norms. Team norms are
emergent, consensual standards that regulate team members behaviors [1]. Productive teamwork carries with it a set of norms such as listening and responding constructively to views expressed by others, giving others the beneﬁt of
the doubt, providing support and recognizing the interests and achievements of
others [2]. Such norms are important because they promote individual performance, which boosts team performance, and good team performance boosts the
performance of the organization. Understanding and inﬂuencing team norms is
therefore key to building a productive software team [3].
With the emergence of agile development methods, we have also seen a substantial research interest in team-related topics such as communication [4], coordination [5] and self-managing teams [6], to name a few. Despite the increased
interest in teamwork and behavioral aspects in software development research,
team norms has been largely ignored [7]. This paper seeks to contribute to our
understanding of team norms in software development by drawing on studies of
norms in other disciplines [8–10]. Our main contribution is the application of
team norm categorizations to a case study of four software teams. Not only do
we report on particular norms in the software development context, but we also
c Springer International Publishing AG 2016

P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 458–467, 2016.
DOI: 10.1007/978-3-319-49094-6 31

Exploring Norms in Agile Software Teams

459

hope this contribution will enable more data-driven empirical research in this
area in order to improve software processes.
We may think of team norms as shared expectations of how to behave in
the team [11]. Norms have the power to partially explain human behavior by
expressing our motivation for doing certain actions [8]. Norms are normative in
the sense that they associate value to certain patterns of behavior. Norms thereby
discriminate between acceptable and unacceptable behaviors of members in a
team [12]. Furthermore, norms are a fundamental element of a team’s structure
and constitute an important vehicle for team members’ identiﬁcation with the
team. When team members identify themselves with a team they will more
easily commit themselves to team goals [10,12]. One of the most important
characteristics of team norms is that they do not exist if they are not shared
with others [13]. Norms may promote adaptive and eﬀective behavior because
people feel compelled to act in ways that are consistent with the norms. Norms
simplify team processes because they make it possible for members to count on
certain things being done and other things not being done [12].
A recent study at Google found that some norms, for example the norm that
team members speak roughly the same amount, could raise the teams collective
intelligence, while other norms could halt the team [3]. A study by Teh et al. suggested that team norms can be adjusted to promote certain behaviors in software
teams [14]. In that particular study, group norms were altered using task priming, whereby team members would complete a pilot task under direct guidance
to establish new norms. Agile methodologies require a shift from command-andcontrol management to leadership-and-collaboration [15]. McHugh [16] found
that norms inﬂuence behavior in agile teams and argue that since traditional
bureaucratic controls are often reduced in agile teams, team norms may be even
more of importance than in traditional software teams. Further, Moe [17] argue
that a software team, in order to become self-managing, needs to change the
operating norms within the team, as well as in the wider environment. While
developing productive norms are important in co-located software teams, it is
even more important in distributed software teams. Sharp and Ryan [18] noted
that a crucial element of virtual team design was the establishment of a shared
set of norms. They argue that virtual teams beneﬁt from learning to express
explicit norms and role expectations to new members.

2

Study Design

The company in which this study was set is an international telecommunications
software company with roughly 700 employees. It was selected as a research site
because it was part of a large research project on teamwork in agile software
teams. We studied two teams located in Norway (Mercury and Mars) and two
teams located in Malaysia (Jupiter and Pluto), as shown in Table 1. One of the
team members worked in both Mercury and Mars, but identiﬁed more strongly
with the latter, and therefore we place her in this team in Table 1. All teams
had used Scrum with the recommended practices for more than two years.

460

V. Stray et al.
Table 1. Sample of teams

Team name Country Team members Team members interviewed
Mercury

Norway

9

1 architect, 1 developer, 1 Scrum Master,
1 tester

Mars

Norway

9

3 developers, 2 Scrum Masters, 1
technical writer

Jupiter

Malaysia

10

Pluto

Malaysia

6

5 developers, 1 project manager, 1
Product Owner, 1 Scrum Master, 1 tester
1 architect, 1 developer, 1 team leader

The 22 interviews were semi-structured and the respondents were asked questions regarding teamwork and meetings. The interview guide was based on a
teamwork model [19], which covers the following teamwork components: communication, team orientation, team leadership, monitoring, feedback, backup,
and coordination. By understanding the teamwork components in each team,
it will be possible to understand the patterns of behavior and the inﬂuencing
norms. Understanding team orientation is of particular interest in this study.
Team orientation is deﬁned as [19]: “the attitudes that team members have
toward one another and the team task. It reﬂects acceptance of team norms,
level of group cohesiveness, and importance of team membership.”.
The average interview duration was 60 min. All the interviews were audiotaped and fully transcribed. The ﬁrst author also observed the teams in meetings
and during daily work. Statements in the interviews regarding daily stand-up
meetings have been used in previous work [20], but the information concerning
norms was analyzed and reported on for the ﬁrst time for the study reported in
this paper.
The two ﬁrst authors studied the interview transcripts and observation notes
and identiﬁed statements that indicated norms - i.e. patterns of behavior. We
looked for statements where team members described a behavior as an “unwritten rule” or “how our team does it”. All the interviews were coded in NVivo.
We decided to use the categorization by Cialdini et al. [8] to understand and
analyze two type of norms: injunctive and descriptive (explained in Sect. 3). We
discussed which of the categories the identiﬁed norms belonged to, and whether
we believed the norms positively or negatively aﬀected team performance. We
also considered the framework proposed by Forsyth [1], but found that the categories were not as clear as those of Cialdini et al.

3

Results and Discussion

Upon analyzing our data we identiﬁed both injunctive and descriptive norms.
Injunctive norms are concerned with what people ought to do or should do.
Such norms describe approved or disapproved behaviors. Descriptive norms are
norms of what most people do, how they typically act, feel, and think in a given

Exploring Norms in Agile Software Teams

461

situation. Because what is approved behavior (injunctive norms) is often the
same as what is typically done (descriptive norms), it is easy to confuse these
two types, but they are conceptually and motivationally distinct [8].
3.1

Injunctive Norms

We found that injunctive norms were the easiest type of norm to identify because
the interviewees often expressed these as ways people ought to behave. For example, one developer in Jupiter described a norm of how to dress for work: “We
have to wear long pants, and we cannot wear slippers.” In Teams Jupiter and
Pluto they had the norm that “the Product Owner (PO) is not allowed to attend
retrospective meetings.” A third example is that all teams had the injunctive
norm: “team members have to be on time for meetings”, and they tried to counteract the tendency to violate this norm with concrete sanctions, such as having
to pay a ﬁne. While talking about allocation of tasks, one developer from Team
Jupiter noted: “We have specialized roles in order to go in depth in solving problems and to be able to solve tasks faster.” The expected behavior in this team
was that team members chose tasks according to specialization. This behavior
was positively sanctioned because the team members believed that it made them
more productive. This norm suggests that the team prioritized role specialization at the expense of agile teamwork norms such as having backup behavior
and knowledge redundancy [5].
Another respondent in Jupiter commented on the autonomy level of the team:
“The thing is, the diﬀerences from now and the early days of Scrum is that we
have full design rights. Previously we did not.” The reference is here to the
positively sanctioned design behavior, i.e. team members are allowed to design.
Design is a part of the work that sets direction for the subsequent coding. This
is a norm that contributes to team performance since it brings decision-making
authority to the level of operational problems.
One developer in Pluto reﬂected on a negative incident with a team member:
“Someone actually decided to take up a user story without informing us and then
told us it was done before the story was even groomed. It is not ok that team members take such decisions without informing the rest of the team.” The injunctive
norm here suggests that team members should not pick up user stories without
informing the other team members. This kind of behavior is often referred to as
decision hijacking [21] and is an example of violation of a norm. The injunctive
norm that team members should inform each other is an enabler for eﬀective
teamwork because agile team members should make decisions together.
3.2

Descriptive Norms

Descriptive norms are concerned with the behavior that generally occurs, and
these norms are predominantly based in implicit assumptions. Hence, in order
to identify these norms, we had to supplement the analysis of the interviews
with observational data to identify the usual behavior of the team members.
For example, when investigating how the burndown chart was updated, one

462

V. Stray et al.

respondent in Team Mars replied: “In Team Mercury, the team members report
and the Scrum Master update it. We have concluded that we do not do it like
that. We do it ourselves. Each one of us has the responsibility to update it.”
This statement suggests a pattern of behavior that is established in the team.
However, we observed that the team members rarely updated the burndown
chart. Hence, the descriptive norm in the team was to update the burndown
chart rarely, even if the project manager wanted them to update it often.
In Team Jupiter, during planning poker, the team member who estimated
the highest or lowest number of hours had to give an explanation of his or hers
estimate. This had resulted in a norm that most team members tried to estimate
a middle value in order to avoid speaking up and explain their value to the others.
Another observation of a descriptive norm in Team Jupiter was that it was ok
to be present in team meetings without paying attention, if the team member said
they had something more important to do. For example, some team members
coded during planning meetings. A consequence of this norm may be a reduced
shared understanding of the work and the teams goals, which negatively aﬀects
team performance.
In all of the teams, team members often arrived at work just in time for
the daily stand-up meeting, even if company policy stated an earlier time. This
illustrates an important aspect of team norms: the informally agreed on guidelines for acceptable behavior in a team may conﬂict with the organization’s
expectations of behavior. Team members will then ﬁnd themselves in a position
where they, often unconsciously, choose or negotiate between diﬀerent norms. In
this example, team norms got precedence above organizational norms. This may
indicate that the team members identiﬁed more strongly with the team than the
organization.
3.3

Co-existing Norms

Injunctive and descriptive norms may co-exist in the same behavioral pattern [8].
In the beginning of our data analysis this created some confusion. An example
of an injunctive and descriptive norm acting simultaneously is the following
statement from a developer: “When I have a problem, I ask for help immediately,
I do not try to sit for days trying to solve the problem myself ”. We often observed
that team members asked each other for help, either by going to a person sitting
close by or by sharing the problem in the daily stand-up meeting. The behavior
of seeking and providing assistance from each other was positively sanctioned in
these teams (injunctive norm). At the same time, it was what people usually did
(descriptive norm).
Another, more intricate example of co-existing norms is illustrated by the
following statement from a manager: “John is not too harsh on the PO, so the
PO would always give him new tasks behind the Scrum Masters back. This is how
John approaches stuﬀ, so we can just let him. It is not really wrong by the way,
he is just doing his part to improve the product.” It is disapproved (and hence an
injunctive norm) to allow the PO to approach team members directly without
the Scrum Masters consent. Nevertheless, this often happens (descriptive norm.)

Exploring Norms in Agile Software Teams

463

Teams go through a natural process of creating norms to ﬁnd a comfortable way
to operate [12]. They try to operate in such a way that they maximize the chances
for success and minimize the chances for failure, and that they also maximize
the satisfaction of the team members and minimize interpersonal discomfort
[22]. For example, the team as a whole are satisﬁed when they try not to accept
tasks from the PO, but at the same time they accept that some team members
solve this type of tasks because it minimizes interpersonal discomfort to let
this person say yes. Nevertheless, we believe that the injunctive norm (team
members should reject tasks from PO) positively aﬀected team performance,
while the acceptance of this being violated (the descriptive norm) negatively
aﬀected team performance.
3.4

Psychological Safety

Some norms of communication were described by the interviewees as cultural
diﬀerences. For example, in Malaysia, one tester noted: “In Norway, the testers
would just go to the developers cubicle and just talk to them whenever there is a
problem. In Malaysia, maybe the working culture is diﬀerent, because most of the
time we are communicating through e-mail to have it in black and white.” However, we believe that other factors than culture are also important in explaining
norms for communication. Norms of how team members behave towards each
other are closely related to the concept of psychological safety, which is a sense
of conﬁdence that the team will not embarrass, reject or punish someone for
speaking up [23].
We identiﬁed several norms that indicated a high degree of psychological
safety. For example, in Jupiter, the developers had the norm that they responded
positively whenever they were confronted with a bug. One tester explained: “In
my previous job I was afraid that developers would be oﬀended when I ﬁled a
bug because basically you are telling them that they have made a mistake. So I
had to think a lot of how I would present the bugs I found. But, I do not get
that feeling in this team because this team is quite mature. The developers are
happy if you ﬁnd a bug. It makes me feel happy about my job and my team.”
This supports the ﬁndings in a recent study of norms that stated that productive
software teams have norms that fosters a high degree of psychological safety in
the team [3].
3.5

Changing Norms

A capacity for learning about norms and how to change them is needed to
improve team performance. The results of this study indicate that it is important that teams reﬂect on the two types of norms associated with how they are
operating as a team, and how such norms evolve. Norms are socially developed
through interactions among team members. As a consequence, they are not static. An intriguing aspect of norms is that behavior that is found eﬀective can
gradually be turned into routine, norm-driven behavior [12,24].

464

V. Stray et al.

Organizations seek to establish norms in diﬀerent ways by enforcing process
standards, code of conduct etc. Similarly, teams will try to establish norms, for
example by agreeing on rules to regulate the team’s interaction. One example
is the set of rules deﬁned by Team Pluto. Figure 1 is a picture of a working
agreement that the team had posted on the wall in the oﬃce space. However,
agreeing on these rules are not suﬃcient to designate them as norms, they are
merely potential injunctive norms. A rule must also be associated with motivation to behave according to this rule (which, naturally, will most often be the
case). Hence, we must ﬁnd evidence of people’s inclination to behave according
to a certain pattern of behavior to verify if they are indeed injunctive norms.
By discussing the working agreement in Team Pluto, the team tried to establish their own norms for eﬀective teamwork. Teams that are able to improve their
own work methods often achieve a higher level of autonomy than teams that do
not make such decisions [11]. One way of changing norms in a conscious manner
is by reﬂection. To enable reﬂection, agile methods typically establish some form
of retrospective meetings. We argue that these should be used as a means to
discuss team norms.

Fig. 1. Working agreement in Team Pluto

In observation of retrospective meetings, we noted several examples of issues
related to team norms, for example: (1) How can we make sure people are punctual to grooming meetings? (2) How can we make team members prioritize the
retrospective meetings when they are busy preparing for the sprint demo? (3)
Should we ban laptops from meetings? and (4) How can we make sure that the

Exploring Norms in Agile Software Teams

465

burndown chart is updated more often? By discussing these issues, the teams
reﬂected on descriptive norms and tried to establish injunctive norms that would
subsequently be adopted as typical. This shows that rituals and ceremonies such
as daily stand-up meetings and retrospective meetings may reinforce acceptable
behaviors. Discussing the team’s own norms is an example of clan-based control. Often, the team will seek to establish sanctions to uphold these injunctive
norms [9]. Clan control is a type of control that operates when the behavior in
a team is motivated by shared values and norms [25]. Clan control empowers
team members in agile software teams [16].

4

Methodological Implications and Future Work

As far as we are aware, this is the ﬁrst study of norms in agile software teams.
Studying norms is a challenging undertaking because most people do not reﬂect
on how norms guide their behavior. Additionally, they may not be aware of
which norms that regulate their actions. We have come to understand norms
better through our analysis according to the framework by Cialdini et al. [8].
Perhaps symptomatic for many soft topics in software engineering, we need to
be vigilant to opportunities for using theory from other disciplines that explain
the practice of software development.
While it seems clear to us that norms are an integral aspect of working in
a team, they may be diﬃcult to uncover because of their degree of visibility,
as shown in Fig. 2. In order to understand norms, researchers have to uncover
assumptions. One may start with identifying artifacts and behaviors in the teams
to decipher the underlying sources of motivations, such as norms. Our position

Fig. 2. Visibility of values and norms. Figure adapted from [25], examples from Team
Mercury

466

V. Stray et al.

is that it is not enough to just interview project members to uncover norms,
one should supplement this data collection method with ﬁeld observations to
see what people actually do. Research in social psychology can serve as useful
examples for future research in software teams [10].
Future research might also explore the concept of team values. Team values
guide behavior and decision making in the team and they underlie norms [26,27].
However, team values may be even more diﬃcult to identify because they are
even less visible than norms.

5

Conclusion

Productive teams, where team members act in a collaborative manner to achieve
project goals, are important for successful software projects. In such teams, team
members often exhibit a strong sense of commitment to the team, and members
are inﬂuenced by shared norms. The purpose of our study was to evaluate the
presence of norms in four software teams. The results support the idea that some
norms enable team performance, while others hinder. In order to encourage productive team member behaviors, we suggest that teams regularly reﬂect on both
their injunctive norms (what is approved/disapproved behavior) and descriptive
norms (what is commonly done). Our contribution can serve as an initial basis
to guide and integrate research ﬁndings about norms in software teams.
Acknowledgments. We thank Yngve Lindsjørn and Øystein Ingebrigtsen for assisting with data collection. We also thank the participants for sharing their experiences
with us. This work was supported by the Smiglo project, which is partly funded by the
Research Council of Norway under the grant 235359/O30.

References
1. Forsyth, D.R.: Group Dynamics. Wadsworth, Cengage Learning, Belmont (2010)
2. Katzenbach, J., Smith, D.: The discipline of teams. Harvard Bus. Rev. 71, 1–11
(1993)
3. Duhigg, C.: What Google Learned From Its Quest to Build the Perfect Team.
http://goo.gl/6md7HQ
4. Pikkarainen, M., Haikara, J., Salo, O., Abrahamsson, P., Still, J.: The impact of
agile practices on communication in software development. Empir. Softw. Eng. 13,
303–337 (2008)
5. Moe, N.B., Dingsøyr, T., Dybå, T.: A teamwork model for understanding an agile
team: a case study of a scrum project. Inf. Softw. Technol. 52, 480–491 (2010)
6. Moe, N.B., Dingsøyr, T., Dybå, T.: Overcoming barriers to self-management in
software teams. IEEE Softw. 26, 20–26 (2009)
7. Lenberg, P., Feldt, R., Wallgren, L.G.R.: Behavioral software engineering: a deﬁnition and systematic literature review. J. Syst. Softw. 107, 15–37 (2015)
8. Cialdini, R.B., Reno, R.R., Kallgren, C.A.: A focus theory of normative conduct:
recycling the concept of norms to reduce littering in public places. J. Pers. Soc.
Psychol. 58, 1015–1026 (1990)

Exploring Norms in Agile Software Teams

467

9. Barker, J.R.: Tightening the iron cage: concertive control in self-managing teams.
Adm. Sci. Q. 38, 408–437 (1993)
10. Terry, D.J., Hogg, M.A.: Group norms and the attitude-behavior relationship: a
role for group identiﬁcation. Pers. Soc. Psychol. Bull. 22, 776–793 (1996)
11. Levine, J.M., Moreland, R.L.: Progress in small group research. Ann. Rev. Psychol.
41, 585–634 (1990)
12. Hackman, J.R.: The design of work teams. In: Handbook of Organizational Behavior, pp. 315–342. Prentice-Hall, Englewood Cliﬀs (1987)
13. Cialdini, R.B., Trost, M.R.: Social inﬂuence: social norms, conformity and compliance. In: Handbook of Social Psychology, pp. 151–193. McGraw-Hill, New York
(1998)
14. Teh, A., Baniassad, E., Van Rooy, D., Boughton, C.: Social psychology and software
teams: establishing task-eﬀective group norms. IEEE Softw. 29, 53–58 (2012)
15. Nerur, S., Mahapatra, R., Mangalaraj, G.: Challenges of migrating to agile methodologies. Commun. ACM 48, 72–78 (2005)
16. McHugh, O.: A study of clan control in agile software development teams. Ph.D.
thesis, NUI Galway (2011)
17. Moe, N.B.: Key challenges of improving agile teamwork. In: Baumeister, H., Weber,
B. (eds.) XP 2013. LNBIP, vol. 149, pp. 76–90. Springer, Heidelberg (2013). doi:10.
1007/978-3-642-38314-4 6
18. Sharp, J.H., Ryan, S.D.: A preliminary conceptual model for exploring global agile
teams. In: Abrahamsson, P., Baskerville, R., Conboy, K., Fitzgerald, B., Morgan,
L., Wang, X. (eds.) XP 2008. LNBIP, vol. 9, pp. 147–160. Springer, Heidelberg
(2008). doi:10.1007/978-3-540-68255-4 15
19. Dickinson, T.L., McIntyre, R.M.: A conceptual framework of teamwork measurement. In: Team Performance Assessment and Measurement: Theory, Methods, and
Applications, pp. 19–43. Psychology Press, NJ (1997)
20. Stray, V., Sjøberg, D.I.K., Dybå, T.: The daily stand-up meeting: a grounded
theory study. J. Syst. Softw. 114, 101–124 (2016)
21. Aurum, A., Wohlin, C., Porter, A.: Aligning software project decisions: a case
study. Intl. J. Softw. Eng. Knowl. Eng. 16, 795–818 (2006)
22. Feldman, D.C.: The development and enforcement of group norms. Acad. Manage.
Rev. 9, 47–53 (1984)
23. Edmondson, A.C.: Psychological safety and learning behavior in work teams. Adm.
Sci. Q. 44, 350–383 (1999)
24. Benner, M.J., Tushman, M.L.: Exploitation, exploration, and process management:
the productivity dilemma revisited. Acad. Manage. Rev. 28, 238–256 (2003)
25. Kirsch, L.J., Ko, D.-G., Haney, M.H.: Investigating the antecedents of team-based
clan control: adding social capital as a predictor. Organ. Sci. 21, 469–489 (2010)
26. Hogan, S.J., Coote, L.V.: Organizational culture, innovation, and performance: a
test of Schein’s model. J. Bus. Res. 67, 1609–1621 (2014)
27. Schein, E.H.: Coming to a new awareness of organizational culture. Sloan Manage.
Review. 25, 3–16 (1984)

Forces that Prevent Agile Adoption
in the Automotive Domain
Philipp Hohl1(B) , Jürgen Münch2,3 , Kurt Schneider4 , and Michael Stupperich1
1

Daimler AG, Research and Development, Ulm, Germany
{philipp.hohl,michael.stupperich}@daimler.com
2
Reutlingen University, Reutlingen, Germany
juergen.muench@reutlingen-university.de
3
University of Helsinki, Helsinki, Finland
4
Leibniz Universität Hannover, Hannover, Germany
kurt.schneider@inf.uni-hannover.de

Abstract.
Context: The current transformation of automotive development
towards innovation, permanent learning and adapting to changes are
directing various foci on the integration of agile methods. Although, there
have been eﬀorts to apply agile methods in the automotive domain for
many years, a wide-spread adoption has not yet taken place.
Goal: This study aims to gain a better understanding of the forces that
prevent the adoption of agile methods.
Method: Survey based on 16 semi-structured interviews from the automotive domain. The results are analyzed by means of thematic coding.
Results: Forces that prevent agile adoption are mainly of organizational,
technical and social nature and address inertia, anxiety and context factors. Key challenges in agile adoption are related to transforming organizational structures and culture, achieving faster software release cycles
without loss of quality, the importance of software reuse in combination
with agile practices, appropriate quality assurance measures, and the
collaboration with suppliers and other disciplines such as mechanics.
Conclusion: Signiﬁcant challenges are imposed by speciﬁc characteristics of the automotive domain such as high quality requirements and
many interfaces to surrounding rigid and inﬂexible processes. Several
means are identiﬁed that promise to overcome these challenges.

Keywords: Software development

1

· Agile methods · Automotive

Introduction

The automotive industry is confronted with high-frequent changes due to innovations and new technology. Today, it is a competitive advantage for car manufacturers to develop and distribute high-quality software at a high pace. A promising
solution to keep pace with this progress are agile software development methods.
c Springer International Publishing AG 2016

P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 468–476, 2016.
DOI: 10.1007/978-3-319-49094-6 32

Forces that Prevent Agile Adoption in the Automotive Domain

469

High quantities with high cost pressure, test and validation under real-time
conditions and a high amount of software variants are important characteristics for automotive software development. Safety-critical applications on the one
hand and cost pressure on the used hardware on the other hand are encountered
in the automotive domain. Furthermore, strict processes for car development
have to be considered. Automotive functions must be veriﬁed by long term ﬁeld
tests and endurance tests that are enforced by law.
This survey investigates the speciﬁcs of agile adoption in the automotive
domain. It presents an interview-based qualitative survey that aims to understand today’s state of the practice of perceived forces on agile adoption. This
work focuses on the agile software development for electronic control units
(embedded software) in the automotive domain.

2

Related Work

Since a decade and more, agile software development methods show promising
beneﬁts in domains such as mobile or web development. In the beginning, it
was not yet clear if agile software development would be applicable to the automotive ﬁeld. In 2004, Manhart and Schneider [1] “tried to break the ice” for
agile embedded development. They summarized existing experiences with agile
methods, but emphasized that more knowledge of agile practices is needed. A
survey by Kugler Maag CIE [2] in 2015, revealed a non-uniform adoption of agile
development in the automotive domain with various and selective adopted agile
methods and practices.
A leader in agile adoption is the Original Equipment Manufacturer (OEM)
Volvo. Eliasson et al. [3] performed several case studies at Volvo to identify the
limitation of agile development in the automotive domain. These studies focus
on impacts with respect to software architecture and pointed out cooperation
problems with subcontractors. In 2014, Eliasson et al. [4] identiﬁed the necessity
to reveal possible showstoppers in the earlier phases of projects by means of
faster feedback. In addition, they focus on requirements engineering combined
with agile practices. Stelzmann et al. [5] analyzed success factors which help
projects to become agile. Katumba et al. [6] conducted a case study to identify
challenges in software development process related to frequent task switching,
individualism, lack of complete knowledge and communication. With this in
mind, our study aims at identifying hindering forces on the agile adoption and
potential solution approaches.

3

Study Approach

3.1

Research Questions

RQ 1: What are the perceived forces that prevent agile adoption in automotive?
RQ 1.1: What are the habits and inertia that prevent agile adoption?

470

P. Hohl et al.

RQ 1.2: What are the anxiety factors that prevent agile adoption?
RQ 1.3: Which context factors prevent agile adoption?
RQ 2: What are the perceived means to adopt agile in automotive?
3.2

Research Design

The study is based on a qualitative survey. It is designed as an exploratory semistructured interview. The method provides insights into the examined topic and
gives essential information to understand the phenomenon in its real context
[7,8]. For a semi-structured interview an interview guide was implemented [9].
The interview guide was structured along a funnel model [8]. Each section begins
with open, exploratory ground mapping questions [10]. These questions reveal
all topics of interests [11]. In addition, dimension mapping questions are used
to focus on interesting topics [11]. The interview guide was tested in a pilot
interview and adjusted to the problems which have arisen.
3.3

Data Collection and Analysis

Research Sites and Participants. The interview participants were selected
from employees of an OEM and an automotive consultant. The interviewee selection was based on two criteria: First, the interviewee should have a work experience of several years. The length of employment varied from 3 to 20 years,
with an average working experience of 16 years. Second, the interviewee should
already use agile practices for software development. To get a diﬀerent point of
view on the examined topic, the following participants were selected: Two managers, ﬁve process owners, two system architects, six software developers and
one automotive consultant for agile development processes. The interviews were
conducted by the primary researcher at the interviewees departments from May
to June 2016.
Interviews. There were 14 face-to-face interviews as well as a group interview
with two participants. Every interview took around one hour. The interview
questions were initially deﬁned in English and translated to the native language
of the interviewees. In consent with the interviewee, the interview was recorded
and transcribed verbatim for detailed analysis. All transcribed interviews notes
were managed using the reference management program Citavi.
Analysis. According to the classiﬁcation of Stol et al. [12], the coding concepts
of Straussian Grounded Theory were used. We used the three coding phases of
Straussian Grounded Theory: open coding, axial coding, and selective coding [12,
13]. The interpretive process of open coding generates categories and concepts
by breaking down the data analytically. The concepts were grouped together
and related to their subcategories in the axial coding. In the selective coding the
central categories were deﬁned.

Forces that Prevent Agile Adoption in the Automotive Domain

471

Validity Considerations. Validity was threatened by the possibility of misunderstandings between interviewees and the researcher. To minimize this risk, the
study goal was explained to the participants prior to the interview. Steps taken
to improve the reliability of the interview guide included a review and a pilot
test. To reduce researcher bias, the interviews were recorded and transcribed.

4

Results

4.1

Forces on Agile Adoption

We deﬁne six categories of forces on agile adoption (cf. Fig. 1). The categorization
aims to better understand the diﬀerent aspects of the transition process from
traditional to agile development practices. We distinguish between “trigger”,
“push” and “pull” as forces that lead to agile adoption. In contrast, we deﬁne
“inertia”, “anxiety”, and “context” as forces that prevent the agile adoption. The
classiﬁcation is inspired by the Customer Forces Diagram by Maurya1 that itself
is inspired by the Forces Diagram by Moesta and Spiek from the Jobs-to-be-done
framework2 .
A trigger force initiates a change and pushes an individual or an organization
towards agile adoption. A push force pushes an individual or an organization
towards adopting agile practices based on issues or demands. A pull force come
into eﬀect when individuals or organizations are pulled towards agile adoption
based on the attractiveness of a future situation.
Inertia forces, such as habits, keep people from trying out something new and
hence prevents agile adoption. The anxiety forces are representing fears which
could prevent the adoption of agile. Often, uncertainties surrounding new situations cause anxiety. The context forces result from constraints and obstacles in
the environment (such as organizational structures or process barriers) and prevent agile adoption. In the following, the preventing forces found in the study are
subsequently described and explained using the information from the interview
transcripts and the interpretations from coding and analysis.

Fig. 1. Agile adoption forces diagram

1
2

https://leanstack.com/science-of-how-customers-buy/.
http://jobstobedone.org.

472

4.2

P. Hohl et al.

Perceived Forces that Prevent Agile Adoption

Inertia. Most of the interviewees mention that a major problem is the missing
understanding of the applicability of agile methods within their context. Some
interviewees emphasize that it is not clear for the management how to manage
agile development and how to integrate it into their departments. Managers point
out that a change of the mind-set is needed to adopt agile, but it is unclear how to
achieve this. All interviewees mention that applying agile methods might require
more communication eﬀort. 50 % of the interviewees agree that communication
eﬀort is manageable in local, small teams but it is diﬃcult to coordinate if the
development is distributed.
Anxiety. The developers believe that it is necessary in agile development to give
more responsibility to software developers; they mention that the management
does not want to give up responsibilities. The managers emphasize that it is
unclear how to provide correct estimations on development eﬀorts when applying
agile practices. They mention that it is diﬃcult to prioritize features without
correct estimations. In addition, the managers mention that they do not want
to displease software developers by changing roles and responsibilities. In fact,
they fear that annoyed developers might leave the department. The software
developers emphasize their biggest fear that customer-relevant defects remain in
the delivered software.
Context. Except of two, all interviewees mention that with the current structure of the company too many responsible persons are involved in negotiations
about feature implementation. The interviewees state that this slows down the
software development and prevents agile adoption. Most of the interviewees consider the high amount of process-dependencies for software development as an
impediment for the transition towards agile development. One manager mentions the demand for more employees to maintain and manage the intersection
between the agile department and the traditional organization. The processes on
the higher system levels are seen as important. At the same time, however, they
are considered to prevent and restrict agile development. The software developers emphasize the need to synchronize with ﬁxed freeze dates and hardwaredevelopment what is seen as slowing down the process. In addition, one interviewee attributes the longer development time to the increased communication
eﬀort during the implementation. The interaction with the purchasing department and suppliers is highlighted by most interviewees as a challenging task.
The communication with a supplier is identiﬁed to be a problem. Challenges
with respect to communication are as well present in the context of globally
distributed software development projects.
All interviewees mention the high eﬀort to fulﬁll compliance and validation.
Technical risks and challenges are mentioned by seven interviewees. It is important that the software is validated and of high quality. The developers mention
that test and validation departments cannot increase speed due to the necessity

Forces that Prevent Agile Adoption in the Automotive Domain

473

of integration and validation in a real car. All interviewees refer to limited capacity in manpower and test systems when it comes to validation. Therefore, it is
necessary to reuse software parts in order to reduce certiﬁcation eﬀorts. Other
restrictions which are seen as important for adopting agile are long term ﬁeld
tests and endurance runs that are enforced by law, e.g. summer and winter tests.
These tests must be kept at reasonable costs.
4.3

Perceived Means to Adopt Agile

How to Overcome the Inertia? Most of the participants mention that they
already use incremental builds to shorten the release cycles. In addition, the
interviewees stress that they have implemented approaches to build prototypes
independent from the main development. Therefore they introduced auxiliary
processes to provide an environment for faster internal releases.
The managers mention that more than 80 % of the software development
for selected electronic control units is already transferred to in-house development. One developer emphasizes that in-house development is appropriate if the
speciﬁcation eﬀort for a feature functionality exceeds the development eﬀort.
Another interviewee describes a situation in which in-house implementation is
not possible. He stresses that the collaboration with the supplier should be a
closer collaboration with more coordination and communication.
How to Overcome the Anxiety? The developers mention an increasing software quality at an early stage of the development is allaying their fear of late
defects in the software. An increase in learning speed is considered as a mean to
increase the odds for delivering high quality software. Other interviewees expect
that prototyping might help to create a safe to fail environment that allows a
fast feedback about a new function which is under development. In addtition,
the risks of late defects can be reduced.
Several interviewees mention that software developers should be granted
more responsibility. One manager referes to the one-room principle where different employees from diﬀerent engineering domains like software, electronics
and mechanics are sitting together in one oﬃce. He highlights the beneﬁts from
interdepartmental cooperation. A mind-set change and redistribution of responsibility is therefore necessary to keep a cooperative atmosphere.
How to Overcome Obstacles Imposed by the Context? All interviewees
identiﬁed organizational structures in large organizations as a main reason of
preventing the adoption of agile practices. They stress that it is almost impossible to change the organization. Three participants mention that although top
management is fostering the change towards agile development, this is not the
case on all management levels. They recommend reorganization with lower hierarchies.
One interviewee describes how his department embedded agile software development in small agile environments in the organization. He emphasizes that this

474

P. Hohl et al.

approach is manageable on a small scale but needs more employees to maintain
the intersection with the traditional organization. The interviewee mentions the
need for new processes in order to manage the interaction between the traditional organization and the agile environment. Two participants mention that
they recruited a consultant to adapt agile practices appropriately to the context
of their department.
Several interviewees state that the beneﬁts achieved so far by the use of
product line management should not be neglected. Therefore, one participant
emphasizes that the level of software reuse should be maintained and extended
by the use of simulations that replace unavailable parts during development.
Table 1 summarizes the challenges with associated possible solution approaches
for the forces that prevent agile adoption.
Table 1. Challenges and Solution approaches
Forces

Challenge

Solution approach

Inertia

Missing understanding of the
applicability of agile methods in a
speciﬁc context
A change of the mind-set is needed
to adopt agile practices
More communication eﬀort required

Organize context-speciﬁc agile training and coaching
Collaborate with (external) experts

Word-of-mouth
recommendations
and feedback-culture
Current development process is seen Explain beneﬁts of agile adoption
as satisfying
Limited acceptance for organiza- Slow and stepwise integration of agile
tional restructuring
methods

Anxiety Management does not want to give Redeﬁne management role; Software
up responsibilities
developers should be granted more
responsibility
Unclear how to provide correct Shorten estimation interval
estimations on development eﬀorts
when applying agile practices
Fear that customer-relevant defects Provide a safe to fail environment and
remain in the delivered software
prototype development
Context Rigid and inﬂexible surrounding
processes
Hierarchy in the organization
Frequent synchronization with suppliers and QA
Globally distributed software development
Keep beneﬁts from software reuse

Deﬁne interface between agile and
traditional processes
Transform hierarchies into networks
More in-house software development
Use of simulations that replace
unavailable parts
Reorganize product line development

Forces that Prevent Agile Adoption in the Automotive Domain

5

475

Discussion and Conclusion

The study presented a state of the practice analysis on agile adoption in the
automotive domain. In 2015, Kugler Maag CIE [2] revealed a selectively adoption of agile methods in the automotive domain. In addition, our study identiﬁed
the forces on the agile adoption. Furthermore, our study associates the ﬁndings
by Katumba [6] and Eliasson [3,4]. Key challenges in agile adoption are related
to transforming organizational structures and culture, achieving faster software
release cycles without loss of quality, the importance of software reuse in combination with agile practices, appropriate quality assurance measures and the
collaboration with suppliers.
The survey reveals many avenues for further research. Potential directions
could be to integrate agile practices into existing product lines in the automotive
domain and to identify means for addressing the restrictions of a rather strict
surrounding process.

References
1. Manhart, P., Schneider, K.: Breaking the ice for agile development of embedded
software: an industry experience report. In: Proceedings of the 26th International
Conference on Software Engineering, pp. 378–386, 23–28 May 2004
2. Weber, S.: Agile in Automotive – State of Practice (2015). www.kuglermaag.com/
agile2015. Accessed 1 Dec. 2015
3. Eliasson, U., Heldal, R., Pelliccione, P., Lantz, J.: Architecting in the automotive
domain: descriptive vs prescriptive architecture. In: Bass, L., Lago, P., Kruchten,
P. (eds.) 12th Working IEEE/IFIP Conference on Software Architecture, WICSA,
pp. 115–118. IEEE, Piscataway (2015)
4. Eliasson, U., Heldal, R., Lantz, J., Berger, C.: Agile model-driven engineering in
mechatronic systems - an industrial case study. In: Dingel, J., Schulte, W., Ramos,
I., Abrahão, S., Insfran, E. (eds.) MODELS 2014. LNCS, vol. 8767, pp. 433–449.
Springer, Heidelberg (2014). doi:10.1007/978-3-319-11653-2 27
5. Stelzmann, E., Kreiner, C., Spork, G., Messnarz, R., Koenig, F.: Agility meets
systems engineering: a catalogue of success factors from industry practice. In: Riel,
A., O’Connor, R., Tichkiewitch, S., Messnarz, R. (eds.) EuroSPI 2010. CCIS, vol.
99, pp. 245–256. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15666-3 22
6. Katumba, B., Knauss, E.: Agile development in automotive software development: challenges and opportunities. In: Jedlitschka, A., Kuvaja, P., Kuhrmann,
M., Männistö, T., Münch, J., Raatikainen, M. (eds.) PROFES 2014. LNCS, vol.
8892, pp. 33–47. Springer, Heidelberg (2014). doi:10.1007/978-3-319-13835-0 3
7. Dresch, A., Lacerda, D.P., Antunes, J.A.V.: Design Science Research. Springer
International Publishing, Cham (2015)
8. Runeson, P., Hst, M.: Guidelines for conducting and reporting case study research
in software engineering. Empirical Softw. Eng. 14(2), 131–164 (2009)
9. Bryman, A.: Social Research Methods, 2nd edn. Univ. Press, Oxford (2004)
10. Easterbrook, S., Singer, J., Storey, M.-A., Damian, D.: Selecting empirical methods
for software engineering research. In: Shull, F., Singer, J., Sjberg, D.I.K. (eds.)
Guide to Advanced Empirical Software Engineering, pp. 285–311. Springer-Verlag,
London Limited, London (2008)

476

P. Hohl et al.

11. Ritchie, J. (ed.): Qualitative Research Practice: A Guide for Social Science Students
and Researchers. Sage, repr ed., Los Angeles (2011)
12. Stol, K.-J., Ralph, P., Fitzgerald, B.: Grounded theory in software engineering
research. In: Dillon, L., Visser, W., Williams, L. (eds.) Proceedings of the 38th
International Conference on Software Engineering, pp. 120–131 (2016)
13. Corbin, J., Strauss, A.: Grounded theory research: procedures, canons and evaluative criteria. Qual. Sociol. 13, 3–21 (1990)

Exploring IoT User Dimensions
A Multi-case Study on User Interactions in ‘Internet of Things’
Systems
Helena H. Olsson1 ✉ , Jan Bosch2, and Brian Katumba1
(

)

1

2

Department of Computer Science, Malmö University, Nordenskiöldsgatan 1,
205 06 Malmö, Sweden
{helena.holmstrom.olsson,brian.katumba}@mah.se
Department of Computer Science and Engineering, Chalmers University of Technology,
Hörselgången 11, 412 96 Göteborg, Sweden
jan.bosch@chalmers.se

Abstract. ‘Internet of Things’ (IoT) systems are fundamentally changing the
way in which users interact and perceive technology. In this paper, we focus on
two of the numerous dimensions of IoT systems with which the users interact;
(1) the IoT user interface and (2) the IoT ecosystem. Based on literature, we
develop a model that identiﬁes how data is presented to users and how users
interact with the system, and the level at which systems interconnect with, and
collects data from, multiple systems. Companies can use the model to assess their
systems in order to identify the current state of their systems and to identify the
desired state. Moreover, the model can be used to better understand the steps
necessary for transforming from one dimension to another in order to develop
more advanced IoT systems.
Keywords: Internet of Things · User interface · Ecosystem · User value

1

Introduction

In the ‘Hype Cycle for Emerging Technologies’, the ‘Internet of Things’ (IoT) is
presented as the new digital business paradigm that will oﬀer fundamentally new ways
for service- and value creation [1–3]. With technologies allowing interconnectivity of
objects, unobtrusive user interfaces and embedded intelligence, IoT applications will
rapidly permeate our everyday lives by transforming the way we interact with informa‐
tion technology and our surrounding environment. Already now, we see examples of
how IoT technologies change everyday life. As one example, ‘smart cities’ is rapidly
transforming city planning, construction and infrastructure [4]. Another area that is
dramatically changing due to IoT technologies is the area of ‘smart homes’ [5]. Finally,
and as an area that is receiving increasing interest is the area of wellbeing, where IoT
applications help people monitor their own health and ﬁtness by collecting data on e.g.
number of steps, calories, heart rate and pulse etc. [6].

© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 477–484, 2016.
DOI: 10.1007/978-3-319-49094-6_33

478

H.H. Olsson et al.

IoT systems involve numerous interesting dimensions that challenge existing views
on how users interact with these. In this paper, we focus on two of the many dimensions
that make IoT systems interesting from a user perspective: (1) the IoT user interface and
(2) the IoT ecosystem. We develop a model in which we deﬁne these two dimensions
and we explore how they inﬂuence user interaction and user value of IoT systems. We
evaluate the two dimensions in the model in ﬁve case companies, and we outline the
typical evolution path that companies take when transitioning towards developing more
advanced IoT systems.

2

Background: ‘Internet of Things’ (IoT)

IoT is the interconnection of uniquely identiﬁable embedded computing devices within
the existing Internet infrastructure [7–9]. The devices collect information through the
use of embedded sensors, and data is transferred without human-to-human or humanto-computer interaction. Based on the data that is collected, IoT systems dynamically
adjust to the environment and behaviors of which they are part, and allow for increasing
opportunities for intelligent identiﬁcation and monitoring of objects. IoT systems allow
many new opportunities in how to present information to users and how to have users
interact with the system [10, 11]. Today, users interact not only by touching, pointing
and scanning, but also by audio, video and gestures [12], and as recognized in [13–15],
the desktop metaphor with windows, icons and pointers is quickly being replaced with
smartphone user interfaces and touch screen input. By deﬁnition, IoT systems are
networks of interconnected objects [16]. The diversity of objects and devices oﬀers
endless opportunities for innovation of new products and services [17]. Jim Hunter [18]
presents a pyramid in which he outlines the diﬀerent needs of an IoT system. At the
bottom level in the pyramid, basic capabilities such as e.g. connectivity, power, security
and network are identiﬁed, and that must exist for any IoT system to be useful. At the
very top of the pyramid, capabilities that are realized when multiple systems interconnect
are identiﬁed. These capabilities allow IoT systems to learn about users, to share what
they learn with other systems and to predict user behaviors [13, 19].

3

The UDIT Model: ‘User Dimension In IoT’

Based on literature, and an explicit interest in the user interface and the ecosystem
dimension of IoT systems, we develop the UDIT model: ‘User Dimensions In IoT’
(Fig. 1). The model focuses on two of the many dimensions of IoT systems: (1) the IoT
user interface dimension and (2) the IoT ecosystem dimension. In our model, the ‘IoT
user interface’ dimension pictures the diﬀerent formats in which information is presented
to users, and the ways in which users interact with the system. The ‘IoT ecosystems’
dimension pictures the ways in which the system interconnects with external systems.
In Fig. 1, we present the model and the deﬁnitions of each dimension.

Exploring IoT User Dimensions

479

Fig. 1. The UDIT model: ‘User Dimensions In IoT’.

4

Method

This paper reports on a ﬁve-months multi-case study (August – December 2015) in ﬁve
companies developing IoT systems. In our study, we adopt an exploratory approach [20–
23], in which we develop and evaluate two dimensions of IoT systems. We develop a
model in which we picture two dimensions of IoT systems in a two-by-two matrix. A
draft of the model was presented to the case companies at a cross-company workshop
to get initial feedback on the two dimensions. To further evaluate the model, we met
with representatives from the case companies to conduct interviews where we asked
them to assess and map one selected system according to the two dimensions in the
model. The interviews were exploratory in nature [24]. All interviews were carried out
face-to-face and lasted for 1–1.5 h. A problem that has been identiﬁed in relation to
qualitative research is that diﬀerent individuals may interpret the same data diﬀerently
[25]. This problem was addressed by using a ‘venting’ method, i.e. a process whereby
interpretations are discussed with professional colleagues [26, 27]. The case companies
are brieﬂy outlined below:
Company A is a supplier of energy and energy related services. For the purpose of
this study, we met with the project manager for business innovation with a main respon‐
sibility in the areas of smart homes and sustainable cities.
Company B develops mobile phones, tablets, smart wear and associated devices that
enhance use and experience of information and communications technology for
consumers and businesses. For the purpose of this study, we met with one of the senior
research managers for technology research and advanced applications.
Company C oﬀers a wide portfolio of IP-based products and solutions for security
and video surveillance. For the purpose of this study, we met with two senior people
from the core technology group.

480

H.H. Olsson et al.

Company D oﬀers mesh network technology with a software package that enables
mobile devices to form instant networks. For the purpose of this study, we met with two
software developers.
Company E develops monitoring and alarm solutions for homes. For the purpose of
this study, we met with three people from the research and development unit.

5

Results

In this section, we summarize our empirical ﬁndings by presenting (1) the current state
of the systems in each company, and (2) the desired state and transition towards more
advanced IoT systems as identiﬁed in each company. Finally, we present the mapping
of the systems that was carried out by the interviewees.
Current state: Homogenous and static IoT systems
Company A. The application is a monitoring solution that helps users track and reduce
energy consumption. It is used in private households, in larger apartment buildings and
in large public buildings. The application uses two main sources of data i.e. an optical
reader and a smart plug reader device sensor. Currently, data is collected from a limited
number of internal sources, and it is not combined or merged with any external data
source.
Company B. The application is an activity tracker application that presents information
in the form of ‘tiles’ via a mobile phone app interface. The diﬀerent tiles present infor‐
mation such as e.g. running and walking activities, surﬁng the web, watching a ﬁlm,
sleep patterns, heart rate, pulse etc. Currently, data is collected primarily from a few
internal sources, and the company is actively looking into the opportunities to connect
to external sources to add user value and experience.
Company C. The product is a surveillance camera that streams video data from surveil‐
lance systems at airports and grocery stores etc., and stores it for further analysis.
Currently, data is collected from one source, and although functionality such as inter‐
connectivity to other systems can be added, this is not yet done.
Company D. The technology provides connectivity when there are no master nodes and
when network infrastructure is poor. Connection of devices is realized by broadcasting
and peer-to-peer connection in a mesh network. In providing instant connection to other
devices, the system has the opportunity to use data from several external sources.
Company E. The system is a home security and surveillance system that allows the user
to keep in contact with the home wherever you are. Through a mobile app, users can
switch oﬀ the coﬀee machine, get information about who is at home, switch on the home
alarm system, control locks, lights and electronic devices and household appliances.
Currently, users cannot inﬂuence the mode of presentation and the system does not
integrate with external systems.

Exploring IoT User Dimensions

481

Desired state: Heterogeneous and dynamic IoT systems
Company A. The primary goal is to have the system connect with other data sources
such as e.g. data collected in other households, weather forecast data, and social media.
This would increase the opportunities for correlation of data as well as community
building among users. Also, the company aims at developing functionality that provides
the users with proactive recommendations, and that allow comparisons to other house‐
holds to stimulate competition among households. The intention is to have an increas‐
ingly autonomous system that learns from previous experience and from other systems
and that, based on these insights, acts proactive to improve user behaviors.
Company B. The company aims at providing a more dynamic system in which the
diﬀerent tiles base their information on several data sources, and where the presentation
of data is continuously updated without any delays. In being a consumer product, the
user experience is a key factor. Therefore, the interviewee identiﬁes the ability for users
to customize the information, as well as the appearance of the tiles, as critical to increase
user value and maintain user experience over time.
Company C. In similar with company A and B, company C views the transition towards
a more heterogeneous system as the ﬁrst and most important step in advancing IoT
systems. As an example, a camera in a grocery store could monitor people, but also
availability of products, what groceries that need to be reﬁlled and with a connection to
the sales system it could issue orderings etc. As such, the system would increase eﬃ‐
ciency and productivity and they would help automate manual tasks.
Company D. The company aims at having the technology they provide become a
standard for mobile communication so that future mobile devices have the technology
as a standard option to connect to the Internet. Also, there is great potential in building
applications on top of the network in order to allow dynamic interaction among multiple
users, and with an interface that responds dynamically to any change in the network.
Company E. The company aims at having the system connected with other systems in
order to provide an integrated user experience. This would allow the company to collect
data from several sources, and to combine this data to enhance functionality and increase
the value. The company believes that a lot of additional functionality, such as e.g. audio
and video, could be added to the product to increase user value.
Mapping of IoT systems: The UDIT model
As part of the interview study, we asked the interviewees to map the system they had
selected according to the dimensions in the UDIT model. In Fig. 2 below, we present
this mapping. As can be seen, the majority of the systems are placed within the lower
left quadrant implying that the user interface dimension is static/standardized, i.e. the
systems present information in a ‘display’ or ‘dashboard’ format, and that users cannot
inﬂuence the way data is presented. Also, this quadrant implies that the systems use one
or a very limited number of data sources as input.

482

H.H. Olsson et al.

Fig. 2. Mapping of case study IoT systems.

6

Discussion

Based on our interviews, we see that the companies identify the transition towards
heterogeneous ecosystems as critical. The main reason for this is the access to multiple
data sources that, if connected, could leverage new user value and innovative business
opportunities. This will be critical to maintain user interest and value over time. Also,
a heterogeneous ecosystem that connects with, and collects data from, multiple sources
would allow increasing system automation. Going a step further, heterogeneous ecosys‐
tems allow for embedded analytics, logic and distributed intelligence to be shared [18].
In our case companies, the opportunity to have interconnected systems that learn from
each other, and that adjust based on input from each other, is seen as the next step towards
autonomous systems. With systems that learn from each other, and adjust accordingly,
many of today’s manual tasks and decisions can be reduced, and based on our research
this is equally important regardless of domain.
With regard to the user interface dimension, our study reveals interesting ﬁndings
in that companies strive for increasingly dynamic user interfaces, while they view
exploratory interfaces as less important. Although they all agree on that exploratory
interfaces are important for improving user experience, they emphasize that with
increasingly intelligent systems user interaction will, and should, be reduced over time
[28]. In contrast to the prevailing belief that more options, more interaction and more
customization opportunities drive user value, the companies we studied foresee
successful IoT systems as systems that over time require less human intervention.
Although the companies we studied target diﬀerent domains, we identify a number
of similarities in relation to what drives the transition towards more advanced IoT
systems. All companies view the transition towards heterogeneous ecosystems as driven
by the increasing desire for system automation and autonomy. In our study, this is
reﬂected in companies striving for interconnected systems and multi-source data collec‐
tion as the basis for automation of actions and system-initiated decision-making.

Exploring IoT User Dimensions

483

The transition towards dynamic interfaces is driven by the desire to have systems that
continuously update and optimize information in order to present accurate recommen‐
dations based on real-time data.

7

Conclusion

In this paper, we explore the user interface and the ecosystem dimension of IoT systems.
We develop a model that captures these dimensions and we evaluate the model in ﬁve
case companies. Based on our ﬁndings, we conclude that (1) companies transition
towards heterogeneous ecosystems to increase system automation and autonomy, (2)
companies transition towards dynamic user interfaces to improve system accuracy and
optimization, and (3) companies foresee future IoT systems as increasingly autonomous
systems for which the desire for user interaction will decrease over time.

References
1. Fenn, J., Raskino, M.: Mastering the Hype Cycle: How to Choose the Right Innovation at the
Right Time. Harvard Business Press, Boston (2008)
2. Gartner’s 2015 Hype Cycle for Emerging Technologies Identiﬁes the Computing Innovations
That Organizations Should Monitor. http://www.gartner.com/newsroom/id/3114217
3. Heather Levy: What’s New in Gartner’s Hype Cycle for Emerging Technologies (2015).
http://www.gartner.com/smarterwithgartner/whats-new-in-gartners-hype-cycle-for-emergingtechnologies-2015/
4. Yonezawa, T., Galache, J.A., Gurgen, L., Matranga, I., Maeomichi, H., Shibuya, T.: A citizencentric approach towards global-scale smart city platform. In: 2015 International Conference
on Recent Advances in Internet of Things (RIoT), pp. 1–6 (2015)
5. De Silva, L.C., Morikawa, C., Petra, I.M.: State of the art of smart homes. Eng. Appl. Artif.
Intell. 25, 1313–1321 (2012)
6. Kovatcheva, E., Nikolov, R., Madjarova, M., Chikalanov, A.: Internet of Things for wellbeing –
pilot case of a smart health cardio belt. In: Roa Romero, L.M. (ed.) XIII Mediterranean
Conference on Medical and Biological Engineering and Computing 2013, pp. 1221–1224.
Springer, Switzerland (2014)
7. Miorandi, D., Sicari, S., De Pellegrini, F., Chlamtac, I.: Internet of Things: vision, applications
and research challenges. Ad Hoc Netw. 10, 1497–1516 (2012)
8. Gubbi, J., Buyya, R., Marusic, S., Palaniswami, M.: Internet of Things (IoT): a vision,
architectural elements, and future directions. Future Gener. Comput. Syst. 29, 1645–1660
(2013)
9. Chen, G., Huang, J., Cheng, B., Chen, J.: A social network based approach for IoT device
management and service composition. In: 2015 IEEE World Congress on Services
(SERVICES), pp. 1–8 (2015)
10. VentureScanner: The State of Internet of Things in Six Visuals. https://medium.com/
@VentureScanner/the-state-of-internet-of-things-in-six-visuals-a4b9cda3324c#.qu0y4pk2k
11. Kranz, M., Holleis, P., Schmidt, A.: Embedded interaction: interacting with the Internet of
Things. IEEE Internet Comput. 14, 46–53 (2010)

484

H.H. Olsson et al.

12. Rukzio, E., Leichtenstern, K., Callaghan, V., Holleis, P., Schmidt, A., Chin, J.: An
experimental comparison of physical mobile interaction techniques: touching, pointing and
scanning. In: Dourish, P., Friday, A. (eds.) UbiComp 2006. LNCS, vol. 4206, pp. 87–104.
Springer, Heidelberg (2006). doi:10.1007/11853565_6
13. Rowland, C., Goodman, E., Charlier, M., Light, A., Lui, A.: Designing Connected Products:
UX for the Consumer Internet of Things. O’Reilly Media, Inc., Sebastopol (2015)
14. Gärdenfors, D.: All or nothing? Interfaces for the Internet of Things — The Conference.
https://medium.com/the-conference/all-or-nothing-interfaces-for-the-internet-of-things-15b
64bd04ae3#.k2iduusp3
15. Yau, S.S., Buduru, A.B.: Intelligent planning for developing mobile IoT applications using
cloud systems. In: 2014 IEEE International Conference on Mobile Services (MS), pp. 55–62
(2014)
16. Evans, D.: The internet of things: how the next evolution of the internet is changing everything.
CISCO White Pap. 1, 14 (2011)
17. Leminen, S., Westerlund, M., Nyström, A.-G.: Living labs as open-innovation networks.
Technol. Innov. Manag. Rev. 2 (2012)
18. Hunter, J.: The Hierarchy of IoT “Thing” Needs. http://social.techcrunch.com/2015/09/05/
the-hierarchy-of-iot-thing-needs/
19. Liu, Y., Zhou, G.: Key technologies and applications of Internet of Things. In: 2012 Fifth
International Conference on Intelligent Computation Technology and Automation (ICICTA),
pp. 197–200 (2012)
20. Dubé, L., Paré, G.: Rigor in information systems positivist case research: current practices,
trends, and recommendations. MIS Q. 27, 597–636 (2003)
21. Yin, R.K.: Case Study Research: Design and Methods. Sage Publications, Los Angeles (2009)
22. Benbasat, I., Goldstein, D.K., Mead, M.: The case research strategy in studies of information
systems. MIS Q. 11, 369–386 (1987)
23. Eisenhardt, K.M.: Building theories from case study research. Acad. Manage. Rev. 14, 532–
550 (1989)
24. Kvale, S.: Doing Interviews. SAGE, Los Angeles (2008)
25. Runeson, P., Höst, M.: Guidelines for conducting and reporting case study research in software
engineering. Empir. Softw. Eng. 14, 131–164 (2008)
26. Kaplan, B., Duchon, D.: Combining qualitative and quantitative methods in information
systems research: a case study. MIS Q., 571–586 (1988)
27. Goetz, J.P., LeCompte, M.D., et al.: Ethnography and qualitative design in educational
research. Academic Press, Orlando (1984)
28. Sarkar, C., Nambi, S.N., Prasad, R.V., Rahim, A.: A scalable distributed architecture towards
unifying IoT applications. In: 2014 IEEE World Forum on Internet of Things (WF-IoT), pp.
508–513. IEEE (2014)

Requirements and Quality

An Industrial Case Study on Measuring
the Quality of the Requirements Scoping
Process
Krzysztof Wnuk1(&), Markus Borg2,
and Sardar Muhammad Sulaman3
1

Software Engineering Research Lab, Department of Software Engineering,
Blekinge Institute of Technology, Karlskrona, Sweden
krzysztof.wnuk@bth.se
2
SICS Swedish ICT AB, Lund, Sweden
markus.borg@sics.se
3
Department of Computer Science, Lund University, Lund, Sweden
sardar@cs.lth.se

Abstract. Decision making and requirements scoping occupy central roles in
helping to develop products that are demanded by the customers and ensuring
company strategies are accurately realized in product scope. Many companies
experience continuous and frequent scope changes and fluctuations but struggle
to measure the phenomena and correlate the measurement to the quality of the
requirements process. We present the results from an exploratory interview
study among 22 participants working with requirements management processes
at a large company that develops embedded systems for a global market. Our
respondents shared their opinions about the current set of requirements management process metrics as well as what additional metrics they envisioned as
useful. We present a set of metrics that describe the quality of the requirements
scoping process. The ﬁndings provide practical insights that can be used as input
when introducing new measurement programs for requirements management
and decision making.
Keywords: Requirements
improvement

engineering


Software

metrics


Process

1 Introduction
Requirements Management (RM) [4] iteratively integrates the requirements elicitation
and analysis results into the project management and development flows. RM also
supports managing requirements during the product lifecycle and between the products.
Large, globally operating software companies need to manage large quantities of
features and requirements that continuously arrive from ever-changing markets [10].
Measuring and optimizing requirements identiﬁcation, prioritization, deﬁnition and
implementation processes is, in a market-driven context [10], crucial for achieving and
sustaining competitive product growth [5].
© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 487–494, 2016.
DOI: 10.1007/978-3-319-49094-6_34

488

K. Wnuk et al.

The process of selecting a subset of requirements for implementation within a given
project is called scoping. Many software-intensive companies increase the flexibility of
decision making by allowing scope fluctuations. Our previous work highlighted that
large companies experience frequent scope fluctuations and have limited support in
scope management [14]. The resulting late changes increase the need for improved
monitoring and management capabilities that can evaluate the adequacy of the selected
requirements management process models. Despite that, most published work on
requirements measurement focus on the attributes of requirements [2] rather than the
requirements management process [1]. Some published process metrics include:
(i) how much value a software team delivers in every iteration [3], (ii) the number of
requirements awaiting analysis, prioritization or decision, and (iii) the lead time in each
state for each user story [7].
In this paper, we present the results from an exploratory interview-based case study
with 22 participants working with the requirements management process at a large
company that develops embedded systems for a global market. During semi-structured
interviews with mostly senior-level practitioners working with requirements gathering,
prioritization, scope management, software resource planning and high-level management, we investigated the following two research questions:
RQ1: What are the current scope management process quality metrics used by the case
company?
RQ2: What scope management process quality metrics would the practitioners like to
implement in the future?

2 Case Company
The case company is a large (5,000 employees) organization active in the telecommunication domain, developing embedded systems for the global consumer market. As
the inflow of new requirements is rapid, product management often needs to make
unplanned scoping decisions [14]. The company utilizes the Software Product Lines
concept [9] where different development projects contribute to an evolving common
code base, also called a platform. The total number of features registered in the
company’s database exceeds 10,000 and is steadily growing as new products are added
to the product line, each containing on average 60 to 80 new features and associate up
to 20 system requirements per feature. Feature implementation is allocated to
approximately 20 to 25 development teams (each team has from 40 to 80 developers).
Features are managed based on a state machine depicted in Fig. 1. When features
are created, they are put into an administrative state called New Feature (NF). In the
next step, features enter the process and are discussed at the M0 forum. This forum
critically reviews if a proposed feature has a sponsor, sufﬁcient business justiﬁcation
and is aligned with the current product and portfolio strategy. Many features are
rejected at this stage mainly due to insufﬁcient business justiﬁcation or unclear deﬁnition. Next, a feature is promoted to the M1 state where it is prioritized against other
features by scope owners using a one-dimensional prioritization based on business
value. A feature could be returned to the M0 state for further reﬁnement.

An Industrial Case Study on Measuring the Quality of the Requirements

489

Fig. 1. The example history of three features. The ﬁrst feature, marked with dashed lines was
implemented. The second feature marked with solid arrows was withdrawn. The third feature
marked with dashed-dotted arrows was discarded. Also available at [13].

At the next stage, called M2 in Fig. 1, the development resources are consulted and
implementation schedules are discussed and agreed upon. Each feature comes to this
forum with a target delivery date that is discussed and adjusted depending on the
current software development organization load and other responsibilities. Prototypes
are used at this stage to provide more accurate effort estimates and possible delivery
times. A pipeline tool is used at the M2 stage to control the resources and to schedule
delivery of several hundreds of implementation running in parallel. After the M2 state,
the development organization takes the main responsibility for the features which are
promoted to the: Deﬁnition Ongoing (DO), Awaiting Execution (AE), Execution
Started (ES), Execution Completed (EC), Awaiting Integration (AI) and Integrated (I).
Transitions between any two states are in theory allowed, including backward transitions. However, there is one optimal path without backward transitions through a state
machine which is visualized with a dashed line in Fig. 1.

3 Research Methodology
To gain deep understanding and explore different requirements management metrics at
the case company, a flexible case study design was chosen [11] with semi-structured
interviews as the method for data collection.
We started phase 1 by iteratively developing an interview instrument in collaboration with four practitioners from the case company. Finally, two senior software
engineering researchers and two practitioners reviewed the 17 questions and they were
grouped into 6 topics: background, business goals, current metrics, desired metrics,
visualization, and open innovation. The interview instrument can be accessed online
[12]. Note that the results reported in this paper are exclusively related to questions
under topics 3 and 4 as well as background questions under topic 1. The remaining
topics are covered in a separate publication [15].
In the next step, we selected interview respondents by using a combination of
maximum variance and convenience sampling [11] to cover as many views on
the requirements management process as possible. Twenty-two respondents participated in the study. Their average experience in working with requirements processes

490

K. Wnuk et al.

was 6.5 years with the most experienced participant having almost 13 years of experience and the least experienced participant having about 3 years of experience [12].
Prior to the interviews, we sent the questions to the participants to help them
understand the scope of the study and prepare for the discussions under the interviews.
The ﬁrst author then interviewed all participants individually, recorded and transcribed
the interviews. The transcripts were sent to the interviewees to validate the content, and
to enable clariﬁcations where needed.
As the industry partner requested a quick summary of the ﬁndings, we concluded
the ﬁrst phase with ﬁve senior managers in a seminar. During this seminar, the ﬁrst
author presented preliminary results from an initial analysis of the data. The seminar
delivered early tangible outcomes and the discussions at the seminar also acted as a
validation, i.e. a sanity check that the direction of our work was promising, and
motivated the deeper analysis in phase 2.
Phase 2 involves the four steps of the systematic data collection and analysis. First,
the ﬁrst author divided the transcripts into chunks of text containing a few connected
sentences. The second author then repeated the process for 4 of the interviews (21 %),
to validated that we had a reasonable level of granularity. The authors compared the
chunk sizes, and agreed on simple rules resulting in the creation of a chunk for each
relevant proposition (i.e. what is believed, doubted, etc.) expressed in the interviews.
The ﬁrst author then reiterated the remaining chunks to apply the rules.
In the second step, the ﬁrst and second authors collaboratively analyzed 11 of the
interviews (48 %) with the goal of developing a robust coding scheme. The ﬁrst and
second authors then independently coded the remaining 12 interviews (52 %). The
authors calculated an inter-rater agreement using Cohen’s Kappa [6] on the coding
results. We achieved a Kappa score of 0.59, which we interpret as moderate agreement
on the coding scheme.
In the third step of phase 2, we analyzed the coded data. The output from the coding
step was synthesized by the ﬁrst and second authors, and reviewed by the third author
to provide further validation and observer triangulation. Finally, all authors prepared
the manuscript for this research article.
Validity. We discuss validity issues based on the guidelines by Runeson et al. [11].
We attempted to mitigate the interpretive validity threats by asking interviewees to
check the interview transcripts. Threats to evaluative validity are not applicable in this
case due to exploratory nature of the study and a lack of evaluative purpose. Threats to
description validity were addressed by recording the interview sessions and transcribing them. The transcripts were sent back to the interviewees for validation. Threats
to theoretical validity have a minimal impact on this study due to its explorative nature
and therefore a lack of theory, speciﬁc hypotheses, or conceptual frameworks to be
validated. Moreover, we minimized the bias of unclear questions by iteratively
developing the interview guidelines. The questions were formulated in a way to
minimize the possibility of imposing a particular answer. We took precautions that the
interviewer expressed neutrality when asking the questions and therefore the risks of
reflexivity are minimized.

An Industrial Case Study on Measuring the Quality of the Requirements

491

Due to an exploratory nature of this study, exploring to what extent our conceptualizations and conclusions derived from the interviews are correct remains to a certain
degree unclear and calls for inspection by other researchers in the ﬁeld as well as
follow-up studies. Since the investigated problem originates from the case company,
we can for sure claim that it is an authentic research problem.
We report that both internal and external generalizability are strongly limited in
this case, mainly due to only one company involved. The paper’s exclusive focus on an
individual company narrows the applicability of the observations. Nevertheless, we
attempted to gather as many perspectives as possible on the studied phenomenon by
inviting participants with various roles and experiences from the case company.

4 Results
Table 1 summarizes the 26 scope management process quality metrics identiﬁed in the
study. Among them, only ﬁve metrics are measured and 21 are needed or requested
metrics. The ﬁve currently used metrics are: the number of backward transitions (Q1)
and their reasons (Q2), the software design quality and if the process actually prioritizes
the most important features from the portfolio planning (Q4) and customer perspectives
(Q5).
The requested metrics include the impact of priorities on the lead-times (Q6 and
Q7) as well as the impact of high priority features on low priority features (Q8). The
accuracy of estimates and its impact on the efﬁciency of requirements analysis or
deﬁnitions (Q9 and Q12) clearly indicate that focusing plainly on speed may not give
the desired effects as quality of the work should not be compromised.
Several metrics also describe the features and their nature in terms of testability or
complexity, e.g. Q10 and Q11. These metrics should be introduced during the
requirements analysis phase and used as extensions to the widely accepted aspects, e.g.
correctness, ambiguity or completeness. Q10 and Q11 further detail requirements on
system test metrics suggested by Petersen and Wohlin [8].
Metrics Q14 and Q15 focus on how many times or why a feature was sent back in
the process due to unclear information. This indicates that some stages of the process
may either not do their work rigidly or receive appropriate input from earlier stages thus delivering requirements of insufﬁcient quality.
Similarly, our respondents would like to measure how many times a feature is
moved between the releases (Q13) and why they were moved, which could indicate
either: i) issues with accurate release planning or ii) several strategic changes after the
release plan is agreed upon. Metric Q13 provides interesting input for the iterative
release planning approaches that are based on continuous release re-planning and timely
responses to a frequently changing market situation. The number of release changes
could be correlated with how many times previously set delivery dates are altered (Q25).
Two requested metrics focus on the “waste” generated by analyzing unimplemented features (Q18 and Q19) while one metric (Q20) focuses on the effort saved on
unimplemented features or software deﬁnitions in relation to the previous (waterfall)
way of working. Two other requested metrics correlate the deﬁned scope with the
overall product strategy (Q24) or increased sales from successful products (Q23).

492

K. Wnuk et al.
Table 1. Elicited metrics. Respondents are coded with Greek alphabet letters.

ID

Metric deﬁnition

Mentioned or need
for/respondent

Q1
Q2
Q3

How many times a feature is sent backwards in the process
Why features are sent backwards in the process from the M2 forum
The quality of the software design and the associated user interaction
features
The overlap or potential discrepancies between the early product
deﬁnitions from the portfolio planning and the product scope at the TG
Commit
Priority levels of highly requested features by various stakeholders
The correlation between the priorities set and the time needed to
implement the features or the time needed for deﬁnition or
implementation
The frequency of priority changes in relation to the dev. performance
How new-coming highly-prioritized features impact low-priority features
(e.g. low-prioritized get delayed)
The accuracy of the estimates in relation to the efﬁciency of the process
The testability of the “vertical features” (features involving several
technical areas) and their impact on various technical areas
The complexity of the features that are sent to the deﬁnition (cf. DO in
Fig. 1) in terms of their impact on other organizations
The quality of feature deﬁnitions and estimates
How many times features are moved between the releases and why

Measured, ZETA
Measured, RHO
Measured, TAU

Q4

Q5
Q6

Q7
Q8
Q9
Q10
Q11
Q12
Q13
Q14
Q15
Q16
Q17
Q18

Q19
Q20
Q21
Q22
Q23
Q24
Q25
Q26

Measured, SIGMA, ETA

Measured, KAPPA
Need for, GAMMA,
LAMBDA
Need for, PI
Need for, RHO
Need for, ZETA
Need for, THETA
Need for, THETA

Need for, KAPPA
Need for, BETA, ETA,
KAPPA, MY
How many times (and why) the features are send back from the M1 to Need for, ALFA and CHI
the M0 forum
How many times (and why respondent CHI) features are send back from Need for, CHI, JOTA,
M2 to M1 forum (respondents JOTA and LAMBDA)
LAMBDA
How many time a feature is resubmitted at the M0 forum due to unclear Need for, LAMBDA
information or quality issues
The reasons why features are send back from the M0 forum to
Need for, RHO
redeﬁnitions
Need for, PHI, CHI
How much “waste” the process is producing (analyzed but
unimplemented features, e.g. how many features are withdrawn at each
stage)
The “waste of the scope” after the features are promoted to the deﬁnition Need for, KSI
The effort saved on unimplemented features or software deﬁnitions in Need for, PHI
relation to the previous (waterfall) way of working
The stability of the scope after the customer acceptance test
Need for, KAPPA, NY
How many changes to the OSS code each feature requires and when to Need for, PI
share these changes with the open source community
If the planned scope later implemented in the products is meeting the set Need for, ETA
sales and customer satisfaction business targets
To what degree the features that are created in the process reflect the
Need for, PI
overall strategy of the company
How many times previously set delivery dates altered (caused by e.g. Need for, ETA
resource shortages or changed priorities)
The percentage of effort put on legacy work
Need for, CHI

An Industrial Case Study on Measuring the Quality of the Requirements

493

Finally, the amount of changes in the open source code each feature requires (Q22)
in combination with the percentage of effort put on legacy work while developing new
features (Q26) can bring interesting insights regarding the selected sourcing strategy
and also suggestions about the amount of open source code in the product.

5 Implications and Conclusions
Our study delivers several implications for research and practice. Firstly, the fact that
we elicited 26 quality metrics is a clear indication that it is challenging to come up with
an accurate set of metrics to capture the important aspects the requirements management process. Secondly, the fact that 21 identiﬁed (required) quality metrics were
collected brings a possible interpretation that more focus should be directed towards
complementing efﬁciency metrics with quality metrics. For example, quickly delivering
features with minimal process waste is highly desired, as long as these features will
provide value to the end customers and positively realize product strategies, see for
example metrics Q23 and Q24.
Thirdly, requirements prioritization for agile development should go beyond popular one-dimensional priority or urgency lists and be correlated with measures that take
the holistic perspective on prioritization (see e.g. metrics Q5, Q6 and Q8) and integrate
it with product and portfolio planning.
Fourthly, measuring lead-times and delays on the interface between the requirements and development organizations appears to be equally important as measuring the
requirements process lead-times. Additional signiﬁcant factor is to measure backward
transitions and understand why they happen (see metrics Q1, Q2, Q14, Q16 and Q17)
or transitions between the releases (Q13). Fifthly, measuring the number of features in
each state should be complemented with the derived measures of the ratios between the
features in two states. This provides useful indications for rapid identiﬁcation of process bottlenecks.
In future work, we plan to create a conceptual model of measuring and tracking
potential waste in requirements management and decision making processes. Moreover, we plan to conduct additional case studies at other companies that record the
information during their requirements management processes. Such empirical studies
would help us in expanding our knowledge about the applicability of our model.
Acknowledgements. This work is supported by the IKNOWDM project from the Knowledge
Foundation in Sweden (20150033).

References
1. Ambriola, V., Gervasi, V.: Process metrics for requirements analysis. In: Conradi, R. (ed.)
EWSPT 2000. LNCS, vol. 1780, pp. 90–95. Springer, Heidelberg (2000). doi:10.1007/
BFb0095017
2. Costello, R.J., Liu, D.-B.: Metrics for requirements engineering. J. Syst. Softw. 29(1), 39–63
(1995)

494

K. Wnuk et al.

3. Feyh, M., Petersen, K.: Lean software development measures and indicators - a systematic
mapping study. In: Fitzgerald, B., Conboy, Kieran, Power, K., Valerdi, R., Morgan, L., Stol,
K.-J. (eds.) LESS 2013. LNBIP, vol. 167, pp. 32–47. Springer, Heidelberg (2013). doi:10.
1007/978-3-642-44930-7_3
4. Hood, C., Wiedemann, S., Fichtinger, S., Pautz, U.: Requirements Management: The
Interface Between Requirements Development and All Other Systems Engineering
Processes. Springer, Heidelberg (2007)
5. Höst, M., Regnell, B., Natt och Dag, J., Nedstam, J., Nyberg, C.: Exploring bottlenecks in
market-driven requirements management processes with discrete event simulation. J. Syst.
Softw. 59(3), 323–332 (2001)
6. Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data.
Biometrics 33, 159–174 (1977)
7. Mujtaba, S., Feldt, R., Petersen, K.: Waste and lead time reduction in a software product
customization process with value stream maps. In: 21st Australian Software Engineering
Conference (ASWEC), pp. 139–148 (2010)
8. Petersen, K., Wohlin, C.: Measuring the flow in lean software development. Softw. Pract.
Exp. 41, 975–996 (2010)
9. Pohl, K., Böckle, G., van der Linden, F.J.: Software Product Line Engineering: Foundations,
Principles and Techniques. Springer, New York (2005)
10. Regnell, B., Svensson, R.B., Wnuk, K.: Can we beat the complexity of very large-scale
requirements engineering? In: Paech, B., Rolland, C. (eds.) REFSQ 2008. LNCS, vol. 5025,
pp. 123–128. Springer, Heidelberg (2008). doi:10.1007/978-3-540-69062-7_11
11. Runeson, P., Höst, M., Rainer, A., Regnell, B.: Case Study Research in Software
Engineering Guidelines and Examples. John Wiley & Sons, Hoboken (2012)
12. The interview instrument can be accessed at http://serg.cs.lth.se/ﬁleadmin/serg/
InterviewQuestions.pdf
13. Wnuk, K., Gorschek, T., Callele, D., Karlsson, E.-A., Regnell, B., Ahlin, E.: Supporting
scope tracking and visualization for very large-scale requirements engineering-utilizing FSC
+, decision patterns, and atomic decision visualizations. IEEE Trans. Softw. Eng. 42, 47–74
(2016)
14. Wnuk, K., Regnell, B., Karlsson, L.: What happened to our features? Visualization and
understanding of scope change dynamics in a large-scale industrial setting. In: 17th IEEE
International Requirements Engineering Conference, RE 2009, pp. 89–98 (2009)
15. Wnuk, K., Pfahl, D., Callele, D., Karlsson, E.A.: How can open source software
development help requirements management gain the potential of open innovation: an
exploratory study. In: ACM-IEEE International Symposium on Empirical Software
Engineering and Measurement (ESEM), pp. 271–279 (2012)

Quality Rule Violations in SharePoint Applications:
An Empirical Study in Industry
Apostolos Ampatzoglou1 ✉ , Paris Avgeriou1, Thom Koenders2, Pascal van Alphen2,
and Ioannis Stamelos3
(

1

)

Department of Computer Science, University of Groningen, Groningen, Netherlands
apostolos.ampatzoglou@gmail.com, paris@cs.rug.nl
2
SharePoint Department, Capgemini Netherlands, Utrecht, Netherlands
thomkoenders@gmail.com, pascal.van.alphen@capgemini.com
3
Department of Computer Science, Aristotle University, Thessaloniki, Greece
stamelos@csd.auth.gr

Abstract. In this paper, we focus on source code quality assessment for Share‐
Point applications, which is a powerful framework for developing software by
combining imperative and declarative programming. In particular, we present an
industrial case study conducted in a software consulting/development company
in Netherlands, which aimed at: identifying the most common SharePoint quality
rule violations and their severity. The results indicate that the most frequent rule
violations are identiﬁed in the JavaScript part of the applications, and that the
most severe ones are related to correctness, security and deployment. The afore‐
mentioned results can be exploited by both researchers and practitioners, in terms
of future research directions, and to inform the quality assurance process.
Keywords: Quality assessment · Defect prediction · Sharepoint

1

Introduction

Although some organizations have such unique business processes that urge for custommade software solutions, in most cases, business processes are fairly typical; therefore,
their needs can be accommodated using more standardized solutions (e.g., by reusing
existing commercial-oﬀ-the-self—COTS software). This has led to a shift from devel‐
oping custom-made solutions to standardized solutions that are based on modular devel‐
opment. A prominent way to develop such solutions is SharePoint, in which standardized
COTS are combined to oﬀer the required functionality. These standardized modules can
be conﬁgured to target and facilitate the speciﬁc needs of diverse organizations. The
conﬁguration of such solutions can be performed by using a declarative programming
language (e.g., XML). Furthermore, SharePoint can be extended with custom modules
and code, so as to provide functionality that is not found in the oﬀered COTS. This way,
solutions can be created by combining standardized oﬀ-the-self components and organ‐
ization-speciﬁc modules (developed using an imperative language—e.g., C#).
Similarly to conventional software development maintainability of SharePoint appli‐
cations is of paramount importance, since maintenance of a software system is
© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 495–505, 2016.
DOI: 10.1007/978-3-319-49094-6_35

496

A. Ampatzoglou et al.

considered as the most eﬀort-intensive part of the software development lifecycle [11],
urging organizations to increase maintainability to reduce the overall maintenance
expenditures [1]. In this paper we study a particular aspect of maintainability, namely
source code quality. Source code quality has been assessed in many ways in the litera‐
ture: through software metrics [2], number of defects [10], number of vulnerabilities [3],
etc. Focusing on the number of vulnerabilities (also known as quality rule violations)
has certain beneﬁts: (a) they are more easily interpreted compared to metrics, since they
are targeting speciﬁc lines of code and the way to resolve them is simple; and (b) they
can be handled at a pre-deployment phase (in contrast to defects), thus they do not reach
the attention of the customer/end-user. A common practice for identifying such rule
violations is code reviews [6], which can be performed manually (code inspection), or
with tool-support (tool-assisted code review). The latter enables not only distributed
reviewing, but also improves both the quality and the quantity of reviews [7].
The goal of this paper is to investigate the source code quality of SharePoint appli‐
cations, through tool-assisted code reviews. Speciﬁcally, we aim at investigating which
rules: (a) are more frequently violated; and (b) are the most severe ones depending on
certain characteristics. The identiﬁcation of the most frequently violated rules can
provide insights into what are the most common “programming mistakes”. Among
those, special attention should be given to the rules that are most severe, as well as those
that are more probable to lead to defects, as the number of defects is crucial for the
success of a software system. To achieve this goal, we performed am industrial case
study, based on the guidelines provided by Runeson et al. [9] (more details in Sect. 3).

2

Related Work

In this section we present research eﬀorts that can be characterized as related work to
our study. We note that, to the best of our knowledge, there are no available studies in
SharePoint quality assessment. Therefore, we present related work that uses rule viola‐
tions as a means for quality assessment. Quality rule violations have been extensively
investigated as indicators of quality in the software engineering literature. For example,
Misra and Bhavsar [8] have explored rule violations as indicators for correctness, and
Zaman et al. [12] have explored them as indicators for security and performance. When
using rule violations to quantify quality, it is a common practice to classify them into
categories. Zaman et al. [12] classiﬁed rule violations according to their eﬀect on speciﬁc
QAs (e.g., security and performance). Therefore, to evaluate software projects with
respect to their quality, one can perform static analysis by collecting the number of
violated rules. One of the most established tools that is used for this purpose is FindBugs.
FindBugs is capable of detecting vulnerabilities in software by using bug patterns [4],
divided into ﬁve categories (in total 246 bug patterns) that can be mapped to: correctness,
performance, and security. In this study, since FindBugs is not applicable for SharePoint
applications, we used SPCAF (see Data Collection and Analysis in Sect. 3). Other tools
that perform such analysis are PMD and CPPcheck.

Quality Rule Violations in SharePoint Applications

3

497

Case Study Design

Objectives and Research Questions: The goal of the study is described using the GoalQuestion-Metric (GQM) approach, as follows: “analyze quality rule violations for
SharePoint applications for the purpose of evaluation, with respect to their (a) frequency
of occurrence, and (b) severity according to certain characteristics, from the viewpoint
of software engineers, in the context of SharePoint application development”. Based to
the aforementioned goal, we derived two research questions that guide the design and
reporting of the case study.
RQ1: What are the most frequently violated rules in SharePoint applications?
This research question aims at identifying the rules that are most frequently violated
in real-world SharePoint applications. Existing tools for SharePoint quality assurance
are capable of assessing approximately 400 predeﬁned quality rules, so we will explore
which ones are more frequent. In addition, we investigate the frequency of rules at
diﬀerent levels of criticality (e.g., warning, error, critical error, etc.)
RQ2: What is the severity of SharePoint rule violations?
SharePoint quality rules can be classiﬁed based on several characteristics (e.g., some
rules are related to correctness). Before speciﬁc defects can be detected, it is important
to determine what kinds of rules are more severe, (diﬃcult to identify or critical). There‐
fore, this research question is focused on the orientation of preventive maintenance
activities (i.e., the ones that aim at identifying and correcting latent faults in the software
product before they become eﬀective faults [11]).
Case Selection: This study involves different cases for each research question. For
RQ1, as cases we used three projects developed at Capgemini. The three projects have
been selected so as to belong to a different production stage (ranging from ‘under
development’ to ‘production-ready’ versions). The projects are referenced as ProjectA, Project-B, and Project-C for confidentiality reasons. Project-A was developed
internally by Capgemini. At the time of the analysis, the project was still in its early
stages of development. Project-B was developed by an external organization and
Capgemini was managing the code-base. At the time of the analysis, this code-base
was ready for use in a production environment according to the external organization
that was responsible for the development. Finally, Project-C had been developed
internally by Capgemini. At the time of the analysis, the code-base was being used
in a production environment. Concerning RQ2, we conducted a supervised survey
with SharePoint experts, so as to investigate the relationship of quality rules and
defects. The term supervised survey [5] refers to the process during which an inter‐
view takes place, but with a specific data collection instrument (questionnaire) with
mostly closed questions that the responded would be able to complete without any
guidance. Nevertheless, in supervised surveys the researcher is in the same place as
the subjects to provide possible clarifications. The questionnaire has been given to
10 SharePoint experts of Capgemini Netherlands.
Data Collection and Analysis: The case study has been performed within Capgemini,
which is an international corporation primarily focused on providing IT services, which

498

A. Ampatzoglou et al.

is present in over 40 countries with more than 180,000 employees. At this point,
concerning SharePoint solutions, the quality assurance process is in a research stage.
Therefore, this project was of great interest to the company.
Used Tool—Code quality analysis for SharePoint is still in its infancy. An initial
research on the state-of-practice on this topic, unveiled that there is only one tool that
is in front of competition, namely SPCAF1. SharePoint Code Analysis Framework
(SPCAF) is a commercial set of four tools, specialized for SharePoint applications, each
one focused on diﬀerent aspects of code quality in SharePoint. From these tools, since
we are interested in identifying rule violations, we used only SPCop, which validates
both the imperative and declarative code. The tool is oﬀered with several predeﬁned
rule sets, each set consisting of a certain group of rules (e.g., the “All Rules” set includes
all of the rules2). If a violation of one of the rules is encountered, the occurrence is added
to the rule violations report. The rule violations are listed using the title of the rule that
is violated and the location (ﬁle and line number), where the violation was discovered.
For the purpose of our study, we used the predeﬁned set of analysis rules (i.e., those that
would be selected without any tool customization), to increase replicability of the study,
and mitigate researcher bias (which would be raised if any selection was made by the
authors). The rules that we used in our case study identify quality rule violations of the
following categories: correctness (cor), supportability (sup), deployment (dep), security
(sec), design (des), best practices (pra), memory (mem), naming (nam), localization
(loc), and JavaScript (jsh).
Data Collection Instruments from Experts—The outcome of using certain constructs
in SharePoint can be diﬃcult to predict, because some aspects are hard to assess, unless
they have been previously encountered. Therefore, we decided to collect knowledge
from experts to ﬁll in these uncertainties, and in addition to validate the information and
conclusions. We elicit data from experts using one questionnaire, aiming at answering
RQ2. The questionnaire consisted of ﬁve questions, each inquiring about the severity of
a certain SharePoint rule type, and an initial one aiming at understanding which part of
SharePoint code is more defect prone.
[q1] SharePoint projects are built using two types of code. The ﬁrst type being the
imperative code, which is the C# code. The second type being the declarative code,
which is the XML code. Which of these types of code do you consider more prone
to produce defects (Imperative code, Declarative code, or Equal)?
[q2] The absence of referenced resources will most probably be followed by rule
violations. A missing resource should be easily detectable. From this perspective,
how severe do you consider this kind of rule violations (“Not severe at all consid‐
ering it is easily detectable”, “Moderately severe, it may be easy to detect, but that
does not make the rule less important”, or “Highly severe, it is important such a
rule violation is detected and solved as soon as possible”)?
[q3] On the contrary, having too many resources might also result in rule violations.
In some cases SharePoint deploys prohibited assemblies, e.g. “ssocli.dll”, or

1
2

http://www.spcaf.com.
Rules are available in https://docs.spcaf.com/v6/SPCAF_PAGE_QUALITY.html.

Quality Rule Violations in SharePoint Applications

499

includes the same assembly in diﬀerent conﬁguration ﬁles. How severe do you
consider this (the options are the same as q2)?
[q4] In the XML conﬁguration ﬁles, a lot of required attributes have to be deﬁned. Not
ﬁlling in these required attributes might result in rule violations. How would you
value assistance on this kind of violations (“Very important, I consider these rules
to be severe”, “Moderately important, assistance would help me solve these rule
violations faster”, or “Do not need assistance, these rule violations are easy
enough to solve on my own”)?
[q5] A possible aspect of security related rule violations is that they may not be as
detectable as other violations, since security-related rules do not have to cause
crashes, but result in unwanted behavior that is more diﬃcult to discover. These
rule violations might pose a big threat. How do you value security related rule
violations that produce this unwanted behavior (“Highly severe, any assistance in
this ﬁeld would be highly helpful since these rule violations are hard to detect”,
“Moderately severe, this kind of rule violations are uncommon”, or “Not severe,
these kind of rule violations are no problem at all”)?
[q6] The previous questions have covered the following kinds of rule violations: (a)
Missing resources, (b) Having too many resources, (c) Missing attributes in XML,
and (d) Security issues. Are there categories of rule violations that have not been
covered by the previous questions? If so, what types of rules do you feel are not
represented (This is an open-ended question)?
Data Analysis: The research questions have been answered by using descriptive statis‐
tics. In particular we have used frequency tables, and bar charts for all questions. For
RQ2, in case subjects provided some qualitative data, we tried to analyze and synthesize
their answers using semantic analysis. Nevertheless, this process was very simple due
to the low number of responses and the similarity of answers.

4

Results

In this section we present the results of our data analysis, organized by research question.
Implications to researchers and practitioners are provided in the discussions section (see
Sect. 5). Figure 1 presents the number of violations identiﬁed in each project. From the
ﬁgure, we can observe that the three studied projects are diﬀerent, providing broad and
representative data. Additionally, regardless of the maturity of the code-base, the tool
was able, in all three cases, to provide valuable data on how to improve the source code
quality.

500

A. Ampatzoglou et al.

Fig. 1. Demographics on Rule Violations Criticality

Rule Violations Frequency: In Table 1, we present the most frequent rule violations
in the three examined projects. In particular, in Table 1 we present the category, the
criticality, the name, and the occurrence frequency of the most recurrent rule violations.
We note that each rule has a single criticality, regardless of the context in which it is
used.
Table 1. Most Frequent Rule Violations.
Category
loc
jsh
jsh
jsh
jsh
jsh

Criticality
W
W
W
W
W
W

sec

CW

jsh
nam

W
W

sec

CW

Name
Use resources for localizable attributes
Use curly braces around blocks
Use correct === and ! ==
Declare variable before it is used
Avoid trailing whitespaces
Do not exceed max length of a line in
code
Avoid usage of “RunWithElevatedPri‐
vileges”
Remove unused variables
Files / Folders should contain the name
of the parent solution
Avoid setting “AllowUnsafe Updates”
on SPWeb

Frequency
1642
745
700
430
379
324
196
189
146
142

From Table 1, two observations can be made: (a) jsh is the category, in which the
most frequent rule violations can be identiﬁed, and (b) the top-10 most frequent rule
violations are Warnings or Critical Warnings. The ﬁrst column of the table shows that
six out of the ten most recurring rule violations are found in the JavaScript (jsh) category.
This means that an important percentage of the identiﬁed rule violations is found in the
JavaScript code. However, this does not necessarily mean that violating these rules has
a negative impact on the external behavior of the system, since some of these rules focus
only on code conventions. Nevertheless, it still is interesting to take such an observation

Quality Rule Violations in SharePoint Applications

501

into account, when optimizing and improving code quality. Additionally, two out of the
ten most frequent rule violations are critical warnings, while the remaining eight are
warnings. Therefore, there are no errors in the ﬁrst positions of the rule violation
frequency table. By inspecting the complete list of rule violations (omitted from this
manuscript due to space limitations), it becomes apparent that the ﬁrst twenty-two rule
violations are either critical warnings or warnings.
Table 2. Most Frequent Errors caused by Rule Violations
Category
cor
dep
cor
dep
cor
dep
sup
sup
cor
cor

Criticality Name
E
Deﬁne attribute ‘ID’ in FieldRef in
correct casing
E
Do not deploy assembly multiple times
E
Declare required attributes in schema of
ListTemplate
E
Do not deploy assembly with DEBUG
mode
CE
Deﬁne unique value for ‘Id’ in
CustomAction
E
Do not deploy TemplateFile multiple
times
CE
Do not access SharePoint API via
reﬂection
E
Do not read ConnectionString from
SPContent-Database
E
Declare required attribute in
CustomAction
E
Declare required attributes in
SiteDeﬁnition

Frequency
50
28
27
24
22
10
6
6
4
4

By inspecting the top-10 most encountered errors (see Table 2), one can highlight
the following:
• The number of occurrences of the ten most occurring critical errors and errors
is relatively low, especially when compared to the amount of occurrences of the ten
most frequently occurring rule violations. However, each of these errors has a higher
potential to cause system crashes or unexpected behavior. Therefore, the information
that is provided by these reports can be considered valuable, since they directly
provide information on aspects that require immediate attention.
• The other aspect that is diﬀerent between the ten most frequently occurring rule
violations and the ten most recurring errors are the categories of the violations. The
list of top-10 most frequently occurring rule violations was dominated by the Java‐
Script (front-end) category, whereas the top-10 most recurring errors is identiﬁed
in the SharePoint backbone categories (i.e., correctness and deployment). A
possible explanation for this is that rules of the JavaScript category rule violations
cannot have a large negative inﬂuence on the system, but rather result in user interface

502

A. Ampatzoglou et al.

problems, while SharePoint backbone categories, can potentially have a huge impact
on the ability of the system to function as intended.
Severity of Rule Violations: In this section we discuss the level of severity of rule
violations, organized by the six questions included in our questionnaire. The ﬁrst ques‐
tion was designed to determine what type of code, (i.e., imperative or declarative), was
considered most prone to result in defects (related to Correctness). The results obtained
based on experts opinion suggest that 10 % of the participants considers imperative code
and declarative code equally prone to defects. The remaining 90 % chose the imperative
or the declarative code options nearly the same amount of times, consisting an indecisive
diﬀerence. Therefore, both types of code are considered equally prone to result in
defects. In addition to the quantitative results, we have encountered some qualitative
results as well: One developer stated that coding in C#, and XML was mostly used to
facilitate communication between websites. A second developer pointed to code written
with XSL, which is a form of declarative language, as the most defect prone parts of the
code. The third comment stated that XML is more sensitive to syntax related mistakes,
and that these small mistakes may have big consequences. However, these are supposed
to be easier to ﬁx, leaving choice on the multiple choice part to the imperative option.
Finally, the fourth comment stated that XML is more error prone, but the impact on the
systems’ defects and performance is signiﬁcantly less.
The second question was designed to determine the severity of leaving out refer‐
enced resources, the emphasis being on the inﬂuence, since this rule violation should
be relatively easy to detect (related to Deployment). Only 20 % of the participants
considered the potential to result in defects to be highly severe. On top of that, 60 %
considered the potential to result in defects to be moderately severe. Overall, this means
that a total of 80 % considered the potential for defects at least moderately severe. This
is a good indication that this aspect of SharePoint has to be monitored when analyzing
the code quality. The third question was designed to determine the negative inﬂuence
of adding too many assemblies, i.e. prohibited assemblies or including the same
assembly twice (related to Deployment). Out of the 80 % of the participants that chose
one of the multiple choice answers, only 10 % considered the defect not severe. There‐
fore, it was concluded that this kind of rule violation in SharePoint will be considered
moderately severe meaning this kind of rule violations will be monitored when analyzing
the code quality. After initial research into the SPCAF tool, it soon became clear that it
oﬀered good detection and assistance on missing required attributes in the XML
conﬁguration ﬁles. The fourth question was designed to explore the severity of this kind
of defects, to determine its impact on the number of times the system present unexpected
behavior, and to ascertain the importance of this rule. 40 % of the participants considered
the assistance very important, since they considered potential defects that can result from
this rule violation. 30 % considered the assistance moderately important, mainly because
it allowed them to solve the problematic code faster. Only 20 % of the participants stated
that they did not ﬁnd the assistance valuable. In conclusion, the severity of this type of
rules will be regarded as highly severe since 70 % considered it at least moderately
important and 40 % considered it as very important.

Quality Rule Violations in SharePoint Applications

503

Even though Security related vulnerabilities might not have a huge impact on the
behavior of the software, possible rule violations may have an even bigger impact on
the system. Software carrying security related vulnerabilities may appear to function as
intended, but would malfunction in the security area, e.g., it may provide entrance to
users to parts that should not be accessible. The ﬁfth question is designed to determine
how valuable the experts consider detection and assistance in the security area. The
results suggested that 70 % of the participants considered the potential of rule violations
to result in defects, highly severe, and that assistance on this type of defects is highly
appreciated. The remaining 30 % of the participants consider the rule violations moder‐
ately severe (since they are uncommon). Finally, the sixth question aimed to provide
the experts with the ability to name types of rules that were not discussed in the ﬁrst
ﬁve questions. This way, the types of rules that were not yet represented, could still be
brought forward. The additional types of vulnerabilities were: (a) Memory violations,
e.g. disposing all sorts of instances or memory leaks; (b) Performance related violations,
e.g. endless loops or other ineﬃcient code; (c) Common coding mistakes, e.g. wrong
syntax or improper use of variables; (d) Not using unique identiﬁcation of components;
and (e) Inconsistency of developed code. This new insight posed a valuable addition to
the types of rules that were already considered in the code quality analysis.

5

Discussion

The results of this study can be considered as a starting point for code quality analysis
in SharePoint applications, which is a rather understudied research ﬁeld. The obtained
results can be useful to both researchers and practitioners:
• (researchers) The relation of some SPCAF rules to defects remains uncertain. These
rules require further investigation.
• (researchers) This research eﬀort was exploratory since it was based on expert
opinion and descriptive statistics. More explanatory research is required.
• (researchers) Evolution analysis with data analytics can be performed, to conﬁrm the
relationship between the existence of defects and speciﬁc rule violations.
• (practitioners) The majority (approx. 67 %) of the predeﬁned rules oﬀered by SPCAF
is associated to defects. Therefore SPCAF can consist a good starting point for toolassisted code reviews.
• (practitioners) Most rule violations are related to the client-side of the application,
but these rules are not that sever. Correctness, deployment, and security rule viola‐
tions should be prioritized.

6

Threats to Validity

In this section we present potential threats to validity for our study following the guide‐
lines proposed by Runeson et al. [9]. According to Runeson et al., there are four types
of threats to validity: construct, reliability, external and internal validity threats. In this
study internal validity will not be considered, since causal relations are not in the scope
of this study. Concerning construct validity, we have identiﬁed one possible threat, i.e.,
the fact that we assessed the quality of the code, based on the suggestions of a single

504

A. Ampatzoglou et al.

tool. Although this threat is important, a discussion with practitioners suggested that 2
out of 3 rule violations, identiﬁed by the tool, are considered vital by practitioners. In
addition to that, no other tools for SharePoint applications quality assurance exist, to the
best of our knowledge. To mitigate threats to reliability, we presented in detail the case
study design, and we have not parameterized the used tools, to ensure that our results
are reproducible and comparable to future replications. Concerning external validity,
we need to underline that the obtained results cannot be generalized to all SharePoint
projects and that the use of a diﬀerent tool for code reviews, might have led to diﬀerent
results. Nevertheless, the diversity of the examined projects ensures some heterogeneity
in the cases.

7

Conclusions

This study aimed at exploring the quality assessment processes in SharePoint application
development, through tool-assisted code reviews. The results of the study suggested that
the majority of the rule violations can potentially lead to defects, and that they exist in
all stages of software, regardless of their positioning in the software development life‐
cycle. As expected, the number of critical errors and errors are eliminated in production
ready software. In addition, the most frequently occurring rule violations are warnings
that exist in the JavaScript part of the applications, whereas the more severe errors (i.e.,
correctness, deployment, and security) are more probable to appear in the imperative
parts of SharePoint applications.

References
1. Ampatzoglou, A., Ampatzoglou, A., Chatzigeorgiou, A., Avgeriou, P.: The ﬁnancial aspect
of managing technical debt: a systematic literature review. Inf. Softw. Technol. 64(8), 52–73
(2015). Elsevier
2. Charalampidou, S., Ampatzoglou, A., Avgeriou, P.: Size and cohesion metrics as indicators
of the long method bad smell: an empirical study. In: 11th International Conference on
Predictive Models and Data Analytics in Software Engineering (PROMISE 2015). ACM,
Beijing, October 2015
3. Feitosa, D., Ampatzoglou, A., Avgeriou, A., Nakagawa E.Y.: Investigating quality trade-oﬀs
in open source critical embedded systems. In: 11th International Conference on the Quality
of Software Architectures (QoSA 2015). ACM, Canada, May 2015
4. Hovemeyer, D., Pugh, W.: Finding bugs is easy. ACM SIGPLAN Not. 39(12), 92–106 (2004)
5. Kitchenham, B., Pﬂeeger, S.L.: Principles of survey research part 2: designing a survey. ACM
Spec. Interest Group Softw. 27(1), 18–20 (2002)
6. McConnell S.C.: Code Complete: A Practical Handbook of Software Construction. Microsoft
Press, Redmond (2004)
7. Meyer, B.: Design and code reviews in the age of the internet. ACM Commun. 51(9), 66–71
(2008)
8. Misra, S.C., Bhavsar, V.C.: Relationships between selected software measures and latent bugdensity: guidelines for improving quality. In: Kumar, V., Gavrilova, Marina, L., Tan, C.J.K.,
L’Ecuyer, P. (eds.) ICCSA 2003. LNCS, vol. 2667, pp. 724–732. Springer, Heidelberg (2003).
doi:10.1007/3-540-44839-X_76

Quality Rule Violations in SharePoint Applications

505

9. Runeson, P., Höst, M., Rainer, A., Regnell, B.: Case Study Research in Software Engineering:
Guidelines and Examples. John Wiley and Sons, Inc. (2012)
10. Vokac, M.: Defect frequency and design patterns: an empirical study of industrial code. IEEE
Trans. Softw. Eng. 30(12), 904–917 (2004)
11. Van Vliet, H.: Software Engineering: Principles and Practice. Wiley & Sons, New York (2008)
12. Zaman, S., Adams, B., Hassan, A.E.: Security versus performance bugs. In: 8th Working
Conference on Mining Software Repositories (MSR 2011), pp. 93–102 (2011)

Quality Assurance of Requirements Artifacts
in Practice: A Case Study
and a Process Proposal
Henning Femmer1(B) , Benedikt Hauptmann1 , Sebastian Eder1 ,
and Dagmar Moser2
1

Technische Universität München, Munich, Germany
{femmer,hauptmab,eders}@in.tum.de
2
Munich Re, Munich, Germany
dmoser@munichre.com

Abstract. Requirements artifacts build the basis for various software
engineering activities, such as development, testing or eﬀort estimations.
As such, the quality of requirements artifacts impacts the eﬃciency
and eﬀectiveness of these activities. Consequently, requirements artifacts
should be subject to quality assurance (QA).
Unfortunately, QA of requirements artifacts struggles in practice. We
contribute a ﬁrst industrial case study, in which we found that the main
problems in QA for requirements artifacts were a missing common quality
understanding, the low feedback speed, low eﬃciency in the QA process,
and, consequently, the lack of creating a sustaining QA processes.
Based on these results, we furthermore contribute a process for
requirements artifact QA that is designed to address these problems. We
discuss feasibility and impact of the process with industry, who acknowledge its potential to increase eﬃciency and to provide a more sustaining
QA process in practice.

Keywords: Requirements Engineering

1

· Artifacts · Quality Assurance

Introduction

Requirements artifacts (such as use cases) are central entities for software
projects: Based on these artifacts, developers build the system, test managers set
up a test-strategy, etc. Consequently, it is widely accepted that quality defects in
requirements artifacts can cause expensive consequences in downstream software
development activities [1]. We supported the evidence of negative consequences
of requirements quality defects also through experiments, e.g. for the impact of
passive voice on understanding in [2].
But how can we detect these quality defects to remove them before they
propagate with the aforementioned negative consequences? As analytical quality assurance techniques, Fagan inspections [3] or perspective-based reading have
c Springer International Publishing AG 2016

P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 506–516, 2016.
DOI: 10.1007/978-3-319-49094-6 36

Quality Assurance of Requirements Artifacts in Practice

507

shown a high eﬀectiveness for detecting quality defects in various empirical studies (summarized in [4]). However, based on our experience in industry, we argue
that, in practice such heavy-weight processes are (at most) only applied at the
beginning of a project, if at all. After a while, projects neglect quality assurance (QA) more and more, until QA is performed just in an ad-hoc instead of
continuous fashion, depending on the available resources in the projects. While
requirements engineers in a recent study by Mendez and Wagner [5] name various
quality defects, e.g. incomplete, hidden, inconsistent, or underspeciﬁed requirements, as well as their expensive consequences, the question remains: Which are
the concrete issues that engineers face during requirements engineering artifact
QA (in the following short: requirements QA)?
To understand these problems, we report on an ﬁrst exploratory case study,
in which we analyze the situation at Munich Re, one of the largest reinsurance
companies world-wide, to understand the individual challenges of requirements
engineers in practice. Based on the outcomes, we develop a process that aims at
controlling these challenges to enable a sustaining QA process for RE artifacts.
Problem Statement: We lack knowledge on the challenges of real-world
requirements QA processes, and eﬃcient processes in requirements (artifact) QA.
Contributions: This paper contributes an analysis of the challenges of requirements QA processes in a case study and a proposal for a more eﬃcient QA process
for RE artifacts.

2

A Case Study on Requirements QA in Practice

To gain ﬁrst insights into the state of the practice as well as the main problems
that requirements engineers face, we performed an exploratory case study with
three cases in the context of one company.
2.1

Research Questions

We concentrated on the following research questions:
RQ 1: How is quality understood?
RQ 2: What is the status quo of quality and QA of requirements artifacts?
RQ 3: What are the main problems for QA of requirements artifacts?
2.2

Case and Subject Description

To get practically relevant outcomes, we performed our study in the context
of industrial software projects within Munich Re. Munich Re has about 45,000
employees in reinsurance and primary insurance worldwide. For their business,
they develop and maintain a variety of individual software systems. The produced systems often involve many stakeholders across the globe, and software

508

H. Femmer et al.

systems are often run by changing, distributed teams over a long period of time.
Therefore, good requirements engineering (RE) and RE artifacts are very important to the company.
We performed three interviews with in total ﬁve requirements and system
engineers of Munich Re. The engineers we interviewed are responsible for three
diﬀerent, globally running business information systems. Each of the systems is
up to seven years old and under active development and maintenance. In the
projects, requirements are typically documented in form of use case descriptions,
business rules and change requests documented in Microsoft Word.
2.3

Data Collection and Analysis

To answer RQ 1 to RQ 3, we collected qualitative data through semi-structured
interviews. Upfront, we created an interview guideline in which we discussed four
aspects of the current project of the participants.
Each interview took about 60 min and covered diﬀerent aspects of the
project’s requirements QA. After the interviews, we openly classiﬁed our notes.
Afterwards, we summarized the results and compared the results across the different interviewees. To increase the validity of our interpretation, the results
were presented and discussed in the company at various diﬀerent levels, including requirements engineers from projects, RE methodologists and department
heads.
2.4

Results of the Case Study

In the following, we report on the results, structured through the aspects understanding of quality, status quo, and challenges.
RQ 1: Understanding of Quality in RE
We ﬁrst discussed with requirements engineers what they understood by good
requirements artifacts. We asked them to describe what high or low quality in
RE means for them.
We found that all projects strongly agreed on the importance of requirements
artifact quality, such as the quality of use cases. When asked about what quality
meant for the requirements engineers, they described quality in two diﬀerent
forms, as summarized in Table 1. While in the former understanding, engineers
broke down quality into properties of artifacts, in the latter, engineers described
it in terms of what the requirements engineer wants to use it for (activity-based
quality; cf. also our work in [6]).
Interpretation: We observed that the criteria can be split up into properties
of artifacts and properties of activities to be performed with the artifacts eﬀectively and eﬃciently (quality-in-use). Nevertheless, these two views are strongly
related: The rationale of most of the aspects of the former category is that certain activities of the latter can be executed. For example, the artifact should
have no implicit knowledge, to enable an eﬃcient understanding, and thereby to
enable an eﬃcient switch to new suppliers.

Quality Assurance of Requirements Artifacts in Practice

509

Table 1. Two views on quality
Quality as properties of artifacts:
– Artifact is correct/valid
– Artifact is clearly traced to tests
– Artifact contains no implicit knowledge
– Artifact is up to date
– Artifact has appropriate level of
abstraction
– Artifact is well-structured
– Artifact is using only un-ambiguous
phrases and grammatical constructs
(e.g. active instead of passive voice)

Quality-in-use: Artifacts enable. . .
– Estimate the testing and development
eﬀort
– Implement a requested change
– Keep an overview of the system and its
requirements
– Find a certain information of the business process
– Understand the system / compensate for
team ﬂuctuation
– Switch the development team to a new
supplier

Result: In summary, the requirements engineers were aware of or even advocating the importance of requirements artifact quality. There was no unique view
on quality, instead it could be seen from both the properties of the artifacts,
as well as quality-in-use perspective. Together, these views holistically form the
understanding of quality in the projects.
RQ 2: Status Quo of Quality and Requirements QA
As a second research question, we analyzed the status quo of the cases regarding
quality assurance in RE.
There was no systematic process for requirements QA in the cases we have
seen. Two of the projects used to have a requirements QA process at the beginning of the project. However, after a while, the processes decayed.
As a consequence, two of the projects explained that they are aware that their
requirements artifacts might not be up-to-date. They stated their dissatisfaction
with this situation, and explained the reasons (see next section). The requirements engineer of the third project claimed that the project’s requirements artifacts were of “relatively high quality” (“up-to-date”, “well-maintained”). She said
she takes care of this aspect, since these artifacts are vital for her to keep an
overview. However, also she did not have a systematic process for requirements
quality in place; instead, requirements QA was performed ad-hoc.
Interpretation: In all three projects there was no systematic requirements
QA in place. In some, there used to be a process, which was abandoned for
the reasons that we will explain in RQ 3. Regarding the impact of the missing
requirements QA, we had mixed responses.
Result: In summary, in these cases, requirements QA is done irregularly. The
resulting quality was, by the engineer’s own admissions, strongly varying.

510

H. Femmer et al.

RQ 3: Requirements QA Challenges
From the ﬁrst results, we understood that the requirements engineers were willing to create good artifacts, they had a varying understanding of what good
artifacts meant in their domain (see RQ 1). However, afterwards, we discovered
that there were no systematic processes (anymore) to analytically ensure that
the RE artifacts were of high quality (see RQ 2). The reason for this partially
lays in the process in which the company aims to focus on high quality of the
product instead of requirements and documentation. In addition, they explained
various challenges that hinder systematic analytical requirements QA:
Problem 1: QA is time consuming and expensive. In practice, requirements QA
mostly means manual reviews. These reviews are time-consuming activities since
they require various coordinated actions by various stakeholders: review goal
deﬁnitions, review participant selection, participant coordination, participant
review execution, collection of results, discussion of results, and ﬁnally, a check
whether all changes were executed correctly.
Problem 2: Synchronous QA meetings are ineﬃcient. Synchronizing for QA,
i.e. having a group of people coming together for a review, was considered inefﬁcient. Scrolling through the text together, and discussing typos, ambiguity or
other aspects in the group was perceived unnecessary, especially if the group is
large.
Problem 3: QA processes have long feedback loops. The company’s software
development process is built around short iterations (usually two-weeks). One
engineer said that a full-blown review, as they performed in the beginning of the
project, would take too long to ﬁt into one iteration1 .
Problem 4: QA initiatives decay over time. Maybe as a consequence of Problem 1–3, ampliﬁed by project pressure, after a while, the existing QA processes
decay and things are done in a cowboy-fashion, as we also see in the answers to
RQ 2.
Problem 5: Quality of requirements artifacts decays over time. Maybe also as
a consequence of Problem 4, participants state that the quality of artifacts gets
worse over time, especially in terms of being outdated. This makes the artifact
less and less usable, since stakeholders, such as testers or developers cannot rely
on it. This again will deteriorate the quality, starting a vicious circle.
1

In a presentation after the study, another engineer reported that for a review it
usually took multiple weeks until the original author received feedback. The person
mentioned that after she received feedback, she herself, although being the original
author, needed some time to understand the content again.

Quality Assurance of Requirements Artifacts in Practice

511

Problem 6: QA processes lack support for the maintenance phase. All analyzed
cases were in maintenance phase, meaning, that applied changes not always
introduce new use cases. Instead, sometimes just passages of existing use cases
are changed. Since reviewing the whole use cases was considered unnecessary, no
QA was performed instead. On the long run, this obviously leads to unreviewed
changes and decay of quality (see also Problem 4).
Problem 7: No support for introducing QA in the middle of a project. Lastly,
the question was raised whether it made sense to start improving all use cases
now. Sometimes, the authors of the original artifacts have left the team. Therefore, a QA initiative now would probably reveal many diﬀerent ﬁndings, overwhelm the inspectors, and leave the question where to start with cleaning up.
This is again related to Problem 4 and 6.
Problem 8: Quality deﬁnition varies. In addition to the aforementioned, explicitly stated problems, RQ 1 shows another problem, unmentioned by the requirements engineers: The understanding of requirements engineers strongly varies
from project to project.
Interpretation: We can group these problems into four categories: First, the
main challenges lay in the time-consuming QA review activities. Manual reviewing is an expensive activity that does not immediately pay back. Second, engineers express their discontent with the long feedback cycles, which (a) constraints
the development process and (b) creates additional eﬀort, since authors must
re-read the texts again. Third, probably since QA is not the primary responsibility of requirements engineers, many projects face diﬃculties sustaining the
QA process. Fourth, quality deﬁnition varies. This is not necessarily a problem,
since diﬀerent projects may require diﬀerent quality deﬁnitions. However, there
should be systematization or control behind the varying quality deﬁnitions.
Many of these problems are linked to each other (e.g. one problem is the
consequence of another). For example, as stated in the introduction, it is a
common assumption in the processes community that decay of QA processes
leads to quality decay. Many of these challenges are unavoidable facts. There
is no free lunch and QA will always cost resources. Although most challenges
are common-sense, listing them as constraints helps to improve the process by
trying to limit the eﬀect and hassle of each individual factor.
Result: The case revealed eight main problems, which can be summarized into
time-consuming QA, long feedback cycles, lack of sustaining the process, and a
varying understanding of quality.
2.5

Proposals for Sustaining Requirements QA

Based on the aforementioned problems, we developed a set of principles that
target these problems and foster a sustaining QA process for requirements arti-

512

H. Femmer et al.

Problem 1:
Time-consuming
& expensive

Problem 2:
Meetings

Problem 3:
Long feedback
loops

Principle 1:

Proposal:
Use meetings
for important
issues only

Proposal:
Phased
Inspections

Problem 4:
QA decays

Problem 5:
Quality decays

Principle 2:
Receive feedback
rapidly

Proposal:
QA every

Proposal:
Measure
clean-up

Problem 6:
QA in
maintenance?

Principle 3:
Sustain QA

Proposal:
Boy-Scouts
rule

Problem 7:
Introduce QA in
later phase?

Problem 8:
varies

Principle 4:
Understand
quality in context

Proposal:
Embed
QA tools into
working tools

Proposal:
Add a
QA role

Proposal:
Company Users
provides decide
guidelines quality

Fig. 1. Problems, principles and proposals

facts (see Fig. 1). Each principle is supported by one or more process proposals.
The proposals transfer various concepts from QA of source code or project management to RE.
2.6

A Step-by-Step Process for Requirements QA

We suggest to combine the various principles and proposals into step-by-step
QA process. We assume that the process is applied either in the beginning of a
project or in the middle of a development process (see Problem 7). The process
is applied continuously, for each change in the requirements. We furthermore
assume the existence of tool support in terms of automatic feedback. For possibilities and limitations of such a tool, see e.g. AQUSA [7] or our own [8,9].
Lastly, we assume the existence of a QA system that supervises the process
itself. Figure 2 illustrates the complete process.

Legend:

1

Quality

Variation

3

4

Requirements
engineer

Role

Activity

2

changes
requirements

incorporates
automatic
feedback
(within tool)

reports artifact
as ready
to review

Guidelines

Sections

selects
reviewer(s) for
review

5

6

7

Artifact
users

Req.
engineer

Quality
engineer

incorporates
provide manual
quality
sets up meeting
assurance
if necessary.

Usage

Usage

Artifact users

Synchrony

Meeting,
Participants

controls
QA'ed
sections

Supervision

Fig. 2. A proposal for a quality assurance process with roles, activities, quality deﬁnitions and variation points, based on proposals of Fig. 1

Quality Assurance of Requirements Artifacts in Practice

3

513

Discussion in Practice

As a next step, we need to understand whether the proposed process is applicable
and beneﬁtial in practice. As a ﬁrst step in this direction, we discussed the
proposed process in terms of feasibility, impact, and eﬀort with one of the leading
requirements engineers of the company.
Feasibility: First, we wanted to ﬁnd out whether the expert thought the approach was feasible in practice. The expert answered this question in the aﬃrmative. She stated, however, that besides the QA process itself, an additional
aspect is the integration into concrete development processes in the projects. Her
company strongly advocates short iterations and often development and testing
is performed in parallel to requirements engineering, hence, the question remains
when to conduct the QA.
Beneﬁts: Second, we iterated through the problems of RQ 3, as described
in Sect. 2.4, and asked whether the expert thought these problems could be
addressed by the process. She stated that in general, she saw the potential for
this. However, it also much depends individual people and projects at hand, so
further studies are necessary.
Required Eﬀort: Third, we asked about the required eﬀort for such a process
in comparison to the status quo. She named various cost items, which she estimated to be relevant: In general, the most relevant cost item are still the manual
reviews. She suggested that if the users of the artifacts execute the review (as proposed in the process), these eﬀorts could be reduced. She argued that these users,
e.g. the testers, have to read and work with the requirements anyways. Costs
for manual reviews include furthermore organizational costs for synchronous
reviews. In addition, writing protocols for reviews was considered costly, both
for reporting ﬁndings from the reviewer to the requirements engineers, but also
reporting changed sections from the requirements engineer to the reviewer after
the review comments have been incorporated. All aspects strongly depend on
the appropriate tool support. Consequently, the success of the proposed process
depends to some extent how it is integrated in a QA tool. This result supports similar conclusions in the domain of code quality, as reported by other
researchers [10,11].
Summary: Altogether, the eﬀort depends on the concrete integration into the
project, the participating individuals, and the tool support. In future work, when
evaluating the process in practice, these factors should be analyzed in detail.
Threats to Validity. Regarding internal validity, recording and interpretation of
the interview is subject to conﬁrmation bias and similar subjectivity. Therefore,
two researchers took notes independently and compared the notes afterwards.

514

H. Femmer et al.

In addition, we validated our results and conclusions through presentation in
various meetings with the company. Furthermore, we distributed a ﬁrst draft of
this paper in the company. Participants acknowledged the validity of the results.
The external validity of the results contains two levels. Regarding the representativeness of results for the whole company, we performed data source triangulation through three cases. These interviews provided a coherent view on
this matter. In addition, in discussions at the company, various senior engineers
reported that, according to their own experience and perception, these results
were representative for this company. Yet on a more general level, future work
must analyze whether the same results hold for other companies and processes.

4

Related Work

Various authors have published in the related ﬁelds of QA for source code, as
well as QA for requirements.
Our paper is related to the area of requirements engineering process improvement. A recent systematic mapping study by Mendez et al. found “very few
exploratory papers” [12] in this area.
Various techniques in related work provide pieces for the puzzle. The probably best-known method in reviewing are Fagan inspections [13]. Other technique
are perspective based reading (c.f. [4]), which includes, just as we do, the concept of the stakeholder viewpoint, as well as ad-hoc, checklist-based, scenariobased, pattern-based or defect-based reviews [14]. The last piece in the puzzle
are phased inspections, as proposed by Knight and Myers [15]. In their paper,
Knight and Myers discuss the combination of automatic and manual approaches
and discuss the consequences. In this early work, they also discuss application
in requirements engineering, but mainly focus on code.
However, these methods do not put the small pieces together. In our opinion,
the work that most holistically combines diﬀerent approaches was written by
Katasonov and Sakkinen [14]. In conclusion, our process proposal extends this
classiﬁcation through various aspects. Based on the problems that we phased in
QA in practice, we extend the framework of Katasonov and Sakkinen in Fig. 3.
Salger [16] provides an analysis of what he argues to be the ﬁve main challenges in reviews, based on related work. We detail his results through an empirical analysis from the perspective of industrial requirements engineers. The results
overlap only partly. Nevertheless, the gap indicates that RE research must carefully coordinate research topics with practitioners to gain traction in industry.
Discussion: In contrast to previous works, we approach the topic holistically
(suggesting a complete process instead of only parts) and inductively, i.e. through
an exploratory, empirical investigation in a certain context and a derivation of a
process from this context. We extend the work by Katasonov and Sakkinen [14]
through additional variables in Fig. 3.

Quality Assurance of Requirements Artifacts in Practice

515

Quality Control

Automatic Checks

Visualization

Review

Translation

Process

Reviewers /
Stakeholders

Changed Sections

Synchronous

Artifact Users

Complete Artifact

Asynchronous

Quality Criteria

Review Object

Tool-internal

Guidelines

Tool-external

Quality-in-Use

Reading
Techniques

Quality
Supervision

Representation
Forms

Customer
Domain Expert

Fig. 3. Our process within the framework of Katasonov and Sakkinen [14]. We highlight
added aspects in light-blue. (Color ﬁgure online)

5

Conclusion and Future Work

In the presented paper, we interviewed requirements engineers in a case study to
gain ﬁrst insights into the challenges of requirements QA in practice. The study
shows that quality assurance is a time-consuming and diﬃcult endeavor that can
quickly decay. Based on the interviews, we argue that the key lies in the eﬃciency
of the process. We suggest various tools, most prominently automatic guideline
and quality checks, to increase eﬃciency. In summary, we consider our process
a pragmatic QA process. This means that on the one hand the process provides
strict guidelines. On the other hand, the process aims to avoid the ineﬃciency of
heavy-weight processes by optimizing cost drivers. As a very ﬁrst step towards
evaluation, we discuss the process with an expert from industry, giving us a ﬁrst
indication that the process could be applied in practice and, to some extent,
provide the required improvements.
There are some aspects that should be studied in future work. First, we
reported on two diﬀerent perspectives on requirements quality. Future work could
analyze how they are related and intertwined. In addition, we executed the case
study on three projects, but only in one context. Future work should extend this
to diﬀerent domains and compare the resulting challenges. We also proposed a
process that seems ﬁt in our context. We integrated this to an existing framework
by Katasonov and Sakkinen, and compared our results to those by Salger. Future
work could extend our approach and propose diﬀerent solutions that are more
ﬁt in other domains. Lastly, future work is required in the form of experiments
and longitudinal studies to evaluate whether the proposed process can indeed
prevent the decay on the long run.
Acknowledgments. The authors thank Daniel Méndez Fernández, Dominik Holling,
and Jakob Mund for their reviews on drafts of this paper. We furthermore want to
thank the participants of Munich Re, for their support of the study. This work was
performed within the project Q-Eﬀekt; it was funded by the German Federal Ministry
of Education and Research (BMBF) under grant no. 01IS15003 A-B. The authors
assume responsibility for the content.

516

H. Femmer et al.

References
1. Broy, M.: Requirements engineering as a key to holistic software quality. In: ISCIS
2006 (2006)
2. Femmer, H., Kucera, J., Vetrò, A.: On the impact of passive voice requirements
on domain modelling. In: ESEM (2014)
3. Fagan, M.E.: Design and code inspections to reduce errors in program development. In: Broy, M., Denert, E. (eds.) Pioneers and Their Contributions to
Software Engineering, pp. 301–334. Springer, Heidelberg (2002). doi:10.1007/
978-3-642-48354-7 13
4. Shull, F., Rus, I., Basili, V.: How perspective-based reading can improve requirements inspections. IEEE Comput. 33(7), 73–79 (2000)
5. Méndez Fernández, D., Wagner, S.: Naming the pain in requirements engineering:
a design for a global family of surveys and ﬁrst results from Germany. In: IST
(2014)
6. Femmer, H., Mund, J., Méndez Fernández, D.: It’s the activities, stupid! a new
perspective on RE quality. In: RET (2015)
7. Lucassen, G., Dalpiaz, F., Brinkkemper, S., van der Werf, J.: Forging high-quality
user stories: towards a discipline for agile requirements. In: RE Conference (2015)
8. Femmer, H., Méndez Fernández, D., Juergens, E., Klose, M., Zimmer, I.,
Zimmer, J.: Rapid requirements checks with requirements smells: two case studies.
In: RCoSE (2014)
9. Femmer, H., Fernández, D.M., Wagner, S., Eder, S.: Rapid quality assurance with
requirements smells. J. Syst. Softw. (2016)
10. Deissenboeck, F., Juergens, E., Hummel, B., Wagner, S., Mas y Parareda, B.,
Pizka, M.: Tool support for continuous quality control. IEEE Softw. 25(5), 60–67
(2008)
11. Steidl, D., Deissenboeck, F., Poehlmann, M., Heinke, R., Uhink-Mergenthaler, B.:
Continuous software quality control in practice. In: ICSE (2014)
12. Méndez Fernández, D., Ognawala, S., Wagner, S., Daneva, M.: Where do we stand
in requirements engineering improvement today? First results from a mapping
study. In: ESEM (2014)
13. Fagan, M.: Design and code inspections to reduce errors in program development.
IBM Syst. J. 15(3), 182–211 (1976)
14. Katasonov, A., Sakkinen, M.: Requirements quality control: a unifying framework.
RE 11(1), 42–57 (2005)
15. Knight, J.C., Myers, E.A.: An improved inspection technique. Commun. ACM
36(11), 51–61 (1993)
16. Salger, F.: Requirements reviews revisited - residual challenges and open research
questions. In: RE Conference (2013)

Commodity Eats Innovation for Breakfast:
A Model for Differentiating Feature
Realization
Aleksander Fabijan1(&), Helena Holmström Olsson1, and Jan Bosch2
1

Faculty of Technology and Society, Malmö University, Nordenskiöldsgatan 1,
211 19 Malmö, Sweden
{Aleksander.Fabijan,Helena.Holmstrom.Olsson}@mah.se
2
Department of Computer Science and Engineering,
Chalmers University of Technology, Hörselgången 11,
412 96 Göteborg, Sweden
Jan.Bosch@chalmers.se

Abstract. Once supporting the electrical and mechanical functionality, software today became the main competitive advantage in products. However, in the
companies that we study, the way in which software features are developed still
reflects the traditional ‘requirements over the wall’ approach. As a consequence,
individual departments prioritize what they believe is the most important and are
unable to identify which features are regularly used – ‘flow’, there to be bought
– ‘wow’, differentiating and that add value to customers, or which are regarded
commodity. In this paper, and based on case study research in three large
software-intensive companies, we (1) provide empirical evidence that companies do not distinguish between different types of features, which causes poor
allocation of R&D efforts and suppresses innovation, and (2) develop a model in
which we depict the activities for differentiating and working with different
types of features and stakeholders.
Keywords: Customer feedback  Innovation  Commodity
Flow feature  Duty feature  Checkbox feature


Wow feature


1 Introduction
The amount of software in products is rapidly increasing. At ﬁrst, software functionality was predominately required in order to support tangible electrical, hardware and
mechanical solutions without delivering any other perceptible value for the customers
[1]. Today, software functionality is rapidly becoming the main competitive advantage
of the product, and what delivers value to the customers [2]. However, the way in
which software features are being developed, and how they are prioritized is still a
challenge for most organizations. Often, and due to immaturity and lack of experience
in software development, companies treat software features similarly to electronics or
mechanics components, with the risk of being unable to identify what features are
differentiating and that add value to customers, and what features are regarded commodity by customers. As a consequence individual departments continue to prioritize
© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 517–525, 2016.
DOI: 10.1007/978-3-319-49094-6_37

518

A. Fabijan et al.

what they ﬁnd the most important and miss the opportunities to minimize and share the
investments into commodity features [3, 4].
In this paper, we identify that the lack of distinguishing between different types of
features is the primary reason for inefﬁcient resource allocation that, in the end, make
innovation initiatives suffer.
The contribution of the paper is twofold. First, we give guidelines on how to
distinguish between different types of features that are being developed and we provide
empirical evidence on the challenges and implications involved in this. Second, we
present a conceptual model to guide practitioners in prioritizing the development
activities for each of the feature types. With this model, companies can develop only
the amount of feature that is required for commoditized functionality and, on the other
hand, maximize their investments in innovative features.

2 Background
In most companies, customer feedback is collected on a frequent basis in order to learn
about how customers use products, what features they appreciate and what functionality they would like to see in new products [5–8]. The number of requests and ideas
that originate from this feedback often outnumbers available engineering resources and
prevents companies from realizing all of them [9]. To help practitioners control the
information overload originating from customer feedback, Knauss et al. [10] propose a
feedback-centric requirements approach, together with a tool that elicits the most
important information. Recently, Johansson et al. [11] stressed the importance of
complementing the qualitative customer feedback with quantitative input by showing
its implications on the product managers prioritization decisions. Moreover, and in
order to further develop only those requirements that will deliver the most business
value, various prioritization techniques have been introduced in requirement engineering and product development literature [12–16]. However, these do not consider
market factors such as the availability of the features being assessed in competitors’
products [17]. Also, very little is known on how to prioritize the development activities
for different types of features.
To recognize the importance of distinguishing between different types of functionality from a complexity point of view, Bosch [18] developed the ‘Three Layer
Product Model’. The model provides a high-level understanding of the three different
layers of features, i.e. commodity, differentiating and innovative, however, does not
give guidance on how to distinguish between the different types, neither which
activities to invest into for each of them. The model distinguishes between three types
of functionality layers, i.e. commoditized functionality (functionality necessary for
system operation that customers take for granted), differentiating functionality (the
functionality that differentiates the product from its competitors) and Innovation
functionality (functionality providing signiﬁcant value).

Commodity Eats Innovation for Breakfast

519

3 Research Method
This case study [19] research builds on an ongoing work with three case companies
(see Table 1 below) involved in large-scale development of software products. It was
conducted between August–December 2015.
3.1

Data Collection and Data Analysis

First, we conducted a workshop at each of the companies. Second, we conducted
twenty-two interviews that lasted one hour. During analysis, the workshop notes,

Table 1. Description of the companies and the representatives.
Company and their domain
Company A is a provider of telecommunication systems and
equipment, communications networks and multimedia solutions for
mobile and ﬁxed network operators. The company has several sites
and for the purpose of this study, we collaborated with
representatives from one company site.
The participants marked with an asterisk (*) attended the workshop
and were not available for a follow up-interview
Company B is a software company specializing in navigational
information, operations management and optimization solutions.
All the participants attended the workshop and were interviewed
Company C is a manufacturer and supplier of transport solutions,
construction technology and vehicles for commercial use.
All the participants that attended the workshop were interviewed. In
addition, one sales manager and one technology specialist wished to
join the project at a later stage, and were interviewed

Representatives
1 Product Owner
1 Product Manager
2 System Managers
2 Software Engineer
1 Release Manager
1 Area Prod. Mng.*
1 Lean Coach*
1 Section Mng.*
1 Product Owner
1 System Architect
1 UX Designer
1 Service Manager
1 Product Owner
2 Product Strategists
2 UX Managers
2 Function Owners
1 Feature Coord.
1 Sales Manager
2 Technology Spec.

interview transcriptions and graphical illustrations were used when coding the data.
The data collected were analyzed following the conventional qualitative content
analysis approach [20] where we derived the codes directly from the text data.
To improve the study’s construct validity, we conducted semi-structured interviews
at the workshops with representatives working in several different roles and companies.
Since these companies represent the current state of large-scale software development
of embedded systems industry, we believe that the results can be generalized to other
large-scale software development companies.

520

A. Fabijan et al.

4 Findings
4.1

Feature Realization: Current State of Feature Differentiation

Features that are being developed are handed over from one development stage to
another, together with their requirements and priorities. The differentiation strategy is
unclear to the practitioners developing the features (see e.g. Table 2).
Table 2. The current State of Feature Differentiation.
Current State
Vague
differentiating
strategy

Dev. investment
level does not vary

Feature
prioritization
processes is in
favor of commodity

4.2

Description
Practitioners struggle to know if
the feature is innovative and
requires e.g. direct investment, or
commodity and can be covered
from e.g. running maintenance
budget
Based on the interview data, we
do not see a signiﬁcant difference
in deﬁning the investment level
allocated to the feature
Prioritizing innovative features is
suppressed with numerous
commodity efforts that are
needed to satisfy the standards
and follow the competitors
instead of accurately
understanding what adds value to
the stakeholders

Quote
“Should we go into Maintenance
budget? Or should it go to
investment budget and we
prioritize there?”, – Product
Strategist from Company C
“There is a lot of functionality
that we probably would not need
to focus on.” – Technology
Specialist from Company C
“Customer could be more
involved in prioritization that we
do in pre-development. Is this
feature more important that the
other one?” – Product Owner
from Company B

Differentiating Features: Challenges and Implications

The current state advocates a situation where features are not differentiated in the
strategy between being innovative, differentiating or commodity and, as a consequence,
development activities do not differ between the features. Based on our interviews, we
see that there are a number of challenges associated with this situation. We present the
challenges in Table 3 below.
Due to an unclear differentiating strategy, our case companies experience a number
of implications during the development process of a feature. We summarize them in
Table 4 below.

Commodity Eats Innovation for Breakfast

521

Table 3. The challenges with differentiating features.
Challenge
Understanding the
stakeholder and
purpose of the feature

Description
The way in how feedback is
being collected is rather ad-hoc
and it depends on the stage of the
feature and not on the

Quote
“We want to understand what the
customer wants to have and also
truly, what do they need.” –
Product Manager from
Company A
Incomprehensible
Identifying whether a feature is “Functionality is software
basically and the features are
high-level directives
commodity, differentiating or
more subjective opinions and
innovative, and investing into
activities needed for each type is things that we can’t really… it is
hard to collect data.” – Function
left to the practitioners
owner from Company C
developing the feature
“Those are the things that
Companies do not learn from
Commodity
worries us the most. All of us,
customers fast enough and,
functionality is
internally considered to occasionally, consider and invest since it is so hard, you need to
gamble a bit. If it turns out that
in development activities for
be innovative
you are wrong, then you are
features they believe are
behind.” – Product Manager from
innovative
Company A

Table 4. The implications of feature realization.
Implication
Uniform
stakeholder
impression

Description
Individuals in our case companies
typically ﬁnd it difﬁcult to identify
and truly understand the stakeholder
and their needs that the feature is
being developed for
Companies risk to invest extensively
Arbitrary
in development activities that are
investments in
not required for a certain type of
development
feature
activities
Commodity
As a consequence, features that are
suppresses
expected to be innovative do not get
innovation
to be prioritized. Instead, and in a
rush to keep the pace, our case
companies study close competitors
and analyze their most promising
features
Companies base themselves on
Projecting
competition right now instead of
competitors
current state is the where it will be in the future
norm

Quote
“If you are sitting in a team…you
see that this is the most important
thing we need to do.” – Product
Manager from Company A
“We tend to focus on,…on the
wrong things. We need to look at the
beneﬁt for their customers.” –
Product Manager from Company A
“In our organization is really hard
to sew everything together, to make
it work. That requires funding that
is almost nonexistent.” – Software
engineer from Company A

“We do also our own tests of
competitors….We measure are we
on track or not.” – Product
Strategist from Company C

522

A. Fabijan et al.

5 The Feature Differentiation Model
In response to the empirical data from our case companies, combined with the ﬁndings
and the implications that we presented above, we expand the 3LPM model to a new
dimension and present our model for feature differentiation in the following section.
The contribution of our model is twofold. First, we provide four different categories of
features and their characteristics in order to give practitioners an ability to better
differentiate features early in the development cycle. Second, and as a guidance for
practitioners after classifying a feature, we provide a summary of development activities for every type of feature. Practitioners can use the guidance in prioritizing the
development activities for the features that they are developing. With this model,
companies can develop only the amount of feature that is required for commoditized
functionality and, on the other hand, free the resources to maximize their investments in
innovative features that will deliver the most value.
5.1

Differentiating Characteristics of New Feature Development

Our model advocates an approach in which four fundamentally different types of
features are being developed. We name them “duty”, “wow”, “checkbox” and “flow”
types of features. With “duty”, we label the type of features that are needed in the
products due to a policy or regulation requirement. “Checkbox” features are the features that companies need to provide in order to be on par with the competition that
provides similar functionality. With “wow”, we label the differentiating features that
are the deciding factor for buying a product. Finally, and with “flow”, we label the
features in the product that are regularly used. We depict the four types of features on
Fig. 1, where we place each of the types in relation to the 3LPM commoditydifferentiating-innovative categorization.

Fig. 1. The Feature Differentiation Model.

Commodity Eats Innovation for Breakfast

523

On the horizontal axis, we indicate the development extent for the features, ranging
from “Satisfy” to “Maximize”. On the vertical axis, however, we indicate the distinction between primarily “User-centric” and “3rd party-centric” features. By this, we
divide between features that are primarily being developed for users of the product
(hence “User-centric”) and other, 3rd parties-centric features, which are not directly
used by the users. And to help practitioners categorize a feature on these axis, we
distinguish between ﬁve characteristic points; the stakeholder (e.g. the requestor of the
feature), feature engagement (e.g. the expected level of feature usage), the source of the
feedback (e.g. the stakeholder generating the most feedback), the focus of the feature
(e.g. is the feature indenting to minimally satisfy a known need, or to innovate in a new
area?) and its impact on driving sales (is it a feature focusing on the customer paying
for the product?).
5.2

The Development Process

For each of the four feature types, we suggest how to set the extent of the feature that
should be developed. Here, the extent of the feature can be either deﬁned once
(constant) or dynamically adjusted during development and operation (floating alternates, following - follows the competitors or open - no limitation). Next, the
sources that contain the information required to set the development extent need to be
deﬁned, together with the techniques that make it possible for the practitioners to
collect relevant customer feedback. Next, we suggest the most important R&D activities. They are followed by the activities that do not deliver value for that type and
should be avoided. Finally, we suggest how to set the deployment frequency. We
summarize the most important differences between each approach in Fig. 2.

Fig. 2. The summary of the 4 different development approaches.

524

A. Fabijan et al.

6 Discussion
Multi-disciplinary teams involved in the development of a software product are
increasingly using customer feedback to develop and improve their products and
features. Both qualitative techniques [5, 6, 8] and quantitative techniques [7, 8] are used
to collect customer feedback and product data. Previous research shows that the
number of requests and ideas that originate from this feedback often outnumbers
available engineering resources [9]. And with increasing amount of ideas, prioritizing
development resources and identifying which features to develop to what extent is a
crucial step in new feature development process. Bosch [18] recognized the importance
of dividing functionality between commodity, differentiating and innovative. However,
and as shown in our empirical ﬁndings, companies still struggle with this and, consequently, invest into development activities that do not deliver value to the stakeholders. Innovative features are suppressed by commodity.
To address the concerns above, we develop the Feature Differentiation model,
where we illustrate how development activities depend on the type of the feature being
developed with respect to characteristics. The model helps companies to (1) differentiate between the four types of the features, and (2) prioritize the necessary development activities.

7 Conclusion
In this paper, and based on case study research in three large software-intensive
companies, we (1) provide empirical evidence that companies do not distinguish
between different types of features, i.e. they don’t know what is innovation, differentiation or commodity, which is the main problem that causes poor allocation of R&D
efforts and suppresses innovation. We (2) develop a model in which we depict the
activities for differentiating and working with different types of features.

References
1. Boehm, B.: Value-based software engineering: reinventing. SIGSOFT Softw. Eng. Notes 28,
3 (2003)
2. Khurum, M., Gorschek, T., Wilson, M.: The software value map- an exhaustive collection of
value aspects for the development of software intensive products. J. Softw. Evol. Process.
25, 711–741 (2013)
3. Lindgren, E., Münch, J.: Software development as an experiment system: a qualitative
survey on the state of the practice. In: Lassenius, C., Dingsøyr, T., Paasivaara, M. (eds.) XP
2015. LNBIP, vol. 212, pp. 117–128. Springer, Heidelberg (2015). doi:10.1007/978-3-31918612-2_10
4. Olsson, H.H., Bosch, J.: Towards continuous customer validation: a conceptual model for
combining qualitative customer feedback with quantitative customer observation. In:
Fernandes, João M., Machado, Ricardo J., Wnuk, K. (eds.) ICSOB 2015. LNBIP, vol. 210,
pp. 154–166. Springer, Heidelberg (2015). doi:10.1007/978-3-319-19593-3_13

Commodity Eats Innovation for Breakfast

525

5. Fabijan, A., Olsson, H.H., Bosch, J.: Customer feedback and data collection techniques in
software R&D: a literature review. In: Software Business, ICSOB 2015. pp. 139–153, Braga,
Portugal (2015)
6. Williams, L., Cockburn, A.: Introduction: Agile Software Development: Its About Feedback
and Change (2003)
7. Holmström Olsson, H., Bosch, J.: Towards data-driven product development: a multiple case
study on post-deployment data usage in software-intensive embedded systems. In:
Fitzgerald, B., Conboy, K., Power, K., Valerdi, R., Morgan, L., Stol, K.-J. (eds.) LESS
2013. LNBIP, vol. 167, pp. 152–164. Springer, Heidelberg (2013). doi:10.1007/978-3-64244930-7_10
8. Bosch-Sijtsema, P., Bosch, J.: User involvement throughout the innovation process in
high-tech industries. J. Prod. Innov. Manag. 32, 1–36 (2014)
9. Bebensee, T., Weerd, I., Brinkkemper, S.: Binary priority list for prioritizing software
requirements. In: Wieringa, R., Persson, A. (eds.) REFSQ 2010. LNCS, vol. 6182,
pp. 67–78. Springer, Heidelberg (2010). doi:10.1007/978-3-642-14192-8_8
10. Knauss, E., Lubke, D., Meyer, S.: Feedback-driven requirements engineering: the heuristic
requirements assistant. In: 2009 IEEE 31st International Conference on Software Engineering, pp. 587–590. IEEE (2009)
11. Johansson, E., Bergdahl, D., Bosch, J., Holmström Olsson, H.: Requirement prioritization
with quantitative data - a case study. In: Abrahamsson, P., Corral, L., Oivo, M., Russo, B.
(eds.) PROFES 2015. LNCS, vol. 9459, pp. 89–104. Springer, Heidelberg (2015). doi:10.
1007/978-3-319-26844-6_7
12. Kano, N., Seraku, N., Takahashi, F., Tsuji, S.: Attractive quality and must-be quality.
J. Japanese Soc. Qual. Control. 14, 39–48 (1984)
13. Wiegers, K.E.: Automating requirements management. Softw. Dev. 7, 1–5 (1999)
14. Karlsson, L., Thelin, T., Regnell, B., Berander, P., Wohlin, C.: Pair-wise comparisons versus
planning game partitioning-experiments on requirements prioritisation techniques. Empir.
Softw. Eng. 12, 3–33 (2007)
15. Lefﬁngwell, D., Widrig, D.: Managing Software Requirements: A Uniﬁed Approach, pp. 10,
491. Addison-Wesley Longman Publ. Co., Inc., Boston (1999)
16. Karlsson, J., Ryan, K.: A cost-value approach for prioritizing requirements. IEEE Softw. 14,
67–74 (1997)
17. Kakar, A.K.: Of the user, by the user, for the user: engaging users in information systems
product. In: SAIS 2014 Proceedings (2014)
18. Bosch, J.: Achieving simplicity with the three-layer product model. Computer (Long. Beach.
Calif) 46, 34–39 (2013)
19. Runeson, P., Höst, M.: Guidelines for conducting and reporting case study research in
software engineering. Empir. Softw. Eng. 14, 131–164 (2008)
20. Hsieh, H.-F., Shannon, S.E.: Three approaches to qualitative content analysis. Qual. Health
Res. 15, 1277–1288 (2005)

Process and Repository Mining

PROMOTE: A Process Mining Tool for Embedded System
Development
Arttu Leppäkoski ✉ and Timo D. Hämäläinen
(

)

Tampere University of Technology, Tampere, Finland
arttu.leppakoski@student.tut.fi, timo.d.hamalainen@tut.fi

Abstract. Embedded system development workﬂow is complex, often poorly
modelled, and thus diﬃcult to optimize. We propose a new process mining tool
PROMOTE as the ﬁrst step of the ﬂow improvement. The tool includes an event
log analyzer and web user interface. PROMOTE has been tested in four real
industrial projects, and in an open source SW project. We exposed several bottle‐
necks otherwise undiscovered, which proved the need and feasibility of
PROMOTE. It will be deployed in production in a big embedded system company
in 2017.
Keywords: Process mining · Embedded system development · Workﬂow ·
Business process modeling · Optimization · Web interface · Event log

1

Introduction

Embedded systems have become very complex requiring increasing eﬀorts in safety
critical SW development. Unexpected changes e.g. in HW sourcing can cause signiﬁcant
changes. The projects use hundreds of design tools, thousands of conﬁguration ﬁles, and
millions of the source code lines [1]. A change in a tool can drastically aﬀect the tasks,
ordering and scheduling.
According to a survey [2], an average project duration was 12.4 months and 57 %
ﬁnished “late or canceled”. The biggest technology challenge was “managing the design
complexity” [2]. Desagent Oy and TUT performed a similar survey for 102 Finnish
companies in 2013. The potential to improve the design eﬃciency was 30–50 % for 87 %
of the answers. Only 3 % mentioned their process is formally modeled. 65 % uses oﬃce
documentation. Practically none explicitly measure the eﬃciency of the development
process. The reason is that the workﬂow is very complex and e.g. vulnerable to tool
changes. The bottlenecks may remain hidden and nobody knows the overall eﬃciency.
The tools, such as Issue and Bug Tracking, Version Control System (VCS), Continuous
Integration (CI), Test Management and Code Review are widely used and lots of docu‐
mentation is produced. Thus, the developers dislike any extra tasks for workﬂow model‐
ling. To overcome this, we use process mining to examine tool event logs and then
automatically construct the workﬂow model [3].
Several tools are available for process mining [4, 5]. ProM is excellent, but we also
required outstanding visualization for non-expert users. Many process mining reports
© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 529–538, 2016.
DOI: 10.1007/978-3-319-49094-6_38

530

A. Leppäkoski and T.D. Hämäläinen

exist of the systems themselves [6, 7], but none of the development process. Some studies
focus in SW development [8–10]. In [11], visualizations were utilized to illustrate events
in the issue management system. We have utilized the basic mining techniques found
in the related work also in our tool.
We have examined real logs in a large Finnish engineering and service company and
from an open source SW project [12]. It soon turned out that current process mining
tools were too focused on business domains, for which reason we propose a new tool,
PROMOTE (PROcess Mining tOol for embedded sysTem dEvelopment). The goal is
to bring a consolidated and transparent view to embedded systems development process
from the information scattered around log ﬁles. The aimed users are project stakeholders.
The key contributions in this paper are (1) usage of process mining in embedded
system development in a real industrial case, (2) new intuitive tool for it, and (3) analysis
of the case studies as proof-of-concept.
This paper is organized as follows. Section 2 introduces the proposed PROMOTE
tool, and Sect. 3 describes its user interface. Section 4 focuses on the evaluation of the
log data and Sect. 5 concludes the paper with ideas for future work.

2

PROMOTE Core Architecture

Figure 1 depicts the overall architecture. PROMOTE captures event logs from other
systems using the other tool interfaces available. Event logs are stored into database and
then processed and converted into a common format using conversion scripts. Converted
event logs are visualized and shared using WebUI, which is aimed to product stake‐
holders.

Fig. 1. General architecture

PROMOTE consists of several interface and conversion scripts which are imple‐
mented in Python. Systems to be analyzed are speciﬁed using a conﬁguration script,
which stores their information (name, type, URI etc.) into a dedicated database.
Capturing and conversion is executed automatically without any user interaction. Event
log data is stored into PostgreSQL database, which allows analysis to be executed in
incremental steps without need to retrieve all event logs for each analysis. Database data
is processed and stored into CSV and JSON ﬁle formats which are accessible by the

PROMOTE: A Process Mining Tool for Embedded System Development

531

WebUI. Processed data is stored into several ﬁles using categories such as Issue
events, and Review events.
Three speciﬁc challenges were identiﬁed in event log handling. The ﬁrst one is
coherency of timestamps across the systems, since otherwise the diﬀerences can cause
misleading analysis results. All timestamps were converted to Coordinated Universal
Time (UTC) and events without timestamp were discarded.
The second challenge is multiple usernames for same person in diﬀerent systems.
This matter was solved by creating a special algorithm for detecting which usernames
belong to single user. Individual users are detected by comparing usernames, full names
and email addresses. The third challenge was to determine and extract the relevant
information from the event logs and to process that information into usable form. This
requires external system speciﬁc knowledge and was manually included to the event log
analysis.
PROMOTE can capture the whole development workﬂow by linking events between
diﬀerent systems. The links used are e.g. issue IDs and VCS commit hash tags.
Figure 2 illustrates how the event log data is processed to capture the whole development
workﬂow. The Event log handler (middle) analyses event logs (left) and generates the
process model (right). Typically issue IDs used in the issue management system (e.g.
Task-23) is used to identify events in other systems as well. For example VCS commit
is linked to an implementation task by adding an issue ID to a comment or code review
is linked to an issue ID or a VCS commit. After the links have been captured, the process
model is generated by arranging events into a temporal order and by calculating the
amount of transitions.

Fig. 2. Capturing of the workﬂow

The typical size of a capturing script is 150–400 lines of Python code. Several libra‐
ries were created for handling user detection, process modeling, and timestamp conver‐
sions. Currently PROMOTE includes a total of 4530 lines of Python code and supports
the following systems: GIT, Subversion, Review Board, Jenkins, Redmine, Source‐
Forge, Jira, and VersionOne.

532

3

A. Leppäkoski and T.D. Hämäläinen

PROMOTE WebUI

The existing process mining tools are often used by analysis experts, for which reason
they are not easy to use by normal users. PROMOTE WebUI aims at changing this and
visualize the results for any project member. WebUI includes a hierarchy of views
starting from the overall status and digs into details as the user desires. The WebUI is
based on the HTML KickStart framework [13], which allows adding new views rapidly.
Each visualization view has been designed for universal usage and implemented with
the d3.js Javascript library that supports CSV and JSON ﬁles [14].
The start view, shown in Fig. 3, contains a list of analyzed systems and the status
summary with a word cloud from the most frequent words.

Fig. 3. PROMOTE WebUI start page

Figure 4 lists all the available views. Highest tree level includes categories distinctive
for the embedded system development. Second level views were selected to present the
most important information related to each category. Scatterplot view is similar to the
Dotted chart analysis available in ProM [4] but other views are created based on the
systems and processes available for the embedded system development. We present the
most important views in more detail in the following. For brevity, we also discuss anal‐
ysis ﬁndings from the industrial studies at the same context.

Fig. 4. PROMOTE WebUI views

PROMOTE: A Process Mining Tool for Embedded System Development

533

Project status is intended for showing an overall project status and it can be
displayed in large info screens.
Process ﬂow graph shows information about traces between events in one or multiple
systems. It shows how diﬀerent systems are linked to each other and reveals the actual
workﬂow. In addition, the graph is used to analyze how the actual and planned workﬂow
diﬀers. WebUI includes a process ﬂow graph for the whole development process or a
single system. Figure 5 depicts the process ﬂow of Kactus2 including creation and
closing of Redmine issues and SVN commits. Amount of transitions is shown on top of
each transition.

Fig. 5. Process ﬂow example from Kactus2. (Color ﬁgure online)

Analysis of industrial CI system revealed that build pipelines were not utilized at all.
In practice, builds are triggered by VCS commits or by predeﬁned schedule, and a single
commit can cause several builds to be executed in parallel. Pipelines should be used to
ensure that most important builds are executed ﬁrst and build failures are detected as
early as possible.
Scatterplot provides a way to investigate events in a temporal and logical manner.
It is intended to everyone to analyze their own behavior. The vertical axis contains task
IDs and the horizontal axis the timeline. Each task can include several events such as
Create issue, Assign, and Close issue. Each type of event is painted in individual color.
Issue ﬁltering helps ﬁnding causalities. WebUI includes scatterplots for the following
categories: Issue tracking, Testing, Reviews, Requirements, VCS, and CI.
Figure 6 depicts the review events for over 1200 reviews in industrial projects using
Review Board. Several anomalies were discovered. For example, Review Board is not
used in the same way in all projects. Some reviewers mostly add comments instead of
opening defects, while some only open defects. A common guideline is clearly needed.
Occasionally the time between opening and closing of the review is quite long. In most
of the cases the review has ended but forgotten to be closed. Thus, the review looks to
have taken even several months. It could also be seen that the review was left open on
purpose and defects were ﬁxed later. However, open defects may cause other problems.
Most of the reviews are completed within few days and reviews are rarely reopened.
Interestingly quite many reviews are discarded because the review had become obsolete
or the review had been created incorrectly.
Timeline collects all events in the embedded system development project to provide
a one glance overview. Timeline is mainly intended for the project managers to analyze
overlapping tasks and compare planned and realized schedule. Timeline for an industrial
development project is shown in Fig. 7. Timeline includes VCS tags (vertical lines),
VCS commits (brown dots), VCS merge commits (purple dots), CI builds (grey dots),

534

A. Leppäkoski and T.D. Hämäläinen

Fig. 6. Scatterplot for review events from an industrial study.

tasks (green horizontal lines), scrum stories (red horizontal lines), reviews (purple hori‐
zontal lines) and scrum sprints (blue boxes on the background).

Fig. 7. Project timeline from an industrial study. (Color ﬁgure online)

Timelines of the industrial projects show that reviews are not always completed
before the sprint ends. Some have taken even three sprints. We suggest to complete in
the same sprint so that the scrum story can be closed at the end of the sprint. In addition,
reviews should be small enough to take only few days, and time should be allocated for
upcoming reviews.
Heat map illustrates usage of the system in question. Darkness of the color indicates
the amount of the activities for the current hour. WebUI includes heat maps for following
categories: VCS, Issue tracking, CI, and Review. With heat maps it is possible to inves‐
tigate when the system is accessed most often and when people are working. Target
group for the heat maps are project managers and system administrators.
Figure 8 illustrates the usage of VCS in one large industrial project including over
1500 commits during two years. In this case VCS is used quite evenly from Monday to
Thursday but on Friday the amount of commits increases heavily. Further analysis indi‐
cates that this is most likely caused by the weekly release schedule, which requires that
a new SW version must be released on every Friday. This leads into situation that new
features are committed into VCS as late as possible, which causes problems in testing
that is executed after the release. Thus, developers should commit source codes evenly

PROMOTE: A Process Mining Tool for Embedded System Development

535

on every weekday, and the release schedule should be modiﬁed so that the weekly release
would be done on Wednesday or latest on Thursday.

Fig. 8. VCS heat map from an industrial study.

Figure 9 illustrates the usage of CI server. CI heat map and CI process ﬂow indicates
that CI server is executing scheduled builds during night. By analyzing the heat map it
is recommended that nightly builds should be divided more evenly for the whole night
to avoid overloading. In addition, dependency between the amounts of CI builds and
VCS commits depicted in Fig. 8 is clearly visible and highlights that events in one system
aﬀects other systems. This further advocates for the usage of CI pipelines.

Fig. 9. CI heat map from an industrial study.

Project members graph shows which persons have participated into which project.
The graph is used to check who are familiar with certain projects and helps illustrating
development organization. In addition, this graph is used to ﬁnd out which persons are
in key roles in the organization.
Social network is used to investigate how users collaborate, who are working
together and what kind of hierarchy exists in the organization. In addition, social
networks are used to discover roles of the project participants [4].
Target group for the social network graph are project managers, team leaders, human
resources, and stakeholders interested in collaboration between persons and teams.
Figure 10 illustrates the social network for the Kactus2 project especially for the issue
assignment. This particular social network implies that one user has a bigger role than
others when assigning tasks. It is notable that most of the assignments are done by the
users for themselves which reﬂects that the issue management should have more atten‐
tion. Scatterplot indicates that the issues are not prioritized, only categorized using target
version. In addition, Fig. 10 shows that one person has much more tasks than other
persons, which indicates that workload should be divided more evenly.

536

A. Leppäkoski and T.D. Hämäläinen

Fig. 10. Social network example from Kactus2.

Analysis of the social network for industrial projects shows that developers tend to
use same reviewers for all of their reviews. It would be beneﬁcial to use a bigger group
for reviewing source codes to ensure high review quality and coverage. Developers
should also include persons from other teams as reviewers. This would help spreading
information, knowledge, and practices between developers and teams.
Personal skills view illustrates the skills of the individuals in the form of radar charts.
The charts are created for the usage of tools and for programming skills. The results can
be used e.g. for identifying if some developer needs training, ﬁnding out the best candi‐
date for a new task or to investigate the overall usage of tool. The personal skills view
is targeted for the project managers and team leaders, but also for the other project
members for self-examination. Figure 11 shows an example of skills in usage of
Redmine in the Kactus2 project for a single user (left) and all users (right). It can be
seen that Redmine is utilized quite well in the Kactus2 project excluding feature Add
news, but a single user has used only a portion of all features.

Fig. 11. Redmine skills for single user and for all users from Kactus2.

4

Evaluation

PROMOTE has been piloted in Kactus2 and four industry projects including 15 devel‐
opers and test specialists. Table 1 lists contents of the latter. Events in the diﬀerent
systems were linked together with issue management IDs and VCS commit IDs.
PROMOTE was run on top of 64-bit 2 GHz Ubuntu 14.04 server with 1 GB of RAM
and 16 GB of hard-drive space.

PROMOTE: A Process Mining Tool for Embedded System Development

537

Table 1. Systems included into industrial pilot
System
7 GIT repositories
VersionOne (Issue tracking system)
Jenkins (CI system)
Review board

Content of the analysis
7185 commits, 1205 branches
10 projects, 20 epics, 221 backlog items, 606 tasks
35 jobs, 2586 builds
1247 reviews

Kactus2 utilizes the following systems: Subversion (2084 commits), Redmine (255
issues), and SourceForge. Time needed for the capturing and conversion was about
10 min. The conclusion is that Kactus2 is a small and agile project with ﬂexible mile‐
stones rather than strictly scheduled and prioritized traditional SW project. In large
industrial setups, time for the capturing and conversion is signiﬁcantly higher. The ﬁrst
execution of the PROMOTE Log handler commonly requires several hours. Follow-up
executions are much faster, typically taking 5–10 min, because only the new events
needs to be captured. Size of the PostgreSQL database generated in industrial pilot was
5 MB, but can be 100 MB in larger setups.
The user acceptance of PROMOTE has been very good based on about 15 persons
test group. The WebUI fulﬁlls the requirements of visualizing and summarizing the
information gathered from several sources. It was noticed that compared to related work,
PROMOTE’s architecture provides faster and more straightforward way to automati‐
cally capture and analyze event logs from several sources. In addition, deployment of
PROMOTE in new environments has proven to be very easy because of an expandable
architecture, simple conﬁguration, and reusable scripts.

5

Conclusions

PROMOTE performed very well in the presented cases, in which usage of tools is
extensive, documentation is comprehensive, and traceability between systems has been
taken into consideration. PROMOTE will also be tested with less rigorous development
setups to disclose even bigger bottlenecks. Implementation of PROMOTE started in
2015, an industry pilot in spring 2016, and later in this year it will be taken into wider
use in the pilot company.
In the future, we will model the workﬂows using YAWL [15, 16] and compare with
the models captured by PROMOTE to adjust model accuracy. Implementation of
PROMOTE continues by adding support for systems such as Quality Center, and by
adding reasoning to automatically ﬁnd bottlenecks and suggest corrective actions. As a
conclusion, PROMOTE is promising even as such and will be immediately deployed in
a company, and will be used as the data and process mining part for workﬂow optimi‐
zation research.

538

A. Leppäkoski and T.D. Hämäläinen

References
1. Leppäkoski, A., Salminen, E., and Hamalainen, T.D.: Framework for industrial embedded
system product development and management. In: 2013 International Symposium on System
on Chip (SoC), pp. 1–6, 23–24 October 2013
2. Blaza, D., Wolfe, A.: Embedded Market Study, presentation on Design West, San Jose, Ca,
USA, 22–25 April 2013. http://presentations.ubmdesign.com/events/san-jose/2013. [Cited: 4
May 2013]
3. van der Aalst, W., Weijters, T., Maruster, L.: Workﬂow mining: discovering process models
from event logs. IEEE Trans. Knowl. Data Eng. 16(9), 1128–1142 (2004)
4. van der Aalst, W.M.P.: Process Mining: Discovery, Conformance and Enhancement of
Business Processes, p. 352. Springer, Heidelberg (2011). ISBN:978-3-642-19344-6
5. Aalst, W., et al.: Process mining manifesto. In: Daniel, F., Barkaoui, K., Dustdar, S. (eds.)
BPM 2011. LNBIP, vol. 99, pp. 169–194. Springer, Heidelberg (2012). doi:
10.1007/978-3-642-28108-2_19
6. Heikkinen, E., Hämäläinen, T.D.: LOGDIG log ﬁle analyzer for mining expected behavior
from log ﬁles. Accepted for 14th Symposium on Programming Languages and SW Tools,
SPLST 2015, Tampere, Finland, 9–10 October 2015
7. Andrews, J.H., Zhang, Y.: General test result checking with log ﬁle analysis. IEEE Trans. SW
Eng. 29(7), 634–648 (2003)
8. Poncin, W., Serebrenik, A., van den Brand, M.: Process mining SW repositories. In: 2011
15th European Conference on SW Maintenance and Reengineering (CSMR), pp. 5–14, 1–4
March 2011. ISBN:978-1-61284-259-2
9. Anwar, W.A., Moussa, A.S., Salah, A.: Extracting hidden information and conclusions in SW
testing via distributed relational visual mining. In: 2013 17th International Conference on
Information Visualisation (IV), pp. 527–531, 16–18 July 2013
10. Rubin, V., Lomazoma, I., van der Aalst, W.M.P.: Agile development with SW process mining.
In: ICSSP 2014 (2014)
11. Lehtonen, T., Eloranta, V., Leppänen, M., Isohanni, E.: Visualizations as a basis for agile SW
process improvement. In: 20th Asia-Paciﬁc SW Engineering Conference (2013)
12. Tampere University of Technology, Kactus2 (2016). http://funbase.cs.tut.ﬁ. [Cited: 6 June
2016]
13. J. Gatcke: HTML Kickstart HTML Elements Documentation (2016). http://www.
99lime.com/elements/. [Cited: 6 June 2016]
14. Bostock, M.: D3.js - Data-Driven Documents (2016). http://d3js.org. [Cited: 6 June 2016]
15. The YAWL Foundation: YAWL (2015). http://www.yawlfoundation.org/. [Cited: 6 June
2016]
16. Adams, M., ter Hofstede, A.H.M., La Rosa, M.: Open source SW for workﬂow management:
the case of YAWL. IEEE SW 28(3), 16–19 (2011). 0740-74

Evaluation of Kano-like Models Deﬁned
for Using Data Extracted from Online Sources
Huishi Yin and Dietmar Pfahl(&)
Institute of Computer Science,
University of Tartu, J. Liivi 2, 50409 Tartu, Estonia
{huishi,dietmar.pfahl}@ut.ee

Abstract. The Kano model is a frequently used method to classify user preferences according to their importance, and by doing so support requirements
prioritization. To implement the Kano model, a representative set of users must
answer for each feature under evaluation a functional and dysfunctional question. Unfortunately, ﬁnding and interviewing users is difﬁcult and
time-consuming. Thus, the core idea of our proposed approach is to extract
automatically opinions about product features from online open sources (e.g.,
Q & A sites, App reviews, etc.) and to feed them into the Kano questionnaire to
prioritize software requirements following the principles of the Kano model.
One problem with our proposed approach is how to pair input extracted from the
internet into paired answers to the functional dysfunctional questions. This
problem arises because the reviews and comments from online sources that we
plan to transform into answers to either the functional or dysfunctional question
are usually unpaired. Therefore, the aim of this study is to ﬁnd a method that
produces results resembling those of the traditional Kano model although we
only retrieve partial information. We propose two Kano-like models, i.e., the
Half- and the Deformed-Kano model, for unpaired answers to functional and
dysfunctional questions. In order to analyze the performance of the two proposed models as compared to that of the traditional Kano model, we run several
simulations with synthetic data. Then we compare the simulation results to see
which Kano-like model produces results that are similar to those of the traditional Kano model. The simulation results show that on average both the
Half-Kano and Deformed-Kano models on average generate feature categorizations similar to those of the traditional Kano model. However, only the
Deformed-Kano model generates the same range of categorizations as the traditional Kano model. The Deformed-Kano can be used as an approximation of
the traditional Kano model when the input is unpaired or partly missing.
Keywords: Kano model

 Requirement prioritization  Online source

1 Introduction
In software product management, requirement prioritization is often used for determining which candidate requirements of a feature should be included in a software
release. Requirements are also prioritized to minimize risk during development so that
the most important or low-risk requirements are implemented ﬁrst [1, 2]. Several
© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 539–549, 2016.
DOI: 10.1007/978-3-319-49094-6_39

540

H. Yin and D. Pfahl

methods for classifying and prioritizing software requirements exist. The Kano model
is one of the best-known and frequently used methods to do this.
The Kano model was developed by Noriaki Kano in the 1980s [3]. It deﬁnes the
relationship between user satisfaction and product features. Since the Kano model can
be used to prioritize user needs as a function of customer satisfaction, it is one of the
most popular methods that address customer needs prioritization. The traditional Kano
model deﬁnes ﬁve categories of user needs that have different effects on user satisfaction. Those are O, A, M, I, R1 [4–8]. Since it is possible to receive contradictory
responses from customers, the category Q (Questionable) is also an option.
In the context of a larger research project, we plan to use advanced machine
learning and data mining techniques to help us extract systematically user needs from
online open sources for complementing traditional sources for the elicitation and prioritization of software requirements. In order to fuel the information generated in this
way into requirements prioritization activities, we intend to use the principles of the
Kano model to classify user needs according to their importance. In this paper, we do
not report on the automatic elicitation and sentiment analysis of user needs voiced in
online open sources. In the following we rather focus on how to use the ideas and
principles of the Kano model when the information extracted does not correspond to
paired answers of the Kano questionnaire.

2 Related Work
Since the 1980s when the Kano model was ﬁrst introduced, it has become a popular
theory used by researchers and business practitioners across many industries. After an
extensive review of the literature on the Kano model, Josip and Darko summarized and
evaluated ﬁve methods, which classify quality features into the categories deﬁned by
the Kano model [9], however, in a different way than Noriaki Kano proposed. The
methods analyzed were the original Kano model developed by Noriaki Kano [3], the
“Penalty reward contrast analysis” (PRCA) originally proposed by Brandt [10], the
“Importance grid” developed by IBM [11], the “Qualitative data methods” including
CIT (critical incident technique) developed by Herzberg and ACC (analysis of complaints and compliments) used by Cadotte [12], Oliver [13], Friman, and Edvardsson
[14], and the “Direct classiﬁcation” method proposed by Emery and Tian [15]. Among
those ﬁve methods, only CIT and ACC have the same assumption, i.e. that “quality
features can be categorized by comparing how frequently customers mention it in a
positive context or a negative context” [9]. However, the reliability of both CIT and
ACC methods, as compared to the Kano model, remains questionable when the frequencies with which customers mention features are low. According to Josip and
Darko’s research, the Kano model and the direct-classiﬁcation method are the only
methods capable of classifying features.

1

O = One-dimensional Quality, A = Attractive Quality, M = Must-be Quality, I = Indifferent
Quality, R = Reverse Quality.

Evaluation of Kano-like Models

541

Based on the analysis of the related work, we concluded that the traditional Kano
model is the best approach to elicit future customers’ perceptions regarding a product’s
features.

3 Research Goal and Method
To apply the Kano model on data extracted from online open sources, we assume that
we have already ﬁltered out the sentiment information expressing a person’s feeling
from online reviews, comments, and questions, and that we have translated the sentiment information into a data set similar to the format of the Kano model. For example,
the statement “I dislike X very much!” represents a very negative answer to the
functional question, and “I would be very happy if there is no function X.” represents a
very positive answer to the dysfunctional answer regarding feature X. We put all
answers to the functional questions in one “Yes” (Y) vector, and all answers to the
dysfunctional questions in one vector “No” (N) vector.
One of the biggest problems we are facing is how to pair the required input to the
Kano model without conducting interviews with real people giving answers to both
functional and dysfunctional questions. The reviews and comments from an online
source are usually unpaired so that we cannot process the data following the traditional
Kano model. Because of this, we need to design Kano-like model algorithms for
processing unpaired data.
In this paper, we propose two kinds of Kano-like models, i.e., the Half-Kano model
and the Deformed-Kano model. The goal of our study is to compare the performance of
the Kano-like models with that of the traditional Kano model and to decide which of
the two Kano-like models behaves more similar to the traditional Kano model. Our
research method uses simulation experiments with synthetic data (see Sect. 5 for
details).

4 Kano-like Models
The two Kano-like models we propose differ in the way how they interpret the unpaired
answers derived from online open sources. The assumption of the Half-Kano model is
that we only have either answers to the functional question or answers to the dysfunctional question. The assumption of the Deformed-Kano model is that answers to the
functional and the dysfunctional questions are from the same group of people, even
though we lost the links between answers. One output of the traditional Kano model will
only contain one speciﬁc category to which a feature is classiﬁed to. However, in our
study, to be able to compare with the output of the Kano-like models, we use the
probability of each category that one feature is categorized into instead of the one
category to which the feature is most frequently categorized into. For example, if we get
ﬁve paired answers about one feature, and each paired answer leads to one category,
then we have a list of ﬁve categories (e.g., M, M, A, M, O). The traditional Kano model
output is that this feature is classiﬁed to category M. However, in our study, we say the
output is that this feature is 60 % classiﬁed to category M, 20 % to A, and 20 % to O.

542

4.1

H. Yin and D. Pfahl

Half-Kano Model

To implement the Kano-like models on unpaired data, we assume an extreme case, i.e.,
that each time when we have an interview with our interviewees, we ask only functional questions or only dysfunctional questions relating to one software feature, hence
we only get two groups of responses for functional and dysfunctional questions from
different interviewees. In this case, we cannot use two responses from different interviewees to classify one person’s satisfaction and based on that derive the satisfaction
category the software feature belongs to. However, we can implement an algorithm
which calculates probabilities with which a software feature would be classiﬁed based
on the responses for the functional and the dysfunctional questions. Since the data in
the Y and the N vectors are not matched, the Half-Kano model is not a traditional Kano
model. Nevertheless, we calculate the probabilities following the traditional Kano
model. The difference is that in this method, we use each signal value from Y and N
vectors to derive a satisfaction category. Figure 1 shows an example of the process of
how the Half-Kano model processes the Y and N vectors when the unpaired input is
“1” in the Y vector, and “−2” in the N vector. In this case, we can say that with a
probability of 40 % this feature should be classiﬁed to the M category, with 30 %
probability to the I category, and with 10 % probability each to the R, O, and Q
categories.

Fig. 1. An example of the process of Half-Kano
model (2 = I like it, 1 = It must be like it, 0 = I
am neutral, −1 = I can tolerate it, −2 = I dislike
it.)

Fig. 2. An example of the process of
Deformed-Kano model

The algorithm of probability (P) that vector Y (Functional) and vector N (Dysfunctional) are categorized to the same category (X) can be written as

Evaluation of Kano-like Models

Pm
PðcatðYÞ ¼ catðNÞ ¼ XÞ ¼

i¼1

Fx catðYðiÞÞ þ

Pn
j¼1

543

Fx catðNðjÞÞ

ð m þ nÞ  5

and


1 if YðiÞ ¼ 2
0 if YðiÞ 2 f2; 1; 0; 1g

1 if YðiÞ 2 f1; 0; 1g
FM catðYðiÞÞ ¼
0 if YðiÞ 2 f2; 2g
8
4 if YðiÞ ¼ 2
>
<
FO catðYðiÞÞ ¼

FR catðYðiÞÞ ¼

1 if Y ðiÞ 2 f1; 0; 1g
>
:
0 if YðiÞ ¼ 2


3 if YðiÞ ¼ 2
0 if YðiÞ 2 f2; 1; 0; 1g

3 if YðiÞ 2 f1; 0; 1g
FI catðYðiÞÞ ¼
0 if YðiÞ 2 f2; 2g

FA catðYðiÞÞ ¼


FQ catðYðiÞÞ ¼

1 if YðiÞ 2 f2; 2g
0 if YðiÞ 2 f1; 0; 1g

and


1 if NðjÞ ¼ 2
0 if NðjÞ 2 f1; 0; 1; 2g

3 if NðjÞ ¼ 2
FM catðNðjÞÞ ¼
0 if NðjÞ 2 f1; 0; 1; 2g
8
4 if NðjÞ ¼ 2
>
<
FR catðNðjÞÞ ¼ 1 if NðjÞ 2 f1; 0; 1g
>
:
0 if NðjÞ ¼ 2
FO catðNðjÞÞ ¼


1 if NðjÞ 2 f1; 0; 1g
0 if NðjÞ 2 f2; 2g

3 if NðjÞ 2 f1; 0; 1g
FI catðNðjÞÞ ¼
0 if NðjÞ 2 f2; 2g
FA catðNðjÞÞ ¼


FQ catðNðjÞÞ ¼

1 if NðjÞ 2 f2; 2g
0 if NðjÞ 2 f1; 0; 1g

and Fx is a function that maps the statement X to the set {0,1,3,4}
Fx : X ! f0; 1; 3; 4g
where
i 2 f1; 2; 3. . .mg
j 2 f1; 2; 3. . .ng
X 2 fO; A; M; I; R; Qg
m is the number of values of Y vector
n is the number of values of N vector
4.2

Deformed-Kano Model

In the Deformed-Kano model, we assume that the responses are from the same group of
people, but we lost the links between answers to functional and dysfunctional
questions.

544

H. Yin and D. Pfahl

We sequentially pick a number of the Y vector to combine with each number of the
N vector to derive the satisfaction categories, and then we get a list of satisfaction
categories. After each value in the Y vector has been combined with all values of the N
vector, we calculate the overall proportion of the appearance of each category. Figure 2
shows the example of the process of the Deformed-Kano model when the unpaired
input is “2, 1” in Y vector, and “−1, −2” in N vector. The output is that this feature is
25 % classiﬁed to category M, 25 % to A, and 25 % to O, and 25 % to I.
The algorithm of probability (P) that vectors Y (Functional) and N (Dysfunctional)
are categorized to the same category (X) can be written as
PðcatðYÞ ¼ catðNÞ ¼ XÞ ¼

Pm Pn
i¼1 ð
j¼1 xx ðcatðYðiÞÞ  catðNðjÞÞÞÞ
mn

and
8
O if Y ðiÞ ¼ 2 and N ðjÞ ¼ 2
>
>
>
>
>
A if Y ðiÞ ¼ 2 and N ðjÞ 2 f1; 0; 1g
>
>
>
< M if Y ðiÞ 2 f1; 0; 1g and N ðjÞ ¼ 2
catðYðiÞÞ  catðNðjÞÞ ¼
> I if Y ðiÞ 2 f1; 0; 1g and N ðjÞ 2 f1; 0; 1g
>
>
>
>
> R if Y ðiÞ 2 f2; 1; 0; 1g and N ðjÞ ¼ 2 jj Y ðiÞ ¼ 2 and N ðjÞ 2 f1; 0; 1g
>
>
:
Q if YðiÞ ¼ 2 and NðjÞ ¼ 2 jj YðiÞ ¼ 2 and NðjÞ ¼ 2

and xx is a function that map the statement X to the set {0,1}
xx : X ! f0; 1g
where
i 2 f1; 2; 3. . .mg
j 2 f1; 2; 3. . .ng
X 2 fO; A; M; I; R; Qg
m is the number of values of Y vector
n is the number of values of N vector

5 Simulation Study
Simulation Input: There are 31 possible value sets2 both in the Y and N vectors. For
example, value set ID No.1 indicates that the Y and N vectors only contain elements
with value ‘−2’. Value set ID No.31 indicates that both vectors contain all possible
values, i.e., ‘−2, −1, 0, 1, 2’.
2

C51 þ C52 þ C53 þ C54 þ C55 ¼ 31.

Evaluation of Kano-like Models

545

Simulation Approach: We use the R language to execute the simulation algorithms
we proposed in Sects. 4 and 5. We ﬁrst set the length of Y and N vectors equals to 20,
and these 20 numbers are picked from each possible value set to simulate responses of
one feature. We combined all 31 possible value sets of Y and N vectors. The total
number of possible ways to combine the Y with the N vector value sets is 31 *
31 = 961. In each round simulation, for each combination of value sets of Y and N, we
sample data randomly following a chosen distribution, e.g., uniform distribution. Then
we run traditional Kano and Kano-like models ﬁve times respectively. Next, we calculated the average value of those who have the same value set ID of Y and N vectors
and join them together to ﬁnally get a table which contains 961 rows and 20 columns
(Value set ID of Y and N vectors plus PO, PA, PM, PI, PR, and PQ for the traditional
Kano, Half-Kano, and Deformed-Kano model, respectively).
Simulation Hypothesis 1: Deformed-Kano model generates more similar output to the
traditional Kano than the Half-Kano model.
We pick the data from one out of 961 rows to show an example of the way to
calculate the difference between the traditional Kano model and the Kano-like models.
Table 1 shows the way to calculate the difference between traditional Kano and
Kano-like models, and the calculation results are shown as well. When we calculate the
absolute value of the difference between the two sets of data (Traditional and Half or
Traditional and Deformed), the range is 0 to 200 %. Hence, we divide the absolute
value by 2 to get the result in the range from 0 to 100 %.
Table 1. An example of the difference between the traditional Kano model and the Kano-like
models
Traditional (%)
Half (%)
Deformed (%)
Difference (%)

PO
PA
PM
PI
PR
0
20
0
45
25
3
15.5
7
40.5
27.5
0
19.5
0
45.5
24.5
Traditional − Half = (|0 − 3| + |20 − 15.5| + |0 − 7| + |
45 − 40.5| + |25 − 27.5| + |10 − 6.5|) /2 = 12.5
Traditional − Deformed = (|0 − 0| + |20 − 19.5| + |
0 − 0| + 45 − 45.5| + |25 − 24.5| + |10 − 10.5|) /2 = 1

PQ
10
6.5
10.5

The lower value of difference represents closer output to traditional Kano model. In
the case shown in the Table 1, we can see the Deformed-Kano model’s output is closer
to the output of the traditional Kano model (difference = 1 %) than the output of the
Half-Kano model (difference = 12.5 %).
The ranges and means of the differences between the outputs of the traditional Kano
model and the Kano-like models are shown in Table 2. We can see from Table 2 that
the range of difference between the traditional and Half-Kano models varies from
10.5 % to 80 %, which is much higher than the range of differences between the
traditional and Deformed-Kano models, which is 0 % to 18.74 %. The means show the
same trend. 25.99 % between the traditional and Half-Kano models and 4.28 %
between the traditional and Deformed-Kano models.

546

H. Yin and D. Pfahl

Table 2. The range and means of the difference between the outputs of Traditional Kano model
and the Kano-like models
Traditional-Half (%) Traditional-Deformed (%)
Ranges [10.5, 80]
[0, 18.74]
Means 25.99
4.28

To see more clearly the distribution of the differences of outputs between the
traditional Kano model and the Kano-like models, we draw 3D and 2D ﬁgures.
Figures 3 and 4 show that the Deformed-Kano model shows outputs which have lower
differences with the outputs of the traditional Kano model.

Fig. 3. The projection of the distribution of differences between the traditional Kano and the
Kano-like models on Y vector plane

Fig. 4. The projection of the distribution of differences between the traditional Kano and the
Kano-like models on N vector plane

Simulation Hypothesis 2: The Deformed-Kano model provides similar outputs as the
traditional Kano model.
According to the simulation results, we found that when the input value of the Y
vector or the N vector belongs to {−2}, {−1}, {0}, {1}, {2}, {−2, 0}, {−2,1}, {0, 1},
{−2, 0, 1}, the difference always equals zero, which means 477 out of 961 (49.6 %)
output combinations of Deformed-Kano and traditional Kano model show no
difference.

Evaluation of Kano-like Models

547

When the input value of Y vector and N vector does not belong to {−2}, {−1}, {0},
{1}, {2}, {−2, 0}, {−2,1}, {0, 1}, {−2, 0, 1}, the difference will always be more than
zero. The simulation results show that 484 out of 961 (50.4%) combinations show
differences between the outputs of the traditional Kano and the Deformed-Kano model
with a range in 1% to 18%. In addition, the average values are less than 11 %.

6 Threats to Validity
There are several limitations and threats to validity linked to our simulation study. First
of all, both Half-Kano and Deformed-Kano models are not fulﬁlling the requirements
of the traditional (interview-based) Kano model, because the input taken from online
sources will most of the time not generate paired answers to functional and dysfunctional questions, and the number of answers extracted from online sources will not be
balanced. The latter point is particularly limiting our study as comparisons with the
traditional Kano model could only be done with sets of balanced data. Thus, we had to
restrict our simulation experiments to cases of balanced data. Another limitation is the
choice of distributions used in our simulation experiments. Since we have not yet
started with extracting real data from online sources, we do not know what empirical
distributions of values in the Y and N vectors are realistic. Therefore, we chose a
neutral approach and sampled from uniform distributions in our simulation experiments. Finally, we noticed that the relative amount of questionable categorizations
might deviate from the typical numbers when using the traditional Kano model.
However, since we exclude questionable data from the comparisons between the
Kano-like and traditional Kano model, this limitation does not influence our comparison results.
It should be mentioned that the comparison of the Kano-like models used in our
simulation study with the traditional Kano model is not based on a single category
resulting from the majority of categorizations per interview (in the traditional Kano
model) and per vector matching (in the Kano-like models). Instead, we compare the
distributions (expressed as probabilities) categorizations per feature. Although this is a
deviation from the procedures used by the traditional Kano model, our approach could
be considered as giving richer output. Assume, for example, an extreme case where you
get 1000 paired answers, with 501 answers leading to category ‘M’ and 499 answers
leading to category ‘I’. The output of the traditional Kano model is that this feature
should be categorized as “Must be” (M). If only this ﬁnal categorization is conveyed,
one will not know that 49.9 % of the interviewees considered this feature to be
“Indifferent” (I).

7 Discussion
It is clear from our simulation study that the Deformed-Kano model produces outputs
that are more similar to those of a traditional Kano model than what the Half-Kano
model produces.

548

H. Yin and D. Pfahl

Although the Deformed-Kano model only partly works like the traditional Kano
model, based on our simulation experiments, we found that 49.6 % of its outputs have
no difference to the outputs of the traditional Kano model, and for those 50.4 % outputs
showing a difference, the differences are very small, with an average of less than 11 %.
Although the Deformed-Kano model does not work exactly like the traditional
Kano model, it has several advantages. Firstly, it can handle unpaired and unbalance
input (Y and N vectors). Secondly, the ability to process unpaired and unbalanced data
but nevertheless producing similar results than the traditional Kano model makes the
analysis of user preferences cheaper as all steps can be automated and now costly
interviews are needed. Thirdly, it is difﬁcult to guarantee the representativeness of the
opinions voiced by a small set of selected interviewees. Using data from online sources
has the potential of generating a more complete and thus more realistic input to the
analysis of user preferences.

8 Conclusions
According to our simulation experiments, we ﬁnd that the results of using the
Deformed-Kano model are always close to the results of the traditional Kano model.
Because of that, we consider the Deformed-Kano model to be a good approximation of
the traditional Kano model. Moreover, the Deformed-Kano model can be used even
when the input is unbalanced or partly missing. Therefore, we believe that the low cost
of using the Deformed-Kano model combined with the possibility to use unbalanced
data compensates for the potential lack of paired data when comparing with the traditional Kano model.
Acknowledgement. This research was supported by the Estonian Research Council.

References
1. Lehtola, L., Kauppinen, M., Kujala, S.: Requirements prioritization challenges in practice.
In: Bomarius, F., Iida, H. (eds.) PROFES 2004. LNCS, vol. 3009, pp. 497–508. Springer,
Heidelberg (2004). doi:10.1007/978-3-540-24659-6_36
2. Berander, P., Andrews, A.: Requirements prioritization. In: Aurum, A., Wohlin, C. (eds.)
Engineering and Managing Software Requirements, pp. 69–94. Springer, Heidelberg (2005)
3. Kano, N., Nobuhiku, S., Fumio, T., Shinichi, T.: Attractive quality and must-be quality.
J. Jpn. Soc. Qual. Control 14(2), 39–48 (1984)
4. Berger, C., Blauth, R., Boger, D., Bolster, C., Burchill, G., DuMouchel, W., Timko, M.:
Kano’s methods for understanding customer-deﬁned quality. Center Qual. Manage. J. 2(4),
3–35 (1993)
5. Jiao, J.R., Chen, C.H.: Customer requirement management in product development: a review
of research issues. Concurrent Eng. 14(3), 173–185 (2006)
6. Kai, Y.: Voice of the Customer: Capture and Analysis. McGraw-Hill, New York (2007)
7. Sharif Ullah, A.M.M., Tamaki, J.I.: Analysis of Kano-model-based customer needs for
product development. Syst. Eng. 14(2), 154–172 (2011)

Evaluation of Kano-like Models

549

8. http://foldingburritos.com/kano-model/. Accessed 15 June 2016
9. Mikulic, J., Prebezac, D.: A critical review of techniques for classifying quality attributes in
the Kano model. Managing Serv. Qual. An Intl. J. 21(1), 46–66 (2011)
10. Brandt, R.D.: A procedure for identifying value-enhancing service components using
customer satisfaction survey data. In: Surprenant, C. (ed.) Add Value to Your Service,
pp. 61–65. American Marketing Association, Chicago (1987)
11. Vavra, T.G.: Improving Your Measurement of Customer Satisfaction: A Guide to Creating,
Conducting, Analyzing, and Reporting Customer Satisfaction Measurement Programs. ASQ
Quality Press, Milwaukee (1997)
12. Cadotte, E.R., Turgeon, N.: Dissatisﬁers and satisﬁers: suggestions from consumer
complaints and compliments. J. Consum. Satisfaction Dissatisfaction Complaining Behav.
1, 74–79 (1988)
13. Oliver, R.L.: Satisfaction: A Behavioral Perspective on the Consumer. McGraw-Hill Series
in Marketing Show all Parts in this Series. McGraw-Hill, New York (1997)
14. Friman, M., Edvardsson, B.: A content analysis of complaints and compliments. Managing
Serv. Qual. Intl. J. 13(1), 20–26 (2003)
15. Emery, C.R., Tian, R.G.: Schoolwork as products, professors as customers: a practical
teaching approach in business education. J. Educ. Bus. 78(2), 97–102 (2002)

Log File Analyzing in Intelligent
Transportation Systems Development
Esa Heikkinen(&) and Timo D. Hämäläinen
Department of Pervasive Computing, Tampere University of Technology,
P.O. Box 553, 33101 Tampere, Finland
esa.heikkinen@student.tut.fi, timo.d.hamalainen@tut.fi

Abstract. Intelligent Transportation Systems (ITS) consist of a large number
of vehicles and stop monitors, as well as operations management center and
servers. Their development can be challenging due to 3rd party black-box
components and limited debugging visibility. Our solution is log ﬁle analysis.
We developed a tool framework called LOGDIG, which differs from related
work by supporting also very complex system behaviors discovered by recurrent
and backward processing of the log ﬁles. The tool was successfully used to ﬁnd
and ﬁx faulty timing of bus stop monitor information in a product called ELMI.
Keywords: Log ﬁle analysis  Systems engineering  Data mining  RTPIS  ITS

1 Introduction
This paper presents our experiences in developing an Intelligent Transportation System
(ITS) product called ELMI [1]. It included real-time bus tracking and bus stop monitors
displaying time of arrival estimates. ELMI is based on proprietary radio network and
involves parts from several vendors, which complicates testing on the ﬁeld. Log ﬁle
analysis helped us discovering the real vs. expected behavior of the black-box parts,
which helped ﬁxing the open ELMI parts. The ﬁrst log tools were TCL scripts but
eventually we ended up with a general purpose tool framework called LOGDIG.
The new contributions in this paper are (i) using feedback from log analysis for
continuous system development improvement loops, (ii) using a log generator to create
test logs and expand real data for better problem tracking and (iii) using LOGDIG
framework to analyze behavior of the case study ELMI.
This paper is organized as follows. Section 2 describes the motivation and related
work, and Sect. 3 the log ﬁle analysis in ELMI as complex behavior mining. Section 4
presents LOGDIG analyzer with generated data and in Sect. 5 using real case study.
The paper is concluded in Sect. 6.

2 Motivation and Related Work
In ELMI, we have applied a continuous development process combining Systems
Engineering [2, 3], CRISP-DM [4] data mining process and event-driven development
[5, 6]. Figure 1 depicts an overview (A) and a detailed (B) model of our development
© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 550–559, 2016.
DOI: 10.1007/978-3-319-49094-6_40

Log File Analyzing in ITS Development
A. Simple model

B. Detailed model

E
K
E

A
S

5.

6.

7.

2.

1.

3.

4.

U
A M
SysLoop
AnaLoop
UserLoop

S U G

551

K = Knowledge base of system
E = Evaluation and engineering
process of system
A = Analyzing process (automatic
and manual)
S = ELMI system process
U = Users of system
M = Model of analyzing
L = Log files of system
G = Guidelines to users
I = System Info
P = Preprocessing

I
L

P

Feedbacks

P

Improvements

Fig. 1. Models of improvement loops in ELMI continuous development process

process. The knowledge base and the engineering process are out of scope, but we
focus on the following loops.
UserLoop is based on end users’ feedback that we call bug reports independent of
the form. Understanding and reasoning can take much time and is sensitive to
misunderstanding, but this is deﬁnitely the last source if nothing else is available.
SysLoop is based on an automated monitor and can detect and report misbehavior
for a system operator. The report is manually processed and sent to the engineering
process. This is less prone to human errors than above. It should be noted that the
Table 1. Feedback features for engineering process
Feature of feedback
1. Knowledge response time (no delay)

UserLoop

SysLoop

AnaLoop

Low

High

Medium

2. Knowledge delivery interval

Low

Medium

Medium

3. Knowledge accurance and
probability for understanding

Low

Medium

High

4. Automation

Low

Medium

High

5. No data preprocessing

Low

Medium

High

6. Variety of knowledge

Medium

Low

High

7. Amount of knowledge

Low

Medium

High

Medium

High

Low

9. Analyzing complexity

Low

Medium

High

10. Final (physical) behavior
observations

High

Low

Low

11. Suitability for engineering process

Low

Medium

High

8. Analyzing small development time

552

E. Heikkinen and T.D. Hämäläinen

monitors may not be originally designed for the engineering process at all, which was
the case in ELMI.
AnaLoop can automatically and regularly analyze the log ﬁles for extracting the
expected behavior. Reports are automatically sent to the engineering process. Human
errors are not probable, because the knowledge is already in the format the engineering
process requires. Table 1 summarizes feasibility of the three feedback loops. The
AnaLoop is the best, as we learned to know in practice in ELMI as well.
Our focus is in discovering the realized behavior that appears as an execution trace
or sequential interdependent events in log ﬁles. Log ﬁle analyzers can be divided in two
groups. The ﬁrst group includes analyzers, like Synoptic [7], Invarimint [8] and Daikon
[9], which creates behavior models from the log ﬁles. The second group includes
analyzers, like LFA [10], Texada [11] and CEP type analyzers [5], which have a
built-in expected behavior model independent of the log ﬁles. The latter only checks if
the behaviors in the log ﬁles are expected or not. Similar approach is also presented by

Messages from bus

Description of
logs events

LOGIN

LOCAT

Messages to
monitor of busstop
TAT

AD

Search (or found) order of
traces A and B for busstop BS1
Start time of searching

Bus B1 sign up to bus line L1
in terminal busstop TBS1
Trace A

1.

3.

2.

Time

Bus B1 leaves busstop TBS1
TAT for monitor of busstop BS1
TAT for monitor of BS2
Bus B2 sign up to bus line L2
in terminal busstop TBS2

4.

AD correction for waitime of
monitor of busstop BS1

Trace B

5.

6.
7.

Bus B1 arrives busstop BS1
Bus B2 leaves busstop TBS2

8.

AD correction for waitime of
monitor of BS2
TAT for monitor of BS3
TAT for monitor of busstop BS1
Bus B1 arrives busstop BS2
AD correction for waitime of
monitor of BS3
Bus B2 arrives busstop BS3
AD correction for waitime of
monitor of busstop BS1

9.
11.

10.

Bus B2 arrives busstop BS1

Configuration
of ITS system
in the example:

TBS1

Stop time of searching
Bus B1

BS3

TBS2

BS1

BS2
Bus line L1

Bus B2

Legend: LOGIN (login to a bus line), LOCAT (location of a bus),
TAT (theoretical arrival time of bus for busstop), AD (advanced or delayed correction of TAT)

Fig. 2. Example of the search method of LOGDIG in ITS type log data

Bus line L2

Log File Analyzing in ITS Development

553

Maoz [12] that uses model-based traces as runtime models. Batch-job analyzers can be
slower, but few on them, like LOGDIG, can perform very complex analysis.
LOGDIG has also a special “versatile searching” feature, which we have not seen
in any other scientiﬁc analyzer. This means the possibility of backward searching as
presented in Fig. 2. Other scientiﬁc analyzers seem to read log lines (events) only by a
forward way and often row by row all the lines. LOGDIG can read only the selected
lines from the log ﬁles.
One challenge in log analysis is that the information might not be purposed for
error detection but e.g. for business, user or application context. This requires capability to interpret log information. In addition, there can be chains of sequential
interdependent events, for which simple statistical methods are not sufﬁcient [13]. This
requires state sequence processing. LOGDIG [14] supports this, because it is intended
for expected behavior mining. We have also proposed a Behavior Mining Language
(BML) to support these needs [15]. There are also commercial real-time analyzers, like
Apache Spark, but they are out of scope of this paper.

3 Complex Behavior Mining in LOGDIG
Figure 2 depicts an example of the search method of LOGDIG. For brevity we consider
the ﬁgure self-explaining. This is a typical example of ITS system logs from sources
like buses and central computer. The behavior is seen in the following sequential
events: (1) a bus has sent login (i.e. start) to the line, (2) the bus has left the terminal
bus stop, (3) the central computer has sent the ﬁrst waiting time estimation to a bus stop
monitor, (4) the central computer has sent the last time estimation to the bus stop
monitor and (5) the bus has arrived to the monitor’s bus stop. The searching has found
two chains of sequential events, which we call traces A and B. Figure 2 also shows
how the searching actually works in a versatile way. It is selective, because it rejects
non-essential traces, which are shown as grey arrows. It can also go backward, like in
arrows 2, 5, 6 and 10.

4 Testing LOGDIG with Generated Log Data
We have used both real and generated log data in testing LOGDIG. We discuss ﬁrst the
latter. To generate the test log ﬁles, we have implemented a tool called LogGen that
takes static and dynamic properties as parameters. The former is e.g. test area, bus stop
areas and bus lines. The latter are e.g. scheduling and speeding of buses, number of
buses, number of generated traces and timing of logs. Figure 3 depicts a visualized
example of the static properties in our example case, which consists of 4 bus lines, 4
terminal bus stops for direction A in left side of the test area, 3 terminal bus stops for
direction B in the right side and 6 bus stop monitors in the edges of the matrix area. The
sizes of all areas have been deﬁned as meters. Bus lines are links as between bus stops,
e.g. the red bus line goes through A1, M1, M3 and B1 bus stops.
The details of LogGen and LOGDIG are out of scope, but we present two example
sets with the help of Fig. 4: Short (A) and complex (B) traces. The log ﬁle data is

E. Heikkinen and T.D. Hämäläinen
Direction A

Test area (Width 1500 m, height 1000 m)
Busstop’s areas
(Width 40 m, height 40 m)

A1
A terminal busstops (number of stops: 4)

Direction B

B1
B terminal busstops (number of stops: 3)

554

A2
M1

M5
M3
B2

A3
M6

M4
M2

Matrix area for M monitor busstops
(width 300 m, height 300 m,
2 vertical and 3 horizontal monitors)

A4

B3

Fig. 3. Visualized example of the static properties in LogGen

BML

BML

BMU
Search order (Found)
ESU
1.
RTAT

ESU
2.
LOGIN

ESU
3.
LOCAT S

ESU
4.
LOCAT A

ESU
5.
AD

ESU
6.
LOGOUT

ESU
vars

ESU
vars

ESU
vars

ESU
vars

ESU
vars

ESU
vars

Start
time

ESU
vars

ESU
vars

BMS

Time order

Time order

Start
vars

B. EXAMPLE OF LONG AND COMPLEX ITS-TYPE TRACE

ESU
2.
LOGOUT

Review time window

Start
vars

Explanations:
Time window
for searching

Event

Trace

BML = Behavior Mining Language
BMU = Behavior Mining Unit
BMS = Behavior Mining Storage
ESU = Event Search Unit
vars = variables

Data
flow

Moving
of bus

Position
dimension

Data flow
dimension

BMU
Search order (Found)
ESU
1.
LOGIN

Time flow
dimension

Control flow
dimension

A. EXAMPLE OF SHORT TRACES

Leaving
from
area

Arriving
to area

Fig. 4. Example traces and structure of LOGDIG for searching

BMS

Log File Analyzing in ITS Development

555

depicted in the middle of Fig. 4 by the order of lines originally written to them. BMU
executes the search of desired events (ESU) speciﬁed in the BML language. The results
are traces from the log ﬁles as explained in the following.
In Example A, we want to search many short traces. The main objective is to
measure all lengths of line logging and check if they are too long. To achieve this,
LOGDIG (1) searches LOGIN event of a bus, (2) searches LOGOUT event of the bus
and (3) checks if the time difference between the events is too long. This is repeated as
many times as all LOGIN events has been read in the given review time window. In the
example we have four “line logging” traces. The results are output to a Static Behavior
Knowledge (SBK) formatted .csv ﬁle depicted in Table 2.
Table 2. The SBK ﬁle of example A

SBK_ID,BUS_ID,LOGIN,LOGOUT,DIFF,STATUS,CNT_OK,CNT_ERROR
1,1,2016-06-21 12:01:00,2016-06-21 12:11:00,600,ERR,0,1
2,2,2016-06-21 12:06:00,2016-06-21 12:15:00,540,OK,1,1
3,3,2016-06-21 12:08:00,2016-06-21 12:13:00,300,OK,2,1
4,1,2016-06-21 12:12:00,2016-06-21 12:17:00,300,OK,3,1

In Example B we want to search one complex and long trace. The main objective is
to measure the time difference between real arriving and estimated time of a bus to the
stop monitor. To achieve this, LOGDIG (1) searches Real Theoretical Arrival Time
(RTAT) event of the bus, (2) LOGIN event of the bus, (3) leaving time of the bus from
a terminal, (4) arriving time to the bus stop monitor, (5) last Advanced or Delayed
(AD) event of the bus, (6) LOGOUT event of the bus and (7) calculates time difference.
The results are written to the SBK ﬁle.
For brevity and clarity, the examples do not include all the details, like input
parameters and adjusting of the time windows during the searching. For example, it is
possible to search events only for certain bus or for all buses. A start and a stop time of
the time windows typically depends on the results of previous searches. This can be
seen as dotted horizontal lines between search states in Fig. 4.

5 Testing LOGDIG with Real Case Study
A severe bus stop monitor problem was discovered in ELMI, in which the arriving bus
symbol disappeared too early [16]. The real case is similar to example B in Fig. 4, but
with thousands of traces per day. Figure 5 depicts a screenshot of LOGDIG for the ﬁrst
9 traces from a bus line between 07:00–8:30. The searching time windows are presented as vertical lines, and horizontally the traces with the corresponding colors. We
can ﬁnd there some issues. Trace 4 had problems, because no arriving was found to the
stop monitor and the trace shape is different than others. There are maybe missing trace
and departure of a bus between traces 4 and 5. Traces 8 and 9 have been almost the
same and maybe two buses have been driving one after the other because of the
morning rush hours.

556

E. Heikkinen and T.D. Hämäläinen

Fig. 5. LOGDIG’s screenshot of real traces from ELMI logs

Some of the original data is unfortunately no longer available even if we know how
the problem occurred. Thus, we augmented the original data with some generated data
to expose the problem. The results are from one bus line (2132) per one day between
03:00–23:00, and we present only the most important results helping to ﬁx the problems. Tables 3 and 4 depicts the minimum, average and maximum errors of the waiting
times in ﬁve bus monitors (Sign ID). The waiting time errors were divided into 15, 10
and 5 min before arriving to a stop monitor and 0 min when a bus has just arrived to
the stop monitor. A positive waiting time error means that the bus has arrived too late,
and negative one means the opposite. The most important value is the average error
value for 0 min. After ﬁxes to ELMI, it has clearly been improved from 124 to 17 s.
Another important result in Fig. 6 is the follow-up list, which lists all “bad” bus lines,
buses and stop monitors. This means over 2 min waiting time errors. Figure 6 depicts
how the total number of errors has been decreased from 16 (8 + 2 + 6) to 9 (3 + 2 + 4)
after improvements. Also the percentage values are much smaller, e.g. in bus line
0021321 it has been decreased from 45,2 % to 2,4 %.

Log File Analyzing in ITS Development

557

Table 3. Waiting time errors before improvements
Num

Sign D L TAT-msg 15 minutes
10 minutes
5 minutes
0 minutes
ID
R C Amount min
ave
max
min
ave
max
min
ave
max
min
ave
max
-----------------------------------------------------------------------------------------------1: 1031 E 1
15
-12
66
308
-12
66
308
-12
71
290
85
131
184
2: 1033 W 1
62
-430
86
634
-434
137
608
-429
107
462
62
133
548
3: 1041 E 1
15
-22
59
278
-22
59
278
-22
60
157
63
129
179
4: 1061 W 1
77
-249
55
441
-145
104
412
-86
116
426
-165
127
540
5: 1071 E 1
12
-23
43
270
-23
43
270
-23
43
270
-23
38
141
-----------------------------------------------------------------------------------------------ALL
181
-430
66
634
-434
104
608
-429
100
462
-165
124
548

Table 4. Waiting time errors after improvements
Num

Sign D L TAT-msg 15 minutes
10 minutes
5 minutes
0 minutes
ID
R C Amount min
ave
max
min
ave
max
min
ave
max
min
ave
max
-----------------------------------------------------------------------------------------------1: 1031 E 1
15
-12
66
308
-12
66
308
-12
51
146
3
15
25
2: 1033 W 1
62
-430
45
543
-501
35
459
-517
0
376
1
22
427
3: 1041 E 1
15
-22
59
278
-22
59
278
-22
20
79
-10
10
29
4: 1061 W 1
77
-249
40
412
-190
30
412
-168
10
369
-246
17
405
5: 1071 E 1
12
-23
43
270
-23
43
270
-23
43
270
-23
10
121
-----------------------------------------------------------------------------------------------ALL
181
-430
46
543
-501
38
459
-517
13
376
-246
17
427

Fig. 6. Follow-up list errors, Original (left) and Fixed (right).

Figure 7 depicts the variation of waiting time errors from −5 to +8 min by 30 s
resolution for all stop monitors. For example, there have been 5 waiting times in 0 min
errors and 44 waiting times in +1,5 min errors.
After the improvements the results are clearly better. Almost all (155) waiting time
errors are in 0 min and the variance is much smaller. There are still some single
problem cases, like one bus arrived 7 min too late to the stop monitor.

558

E. Heikkinen and T.D. Hämäläinen

Fig. 7. Histogram of the waiting time errors in 0 min.

6 Conclusions
We have introduced an automatic, general purpose log-ﬁle analyzer framework
LOGDIG, which was used to improve the ELMI real time passenger information
system after the ﬁeld deployment. To our best knowledge, there is no other such log ﬁle
analyzer capable of backward and selective searches through the log ﬁles, which was
required to ﬁx ELMI. LOGDIG also automated many analysis tasks that are usually
carried out manually. The results are looped to the systems engineering process every
night, so the improvements can be started in the next working day. LOGDIG and the
tools are freely available at GitHub [17] as well as the generated test data used in this
paper. In the future, we will evaluate the feasibility of commercial log ﬁle analyzers
and study further generation of representative and generally applicable test logs.

References
1. Aaltonen, J.: Implementation of GPS based real time passenger information system.
Licentiate in Technology, Tampere University of Technology, pp. 1–76 (1998)
2. Wasson, C.S.: System Engineering Analysis, Design, and Development: Concepts,
Principles, and Practices. John Wiley & Sons (2015)
3. Blanchard, B.S., Fabrycky, W.J., Fabrycky, W.J.: Systems Engineering and Analysis.
Prentice Hall, New Jersey (1990)
4. Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., Wirth, R.:
CRISP-DM 1.0 Step-by-step data mining guide (2000)
5. Luckham, D.: The Power of Events. Addison-Wesley, Reading (2002)
6. Dunkel, J., Fernández, A., Ortiz, R., Ossowski, S.: Event-driven architecture for decision
support in trafﬁc management systems. Expert Syst. Appl. 38(6), 6530–6539 (2011)

Log File Analyzing in ITS Development

559

7. Beschastnikh, I., Abrahamson, J., Brun, Y., Ernst, M.D.: Synoptic: studying logged behavior
with inferred models. In: Proceedings of the 19th ACM SIGSOFT Symposium and the 13th
European Conference on Foundations of Software Engineering, pp. 448–451. ACM
8. Beschastnikh, I., Brun, Y., Abrahamson, J., Ernst, M.D., Krishnamurthy, A.: Unifying
FSM-inference algorithms through declarative speciﬁcation. In: Proceedings of the 2013
International Conference on Software Engineering, pp. 252–261. IEEE Press (2013)
9. Ernst, M.D., Perkins, J.H., Guo, P.J., McCamant, S., Pacheco, C., Tschantz, M.S., Xiao, C.:
The Daikon system for dynamic detection of likely invariants. Sci. Comput. Program. 69(1),
35–45 (2007)
10. Andrews, J.H., Zhang, Y.: General test result checking with log ﬁle analysis. IEEE Trans.
Softw. Eng. 29(7), 634–648 (2003)
11. Lemieux, C., Beschastnikh, I.: Investigating program behavior using the texada LTL
speciﬁcations miner. In: 2015 30th IEEE/ACM International Conference on Automated
Software Engineering (ASE), pp. 870–875. IEEE (2015)
12. Maoz, S.: Using model-based traces as runtime models. Computer 42(10), 28–36 (2009)
13. Valdman, J.: Log ﬁle analysis, Technical report, Department of Computer Science and
Engineering, University of West Bohemia in Pilsen (FAV UWB), Czech Republic,, Tech.
Rep.DCSE/TR-2001–04, pp. 1–51 (2001)
14. Heikkinen, E., Hämäläinen, T.D.: LOGDIG log ﬁle analyzer for mining expected behavior
from log ﬁles. In: SPLST 2015 14th Symposium on Programming Languages and Software
Tools, Tampere, Finland (2015)
15. Heikkinen, E., Hämäläinen, T.D.: Behavior mining language for mining expected behavior
from log ﬁles. In: IECON The 42nd Annual Conference of IEEE Industrial Electronics
Society, Firenze, Italy, Accepted for publication in 24–27 October 2016
16. Pesonen, H., Laine, T., Bäckström, J., Granberg, M., Vehmas, A., Niittymäki, J.:
Assessment of impacts and socio-economical proﬁtability of real-time passenger information
system (ELMI). FITS Publications 7/2002, Ministry of Transport and Communications,
Straﬁca Ltd., SCC Viatek Ltd., LT-Consultants Ltd., Helsinki (2002) http://virtual.vtt.ﬁ/
virtual/proj6/ﬁts/julkaisut/hanke2/ﬁts7.pdf
17. Heikkinen, E.: LOGDIG’s source codes and examples, GitHub. https://github.com/
ErasRasmuson/LA. (Accessed 1 April 2016)

On the Eﬀectiveness of Vector-Based Approach
for Supporting Simultaneous Editing
of Software Clones
Seiya Numata1 , Norihiro Yoshida2(B) , Eunjong Choi3 , and Katsuro Inoue1
1

2

Osaka University, 1-5 Yamadaoka, Suita, Osaka 565-0871, Japan
{s-numata,inoue}@ist.osaka-u.ac.jp
Nagoya University, Furo-cho, Chikusa, Nagoya, Aichi 464-8601, Japan
yoshida@ertl.jp
3
Nara Institute of Science and Technology,
8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
choi@is.naist.jp

Abstract. Code clone is one of the factors that makes software maintenance more diﬃcult. Once a developer ﬁnd a defect in a code fragment,
he/she has to inspect the all of the code clones of the code fragment.
In this study, we investigated the eﬀectiveness of query-based use of a
vector-based clone detection tool for supporting simultaneous ﬁxing of
buggy clones in source code and compared it with the query-based use
of a token-based clone detection tool CCFinder.
Keywords: Code clone detection tool
taneous editing

1

· Software maintenance · Simul-

Introduction

A code clone is a code fragment that has identical or similar code fragments to
it in the source code [10]. So far, a lot of code clone detection techniques have
been developed to capture various aspects of source code similarity [7,10,13].
For the detection of syntactically identical or similar code fragments, tokenbased and tree-based approaches detect identical token sequence and similar
syntax tree in source code, respectively [7,10]. These approaches are able to
detect useful clones (i.e., code fragments to be merged [5], inconsistent code
clones that are suspected to include a bug [9]), but have limitations of false positives [5] (syntactically similar but semantically diﬀerent code) and false negatives
[4] (syntactically diﬀerent but semantically similar clones). As a more sophisticated approach, a few techniques have been proposed for the detection of only
semantically similar clones from source code [7,11,12].
For example, Komondoor and Horwitz proposed an approach to ﬁnding isomorphic subgraphs of program dependence graphs (PDGs) in order to ﬁnd
semantic clones from source code [12]. Also, MeCC detects C functions implementing semantically-similar computations based on the similarity of abstract
c Springer International Publishing AG 2016

P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 560–567, 2016.
DOI: 10.1007/978-3-319-49094-6 41

On the Eﬀectiveness of Vector-Based Approach

561

memory states between them [11]. Jiang and Su proposed an approach to compare program execution traces via random testing in order to ﬁnd functionally equivalent code fragments [8]. However, those approaches have limitations.
Identifying isomorphic subgraph of PDG is time-consuming as well as identifying abstract memory states of C functions [11], and comparing execution traces
require a number of test suites to achieve suﬃcient level of test coverage. Also,
it is diﬃcult for those approaches to be applied to uncompilable source code.
In our previous research [18], we developed a vector-based approach for the
lightweight detection of function clones. In our vector-based approach, a feature
vector is generated for each function, based on the occurrence of identiﬁers and
reserved keywords, and then clustering of the generated vectors is performed
by means of locality-sensitive hashing (LSH) [6]. Finally, clones are detected
based on the similarities between each pair of feature vectors. We conﬁrmed the
advantages of the vector-based approach over MeCC as follows:
– Detects a number of function clones but also maintains a low false positive
rate in comparison to MeCC
– Detect in a shorter time
– Finds a larger number of clone-related defects and bad smells
We introduced the tool based on the vector-based approach to a Japanese
multinational IT company and then got feedbacks from practitioners in the company. According to the feedbacks, the practitioners need to know the eﬀectiveness
of the vector-based approach for supporting simultaneous ﬁxing of buggy clones.
They are mainly motivated to perform query-based use of the vector-based approach. When they ﬁnd a defect in a function, they would like to give the function
as a query to the vector-based approach and then discover clone-related defects
from the detected function clones. In our previous research [18], we detected function clones from OSS and then manually conﬁrmed that a large number of the
detected function clones included defects and bad smells. However, the eﬀectivness of query-based use of the vector-based approach is still unknown so far.
In this study, we investigated the eﬀectiveness of query-based use of the
vector-based clone detection tool for supporting simultaneous ﬁxing of buggy
clones in source code. In the investigation, we used the collection of clone-related
defects that was collected by Li and Ernst for the evaluation of a cloned buggy
code detector CBCD [14,15].
The remainder of this paper is organized into the following sections. Section 2
details a vector-based approach to detecting code clones. Section 3 describes a
method to investigate the eﬀectiveness of query-based use of a clone detection
approach. Section 4 explains the investigation result, Sect. 5 reviews related work
and ﬁnally, Sect. 6 summarizes this study.

2

Vector-Based Approach to Detecting Clones

Figure 1 provides an overview of the vector-based approach. The Vector-based
approach takes source code is used as the input, and the output consists of a list

562

S. Numata et al.

Fig. 1. Overview of the Vector-based approach

of clone pairs (i.e., pairs of function clones that are identical or similar to one
another).
Hereafter, we use the term word to represent a set of identiﬁers (e.g., variables
and function names) and reserved keywords (e.g., conditional statements and
interactive statements).
2.1

STEP1: Extracting Word

In this step, word s are extracted from functions in the source code. During this
process, if an identiﬁer consists of more than one word, it is divided into diﬀerent
words as follows:
– It is divided using a delimiters such as hyphens or underscores, between words.
– It is split by using a capital letter for each word using CamelCase
2.2

STEP2: Generating Feature Vectors

In this step, feature vectors are generated based on the weights of extracted words
from STEP1. For this, we use Term Frequency Inverse Document Frequency(TFIDF ) [2], a popular technique, in IR, for weighting each word. Uramoto et al.
used TF-IDF to weight newspaper articles for the relation of multiple news
articles [17]. In this study, we use TF-IDF to weight words in the source code.
TF-IDF combines Term Frequency(TF) weights and Inverse Document Frequency(IDF) weights. For example, suppose Nw represents the occurrences of w
in a function, and Nall represents the total number of occurrences of all words
in a function. In addition, F unction represents the set of all functions in the
source code, while F unctionw represents the set of functions that contain word

On the Eﬀectiveness of Vector-Based Approach

563

w exists. The weighting of w using TF-IDF is deﬁned as follows:
tfw =
2.3

Nw
Nall

idfw = log

|F unction|
|F unctionw |

tf idf w = tfw × idfw

STEP3: Clustering Feature Vectors

Clustering is used to identify candidates for clone pairs, and is conducted. prior
to clone detection (STEP4) for the sake of time eﬃciency. In this step, the feature vectors generated in STEP 2 are clustered using Locality-Sensitive Hashing(LSH) [6], which is known to be an eﬃcient nearest neighbor search algorithm.
To cluster feature vectors, in this study we used E 2 LSH 1 [1], which implements the LSH algorithm. Given a feature vector as a query, E 2 LSH performs
clustering of feature vectors approximate to the query from the dataset, based
on Euclidean distance. Using a dataset of functions, with each feature vector
(function) as a query, a set of similar functions is returned.
2.4

STEP4: Detecting Clones

Clone pairs are detected based on the cosine similarity between all of the feature vectors of the clustered feature vectors obtained in STEP 3. Cosine similarity identiﬁes similarities between multidimensional vectors. Formally, the cosine
similarity between two vectors a, and b whose dimension is d are determined as
follows:
d
i=1 ai bi

sim(a, b) = cos(a, b) = 
d
d
2
2
a
i=1 i
i=1 bi
Cosine similarity takes a value between 0 and 1, because feature values only
have positive values, as seen in the formulation of the TF-IDF described in STEP
2. If the cosine similarity value between two feature vectors is higher than the
threshold, these two vectors are regarded as clone pairs. In this study, we set the
threshold at 0.9 to reduce the probability of false positive results.

3

Investigation Method

We investigated the eﬀectiveness of query-based use of the vector-based clone
detection tool for supporting simultaneous ﬁxing of buggy clones in source code
and compared it with the query-based use of a token-based clone detection tool
CCFinder [10]. Please note that we set 10 tokens as minimum length of a token
sequence for CCFinder because most clone-related defects in the dataset are 10
tokens or smaller in source code.
As we mentioned in Sect. 1, this research was triggered by the feedbacks
from the Japanese multinational IT company. Because this company has used
CCFinder for several years, they would like to know the eﬀectiveness comparison
of the vector-based approach and CCFinder.
1

http://www.mit.edu/∼andoni/LSH/.

564

S. Numata et al.
Table 1. The numbers of N1, N2, N3 and N4
Vector-based approach
CCFinder
Threshold = 0.9 Threshold = 0.5

3.1

N1 11

10

10

N2 22

13

11

N3

4

11

16

N4

1

4

1

Dataset

For our investigation, we used the dataset of clone-related defects that was collected by Li and Ernst [14,15]. The clone-related defects in the dataset are from
the OSS repositories of Git, Linux kernel and PostgreSQL that are written by
C/C++.
The dataset also includes commit IDs of not only clone-related defects but
also code clones of those defects [14]. Please note that we removed the instances
if a defect and its code clones in the same function because the purpose of this
study is the investigation of eﬀectiveness of query-based use of the vector-based
approach.
3.2

Eﬀectiveness Criteria

We used not only precision/recall and F-measure but also a categorization proposed by Li and Ernst [15]. Li and Ernst proposed the following categorization
for each instance in their dataset of clone-related defects.
–
–
–
–

N1:
N2:
N3:
N4:

no false positives, no false negatives
no false positives, some false negatives
some false positives, no false negatives
some false positives, some false negatives

After clones of each clone-related defect are detected, each clone detection tool
can be characterized by the numbers of N1, N2, N3 and N4.
Precison/recall and F-score for each approach are calculated from the total
numbers of true positives, detected functions and buggy functions that are
involved in the dataset.

4

Investigation Results

Table 1 shows the numbers of N1, N2, N3 and N4 and Table 2 shows recall,
precision and F-score for each approach.
According the numbers of N2 in Table 1, the vector-based approach with
threshold=0.9 is the most eﬃcient for supporting simultaneous ﬁxing of buggy

On the Eﬀectiveness of Vector-Based Approach

565

Table 2. Recall, Precison and F-score
Vector-based approach
CCFinder
Threshold=0.9 Threshold=0.5
Recall

0.41

0.53

0.53

Precision 0.59

0.11

0.01

F-score

0.18

0.02

0.48

clones because N2 means no false positive. When developers have only a limited
time, the vector-based approach with threshold=0.9 is the most suitable.
In terms of the numbers of N3 in Table 1, CCFinder is the highest. For the
development of a high-reliability software system, CCFinder is the most suitable
because N3 means no false negative.
N1, N2, N3 and N4 take account of the existence of false postives and negatives and do not take account of the numbers of them precisely. On the other
hand, Recall/Precision takes account of the numbers of them precisely.
According to Table 2, the precision of CCFinder is extremely low. In several
instances of clone-related defects in the dataset, CCFinder detects a large number of false positives (max. 218). This means that CCFinder is unsuitable when
developers have only a limited time. The vector-based approach with threshold=0.9 is most suitable when developers have only a limited time according to
the precision in Table 2.
In terms of recall, the all of the score are almost same. The vector-based
approach with threshold=0.5 and CCFinder are the highest score between them.
Since the precision of CCFinder is extremely low, the vector-based approach with
threshold=0.5 is more suitable for the development of a high-reliability software
system.

5

Related Work

Thus far, various techniques have been proposed for the detection of code clones
from source code. For the detection of syntactically identical or similar code
fragments, the token-based and tree-based approaches detect identical token
sequences and similar syntax trees in the source code, respectively [7,10]. However, these approaches may result in false positives (syntactically similar but
semantically diﬀerent clones) [5] and false negatives (syntactically diﬀerent but
semantically similar clones) [4].
As a more sophisticated approach, a few techniques have been proposed for
the detection of only semantically similar clones from the source code [8,11,12].
For example, Komondoor and Horwitz proposed for ﬁnding the isomorphic subgraphs of PDGs in order to ﬁnd the semantic clones from the source code [12].
Additionally, MeCC [11] detects C functions implementing semantically similar computations, based on the similarity of their abstract memory states. Jiang
and Su proposed an approach to comparing program execution traces via random

566

S. Numata et al.

testing in order to ﬁnd functionally equivalent code fragments [8]. The vectorbased approach in this paper is inspired by the existing vector-based approach
that is proposed by Marcus et al. [16]. Their original approach uses a LSI-based
clustering technique to form all clusters of similar entities. LSI-based retrieval
is a considerable idea to improve the recall of the vector-based approach in our
study. However, we do not use LSI because it leads the increase of the detection
time.
Various applications of the detection of code clones from source code have
been proposed. For example, several studies have been conducted on the support
of clone refactoring using clone detection techniques. Balazinska et al. proposed
a code clone classiﬁcation method for the identiﬁcation of reengineering opportunities [3]. Higo et al. [5] proposed a set of metrics to represent the diﬃculty of
merging clones detected by the token-based clone detection tool CCFinder [10].
Yoshida et al. proposed an approach for extracting clone clones that are related
to each other from the output of CCFinder, and suggesting these be used as
a large-scale reengineering opportunity [19]. Combining these approaches with
the vector-based approach appears to be a promising solution for achieving the
eﬃcient support of clone refactoring.

6

Summary

In this study, we investigated the eﬀectiveness of the query-based use of the
vector-based clone detection tool for supporting simultaneous ﬁxing of buggy
clones in source code. In the investigation, we used the collection of clone-related
defects that was collected by Li and Ernst for the evaluation of a cloned buggy
code detector CBCD [14,15].
The summary of the investigation result is as follows:
– The detection result of the vector-based approach with threshold=0.9 is highest precesion.
– The detection results of the vector-based approach with threshold=0.5 and
CCFinder are highest recall.
– Since the precision of CCFinder is extremely low, the vector-based approach
with threshold=0.5 is more suitable for the development of a high-reliability
software system.
Acknowledgments. This work was supported by JSPS KAKENHI Grant Numbers
25220003, 26730036, 15H06344 and 16K16034.

References
1. Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest
neighbor in high dimensions. CACM 51(1), 117–122 (2008)
2. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval: The Concepts
and Technology behind Search, 2nd edn. (ACM Press Books) Addison-Wesley Professional, Harlow (2011)

On the Eﬀectiveness of Vector-Based Approach

567

3. Balazinska, M., Merlo, E., Dagenais, M., Lague, B., Kontogiannis, K.: Measuring
clone based reengineering opportunities. In: Proceedings of METRICS 1999, pp.
292–303 (1999)
4. Deissenboeck, F., Heinemann, L., Hummel, B., Wagner, S.: Challenges of the
dynamic detection of functionally similar code fragments. In: Proceedings of CSMR
2012, pp. 299–308 (2012)
5. Higo, Y., Kusumoto, S., Inoue, K.: A metric-based approach to identifying refactoring opportunities for merging code clones in a java software system. J. Softw.
Maintenance Evol. 20(6), 435–461 (2008)
6. Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the
curse of dimensionality. In: Proceedings of STOC 1998, pp. 604–613 (1998)
7. Jiang, L., Misherghi, G., Su, Z., Glondu, S.: DECKARD: scalable and accurate
tree-based detection of code clones. In: Proceedings of ICSE 2007, pp. 96–105
(2007)
8. Jiang, L., Su, Z.: Automatic mining of functionally equivalent code fragments via
random testing. In: Proceedings of ISSTA 2009, pp. 81–92 (2009)
9. Jiang, L., Su, Z., Chiu, E.: Context-based detection of clone-related bugs. In: Proceedings of ESEC-FSE 2007, pp. 55–64 (2007)
10. Kamiya, T., Kusumoto, S., Inoue, K.: CCFinder: a multilinguistic token-based
code clone detection system for large scale source code. IEEE Trans. Softw. Eng.
28(7), 654–670 (2002)
11. Kim, H., Jung, Y., Kim, S., Yi, K.: MeCC: memory comparison-based clone detector. In: Proceedings of ICSE 2011, pp. 301–310 (2011)
12. Komondoor, R., Horwitz, S.: Using slicing to identify duplication in source code.
In: Proceedings of SAS 2001, pp. 40–56 (2001)
13. Krinke, J.: Identifying similar code with program dependence graphs. In: Proceedings of WCRE 2001, pp. 301–307 (2001)
14. Li, J., Ernst, M.D.: CBCD: Cloned buggy code detector. Technical report UWCSE-11-05-02, University of Washington Department of Computer Science and
Engineering (2011)
15. Li, J., Ernst, M.D.: CBCD: cloned buggy code detector. In: Proceedings of ICSE
2012, pp. 310–320 (2012)
16. Marcus, A., Maletic, J.I.: Identiﬁcation of high-level concept clones in source code.
In: Proceedings of ASE 2001, pp. 107–114 (2001)
17. Uramoto, N., Takeda, K.: A method for relating multiple newspaper articles by
using graphs, and its application to webcasting. In: Proceedings of ACL 1998, pp.
1307–1313 (1998)
18. Yamanaka, Y., Choi, E., Yoshida, N., Inoue, K.: A high speed function clone detection based on information retrieval technique. IPSJ J. 55(10), 2245–2255 (2014).
in Japanese
19. Yoshida, N., Higo, Y., Kamiya, T., Kusumoto, S., Inoue, K.: On refactoring support
based on code clone dependency relation. In: Proceedings of METRICS 2005, pp.
16:1–16:10 (2005)

Business Value and Benefits

The Developers Dilemma: Perfect Product
Development or Fast Business Validation?
Henri Terho(B) , Sampo Suonsyrjä, and Kari Systä
Tampere University of Technology, P.O. Box 553, 33101 Tampere, Finland
{henri.terho,sampo.suonsyrja,kari.systa}@tut.fi

Abstract. To find a fast-track to profitability, a startup needs to
streamline and speed up two vital processes – developing novel products and finding new markets for their products. These two goals are
typically opposed to each other, business development requiring quick
iteration and product development requiring focus on quality. This difference in mindsets, where the focus should be on the balance of quality
to the business experimentation causes a conflicting environment for the
developers to develop products. This problem is aggravated in a startup
environment, where the reasons for product failure are not clear, increasing the frustrations felt by the developers. Clear ways to communicate
the product goals and even successes between management and developers is needed to create an environment for success. This balancing
act between quality and speed to achieve fast product iteration is the
developers dilemma.
Keywords: Startups · Software development
Lean startup · Prototyping · Agile

1

· Business development ·

Introduction

The success of a startup, or a potential company looking for a repeatable and
scalable business model, is often related to the time it takes for the startup to
develop their business model. [4,9] Consequently, the importance of fast iteration
cycles is intensiﬁed, as the entire business model can be unclear or at least it
remains under constant development [3,13].
Mastering this requires optimized techniques and methods for product and
customer management [3]. An emerging choice for such a management method is
the Lean Startup framework [4,9]. Products are tested through hypothesis driven
iterations, where the success of the product is measured by actionable metrics.
Moreover, when a product is deemed failed, a pivot is encouraged. As each
iteration brings new knowledge, iteration speed is vital – Failing faster means
also ﬁnding success faster. To meet the requirements of rapid course corrections
in the business, i.e. pivoting, the product development process has to adapt to
fast iterations [13].
In a startup environment the developers should be constantly aware that the
software might become waste and therefore typical quality thinking can become
c Springer International Publishing AG 2016

P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 571–579, 2016.
DOI: 10.1007/978-3-319-49094-6 42

572

H. Terho et al.

diﬃcult. This creates a situation where a software developer is pulled in two
directions: should I follow the values of professional software development or
work fast to support the constantly changing business directions?
This delicate balancing act between writing good quality software and spending as little resources as possible on a product that could be scrapped creates
the Developers Dilemma that we explore further in this paper.

2

Background

The Lean Startup method is a popularized collection of best practices from multiple previous entrepreneurship theories such as Creation theory [1] and Bricolage
[2]. With it, business development is seen as an iterative process of conﬁrming
business hypotheses with minimum viable products (MVPs) [9]. The method
consists of iterative cycles of building, measuring, and learning.
Each cycle is typically linked with its own MVP, which is used to test the
hypothesis of the current cycle. Based on these tests, the company either stays
on the same path, building additional MVPs on top of the data gained from the
ﬁrst, or pivots their business plan to a new trajectory. Startups could even be
said to be deﬁned by their pivot making capabilities [12].
An MVP is a version of the product that enables a full turn of the buildmeasure-learn loop. It should contain the features that realize the unique value
proposition of the software solution and little else. The idea is to cut out all
non-essential features and leave just the core features of your application and
the tools to enable learning [8].
As a software development method, producing MVPs is somewhat similar to
prototyping. Prototyping approaches have been developed for situations where
the work steps of a project cannot be clearly detailed before execution [10]. Prototyping incorporates many styles, such as iterative, rapid, evolutionary, throwaway incremental and mock up prototyping [7]. Stephen and Bates [11] deﬁne
the prototype through two common characteristics:
1. The prototype enables a high degree of user evaluation which substantially
aﬀects requirements, speciﬁcations, or design.
2. The prototype initiates a learning process for users and developers of the
system.
The ﬁrst deﬁnition matches the MVP’s aspect of user evaluation. The second
deﬁnition matches the MVP cycle, where the MVP is designed to enable one cycle
of the experimentation and produce learning with minimal development eﬀort.
The prototypes can be split into throwaway and evolutionary prototypes.
These two types are classiﬁed by their intended life cycle. Development based on
evolutionary prototypes goes through sequences of re-design, re-implementation
and re-evaluation without knowing the complete set of requirements beforehand
[7]. Although the exact requirements for further development might be unclear,
the implementation choice still matters as large parts of the code will be reused.
On the contrary, throwaway prototypes will not be reused.

The Developers Dilemma

573

Comparing these two with MVPs, MVPs cover both aspects and possibilities
of prototypes. In MVP development, the key idea is to validate the business case
as fast as possible with a minimum set of features. If the experiment fails, the
MVP should be thrown away, but in the case of a success it will be used again.
However, it is not typically known beforehand if the MVP results in a throwaway
prototype or in an evolutionary prototype. Therefore, the developers encounter
a dilemma of writing code suitable for either throw-away or evolutionary prototyping.

3
3.1

Developers Dilemma
Environment

Modern software development, especially in a startup business where direction
is changed rapidly, challenges the professional mind-set of software developers.
These contradicting goals summarize the Developer’s dilemma:
– As any professionals, software developers want to create artifacts. However,
experimentation and pivoting that are implemented in many startups often
lead to abandoning of software that did not receive positive feedback from the
users or could not create an attractive business model fast enough.
– One of the main ways of showing your skills as a software developers is to write
elegant code. However, when aiming at the minimum viability, developers
should not reﬁne their work in terms of quality and functionality to a level
that they can be proud of.
– Developers can be used to creating prototypes that are thrown away, but this
is not the case if they look at an MVP more as the ﬁrst version of a ﬁnal
product, i.e. an evolutionary prototype. When such an MVP fails, this can
cause a sense of loss for the developers who have poured their talent into the
creation of the MVP.
– In the sense that an MVP is actually closer to being a tool for market research
than an actual product, developers might spend too much eﬀort on developing
features that are ultimately not needed. The stress about features which are
not essential is unneeded.
This environment of creating software, where the passion of the software
developers might work against the goals of the company is typical in product
development and startup environments. It creates a diﬃcult environment to
manage and develop software in.
To further elaborate, developers typically want to distinguish prototypes from
a real products, but if the operation of the company is based on business experimentation, the choice is not known in advance.

574

3.2

H. Terho et al.

Organizational View

The developers dilemma is a problem and a strategic question for the whole
organization. The disjoint between the quality expectations of the software and
the learning that the organization wants to achieve can cause problems.
This problem is illustrated in Fig. 1. The quality of the software and the size
of the learning goals are placed on diﬀerent axes. Typically the more complex
the learning objectives, the larger product has to be built. For example, you are
assessing solutions to complex networked problems or totally novel technology.

Fig. 1. Quality vs learning scale

The quality of this product has to be above a viable level to produce reliable
results. This is illustrated by the Minimum Viability line on the graph. The
quality of an MVP above the line is such that it does not interfere in assessing
an business hypothesis. To further elaborate, the reason why the customers will
not use or buy the MVP is that the they actually do not see it worthy – not
because for example that the UI is horrible. Consequently, MVPs that are under
the Minimum Viability line do not produce reliable results to the learning goals.
On the other hand, when the learning goals are still small and abstract, such
as when an organization is just looking for the initial problem solution ﬁt (PSF),
MVPs can be smaller and of worse quality than when looking for product market
ﬁt (PMF). This is illustrated by the dotted vertical line.
In this case, we deﬁne quality to include not only typical quality attributes,
but also the eﬀort level it takes to develop an MVP such as additional features
and keeping track of technical debt. So the challenge is to optimize quality as
close to the Minimum Viability line as possible to reduce the amount of wasted
quality and work. Also the two steps to product validation, PSF and PMF are
marked to the chart to show the diﬀerent phases of validation [4].

The Developers Dilemma

575

Research into the motivations of programmers show that competence, not
experiments and ﬁnancial incentives motivate them. [5] For the developers the
way to show talent is to write quality code, so the natural tendency for quality in
this is to go up. When such overquality prototypes are thrown away, the eﬀorts
of the developer are wasted and it results in frustration for the developers. The
goal of the organization would be to make sure that the developers understand
the goals of the prototype/MVP. The management has to message to the developers the point in the x-axis for the current product. The developers are then
responsible for the positioning in the y-axis.
Some typical ﬁrst products are positioned on the scale. Landing pages are
small products which enable learning on a small scale, but do not require much
quality. Typically the goal of the landing page MVP is to just assess the initial
PSF. Brick and mortar stores as the ﬁrst product on the other hand require
a huge investment of resources to start and they may result in an immediate
failure, if the clientele is not interested in the products of the store. The idea
being that initially starting a brick-and-mortar store to asses your initial business
hypothesis produces a high quality “product”, but is a huge risk.
This positioning on the axes is at the heart of the developers dilemma and
how the wanted position in these axes is messaged to the developers. The positioning on both axes is important to the success of the company, but the management typically forgets to take into account the quality tendencies and its
psychological eﬀects when deciding the goals.
3.3

Example

The issues that result from the developers dilemma in development can be better
shown through an example:
A small startup is investigating their new hypothesis that people want to buy
blood pressure measurement services. An MVP to test the validity of the idea
is created. This MVP is a small website which enables customers to contact a
person to come and measure their blood pressure. For the software development
the user story in the backlog would be “As a user I want to ﬁll in my contact
details so the company can contact me to measure my blood pressure”. In this
ﬁrst iteration, the experiment is to see if people want to even buy these services.
The enabling MVP is the website that collects and stores contact information.
After the success of MVP, the company decides to experiment if they can
expand the business by changing the contact form to a time reservation system.
This way they can allocate personnel more eﬃciently. To facilitate this, a time
reservation website is built as the second MVP.
At the third stage, the company has seen that there is demand for blood pressure measurements and decides to experiment if the people are willing to measure their blood pressure in predesignated locations. To test this, they develop
a mobile application with GPS location to show the closest free blood measurement point. The new user story is: “As a user I want to be able to ﬁnd
out the nearest point where I can measure my blood pressure and reserve it.”
Here the experiment has expanded from a web form to a mobile application.

576

H. Terho et al.

Again from the angle of the software developer, the development of the mobile
application is a new production of a new stand alone application.
On the business development side, the continuous development of the theme
progresses and builds upon the results from the previous application, expanding
the scope of the project to one direction at a time. This however is not reﬂected
even in the backlog items created. The actual goals of the company are not
reﬂected by the user stories that the development is based upon. To improve
upon this, the startup should have a backlog item that shows the goals of the
current experiment, e.g. “We as a startup want to know if the customers want
to come to our premises for measurement.”
This communication problem causes the developers to develop three diﬀerent
projects. First a web application contact form is developed. Next a reservation
system is developed and third a separate GPS location application is built. These
three separate projects do not have large amounts of overlap and also the two
previous versions of the product have now been throwaway prototypes. On top
of this, the developer has seen two of his products thrown out as waste.
Even though the company has achieved its goals, this feeling might not have
been transferred to the developers. The feeling with the developers might even
be that most of their work was wasted, even though it clearly contributed to the
company goals. Similarly, if the developers have used huge amount of resources
in reﬁning for example the ﬁrst contact form MVP and its back-end scalability,
the organization can leave the developers without proper recognition as their
eﬀorts have done in vain in the eyes of the organization.
3.4

Lean Startup Diﬃculties

One of the most diﬃcult things in the execution of Lean Startup’s BuildMeasure-Learn cycle is understanding why an MVP actually fails. This assessment of product failures is a critical part of the Developer’s Dilemma. The Lean
Startup tries to avoid the dilemma, however, by focusing on learning goals, not
on development goals.
By developing diﬀerent types of MVPs, a startup can split their learning
goals into appropriately sized fragments. The diﬀerent types of MVPs have been
described e.g. in [9,13]. By sharing the vision of using many of them along the
life cycle within the whole organization, we think that the risk of Developer’s
Dilemma can be mitigated.
However, if the targets of developing an MVP are not made clear for the whole
organization (including developers) Lean Startup will not help in resolving the
Developer’s Dilemma. Therefore, the underlying communications problems and
the way developers valuate their own work form the crux of the Developer’s
Dilemma.
We propose that developers are intrinsically drawn to think that it is the
lack of reﬁning, missing features and things such as technical debt that make an
MVP fail. If this opinion is given the most value, an unsetting can be created in
which an MVP is reﬁned forever or until the organization runs out of funds, but
no one really understands (or accepts) why no user actually needs the product.

The Developers Dilemma

577

On the other hand, the development organization can be drawn to think
that an idea and the related MVP do not have any business value because the
MVP does not succeed straight away at a targeted level. In such a case, the
organization might perceive the developed MVP as a perfect artifact to ﬁnd
out if a business problem is worth solving. This, however, is quite often not the
case. Rather, MVPs usually need some reﬁning to be able to produce reliable
results for learning, if a business problem is actually worth solving. Again, this
premature rejection of an MVP can create an unfavorable setting. In such, MVPs
are thrown away before they are developed to their minimum viability.
We propose that the diﬃculty of balancing between these two unfavorable
settings, can be a root cause for the Developer’s Dilemma. On one hand, the
technical development of MVPs that can produce reliable results is required
and expected from the software developers. Also the developers naturally tend
toward better quality because it is a way for them to show their skills. However,
developing MVPs above the quality of minimum viability can be considered
waste. Thus, developers need to understand thoroughly the level of quality they
are expected to produce. On the other hand, the organization needs to have an
amazing competence in chopping down the learning goals to appropriate size
experiments. If these two premises do not exist in a case, the setup is ready for
example for contradicting opinions on why an MVP fails, i.e. one of the result
of the Developer’s Dilemma.
3.5

Analysis of the Developer’s Dilemma

This conﬂict between business-driven and technical-driven goals is not new. Similar dilemma exists in most Agile processes since the development should focus
on the tasks of the biggest business value. This often leads to compromises in
the architecture, and the maintenance of the architecture requires special care
in Agile development [6]. In one way, Lean approaches partly amplify this since
non-productive work is considered waste but by including principles like build
quality in the Lean community has recognized the importance of professional
development.
Many aspects of the developer’s dilemma are known but the startup approach ampliﬁes the developer dilemma since the process is driven by the business experiments. As in all organizations, the key element is communication and
understanding of common goals. Based on these assumptions we propose several
aspects that can be used to identify the developers dilemma:
– Do the same people develop the software and business aspects of the software?
The easiest way to understand both sides of the problem is to work on both
development and management roles. This allows for a larger perspective on
the whole company. The management should be transparent and allow for
opinions from both sides to be intermixed.
– Is the software development outsourced? Outsourcing the software development splits the goals of the company in two. The software company is responsible for delivering software and they have their own goals. The original business is left with just the business goals in mind and can optimize to fulﬁll

578

H. Terho et al.

those. For the outsourcing companies, the fulﬁllment of developer ethos might
be easier because the clients bring the problems to them and the company
does not have to do validation on those. Even if the product would fail on the
business side, the software developers have met their goals.
– What are the performance indicators that are used to measure the software
development? You get what you measure is an old mantra that still holds
true. If the development is only measured in regard to quality or number of
tickets completed the development work is separated from the business goals.
The measures should be developed in such a way to encompass business and
quality indicators.
– If asked, do management and software development teams have the same goals
for the product? Do they see the same future for it, i.e. still in use after
6 months, proﬁtable. Also what do the developers see as the core targets,
optimizing performance, creating products quick? Does management share
these goals?
– How does the company handle the failures of the previous MVP? Is it taken
as a way to learn from failure and do the management and development see
it in the same way. Are questions like was the UI good enough handled in the
same light as was there a market for the product?

4

Conclusions and Future Work

In this paper, we have analyzed the mismatch and especially the Developers
Dilemma between professionalism in software development and the needs of fast
business development. Thus this problem is not just limited to lean startup, but
also to other environments, where rapid product development is needed.
As as response, we propose a set of questions that can be used to analyze if
the developers dilemma is a problem in your company, outlined in Subsect. 3.4.
These questions allow for the initial recognition of possible problems between
uniﬁed goals of the developers and the management. It seems that one core
aspect to the developers dilemma might be the usage of diﬀerent measurements
to measure product success on the business development side and the software
development side. If these two sides are brought closer together and both sides
share a deeper understanding of the project goals the problem can be mitigated.
Although the idea comes from our personal experience and communication
with a number of local startups, there is still work to be done in reﬁning and
validating the proposed concepts.
As future work, we want to analyze how startups can eﬃciently recognize the
diﬀerence between throw-away and evolutionary parts of MVPs to assess the ﬁrst
solution. Similarly, we need to study to which extent the idea of experiments as
top-level requirements is already used and how it aﬀects development. Also other
ways that companies use to handle failure and waste are of interest. Instructions
on how to write and manage such requirements should be developed e.g. with
action research methodology. These two methods could be used to help solve the
dilemma.

The Developers Dilemma

579

The Developer’s Dilemma is a mismatch between the software developer’s
professional ambitions and the startup’s need for fast business experimentation.
As not responding to the dilemma can have grave consequences for the company,
the dilemma should be alleviated as well as possible to ensure the growth. This is
not just a problem for the management, but a challenge for the whole company
on how to communicate their core targets.
Acknowledgments. The authors wish to thank Digile’s Need4Speed program
(http://www.n4s.fi/) funded by the Finnish Funding Agency for Innovation Tekes
(http://www.tekes.fi/en/tekes/) for its support for this research.

References
1. Alvarez, S.A., Barney, J.B.: Discovery and creation: alternative theories of entrepreneurial action. Strateg. Entrepreneurship J. 1(1–2), 11–26 (2007)
2. Baker, T., Nelson, R.E.: Creating something from nothing: resource construction
through entrepreneurial bricolage. Adm. Sci. Q. 50(3), 329–366 (2005)
3. Blank, S.: The Four Steps to the Epiphany. K&S Ranch, Pescadero (2013)
4. Blank, S., Dorf, B.: The Startup Owner’s Manual. K&S Ranch, Pescadero (2012)
5. Da Silva, F.Q., França, A.C.C.: Towards understanding the underlying structure of
motivational factors for software engineers to guide the definition of motivational
programs. J. Syst. Softw. 85(2), 216–226 (2012)
6. Eloranta, V.P., Koskimies, K.: Aligning architecture knowledge management with
scrum. In: Proceedings of the WICSA/ECSA 2012 Companion Volume. pp. 112–
115. WICSA/ECSA 2012. ACM, New York (2012). doi:10.1145/2361999.2362023
7. Floyd, C.: A systematic look at prototyping. In: Budde, R., Kuhlenkamp, K.,
Mathiassen, L., Züllighoven, H. (eds.) Approaches to Prototyping, pp. 1–18.
Springer, Heidelberg (1984). doi:10.1007/978-3-642-69796-8 1
8. Maurya, A.: Running lean: iterate from plan A to a plan that works. O’Reilly
Media Inc., Sebastopol (2012)
9. Ries, E.: The Lean Startup. Penguin, New York (2011)
10. Sandor, C., Klinker, G.: A rapid prototyping software infrastructure for user interfaces in ubiquitous augmented reality. Pers. Ubiquit. Comput. 9(3), 169–185 (2005)
11. Stephens, M., Bates, P.: Requirements engineering by prototyping: experiences in
development of estimating system. Inf. Softw. Technol. 32(4), 253–257 (1990)
12. Terho, H., Suonsyrjä, S., Jaaksi, A., Mikkonen, T., Kazman, R., Chen, H.M.: Lean
startup meets software product lines: survival of the fittest or letting products
bloom? (2015)
13. Terho, H., Suonsyrjä, S., Karisalo, A., Mikkonen, T.: Ways to cross the rubicon:
pivoting in software startups. In: Abrahamsson, P., Corral, L., Oivo, M., Russo, B.
(eds.) PROFES 2015. LNCS, vol. 9459, pp. 555–568. Springer, Heidelberg (2015).
doi:10.1007/978-3-319-26844-6 41

Workshop-Based Corporate Foresight Process:
A Case Study
Leila Saari1 ✉ , Tanja Suomalainen1, Raija Kuusela1, and Tapio Hämeen-Anttila2
(

)

1

VTT Research Centre of Finland, Oulu, Finland
{leila.saari,tanja.suomalainen,raija.kuusela}@vtt.fi
2
Elisa Appelsiini, Helsinki, Finland
tapio.hameen-anttila@elisa.fi

Abstract. Corporate foresight is a value-creation tool that helps companies
survive in a competitive, ever-changing business environment. Foresight can be
perceived as a company’s capability to commit to continuous environmental
scanning and to respond quickly to the discovered market threats and opportuni‐
ties. Currently, there is scarce literature about foresight application procedures in
organizations. This paper presents a single-case study, with a foresight process
that was experimented in a real industrial setting. Based solely on a workshop
series, this simple and straightforward process used the participants’ tacit knowl‐
edge and insights as the main data sources. The participants presented diﬀerent
roles and departments of the case company. As a result, two new application areas,
My Data and eHealth, were found, and a message to the company’s top manage‐
ment was formulated. The future will show how the results will be implemented
in the company and what will be their business impact.
Keywords: Corporate foresight · Environmental scanning · Innovation · Insight ·
Process · Workshop

1

Introduction

The current global business environment is ever changing. Competition is tough; new
technologies, innovations, threats, and opportunities emerge continuously. An early
identiﬁcation of discontinuities can prevent companies from loosing ground in compe‐
tition. Companies have to consider how to survive or to defeat rivals. Many companies
have started to prepare better for the future by starting foresight practices. Foresight
requires future-oriented awareness and planning that enable businesses to respond
quickly to future market threats and opportunities [1].
Software companies have realized that changes in their business environment will
occur frequently with increasing speed and may include serious discontinuities. Compa‐
nies need to be aware of their business environment and be able to adopt or even create
radical adjacencies, that is, stepping outside the core and operating in new markets and/
or with novel products [2]. Moqaddamerad [3] deﬁnes corporate foresight as an organ‐
ization’s ability to scan the environment constantly and to consider future options.

© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 580–589, 2016.
DOI: 10.1007/978-3-319-49094-6_43

Workshop-Based Corporate Foresight Process: A Case Study

581

Both the overall corporate and speciﬁc business strategies should bend to foresight
and continuous innovation and hence help the company react to changes in the internal
and the external environments. As claimed by Grant [4], a turbulent business environ‐
ment makes systematic strategic planning more diﬃcult than before. Organizations are
aﬀected by both external changes, such as the economy, competition, and political
interference, and internal changes, such as management systems, organizational culture,
and employee morale [5].
Despite the abundance of current literature on the use of foresight, details of
successful methods and the results of foresight activities are scarce [1]. Plenty of fore‐
sight tools and methods are available today. However, it is challenging to choose the
most suitable ones for an organization. Companies interested in implementing foresight
are often unable to identify best practices because these ﬁrms conduct foresight activities
behind closed doors [1]. Mazurkiewicz et al. [6] also argue that foresight activities do
not always lead to successful implementations of their results. Foresight practitioners
usually focus on developing methodologies and conducting foresight exercises and have
no inﬂuence on the implementation of the results [6].
To ﬁll in this gap, a single-case study was conducted at Elisa Appelsiini, one of the
leading telecommunications, information and communication technology (ICT), and
online service companies in Finland. ICT industry ﬁeld relies heavily on software engi‐
neering as their products and innovations include software. The empirical data was
gathered through a series of workshops. The speciﬁc research question was formulated
as follows: How business opportunities can be discovered by a workshop based foresight
process? Thus, this paper’s main contributions are its presentation of an uncomplicated,
workshop-based foresight process and description of the ﬁrst experiences in using it.
The experiences gained from the case study form the ﬁrst validation of the process.
This paper is structured as follows: Sect. 2 reviews the current literature on foresight.
Section 3 describes the study’s research design. Section 4 presents and summarizes the
case study’s ﬁndings. Finally, in Sect. 5, the conclusions are drawn from the research.

2

Related Research

This section brieﬂy describes the characteristics of foresight. Since foresight and inno‐
vation are closely related, they are discussed as well. Foresight frameworks, methods,
and tools are then presented.
2.1 Setting Scene for Foresight
Foresight involves future-oriented awareness, enabling businesses to respond quickly
to future market threats and opportunities. It is useful in managerial planning, where the
ever-changing environment makes past data an unreliable basis for future action.
This paper focuses on the foresight of enterprises and corporations. Moqaddamerad
[3] deﬁnes corporate foresight as an organization’s ability to commit to continuous
environmental scanning and to implement corrective actions accordingly. Thus, fore‐
sight requires exploring new markets, products, and services to ensure the company’s

582

L. Saari et al.

long-term survival and success. Likewise, Rohrbeck [7] views corporate foresight as the
capability to detect, interpret, and respond to discontinuous changes.
The starting point of foresight includes environmental scanning, which identiﬁes
trends, driving forces, change factors, weak signals, and so on, that impact the business
environment Thus, foresight enables an enterprise to respond to the changes in the busi‐
ness environment or to actually foresee the coming changes, trends, and discontinuities
and prepare for them [3].
In enterprises, strategic planning had previously been solely the concern of the top
management. The growing request for transparency in management is gradually
increasing the share of staﬀ members who have the possibility to contribute to the fore‐
sight or even the strategy of an enterprise. This broader participation has to be organized
with suitable tools because face-to-face dialogue is not feasible with a huge group [8].
Foresight is not just a method or a technique but a social activity [9].
Foresight increases the future awareness of an enterprise and helps corporations
survive in a competitive business environment in times of continuous changes [3]. At
the organizational level, foresight refers to examining possible futures in a range of areas,
including technology, politics, and demographics, as well as determining what decisions
should be made today to establish the best possible future for the organization [10].
Moqaddamerad [3] argues that companies try to win the innovation race in diﬀerent
ways, such as the integration of foresight into their innovation processes. The purpose
of innovation management is to discover creative capabilities, constantly generate new
products, renew internal processes, and create a new market opportunities to promise a
long-lasting competitive advantage [3].
2.2 Foresight Process
The foresight process presented by Voros [11] was selected as our reference framework
because it is widely referred to in the current literature, e.g. [1, 8] and a practical
approach, describing the foresight process with questions to be answered. Actually,
foresight feeds strategic thinking with the “what if” question [11]. Prior to the process,
inputs represent the strategic intelligence that feeds the forthcoming foresight. The fore‐
sight process itself has three phases—analysis, interpretation, and prospection. The
process outputs bring options to decision making. In a strategy, the question of what
will be done will be decided; in action planning, how it will be done will be determined.
The three phases of foresight, together with the precondition (inputs) and post condition
(outputs), were selected as the starting points of our summary presented in Table 1,
which includes other foresight frameworks.
As shown in Table 1, the phases presented by the authors overlap to some extent.
The precondition (inputs) is mentioned by Voros [11], Bishop et al. [12], and Popper [8].
According to Voros [11], this phase is needed for information gathering before the actual
foresight. Popper’s [8] practical approach includes the people’s involvement. All the
presented foresight processes initially have an observation phase, that is, scanning,
monitoring, exploring, or sensing, which Voros [11] refers to as analysis. His second
foresight phase is named interpretation, which other authors call forecasting, analysis,
or seizing. Prospection, Voros’ third foresight phase, is termed visioning, projection, or

Workshop-Based Corporate Foresight Process: A Case Study

583

anticipation by other authors. The post-condition phase—output for strategy—
comprises action planning, transformation, renewal, and asset reconﬁguration. The
options are proposed for decision making, to be included in strategy and action planning.
Table 1. Foresight phases in literature
Voros [11]
Input for
foresight

Bishop et al. [12] Keenan et al. [13]
Framing:
identifying the
problem

Analysis
What seems to be
happening in the
environment?

Scanning:
Monitoring:
understanding
recognizing
the context of the relevant trends
organization

Interpretation
What is really
happening?
Which changes
are meaningful?
Prospection
What might
happen?

Forecasting:
taking into
account future
possibilities

Analysis:
understanding
the drivers of
change

Visioning:
moving from
possible to
preferred future
Planning:
building a
pathway to the
future
Action:
continuously
translating
foresight into
action

Projection:
anticipating the
future

Output for
strategy
What might we
need to do?

3

Transformation
: drawing
implications for
business

Popper [8]
Pre-foresight:
setting the overall
aspirations
Recruitment:
enrolling key
individuals and
stakeholders
Generation:
generating
prospective
knowledge and
shared visions in
three phases;
exploration,
analysis, and
anticipation

Action:
committing key
players
Renewal: mixing
intelligence and
wisdom

Teece [14]

Sensing:
identifying
opportunities and
threats (both
internal and
external)
Seizing: creating
business models,
determining
organizational
boundaries,
adapting the
strategy

Asset
reconﬁguration:
aligning tangible
and intangible
assets

Research Design

This section describes the research design of this study. It includes the purpose, methods,
and process of the research, as well as the case company description.

584

L. Saari et al.

3.1 Research Method and Process
This research was built on a single-case study [15]. A case-study approach was decided
to gain practical experiences. Järvinen [16] emphasizes the ability of case studies to
examine complicated circumstances and in this way, to gather information for the crea‐
tion of new knowledge. Additionally, the case-study methodology is reported to be well
suited for software engineering research as it involves the study of contemporary
phenomena in their natural context [17]. This study also follows the participative action
research [18] approach, which emphasizes participant collaboration. Action research is
an iterative process involving researchers and practitioners acting together on a partic‐
ular cycle of activities, including problem diagnosis, action intervention, and reﬂective
learning [18].
The research process followed the case-study research design presented by Yin [15].
It consisted of the following three phases: (1) deﬁne and design; (2) prepare, collect,
and analyze individual case results; and (3) analyze and conclude cross case results. The
research process was initiated by conducting a literature review, whose purpose was to
provide an understanding of the current state and knowledge relating to the main research
area. It is essential to construct a preliminary theory about a study [15]. The development
of the preliminary theory helps in deﬁning the appropriate research design and data
collection, as well as in generalizing the case study’s results.
Another intention of the literature review was to discover gaps and thus focus the
research on the most unexplored areas of study, and then to help structure the empirical
data collection. Thereafter, the case company was selected, and the data collection
protocol was designed. It was decided that the empirical data would be gathered through
several workshops, comprising the second phase of the research process. Both the case
company and the workshop participants were selected by using purposive sampling [19].
Purposive sampling involves designating a group of people for selection based on some
traits that are important for the study. After each workshop, all ﬁndings and results were
ﬁled and memos were written. Finally after all four workshops were conducted the
results were validated with the attendees in a remote session.
3.2 Data Collection and Analysis
The main data collection method used in this case study involved a series of four work‐
shops held at Elisa Appelsiini during the autumn of 2015. The research started with
planning meetings with three researchers and four contact persons from Elisa Appelsiini.
In these meetings (four in total), the schedule, number, and content of the workshops
were drafted, as well as the desired roles of the participants to be invited.
Thereafter, four half-day workshops were organized, with the following main
themes: (1) kick-oﬀ, (2) analysis, (3) interpretation, and (4) prospection and selection.
A total of 13 Elisa Appelsiini employees representing diﬀerent roles participated, from
solutions consultants to directors. There were 13 participants in the ﬁrst workshop, nine
in the second, 11 in the third, and nine in the last one.
In the kick-oﬀ workshop, an introduction to foresight was given, including the
process description, and various foresight methods and potential tools were presented

Workshop-Based Corporate Foresight Process: A Case Study

585

before agreeing on the title of this process. The second workshop’s goal was to gather
all trends, threats, discontinuities, and weak signals that might have eﬀects on the
company’s business. In the third workshop, the ﬁndings of the previous workshop were
further processed, this time from the viewpoint of potential customers. In the last work‐
shop, the participants prioritized the identiﬁed initiatives, evaluated their business
opportunities, analyzed the capabilities needed to implement the prioritized initiatives,
and formulated their message to the top management.
The workshops were facilitated by three researchers. One of them mastered the
process and ensured its smooth progress. The other researchers supported and docu‐
mented both the process and its intermediate ﬁndings, keeping notes and taking pictures.
The result slides were created based on the memos, post-it notes, and photos taken during
each workshop, and shared with the participants via email before the next workshop.
Finally, a result and feedback session for all initially invited workshop participants
was organized. The results were presented, and feedback was requested. The participants
honestly evaluated the content of the presentation, gave good feedback, and expressed
willingness to continue, either by deepening the same scope or selecting a new perspec‐
tive.
3.3 Case Company
Elisa Appelsiini is a merger of Elisa and Appelsiini. In 2010, Elisa acquired Appelsiini,
which started in 1999 as a small company with just a few employees. Elisa Appelsiini
continues to operate as Elisa’s aﬃliated company. Since their fusion, Elisa and Appel‐
siini have oﬀered a comprehensive service package in Finland, speciﬁcally targeting
small and medium enterprises. Elisa is a telecommunications, ICT, and online service
company serving 2.3 million consumers, including corporate and public sector
customers. The ﬁrm provides services for communication and entertainment, along with
tools for improving organizations’ operating methods and productivity.

4

Workshop-Based Foresight Process

The workshop participants were committed and enthusiastic, open to proposing new
ideas and thoughts, as well as trying novel methods and tools. First of all, they anticipated
that the results would be useful for the company. They were eager to bring the results
to the top management and to see the possible impacts of their proposals.
After the introduction in the kickoﬀ workshop, the participants started to discover
what should be the scope of this foresight experiment. Through the me-we-all method,
it was decided that the title of this process would be “The Business Transformation
Initiative of Elisa Appelsiini”. At the end of the kick-oﬀ workshop, there were high
expectations for the results; most of the attendees thought that emerging or radical busi‐
ness opportunities would be found during the process.
The second (analysis) workshop focused on environmental scanning. Usually, this
phase builds heavily on external information, including the literature review, biblio‐
metrics, and patent maps. In this case, we wanted to rely on the participants’ insights

586

L. Saari et al.

instead of studying reports that others had written. The workshop fostered an open
atmosphere to ensure that opposing views would also be expressed. Moreover, some
facilitation methods were used to switch the viewpoints of the participants. For example,
as we were identifying trends, the political, economic, social, technological, environ‐
mental, and legal (PESTEL) classiﬁcation [19] was shown to enlarge the focus. The
identiﬁed trends (over 50) were evaluated with a matrix of probability and impact. Four
topics were selected for further discussion: My Data, eHealth, morals and ethics in
business, and outsourcing/insourcing.
The third (interpretation) workshop started with stakeholder mapping, which iden‐
tiﬁed the case company’s various interest groups. After the generic stakeholder map,
more detailed studies on customers, partners, and competitors were conducted for both
of the selected application areas—My Data and eHealth. These maps were generated
for later use when considering potential partners and co-creators of new businesses.
The next step was to create several customer-segment canvases, one for each iden‐
tiﬁed segment. Adopted from the business model canvas [20], this canvas had the
customers as the starting point, including their jobs, pains while doing their tasks, and
gains if not having the pain. In total, 11 customer-segment canvases were created and
classiﬁed, with the new–old matrix showing the customer on the vertical axis and the
product (or service) on the horizontal axis. According to this evaluation by the partici‐
pants, most of the identiﬁed products were seen as new products for new customers.
The fourth and last workshop wrapped up the previous ones. It started with a matrix
(where participants had to move to the spot that described their attitude) with two axes,
as follows: “Foresight will have huge/no business impact” and “Afraid of/happy with
the organizational change.” The matrix generated a vibrant and open discussion, from
which the message to the top management was formulated. After the warm-up with the
matrix, the opportunities for the selected application areas (My Data and eHealth) were
prioritized and further reﬁned. The selected product or service ideas (found via the
customer-segment canvases in the third workshop) were analyzed, using the needs,
approach, beneﬁts, competition (NABC) analysis developed at the Stanford Research
Institute [21].
After the NABC analysis, a short pitch (ﬁve-minute talk) was given to clarify the
necessity of each product or service. A preliminary architecture was drawn to ﬁnd out
the elements needed to implement the selected ideas. Each identiﬁed element was clas‐
siﬁed by rating it with one to three stars. One star meant that the case company already
had the element, two stars signiﬁed that the company would develop it, and three stars
denoted that the company would buy or develop the element with a partner.
When closing the last workshop, the attendees had high spirits, believing that they
had formulated a meaningful proposal to the top management. Feedback was gathered
from their answers to open questions and four factors rated on a point scale from 4 to
101. The averages of factors were: facilitator’s skills 9.0, the feasibility of the used
methods and tools 8.45, the workshop-based foresight process itself 8.76, and the prob‐
ability of their recommending this process to a peer organization 9.0.

1

General scale in Finnish schools, where 4 = F, fail, 10 = A+, excellent.

Workshop-Based Corporate Foresight Process: A Case Study

587

Elisa Appelsiini’s business development manager concluded, “We are truly satisﬁed
with learning about the foresight process and are eager to begin the adaptation of these
practices in our organization. The workshops led by VTT were very professional,
combining the academic background with more hands-on methods in the actual work‐
shops. For us, this presented both the necessary framework to understand the big picture
and the concrete iterative model to look at the innovation for a competitive edge in our
market. We are truly looking forward to the next steps, as well as sharing our experiences
along the way.”

5

Discussion and Further Work

An uncomplicated workshop-based foresight process was facilitated at Elisa Appelsiini
during the autumn of 2015. In the kick-oﬀ workshop, the goal of the workshop series
was formulated as “the business transformation initiative of Elisa Appelsiini.” In the
workshops, the initiatives were processed with disruptions, technologies, business
models, and customer needs. In the ﬁnal workshop, the results were crystallized and a
message to the management was also formulated.
In this process, the participants’ insights were discovered and formulated through
various methods and tools. In this regard, all available tools are not easy to adopt and
exploit. Their selection and implementation require both knowledge on foresight and
experience in facilitation. While facilitating a foresight process, it is important to push
the thinking towards new tracks as the process relies on the participants’ insights.
The plan was to involve about ten attendees, representing diﬀerent roles and depart‐
ments of the case company in each workshop to keep the work intensive. The attendees’
roles varied from the director to business development managers and solution consul‐
tants. The outcome of this process was the common distillation of the participants’
insight. Therefore, the same results would likely not have been found with another group
of people.
The single-case study chosen in this research raises the question of the generaliza‐
bility of the results. Additionally, a single-case study poses the risk of misjudging and
exaggerating the research data [22]. This research uses the qualitative approach, which
relies much on interpretation and thus tends to be subjective. This study’s primary goal
was to experiment on a tailored, uncomplicated workshop-based foresight process that
relied on the insights of staﬀ members in a contemporary, turbulent business environ‐
ment.
The results’ external validity was achieved through the detailed description of the
experiment journey in the case company and by grounding this study on the previous
research. Internal validity was attained with diﬀerent types of triangulation [16]. First,
data triangulation was applied by using various information sources in terms of people
with diﬀerent roles who participated in the workshops. Second, methodological trian‐
gulation combined two types of data collection methods, that is, workshops and meet‐
ings. Third, investigator triangulation was employed; several people from diﬀerent
backgrounds participated in the analysis, including the company professionals with
varying areas of competence and the researchers outside the company. Fourth, theory

588

L. Saari et al.

triangulation involved diverse ﬁelds of research, comprising foresight, new service
development, and innovation.
At the time of writing this paper, it is not yet known how the results of the foresight
experiment will be used at Elisa Appelsiini; Hammoud et al. [1] and Mazurkiewicz et al.
[6] make similar claims in their articles. Future research topics could include how the
outputs of foresight activities would be utilized. The follow-up on the foresight results
would mean long-term research and include some visibility on the decision making,
strategy process, and even business impact. Additionally, it would be interesting to
expand this research to a multiple-case study with this or slightly modiﬁed process and
possibly use other data collection methods. A web-based interactive tool would be
applicable if more people could attend. The transparency of the process could be
enhanced as well if the intermediate results would be available to the staﬀ and their
comments would be solicited in the company’s Intranet. The participant roles could also
include representatives from customers, partners, competitors, and so on.
Acknowledgements. This research has been carried out under the Dimecc Need for Speed
program and has been partially funded by Tekes (the Finnish Funding Agency for Technology
and Innovation).

References
1. Hammoud, M.S., Nash, D.P.: What corporations do with foresight. Eur. J. Fut. Res. 2(1),
1–20 (2014)
2. Vitalari, N., Saughnessy, H.: The Elastic Enterprise. The New Manifesto for Business
Revolution. Telemachus Press (2012)
3. Moqaddamerad, S.: Corporate foresight: A contribution to innovation management, Turku
2014, Master thesis (2014)
4. Grant, R.M.: Strategic planning in a turbulent environment: evidence from the oil majors.
Strat. Manag. J. 24, 491–517 (2003)
5. Suomalainen, T., Kuusela, R., Tihinen, M.: Continuous planning: an important aspect of agile
and lean development. Int. J. Agile Syst. Manag. 8(2), 132–162 (2015)
6. Mazurkiewicz, A., Poteralska, B., Sacio-Szymańska A.: Implementation and evaluation of
foresight results. In: The XXIV ISPIM Conference – Innovating in Global Markets:
Challenges for Sustainable Growth, Helsinki, 16–19 June 2013
7. Rohrbeck, R.: Corporate Foresight Contributions to Management Science. Springer, Berlin
(2011)
8. Popper, R.: How are foresight methods selected. Foresight 10(6), 62–89 (2008)
9. Pina e Cuncha, M., Palma, P., Guimaraes da Costa, N.: Fear of foresight: Knowledge and
ignorance in organizational foresight. Futures 38, 942–955 (2006)
10. Horton, A.: Fore front: a simple guide to successful foresight. Foresight, J. Fut. Stud. Strat.
Think. Policy 1(1) (1999)
11. Voros, J.: A generic foresight process framework. Foresight 5(3), 10–21 (2003)
12. Bishop, P., Hines, A., Collins, T.: The current state of scenario development: an overview of
techniques. Foresight 9(1), 5–25 (2007)
13. Keenan, M., Popper, R.: Combining foresight methods for impacts. In: NISTEP 3rd
International Conference on Foresight, Tokyo (2007)

Workshop-Based Corporate Foresight Process: A Case Study

589

14. Teece, D.J.: Explicating dynamic capabilities: the nature and microfoundations of
(sustainable) enterprise performance. Strat. Manag. J. 298, 1319–1350 (2007)
15. Yin, R.K.: Case Study Research: Design and Methods. Applied Social Research Methods
Series, 3rd edn., vol. 5. Sage Publications, Inc., Thousand Oaks (2003)
16. Järvinen, P.: On Research Methods. Opinpajan kirja, Tampere (2001)
17. Runeson, P., Höst, M.: Guidelines for conducting and reporting case study research in software
engineering. Empirical Softw. Eng. 14(2), 131–164 (2009)
18. Avison, D.E., Lau, F., Myers, M., Nielsen, P.A.: Action research. Commun. ACM 42(1),
94–97 (1999)
19. Nardi, P.M.: Doing Survey Research, A Guide to Quantitative Methods. Pearson Education
Inc., Boston (2003)
20. Osterwalder, A., Pigneur, Y., Bernarda, G., Smith, A.: Value proposition Design. Wiley.
https://strategyzer.com/books/value-proposition-design
21. Stanford Research Institute. https://nielschrist.wordpress.com/2012/07/13/the-nabc-methodstandford-research-institute-sri/
22. Voss, C., Tsikriktsis, N., Frohlich, M.: Case research in operations management. Int. J. Oper.
Prod. Manag. 22(2), 195–219 (2002)

DevOps Adoption Benefits and Challenges
in Practice: A Case Study
Leah Riungu-Kalliosaari1(B) , Simo Mäkinen1 , Lucy Ellen Lwakatare2 ,
Juha Tiihonen1 , and Tomi Männistö1
1

Department of Computer Science, University of Helsinki,
Gustaf Hällströmin katu 2b, P.O. Box 68, 00014 Helsinki, Finland
riungu@cs.helsinki.fi
2
Department of Information Processing Science,
University of Oulu, P.O. Box 3000, 90014 Oulu, Finland

Abstract. DevOps is an approach in which traditional software engineering roles are merged and communication is enhanced to improve the
production release frequency and maintain software quality. There seem
to be benefits in adopting DevOps but practical industry experiences
have seldom been reported. We conducted a qualitative multiple-case
study and interviewed the representatives of three software development
organizations in Finland. The responses indicate that with DevOps, practitioners can increase the frequency of releases and improve test automation practices. DevOps was seen to encourage collaboration between
departments which boosts communication and employee welfare. Continuous releases enable a more experimental approach and rapid feedback
collection. The challenges include communication structures that hinder
cross-department collaboration and having to address the cultural shift.
Dissimilar development and production environments were mentioned as
some of the technical barriers. DevOps might not also be suitable for all
industries. Ambiguity in the definition of DevOps makes adoption difficult since organizations might not know which practices they should
implement for DevOps.

1

Introduction

Software and product development in the modern era of interconnected, highly
available systems requires close collaboration between members of the development team. Software distribution has changed and recurring software updates
can happen to the point that software releases are served at the same time as
when software is being used [12]. Because of the nature and frequency of the
releases, possible issues in the production systems need to be monitored closely
in order to provide optimum user experience [12]. Software organizations need
to adapt their practices to various changes brought about by new concepts such
as DevOps.
Our working deﬁnition of DevOps is: “a set of practices intended to reduce the
time between committing a change to a system and the change being placed into
c Springer International Publishing AG 2016

P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 590–597, 2016.
DOI: 10.1007/978-3-319-49094-6 44

DevOps Adoption Benefits and Challenges in Practice: A Case Study

591

normal production, while ensuring high quality” [3]. The DevOps phenomenon
has two core principles: (1) emphasis on collaboration between development
and operations; (2) the use of agile principles and automation to conﬁgure and
manage deployment environments [8]. DevOps extends collaboration between
development and operations teams which eases the handling of changes in the
production environment.
Industry reports on the DevOps practices have been rare in the past [9],
although more studies have surfaced lately. What potential do practitioners see
in DevOps, and what kind of challenges are there in adopting the DevOps? To
answer these questions, we performed a multiple-case study of three software
companies in Finland. We conducted semi-structured interviews with representatives from the companies and report the results of our analysis in this article.
The article is structured as follows. Section 2 describes previously reported
DevOps experiences, beneﬁts and challenges. Section 3 presents the applied
research method. Section 4 focuses on the results whereas Sect. 5 discusses the
results and considers the validity threats. Section 6 concludes the work.

2

Related Work

Several recent studies have recognized the importance of DevOps. Developer and
operations teams can streamline development processes in order to ﬁne-tune the
performance of services and increase scalability with virtualization [6]. Monitoring the production systems at real-time enables developers to react whenever
anomalies are detected [2,6,11]. On-demand infrastructures and timely feedback
from monitoring support continuous software delivery and deployment. Release
cycles can shorten to hours instead of weeks and months [11], which is seen as a
deﬁnite advantage of DevOps [4,7]. Monolithic architectures restrict the release
frequency [13] but microservice architectures help to break down components
into smaller pieces suitable for frequent releases [2,13].
Combining the expertise and knowledge of software experts from diﬀerent
functions can be challenging. The lack of cooperation between developers and
operations personnel results in uncoordinated activities [9,14]. This causes serious problems which include: (1) IT operations not being involved in the requirements speciﬁcation, (2) poor communication and information ﬂow, (3) unsatisfactory test environments, (4) lack of knowledge transfer, (5) immature systems,
and (6) operational routines not being established prior to deployment [9].
Poor communication between the development and operations functions produces undesirable results. Non-functional requirements e.g. performance or availability might be overlooked as the responsibility of running the product is
shifted to the operations team, letting developers oﬀ the hook [14]. Without
proper access to production systems and error logs, developers become frustrated [13,14]. Such challenges are only exacerbated if the development teams
are distributed [13].

592

L. Riungu-Kalliosaari et al.
Table 1. Characteristics of the organizations involved in the case study

Org. Domain

Personnel Unit size Interviewee position

A

Software development and 90
big data analytics

20

Senior Consultant 1 & 2

B

Digital service development 100

20

Senior Developer

C

Software development and >900
research

10

Lead Architect for Cloud

3

Research Method

We employed an explorative, qualitative research approach in our case study.
We wanted to ﬁnd out how the industry sees DevOps along with its advantages
and limitations. Our research questions are:
– RQ1: How do industry practitioners perceive the beneﬁts of DevOps practices
in their organization?
– RQ2: How do industry practitioners perceive the adoption challenges related
to DevOps?
We targeted software development organizations with suﬃcient DevOps
experience. Three Finnish development organizations were selected for the study.
Two of them were participants in a joint industry-academia program called Need
for Speed [1]. The sampling for the study can be considered as convenience sampling. Table 1 introduces the organizations.
Organization A is a consultancy company with a speciﬁc DevOps unit. Its
customers’ operations environments ranged from private servers and clouds to
the application of public cloud services. In Organization B, developers work as
consultants in customer projects, bringing their expertise as required. Organization C is an international organization with a technological expertise services
unit dedicated to cloud platform services. It sees DevOps as feasible for areas
with direct control of the operations environment.
Semi-structured interviews were the data collection method. The themes
related to views on the DevOps phenomenon, along with its advantages and limitations. Two interviews were conducted on-site and one remotely. The interviews
lasted from one to two hours. The interviews were recorded and transcribed.
Thematic analysis and synthesis [5] was used to support data analysis. The
method can be used to ﬁnd patterns in the data by assigning codes to segments
of text, translating the codes into higher-order themes and ﬁnally creating a
model of the themes [5]. The interview transcripts were coded and a set of
themes created as a result which allowed to construct a model of the beneﬁts
and challenges of DevOps.

DevOps Adoption Benefits and Challenges in Practice: A Case Study

4

593

Results

After analyzing the interview transcripts, we identiﬁed several themes. The
higher-level themes considered were the beneﬁts and challenges of adopting
DevOps. Figure 1 illustrates the thematic map and the themes covered in this
section.

Fig. 1. Thematic map for the perceived benefits and adoption challenges of DevOps

4.1

Perceived Benefits of DevOps

The respondents thought that DevOps practices lead to a number of beneﬁts.
Broadly speaking, DevOps was seen to positively aﬀect the pace at which software products could be delivered and the quality of products for instance by
intensifying feedback cycles. The internal mechanics of engineering products in
the organization including aspects such as communication were perceived to be
aﬀected, too.
As a deﬁnite advantage, the respondents highlighted more implemented features and frequent releases. Powered by automated build, testing and deployment
processes, one of the main advantages of DevOps is that organizations are able
to channel more features into the production and delivery pipelines as noted by
an interviewed Lead Architect. Automation also reduces the eﬀort required for
setting up releases, making it possible for organizations to churn out the releases
as frequently as required.
“There are multiple factors, relating to this but yes, deﬁnitely one of the key
drivers of DevOps is actually that you can get more commits of code into each
day basically.” (Lead Architect, Org C).
Subsequently, higher levels of automation were found to drive improved quality assurance. The automated DevOps production pipeline helps to ensure that
every change is veriﬁed before it is pushed forward for delivery. Because every
change in the code is checked at every stage of the development, and errors are
discovered and resolved on the ﬂy, the end products have fewer bugs and the
software can be readily released.
One of the major impacts of DevOps is that it forces the development and
operations teams to interact with each other more than before, which was seen
to lead to enhanced collaboration and communication. The walls between the
traditional development versus operations silos are slowly broken down, and as

594

L. Riungu-Kalliosaari et al.

a result encouraging a uniﬁed way of working. Increased collaboration accelerates
the exchange of knowledge and experiences between the teams. Multi-functional
teams with a variety of capabilities that boost productivity result in maximizing
competences since a broader set of the skills is utilized.
The respondents felt that when applying DevOps practices, the possibility
to release frequently has implications to the whole development process. Shorter
development cycles were seen as beneﬁcial for the customers who can enjoy newly
developed features faster than before. Smaller releases made more often promote
improved visibility of implemented features to the customer. In addition, due to
frequent releases, the development and operations teams are able to receive early
feedback from the end users and able to do testing with real customers, which
helps in improving the end products.
Working with real customers, companies are better equipped with knowledge on customer preferences and are ultimately able to tailor their products
to meet the market demands. According to the perceptions of the interviewees,
DevOps can help companies to test diﬀerent ideas quickly and make decisions
accordingly. The constant testing of hypotheses in order to determine the value
gains for both the customer and the organization is referred to as continuous
experimentation [10]. An eﬃcient DevOps process supports and enables continuous experimentation which requires well set up experimental infrastructure and
environments.
DevOps can impact occupational welfare, too. A senior developer noted that
frequent releases help to reduce the stress levels because the anxiety related
to handling huge releases is minimized. So, DevOps processes not only bring
beneﬁts to the organization, but they also improve the way of working, hence
positively contributing to the improved well-being of the DevOps teams.
4.2

DevOps Adoption Challenges

Due to many factors, DevOps might not always be successful. The respondents
raised challenges related to communication patterns, organization cultures which
are not malleable, diﬀerent constraints stemming from the domain and environments, and the obscurity of the meaning of DevOps.
A key impediment for successfully adopting DevOps is insuﬃcient communication. For instance, it was mentioned that operations teams do not always
monitor or pass all the performance and other metrics that can be of use to developers, which can cause problems. In sub-optimal cases, operations engineers and
developers care about diﬀerent, and possibly conﬂicting, metrics: operations personnel worry about the uptime of servers whereas developers are concerned with
the release frequency. It was noted that communication between the two groups
may also be lacking if it occurs only through electronic systems, causing delays
in reaction times to issues. In person communication is hard to replace with
electronic tools.
DevOps adoption also highlights cultural matters. Profound changes to the
cultural mindset are required and the deep-seated company culture can be a
challenge. As mentioned in the interviews, roles merge, responsibilities shift,

DevOps Adoption Benefits and Challenges in Practice: A Case Study

595

and people have to rethink their established roles. Developers have to take on
tasks they are not used to and might have reservations on accepting new responsibilities for the operations environment like being on call for system failures.
At the same time, operations people may be wary of the developers taking over
their turf or overly taxed with handling more frequent releases. Changing people’s behavior can be diﬃcult, especially if they have had long careers. Smaller
organizations might be in a better position to change their practices, though. As
mentioned by one of the interviewees, people need to be receptive towards changing the company culture to ﬁt the DevOps ideal since it is hard to push such
initiatives through if, for instance, the management is not supportive. A Lead
Architect stated the importance of culture and hardships in DevOps adoption.
“I think that’s a big cultural shift that we are also seeing, diﬃcult to address
and what is probably the biggest blocker in moving ahead with this, bigger than
the actual technical competences or processes. For me, I see it as two traditional
roles that suddenly need to merge and we just need to ﬁnd ways to, work towards
one common goal.” (Lead Architect, Org C).
DevOps practices might not be suitable in all circumstances. Access to production systems can be legally or contractually restricted so the industry constraints and feasibility in diﬀerent domains need to be considered when applying
DevOps. In speciﬁc cases, the environments such as databases used in production systems can be complex enough to make replicating the environments for
veriﬁcation and testing diﬃcult as mentioned by a respondent. As a consequence,
automated testing becomes less trustworthy meaning that heterogeneous environments provide a challenge for successful DevOps adoption.
While there is agreement about certain characteristics of DevOps, its true
essence is still somewhat vague. Since there is not a standard set of ﬁxed practices
related to DevOps, practitioners ﬁnd it hard to say what practices they should
take into use for DevOps. It was seen that the meaning of DevOps has shifted
in the previous years and new tools for DevOps keep coming up so one of the
challenges is that DevOps is unclear but also evolving.

5

Discussion

The interviewees saw that adopting DevOps had some beneﬁts. They saw
DevOps as a means to increase the implemented features and generate more
releases. The idea of rapid delivery aligns with the DevOps notion of reducing
the time it takes for a software release to reach the production environment [3].
DevOps encourages automation, which was seen to help in improving the quality
of releases.
DevOps helped to bridge the communication gap between developers and
operations engineers. This fosters collaboration towards improving the development process and end product. Furthermore, the existing diﬀerent skills can be
readily utilized hence increasing the team’s reactivity to problems.
The respondents thought that DevOps practices support real-time monitoring which helps to foster fast feedback loops and an experimental culture that

596

L. Riungu-Kalliosaari et al.

engages more interaction with the end users. Real-time monitoring has also been
previously highlighted by developers as a factor which helps to create fault-aware
systems [2,6] and DevOps has been seen to encourage experimental culture [11].
Shortcomings in communication and the prevailing company culture were
some of the challenges that we identiﬁed. The lack of knowledge and information
sharing can result in obscuring vital facts. Guidelines suggesting how to share
information can help but changing the culture of a company can be a challenge.
The cultural aspects are signiﬁcant, as has been previously stated [7,13]. The size
of the company or having company-wide support for the change might matter.
Smaller companies are in a better position to react faster to changes.
Constraints in the environment can prove challenging, too. Working habits
emphasizing e.g. security can prevent some companies from using DevOps practices. Technical environments that are diﬃcult to replicate add up to the challenges. Both challenges in the environment have been highlighted earlier [13].
As DevOps evolves, its deﬁnition, practices and tools are expected to change –
a challenge that is expected to remain in the long run. As organizations continue
to adjust to changes, Bass et al. [3] advise that DevOps should not be tied to any
speciﬁc must-have tools or communication practices, but it should be aligned to
the higher-level goals an organization wishes to achieve.
As an exploratory case study, there are no strong claims for causality and
threats to internal validity are not central. The interviewees’ responses are their
own opinions on DevOps. Hence, the reported beneﬁts and challenges are not
universal. External validity could be threatened by the selection of the cases
as two of the companies were consultancies. Consultants may see the situation
diﬀerently because they can draw experiences from multiple clients. Still, considering external validity, it is possible that a company adopting DevOps might
observe similar beneﬁts and challenges as presented in the study.
The most notable threats to validity are related to construct validity and the
operationalization of the concept of DevOps. Lacking a clear deﬁnition, respondents might have understood DevOps diﬀerently. Factors contributing to the beneﬁts and challenges could have been left out if a respondent had a narrow understanding of DevOps which could aﬀect the interpretation of its implications.

6

Conclusions

The study indicates some beneﬁts and challenges involved in adopting DevOps.
The beneﬁts include more frequent releases, improved test automation, better
communication and enhanced occupational welfare. DevOps can also support an
experimentation culture in software development.
The factors inhibiting DevOps adoption were along human aspects e.g. lack of
communication and resistance to change; and technical aspects e.g. the complexity of development and production environments. Suitability of DevOps might
be questioned for certain domains and industry sectors, at least for now. The
fuzzy deﬁnition of DevOps also prevents companies from having clear targets.
It would be interesting to learn about further implications of DevOps. Frequent releases are an advantage, but what are the eﬀects of short release cycles

DevOps Adoption Benefits and Challenges in Practice: A Case Study

597

and other DevOps practices? How does DevOps aﬀect the end users, or is it just
an internal matter for the development organizations? Understanding the eﬀects
in a larger scale could help in assessing the real value of DevOps.
Acknowledgments. This article was supported by TEKES as part of the N4S Program of DIMECC (Digital, Internet, Materials & Engineering Co-Creation).

References
1. Digile N4S (2016). http://www.n4s.fi/en. Accessed Sep 2016
2. Balalaie, A., Heydarnoori, A., Jamshidi, P.: Microservices architecture enables
DevOps: migration to a cloud-native architecture. IEEE Softw. 33(3), 42–52 (2016)
3. Bass, L., Weber, I., Zhu, L.: DevOps: A Software Architect’s Perspective. AddisonWesley, Boston (2015)
4. Callanan, M., Spillane, A.: DevOps: making it easy to do the right thing. IEEE
Softw. 33(3), 53–59 (2016)
5. Cruzes, D., Dyba, T.: Recommended steps for thematic synthesis in software engineering. In: International Symposium on Empirical Software Engineering and Measurement (ESEM), pp. 275–284, September 2011
6. Cukier, D.: DevOps patterns to scale web applications using cloud services. In:
Proceedings of the 2013 Companion Publication for Conference on Systems, Programming, & Applications: Software for Humanity, pp. 143–152, SPLASH 2013.
ACM, New York (2013)
7. Ebert, C., Gallardo, G., Hernantes, J., Serrano, N.: Devops. IEEE Softw. 33(3),
94–100 (2016)
8. Humble, J., Farley, D.: Continuous Delivery: Reliable Software Releases Through
Build, Test, and Deployment Automation. Addison-Wesley Professional, Boston
(2010)
9. Iden, J., Tessem, B., Päivärinta, T.: Problems in the interplay of development and
IT operations in system development projects: a Delphi study of Norwegian IT
experts. Inf. Softw. Technol. 53(4), 394–406 (2011)
10. Lindgren, E., Münch, J.: Software development as an experiment system: a qualitative survey on the state of the practice. In: Lassenius, C., Dingsøyr, T.,
Paasivaara, M. (eds.) XP 2015. LNBIP, vol. 212, pp. 117–128. Springer, Heidelberg
(2015). doi:10.1007/978-3-319-18612-2 10
11. Neely, S., Stolt, S.: Continuous delivery? Easy! just change everything (well, maybe
it is not that easy). In: Proceedings of the 2013 Agile Conference, pp. 121–128,
AGILE 2013. IEEE Computer Society, Washington, DC (2013)
12. Roche, J.: Adopting DevOps practices in quality assurance. Commun. ACM
56(11), 38–43 (2013)
13. Smeds, J., Nybom, K., Porres, I.: DevOps: a definition and perceived adoption impediments. In: Lassenius, C., Dingsøyr, T., Paasivaara, M. (eds.) XP
2015. LNBIP, vol. 212, pp. 166–177. Springer, Heidelberg (2015). doi:10.1007/
978-3-319-18612-2 14
14. Tessem, B., Iden, J.: Cooperation between developers and operations in software
engineering projects. In: Proceedings of the 2008 International Workshop on Cooperative and Human Aspects of Software Engineering, pp. 105–108, CHASE 2008,
NY, USA. ACM, New York (2008)

Towards Continuous Customer Satisfaction
and Experience Management: A Measurement
Framework Design Case in Wireless B2B
Industry
Petri Kettunen1(&), Mikko Ämmälä2, Tanja Sauvola3,
Susanna Teppola4, Jari Partanen2, and Simo Rontti5
1

Department of Computer Science, University of Helsinki, Helsinki, Finland
petri.kettunen@cs.helsinki.fi
2
Bittium Wireless Ltd., Oulu, Finland
{mikko.ammala,jari.partanen}@bittium.com
3
Department of Information Processing Science,
University of Oulu, Oulu, Finland
tanja.sauvola@oulu.fi
4
VTT Ltd., Oulu, Finland
susanna.teppola@vtt.fi
5
Faculty of Art and Design, University of Lapland, Rovaniemi, Finland
simo.rontti@ulapland.fi

Abstract. Customer satisfaction (CS) is continuously important in modern
industrial business environments. However, it is inherently affective even in
B2B contexts and thus not directly controllable. Satisfaction impacting customer
experiences (CX), respectively, can be managed by the supplier company. The
company should ﬁrst deﬁne its strategic CX vision, and then set the value-based
CX goals accordingly. The goals have to be made transparent to the entire
organization for producing the experiences with their current status and projected progress. A transparent measurement system is thus needed. In this
research work, we have investigated how satisfying experiences (chiefly UX)
can transparently be gauged in a B2B case company. Following our prior
research approach, instead of attempting to cover all possible experience
touchpoints in customer-supplier relationships, we focus on the main experience
factors of the case company. A real-time predictive CS/CX measurement
framework design is proposed. A use case is illustrated for initial evaluation.
Keywords: Customer experience  B2B  Key performance indicator
Transparency  Service design  User experience


1 Introduction
Customer experience (CX) is expected to become an increasingly important competitive advantage. In the digital economy business-to-business (B2B) companies must
also take that more consciously into account to succeed, following the B2C trends.
By emphasizing the customer perspective, leading industrial B2B companies are
© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 598–608, 2016.
DOI: 10.1007/978-3-319-49094-6_45

Towards Continuous Customer Satisfaction and Experience

599

endorsing superior user experience (UX) and CX as their differentiating competitive
strategies.
In this paper we aim to develop towards such ideals in one particular case company
context. Bittium, www.bittium.com (later referred to as “the company”) offers innovative products and solutions based on own platforms for defense, public safety and
other authorities markets, IoT markets (Internet of Things), as well as for industrial use.
For the wireless communication markets and companies who need wireless connectivity to their products, the company offers R&D services based on latest wireless
technologies and applications. The company also offers high-security solutions for
mobile devices.
The strategic targets and challenges of the company are as follows: to achieve
constant transparency to the performed work as close to real-time as possible with
collaborative feedback system from the product development results, customer satisfaction, and developers to cover the whole value chain towards analytics supported
customer experience in B2B context. Therefore, dashboards representing the work and
its value are created. Customer involvement (understanding) and service design are key
means.
The contribution of this paper is in showing how B2B customer experience can be
systematized and, consequently, strategically managed throughout the organizational
levels and lifecycle processes to continuously satisfy the customers. We design a
proposal for a CS/CX measurement framework to realize it in practice, as demonstrated
in the company case. The CX workflow gets transparency of the R&D and the KPI
targets.

2 Background and Related Works
2.1

Customer Satisfaction and Experience in B2B

The key distinction of B2B contexts compared with business-to-consumer (B2C) is that
the clients are other companies and organizations with formal acquisition processes run
by professional buyers. Thus, in this paper, customers refer to users and acquirers of the
products, including the delivery project and service-related operations stakeholders.
Customer experience stems from the perceptions of all the cognitive and emotional
touchpoint encounters. Following their subjective experience judgements, the customers are satisﬁed in various degrees. This may then affect their future behaviors.
While traditional CS measurements (surveys, in particular) have been used for
years, more prompt and forward-looking measures are just emerging. Currently, there
are some published CX measurement models and frameworks (e.g., the Forrester’s CX
Index), but in B2B, only few exist and none are standardized. In B2B, the
customer-supplier dependencies are more complex, requiring more comprehensive
measures [1]. Measuring the customer value delivery and realization is still not often
practiced [2, 3].
In general, it is not possible for industrial companies to control all the factors
affecting their customer experiences. However, each company can make strategic
choices of the particular experiences they want to deliver to their customers. They can

600

P. Kettunen et al.

select the experience components to focus on accordingly [3, 4]. There are various CX
improvement models proposed, but no generally agreed, validated standard references
exist (not even de facto). Some publicly available are the CXMM and Naïve to Natural.
2.2

Transparency of Customer Experience and Satisfaction

As mentioned, CX has a direct link to the customer behavior towards company’s
products and services, and therefore it requires careful planning and management
throughout the organizational levels and lifecycle processes of the product/service.
However, comprehensive and continuous CX management requires that the company
has transparency built in their engineering processes, touchpoints, and customer
interaction so that the realisation of positive CX can continuously be followed and
planned as a whole.
In this research, we link transparency to the management of CX throughout the
product lifecycle, from the design to the build and delivery, and related information
system support. In recent research, transparency is connected to Lean thinking, continuous deployment and process visibility [5, 6]. However, it is still unclear how
increased data and process transparency contributes to process performance, and under
what circumstances transparency is needed and brings performance effects or business
value [6].
In the research of customer satisfaction in information systems development,
process transparency has also been identiﬁed as important [7]. Often, customer satisfaction information is not widely shared across the organization with all the employees,
affecting the customer experiences. In B2B contexts, formalities and even restrictions
may limit.
2.3

Continuous Customer Experience Development with Service Design

The sources and drivers of B2B customer satisfaction become more diverse. Service
design (SD) is a methodological approach that can be used for customer involvement
during the development process [8]. It is a holistic, multidisciplinary ﬁeld that helps
improve customer satisfaction by improving existing products and services, as well as
making them more useful and desirable for customers with UX centric practices.
Service design has already taken place in the B2C context, but it is also recognized as a
useful approach in the B2B context, as well as in the internal development of processes.
SD serves as a platform and facilitator for cross-functional user-driven identiﬁcation
of product opportunities and features as well as enabling early concretization of ideas
and their effect to the holistic customer experience. The consequent UX goals can be
deﬁned and managed systematically [9]. The translation of the ﬁndings into measurable
CX/UX key performance indicators (KPI) is the critical phase in order to truly enable
real-time, data-driven, and transparent management of user experience throughout the
subsequent development, manufacturing, delivery, and service processes.

Towards Continuous Customer Satisfaction and Experience

2.4

601

Customer Involvement and Feedback

Customer involvement in the development process and the understanding of customers’
needs and behaviors are essential when building successful products and services.
There are various methods and tools to involve customers. For instance, agile methods
focus on improving customer satisfaction through collaboration and active participation
of relevant stakeholders [8]. Ideally, the customer and the supplier treat themselves as
partners with mutually beneﬁcial commits to value creation. Customer involvement
provides an opportunity to enhance the product’s technical performance, as well as
overall user experience, based on a better understanding of customers’ needs, resulting
in a better alignment of R&D resources. Also, throughout different touchpoints, there
are opportunities to collect customer feedback. For example, continuous deployment
and rapid feature validation cycles with customers are seen compelling for learning to
improve R&D efﬁciency and customer satisfaction after deployment [8, 10].

3 Framework Design
This work is a continuation of our previous research with the case company. Initially,
we investigated certain elements of the company R&D process and identiﬁed a set of
strategies to improve the customer satisfaction (e.g., keeping customer promises)
[11, 12]. In principle, real-world management research can be either description-driven
(explanatory) or prescription-driven (design sciences). Our previous work with the case
company has primarily followed the former thread, but in this paper, we advance with
the latter, solution-focused approach. We propose the measurement framework as a
design artefact and evaluate the innovation.
Based on the case company needs and challenges in particular (Sect. 1), and
grounding on the extant literature in general (Sect. 2), the purpose of this research work
is to design a solution for the following questions:
1. What valuable customer experiences does the company want to bring and focus on?
2. How to achieve and sustain them?
3. What measurements indicate and predict them?
To begin with, Fig. 1 depicts a conceptual model for the CS/CX measurement
framework design. The reasoning is that the customers will be satisﬁed when they
experience (CX) to get value from the supplied products (UX) and services considering
the beneﬁts and costs. The aim for the supplier ﬁrm is to provide that optimally. Ideally,
all beneﬁt from mutual value creations – leading to manageable and economically
optimal CX. We thus propose four goal areas (exempliﬁed in Sect. 4) to be attained
(CS/CX vision):
I. Excellent product solution (design; portfolio)
II. High-quality implementation, manufacturing (capability, technology)
III. Reliable transactions, customer care (includes validation of the products/
services)
IV. Successful customer relationship (satisfaction, feedback; business/partner cases)

602

P. Kettunen et al.

Fig. 1. Conceptual model of CS/CX management: goal spheres and contributing factors

Figure 2 visualizes our design solution model for the CS/CX measurement
framework to manage the above. There are four measurement sections corresponding
the goal areas I–IV. For each section, there are the following components to device:
KPI displays (actuals and predicted), Information sources (feedbacks and feedforwards). The left-hand side of each grid displays the current state of the CX goals based
on their associated measurement data (real-time) sources. The right-hand sides in turn
reflect their future predictions based on the selected indicator signals. Effectively, the
upper-half of each grid addresses research questions 1 and 2, and the lower-half –
question 3, respectively.

Fig. 2. Design model visualization for the CS/CX measurement framework

The added value of our framework is in combining the high-level CX goal areas
and gauging them with both monitoring and predictive measures. Existing methods
(like GQM+Strategies) can then be used to deﬁne them for each section [12].

Towards Continuous Customer Satisfaction and Experience

603

4 Realization
This section shows how the CS/CX measurement framework design presented in
Sect. 3 can be utilized in practice in the case company introduced in Sect. 1. The
company has systematically managed its business-level and project-level satisfaction
feedback since 2003. Most techniques used were based on collecting customer feedback and satisfaction data through familiar means such as surveys. Also, log ﬁles, bug
reports, change requests, and feature assessment were used to guide product management decisions. At the time of this investigation the company development focus is
to improve company enterprise data management1 for product portfolio, transparency,
traceability, and foresight purposes. Updated company portfolio and service offerings
also have brought improvements to the company’s UX and product feedback processes.
Table 1 exhibits how all the four sections (I–IV) of the measurement frame (see
Fig. 2) can be instantiated with the actual company data sources and reports (displays).
With this, the following company-speciﬁc CX goals and factors can be gauged
end-to-end:
• Product/Solution: UX design use cases linked to product requirements; Proven UX
factors applied into product features and usage
• Production & Delivery: Support for test case management to ensure early use case
feedback both internally and externally
• Projects & Services: To react faster, adapt and improve – to be able to cope with the
customer’s requirements, technology, competences and way-of-working.
• Customer & Business: To create better knowledge and understanding of the customer in
order to improve the services provided. With the help of creating better CX, customers
will be more happy and loyal, and the customer relationship will be better.
An actual business case (conﬁdential) can be characterized as following mapping of
the total CX into the four main components of the measurement framework (Fig. 2):
– Develop customized (III) mobile devices and solutions (I) for special authority
(IV) use based on the company’s special device platform (II) having enhanced
security.
For each component area, the company shall then deﬁne speciﬁc CX goals to be
followed. The following is an excerpt of the consequent goal deﬁnition and flowdown:

1

Company’s Product Master Data program.

604

P. Kettunen et al.
Table 1. Framework realization

ACTUALS
Customer Care
Portal (product
statistics);
UX monitoring
(evaluation of
user stories);
Prototype test
statistics

II: Production & Delivery

Reports of
Product delivery
related enquiries;
Product/delivery
realization data
monitoring

Product Delivery related Enquiries to product orderer, on
deliveries;
Delivery coplanning; Enterprise Data applications1 (shared
solution data)

III: Project & Services

Customer Feedback Summary –
Trends (reports
of Project or
Service related
CS Enquiries)

Project or Service related CS
Enquiries to
project management and
teams / service
orderer, during
finalization
(long cases multiple)
Customer Satisfaction Surveys
to business /
account management, annually or by plan

IV: Customer
& Business

I: Product / Solution

Feedback
Sources
Customer Care
Portal (product
feedback);
Verification of
user stories;
Prototype tests
(real-life)

Reports of Customer Satisfaction Surveys

PREDICTORS
Feature Assessment with Value
Tool [13];
Lead user experience (trials);
Product performance metrics
translated from
specific UX
requirements
Solution data
management and
Product content
updates; Product
release and trend
management;
Enterprise Data
modeling and
visualization

Impact Mapping
Grid [12]

Customer Care
Portal (statistics)

Feedforward
Sources
Technology and
business evaluations;
Usability expert
evaluation data;
Prototype trials
(with SD);
Product Master
Data applications
Company development environment for
Continuous Integration/Deploy
ment (SW &
HW);
Product Master
Data and Enterprise Data applications1
Project progress
trends, estimates; Process
audits; Employee feelings surveys

Customer Care
Portal (customer
feedback)

Towards Continuous Customer Satisfaction and Experience

605

With the four sections (I–IV) of the CS/CX measurement framework, the particular
goal above can transparently be monitored and steered accordingly throughout the
entire product value chain and lifecycle as follows (c.f., Table 1):
I. “Seamless navigation” designed and incorporated into the product requirements
II. Software and hardware implementation fulﬁlling the requirements,
production-to-design, related user documentation
III. Customer perceptions: The quality of deliverables is/was good.
Firm factors: The processes of the projects are/were well deﬁned and tailored to
ﬁt the case, and followed (UX plan). Design tools and networks are/were utilized
effectively. Veriﬁcation and validation of deliverables is/was functional (UX
tests).
IV. Customer feedback concerning the navigation and the keyboard

5 Discussion
Our research and development work can be judged both in general and locally in the
case company context with informed arguments. There are theoretical and practical
implications leading to future research avenues, as well as suggestions for the
company.
5.1

Evaluation

Our proposition is generic in that it only suggests the four main component areas, but it
does not prescribe the speciﬁc goal-setting to be instantiated by the particular B2B
company. This conﬁguration work alone is expected to be beneﬁcial for the company,
since it forces to make strategic choices (question 1 in Sect. 3) and touchpoint analysis.
The proposed four component sections build transparency to the customer experience information over the organizational levels (Sect. 2.2) supporting the common
awareness of the customer experience targets and their realisation during lifecycle
processes. However, we acknowledge that such sectors as marketing and sales (brand
experience) exist. Naturally, in different companies for instance the different types of
products may lead to different weightings of the sections. Moreover, with B2B customers there are typically different persons (including primary/secondary/indirect
users) involved at different stages (pre/post-delivery, use) of the customer journey
(stakeholder analysis). The ISO/IEC 25010 standard advocates promoting such different stakeholder perspectives and types of users (quality in use).
At the time of this writing, the case exhibited in Sect. 4 is under development, so
we do not have enough feedback from practitioners to conclusively assess, how beneﬁcial our CS/CX measurement framework proposal could be to use. However, the
proof-of-concept conﬁguration described in Table 1 reflects its application potential ex
ante.
Comparing, there is some conceptual research published although more in the B2C
contexts. Choi, et al. investigated consumer service experiential components (including

606

P. Kettunen et al.

price) [4]. In B2B, for example one of the seminal works by Eggert and Ulaga tests the
impact of customer perceived value on customer satisfaction [14]. Basten and Pankratz
distinguish between the overall customer satisfaction of the organization and the
end-user perceptions of the product performance [7]. Payne et al. advocate designing
the different customer encounters considering the forms and apt metrics [3]. Our
framework promotes that line of thinking, focusing on the usage and service touchpoints. Value-in-use is the key component in the research model of Lemke et al [1].
Understanding the deep customer needs, co-creating the solution, and monitoring the
actual value realization are principal satisﬁer factors. This mirrors the four sections of I,
II–III, and IV of our CS/CX measurement framework, respectively (c.f., Fig. 1). In
contrast to the conceptual research, the main contribution of our approach is
operational.
A general limitation of this investigation is that it is based on a single company
case. Consequently, the suggestions and ﬁndings cannot be generalized to the entire
B2B industry: rather they are meant to provide insight into similar companies.
5.2

Implications

The prime lesson of our research work is in discerning how industrial B2B customer
experience can systematically be framed and managed in practice. The conceptual
model (Fig. 1) is for the design reasoning, not theory development and testing.
As customer experience is a summary of various organizational levels of customer
interaction (Sect. 2.4) during the product/service lifecycle (Sect. 2.2), the case company main focus has been on enhancing the content and the accessibility of information
through the information systems, which would support the efﬁcient management and
even prediction of customer experience by utilizing data sources that were accurate,
timely, and relevant. A key target is to systematize product data and the aftersales
operations. Our CS/CX measurement framework is aimed at facilitating that line of
development – leaning on the assumption that the higher the process transparency, the
more accurate the solution realization foresight.
In general, we can delineate the following customer experience management
strategies with our CS/CX measurement framework: Design for Customer (IV in
Fig. 1), Design with Customer (I, II), Design by Customer (III). In B2B, a R&D project
can be considered a service to be codesigned and gauged (Sect. 2.3). Several counterparts from both the supplier’s and the client’s side can be asked considering each
four section of our CS/CX framework: what knowledge, technology (assets), and
resources they bring – and why (motivation). That is, in addition to the customer’s
view, we may also continuously appreciate the supplier’s view, including the ﬁrm’s
economy (see Fig. 1). These can also be supported by SD. In the case company, the
project level can be seen as the continuation of introducing design thinking. The ﬁrst
step has been paying attention to usability of software user interfaces and individual
devices. On the next level the design focus expands to more holistic UX including
fluent end-user work processes.
The basic premise of our research (Fig. 1) is that positive customer experiences
improve customer satisfaction, which in turn, positively influences the business

Towards Continuous Customer Satisfaction and Experience

607

relationship outcomes (e.g., loyalty). However, such causal linkages are complex to
validate. Nevertheless, each company should be able to discern the cost/beneﬁt ratio of
their different CX components. Customer value(-in-use) is the key concept [1].
While ﬁrm economy and business performance are out of the scope of our research
work, we suggest that the company considers the potential negative impacts of poor
customer experiences in the four sections of our CS/CX measurement framework both
from the customer’s and the ﬁrm’s points of views. By systematically measuring the
distinct touchpoints, the company is able to improve the performance impacts.
5.3

Future Work

We plan to strengthen the company case. More feedback from practitioners of applying
our framework would be appreciated to judge its value and to discover most suitable
internal and external (customer) data collection points as the sources of the measurements. Like exhibited in Sect. 4, we can record successful conﬁguration templates
(Table 1) for the known customers. In all, the predictive power of different indicators
(the right-hand sides of the grids in Fig. 2) could be validated in the company’s context.
New internal information sources could be employee insights (of product realization)
and employee satisfaction. Finally, we intend to follow the recent research stream of
co-creating customer value [1–3]. This spans our entire CS/CX measurement framework
space, but in our case company, the product value is the current focus area [13].

6 Conclusion
This paper suggests continuous customer experience management for modern B2B
companies. While CX measurement cannot be fully standardized, with individual B2B
customers it is possible for the company to control their selected touchpoints by their
contributing factors and the associated experience components to gauge for customer
satisfaction. By systematically measuring them, the company is armed to improve the
performance with impact-oriented customer experience management. In this paper, we
have designed a CS/CX measurement framework proposal to facilitate those aims as
demonstrated in our target company case.
Acknowledgements. This work was supported by TEKES as part of the Need 4 Speed Program
of DIMECC (Finnish Strategic Centre for Digital, Internet, Materials & Engineering
Co-Creation).

References
1. Lemke, F., Clark, M., Wilson, H.: Customer experience quality: An exploration in business
and consumer contexts using repertory grid technique. J. Acad. Mark. Sci. 39(6), 846–869
(2010)

608

P. Kettunen et al.

2. Keränen, J.: Customer Value Assessment In Business Markets. Doctoral Dissertation,
Lappeenranta Univ. of Tech, Acta Universitatis Lappeenrantaensis 579 (2014)
3. Payne, A.F., Storbacka, K., Frow, P.: Managing the co-creation of value. J. Acad. Mark. Sci.
36, 83–96 (2008)
4. Choi, E.K., Wilson, A., Fowler, D.: Exploring customer experiental components and the
conceptual framework of customer experience, customer satisfaction, and actual behavior.
J. Foodservice Bus. Res. 16, 347–358 (2013)
5. Rodriquez, P., et al.: Continuous deployment of software intensive products and services: a
systematic mapping study. J. Syst. Softw. (in press, 2016)
6. Berner, M., Augustine, J., Maedche, A.: The impact of process visibility on process
performance. Bus. Inf. Syst. Eng. 58(1), 31–42 (2016)
7. Basten, D., Pankratz, O.: Customer satisfaction in IS projects: assessing the role of process
and product performance. Commun. AIS 34, 430–447 (2015)
8. Yaman, S.G., Sauvola, T., Riungu-Kalliosaari, L., Hokkanen, L., Kuvaja, P., Oivo, M.,
Männistö, T.: Customer involvement in continuous deployment: a systematic literature
review. In: Daneva, M., Pastor, O. (eds.) REFSQ 2016. LNCS, vol. 9619, pp. 249–265.
Springer, Heidelberg (2016). doi:10.1007/978-3-319-30282-9_18
9. Kaasinen, E., et al.: Deﬁning user experience goals to guide the design of industrial systems.
Behav. Inf. Technol. 34(10), 976–991 (2015)
10. Sauvola, T., Lwakatare, L.E., Karvonen, T., Kuvaja, P., Holmström Olsson, H., Bosch, J.,
Oivo, M.: Towards customer-centric software development: a multiple-case study. In: 41st
Euromicro Conference on Software Engineering and Advanced Applications (SEAA),
pp. 9–17. IEEE (2015)
11. Münch, J., Fagerholm, F., Kettunen, P., Pagels, M., Partanen, J.: Experiences and insights
from applying gqm+strategies in a systems product development organisation. In: Demirors,
O., Turetken, O. (eds.) 39th Euromicro Conference on Software Engineering and Advanced
Applications (SEAA), pp. 70–77. IEEE (2013)
12. Kettunen, P., Ämmälä, M., Partanen, J.: Towards predictable B2B customer satisfaction and
experience management with continuous improvement assets and rich feedback. In:
Lassenius, C., Dingsøyr, T., Paasivaara, M. (eds.) XP 2015. LNBIP, vol. 212, pp. 205–211.
Springer, Heidelberg (2015). doi:10.1007/978-3-319-18612-2_18
13. Mendes, E., Turhan, B., Rodriguez, P., Freitas, V.: Estimating the value of decisions relating
to managing and developing software-intensive products and projects. In: 11th International
Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE).
ACM (2015)
14. Eggert, A., Ulaga, W.: Customer perceived value: a substitute for satisfaction in business
markets? J. Bus. Ind. Market. 17(2/3), 107–118 (2002)

Emerging Research Topics

Gamiﬁcation of Software Testing - An MLR
Mika V. Mäntylä1 ✉ and Kari Smolander2
(

)

1

University of Oulu, Oulu, Finland
mika.mantyla@oulu.fi
2
Aalto University, Espoo, Finland
kari.smolander@aalto.fi

Abstract. This paper presents an initial multi-vocal literature review that
extracts ideas for gamiﬁcation of software testing. We surveyed the type of
testing, system under test, role of individuals, gamiﬁcation elements, challenges
and drawbacks, support constructs and tools, and empirical evidence from
academic sources and grey literature. Ideas were given to both automated unittesting, and end-user related testing done by exploratory testers and beta testers.
The most frequent gamiﬁcation elements were points (13 sources), awards (4),
stories (4), badges (3), rankings (3), levels (3) and time-pressure (3).
Keywords: Gamiﬁcation · Testing · Multi-vocal literature · Grey literature

1

Introduction

Gamiﬁcation is the utilization of game elements outside the context of computer games.
Its purpose is to increase the engagement, motivation and performance of the participants
[1]. We study gamiﬁcation of software testing as software testing costs are high (35 %
of the IT costs [2]), testing is often an undervalued job, and testing produces lots of
information in the form of numbers. The numbers can be turned to points that are a
fundamental gamiﬁcation element. The high cost, low appreciation, and the apparent
gamiﬁability of testing make it an excellent target for gamiﬁcation.
Our research method was a multi-vocal literature review (MLR). MLRs aim at
studying all types of writings on a particular topic [3]. Thus, in comparison to traditional
systematic literature review they also include grey literature such as web-pages, blog
posts and discussion forum content. They are suitable for topics where academic liter‐
ature is lacking due to the recent emergence of the topic or for some other reason.
When searching the literature, we performed two main steps. First, we used a recent
SLR of gamiﬁcation in Software Engineering [1] as a starting point for forward and
backward snowballing academic literature. We studied the SLR and all primary studies
it was refereeing and found three papers that were related to software testing, veriﬁcation
and validation. Forward snowballing additionally revealed two additional studies.
Second, we used Google Search Engine to search for relevant grey literature. Our search
string was “gamiﬁcation software testing”. We utilized the page-rank algorithm. During
the process, the quality of the grey literature quickly deteriorated as we progressed
further. Our ﬁnal included grey literature source was 34th hit provided by Google.
© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 611–614, 2016.
DOI: 10.1007/978-3-319-49094-6_46

612

M.V. Mäntylä and K. Smolander

Overall, our goal was not to cover all possible corners but to ﬁnd a reasonable amount
of sources (n = 20) to enable the collection of potential ideas and requirements for a full
design science project, where a software testing gamiﬁcation environment is built. Our
resulting spreadsheet is online [4] and sources are referred with “S” and a number of the
reference, e.g. S1 refers to the ﬁrst source.

2

Results

2.1 Types of Testing, Systems Under Test (SUT) and Roles of Individuals
First, we wanted to understand what types of testing gamiﬁcation has been proposed for
gamiﬁcation. Unit testing was the most popular option mentioned in six sources (see
our online spreadsheet for details). In particular, Test-driven development, a specialized
way of unit testing was mentioned in three out of the six papers that mentioned unittesting. Six sources did not mention the type of testing or talked about testing in general
with no particular focus area. Two sources mentioned beta-testing as the type of testing
suitable for gamiﬁcation. Two mentions were also given to exploratory testing. To
summarize, gamiﬁcation was mostly suggested to the very opposite ends of testing. Unittesting is technical and typical performed by developers who also develop the software.
On the other hand, beta-testing and exploratory testing is often done from end-users’
perspective to ﬁnd out problems particularly related to the user or the customer or the
domain rules were the software is used.
Second, SUT can inﬂuence whether the gamiﬁcation of software testing is desirable.
However, the majority of our sources (13) did not specify the system under test in detail
or claimed that is applicable to all systems types. The only type of system receiving
more than one mention were Games with two sources. Other systems mentioned once
were: Data-center software, Inter-active system with complex workﬂows, Enterprise
Systems, Java, and an artiﬁcial element with linked list for education purposes. Overall,
it appears that any type of SUT would be suitable for gamiﬁcation.
Third, as past work suggests that testing is also performed by many roles (testers,
developers, customers, product managers and help-desk personnel) [5, 6], we investi‐
gated the roles who participate in the gamiﬁed testing. Nine sources mentioned devel‐
opers. Software testers were another notable group mentioned in eight sources. Other
roles received considerably less mentions: students were mentioned three times, betatesters and managers were mentioned in two sources and one source mentioned customer
support, designers, and crowdsourced workers. Additionally, one source was ambiguous
with respect to roles.
Although the literature of gamiﬁcation recognizes that multiple roles can participate
in gamiﬁed software testing eﬀorts, still majority of our source focus on the two obvious
groups developers and testers.
2.2 Gamiﬁcation Elements
We classiﬁed the type of gamiﬁcation elements [1, 7] of our sources. Gamiﬁcation
elements refer to constructs that try to transform work, i.e. software testing in our case,

Gamiﬁcation of Software Testing - An MLR

613

to a game. Transforming work that is supposedly boring and tedious to a game that
supposedly is fun and engaging is the key motivator of gamiﬁcation.
As in prior work [1] points were the most frequently mentioned gamiﬁcation element
with 13 sources. Points are a basic element in gamiﬁcation that can be turned to other
elements such as awards (4 sources), badges (3), and rankings (3). Empirical evidence
of the eﬀect of the gamiﬁcation elements was limited in our sources, but for example
S20 mentioned that a leaderboard (a public ranking based on points) was the most
eﬀective gamiﬁcation element.
Other game elements were also frequently present. Stories or quests were mentioned
in four sources. For example, S5 states that “Testers will be impersonating diﬀerent
characters from a detective in industrial London….” Time-pressure was an element
mentioned in three sources. Time-pressure can increase eﬃciency and make games more
engaging. Levels, mentioned in three sources, enable iterative progress and adjust the
diﬃculty of the game so that a suitable challenge is always present, e.g. to complete
level 1 one must have one unit-test for all classes, for level 2 one must have unit-tests
for all methods and so on. Tips, mentioned in one source, can be also be used to make
the games engaging and reduce deadlocks.
For an organization, games may require task distribution mechanics (mentioned in
two sources). Rules, mentioned in two sources, are also important as otherwise someone
can start getting points with inappropriate ways, e.g. to write meaningless code to
increase unit-test coverage.
2.3 Empirical Evidence, Support Constructs, and Challenges
Only three sources provided properly reported empirical results. S12 presents two
industrial case studies with impressive numbers that support the adoption of gamiﬁca‐
tion. The ﬁrst increased the defect ﬁxing speed while the other focused on using static
analysis results to motivate developers to create higher code quality. S13 presents a
student experiment in unit-testing showing that the treatment group (gamiﬁed) found
signiﬁcantly more defects and had higher requirements coverage than the control group.
Finally, S20 performed an industrial case study and found that their system was
successful in motivating developers to write more and higher quality unit tests.
Twelve sources presented support constructs to gamiﬁcation. S3 tells how their tool
for gamiﬁcation of testing failed and how it could be improved. S4 provides a storytelling
scaﬀolding with roles and principles that help with gamiﬁcation of testing. S7 connects
gamiﬁcation idea to the testers’ career path. S15 provides a rule system. Actual tools
were provided as JIRA add-ons (S8), Eclipse plugins (S10, S11, S13), web-based
learning environments (S18), and as GUnit tool (S20).
From our sources, we found three challenges in the gamiﬁcation of software testing
that we consider notable. First, S6 highlighted that people have diﬀerent ways to achieve
the same goal. If one then starts to measure things like the quality of defect reports in
the defect tracking system, then other qualities like excellent face-to-face communica‐
tion that can be used as a substitute for the poor quality of written reports would be
ignored. Thus, the game would reward only a selected subset of the bug reporting
process, i.e. the one that is visible in the defect tracker. This ignores many relevant parts

614

M.V. Mäntylä and K. Smolander

of the process as pointed out in the literature [8]. Second, even counting bugs will intro‐
duce challenges, for example counting and awarding based on the bug count could lead
to a situation where ﬁve spelling mistakes in the application would result in ﬁve bug
reports (S1). Such issues need to be dealt with some types of rules and game referees
that decide what is right. Third, S2 pointed out a need for balance to give individuals
enough freedom so that the game stays engaging and allows creativity to blossom, while
still maintaining control and coordination.

3

Conclusions

We make four ﬁndings in this paper. First, gamiﬁcation proposals were given to both
ends of testing – automated unit-testing that is technical, and end-user related testing,
i.e. beta-testing and exploratory testing. Second, the multitude of diﬀerent roles and
crowdsourcing in testing were recognized. Third, numerous gamiﬁcation elements were
present. Points were the most popular while stories appeared as something that could be
used to increase the engagement in ways that are not possible for numeric point based
approaches. Fourth, problems of gamiﬁcation were discussed, e.g. the gamiﬁed approach
might not allow employees to work in a way that is the most natural for each individual,
thus, resulting in unfairness and lower productivity.

References
1. Pedreira, O., García, F., Brisaboa, N., Piattini, M.: Gamiﬁcation in software engineering – a
systematic mapping. Inf. Softw. Technol. 57, 157–168 (2015)
2. Buenen, M., Walgude, A.: World Quality Report 2015–2016. Capgemini, Sogeti und HP (2015)
3. Garousi, V., Felderer, M., Mäntylä, M.V.: The need for multivocal literature reviews in
software engineering: complementing systematic literature reviews with grey literature, p. 26.
ACM (2016)
4. Mäntylä, M.V., Smolander, K.: Spreadsheet of sources: Gamiﬁcation of Software Testing - an
MLR [Internet]. Figshare. doi:10.6084/m9.ﬁgshare.3756600
5. Mäntylä, M.V., Itkonen, J., Iivonen, J.: Who tested my software? Testing as an organizationally
cross-cutting activity. Softw. Qual. J. 20, 145–172 (2012)
6. Prechelt, L., Schmeisky, H., Zieris, F.: Quality experience: a grounded theory of successful
agile projects without dedicated testers, pp. 1017–1027. ACM (2016)
7. Hamari, J., Koivisto, J., Sarsa, H.: Does gamiﬁcation work? – A literature review of empirical
studies on gamiﬁcation, pp. 3025–3034. IEEE (2014)
8. Aranda, J., Venolia, G.: The secret life of bugs: going past the errors and omissions in software
repositories, pp. 298–308. IEEE Computer Society (2009)

Internationally Distributed Software Development:
On the Impact of Distance Based on a Case Study
Harri Sten ✉ , Hannu Jaakkola, and Kari Systä
(

)

Department of Pervasive Computing, Tampere University of Technology, Tampere, Finland
{harri.sten,hannu.jaakkola,kari.systa}@tut.fi

Abstract. Distribution of software development is increasingly global and
crosses the geographical and cultural borders. As software development is crea‐
tive teamwork, the distribution is not about mechanical division of work. The
poster presents on on-going study on internationally distributed software devel‐
opment. Based on a literature and experiences from a case company, investigates
the motivations, models of distribution and the most problematic areas.
Keywords: Distributed development · Software development · Management

1

Introduction

Software development is often an international activity. Distributed software develop‐
ment is in many ways diﬀerent from the situation where developers work in a co-located
fashion. While all the disadvantages associated with international distribution cannot be
eliminated in full, their eﬀects can be reduced. Therefore, ﬁnding a balance between the
various beneﬁts, disadvantages, and successes is the most important aspect for a
successful international and distributed software development project. Carmel and Tjia
[1] describe the problems in terms of ﬁve centrifugal forces, each of which alienate
project staﬀ from each other and make the work more diﬃcult. These forces are culture,
communication, coordination, control, and cohesion.
In this paper, these ﬁve centrifugal forces are used as a framework for analyzing
distributed software development. In particular, we focus on the attributes of distance
in distributed software development. This research analyzes the eﬀects of these two
dimensions, centrifugal forces and distance attributes by using a concrete case company
that develops a product family in the globally distributed environment.

2

Research Methods and Data Collection

We address the aspects of distribution of software development in the light of earlier
studies in the form of the literature review [2], and reﬂect the ﬁndings to a case company
using an ethnographic study approach [3].
The ethnographic study is based on the main author’ long (12 years) career in the
case company. In addition, retrospectives have been important for development and
© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 615–620, 2016.
DOI: 10.1007/978-3-319-49094-6_47

616

H. Sten et al.

improvement of the process. These retrospectives been conducted both in project and
team level. As a genuine insider, the main author has collected data and analyzed the
changes and documented conclusions in the internal wiki pages of the organizations.

3

Background

3.1 Attributes of Distance
The distance between developers has many attributes. One categorization of those
attributes includes geographical, temporal, socio-cultural and organizational distance
[4, 5]. When considering geographical distance, the physical distance does not explain
all the impact to the software development. Eﬀort, time, and cost of travelling are more
relevant than simple physical distance [4, 6]. Temporal distance means that the devel‐
opers work on diﬀerent time zones and therefore have only few or no common working
hours. Similarly, to geographical distance it hinders real-time communication [4], but
also opens new possibilities like “follow-the-sun” [7] development.
Socio-cultural distance refers to a person’s ability to understand and adapt to another
person’s values and normative practices. Factors of socio-cultural distance include e.g.
national culture, language, politics, organization and employee motivation [4, 6]. One
attribute of distance is organizational. The software development can be distributed to
diﬀerent parts of the company or even outsourced to another company. The organiza‐
tional aspect can both decrease and increase the distance. Company culture can help
organizational distribution to overcome socio-cultural diﬀerences [4].
3.2 Centrifugal Forces of Distributed Model
The literature mentions few reference models for distributed software development.
Carmel [8] sees globalization of software development as a set of centrifugal forces that
propels things outwards from the center as it disperses developers to the far corners of
the world. Each of these ﬁve forces alienates project staﬀ from each other and makes
the work more diﬃcult. Those forces are now introduced one by one.
Culture. Cultural eﬀects are most visible in long and multi-national projects. Cultural
diversity can have several dimensions like corporate, project, or national. Even small
cultural mistakes may lead to culture clashes, mistrust and later to conﬂicts. Organiza‐
tional culture has an impact to development environment, however the regional culture
has still a much greater impact than organizational culture [9].
Communication. Communication requires special attention in distributed software
development, because there is less or no informal communication.
Coordination. Software development is a set of complex tasks with complex interdependencies. Well thought division of labor is important in the distributed software
project. Software project and /or the product must be suﬃciently modular so that it can
be divided into separate and independent work packages for each site to help the

Internationally Distributed Software Development

617

coordination of software development [4, 10, 11]. Also working time arrangements
should be coordinated so that common working time between sites can be arranged to
guarantee simultaneous work.
Control. Control ensures that the work is consistent with the objectives and selected
standards and practices. Control is best achieved when the project managers can be and
work among his subordinates (i.e. management by walking). That cannot be imple‐
mented in distributed software development project. To improve control in distributed
software project must be prepared both ﬁnancially and in terms of time for frequent
travelling as often as possible to all sites [1, 12].
Cohesion. If a software development team has good cohesion, members can rely on
each other, like each other and to help one another. All the members of the project team
should trust each other, so that eﬃcient co-operation is at all possible.

4

A Validating Case Study

Description of the organization. The case company had a really distributed organization
with sites in Asia, North America and Europe. Headquarter and executive management
was in Finland. Distribution to Asia and subcontracting were used to enable the growth
but also due to cost pressures. Site in North America was established and acquisition in
a Southern European country was implemented to access necessary technological knowhow. Fast increase of distribution inevitably causes a lot of diﬃculties. Most of the
problems during the development were consequences of geographical and temporal
distances. Physical distance increased traveling related costs and temporal distance led
to diﬃculties in communication (coordination and control).
Culture. The development work in the case company was distributed to three continents
and seven countries. Cultural diversity required the management to familiarize and
understand the cultures of the diﬀerent countries. The main part of the software was
developed in Finland, India and the USA. In the Lewis’ LMR model [13], all these
countries are on diﬀerent edges of the triangle. Cooperation between these oﬃces
appeared very diﬃcult. A strong corporate culture helped to decrease some implications
in the area of culture, but in the subcontracting model it does not help.
Communication. The case company faced a lot of communication challenges. Long
temporal and geographical distances lead to asynchronous communications. The main
communication tool was e-mail. Synchronous communication was accompanied mainly
with regular telephone and videoconferencing, but lack of mutual face-to-face meetings
prevented successful of the communication.
Coordination. Coordination of work required special attention from the management.
Product architecture was created to support distribution of tasks to diﬀerent sites and
subcontractor. As competencies and practices were very diﬀerent at sites work slit was
not ideal. The one positive consequences opportunity to use “follow-the-sun” while

618

H. Sten et al.

establishing their continuous integration and testing so that China tested every day what
other sites had accomplished.
Control. Strong company and project culture helped organization to establish opera‐
tional control of projects. But still some problems occurred during project related to oral
reporting in the meetings. Culture diﬀerences made interpretation of daily oral reporting
very hard as delivered quite often through some ﬁltering. Also daily reporting had delays
based on temporal diﬀerences.
Cohesion. Negative side eﬀects of cultural diﬀerences, communication problems were
lack of trust on each other’s knowledge and competences. Lack of conﬁdence and trust
was already seen at early stages of development project. Organization started to establish
kick-oﬀ meetings at the beginning of each project to create conﬁdence on competencies
and creating trust between sites and people.

5

Results and Analysis

Diﬀerent organizational and project structures cause combinations of attributes of
distance with many implications, both positive and negative. With good processes and
development methods the organizations can decrease the negative impacts and at the
same time to help managing the problematic areas of distributed development.
Table 1 shows how different distance attributes affect distributed work through
the centrifugal forces. The table is based on earlier studies and their key findings e.g.
[14–17], hypothesis of a main author and the case study (ethnographic study)
presented in Sect. 3. Strength of these impacts has been estimated with four values
(low/medium/high/very high). In very high level impact success of development
work require serious actions to eliminate problems. Low level impact is not a serious
threat needs to be taken into account. The most problematic areas are indicated with
grey background in Table 1.
Table 1. Strength of impacts in the distribution

The geographical distance has always impacts due to wasted time and costs of travel.
The biggest impact is on coordination, control and cohesion. Controlling of the work
and people from a long physical distance is hard to implement and oral two-way

Internationally Distributed Software Development

619

communication is often replaced with written reporting. It is not easy to use the manage‐
ment-by-walking method or true co-operation across distributed sites. Cohesion over
long geographical distance is very diﬃcult to establish and needs extra eﬀort from the
management.
The high temporal distance has a major impact to key areas of software development
project. The case and earlier studies have shown that temporal distance leads to delayed
and asynchronous communication, and lack of informal communication [1]. In addition
to weak collaboration and trust all this makes implementation management mechanisms
diﬃcult [18]. On the other hand, the case has been shown that “follow-the-sun” devel‐
opment can take an advantage of temporal distance.
Diﬀerent socio-cultural distances have a major impact on the working culture of the
project. This eﬀect can be reduced by introducing a strong company or project culture.
Communication is diﬃcult when people do not have common language or their cultural
behavior is signiﬁcantly diﬀerent. It is hard to build trust and co-operation between
people who have diﬀerent backgrounds and believes. Our case showed that without prior
collaboration or knowing and understanding competencies and ways of working in
diﬀerent cultures, developers cannot work together successfully during the project.
Cultural problems are most harmful in outsourcing because no shared company culture
to compensate resulting problems is present [1].
Organizational distance impacts all centrifugal forces, but mostly cultural. That can
be seen clearly in the case of outsourcing where company cultures diﬀer. When the
company needs to distribute teams to diﬀerent cultures, the management mechanisms,
communication channels, practices and tools need to be upgraded and take into account
national and company culture diﬀerences. [1]
To summarize, Table 1 shows how diﬀerent attributes of distance and centrifugal
forces impact the success of distributed work and it seems that common impact is even
stronger in the presence of multiple attributes of distance.

6

Conclusions

Distributed software development is a daily routine in software industry. In this research
we analyzed four selected attributes [4] of distance in internationally distributed soft‐
ware development and how those attributes inﬂuence the aﬀects the centrifugal forces.
In addition to geographical distribution, distributed development covers other factors,
such as temporal and socio-cultural distribution, out of which the temporal distance has
been shown to be the most problematic attribute of distance.
Acknowledgments. The research has been supported by Tekes-funded Digile project Need for
Speed.1

1

http://www.n4s.ﬁ/en/.

620

H. Sten et al.

References
1. Carmel, E., Tjia, P.: Oﬀshoring Information Technology: Sourcing and Outsourcing to a
Global Workforce. Cambridge University Press, Cambridge (2005)
2. Kitchenham, B.: Procedures for Performing Systematic Reviews. Keele University Technical
report. Keele, Staﬀs, UK, Software Engineering Group, Department of Computer Science,
Keele University. TR/SE-0401 (2004)
3. Randall, D., Harper, R., Rounceﬁeld, M.: Fieldwork for Design: Theory and Practice.
Springer, Heidelberg (2007)
4. Ågerfalk, P.J., Fitzgerald, B., Holmström, H., Lings, B., Lundell, B., Conchuir, E.O.: A
framework for considering opportunities and threats in distributed software development. In:
Proceedings of the International Workshop on Distributed Software, Austrian Computer
Society (2005)
5. Gumm, D.C.: Dimensions of distribution in software development projects: a taxonomy. IEEE
Softw. Spec. Issue Glob. Softw. Dev. 23, 45 (2006)
6. Holmström, H.Ó., Conchúir, E., Ågerfalk, P.J., Fitzgerald, B.: Global software development
challenges: a case study on temporal, geographical and socio-cultural distance. In:
International Conference on Global Software Engineering (ICGSE2006), Costão do Santinho,
Florianópolis, Brazil, 16–19 October 2006
7. Carmel, E., Espinosa, J.A., Dubinsky, Y.: “Follow the Sun’’ Workﬂow in Global Software
Development. J. Manag. Inf. Syst. 27(1), 17–38 (2010)
8. Carmel, E.: Global Software Teams: Collaborating Across Borders and Time Zones. Prentice
Hall PTR, Upper Saddle River (1999)
9. Auch, F., Smyth, H.: The culture heterogeny of project ﬁrms and project teams. Int. J. Manag.
3(3), 443–461 (2010)
10. Lings, B., Lundell, B., Ågerfalk, PJ., Fitzgerald, B.: A reference model for successful
distributed development of software systems. In: Proceedings of the 2nd International
Conference on Global Software Engineering (ICGSE 2007), Munich, Germany, 27–30
August 2007
11. Ågerfalk, P.J., Fitzgerald, B., Olsson, H.H., Ó Conchúir, E.: Beneﬁts of global software
development: the known and unknown. In: Wang, Q., Pfahl, D., Raﬀo, D.M. (eds.) ICSP 2008.
LNCS, vol. 5007, pp. 1–9. Springer, Heidelberg (2008). doi:10.1007/978-3-540-79588-9_1
12. Ramasubbu, N., Krishnan, M.S., Kampalli, P.: Leveraging global resources: a process
maturity framework for managing distributed development. IEEE Softw. 22(3), 80–86 (2005)
13. Lewis, R.: When Cultures Collide. Leading Across Cultures, 3rd edn. Nicholas Brealey
International, Boston (2006)
14. Noll, J., Beecham, S., Richardson, I.: Global software development and collaboration: barriers
and solutions. ACM Inroads 1(3), 66–78 (2010)
15. da Silva, F.Q.B, et al.: Challenges and solutions in distributed software development project
management: a systematic literature review. In: 2010 5th IEEE International Conference on
Global Software Engineering (ICGSE), IEEE (2010)
16. Jiménez, M., Piattini, M., Vizcaíno, A.: Challenges and improvements in distributed software
development: a systematic review. Adv. Softw. Eng. 2009, 3 (2009)
17. Šmite, D., et al.: Empirical evidence in global software engineering: a systematic review.
Empirical Softw. Eng. 15(1), 91–118 (2010)
18. Fox, S.: Information and communication design for multi-disciplinary multi-national projects.
Int. J. Managing Proj. Bus. 2(4), 536–560 (2009)

Using Scrum to Develop a Formal Model –
An Experience Report
Marta Olszewska ✉ , Sergey Ostroumov, and Marina Waldén
(

)

Faculty of Natural Sciences and Engineering, Åbo Akademi University, Turku, Finland
{marta.plaska,sergey.ostroumov,marina.walden}@abo.fi

Abstract. The beneﬁts of merging agile and formal methodologies have been
discussed on the conceptual level for several years now, also in our previous work.
This paper presents a hands-on investigation on the synergy of the Event-B formal
method within the Scrum development process. A case study of the landing gear
critical system is used to investigate the feasibility of such a merge. We provide
the quantitative and qualitative analysis of the case study by measuring develo‐
per’s eﬀort, size and complexity of the created model, as well as observing the
development process. Our results show that the merge increases quality, in terms
of sustaining the creation of correct and reliable systems (Event-B) and at the
same time smoothens the modelling process, enhances comprehension of the
system domain and requirements (Scrum).
Keywords: Scrum · Event-B · Agile development process · Formal methods

1

Background

Agile methods, dated back to Agile Manifesto [1], are known for enabling rapid, ﬂexible
and evolutionary development with a strong emphasis on its social aspect (team work
and communication) [2]. They all have provided practices and values, which can be
tailored with respect to the context of their application. Due to the ﬂexibility and certain
degree of freedom when using such methods, they are not considered as a ﬁrst choice
for supporting the development of critical systems. However, the quality and correctness
of critical systems [3, 4] have been assured with formal methods for over 40 years now.
The mixture of these methods would create a development setting, which can ensure
high quality of the system being created in a process that supports iterativeness and
response to change. This combination would beneﬁt from providing transparency in the
project by increasing the interaction between team members and improving compre‐
hension of the requirements of the system to be developed.
In our previous work, we investigated several agile methods with respect to their
feasibility in the development of critical systems [5]. We explored the values, principles
and practices of agile development methods and placed them in the context of formal,
This work was carried out within the project ADVICeS (https://research.it.abo.ﬁ/ADVICeS/),
funded by the Academy of Finland, grant No. 266373.
© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 621–626, 2016.
DOI: 10.1007/978-3-319-49094-6_48

622

M. Olszewska et al.

reﬁnement-based developments. We provided a mapping between the characteristics of
these two, which established FormAgi – a high-level framework consisting of (i) guide‐
lines on what concerns should be tackled before committing to a certain agile method
and (ii) pointers in which aspects an agile method can be a facilitator in the formal
development.
We chose to use Event-B [6, 7] as a formal method and modelling language and to
apply it within the Scrum [8] agile process. We ﬁnd Event-B particularly useful, as it
supports the iterative creation of systems in a correct-by-construction manner. We are
able to model software, hardware, as well as environment [9]. Scrum matches well
Event-B and its idea of stepwise development of a system, since it is time-framed, iter‐
ative and incremental. Although Event-B is not considered as a lightweight approach,
in this work we apply it in a rapid manner by conducting the development in small
reﬁnement steps [6], and by decomposing the models [10], as well as by using compo‐
nent-based visual development [11]. The goal is to achieve a development of a high
quality and correct system in an adaptive, ﬂexible, continuous and timely way.
In this work, we validate our claims we made in the previous conceptual study
[5, 12]: (i) a refinement step (formal modelling) and a development iteration (agile
process) are not exactly corresponding, (ii) in order to fit the iterations, the require‐
ments may need to be decomposed, (iii) Scrum smoothens the development by
facilitating communication and improving understandability in the project, as well
as providing control over development (supports progress and steady pace). We
perform a hands-on investigation on the Event-B development in a Scrum setting
by using a case study (Landing Gear System – later referred to as LGS) from the
industry from the aerospace domain. The LGS case study is well described in [13]
and modelled using various formalisms. In our work we concentrate on the Scrum
development process and how it can facilitate the Event-B formal modelling.

2

Event-B Within Scrum

Event-B provides rigour to the speciﬁcation and design phases of the development
process of critical systems. It enables us to gradually introduce more details and func‐
tionality to the constructed system via reﬁnement steps. The formal development starts
by modelling a set of requirements as an abstract speciﬁcation which is then reﬁned in
a number of steps. The consistency between the reﬁnement levels is veriﬁed by math‐
ematical proofs. An Event-B speciﬁcation uses a pseudo-programming notation
(Abstract Machine Notation) and consists of a dynamic and a static part, which are linked
to each other. Event-B is supported by the Eclipse-based, open source tool called Rodin
[14] that is a “rich client platform” extendable with plug-ins.
We beneﬁt from the way software systems are developed with Event-B. The gradual
introduction of properties to the system enabled by reﬁnement allows us to comply with
the iterative, incremental and time-framed nature of agile development. Moreover, we
can handle complexity issues more eﬃciently by decomposing the problems to simpler
and smaller ones. The quality aspect of development is assured by the correct-byconstruction approach and strengthening the work on requirements (elicitation). Finally,

Using Scrum to Develop a Formal Model – An Experience Report

623

modelling and proving properties of the system contributes to building a well-deﬁned
system and diminishing the risk of unnecessary re-work due to misunderstanding or not
suﬃciently described requirements.
Formal development (herein modelling activity) diﬀers from traditional develop‐
ment not only in the rigour of the development, but also in how the progress of the
development can be seen and measured. For instance, the progress of the development
is determined by the artefacts being created during the development (like requirements,
speciﬁcations, models on speciﬁc abstraction levels or implementation of a feature), as
opposed to recognising executable code as the main measure of progress.
A set of requirements to be developed, acting as product backlog, is called the item
pool. It contains high- and low-level requirements, safety cases, environmental and
context descriptions. A subset of the item pool comprising of requirements chosen for
the current sprint is a backlog. The requirements are only prioritised within the sprints.
The reasoning is twofold: (i) we do not want to rush decisions which would lead to a
complex and hard to prove model and (ii) the work on the requirements and their struc‐
turing with respect to the modelling strategy will pay oﬀ later, when the model needs to
be extended. Therefore, a sprint includes modelling of the requirements, as well as
developing and proving a model. Finally, model animation and simulation, validation
mechanisms supported by Rodin, can also be a part of the sprint.
Finally, although not present in the original Scrum, a tool-supported and proactive
feedback system is included in the sprints via the Monitoring and Metrics mechanisms
(M&M). It is to raise understanding on the current status of the iteration, facilitate the
process improvement and provide control and evidence on the development, by meas‐
uring the number of proof obligations and the time that is used for modelling, as well
as the size and complexity of the model.

3

Formally Modelling Landing Gear System in Scrum

We demonstrate the proposed synergy using the Landing Gear System case study LGS
[13]. We focus on the process of formally building the model and how it can be smooth‐
ened by utilising Scrum. Further details of the LGS case study and the context of our
investigation are given in [15]. The construction details of the formal model, its compo‐
nentization and patterns are described in detail in [11].
The LGS system consists of a digital controller and a few actuators. The function of
the system is to operate the landing gears and associated doors. Depending on the reac‐
tions from the pilot, the digital controller manipulates the mechanical part. The mechan‐
ical part, in its turn, consists of front, left and right landing sets. Each set includes a door,
a landing gear and hydraulic cylinders that are attached to and move the corresponding
doors and gears. In addition, the system has an analogical switch to prevent an abnormal
behaviour of the digital part.
We performed our hands-on investigation in academic setting that involved three
persons with the following expertise: formal methods, formal modelling of systems,
quality assurance and quality measurements. The roles, to some extent shared, were as
follows: (i) product owner: role shared by the modeller, due to the familiarity with the

624

M. Olszewska et al.

requirements in [13], and the senior expert; (ii) scrum master: quality assurance expert
and agile expert; (iii) team: role shared by all three members. The participants were
selected based on their familiarity with formal methods, as our goal was to make the
work environment close to the one of the engineers from safety-critical domains.
Due to other work-related commitments of the team members, we set up a two-week
restriction time for the development, divided into two sprints, each one week long (long
sprint), where every work-day was treated as a short sprint. We held an introductory
planning meeting, “sprint 0”, to familiarise the non-agile members with the concepts of
Scrum, as well as to ensure that the goals and vocabulary of the case study were clear
for all of the team members.
The item pool and backlog were managed in an Excel form, which was constructed
as a simpliﬁed tracking system. The safety, functional, equipment and other require‐
ments of the case study [13] were added to the item pool/backlog in “sprint 0”. The team
planned the modelling by ﬁrst prioritising the features listed in the item pool (assigning
priorities 0-3) and then scheduling them for certain iterations (1st or 2nd). However,
more requirements were added to the item pool/backlog by the team and product owner/
stakeholder later on, since some additional properties were revealed during the devel‐
opment. A requirement was assigned status Done when it was both modelled and proven.
Only then the model could be submitted to the SVN-repository. Note that this study was
not the sole activity of the development team during the workdays, so one cannot relate
the collected data to the complete workday.

4

Analysis and Observations

The quantitative data we report on were collected and computed automatically and are
describing: the development process (including meetings), the modeller’s eﬀort, the
number of automatic and interactive proof obligations, and ﬁnally the size and
complexity of the model (the dynamic and static parts). We also gathered some quali‐
tative data about the development process and its suitability for the Event-B development
based on the statements of the modeller.
In the quantitative study, we were particularly interested in the eﬀort required for
modelling and proving such system and how it reﬂects in perspective of Scrum. Some
preparatory activities were needed at the beginning of the development (sprint “0”).
Since the choice of modelling strategy and the sequence of modelling artefacts and
properties has an impact on how easy it will be to prove the model, there is an overhead
that should be taken into account when the system is being studied and the modelling is
planned. Therefore, investing some time at the beginning of the sprints is beneﬁcial in
the long run and does not contradict the idea of an agile development process. On the
contrary, any progress, also involving eliciting and reworking the requirements,
conforms to the agile philosophy by supporting value creation (and indirectly elimi‐
nating waste by constructing the system in such a way that it does not need to be remod‐
elled). In order for some requirements to obtain the status “Done” the developer needed
to spend more than one day of work per requirement; whereas several other requirements
could be completely modelled and proven within one day.

Using Scrum to Develop a Formal Model – An Experience Report

625

Whenever analysing a formal development, it is particularly interesting to investigate
the proof statistics, which may shed some light on the development complexity or
required eﬀort. Proof statistics for our model consist of data representing the interactive,
automatic and total number of Proof Obligations (POs) in each reﬁnement step. Amongst
the 504 proof obligations, only 13 required interactive proving. This was due to the
redundancy of components and the well-deﬁnedness condition.
The size and complexity of the model, as well as the diﬃculty and eﬀort of its creation,
were calculated according to the set of metrics we established for Event-B [16] for each
model submitted to the repository (Done status) for all reﬁnement steps. The ﬁnal model
consists of ten reﬁnement steps, each having a dynamic and static part. We observed
that the dynamic part of the model had a steady pace of development (no sudden peaks
in measures), which denotes that the requirements were well decomposed into features
and that they were modelled in an iterative way following the reﬁnement process. Thus,
the development was well reﬂecting the agile principles. Moreover, the diﬃculty meas‐
ures were not changing throughout the modelling, which signiﬁes that the strategy for
the development was well chosen. The static part of the model was considerably growing
with respect to size, complexity and eﬀort measures whenever a component was intro‐
duced, which was expected.
Additionally, we explored the eﬀort distribution over the reﬁnement steps based on
the model metrics. When compared to the eﬀort reported by the developer, the eﬀort
based on model metrics takes into consideration not only modelling and proving activ‐
ities, but also the eﬀort necessary to comprehend the model. It shows gradual growth
concerning the development progress. Finally, the size and complexity of the complete
model, as well as the eﬀort computed from the model correlated.
Our qualitative analysis is based on the observations noted in our development diary
during the sprints and a post-mortem interview with the developer. We observed that a
reﬁnement step does not correspond to an iteration, regardless of the deﬁnition of iter‐
ation (otherwise the length of the iteration would have to be ﬂuctuating). When a reﬁne‐
ment step takes longer than one iteration, i.e., it involves a problem that is too intricate,
the problem needs to be decomposed into several smaller ones; otherwise, there may be
several reﬁnement steps in a single iteration. Since the “Done” status might not neces‐
sarily be obtained at the end of the working day, submissions to the subversion control
system may not occur on a daily basis. Finally, the sequential nature of reﬁnement does
not fully conform to rapid and iterative development suggested by agile methodologies
(a new feature request may require re-engineering the model). However, we noticed that
Scrum is particularly helpful when it comes to monitoring and managing the develop‐
ment with respect to the planned modelling, due to requirements prioritisation and
continuous control of progress (backlog). Communication and transparency of the
development (a weak point in formal methods application) is facilitated, which is attrib‐
utable to the set of meetings proposed by Scrum.
The quantitative and qualitative data conﬁrmed that the synergy of Event-B and
Scrum is beneﬁcial for safety-critical developments, since the well-deﬁned development
methods are complemented with eﬃcient and ﬂexible development process. We are
aware that our work is limited to two Scrum iterations, performed in a controlled,

626

M. Olszewska et al.

academic environment by one team and thus the development may seem of a smallscale. However, not only the case study was provided by industry and represented real
life requirements from the aerospace domain, but also the team was of experience similar
to the one of the engineers working in industry.
Due to the nature of our study, we cannot generalise our ﬁndings, nor validate them
statistically. There are many variables that can impact the investigation, e.g., experience
of the developer with a formal method or the tool that supports it; familiarity with agile
processes; domain knowledge of the problem to be modelled. However, we believe that
our results can be transferable to other formalisms and agile methods, as we show: (i)
what kind of issues need more attention when choosing Event-B as a modelling language
in an agile setting and (ii) what kind of ﬁne-tuning of an agile process is needed to beneﬁt
the most from the agile and formal combination.

References
1. Manifesto for Agile Software Development. http://agilemanifesto.org/. Accessed 11–13 Feb
2001
2. Agile Alliance: What is Agile? In: Agile Alliance. https://www.agilealliance.org/agile101/
what-is-agile/
3. Butler, R.: What is formal methods? In: NASA LaRC Formal Methods Program (2001)
4. Holloway, M.: Why engineers should consider formal methods. In: AIAA/IEEE16th Digital
Avionics Systems Conference (1997)
5. Olszewska, M., Waldén, M.: FormAgi – A Concept for More Flexible Formal Developments.
TUCS TR, Åbo Akademi University, Turku (2014)
6. Abrial, J.-R.: Modeling in Event-B: System and Software Engineering. Cambridge University
Press, New York (2010)
7. Abrial, J.-R.: Extending B without changing it (for developing distributed systems). In:
Proceedings of 1st Conference on the B Method, Nantes (1996)
8. Schwaber, K., Sutherland, J.: Scrum. The Oﬃcial Guide (2010). http://www.Scrum.org
9. Event-B: Home of Event-B and the Rodin Platform (2008). http://www.event-b.org/
index.html
10. Abrial, J.-R.: Event Model Decomposition. http://wiki.event-b.org/images/Event_Model_
Decomposition-1.3.pdf
11. Ostroumov, S., Waldén, M.: Facilitating Formal Event-B Development by Visual
Component-based Design. TUCS Technical report 1148, Turku Centre for Computer Science,
Turku (2015)
12. Olszewska, M., Waldén, M.: DevOps meets formal modelling in high-criticality complex
systems. In: 1st International Workshop on Quality-Aware DevOps (QUDOS 2015),
Collocated with 10th Joint Meeting of the ESEC and ACM SIFSOFT FSE (2015)
13. Boniol, F., Wiels, V., Ait Ameur, Y., Schewe, K.-D. (eds.): ABZ The Landing Gear Case
Study. CCIS, vol. 433. Springer, Heidelberg (2014). doi:10.1007/978-3-319-07512-9
14. Rodin Platform (2006). http://www.event-b.org/platform.html
15. Olszewska, M., Ostroumov, S., Waldén, M.: Synergising Event-B and Scrum Experimentation on a Formal Development in an Agile Setting. TR 1152, TUCS, Turku (2016)
16. Olszewska (Pląska), M., Sere, K.: Speciﬁcation Metrics for Event-B Developments. In: 13th
International Conference on Quality Engineering in Software Technology (CONQUEST
2010), Dresden (2010)

Towards Better Selection Between Moving
Windows and Growing Portfolio
Sousuke Amasaki1(B) and Chris Lokan2
1

Department of Systems Engineering, Okayama Prefectural University, Soja, Japan
amasaki@cse.oka-pu.ac.jp
2
School of Engineering and Information Technology,
UNSW Canberra, Campbell, Australia
c.lokan@adfa.edu.au

Abstract. BACKGROUND: Several studies in software eﬀort estimation have shown that it can be eﬀective to use a window of recent projects,
rather than the growing portfolio of all past projects, as training data.
Practitioners need a method for choosing between those approaches when
estimating eﬀort for a target project. OBJECTIVE: This study examined the usefulness of the ﬁtted values for choosing between moving windows and the growing portfolio. METHOD: An empirical experiment was
conducted with a single-company ISBSG dataset. RESULTS: The ﬁtted
values could be useful for the selection on average. CONCLUSIONS:
This positive result contributes to understanding when windows may be
appropriate.

1

Introduction

Accurate eﬀort estimation is essential to software project success. Intuitively, it
may make sense to discount older projects as training data, as they may be less
representative of an organization’s current practices.
Recent studies [1,2] have shown that selecting only newer projects, using a
moving window, could improve estimation accuracy. Several factors aﬀect the
usefulness of moving windows. As moving windows do not work for every situation, practitioners need a method to decide when they may be appropriate.
This paper investigates a method which gives a hint for selecting between
moving windows and the growing portfolio.

2

Fitted Values Based Selection

Figure 1 depicts how a dataset is used as a growing portfolio, with moving windows, and with the selection method we examine here. The horizontal line represents a timeline of project data collection in an organization. The circle represents
a target project to be estimated. The growing portfolio approach uses all project
data collected in the past. The moving windows approach uses recent projects
segmented by a window of size w.
c Springer International Publishing AG 2016

P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 627–630, 2016.
DOI: 10.1007/978-3-319-49094-6 49

628

S. Amasaki and C. Lokan
Training Data for Growing Portfolio

w
Training Data for
Moving Windows

past

Target Project

future
Evaluation Data for
Fitted Selection

Fig. 1. The use of data with the growing portfolio, moving windows, and the selection
method

The use of moving windows assumes that recent project data represents an
organization’s current situation better than older project data, and can thus
train a good eﬀort estimation model for a target project. It is also reasonable
to suppose that an eﬀort estimation model would ﬁt more recent training data
better than older training data, if it better reﬂects the current situation of the
organization. In other words, the hypothesis is that if a model can learn characteristics of recent projects, it can ﬁt them, and a new project, well.
Therefore, the ﬁtted values of recent projects can give a hint for choosing
between the growing portfolio and moving windows.
This study examined a simple method based on that idea:
Step 0: Make two training datasets; one holds all past project data, and the
other holds a part of past project data segmented by a window of size w such
that it only holds recent projects.
Step 1: Train two models: one uses all past project data, and the other uses the
window of recent data.
Step 2: Evaluate the ﬁtting performance (accuracy) of both models, for the
projects in the window (shown as “Evaluation Data for Fitted Selection” in
Fig. 1).
Step 3: Select the model that best ﬁts the projects in the window.
At Step 2, the model based on the growing portfolio is evaluated on part
of its training data, while the moving window is evaluated with all of its training data, as shown in Fig. 1. Note that we did not use any unseen or future
projects: accuracy is based on ﬁtted values for past projects, thus we call this
fitting performance. Note also that we did not use a validation dataset in addition to a training dataset. This is because separating a validation dataset from
recent projects may degrade the representativeness of the training dataset. Furthermore, this treatment would introduce the complication of determining the
size of the validation dataset in addition to the size of the moving window.
We adopted mean absolute error (“MAE”) for evaluating estimation accuracy. MAE is widely used in evaluation of eﬀort estimation models.

Towards Better Selection Between Moving Windows and Growing Portfolio

3

629

Experiment

We conducted an experiment for examining the eﬀect of using the ﬁtted values
based method. The experiment used the single-company ISBSG dataset of 228
projects investigated in past studies [1,2].
We used linear regression for building eﬀort estimation models. The linear
regression model used a log-transformation for eﬀort and size, to improve the normality of residuals. Feature selection was also adopted for performance improvement.
The experiment evaluated the eﬀects of using the selection method along with
a timeline of projects’ history. The sizes of moving windows were varied, as in
past studies, because it signiﬁcantly aﬀected the accuracy. This study considered
windows of 20 to 120 projects, as in [2].
For each window of size w, the experiment was performed as follows:
1. Sort all projects by starting date.
2. For a given window size w, ﬁnd the earliest project p0 for which at least w + 1
projects were completed prior to the start of p0 (projects from p0 onwards
are the ones whose training set could be aﬀected by using a window, so they
form the set of evaluation projects for this window size).
3. For every project pi in chronological sequence, starting from p0 , form estimates using the growing and windowing approaches. For the growing approach, the training set is all projects that ﬁnished before pi started. For moving windows, the training set is the w most recent projects that ﬁnished before
pi started. If multiple projects ﬁnished at the same date, projects that started
more recently are prioritized.
4. Evaluate estimation accuracy with MAE.
To test for statistically signiﬁcant diﬀerences between accuracy measures, we
use the two-sided Wilcoxon signed-rank test (wilcoxsign test function of the
coin package for R) and set the statistical signiﬁcance level at α = 0.05. We
also controlled the false discovery rate (FDR) of multiple testing [3] with the
“qvalue” function of the qvalue package in R. FDR is a ratio of the number of
falsely rejected null hypotheses to the number of rejected null hypotheses.

4

Preliminary Result

As a preliminary experiment, we compared the ﬁtted values based method and
the growing portfolio. Figure 2 shows the diﬀerence in mean absolute error
against window sizes. The x-axis is the size of the window, and the y-axis is
the subtraction of the accuracy measure value with the growing approach from
that with the ﬁtted selection at the given x-value. The ﬁtted selection is advantageous where the line is below 0. Circle points mean a statistically signiﬁcant
diﬀerence, with the ﬁtted selection being better than the growing approach.
At these points, the corresponding q-value is below 0.05 (this means that the
number of falsely rejected hypotheses was at most 5 % of rejected hypotheses.)
Figure 2 shows that the proposed method can improve estimation accuracy,
and improvements are signiﬁcant with larger windows.

S. Amasaki and C. Lokan

Differences in mean AE(%)
−20
−10
0
5 10

630

20
40
60
80
100
120
Window Size (number of projects)

Fig. 2. The diﬀerence of accuracy between growing portfolio and ﬁtted-selection

5

Conclusion and Future Directions

This paper investigated a method which gives a hint for choosing between moving windows and the growing portfolio, using data from one organization in the
ISBSG dataset. We have shown that it has a statistically signiﬁcant eﬀect, compared to always using the growing portfolio. This result supported using ﬁtted
values in making the decision.
Examining the method with duration-based moving windows, enhancing the
selection method in the eﬀectiveness, and generalizing the results under other
situations are future directions for this research.
Acknowledgment. This work was partially supported by JSPS KAKENHI Grant
#15K15975.

References
1. Amasaki, S., Lokan, C.: On the eﬀectiveness of weighted moving windows: experiment on linear regression based software eﬀort estimation. J. Softw. Evol. Process
27(7), 488–507 (2015)
2. Lokan, C., Mendes, E.: Investigating the use of duration-based moving windows to
improve software eﬀort prediction: a replicated study. Inf. Softw. Technol. 56(9),
1063–1075 (2014)
3. Storey, J.D.: A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B (Stat.
Methodol.) 64(3), 479–498 (2002)

Assessing the Behavior of Software
Analysis Tools
Lerina Aversano, Carmine Grasso, Pasquale Grasso,
and Maria Tortorella(&)
Departement of Engineering, University of Sannio, Via Traiano,
82100 Benevento, Italy
{aversano,carminegrasso,pasgrasso,
tortorella}@unisannio.it

Abstract. Quality models and metrics permit an objective evaluation of the
quality level of a software product. Moreover, the adoption of software analysis
tools that facilitate the measurement of software metrics can ease the evaluation
tasks. However, the available tools do not interpret and measure metrics in the
same manner. This paper proposes a preliminary investigation on the behavior
of existing software metric tools and shows that the evaluation of the metrics of
a software system by using different analysis tools provides different values.
This aspect could impact on the overall software quality evaluation.

1 Introduction
A software project is an important investment for an organization. The commitment of
ﬁnancial resources and the expected time to get the ﬁnal result is only justiﬁed by the
necessity of addressing a need. On the other side, the software quality is “how well” the
problem has been solved. Therefore, develop and/or select software products of an
good quality represents a relevant activity. For knowing the quality level of a software
system, it is important to adopt a quality model and metrics permitting an objective
evaluation of the quality of a software product, and the availability of software analysis
tools facilitating the measurement of software metrics can ease the evaluation tasks.
Many software analysis tools measuring metrics exists. They have different characteristics concerning the programming language they analyse and measurement they
perform, then the evaluator can have difﬁculties in identifying the software analysis
tool that better addresses his needs.
This paper aims at investigating if the software analysis tools achieve similar results
when they are used for assessing a software system, so as to be able to indifferently
choose a tool instead of another. In particular the paper analyses a set of software
analysis tools and veriﬁes if they consider the same set of metrics and the way they are
assessed. Therefore, the chosen tools are used for measuring a set of selected metrics on
a software systems, and the obtained results are compared.
Next section describes the experimental setup of the study and lists the chosen
software analysis tools and the selected metrics. Results of the evaluation is presented
in the subsequent section, and ﬁnal considerations are given in the last section.
© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 631–635, 2016.
DOI: 10.1007/978-3-319-49094-6_50

632

L. Aversano et al.

2 Experimental Setup
This section presents the planning of the presented analysis.
The ﬁrst performed step regards the Scope deﬁnition. As already stated, the goal is
to analyze and compare a set of software analysis tools evaluating software quality
metrics with the aim of verifying if they consider the same metrics and interpret and
evaluate them in the same manner.
The following step is Selection of the evaluation tools. The goal is to select the
software tools to be analyzed and compared. They were chosen among the most used
open source systems used for measuring software metrics. Such a kind of tools was
considered for economical reasons. The tools were also chosen on the basis of the
programming language they could evaluate, as the measurements obtained by the
different tools had to be comparable. The Java programming language was chosen.
A free search in the SourceForge.net site brought to the identiﬁcation of forty
software analysis tools. The preliminary analysesd characteristics regarded: name,
manufacture, link to the home page, license type, availability, supported programming
languages, supported operating system/environment and evaluated metrics. This
information permitted to considered the following nine software analysis tools:
Eclipse Metrics Plug-in 1.3.6. A metrics calculation and dependency analyzer Eclipse
plugin.
CCCC. A command-line tool generating reports on various metrics.
Understand. A reverse engineering, code anylizer and metrics tool for different programming languages. It evaluates a set of standard metrics and visualizes them in
different ways.
JArchitect. A static analysis tool for Java code evaluating different code metrics.
Stann4j. An Eclipse plug-in that allows for analysis of the dependencies between
classes and packages, and provides code metrics.
CodePro Analytix. An Eclipse plug-in, freely offered by Google, regarding software
quality improvement and reduction of development costs and schedules. It also supports code analysis, test cases, dependency analysis and metric measurement.
LocMetrics. A freeware simple tool, measuring the size of a software program by
counting the number of lines in the source code
SourceMonitor. A tool for code exploration, including the measurement of a set of
metrics related to the identiﬁcation of module complexity.
CodeAnalyzer. A Java application for C, C++, Java, Assembly, Html. It calculates
metrics across multiple source trees as one project.
Some of the tools above use a graphical interface, such as, JArchiect, Undertand,
LocMetrics, CodeAnalyzer, and SourceMonitor; others are Eclipse plug-in, such as
Metrics, Stan4j, CodePro Analytix, while CCCC software is a command line analysis
tool.
Then, the Metrics selection task followed. It required the analysis of standards and
evaluation models for open-source software systems to identify features, sub-features
and metrics to be automatically evaluated by using the considered software analysis
tools [4]. The considered metrics can be classiﬁed as it follows:

Assessing the Behavior of Software Analysis Tools

633

• Dimensional Metrics, used to evaluate the software quality analyzing the software
system dimensions. Examples of this kind of metrics are: LOC (Lines of Code),
TLOC (Total Lines of Code), NOP (Number of Packages), NOM (Number Of
Methods), MLOC (Medium LOC per method), NOA (Number Of Attributes), etc.
• Object Oriented Metrics, such as the object oriented metrics proposed by
Chindamber and Kermerer in 1991 [1], CK Metrics. Some example are: WMC
(Weighted Methods for Class), CBO (Coupling between Objects), RFC (Response
For Class), LCOM (Lack of Cohesion of Methods), DIT (Depth of Inheritance
Tree), NOC (Number of Children) [2].
• Complexity Metrics, for assessing the complexity of the software, such as CC
(McCabe’s Cyclomatic Complexity) [1, 3].
Other important metrics are: AC (Afferent Coupling), EC (Efferent Coupling),
I (Instability), A (number of abstract classes respect to the one of the concrete classes),
D (distance from the ideal quality). Table 1 lists on the ﬁrst column the metrics divided
of the basis of the classiﬁcation above, and on the ﬁrst line the considered software
analysis tools. The list is not complete, as the included metrics are those ones evaluated
by at least three analysis tools. A number in a table cell indicates that the tool on the
column evaluated the metric on the row.
Table 1 shows that the software analysis tools do not evaluate all the metrics. Three
tools (Metrics, Jarchitect and CodePro) measure the large part of the metrics; while
LOCMetrics, SourceMonitor and CodeAnalyzer consider only the dimensional metrics,
and only six tools on nine consider the CK metrics. Deﬁnitively, none of the considered
tools can be used to perform a complete evaluation of a software system quality, and it
can be necessary to integrate them in a common evaluation strategy.
After deﬁning what and how to make the analysis, task Selection of the software
systems to be evaluated aimed at choosing the software systems to be analyzed. An
open source software system was considered, as many metrics to be evaluated consider
the source code. A software system named SimpleWeb was chosen. Then, the Metric
evaluation task was performed for calculating the chosen metrics by using the considered software analysis tools. Finally, a Results analysis was executed for comparing
the values of the metrics obtained by using the different software analysis tools, with
the aim of verifying to which extent the evaluation tools interpret the metrics in a
similar manner, by applying the same rules.

3 Evaluation
This section reports the analysis of the metric values evaluated by considering the
chosen software analysis tools. Table 1 reports the values obtained by evaluating
SimMetrics. Most of the values are given as mean, while few metrics have an integer
value, such as: Class and Interface, SourceFile, LOC, TLOC, CommentWord and
Blank Line. Table 1 indicates that in some cases the metrics have different values. This
happen, for example, for the metric LCOM, that ranges from a minimum value of 0.28
to a maximum value of 46 measured with JArchitect. Similarly, the metric DIT passes
from a value of 2.48 obtained with CodePro to a value equal to 1.17 measured with
CCCC. Moreover, other metrics have mean values quite different between them.

634

L. Aversano et al.
Table 1. Metric values obtained for simmetrics

Software
Analysis
Tool/Metric

Metrics Stan4j LOC
Metric

AC
6.79
EC
3.89
D
0.38
I
0.51
A
0.11
Object-Oriented metrics
CBO
DIT
1,71
LCOM
0.28
NOC
0.61
WMC
10.39
Dimensional metrics
TLOC
LOC
2238
NOM
6.05
MLOC
4.66
NOA
1.24
NOP
9
CWords
Blank Lines
% Lines with
Comments
Source File
Class and
47
Interface
Complexity metrics
CC
1.63
NBD
1.22

Source
Monitor

0
7.04
0.14
1
0.19

0.42

1.15
1.71
5.49
0.53
9.26

2467
6.79

7326
2238

0.74

1191
6.72
4.49
2.14
9
2679

7280
2238
5.23
5.92
1.08
9
755

60.30

69.22

33.70

47
47

47
47

47

1.55
0.91

7280
1683
5.85
3.63

4351
737

47

0
47
0.19
1
19.10

3.19
1.49
0.28
0.55

1.47

2.15
9

47

JArchitect CodePro CCCC UnderStand Code
Analyzer

1.36

1.86
1.85

2.48

5.72
1.17
1.19
5.12

2.75
1.68
46.46
0.61

7279
2237
6.39

7280
2238

4357
722
1.95

4358
722

47
47

47

58

2.83

1.53

2283
5.12

4389

It is not possible to provide a more detailed description of the evaluation of some
metrics, such as, CBO, DIT, LCOM, NOC, WMC, AC, EC, NOA, DC, MLOC, NBD,
D, NOM, I, as their assessment depends on the speciﬁc deﬁnition adopted by the
different tools. For example, this occurs for the differences detected in Class and
Interface metric that with CCCC obtains a value of 58 differently from all the other
tools, which provide 47 as a value.
A manual inspection of the source code allowed to deduce that CCCC tool, for the
evaluation of Class and Interface metric, considers classes and interfaces but even the
packages. Moreover, unlike the other tools, only tool Metrics returns a separate value
for the number of classes and the one of interfaces.
In the case of metric TLOC, not all tools consider all the lines from the ﬁrst to the
last bracket. In particular Understand does not consider the ﬁrst white line, while
LocMetrics considers all the lines including those after the last curly bracket, so it
provides a highest value. With regard to the LOC metric it ranges from a minimum
value of tool JArchitect with 1191 lines of code to a maximum value of tool Stan4j

Assessing the Behavior of Software Analysis Tools

635

with 2467 lines of code. This occur because JArchitect considers a method as a single
statement, while Stan4j considers also the white lines of the methods as statement.
Finally the CCCC tool considers a statement written on multiple successive lines as
more lines of code. Performing a manual inspection of the source code it emerged that
the actual number of lines of code value is 2238. as evaluated by tools Metrics, Loc
Metrics, CodePro, Analytix and CodeAnalyzer. Regarding the Comment words metric,
It can be observed that it ranges from a minimum value of 755 lines obtained by
CodePro to a maximum value of 4389 obtained by Understand. This is due to the fact
that some software counts as comments the lines included from the start to the end
delimiters of the comments (/* - */).

4 Conclusions
Nowadays. software engineering managers always more often needs to deal with
quantitative data regarding the quality of a software system. Indeed a number of metrics
are generally adopted and measured during maintenance and evolution processes to
predict effort for maintenance activities and/or identify parts of a system needing
attention. Numerous software metrics tools have been implemented for evaluating the
software metrics, however their use in practice requires their validation for understanding if they are really useful for achieving the evaluation goal. With this in mind,
this paper investigates if different software analysis tools perform the same kind of
assessment on a software system, and veriﬁes if they consider the same set of metrics
and interpret them in the same way. The performed analysis indicates that at least for
the investigated software analysis tools differences exist. Actually, the evaluation of
certain metrics with some tools delivered similar results, and they were not assessed as
intended with other tools. This is because each metric is interpreted differently by each
tool and then is differently calculated. Therefore, when a tool is chosen for performing
a quality measurement, it is necessary to consider that this choice can be also misleading during the evaluation task.
Future work will consider more case studies and additional tools and metrics.
Indeed, a larger base of software systems should be measured to increase the practical
relevance of our results.

References
1. Chidamber, S.R., Kemerer, C.F.: A metrics suite for object-oriented design. IEEE Trans.
Softw. Eng. 20(6), 476–493 (1994)
2. Henderson-Sellers, B.: Object-Oriented Metrics: Measures of Complexity. Prentice-Hall Inc.,
Upper Saddle River (1996)
3. Li, W., Henry, S.: Maintenance metrics for the object oriented paradigm. In: IEEE
Proceedings of the 1st International Software Metrics Symposium, pp. 52–60, May 1993
4. Lincke, R.: Validation of a Standard - and Metric-Based Software Quality Model – Creating
the Prerequisites for Experimentation. Licentiate thesis, MSI, V ̈axj ̈o, University, Sweden,
April 2007

Driving Academic Spin-oﬀ by Software Development
Process: A Case Study in Federal Institute of Rio Grande
do Norte - Brazil
Claudia M.F.A. Ribeiro ✉ , Fellipe A. Aleixo, and Marília A. Freire
(

)

Federal Institute of Rio Grande do Norte – IFRN, Natal, RN, Brazil
{claudia.ribeiro,fellipe.aleixo,marilia.freire}@ifrn.edu.br

Abstract. This paper presents, as a case study, the experience of Federal Institute
of Rio Grande do Norte (IFRN) in using software process, as a means to create
institutional ambience for technological innovation and new start-ups formation.
As a professional qualiﬁcation institution, IFRN has succeeded during its centuryold existence in providing good job opportunities for its students. However, the
same educational model has proved not to be as adequate to give the students the
same opportunity to run their own businesses. Having this in mind, signiﬁcant
institutional arrangements and curriculum innovations were made, mainly related
to software development disciplines, in order to support the gradual transition of
academic requirements (e.g. knowledge acquisition and best practices) into busi‐
ness and more market-oriented ones. This paper reports this trajectory, the main
lessons learned, and the new challenges ahead.
Keywords: Software development process · Start-up · Academic spin-oﬀ

1

Introduction

Professional education aims at providing proper conditions for job activities. In this
context, excellence of professional qualification is a key instrument to guarantee the
best job opportunities. This is particularly true for technological areas, such as soft‐
ware development, where the pace of advances in technology dictates adjustments
on curriculum.
Entrepreneurship also poses some challenges for curriculum planning, designing and
implementation. Cutting-edge technologies and classes are not the single requirement.
It must be complemented with real-world problems discussions, activities beyond the
classroom, and other business-related subjects to give students the proper conditions to
embark on a career in private equity.
Regarding these two educational aspects, Federal Institute of Rio Grande do Norte
(IFRN) has played an important role in professional context, mainly for Brazilian North‐
east region, where it is located. Over the last century, IFRN has changed to match the
necessities of Brazilian economy and marketplace. This document covers part of this
experience presenting curriculum and institutional adjustments made, to support

© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 636–639, 2016.
DOI: 10.1007/978-3-319-49094-6_51

Driving Academic Spin-oﬀ by Software Development Process

637

students formation program. More speciﬁcally, we investigated these two following
research questions:
RQ1. How can software process chain drive the maturity of software projects?
RQ2. Can the adoption of a software process in undergraduate curriculum inﬂu‐
ence the derivation of academic spin-oﬀ?
The answer for these two questions represents qualitative indicators that show how
adjusted the curriculum is to the new professional demands. To support students during
the many activities of software development, it was initially adopted RUP as the software
development process. Despite the clear beneﬁts of the process adoption and RUP itself,
it became clear that to accomplish the curriculum demands, both academic and business
requirements, it is critical to adapt the software process. In this paper, we share the
lessons learned during a decade when such an adaptation process occurred.
This paper is organized as follows. Section 2 deals with academic spin-oﬀ models.
Section 3 is dedicated to design and procedures of the case study. Results, conclusions
and future work are subject of Sect. 4.

2

Academic Spin-oﬀ

Academic institutions all over the world have been the place where knowledge is created,
acquired, transferred, combined, and converted into diﬀerent types of knowledge
(Gamble and Blackwell 2001; Waugaman 2014). As such capabilities have increasingly
become critical for companies, mostly those concerned about innovative products and
services, it is reasonable to think of academic environment as the natural crib for
nurturing new businesses. This kind of institution is commonly named academic spinoﬀs (Corbett et al. 2014).
Academic spin-oﬀ model is not a novelty in many countries (Wright 2007), however,
it is a quite new movement in Brazilian institutions. To support a proper environment
for business inspiration, an academic spin-oﬀ has to make institutional arrangements,
in order to give students a variety of extra-classes activities to complement entrepre‐
neurship skills. Some of these arrangements include incubators, technology transfer
oﬃcer, and the establishment of eﬀective relations with external agents. In fact, closing
the gap between academy and society is critical to provide a clear sense of real demands,
and inspire new and successful business solutions.
IFRN has made signiﬁcant eﬀorts in this direction. However, as many institutions
worldwide, culture is still one of the biggest challenges to overcome (Waugaman
2014). As stated in (Rasmussen and Borch 2010) “While leadership is about making
things happen; organizational culture is the juice that makes people want to make things
happen”. It is worth noting that in the context of this case study, cultural aspect exceeds
academic limits, as only recently (in 2016), discussions about intellectual property
licensing, originated from research projects, gained legal support in Brazil. After that,
researcher’s productivity at universities and institutes, traditionally measured by the
number of published paper, was revised to embrace also entrepreneurial activities, as
well.

638

3

C.M.F.A. Ribeiro et al.

Case Study Design and Procedures

In this case study, the collected data refer to an interval of 10 years, from 2006 to 2016,
of TADS (Technology of Software Analysis and Development) course. TADS is a threeyear undergraduate course, which oﬀers modules that embrace the main subjects related
to software development. A key component of TADS curriculum is a Software Devel‐
opment Project (PDS). The main goal of PDS is to allow the students the opportunity
to practice the theory covered in the formal disciplines. PDS was conceived as a bridge
that intends to gap the distance between academic and marketplace requirements. The
teams involve ﬁve or six members, who play diﬀerent roles according to a proposed
software process.
During this decade of TADS history, many adjustments on the software process were
made, in order to maximize the quality level of resultant software products. So that, it
is fair to say TADS history can be explained through software process evolution, which
ended up to originate a family of (academic agile) processes. This process family was
formally deﬁned at 2014 and is composed by three processes: (i) PAWEB (Web Appli‐
cation Development); (ii) PAAD (Distributed Application Development); and (iii)
PAAC (Enterprise Application Development).
To understand a case study that involves subjective aspects, such as here described,
it is critical to understand the context that surrounds and involves the participants. IFRN
has many institutional instruments to support innovation and start-up formation, such
an oﬃcer dedicated to technology transfer (NIT), a technological incubator (ITNC), and
a mentoring program. These organizational units have developed many activities to
install a culture of innovation at IFRN. While the exercise of assessment is not a trivial
task, revisiting the seven principles used by successful academic spin-oﬀ (Corbett et al.
2014) it is an interesting way to investigate how IFRN is going on implanting innovationfriendly environment. During the past decade, discussions about innovation and how to
create academic spin-oﬀ to stimulate new business occurred, involving typical entities
in the triple helix model (Etzkowitz 2008).
In a very broad sense, this case study aims to investigate the eﬀectiveness of IFRN
as an academic spin-oﬀ agent. To tackle this problem, it is necessary to deal with both
objective and subjective issues. For the assessment of objective attributes of projects,
all of software artefacts, such as code and documents, are saved in a repository under
Redmine system. Additionally, the supervisors’ notes cover subjective aspects on the
projects. Finally, information about program mentoring and incubator processes are also
recorded and evaluated, as an innovation.
This case study embraces a qualitative approach to organizing its data into three
criteria. These criteria have been identiﬁed from the researchers’ experience in
conducting supervision of software development projects. They were deﬁned as follows:
(1) Project’s success per cycle; (2) Project evolution through out cycles; and (3)
Student’s innovation engagement. The results of this case study are subject of next
section, along with the main ﬁndings that came out from this analysis.

Driving Academic Spin-oﬀ by Software Development Process

4

639

Results, Conclusions and Future Work

The case study presented described a decade of experience of Federal Institute of Rio
Grande do Norte (IFRN) in using software process, as a means to guide students towards
individual qualiﬁcation and entrepreneurial initiatives. The main ﬁndings achieved
pointed out towards the two research questions, as follows.
RQ1. How can software process chain drive the maturity of software projects?
Answer: In academic institutions, a software process chain can promote higher levels
of maturity of software projects, by introducing gradually and consistently a minimum
set of fundamental requirements, artefacts, roles, and activities, specially deﬁned to
observe the same level of intellectual students maturity.
RQ2. Can the adoption of a software process in undergraduate curriculum inﬂu‐
ence the derivation of academic spin-oﬀ?
Answer: For the sake of our experience described in this case study; the answer
depends on the observance of rigorous application of software process. This is due the
level of conﬁdence acquired by students, not only about their own skills, but also on
the potential market opportunities for the resultant product. While the former matches
individual educational purposes, the latter also depends on how the innovation culture
is disseminated in the institution.
More information about the results can be found at (Ribeiro 2016). As future work,
we intend to make continuous analysis to improve our processes, and also plan to inves‐
tigate automatic software metrics to attest the increment of quality on software artefacts.
At the institutional level, we intend to make eﬀort to implement the seven principles
that guide innovative academic institutions (Waugaman 2014).

References
Corbett, A.C., Siegel, D.S., Katz, J.A.: Academic Entrepreneurship: Creating An Entrepreneurial
Ecosystem. Emerald Group Publishing, Bingley (2014)
Etzkowitz, H.: The Triple Helix: University-Industry-Government Innovation in Action.
Routledge, London (2008)
Gamble, P.R., Blackwell, J.: Knowledge Management: A State of the Art Guide. Kogan Page
Publishers, London (2001)
Rasmussen, E., Borch, O.J.: University capabilities in facilitating entrepreneurship: a longitudinal
study of spin-oﬀ ventures at mid-range universities. Res. Policy 39(5), 602–612 (2010)
Ribeiro, C.M., Aleixo, F.A., Freire, M.A.: Academic Spin-oﬀ (2016). Accessed August 2016 from
Software Development Research Group/CNAT/IFRN. http://nudes.ifrn.edu.br/index.php/
projetos/academic-spin-oﬀ/
Waugaman, P.G.: Innovation U 2.0 reinventing university roles in a knowledge economy. J. Res.
Admin. 45(1), 125 (2014)
Wright, M.: Academic Entrepreneurship in Europe. Edward Elgar Publishing, Cheltenham (2007)

Future of Computing

The CRUSOE Framework: A Holistic
Approach to Analysing Prerequisites
for Continuous Software Engineering
Teemu Karvonen1(&), Tanja Suomalainen2, Marko Juntunen3,
Tanja Sauvola1, Pasi Kuvaja1, and Markku Oivo1
1

Information Technology and Electrical Engineering,
University of Oulu, Oulu, Finland
{teemu.3.karvonen,tanja.sauvola,pasi.kuvaja,
markku.oivo}@oulu.fi
2
VTT Technical Research Centre of Finland Ltd, Espoo, Finland
tanja.suomalainen@vtt.fi
3
Oulu Business School, University of Oulu, Oulu, Finland
marko.juntunen@oulu.fi

Abstract. Continuous software engineering (CSE) is used for customer
experiments and repetitive integrated processes within and between business
planning and software development. First, this paper deﬁnes a new framework,
called CRUSOE, for analysing CSE prerequisites. The framework allows for a
more precise analysis of the interrelations and estimation of the changes that are
prerequisites for moving from traditional product development to CSE. CRUSOE addresses prerequisites associated with and interdependencies among
(1) the strategy, (2) architecture and (3) organisation. Second, this paper
describes a case study conducted as part of a smartphone platform project to
investigate the CSE prerequisites for product-focused software development.
The results are synthesised together with recent related studies using the
CRUSOE framework. The ﬁndings conﬁrm challenges in moving towards CSE
in embedded system development. Moreover, context-speciﬁc prerequisites
should be considered, while it is still unclear as to how CSE can be systematically applied to the non-website development context.
Keywords: Continuous software engineering
Organising  BizDev  Software ecosystem


Strategy


Architecture


1 Introduction
Embedded and product-intensive software development project teams are becoming
increasingly interested in applying practices and tools for continuous software engineering (CSE) [1]; e.g., the Lean Startup method [2], DevOps [3], continuous delivery
(CD) [4] and continuous experimentation [5]. Although many of these practices are
widely acknowledged in the ﬁeld of website development [6, 7], there are only a few
frameworks that describe how CSE can be applied in product-focused embedded
system development (e.g. smartphones, cars etc.). Moreover, there is still very little
© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 643–661, 2016.
DOI: 10.1007/978-3-319-49094-6_52

644

T. Karvonen et al.

empirical evidence of the actual usage of these practices in this context. The existing
studies have mostly indicated severe challenges in adopting these practices in
business-to-business (B2B) and embedded system development [8–11] contexts. In
addition, CD and continuous experimentation still seem to mostly be used for
small-scale website development projects [6, 8, 12]. Fagerholm et al. [5] have recently
investigated continuous experimentation in university software laboratory projects with
two case companies and have introduced a model for explaining how the continuous
experimentation can be organised. However, more empirical studies are needed to
increase our understanding of how these practices could be implemented in different
software development contexts. Consequently, in this paper, our goal is to clarify the
key prerequisites for applying CSE in product-focused software development.
The sustainable success of a company can be linked to its capabilities in terms of
bringing new innovations to market. In today’s competitive and turbulent business
environment, time to market has also become very important. Consequently, business
stakeholders have identiﬁed rapid ﬁelding and continuous experimentation as important
elements of their long-term strategies. Development stakeholders are tasked with
ﬁnding a balance between development speed and stability, as the development process
speed can often be temporarily increased by collecting technical debt (e.g., skipping
some steps in the process), followed by a slowdown in development due to having to
pay off the debt later. Consequently, companies need practices for maintaining a
consistently high velocity. Bellomo et al.’s [13] suggests that companies must develop
combined practices such as “release planning with architecture considerations” and a
“prototype/demo with [a] quality attribute focus” to balance process speed and stability.
Efﬁcient integrative activities between software development and other functions (e.g.,
business and operations) are needed in all stages of the product lifecycle.
Fitzgerald and Stol [3] have emphasised continuous integration (CI) between
software development and its operational deployment (i.e., DevOps) as well as continuously assessing and improving the link between the business strategy and software
development (i.e., BizDev). However, they do not explicitly deﬁne how such a business strategy should be carried out or how it is enacted in a continuous manner. In this
paper, we want to clarify strategy planning activities and their interrelationship with
CSE. The focus of our study is on investigating CSE prerequisites in product-focused
(e.g., embedded systems) software development projects. Our research contributions
are as follows. First, we review the literature on CSE, strategy planning and models for
analysing holistic aspects of software-intensive product development. Second, we
construct and specify the CRUSOE framework (Continuous inteRdependencies in
prodUct-focused SOftware Engineering) for analysing CSE in software-intensive
projects. Third, we conduct a case study from a smartphone product platform project to
validate the framework. The research question for the case study is: What are the
prerequisites for using the CSE approach in software-intensive product development?
Finally, we synthesise the case-study ﬁndings with recent related studies by applying
the CRUSOE framework.

The CRUSOE Framework: A Holistic Approach to Analysing Prerequisites for CSE

645

2 Background
2.1

Holistic Models for Analysing the Development of Software-Intensive
Products

Various aspects of business and software ecosystems have been identiﬁed as important
research topics in the context of CSE [1, 3]. In addition, as explicitly stated by
Fitzgerald and Stol [3], continuity is required in all stages of the product lifecycle.
Subsequently, they stress that it is necessary to constantly evaluate and improve
software development interfaces with adjacent business-oriented activities. Previous CSE studies [3, 14] have suggested that delays in product development are often
caused by a lack of holistic thinking and/or models for analysing software product
development in a holistic manner. For example, inefﬁciencies and delays related to
“handoffs” [15], as addressed by lean thinking, have been identiﬁed as a typical form of
waste in software development, and thus the planning and engineering aspects of
software product development should not be decoupled in separate silos for efﬁciency
reasons. Still, information gaps and waste in between the business planning and
development cycle could become evident when organisations are pushed towards faster
(e.g., daily) or continuous software release cycles.
There are only a few documented approaches for analysing software product
development in a holistic manner. As stated by Bosch et al. [17], few, if any, models
exist that can analyse both the internal and ecosystem dimensions of research and
development (R&D). The Business, Architecture, Process and Organisation (BAPO)
[16] model (the left-hand side of Fig. 1) has been used for evaluating software product
families and for analysing four main concerns addressed in product development:
(1) how to make a proﬁt from products, (2) the technical means to build the software,
(3) responsibilities and relationships within software development and (4) the mapping
of roles and responsibilities to organisational structures. As an update and extension to
the BAPO model, the Ecosystem, Strategy, Architecture and Organizing (ESAO) [17]
model (the right-hand side of Fig. 1) addresses both the internal and ecosystem
dimensions for analysing company strategy, architecture and organising. Recently,
both the BAPO and ESAO model dimensions have been applied to describe the evolutionary steps in the transition from traditional development towards an innovation
experiment system (IES) [18]. Figure 1 illustrates the key dimensions of the BAPO and
ESAO models. Later in this paper, we elaborate on the dimensions of the ESAO model
as we construct the CRUSOE framework to analyse the CSE prerequisites.
The CRUSOE framework illustrates possible interrelationships among ESAO dimensions that we consider to be important for CSE. These interrelationships (links) have
also been previously addressed by the BizDev and DevOps concepts [3].
2.2

CSE in a Nutshell

When we use the term “CSE”, we are referring to an emerging software development
paradigm recently characterised by Bosch [14] (e.g., Stairway to Heaven, IES, and
continuous deployment) and Fitzgerald et al. [3] (e.g., Continuous*). IES is

646

T. Karvonen et al.

Fig. 1. The BAPO [16] and ESAO [17] models.

characterised by three coexisting aspects [19]: (1) “continuously evolving the software
by frequently deploying new versions”, (2) “customers and customer usage data play
[ing] a central role throughout the development process”, and (3) “development … focus
[ing] on innovation and testing as many ideas as possible with customers”. To be able to
frequently deploy new versions, it is necessary to be capable of CD, which, as deﬁned by
Humble and Farley [4], is an array of software development, CI and continuous
deployment patterns for enabling fast, reliable and automated deployments to production. Consequently, the CSE paradigm can be associated with the Lean Startup method
[2] (i.e., rapid validated learning) and enterprise agility [20] sensing (i.e., the capability
of continuously determining what is the most valuable feature for the customer) and
response (i.e., the capability of continuously deploying new versions to production).
Underlying agile principles emphasise collaborative work methods between business
people and developers: “Business people and developers must work together daily
throughout the project” [21]. Interdependencies between the business and development
aspects are often addressed via the need for establishing frictionless information flow
and decision making in product development [3, 15] (e.g., fast information flow,
transparency and continuous planning practices). Continuous information flow and
smaller batch sizes allow for better synchronisation of business planning with iterative
release planning methods and tools [22] (e.g., iterative feature prioritisation and road
mapping) and for breaking down requirements into small chunks that can be implemented, tested and deployed in approximately 1- to 2-week sprints, as emphasised in
Scrum methodology [23]. Finally, shorter iterations allow for faster customer feedback
cycles, thereby reducing the risk of developing the wrong product. In website development, continuous automated deployment can even enable rapid controlled experiments [7] (e.g., A/B tests) with end users.

The CRUSOE Framework: A Holistic Approach to Analysing Prerequisites for CSE

2.3

647

Business Management Views on Strategy and Strategic Planning

When juxtaposing the ideology of CSE practices [1, 3] with the traditional, rationalistic
view of strategy and organisations [24], they appear ill-matched. Whereas CSE stresses
the importance of real-time actions and continuous change, the rationalistic view of the
strategy process [24] focuses on the creation of a structured future plan that is temporally and practically separated from its implementation [26, 27]. This separation thus
relies on the assumption of a comparatively static and predictable business environment
that allows the rational managers [26] to ﬁrst create a plan based on systematic
scanning and positioning [25] and then implement it while having sufﬁcient control
over the consequences of their actions [27]. The usefulness of such theories for practice
has been questioned, as they do not sufﬁciently reflect today’s volatile organisational
reality [28, 29]. Therefore, there is a need for creating an understanding of strategymaking that better addresses the turbulent organisational reality.
Strategic planning is all about answering the questions of where you are, where you
want to be and how you get there, as well as deﬁning how these aspects are connected
[30]. The strategy process varies across companies, but at the company level, it should
be a continuous and issue-driven process [31]. In addition, the ways in which the
strategy can be implemented fall into speciﬁc routines and work patterns that vary from
ﬁrm to ﬁrm and between different types of ﬁrms [32]. Similarly, Brömmelstroet [33]
deﬁnes how strategic planning phases can vary widely in terms of how they are
organised (i.e., bottom up or top down), but all strategic planning processes can be seen
as multilevel company processes in which planning actors work together towards a
shared outcome. Even though the value of formal strategic planning has been strongly
questioned [34], it is still an activity that is widely carried out in companies [35].
Formal strategy has power in affecting organisational actions and practices, as it deﬁnes
roughly what is done and what is not done [36]. Eisenhardt and Brown [37] state that
strategy should be seen as temporary, complicated and unpredictable and that
strategy-making is a continuous process that is more oriented towards real-time
operations than long-term stable goals [37].
From the software development perspective, and especially from the agile and lean
software organisational perspective, Mavengere [38] clariﬁes that supply chain participants should have their own strategic plans which should be related to the whole
supply chain’s plan, but he does not go into detail about how these plans are created
(i.e., the planning process in detail). On the other hand, Koenigsaecker [39] presents a
lean strategic organisational process in which strategic planning is typically done once
per year and is a learning experience in and of itself. In addition, monthly strategy
deployment meetings are held to review progress and create opportunities for sharing
lessons learned. The monthly strategy deployment reviews also help get the enterprise
thinking about how to make the work process fundamentally better every month.
According to [40], among software development companies, the time frame for
long-term strategic planning is commonly 3 years plus the current year. One of the case
companies in this study used continuous strategic planning in which strategic plans
were reviewed quarterly and monthly. The two other case companies reviewed and
updated their strategic plans annually. Their strategic planning practices were closer to
traditional project-based strategies than to continuous strategy practices – because the

648

T. Karvonen et al.

planning was performed annually, management approved plans which were then
implemented for the rest of the year. Continuous strategy is extremely vital in a
business environment that is constantly changing [41]. Suomalainen et al. [41] clarify
that even though a strategy exists all the time, it should be iteratively and continuously
updated based on market and customer demands. For example, past ﬁnancial crises
have forced companies to realise that continuous planning is required throughout their
organisations; not only at the software development level, but also at the strategic,
business and ﬁnancial levels.

3 The CRUSOE Framework
In this section, we introduce a new framework called CRUSOE that we later use to
analyse the CSE prerequisites.
Figure 2 provides a simpliﬁed illustration of the CRUSOE framework, which
utilises the dimensions of the ESAO model [17]: (1) Strategy: ecosystem strategy
(ES) and company internal strategy (CIS); (2) Architecture: ecosystem architecture
(EA) and company internal architecture (CIA); and (3) Organising: ecosystem organising (EO) and company internal organising (CIO). The CRUSOE framework is an
enhancement of the ESAO model that, according to the principles of CSE [1, 3],
highlights the interdependencies between the three dimensions. These interdependencies are illustrated in Fig. 2, in which Areas 4, 5, 6 and 7 overlap with two adjacent
dimensions. These areas illustrate interdependencies (e.g., integrative activities and
combined practices) between dimensions. Area 7 illustrates the most overarching,
holistic practices for company governance. Consequently, Areas 4, 5 and 6 illustrate
more explicit integrative activities between dimensions, such as the BizDev concept
addressed by Fitzgerald et al. [3]. We argue that the BizDev activities are associated
with Areas 4, 5 and 7. According to Linden et al. [16], due to the interrelationships
among the dimensions, any change in one dimension may have consequences in
another dimension. These interrelationships have also been illustrated in the BAPO
model with arrows and lines (Fig. 1). In the rationalistic view of the strategy process,
and as stated by Bosch et al. [17], strategy should “idealistically” drive architecture and
architecture should drive organising. However, in practice, “one has to allow for
bi-directional dependencies” [17]. Moreover, the strategy must conform to empiricism
and business realities; e.g., seemingly irrational customer behaviour and the existing
constraints and capabilities of information technology (IT) and R&D. To summarise
the notions referred to earlier about the need for a flexible and dynamic strategy
process, we argue that the strategy process should not be seen as the process governing
“only business” or “all company processes”, but rather as a process that can, and
should, be continuously influenced by other company processes. Subsequently, there
are also bi-directional interactions and dependencies illustrated as overlapping areas
between each dimension and highlighted with numbers 4 to 7 in Fig. 2. Although the
CRUSOE framework is used herein to analyse the CSE prerequisites, we anticipate that
the framework could also be used for analysing other kinds of software-intensive
product development processes.

The CRUSOE Framework: A Holistic Approach to Analysing Prerequisites for CSE

649

Fig. 2. Simpliﬁed illustration of the CRUSOE framework.

In Table 1, we further elaborate on the key aspects of each area of the CRUSOE
framework. Areas 1 to 3 are adopted from the ESAO model deﬁnitions. Our contribution to this area of study relates to the questions associated with establishing integrative practices among the various ESAO dimensions (Areas 4–7). These questions
highlight relations between the ESAO dimensions and choices that the company has
available to it. For example, there could be a vast number of choices for how to build a
software-intensive product. However, only a few of the choices perhaps allow for the
proper means with which to generate revenue in the future (e.g. providing a proper
platform for a service business based on continuous deployment). Meanwhile, different
ecosystems could provide different technical and procedural capabilities for CSE. In
this paper, the purpose of the framework is to aid in analysing CSE in softwareintensive product development.

4 Case-Study Design
To investigate CSE in a real software development context and to validate the CRUSOE framework, we applied the case-study method [42]. Our goal in conducting
interviews was to gather data for a comprehensive understanding of the project’s goal
and of development and deployment practices. In addition, we asked about information
flow and interactions among company stakeholders, customers and suppliers. In eight
semi-structured face-to-face interviews, we asked participants to describe company
strategy planning practices and product development processes. We also explicitly
asked interviewees to identify the beneﬁts and barriers associated with using the CSE
approach to product development.
The unit of analysis in our case study was the project; i.e., developing a smartphone
platform. Due to conﬁdentiality reasons, we cannot provide a very detailed description
of the features of the product. The platform included both software and hardware
components. Consequently, this project is large, employing over 100 people directly
inside the case company and also several partners involved in hardware and software
development. We interviewed company personnel who were directly involved in the
product development or company-level strategy planning. We also collected data from
the organisation’s public webpages for a better understanding of the project’s purpose,

650

T. Karvonen et al.
Table 1. CRUSOE framework areas explained.

CRUSOE
framework Areas 1–
7
1* - Strategy

Analysis scope: company
internal (I)

Analysis scope: ecosystem (E)

What are the options for how the
company generates revenue now
and in the future? [17]

2* - Architecture

What are the options for
technology choices, technical
means and technical structures
to build software-intensive
products? [17]

3* - Organising

What are the options for ways of
organising work, ways of
working, roles, responsibilities,
processes and tools within
software development? [17]

4 - Strategy &
Architecture
interdependencies

What are the options to connect
the internal strategy and
architecture? E.g. what are the
practices for continuously
validating technology choices,
technical means and technical
structures that generate revenue
now and in the future?
What are the options to connect
the internal strategy and
organising? E.g. practices for
continuously adopting efﬁcient
ways of organising work, ways
of working, roles,
responsibilities, processes and
tools.

What are the options that the
company has available in its
current role in the ecosystem?
[17]
What are the options for how to
design interfaces between the
company’s internal
architecture and related
ecosystem partners, such as
suppliers providing solutions
and ﬁrms that build software on
top of a product or platform?
[17]
What are the options for how a
company works with customers,
suppliers, and ecosystem
partners in terms of processes,
tools used, ways of working and
ways of organising the
collaboration? [17]
What are the options to connect
the ecosystem strategy and
architecture? E.g. what are the
practices for continuously
comparing different ecosystems’
technical capabilities and
interfaces that generate revenue
now and in the future?
What are the options to connect
the ecosystem strategy and
ecosystem organising? E.g.
practices for continuously
validating investments in
ecosystem processes, tools,
ways of working and ways of
organising the collaboration in
the ecosystem.
What are the options to connect
the ecosystem architecture and
organising? E.g. practices for
providing appropriate technical
structures for continuous
deployments and collaboration
(continued)

5 - Strategy &
Organising
interdependencies

6 - Architecture &
Organising
interdependencies

What are the options to connect
the architecture and organising?
E.g. practices for continuously
refactoring technical structures
that provide efﬁcient organising
ways of working, roles,

The CRUSOE Framework: A Holistic Approach to Analysing Prerequisites for CSE

651

Table 1. (continued)
CRUSOE
framework Areas 1–
7

7 - Strategy &
Architecture &
Organising
interdependencies

Analysis scope: company
internal (I)

Analysis scope: ecosystem (E)

responsibilities, processes and
tools.
What are the overarching
company governance options for
connecting the company strategy
with technical architectures and
with ways of organising? E.g.
practices for enabling a
company culture of continuous
improvement, experimentation
and innovation.

with customers and ecosystem
partners.
What are the overarching
company governance options for
connecting the company strategy
with ecosystem interfaces and
ways of collaborating with
customers and ecosystem
partners? E.g. practices for
enabling a culture of continuous
improvement, experimentation
and innovation with customers
and ecosystem partners.

*Areas 1 to 3 are the same as in the ESAO model [17].

company vision and strategic signiﬁcance at the organisational level. We used convenience sampling for selecting interviewees and projects (i.e., those involving the
company that we could easily access). In addition, related workshop materials such as
video clips, photos and ﬁeld notes were collected and stored to support the analysis. All
of the data was collected in 1- to 1.5-hour semi-structured interviews with eight
company employees. The interviewed employees’ job titles and responsibilities are
summarised in Table 2. All interviews were recorded for later transcription and a
Table 2. Interviewees’ job titles and responsibilities.
Job title

Job responsibilities

Senior product
manager
Software platform
product owner
Quality manager

Responsible for delivering product programs to
customers
Platform software component–related supervising

Senior specialist
President of the
business segment
Business developer
Scrum master
Product manager

Product quality management including conformance
to product safety standards and environmental
regulations
Design and implementation of continuous
deployment processes and tools
Chief Executive Ofﬁcer for the business segment
Product business development
Responsible for coordinating software development
team work
Responsible for coordination of the product program

Interview
duration
117 min
103 min
105 min

104 min
67 min
78 min
86 min
92 min

652

T. Karvonen et al.

qualitative data analysis was conducted using the NVivo tool [43]. The analysis was
performed by considering the CRUSOE framework dimensions and the questions
presented in Table 1.

5 Findings
In this section, we analyse the prerequisites for applying CSE through case-study data
and by applying the CRUSOE framework areas introduced in Sect. 3.
5.1

ESAO Overview: Strategy (1), Architecture (2) and Organising (3)

The case company offers a wide range of products, platforms and R&D services that
typically involve radio technologies and wireless data transfer. Consequently, the
company is involved in multiple ecosystems. The company is also an active contributor
to several open source software projects. Recently, the company has expanded its
product portfolio towards Internet of Things (IoT) solutions and data analytics services.
Its main customers are from the B2B domain, including both private- and public-sector
organisations. Hence, when analysing the strategy and connected software development
practices, it is necessary to explicitly specify which product category and customer
segment is under analysis. The company has an established position in manufacturing
systems for public safety and the military. However, more recently, the company has
adopted a strategy for developing product platforms that allow tailored products to be
created that could be sold to consumer markets (B2C). The smartphone platform project
that we analysed in this case study is an example of this type of product.
We consider this company to be a very interesting research context for CSE
because the company has many products in their portfolio and there are many different
kinds of customers involved. This clearly addresses the challenges in deﬁning productand company-level processes.
In this way, the company has systematically developed capabilities to adapt to
different customer contexts and software ecosystems and also to rotate employees
among projects to develop the employees’ skills and technical knowledge. Consequently, the interviewees often referred to other projects that they were aware of or had
worked on previously. The interviewees emphasised that although there was a
company-wide deﬁned product development process that was, in principle, guiding all
company projects, individual projects often improvised; i.e., very different methods
were used or they worked in collaboration with speciﬁc customers and other ecosystem
partners. Hence, when asked about the beneﬁts of and barriers to using the CSE
approach in product development, the interviewees considered opportunities for using
the CSE approach as highly context-sensitive. Different products and customer segments were considered as having very different CSE prerequisites. These customer
segments could be characterised by two extremes: “conservative public-sector customers” and “fast-moving private-sector customers”. Different product segments could
be characterised by “large and complex multivendor legacy systems” and “compact
consumer products”.

The CRUSOE Framework: A Holistic Approach to Analysing Prerequisites for CSE

5.2

653

CRUSOE Area 4: Connecting Strategy and Architecture for CSE

This section analyses the options for connecting the internal and ecosystem strategy
with the architecture for CSE; i.e., interfaces and technical structures to create revenue.
The Android operating system (OS) [44] was selected as the software baseline for the
smartphone project. When considering the various smartphone OS ecosystems, the
Android OS currently has by far the largest market share, dominating markets with
over 80 % of the total market share [45]. Hence, the selected platform also allowed for
opportunities to generate future revenue as additional product applications could be
continuously provided via the Google Play store (http://play.google.com/store/apps).
The project budget and product strategy management-level planning occurred in
monthly cycles. The development teams organised their work into 2-week sprints that
also guided and synchronised several planning-related activities such as the planning
associated with validating technical interfaces, features and release-content prioritisation. The project actively used a CI system, thus new versions of the product could be
produced several times a day. These new software versions were mainly used for
internal testing purposes. Nevertheless, the project was considered capable of continuously delivering new versions to demonstrate the latest interfaces and product features
for customers and ecosystem partners. We consider that interdependency with business
planning (i.e., continuous synchronisation of development sprints with continuous
strategic planning and budgeting cycles) was clearly a key prerequisite for using the
CSE approach because it allowed for continuous feedback cycles and transparency in
terms of how the project was progressing.
5.3

CRUSOE Area 5: Connecting Strategy and Organising for CSE

This section analyses the options for connecting the company’s internal and ecosystem
strategy with internal and ecosystem organising for CSE. The interviewees emphasised
how the transition towards CSE was a strategic decision that governed practices
relating to how products were designed and how the company was organised. Moreover, adopting CSE was considered as a competitive advantage in that it provides better
transparency, efﬁciency and flexibility when working with different customer projects.
When considering the smartphone project-level CSE strategy, we could see that the
CSE approach was mainly limited to software development practices such as CI and
test automation. The interviewees also pointed out that while the company was a small
player in the Android OS ecosystem, it had to adjust its internal plans according to
supplier schedules and technology roadmaps. Meanwhile, larger smartphone vendors
had more power to affect their supplier’s plans. Consequently, the interviewees considered supply chain support as a key CSE prerequisite and also as a key hindrance to
not being able to fully implement the CSE approach (e.g., CD in the project).
To summarise our ﬁndings regarding strategic dimension interdependencies with
organising, we consider that currently, the company’s internal strategy is to adopt a
CSE capability. This is also the main driver for using CSE in the company projects.
The company had already made signiﬁcant investments in terms of promoting CI and
CD solutions (e.g., automation) for use in software-intensive product projects.

654

T. Karvonen et al.

As concrete evidence of the strategic decision, the company had established a team of
experts to implement the technical CI and CD framework (toolbox) so that it could be
adopted in all company projects. This expert team was also in charge of coaching
projects on how to adopt CI development practice. We consider this activity as a key
interdependency between the strategy and organising aspects that was clearly contributing to the company’s transition towards CSE.
When considering other aspects of the ecosystem, some of the existing customers
showed very little interest in using the CSE approach in product development projects.
So far, only a few private-sector B2B customers had insisted on using the CSE
approach in the development of consumer products. One interviewee pointed out that
some customers have very little knowledge of and experience in agile product development methods. Meanwhile, public-sector customers in particular often have established and formal staged processes for acquiring software-intensive products; e.g.,
communication equipment for the military and for public safety must go through
rigorous testing and certiﬁcation processes before it can be put into actual use. Consequently, an educated and motivated customer was considered as an important CSE
prerequisite.
5.4

CRUSOE Area 6: Connecting Architecture and Organising for CSE

This section analyses the options for connecting the company’s internal and ecosystem
architecture with organising for CSE. The interviewees considered the selected
Android OS platform architecture (e.g., the hardware and software technology platforms and associated tools) to provide an adequate technical capability in terms of
delivering software to end users rapidly and over-the-air (OTA) [46]. We consider this
as a key prerequisite associated with the interdependency between the organising and
architecture dimensions.
Several interviewees emphasised how the technical capability of providing updates
continuously must be aligned with quality assurance practices and cycles for testing.
The interviewees stated that the prerequisite of frequently delivering new updates to the
end user would have to precede the rigorous internal testing period. One interviewee
stated that some bugs can be identiﬁed only after using the product for a long period of
time, which could be a challenge for CD. System updates that require rebooting or
interrupting end-user product usage were also considered problematic since they could
easily annoy end users. Moreover, updates in business and critical safety systems
cannot interrupt or compromise the availability of the service. Consequently, it is a
prerequisite to minimise breaks in service availability and deliver updates so that the
end user is not interrupted.
We consider the interviewees’ previous experiences in using rapid prototyping and
demoing to be somewhat controversial. Although rapid prototyping was acknowledged
as very important and good practice for identifying key functionalities and requirements for the product in the early phases of development, there were also drawbacks
such as undisciplined processes for bug ﬁxes and feature prioritisation. Two interviewees identiﬁed the problem known as the HiPPO; i.e., the “Highest Paid Person’s
Opinion” [7]. One interviewee emphasised how the processes for building prototypes

The CRUSOE Framework: A Holistic Approach to Analysing Prerequisites for CSE

655

and actual products were very different. Although prototypes can often be used for
demonstrating new functionalities, they do not typically meet the proper internal
quality criteria that are required for real products. Consequently, some managers are
often too optimistic about how much work is still to be done in order to ﬁnalise the
product. Therefore, the company needs to develop balanced processes that integrate
speed and stability in order to build actual products in an experimental manner. An
important CSE prerequisite is thus that the company increases its understanding of the
experimentation process and that it reviews current best practices, milestones and
checklists for product development. Methods for managing technical debt are particularly important prerequisites for CSE.
5.5

CRUSOE Area 7: Overarching Governance for CSE

Finally, this section analyses the options for overarching company governance for CSE.
As stated earlier, we considered that the company management had clearly made a
strategic decision to promote the CSE approach in all of its company projects. Two
interviewees stated that any investment promoting CSE was an important “investment
for the future”. The interviewees considered it important to establish and improve
systems for company-wide information transparency and for the real-time availability
of customer feedback and product quality metrics. We consider the company’s senior
management’s commitment to promoting the capability of using the CSE approach in
product development as a key CSE prerequisite. We can identify several activities that
indicate the company management’s commitment to investing in CSE, such as an
investment in people, tools and processes for enabling the rapid deployment of CI in all
company projects; increasing test automation coverage; developing methods for
end-user data collection (product platform instrumentation for data collection); systematising customer feedback collection (e.g., customer surveys); and developing tools
and processes for analysing user data and sharing information internally in the company via IT systems. The company also arranged regular sessions for employees to
promote internal experience sharing and bottom-up strategic planning.
5.6

Findings’ Summary

In summary, we identiﬁed the following key CSE prerequisites in a smartphone
platform project: (1) customer education and motivation, (2) software ecosystem
support, (3) supply chain stakeholder support, (4) leadership commitment, (5) process
rigor for experimentation, (6) quality assurance process cycle duration, (7) technical
debt management, (8) OTA updates with minimised breaks in service availability, and
(9) internal experience sharing and bottom-up strategic planning. The CRUSOE
framework has signiﬁcantly helped us to systematically categorise and more clearly
articulate the prerequisites for using CSE in the case-study (smartphone platform)
project. Based on our case-study ﬁndings, applying CSE to product-oriented development can involve a complex organisational change within and between software
development and business activities. Whereas the adoption of technical infrastructure

656

T. Karvonen et al.

and development practices is an important starting point for CSE, one should also
consider the company’s culture, leadership and key stakeholder relations.

6 Discussion
This section continues our interpretation of the case-study results and synthesis of the
prerequisites for CSE in software-intensive projects together with recent related
empirical studies on the research topic. In Table 3, we list the main ﬁndings from the
studies.
6.1

Synthesising the Prerequisites for CSE

As identiﬁed in previous studies referred to in Table 3, the challenges associated with
CSE are often multidimensional. Incorporating pilot or lead customers in the development process and business-model change is clearly a commonly identiﬁed prerequisite for CSE that involves both the company’s internal and ecosystem processes. In
addition, the supply chain (e.g., the component and technology platform suppliers)
must be incorporated in the development cycle to be able to continuously integrate all
of the product components and test the product.
As stated by Facebook’s release engineering manager, “Mobile deployments are
more challenging than Web deployments because we don’t own the ecosystem, so we
can’t do all the things that we would normally do” [12]. A company’s role in the
ecosystem can signiﬁcantly affect how feasible it is to use the CSE approach. Earlier
case studies have identiﬁed ecosystem-related challenges in CD, such as dependencies
on the hardware platform component supply process and interrelated customer processes, such as periodic tendering, periodic budgeting, product piloting and acceptance
Table 3. Recent empirical studies on obstacles and challenges for CSE.
Leppänen et al. [8] Obstacles for CD: “resistance to change”, “customer preferences”, “domain
constraints”, “developer trust and conﬁdence”, “legacy code considerations”, “[test automation]
duration, size and structure”, “different development and production environments”, “manual
and non-functional testing”.
Lindgren et al. [9] Domain-independent challenges associated with continuous
experimentation: “organizational culture”, “availability and sharing of data”, “data analysis”,
“identifying metrics”, “release cycle speed”, “deﬁning product roadmap”, “time [resources]”,
funding [resources], “technical obstacles”.
Rissanen et al. [10] B2B speciﬁc challenges of CD: “technical challenges”, “customer
challenges”, “procedural challenges”.
Olsson et al. [47] Challenges identiﬁed in the adoption of CD: “diverse adoption of agile
practices among teams”, “complexity of team resource allocation”, “dependence on resources
outside of the team”, “difﬁculties in analyzing and maintaining automated tests”, “difﬁculty in
removing or reducing old tests”, “difﬁculties in establishing efﬁcient rollback mechanisms”, “no
effective mechanism for analysis of customer data”, “lack of understanding about feature use”,
“no pro-active use of customer data”.

The CRUSOE Framework: A Holistic Approach to Analysing Prerequisites for CSE

657

testing practices. These ecosystem-related constraints could require overarching
changes in a company’s business model, architecture and organising.
As identiﬁed in previous case studies, the product architecture must provide
capabilities for adequate componentisation for partial and staged release, including
roll-back mechanisms. Additionally, internal and ecosystem stakeholder needs must be
addressed in the architecture; e.g., enabling CI and partial deliveries of products
without updating the whole product. Technical capabilities for CD and continuous
testing in production-like (staging) test environments are prerequisites for pushing
reliable, bug-free releases into the customer’s production environment. While
deployment to the production environment can technically be similar to deployment to
the staging environment, it involves risks directed towards the customer’s business.
Consequently, deployment to production must ensure the rapid identiﬁcation of any
abnormalities in the system and, if needed, an immediate roll-back to the previous
functional conﬁguration.
The experimentation of new functionalities on Facebook is conducted via “canarying” [12]; i.e., collecting user data from alpha and beta test groups before engaging
in mass deployments, where changes are pushed out to all production systems (servers).
As identiﬁed by Rahman et al. [6], CD is used almost solely for deploying websites.
Hence, we consider that Internet- and cloud-based virtualisation technologies provide
the best technical capabilities for CSE. Cloud-based services are nowadays often integrated with embedded systems and consumer products (e.g., sports-tracking and
health-monitoring applications). Consequently, this trend may also enable the increasing
use of the CSE approach in the future.
As identiﬁed in previous studies, it is a prerequisite that a project must have the
capability for continuous integration and testing of the whole product. The existing
system for CI can often be incrementally upgraded for automated delivery; i.e., a
continuous release process also involving the release of the decision-making (e.g.,
acceptance) process for customer deliveries. As the release cycle may shorten quite
dramatically, it is paramount that the user experience (UX) and system design functions
are integrated into the development team to enable the efﬁcient planning of features.
From a release planning point of view, in traditional software projects, the customer’s role is to be involved in the planning and freezing of requirements at the
start-up stages of the project. Consequently, the customer accepts project delivery
based on customer validation at the end of the project. From the development process
point of view, a customer’s role and responsibilities could change radically when
moving from periodic traditional methods towards CSE. Although a product owner can
represent the customer, the actual customer must also take a more active stakeholder
role throughout the project because deliveries can be experimental (tentative and prone
to change) and they can occur more frequently. Subsequently, the delta between
deliveries to be accepted by the customer is smaller. CD, however, depending on the
industry domain, may radically impact how value is delivered to users.
Moving away from periodic delivery (large releases) often leads to inevitable
business-model transformation. CD involves customer use and purchase processes.
Internal business planning, requirement prioritisation and release delivery processes are
also involved. However, organisational capabilities for CD must be considered. As
stated by Facebook’s release engineering manager, “CD works for small teams, within

658

T. Karvonen et al.

20 to 30 changes per day”. Hence, more complex systems and large projects (i.e., with
more than 100 developers) working in a common code base may have to settle for a
lower continuous deployment frequency.
6.2

Study Limitations, Future Studies and Threats to Research Validity

The main limitation in our study relates to the constructed CRUSOE framework
because it has only been validated through one case study. More empirical studies are
needed to validate the framework’s true utility. Moreover, as our goal was to understand the holistic prerequisites for CSE, we acknowledge that is it also beneﬁcial to
investigate explicit CSE practices and also dependencies within individual dimensions.
Consequently, for future studies, it might be useful to either scope the research topic
and/or apply the framework in a different software project context. Applying the
CRUSOE framework in a different context would also provide information on how to
improve the framework.
The main threats to validity and the limitations in case studies are typically
addressed by the data-collection methods, data interpretation, reliability and generalisability of the results. Due to the nature of the case-study method, the results are not
generalisable to the whole industry. To mitigate the risks associated with construct
validity (e.g., misunderstandings and misinterpretations between the interviewer and
interviewees), we started each interview with a 5- to 10-min introduction on the
research topic and key concepts of CSE (e.g., CD and continuous experimentation).
The semi-structured interview method allowed us to ask clarifying questions
throughout the interview. We also arranged for an interactive feedback session with the
case company representatives to share interview summaries and to get feedback on our
analysis; i.e., how we (the researchers) interpreted the interviewees’ answers. The
company stakeholders conﬁrmed that our ﬁndings were correct.

7 Conclusion
In this paper, we have applied the case-study method to a smartphone platform project
to investigate the prerequisites for CSE. First, we speciﬁed the CRUSOE framework in
terms of it allowing for a holistic, systematic and structured investigation of the prerequisites for CSE. We consider that the CRUSOE framework enabled us to more
precisely articulate and analyse the prerequisites for CSE. The framework can further
aid in developing estimations regarding the changes that are needed when moving from
traditional product development to CSE. Finally, the case-study ﬁndings were synthesised alongside related recent studies. The results indicate that using the CSE
approach in product-focused software development could involve several areas within
and between the strategy, architecture and organising dimensions. Moreover, novel
integrative activities are needed for eliminating disconnects and for balancing speed
and stability (e.g., feature-driven development and managing cumulating technical
debt). Although these are initial ideas on how to organise continuous experimentation
in software development, rigorous processes are needed between the customer and the

The CRUSOE Framework: A Holistic Approach to Analysing Prerequisites for CSE

659

supplier. Our case study indicated that opportunities for using the CSE approach in
product development are often context-sensitive (e.g., customer and product dependent). Moreover, customer motivation and ecosystem support for CSE are important.
Although the CSE approach is mostly used for website development, more systematic
use of CSE could enhance the competitiveness of product-oriented companies. However, more prescriptive models and best practices are clearly needed to describe how
CSE should be implemented in product-oriented software development.
Acknowledgments. This work was supported by TEKES as part of the Need for Speed Project
(http://www.n4s.ﬁ/) of DIMECC (Digital, Internet, Materials & Engineering Co-Creation).

References
1. Bosch, J.: Continuous Software Engineering: An Introduction. In: Bosch, J. (ed.) Continuous
Software Engineering, pp. 3–13. Springer, Switzerland (2014)
2. Ries, E.: The Lean Startup: How Today’s Entrepreneurs Use Continuous Innovation to
Create Radically Successful Businesses (2011)
3. Fitzgerald, B., Stol, K.J.: Continuous software engineering: a roadmap and agenda. J. Syst.
Softw. (2015). doi:10.1016/j.jss.2015.06.063
4. Humble, J., Farley, D.: Continuous Delivery: Reliable Software Releases through Build,
Test, and Deployment Automation (2010)
5. Fagerholm, F., Guinea, A.S., Mäenpää, H., Münch, J.: The RIGHT model for continuous
experimentation. J. Syst. Softw. (2016). doi:10.1016/j.jss.2016.03.034
6. Rahman, A.A.U., Helms, E., Williams, L., Parnin, C.: Synthesizing continuous deployment
practices used in software development. In: 2015 Agile Conference, pp. 1–10. IEEE (2015)
7. Kohavi, R., Henne, R.M., Sommerﬁeld, D.: Practical guide to controlled experiments on the
web. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining - KDD 2007, p. 959. ACM Press, New York (2007)
8. Leppänen, M., Mäkinen, S., Pagels, M., Eloranta, V.-P., Itkonen, J., Mäntylä, M.V.,
Männistö, T.: The highways and country roads to continuous deployment. IEEE Softw. 32,
64–72 (2015)
9. Lindgren, E., Münch, J.: Software development as an experiment system: a qualitative
survey on the state of the practice. In: Lassenius, C., Dingsøyr, T., Paasivaara, M. (eds.) XP
2015. LNBIP, vol. 212, pp. 117–128. Springer, Heidelberg (2015). doi:10.1007/978-3-31918612-2_10
10. Rissanen, O., Münch, J.: Transitioning towards continuous delivery in the B2B domain: a
case study. In: Lassenius, C., Dingsøyr, T., Paasivaara, M. (eds.) XP 2015. LNBIP, vol. 212,
pp. 154–165. Springer, Heidelberg (2015). doi:10.1007/978-3-319-18612-2_13
11. Lwakatare, L.E., Karvonen, T., Sauvola, T., Kuvaja, P., Olsson, H.H., Bosch, J., Oivo, M.:
Towards DevOps in the embedded systems domain: why is it so hard? In: 2016 49th Hawaii
International Conference on System Sciences (HICSS), pp. 5437–5446. IEEE (2016)
12. Adams, B., Bellomo, S., Bird, C., Marshall-Keim, T., Khomh, F., Moir, K.: The practice and
future of release engineering: a roundtable with three release engineers. IEEE Softw. 32,
42–49 (2015)
13. Bellomo, S., Nord, R.L., Ozkaya, I.: A study of enabling factors for rapid ﬁelding combined
practices to balance speed and stability. In: Proceedings of the International Conference on
Software Engineering, pp. 982–991 (2013)

660

T. Karvonen et al.

14. Bosch, J. (ed.): Continuous Software Engineering. Springer, Switzerland (2014)
15. Poppendieck, M., Poppendieck, T.: Implementing Lean Software Development: From
Concept to Cash (2006)
16. Van Der Linden, F., Bosch, J., Kamsties, E., Känsälä, K., Obbink, H.: Software product
family evaluation. In: Nord, R.L. (ed.) SPLC 2004. LNCS, vol. 3154, pp. 110–129.
Springer, Heidelberg (2004). doi:10.1007/978-3-540-28630-1_7
17. Bosch, J., Bosch-Sijtsema, P.: ESAO: a holistic ecosystem-driven analysis model. In:
Lassenius, C., Smolander, K. (eds.) ICSOB 2014. LNBIP, vol. 182, pp. 179–193. Springer,
Heidelberg (2014). doi:10.1007/978-3-319-08738-2_13
18. Olsson, H.H., Bosch, J.: Climbing the stairway to heaven: evolving from agile development
to continuous deployment of software. In: Bosch, J. (ed.) Continuous Software Engineering,
pp. 15–27. Springer, Switzerland (2014)
19. Bosch, J.: Building products as innovation experiment systems. In: Cusumano, M.A., Iyer,
B., Venkatraman, N. (eds.) ICSOB 2012. LNBIP, vol. 114, pp. 27–39. Springer, Heidelberg
(2012). doi:10.1007/978-3-642-30746-1_3
20. Overby, E., Bharadwaj, A., Sambamurthy, V.: Enterprise agility and the enabling role of
information technology. Eur. J. Inf. Syst. 15, 120–131 (2006)
21. Beck, K., Beedle, M., Van Bennekum, A., Cockburn, A., Cunningham, W., Fowler, M.,
Grenning, J., Highsmith, J., Hunt, A., Jeffries, R., Kern, J., Marick, B., Martin, R.C., Mellor,
S., Schwaber, K., Sutherland, J., Thomas, D.: Agile Manifesto. http://agilemanifesto.org/
22. Ruhe, G.: Product Release Planning Methods, Tools and Applications. Auerback
Publications, Taylor and Francis Group, LLC (2010)
23. Schwaber, K., Beedle, M.: Agile Software Development with Scrum (2001)
24. Hutzschenreuter, T.: Strategy-process research: what have we learned and what is still to be
explored. J. Manage. 32, 673–720 (2006)
25. Tsoukas, H., Chia, R.: Philosophy and Organization Theory. Emerald Group Publishing
Limited (2011)
26. Vaara, E., Kleymann, B., Seristo, H.: Strategies as discursive constructions: the case of
airline alliances. J. Manag. Stud. 41, 1–35 (2004)
27. MacKay, R.B., Chia, R.: Choice, chance, and unintended consequences in strategic change:
a process understanding of the rise and fall of NorthCo Automotive. Acad. Manag. J. 56,
208–230 (2012)
28. Sandberg, J., Tsoukas, H.: Grasping the logic of practice: theorizing through practical
rationality. Acad. Manag. Rev. 36, 338–360 (2011)
29. Chia, R.: A “Rhizomic” model of organizational change and transformation: perspective
from a metaphysics of change. Br. J. Manag. 10, 209–227 (1999)
30. Bryson, J.M.: Strategic Planning for Public and Nonproﬁt Organizations: A Guide to
Strengthening and Sustaining Organizational Achievement (2011)
31. Bogsnes, B.: Implementing Beyond Budgeting: Unlocking the Performance Potential (2008)
32. Nordqvist, M., Melin, L.: The promise of the strategy as practice perspective for family
business strategy research. J. Fam. Bus. Strateg. 1, 15–25 (2010)
33. te Brömmelstroet, M.: Performance of planning support systems. Comput. Environ. Urban
Syst. 41, 299–308 (2013)
34. Mintzberg, H.: The Rise and Fall of Strategic Planning (2000)
35. Whittington, R., Cailluet, L.: The crafts of strategy. Long Range Plann. 41, 241–247 (2008)
36. Balogun, J., Huff, A.S., Johnson, P.: Three responses to the methodological challenges of
studying strategizing. J. Manag. Stud. 40, 197–224 (2003)
37. Eisenhardt, K.M., Brown, S.L.: Competing on the edge: strategy as structured chaos. Long
Range Plann. 31, 786–789 (1998)

The CRUSOE Framework: A Holistic Approach to Analysing Prerequisites for CSE

661

38. Mavengere, N.B.: Information technology role in supply chain’s strategic agility. Int. J. Agil.
Syst. Manag. 6, 7–24 (2013)
39. Koenigsaecker, G.: Leading the Lean Enterprise Transformation (2009)
40. Suomalainen, T.: Deﬁning continuous planning through a multiple-case study. In:
Abrahamsson, P., Corral, L., Oivo, M., Russo, B. (eds.) PROFES 2015. LNCS, vol.
9459, pp. 288–294. Springer, Heidelberg (2015). doi:10.1007/978-3-319-26844-6_21
41. Suomalainen, T., Kuusela, R., Tihinen, M.: Continuous planning: an important aspect of
agile and lean development. Int. J. Agil. Syst. Manag. 8, 132 (2015)
42. Runeson, P., Höst, M.: Guidelines for conducting and reporting case study research in
software engineering. Empir. Softw. Eng. 14, 131–164 (2008)
43. QRS International: NVivo (2016). http://www.qsrinternational.com/
44. Google: Android. https://www.android.com/
45. Gartner: Gartner Says Worldwide Smartphone Sales Grew 9.7 Percent in Fourth Quarter of
2015. http://www.gartner.com/newsroom/id/3215217
46. Hoffman, T.L.: Over-the-air programming of wireless terminal features (2003). https://www.
google.com/patents/US6622017
47. Olsson, H.H., Bosch, J.: Towards agile and beyond: an empirical account on the challenges
involved when advancing software development practices. In: Cantone, G., Marchesi, M.
(eds.) XP 2014. LNBIP, vol. 179, pp. 327–335. Springer, Heidelberg (2014). doi:10.1007/
978-3-319-06862-6_27

Software Development in the Post-PC Era:
Towards Software Development as a Service
Sami Alajrami1(B) , Alexander Romanovsky1 , and Barbara Gallina2
1

Newcastle University, Newcastle upon Tyne, UK
{s.h.alajrami,alexendar.romanovsky}@newcastle.ac.uk
2
Mälardalen University, Västerås, Sweden
barbara.gallina@mdh.se

Abstract. Over the years, software development has evolved to meet
the needs of new types of applications and to embrace new technological
disruptions. Today, we witness the rise of mobility where the role of the
conventional high-end PC is declining. Some refer to this era as the PostPC era. This technological shift, powered by a key enabling technology,
cloud computing, has opened new opportunities for human advancement.
Consequently, the evolving landscape of software systems drives the need
for new methods for conceiving them. Such methods need to: (a) address
the challenges and requirements of this era and (b) embrace the benefits
of new technological breakthroughs. In this paper, we list the characteristics of the Post-PC era from the software development perspective
and describe two motivating trends of software development processes.
Then, we derive a list of requirements for the future software development from the characteristics of the Post-PC era and from the motivating
trends. Finally, we propose a reference architecture for cloud-based software process enactment as an enabler for Software Development as a
Service. The architecture is the first step addressing the needs that we
have identified.

Keywords: Software Development
ment · Clouds

1

·

Post-PC Era

·

Process Enact-

Introduction

Software systems are playing a critical role in modern society. Many aspects of
our lives (e.g. transport and health care) are dependent on software. In a way,
software is smartifying our lives through the smart X trend (phones, watches,
glasses, cars, grids and cities). The list goes on leading to a smart society where
every aspect of the society is connected to, inﬂuenced by, and dependent on
software. Although, this helps addressing several societal challenges, it comes
with the cost of increased software complexity. This complexity is then reﬂected
on the way software is conceived where the expectations of quality, reliability,
security, safety and fast delivery are higher than ever.
c Springer International Publishing AG 2016

P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 662–671, 2016.
DOI: 10.1007/978-3-319-49094-6 53

Software Development in the Post-PC Era

663

Driven by challenges and opportunities, software development will continue
to evolve to address the smart society needs and beyond. For example, the
Internet has made Global Software Engineering (GSE) possible while economical
factors and market needs have motivated the rise of new development paradigms.
As Maximilien and Campos point out [10], we are entering the Post-PC era.
This era is characterized by the increasing mobility and connectivity of people
and devices, and the use of the Internet as a computing delivery medium. The
role of the traditional personal computers (high-speciﬁcation desktops) is gradually declining. Personal computers are becoming mobile and low-speciﬁcation
devices. Users can use any Internet-connected low-speciﬁcation device to perform their tasks on powerful computing resources delivered over the Internet
(using tools which are delivered as services). With this mobility, the relevance of
OSs/platforms becomes less [7] as many software applications are oﬀered in an
OS/platform neutral fashion (e.g. services or HTML5). Cloud computing provides the enabling computing infrastructure on demand for such applications.
Accordingly, the way software is conceived needs to adapt to the rising Post-PC era. Software development is a complex socio-technical process
which involves multiple stakeholders. Development teams use a wide range of
tools/platforms for development, testing, deployment and operation of software.
Some of these tools are already oﬀered through the Internet (e.g. Eclipse Orion1 ).
This paradigm is often referred to as Tools as a Service (TaaS). TaaS, however,
overlooks the organizational aspects of the process. Therefore, there is a need
for Software Development as a Service (SDaaS) which uses the cloud to support
modelling, managing and enacting software processes in a model-driven fashion. SDaaS can utilize cloud as an execution and distribution medium where
tools are oﬀered as services and orchestrated in workﬂows. Development environments will be created on the ﬂy and scaled as needed. Engineers will be able
to do their work on-the-go from anywhere. Furthermore, modelling and monitoring the process itself will integrate the organizational and management aspects
into the development environment.
In this paper, we propose a reference architecture for cloud-based software
process enactment as an enabler for Software Development as a Service (SDaaS).
This architecture brings the beneﬁts of clouds and modelling to support development processes. We describe two industry-inspired development trends from the
two themes: Continuous Delivery and Global Software Engineering. We highlight the impact of the Post-PC era on software development and identify the
requirements of software development in that era. Based on these requirements,
we design the proposed SDaaS architecture.

2

Motivating Trends

In this section, we list and discuss two industry-inspired motivating trends which
describe diﬀerent development/business needs a modern software vendor is facing. For each trend, we discuss its impact on software development.
1

https://orionhub.org/.

664

2.1

S. Alajrami et al.

Continuous Delivery

Continuous Delivery [9] has become a trendy software development paradigm
along with DevOps. Together, they aim at bridging the gaps between development and operations teams and automate the build-test-deploy-release cycle.
The motivation is to achieve frequent releases, reduce conﬂicts and therefore,
reduce cost. To achieve such automation, teams should follow certain practices
and use supporting tools/platforms. Humble and Farley [9] set the principles
and technical practices for successful implementation of Continuous Delivery. We
use Facebook’s deployment pipeline [6] as an example of a Continuous Delivery
process for large projects. Facebook is an example of a complex software that
requires rapid innovation and release of new features. As shown in Fig. 1, the
release cycle for each new feature starts by engineers coding a new feature or a
bug ﬁx. The code is then reviewed by a diﬀerent engineer using the Phabricator
code review tool. Tools such as distributed source control and automated testing
packages are used. The code is released on stages: ﬁrst it is released to internal
employees to test it and is also tested for performance issues using Preflab. Then
(after ﬁxing any discovered issues), it is released to a small portion of users using
the Gatekeeper tool. Only after these stages have passed successfully, the new
feature would be released to all users.
Discussion. Facebook is delivered through the Internet and changes and new
features are continuously pushed to users transparently. This means that developers will be committing and integrating code very often (sometimes on daily
basis). The beneﬁts of such frequency includes maintaining a bug-free code base
and easier bug ﬁxing (since searching for bugs is limited to last pushed code).
Automation and repeatability of the software build-test-deployment-release are
a key enabling factor to Continuous Delivery. To pick up the fruits of Continuous
Delivery, the social/organizational aspect must be considered. For example, if
developers do not commit their code regularly, the Continuous Delivery chain
is broken. Therefore, there is a need for convergence and monitoring support to
ensure certain processes and practices are followed.
2.2

Software Outsourcing

The Post-PC era is also a globalized era. Software development outsourcing
was driven by business and economic factors (e.g. exploiting low-cost developers

Fig. 1. Facebook’s deployment pipeline [6].

Software Development in the Post-PC Era

665

and reducing the time-to-market). In addition, companies tend to outsource the
tasks that they lack the skills or expertise to perform. Outsourcing can take place
either within the same organization (intra-organization) or across organizations
(inter-organization).
This example is inspired by the railway system development. In this scenario, there are two companies cooperating on system development. Company
A is a contractor that runs large industrial projects for designing/redesigning
railway networks. Among various tools the company uses a number of simulation tools to visualise and analyse the systems it is building, to debug them,
to check their characteristics (such as throughput, energy consumption, performance and capacity). During such projects company A develops a wide range
of models, diagrams, documents and blueprints that will be used for building
the network. As part of this work, company A needs to develop a safe signalling
software to operate the network by following a stringent software process. To
ensure the system safety, company A would like to use industry-strength formal technologies. Company A does not have expertise in conducting large-scale
formal veriﬁcation of complex systems so it decides to outsource this work to
small independent company B that has the right skill set. Conducting this type
of veriﬁcation is the main business of company B. The artefacts to be used by
company B include layouts, infrastructure data, service patterns, timetables and
control tables. Due to the conﬁdential nature of these artefacts, company B signs
a non disclosure agreement and a Service Level Agreement (SLA) with company
A and as a precaution, it undertakes all its processes on a private infrastructure.
Both companies (A and B) only exchange relevant artefacts and do not know
each other’s internal processes.
Discussion. In reality, large scale projects may include intra and interorganization outsourcing with other teams/partners. Management of such
projects can be tedious and consumes enormous resources (time and eﬀort) to
monitor and synchronize the diﬀerent outsourced sub-projects. Several issues
may arise. Small issues such as using diﬀerent tool versions by diﬀerent teams
may easily go unnoticed till a late stage of the project at which it will become
very costly to ﬁx. Other concerns include how to ensure the quality of the outsourced tasks and how to monitor that they have been performed according
to SLAs. Process-state-awareness and communication is vital for the success of
such distributed development projects [7]. Therefore, there is a need for eﬃcient
management and monitoring of such projects.

3

Characteristics of the Post-PC Era

The term Post-PC era was used to describe the fall of PC sales due to the rise of
mobile devices. When David Clark used the term for the ﬁrst time in a talk called
“The Post-PC Internet” in 1999, he predicted that the future will be “inevitably

666

S. Alajrami et al.

heterogeneous” and “a network full of services”2 . Today, we can see this prophecy
taking place in the form of heterogeneous mobile devices and services while PCs
are becoming more portable and low-speciﬁcation. The technology shift in this
era is enabled by cloud computing technology and the Internet. This shift has
changed the way users access and interact with technology. We categorize the
characteristics of the Post-PC era into two categories: (a) technical and (b)
organizational.
3.1

Technical Characteristics

The Rise of Mobility. Over the past few years, mobile devices have been shaking the dominance of PCs. Users use mobile devices for many daily activities.
This has enabled new business models and new software distribution platforms
(e.g. app stores) [7]. Consequently, users have become more mobile and have
adopted new interaction patterns for interacting with technology (e.g. touch
and voice). This increasing mobility impacts software development in two ways:
one impacts the produced mobile software (e.g. to have less power consumption) and the other impacts the development process itself. The new interaction
paradigms that came with mobile devices have driven new works on unconventional development methods. Microsoft TouchDevelop [4] platform enables
programming on the go using only mobile phone touch screens. Another trend is
using voice recognition to input code3 . The Cloud as the Development and
Operation Platform. Mobile devices have limited computing power. To overcome this challenge, mobile applications delegate the processing and storage to
cloud platforms over the Internet. Cloud computing allows acquiring computing
resources on the ﬂy and on a pay-as-you-go pricing model. This paradigm has
enabled Software, Platform (hardware, OS, etc.) and Infrastructure to be oﬀered
as services over the Internet. Consequently, software development is increasingly
relying on Internet services which enable collaboration and integration between
development teams (e.g. Github4 ). Open source software and crowdsourcing are
examples of how the Internet (powered by the cloud) enables collaborative development. In addition, many software systems are now built by aggregating other
services from the Internet. Cloud is becoming the development and the operation
environment for software. This trend raises the need for alternative methods and
technologies to conceive, design, implement, test, deploy and evolve software [7].
3.2

Organizational (Business) Characteristics

On Demand Infrastructure and Tools Acquisition. With cloud and
services, traditional software distribution models have changed. Desktop
clients are being changed to cloud-based tools and mobile applications.
2
3
4

http://www.nytimes.com/1999/04/18/business/economic-view-is-mr-gates-pouringfuel-on-his-rivals-fire.html.
https://www.youtube.com/watch?v=8SkdfdXWYaI.
https://github.com/.

Software Development in the Post-PC Era

667

Computing infrastructure is now only acquired and scaled up/down as needed.
Along with this shift, pricing models have also changed from the desktop
client licence model to in-app purchases and pay-as-you-go models. Globalized
Development. As mentioned earlier, the Post-PC era is driving the development and operation to take place in the cloud. This has facilitated undertaking global software development projects. Software development outsourcing
helps reducing costs and development time, but also introduces management
challenges to overcome spatial, cultural and geographical distances in order to
ensure the quality of the product and eﬀective communication between development teams. Dissolving Boundaries. The Internet has made geographical
boundaries within or between companies disappear. In addition, team boundaries are also fading [7]. Design, development, testing and operations are no
longer isolated tasks. Trendy development paradigms such as DevOps calls for
tight collaboration and integration across these tasks.

4

Software Development in the Post-PC Era

The characteristics listed in the previous section aﬀect how software development
is going to be conducted in the near future and raises the need for new methods
and tools for software development. Here, we list a non-exhaustive list of requirements (derived from Sects. 2 and 3) for the next software development environments: Process Monitoring & Management. Regardless of which process
model you use, the need for process visualization, monitoring the process status
and detecting/predicting problems and deviations becomes vital. Considering
the outsourcing scenario in Sect. 2, visual models of the process would ease communication and understanding between distributed teams. Process monitoring
and status checking would help project managers to identify bottlenecks in the
process; Tools as a Service. The process models contain the tools needed to
support the process. To achieve executability of models, the required tools should
be available as a service over the Internet. While some tools can be automated,
others can be interactive. Interactive tools should provide interaction patterns
over the Internet. Consistency of tool versions used by distributed teams for
development and production is vital. As Humble and Farley [9] demonstrate
(using their experience from real-world projects), using diﬀerent versions of the
same package by collaborating teams could create very costly problems; Provenance, Governance & SLA monitoring. Software development is a humancentric process and when the involved humans are distributed within the same
or across diﬀerent companies, eﬀective management becomes essential. As mentioned earlier, process monitoring and consistency checks are important, but they
are not enough. Data about the process, its enactment environment, the tools
used, the stakeholders involved and the artefacts produced/consumed should
be logged. Such data can be useful for process improvement and accountability. Moreover, when multiple companies are involved in a project, the processes
followed by both parties should comply with the agreed SLA. Therefore, there
is a need for SLA monitoring to assist the management of such collaborative

668

S. Alajrami et al.

projects and ensure all parties are compliant; Artefacts Management. Artefacts are tightly related to the previous needs and process models are artefacts
themselves. Therefore, artefacts should be managed and stored eﬀectively. They
should be accessible from anywhere and available at any time. Changes made
to them should be tracked and diﬀerent versions of an evolving artefact should
be kept; Automation. The question about how much automation one can have
in a software process is important. The answer is indeed, a limited portion.
However, automation when possible is beneﬁcial. Repetitive tasks such as the
build-test-deploy-release cycle are error-prone and their automation can prevent
errors and save time. Non-interactive tasks (e.g. testing or model checking) can
be automated. Furthermore, automated background service can be run to check
consistency and compliance and monitor SLAs.

5

Reference Architecture for Enabling SDaaS

Aggregating the previous needs leads to Software Development as a Service
(SDaaS). SDaaS provides tools for modelling, enacting and managing software
processes. It enables orchestrating tools on the ﬂy as services and manage and
store artefacts in the cloud. In addition, it enables utilizing the scalable cloud
resources to run automated processes and meet the needs of computing-intensive
tasks (e.g. code analysis and testing). In this section, we propose a reference
architecture for SDaaS. The architecture is model-driven where processes are
modelled and enacted as workﬂows.
The architecture complies with the Workﬂow Management Coalition
(WfMC) reference model [8] and is designed as a service. It consists of three
main components: (a) The modelling and management interface is oﬀered
as Software as a Service (SaaS) and allows distributed teams to access, model,
enact and manage processes. (b) The enactment service is oﬀered as a Platform as a Service (PaaS) and handles the instantiation, enactment and monitoring of process models. And (c) Workflow Engines are deployed in a set of
hybrid clouds and enact the individual workﬂow tasks/activities.
5.1

Process Modelling (Build Time)

Software processes consist of a set of diﬀerent types of (e.g. interactive or automated) activities, which are to be enacted by diﬀerent stakeholders with different enactment requirements (e.g. privacy, computing power). These process
details need to be captured. Software & Systems Process Engineering Metamodel (SPEM2.0) [11] is the Object Management Group (OMG) standard for
modelling software processes. SPEM2.0 lacks explicit support for expressing
cloud-based process enactment and control ﬂow semantics. Consequently, we
proposed EXE-SPEM [2] which is an extension of SPEM2.0 for cloud-based
enactment. Software process models modelled in EXE-SPEM can be mapped to
an executable XML notation.

Software Development in the Post-PC Era

669

Figure 2 shows the software process build time components which are packed
as a SaaS solution. The Model Authoring module allows constructing process
models using EXE-SPEM constructs. The Access & Sync. Service applies
access management policies and ensures the consistency of models that are being
authored by distributed teams simultaneously. This module also notiﬁes collaborators when a model is changed/updated. Once the model is authored, the
Model Storage Service allows saving/retrieving the model into the cloudbased repository through the enactment service API. Finally, models can be
transformed into the executable XML notation from EXE-SPEM using the
Model Transformations module.

Fig. 2. Detailed architecture for the Software Development as a Service (SDaaS) platform.

5.2

The Enactment Service (Runtime)

The enactment service has an API to interact with the process modelling service. This way, modelling can be done from SaaS or a plug-in for a legacy desktop client. Behind the API, the service is responsible for the runtime instantiation and execution of process models. To do this, the service consists of several
modules as illustrated in Fig. 2. These modules are: The REST API provides endpoints for process enactment and monitoring and artefacts storage and
retrieval; The Artefacts Manager stores the artefacts and meta-data about
them into the artefacts repository. Software processes involve producing large
number of artefacts such as: code, models and documentation. These artefacts
capture invaluable information about both the software process and product evolution. The artefact meta-data includes: actors involved, version, tools used and
the date and time the artefact was created/modiﬁed; The External Tools are
service blocks performing the process activities. These blocks are either: interactive, control points (providing control ﬂow during the process execution) or
automated ﬁre-and-forget activities. This module provides the necessary information on these activities when needed for process execution; The Execution

670

S. Alajrami et al.

Manager orchestrates the enactment of process models. First, an instance of
the model is created and the ready-to-execute activities are passed to the scheduler. The scheduled activities are then executed on workﬂow engines. During
the execution of the process, the execution manager tracks of the status of the
process instance being executed. This module also logs all the provenance data
about each process instance execution; The Workflow Engines Registry is
responsible for starting, stopping and monitoring workﬂow engines based on the
activities scheduling policies used by the scheduler. Workﬂow engines are independent applications running on diﬀerent cloud providers. Activities get executed in a workﬂow engine that is deployed on a public or private cloud. The
workﬂow engine has to meet the execution requirements expressed in the process
model. The execution of activities is a black-box execution which means that the
workﬂow engine would not know any information about the process being executed. This reduces the risks of privacy and conﬁdentiality breaches. In order to
decouple the enactment service from the workﬂow engines, asynchronous communication between them is achieved through message oriented middleware; The
Scheduler handles the planning of process execution. This involves checking
the needed resources (from the process model). The scheduler should operate
using a policy to meet the enactment requirements (e.g. enacting an activity
on a private cloud) while minimizing the cost. Several cloud-based workﬂow
scheduling algorithms exist and can be used (e.g. [1]). The schedules generated
by the scheduler determine the expected load of execution and is used by the
workﬂow engines registry to dynamically scale the number of workﬂow engines;
The Consistency Checker automatically checks the process consistency during its execution which can alleviate problems early and save time and cost (as
explained in Sect. 4). Discussion of consistency checking techniques is beyond the
scope of this paper; The SLA monitor transparently ensures that all parties
collaborating on a project are not breaching the agreed SLA (as explained in
the software outsourcing scenario in Sect. 2). While each organization can have
its own SDaaS environment, these environments can exchange data about the
process state and execution using the External Workﬂow Collaboration module;
Finally, The External Workflow Collaboration allows process execution to
incorporate invoking processes managed by another workﬂow system (e.g. from
a diﬀerent company).

6

Conclusion

The Post-PC era is here and software is embedded in almost every aspect of our
daily life. Software systems have evolved but the way they are conceived still
needs to be rethought to adapt to the new era’s challenges and to embrace its
technological breakthroughs.
In this paper, we have described the characteristics of the new era and its
impact on software development. We also proposed the SDaaS reference architecture for supporting software processes enactment. To become a reality, this
proposal requires tools to be oﬀered as services. We have developed a prototype

Software Development in the Post-PC Era

671

of the proposed architecture consisting of an enactment engine that executes
software processes, a number of oﬀ-the-shelf tools deployed as services in our
tool repository and an artefact store. The prototype was used to enact a safetyrelated process [3] and a number of veriﬁcation/modelling processes. Our ongoing
work focuses on implementing larger and more complex processes and evaluating the architecture proposed. In a longer run we aim at creating a community
of developers extending the architecture and applying it for the development of
complex software systems.
Additionally, Empirical studies are needed to study the eﬀects of this proposal
on the organizational, technical and economical aspects of software development
processes. Furthermore, the eﬀect on diﬀerent development process models (e.g.
Agile) also needs to be analysed and benchmarked. Usability studies can determine the eﬀects this approach may have on individual developers, managers and
other stakeholders. Indeed, as Fred Brooks put it, “There is no silver bullet” and
we can only eliminate accidental diﬃculties in software development. Inherent
diﬃculties will continue to exist as software and its development evolve [5].

References
1. Abramson, D., Lees, M., Krzhizhanovskaya, V., Dongarra, J., Sloot, P.M., Wang,
J., Korambath, P., Altintas, I., Davis, J., Crawl, D.: Workflow as a service in the
cloud: architecture and scheduling algorithms. Procedia Comput. Sci. 29, 546–556
(2014)
2. Alajrami, S., Gallina, B., Romanovsky, A.: EXE-SPEM: towards cloud-based executable software process models. In: Proceedings of the 4th International Conference on Model-Driven Engineering and Software Development, MODELWARD
2016, pp. 517–526 (2016)
3. Alajrami, S., Gallina, B., Sljivo, I., Romanovsky, A., Isberg, P.: Towards cloudbased enactment of safety-related processes. In: Skavhaug, A., Guiochet, J., Bitsch,
F. (eds.) SAFECOMP 2016. LNCS, vol. 9922, pp. 309–321. Springer, Heidelberg
(2016). doi:10.1007/978-3-319-45477-1 24
4. Ball, T., Burckhardt, S., de Halleux, J., Moskal, M., Tillmann, N.: Beyond open
source: the TouchDevelop cloud-based integrated development environment. Technical report MSR-TR-2014-127, Microsoft Research, September 2014
5. Brooks, F.P.: No silver bullet: essence and accidents of software engineering. IEEE
Comput. 20, 10–19 (1987)
6. Feitelson, D., Frachtenberg, E., Beck, K.: Development and deployment at facebook. IEEE Internet Comput. 17(4), 8–17 (2013)
7. Fuggetta, A., Di Nitto, E.: Software process. In: Proceedings of the on Future of
Software Engineering, pp. 1–12. FOSE, ACM (2014)
8. Hollingsworth, D.: Workflow Reference Model. No. TC00-1003, Workflow Management Coalition (WfMC), January 1995
9. Humble, J., Farley, D.: Continuous Delivery: Reliable Software Releases Through
Build, Test, and Deployment Automation, 1st edn. Addison-Wesley Professional,
Boston (2010)
10. Maximilien, E.M., Campos, P.: Facts, trends and challenges in modern software
development. Int. J. Agil. Extrem. Softw. Dev. 1(1), 1–5 (2012)
11. OMG: Software and Systems Process Engineering Meta-Model Specification, V2.0,
April 2008

Invited Papers

The Origins of Design Thinking and the Relevance
in Software Innovations
Matilde Bisballe Jensen ✉ , Federico Lozano, and Martin Steinert
(

)

Department of Engineering Design and Materials, NTNU, Richard Birkelandsvei 2B,
7491 Trondheim, Norway
{matilde.jensen,federico.lozano,martin.steinert}@ntnu.no

Abstract. This paper argues that the methods used in the trending buzzword
Design Thinking have deeper roots and bigger application potential, beyond
product development IDEO and the Stanford University d.school style. The
conscious combination of these Design Thinking methods and rapid iteration
sessions is also of value when deploying it to software development. It is a
powerful approach for requirement discovery and hence becomes relevant when
developing novel solutions. This argument is supported by the case of SAP
AppHaus and their experience on implementing the Design Thinking process for
HANA related software development. Here Design Thinking forces to holistically
explore a solution space with the customer, but also to bring diﬀerent internal
disciplines together early. Hence anybody who is interested in software innova‐
tions might want to consider the core ideas behind Design Thinking.
Keywords: Design thinking · Software development · SAP case study · Product
discovery

1

Introduction

This paper provides an introduction and justiﬁcation of the activities behind Design
Thinking. Although the “buzzword-factor” of Design Thinking has increased in recent
decades1 the methods applied origins from older academic research ﬁelds such as the
design research group of Stanford University, Social Science, and Mechanical Engi‐
neering. Hence it is more than a brief management trend praising post-its, pipe cleaners
and play dough, but in combination a powerful method for developing radical innova‐
tions. First we present the origins of the most domesticated Design Thinking Rules:
Empathize; Deﬁne; Ideate; Prototype and Test. Moreover, we provide the reader with
insights from Silicon Valley based SAP AppHaus and argue why any Software Inno‐
vation could beneﬁt from the Design Thinking process.

1

According Google Trends the number of google searches on Design Thinking is 10 times higher
today than 10 years ago.

© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 675–678, 2016.
DOI: 10.1007/978-3-319-49094-6_54

676

2

M.B. Jensen et al.

The Research Domains Behind Design Thinking

The d.school Bootcamp Bootleg describes 5 essential activities covering the process of
Design Thinking; Empathize, Deﬁne, Ideate, Prototype, Test. Below we brieﬂy describe
the origins of these activities.
Empathize and Deﬁne. In Design Thinking you are to empathize with not only the enduser, but also any stakeholder of you product. You are looking for relevant pain points
to address in your solution. One should be able to put him/her-self in the place of a
stakeholder and understand their motivations and frustrations. This demand for under‐
standing and empathy are traditional skills required in the ﬁeld of anthropology and
sociology. Actor Network Theory was introduced in the early 80’s and argues the value
of seeing problems in the context of actor-networks and identifying the misalignments
in this network (Callon 1984, 1986; Latour 1996). This is the basic idea behind need
ﬁnding and the holistic approaches taught in Design Thinking.
Ideate. Ideation originates from the ﬁeld of creativity research and the ability to
consciously focus on either divergent or convergent processes (Runco and Okuda
1988). Ideation covers developing several solutions to an initial challenge. It is not
enough to stop at the ﬁrst idea at hand, but instead one have to explore the solution space.
Onarheim and Biskjaer (2015) describe ideation as much more than just a eureka
moment. They argue that one of the most important skills when ideating are to be aware
of your creativity constraints that are surrounding the context and might you constrain
you.
Prototype and Test. Prototypes make ideas tangible. This changes the design dialogue
from abstract to concrete. Bringing in a physical object to talk about is the principle of
boundary objects introduced by sociologist Star and Griesemer (1989). They argue the
value of bringing stakeholders together around a boundary object opening up for view‐
points across disciplines and interests.
Moreover in Design Thinking prototyping are a strategy to learn. In this way it relates
to active learning or action research, where you built representations or small experi‐
ments to challenge and test context hypotheses (Berg 2004; Bonwell and Eison 1991;
Cameron 2009). Hence you get feedback on your ideas very early on in the project. The
process of testing speciﬁc ideas as hypotheses origins from the scientiﬁc method it self
with Karl Popper as the main spokes person (Popper 1959).

3

The Case of SAP AppHaus

Agreeably Agile Development and the Design Thinking process are similar. Both favor
user involvement, rapid prototyping and testing. Yet Design Thinking is for discovery
hence requirement deﬁning. Agile development methods take a starting point in coding
for semi-well-deﬁned requirements. Hence the two methods compliment each other Design Thinking ﬁtting the early stage of product discovery followed by Agile processes
when the right “it” has been deﬁned. To illustrate this claim, the case of AppHaus Silicon

The Origins of Design Thinking and the Relevance

677

Valley is being presented. The insights were kindly provided through a semi-structured
interview with Design & Innovation Executive Philipp Skogstad from SAP AppHaus.
Design Thinking for Product Discovery, Early Co-Collaboration and Business
Strategy Considerations
At SAP Design Thinking is a core nominator for the early stage product development.
The core values of implementing design thinking have been to allow rapid iterative
development involving the three-dimensional product core; customer, technology and
business. This allows much faster to reframe the actual problem of the customer. SAP’s
beneﬁt of Design Thinking is to gain the holistic overview of a problem. Hence a project
always starts with a discovery phase conducting research on customers, end-user as well
as current technological solutions. In order not to start coding immediately several future
interactions are physically prototyped through role-plays and paper wireframes. SAP
even developed the tool Scenes that allow a tangible dialogue among customers, devel‐
opers and designers. Through story-boards and scenes the dialogue reﬂect future
scenarios rather than detailed speciﬁcations2.
“One of the core values when implementing Design Thinking in our development process was
actually not only to bring in external stakeholders early on in the process, but actually to get
programmers and UI designers to meet earlier as well. This improves the understanding of
speciﬁc design choices and to what degree certain solutions are ﬂexible. This saves us a lot of
time.” Philipp Skogstad SAP

Involving customers early on in the development inﬂuenced the communication
topics as well. Helping the customers seeing the bigger context of their products trans‐
formed the dialogue from being product to strategy or underlying problem focused.
“First you might see yourself having a meeting with the people from product development. Next
time you ﬁnd your self being brought to business strategy meetings.” Philipp Skogstad, SAP

This indicates that Design Thinking is more than a process to develop innovative
products, but a mindset that allow you to explore and foresee uncertainties as well. This
is supported by recent scholars arguing that design thinking covers a much broader ﬁeld
than solely product development (Cooper et al. 2009).
Implementing Design Thinking
SAP made strategic eﬀorts to implement the exploring and testing mindset of Design
Thinking. Their core initiative covers the so-called three P’s: People, Process and Place.
People means making sure you are working in an interdisciplinary environment. Process
means having a conscious knowledge on which methods to use when and being able to
switch between methods taken from Design Thinking as well as Agile Development
processes. In SAP Design Thinking is hence not something to stand alone, but it comple‐
mented supported by process such as Agile Development. Place covers having a phys‐
ical workspace that creates and atmosphere of openness to idea sharing, experimenting
and prototyping. At SAP they aimed at creating an art studio rather than a gallery.
2

https://experience.sap.com/designservices/scenes.

678

M.B. Jensen et al.

“Many companies fail at creating such a space and end up with fancy chairs etc. like an art
gallery. We seek to create the feeling of an art studio where you are allowed to test, experiment
and get things dirty,” Philipp Skogstad, SAP

4

Conclusion

In this paper we describe the original research ﬁelds behind the methods applied in the
Design Thinking Process in order to remove the term from a brief buzzword to a credible
innovation method. We argue that though these methods are not new, the favor of rapid
iteration and stakeholder involvement supports the exploratory mindset any innovator
needs in the early stages of product development. They are powerful in deﬁning the
product requirement that in term form the foundation for applying agile development
methods. This description is supported by the case of Silicon Valley based AppHous
that in their daily work successfully apply Design Thinking methods for product
discovery and agile method for test and implementation.

References
Berg, B.: Qualitative Research Methods for the Social Sciences, 4th edn. (2004)
Bonwell, C.C., Eison, J.A.: Active Learning: Creating Excitement in the Classroom. School of
Education and Human Development, Washington, DC (1991)
Callon, M.: Some elements of a sociology of translation: domestication of the scallops and the
ﬁshermen of St Brieuc Bay. Sociol. Rev. 32(S1), 196–233 (1984)
Callon, M.: The Sociology of an actor-network: the case of the electric vehicle. In: Callon, M.,
Law, J., Rip, A. (eds.) Mapping the Dynamics of Science and Technology: Sociology of
Science in the Real World, pp. 19–34 (1986)
Cameron, M.: Review Essays: Donald A. Schon, The Reﬂective Practitioner: How Professionals
Think in Action. Basic Books, New York (1983, 2009)
Cooper, R., Junginger, S., Lockwood, T.: Design thinking and design management: a research
and practice perspective. Des. Manage. Rev. 20(2), 46–55 (2009)
Latour, B.: On actor-network theory: a few clariﬁcations. Soziale Welt 47, 369–381 (1996)
Onarheim, B., & Biskjaer, M.: Balancing Constraints and the Sweet Spot as Coming Topics for
Creativity Research. Creativity in Design: Understanding, Capturing, Supporting 1, 1–18
(2015)
Popper, K.: The Logic of Scientiﬁc Discovery (1959)
Runco, M.A., Okuda, S.M.: Problem discovery, divergent thinking, and the creative process. J.
Youth Adolesc. 17(3), 211–220 (1988)
Star, S.L., Griesemer, J.R.: Institutional ecology, ‘translations’ and boundary objects: amateurs
and professionals in Berkeley’s museum of vertebrate zoology, 1907–1939. Soc. Study Sci.
19, 387–420 (1989)

Playing Protection Poker for Practical
Software Security
Martin Gilje Jaatun(B) and Inger Anne Tøndel
Department of Software Engineering, Safety and Security,
SINTEF ICT, 7465 Trondheim, Norway
martin.g.jaatun@sintef.no
http://www.sintef.no/sos-agile

Abstract. Software security is about creating software that keeps performing as intended even when exposed to an active attacker. Secure
software engineering is thus relevant for all software, not only security
software. We describe Protection Poker, a tool for risk estimation to be
used as part of the iteration planning meeting, and discuss some preliminary experiences.

1

Introduction

Protection Poker is a security risk assessment technique for agile development
teams proposed by professor Laurie Williams and colleagues at NCSU [1]. The
idea is to play Protection Poker as part of every iteration1 planning meeting,
in order to rank the security risk of each feature to be implemented in that
iteration, and possibly identify additional security mechanisms that have to be
implemented to maintain an acceptable risk level.
1.1

Risk in Protection Poker

Protection Poker uses a slight variation of the traditional computation of risk:

risk = (
value of assets that could be exploited) × (the exposure)
(1)
Risk is always related to the requirements that are to be implemented in the
next iteration, often this will be some new, enhanced or corrected functionality.
Exposure relates to how hard or easy this change in functionality makes it to
attack the system, and in the evaluation of exposure, one should consider the
possible ways in which attackers can attack the system (attack surface), what
type of breaches they can perform (conﬁdentiality, integrity, availability) and
the skill level required. For asset value, the value of the asset for various groups
should be considered: the value of the asset for an attacker is important for
attacker motivation, whereas the value of the asset for customers, users, the
business, etc. highly determines the consequences that a successful attack may
1

e.g., a “sprint” in Scrum.

c Springer International Publishing AG 2016

P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 679–682, 2016.
DOI: 10.1007/978-3-319-49094-6 55

680

M.G. Jaatun and I.A. Tøndel

have. Assets are typically considered to be database tables or system processes
that the new functionality controls.
Our dialect of Protection Poker2 uses the following numbers to determine
asset value or system exposure: <10, 20, 30, 40, 50, 60, 70, 80, 90, 100. In
calculations,“<10” is counted as 10. To be able to prioritize between diﬀerent
requirements, it is important to be able to get a spread in the numbers assigned.
This is to avoid that, e.g., high risk projects rate every requirement with a high
number. That would make it very hard to prioritize within the project.
1.2

Calibration

With Protection Poker, the security risk of a requirement is compared to other
requirements of the same system. The goal is not to establish a “perfect” and
“universal” risk value, but to rate the security risk of the requirements in order
to be able to better prioritize security eﬀort. Before starting to play Protection
Poker for a system, it is thus recommended to perform calibration in order to
arrive at a common understanding of the end-points of the scale, i.e., what does
a <10 or a 100 mean for this product? This is done the following way:
Asset value: The team asks itself: what assets are most important in this
system, and what assets are of little value. The asset they can think of as
most important is given a ‘100’ and the asset they can think of with little
value is given a ‘<10’.
Exposure: The team asks itself: what types of functional requirements can open
up most for attacks, and which functional requirements can limit exposure,
and assign a ‘100’ and a ‘<10’ accordingly.
In the evaluation of asset value and exposure, numbers should be assigned relative to these endpoints, as well as the values assigned for previously assessed
requirements.

2

Playing the Game

Protection Poker is played during an iteration planning meeting, and it is recommended that the full development team participates. One person should have
the role as moderator, and this person will be responsible for leading the team
through the game, and point the discussions in a good direction. Ideally, a separate person should be tasked with recording important security solutions and
ideas that emerge during play. Focus is on the speciﬁc requirements the team
will likely implement during the next iteration.
Step 1 – Common understanding of the requirements: The requirements to be
implemented in the iteration are explained to the team (e.g., by the product
manager or business owner) and the team members discuss or ask questions
to clarify the requirements.
2

http://www.sintef.no/protection-poker.

Playing Protection Poker for Practical Software Security

681

Step 2 – Initial discussion of security implications: The team performs a ﬁrst
discussion of the security implications of the requirements. The moderator
can ask leading questions, e.g., “Who would want to attack this system?”;
“What would an attacker do if he got access to this data?”; or “What damage
could an insider do with this functionality?”
Step 3 – Identify assets: Everybody together identify which assets are created
or touched upon by the requirement under consideration. Some of these may
have already been assigned a value, and then this value can be reused.
Step 4 – Assign value to assets: For identiﬁed assets that have not previously
been assigned a value, each participant picks the Protection Poker card (individually and without telling anybody about which card has been picked) that
best describes their understanding of the asset’s value. All participants show
their selected card to the whole team, and the team discusses the rationale
for selecting the cards; the team members with the highest and lowest cards
explain them to the group, followed by an open discussion until the team is
ready to revote (if there is disagreement). When the team has reached a consensus on the asset value (or when there is no use in discussing any further –
in these cases the moderator is responsible for making a suggestion for what
value to assign to the asset), the moderator notes the asset value. The team
now moves on to the next asset (if there are more left to assess) or to the
exposure evaluation.
Step 5 – Evaluate exposure: As for asset value, the team bids to evaluate to
which extent the requirement increases the exposure of the system and assets
to attack.
Step 6 – Calculate risk: The numbers assigned for asset values and exposure
are used to calculate a risk value as given in Eq. 1.
Step 7 – Compare risk related to other requirements: The risk value for the
requirement can be compared to the risk value of other requirements to see
for which requirements security should be given speciﬁc priority.
Step 8 – Prioritize security activities: Based on the risk value and the discussion
decision should be made on how to address security for this requirement.
The decision should be documented. If there is a need for speciﬁc security
activities or functionalities, these should be documented together with other
requirements (e.g. in the backlog).

3

Experiences and Challenges

We made some small adjustments to the Protection Poker cards and terminology.
Whereas the original Protection Poker uses the term “ease of exploitation”, we
found that this concept was distracting or not properly understood by some
pilot players, e.g., leading them to focus too much on threats such as “shoulder
surﬁng”. In order to focus more on how a feature increases the attack surface of
an application, we decided to change it to “exposure”.
The original Protection Poker3 uses the same cards as Planning Poker [2,3]
(also known as Scrum Poker), used for eﬀort estimation in agile teams. Planning
3

http://collaboration.csc.ncsu.edu/laurie/Security/ProtectionPoker/.

682

M.G. Jaatun and I.A. Tøndel

Poker cards follow a Fibonacci-like sequence, after the rationale that it is easier
to have an opinion on whether a task takes 1 or 2 days than whether it takes 40
or 41 days. We argue, however that the same is not true when it comes to relative
value of assets or degree of exposure, and since we are less concerned about small
risks and more interested in the bigger risks, we opted for an even scale instead.
This enables us to diﬀerentiate between big risks, not just the small ones.
We have tried out Protection Poker with representatives from various
Norwegian organizations and in general it has been well received. However,
some have indicated that they feel it takes too long to play the game, especially
when considering that planning meetings already are perceived as being too full.
Laurie Williams [1] found that the time required for playing dropped signiﬁcantly after the team gained familiarity with the technique, but we need more
experience with our partner companies to determine whether that will also be
the case here. We remain open to the possibility of changing how and when
Protection Poker is played in order to maximise the beneﬁt.
For some development groups, we have observed that asset identiﬁcation can
be diﬃcult, and particularly the granularity of assets can be challenging. It is
important that at least within a development team, the assets have a consistent
granularity, as this otherwise might skew the risk calculations.
Acknowledgment. This work was supported by the SoS-Agile: Science of Security in
Agile Software Development project, funded by the Research Council of Norway, grant
number 247678.

References
1. Williams, L., Meneely, A., Shipley, G.: Protection poker: the new software security
game. IEEE Secur. Priv. 8(3), 14–20 (2010)
2. Grenning, J.: Planning poker or how to avoid analysis paralysis while release planning. Hawthorn Woods: Renaissance Softw. Consult. 3, 1–3 (2002)
3. Moløkken-Østvold, K., Haugen, N.C., Benestad, H.C.: Using planning poker for
combining expert estimates in software projects. J. Syst. Softw. 81(12), 2106–2117
(2008). Best papers from the 2007 Australian Software Engineering Conference
(ASWEC 2007), Melbourne, Australia, 10–13 April 2007

Exploring Expectations About Risk-Based Testing:
Towards Increasing Eﬀectiveness and Eﬃciency
Michael Felderer1 ✉ and Rudolf Ramler2
(

)

1

2

Institute of Computer Science, University of Innsbruck, Technikerstrasse 21a,
6020 Innsbruck, Austria
michael.felderer@uibk.ac.at
Software Competence Center Hagenberg GmbH, Softwarepark 21, 4232 Hagenberg, Austria
rudolf.ramler@scch.at

Abstract. Risk-based testing is sometimes reduced to an approach that focuses
on cutting costs and time in testing. While the high eﬀort involved in testing makes
eﬃciency an important issue, for many companies the main concern is still to
ﬁnd the critical defects in their software products. Such defects can cause costly
remedial upgrades and ﬁxes, and they threaten the company’s long-term business
success. In this paper we explore how the two goals “eﬀectiveness” and “eﬃ‐
ciency” motivate a risk-based testing approach in diﬀerent organizations. Further‐
more, we identify a third goal, summarized as “management support”. In a survey
conducted as part of a tutorial on risk-based testing we investigated common
expectations and potential beneﬁts associated with these three goals. The results
indicate that the main motivation for a risk-based approach is making testing more
eﬃcient. Nevertheless, eﬃciency and eﬀectiveness are not conﬂicting goals and
the main challenge is therefore ﬁnding strategies that increase the overall beneﬁt
of including risk information in testing.
Keywords: Test management · Software risk management · Software testing ·
Risk-based testing · Test process improvement · Eﬀectiveness · Eﬃciency

1

Introduction

Software testing is an essential and widely practiced measure for assuring software
quality. By accounting for up to 50 % of the overall project eﬀort, testing is also a highly
costly and time-intensive activity in software development. Hence, an adequate test
strategy plays a key role in balancing product quality with cost and time-to-market.
Ideally this balance is achieved by taking the risks into account, which are associated
with the consequences of poor quality caused by software defects. A systematic consid‐
eration of the involved risks is suggested by risk-based testing [1].
However, risk-based testing is sometimes (mis-)understood as an approach that
focuses primarily on minimizing costs and time – according to the pun “cut testing and
take the risk”. The high eﬀort involved in software testing makes eﬃciency an important
management goal. Although important, for many companies the main concern is still to
miss critical bugs. Critical bugs can signiﬁcantly delay time-to-market, result in costly
© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 683–688, 2016.
DOI: 10.1007/978-3-319-49094-6_56

684

M. Felderer and R. Ramler

hotﬁxes, and threaten the acceptance of software products and services by customers on
the long run. In such a case, eﬀectiveness is often the primary goal in software testing.
It has to be balanced with short-term economic constraints such as limited time and or
resources available for software testing.
The need to balance eﬀectiveness and eﬃciency requires adequate management
support. Providing management supports is therefore a further, commonly observed goal
of risk-based testing. It is also found in contexts where testing is determined by the need
to fulﬁll industry standards and organizational regulations.
In this paper we explore these main goals associated with risk-based software testing.
Section 2 provides an overview of the relevant concepts and links them to ﬁndings from
related work. Section 3 shows the results from a survey conducted as part of a previous
tutorial held at the Software Quality Days 20161. The paper is concluded by a summary
and discussion in Sect. 4.

2

Background

Risk-based testing is a testing approach which considers risks of the software product
as the guiding factor to support decisions in all phases of the test process [2]. In previous
studies we investigated the potential of risk-based testing in large enterprises [3] as well
as in small and medium enterprises [4]. We also studied the introduction of risk-based
testing in an organization [5], where an essential ﬁrst step is to establish a risk-based
test strategy with clear goals and expectation for all testing activities. Among other
research questions we explored “What is the beneﬁt and improvement potential that
defect prediction can provide for software testing?” in these diﬀerent contexts. Our
ﬁndings can be summarized as follows.
The general motivation is that information about fault-prone modules, i.e., those that
have a high risk of causing critical failures, allows focusing the testing eﬀort on selected
parts of the software system instead of testing the entire system with the same rigor
(Fig. 1). The associated improvement potential is based on the observation that the
majority of the faults (usually approximated as “80 % of the defects”) comes from a
relatively small amount of the code (usually approximated as “20 % of the modules”)
and that “about half the modules are defect free” [6, 7].
2.1 Eﬀectiveness
One of the resulting beneﬁts of risk-based testing is its ability to increase the eﬀective‐
ness of testing, which can be deﬁned as the degree to which testing is able to detect all
defects in the system under test (e.g., defects actually detected per total defects). Testing
is an investment in the quality of a software product [8, 9]. Even though resources are
limited in general, it is important to achieve quality requirements such as functional
correctness, reliability, performance etc.

1

https://2016.software-quality-days.com/en/.

Exploring Expectations About Risk-Based Testing

685

Fig. 1. Increasing eﬀectiveness and eﬃciency with risk-based software testing.

A risk-based approach can help to make testing more eﬀective by including infor‐
mation about high-risk components in test planning from the very beginning. Directing
the main testing eﬀort to high-risk components to pursue a thorough, systematic testing
approach (1) increases the likelihood that the existing defects are detected. Besides an
improvement of the product’s quality, (2) setting priorities based on risk considerations
fosters the detection of critical defects ﬁrst. Risk-based prioritization also ensures that
(3) defects are found in the early iterations of testing.
In our study on risk-based testing in industry [3] we found that the investigated
companies use risk information to further increase the range of testing with additional
risk-based test cases. These test cases are added to the existing set of tests that were
systematically derived from the requirements in order to boost the chance of detecting
additional defects. It is expected that fewer defects will slip through to the ﬁeld.
2.2 Eﬃciency
Another potential beneﬁt of risk-based testing is its ability to increase the eﬃciency of
testing, which can be deﬁned as ratio between output and input (e.g., detected defects
per time spent). Time and manpower are constrained resources in the development of
software systems. This also aﬀects the resource allocation in software testing, which has
to cope with the fundamental challenge that exhaustive testing of a software system is
not possible in a realistic timeframe [10]. Therefore, it is sensible to allocate (prioritize)
the available resources in a way so that most output can be generated.
Most organizations use testing as a means to reduce the risk of delivering software
with critical bugs. However, a harsh competitive environment can imply severe time
and resource constraints that outweigh quality risks. In the struggle to meet strict release
deadlines or to maintain short-term business opportunities, companies may be forced to
(1) reduce testing time and (2) cutback resources allocated for testing. In this context,

686

M. Felderer and R. Ramler

the understanding of risks is typically used to make testing more eﬃcient, i.e., to adjust
the amount of testing to optimally use the available time and resources for covering at
least the most critical parts.
Small and medium enterprises (SME) seem to be most aﬀected by time and resource
constraints. In our study on risk orientation in software testing processes of SME [4] we
observed cases where risk information has been used to increase test eﬃciency, i.e., to
adjust the amount of testing to reduce cost and time. Larger organizations seem to tackle
time and resource constraints by using risk information to increases the chance of ﬁnding
critical defects in the early iterations of testing and, in consequence, to reduce the overall
costs and time required for stabilization [3].
2.3 Management Support
Finally, we also found that companies beneﬁt from the risk information used in riskbased testing for supporting management in decision making and in pursuing process
improvement initiatives. A risk-based approach helps to make the balance between cost
and quality transparent and easier to communicate. These beneﬁts were observable in
small, agile organizations [4] as well as in large enterprises [3] that have to cope with
the fulﬁllment of industry standards and organizational regulations.
One example is the use of risk-burndown carts [5]. Traceability between the executed
test cases, the test results and the risk items enables reliable release quality statements
as well as the estimation of residual risks. In risk-based evaluation and reporting, risk
burn-down charts, which illustrate the development of risk exposure for a system or
speciﬁc artifacts over time, are a suitable measure for release quality and residual risk
estimation. The risk that is estimated in the beginning is reduced with every test cycle
until a risk level is reached where it is acceptable to release the software application.
This method based on risk burn-down charts proposed in the studied project has been
found intuitively accessible and suitable for supporting release decisions by project
management.

3

Preliminary Survey Results

A list of commonly observed expectations and potential beneﬁts that motivate the adop‐
tion of a risk-based testing approach has been collected from previous projects (e.g. [5])
as well as related literature (e.g., [11, 12]). In a survey conducted as part of a tutorial
held at the Software Quality Days 2016, one of the biggest industry-academia conven‐
tions on software quality in Europe, we asked the participants about their motivation for
risk-based testing in their projects or products.
The survey showed 20 diﬀerent motivations for risk-based testing. Each participant
was asked to select all motivations considered relevant in context of his or her work.
The participants’ main roles were software tester, test manager, project manager, quality
manager, team lead or head of QA, software architect and developer with practical
experience in software testing ranging from 1 to more than 10 years; majority in the
range of 3 to 5 years. We received answers from 23 participants (N = 23).

Exploring Expectations About Risk-Based Testing

687

Figure 2 shows the results sorted by the number of times a motivation has been
selected by a survey participant. Motivations related to eﬀectiveness are shown in blue,
motivations related to eﬃciency are shown in orange, and motivations related to organ‐
izational and management support are shown in green color.
0

5

10

15

20

25

Finding critical defects
Finding defects early
Reducing test cycle times
Reducing test costs
Reducing the resources needed for testing
Focus on the essential things
Systematic approach for taking decisions
Realistic estimates of the current situation
Increase awareness about quality in team
Fast feedback
Increase the defect finding rate
Clear communication of status
Increase the range of defects found
Approach following objective criteria
Adhere to a defined approach
Find differet types of defects
Better fulfillment of user expectations
Increase the possiblity to find all defects
Other
Try something new
Support for budget negotiation

Fig. 2. Number of times a motivation has been selected by a survey participants (N = 23).
Motivations are related to eﬀectiveness (blue), eﬃciency (orange) or management support (green).
(Color ﬁgure online)

4

Conclusions and Future Work

Despite its preliminary character, the survey indicates that the main motivation for
introducing or conducting risk-based testing is making testing more eﬃcient. Weighted
over all possible motivations selected by the participants, 48 % can be related to eﬃciency
aspects, 29 % to eﬀectivity, and 23 % to management support. The emphasis on eﬃciency
is also visible from the ranking of the motivating factors. It is, however, worth noting
that the top most motivation selected by 22 out of 23 participants is “ﬁnding critical

688

M. Felderer and R. Ramler

defects”, which is remarkable since we consider this motivation to be associated more
with eﬀectiveness than with eﬃciency.
Eﬃciency and eﬀectiveness are not conﬂicting objectives in risk-based testing. From
a research perspective, thus, the main question for future work is to explore which strat‐
egies that maximize the overall beneﬁt of testing with risk information.
Acknowledgements. This work has been supported by the COMET Competence Center
program of the Austrian Research Promotion Agency (FFG), and the project QE LaB – Living
Models for Open Systems funded by the Austrian Federal Ministry of Science, Research and
Economy.

References
1. Ramler, R., Felderer, M.: A process for risk-based test strategy development and its industrial
evaluation. In: Abrahamsson, P., Corral, L., Oivo, M., Russo, B. (eds.) PROFES 2015. LNCS,
vol. 9459, pp. 355–371. Springer, Heidelberg (2015). doi:10.1007/978-3-319-26844-6_26
2. Felderer, M., Schieferdecker, I.: A taxonomy of risk-based testing. Int. J. Softw. Tools
Technol. Transf. 16(5), 559–568 (2014). Springer
3. Felderer, M., Ramler, R.: A multiple case study on risk-based testing in industry. Int. J. Softw.
Tools Technol. Transf. 16(5), 609–625 (2014). Springer
4. Felderer, M., Ramler, R.: Risk orientation in software testing processes of small and medium
enterprises: an exploratory and comparative study. Softw. Q. J. 24(3), 519–548 (2016).
Springer
5. Felderer, M., Ramler, R.: Integrating risk-based testing in industrial test processes. Softw. Q.
J. 22(3), 543–575 (2014). Springer
6. Boehm, B., Basili, V.R.: Software defect reduction top 10 list. IEEE Comput. 34(1), 135–137
(2001). IEEE
7. Shull, F., Basili, V.R., Boehm, B., Brown, A.W., Costa, P., Lindvall, M., Port, D., Rus, I.,
Tesoriero, R., Zelkowitz, M.: What we have learned about ﬁghting defects. In: 8th Symposium
on Software Metrics, METRICS 2002, p. 249. IEEE (2002)
8. Huang, L.G., Boehm, B.: How much software quality investment is enough: a value-based
approach. IEEE Softw. 23(5), 88–95 (2006)
9. Ramler, R., Biﬄ, S., Grünbacher, P.: Value-based management of software testing. In: ValueBased Software Engineering, pp. 225–244. Springer, Heidelberg (2006)
10. Ramler, R., Wolfmaier, K.: Economic perspectives in test automation: balancing automated
and manual testing with opportunity cost. In: International Workshop on Automation of
Software Test, AST 2006, pp. 85–91. ACM (2006)
11. Amland, S.: Risk-based testing: risk analysis fundamentals and metrics for software testing
including a ﬁnancial application case study. J. Syst. Softw. 53(3), 287–295 (2000)
12. Redmill, F.: Theory and practice of risk-based testing. Softw. Test. Veriﬁcation Reliab. 15(1),
3–20 (2005). Wiley

2nd International Workshop on Human
Factors in Software Development
Processes

Human Factors in Software Development
Processes: Measuring System Quality
Silvia Abrahao1, Maria Teresa Baldassarre2, Danilo Caivano2, Yvonne Dittrich3,
( )
Rosa Lanzilotti2, and Antonio Piccinno2 ✉
1

Universidad Politecnica de Valencia (UPV), Valencia, Spain
sabrahao@dsic.upv.es
2
Università degli Studi di Bari Aldo Moro, Bari, Italy
{mariateresa.baldassarre,danilo.caivano,rosa.lanzilotti,
antonio.piccinno}@uniba.it
3
IT University of Copenhagen, Copenhagen, Denmark
ydi@itu.dk

Abstract. Software Engineering and Human-Computer Interaction look at the
development process from diﬀerent perspectives. They apparently use very
diﬀerent approaches, are inspired by diﬀerent principles and address diﬀerent
needs. But, they deﬁnitively have the same goal: develop high quality software
in the most eﬀective way. The second edition of the workshop puts particular
attention on eﬀorts of the two communities in enhancing system quality. The
research question discussed is: who, what, where, when, why, and how should
we evaluate?
Keywords: Human Computer Interaction · Software Engineering · Human
factors · Development process

1

Introduction and Motivation

Software Engineering (SE) and Human-Computer Interaction (HCI) look at develop‐
ment processes of interactive software systems from diﬀerent perspectives. Eﬀorts to
reduce the gap between the two communities for what concerns the introduction of
human factors in software development processes have started to be discussed in the
ﬁrst edition of the workshop, held in Bolzano in 2015 [1]. One aspect that emerged from
the discussion pointed out concerns on how system quality should be measured, in order
to satisfy both communities. Indeed, the software product industry emphasizes how
important it is to involve users and customers to evaluate quality in terms of functionality
and usability of software products [2].
Although researchers and practitioners from the two communities share the same
goal of developing high quality systems, the methodologies, methods and metrics they
use to evaluate such quality are very diﬀerent due to their background and expertise.
The second edition of the workshop on Human Factors in Software Development
Processes aims at providing a forum for discussing measuring system quality from both
perspectives. In particular, the following research issues are addressed:
© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 691–696, 2016.
DOI: 10.1007/978-3-319-49094-6_57

692

S. Abrahao et al.

• key methods that allow to integrate human factors in the evaluation of the software
quality;
• methodologies and techniques currently used in software development teams to
engage users in the evaluation process;
• how the level of human factor involvement can be objectively veriﬁed during and
after software development;
• how Software Engineering and HCI researchers and practitioners can overcome the
communication gap when evaluating system quality.
Researchers and practitioners who face the problem of integrating human factors in
software quality evaluation should have a place to discuss their experiences, lessons
learned and future intentions to reach a common understanding on evaluation topics.

2

Filling the GAP Between SE and HCI

SE and HCI apparently use very diﬀerent approaches, are inspired by diﬀerent principles
and address diﬀerent needs. Ultimately though they have the same goal: developing high
quality software in the most eﬀective way.
Based on the discussions of the previous edition and of the contributions received,
the authors of this paper have classiﬁed some gaps between the two communities that
can be seen as two sides of the same medal. In the following, we illustrate and discuss:
(i) the main diﬀerences between SE and HCI approaches adopted, (ii) categorize the
common wisdoms and (iii) explore possible ways to reduce the gap and converge.
2.1 The Diﬀerences
2.1.1 User vs. Market Oriented Systems
One of the most popular claims in the two communities is that they address diﬀerent
types of software products.
The HCI products are User Oriented Systems where typically there is some type of
“users” (user, lead user, customer and so on) to refer to during the development. Here
the source of requirements (functional and non functional) is primarily the “user” himself
who is actively involved in the design, review and validation, i.e. ingrained in the entire
development process right up until delivery. The systems are very focused on speciﬁc
domains, they oﬀer the functions that are specialized on users’ needs.
The SE products are Market Oriented Systems that address a wide range of needs
and are used by tons of users that diﬀer for functions used, language, culture, ability,
competences and skills. Here the source of requirements are the laws, domain and busi‐
ness rules, books, the already existing legacy systems, competitors and, in general, the
so called stakeholders. The “users” are rarely involved and typically a “customer”
doesn’t exit because the developed product is addressed to an entire market segment
with hopefully hundreds of customers. The systems are so big that there isn’t a single
“user” that knows all the requirements to develop and thus the “users” are simply not
essential.

Human Factors in Software Development Processes

693

2.1.2 Vertical vs. Layered or Horizontal Architecture
An assertion emerged in the previous workshop is that the diﬀerences between User and
Market Oriented product imply the use of diﬀerent software architectures. The HCI
architectures were perceived as “vertical” in that they address speciﬁc needs of speciﬁc
users from speciﬁc domains. Due to these characteristics the system development starts
from interface until database without particular attention to maintainability and reuse.
The resulting systems are generally characterized by a high coupling between data
functions and interfaces and a possible low internal cohesion.
The SE architectures are usually layered architectures inspired by principles such as
modularization, separation of interest, information hiding, reuse etc. and the resulting
systems are assumed to be maintainable and robust. The interface is only one of the system
layers and it is often considered less important than others such as the data layer or busi‐
ness layer. Here the belief is that the obtained systems are so maintainable that their inter‐
faces can be adjusted or completely changed in a short time and fashionable way without
particular problems. Common architecture styles are Software Product Lines, Enterprise
Architectures, Service Oriented Architectures etc. A software system is naturally perceived
along the horizontal dimension because it covers a wide range of domains, from business
to technical. A typical example is an Enterprise Resource Planning.
2.1.3 User Time - Ex Ante vs. Ex Post
A crucial point in the SE-HCI debate is the decision on when the users should be
involved. The HCI philosophy is “as soon as possible” while the SE face states that the
“users are rarely involved”. This is the result of a numbers of convictions; for HCI: the
interface and usability are crucial for system success and the nature and type of interface
strongly inﬂuences the architecture style; The interface cannot be simply pushed into a
system right after being developed; The source of requirements is also the “user” and
thus drives system development from the start (User Driven Development) [7].
SE: in this community a Process/Product Driven Development is adopted. The
source of requirements “can also” be the user but he/she is generally not considered
particularly important, rather time consuming and misleading. If the system is developed
according to the SE processes and principles the user interface can be easily pushed into
the system at the end of development when the critical and most important layers are
completed. The ﬁnal users can be involved before software delivery to carry out pilot
studies aimed at validating user satisfaction and product correctness. All the suggestions
coming from users are then quickly and eﬀectively incorporated.
2.2 Common Wisdom
Following to discussions of the previous edition we have outlined some “common
wisdom” points spread within the communities, just to mention a few:
– HCI systems are more usable than SE ones;
– HCI systems are less maintainable than SE ones
– HCI systems are less performing than SE ones

694

S. Abrahao et al.

Obviously these are anecdotal assertions and with lack of any evidence with the
only intent of soliciting and marking the differences and distance between the two
communities.
2.3 Bridging the Gap
How do we reduce the gap? What approaches, techniques or processes should we
use? Discussion and information collected among participants of both communities
were trivial in some cases and original in others. They have been summarized in the
following list:
– make use of short iterations, meetings and focus groups between “stakeholders”
during system development in order to reduce the risks of omitted requirements and
deliver unusable systems. Stakeholders obviously include among others, users, lead
users and customers as well as developers and experts. To this end during the discus‐
sion of this point was clear that there is a misleading use of terms in the two commun‐
ities and that most likely terms like user, lead user or customer as intended by HCI
are included and considered by SE as stakeholders;
– use of lightweight processes, especially those ones proposed in HCI community that
starting from the well known agile processes, such as SCRUM or XP, extend them
by actively involving the “user” or “customer”;
– concepts such as experimentation and empirical evaluation can represent a common
means for both communities. Action research, ethnographic studies, Cooperative
Method Development, formal experiments, case studies [6], surveys, qualitative and
quantitative evaluations etc. could become a shared platform of methods for a joint
evaluation of what is done in the two communities;
– use of interdisciplinary teams that include both HCI and SE experts;
– deﬁnition and use of shared quality models able to objectively evaluate the quality
of the system developed.
In the end we can observe that in the recent years the two communities are progres‐
sively converging towards a set of common practices, process and empirical techniques.
The considerations above and the outcomes of the discussions enhanced during the ﬁrst
edition of the workshop suggest to refer to empirical approaches [5] and evaluation of
software quality [3, 4] as a way for enhancing the convergence between communities
and improves software quality overcoming skepticism and common wisdoms.
Despite the discussion, the research problem still remains: who, what, where, when,
why, and how should we evaluate quality?
Nowadays software quality approaches are also converging towards product evalu‐
ation from diﬀerent point of views: internal, external quality, quality in use. For example,
if we refer to ISO/IEC 25000 we can observe that it standardizes the processes and
models for product evaluation and provides useful guidelines for addressing system
quality in a more objective way. This has been one of starting points for triggering further
discussion in the current edition.

Human Factors in Software Development Processes

3

695

Audience and Expected Outcomes

The overall goal of this interdisciplinary workshop has been to raise the level of engage‐
ment and discussion about human factors in software quality measurements. A further
goal of the workshop has been to identify opportunities to improve the synergy of the
two communities on the scientiﬁc discourse and progress on human aspects of software
evaluation, as well as to better identify opportunities able to educate practitioners and
researchers about how to conduct sound human-centered evaluations in the context of
software engineering. Indeed, the organizers of the workshop are a synergic composition
of active researchers belonging to both communities. The expected outcome is a descrip‐
tive framework that helps to organize the current best practices and a set of recommen‐
dations for formalizing and verifying software system quality.
The workshop has received a positive response from both HCI and SE communities
with several interesting and valuable contributions. The submissions were peerreviewed by international committee members for their quality, topic relevance, inno‐
vation, and potentials to foster discussion. Finally, ﬁve papers were accepted.
In the ﬁrst paper, “Gamiﬁcation and Functional prototyping to Support Motivation
towards Software Process Improvement” authors discuss commitment in software
process improvement initiatives in the context of people-driven processes to help ensure
software quality. Gamiﬁcation and functional prototyping are used as a means to boost
motivation and commitment.
The second paper “Exploring Mobile User Experience through Code Quality
Metrics” presents a set of features for evaluating the code quality of Android applica‐
tions. The discussion points out how user experience varies in mobile ecosystems and
who developers should focus on software quality to assure usable applications from a
user perspective.
In the third paper “Early-Usability in Model-Driven Game Development”, authors
propose a model that can be used to evaluate the usability of video games in early stages
of development. Moreover, the method relies on a model that decomposes usability into
measurable attributes and metrics speciﬁc to the video game domain, bridging de facto
a gap between SE and HCI.
In the fourth paper “What aspects of context should be described in case studies
about software teams? Preliminary results from a mapping study”, authors illustrate the
results of a mapping study aiming at addressing human-based factors that inﬂuence the
selection and composition of software engineering teams and how these can inﬂuence
and impact ﬁnal quality.
Finally, in the ﬁfth contribution “Miscommunication in Software Projects: Early
recognition through tendency forecasts”, authors address issues of team communication
and how miscommunication can lead to delay of software releases and especially hamper
customer satisfaction. Aspects related to team composition and their interaction with
users is also addressed.
Acknowledgment. We would like to thank the organizers of PROFES 2016 for giving us the
opportunity to organize this workshop. We are also grateful to our international program
committee of experts in the ﬁeld for their reviews and collaboration.

696

S. Abrahao et al.

References
1. Abrahao, S., Baldassarre, M.T., Caivano, D., Dittrich, Y., Lanzilotti, R., Piccinno, A.: Human
factors in software development processes. In: Abrahamsson, P., Corral, L., Oivo, M., Russo,
B. (eds.) Proceedings of the Product-Focused Software Process Improvement: 16th
International Conference, PROFES 2015, Bolzano, Italy, 2–4 December 2015, pp. XIV–XVI.
Springer, Switzerland (2015)
2. Costabile, M.F., Fogli, D., Lanzilotti, R., Mussio, P., Piccinno, A.: Supporting work practice
through end-user development environments. J. Organ. End User Comput. 18(4), 43–65 (2006)
3. Pardo, C., Pino, F.J., García, F., Piattini, M., Baldassarre, M.T.: A process for driving the
harmonization of models. In: ACM International Conference Proceeding Series, pp. 51–54
(2010). doi:10.1145/1961258.1961271
4. Pardo, C., Pino, F.J., García, F., Piattini Velthius, M., Baldassarre, M.T.: Trends in
harmonization of multiple reference models. In: Maciaszek, L.A., Loucopoulos, P. (eds.)
ENASE 2010. CCIS, vol. 230, pp. 61–73. Springer, Heidelberg (2011). doi:
10.1007/978-3-642-23391-3_5
5. Ardimento, P., Caivano, D., Cimitile, M., Visaggio, G.: Empirical investigation of the eﬃcacy
and eﬃciency of tools for transferring software engineering knowledge. J. Inf. Knowl. Manag.
7(3), 197–207 (2008). doi:10.1142/S0219649208002081
6. Baldassarre, M.T., Bianchi, A., Caivano, D., Visaggio, G.: An industrial case study on reuse
oriented development. In: IEEE International Conference on Software Maintenance, ICSM
2005, Art. No. 1510124, pp. 283–294 (2005). doi:10.1109/ICSM.2005.20
7. Ardito, C., Buono, P., Caivano, D., Costabile, M.F., Lanzilotti, R., Bruun, A., Stage, J.:
Usability evaluation: a survey of software development organizations. In: International
Conference on Software Engineering and Knowledge Engineering (SEKE 2011), Miami,
Florida, USA, pp. 282–287, 7–9 July 2011

Gamiﬁcation and Functional Prototyping to Support
Motivation Towards Software Process Improvement
Mercedes Ruiz ✉ , Manuel Trinidad, and Alejandro Calderón
(

)

Department of Computer Science and Engineering, University of Cádiz, Cádiz, Spain
{mercedes.ruiz,manuel.trinidad,alejandro.calderon}@uca.es

Abstract. The topic of commitment in software process improvement (SPI) has
been a recurrent topic of research that has not received enough attention in the
recent years. In many situations, the lack of commitment in SPI initiatives is
appointed as the major cause leading to failure. In this paper, we propose the use
of simulation-based process functional prototyping as a tool to design and test
SPI initiatives in a risk-free environment together with gamiﬁcation, as a means
to boost motivation and commitment. The paper shows the conceptualization of
the proposal and the design of the gamiﬁcation strategy, describes the tools built
at the technical implementation of the strategy and summarizes the main results
of a pilot study conducted to initially evaluate the proposal.
Keywords: Gamiﬁcation · Process prototyping · Simulation · Software process
improvement · Motivation · Commitment

1

Introduction

This work is part of a research endeavor aimed at exploring the application of simulation
techniques and gamiﬁcation in the context of people-driven processes to help ensure
software quality. In this paper, we explore the aspects related to commitment in software
process improvement initiatives and how simulation techniques, as a means of functional
process prototyping, and gamiﬁcation can be used to help improve attitude and commit‐
ment.
The topic of commitment in software process improvement (SPI) has been a recur‐
rent topic of research that has not received enough attention in the recent years. Abra‐
hamssom and Iivani [2] conclude that even with investments of billions of dollars in
SPI, the results show that two thirds of the eﬀorts fail in reaching the objectives. In many
situations, the lack of commitment in SPI initiatives is appointed as the major cause
leading to this failure. This sounds at least paradoxical since it is undoubted that SPI has
to be present whenever an organization attempts to improve its eﬃciency and results.
In this paper, we propose the use of process functional prototyping as a tool to design
and test SPI initiatives in a risk-free environment together with gamiﬁcation, as a means
to boost motivation and commitment. This paper is structured as follows: Sect. 2
provides the background for this study and describes similar works to ours highlighting
the diﬀerences between other initiatives and our proposal. Sections 3 and 4 describe our
© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 697–704, 2016.
DOI: 10.1007/978-3-319-49094-6_58

698

M. Ruiz et al.

proposal and the tools that have been already developed to technically support this
approach, respectively. Section 5 describes the pilot study conducted to initially assess
the proposal and the main results. Finally, Sect. 6 summarizes the paper, draws our
conclusions and describes our future work.

2

Background

2.1 Software Process Prototyping
In the context of this work, we deﬁne a functional prototype as a fully functional model
used to test an idea or process in order to learn from its use and provide the speciﬁcations
to implement this idea or process in the real world. As an element of functional proto‐
typing, the use of simulation models has been an active research topic over the past two
decades. Simulation applied to software development projects was introduced in the
ﬁeld of software engineering in the pioneering work of Abdel-Hamid and Madnick [1].
Zhang and his colleagues conclude in their review of the literature that it is an eﬀective
tool for the assessment and management of changes made in the projects and software
organizations [23]. Similarly, Raﬀo and Wakeland [16] deﬁne the process simulation
software as ‘possibly the most useful tool to improve the maturity and capability of
processes, allowing increasing the level of maturity in CMMi’. In addition, another
important advantage of using this type of functional prototyping is its integration with
other techniques that promote analysis and knowledge extraction from its results.
2.2 Gamiﬁcation
It was not until 2011 when gamiﬁcation was deﬁned in a way that is formally accepted.
Deterding and his colleagues [7] deﬁned gamiﬁcation as the application of the design
elements of games in diﬀerent environments, such as websites, education or social
networks.
In the ﬁeld of software engineering, some applications of gamiﬁcation have already
been published. For example, Singer and Schneider [19] proposed a gamiﬁed experience
to enhance the practice of software engineering and control versions or Dubois [9]
suggested the use of self-organizing models for the gamiﬁcation of context-sensitive
applications. In [14], a case study that shows a real application of game mechanics in
software engineering processes is collected. In the context of software testing, some
applications of gamiﬁcation have been also published. For example, Jonathan Khol [12]
indicates that gamiﬁcation helps testing activities to be more interesting, creative,
productive and fun, leading to more eﬀective results.
Some companies in the sector are already beginning to apply gamiﬁcation in its
working methods. A comprehensive systematic mapping of the ﬁeld of gamiﬁcation in
software engineering is shown in [15].
Gamiﬁcation and SPI. In this section, we focus on the works about the use of gami‐
ﬁcation in the speciﬁc scope of SPI. Dorling and McCaﬀery [8] propose using gamiﬁ‐
cation in the context of SPICE. In this work, gamiﬁcation is presented as a solution ‘for

Gamiﬁcation and Functional Prototyping to Support Motivation Towards SPI

699

better user engagement, faster feedback of achievement and more visible progress indi‐
cators of process improvement’. Herranz and his colleagues [10, 11] deﬁne a methodo‐
logical gamiﬁcation framework to guide organizations to apply gamiﬁcation in the scope
of organizational change management of SPI.
To the best of our knowledge, these are the only works reported in the literature that
are speciﬁcally focused on the application of gamiﬁcation in the scope of SPI. Our
proposal is also placed in the scope of promoting the application of gamiﬁcation in SPI.
However, our contribution takes a further step by conceptually and formally designing
a gamiﬁcation strategy that, based on the use of process functional prototyping, is
intended to boost motivation and commitment towards SPI. The technical platform to
support the implementation of the strategy is also described.

3

Gamiﬁcation Framework for SPI

This section describes the process we have followed to design a gamiﬁcation strategy.
The strategy’s aim is to gamify the use of process functional prototyping in order to
bring together the best of both worlds into the area of SPI initiatives design and test.
First, before starting any process leading to the implementation of a gamiﬁcation
strategy, it must be clear that the beneﬁts intended to be obtained meet your needs. Even
though this may sound obvious, a poorly designed gamiﬁcation strategy leads most
probably to failure in meeting your business objectives [5]. In their book, Werbach and
Hunter [21] go deeper into this problem and state that gamiﬁcation is not a solution to
every business problem. Hence, the very ﬁrst stage in the design of a gamiﬁcation eﬀort
is to value if it will help to meet your objectives.
In order to perform the initial evaluation of our approach, we followed Werbach and
Hunter’s proposal [21] consisting on providing an answer to the following questions:
Q1: Motivation: Where would you derive value from encouraging behavior?
Some studies have highlighted the importance of motivation in the software engi‐
neering practice. Beechama et al. [4] performed a systematic literature review to ﬁnd,
among other things, what motivates software developers to be more productive. Further‐
more, when analyzing the speciﬁc ﬁeld of SPI, there is a common agreement in the
published literature about the importance of commitment and motivation in the success
of any SPI initiative [2, 13, 20].
Therefore, the works reported support the thesis that the improvement of motivation
is among the most crucial issues in SPI and, consequently, gamiﬁcation is a suitable tool
to be used in this area.
Q2: Meaningful Choices: Are your target activities suﬃciently interesting?
When designing a gamiﬁcation strategy one should concern about the participant’s
autonomy. For this reason, and based on the fact that change and challenge drive soft‐
ware engineers motivation, we decided to design an environment where the participant
could design and explore the result of as many diﬀerent SPIs initiatives as they could
imagine.

700

M. Ruiz et al.

Q3: Structure: Can the desired behaviors be modeled through a set of algorithms?
Every gamiﬁcation strategy needs to be measurable. In our proposal, we track and
record measures regarding the complexity of the SPI initiative designed by the partici‐
pant and the level of success of that initiative. Later, the structure of feedback informa‐
tion and rewards received by the participant are based on these measures.
Q4: Potential Conﬂicts: Can the game avoid conﬂicts with existing motivational
structures?
Practice shows that some game mechanics intertwined with other organization
rewards, such as salary, lead to demotivation [21]. For this reason, we decided that our
reward system should be built on the basis of promoting self-improvement, teamimprovement and, therefore, organizational improvement.
Once we can provide a positive answer to each of the previous questions, we can
conclude that gamiﬁcation can help solve our problem of improving motivation and
commitment towards SPI initiatives.
3.1 Designing the Gamiﬁcation Strategy
In order to design our gamiﬁcation strategy, we followed the process described by
Werbach and Hunter [21]. The process and the results of each step are described below:
Step 1. Deﬁne your business objectives. In our case, we pursue the following objec‐
tives: (a) improve the attitude of participants towards SPI initiatives, (b) increase their
motivation towards the design and implementation of SPI, and (c) improve the quality
of SPI initiatives.
Step 2. Delineate your target behaviors. We pretend the participants to use the tech‐
nique of process functional prototyping to design and evaluate diﬀerent SPIs initiatives.
Step 3. Describe your players. We include in our strategy intrinsic motivators aimed
at: (a) rewarding individuals for the knowledge and mastery shown when designing
process prototypes, (b) promoting eﬀective communication, by rewarding the quality of
the solutions proposed, and (c) encouraging competition among teams of participants,
to enhance team building.
Step 4. Devise your activities cycles. Actions such as using the technique, creating
prototypes, adding components and complexity to their prototypes, etc. are examples of
actions placed in this low-level cycle that are monitored, recorded and awarded. As the
data associated to these actions become available, the actions placed at the high-level
cycle can be performed. These actions are mostly to challenge the participant with a SPI
problem of a higher complexity.
Step 5. Don’t forget the fun! Our current and next software engineers belong to the
so-called Generation Y (also known as Millennials). According to their features
described in [18], we decided to provide a technological environment that meets their
expectations, mostly ﬂexibility provided by a mobile application, instant feedback and
collaborative culture.

Gamiﬁcation and Functional Prototyping to Support Motivation Towards SPI

701

Step 6. Deploy the appropriate tools. Since our strategy is not based on the interaction
of the user with a web-like platform, we decided to develop a particular technical imple‐
mentation to support the strategy, as described below.

4

Technical Implementation

The technical implementation consists of three tools: SysDyn, OpenBadgesUCA and
GamAnalyze that work collaboratively.
The main features of each tool are the following:
1. SysDyn. Its main function is to allow the user to create functional simulation-based
process prototypes on touch devices such as tablets. It is a mobile application for
Android implemented as a hybrid system, i.e., combining native code Android
(Java), with HTML5, CSS3 and JavaScript.
2. OpenBadgesUCA. OpenBadgesUCA aims to eﬀectively send the user’s badges
awarded by SysDyn to their account in Mozilla OpenBadges platform [3]. It is
implemented as a web app with a PHP script that receives encrypted data sent by
SysDyn and decrypts them to allow the user upload their awards to their account in
Mozilla OpenBadges. OpenBadgesUCA uses HTML5, JavaScript, CSS3 and PHP.
3. GamAnalyze. GamAnalyze aims to help analyze the data collected by mostly Open‐
BadgesUCA to provide information about the results of the gamiﬁcation strategy.
GamAnalyze is a web 2.0 application built using Yii Framework [22].

5

Evaluation of the Proposal

In this section, we present the preliminary results based on a pilot study carried out with
a small number of users. For this study, we designed a pilot evaluation based on [17],
in which the participants used the framework to solve a given SPI problem and provided
their feedback using a post-experience questionnaire.
In order to measure the motivation, we selected the following seven core indicators,
partially based on Chou’s proposal [6]: (a) Accomplishment, (b) Empowerment, (c)
Ownership, (d) Curiosity, (e) Control, (f) Creativity and (g) Social inﬂuence. The ques‐
tionnaire is answered individually. The information is collected by asking the participant
to rate, in a Likert-like scale ranged from 1 (lowest value) to 5 (highest value), their
agreement with a total of 20 statements.
Five professors of software engineering related subjects and ﬁve students attending
a course in software processes at the University of Cádiz in Spain took part in the study.
They were selected by invitation and participated voluntarily in the pilot study. All of
them completed the post-experience questionnaire.
Figure 1 shows the results obtained in this pilot study regarding the motivation indi‐
cators. The indicators related to Accomplishment and Empowerment were the best rated.
The use of SysDyn helped participants make progress and motivated them to overcome
the challenges, since the feedback to the SPI initiatives they were designed was

702

M. Ruiz et al.

immediate thanks to the in-built simulator in SysDyn. This automate feedback assisted
them in the adjustment of their solutions, helping them feel more creative and engaged
in the process.
Accomplishment
5
4

Social influence

Enpowerment

3
2
1
Creativity

Ownership

Control

Curiosity

Fig. 1. Radial diagram of the motivation indicators

The indicators of Ownership and Control were also positively rated, since the sense
of creating your own solution, by increasing and improving it was present in the expe‐
rience.
The social inﬂuence and the curiosity were the indicators that received the lowest
rate. The factor of curiosity is highly inﬂuenced by the sense of the unknown, i.e. not
knowing what is going to happen after you have made a decision. We consider that this
result may be due to the fact that in this stage, our tools did not implement any kind of
uncertain or random event that may happen during the simulation of a SPI initiative,
making them more unpredictable and fostering the curiosity of the participants. As for
the social inﬂuence, our tools help the user to share their achievements via Twitter. This
is a good option for participants who are motivated by trying to achieve what others
have. However, not all our participants seem to have that competitor proﬁle.

6

Conclusions

This paper presents a novel approach towards the improvement of commitment and
motivation in SPI initiatives. The main contributions of this work are the following:
1. The application of the technique of simulation-based functional process prototyping
to provide a risk-free environment where software engineers can design, test, and
evaluate the eﬀect of the SPIs initiatives.
2. The integration of the mentioned technique with gamiﬁcation as a means of boosting
motivation and commitment in SPI.

Gamiﬁcation and Functional Prototyping to Support Motivation Towards SPI

703

3. A gamiﬁcation strategy that supports this integration and that has been conceptually
created and designed following the process suggested by Werbach and Hunter [21].
4. A suite of tools that implement and support the technical implementation of the
strategy.
5. A pilot study conducted to initially assess the validity of the proposal.
Our future work is mostly focused on the enrichment and improvement of the strategy
and the tools developed based on the results of the pilot study. Next, we are planning to
develop more experiments to help us validate our proposal.
Acknowledgements. This work has been partially supported by the Spanish Ministry of Science
and Technology (grant TIN 2013-46928-C3R) with ERDF funds and the Andalusian Plan for
Research, Development and Innovation (grant TIC-195).

References
1. Abdel-Hamid, T., Madnick, S.: Software Project Dynamics: An Integrated Approach, 1st edn.
Prentice-Hall, Upper Saddle River (1991)
2. Abrahamssom, P., Iivari, N.: Commitment in software process improvement - in search of the
process. In: Proceedings of the 35th Annual Hawaii International Conference on System
Sciences (HICSS-35 2002), vol. 8. IEEE Computer Society, Washington (2002)
3. Atkins, T.: It’s Raid Night! Gamification for Software Test Teams? http://think
testing.files.wordpress.com/2012/05/pqa-gamificationtesting-v120426.pdf. Accessed 9 March
2016
4. Beechama, S., Baddooa, N., Halla, T., Robinsonb, H., Sharpb, H.: Motivation in software
engineering: a systematic literature review. Inf. Softw. Technol. 50(9–10), 860–878 (2008)
5. Burke, B.: Gamiﬁcation 2020: What Is the Future of Gamiﬁcation? http://www.grow
thengineering.co.uk/what-is-the-future-gamiﬁcation/. Accessed 9 March 2016
6. Chou, Y.K.: Octalysis – complete Gamiﬁcation framework. http://yukaichou.com/gamiﬁ
cation-examples/octalysis-complete-gamiﬁcation-framework/. Accessed 9 March 2016
7. Deterding, S., Dixon, D., Khaled, R., Nacke, L.: From game design elements to gamefulness:
deﬁning “Gamiﬁcation”. In: MindTrek 2011, 28–30 September, Tampere, Finland (2011)
8. Dorling, A., McCaﬀery, F.: The gamiﬁcation of SPICE. In: Mas, A., Mesquida, A., Rout, T.,
O’Connor, R.V., Dorling, A. (eds.) SPICE 2012. CCIS, vol. 290, pp. 295–301. Springer,
Heidelberg (2012). doi:10.1007/978-3-642-30439-2_35
9. Dubois, D.J.: Toward adopting self-organizing models for the gamiﬁcation of context-aware
user applications. In: 2012 2nd International Workshop on Games and Software Engineering
(GAS), vol. 9(15), p. 9 (2012)
10. Herranz, E., Colomo-Palacios, R., Amescua-Seco, A.: Towards a new approach to supporting
top managers in SPI organizational change management. Procedia Technol. 9, 129–138 (2013)
11. Herranz, E., Colomo-Palacios, R., de Amescua Seco, A., Yilmaz, M.: Gamiﬁcation as a
disruptive factor in software process improvement initiatives. J. Univ. Comput. Sci. 20(6),
885–906 (2014)
12. Khol, J.: Software Testing Is a Game. Better Software, pp. 7–8, January/February 2013
13. Niazi, M.: Software process improvement implementation: avoiding critical barriers,
CROSSTALK. J. Defense Softw. Eng. 22(1), 24–27 (2009)

704

M. Ruiz et al.

14. Passos, E.B., Medeiros, D.B., Neto, P.A.S., Clua, E.W.G.: Turning real-world software
development into a game. In: 2011 Brazilian Symposium on Games and Digital Entertainment
(SBGAMES), vol. 260(269), 7–9 November 2011
15. Pedreira, O., García, F., Brisaboa, N., Piattini, M.: Gamiﬁcation in software engineering – a
systematic mapping. Inf. Softw. Technol. 57, 157–168 (2015)
16. Raﬀo, D.M., Wakeland, W.: Moving up the CMMI capability and maturity levels using
simulation. Technical report CMU/SEI-2008-TR-002 ESC-TR-2008-002, January 2008
17. Runeson, P., Host, M., Rainer, A., Regnell, B.: Case Study Research in Software Engineering:
Guidelines and Examples. John Wiley & Sons, Hoboken (2012)
18. Saxena, P., Jain, R.: Managing career aspirations of generation Y at work place. Int. J. Adv.
Res. Comput. Sci. Softw. Eng. 2(7), 114–118 (2012)
19. Singer, L., Schneider, K.: It was a bit of a race: gamiﬁcation of version control. In: 2012 2nd
International Workshop on Games and Software Engineering (GAS), vol. 5(8), p. 9, June 2012
20. Stelzer, D., Mellis, W.: Success factors of organizational change in software process
improvement. Softw. Process. Improv. Pract. 4(4), 227–250 (1998)
21. Werbach, K., Hunter, D.: For the Win. How Game Thinking can Revolutionize your Business.
Wharton Digital Press, Philadelphia (2012)
22. Yii framework. The Fast, Secure and Professional PHP Framework. http://www.yii
framework.com/. Accessed 9 March 2016
23. Zhang, H., Kitchenham, B., Pfahl, D.: Software process simulation modeling: facts, trends
and directions. In: Proceedings of the 15th Asia-Paciﬁc Software Engineering Conference,
pp. 59–66. IEEE Computer Society (2008)

Exploring Mobile User Experience Through
Code Quality Metrics
Gerardo Canfora2 , Andrea Di Sorbo2 , Francesco Mercaldo1(B) ,
and Corrado Aaron Visaggio2
1

2

Institute for Informatics and Telematics,
National Research Council of Italy (CNR), Pisa, Italy
francesco.mercaldo@iit.cnr.it
Department of Engineering, University of Sannio, Benevento, Italy
{canfora,disorbo,visaggio}@unisannio.it

Abstract. Smartphones have been absorbed into everyday life at an
astounding rate, and continue to become more and more widely used.
Much of the success of the mobile paradigm can be attributed to the discover of a huge market. Users may pick from a large collection of software,
in domains ranging from games to productivity. Each platform makes the
task of installing and removing apps very simple, further inciting users
to try new software. Smartphone users may download applications from
the oﬃcial Google Play market, but those applications do not pass any
review process, and can be downloaded very shortly after submission.
Google Play does not oﬀer any mechanism to ensure the user about the
quality of the installed app, and this is particularly true for user experience: the user simply downloads and runs the application. In this paper
we propose a features set to evaluate the code quality of Android applications to understand how user experience varies in mobile ecosystem.
Our ﬁndings show that developers need to focus on software quality in
order to make their applications usable from the user point of view.

Keywords: Software quality

1

· Mobile applications · User experience

Introduction and Background

Since its inception, the smartphone has quickly become one of the most ubiquitous technological artifacts in today’s society.
Global sales of smartphones to end users totaled 349 million units in the ﬁrst
quarter of 2016, a 3.9 percent increase over the same period in 2015, according
to Gartner: smartphone sales represented the 78 percent of total mobile phone
sales in the ﬁrst quarter of 20161 . Sales continue to increase year after year, as
the smartphones become even more accessible and intertwined in day-to-day life.
Such phones have achieved success due in part to easily available applications
(apps), typically provided by oﬃcial market.
1

http://www.gartner.com/newsroom/id/3323017.

c Springer International Publishing AG 2016

P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 705–712, 2016.
DOI: 10.1007/978-3-319-49094-6 59

706

G. Canfora et al.

Apps have major implications for both end users and software developers,
especially in terms of software quality and user experience.
Indeed, mobile device users may download applications from the oﬃcial
Google Play store (formerly the Android Market). Apps on the Play Store are not
reviewed in any way, and can be downloaded very shortly after submission. This
unrestricted model is often called the “wild west” [1]. In this paradigm, more
responsibility is placed on the user to screen an app and determine whether it
is safe and whether it meets his needs. This can be problematic as the app may
be new, and the user likely has little insight into how the app works.
No criteria is given as to how these apps should be scored, so the decision is
left completely to the user. Users may also leave comments regarding the app.
Many of these indicate the user’s feelings toward the quality of the app, often
advising others on whether the app is worth downloading.
With such a large presence of apps and markets, and their ever-increasing
popularity, the question of how to determine the quality of an app and its marketplace is one that warrants consideration.
In this paper we evaluate mobile applications extracting a set of code quality
features in order to investigate the user experience evolution in Android environment.
Diﬀerent authors propose metrics for evaluating the user experience on
mobile devices. Authors in [2] apply a set of features with the aim at understanding whether there are diﬀerences in terms of usability between trusted and
malware mobile samples. Researchers in [3] developed a platform named ATE
for supporting design of UX tests. User perception is obtained by measuring
the smartphone’s resources (i.e. time, memory occupation). The main diﬀerence
with these works is that both of the papers do not address the problem of UX
evaluation through code quality metrics.
The paper poses the following research question:
– RQ: is there a diﬀerence in user experience among mobile applications developed in diﬀerent years?
The paper is organized as follows: the next section describes the methodology,
the third section discusses the results and, ﬁnally, conclusion and future works
are given in the last section.

2

The Methodology

In this section we explain the rationale behind the features set we extracted and
the approach we used to extract them.
We deﬁned a set of features for evaluating the quality of Android mobile
applications from users’ side, i.e. user experience (UX) related.
We consider following categories with their respective features:
– Separation of View and Controller : we compute these features in order to
assess the conformance of each app to the model-view-controller (MVC) architectural pattern. In this pattern, commonly used for user-interface software,

Exploring Mobile User Experience Through Code Quality Metrics

707

there are three components. The model contains the data for the application.
A view presents something to the user, generally based on the model, and
allows interaction. Finally, the controller implements the program logic, mediating the interaction between the view and the model. Android development is
based around this pattern. Activities are the controllers, plain old Java objects
(POJO) comprise the model, and the previously discussed mechanisms (XML
or Java) comprise the view. A fundamental tenet of the pattern is that views
should not be deﬁned in controllers. We compiled a list of view objects from
the Android.widget package2 , and used it to determine where view objects
were deﬁned.
In this category fall following metrics:
• SV C1: this feature represents the number of views in controllers;
• SV C2: this feature represents the number of views not in controllers;
• SV C3: this feature represents the percentage of views deﬁned outside of
controllers.
While the previous features are related to software quality, the following categories directly inﬂuence the user’s experience. We detail each of these features.
– Unchecked Bundles. On any given day, one in ﬁve Android users will experience a crash, and up to half will uninstall the oﬀending app3 . Research has
shown that the most common reason for app crashes is the occurrence of
a NullPointerException [4]. Many of these are left uncaught and thus cause
crashes. The NullPointerException is often manifested in the implementation
of the application lifecycle. In order to pass data among views, the developer
can use a custom object, the Intent, to wrap data. The developer must call
the putExtras() method of the Intent class, which creates a Bundle, another
custom object, to wrap the data. From the receiving end, there is a class-level
Intent. A Bundle can accessed from the Intent using the getExtras() method.
In some cases this method might return null4 , meaning that if left unchecked
it may lead to a crash. Thus, the developer should handle the possibility of
the exception being thrown when using Bundles from Intents to prevent the
app from crashing. This can be accomplished using a try-catch block or an if
statement. We counted the number of Bundles in each application.
In this category the following features fall:
• UB1: this feature counts the number of Bundles in each application;
• UB2: this feature counts the number of Checked Bundles in each application;
• UB3: this feature counts the number of Unchecked Bundles in each application.
– Token Exceptions. In Android development, the Context object is used to store
information regarding, as expected, the current context of the application5 .
2
3
4
5

https://developer.android.com/reference/android/widget/package-summary.html.
http://visual.ly/android-errors-real-time?utm source=visually embed.
https://developer.android.com/reference/android/content/Intent.html.
https://developer.android.com/reference/android/content/Context.html.

708

G. Canfora et al.

This is necessary to perform certain actions, such as showing a notiﬁcation. In order to show a dialog, the context of an Activity must be used.
Using any other context, such as that of a service, will throw a WindowManager.BadTokenException6 . We counted the number of dialogs shown from
classes other than activities, which might lead to crashes.
This category includes the following metric:
• TE: this feature represents the number of potential bad token exception.
– Number of Fragments. The Fragment was added to the Android SDK level
11. It represents a behavior or a portion of user-interface in an Activity. This
can be combined to create a fragmented interface7 . Since these are new to the
platform, we determined that they might be useful in determining quality, as
developers may have trouble adjusting to new techniques.
In this category falls following metric:
• NF: this feature counts the number of fragments in each application.
In order to study the evolution of UX, relying on the assumption for which
apps presenting a previous packaging date have been developed earlier than apps
presenting a subsequent packaging date, (i) we grouped together apps presenting
same packaging year, and (ii) for each resulting group we computed the correspondent set of descriptive statistic indicators for all the features we computed,
in order to describe the evolution trends of each feature belonging to the dataset.
Speciﬁcally, we computed for each feature: (i) Minimum, (ii) 1st Quartile, (iii)
Median, (iv) 3rd Quartile, (v) Maximum, and (vi) Mean.

3

Evaluation

We designed an experiment [5,6] in order to evaluate the eﬀectiveness of the
proposed feature set, expressed through the research question RQ stated in the
introduction. More speciﬁcally, the evaluation is aimed at verifying whether the
eight features are able to highlight diﬀerences from the UX point of view in
Android mobile applications developed in diﬀerent years.
We conducted measurements on a dataset containing 15,426 Android trusted
apps downloaded from the Google’s oﬃcial app store8 . We identiﬁed Google Play
as a target for our study for several reasons. This is by far the largest and most
used source for Android applications, with over a million apps in July 2013 and
was most recently placed at 2 million apps in February 20169 . The majority of
devices sold are compatible and come with the market preinstalled, making it the
most accessible. In fact, using other markets requires the explicit permission of
the user. In order to download applications we crawled the Google Play market
using an open-source crawler10 . The obtained dataset includes samples belonging
6
7
8
9
10

https://possiblemobile.com/2013/06/context/.
https://developer.android.com/guide/components/fragments.html.
https://play.google.com/store.
http://www.statista.com/statistics/266210/.
https://github.com/liato/android-market-api-py.

Exploring Mobile User Experience Through Code Quality Metrics

709

to all diﬀerent categories available on the market. The crawler is conﬁgured to
equally download applications from the diﬀerent categories of apps.
The .apk ﬁle of an Android application is basically a zip archive containing all
the resources an application needs to run, such as the application binary code and
images. Using this package format all the ﬁles that makes an application go from
the computer of the developer to end user devices without any modiﬁcation. In
particular, all metadata of the ﬁles contained in the .apk package, for instance the
last modiﬁcation date, are preserved. All bytecode, representing the application
binary code, is assembled into a classes.dex ﬁle that is produced at packagingtime. Thus the last modiﬁcation date of this ﬁle represents the packaging time.
We retrieve the date of the compilation at which the Dalvik bytecode was produced in order to infer the historical distribution of the samples in our dataset.
Speciﬁcally, we retrieve application packaging in: 2008 (178 samples), 2009 (143
samples), 2010 (167 samples), 2011 (629 samples), 2012 (1548 samples), 2013
(3640 samples), 2014 (2448 samples), 2015 (6423 samples), 2016 (250 samples).
We provide a comparison of descriptive statistics of the populations of
applications.
Figure 1 shows the boxplots related to the features SCV 1, SCV 2, SCV 3,
U B1, U B2, U B3, T E and N F (from left to right).
The boxplots related to SCV 1 feature show that the value of the views in
controller is very similar between apps developed in diﬀerent years. The implementation of a view in a controller is considered a bad programming practice,
the boxplot suggests that this practice is quite diﬀused in Android programming
indiﬀerently from the packaging year. The boxplots related to the SV C2 feature,
i.e. the number of views that are not in the controller, is symptomatic of the fact
that Android programmers employ frameworks for developing applications, especially in the more recent year (i.e., 2016): this is reﬂecting the increasing trend
in the SCV 2 feature. The trend is also considerable in 2009 but, as explained
by previous boxplots, developers anyway make use of view embedded into controllers. The boxplots related to the SV C3 feature, i.e. the percentage of views
deﬁned outside of the controllers, conﬁrm the trend we highlighted in discussing
SV C2: Android developers take into account the best practices of software engineering when they design the application, as matter of fact newer and older
mobile applications present the 100 % of percentage of view correctly deﬁned
outside of controllers.
Bundles are generally used for passing data between various Android activities, this is the reason why a not handled bundle may cause NullPointerException whether expected data are not returned. Boxplots show that when the
packaging year is increasing, the number of bundled employ by the application is
increasing: this may happen because mobile applications are quickly increasing
in complexity and functionality to oﬀer to end users. Applications developed in
2010, 2015 and 2016 exhibit an overusing of bundles if compared with previous
packaging years, as demonstrated by boxplots related to the U B1 feature, i.e.
the number of Bundles in each application.
Boxplots related to the U B2 feature, i.e. the number of Checked Bundles
in each application show that developer do not take into account that the

710

G. Canfora et al.

Fig. 1. Box plots for the full features set.

application can cause a NullPointException due for a Bundle, i.e. the usage
of check related to Bundle is very limited and it is almost similar between diﬀerent packaging years. The boxplots related to the U B3 feature, i.e. the number of
Unchecked Bundles in each application, reﬂect the results of the U B2 feature: we
denote an increment of Unchecked Bundles in newer Android applications. With
the increase of functionality and the usage of web-services required by new applications developers have to employ more and more Bundles in order to exchange
data between activities: when developers do not handle bundle exception, they
make their applications sensitive to unpredictable crashes due to NullPointerException. This result has a dramatic impact on UX, because the application may
crash at any time unnoticed by the user.

Exploring Mobile User Experience Through Code Quality Metrics

711

To show a dialog, the context of an Activity must be invoked. Using
any other context, such as that of a service, will throw a WindowManager.BadTokenException. The boxplots boxplots related to the T E feature, i.e.
the number of potential token exception in each application, do not exhibit an
increment between diﬀerent packaging years: this happens because developer do
not increasing employ dialogs shown from classes other than activities, that may
cause unexpected crashes.
Fragments represent a behavior or a portion of user-interface in an Activity.
This can be combined to create a fragmented interface. Basically a fragment is a
modular section of an activity, which has its own lifecycle, receives its own input
events: it is a sort of “sub activity” that it is possible to reuse in diﬀerent activities. The boxplots related to the N F feature, i.e. the number of fragments in each
application, show that the trend increases when the packaging year increases.
This happens for a two-fold motivation: the ﬁrst one is that developers learn
quickly how to use enhancements provided by new SDK version, but on the
other side they don’t care of software quality. As matter of fact, Fragments were
created to embed small user-interface portion in an Activity but they do not
exceed with using them. Indeed, an extensive use of Fragments is not in line
with the Separation of View and Controller.
Table 1 shows the results obtained for each median indicator we considered
in the study.
Table 1. Median indicator for each feature we computed for the packaging year.
Feature 2008
SCV 1

49

2009
137

2010
73

2011
82

2012
81

2013
86

2014
73

2015
81

2016
90

SCV 2

711

1327

810

755,5

924

1010

1099

1178

1487

SCV 3

95,51

91,87

87,08

90,56

88,79

89,95

91,82

91,56

91,14

U B1

0

22

10

10

10

13

11

15

10

U B2

0

8

1

4

4

6

5

6

6

U B3

0

11

4

5

4

6

5

9

4

TE

0

1

2

2

1

2

2

1

0

NF

345

764

348

378,5

404

512

490

548

628

From results we denote that SV C2 median feature exhibits greater values
if compared with the SV C1 median i.e., the number of views not in controllers
exceed the number of views in controllers. From the software engineering side
this is a result representative that developer take into account MCV pattern
when they developer mobile applications, and from the UX ones, being the view
not coupled from business logic, the application will be easily adaptable by the
developer to run on a plethora of devices with diﬀerent screen size with the
same UX level from the user point of view. The SV C3 feature median, i.e. the

712

G. Canfora et al.

percentage of views deﬁned outside of controllers, conﬁrms this software quality
positive trend.
The U B1, U B2 and U B3 features median exhibits that recent mobile application make less use of Bundles if compared with previous developed ones. Considering that a non handled bundle can cause a NullPointerException and the
consequently crash of the application with the correspondent lose of data, we
conclude from this analysis that developer take into account to caught exceptions
from Bundle, in order to guarantee an high UX to the users.
Relatively to T E feature median, the table shows that the number of potential to token exceptions in each application is almost the same between diﬀerent Android application. Developers are aware that this may cause unexpected
crashes and they avoid to use them, for the beneﬁt of either software quality and
UX. The N F feature median increments occur in recent packaging years. The
excessive usage of Fragment is a bad programming practice, because it pushes to
include small portions of graphics interface into business logic classes. As matter
of fact, an extensive use of Fragments, merging view and controllers, is resulting
in a consequently poor software quality and UX.

4

Conclusions and Future Works

In this paper, we extract from 15,426 applications a set of features in order to
evaluate the quality of user experience in the software for mobile devices.
Results obtained show that Android mobile applications present several indicators that make us think that the developers do not care about the quality of
user experience in a systematic way. Future works concern the evaluation of
the review mechanism on Google Play to verify whether the user reviews are
coherent with the software quality and user experience.

References
1. Higa, D.: Walled gardens versus the wild west. Computer 41(10), 102–105 (2008)
2. Mercaldo, F., Visaggio, C.A.: Evaluating mobile malware by extracting user
experience-based features. In: Abrahamsson, P., Corral, L., Oivo, M., Russo, B.
(eds.) PROFES 2015. LNCS, vol. 9459, pp. 497–512. Springer, Heidelberg (2015).
doi:10.1007/978-3-319-26844-6 37
3. Canfora, G., Mercaldo, F., Visaggio, C.A., DAngelo, M., Furno, A., Manganelli,
C.: A case study of automating user experience-oriented performance testing on
smartphones. In: 2013 IEEE Sixth International Conference on Software Testing,
Veriﬁcation and Validation, pp. 66–69. IEEE (2013)
4. Kechagia, M., Spinellis, D.: Undocumented and unchecked: exceptions that spell
trouble. In: Proceedings of the 11th Working Conference on Mining Software Repositories, pp. 312–315. ACM (2014)
5. Carver, J.C., Juristo Juzgado, N., Baldassarre, M.T., Vegas Hernández, S.: Replications of software engineering experiments. Empirical Softw. Eng. 19(2), 267–276
(2014)
6. Ardimento, P., Caivano, D., Cimitile, M., Visaggio, G.: Empirical investigation of
the eﬃcacy and eﬃciency of tools for transferring software engineering knowledge.
J. Inf. Knowl. Manage. 7(03), 197–207 (2008)

Early Usability in Model-Driven Game
Development
Silvia Abrahão(&), Emilio Insfran, José Ángel Carsí,
and Adrián Fernandez
DSIC – Universitat Politècnica de València, Camino de Vera s/n,
46022 Valencia, Spain
{sabrahao,einsfran,jcarsi,afernandez}@dsic.upv.es
Abstract. Video games are one of the most influential types of software today.
However, they are becoming more and more complex to design and evaluate. In
this context, Model-Driven Development approaches seem to be very promising
since a video game can be obtained by transforming platform-independent
models into platform-speciﬁc models that can be in turn transformed into code.
In previous work, we deﬁned a usability evaluation method speciﬁcally tailored
to this type of development process. This paper shows how this method can be
used to evaluate the usability of video games in early stages of a model-driven
development process. The method relies on a model that decomposes usability
into measurable attributes and metrics speciﬁc for the video game domain. To
show the feasibility of the method, we performed an early usability evaluation of
a video game for the XBOX360 platform.
Keywords: Video game

 Usability inspection  Model-Driven Development

1 Introduction
Video games are one of the most influential types of software today. Games are being
used for far more than entertainment, with applications including social marketing,
education, and specialist training. The video game development industry is a strong
economic sector that deals with the development of highly interactive software, i.e.,
video games, for a wide variety of technology platforms.
The interaction between the game and the players is a critical factor in the success
of a video game. Usability and playability are considered to be the most important
quality factors of video games [10]. Usability is deﬁned as the degree to which the
video game can be understood, learned, used and is attractive to the user, when used
under speciﬁed conditions [6]. Playability is deﬁned as a collection of criteria with
which to evaluate a product’s gameplay or interaction [7]. Playability is often evaluated
by using early prototypes and iterative cycles of playtesting during the entire video
game development cycle. However, the evaluation of usability in current video game

This research work is funded by the Value@Cloud project (MINECO TIN2013-46300-R).
© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 713–722, 2016.
DOI: 10.1007/978-3-319-49094-6_60

714

S. Abrahão et al.

development practices is often deferred to late stages in the game development cycle,
thus signifying that usability problems from early stages may be propagated to late
stages of the development, and consequently making their detection and correction a
very expensive task. The state of the art for game development in software engineering
has been summarized in a systematic literature review [1]. The results of this review
show a signiﬁcant lack of studies in the key dimensions of video game quality:
playability and usability.
Today there is no common agreement what kind of usability evaluation methods
can and should be used to enhance the design of games [3]. Traditional video game
development approaches do not take full advantage of a usability evaluation of the
game design artifacts that are produced during the early stages of the development.
These intermediate artifacts (e.g., screen mock-ups or screen flow diagrams) are used to
guide game developers but not to perform usability evaluations.
This problem may be alleviated by using a Model-Driven Development
(MDD) approach due to its intrinsic traceability mechanisms that are established by the
transformation processes. Platform-independent models (PIM) or platform-speciﬁc
models (PSM) can be evaluated during the early stages of video game development to
identify and correct some of the usability problems prior to the generation of the source
code of the ﬁnal video game application. We are aware that not all the usability
problems can be detected based on the evaluation of models since they are limited by
their own expressiveness and, most important, they may not predict the user behavior
and preferences. However, the use of inspection methods for detecting usability
problems can be complemented with other evaluations performed with end-users. In
previous work, we deﬁned a usability evaluation method speciﬁcally tailored to
model-driven video game development. In this paper, we show how this method can be
used to evaluate the usability of a video game for the XBOX360 platform.
This paper is organized as follows. Section 2 discusses our usability inspection
method. Section 3 describes a strategy to apply this method for performing early
usability evaluations in model-driven video game development. Section 4 presents the
evaluation of a speciﬁc game to illustrate the inspection method. Finally, Sect. 5
presents our conclusions and further work.

2 Usability Inspection Method
Our usability inspection method relies on a Video Game Usability Model [5] which
contains a set of usability attributes and measures that can be applied by the designer in
the following phases of a MDD development process: (i) in the PIM, to assess different
models that specify the video game application independently of platform details
(e.g., screen flow diagrams, screen mock-ups, screen navigation diagrams); (ii) in the
PSM, to assess the concrete design models related to a speciﬁc platform (if they exist);
and (iii) in the code model, to assess the generated video game.
The goal of this usability model is to extend the Software Quality Model proposed
in the ISO/IEC 25010 (SQuaRE), speciﬁcally the usability characteristic, for specifying, measuring, and evaluating the usability of video games that are produced
throughout a model-driven development process from the end-users perspective.

Early Usability in Model-Driven Game Development

715

The SQuaRE states that the usability of a software product can be decomposed into
the following sub-characteristics: Appropriateness Recognisability, which refers to how
the software product enables users to recognize whether the software is appropriate for
their needs; Learnability, which refers to how the software product enables users to
learn its application; Ease of Use, which refers to how the software product makes it
easy for users to operate and control it; Helpfulness, which refers to how the software
product provides help when users need assistance; Technical Accessibility, which refers
to how the software product provides help when users need assistance; and Attractiveness, which refers to how appealing the software product is to the user.
Since these sub-characteristics are too abstract we decompose them into usability
attributes and measures as showed in the second and third column of Table 1. The
attributes have been deﬁned by considering and adapting ergonomic criteria for user
interfaces [2] as well as knowledge from other domains such as Web development [4],
and the underlying usability principles from game development [8]. The measures are
generic to ensure that they could be operationalized in different software artifacts from
any model-driven video game development method.
It is worth to mention that we cannot guarantee that our usability model covers all
the possible usability attributes for the video game domain. Our model is an attempt to
operationalize subjective heuristics, usability guidelines and recommendations into
usability attributes that can be quantiﬁed by means of measures.

3 Evaluation Process
In order to apply the Video Game Usability Model to a speciﬁc model-driven video
game development, we follow a usability evaluation process. A typical video game
development process consists in the following activities: requirements speciﬁcation,
game design, implementation, and playtesting, along with the usability evaluation. The
usability evaluation is conducted by applying the following steps:
1. The establishment of evaluation requirements. Evaluation proﬁles are chosen
to specify which model-driven game development method is employed, which type
of video game is developed, what the target technological platform is, and at which
target players the game is aimed. Given a speciﬁc model-driven game development
method, software artifacts (models) and usability attributes from the Video Game
Usability Model are selected to perform early usability evaluations. The measures
associated with the selected attributes are instantiated for a speciﬁc software artifact
of the game development method, and the thresholds for the measures are
established.
2. Early usability evaluation. Each selected game software artifact is evaluated
with a set of measures. Each measure returns a numeric value within a speciﬁc
threshold that indicates whether there is a usability problem in the video game.
A usability report is generated with the usability problem details and suggestions to
solve them.
3. Usability evaluation in-use. Even when early usability evaluation is performed
on models, the game should also be further evaluated with end-users in speciﬁc

716

S. Abrahão et al.

Table 1. Decomposition of the SQuaRE into measurable attributes and generic measures
Sub-characteristics
Appropriateness
Recognisability

Learnability

Attributes
Visibility
Interface Simplicity
Control Simplicity
Consistency
Feedback

Tutorial Support
Ease of Use

Control Consistency
Internal Navigational
Simplicity
External Navigational
Simplicity

Helpfulness

Hint Support
Goal Support

Technical
Accessibility

Attractiveness

Subtitle Support

Magniﬁer Support
Customization
Wait Reduction

Measures
Percentage of screen usage
Total number of GUI elements
Total number of control mappings
Ratio of similitude between screens
Total number of GUI elements
displaying state changes
Ratio of GUI elements highlighting
state changes
Ratio of meaningful messages
Tutorial interactivity
Tutorial coverage
Ratio of similitude between colliding
game actions
Internal menu navigation depth
Internal menu navigation breadth
Shortest Path To Gameplay
Shortest Path To Exit
Shortest Return Path To Gameplay
Availability of hints
Hint understandability
Goal visibility
Goal understandability
Availability of subtitles
Subtitle support for hearing impaired
players
Subtitle style differentiation
Subtitle resize support
Control remapping
Interface customization
Inactive wait
Skip capability of non-interactive
content

contexts of use. Since this paper focuses on early usability evaluation in
model-driven development, usability evaluation in-use is not within the scope of
this work.
After usability evaluations, game designers should perform changes to the models in
order to solve the usability problems. Early usability problems detected in the game
design can be corrected in each model of the corresponding development stage (e.g.,
PIM, PSM) prior to the code generation.

Early Usability in Model-Driven Game Development

717

4 Applying the Usability Inspection Method
The usability model was applied to a 2D ﬁghting game for the XBOX 360, which is
similar to the commercial Capcom’s Street Fighter V™ for the same platform.
4.1

Model-Driven Video Game Development

Model-driven video game development [9] is a game development methodology that
focuses on deﬁning platform-independent models which provide a precise high-level
speciﬁcation of the gameplay, control, and graphical user interface of the video game
under development. We focus only on the following platform-independent models that
offer the most suitable modeling primitives for usability evaluation:
Screen Navigation Diagram. Video games display visual information on different
game screens through which players can navigate. Figure 1 shows the screen navigation metamodel. A screen navigation diagram can be speciﬁed by using screen nodes
and screen transitions.
A game screen represents a game state in the screen navigation. Two special screen
nodes denote the initial and ﬁnal states that deﬁne the screens on which a video game
starts and ends. Screen transitions represent a change of state in the screen navigation,
i.e., moving from one screen to another. Screen transitions are triggered by screen
events such as control interactions, time, or rule executions.
Screen Layout Diagram. When the flow of screens is clearly deﬁned in a navigation
diagram, each game screen GUI should be further speciﬁed by using a screen layout
diagram. Figure 2 shows the screen layout metamodel. A screen layout diagram can be
speciﬁed by different GUI display primitives that can be positioned and sized on the
screen. These primitives provide a visual representation of a game attribute which is
previously deﬁned in the gameplay perspective. There are four types of GUI display
primitives: numeric containers and textual containers which represent information as
plain numbers or text, image containers which represent information using 2D images

Fig. 1. Excerpt of the Screen Navigation metamodel

718

S. Abrahão et al.

Fig. 2. Excerpt of the Screen Layout metamodel

or animations, and progress containers which represent the progress of information as
a relative percentage of a colored bar or a succession of small icons.
Control Mapping Diagram. A game control mapping deﬁnes how players interact
with controller devices to communicate with the game. Figure 3 shows the control
mapping metamodel. A controller is a device that players use to communicate with the
game. Controllers are made up of smaller control elements such as keys, buttons,
joysticks and triggers that players use to communicate atomic game interactions.
Control element interactions such as pressing or releasing a button, moving a joystick,
or pulling a trigger, activate the speciﬁc action rules of a player’s character. A control
mapping diagram speciﬁes which control elements and interactions are associated with
gameplay actions.
4.2

Establishment of the Usability Evaluation Requirements

The evaluation proﬁle of the 2D ﬁghting game example used is as follows:
• Game development method: the game is designed by using a model-driven
development method. The software artifacts involved in the usability evaluation are
the screen navigation, the screen layout and the control mapping diagrams.
• Type of video game: the game belongs to the 2D ﬁghting genre.

Fig. 3. Excerpt of the Control Mapping metamodel

Early Usability in Model-Driven Game Development

719

• Target technological platform: the game is developed for the XBOX 360.
• Target audience: the game, like most 2D ﬁghting games, is targeted at a hardcore
audience of players who have a great deal of previous experience in games of the
same genre, and who thus know and expect certain common genre conventions.
Table 2. Operationalized measures for the case study
Measure
Attribute
Artifact
Operationalization

Formula
Thresholds

Measure
Attribute
Artifact
Operationalization
Formula
Thresholds

Measure
Attribute
Artifact
Operationalization
Formula
Thresholds

Percentage of Screen Usage (PSU)
Appropriateness Recognisability/Visibility
Screen Layout Diagram (PIM)
Each display primitive of the Screen Layout Diagram has attributes for
its width and height. The screen meta-class also has attributes for its
width and height. Both the primitive display size and the screen size can
be deﬁned as the product of their width and height
PSU = (Sum of all display primitives width x height) / (screen width x
height)
The XBOX 360 is typically played on a high-resolution TV, which
beneﬁts visibility. Hardcore players are also well trained in the speciﬁc
genre conventions of 2D ﬁghting games, thus minimizing the space
needed to convey the game’s visual information.
Critical Usability Problem: [PSU > 0.5] Low Usability Problem:
[0.1 < PSU ≤ 0.2]
Medium Usability Problem: [0.2 < PSU ≤ 0.5] No Usability Problem:
[PSU ≤ 0.1]
Total Number of GUI Elements (TNGUIE)
Appropriateness Recognisability/Interface Simplicity
Screen Layout Diagram (PIM)
Each display primitive of the Screen Layout Diagram is a GUI of the
screen
TNGUIE = Sum of all display primitives of the Screen Layout Diagram
Hardcore players are experienced in the genre conventions of 2D
ﬁghting games and they expect their typical interface layout, with a
number of GUI elements between 3 and 10.
Critical Usability Problem: [TNGUIE > 10] Low Usability Problem:
[3 < TNGUIE ≤ 5]
Medium Usability Problem: [5 < TNGUIE ≤ 10] No Usability
Problem: [0 ≤ TNGUIE ≤ 3]
Shortest Return Path To Gameplay (SRPTG)
Ease of Use/External Navigational Simplicity
Screen Navigation Diagram (PIM)
Each screen primitive of the Screen Navigation Diagram can be
associated with a game screen.
SRPTG = minimum number of transitions between the game over
screen node and the gameplay screen node.
Hardcore players value immediateness of menu interfaces.
Medium Usability Problem: [SRPTG > 2] No Usability Problem:
[0 ≤ SRPTG ≤ 2]

720

S. Abrahão et al.

For the sake of simplicity, only two usability sub-characteristics were evaluated:
Appropriateness Recognisability and Ease of Use. The selected attributes were Visibility, Interface Simplicity, and External Navigation Simplicity, whose associated
measures are shown in Sect. 2. The operationalization of the aforementioned measures
are presented below in Table 2. Note that the measure thresholds are deﬁned in
accordance with speciﬁc information from the evaluation proﬁle for the video game
evaluated in this case study. We gather this information from game developers but we
are aware that these values need to be empirically validated to determine which
thresholds are more appropriate for pre-deﬁned evaluation proﬁles.
4.3

Early Usability Evaluation of Software Artifacts

Figure 4 shows the main GUI elements of the game rendered in a Screen Layout
Diagram. Figure 5 shows the game screens and transitions in a Screen Flow Diagram.
The Control Mapping Diagram is not rendered as a ﬁgure for the sake of simplicity: the
game uses a 2-dimensional control element (the left thumbstick associated with character movement), and six 1-dimensional control elements (the buttons associated with
light/medium/strong punches and kicks).
With regard to the Screen Layout Diagram (see Fig. 4), we apply the two speciﬁc
measures shown for this artifact in Sect. 4.2 in order to evaluate the Visibility and
Interface Simplicity of the video game. By applying the Percentage of Screen Usage
formula we obtain PSU = 0.09 (by dividing the sum of the size of all the display
primitives by the screen size). This indicates that there is no usability problem related to
the Visibility attribute since PSU is in the threshold [PSU ≤ 0.1]. By applying the Total
Number of GUI Elements formula we obtain TNGUIE = 13 (by counting all the
display primitives in the diagram), which leads to a critical usability problem related to
the Interface Simplicity attribute since the value obtained is [TNGUIE > 10]. Table 3
shows the usability report associated to this usability problem (UP001).

Fig. 4. Street Fighter V screenshot and the corresponding Screen Layout Diagram

Fig. 5. Street Fighter V Screen Flow Diagram

Early Usability in Model-Driven Game Development

721

Table 3. Usability report for usability problem UP001
ID
Description
Affected attribute
Severity level
Artifact evaluated
Problem source
Recommendations

UP001
There are too many GUI Elements on the same game screen.
Appropriateness Recognisability/Interface Simplicity
Critical [TNGUIE = 13 > 10]
Screen Layout Diagram
Screen Layout Diagram
Collapse GUI elements that render the same information, such as the
image and the text container that portray the ﬁghter portrait and name
Table 4. Usability report for usability problem UP002

ID
Description
Affected attribute
Severity level
Artifact evaluated
Problem source
Recommendations

UP002
Players need to navigate through several screens to restart the game
Ease of Use/External Navigational Simplicity
Medium [SRPTG = 4 > 2]
Screen Flow Diagram
Screen Flow Diagram
Add a shortcut (e.g., retry) from the game-over screen to the gameplay
screen

With regard to the Screen Flow Diagram (see Fig. 5), we can use up to three speciﬁc
measures in order to evaluate the External Navigation Simplicity attribute belonging to the
video game’s Ease of Use sub-characteristic. In this example, we use the Shortest Return
Path To Gameplay formula obtaining a SRPTG = 4 (by counting the screen transitions
from gameplay, result, and vs screens), which leads to a medium usability problem related
to the External Navigation Simplicity, since the value obtained is in the threshold
[SRPTG > 2]. Table 4 presents this usability problem (UP002).
After applying the measures, we can conclude with regard to the Appropriateness
Recognisability sub-characteristic that the video game has poor Interface Simplicity but
very good Visibility, i.e., the game has a complex interface but effectively manages to
keep gameplay visible. With regard to the Ease of Use sub-characteristic, we realize
that the video game has poor External Navigational Simplicity, i.e., the game has a
complex flow of screens which makes it difﬁcult to restart the game.

5 Conclusions
We presented a usability inspection method that can be used in early stages of
model-driven video game development. The method relies on a usability model that has
been developed speciﬁcally for the video game domain. This model is aligned with the
SQuaRE standard and allows the evaluation of the usability of video games developed
according to a model-driven development process. Usability is considered throughout
the entire game development, thus enabling a more usable video game to be developed
and thereby reducing effort during the maintenance stage.

722

S. Abrahão et al.

The inherent features of model-driven development provide a suitable context in which
to perform usability evaluations since usability problems that may appear in the ﬁnal
application can be detected and corrected at the model level. Although the proposed usability
inspection method has been operationalized to a speciﬁc model-driven development process
that has been used to develop a video game for the XBOX360 platform, it can also be
integrated into other model-driven video game development processes by establishing the
relationships between the generic measures from the usability model and the modeling
primitives of the different software artifacts of the selected development process.
Nevertheless, we are aware that a further comparison with the users’ perception on
the usability of a video game obtained by applying a model-driven development
process is needed. For this reason, we are currently performing an empirical study to
compare the predicted usability of two video games measured using a set of metrics
from the Usability Model with the perceived usability of these video games.
Future work includes the application of the method to industrial case studies, the
deﬁnition of aggregation mechanisms for combining the values obtained from individual measures into usability indicators. We also plan to empirically validate the
effectiveness of the proposed usability inspection method by means of controlled
experiments in which the results of the evaluations obtained at the model level will be
compared to the ones obtained when players interact with the generated video game.

References
1. Ampatzoglou, A., Stamelos, I.: Software engineering research for computer games: a
systematic review. Inf. Softw. Technol. 52(9), 888–901 (2010). doi:10.1016/j.infsof.2010.
05.004. ISSN:0950-5849
2. Bastien, J.M., Scapin, D.L.: Ergonomic Criteria for the Evaluation of Human-Computer
Interfaces, version 2.1 (1993)
3. Bernhaupt, R., Eckschlager, M., Tscheligi, M.: Methods for evaluating games: how to measure
usability and user experience in games? In: Proceedings of the International Conference on
Advances in Computer Entertainment Technology, pp. 309–310. ACM Press, Salzburg (2007)
4. Calero, C., Ruiz, J., Piattini, M.: Classifying web metrics using the web quality model.
Emerald Group Publishing Limited 29(3), 227–248 (2005)
5. Fernandez, A., Insfran, E., Abrahão S., Carsí J.A., Montero, E.: Integrating usability
evaluation into model-driven video game development. In: 4th International Conference on
Human-Centered Software Engineering (HCSE), Toulouse, France (2012)
6. ISO/IEC 25010: Systems and software engineering, Systems and software Quality
Requirements and Evaluation (SQuaRE), System and software quality models (2011)
7. Järvinen, A., Heliö, S., Mäyrä, F.: Communication and Community in Digital Entertainment
Services. Prestudy Research Report, Hypermedia Laboratory, University of Tampere,
Tampere (2002). http://tampub.uta.ﬁ/tup/951-44-5432-4.pdf
8. Microsoft: Best Practices for Indie Games 3.1. http://create.msdn.com/en-US/education/
catalog/article/bestpractices_31
9. Montero, E., Carsí, J.A.: A platform-independent model for videogame gameplay
speciﬁcation. In: Digital Games Research Association Conference (DiGRA 2009), London,
UK (2009). http://www.digra.org/dl/db/09287.28003.pdf
10. Nacke, L.: From playability to a hierarchical game usability model. In: FuturePlay at Game
Developers Conference Canada, Vancouver, Canada (2009)

What Aspects of Context Should Be Described in Case
Studies About Software Teams? Preliminary Results
from a Mapping Study
Maria Teresa Baldassarre1 ✉ , César França2, and Fabio Q.B. da Silva3
(

)

1

2

Department of Informatics, University of Bari, Bari, Italy
mariateresa.baldassarre@uniba.it
Department of Statistics and Informatics, Federal Rural University of Pernambuco,
Recife, PE, Brazil
cesar@franssa.com
3
Center of Informatics, Federal University of Pernambuco, Recife, PE, Brazil
fabio@cin.ufpe.br

Abstract. In this article, we report the ﬁndings of a systematic literature mapping
study aimed at identifying contextual factors that should be described in case
studies about teams in software engineering. As a result, we identiﬁed 26 factors,
which we organized in ﬁve dimensions: characteristics of individuals, groups,
team processes, projects and organizations. These dimensions and factors can
guide future reports to present better descriptions of the context in which their
software teams are studied.

1

Introduction

Understanding how software teams work, and how best to design them is an issue of
high relevance for industry in general [1]. However, given the speciﬁcities of the soft‐
ware development activity, several theories from organizational behavior do not apply
directly in this ﬁeld [2], which has recently attracted attention and eﬀort from software
engineering researchers. However, given the large variety of soft factors and uncon‐
trollable variables involved in studies with human subjects, it has been challenging to
software engineering researchers to elaborate reliable and general theories or predictive
models on software development teamwork [3]. Recent secondary studies [3–5] point
out that the main weakness of those studies is that they still miss potentially relevant
contextual information. Enriching research reports with more contextual information
could bring many beneﬁts, such as increasing the strength of the presented evidences;
enhancing their transferability; and enabling future work to synthesize knowledge
following more systematic procedures, such as meta-ethnographies [6] and others [7].
In this research, we are interested in documenting the concrete characteristics of the
context that should be described in teamwork case study reports, to support the
improvement of this type of research. The currently available qualitative research
guidelines [8–11] are not very specific about that basically because they are designed

© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 723–730, 2016.
DOI: 10.1007/978-3-319-49094-6_61

724

M.T. Baldassarre et al.

to be independent of the research subject. To this end, we designed a systematic liter‐
ature mapping study with a twofold complementary approach: first, we review papers
that contain methodological recommendations on what the characteristics of teams
should be included in rich descriptions of contexts in studies involving software engi‐
neering teams; second, we look at papers that conducted case studies with software
teams to map what characteristics they actually have reported.
This is yet an in progress research. In this short article we present the results of the
ﬁrst part of the study, which is a thematic synthesis made in the six papers, manually
selected out from the 4,915 reviewed titles published between 2008 and 2014 in relevant
journals and conferences. Our results uncovered 26 diﬀerent contextual factors that
should be described in studies on software teams. These factors are organized in ﬁve
dimensions: characteristics of individuals, of groups, of team processes, of projects and
of organization. Although these factors are of high relevance to this type of research,
several of them, particularly those related to individuals and groups, were only brieﬂy
mentioned in the previously existing checklists. This paper is organized as follow: in
Sect. 2, we brieﬂy detail the referential work that motivated our research; in Sect. 3 we
describe our mapping study protocol and its threats to validity; in Sect. 4 we present
preliminary results of this study, which a discussed in more details in Sect. 5, where we
also conclude and posit our future steps.

2

Background

According to Yin [12], a case study is deﬁned as “an empirical inquiry that investigates
a contemporary phenomenon in depth and within its real life context, especially when
the boundaries between the phenomenon and context are not clearly evident”. Dybå
et al. [11] deﬁne context as “the circumstances, conditions, situations, or environments
that are external to a speciﬁc phenomenon and that enable or constrain it”. However,
sometimes, the context interfere in, or interacts with, the phenomenon of interest. Thus,
case studies reports must provide in-depth description of the natural settings, oﬀering
rich, qualitative data, to enable researchers to draw appropriate interpretations of the
data. A rich description of the context also help case studies to improve their transfer‐
ability, making easier to readers or user to decide whether the ﬁndings are likely be
applied to other situations or not [13]. Dybå et al. [11] also distinguishes two types of
context: the discrete context refers to speciﬁc contextual variables, such as organization
size, or product complexity; and the omnibus context refers to a broad perspective,
drawing attention to who, what, when, where, and why involved in the activities under
observation. In order to make the context description in case studies reports more
straightforward, Petersen and Wohlin [10] proposed a checklist of 21 discrete charac‐
teristics of context, organized in six diﬀerent context facets: product, processes, practices
and techniques, people, organization, and market. They suggest, however, that this
checklist should be extended, and adapted according to the object of study and to the
consensus of the speciﬁc community. In this article, we are speciﬁcally interested in the
characteristics of the discrete context of case studies that address teamwork related
phenomena. According to Marks et al. [14] there are at least three approaches to

What Aspects of Context Should Be Described in Case Studies

725

characterize a team: describing the characteristics of the team members; describing the
properties of the team that vary as a function of team context, and the environment in
which the team is acting; and describing the processes through which the team works.
Those approaches are only partially covered in Petersen and Wohlin’s [10] checklist.

3

Review Protocol

We have taken a diﬀerent approach from Petersen and Wohlin [10]. While they have
proposed a checklist from scratch, we decided to carry out a systematic literature
mapping study on scientiﬁc papers addressing teamwork in software engineering
published in highly qualiﬁed venues, looking for methodological recommendations on
what the characteristics of context should be described in future reports. A mapping
study is deﬁned as a type of systematic literature review whose the aim is “to identify
all research related to a speciﬁc topic” either to answer broad research questions or to
picture trends in a given area [16, 19]. In this paper, we aim to answer the following
research question:
RQ1. What characteristics of the context should be described in case studies about
teams in software engineering?
To this end, we designed a mapping study protocol based on Kitchenham and Char‐
ters’ [17] guidelines. The thematic synthesis followed the recommendations of Cruzes
and Dybå [18]. Overall, our study was conducted in three main steps. First, we carried
out a manual search in the proceedings of the last seven years (2008–2014) of four
conferences (ICSE, ESEM, EASE, CHASE) and four journals (EMSE, IST, JSS, TSE).
We chose this time period because the main known guidelines for qualitative research
in software engineering data from 2008 [15]. This initial selection was based on reading
titles and abstracts, resulted in the list of 71 potentially relevant studies. That activity
was done independently by two researchers. Second, we read full texts, and applied
inclusion and exclusion criteria. We included both theoretical articles addressing case
studies or other qualitative approaches in software engineering research; and secondary
research reviewing case studies or other qualitative methods, presenting a detailed
discussion on how primary research software engineering teams or teamwork in soft‐
ware projects reports should improve. We excluded papers not written in English or not
available on the web as well as invited papers, keynotes, workshop reports, books, theses
and dissertations. We also excluded papers of computer science domain but clearly out
of the software engineering scope or papers that did not target software engineers, soft‐
ware engineering teams, or that did not relate to teamwork or software teams. Each paper
was judged by two researchers, independently, and conﬂicting opinions were individu‐
ally discussed and resolved. In the end, six papers were selected to analysis and synthesis.
In the third step, we extracted relevant segments of the papers, labeled them with repre‐
sentative codes. Two researchers read and analyzed all the six papers independently.
Then, these two researchers, together, grouped the labels in higher order categories,
following an iterative and incremental process. The research team was composed of four
researchers. The data management was conducted using Google Forms and worksheet

726

M.T. Baldassarre et al.

ﬁles in Microsoft Excel ™. A quality assessment was not carried out because we
assumed the papers to have high quality given the sources where they were published.

4

Results

In this section we present the results of the thematic synthesis conducted in the six
selected papers. The references are listed at the appendix in the end of this article. The
results are organized according to Marks et al. [14] dimensions in the subsections below,
and summarized in Table 1.
Table 1. Contextual factors

4.1 Characteristics of the Individuals
This dimension groups individual characteristics of the team members, which charac‐
terize the people in the team individually, and are independent of others. These charac‐
teristics are relevant, because as pointed out in P4:
“The success of a software development project depends on people and human
resource factors (…) Not only the experience and competency of the team members is
important, but also their personal characteristics such as honestly, collaborative attitude,
sense of responsibility, readiness to learn, and work with others are considered equally
important, if not more” (P4, p. 1872, Sect. 3.2).

What Aspects of Context Should Be Described in Case Studies

727

Thus, in addition to the basic demographics (age, education, gender, function, etc.)
it seems to be relevant to characterize the individuals in the teams in terms of their
technical and interpersonal skills [P1, P3, P4, P5]; perceptions that they hold about the
organization [P1, P2, P3, P5]; level of career experience and expertise [P1, P4, P5];
personality traits and behavioral patterns [P2, P4]; and individual interests [P5].
4.2 Characteristics of the Group
When the individuals are put together to work as a team, there are characteristics that
pertain to the collectivity. In this dimension, we sought to cluster these properties of the
group that only make sense when the team is formed. In this dimension, we identiﬁed
ﬁve categories: Group composition refers to how the group is balanced in terms of
combined individual characteristics (knowledge, expertise, personality, etc.) [P1, P2,
P3, P4, P6]; Group structure regards the size, functions, and power allocated to the team
members [P1, P3, P4]; Geographical dispersion describes the physical conﬁguration of
the participants [P3, P4, P6]; Management style describes formal and informal relation‐
ships established between the group and its leader [P2, P3]; and ﬁnally Team climate
describes the relationship between the individuals and the group, such as commitment
with others, collective identity or team spirit, motivation to work, cohesion, trust, etc.
[P1, P2, P3, P6].
4.3 Characteristics of the Team Processes
The dimension of Team Processes is intended as the way of doing things on behalf of
teams, or how teams work and operate, and what they do. According to P6: “Besides,
to understand the level of coordination practices, communication process and tech‐
nology should be described as well. While availability of collaboration technology and
its adoption level could be a mediate factor of team dispersion on perform” (P6, p. 136,
Sect. 5.1).
This dimension includes six characteristics: Customer relationship clariﬁes the type
of relationship that the team has with its customers as well as the role that a customer
play in the team [P1, P3, P4, P5]; communication dynamics relates to how team members
communicate among each other [P2, P3, P4, P5, P6] includes communication processes,
interactions of the various participants, members interaction and how they share work
and so on; management and coordination activities [P3, P5, P6] describes the types of
activities and procedures that management puts in practice during a project involving
software development teams such as devolution of project responsibility to project busi‐
ness owners, measure team performance to assure team is working on schedule and
within budget, adopt standard procedures, software development process [P1, P2, P3,
P4, P5, P6] details the characteristics of the processes teams adopt, some examples are:
amount of documentation, code ownership, development practice, coding standards,
planning game activity; researcher involvement [P1, P2, P3, P4, P5, P6] discusses how
an external researcher should coordinate in a development team, and working schedule
[P5] articulates the mechanisms that should characterize team members’ schedule such
as dynamic schedule, no overtime, 40-h working weeks.

728

M.T. Baldassarre et al.

4.4 Characteristics of the Environment
The Environment dimension was intended as inclusive of all those factors that charac‐
terize the context in which a project is carried out and where a team operates. As we
deﬁned the categories and coded the papers in our synthesis analysis we agreed on the
fact that this dimension actually was more appropriate to be divided into two dimensions:
Project and Organization as the categories coded ﬁt better. So, as it can be seen in
Table 1, Environment category has been divided into Project and Organization.
The ﬁrst comprises features that a project should have, just to give an example:
“Initial characterization of a software development project can play an important
role in shaping participant’s attitudes and actions in the development process.” (P5, p.
11, Sect. 5.2.1). It includes categories such as: stakeholders [P3, P5, P6]; project impor‐
tance and criticality [P3, P5]; project complexity and variability [P3, P5]; project dura‐
tion and size [P5].
The Organization dimension relates to the characteristics that the organization should
have with respect to software development teams [P2]. “Thus changing the culture of
an organization is not a trivial task, since it requires changes on members’ ways of
thinking, communicating, relating to each other, as well as consolidated habits and
working manners.” (P2, p. 1956, Sect. 3.)
This dimension includes categories such as: working facilities and office outline [P1,
P3, P5, P6] that define in some way how the office should be organized like logistical
arrangements, physical and temporal settings, spaces distribution and so on The organi‐
zational culture category includes details relating to aspects such as ways of thinking,
historical organizational practices, corporate culture [P1, P2, P3, P4, P5]; Business
model category includes information of the models adopted such as organization type,
setting of the industry or of the study carried out [P1, P5]; Human-resources related
policies includes aspects about rewarding system adopted for staff and company poli‐
cies related to the employees’ motivation [P2, P3]. Finally, organizational size and struc‐
ture provides insight on the dimensions of the company with codes on multiple inde‐
pendent teams, organization structure and size, as well as having projects with small
teams [P1, P3].

5

Discussion and Conclusion

This research was motivated by the fact that research reports and studies of software
development teams ignore potentially relevant information of the particular context in
which they are conducted, which may directly compromise the transferability of this
type of research [20]. Less obviously, it also impacts the ability of the whole ﬁeld to
accumulate knowledge, learning from comparative synthesis of diﬀerent studies [19].
Dybå et al. [11] warns that what counts as context depends on the substantive
problem under scrutiny, so it is not possible to draw checklists that are general enough.
Our checklist is looking speciﬁcally to the study of software development teams, but
even so the relevance of the elements in the list may vary. It is indeed a long list of
contextual factors. Petersen and Wohlin [10] suggest that the importance of the discrete

What Aspects of Context Should Be Described in Case Studies

729

checklist is to help the researchers to make informed decisions about what to describe
in their reports.
Compared to previous work carried out by other authors [10, 14] we have particularly
focused on the characteristics that address teamwork related phenomena. To this end,
our categories related to organization, processes and project are common to the ones in
Petersen and Wohlins’ checklist [10] labeled as organization, processes/practices and
market; while other Petersen and Wohlin categories such as product have not been
considered in our dimensions as they may be not central to teamwork features. Our
categories are more oriented to human aspects, rather than the general context. This
motivates the identiﬁcation of both individual and group categories in the sense that they
focus on team member characteristics as single elements and as components of a group
immersed in an environment where the team acts and works through speciﬁc processes.
The contextual factors listed in Table 1 are also of some importance to the software
engineering practice, because they reveal a variety of aspects that software engineers
must deal with, and that managers should care about, because they can somehow impact
the eﬀectiveness of the team [20]. It was not our objective to map how diﬀerent prop‐
erties aﬀect the eﬀectiveness of the team, but that information can be found in the six
papers that we analyzed (Appendix A). Dutra et al. recently presented a synthesis of
these factors [4].
Finally, this study is part of a larger eﬀort to document characteristics of the context
that should be described in teamwork studies. This work was limited to identify other
authors’ suggestions, but it is not yet an exhaustive list. In the next step, we will look at
what factors have actually been reported in these types of studies, and why. We hope
that, by enriching their context descriptions, in the future we will be able to address more
sophisticated questions such as what software engineering approaches work best for a
given type of team, or how a team should be assembled and organized, given a set of
characteristics of the environment.

A Appendix: List of Selected Papers
[P1] T. Dybå and T. Dingsøyr. “Empirical studies of agile software development: A
systematic review”. IST, 50 (9–10), pp. 833–859 2008.
[P2] C. Tolfo and R. Wazlawick. “The inﬂuence of organizational culture on the
adoption of extreme programming”. JSS, 81 (11), pp. 1955–1967 2008.
[P3] T. Chow and D. Cao. “A survey study of critical success factors in agile software
projects”. JSS, 81 (6), pp. 961–971 2008.
[P4] S. Misra, V. Kumar and U. Kumar. “Identifying some important success factors
in adopting agile software development practices”. JSS, 82 (11), pp. 1869–1890 2009.
[P5] L. Mcleod, S. Macdonell and B. Doolin. “Qualitative research on software devel‐
opment: a longitudinal case study methodology”. EMSE, 16 (4), pp. 430–459 2011.
[P6] N. Anh, D. Cruzes and R. Conradi. “Dispersion, coordination and performance
in global software teams: a systematic review”. ESEM, pp. 129–138 2012.

730

M.T. Baldassarre et al.

References
1. Sawyer, S.: Software development teams. Commun. ACM 47(12), 95–99 (2004)
2. Sharp, H., Badoo, N., Beecham, S., Hall, T., Robinson, H.: Models of motivation in software
engineering. IST 51, 219–233 (2009)
3. Moe, N., Dingsøyr, T., Dybå, T.: A teamwork model for understanding an agile team: a case
study of a Scrum project. IST 52(5), 480–491 (2010)
4. Dutra, A., Prikladnicki, R., França, C.: What do we know about high performance teams in
software engineering? Results from a systematic literature review. In: SEAA 2015
5. Ribeiro, D., Cardoso, M., da Silva, F., França, C.: Using qualitative meta summary to
synthesize empirical ﬁndings in literature reviews. In: ESEM 2014
6. da Silva, F., Cruz, S., Gouveia, T., Capretz, L.:. Using meta-ethnography to synthesize
research: a worked example of the relations between personality on software team processes.
In: ESEM 2013
7. Cruzes, D., Dybå, T.: Research synthesis in software engineering: a tertiary study. IST 53(5),
440–455 (2011)
8. Seaman, C.: Qualitative methods in empirical studies of software engineering. IEEE TSE
25(4), 557–572 (1999)
9. Runeson, P., Höst, M.: Guidelines for conducting and reporting case study research in software
engineering. EMSE 14, 131–164 (2009)
10. Petersen, K., Wohlin, C.: Context in industrial software engineering research. In: ESEM 2009
11. Dybå, T., Sjøberg, D., Cruzes, D.: What works for whom, where, when, and why? On the role
of context in empirical software engineering. In: ESEM 2012
12. Yin, R.: Case Study Research: Design and Methods: Applied Social Research Methods Series,
vol. 5, 4th edn. SAGE Publications, Thousand Oaks (2009)
13. Merriam, S.: Qualitative Research: A Guide to Design and Implementation, 2nd edn. JosseyBass, [S.l.] (2009)
14. Marks, M., Mathieu, J., Zaccaro, S.: A temporally based framework and taxonomy of team
processes. Acad. Manage. Rev. 26(3), 356–376 (2001)
15. Dittrich, Y., John, M., Singer, J., Tessem, B.: Editorial for the special issue on qualitative
software engineering research. IST 49, 531–539 (2007)
16. da Silva, F., Santos, A., Soares, S., França, C., Monteiro, C., Maciel, F.: Six years of systematic
literature reviews in software engineering: an updated tertiary study. IST 53, 899–913 (2011)
17. Kitchenham, B., Charters, S.: Guidelines for performing systematic literature reviews in
software engineering. EBSE Technical Report (2007)
18. Cruzes, D., Dybå, T.: Recommended steps for thematic synthesis in software engineering. In:
ESEM (2011)
19. Carver, J.C., Juristo, N., Baldassarre, M.T., Vegas, S.: Replications of software engineering
experiments. Empir. Softw. Eng. J. 19(2), 267–276 (2014). doi:10.1007/s10664-013-9290-8
20. Ardimento, P., Caivano, D., Cimitile, M., Visaggio, G.: Empirical investigation of the eﬃcacy
and eﬃciency of tools for transferring software engineering knowledge. J. Inf. Knowl.
Manage. 7(3), 197–207 (2008)

Miscommunication in Software Projects: Early
Recognition Through Tendency Forecasts
Fabian Kortum(B) , Jil Klünder, and Kurt Schneider
Software Engineering Group, Leibniz Universität Hannover,
Welfengarten 1, 30167 Hannover, Germany
{fabian.kortum,jil.kluender,kurt.schneider}@inf.uni-hannover.de

Abstract. Eﬃcient team communication is essential for software project
success. Misunderstood or underestimated demands on customer requirements and insuﬃcient information sharing within a team can rapidly
cause the delay of software releases, hamper customer satisfaction or
even endanger the project succeed. The challenges remain to quantify the
right amount of communication according to durations, necessary eﬀort,
and the ambitions to avoid ﬁreﬁghting situations. Especially newly build
or less experienced teams often struggle with their information ﬂow. To
improve team communication performances for these teams, we build
an experience based classiﬁer model that interpolates tendency forecasts
with ﬁve approved team communication metrics from related work. The
model matches archival project communications with present team conditions and computes tendency forecasts for the ongoing project. These
future trends can indicate critical communication conditions right from
early phases. Hence, they can reduce risks of miscommunication during
a project.
Keywords: Machine learning · Team communication · Experience-base

1

Introduction

Successful software projects and today’s development processes are primarily
grounded on eﬃcient team communication. Communication within software
developing teams consists of several psychological facets, practical use of technologies and organizational expertise [5,12]. Tightened project costs and reduced
resources require further improvements to stay internationally competitive. Software projects with zero margins for errors, misunderstandings or functionality
issues lead to higher pressuring rates of developer conditions [11]. Risks of later
improvements or potential delay of releases can cause a radical loss of customer
satisfaction and probably the cancellation or failure of a project. Therefore, team
communication and the ﬂow of information seem to be a prerequisite for project
This work was funded by the German Research Foundation (DFG) under grant
number 263807701 (Project TeamFLOW, 2015–2017).
c Springer International Publishing AG 2016

P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 731–738, 2016.
DOI: 10.1007/978-3-319-49094-6 62

732

F. Kortum et al.

success [5]. The challenges remain in scheduling the right amount of communication according to durations, necessary eﬀort and the ambitions of a team
to avoid ﬁreﬁghting situations. Especially new build or less experienced teams
often struggle with their communication management. Tuckman [15] revealed
that teams typically change and expose their social behavior over time, settled
by the experiences made during each phase by the team members. Furthermore,
most project teams suﬀer stressful or troubling situations diﬀerently due to individual team characters and communication styles [11,12].
In this approach, we introduce a technique that enables future tendencies
of team communication behavior for ongoing software projects, resulted from
present team communication characteristics and experience-based support from
archival projects. We build a machine learning classiﬁer that operates on relevant
data records and interpolates a multi-week communication tendency with ﬁve
signiﬁcant metrics describing a team’s simpliﬁed communication behavior. The
applied archival data and communication metrics used within this approach
refer to a ﬁeld study from previous investigations with retrospectives on team
communication in student software projects [12]. A machine learning classiﬁer
which is developed in Java is realized through a k-nearest neighbor (kNN) model
with predictive characteristics to involve training and testing data. This kNN
model accesses archival project information from the training set and matches
them with present team communication conditions. After all, it interpolates
communication trends based on best matches with closely related conditions. The
resulting future trends can indicate critical communication behavior right after
early project phases and enable early adjustments. Due to dynamic changes of a
team’s communication behavior over time, the kNN model considers the entire
course of team communication since the project start. We validated our model
through leave-one-out-cross-validations (LOOCV), also encountered its overall
accuracy on team communication tendencies, measured as the aggregated error
rates between the model’s forecasts and real testing data references of archival
project records.
The model currently interpolates tendency forecasts with future perspectives
up to 14 weeks and operates with high accuracies during several project phases.
We present validating results and insights gained from applying both psychological and software engineering perspectives, additionally to their respective
interpretations. It would be highly desirable to obtain improved team communication for project success based on experience-based tendency assistances with
archival records and ongoing team communication conditions. In our vision, this
terminology can be especially helpful for newly formed teams and allow a better
understanding in scheduling and planning meeting quantities during diﬀerent
phases of project.

2

Related Work

This approach is grounded on several previous publishing from psychological
perspectives on socio-interactive team communications and predictive analytics.

Miscommunication in Software Projects

733

Kauﬀeld et al. [5] investigated that the interaction and meeting intensities of
software developing teams shows strong aﬀects to the project’s overall success.
Klünder et al. [6,7] and Schneider et al. [12] monitored and explored
socio-interactive communication behaviors of students in software projects. The
authors applied a multi-week ﬁeld study with real software projects and environmental framework to elicit project information with unsynthesized origin. In
cooperation with psychological experts, the authors derived and manifested ﬁve
relevant communication metrics which have a direct impact on team communication behavior.
Kumar et al. [10] describe the importance of training and guidance of communication ways for software engineering students. The authors use comparison
methods on two student software projects to determine their communication
style. The authors also explored pitfalls and indicate the communication diﬃculties for newly formed teams. This was out motivation to apply an assistance
approach combining both experience-based guidance and predictive analytics on
communication behavior.
Jørgensen et al. [4] elaborated the signiﬁcance of forecasts for eﬀort-scaling
in software projects. They established a sequential approach with distinctions of
measurable project outcomes under lessons learned circumstances. The author’s
goals were to validate whether experience-based information from a previous
project will aﬀect the following project based on the additional available source
of experienced data.
Sharma et al. [13] approached the accuracy of priority handler for occurring
bugs by using machine learning methodologies and cross-validations with an
experience base of earlier prioritized bugs. They analyzed and validated the
accuracy of 76 sample studies using support vector machines.
We use the author’s conceptual theory on experience-based approaches supported by machine learning to forecast team communication tendencies that
consider a team’s ongoing communication condition. In a previous approach, we
applied psychological perspectives and analytical methods to explore the feasibility of a ﬁrst simpliﬁed forecasting tool that can compute tendencies on team
communication intensities in support of a machine learning classiﬁer [6,9].

3
3.1

Methodology
Study Design

We accessed 34 student software projects from a previous ﬁeld study [12] that
provides a series of small, yet complete software projects including team communications in a variety of psychological and software project metrics. All software
projects were comparable in their framework conditions, complexity, and duration. Most importantly, they did not consist of synthetic made or biases and
intended for real use in their respective context by the customer.
We use ﬁve manifested metrics [7,12] to describe team communication behavior
through relevant expressions and attributes from the ﬁeld study records.

734

F. Kortum et al.

These metrics are compounds of communication attributes with matter for teams
like mutual network structures, communication paths, proportions of communication media channels and their intensity of use, as well as meeting distances, team
moods, and spirits. We extracted a synergistic conjunction with these ﬁve metrics
as predictable subjects involving FLOW distance [7], FLOW centralization [7], and
team spirits [14]. We also consider the positive and negative moods which have been
previously proven to be relevant factors for team communication [12].
An application realized in this approach uses the kNN algorithm. Furthermore, it supports the integration of entire collections with communication metrics for training the model and compute communication tendencies. An internal
diagnose functionality resolves the model’s predictions accuracy automatically
due to actually loaded training data, its quality and signiﬁcance of the raw information. The latter, it can help teams to validate whether their training data is
qualiﬁed as experience data for tendency forecasts or not.
3.2

Team Communication Metrics

FLOW distance is a metric considering diﬀerent kind of communication channels
like e-mail, meetings and phone calls and the perceived communication intensity [7]. We applied it as central metric for communication behavior [6] since it
indicates potential problems in the communication way of a team [12].
FLOW centralization combines the quantitative distribution of all communication ways between team members and resolves the dominance positioning of
the single members. It indicates the degree to which communication is focused
on a subset of an entire team [7].
During a software project, team moods yield impressive information with signiﬁcant reﬂection during releases, quality gates, issues and similar [11]. In this
approach, each team member’s mood is covered by positive and negative aﬀects
describing the feelings during a week [12]. A team’s emotional expression reﬂects
a direct condition about the project’s status and thus provides important information with impact to project success [3]. The positive and negative attributed
mood were aggregated into two central metrics [2].
In analogy to the team’s mood, team spirit represents a further emotional
reﬂection about the status of socio-interacting conditions. The metric used in
this approach is aggregated from each team’s responded atmosphere and subset
to inner team’s mutual trusts, creativity with innovative thinking, satisfaction on
team results, time availability, and schedulability, as well as inner performances
and productivity [12,14]. The aggregated team spirit metric presents a subjective
opinion indicating the balance of a team’s overall atmosphere and satisfaction.
Therefore, it can be expressed as the mean value for each teams rating.
Figure 1 presents a plot of the diﬀerences in 34 student software projects and
the weekly provided metrics. The ﬁgure shows normalized ranges from 0.0 to
1.0 of the selected metrics. Verifying the ranges and variances of data is a ﬁrst
necessary demand before building and training classiﬁer models.

Miscommunication in Software Projects

735

Fig. 1. Distribution of communication metrics from ﬁeld study reports

3.3

Machine Learning Classifier

Predictive analytics and its machine learning subset provide powerful algorithms
with diﬀerent characteristics on experience based forecasts. The kNN algorithm
is one of these representatives with outstanding qualities in information matching [1]. In a previous approach [9], we introduced the feasibility of a simpliﬁed
prediction model that considers team communication intensities. According to
the goals of this approach, we wanted to improve the model’s level of maturity
and realize a Java application with a higher accuracy under consideration of ﬁve
selectable metrics.
The developed application consists of several auto features, e.g. data normalizing for k-nearest training and testing data as well as an upload function
for a training set. We integrated an accuracy validation function that enables
teams to forecast diﬀerent tendencies under limitation to selected metrics. This
functional feature is associated with an inner process detecting the best ﬁtting
k-value to achieve most accurate forecasts. The algorithm measures the vector distance between communication metrics from the experience base and the
new inputs from ongoing communication conditions during a project [6,9]. Both
information are proceeded by matching a k-number of archival projects from the
training data that fulﬁll the closest team communication similarities with the
real communication in the past weeks. These best matches are used to interpolate
tendencies.
With respect to changing communication styles, this model considers the
matching with other communication metrics on a multi-span of time. The accuracy of weekly communication forecasts is limited to the model’s training data
and matching comparability. The k-value of matching algorithm manages a number of required matches with the reference inputs that are necessary to interpolate a new tendency. Diﬀerent k-values can have more or less good predictions.
We included a diagnostic support for the training data to extract a best operational k-value due to the quality and signiﬁcance of the training data. This
ensures the highest precision outcome due to the measured signiﬁcance of the
experience data.

4

Validation and Forecast Results

The accuracy of forecasts remains to the quality and signiﬁcance by the applied
training data and the selected k-value. We applied the leave-one-out-cross-

736

F. Kortum et al.

validation (LOOCV) to our forecasting application and approved its overall
accuracy [8] on team communication tendencies. The training data of our kNN
model consists of 34 student software projects; each providing 15 weeks of weekly
submitted team communication information. LOOCV folds a single team project
from the training set and inserts its communication information iteratively as
weekly inputs to the model. This resolves tendency forecasts for diﬀerent weekly
progressing stages during the lifespan of a project. The error between the fold
records and the forecast delivers a measurable prediction deviance. The average
deviance of the application through all weekly forecasts is deﬁned as the forecast’s overall error rate. This rate represents so far only the overall errors for a
single fold project. Afterward, LOOCV iterates and folds again for every further
team project from the entire training data.
In fact, this procedure results from 34 single overall error rates which reveal
an average total error rate of the current forecasting model of 11.5 %. The mean
precision of our model operates with 88.5 % on a maximum forecast horizon of
up to 14 weeks. Compared with the results from the previous approach [9], we
increased the models precision by 5.5 % while operating with ﬁve communication
metrics. The applied best k of 6 is auto-detected from the highest forecast’s
overall precision due to the training data. The integrated auto k-ﬁnder updates
its k-value with every change of training data to ensure the highest accurate
forecast. Figure 2 shows the distribution of all measured error rates according
to the progressed weeks during the project and decrease of the forecast horizon.
The decrease is linked to the reduce of left weeks until the projects ending.

Fig. 2. Overall error rates

The forecast’s accuracy has an almost constant error rate of 9 % exception for
the last two weeks. The data quality of team records and submission decreases
towards project’s end, which has a direct aﬀect to training data.
The wide spread of communication styles with weak correlation dependencies
to the previous weeks costs a total loss of 3 % from the model’s overall precision.

Miscommunication in Software Projects

737

Nevertheless, the application still computes remarkable forecasts over multiple
weeks with high accuracy and a ﬁnal overall precision of 88.5 %. The data quality
issues at the end seem hereby marginal. The almost constant level of error rates
during changing forecast horizons is an indicator for well-performing tendency
forecasts with the chosen team communication metrics.

5
5.1

Conclusion
Discussion

Meanwhile our approach leads to good results, the results cannot be overgeneralized.
A generalized support for any type of software project with team communication forecast cannot be granted. Although, this approach on student software
projects and its applied communication metrics presents parallels with software
projects in industrial environments. However, the team’s communication behavior and tendency errors strongly depend on the applied training data and should
be so far only used with student software project approaches. A practical validation has not been established yet. We plan to extend our model with data taken
from industrial projects. At this moment, our tool only works great for projects
with a duration of 15 weeks.
5.2

Conclusion

In this approach, we presented the feasibility of team communication forecasts
with the FLOW distance, FLOW centralization, team spirit, positive and negative moods as signiﬁcant team communication metrics. We developed a Java
application that enables the integration of various data collections and normalizes its data content to perform predictive operations. The core is based on a
kNN model that computes the correlation between a team’s present communication course and ﬁnding for k-nearest communications condition within the
models training data. The k-best matches become interpolated as future communication tendencies during a project.
The precision of current tendency forecasts is validated through LOOCV and
resulted in mean precisions of 88.5 %. The application is designed to consider
future types of numeric metrics with dependencies to team communication in
software projects. This approach is an encouraging step to support new software
developing teams in ﬁnding their optimal communication structures. Experience
based tendency forecasts in combination with continuous team communication
can help hereby to reduce risks of miscommunication and promote for better
software quality due to less pressuring phases.

References
1. Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “Nearest Neighbor”
meaningful? In: Beeri, C., Buneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp.
217–235. Springer, Heidelberg (1999). doi:10.1007/3-540-49257-7 15

738

F. Kortum et al.

2. Flory, J.D., Manuck, S.B., Matthews, K.A., Muldoon, M.F.: Serotonergic function
in the central nervous system is associated with daily ratings of positive mood.
Psychiatry Res. 129(1), 11–19 (2004)
3. Graziotin, D., Wang, X., Abrahamsson, P.: Happy software developers solve problems better: psychological measurements in empirical software engineering. CoRR
abs/1505.00922 (2015)
4. Jørgensen, M., Gruschke, T.M.: The impact of lessons-learned sessions on eﬀort
estimation and uncertainty assessments. IEEE Trans. Softw. Eng. 35(3), 368–383
(2009)
5. Kauﬀeld, S., Lehmann-Willenbrock, N.: Meetings matter eﬀects of team meetings
on team and organizational success. Small Group Res. 43(2), 130–158 (2012)
6. Klünder, J., Karras, O., Kortum, F., Schneider, K.: Forecasting communication
behavior in student software projects. In: Proceedings of the The 12th International
Conference on Predictive Models and Data Analytics in Software Engineering,
PROMISE 2016, pp. 1–8. ACM, New York (2016)
7. Klünder, J., Schneider, K., Kortum, F., Straube, J., Handke, L., Kauﬀeld, S.: Communication in teams - an expression of social conﬂicts. In: Bogdan, C., Gulliksen,
J., Sauer, S., Forbrig, P., Winckler, M., Johnson, C., Palanque, P., Bernhaupt,
R., Kis, F. (eds.) HCSE/HESSD -2016. LNCS, vol. 9856, pp. 111–129. Springer,
Heidelberg (2016). doi:10.1007/978-3-319-44902-9 8
8. Kohavi, R., et al.: A study of cross-validation and bootstrap for accuracy estimation
and model selection. In: IJCAI, vol. 14, pp. 1137–1145 (1995)
9. Kortum, F., Klünder, J.: Early diagnostics on team communication - experiencebased forecasts on student software projects. In: 10th International Conference
on the Quality of Information and Communications Technology (Quatic). IEEE
(2016)
10. Kumar, S., Wallace, C.: A tale of two projects: a pattern based comparison of
communication strategies in student software development. In: 2013 IEEE Frontiers
in Education Conference (FIE), pp. 1844–1850. IEEE (2013)
11. Liskin, O., Schneider, K., Kiesling, S., Kauﬀeld, S.: Meeting intensity as an indicator for project pressure: exploring meeting proﬁles. In: 2013 6th International
Workshop on Cooperative and Human Aspects of Software Engineering (CHASE),
pp. 153–156. IEEE (2013)
12. Schneider, K., Liskin, O., Paulsen, H., Kauﬀeld, S.: Media, mood, and meetings:
related to project success? ACM Trans. Comput. Educ. (TOCE) 15(4), 21 (2015)
13. Sharma, M., Bedi, P., Chaturvedi, K., Singh, V.: Predicting the priority of a
reported bug using machine learning techniques and cross project validation. In:
2012 12th International Conference on Intelligent Systems Design and Applications
(ISDA), pp. 539–545. IEEE (2012)
14. Sudhakar, G.P.: Software Development Teams: Performance, Productivity and
Innovation. Prentice-Hall India, Delhi (2016)
15. Tuckman, B.W., Jensen, M.A.C.: Stages of small-group development revisited.
Group & Organization Management 2(4), 419–427 (1977)

Doctoral Symposium

A Research Proposal: Tracking Open Source
Software Evolution for the Characterization
of Its Evolutionary Behavior
Munish Saini ✉ and Kuljit Kaur Chahal
(

)

Department of Computer Science, Guru Nanak Dev University, Amritsar, Punjab, India
munish_1_saini@yahoo.co.in, kuljitchahal@yahoo.com

Abstract. Open Source Software (OSS) has attracted a lot of attention in the last
decade. Due to the rising dominance of OSS in the software industry; not only
practitioners, but researchers as well as academicians are also keen to understand
the OSS development and evolution process. Several studies have been conducted
in the past in this regard. Most of the existing work relates to growth analysis of
OSS projects using source code level metrics. Lately, metrics related to change
activity have also been included to understand OSS evolution. Change activity as
recorded in Source Code Management (SCM) systems is used in a few cases.
Most of the work deals with ﬁnding change size, and change eﬀort distributions.
A few studies do change proﬁle analysis as OSS systems evolve. But that is
restricted to a few of the change categories, e.g., adaptive v/s non-adaptive
changes, corrective v/s non-corrective changes. This research study explores
change proﬁles of 106 OSS systems by extracting change type information from
their SCM repositories and then categorizing these changes automatically into
ﬁve diﬀerent categories - corrective, adaptive, perfective, preventive, and
enhancement related. The idea is to understand the way OSS projects undergo
change through long periods of time. The results indicate that change behavior
of the OSS projects is diﬀerent for diﬀerent types of changes.
Keywords: Open Source Software (OSS) · Software evolution · Change
classiﬁcation · Cluster analysis · Commit quality

1

Introduction

Software evolution refers to the phenomenon of continuous software change and growth
after its initial development. Understanding software evolution in general and OSS
projects evolution in particular has been of wide interest in the recent past. A wide range
of research studies has analyzed the OSS evolution from diﬀerent points of views such
as growth [1] quality [2], and group dynamics [3]. Several research studies, in the past
have studied the historical records of changes of OSS projects and found them useful
for understanding the software evolution. In this study, we explore the change evolution
of OSS projects focusing on their diﬀerent types, e.g. corrective, adaptive, perfective,
preventive, and enhancement.

© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 741–745, 2016.
DOI: 10.1007/978-3-319-49094-6_63

742

2

M. Saini and K. Kaur Chahal

Problem Deﬁnition

Kemerer and Slaughter [4] explore the evolution of commercial systems on the basis of
detailed change events. The study shows that systems pass through several evolutionary
phases. However, all the systems do not follow the same evolutionary paths. There are
few studies that explore the change evolution in OSS projects from this point of view.
Meqdadi et al. [5] study trends only in the adaptive changes in 3 OSS projects. As per
their analysis, adaptive changes decrease over the period of time. Most of the change
analysis studies in OSS projects investigate only the distributions of changes types,
change size, and change frequency. In this study, a large set of OSS projects are analyzed
to explore trends in their evolution corresponding to diﬀerent types of changes over a
period of time. The trend analysis indicates patterns in which diﬀerent change types
evolve. We have traced the evolution behavior of OSS projects at two distinct levels of
detail: ﬁne grained and coarse grained. In the ﬁne-grained analysis, OSS projects and
their repositories are explored in depth to analyze the repository level metrics; whereas
for coarse-grained analysis only project level metrics are explored for evaluation.
Following are the main objectives of this study:
A. Fine-grained analysis
• Analyzing the change proﬁles of OSS projects using their change logs.
• Change proﬁling of OSS projects to explore the evolutionary patterns with respect
to the types of changes performed over a period of time.
• Prediction of commits with respect to previous commits.
• Examining the change activity in OSS projects from three points of view: change
purpose (type), change size, and change eﬀort.
• Analyzing commit quality of OSS projects.
B. Coarse-grained analysis
• Understanding the programming language proﬁle of an OSS project.
• Analyze the existence of similar (or dissimilar) pattern in single and multiple contrib‐
utor projects in terms of: (a) main programming languages, (b) licensing, (c) number
of diﬀerent languages used, (d) identifying the relationship (between the number of
user, contributors, total commit, and LOC).

3

Signiﬁcance of the Work

Commits, and other related data in the SCM of OSS projects can be analyzed to see the
development activity, technical eﬀervescence and popularity of the projects [6]. All
stakeholders, including project managers, developers, and end users can use commit
evolution trends to understand the post implementation activities of a project (such as
enhancements of the old features of the project, the addition of new features, or improve‐
ments in its performance). When an end user selects an OSS project for use, he should
be able to see the history of change of the software project. How are changes handled
in the past? The kind of changes a software system encounters can indicate the health

A Research Proposal: Tracking OSS Evolution

743

of the software project (though it is not in the scope of this paper, but we believe that
our work creates a base for further discussion in this direction). This information is also
of interest to researchers and academicians as the OSS development paradigm oﬀers
them an opportunity to understand the software evolution process with the help of the
large data sets available in the public domain. What change patterns does an OSS system
follow? Can these change patterns be generalized across the problem domains or the
development paradigms? These are still open questions.

4

Results so Far

4.1 Fine-Grained Analysis of the OSS Projects
Analyzing the Change proﬁles of Software Systems using their Change Logs [8]
The change distribution and change pattern analysis of commit logs of six OSS
projects have shown that the corrective changes are most often performed in the OSS
projects, whereas the preventive changes have the minimum share in all the change logs
except for Twitter MySQL and Apache Tomcat as the code restructuring has been taken
up in these projects. In all the OSS projects, the enhancement activity is second among
the change activity performed. It is observed that change activity in all the projects
follows the up and down trend. It does not remain constant in any of the OSS projects.
Change Proﬁle Analysis using Cluster Analysis to identify the Evolution trends (the
paper under review)
Cluster analysis of all the change type of 106 OSS projects gives broadly three cate‐
gories of change activity: high activity, moderate activity, and low activity. In high
activity clusters, pressure to add more features dominates at the beginning followed by
corrective actions. Corrective changes stabilize before all other change types. Adaptive
and perfective changes catch slowly. It may be that developer teams focus on (problemspeciﬁc) features of projects ﬁrst, and the adaptive and perfective changes are imple‐
mented later to gain a competitive edge. In projects with moderate activity for diﬀerent
change types, enhancement changes are again dominating in the beginning and stabilize
later. Corrective changes increase at a moderate rate and surpass the enhancement
changes towards the end of the observation period. Interestingly, adaptive and perfective
changes, both, start at the same level.
Fuzzy Analysis and Prediction of OSS Projects Commit Activity [9, 10]
The computational method [7] of forecasting based on fuzzy time series is used to
predict the commit activity in OSS projects. The results have shown that the computa‐
tional method performs better than the best ﬁtted ARIMA model for the commit data
set of seven OSS projects. The error values in the case of the computational method
remain close to the zero, whereas they ﬂuctuate a lot in the case of the ARIMA.
Analyzing Change Proﬁle of OSS Projects using Burst Detection (the paper under
review)
The empirical investigation using the burst detection technique, in the change
activity and other change related attributes such as change eﬀort, and change size reveals

744

M. Saini and K. Kaur Chahal

that they follow almost same patterns of occurrence. The high or moderate activity
projects have signiﬁcant peak correlation for the change attributes. But it is not so in
projects with low levels of activity.
4.2 Coarse Grained Analysis
Understanding Language Proﬁle of OSS Projects using Association Rule Mining
[11]
The process of mining the most frequent associated group of languages from the data
set is performed by exploring the KDD approach and using Apriori association rule
mining algorithm. It provides combinations of the programming languages which are
often used together in a project. The generated rules indicate that the languages like C,
C++, C#, Java, PHP, Perl, Java Script, Python, Tcl, and Assembly are widely used in
the development of the OSS projects.
Empirically Investigating Evolution Trends of Single and Multiple Contributor
OSS Projects [12]
Popular languages (like Java, C, C++, PHP, Python, C# Perl, JavaScript, and Ruby)
and licenses (like GPL, GPL3 MIT, LGPL, BSD3-Clause, and GPL3-late) have shown
high usage trends in both types of projects. The average number of languages used in
single and multiple contributor projects is 4.60 and 8.51 respectively. The relationship
analysis between the attributes (user, LOC, commits, contributors) of single and multiple
projects indicates that multiple contributor projects have strong relationship between
contributors and commits; whereas contributor-LOC, user-commit, commit-LOC have
shown a moderate relation and user-LOC, user-contributor has shown a weak relation.
In single contributor projects, there is no strong relationship between diﬀerent sets of
attributes.
4.3 Work to Be Done
The commit messages of seven OSS projects (two large projects, three medium projects,
and two small projects) are to be analyzed by focusing on the syntactical wellness and
eﬀectiveness of commit messages. A new approach is proposed to calculate the commit
quality of a commit message by using 11 syntactical metrics of a commit message. The
commit quality of software projects helps developers and project analysts to better
understand the way OSS projects evolve. We will explore if there is any relation between
software evolution and the commit quality of OSS projects. We plan to correlate the
commit quality with number of contributors, and number of commits for large, medium
and small OSS projects.

A Research Proposal: Tracking OSS Evolution

745

References
1. Godfrey, M., Tu, Q.: Evolution in open source software: a case study. In: Proceeding of the
IEEE International Conference on Software Maintenance (ICSM), pp. 131–142 (2000)
2. Zhang, H., Kim, S.: Monitoring software quality evolution for defects. IEEE Softw. 4, 58–64
(2010)
3. Fang, Y., Neuﬁed, D.: Understanding sustained participation in open source software projects.
J. Manag. Inf. Syst. 25(4), 9–50 (2009)
4. Kemerer, C.F., Slaughter, S.A.: An empirical approach to studying software evolution. IEEE
Trans. Softw. Eng. 25(4), 493–509 (1999)
5. Meqdadi, O., Alhindawi, N., Collard, M., Maletic, J.: Towards understanding large-scale
adaptive changes from version histories. In: IEEE International Conference on Software
Maintenance, pp. 416–419 (2013)
6. Gonzalez-Barahona, J., Robles, G., Herriaz, I., Ortega, F.: Studying the laws of software
evolution in a long-lived FLOSS project. J. Softw. Evol. Process. 26(7), 589–612 (2014)
7. Singh, S.R.: A computational method of forecasting based on fuzzy time series. Math.
Comput. Simul. 79(3), 539–554 (2008)
8. Saini, M., Kaur, K.: Analyzing the change proﬁles of software systems using their change
logs. Int. J. Softw. Eng. (IJSE-Egypt) 7(2), 39–66 (2014)
9. Saini, M., Kaur, K.: Fuzzy analysis and prediction of commit activity in open source software
projects. IET Softw. J. 10(5), 136–146 (2016). doi:10.1049/iet-sen.2015.0087
10. Saini, M., Kaur, K.: Software Evolution Prediction using Fuzzy Analysis. EAIT. 349-354.
Indian statistical institute Kolkata, India (2014). doi:10.1109/EAIT.2014.66
11. Saini, M., Kaur, K.: Understanding languages proﬁle of open source software using
association rule mining. In: IEEE International Conference on Future Technologies
Conference, Fisherman’s Wharf San Francisco, United States (2016)
12. Saini, M., Kaur, K.: A study to ﬁnd signiﬁcant evolution trends in OSS projects with single
or multiple contributors. In: SCESM 2016, Heirank Business School, Noida, India (2016)

Transition from Plan-Driven to Agile:
An Action Research
Mohammad Abdur Razzak(B)
Lero–the Irish Software Research Centre, University of Limerick, Limerick, Ireland
abdur.razzak@lero.ie

Abstract. Nowadays, many individuals and teams involved on projects
are already using agile development techniques as part of their daily
work. However, we have much less experience in how to scale and manage agile practices in distributed software development. At this level,
there is an increasing need to standardize best practices to avoid reinvention and miscommunication across artifacts and processes. So, the
R
)
emerging growth of frameworks i.e.; Scaled Agile Framework (SAFe
R

does not
in industry requires an academic attention because SAFe
cover all aspects of agility required in a distributed environment context.
R
also reported that, geographically distributed
Early adopter of SAFe
teams experience lower productivity due to lack of alignment and solid
program execution. On the other hand, Global Teaming Model (GTM)
places particular emphasis on the organization and management of globally distributed development teams, it does not specify how to develop
software using Agile and Lean principles. Furthermore, the GTM recommended practices are normative, and do not prescribe how to implement
R
practices,
the practice. Thus, we hypothesize that combining SAFe
together with GTM recommendations will provide practitioners with a
framework of implementable practices.

Keywords: Global software development
work · Global Teaming Model

1

· Agile · Scaled agile frame-

Background

1.1

Global Software Development

Improved communication technologies, access to global talent, cheaper labour,
proximity to new markets and legal requirements have all contributed to the
growth in Global Software Development (GSD) [24]. GSD is software work undertaken in diﬀerent geographical locations, across national boundaries in a coordinated fashion through synchronous and asynchronous interaction [12]. As a result,
a growing number of software companies started to implement GSD to reduce
Supervisor: Dr. John Noll, Research Fellow, Lero–the Irish Software Research Centre,
University of Limerick, Limerick, Ireland; john.noll@lero.ie
c Springer International Publishing AG 2016

P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 746–750, 2016.
DOI: 10.1007/978-3-319-49094-6 64

Transition from Plan-Driven to Agile: An Action Research

747

time-to-market, increase operational eﬃciency, improve quality, and many more.
Over the years several recommendations have been published in support of this
complex development paradigm [6]. But, industrial experience shows that, GSD
is reputed to suﬀer from communication breakdowns, low morale and delays due
to teams being geographically, culturally and temporally separated [4,5,16].
1.2

Agile Methods

Traditionally Global Software Development has followed a plan driven, structured, waterfall approach, where tasks are allocated according to where they
appear in the software lifecycle [8]. It was considered that agile methods
envisaged for small projects and co-located teams [1] would be a poor ﬁt for
Global Software Development because both Agile and distributed development
approaches diﬀer signiﬁcantly [18]. Agile methods tend to rely on informal
processes to facilitate coordination whereas distributed software development
relies on formal mechanisms. There is a growing trend for companies to adopt
agile methods as reported in a tertiary study of Global Software Development
[11]. However, setting up an Agile team is usually motivated by beneﬁts such as
increased productivity, innovation, and employee satisfaction [21] but introducing an Agile method can change the culture (command and control model) in
a company; so to implement the Agile practices in global software environment
developers need to have more autonomy as well as decision-making power [9].
1.3

Agile Framework

The choice of scaling agile framework adopted or how the framework is tailored
will depend on the organization’s size or on “what works” based on their own
business goals, operative model, and needs. The Agile Scaling Knowledgebase
(ASK) (http://www.agilescaling.org/home.html) developed a matrix of diﬀerent
Agile frameworks namely Scrum-of-Scrum (SoS), Large Scale Scrum (LeSS),
R
), Disciplined Agile Delivery (DAD), Spotify
Scaled Agile Framework (SAFe 
R
, launched in 2012
Model, and Scrum at Scale. This matrix shows that SAFe
by Dean Leﬃngwell [13], focuses on large enterprises and takes a scaled approach
to Agile adoption.
R
framework includes [13,14]: a process model that covers the highThe SAFe
est and the lowest level in the enterprise; Associated Agile values and practices,
including Scrum [20], eXtreme Programming [3], Kanban and Lean Software
Development [17], and the Agile Manifesto [10]; four core values: code quality,
alignment, program execution and transparency.
1.4

Global Teaming Model

The Global Teaming Model (GTM) is a model for global software engineering,
with a particular emphasis on organization, governance and management of globally distributed development teams [19]. The Global Teaming Model follows the

748

M.A. Razzak

hierarchical structure and nomenclature of the CMMI [22]. At the highest level
there are two broad goals, “Deﬁne Global Project Management” and “Deﬁne Management Between Locations”. These goals are decomposed into Speciﬁc Practices
that deﬁne broad categories of practice that lead to the parent goals. Speciﬁc Practices are further elaborated into Sub-practices. Finally, Sub-practices have one
or more recommendations that specify detailed actions to be taken. In total, the
GTM has ﬁve Speciﬁc Practices, twenty Sub-practices, and 64 recommendations,
that have been validated against a real industrial case [4,7].

2

Research Problem

Software development is still driven by Infinite Diversity in Infinite Combinations– as a consequence of that, practitioners do not ask themselves why to adopt
these practices instead they ask how to scale these practices. So, there are two
visible challenges–ﬁrst one is, Scaling Practices and second one is, Combining
multiple development methodologies. Scaling Agile continues to be a challenge in
software development because when more teams works together then its required
strong coordination among teams as well as on the project [1,15,23]. Scott W.
Ambler [2] pointed out several factors, that needs to taken into consideration
when scaling Agile such as team size, geographical distribution, entrenched culture, system complexity, legacy systems, regulatory compliance, organizational
distribution, governance and enterprise focus.
To resolve the stated issues, a number of frameworks have been proposed
R
is one
to provide guidance for scaling agile across the enterprise and SAFe
R
of the commonly known models. SAFe has gained a rapid attention and an
important choice for organization that are in need of approaches for scaling agile
R
focuses merely on describing the best practices,
development. However, SAFe
roles and artifacts of agile and lean principles but no attempt has been made
R
also does not cover all aspects of
to describe implementation strategy. SAFe
R
agility required in a distributed environment context. Early adopter of SAFe
also reported that, geographically distributed teams experience lower productivity due to lack of alignment and solid program execution. On the other hand,
while the GTM places particular emphasis on the organization and management
of globally distributed development teams, it does not specify how to develop
software using Agile and Lean principles. Furthermore, the GTM recommended
practices are normative, and do not prescribe how to implement the practice.
R
practices, together with GTM recSo, we hypothesize that combining SAFe
ommendations will provide practitioners with a framework of implementable
practices. Thus, our research going forward is driven by following hypothesis:
Scaled Agile Framework can provide concrete implementations of GTM
recommendations for Agile development

3

Methods

This research will undertake a cycle of action research. The action research
process can be deﬁned as a number of learning cycles consisting of predeﬁned

Transition from Plan-Driven to Agile: An Action Research

749

stages. Within the action research a number of sub-methods will be use, namely
interviews and workshops for data collection, and grounded theory as well as statistical analysis. Alongside, we will also gather data through participant observations, informal meetings, informal communications (e-mails) and documents
from the organization and speciﬁc projects. This research will comprise multiple iterations over in three stages namely–Identify, Develop, and Implement &
Measure:
– Stage 1–Identify
• Identify the current “As-is” process in the industrial settings. (Via interviews, observation, Global Teaming survey, documentation inspection)
R
practices at 4 levels (Team, Program, Value Stream, Port• Extract SAFe
folio), and stakeholder roles and responsibilities
R
• Identify SAFe
practices that implement GTM Recommendations
• Identify GTM Recommendation that is not implemented
– Stage 2–Develop
R
practices that imple• Create “To-be” process model incorporating SAFe
ment unimplemented GTM recommendations
• Develop a process “Roadmap”, that shows how to transition from the
current “As-is” process, to the desired “To-be” process
– Stage 3–Implement and Measure
• Implement “To-be” process within the industrial settings and collect the
KPI’s
• Evaluate the implementation and revise Roadmap and “To-be” models
accordingly.
Acknowledgement. I would like to thanks Dr. Sarah Beecham, Senior Research Fellow, Lero. This work was supported, in part, by Science Foundation Ireland grants
10/CE/I1855 and 13/RC/2094 to Lero - the Irish Software Research Centre (www.
lero.ie).

References
1. Abrahamsson, P., Conboy, K., Wang, X.: Lots done, more to do: the current state
of agile systems development research. Eur. J. Inf. Syst. 18, 281–284 (2009)
2. Ambler, S.W.: Agile software development at scale. In: Meyer, B., Nawrocki, J.R.,
Walter, B. (eds.) CEE-SET 2007. LNCS, vol. 5082, pp. 1–12. Springer, Heidelberg
(2008). doi:10.1007/978-3-540-85279-7 1
3. Beck, K.: Extreme Programming Explained: Embrace Change. Addison-wesley professional, Boston (2000)
4. Beecham, S.: Motivating software engineers working in virtual teams across the
globe. In: Ruhe, G., Wohlin, C. (eds.) Software Project Management in a Changing
World, pp. 247–273. Springer, Heidelberg (2014)
5. Beecham, S., Noll, J.: What motivates software engineers working in global software
development? In: Abrahamsson, P., Corral, L., Oivo, M., Russo, B. (eds.) PROFES
2015. LNCS, vol. 9459, pp. 193–209. Springer, Heidelberg (2015). doi:10.1007/
978-3-319-26844-6 14

750

M.A. Razzak

6. Beecham, S., O’Leary, P., Richardson, I., Baker, S., Noll, J.: Who are we ng global
software engineering research for? In: IEEE 8th International Conference on Global
Software Engineering (ICGSE), pp. 41–50. IEEE (2013)
7. Beecham, S., Richardson, I., Noll, J.: Assessing the strength of global teaming
practices: apilot study. In: 2015 IEEE 10th International Conference on Global
Software Engineering (ICGSE), pp. 110–114. IEEE (2015)
8. Estler, H.C., Nordio, M., Furia, C.A., Meyer, B., Schneider, J.: Agile vs. structured
distributed software development: a case study. Empirical Softw. Eng. 19(5), 1197–
1224 (2014)
9. Fowler, M.: Using an agile software process with oﬀshore development. Capturado
em (2006). http://martinfowler.com/articles/agileOﬀshore.html
10. Fowler, M., Highsmith, J.: The agile manifesto. Softw. Dev. 9(8), 28–35 (2001)
11. Hanssen, G.K., Šmite, D., Moe, N.B.: Signs of agile trends in global software engineering research: a tertiary study. In: Sixth IEEE International Conference on
Global Software Engineering Workshop (ICGSEW), pp. 17–23. IEEE (2011)
12. Herbsleb, J.D., Moitra, D.: Global software development. Software 18(2), 16–20
(2001). IEEE
R
3.0 (2015). http://v3.
13. Leﬃngwell, D.: Scaled Agile Framework
scaledagileframework.com/. Accessed 15 Mar 2016
14. Levy, R., Short, M., Measey, P.: Agile foundations: principles, practices and frameworks. In: BCS (2015)
15. Maples, C.: Enterprise agile transformation: the two-year wall. In: Agile Conference, AGILE 2009, pp. 90–95. IEEE (2009)
16. Noll, J., Beecham, S., Richardson, I.: Global software development and collaboration: barriers and solutions. ACM Inroads 1(3), 66–78 (2010)
17. Poppendieck, M., Poppendieck, T.: Lean Software Development: An Agile Toolkit.
Addison-Wesley, Boston (2003)
18. Ramesh, B., Cao, L., Mohan, K., Xu, P.: Can distributed software development be
agile? Commun. ACM 49(10), 41–46 (2006)
19. Richardson, I., Casey, V., McCaﬀery, F., Burton, J., Beecham, S.: A process framework for global software engineering teams. Inf. Softw. Technol. 54(11), 1175–1191
(2012)
20. Schwaber, K., Beedle, M.: Agile Software Development with Scrum. Prentice Hall,
Upper Saddle River (2002)
21. Šmite, D., Moe, N.B., Ågerfalk, P.J.: Fundamentals of agile distributed software
development. In: Šmite, D., Moe, N.B., Ågerfalk, P.J. (eds.) Agility Across Time
and Space. Springer, Heidelberg (2010)
22. Team, C.P.: Capability maturity model integration for development. Software Engineering Institute Technical report CMU/SEI-2006-TR-008 (2006)
23. Turk, D., France, R., Rumpe, B.: Limitations of agile software processes. In: Third
International Conference on Extreme Programming and Flexible (2014)
24. Vizcaı́no, A., Garcı́a, F., Piattini, M., Beecham, S.: A validated ontology for global
software development. Comput. Stand. Interf. 46, 66–78 (2016)

Software Product Innovation Through Startup
Experimentation in Large Companies
Henry Edison(B)
Free University of Bozen-Bolzano, Bolzano, Italy
henry.edison@inf.unibz.it

Abstract. Startups are king of innovation. Their innovative products
not only have changed our lives today but also put long-established players under pressure. Corporate management is now looking for ways to
innovate like startup. Along with it, awareness and use of the Lean startup method has grown rapidly amongst the software startup community
and large companies in recent years. However, how large companies could
beneﬁt from this method is still not fully understood. To shed a light on
this issue, we conducted a multiple case study with 5 internal startups
from diﬀerent large companies.
Keywords: Innovation · Software product innovation
startup · Lean startup · Method in action

1

·

Internal

Introduction

The world has changed. With their innovative products, startups are disrupting traditional markets and replacing long-established players. Uber, Spotify,
AirBnB, to name a few, are examples of startup that have changed the way
we do business today. Today, Alibaba, a Chinese online retailer has become the
largest retail business in the world, which has greater revenues than Wal-Mart.
Startups oﬀer new product, business model and customer value at high speed
and with cutting edge technology. To identify an opportunity, startups continuously communicate with their potential users, iterate and experiment to ﬁnd
a repeatable and scalable business. When the opportunity seems not going anywhere, startups are willing to pivot immediately, by redirecting and reshaping
the product and the market. To compete in this age of disruption, companies
cannot rely on eﬃciency on both cost and time-to-market and quality improvement anymore [1]. Corporate management is now looking for ways to innovate
like startup. With a greater resource in-house, they hope that they are able
to bring innovative product to market with the speed and new technology as
startups do.
Along with it, awareness and use of the Lean startup method has grown
rapidly amongst the software startup community in recent years. Like most
previous methods, the development and promotion of these methods have been
almost entirely driven by practitioners and consultants, with little participation
c Springer International Publishing AG 2016

P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 751–756, 2016.
DOI: 10.1007/978-3-319-49094-6 65

752

H. Edison

from the research community during the early stages of evolution. These methods
are now the focus of more and more research eﬀorts.
Even though Lean startup method is originated in software startup, it has
also gained interest from large companies i.e. General Electric, 3M, Intuit, etc.
Recent survey on 170 corporate executives reveals that 82 % of them are using
some elements of Lean startup in their context [2]. Marijarvi et al. [3] report on
Finnish large companies’ experience in developing new software through internal
startups. Ries [4] claims that Lean startup bears potential to improve the innovativeness of large companies. However, scientiﬁc and empirical studies regarding
the implementation of Lean start-up in large software organisations are rare [5].
Based on this observation, the research questions investigated in this study are:
– RQ1: How Lean startup approach is applied to internal startup?
– RQ2: What are the challenges that Lean startup based internal startup faced
by?
– RQ3: What are the key factors aﬀecting Lean startup based internal startup?
– RQ4: How does Lean startup approach applied by internal startup support
software product innovation?

2

Related Work

Software product innovation (SPI) refers to the introduction of new software
product to an existing or new market [6]. Due to its nature, SPI is diﬀerent
with product innovation in general. Software is intangible and time is the main
resource consumed to write, compile and test the code [7]. As in other industry,
SPI is triggered by technology or market opportunity [8]. Technology is used
to improve the current or oﬀer new functionality. Market opportunity is arisen
out of the unmet customer needs from the current oﬀer or to address the newly
revealed customer needs. SPI diﬀers from software process innovation, which
refers to the implementation of new process (i.e. OO development, CASE tools,
open source, etc.) [9]. Software process innovation does not lead to SPI.
Our previous work reveals that the current research streams on SPI focus on
ﬁve diﬀerent areas: grassroot innovation, early user integration, agile-based innovation, startup experimentation and open innovation [9]. The ﬁrst two areas are
looking at how to capture new ideas from outside companies i.e. users, customers,
competitors, etc., and turn them into product in-house. Agile-based innovation
seeks a way to generate innovation using agile practise. Rather than developing
new product internally, research on open innovation suggests to collaborate with
external entity i.e. through living lab.
Startup experimentation approach is one of emergent themes in SPI which
is inspired by Lean startup method [10,11]. In this approach, software is developed and validated in a continuous experiments with all stakeholders. Bosch
[12] introduces an innovation experimentation system to minimise Research and
Development (R&D) investment and increase customer satisfaction. In this system, R&D is responsible to develop the product iteratively (2–4 weeks) based
on customer feedback. However, the method is limited to SaaS and embedded

SPI Through Startup Experimentation in Large Companies

753

system. Based on Bosch’s study, Fagerholm et al. [10] and Lindgren and Münch
[11] propose a continuous experimentation system, which continuously testing
the value of the product to users. These studies emphasise more on product
development itself and how to capture the product’s value. In this study, we do
not only look at product development but also business development.
Current research on Lean startup method is centred on applying its method
in software startup context to develop new product i.e. [13–15]. Very few peerreviewed studies investigate how Lean startup method support SPI in large companies. Our study is one of the ﬁrst eﬀort to establish the empirical and scientiﬁc
evidence of Lean startup method to support SPI in large companies.

3

Research Approach

Due to the uniqueness and complex nature of the phenomenon and the intention
to achieve an in-depth understanding of it, a multiple case study is considered a
suitable research approach. The case companies is selected based on the following
criteria: (1) the company develops software in-house, (2) a dedicated team is
responsible from ideation to commercialisation of a new software, and (3) the
software falls out of the current main product line. The unit of analysis in this
study is a development team or internal startup.
There are ﬁve cases involved in this study. Two of them are in the same
company. Some of the case companies will remain anonymous at the request of
the companies. The proﬁle of ﬁve internal startups are shown in Table 1.
Table 1. The proﬁle of the ﬁve cases
Lokki

Team A

Team B

Team C

Team D

Company

F-Secure

XCo

YCo

ZCo

ZCo

Business
domain

Cyber secu- Print
rity and pri- directory
vacy
publisher

Telecommunication

Classified advertisement

Member

6–7

7–10

5–18

5–6

Product

Location
sharing

Prepayment Audio & video con- e-Commerce
e-Commerce
platform
versation
platform
(gen- platform
eral)
(used cars)

Customer

Family

Merchant’s
owners

Segmented users

Segmented users

Segmented
users

Current
status

Terminated

Scaling

Scaling

Development

Terminated

Timeframe

2012–2014

#interviews 8

3–5

2014-now

2013-now

2016-now

2014–2016

2

3

2

1

Semi-structured interviews were used as the primary data collection method.
To better understand the phenomenon, several members were interviewed (see
Table 1). The interviewees were selected based on their involvement in the development and their availability in the interview process. In the case of Team A,

754

H. Edison

the team lead did not recommend us to talk to other current members. She
argued that they recently joined the team.
Most of the interviews were done in their oﬃce, but some of them were done
through Skype due to geographically constraint. Each interview lasted between
one and two hour, and was recorded. All interviews were transcribed verbatim.
Notes were taken during the interviews. Other supporting materials, such as
presentation, white papers, etc. were also collected to triangulate the interview
data.
To guide the study process, we employ method in action framework [16]
as the conceptual framework. The framework recognises the complex nature of
software development and each component aﬀects the overall system. However,
the framework does not prescribe detailed and speciﬁc action to use. It allows
us to reﬂect on the software development as rich and complex phenomenon
inﬂuenced by the components and interactions [17].

4

Initial Results

Except Team B and Team C, the startup experimentation was initiated by corporate management as part of strategy exercise to look for growth through product
innovation. In these cases, management was responsible to mobilise the resources
needed by the team. In the cases of Team B and Team C, the initiative came
from the employees who found a gap in the current market and technology. As
consequences, they had to ﬁnd a way to convince the management and get all
the resources needed. Hence, an internal startup was established to develop and
bring the new product to market. Based on Lokki case, the detail process on
how Lean startup method is applied to internal startup has been published in
[18]. The framework used in this study is used to analyse the remaining cases.
Using method in action framework, our initial results identify the challenges
faced by internal startups. The initial key challenges are presented in Table 2.
In all cases, the core team members were recruited internally and assigned
dedicatedly to that team. In the case of Team B, the team found diﬃculties to
recruit new members externally because YCo is not known as software developing
company. Moreover, YCo was struggling with attracting new talent to join the
company. In the case of Lokki, it was not clear to the employees what it means
by working in internal startup. Moreover, not everybody has entrepreneurial
mindset.
During the development process, all teams had to report the progress to
corporate management periodically. The role of founder is found in Team B and
Team C case. A founder is the one who has a vision about the new product. In
the other cases, there was no founder role inside the team. Hence they had to
ﬁgure it out what product should be developed. In Team A case, the internal
startup employed Design Thinking approach to seek for new idea. In both Lokki
case and Team D, no speciﬁc method was employed. The team members were
brainstorming to collect and decide ideas to be implemented. No metrics were
collected during that period until they had approval from management to start
the development.

SPI Through Startup Experimentation in Large Companies

755

Table 2. The key challenges of Lean based internal startups
Key challenges
Organisational context

Overhead communications
Potential internal conflict of interest

Company strategy

Strategy change
Organisational championship
Aligning between new product and company strategy

Human resources

Entrepreneurial mindset
No “real” founder, no vision
Recruiting new talent

Development/Business context Fear of cannibalisation
Autonomy in decision making process
Balancing the needs of all stakeholders

Our initial results show that only Team A had freedom in both product and
business development, whereas in the other cases, the team only had freedom
in product development. When it comes to deﬁning business model, the team
needed approval from management. In the case of Team A, the team was supported by the CEO. When new management came in, the CEO protected the
initiative. They were allowed to continue the project even though at that time,
the product did not bring any revenue. In the case of Team D, they did not get
approval for validating new business model, thus the initiative was closed. In
the case of Lokki, due to the strategy change, the new product had fell beyond
the core business. Moreover, no organisational championship that were able to
protect them led to the termination of that initiative.

5

Conclusion and Future Work

With the conceptual framework presented herein, we are able to understand
better how Lean startup method is used in large companies. This study proposes
several speciﬁc contributions. Firstly, the framework should facilitate companies
to better use Lean startup method suited to their needs. Secondly, based on the
identiﬁed key challenges companies can prepare and undertake any preemptive
actions to overcome or minimise these challenges. Third, using the common key
factors, companies can maximise its potential to support product innovation
through internal startup.
The next step of this study is to continue in-depth data analysis to all cases
using the same protocol to answer all research questions. Based on the analysis,
the framework will be reﬁned and extended. A cross-case comparison will be
performed to identify the common pattern among the cases.

756

H. Edison

References
1. Rejeb, H.B., Morel-Guimaraes, L., Boly, V., Assiélou, N.G.: Measuring innovation best practices: improvement of an innovation index integrating threshold and
synergy eﬀects. Technovation 28, 838–854 (2008)
2. Kirsner, S.: The barriers big companies face when they try to act like lean startups (2016). https://hbr.org/2016/08/the-barriers-big-companies-face-when-theytry-to-act-like-lean-startups
3. Marijarvi, J., Hokkanen, L., Komssi, M., Kiljander, H., Xu, Y., Raatikainen, M.,
Seppanen, P., Heininen, J., Koivulahti-Ojala, M., Helenius, M., Jarvinen, J.: The
cookbook for successful internal startups. In: DIGILE and N4S (2016)
4. Ries, E.: The Lean Startup: How Today’s Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesse. Crown Business, New York (2011)
5. Edison, H., Wang, X., Abrahamsson, P.: Lean startup: why large software companies should care. In: XP 2015 Scientiﬁc Workshop Proceedings, pp. 2:1–2:7 (2015)
6. Lippoldt, D., Stryszowski, P.: Innovation in the Software Sector. OECD, Paris
(2009)
7. Moe, N.B., Barney, S., Aurum, A., Khurum, M., Wohlin, C., Barney, H.T.,
Gorschek, T., Winata, M.: Fostering and sustaining innovation in a fast growing agile company. In: Dieste, O., Jedlitschka, A., Juristo, N. (eds.) PROFES
2012. LNCS, vol. 7343, pp. 160–174. Springer, Heidelberg (2012). doi:10.1007/
978-3-642-31063-8 13
8. Desouza, K.C., Awazu, Y., Kim, J.: Managing radical software engineering: leverage order and chaos. IJTPM 8(1), 22–40 (2008)
9. Edison, H., Duc, A.N., Jabangwe, R., Wang, X., Abrahamsson, P.: An investigation
into software product innovation: a systematic literature review. In: ICE/IEEE
International Technology Management 2016 Proceedings (2016)
10. Fagerholm, F., Guinea, A.S., Mäenpää, H., Münch, J.: Building blocks for continuous experimentation. In: RCoSE 2014 Proceedings, pp. 26–35 (2014)
11. Lindgren, E., Münch, J.: Software development as an experiment system: a qualitative survey on the state of the practice. In: Lassenius, C., Dingsøyr, T.,
Paasivaara, M. (eds.) XP 2015. LNBIP, vol. 212, pp. 117–128. Springer, Heidelberg
(2015). doi:10.1007/978-3-319-18612-2 10
12. Bosch, J.: Building products as innovation experiment systems. In: Cusumano,
M.A., Iyer, B., Venkatraman, N. (eds.) ICSOB 2012. LNBIP, vol. 114, pp. 27–39.
Springer, Heidelberg (2012). doi:10.1007/978-3-642-30746-1 3
13. Efeoglu, A., Moller, C., Sérié, M.: Solution prototyping with design thinking - social
media for sap store: a case study. Commun. Comput. Inf. Sci. 447, 99–110 (2014)
14. Haniotis, J.: Innovation jams: lessons in agile product development - an experience
report. In: Agile Conference 2011 Proceedings, pp. 223–229 (2011)
15. May, B.: Applying lean startup: an experience report - lean & lean ux by a ux
veteran: lessons learned in creating & launching a complex consumer app. In: Agile
Conference 2012 Proceedings, pp. 141–147 (2012)
16. Fitzgerald, B., Russo, N.L., Stolterman, E.: Information Systems Development:
Methods in Action. McGraw-Hill Education, Boston (2002)
17. O’Neill, S., Morgan, L., Conboy, K.: A framework for investigating open innovation
processes in ISD. In: ICIS 2011 Proceedings (2011)
18. Edison, H., Wang, X., Abrahamsson, P.: Product innovation through internal
startup in large software companies: a case study. In: Euromicro SEAA 2016 Proceedings (2016)

Tutorials

Tutorials at PROFES 2016
Daniela S. Cruzes1(&) and Sabrina Markzac2
1

2

SINTEF-ICT, Trondheim, Norway
danielac@sintef.no
Computer Science School – PUCRS, Porto Alegre, RS, Brazil
sabrina.marczak@pucrs.br

Abstract. PROFES 2016 hosts nine exciting tutorials that will complement and
enhance the main conference program, offering a wider knowledge perspective
around the conference topics. The tutorials provide insights into special topics of
current and ongoing relevance to the conference focus areas. We have divided
the program in ﬁve special tracks: innovation and speed, software security,
software quality, regulated software, DevOps and Lean Startups. Our goal is to
have practitioners from different companies participating on this day.

1 Introduction to the Tutorials
PROFES 2016 hosts nine exciting tutorials that will complement and enhance the main
conference program, offering a wider knowledge perspective around the conference
topics. We have divided the program in ﬁve special tracks: Innovation and Speed,
Software Security, Software Quality, Regulated Software, DevOps and Lean Startups
(See Table 1).
On Track 1, Innovation and Speed are the focus. On the Design Thinking tutorial,
the presenters will give a hands-on introduction into Design Thinking and rough
physical prototyping in the early stages of any kind of product development – physical
as well as digital. On the Continuous experimentation tutorial, insights on how to use
continuous experimentation to steer development towards rapid value creation and to
avoid unnecessary development efforts will be the focus.
On Track 2, Software Security will be addressed on a full day tutorial, which will
provide a brief introduction to the core principles of software security, threat modeling
using data flow diagrams, attack trees and misuse cases. On the second part of the
tutorial, the participants will have a hands-on on Protection Poker, a tool for risk
estimation to be used as part of the sprint planning meeting.
On Track 3, Software Quality will be addressed on the perspective of Risk-Based
Testing and Architecture Evaluation. In the Risk-based Software Testing, the presenters
introduce the concept of risk in software testing as well as a practical approach for
developing a risk-based test strategy. The tutorial is based on results from previous
research and studies investigating the introduction of risk-based testing in large organizations as well as small and medium enterprises. On the Architecture Evaluation
tutorial, there will be a presentation of a method for architecture evaluation, with a
walk-through in an example of a previously performed evaluation.
© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 759–760, 2016.
DOI: 10.1007/978-3-319-49094-6

760

D.S. Cruzes and S. Markzac
Table 1. Tutorials at PROFES 2016

Track 1:
Innovation
and Speed

Track 2:
Software
Security
Track 3:
Software
Quality
Track 4:
Regulated
Software
Track 5:
DevOps and
Lean Startups

Morning
Afternoon
Continuous Experimentation:
Design Thinking for Software
Accelerating Innovation through
Innovations – a crash course in
Highly Effective Experiments
rough physical prototyping
(Martin Steinert, Federico Lozano & (Jürgen Münch)
Matilde Bisballe)
Practical Software Security in a Continuously Deploying World
(Martin Gilje Jaatun & Inger Anne Tøndel)
Risk-Based Software Testing:
Increasing Effectiveness and
Efﬁciency in Testing
(Michael Felderer & Rudolf Ramler)
Experience from integrating Agile
Development with process standards
like ASPICE and ISO 26262
(Even-Andre Karlsson)
Creating Champions and battling
Dragons – How to create a DevOps
culture?
(Pål Thomassen & Ingrid Sorgendal)

Architecture Evaluation – Threat or
opportunity?
(Even-Andre Karlsson)
Safe Scrum
(Geir K. Hanssen, Thor Myklebust,
Tor Stålhane & Børge Haugset)
Lean startups in established
companies: How to make it really
happen and how to avoid common
pitfalls?
(Nils Moe & Tone Merethe Aasen)

On Track 4, the tutorials focus on Regulated Software; Safety-critical software is
becoming an increasingly larger part of safety systems, such as ﬁre-detection, dynamic
positioning, and autonomous driving. In the morning, the presenters will introduce the
basic principles of process standards like ASPICE and ISO 26262, and also practical
experience how to incorporate them in an agile environment, how to incorporate safety
activities into sprints, handling of traceability in an agile environment, handling of
reviews, change management, etc. These standards have not been adapted to agile
development, and many of the underlying assumptions are based on a waterfall model
and many organizations have problems joining these two words. In the afternoon, the
tutorial will explain the speciﬁc challenges related to development of safety-critical
systems and explain how Scrum may be adapted to create a flexible and efﬁcient process
for both development and certiﬁcation, based on the SafeScrum process framework.
On Track 5, the two tutorials will focus on two very actual topics in software
companies, DevOps and Lean Startups. The term DevOps is still often misunderstood,
and is typically associated with many types of automation tools. However, while a key
strategy in many well performing teams, automation is a consequence of DevOps
workflows – not the main point. In this tutorial the presenters will demystify the term
and give some insights on how to change an organization towards DevOps culture. In
the afternoon, the tutorial will describe the idea and motivation behind Lean Startup,
why many agile companies are implementing this approach today, and how they do it.
Further, the tutorial will highlight challenges companies face when adopting the Lean
Startup approach, and explain why the Lean Startup team needs a high degree of
autonomy, and how to enable such autonomy.

Continuous Experimentation: Accelerating
Innovation Through Highly Effective
Experiments
Jürgen Münch1,2(&)
1

Herman Hollerith Center, Reutlingen University,
Danziger Straße 6, 71034 Böblingen, Germany
Juergen.Muench@Reutlingen-University.de,
Juergen.Muench@cs.helsinki.fi
2
Department of Computer Science, University of Helsinki,
P.O. Box 68, 00014 Helsinki, FI, Finland

Abstract. Finding the right scope for product development in order to build
innovative products that customers want is crucial for success. Continuous
experimentation is an important means to steer development towards rapid value
creation and to avoid unnecessary development efforts. Insights from such
experiments can directly influence frequent iterative deliveries. Continuous
experimentation helps companies to gain competitive advantage by reducing
uncertainties and rapidly ﬁnding product roadmaps that work. However,
deﬁning a product strategy in a testable way and running the right experiments
in an effective way is hard. Setting up experiments wrong can lead to false
results and wrong business decisions.

1 What You Will Learn
In this hands-on tutorial you will learn the tactics and habits for highly effective
experiments and how to introduce them into your company. Join this tutorial to learn
how to get out the most out of continuous experimentation.
– How to identify the relevant questions we need to answer for making good product
decisions.
– How to ﬁnd and formulate the right hypotheses to test.
– What are the components of a good hypothesis?
– How to deﬁne metrics that inform product decisions.
– How to select the right experiments.
– How to justify the efforts for experimentation.
– How to align experiments with your product decisions and product strategy.
– How to transition your organization towards continuous experimentation.

© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 761–762, 2016.
DOI: 10.1007/978-3-319-49094-6

762

J. Münch

2 Who Should Attend
This presentation is aimed at
–
–
–
–
–
–
–
–

product managers,
innovation managers,
startup founders,
business people,
software developers,
consultants,
coaches, and
anyone who is interested in making an impact with their products through
experimentation.

3 Who is Teaching This Tutorial
Jürgen Münch is a Professor of Software Engineering at Reutlingen University, Germany, and a Research Director in the Department of Computer Science at the
University of Helsinki, Finland. He regularly teaches product management courses and
helps companies to develop innovation capabilities and new digitally-enabled products
and services. He specializes in software engineering, in particular data- and
value-driven software development, product management, agile engineering, and
startups. Results are documented in ﬁve books and more than 150 refereed
publications.

4 Outline
1. Why Experiments?
2. Setting up Highly Effective Experiments
3. Achieving Breakthrough

5 What Former Attendees Said
– One of the most enriching courses.
– Jürgen has a very profound practical and theoretical knowledge.
– Deﬁnitely recommendable.

Integrating Agile Development with Process
Standards Like ASPICE and ISO 26262
Even-André Karlsson(&)
Addalot Consulting AB, Gråbrödersgatan 8, 211 21 Malmö, Sweden
even-andre.karlsson@addalot.se

Abstract. In many industries there is dual pressure on both being more agile
and adaptive to changing requirements at the same time as being compliant with
process standards like ASPICE and ISO 26262. These standards have not been
adapted to agile development, and many of the underlying assumptions are
based on a waterfall model. Many organizations have problems joining these
two words. In this tutorial we will look at the basic principles in these some
standards, and also practical experience how to incorporate them in an agile
environment. Examples that we will show are e.g. how to incorporate safety
activities into sprints, handling of traceability in an agile environment, handling
of reviews, change management, etc.

1 Biography and Contact Information of Tutorial Presenters
Even-André Karlsson has over 20 years’ experience as a consultant for larger software companies. One of his areas of expertise is evaluating, documenting and development of processes for software architecture. Even’s experience includes modelling
Ericsson GPRS and BTS systems. He served as adjunct professor at the Lund Technical
University for three years teaching an industrial course in Software Architecture.

2 Description, to be Used for Evaluation, Including Aims
and/or Learning Objectives
Learning objectives
• Understand how to be compliant with process standards in an agile process.
• Understand the major obstacles, and possible ways to handle them
• Get some practical experience from real world organizations that have addressed
these issues.

3 Target Audience and Desired Number of Participants
(Minimum and Maximum)
This presentation is interesting for developers and managers that have to tackle this
challenge. A basic understanding of agile development and process standards is useful.
No restriction on number of participants.
© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 763–764, 2016.
DOI: 10.1007/978-3-319-49094-6

764

E.-A. Karlsson

4 Tentative Agenda
Introduction (50 min)
• Who am I? Who are you?
• Agile development
• Process standards, e.g. ISO 26262, ASPICE
Areas of concern with examples (2 × 50 min)
•
•
•
•
•

Documentation
Traceability
Up front analysis and design
Planning and support processes
Safety activities

Architecture Evaluation - Threat
or Opportunity?
Even-André Karlsson(&)
Addalot Consulting AB, Gråbrödersgatan 8, 211 21 Malmö, Sweden
even-andre.karlsson@addalot.se

Abstract. Addalot has in many contexts been commissioned to evaluate the
architecture of a product. There may be situations where you want to buy a
company, taking over a product, or that you have purchased a product, but are
not satisﬁed and want to gain a deeper understanding of the evidence and
weaknesses. The purpose of the architecture evaluation is to obtain an objective
analysis for future product decisions. We usually say that an architecture
evaluation of a system is that an inspection of a car or a transfer inspection of a
house - something that should be mandatory. In this tutorial we want to go
through the method we use to do this, what areas we analyze, and common
weaknesses and risks that we often encounter. We conclude by walking through
an example of an evaluation as we recently did for Swedish Radio.

1 Biography and Contact Information of Tutorial Presenters
Even-André Karlsson has over 20 years’ experience as a consultant for larger software companies. One of his areas of expertise is evaluating, documenting and development of processes for software architecture. Even’s experience includes modelling
Ericsson GPRS and BTS systems. He served as adjunct professor at the Lund Technical
University for three years teaching an industrial course in Software Architecture.

2 Description, to be Used for Evaluation, Including Aims
and/or Learning Objectives
Areas that we evaluate are:
•
•
•
•
•
•
•

Functionality
Logical structure, interface, dynamic architecture, documentation and traceability
Flexibility, comprehensibility and maintainability
Scalability, performance, redundancy, security and safety (if relevant)
Code structure and quality
Testability, test strategy, automation and coverage
Future plans

© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 765–766, 2016.
DOI: 10.1007/978-3-319-49094-6

766

E.-A. Karlsson

Learning objectives
• Understand how an architecture evaluation is done
• Understand when it can be useful to perform one
• Practical experience from evaluations that can help in your own work as architect,
both on current and future systems

3 Target Audience and Desired Number of Participants
(Minimum and Maximum)
This presentation is interesting for both architects who want to evaluate products and
architects who want to prepare themselves and their system of evaluation.
It is also useful for managers and decision makers that needs to understand what an
architecture evaluation is and when it can be useful to do.

4 Tentative Agenda
• Introduction (50 min)
– Who am I? Who are you?
– Why do we do architecture evaluations?
– How do we do it?
• SR example (30 min)
• Areas and examples (50 min)
– Functionality
– Architecture and documentation
– Non functional aspects
– Code, testing and environment
– Management of architecture
• Examples of conclusion (20 min)

SafeScrum Tutorial
Geir Kjetil Hanssen1(&), Thor Myklebust1, Tor Stålhane2,
and Børge Haugset1
1

2

SINTEF, Department of Software Engineering, Safety and Security,
Postboks 4760 Sluppen, 7465 Trondheim, Norway
{ghanssen,Thor.Myklebust,Borge.Haugset}@sintef.no
Informatics Department, Norwegian University of Science and Technology,
Sem Sælandsvei 9, 7491 Trondheim, Norway
tor.stalhane@idi.ntnu.no

Abstract. Safety-critical software is becoming an increasingly larger part of
safety systems, such as ﬁre-detection, dynamic positioning, and autonomous
driving. Such systems normally needs to be certiﬁed according to international
safety standards, such as IEC 61508 which provides details on how the system
and the software is to be built in order to be considered safe. In order to achieve
a certiﬁcate by in independent assessor, the system provider needs to provide
comprehensive documentation to demonstrate proof of conformance to the
standard. This makes the development process complex and documentation
costs very high. Traditionally this has been managed by the use of plan-based
approaches, typically variants of the V-model. This has been a viable approach
for decades as safety systems have largely been hardware based. The ongoing
software growth however calls for a new take on the development process and
the industry has picked up an interest in agile development methodologies to
address a growing need for flexibility in development. As an answer to this we
have proposed the SafeScrum process framework in order to meet new challenges in development and certiﬁcation of safety critical software. The tutorial
will explain the speciﬁc challenges related to development of such systems and
explain how Scrum may be adapted to create a flexible and efﬁcient process for
both development and certiﬁcation.
The software tends to be very complex which also affects the development
process.

1 Biography and Contact Information of Tutorial Presenters
Geir Kjetil Hanssen is a senior research scientist at SINTEF ICT, Norway. He has a
PhD in software engineering from the Norwegian University of Science and Technology (NTNU). His main areas of interest are software engineering methodologies –
in particular agile methods, software process improvement, and safety critical systems.
He is currently involved in industry-oriented research projects addressing the implementation and effect of agile methods for developing systems that will undergo certiﬁcation according to the IEC61508 and DO178C standards. Founder of SafeScrum.
Contact information: ghanssen@sintef.no, +47 92492454.

© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 767–769, 2016.
DOI: 10.1007/978-3-319-49094-6

768

G.K. Hanssen et al.

Thor Myklebust works as research and certiﬁcation manager at SINTEF ICT. He
is a Cand. Scient. in physics with an additional two years at university level on
Psychology and Statistics. He has experience in certiﬁcation of products and systems
since 1987 and has been a member of several international committees since 1988.
•
•
•
•

Safety (NEK/IEC 65),
IEC 61508 committee,
Railway (NEK/CENELEC/TC 9) and
NB-rail (notiﬁed bodies) since 2007.

He was vice-chairman and chairman of NB-rail in the period October 2013 –
October 2015. Founder of SafeScrum. Contact information: thor.myklebust@sintef.no,
+4795779869.
Tor Stålhane holds a master degree in physical electronics and a PhD in statistics.
He worked as a developer at SINTEF from 1969 to 1988 and as a safety analyst of
software intensive systems from 1988 to 2000. He then moved to NTNU where he was
a professor in software engineering, teaching software engineering, process improvement and safety analysis of software-intensive systems up to 2014. He is now professor
emeritus at NTNU. Founder of SafeScrum. Contact information: stalhane@idi.ntnu.no,
+47 97595326.
Børge Haugset is a research scientist at SINTEF ICT, Norway. He has a Master in
software engineering from the University of Oslo (UiO). His main areas of interest is
agile software development and where safety meets security – in particular the complex
problems that modern safety-critical software systems face when introduced to open
networks and the Internet of Things. Contact information: borge.haugset@sintef.no,
+47 93420190.

2 Description, to Be Used for Evaluation, Including Aims
and/or Learning Objectives
The tutorial will describe the SafeScrum process and then go through the important
steps in agile development of safety critical systems according to SafeScrum and IEC
61508 – safety analysis, the application of the agile practices and examples of applied
tool chains. The tutorial will end with a presentation of how we can adapt the agile
process to other relevant standards. After having attended the tutorial, the participants
should be able to:
• Basic understanding of safety-critical software and the IEC 61508 standard
• Understand and apply the basic ideas of agile development of safety-critical systems
• Create a SafeScrum process that is compliant to other relevant safety standards

SafeScrum Tutorial

769

3 Target Audience and Desired Number of Participants
(Minimum and Maximum)
Target audience is developers and project managers who (1) are, or will be involved in
the development of safety-critical systems and (2) will introduce agile development
methods into the process. The tutorial will focus on Scrum and development according
to IEC 61508 but can be adapted to most safety-critical standards and agile methods.
• Minimum number of participants: 5
• Maximum number of participants: 50

4 Special Requirements, Such as Facilities, Equipment,
and Materials, if Applicable
Necessary equipment is:
• Video equipment and screen
• Whiteboard and pens

5 Tentative Agenda
Time
13:00
13:30
13:50
14:20
14:35
15:05
15:30
16:00

Topic
Scrum and agile software development
Introduction to safety-critical systems
The SafeScrum process
Safety analysis, hazard logs and SafeScrum
Coffee break
Use of Agile Practices when developing Safety-Critical Software
The SafeScrum tool chain and tool classiﬁcation
Adapting SafeScrum to other standards than IEC 61508
End of tutorial

Responsible
GH
GH
TS
TM
BH
TS
GH

6 Acknowledgments
This work was partially funded by the Norwegian research council under grant
#228431 (the SUSS project). Research has been done in collaboration with Autronica
Fire & Security AS and Kongsberg Maritime AS. We are grateful for valuable input
from TÜV Nord and TÜV Rheinland.

Creating Champions and Battling Dragons –
How to Create a DevOps Culture
Pål Thomassen(B) and Ingrid Sorgendal
Bekk Consulting AS, Ferjemannsveien 10, 7042 Trondheim, Norway
pal.thomassen@bekk.no, ingrid.sorgendal@bekk.no
http://www.bekk.no

Abstract. The term DevOps is a vague and often misunderstood term.
It is typically associated with many types of automation tools. However,
while a key strategy in many well performing teams, automation is a
consequence of DevOps workflows – not the main point. In this tutorial
we will demystify the term and give some insights on how to change an
organization towards DevOps culture.

1

Introduction

Are you curious about what DevOps is, and how its practices can beneﬁt your
organization – or are you ready to implement a DevOps workﬂow, but struggling
to break up an unyielding silo mindset?
The DevOps movement is rapidly spreading throughout the software community. A group of concepts, inspired and catalyzed by Agile and Lean approaches,
there are many misconceptions of what DevOps is, and how it can be implimented to beneﬁt the development and operation of software. Oftentimes, a shift
to a DevOps oriented workﬂow requires organizational culture changes that can
be diﬃcult or even painful to undertake.
Join us for this tutorial, where we will demystify the term (and dispel some
myths), and give you the tools to identify how you can change your organization
into a smooth, well tuned DevOps machine.

2

Biography of Presenters

Pål Thomassen is Practice Lead for DevOps at BEKK Trondheim, and board
member for the Trondheim DevOps meetup. Pål draws from his experience as a
developer in projects for small and large Norwegian businesses, both in private
and public sectors. Pål is dedicated to improving the workﬂows and eﬃciency
of development teams, all the while spreading his enthusiasm and love for the
software community.
Ingrid Sorgendal is Practice Lead for User Experiences at BEKK Trondheim,
and board member for the Trondheim IxDA meetup. As a User Experience
Designer, Ingrid is a seasoned facilitator with experience from a broad array
c Springer International Publishing AG 2016

P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 770–771, 2016.
DOI: 10.1007/978-3-319-49094-6

Creating Champions and Battling Dragons

771

of projects and customers. She oftentimes ﬁnds herself working with improving
organizational cultures – enabling her customers to meet user needs, in addition
to creating better working conditions for the developers.
In this tutorial, Ingrid and Pål team up to share their experiences with
DevOps and organizational change.

3

Learning Objectives

– To give partipants an introduction to what DevOps is (and isn’t)
– To make participants conﬁdent with improving the culture of their own
organizations
– To challenge participants on how to implement and streamline a DevOps
workﬂow

4

Target Audience and Prerequirements

The tutorial targets practitioners interested in shifting to a DevOps workﬂow;
especially those struggling with change resistant cultures and/or silo cultures.
However, interested researchers will also beneﬁt from presentations as well as
activities. The tutorial will be hands on with speciﬁc techniques aimed at preparing tutorial participants on working with the cultures of their own organizations,
in order to implement the DevOps Concept into their software development and
operations.
There are no requirements, though participants should in advance decide on
a business or organization (your own workplace, customer, institute; as long as
you know it well) that can be use for the individual tutorial activities.

5
–
–
–
–
–
–
–

Tentative Agenda
Introduction: DevOps
Introduction: Actor mapping
Activity: Map your organization
Introduction: Creating champions and battling dragons
Activity: Identify champions and dragons
Activity: Create your battle strategy
Wrap up: DevOps in your day-to-day workﬂow

Lean Startups in Established Companies: How
to Make it Really Happen and How to Avoid
Common Pitfalls
Nils Brede Moe(&) and Tone Merethe Aasen
SINTEF, 7465 Trondheim, Norway
nils.b.moe@sintef.no

Abstract. Sustaining innovation in a company delivering services based on
software is difﬁcult. One common challenge is developing new products and
services when well-deﬁned requirements are lacking due to a high level of
uncertainty of what the customer really wants. This tutorial presents one
approaches to handling this uncertainty by relying on continuous experimentation and validated learning. The method “Lean Startup” is about
cross-functional teams given the authority to set directions for the new product,
and continuously testing out the assumptions and ideas on real customers.
However, the Lean Startup team is seldom able to solve all tasks by themselves.
While doing continuous experimentation, the team must align many decisions
regarding with the rest of the company, which usually slows them down. Further, the team’s autonomy is reduced due to multiple dependencies, which in
turn reduces the innovation potential of the team. The question is how do we
have Lean Startup teams which overcome these difﬁculties and reap the beneﬁts
using the method. Also: Who should be part of the cross-functional Lean Startup
team? Should the team be isolated? What are the consequences if the team
operating independently form the rest of the organization?

1 Biography and Contact Information of Tutorial Presenters
Nils Brede Moe works with software process improvement, agile software development, Lean-startup and global software development at SINTEF. His research interests
are related to organizational, socio-technical, and global/distributed aspects. His main
publications include studies on self-management, decision-making, innovation and
teamwork. He wrote his thesis on “From Improving Processes to Improving Practice —
Software Process Improvement in Transition from Plan-driven to Change-driven
Development”. He is also holding position at Blekinge Institute of Technology.
Tone Merethe Aasen works with innovation management, and knowledge processes and strategies at SINTEF. Her research interests are related to innovation as
participative processes, including new models and practices for collaborative innovation work. Her main publications include books and papers on innovation as collective
processes in and between organizations, and on employee-driven-innovation. She
wrote her thesis on “Innovation as social processes A participative study of the Statoil R & D program Subsea Increased Oil Recovery (SIOR)”.
© Springer International Publishing AG 2016
P. Abrahamsson et al. (Eds.): PROFES 2016, LNCS 10027, pp. 772–773, 2016.
DOI: 10.1007/978-3-319-49094-6

Lean Startups in Established Companies

773

2 Description, to Be Used for Evaluation, Including Aims
and/or Learning Objectives
The tutorial will describe the idea and motivation behind Lean Startup, why many agile
companies are implementing this approach today, and how they do it. Further, the
tutorial will highlight challenges companies face when adopting the Lean Startup
approach, and explain why the Lean Startup team needs a high degree of autonomy,
and how to enable such autonomy. After having attended the tutorial, the participants
should be able to:
• Have basic understanding of Lean Startup and how to apply it.
• Understand the challenges of such approaches in established companies related to
the product owner, management, and other departments.
• Understand the need for and how to empower the Lean Startup team.

3 Target Audience and Desired Number of Participants
(Minimum and Maximum)
Target audiences are team members, team leads, project and department managers,
business analysts, and HR. The tutorial will focus on Lean Startup, agile software
development, and empowered cross-functional teams. Minimum number of participants: 5, Maximum number of participants: 40.

4 Special Requirements, Such as Facilities, Equipment,
and Materials, if Applicable
Necessary equipment: Video equipment and screen, whiteboard, markers and pens.

5 Tentative Agenda
13:00
13:50

Introduction to the workshop and Lean Startup
The challenges of Lean Startup in an established company – a story
from an Norwegian bank
14:40
The self-managing cross functional Lean Startup team
15:00
Strategies for implementing Lean Startup
15:45 – 16.00 Summing up and closing the tutorial
Acknowledgments. This work was supported by the Smiglo project, partly funded by the
Research Council of Norway under grant 235359/O30, and by the Agile 2.0 project which is
supported by the Research council of Norway through grant 236759/O30.

Author Index

Eckhardt, Jonas 31
Eder, Sebastian 506
Edison, Henry 751

Aaltola, Mika 344
Aasen, Tone Merethe 772
Abrahão, Silvia 691, 713
Adenmark, Mikael 233
Afzal, Wasif 268
Alajrami, Sami 662
Aleixo, Fellipe A. 636
Alves, Carina 69
Amasaki, Sousuke 627
Ameller, David 48
Ämmälä, Mikko 598
Ampatzoglou, Apostolos 495
Angelov, Samuil 432
Aversano, Lerina 631
Avgeriou, Paris 495
Aymerich, Brenda 198, 311

Fabijan, Aleksander 517
Fægri, Tor Erlend 458
Fagerholm, Fabian 102, 344
Farré, Carles 48
Felderer, Michael 327, 683
Feldt, Robert 233
Femmer, Henning 506
Fernandez, Adrián 713
Fischbach, Kai 379
Fischer, Diana 379
França, César 723
Franch, Xavier 48
Freire, Marília A. 636

Babar, Muhammad Ali 111
Baldassarre, Maria Teresa 691, 723
Baum, Tobias 301
Berzisa, Solvita 387
Biddle, Robert 416
Bohlin, Markus 268
Borg, Markus 487
Bosch, Jan 86, 448, 477, 517
Caivano, Danilo 691
Calderón, Alejandro 697
Canfora, Gerardo 705
Carsí, José Ángel 713
Chauhan, Muhammad Aufeef
Choi, Eunjong 560
Chunduri, Annapurna 233
Cocozza, Francisco 311
Cruzes, Daniela S. 759

111

da Silva, Fabio Q.B. 723
Di Sorbo, Andrea 705
Diebold, Philipp 327, 360, 432
Dittrich, Yvonne 691
Durisic, Darko 127

Gallina, Barbara 662
Galster, Matthias 432
Garbanzo, Diana 198, 311
Grabis, Jānis 387
Grasso, Carmine 631
Grasso, Pasquale 631
Haller, Johan 368
Hämäläinen, Timo D. 529, 550
Hämeen-Anttila, Tapio 580
Hanssen, Geir Kjetil 767
Haugset, Børge 767
Hauptmann, Benedikt 506
Heikkinen, Esa 550
Hohl, Philipp 468
Hokkanen, Laura 15
Höst, Martin 181
Inoue, Katsuro 560
Insfran, Emilio 713
Jaakkola, Hannu 615
Jaatun, Martin Gilje 679
Jacobsen, Jan Wiedemann

327

776

Author Index

Jansen, Slinger 69
Jensen, Matilde Bisballe 675
Johnsson, Björn A. 181
Juntunen, Marko 643
Kakkonen, Kari 285
Karlsson, Even-André 763, 765
Karvonen, Teemu 643
Katumba, Brian 477
Kaur Chahal, Kuljit 741
Kettunen, Petri 598
Kiesling, Stephan 379
Kilamo, Terhi 102
Kiljander, Harri 15
Klünder, Jil 379, 731
Koenders, Thom 495
Kojo, Tero 102
Komssi, Marko 15
Kortum, Fabian 731
Kropp, Martin 416
Kuhrmann, Marco 327
Kuusela, Raija 580
Kuvaja, Pasi 144, 399, 643
Lanzilotti, Rosa 691
Larsson, Stig 268
Lehto, Jari 144
Leppäkoski, Arttu 529
Lokan, Chris 627
López, Gustavo 198, 311
Lozano, Federico 675
Lwakatare, Lucy Ellen 399, 590
Mäenpää, Hanna 102
Maglyas, Andrey 368
Magnusson, Boris 181
Mäkinen, Simo 590
Männistö, Tomi 102, 344, 590
Mäntylä, Mika V. 285, 611
Märijärvi, Jukka 15
Marín, Gabriela 311
Markkula, Jouni 144
Markzac, Sabrina 759
Maro, Salome 163
Mårtensson, Torvald 448
Meesters, Marcel 432
Meier, Andreas 416
Méndez Fernández, Daniel 31
Mercaldo, Francesco 705

Moe, Nils Brede 458, 772
Mohout, Omar 15
Moser, Dagmar 506
Motta, Corrado 127
Možucha, Jakub 217
Münch, Jürgen 327, 344, 468, 761
Munezero, Myriam 102, 344
Myklebust, Thor 767
Numata, Seiya 560
Nurminen, Mikko 102
Oivo, Markku 144, 399, 643
Olsson, Helena Holmström 86, 477, 517
Olszewska, Marta 621
Ostroumov, Sergey 621
Pacheco, Alexia 198, 311
Palmu, Christina 344
Partanen, Jari 598
Peikenkamp, Thomas 163
Pfahl, Dietmar 249, 539
Piccinno, Antonio 691
Probst, Christian W. 111
Raatikainen, Mikko 15
Ramler, Rudolf 683
Raulamo-Jurvanen, Päivi 285
Razzak, Mohammad Abdur 746
Ribeiro, Claudia M.F.A. 636
Riungu-Kalliosaari, Leah 590
Romanovsky, Alexander 662
Rontti, Simo 598
Rossi, Bruno 217
Ruﬁan, Guillem 48
Ruiz, Mercedes 697
Saadatmand, Mehrdad 268
Saari, Leila 580
Saini, Munish 741
Samuel, Triin 249
Sauvola, Tanja 598, 643
Schmitt, Anna 360
Schneider, Kurt 301, 379, 468, 731
Sjøberg, Dag I.K. 3
Smolander, Kari 368, 611
Sorgendal, Ingrid 770
Ståhl, Daniel 448
Stålhane, Tor 767

Author Index

Stamelos, Ioannis 495
Staron, Miroslaw 127
Steghöfer, Jan-Philipp 163
Steinert, Martin 675
Sten, Harri 615
Stray, Viktoria 458
Stupperich, Michael 468
Sulaman, Sardar Muhammad 487
Sundmark, Daniel 268
Suomalainen, Tanja 580, 643
Suonsyrjä, Sampo 571
Systä, Kari 571, 615
Tahvili, Sahar 268
Taušan, Nebojša 144
Teppola, Susanna 598
Terho, Henri 571
Thomassen, Pål 770
Tiihonen, Juha 590

Tøndel, Inger Anne 679
Törnroos, Hannu 368
Tortorella, Maria 631
Trei, Maria 163
Trinidad, Manuel 697
Valença, George 69
van Alphen, Pascal 495
Visaggio, Corrado Aaron 705
Vogelsang, Andreas 31
Waldén, Marina 621
Wnuk, Krzysztof 487
Yaman, Sezin Gizem 344
Yin, Huishi 539
Yli-Huumo, Jesse 368
Yoshida, Norihiro 560

777