Skip to topic | Skip to bottom


Main.SQAbstractsr1.84 - 16 Jul 2013 - 08:37 - AndyZaidman

Start of topic | Skip to actions

Research Colloquium Presentation Abstracts

Estimating Rework with API Change Injection

Thursday March 6th, 2014: 11:00

Presenter: Steven Raemaekers

Interfaces in a library can be seen as contracts between developers and users of that library. When interfaces (or any public and used part of a software system) change, rework in other places is expected. In this paper, we have investigated the occurrence of interface changes between different versions of libraries in the Maven repository. We also introduce a new method to estimate the impact of an interface change by injecting the change and detect compilation errors occurring because of it. This gives a good indication of expected rework, because we believe it closely matches the way developers adapt software. In this talk, I will demonstrate my change injection technique and I will share the results with you.

Improving Service Diagnosis Through Invocation Monitoring

Wednesday July 17th, 2013: 10:15

Presenter: Cuiting Chen

Service oriented architectures support software runtime evolution through reconfiguration of misbehaving services. Reconfiguration requires that the faulty services can be identified correctly. Spectrum-based fault localization is an automated diagnosis technique that can be applied to faulty service detection. It is based on monitoring service involvement in passed and failed system transactions.

Monitoring only the involvement of services sometimes leads to inconclusive diagnoses. In this paper, we propose to extend monitoring to include also the invocation links between the services. We show through simulations and a case study with a real system under which circumstances service monitoring alone inhibits the correct detection of a faulty service, and how and to which extent the inclusion of invocation monitoring can lead to improved service diagnosis.

Mining Models from Generated System Tests

Friday June 28th, 2013: 11:45

Presenter: Andreas Zeller

Modern Analysis and Verification techniques can easily check advanced properties in complex software systems. Specifying these models and properties is as hard as ever, though. I present techniques to extract models from legacy systems based on dynamic analysis of automatically generated system tests – models that are real by construction, and sufficiently complete and precise to serve as specifications for testing, maintenance, and proofs.

Andreas Zeller is a full professor for Software Engineering at Saarland University in Saarbrücken, Germany, since 2001. His research concerns the analysis of large software systems and their development process. In 2010, Zeller was inducted asFellow of the ACM for his contributions to automated debugging and mining software archives. In 2011, he received an ERC Advanced Grant, Europe’s highest and most prestigious individual research grant, for work on specification mining and test case generation.

For a crisper definition of technical debt

Friday June 28th, 2013: 11:00

Presenter: Philippe Kruchten

The metaphor of technical debt to designate broadly several ills that large software development projects suffer from is gaining a lot of momentum. But at the same time, this technical debt is ill-defined, making its usefulness limited to being just a nice rhetorical concept to establish a dialog between business stakeholders and technical staff. Is the technical debt just another term for bad software quality? Is technical debt quantifiable in monetary terms? Over a series of workshops organized in the context of ICSE, the software engineering community has come to a better understanding of technical debt, what criteria to use, its various forms and also the limitations of the metaphor.

Variability in quality attributes in service-based software systems

Thursday June 27th, 11:00

Presenter: Sara Mahdavi-Hezavehi

Facilitating variability in software-intensive systems is essential to make sure that systems successfully adapt to changing needs. In service-bases systems, variability is usually achieved through flexible service retrieval and binding, mostly focusing on functional aspects rather than quality attributes. Moreover, variability in service-based systems focuses on process workflow variability. Therefore, the objective of this thesis is to describe the state-of-the-art of variability in quality attributes in service-based systems. We are particularly interested in a) assessing the quality of current research, b) collecting evidence about current research that suggests implications for practice, and c) identifying research trends, open problems and areas for improvement. Our ultimate goal is to summarize key issues in variability management of quality attributes in service-based systems and to propose new lines of research. Therefore, we propose a systematic literature review concerning variability in quality attributes in service-based software systems. In our review we intend to identify, evaluate, and interpret all available relevant research done in this particular domain, and to answer a set of specific research questions.

A Framework for Improving a Distributed Software System's Deployment Architecture

Monday June 3rd, 2013: 12:00

Presenter: Sam Malek

A distributed system’s allocation of software components to hardware nodes (i.e., deployment architecture) can have a significant impact on its quality of service (QoS). For a given system, there may be many deployment architectures that provide the same functionality, but with different levels of QoS. The parameters that influence the quality of a system's deployment architecture are often not known before the system’s initial deployment and may change at runtime. This means that redeployment of the software system may be necessary to improve the system’s QoS properties. In this talk I will describe a framework aimed at finding the most appropriate deployment architecture for a distributed software system with respect to multiple, possibly conflicting QoS dimensions. The framework supports formal modeling of the problem and provides a set of tailorable algorithms for improving a system’s deployment. We have realized the framework on top of a visual deployment architecture modeling and analysis environment. The framework has been evaluated for precision and execution-time complexity on a large number of simulated distributed system scenarios, as well as in the context of two third-party families of distributed applications.

Automatic composition of complex software components

Monday June 3rd, 2013: 11:30

Presenter: Roberto di Cosmo

Modern software systems are built by composing components drawn from large repositories, whose size and complexity is increasing at a very fast pace. Cloud applications are built by connecting together services running on different machines, configured with a selection of software packages whose co-installability problem, related to boolean satisfiability, is known to be algorithmically hard. We will survey recent work that combines sophisticated algorithms for checking co-installability developed in the Mancoosi project with a clean model of software services to create a unique software architecture configuration tool, codename Zephyrus, developed in the Aeolus project.

Reliability of APIs

Wednesday May 29th, 2013, 11:00

Presenter: Maria Kechagia, Athens University of Economics and Business

Application programming interfaces (APIs) of new open platforms, such as Android, is a fertile ground for empirical software engineering studies. The availability of the code of the APIs establishes them as a valuable corpus for research associated with their design. Locating deficient APIs and improving their design or implementation can prevent thousands of applications from crashes. In our talk, we report how we used software telemetry data, in the form of stack traces, coming from Android application crashes, to analyze their causes and evaluate the reliability of the used APIs.

Analyzing the Change-Proneness of Service-Oriented Systems from an Industrial Perspective

Wednesday, May 15th, 2013, 14:00

Presenter: Daniele Romano

Antipatterns and code smells have been widely proved to affect the change-proneness of software components. However, there is a lack of studies that propose indicators of changes for service-oriented systems. Like any other software systems, such systems evolve to address functional and non functional requirements. In this research, we investigate the change- proneness of service-oriented systems from the perspective of software engineers. Based on the feedback from our industrial partners we investigate which indicators can be used to highlight change-prone application programming interfaces (APIs) and service interfaces in order to improve their reusability and response time. The output of this PhD research will assist software engineers in designing stable APIs and reusable services with adequate response time.

The GHTorrent dataset and tool suite

Wednesday, May 15th, 2013, 14:00

Presenter: Georgios Gousios

During the last few years, GitHub has emerged as a popular project hosting, mirroring and collaboration platform. GitHub provides an extensive REST API, which enables researchers to retrieve high-quality, interconnected data. The GHTorent project has been collecting data for all public projects available on Github for more than a year. In this talk, we present the dataset details and construction process and outline the challenges and research opportunities emerging from it.

Fixing the 'Out of Sight Out of Mind' Problem: One Year of Mood-Based Microblogging in a Distributed Software Team

Wednesday, May 15th, 2013, 14:00

Presenter: Kevin Dullemond & Ben van Gameren

Distributed teams face the challenge of staying connected. How do team members stay connected when they no longer see each other on a daily basis? What should be done when there is no coffee corner to share your latest exploits? In this paper we evaluate a microblogging system which makes this possible in a distributed setting. The system, WeHomer, enables the sharing of information and corresponding emotions in a fully distributed organization. We analyzed the content of over a year of usage data by 19 team members in a structured fashion, performed 5 semi-structured interviews and report our findings in this paper. We draw conclusions about the topics shared, the impact on software teams and the impact of distribution and team composition. Main findings include an increase in team-connectedness and easier access to information that is traditionally harder to consistently acquire

The Maven Repository Dataset of Metrics, Changes, and Dependencies

Wednesday, May 15th, 2013: 14:00

Presenter: Steven Raemaekers

We present the Maven Dependency Dataset (MDD), containing metrics, changes and dependencies of 148,253 jar files. Metrics and changes have been calculated at the level of individual methods, classes and packages of multiple library versions. A complete call graph is also presented which in- cludes call, inheritance, containment and historical relationships between all units of the entire repository. In this paper, we describe our dataset and the methodology used to obtain it. We present different conceptual views of MDD and we also describe limitations and data quality issues that researchers using this data should be aware of.

An N-gram Analysis on the complete corpus of MSR papers

Tuesday April 23th, 11:00

Presenter: Serge Demeyer, University of Antwerp

On the occasion of the 10th anniversary of the MSR conference, it is a worthwhile exercise to meditate on the past, present and future of this striving research community. Indeed, since the MSR community has experienced a big influx of researchers bringing in new ideas, state-of-the art technology and contemporary research methods it is unclear what the future might bring. In this paper, we report on a text mining exercise applied on the complete corpus of MSR papers to reflect on where we come from; where we are now; and where we should be going. We address issues like the trendy (and outdated) research topics; the frequently (and less frequently) cited cases; the popular (and emerging) mining infrastructure; and finally the proclaimed actionable information which we are deemed to uncover.

BIO: Serge Demeyer is a professor at the University of Antwerp and the spokesperson for the ANSYMO (Antwerp System Modelling) research group. He directs a research lab investigating the theme of "Software Reengineering" (LORE - Lab On REengineering). In 2007 he received a "Best teacher" award from the Faculty of Sciences at the University of Antwerp. As a consequence he remains very active in all matters related to teaching quality.

His main research interest concerns software reengineering, more specifically the evolution of object-oriented software systems. He is an active member of the corresponding international research communities, serving in various conference organization and program committees. He has written a book entitled "Object-Oriented Reengineering" and edited a book on "Software Evolution". He also authored numerous peer reviewed articles, many of them in highly respected scientific journals. He completed his M.Sc. in 1987 and his PhD in 1996, both at the "Vrije Universiteit Brussel". After his PhD, he worked for three years in Switzerland, where he served as a technical co-ordinator of an European research project. Switzerland remains near and dear to his heart, witness the sabbatical leave during 2009-2010 at the University of Zürich in the research group SEAL.

CodeMine: a Software Analytics Platform for Collecting and Analyzing Engineering Process Data at Microsoft.

Monday March 18, 12:30, Snijderszaal, TU Delft

Presenter: Jacek Czerwonka, Microsoft

The breadth of products Microsoft is involved in developing is unprecedented. From several flavors of operating systems (Windows for phones, PCs and servers), through client and server-side software in a box (Office, Exchange, SQLServer), many types of services (Skype, Bing, Office365), and finally to several types of hardware (Xbox, Surface); all used by hundreds of millions of people across the world. Some of these products have long histories and some are relatively new. And even though all products eventually adhere to the same release policies, there are substantial differences in internal processes followed by teams which in large part depend on the product’s history, their business model, where they are in the lifecycle, and their operating characteristics. Project CodeMine captures data from all the processes, from all products, and for all organizations building them into a common schema. CodeMine enables product teams to drastically shorten the time to answer engineering process questions and substantially improves both the quality and consistency of decision making. In addition, it provides the ability to easily perform comparative analysis of processes across product lines of different characteristics, which makes it invaluable to Microsoft’s empirical software engineering researchers.

This talk will describe the architecture of CodeMine, provide examples of its various uses, and examine several results in depth. This talk will also outline collaboration possibilities that exists for empirical software researchers from outside of Microsoft interested in performing studies based on CodeMine’s data.

BIO: Jacek Czerwonka is a principal architect in Tools for Software Engineers team at Microsoft. After spending 10 years working on Windows (mostly in testing), he is currently involved in creating solutions for understanding software engineering organizations and improving engineering processes at Microsoft. His interests revolve around engineering process analysis and improvement, software testing, and data-driven decision making on software projects. He has been involved in CodeMine since its inception.

Games Software Architects Play

Speaker: Philippe Kruchten

Over the years we've identified some of the strategies and tactics software architects use during the design of new, bold, large software-intensive systems: divide-and-conquer, brainstorming, reuse, etc.. But we also observed some strange tactics, biases, reasoning fallacies that creep in and pervert somehow the design process. They go by simple, funny or fancy names: anchoring, red herring, elephant in the room, post hoc ergo propter hoc, non sequitur, argumentum verbosium, etc. This talk will do a little illustrated catalogue of these games, with examples, and how they sometimes combine onto subtle but elaborate political plots. In other words, this talk is about cognitive biases and how they affect software development.

Bio: Philippe Kruchten is professor of software engineering, in the department of Electrical and Computer Engineering of the University of British Columbia, in Vancouver, where he holds an NSERC Chair in Design Engineering. He teaches software engineering, more specifically software project management, and two interdisciplinary project courses on innovation, entrepreneurship, system engineering. Philippe does research in software architecture and software processes, with a handful of graduate students, and many collaborators around the world. He is known as Director of Process Development (RUP) at Rational Software, and developer of the 4+1 view model.

Venue: TU Delft, Faculty EEMCS, Mekelweg 4, Delft, Lipkensroom, December 19, 15:30-16:30.

Mining Structured Data in Natural Language Artifacts with Island Parsing

Speaker: Alberto Bacchelli

Software developers are nowadays supported by a variety of tools (e.g., version control systems, issue tracking systems, and mailing list services) that record a wide range of information into archives. Researchers mine these archives (or software repositories) both to support software understanding, development, and evolution, and to empirically validate novel ideas and techniques. Software repositories comprise two types of data: structured data and unstructured data. Structured data, such as source code, has a well-established structure and grammar, and is straightforward to parse and use with computer machinery. Unstructured data, such as documentation, discussions, comments, customer support requests, comprises a mixture of natural language text, snippets of structured data, and noise. Mining unstructured data is hard, because out-of-the box approaches adopted from related fields such as Natural Language Processing and Information Retrieval cannot be directly applied in the Software Engineering domain.

In my work I focus on mining unstructured data, because it gives us the chance to gain insights on the human factors revolving around software projects, so that we can better understand and support software development from a different perspective. In particular, I focus on the email communication occurring among people involved in a software project. In this presentation, I will detail our approach---based on island parsing---to recognize, parse, and model fragments of structured information (e.g., source code snippets) embedded in natural language artifacts. We evaluated our approach by applying it to the mailing lists of three open source software systems. The results show that our approach allows the extraction of structured data in messages with high precision and recall, despite the noisy nature of emails. I will discuss how the presented approach can be used to conduct novel forms of software analysis and lead to promising future work.

Bio: Alberto Bacchelli obtained his Bachelor and Master's degree in Computer Science from the University of Bologna, Italy. After graduating he has been working for one year in the largest Italian computing center as a professional software engineer in a large team to develop software for universities. He is currently a Ph.D. student at the University of Lugano in the Faculty of Informatics, working under the supervision of Prof. Michele Lanza in the REVEAL (Reverse Engineering, Visualization, Evolution Analysis Lab) research group. He has been intern at Microsoft Research in summer 2012, mentored by Dr. Christian Bird, working on understanding motivations, outcomes, and challenges of tool supported code reviews. He co-organized the second international workshop on mining unstructured data (MUD'12), held with WCRE 2012. His research interests include empirical software engineering, mining software repositories, unstructured data mining, qualitative research, and development tools.

Leveraging feature models for change impact analysis and design improvement

Speaker: Nicolas Dintzner

Large software systems are subject to constant changes, leading to an increase of complexity and thus increases in maintenance and development cost. For long live system, there is a need to address evolvability early on in the development process.

One way of designing an evolvable system is to describe likely upcoming changes and assess their impact of the current system. While all changes cannot be foreseen, some changes can be inferred from what is already subject to change: alternative and optional features of the system.

We propose here an approach to assess the impact of external feature changes on a design based on a system description using feature model and features/design artifacts relationships. The approach is ment to be lightweight and usable iteratively to quickly improve the capability of a design to withstand such potential changes.

Analyzing the Evolution of Web Services using Fine-Grained Changes

Speaker: Daniele Romano

In the service-oriented paradigm web service interfaces are considered contracts between web service subscribers and providers. However, these interfaces are continuously evolving over time to satisfy changes in the requirements and to fix bugs. Changes in a web service interface typically affect the systems of its subscribers. Therefore, it is essential for subscribers to recognize which types of changes occur in a web service interface in order to analyze the impact on his/her systems. In this paper we propose a tool called WSDLDiff to extract fine-grained changes from subsequent versions of a web service interface defined in WSDL. In contrast to existing approaches, WSDLDiff takes into account the syntax of WSDL and extracts the WSDL elements affected by changes and the types of changes. With WSDLDiff we performed a study aimed at analyzing the evolution of web services using the fine-grained changes extracted from the subsequent versions of four real world WSDL interfaces. The results of our study show that the analysis of the fine-grained changes helps web service subscribers to highlight the most frequent types of changes affecting a WSDL interface. This information can be relevant for web service subscribers who want to assess the risk associated to the usage of web services and to subscribe to the most stable ones.

Measuring library stability through code metrics and binary incompatibilities

Speaker: Steven Raemaekers

Third-party libraries are widely used in todays software systems, especially in Java. Changes in libraries can force software systems to update places with links to these libraries. A 'stable' library is capable of avoiding such changes in systems that use it, while an 'unstable' library will cause so-called ripple effects sooner. What kind of metrics can we think of that can catch this stability of libraries? We look at historic changes in metric values of a library. With Java bytecode, it is possible to exactly determine the size of the impact of each change. Eventually, we want to determine how much it costs if a single line in a library or system is being changed, including all ripple effects in other places

Developer Motivation and the Adoption of Software Engineering Methods

Speaker: Leif Singer

Software engineering research and practice provide a wealth of methods that improve the quality of software and lower the costs of producing it. Even though processes mandate their use, methods are not employed consequently. Software developers and development organizations thus cannot benefit from these methods fully. There can be diverse reasons for unsatisfactory adoption, the motivations of developers being one of them. We are developing a theoretical framework for supporting the more intrinsic motivations of software developers. This talk discusses our efforts and current status.

Leif Singer is a Ph. D. student at the Software Engineering Group at Leibniz Universität Hannover in Germany. He's interested in the different uses of social software in software development and the effects it can have on developers' behaviors.

Declaratively Defining Domain-Specific Language Debuggers

Speaker: Ricky Lindeman

Tool support is vital to the effectiveness of domain-specific languages. With language workbenches, domain-specific languages and their tool support can be generated from a combined, high-level specification. This paper shows how such a specification can be extended to describe a debugger for a language. To realize this, we introduce a meta-language for coordinating the debugger that abstracts over the complexity of writing a debugger by hand. We describe the implementation of a language-parametric infrastructure for debuggers that can be instantiated based on this specification. The approach is implemented in the Spoofax language workbench and validated through realistic case studies with the Stratego transformation language and the WebDSL web programming language.

Performance Trade-offs in Client-Side Service Delegation

Speaker: Adam Nasr

Service Oriented Architecture, which builds on distributed computing platforms, is increasingly being adopted by organizations in both public and private sectors. Migration from traditional monolithic systems to services, in particular web services, characterizes much of systems evolution today. This paper analyzes some of the performance and modularization problems involved in current service-oriented computing. It investigates under which circumstances the communication between service providers and service consumers can be made more efficient by eliminating certain steps from traditional Remote Procedure Call (RPC) methods. After discussing traditional service invocation and its drawbacks, this paper proposes an alternative approach called Distributed Service Delegates (DSD). DSD is based on emphasizing client-side or local computations. An experiment is designed and implemented to measure the trade offs between traditional methods, in this case Web services, and the proposed DSD. The results of this experiment are discussed and its implications for future research are indicated.

Spectrum-based Health Monitoring for Self-Adaptive Systems

Speaker: Éric Piel

An essential requirement for the operation of self-adaptive systems is information about their internal health state, i.e., the extent to which the constituent software and hardware components are still operating reliably. Accurate health information enables systems to recover automatically from (intermittent) failures in their components through selective restarting, or self-reconfiguration.

This paper explores and assesses the utility of Spectrum-based Fault Localisation (SFL) combined with automatic health monitoring for self-adaptive systems. Their applicability is evaluated through simulation of online diagnosis scenarios, and through implementation in an adaptive surveillance system inspired by our industrial partner. The results of the studies performed confirm that the combination of SFL with online monitoring can successfully provide health information and locate problematic components, so that adequate self-* techniques can be deployed.

Using Source Code Metrics to Predict Change-Prone Java Interfaces

Speaker: Daniele Romano

Empirical studies have investigated the use of source code metrics to predict the change- and defect-proneness of source code files and classes. While results showed strong correlations and good predictive power of these metrics, they do not distinguish between interface, abstract or concrete classes. In particular, interfaces declare contracts that are meant to remain stable during the evolution of a software system while the implementation in concrete classes is more likely to change. This study aims at investigating to which extent the existing source code metrics can be used for predicting change-prone Java interfaces. The correlation between metrics and the number of fine-grained source code changes have been investigated in interfaces of ten Java open-source systems. Then, the metrics have been evaluated to calculate models for predicting change-prone Java interfaces. The results show that the external interface cohesion metric exhibits the strongest correlation with the number of source code changes. This metric also improves the performance of prediction models to classify Java interfaces into change-prone and not change-prone.

Using Vector Clocks to Monitor Dependencies among Services at Runtime

Speaker: Daniele Romano

Service-Oriented Architecture (SOA) enable organizations to react to requirement changes in an agile manner and to foster the reuse of existing services. However, the dynamic nature of Service-Oriented Systems and their agility bear the challenge of properly understanding such systems. In particular, understanding the dependencies among services is a non trivial task, especially if service-oriented systems are distributed over several hosts and/or using different SOA technologies. In this study, we propose an approach to monitor dynamic dependencies among services. The approach is based on the vector clocks, originally conceived and used to order events in a distributed environment. We use the vector clocks to order service executions and to infer causal dependencies among services. In our future work we plan to use this information to study change and failure impact analysis in service-oriented systems.

Understanding Service-Oriented Systems Using Dynamic Analysis

Speaker: Tiago Espinha

Service-Oriented Architecture (SOA) enable organizations to react to requirement changes in an agile manner and to foster the reuse of existing services. However, the dynamic nature of Service-Oriented Systems and their agility bear the challenge of properly understanding such systems. In particular, understanding the dependencies among services is a non trivial task, especially if service-oriented systems are distributed over several hosts and/or using different SOA technologies. In this study, we propose an approach to monitor dynamic dependencies among services. The approach is based on the vector clocks, originally conceived and used to order events in a distributed environment. We use the vector clocks to order service executions and to infer causal dependencies among services. In our future work we plan to use this information to study change and failure impact analysis in service-oriented systems.

A Framework-based Runtime Monitoring Approach for Service-Oriented Software Systems

Speaker: Cuiting Chen

The highly dynamic and loosely coupled nature of a serviceoriented software system leads to the challenge of understanding it. In order to obtain insight into the runtime topology of a SOA system, we propose a framework-based runtime monitoring approach to trace the service interactions during execution. The approach can be transparently applied to all web services built on the framework and reuses parts of information and functionality already available in the framework to achieve our goals.

Grammar Comparison Techniques

Speaker: Vadim Zaytsev

The need to compare languages is commonly encountered when developing grammarware (such as a parser or a manual), validating and fixing it, assessing its compatibility. When making statements in natural languages, it is very easy and commonly desirable to claim one language to be equal or equivalent to another, or to be a subset or a superset. However, in formal practice these questions are undecidable, so a smart workaround is usually needed. Three different approaches from recent research will be covered: grammar convergence is a lightweight verification method for establishing and maintaining the correspondence between grammar knowledge ingrained in software artefacts; grammar-based testing approach is based on systematic test data generation and feeding it into parsers in a differential way; test-based nonterminal matching can go beyond nominal matching of syntactic categories by applying grammar-based testing on a per nonterminal basis.

Reconstructing Complex Metamodel Evolution

Speaker: Sander Vermolen

Metamodel evolution requires model migration. To correctly migrate models, evolution needs to be made explicit. Manually describing evolution is error-prone and redundant. Metamodel matching offers a solution by automatically detecting evolution, but is only capable of detecting primitive evolution steps. In practice, primitive evolution steps are jointly applied to form a complex evolution step, which has the same effect on a metamodel as the sum of its parts, yet generally has a different effect in migration. Detection of complex evolution is therefore needed. In this paper we present an approach to reconstruct complex evolution between two metamodel versions, using a matching result as input. It supports operator dependencies and mixed, overlapping and incorrectly ordered complex operator components. It also supports interference between operators, where the effect of one operator is partially or completely hidden from the target metamodel by other operators.

Using Pattern Recognition Techniques for Server Overload Detection

Speaker: Cor-Paul Bezemer

One of the key factors in customer satisfaction is application performance. To be able to guarantee good performance, it is necessary to take appropriate measures before a server overload occurs. While in small systems it is usually possible to predict server overload using a subjective human expert, an automated overload prediction mechanism is important for ultra-large scale systems, such as multi-tenant Software-as-a-Service (SaaS) systems. An automated prediction mechanism would be an initial step towards self-adaptiveness of such systems, a property which leads to less human intervention during maintenance, resulting in less errors and better quality of service. In order to provide such a prediction mechanism, it is important to have a solid overload detection approach, which is (1) a first step towards automated prediction and (2) necessary for automated testing of a prediction mechanism. In this paper we propose a number of steps which help with the design and optimization of a statistical pattern classifier for server overload detection. Our approach is empirically evaluated on a synthetic dataset.

Software Engineering and Sensor Networks

Speaker: Matthias Woehrle

Wireless sensor networks (WSNs) are a new class of computation systems that enable seamless integration of the physical with the digital world. Developing applications of sensor network technology is challenged by various intricacies such as unpredictable environmental influences, unreliable communication between sensor nodes and severe resource constraints (processing, memory, bandwidth and energy). The development of sensor networks software is still rather ad-hoc, lacking suitable methodologies, techniques, and abstractions. These are topics where the software engineering community can help sensor network research in facilitating the development of novel, reliable and dependable WSN applications. This talk will highlight some of the software engineering challenges in WSNs and present some subjectively select examples of software engineering research on WSNs.

A Self-Adaptive Deployment Framework for Service-Oriented Systems

Speaker: Sander van der Burg

Deploying components of a service-oriented system in a network of machines is often a complex and labourious process. Usually the environment in which such systems are deployed is dynamic: any machine in the network may crash, network links may temporarily fail, and so on. Such events may render the system partially or completely unusable. If an event occurs, it is difficult and expensive to redeploy the system to the take the new circumstances into account.

In this paper we present a self-adaptive deployment framework built on top of Disnix, a model-driven distributed deployment tool for service-oriented systems. This framework dynamically discovers machines in the network and generates a mapping of components to machines based on non-functional properties. Disnix is then invoked to automatically, reliably and efficiently redeploy the system.

Collective Code Bookmarks for Program Comprehension

Speaker: Anja Guzzi

The program comprehension research community has been developing useful tools and techniques to support developers in the time-consuming activity of understanding software artifacts. However, the majority of the tools do not bring collective benefit to the team: After gaining the necessary understanding of an artifact (e.g., using a technique based on visualization, feature localization, architecture reconstruction, etc.), developers seldom document what they have learned, thus not sharing their knowledge. We argue that code bookmarking can be effectively used to document a developer’s findings, to retrieve this valuable knowledge later on, and to share the findings with other team members. We present a tool, called POLLICINO, for collective code bookmarking. To gather requirements for our bookmarking tool, we conducted an online survey and interviewed professional software engineers about their current usage and needs of code bookmarks. We describe our approach and the tool we implemented. To assess the tool’s effectiveness, adequacy, and usability, we present an exploratory pre-experimental user study we have performed with 11 participants.

Finding Software License Violations Through Binary Code Clone Detection

Speaker: Eelco Dolstra

Software released in binary form frequently uses third-party packages without respecting their licensing terms. For instance, many consumer devices have firmware containing the Linux kernel, without the suppliers following the requirements of the GNU General Public License. Such license violations are often accidental, e.g., when vendors receive binary code from their suppliers with no indication of its provenance. To help find such violations, we have developed the Binary Analysis Tool (BAT), a system for code clone detection in binaries. Given a binary, such as a firmware image, it attempts to detect cloning of code from repositories of packages in source and binary form. We evaluate and compare the effectiveness of three of BAT’s clone detection techniques: scanning for string literals, detecting similarity through data compression, and detecting similarity by computing binary deltas.

Building a Computing System for the World’s Information

Speaker: JC van Winkel, Site Reliability Engineer, Google Zurich

Thursday April 14, 14:00, Room HB09.130, EWI, TU Delft, Mekelweg 4, Delft

Google's mission (to organize the world’s information and make it universally accessible and useful) in today's world requires quite some infrastructure. Not only in hardware, but also in software. In this talk we will take a look at some of this infrastructure and some of the projects making use of it. Of course, achieving Google's goal and overcoming the challenges needed to perform at Google scale is only possible with people. Therefore we will also look at the culture within Google and the philosophy around which all engineering is done.

Bio: JC van Winkel has a B.S. and an M.S. in Computer Science (the M.S. from the Vrije Universiteit Amsterdam). From 1990 to 2010 he worked at AT Computing, a small courseware and consulting firm in Nijmegen, the Netherlands. There he taught UNIX and UNIX-related subjects, such as C++. Since November 2010 he has been working as a software engineer at Google Switzerland in Zurich.

Software Bertillonage: Finding the Provenance of an Entity

Speaker: Daniel M. German, Dept of Computer Science, University of Victoria

Deployed software systems are typically composed of many pieces, not all of which may have been created by the main development team. Often, the provenance of included components --- such as external libraries or cloned source code --- is not clearly stated, and this uncertainty can introduce technical and ethical concerns that make it difficult for system owners and other stakeholders to manage their software assets.

In this presentation, I will motivate the need for the recovery of the provenance of software entities using metrics that can be computed easily but are effective at reducing the search space, in a manner similar to that of Bertillonage. This was a simple and approximate forensic analysis technique based on bio-metrics that was developed in 19th century France before the advent of fingerprints.

As an example, we have developed a fast, simple, and approximate technique called Anchored Signature Matching for identifying library version information within a given Java application. This technique involves a type of structured signature matching performed against a database of candidates drawn from the Maven2 repository, a 150GB collection of open source Java libraries. An exploratory case study using a proprietary e-commerce Java application illustrates that the approach is both feasible and effective.

I will also describe how software Bertillonage is one of the first steps towards solving the problem of license compliance.

A co-Relational Model of Data for Large Shared Data Banks

Speaker: Erik Meijer, Microsoft

Wednesday April 13, 11:00-12:00, Lecture Room C, EWI, TU Delft, Mekelweg 4, Delft

One of the most hotly debated topics in the world of Cloud-scale data processing today is noSQL versus SQL. Because noSQL lacks a mathematic underpinning, the discussion around each model's strength and weakness often relies on rhetoric and anecdotal evidence, instead of sound technical reasoning. This makes it hard for enterprises and practitioners to make rational decisions around building (new) projects using either SQL or noSQL solutions.

In this talk we present categorical models for noSQL key-value stores and for SQL's foreign/primary-key data stores, and show that SQL and noSQL are in fact mathematical duals; in other words noSQL is really coSQL. We also illustrate how another concept from category theory, namely monads, can be seen as a generalization of the relational algebra and thus provides a uniform algebra for expressing queries over both SQL and noSQL stores.

Just as Codd's discovery of the relational algebra as a formal basis for SQL propelled a billion dollar industry around foreign/primary-key stores, we believe that the formalization of coSQL using category theory will allow the same to happen for coSQL key-value stores.

Full paper: Erik Meijer and Gavin Bierman. A Co-Relational Model of Data for Large Shared Data Banks. Communications of the ACM Vol. 54 No. 4, Pages 49-58, 2011 (fulltext).

Bio: Erik Meijer is head of the Cloud Programmability team at Microsoft, Redmond, USA. He is the (co-)creator of LINQ, Volta, and the Reactive programming framework Rx (Reactive Extensions) for .NET. He has been involved in over 150 software patent applications. In 2009, he was the recipient of the Microsoft Outstanding Technical Leadership Award.

Before joining Microsoft more than 10 years ago, he was associate professor at Utrecht University, where he worked on functional programming, in particular on the programming language Haskell.

Empirical Software Engineering for Agent Programming

Speaker: Birna van Riemsdijk

Various programming languages and frameworks for developing agents have been developed by now. These languages are moving more and more from the realm of theory and toy examples to being applied in challenging environments. However, very few systematic studies have been done as to how the language constructs in these languages may be and are in fact used in practice. In this talk I will present recent empirical research on the use of the agent programming language GOAL that is being developed in the MMI group at TU Delft. In particular, we have studied GOAL programs that were developed for the first-person shooter game Unreal Tournament 2004 by students of the first year BSc course on multi-agent systems. This research aims to form the basis for development of programming guidelines and language improvements for GOAL and agent programming languages in general.

Building DSLs for Algorithmic Currency Trading

Speaker: Karl Trygve Kalleberg

Writing low-latency, high-frequency trading systems that are reliable and predictable currently requires both an in-depth understanding of the trading domain plus solid skills in engineering asynchronous systems. The domain expert and systems engineer are rarely the same person (but in those few exceptional cases, said person is paid obscenely well.) We propose to mend the highly problematic divide between trader and programmer by offering technically minded traders a domain-specific language for writing trading strategies along with a web platform for end-to-end development, testing and deployment of these strategies.

The talk will include a short demonstration of KolibriFX, our web-based trading development platform, and it is my intent and hope that the presentation will spark debate on approaches for gracefully exposing non-programmers to the wonderful world of asynchronous programming by way of DSLs.

A Pragmatic Perspective on Software Visualization

Speaker: Arie van Deursen

For software visualization researchers taking the pragmatic philosophical stance, the ultimate measure of success is adoption in industry. For you as researcher, what can be more satisfying than enthusiastic developers being able to work better and more efficiently thanks to your beautiful visualization of their software? One of the aims of this talk is to reflect on factors affecting impact in practice of software visualization research. How does rigorous empirical evaluation matter? What is the role of foundational research that does not subscribe to the philosophy of pragmatism? Can we make meaningful predictions of adoption in practice if this takes 10 years or more? During the talk, I will illustrate the dilemmas, opportunities, and frustrations involved in trying to achieve practical impact with examples drawn from my own research in such areas as software architecture analysis, documentation generation, and Web 2.0 user interface reverse engineering. I will also shed light on some of my most recent research activities, which includes work in the area of spreadsheet comprehension. This is research that we conduct with a major (Dutch) financial asset management firm. Our work consists of the identification of information needs for professional spreadsheet users, a visualization to address these needs, and an evaluation of this visualization with practitioners conducting real-life spreadsheet tasks. Throughout the talk, I will encourage the audience to engage in the discussion, and contribute their own perspectives on the issues that I raise in my talk.

On Technical Debt

Speaker: Philippe Kruchten

The technical debt metaphor is gaining significant traction in the software development community as a way to understand and communicate issues of intrinsic quality, value, and cost. The idea is that developers sometimes accept compromises in a system in one dimension (e.g., modularity) to meet an urgent demand in some other dimension (e.g., a deadline), and that such compromises incur a “debt”: on which “interest” has to be paid and which should be repaid at some point for the long-term health of the project. Little is known about technical debt, beyond a wide range of feelings and opinions.

Software developers and corporate managers frequently disagree about important decisions regarding how to invest scarce resources in development projects, especially in internal quality aspects that are crucial to system sustainability, but that are largely invisible to management and customers, and that do not generate short-term revenue. Among these properties are code and design quality and documentation. Engineers and developers often advocate for such investments, but executives question their value and frequently decline to approve them, to the long-term detriment of software projects. The situation is exacerbated in projects that must balance short deadlines with long-term sustainability.

There is a key difference between debt that results from employing bad engineering practices and debt that is incurred through intentional decision-making in pursuit of a strategic goal. While an appealing metaphor, theoretical foundations for identifying and managing technical debt are lacking. In addition, while the term was originally coined in reference to coding practices, today the metaphor is applied more broadly across the project lifecycle and may include practices of refactoring, test-driven development, iteration management and software craftsmanship.

The concept of technical debt could provide a basis on which the various parties could reason about the best course of action for the evolution of a software product. In this brief presentation I will present the metaphor and its limitation, give examples of various type of technical debt and how it can be estimated, measured and maybe tackled. I will also discuss a possible research agenda on technical debt.

Understanding Plug-in Test Suites from an Extensibility Perspective

Speaker: Michaela Greiler

Plug-in architectures enable developers to build extensible software products. Such products are assembled from plug-ins, and their functionality can be enriched by adding or configuring plug-ins. The plug-ins themselves consist also of multiple plug-ins, and offer dedicated points through which their functionality can be influenced. A well-known example of such an architecture is Eclipse, best known for its use to create a series of extensible IDEs. In order to test systems built from plug-ins developers use extensive automated test suites. Unfortunately, current testing tools offer little insight in which of the many possible combinations of plug-ins and plug-in configurations are actually tested. To remedy this problem, we propose three architectural views that provide an extensibility perspective on plug-in based systems and their test suites. The views combine static and dynamic information on plug-in dependencies, extension initialization, and extension usage. The views are implemented in ETSE, the Eclipse Plug-in Test Suite Exploration tool. We evaluate the proposed views by analyzing eGit and Mylyn, two open source Eclipse plug-ins.

Replaying Past Changes on Multi-developer Projects

Speaker: Lile Hattori

What was I working on before the weekend? and What were the members of the my team working on during the last week? are common questions that are frequently asked by a developer. They can be answered if one keeps track of who changes what in the source code. In this work, we present Replay, a tool that allows one to replay past changes as they happened at a fine-grained level, where a developer can watch what she has done or understand what her colleagues have done in past development sessions. With this tool, developers are able to not only understand what sequence of changes brought the system to a certain state (e.g., the introduction of a defect), but also deduce reasons for why her colleagues per- formed those changes. One of the applications of such a tool is also discovering the changes that broke the code of a developer.

Automating System Tests Using Declarative Virtual Machines

Speaker: Eelco Dolstra

Automated regression test suites are an essential software engineering practice: they provide developers with rapid feedback on the impact of changes to a system's source code. The inclusion of a test case in an automated test suite requires that the system's build process can automatically provide all the environmental dependencies of the test. These are external elements necessary for a test to succeed, such as shared libraries, running programs, and so on. For some tests (e.g., a compiler's), these requirements are simple to meet.

However, many kinds of tests, especially at the integration or system level, have complex dependencies that are hard to provide automatically, such as running database servers, administrative privileges, services on external machines or specific network topologies. As such dependencies make tests difficult to script, they are often only performed manually, if at all. This particularly affects testing of distributed systems and system-level software.

This paper shows how we can automatically instantiate the complex environments necessary for tests by creating (networks of) virtual machines on the fly from declarative specifications. Building on NixOS, a Linux distribution with a declarative configuration model, these specifications concisely model the required environmental dependencies. We also describe techniques that allow efficient instantiation of VMs. As a result, complex system tests become as easy to specify and execute as unit tests. We evaluate our approach using a number of representative problems, including automated regression testing of a Linux distribution.

A Metric for Assessing Component Balance of Software Architectures

Speaker: Eric Bouwers

The decomposition of a software system into components is a major decision in a software architecture, having a strong influence on many of its quality aspects. A system’s analyzability, in particular, is influenced by its decomposition into components. But into how many components should a system be decomposed? And how should the elements of the system be distributed over those components?

In this paper, we set out to find an answer to these questions by capturing them jointly inside a metric called Component Balance. We calibrate this generic metric with the help of a repository of industrial and open source systems. We report on an empirical study that demonstrate that the metric is strongly correlated with ratings given by experts. In a case study we show that the metric provides relevant results in various evaluation scenarios.

Supporting Professional Spreadsheet Users by Generating Leveled Dataflow Diagrams

Speaker: Felienne Hermans

Thanks to their flexibility and intuitive programming model, spreadsheets are widely used in industry, often for business-critical applications. Similar to software developers, professional spreadsheet users demand support for maintaining and transferring their spreadsheets.

We first studied the problems and information needs of professional spreadsheet users by means of a survey conducted at a large financial company. Based on these needs, we then present an approach that extracts this information from spreadsheets and provides it in a compact and easy to understand way using leveled dataflow diagrams. Our approach comes with three different views on the dataflow and allows the user to analyze the information in a top-down fashion also using slicing techniques.

To evaluate the usefulness of the proposed approach, we conducted a series of interviews as well as nine case studies in an industrial setting. The results of the evaluation clearly indicate the demand for and usefulness of our approach in ease the understanding of spreadsheets.

Extending Code Generators by Transforming Generated Code

Speaker: Rob Economopoulos

Code generated from high-level specifications often requires modification before deployment, but no approach exists that allows an application programmer to make these modifications in a safe, reliable, and systematic way. Hence, incomplete or incorrect generators cannot be changed easily, domain evolution is not supported, and experimenting with new features is not possible. In this talk, I'll show how to modify and extend code generators indirectly, by automatically transforming the generated code. Two related techniques to enable automatic code customizations are explored: i) syntactic customization patches, whose implementation required Stratego to be extended to allow the embedding of concrete syntax in concrete syntax and ii) semantic join points, implemented as Java annotations and weaved into the generated code using AspectJ. Both approaches are based on ideas from aspect-oriented programming, but have opposite characteristics. Customization patches do not require changes to the underlying code generator, but can suffer from the fragile point-cut problem, while semantic join points are stable but require support from the generator. I will describe and evaluate the application of these techniques in the customization of WebDSL, a domain-specific language for modeling dynamic, data-rich web applications.

Combining Micro-Blogging and IDE Interactions to Support Developers in their Quests

Speaker: Anja Guzzi

Software engineers spend a considerable amount of time on program comprehension. Although vendors of Integrated Development Environments (IDEs) and analysis tools address this challenge, current support for reusing and sharing program com- prehension knowledge is limited. As a consequence, developers have to go through the time-consuming program understanding phase multiple times, instead of recalling knowledge from their past or other’s program comprehension activities. In this paper, we present an approach to making the knowledge gained during the program comprehension process accessible, by combining micro-blog messages with interaction data automati- cally collected from the IDE. We implemented the approach in an Eclipse plugin called James and performed a first evaluation of the underlying approach effectiveness, assessing the nature and usefulness of the collected messages, as well as the added benefit of combining them with interaction data.

Enabling Multi-Tenancy: An Industrial Experience Report

Speaker: Cor-Paul Bezemer

Multi-tenancy is a relatively new software architecture principle in the realm of the Software as a Service (SaaS) business model. It allows to make full use of the economy of scale, as multiple customers – “tenants” – share the same application and database instance. All the while, the tenants enjoy a highly configurable application, making it appear that the application is deployed on a dedicated server. The major benefits of multi-tenancy are increased utilization of hardware resources and improved ease of maintenance, resulting in lower overall application costs, making the technology attractive for service providers targeting small and medium enterprises (SME). Therefore, migrating existing single-tenant to multi-tenant applications can be interesting for SaaS software companies. In this paper we report on our experiences with reengineering an existing industrial, single-tenant software system into a multitenant one using a lightweight reengineering approach.

Automated Deployment of a Heterogeneous Service-Oriented System

Speaker: Sander van der Burg

Deployment of a service-oriented system in a network of machines is often complex and labourious. In many cases components implementing a service have to be built from source code for the right target platform, transferred to the right machines with the right capabilities and activated in the right order. Upgrading a running system is even more difficult as this may break the running system and cannot be performed atomically. Many approaches that deal with the complexity of a distributed deployment process only support certain types of components or specific environments, while general solutions lack certain desirable non-functional properties, such as atomic upgrading. This paper shows Disnix, a deployment tool which allows developers and administrators to reliably deploy, upgrade and roll back a service-oriented system consisting of various types of components in a heterogeneous environment from declarative specifications.

Topic Clouds Extraction from End-user Documents

Speaker: Arsen Storojev

Word documents and spreadsheets are widely used in organizations all over the world. Domain information, contained in these spreadsheets could be of a great value for requirements engineers when collecting requirements for new software systems at the organizations. Visualizing domain context is one of the possible ways to communicate it to the requirements engineers. The master’s project of Arseni Storojev (supervisor: Felienne Hermans) explores the possibility of creating “Topic Clouds” – a visual tag-cloud like representation of the important topics, extracted from end-user documents. Latent Dirichlet Allocation and an extended graph-based keyphrase extraction algorithm TextRank are selected for this task. The topic clouds generated by the algorithms are evaluated by experiments with domain experts.

Testing in the Google environment

Speaker: James A. Whittaker

Google releases software many times every day. Ever wonder what it takes to test in such an environment? James Whittaker talks about test methodology, tools and innovation surrounding the discipline of quality assurance at Google where testers are far outnumbered by developers. Specifically he will present how the webapp-chrome-chromium stack is tested to ensure that Google apps work well on Chrome browser and Chromium operating system. During the talk he presents how Google treats testing activity much like a hospital triages emergency room patients and how game playing metaphors have inspired the development of next generation test automation tools.

Speaker Bio Dr. Whittaker is currently the Engineering Director over engineering tools and testing for Google's Seattle and Kirkland offices. He holds a PhD in computer science from the University of Tennessee and is the author or coauthor of four acclaimed textbooks. How to Break Software, How to Break Software Security (with Hugh Thompson) and How to Break Web Software (with Mike Andrews). His latest is Exploratory Software Testing: Tips, Tricks, Tours and Techniques to Guide Test Design and he's authored over fifty peer-reviewed papers on software development and computer security. He holds patents on various inventions in software testing and defensive security applications and has attracted millions in funding, sponsorship, and license agreements while a professor at Florida Tech. He has also served as a testing and security consultant for dozens of companies and spent 3 years as an architect at Microsoft.

Requirements for collaborative software in global software engineering

Speaker: Martijn Reijerse

In global software engineering (GSE), coordination of a software development project can be difficult. Software tools can provide support for coordination and collaboration among developers. In the interest of the high amount of differences between the tools and the scarcity of the studies that treat multiple requirements, a study to identify the requirements for such tools is desirable. The research question that we aim to answer is: 'Which requirements should be fulfilled by collaborative software supporting synchronous collaboration?'. To answer this question, this study critically selects a range of studies, presents an analysis of the selected studies and displays a derived list of requirements. This list is an overview of all presented requirements in the selected studies. Subsequently, the requirements will be validated with practical experiences and rejected requirements will be removed, in order to guarantee the validity of this study. Furthermore, the contribution of this study to GSE is investigated by measuring to what extent the requirements are already implemented in currently used collaborative software. The outcome is that not all validated requirements are implemented. As a result, this study can contribute to the development and selection of collaborative software, because the results are based on existing case studies and data is validated with practical experiences.

Reducing Maintenance Effort through Generic Recording of Deployed Software Operation: A Field Study

Speaker: Henk van der Schuur

Knowledge of in-the-field software operation is acquired unsophisticatedly: acquisition processes are implemented ad hoc, application-specific and are only triggered when end-users experience severe software operation failures. Vendors that are structurally acquiring such knowledge, often are unsuccessful in visualizing it effectively. The lack of a generic software operation knowledge acquisition approach and absence of visualization tools are only two causes of an explosion in maintenance costs. After all, maintenance effort is directly proportional to the time needed to identify software operation failures. We propose a technique for software operation knowledge acquisition and presentation by generic recording and visualization of deployed software operation. A prototype tool that implements this technique is presented, as well as an empirical study that evaluates this tool on three widely-used software packages. Results show that the technique is effective and is expected to reduce software maintenance effort and increase end-users' software operation comprehension.

Bio: Henk is a PhD student in the field of software engineering and feedback. His research is directed at revealing the potential role of software operation knowledge (SOK; knowledge of software operating in the field) within the organizations of software vendors, and to identify mechanisms for identification, acquisition, integration, presentation and utilization of SOK in these organizations.

Do we test to find failures or faults? A Diagnostic Approach to Test Prioritization

Speaker: Alberto Gonzalez

Testing typically triggers fault diagnosis to localize the detected failures. However, current test prioritization algorithms are tuned for failure detection rate rather than providing diagnostic information. Consequently, unnecessary effort might be spent to localize the faults. We present a dynamic test prioritization algorithm that trades fault detection rate for diagnostic performance, minimizing overall testing and diagnosis cost. The algorithm exploits pass/fail information from each test to select the next test, optimizing the diagnostic information produced per test.

Supporting Collaboration Awareness with Real-time Visualization of Development Activity

Speaker: Anja Guzzi

In the context of multi-developer projects, where several people are contributing code, developers must deal with concurrent development. Collaboration among developers assumes a fundamental role, and failing to address it can result, for example, in shipping delays. We argue that tool support for collaborative software development augments the level of awareness of developers, and consequently, help them to collaborate and coordinate their activities. In this context, we present an approach to augment aware- ness by recovering development information in real time and broadcasting it to developers in the form of three lightweight visualizations. Scamp, the Eclipse plug-in supporting this, is part of our Syde tool to support collaboration. We illustrate the usage of Scamp in the context of two multi-developer projects.

Supporting Collaboration with Synchronous Changes

Speaker: Lile Hattori (University of Lugano, Switzerland)

In a multi-developer project, team collaboration is essential for the success of the project. When team members are spread across different locations, individual awareness of the activity of others is compromised. Consequently, collaboration becomes a challenge. Studies have shown that some of the problems that arise due to low levels of awareness are the decline of the willingness of developers to help others and of the ability to spot specialists. Concomitantly, studies have strongly indicated the need for tools and techniques to increase team awareness and support collaboration.

In this talk we present Syde, a tool that promotes collaboration by enriching the Eclipse Integrated Development Environment (IDE) with awareness information. As opposed to the state-based approaches proposed so far, Syde uses an operation-based technique to continuously track source code changes at a fine-grained level. This allows Syde to have changes as first-class entities instead of recovering them through differencing algorithms, as state-based approaches do. We use the precise information about ongoing changes to build views and visual cues within the Eclipse IDE. The goal is to inform developers about changes that are of their interest, potential conflicts, and code ownership.

Supporting Developers with Natural Language Queries

Speaker: Harald Gall (University of Zurich, Switzerland)

The feature list of modern IDEs is steadily growing and mastering these tools becomes more and more demanding, especially for novice programmers. IDEs often cannot directly answer the questions that arise during program comprehension tasks. Developers have to map their questions to multiple queries that can be answered only by combining several tools and examining the output of each of them manually. Existing approaches are limited to a set of predefined, hardcoded questions, or require to learn a specific query language. We present a framework to query for information about a software system using guided-input natural language resembling plain English. We model data extracted by classical software analysis tools with an OWL ontology and use Semantic Web knowledge processing technologies to query it. In a case study we demonstrate how our framework can be used to answer queries about static source code information for program comprehension purposes.

The Missing Link: Unifying Remote Data and Services

Speaker: William Cook (University of Texax, Austin)

Most large-scale applications integrate remote services and/or transactional databases. Yet building software that efficiently invokes distributed service and accesses relational databases is still quite difficult. Existing approaches to these problems are based on the Remote Procedure Call (RPC) and Object-Relational Mapping (ORM). RPCs have been generalized to distributed object systems with remote proxies, a kind of remote object reference. ORM tools generally support a form of query sublanguage for efficient object selection. The last 20 years have produced a long litany of technologies based on these concepts, including ODBC, CORBA, DCE, DCOM, RMI, DAO, OLEDB, SQLJ, JDBC, EJB, JDO, Hibernate, XML-RPC, Web Services and LINQ. Even with these technologies, complex design patterns for service facades and/or bulk data transfers must be followed to optimize communication between client and server or client and database, leading to programs that are difficult to modify and maintain. While significant progress has been made, there is no widely accepted solution or even agreement about what the solution should look like.

In this talk I present a new unified approach to invocation of distributed services and data access. The solution involves a novel control flow construct that partitions a program block into remote and local computations, while efficiently managing the communication between them. The solution does not require proxies or an embedded query language. Although the result itself is elegant and useful, what is more significant is the realization that the original problems cannot be solved using existing programming language constructs and libraries. This work calls into question our assumption that general-purpose programming languages are truly general-purpose.

ASPIC: Awareness-based Support Project for Interpersonal Collaboration

Speakers: Kevin Dullemond and Ben van Gameren

On the 1st October 2009 we started our PhD track. We will research what role awareness plays in collaborative software development, especially when the development team is distributed over multiple locations, and how this can be supported with technology. In this talk we will introduce this line of research and provide an example of a tool we developed which supports the awareness of ongoing conversations in a distributed setting with reduced effort (OCL - Open Conversation List).

Detecting (r)evolution events from software product measurement streams

Speaker: Jose Pedro Correia (Software Improvement Group, Amsterdam)

Collecting product metrics during the development or maintenance process is an increasingly common practice to monitor and thus have more control over both the progress and the quality of a software product. An important challenge remains in interpreting the data as it is being collected and transforming it into actionable information. We present an approach for discovering significant events in the production process from the associated stream of measurement data. At the heart of our approach lies the view of measurement data streams as functions for which derivatives can be calculated. A data point is then selected or not as an event based on those values and the history of activity until that point. The technique described has been used to put in place an 'alert service' for 54 systems monitored by the Software Improvement Group.

What your IDE could do once you understand your code

Speaker: Arie van Deursen

A significant part of today's program comprehension research addresses the long and complicated road a developer needs to travel to understand a given piece of code. But perhaps the best way to shorten this road, is by focusing on the eventual moment of enlightenment that marks the end of this road, when the developer actually understands the code and is about to make the required change. Can we record this valuable moment of true understanding? Can the IDE know which methods, classes, execution traces, test cases, diagrams, or other artifacts contributed to this understanding? Are there light-weight mechanisms to ask the developer to record his understanding, for example via tagging, micro-blogging, or selection of a visualization that most accurately captures the understanding obtained? And is there a way to deal with non-monotonic understanding, in which some of the developer's earlier insights turn out to be false? At the moment, we don't have answers to these questions. But, together with the audience, we will explore what results have been achieved so far, and which challenges need to be addressed to find the required answers. Furthermore, we will reflect on the simplest possible IDE that could offer this type of support for recording actual understanding, and investigate to what extent Web 2.0 based technologies can be used to realize such an IDE.

Managing Code Clones Using Dynamic Change Tracking and Resolution

Speaker: Andy Zaidman

Code cloning is widely recognized as a threat to the maintainability of source code. As such, many clone detection and removal strategies have been proposed. However, some clones can often not be removed easily so other strategies, based on clone management need to be developed. In this paper we describe a clone management strategy based on dynamically inferring clone relations by monitoring clipboard activity. We introduce CLONEBOARD, our Eclipse plug-in implementation that is able to track live changes to clones and offers several resolution strategies for inconsistently modified clones. We perform a user study with seven subjects to assess the adequacy, usability and effectiveness of CLONEBOARD, the results of which show that developers actually see the added value of such a tool but have strict requirements with respect to its usability.

Test Generation with Grammars and Covering Arrays

Speaker: Paul Strooper


Grammars and covering arrays have seen extensive use in test generation. A covering-array algorithm takes a list of domains and generates a subset of the Cartesian product of the domains. A grammar-based test generation (GBTG) algorithm takes a grammar G and generates a subset of the language accepted by G. Covering arrays and GBTG are usually applied independently. We show that CFG rules and covering array specifications can be freely intermixed, with precise, intuitive and efficient generation. The potential benefits for automated test generation are significant. We present an approach for "tagging" grammars with specifications for mixed-strength covering arrays, a generalisation of conventional covering arrays. We will demonstrate a prototype tool for generating test cases from tagged grammars.


Paul Strooper is a Professor in the School of ITEE at The University of Queensland. He received the BMath and MMath degrees in Computer Science from the University of Waterloo, and the PhD degree in Computer Science in 1990 from the University of Victoria. His main research interest is Software Engineering, especially software specification, verification, and testing. He has had substantial interaction with industry through collaborative research projects, training and consultation in the area of software verification and validation. He was one of the program co-chairs for the 2002 Asia-Pacific Software Engineering Conference (APSEC), one of the General Chairs of the 2009 Australian Software Engineering Conferences (ASWEC) and the program chair for ASWEC in 2004 and 2005. He is member of Steering Committees for ASWEC and APSEC, and a member of the editorial board of the IEEE Transactions on Software Engineering and the Journal of Software Testing, Verification and Reliability.

Supporting Collaboration Awareness in Multi-developer Projects

Speaker: Anja Guzzi

Teamwork is necessary to produce large software systems in a reasonable amount of time. A team of developers working on the same project must deal with con current development. Collaboration among team members assumes a fundamental role during the whole development process of systems. Failing to appropriately take care of collaboration aspects, such as awareness, communication and synchronization, can result in the delay of a whole project. However, the negative consequences of uncoordinated concurrent development can be reduced with tool support for collaborative software development. We developed Scamp, an Eclipse Plug-in conceived to support collaboration awareness through visualization. Scamp is built on top of Syde, which provides an environment for synchronous development. Relying on the underlying structure, Scamp visualizes changes in a system as they happen in three different ways: a distinctive mark on changed entities, a Tag Cloud and a “Buckets view”.

Criteria for the Evaluation of Implemented Architectures

Speaker: Eric Bouwers

Software architecture evaluation methods aim at identifying potential maintainability problems for a given architecture. Several of these methods exist, which typically prescribe the structure of the evaluation process. Often left implicit, however, are the concrete system attributes that need to be studied in order to assess the maintainability of implemented architectures.

To determine this set of attributes, we have performed an empirical study on over 40 commercial architectural evaluations conducted during the past two years as part of a systematic ``Software Risk Assessment''. We present this study and we explain how the identified attributes can be projected on various architectural system properties, which provides an overview of criteria for the evaluation of the maintainability of implemented software architectures.

Traits at Work

Speaker: Stephane Ducasse

Traits have been proposed as a mechanism to compose and share behavioral units between distinct class hierarchies. They are an alternative to multiple inheritance, the most significant difference being that name conflicts must be explicitly resolved by the trait composer. Traits are recognized for their potential in supporting better composition and reuse. They have been integrated into a significant number of languages, such as Perl 6, Slate , Squeak , DrScheme OO and Fortress (SUN Microsystems). Although originally designed in a dynamically typed setting, several type systems have been built for Traits (Fisher, Smith, Liquori, Reppy). We will present trait, their applications and their evolutions.

Link to research:

A quick look at two research project at TUT

Speaker: Tarja Systa

Analysis runtime behavior with Bebop and supporting interactive model transformations with DReAMT

We propose an approach, called behavioral profiles, to model and validate architecturally significant behavior. Behavioral profiles are used to illustrate role-based behavioral rules using UML2 sequence diagram notation. With Bebop tool, runtime behavior can be analyzed and validated against the profiles. The main goal is to focus on certain, architecturally interesting behaviors only, instead of analyzing the whole event trace. Bebop can identify whether the behavioral rules, given as behavioral profiles, have been followed in the implementation. In the DReAMT project, we have studied model-driven software development and analysis. Instead of assuming a set of predefined model transformations, we rather aim at developing transformations incrementally, gradually refining them into more complete ones as the body of knowledge of the domain grows. To achieve this, we describe discovered partial and incomplete transformations as patterns and iteratively refine them. When applying the transformations, we provide the user a possibility to make decisions and influence the transformation steps. The decisions are made as needed during the transformation and not in a separate step.

New Uses of Simulation in Distributed System Engineering

Speaker: Alexander Wolf

Simulation has been used by software engineers for many years to study the functionality and performance of complex distributed system designs. For example, they are used to understand network protocols, tune distributed systems, and improve distributed algorithms. They are appealing to engineers because of their inherent efficiency and scalability. Unlike many other development artifacts, simulations seem to be used, and therefore well maintained, throughout the development process, both as early design tools and as late evaluation tools. Given the effort invested in the construction and maintenance of simulations, and the degree to which developers trust in them, we wonder whether there are other purposes to which they can be put.

In this talk I present two such uses, one to increase the power of large-scale distributed experimentation and the other to develop a rigorous testing method for distributed systems.

Improving Web Application Security and Reliability through Program Analysis

Speaker: Alessandro Orso

Over the past decade, the popularity of web applications has steadily grown. Nowadays, millions of users utilize web applications to access a multitude of services, such as online banking, e-shopping, and gaming. Because web applications have become such an essential part of our daily lives, it is essential to devise effective quality assurance techniques for these applications. Although there has been a great deal of research in software quality assurance, many techniques that work effectively on traditional software and on simple web applications are inadequate when used on modern, highly dynamic web applications. Part of the reason for this inadequacy is that traditional abstractions used in these techniques, such as control-flow, data-flow, and interfaces, are fairly different in web applications. In this talk, we present a set of program-analysis based techniques that we developed to address some of the shortcomings of existing approaches. We also show how our techniques can support and improve different quality assurance techniques for web applications. Finally, we discuss the results of an empirical evaluation of our techniques performed on a set of real web applications, in which we assess their effectiveness and practical applicability.

Invariant-Based Automatic Testing of AJAX User Interfaces

Speaker: Ali Mesbah

AJAX-based Web 2.0 applications rely on stateful asynchronous client/server communication, and client-side runtime manipulation of the DOM tree. This not only makes them fundamentally different from traditional web applications, but also more error-prone and harder to test. We propose a method for testing AJAX applications automatically, based on a crawler to infer a flow graph for (client-side) user interface states. We identify AJAXspecific faults that can occur in such states (related to DOM validity, error messages, discoverability, back-button compatibility, etc.) as well as DOM-tree invariants that can serve as oracle to detect such faults. We implemented our approach in ATUSA, a tool offering generic invariant checking components, a plugin-mechanism to add application-specific state validators, and generation of a test suite covering the paths obtained during crawling.

Studying the Relation Between Coding Standard Violations and Known Faults

Speaker: Cathal Boogerd

In spite of the widespread use of coding standards and tools enforcing their rules, there is little empirical evidence supporting the intuition that they prevent the introduction of faults in software. In previous work, we performed a pilot study to assess the relation between rule violations and actual faults, using the MISRA C 2004 standard on an industrial case. In this talk, we investigate three different aspects of the relation between violations and faults on a larger case study, and compare the results across the two projects. We find that 10 rules in the standard are significant predictors of fault location.

Evaluating Visualization

Speaker: Bas Cornelissen

Visualization is a common technique to support software understanding, and many variants have been proposed in the literature in the past decades. Yet, when it comes down to their evaluation, one typically resorts to anecdotal evidence rather than actual human subjects. Until recently, the same was true for our own visualization tool (the one with the exotic and colorful look). In this talk, we present the design of a controlled experiment to quantitatively measure the added value of visualization for typical software maintenance tasks. We then report the results of the actual application of this design on our visualization tool and 24 human subjects.

Software Deployment in a Dynamic Cloud: From Device to Service Orientation in a Hospital Environment

Speaker: Sander van der Burg

Hospital environments are currently primarily device-oriented: software services are installed, often manually, on specific devices. For instance, an application to view MRI scans may only be available on a limited number of workstations. The medical world is changing to a service-oriented environment, which means that every software service should be available on every device. However, these devices have widely varying capabilities, ranging from powerful workstations to PDAs, and high-bandwidth local machines to low-bandwidth remote machines. To support running applications in such an environment, we need to treat the hospital machines as a cloud, where components of the application are automatically deployed to machines in the cloud with the required capabilities and connectivity. In this talk, we suggest an architecture for applications in such a cloud, in which components are reliably and automatically deployed on the basis of a declarative model of the application using the Nix package manager.

Execution Models to Describe Large Software-Intensive Systems

Speaker: Trosky B. Callo

Execution models describe what software system does at runtime and how it does it. Although it is obvious that execution models are important assets to facilitate system evolution, in practice development organization do not pay enough attention to create useful execution models. In this talk we present the foundations and infrastructure that we have developed to support the construction of execution models within a large development organization developing a large and complex software-intensive system. The foundations includes the abstractions, concerns, and stakeholders of execution models. The infrastructure consist of sources of information and tools that an organization should make available in order to construct useful execution models without considerable overhead.

Virtual Components: Integration Testing of Data Flow-oriented systems

Speaker: Eric Piel

In order to improve the quality and to test the correct behaviour of a component-based system, components should not only be unit-tested but also the integration of the components should be tested. In this presentation, after having detailed our motivations, we will first have a look at some existing approaches for integration testing. In particular, we will highlight the difficulties met when testing data flow-oriented systems, in which components never have any explicit expectation on the other components.

We will then present our approach based on the notion of "virtual components". Complementary to the other existing approaches, it allows to verify very specific behaviours of the system, while leveraging the techniques of unit testing. Later on, will be given the status on the current implementation and on the validation of the approach.

Assessing the Value of Coding Standards

Speaker: Cathal Boogerd

In spite of the widespread use of coding standards and tools enforcing their rules, there is little empirical evidence supporting the intuition that they prevent the introduction of faults in software. Not only can compliance with a set of rules having little impact on the number of faults be considered wasted effort, but it can actually result in an increase in faults, as any modification has a non-zero probability of introducing a fault or triggering a previously concealed one. Therefore, it is important to build a body of empirical knowledge, helping us understand which rules are worthwhile enforcing, and which ones should be ignored in the context of fault reduction. In this talk, we discuss two approaches to quantify the relation between rule violations and actual faults, and present empirical data on this relation for the MISRA C 2004 standard on an industrial case study.

An observation-based model for fault localization

Speaker: Rui Abreu

Automatic techniques for helping developers in finding the root causes of software failures are extremely important in the development cycle of software. In this paper we study a dynamic modeling approach to fault localization, which is based on logic reasoning over program traces. We present a simple diagnostic performance model to assess the influence of various parameters, such as test set size and coverage, on the debugging effort required to find the root causes of software failures. The model shows that our approach unambiguously reveals the actual faults, provided that sufficient test cases are available. This optimal diagnostic performance is confirmed by numerical experiments. Furthermore, we present preliminary experiments on the diagnostic capabilities of this approach using the single-fault Siemens benchmark set. We show that, for the Siemens set, the approach presented in this paper yields a better diagnostic ranking than other well-known techniques.

Further information: TechReport

A Systematic Survey of Program Comprehension through Dynamic Analysis

Speaker: Bas Cornelissen

Program comprehension is an important activity in software maintenance, as software must be sufficiently understood before it can be properly modified. The study of a program's execution, known as dynamic analysis, has become a common technique in this respect and has received substantial attention by the research community, particularly over the last decade.

This talk presents an introduction into the use of dynamic analysis for program comprehension. Next, we report on the results of our systematic literature survey on this topic, in which we selected and characterized a total of 172 articles on the basis of four main facets: activity, target, method, and evaluation. We conclude with several important lessons learned and a series of future directions.

A verifiable Posix File-System using Flash Memory

Speaker: Kees Pronk

This talk will be about three related subjects:

  • The Grand Challenges as defined in the UK by the UK Computer Research Committee and GC6 (Dependable Systems Evolution) in particular. Two of the projects to be highlighted are the Mondex Electronic Purse and the Posix compliant verifiable file store using Flash memory chips.
  • The physics of Flash memory explaining the problems to be solved when constructing a reliable file store for use in harsh and remote environments (space stations, rovers).
  • The implementation and test of the Posix file store using Model Checking (Spin) on a multi-core machine (Thesis work by Paul Taverne).

Putting Fluent Interfaces to the Test

Speaker: Eric Bouwers

The API of a framework usually consists of several configuration objects and a few methods that perform operations on these objects. To use the framework a programmer initializes several value-objects, configures these objects through set-methods and then uses these objects as parameters for the methods that do the actual work. Unfortunately, this approach often leads to a long list of object-creation and method-calls in which it is hard to see the relation between the objects. In order to make the configuration of an object more readable one could use a so-called "Fluent Interface". This term, introduced by Martin Fowler in 2005, describes an interface which is constructed to write readable code.

Within this presentation we will introduce Fluent Interfaces using small examples. Furthermore, we will take a look at the API of JMock, a Java-library for unit-testing which has designed his API as a Fluent Interface. Also, we will share our experiences in implementing and using an API which is designed as a Fluent Interface.

Can Faulty Models Help to Fix Faulty Programs?

Speaker: Wolfgang Mayer

Debugging programs is a tedious task where automated tool would be most beneficial to reducing development effort. Model-based fault isolation techniques have shown great success in debugging physical systems, however, the absence of suitable formalisations of the correct behaviour hampers direct application of this paradigm to most software systems. Instead, model-based debugging derives a model from the incorrect system. Hence, the question arises in how far such incorrect models can aid in locating faults in the underlying program.

In this talk I will explore different modelling paradigms that have been tried for model-based debugging and outline our experiences with selected models. I will show how the model-based framework relates to well-known program analysis techniques and that synergies between complementary paradigms can boost accuracy.

Automating Runtime Evolution and Verification of Component-based Systems

Speaker: Alberto Gonzalez

Systems-of-Systems (SoS) represent a novel kind of system, for which runtime evolution is a key requirement, as components join and leave during runtime. Current component integration and verification techniques are not enough in such a dynamic environment. In the colloquium we will present a brief overview of the verification problem, plus the status of our current work and research platform: Atlas. Atlas is based on the Fractal component model, and extends it with ideas based on the built-in testing paradigm. Our long-term research objective (vision?) is devising a fully automated integration and verification process for dynamic systems, establishing what degree of automation can be performed by the runtime environment, and what other features need extra artefacts of support by the components.

Note: some of the ideas of this colloquium are available as a technical report at;

Aspect-oriented Web Engineering

Speaker: Matthias Niederhausen (TU Dresden)

In a quickly evolving WWW, users are accessing web pages with an increasing range of devices and from different contexts. Adaptative web applications are one way to address this growing heterogeneity, but still suffer from laborious authoring processes. Because adaptation is defined directly on the web application, authors have to consider previously defined adaptation when doing updates to the site. Therefore, maintenance of such web applications is greatly complicated. To address these problems, we propose the use of aspect-oriented programming (AOP). By separating adaptation from the core web application into adaptation aspects, authoring and maintenance processes can be considerably simplified. We will give an overview of our current work on this field as well as open research issues and collaboration opportunities.

Can Developer Social Networks Predict Failures?

Speaker: Martin Pinzger

Software teams should follow a well defined goal and keep their work focused. Work fragmentation is bad for efficiency and quality. In this paper we empirically investigate the relationship between the fragmentation of developer contributions and the number of post-release failures. Our approach is to represent the structure of developer contributions with a contribution network. We use social network centrality measures to measure the degree of fragmentation of developer contributions. Fragmentation is determined by the centrality of software modules in the contribution network. Our claim is that central software modules are more likely to be failure-prone than modules located in surrounding areas of the contribution network. We analyze this hypothesis by exploring the network centrality of Microsoft Windows Vista modules using several social network centrality measures as well as linear and logistic regression analysis. In particular, we investigate which centrality measures are significant to predict the probability and number of post-release failures. Results of our experiments show that central modules are more failure-prone than modules located in surrounding areas of the network. The basic centrality measures, number of authors and number of commits, are significant predictors for the probability of failures. For predicting the number of post-release failures the closeness centrality measures are most significant.

Static Estimation of Test Coverage

Speaker: Tiago Alves

Test coverage is an important indicator for unit test quality. Test coverage can be computed by tools such as Clover or Emma by first instrumenting the code with logging functionality, and then running the instrumented code to log which parts are executed during unit test runs. Since computation of test coverage is a dynamic analysis, it presupposes a working installation of the analyzed software.

In the context of software quality assessment by an independent third party, a working installation is often not available. The evaluator may not have access to the required software libraries or hardware platform. The installation procedure may not be automated or even documented. Instrumentation may not be feasible, e.g. due to space or time limitations in case of embedded software.

In this paper, we propose a method for estimating test coverage through static analysis only. The method uses slicing of static call graphs to estimate the actual dynamic test coverage. We explain the method and its implementation in an analysis tool. We validate the results of the static estimation by comparison to actual values obtained through dynamic analysis using Clover.

Towards an Assessment Methodology for Trace Abstraction Techniques

Speaker: Bas Cornelissen

The use of dynamic analysis for software understanding has become increasingly popular over the last years, and various abstraction techniques are being offered to counter the scalability concerns involved therein. A major issue in this respect is that such techniques are typically not assessed in a systematic fashion: most evaluations comprise the gathering of anecdotal evidence through the treatment of a limited test set, which makes the techniques difficult to compare.

In this presentation, we take a large step towards an assessment methodology for trace abstraction techniques. In particular, we propose a systematic process in which such techniques can be quantitatively assessed and compared. We also report on the application of this methodology on a selection of three trace abstraction technique. We discuss our findings and outline the directions for future work.

Note: this research is available as a technical report at

Exposing the Hidden-Web Induced by Ajax

Speaker: Ali Mesbah

Ajax is a very promising approach for improving rich interactivity and responsiveness of web applications. At the same time, Ajax techniques increase the totality of the hidden web by shattering the metaphor of a web "page" upon which general search engines are based. We present a technique for exposing the hidden web content behind Ajax by automatically creating a traditional multi-page instance. In particular we propose a method for crawling Ajax applications and building a "state-flow graph" modeling the various navigation paths and states within an Ajax application. This abstract model is used to generate linked static HTML pages and a corresponding Sitemap. We present our tool called Crawljax which implements the concepts discussed in this presentation. Additionally, we present a case study in which we apply our approach to two Ajax applications and elaborate on the obtained results.

Model Driven Engineering in the Road Traffic Management domain

Speakers: Michel Soares, Jos Vrancken

Road traffic in many industrialized countries is both highly important and highly problematic, the latter due to high casualty rates, frequent congestion and high environmental pollution. Dynamic traffic management (DTM) is an important means to control traffic and counter these negative effects.

DTM-systems are complex, software intensive systems, that are very expensive to deploy, due to the IT-infrastructure required along the roads. On the other hand, DTM is a lively research area, coming up with new and innovative control measures regularly. This entails a strong need for DTM systems and their infrastructure to be as flexible as possible, in order to accomodate new measures without renewed high investments. Yet DTM systems involve human life, so reliability is also a prime requirement.

In this talk we show how Model Driven Engineering can help in building systems with both high flexibility and high reliability. To that end a number of different Software and System engineering techniques and methodologies will be integrated into a MDE method, and applied for the road traffic domain. Components of this methodology include Model Driven Requirements Engineering, UML Profiles (SysML) and Petri Nets.

Achieving Incremental Generative Model-driven Development

Speaker: T.D. Meijler (SAP, Dresden)

In the standard generative Model-driven Architecture (MDA) adapting the models of an existing system requires a full re-generation and restarting of that system. This is due to a strong separation between the modeling environment and the run-time environment. Certain current approaches remove this separation by allowing a system to be changed in an incremental way through incremental model changes. These approaches are however based on interpretation of modeling information. In this presentation I present an approach –largely realized in a commercial R&D project– that enables full-fledged incremental generative model-driven development, i.e., applying incremental changes in a generative model-driven fashion to a system that has itself been developed in a generative model-driven way. To achieve this, model changes must be propagated to impacted elements along three dimensions: generated implementation, data (instances) and modelled dependencies. The result suggests a fundamental rethinking of the MDA, where the three dimensions are explicitly represented in an integrated modelling – run-time environment to enable total traceability.

An Integrated System to Manage Crosscutting Concerns in Source Code

Speaker:Marius Marin (TU Delft)

Evolution of software systems accounts for the largest part of their lifecycle and costs. Software engineers therefore, more often than developing new systems, work on complex, existing ones that they have to understand in order to modify them. Understanding such systems requires insight into the various concerns the systems implement, many of which have to be inferred from source code. Particularly challenging for software comprehension, and consequently, software evolution, are those concerns said to be crosscutting: implementation of such concerns lacks modularity and results in scattered and tangled code.

The research presented in this talk proposes an integrated approach to consistent comprehension, identification, documentation, and migration of crosscutting concerns in existing systems. This work is aimed at helping software engineers to more easily understand and manage such concerns in source code. As a final step of our approach, we also experiment with the refactoring of crosscutting concerns to aspect-oriented programming and reflect on the support provided by this new programming technique for improving modularization of concerns.

Reuseware -- Generic Invasive Software Composition

Speaker: Steffen Zschaler (TU Dresden)

Traditionally, most composition systems have been either black or white box. Black box composition systems allow no knowledge about the internal structure of components to be used in composing them. White box composition systems, on the other hand, allow complete knowledge about internal structures to be used and allow these structures to be manipulated in any manner when performing compositions. Invasive software composition is maintaining a middle ground in this area by following a grey-box approach to composition. Here, the structure of a component is exposed and made available for manipulation in a controlled manner. Components specify explicitly what parts of their structure should be exposed and the composition system uses this specification to construct a composition interface for the components.

The presentation gives an overview of Reuseware, a generic implementation of invasive software composition based on Eclipse. Reuseware allows invasive composition concepts to be integrated with any arbitrary language and generates parsing, editing, and composition tooling for these languages.

On How Developers Test Open Source Software Systems

Speaker: Andy Zaidman

Engineering software systems is a multidisciplinary activity, whereby a number of artifacts must be created and maintained synchronously. In this paper we investigate whether production code and the accompanying tests co-evolve by exploring a project's versioning system, code coverage reports and size-metrics. Three open source case studies teach us that testing activities usually start later on during the lifetime and are more "phased", although we did not observe increasing testing activity before releases. Furthermore, we note big differences in the levels of test coverage given the proportions of test code.

Documenting Crosscutting Concerns Using Queries

Speaker: Marius Marin

SoQueT is a tool that supports consistent documentation of crosscutting concerns in source code by using a set of pre-defined queries. Each query describes a typical relation and implementation idiom of crosscutting concerns, which we recognize as a concern sort. The tool allows the user to parameterize the sort queries in dedicated user-interfaces and then use these queries to document crosscutting concerns as sort instances. Such instances are building blocks that can be composed in a concern model to describe more complex relations or design decisions in the system under investigation. Concern models assist new developers or system maintainers in recognizing crosscutting relations and program elements (i.e., the query results) participating in these relations. The demo will show how SoQueT can be used to document crosscutting concerns, and how this documentation is useful for program comprehension and software evolution purposes.

Domain-Specific Languages in Perspective

Speaker: Jan Heering (CWI)

Domain-specific languages (DSLs) are languages tailored to a specific application domain. They offer substantial gains in expressiveness and ease of use compared with general-purpose programming languages in their domain of application. While the use of DSLs is by no means new, it is receiving increased attention in the context of software technologies such as software product-line engineering, software factories, language-oriented programming, and generative programming. These approaches advocate the development and use of DSLs as essential to the construction of software system families. We discuss these trends from the perspective of the roles DSLs have traditionally played.

Towards the Generation of a Text-Based IDE from a Language Metamodel

Speaker: Anneke Kleppe (University of Twente)

In the model driven world languages are usually specified by a (meta) model of their abstract syntax. For textual languages this is different from the traditional approach, where the language is specified by a (E)BNF grammar. Support for the designer of textual languages, e.g. a parser generator, is therefore normally based on grammars. This paper shows that similar support for language design based on metamodels is not only possible, but is even more powerful than the support based on grammars. In this paper we describe how an integrated development environment for a language can be generated from the language’s abstract syntax metamodel, thus providing the language designer with the possibility to quickly, and with little effort, create not only a new language but also the tooling necessary for using this language.

Wireless Sensor Networks: A Promise that Fails to Deliver

Speaker: Koen Langendoen

Wireless Sensor Network (WSN) technology has been envisioned to revolutionize society by providing unlimited access to context data ranging from simple scalars (temperature, light, heartbeat, etc.) to aggregated information (object tracking). This vision has inspired a large body of research on how to construct large-scale, self-organizing networks out of resource-scarce sensor nodes running of batteries or ambient energy sources. The output has been a range of new protocols/algorithms trading off performance for energy efficiency. Taking these protocols from theory (simulation) to practice, however, has proven to be a mission impossible. The prime reason being that traditional software development for embedded systems -- testing -- does not scale to WSNs because of the large number of nodes involved and the unreliable nature of the wireless medium. Also, debugging a wireless system is a "challenge" because of the limited access, especially when the protocol stack is under development. Many pilot WSN deployments have learned this the hard way, and if nothing changes WSN is doomed to fail entering the consumer market where development cost is a key factor to success. So HELP! Do you software engineering researchers have a proper development technology that could save the day? Any pointers greatly appreciated!

Meta-analysis for the linux kernel: recognising, describing and analysing kernel domain abstractions expressed in C

Speaker: Peter Breuer

The linux kernel is written in low-level C and assembler, yet expresses much higher level concepts. While a C compiler does a good job of detecting and advising of low-level problems such as writing a signed value into an unsigned receptacle for it, it cannot warn of higher level problems such as accessing a field of a compound structure that is potentially visible to many different threads of computation at the same time without first having taken an appropriate lock. Detecting that kind of problem requires a much higher level analysis which takes into account domain knowledge of the kernel structures and interfaces and their intended mode of use.

This talk will set out some of the experience gained and problems experienced (as well as solutions adopted, clearly) in recognising the higher-level structures in the kernel and codifying their intended modes of use, and detecting the real programming irregularities in the kernel that have arisen in the last ten years of practice. The analysis tool is an extensible logic compiler and lessons from its development should be applicable to the general problem of extending languages with domain-specific knowledge.

Declarative Object Identity using Relation Types

Speaker: Frank Tip (IBM)

Object-oriented languages define the identity of an object to be an address-based object identifier. The programmer may customize the notion of object identity by overriding the equals() and hashCode() methods following a specified contract. This customization often introduces latent errors, since the contract is unenforced, and at times impossible to satisfy. Notably, equals() may refer to mutable state, which allows object identity to change during execution, breaking standard library invariants.

We propose a programming model based on a relational view of the heap which defines identity declaratively, obviating the need for equals() and hashCode() methods. Each element in the heap (called a tuple) belongs to a relation type and relates an immutable identity to mutable state. The model entails a stricter contract: identity never changes during an execution. Objects, values, and singletons arise as special cases of tuples.

We formalize the model as an adaptation of Featherweight Java, and implement it by extending Java with relation types. Experiments on a set of Java programs show that the majority of classes that override equals() can be refactored into relation types, and that most of the remainder are buggy or fragile.

This work will be presented at ECOOP'07.

Grammar Engineering Support for Precedence Rule Recovery and Compatibility Checking

Speaker: Martin Bravenboer

A wide range of parser generators are used to generate parsers for programming languages. The grammar formalisms that come with parser generators provide different approaches for defining operator precedence. Some generators (e.g. YACC) support precedence declarations, others require the grammar to be unambiguous, thus encoding the precedence rules. Even if the grammar formalism provides precedence rules, a particular grammar might not use it.

The result is grammar variants implementing the same language. For the C language, the GNU Compiler uses YACC with precedence rules, the C-Transformers uses SDF without priorities, while the SDF library does use priorities. For PHP, Zend uses YACC with precedence rules, whereas PHP-front uses SDF with priority and associativity declarations.

The variance between grammars raises the question if the precedence rules of one grammar are compatible with those of another. This is usually not obvious, since some languages have complex precedence rules. Also, for some parser generators the semantics of precedence rules is defined operationally, which makes it hard to reason about their effect on the defined language.

We present a method and tool for comparing the precedence rules of different grammars and parser generators. Although it is undecidable whether two grammars define the same language, this tool provides support for comparing and recovering precedence rules, which is especially useful for reliable migration of a grammar from one grammar formalism to another. We evaluate our method by the application to non-trivial mainstream programming languages, such as PHP and C.

Modularization of Language Constructs: A Reflective Approach

Speaker: Thomas Cleenewerck (VUB)

Programming languages are in a continuous state of flux in order to keep up with emerging needs of programmers. They are grown with new constructs so that programmers can express the problems from their domain within the language they are using. Growing languages means to grow their implementations along with them. To support this, we wish to preserve the decomposition of languages into language constructs in their implementations. As the design of a language implementation directly reflects our intuitive decomposition, a developer can engage in the natural process of developing a language. We preserve the decomposition into language constructs by modularizing the definition of language constructs in separate implementation modules containing their syntactical representation and their translational semantics. In this setting, growing a language boils down to writing or selecting the appropriate language constructs and establishing the necessary interactions. As the language is continuously evolving during its implementation and future evolutions, the modularization of the language constructs renders the implementation less susceptible to the continuous changes.

The modularization of language implementations has been the subject of much research in the domain of compiler technology. The complexity of this research lies in the fact that language constructs intrinsically take into account other language constructs and therefore compromise their opportunities for modularization. Indeed, the mechanisms presented by the contemporary state of the art technologies for separating a language implementation into modules do not suffice.

In this talk, we present a lightweight formal model for the modularization of language constructs.From this model we deduce a new language implementation design in which languages consists of three kinds of concerns: the basic language concerns defining the language constructs, the language specification defining the interactions between the basic concerns by using the special-purpose concerns which define the mechanisms to implement the interactions. As a solution for the above model, we present an open design for a new language development technique: A language implementation is decomposed into a set of interacting language modules called linglets. Each linglet defines in isolation the syntax and the (translational) semantics of a single language construct in terms of another (lower level) language. The mechanisms to establish the necessary interactions among the language constructs are captured in strategies. The strategies are defined as extensions of a specifically tailored metaobject protocol (MOP).

Linglets and strategies can be reused across language implementations. In addition, new linglets and strategies can be defined, and existing ones can be specialized to respectively establish the interactions with other linglets and to meet and adapt the strategy for the challenges in a particular language implementation.

We validate our approach by developing a non-trivial family of domainspecific languages using a shared pool of language constructs and strategies, and by implementing the necessary metalanguages. The resulting language implementations are optimized with respect to separation of concerns according to their language constructs.

Middleware for Semantic Service Advertisement and Discovery on MANETs

Speaker: Zef Hemel

MANETs (Mobile Ad-hoc Networks) offer exciting new research opportunities now that devices with wireless capabilities become more widespread. Many wireless technologies, such as 802.11, support these ad-hoc style networks. Opportunities lie in many areas, such as routing protocols, services and applications. The network topology of MANETs is constantly changing and the devices on these networks, like laptops and PDAs, have limited processing and battery power.

Research on low-level protocols that do semantic service discovery on ad-hoc networks is emerging. Pervasive and mobile computing applications require these protocols, but using them requires a lot of engineering and knowledge of network protocols and service matching.

This talk will give an overview of the challenges that lie in this area and how they are addressed by the developed middleware. The purpose of the middleware is to make the task of defining, advertising and discovering semantic services on MANETs more straight- forward by offering APIs to complete these tasks. As part of the semantic service matching, context such as location and workload can be defined and matched to further improve the discovery results. Services and context are described using ontologies. Queries for services can be expressed in a newly developed query language called RaSSQL (RDF and Semantic Service Query Language). The middleware, implemented in Python, is based on ideas from the OntoMobil protocol (developed at TCD), but can use any protocol that discovers services based on concept dissemination.

ComplexityMap, Bridging the Gap

Speaker: Mark Hissink Muller

A diagramming style will be presented, which aims to bridge the gap between business users and application developers/architects, by de/pre-scribing a common way of looking at integration architecture. This new style can help to open the traditional black-box, which is too often used as a management approach in system development and maintainance. Based on the diagramming style, a proof of concept will be shown to indicate quality in various parts of a J2EE-application. Such a 'ComplexityMap' allows quality to be monitored and managed as an integral part of the application lifecycle.

Model-Driven Consistency Checking of Behavioural Specifications

Speaker: Bas Graaf

For the development of software intensive systems different types of behavioural specifications are used. Although such specifications should be consistent with respect to each other, this is not always the case in practice. Maintainability problems are the result. In this paper we propose a technique for assessing the consistency of two types behavioural specifications: scenarios and state machines. The technique is based on the generation of state machines from scenarios. We specify the required mapping using model transformations. The use of technologies related to the Model Driven Architecture enables easy integration with widely adopted (UML) tools. We applied our technique to assess the consistency of the behavioural specifications for the embedded software of copiers developed by Oce. Finally, we evaluate the approach and discuss its generalisability and wider applicability.

Towards Unification of Software Component Procurement and Integration Approaches

Speaker: Gerd Gross

Software component procurement and integration are primarily based upon having the right communication mechanisms available that can map component customer requirements to component provider specifications. Such mechanisms are currently only available on lower levels of abstraction, close to the implementation level. This paper describes the research being performed at the TU Delft Embedded Software Laboratory to elevate typical component feature mapping mechanisms from the implementation level up onto the design and requirements engineering levels.

An Evaluation of Similarity Coefficients for Software Fault Localization

Speaker: Rui Abreu

Automated diagnosis of software faults can improve the efficiency of the debugging process, and is therefore an important technique for the development of dependable software. In this presentation we discuss different similarity coefficients that are applied in the context of a program spectral approach to software fault localization (single programming mistakes). The coefficients studied are taken from the systems diagnosis / automated debugging tools Pinpoint, Tarantula, and AMPLE, and from the molecular biology domain (the Ochiai coefficient). We evaluate these coefficients on the Siemens Suite of benchmark faults, and assess their effectiveness in terms of the position of the actual fault in the probability ranking of fault candidates produced by the diagnosis technique. Our experiments indicate that the Ochiai coefficient consistently outperforms the coefficients currently used by the tools mentioned. In terms of the amount of code that needs to be inspected, this coefficient improves 5% on average over the next best technique, and up to 30% in specific cases.

Visualizing Testsuites to Aid in Software Understanding

Speaker: Bas Cornelissen

Agile methods and eXtreme Programming have brought renewed attention to testing during the software development process, both as a quality assurance method and as a form of live documentation. It is for this reason that a software system's testsuite is an ideal starting point for gaining knowledge about its inner workings. We propose to use sequence diagrams to visualize information that was dynamically obtained from testsuites. We employ abstraction techniques such as constructor hiding and stack depth limitation to make the diagrams more scalable. We use JPacman as a case study, validate our results by consulting with a domain expert, and use his feedback to finetune our techniques.

Prioritizing Software Inspection Results using Static Profiling

Speaker: Cathal Boogerd

Static software checking tools are useful as an additional automated software inspection step that can easily be integrated in the development cycle and assist in creating secure, reliable and high quality code. However, an often quoted disadvantage of these tools is that they generate an overly large number of warnings, including many false positives due to the approximate analysis techniques. This information overload effectively limits their usefulness.

In this talk we will discuss ELAN, a technique that helps the user prioritize the information generated by a software inspection tool, based on a demand-driven computation of the likelihood that execution reaches the locations for which warnings are reported. This analysis is orthogonal to other prioritization techniques known from literature, such as severity levels and statistical analysis to reduce false positives. We evaluate feasibility of our technique using a number of case studies and assess the quality of our predictions by comparing them to actual values obtained by dynamic profiling.

Edit | Attach | Printable | Raw | More topic actions
Revisions: | r1.85 | > | r1.84 | > | r1.83 | Page history | Backlinks
You are here: Main > ResearchProjects > ResearchColloquium > SQAbstracts

to top

Copyright © 2003-2014, Software Engineering Research Group, Delft University of Technology, The Netherlands