Research Colloquium Presentation Abstracts
Reliability of APIs
Wednesday May 29th, 2013, 11:00
Presenter: Maria Kechagia, Athens University of Economics and Business
Application programming interfaces (APIs) of new open platforms, such as Android, is a fertile ground for empirical software engineering studies. The availability of the code of the APIs establishes them as a valuable corpus for research associated with their design. Locating deficient APIs and improving their design or implementation can prevent thousands of applications from crashes. In our talk, we report how we used software telemetry data, in the form of stack traces, coming from Android application crashes, to analyze their causes and evaluate the reliability of the used APIs.
Analyzing the Change-Proneness of Service-Oriented Systems from an Industrial Perspective
Wednesday, May 15th, 2013, 14:00
Presenter: Daniele Romano
Antipatterns and code smells have been widely proved to affect the change-proneness of software components. However, there is a lack of studies that propose indicators of changes for service-oriented systems. Like any other software systems, such systems evolve to address functional and non functional requirements. In this research, we investigate the change- proneness of service-oriented systems from the perspective of software engineers. Based on the feedback from our industrial partners we investigate which indicators can be used to highlight change-prone application programming interfaces (APIs) and service interfaces in order to improve their reusability and response time. The output of this PhD research will assist software engineers in designing stable APIs and reusable services with adequate response time.
The GHTorrent dataset and tool suite
Wednesday, May 15th, 2013, 14:00
Presenter: Georgios Gousios
During the last few years, GitHub has emerged as a popular project hosting, mirroring and collaboration platform. GitHub provides an extensive REST API, which enables researchers to retrieve high-quality, interconnected data. The GHTorent project has been collecting data for all public projects available on Github for more than a year. In this talk, we present the dataset details and construction process and outline the challenges and research opportunities emerging from it.
Fixing the 'Out of Sight Out of Mind' Problem: One Year of Mood-Based Microblogging in a Distributed Software Team
Wednesday, May 15th, 2013, 14:00
Presenter: Kevin Dullemond & Ben van Gameren
Distributed teams face the challenge of staying connected. How do team members stay connected when they no longer see each other on a daily basis? What should be done when there is no coffee corner to share your latest exploits? In this paper we evaluate a microblogging system which makes this possible in a distributed setting. The system, WeHomer, enables the sharing of information and corresponding emotions in a fully distributed organization. We analyzed the content of over a year of usage data by 19 team members in a structured fashion, performed 5 semi-structured interviews and report our findings in this paper. We draw conclusions about the topics shared, the impact on software teams and the impact of distribution and team composition. Main findings include an increase in team-connectedness and easier access to information that is traditionally harder to consistently acquire
The Maven Repository Dataset of Metrics, Changes, and Dependencies
Wednesday, May 15th, 2013: 14:00
Presenter: Steven Raemaekers
We present the Maven Dependency Dataset (MDD), containing metrics, changes and dependencies of 148,253 jar files. Metrics and changes have been calculated at the level of individual methods, classes and packages of multiple library versions. A complete call graph is also presented which in- cludes call, inheritance, containment and historical relationships between all units of the entire repository. In this paper, we describe our dataset and the methodology used to obtain it. We present different conceptual views of MDD and we also describe limitations and data quality issues that researchers using this data should be aware of.
An N-gram Analysis on the complete corpus of MSR papers
Tuesday April 23th, 11:00
Presenter: Serge Demeyer, University of Antwerp
On the occasion of the 10th anniversary of the MSR conference, it is a worthwhile exercise to meditate on the past, present and future of this striving research community. Indeed, since the MSR community has experienced a big influx of researchers bringing in new ideas, state-of-the art technology and contemporary research methods it is unclear what the future might bring. In this paper, we report on a text mining exercise applied on the complete corpus of MSR papers to reflect on where we come from; where we are now; and where we should be going. We address issues like the trendy (and outdated) research topics; the frequently (and less frequently) cited cases; the popular (and emerging) mining infrastructure; and finally the proclaimed actionable information which we are deemed to uncover.
BIO:
Serge Demeyer is a professor at the University of Antwerp and the spokesperson for the ANSYMO (Antwerp System Modelling) research group. He directs a research lab investigating the theme of "Software Reengineering" (LORE - Lab On REengineering). In 2007 he received a "Best teacher" award from the Faculty of Sciences at the University of Antwerp. As a consequence he remains very active in all matters related to teaching quality.
His main research interest concerns software reengineering, more specifically the evolution of object-oriented software systems. He is an active member of the corresponding international research communities, serving in various conference organization and program committees. He has written a book entitled "Object-Oriented Reengineering" and edited a book on "Software Evolution". He also authored numerous peer reviewed articles, many of them in highly respected scientific journals. He completed his M.Sc. in 1987 and his PhD in 1996, both at the "Vrije Universiteit Brussel". After his PhD, he worked for three years in Switzerland, where he served as a technical co-ordinator of an European research project. Switzerland remains near and dear to his heart, witness the sabbatical leave during 2009-2010 at the University of Zürich in the research group SEAL.
CodeMine: a Software Analytics Platform for Collecting and Analyzing Engineering Process Data at Microsoft.
Monday March 18, 12:30, Snijderszaal, TU Delft
Presenter: Jacek Czerwonka, Microsoft
The breadth of products Microsoft is involved in developing is unprecedented. From several flavors of operating systems (Windows for phones, PCs and servers), through client and server-side software in a box (Office, Exchange, SQLServer), many types of services (Skype, Bing, Office365), and finally to several types of hardware (Xbox, Surface); all used by hundreds of millions of people across the world. Some of these products have long histories and some are relatively new. And even though all products eventually adhere to the same release policies, there are substantial differences in internal processes followed by teams which in large part depend on the product’s history, their business model, where they are in the lifecycle, and their operating characteristics. Project CodeMine captures data from all the processes, from all products, and for all organizations building them into a common schema. CodeMine enables product teams to drastically shorten the time to answer engineering process questions and substantially improves both the quality and consistency of decision making. In addition, it provides the ability to easily perform comparative analysis of processes across product lines of different characteristics, which makes it invaluable to Microsoft’s empirical software engineering researchers.
This talk will describe the architecture of CodeMine, provide examples of its various uses, and examine several results in depth. This talk will also outline collaboration possibilities that exists for empirical software researchers from outside of Microsoft interested in performing studies based on CodeMine’s data.
BIO:
Jacek Czerwonka is a principal architect in Tools for Software Engineers team at Microsoft. After spending 10 years working on Windows (mostly in testing), he is currently involved in creating solutions for understanding software engineering organizations and improving engineering processes at Microsoft. His interests revolve around engineering process analysis and improvement, software testing, and data-driven decision making on software projects. He has been involved in CodeMine since its inception.
Games Software Architects Play
Speaker: Philippe Kruchten
Over the years we've identified some of the strategies and tactics software architects use during the design of new, bold, large software-intensive systems: divide-and-conquer, brainstorming, reuse, etc.. But we also observed some strange tactics, biases, reasoning fallacies that creep in and pervert somehow the design process. They go by simple, funny or fancy names: anchoring, red herring, elephant in the room, post hoc ergo propter hoc, non sequitur, argumentum verbosium, etc. This talk will do a little illustrated catalogue of these games, with examples, and how they sometimes combine onto subtle but elaborate political plots. In other words, this talk is about cognitive biases and how they affect software development.
Bio:
Philippe Kruchten is professor of software engineering, in the department of Electrical and Computer Engineering of the University of British Columbia, in Vancouver, where he holds an NSERC Chair in Design Engineering. He teaches software engineering, more specifically software project management, and two interdisciplinary project courses on innovation, entrepreneurship, system engineering. Philippe does research in software architecture and software processes, with a handful of graduate students, and many collaborators around the world. He is known as Director of Process Development (RUP) at Rational Software, and developer of the 4+1 view model.
Venue:
TU Delft, Faculty EEMCS, Mekelweg 4, Delft, Lipkensroom, December 19, 15:30-16:30.
Mining Structured Data in Natural Language Artifacts with Island Parsing
Speaker: Alberto Bacchelli
Software developers are nowadays supported by a variety of tools (e.g., version control systems, issue tracking systems, and mailing list services) that record a wide range of information into archives. Researchers mine these archives (or software repositories) both to support software understanding, development, and evolution, and to empirically validate novel ideas and techniques. Software repositories comprise two types of data: structured data and unstructured data. Structured data, such as source code, has a well-established structure and grammar, and is straightforward to parse and use with computer machinery. Unstructured data, such as documentation, discussions, comments, customer support requests, comprises a mixture of natural language text, snippets of structured data, and noise. Mining unstructured data is hard, because out-of-the box approaches adopted from related fields such as Natural Language Processing and Information Retrieval cannot be directly applied in the Software Engineering domain.
In my work I focus on mining unstructured data, because it gives us the chance to gain insights on the human factors revolving around software projects, so that we can better understand and support software development from a different perspective. In particular, I focus on the email communication occurring among people involved in a software project. In this presentation, I will detail our approach---based on island parsing---to recognize, parse, and model fragments of structured information (e.g., source code snippets) embedded in natural language artifacts. We evaluated our approach by applying it to the mailing lists of three open source software systems. The results show that our approach allows the extraction of structured data in messages with high precision and recall, despite the noisy nature of emails. I will discuss how the presented approach can be used to conduct novel forms of software analysis and lead to promising future work.
Bio:
Alberto Bacchelli obtained his Bachelor and Master's degree in Computer Science from the University of Bologna, Italy. After graduating he has been working for one year in the largest Italian computing center as a professional software engineer in a large team to develop software for universities. He is currently a Ph.D. student at the University of Lugano in the Faculty of Informatics, working under the supervision of Prof. Michele Lanza in the REVEAL (Reverse Engineering, Visualization, Evolution Analysis Lab) research group. He has been intern at Microsoft Research in summer 2012, mentored by Dr. Christian Bird, working on understanding motivations, outcomes, and challenges of tool supported code reviews. He co-organized the second international workshop on mining unstructured data (MUD'12), held with WCRE 2012. His research interests include empirical software engineering, mining software repositories, unstructured data mining, qualitative research, and development tools.
Leveraging feature models for change impact analysis and design improvement
Speaker: Nicolas Dintzner
Large software systems are subject to constant changes, leading to an increase of complexity and thus increases in maintenance and development cost. For long live system, there is a need to address evolvability early on in the development process.
One way of designing an evolvable system is to describe likely upcoming changes and assess their impact of the current system. While all changes cannot be foreseen, some changes can be inferred from what is already subject to change: alternative and optional features of the system.
We propose here an approach to assess the impact of external feature changes on a design based on a system description using feature model and features/design artifacts relationships. The approach is ment to be lightweight and usable iteratively to quickly improve the capability of a design to withstand such potential changes.
Analyzing the Evolution of Web Services using Fine-Grained Changes
Speaker: Daniele Romano
In the service-oriented paradigm web service interfaces are considered contracts between web service subscribers and providers. However, these interfaces are continuously evolving over time to satisfy changes in the requirements and to fix bugs. Changes in a web service interface typically affect the systems of its subscribers. Therefore, it is essential for subscribers to recognize which types of changes occur in a web service interface in order to analyze the impact on his/her systems.
In this paper we propose a tool called WSDLDiff to extract fine-grained changes from subsequent versions of a web service interface defined in WSDL. In contrast to existing approaches, WSDLDiff takes into account the syntax of WSDL and extracts the WSDL elements affected by changes and the types of changes. With WSDLDiff we performed a study aimed at analyzing the evolution of web services using the fine-grained changes extracted from the subsequent versions of four real world WSDL interfaces.
The results of our study show that the analysis of the fine-grained changes helps web service subscribers to highlight the most frequent types of changes affecting a WSDL interface. This information can be relevant for web service subscribers who want to assess the risk associated to the usage of web services and to subscribe to the most stable ones.
Measuring library stability through code metrics and binary incompatibilities
Speaker: Steven Raemaekers
Third-party libraries are widely used in todays software systems, especially in Java. Changes in libraries can force software systems to update places with links to these libraries. A 'stable' library is capable of avoiding such changes in systems that use it, while an 'unstable' library will cause so-called ripple effects sooner. What kind of metrics can we think of that can catch this stability of libraries? We look at historic changes in metric values of a library. With Java bytecode, it is possible to exactly determine the size of the impact of each change. Eventually, we want to determine how much it costs if a single line in a library or system is being changed, including all ripple effects in other places
Developer Motivation and the Adoption of Software Engineering Methods
Speaker: Leif Singer
Software engineering research and practice provide a wealth of methods that improve the quality of software and lower the costs of producing it. Even though processes mandate their use, methods are not employed consequently. Software developers and development organizations thus cannot benefit from these methods fully.
There can be diverse reasons for unsatisfactory adoption, the motivations of developers being one of them. We are developing a theoretical framework for supporting the more intrinsic motivations of software developers. This talk discusses our efforts and current status.
Leif Singer is a Ph. D. student at the Software Engineering Group at Leibniz Universität Hannover in Germany. He's interested in the different uses of social software in software development and the effects it can have on developers' behaviors.
Declaratively Defining Domain-Specific Language Debuggers
Speaker: Ricky Lindeman
Tool support is vital to the effectiveness of domain-specific languages. With language workbenches, domain-specific languages and their tool support can be generated from a combined, high-level specification. This paper shows how such a specification can be extended to describe a debugger for a language. To realize this, we introduce a meta-language for coordinating the debugger that abstracts over the complexity of writing a debugger by hand. We describe the implementation of a language-parametric infrastructure for debuggers that can be instantiated based on this specification. The approach is implemented in the Spoofax language workbench and validated through realistic case studies with the Stratego transformation language and the WebDSL web programming language.
Performance Trade-offs in Client-Side Service Delegation
Speaker: Adam Nasr
Service Oriented Architecture, which builds on distributed computing platforms, is increasingly being adopted by organizations in both public and private sectors. Migration from traditional monolithic systems to services, in particular web services, characterizes much of systems evolution today. This paper analyzes some of the performance and modularization problems involved in current service-oriented computing. It investigates under which circumstances the communication between service providers and service consumers can be made more efficient by eliminating certain steps from traditional Remote Procedure Call (RPC) methods. After discussing traditional service invocation and its drawbacks, this paper proposes an alternative approach called Distributed Service Delegates (DSD). DSD is based on emphasizing client-side or local computations. An experiment is designed and implemented to measure the trade offs between traditional methods, in this case Web services, and the proposed DSD. The results of this experiment are discussed and its implications for future research are indicated.
Spectrum-based Health Monitoring for Self-Adaptive Systems
Speaker: Éric Piel
An essential requirement for the operation of self-adaptive systems is information about their
internal health state, i.e., the extent to which the constituent software and hardware components
are still operating reliably.
Accurate health information enables systems to recover automatically from
(intermittent) failures in their components through selective restarting, or
self-reconfiguration.
This paper explores and assesses the utility of Spectrum-based Fault Localisation (SFL)
combined with automatic health monitoring for self-adaptive systems.
Their applicability is evaluated through simulation of online
diagnosis scenarios, and through implementation in an adaptive
surveillance system inspired by our industrial partner.
The results of the studies performed confirm that the combination of SFL with online
monitoring can successfully provide health information and locate problematic
components, so that adequate self-* techniques can be deployed.
Using Source Code Metrics to Predict Change-Prone Java Interfaces
Speaker: Daniele Romano
Empirical studies have investigated the use of source code metrics to predict the change- and defect-proneness of source code files and classes. While results showed strong correlations and good predictive power of these metrics, they do not distinguish between interface, abstract or concrete classes. In particular, interfaces declare contracts that are meant to remain stable during the evolution of a software system while the implementation in concrete classes is more likely to change.
This study aims at investigating to which extent the existing source code metrics can be used for predicting change-prone Java interfaces. The correlation between metrics and the number of fine-grained source code changes have been investigated in interfaces of ten Java open-source systems. Then, the metrics have been evaluated to calculate models for predicting change-prone Java interfaces. The results show that the external interface cohesion metric exhibits the strongest correlation with the number of source code changes. This metric also improves the performance of prediction models to classify Java interfaces into change-prone and not change-prone.
Using Vector Clocks to Monitor Dependencies among Services at Runtime
Speaker: Daniele Romano
Service-Oriented Architecture (SOA) enable organizations to react to requirement changes in an agile manner and to foster the reuse of existing services. However, the dynamic nature of Service-Oriented Systems and their agility bear the challenge of properly understanding such systems. In particular, understanding the dependencies among services is a non trivial task, especially if service-oriented systems are distributed over several hosts and/or using different SOA technologies.
In this study, we propose an approach to monitor dynamic dependencies among services. The approach is based on the vector clocks, originally conceived and used to order events in a distributed environment. We use the vector clocks to order service executions and to infer causal dependencies among services. In our future work we plan to use this information to study change and failure impact analysis in service-oriented systems.
Understanding Service-Oriented Systems Using Dynamic Analysis
Speaker: Tiago Espinha
Service-Oriented Architecture (SOA) enable organizations to react to requirement changes in an agile manner and to foster the reuse of existing services. However, the dynamic nature of Service-Oriented Systems and their agility bear the challenge of properly understanding such systems. In particular, understanding the dependencies among services is a non trivial task, especially if service-oriented systems are distributed over several hosts and/or using different SOA technologies.
In this study, we propose an approach to monitor dynamic dependencies among services. The approach is based on the vector clocks, originally conceived and used to order events in a distributed environment. We use the vector clocks to order service executions and to infer causal dependencies among services. In our future work we plan to use this information to study change and failure impact analysis in service-oriented systems.
A Framework-based Runtime Monitoring Approach for Service-Oriented Software Systems
Speaker: Cuiting Chen
The highly dynamic and loosely coupled nature of a serviceoriented
software system leads to the challenge of understanding
it. In order to obtain insight into the runtime
topology of a SOA system, we propose a framework-based
runtime monitoring approach to trace the service interactions
during execution. The approach can be transparently
applied to all web services built on the framework and reuses
parts of information and functionality already available in
the framework to achieve our goals.
Grammar Comparison Techniques
Speaker: Vadim Zaytsev
The need to compare languages is commonly encountered when developing grammarware (such as a parser or a manual), validating and fixing it, assessing its compatibility. When making statements in natural languages, it is very easy and commonly desirable to claim one language to be equal or equivalent to another, or to be a subset or a superset. However, in formal practice these questions are undecidable, so a smart workaround is usually needed. Three different approaches from recent research will be covered: grammar convergence is a lightweight verification method for establishing and maintaining the correspondence between grammar knowledge ingrained in software artefacts; grammar-based testing approach is based on systematic test data generation and feeding it into parsers in a differential way; test-based nonterminal matching can go beyond nominal matching of syntactic categories by applying grammar-based testing on a per nonterminal basis.
Reconstructing Complex Metamodel Evolution
Speaker: Sander Vermolen
Metamodel evolution requires model migration. To correctly migrate models, evolution needs to be made explicit. Manually describing evolution is error-prone and redundant. Metamodel matching offers a solution by automatically detecting evolution, but is only capable of detecting primitive evolution steps. In practice, primitive evolution steps are jointly applied to form a complex evolution step, which has the same effect on a metamodel as the sum of its parts, yet generally has a different effect in migration. Detection of complex evolution is therefore needed. In this paper we present an approach to reconstruct complex evolution between two metamodel versions, using a matching result as input. It supports operator dependencies and mixed, overlapping and incorrectly ordered complex operator components. It also supports interference between operators, where the effect of one operator is partially or completely hidden from the target metamodel by other operators.
Using Pattern Recognition Techniques for Server Overload Detection
Speaker: Cor-Paul Bezemer
One of the key factors in customer satisfaction is application performance. To be able to guarantee good performance, it is necessary to take appropriate measures before a server overload occurs. While in small systems it is usually possible to predict server overload using a subjective human expert, an automated overload prediction mechanism is important for ultra-large scale systems, such as multi-tenant Software-as-a-Service (SaaS) systems. An automated prediction mechanism would be an initial step towards self-adaptiveness of such systems, a property which leads to less human intervention during maintenance, resulting in less errors and better quality of service.
In order to provide such a prediction mechanism, it is important to have a solid overload detection approach, which is
(1) a first step towards automated prediction and (2) necessary for automated testing of a prediction mechanism. In this paper we propose a number of steps which help with the design and optimization of a statistical pattern classifier for server overload detection. Our approach is empirically evaluated on a synthetic dataset.
Software Engineering and Sensor Networks
Speaker: Matthias Woehrle
Wireless sensor networks (WSNs) are a new class of computation systems that enable seamless integration of the physical with the digital world.
Developing applications of sensor network technology is challenged by various intricacies such as unpredictable environmental influences, unreliable communication between sensor nodes and severe resource constraints (processing, memory, bandwidth and energy).
The development of sensor networks software is still rather ad-hoc, lacking suitable methodologies, techniques, and abstractions. These are topics where the software engineering community can help sensor network research in facilitating the development of novel, reliable and dependable WSN applications.
This talk will highlight some of the software engineering challenges in WSNs and present some subjectively select examples of software engineering research on WSNs.
A Self-Adaptive Deployment Framework for Service-Oriented Systems
Speaker: Sander van der Burg
Deploying components of a service-oriented system in a network of machines is often a complex and labourious process. Usually the environment in which such systems are deployed is dynamic: any machine in the network may crash, network links may temporarily fail, and so on. Such events may render the system partially or completely unusable. If an event occurs, it is difficult and expensive to redeploy the system to the take the new circumstances into account.
In this paper we present a self-adaptive deployment framework built on top of Disnix, a model-driven distributed deployment tool for service-oriented systems. This framework dynamically discovers machines in the network and generates a mapping of components to machines based on non-functional properties. Disnix is then invoked to automatically, reliably and efficiently redeploy the system.
Collective Code Bookmarks for Program Comprehension
Speaker: Anja Guzzi
The program comprehension research community
has been developing useful tools and techniques to support
developers in the time-consuming activity of understanding
software artifacts. However, the majority of the tools do not
bring collective benefit to the team: After gaining the necessary
understanding of an artifact (e.g., using a technique based on
visualization, feature localization, architecture reconstruction,
etc.), developers seldom document what they have learned, thus
not sharing their knowledge. We argue that code bookmarking
can be effectively used to document a developer’s findings, to
retrieve this valuable knowledge later on, and to share the
findings with other team members.
We present a tool, called POLLICINO, for collective code
bookmarking. To gather requirements for our bookmarking
tool, we conducted an online survey and interviewed professional
software engineers about their current usage and needs
of code bookmarks. We describe our approach and the tool
we implemented. To assess the tool’s effectiveness, adequacy,
and usability, we present an exploratory pre-experimental user
study we have performed with 11 participants.
Finding Software License Violations Through Binary Code Clone Detection
Speaker: Eelco Dolstra
Software released in binary form frequently uses third-party
packages without respecting their licensing terms. For instance,
many consumer devices have firmware containing the
Linux kernel, without the suppliers following the requirements
of the GNU General Public License. Such license
violations are often accidental, e.g., when vendors receive
binary code from their suppliers with no indication of its
provenance. To help find such violations, we have developed
the Binary Analysis Tool (BAT), a system for code clone
detection in binaries. Given a binary, such as a firmware
image, it attempts to detect cloning of code from repositories
of packages in source and binary form. We evaluate and
compare the effectiveness of three of BAT’s clone detection
techniques: scanning for string literals, detecting similarity
through data compression, and detecting similarity by computing
binary deltas.
Building a Computing System for the World’s Information
Speaker: JC van Winkel, Site Reliability Engineer, Google Zurich
Thursday April 14, 14:00, Room HB09.130,
EWI, TU Delft, Mekelweg 4, Delft
Google's mission (to organize the world’s information and make it universally accessible and useful) in today's world requires quite some infrastructure. Not only in hardware, but also in software. In this talk we will take a look at some of this infrastructure and some of the
projects making use of it. Of course, achieving Google's goal and overcoming the challenges needed to perform at Google scale is only possible with people. Therefore we will also look at the culture within Google and the philosophy around which all engineering is done.
Bio: JC van Winkel has a B.S. and an M.S. in Computer Science (the M.S. from the Vrije Universiteit Amsterdam). From 1990 to 2010 he worked at AT Computing, a small courseware and consulting firm in Nijmegen, the Netherlands. There he taught UNIX and UNIX-related subjects, such as C++. Since November 2010 he has been working as a software engineer at Google Switzerland in Zurich.
Software Bertillonage: Finding the Provenance of an Entity
Speaker: Daniel M. German, Dept of Computer Science, University of Victoria
Deployed software systems are typically composed of many pieces, not all of which may have been created by the main development team. Often, the provenance of included components --- such as external libraries or cloned source code --- is not clearly stated, and this uncertainty can introduce technical and ethical concerns that make it difficult for system owners and other stakeholders to manage their software assets.
In this presentation, I will motivate the need for the recovery of the provenance of software entities using metrics that can be computed easily but are effective at reducing the search space, in a manner similar to that of Bertillonage. This was a simple and approximate forensic analysis technique based on bio-metrics that was developed in 19th century France before the advent of fingerprints.
As an example, we have developed a fast, simple, and approximate technique called Anchored Signature Matching for identifying library version information within a given Java application. This technique involves a type of structured signature matching performed against a database of candidates drawn from the Maven2 repository, a 150GB collection of open source Java libraries. An exploratory case study using a proprietary e-commerce Java application illustrates that the approach is both feasible and effective.
I will also describe how software Bertillonage is one of the first steps towards solving the problem of license compliance.
A co-Relational Model of Data for Large Shared Data Banks
Speaker:
Erik Meijer, Microsoft
Wednesday April 13, 11:00-12:00, Lecture Room C,
EWI, TU Delft, Mekelweg 4, Delft
One of the most hotly debated topics in the world of Cloud-scale data processing today is noSQL versus SQL. Because noSQL lacks a mathematic underpinning, the discussion around each model's strength and weakness often relies on rhetoric and anecdotal evidence, instead of sound technical reasoning. This makes it hard for enterprises and practitioners to make rational decisions around building (new) projects using either SQL or noSQL solutions.
In this talk we present categorical models for noSQL key-value stores and for SQL's foreign/primary-key data stores, and show that SQL and noSQL are in fact mathematical duals; in other words noSQL is really coSQL. We also illustrate how another concept from category theory, namely monads, can be seen as a generalization of the relational algebra and thus provides a uniform algebra for expressing queries over both SQL and noSQL stores.
Just as Codd's discovery of the relational algebra as a formal basis for SQL propelled a billion dollar industry around foreign/primary-key stores, we believe that the formalization of coSQL using category theory will allow the same to happen for coSQL key-value stores.
Full paper:
Erik Meijer and Gavin Bierman.
A Co-Relational Model of Data for Large Shared Data Banks.
Communications of the ACM
Vol. 54 No. 4, Pages 49-58, 2011
(
fulltext).
Bio:
Erik Meijer
is head of the
Cloud Programmability team at Microsoft, Redmond, USA.
He is the (co-)creator of LINQ, Volta, and the Reactive programming framework Rx (Reactive Extensions) for .NET. He has been involved in over 150 software patent applications.
In 2009, he was the recipient of the
Microsoft Outstanding Technical Leadership Award.
Before joining Microsoft more than 10 years ago,
he was associate professor at Utrecht University, where he worked
on functional programming, in particular on the programming language Haskell.
Empirical Software Engineering for Agent Programming
Speaker: Birna van Riemsdijk
Various programming languages and frameworks for developing agents have been developed by now. These languages are moving more and more from the realm of theory and toy examples to being applied in challenging environments. However, very few systematic studies have been done as to how the language constructs in these languages may be and are in fact used in practice. In this talk I will present recent empirical research on the use of the agent programming language GOAL that is being developed in the MMI group at TU Delft. In particular, we have studied GOAL programs that were developed for the first-person shooter game Unreal Tournament 2004 by students of the first year BSc course on multi-agent systems. This research aims to form the basis for development of programming guidelines and language improvements for GOAL and agent programming languages in general.
Building DSLs for Algorithmic Currency Trading
Speaker: Karl Trygve Kalleberg
Writing low-latency, high-frequency trading systems that are reliable
and predictable currently requires both an in-depth understanding of
the trading domain plus solid skills in engineering asynchronous
systems. The domain expert and systems engineer are rarely the same
person (but in those few exceptional cases, said person is paid
obscenely well.) We propose to mend the highly problematic divide
between trader and programmer by offering technically minded traders a
domain-specific language for writing trading strategies along with a
web platform for end-to-end development, testing and deployment of
these strategies.
The talk will include a short demonstration of KolibriFX, our
web-based trading development platform, and it is my intent and hope
that the presentation will spark debate on approaches for gracefully
exposing non-programmers to the wonderful world of asynchronous
programming by way of DSLs.
A Pragmatic Perspective on Software Visualization
Speaker: Arie van Deursen
For software visualization researchers taking the pragmatic philosophical
stance, the ultimate measure of success is adoption in industry.
For you as researcher, what can be more satisfying than enthusiastic
developers being able to work better and more efficiently
thanks to your beautiful visualization of their software?
One of the aims of this talk is to reflect on factors affecting impact
in practice of software visualization research. How does rigorous
empirical evaluation matter? What is the role of foundational
research that does not subscribe to the philosophy of pragmatism?
Can we make meaningful predictions of adoption in practice if this
takes 10 years or more?
During the talk, I will illustrate the dilemmas, opportunities, and
frustrations involved in trying to achieve practical impact with examples
drawn from my own research in such areas as software architecture
analysis, documentation generation, and Web 2.0 user
interface reverse engineering.
I will also shed light on some of my most recent research activities,
which includes work in the area of spreadsheet comprehension.
This is research that we conduct with a major (Dutch)
financial asset management firm. Our work consists of the identification
of information needs for professional spreadsheet users, a
visualization to address these needs, and an evaluation of this visualization
with practitioners conducting real-life spreadsheet tasks.
Throughout the talk, I will encourage the audience to engage in
the discussion, and contribute their own perspectives on the issues
that I raise in my talk.
On Technical Debt
Speaker: Philippe Kruchten
The technical debt metaphor is gaining significant traction in the software development community as a way to understand
and communicate issues of intrinsic quality, value, and cost. The idea is that developers sometimes accept compromises in
a system in one dimension (e.g., modularity) to meet an urgent demand in some other dimension (e.g., a deadline), and that
such compromises incur a “debt”: on which “interest” has to be paid and which should be repaid at some point for the long-term
health of the project. Little is known about technical debt, beyond a wide range of feelings and opinions.
Software developers and corporate managers frequently disagree about important decisions regarding how to invest scarce
resources in development projects, especially in internal quality aspects that are crucial to system sustainability, but that are
largely invisible to management and customers, and that do not generate short-term revenue. Among these properties are
code and design quality and documentation. Engineers and developers often advocate for such investments, but executives
question their value and frequently decline to approve them, to the long-term detriment of software projects. The situation is
exacerbated in projects that must balance short deadlines with long-term sustainability.
There is a key difference between debt that results from employing bad engineering practices and debt that is incurred through
intentional decision-making in pursuit of a strategic goal. While an appealing metaphor, theoretical foundations for identifying
and managing technical debt are lacking. In addition, while the term was originally coined in reference to coding practices, today
the metaphor is applied more broadly across the project lifecycle and may include practices of refactoring, test-driven development,
iteration management and software craftsmanship.
The concept of technical debt could provide a basis on which the various parties could reason about the best course of action
for the evolution of a software product.
In this brief presentation I will present the metaphor and its limitation, give examples of various type of technical debt and how
it can be estimated, measured and maybe tackled. I will also discuss a possible research agenda on technical debt.
Understanding Plug-in Test Suites from an Extensibility Perspective
Speaker: Michaela Greiler
Plug-in architectures enable developers to build extensible software products. Such products are assembled from plug-ins, and their functionality can be enriched by adding or configuring plug-ins. The plug-ins themselves consist also of multiple plug-ins, and offer dedicated points through which their functionality can be influenced. A well-known example of such an architecture is Eclipse, best known for its use to create a series of extensible IDEs. In order to test systems built from plug-ins developers use extensive automated test suites. Unfortunately, current testing tools offer little insight in which of the many possible combinations of plug-ins and plug-in configurations are actually tested.
To remedy this problem, we propose three architectural views that provide an extensibility perspective on plug-in based systems and their test suites. The views combine static and dynamic information on plug-in dependencies, extension initialization, and extension usage. The views are implemented in ETSE, the Eclipse Plug-in Test Suite Exploration tool. We evaluate the proposed views by analyzing eGit and Mylyn, two open source Eclipse plug-ins.
Replaying Past Changes on Multi-developer Projects
Speaker: Lile Hattori
What was I working on before the weekend? and What were the members of the my team working on during the last week? are common questions that are frequently asked by a developer. They can be answered if one keeps track of who changes what in the source code. In this work, we present Replay, a tool that allows one to replay past changes as they happened at a fine-grained level, where a developer can watch what she has done or understand what her colleagues have done in past development sessions. With this tool, developers are able to not only understand what sequence of changes brought the system to a certain state (e.g., the introduction of a defect), but also deduce reasons for why her colleagues per- formed those changes. One of the applications of such a tool is also discovering the changes that broke the code of a developer.
Automating System Tests Using Declarative Virtual Machines
Speaker: Eelco Dolstra
Automated regression test suites are an essential software engineering practice:
they provide developers with rapid feedback on the impact of changes to a system's source code. The inclusion of a test case in an automated test suite requires that the system's build process can automatically provide all the environmental dependencies of the test. These are external elements necessary for a test to succeed, such as shared libraries, running programs, and so on.
For some tests (e.g., a compiler's), these requirements are simple to meet.
However, many kinds of tests, especially at the integration or system level, have complex dependencies that are hard to provide automatically, such as running database servers, administrative privileges, services on external machines or specific network topologies. As such dependencies make tests difficult to script, they are often only performed manually, if at all. This particularly affects testing of distributed systems and system-level software.
This paper shows how we can automatically instantiate the complex environments necessary for tests by creating (networks of) virtual machines on the fly from declarative specifications. Building on NixOS, a Linux distribution with a declarative configuration model, these specifications concisely model the required environmental dependencies. We also describe techniques that allow efficient instantiation of VMs. As a result, complex system tests become as easy to specify and execute as unit tests. We evaluate our approach using a number of representative problems, including automated regression testing of a Linux distribution.
A Metric for Assessing Component Balance of Software Architectures
Speaker: Eric Bouwers
The decomposition of a software system into components is a major decision in a software architecture, having a strong influence on many of its quality aspects. A system’s analyzability, in particular, is influenced by its decomposition into components. But into how many components should a system be decomposed? And how should the elements of the system be distributed over those components?
In this paper, we set out to find an answer to these questions by capturing them jointly inside a metric called Component Balance. We calibrate this generic metric with the help of a repository of industrial and open source systems. We report on an empirical study that demonstrate that the metric is strongly correlated with ratings given by experts. In a case study we show that the metric provides relevant results in various evaluation scenarios.
Supporting Professional Spreadsheet Users by Generating Leveled Dataflow Diagrams
Speaker: Felienne Hermans
Thanks to their flexibility and intuitive programming model, spreadsheets are widely used
in industry, often for business-critical applications. Similar to software developers,
professional spreadsheet users demand support for maintaining and transferring their spreadsheets.
We first studied the problems and information needs
of professional spreadsheet users by means of a survey conducted at a
large financial company. Based on these needs, we then present an approach that extracts this
information from spreadsheets and provides it in a compact and easy to understand way
using leveled dataflow diagrams. Our approach comes with three different views on the dataflow and
allows the user to analyze the information in a top-down fashion also using slicing techniques.
To evaluate the usefulness of the proposed approach, we conducted a series of interviews
as well as nine case studies in an industrial setting. The results of the evaluation clearly indicate
the demand for and usefulness of our approach in ease the understanding of spreadsheets.
Extending Code Generators by Transforming Generated Code
Speaker: Rob Economopoulos
Code generated from high-level specifications often requires
modification before deployment, but no approach exists that allows an
application programmer to make these modifications in a safe, reliable,
and systematic way. Hence, incomplete or incorrect generators cannot be
changed easily, domain evolution is not supported, and experimenting
with new features is not possible. In this talk, I'll show how to modify
and extend code generators indirectly, by automatically transforming
the generated code. Two related techniques to enable automatic
code customizations are explored: i) syntactic customization patches,
whose implementation required Stratego to be extended to allow
the embedding of concrete syntax in concrete syntax and ii) semantic
join points, implemented as Java annotations and weaved into the
generated code using AspectJ. Both approaches are based on ideas
from aspect-oriented programming, but have opposite characteristics.
Customization patches do not require changes to the underlying code
generator, but can suffer from the fragile point-cut problem, while
semantic join points are stable but require support from the generator.
I will describe and evaluate the application of these techniques in the
customization of WebDSL, a domain-specific language for modeling
dynamic, data-rich web applications.
Combining Micro-Blogging and IDE Interactions to Support Developers in their Quests
Speaker: Anja Guzzi
Software engineers spend a considerable amount of time on program comprehension. Although vendors of Integrated Development Environments (IDEs) and analysis tools address this challenge, current support for reusing and sharing program com- prehension knowledge is limited. As a consequence, developers have to go through the time-consuming program understanding phase multiple times, instead of recalling knowledge from their past or other’s program comprehension activities.
In this paper, we present an approach to making the knowledge gained during the program comprehension process accessible, by combining micro-blog messages with interaction data automati- cally collected from the IDE. We implemented the approach in an Eclipse plugin called James and performed a first evaluation of the underlying approach effectiveness, assessing the nature and usefulness of the collected messages, as well as the added benefit of combining them with interaction data.
Enabling Multi-Tenancy: An Industrial Experience Report
Speaker: Cor-Paul Bezemer
Multi-tenancy is a relatively new software architecture principle in the realm of the Software as a Service (SaaS) business model. It allows to make full use of the economy of scale, as multiple customers – “tenants” – share the same application and database instance. All the while, the tenants enjoy a highly configurable application, making it appear that the application is deployed on a dedicated server. The major benefits of multi-tenancy are increased utilization of hardware resources and improved ease of maintenance, resulting in lower overall application costs, making the technology attractive for service providers targeting small and medium enterprises (SME). Therefore, migrating existing single-tenant to multi-tenant applications can be interesting for SaaS software companies. In this paper we report on our experiences with reengineering an existing industrial, single-tenant software system into a multitenant one using a lightweight reengineering approach.
Automated Deployment of a Heterogeneous Service-Oriented System
Speaker: Sander van der Burg
Deployment of a service-oriented system
in a network of machines is often complex and labourious.
In many cases components implementing a service have
to be built from source code for the right target platform,
transferred to the right machines with the right capabilities
and activated in the right order.
Upgrading a running system is even more difficult as this may break
the running system and cannot be performed atomically.
Many approaches that deal with the complexity of a distributed deployment process
only support certain types of components or specific environments, while
general solutions lack certain desirable non-functional properties,
such as atomic upgrading.
This paper shows Disnix, a deployment tool which
allows developers and administrators to reliably deploy, upgrade
and roll back a service-oriented system consisting of various
types of components in a heterogeneous environment
from declarative specifications.
Topic Clouds Extraction from End-user Documents
Speaker: Arsen Storojev
Word documents and spreadsheets are widely used in organizations all over the world. Domain information, contained in these spreadsheets could be of a great value for requirements engineers when collecting requirements for new software systems at the organizations. Visualizing domain context is one of the possible ways to communicate it to the requirements engineers. The master’s project of Arseni Storojev (supervisor: Felienne Hermans) explores the possibility of creating “Topic Clouds” – a visual tag-cloud like representation of the important topics, extracted from end-user documents. Latent Dirichlet Allocation and an extended graph-based keyphrase extraction algorithm TextRank are selected for this task. The topic clouds generated by the algorithms are evaluated by experiments with domain experts.
Testing in the Google environment
Speaker: James A. Whittaker
Google releases software many times every day. Ever wonder what it takes to test in such an environment? James Whittaker talks about test methodology, tools and innovation surrounding the discipline of quality assurance at Google where testers are far outnumbered by developers. Specifically he will present how the webapp-chrome-chromium stack is tested to ensure that Google apps work well on Chrome browser and Chromium operating system. During the talk he presents how Google treats testing activity much like a hospital triages emergency room patients and how game playing metaphors have inspired the development of next generation test automation tools.
Speaker Bio
Dr. Whittaker is currently the Engineering Director over engineering tools and testing for Google's Seattle and Kirkland offices. He holds a PhD in computer science from the University of Tennessee and is the author or coauthor of four acclaimed textbooks. How to Break Software, How to Break Software Security (with Hugh Thompson) and How to Break Web Software (with Mike Andrews). His latest is Exploratory Software Testing: Tips, Tricks, Tours and Techniques to Guide Test Design and he's authored over fifty peer-reviewed papers on software development and computer security. He holds patents on various inventions in software testing and defensive security applications and has attracted millions in funding, sponsorship, and license agreements while a professor at Florida Tech. He has also served as a testing and security consultant for dozens of companies and spent 3 years as an architect at Microsoft.
Requirements for collaborative software in global software engineering
Speaker: Martijn Reijerse
In global software engineering (GSE), coordination of a software development project can be difficult. Software tools can provide support for coordination and collaboration among developers. In the interest of the high amount of differences between the tools and the scarcity of the studies that treat multiple requirements, a study to identify the requirements for such tools is desirable. The research question that we aim to answer is: 'Which requirements should be fulfilled by collaborative software supporting synchronous collaboration?'. To answer this question, this study critically selects a range of studies, presents an analysis of the selected studies and displays a derived list of requirements. This list is an overview of all presented requirements in the selected studies. Subsequently, the requirements will be validated with practical experiences and rejected requirements will be removed, in order to guarantee the validity of this study. Furthermore, the contribution of this study to GSE is investigated by measuring to what extent the requirements are already implemented in currently used collaborative software. The outcome is that not all validated requirements are implemented. As a result, this study can contribute to the development and selection of collaborative software, because the results are based on existing case studies and data is validated with practical experiences.
Reducing Maintenance Effort through Generic Recording of Deployed Software Operation: A Field Study
Speaker: Henk van der Schuur
Knowledge of in-the-field software operation is acquired
unsophisticatedly: acquisition processes are implemented ad hoc, application-specific and are only triggered when end-users experience severe software operation failures. Vendors that are structurally acquiring such knowledge, often are unsuccessful in visualizing it effectively. The lack of a generic software operation knowledge acquisition approach and absence of visualization tools are only two causes of an explosion in maintenance costs. After all, maintenance effort is directly proportional to the time needed to identify software operation failures. We propose a technique for software operation knowledge acquisition and presentation by generic recording and visualization of deployed software operation. A prototype tool that implements this technique is presented, as well as an empirical study that evaluates this tool on three widely-used software packages. Results show that the technique is effective and is expected to reduce software maintenance effort and increase end-users' software operation comprehension.
Bio:
Henk is a PhD student in the field of software engineering and feedback.
His research is directed at revealing the potential role of software operation knowledge (SOK; knowledge of software operating in the field) within the organizations of software vendors, and to identify mechanisms for identification, acquisition, integration, presentation and utilization of SOK in these organizations.
Do we test to find failures or faults? A Diagnostic Approach to Test Prioritization
Speaker: Alberto Gonzalez
Testing typically triggers fault diagnosis to localize the detected failures.
However, current test prioritization algorithms are tuned for
failure detection rate rather than providing diagnostic information.
Consequently, unnecessary effort might be spent to localize the faults.
We present a dynamic test prioritization algorithm that trades fault detection
rate for diagnostic performance, minimizing overall testing and diagnosis cost.
The algorithm exploits pass/fail information from each test to select the next test,
optimizing the diagnostic information produced per test.
Supporting Collaboration Awareness with Real-time Visualization of Development Activity
Speaker: Anja Guzzi
In the context of multi-developer projects, where
several people are contributing code, developers must deal
with concurrent development. Collaboration among developers
assumes a fundamental role, and failing to address it can
result, for example, in shipping delays. We argue that tool
support for collaborative software development augments the
level of awareness of developers, and consequently, help them
to collaborate and coordinate their activities.
In this context, we present an approach to augment aware-
ness by recovering development information in real time and
broadcasting it to developers in the form of three lightweight
visualizations. Scamp, the Eclipse plug-in supporting this, is
part of our Syde tool to support collaboration. We illustrate the
usage of Scamp in the context of two multi-developer projects.
Supporting Collaboration with Synchronous Changes
Speaker: Lile Hattori (University of Lugano, Switzerland)
In a multi-developer project, team collaboration is essential for the success of the project. When team members are spread across different locations, individual awareness of the activity of others is compromised. Consequently, collaboration becomes a challenge. Studies have shown that some of the problems that arise due to low levels of awareness are the decline of the willingness of developers to help others and of the ability to spot specialists. Concomitantly, studies have strongly indicated the need for tools and techniques to increase team awareness and support collaboration.
In this talk we present Syde, a tool that promotes collaboration by enriching the Eclipse Integrated Development Environment (IDE) with awareness information. As opposed to the state-based approaches proposed so far, Syde uses an operation-based technique to continuously track source code changes at a fine-grained level. This allows Syde to have changes as first-class entities instead of recovering them through differencing algorithms, as state-based approaches do. We use the precise information about ongoing changes to build views and visual cues within the Eclipse IDE. The goal is to inform developers about changes that are of their interest, potential conflicts, and code ownership.
Supporting Developers with Natural Language Queries
Speaker: Harald Gall (University of Zurich, Switzerland)
The feature list of modern IDEs is steadily growing and mastering these tools becomes more and more demanding, especially for novice programmers.
IDEs often cannot directly answer the questions that arise during program comprehension tasks. Developers have to map their questions to multiple queries that can be answered only by combining several tools and examining the output of each of them manually. Existing approaches are limited to a set of predefined, hardcoded questions, or require to learn a specific query language. We present a framework to query for information about a software system using guided-input natural language resembling plain English. We model data extracted by classical software analysis tools with an OWL ontology and use Semantic Web knowledge processing technologies to query it.
In a case study we demonstrate how our framework can be used to answer queries about static source code information for program comprehension purposes.
The Missing Link: Unifying Remote Data and Services
Speaker: William Cook (University of Texax, Austin)
Most large-scale applications integrate remote services and/or transactional databases. Yet building software that efficiently invokes distributed service and accesses relational databases is still quite difficult. Existing approaches to these problems are based on the Remote Procedure Call (RPC) and Object-Relational Mapping (ORM).
RPCs have been generalized to distributed object systems with remote proxies, a kind of remote object reference. ORM tools generally support a form of query sublanguage for efficient object selection.
The last 20 years have produced a long litany of technologies based on these concepts, including ODBC, CORBA, DCE, DCOM, RMI, DAO, OLEDB, SQLJ, JDBC, EJB, JDO, Hibernate, XML-RPC, Web Services and LINQ. Even with these technologies, complex design patterns for service facades and/or bulk data transfers must be followed to optimize communication between client and server or client and database, leading to programs that are difficult to modify and maintain. While significant progress has been made, there is no widely accepted solution or even agreement about what the solution should look like.
In this talk I present a new unified approach to invocation of distributed services and data access. The solution involves a novel control flow construct that partitions a program block into remote and local computations, while efficiently managing the communication between them. The solution does not require proxies or an embedded query language. Although the result itself is elegant and useful, what is more significant is the realization that the original problems cannot be solved using existing programming language constructs and libraries. This work calls into question our assumption that general-purpose programming languages are truly general-purpose.
ASPIC: Awareness-based Support Project for Interpersonal Collaboration
Speakers: Kevin Dullemond and Ben van Gameren
On the 1st October 2009 we started our PhD track. We will research what role awareness plays in collaborative software development, especially when the development team is distributed over multiple locations, and how this can be supported with technology. In this talk we will introduce this line of research and provide an example of a tool we developed which supports the awareness of ongoing conversations in a distributed setting with reduced effort (OCL - Open Conversation List).
Detecting (r)evolution events from software product measurement streams
Speaker: Jose Pedro Correia (Software Improvement Group, Amsterdam)
Collecting product metrics during the development or maintenance process is an increasingly common practice to monitor and thus have more control over both the progress and the quality of a software product. An important challenge remains in interpreting the data as it is being collected and transforming it into actionable information. We present an approach for discovering significant events in the production process from the associated stream of measurement data. At the heart of our approach lies the view of measurement data streams as functions for which derivatives can be calculated. A data point is then selected or not as an event based on those values and the history of activity until that point. The technique described has been used to put in place an 'alert service' for 54 systems monitored by the Software Improvement Group.
What your IDE could do once you understand your code
Speaker: Arie van Deursen
A significant part of today's program comprehension research addresses the long and complicated road a developer needs to travel to understand a given piece of code. But perhaps the best way to shorten this road, is by focusing on the eventual moment of enlightenment that marks the end of this road, when the developer actually understands the code and is about to make the required change.
Can we record this valuable moment of true understanding? Can the IDE know which methods, classes, execution traces, test cases, diagrams, or other artifacts contributed to this understanding? Are there light-weight mechanisms to ask the developer to record his understanding, for example via tagging, micro-blogging, or selection of a visualization that most accurately captures the understanding obtained? And is there a way to deal with non-monotonic understanding, in which some of the developer's earlier insights turn out to be false?
At the moment, we don't have answers to these questions. But, together with the audience, we will explore what results have been achieved so far, and which challenges need to be addressed to find the required answers. Furthermore, we will reflect on the simplest possible IDE that could offer this type of support for recording actual understanding, and investigate to what extent Web 2.0 based technologies can be used to realize such an IDE.
Managing Code Clones Using Dynamic Change Tracking and Resolution
Speaker: Andy Zaidman
Code cloning is widely recognized as a threat to the
maintainability of source code. As such, many clone detection
and removal strategies have been proposed. However,
some clones can often not be removed easily so other strategies,
based on clone management need to be developed. In
this paper we describe a clone management strategy based
on dynamically inferring clone relations by monitoring clipboard
activity. We introduce CLONEBOARD, our Eclipse
plug-in implementation that is able to track live changes to
clones and offers several resolution strategies for inconsistently
modified clones. We perform a user study with seven
subjects to assess the adequacy, usability and effectiveness
of CLONEBOARD, the results of which show that developers
actually see the added value of such a tool but have strict
requirements with respect to its usability.
Test Generation with Grammars and Covering Arrays
Speaker: Paul Strooper
Abstract:
Grammars and covering arrays have seen extensive use in test generation.
A covering-array algorithm takes a list of domains and generates a subset of the Cartesian product of the domains. A grammar-based test generation (GBTG) algorithm takes a grammar G and generates a subset of the language accepted by G. Covering arrays and GBTG are usually applied independently. We show that CFG rules and covering array specifications can be freely intermixed, with precise, intuitive and efficient generation. The potential benefits for automated test generation are significant. We present an approach for "tagging"
grammars with specifications for mixed-strength covering arrays, a generalisation of conventional covering arrays. We will demonstrate a prototype tool for generating test cases from tagged grammars.
Biography:
Paul Strooper is a Professor in the School of ITEE at The University of Queensland. He received the BMath and MMath degrees in Computer Science from the University of Waterloo, and the PhD degree in Computer Science in 1990 from the University of Victoria. His main research interest is Software Engineering, especially software specification, verification, and testing. He has had substantial interaction with industry through collaborative research projects, training and consultation in the area of software verification and validation. He was one of the program co-chairs for the 2002 Asia-Pacific Software Engineering Conference (APSEC), one of the General Chairs of the 2009 Australian Software Engineering Conferences (ASWEC) and the program chair for ASWEC in 2004 and 2005. He is member of Steering Committees for ASWEC and APSEC, and a member of the editorial board of the IEEE Transactions on Software Engineering and the Journal of Software Testing, Verification and Reliability.
Supporting Collaboration Awareness in Multi-developer Projects
Speaker: Anja Guzzi
Teamwork is necessary to produce large software systems in a reasonable amount of time. A team of developers working on the same project must deal with con current development. Collaboration among team members assumes a fundamental role during the whole development process of systems. Failing to appropriately take care of collaboration aspects, such as awareness, communication and synchronization, can result in the delay of a whole project. However, the negative consequences of uncoordinated concurrent development can be reduced with tool support for collaborative software development. We developed Scamp, an Eclipse Plug-in conceived to support collaboration awareness through visualization. Scamp is built on top of Syde, which provides an environment for synchronous development. Relying on the underlying structure, Scamp visualizes changes in a system as they happen in three different ways: a distinctive mark on changed entities, a Tag Cloud and a “Buckets view”.
Criteria for the Evaluation of Implemented Architectures
Speaker:
Eric Bouwers
Software architecture evaluation methods aim at identifying potential
maintainability problems for a given architecture. Several of these
methods exist, which typically prescribe the structure of the
evaluation process. Often left implicit, however, are the concrete
system attributes that need to be studied in order to assess the
maintainability of implemented architectures.
To determine this set of attributes, we have performed an empirical
study on over 40 commercial architectural evaluations conducted during
the past two years as part of a systematic ``Software Risk Assessment''.
We present this study and we explain how the identified attributes can be
projected on various architectural system properties, which provides an
overview of criteria for the evaluation of the maintainability of implemented software architectures.
Traits at Work
Speaker:
Stephane Ducasse
Traits have been proposed as a mechanism to compose and share behavioral units between distinct class hierarchies. They are an alternative to multiple inheritance, the most significant difference being that name conflicts must be explicitly resolved by the trait composer. Traits are recognized for their potential in supporting better composition and reuse. They have been integrated into a significant number of languages, such as Perl 6, Slate , Squeak , DrScheme OO and Fortress (SUN Microsystems). Although originally designed in a dynamically typed setting, several type systems have been built for Traits (Fisher, Smith, Liquori, Reppy).
We will present trait, their applications and their evolutions.
Link to research:
http://www.iam.unibe.ch/~scg/Research/Traits/index.html
A quick look at two research project at TUT
Speaker:
Tarja Systa
Analysis runtime behavior with Bebop and supporting interactive model transformations with DReAMT
We propose an approach, called
behavioral profiles, to model and validate architecturally significant behavior. Behavioral profiles are used to illustrate role-based behavioral rules using UML2 sequence diagram notation. With Bebop tool, runtime behavior can be analyzed and validated against the profiles. The main goal is to focus on certain, architecturally interesting behaviors only, instead of analyzing the whole event trace. Bebop can identify whether the behavioral rules, given as behavioral profiles, have been followed in the implementation.
In the DReAMT project, we have studied model-driven software development and analysis. Instead of assuming a set of predefined model transformations, we rather aim at developing transformations incrementally, gradually refining them into more complete ones as the body of knowledge of the domain grows. To achieve this, we describe discovered partial and incomplete transformations as patterns and iteratively refine them. When applying the transformations, we provide the user a possibility to make decisions and influence the transformation steps. The decisions are made as needed during the transformation and not in a separate step.
New Uses of Simulation in Distributed System Engineering
Speaker:
Alexander Wolf
Simulation has been used by software engineers for many years to study
the functionality and performance of complex distributed system designs.
For example, they are used to understand network protocols, tune
distributed systems, and improve distributed algorithms. They are
appealing to engineers because of their inherent efficiency and
scalability. Unlike many other development artifacts, simulations seem
to be used, and therefore well maintained, throughout the development
process, both as early design tools and as late evaluation tools. Given
the effort invested in the construction and maintenance of simulations,
and the degree to which developers trust in them, we wonder whether
there are other purposes to which they can be put.
In this talk I present two such uses, one to increase the power of
large-scale distributed experimentation and the other to develop a
rigorous testing method for distributed systems.
Improving Web Application Security and Reliability through Program Analysis
Speaker:
Alessandro Orso
Over the past decade, the popularity of web applications has steadily
grown. Nowadays, millions of users utilize web applications to access
a multitude of services, such as online banking, e-shopping, and
gaming. Because web applications have become such an essential part of
our daily lives, it is essential to devise effective quality assurance
techniques for these applications. Although there has been a great
deal of research in software quality assurance, many techniques that
work effectively on traditional software and on simple web
applications are inadequate when used on modern, highly dynamic web
applications. Part of the reason for this inadequacy is that
traditional abstractions used in these techniques, such as
control-flow, data-flow, and interfaces, are fairly different in web
applications. In this talk, we present a set of program-analysis based
techniques that we developed to address some of the shortcomings of
existing approaches. We also show how our techniques can support and
improve different quality assurance techniques for web applications.
Finally, we discuss the results of an empirical evaluation of our
techniques performed on a set of real web applications, in which we
assess their effectiveness and practical applicability.
Invariant-Based Automatic Testing of AJAX User Interfaces
Speaker:
Ali Mesbah
AJAX-based Web 2.0 applications rely on stateful asynchronous
client/server communication, and client-side runtime manipulation of the
DOM tree. This not only makes them fundamentally different from
traditional web applications, but also more error-prone and harder to
test. We propose a method for testing AJAX applications automatically,
based on a crawler to infer a flow graph for (client-side) user
interface states. We identify AJAXspecific faults that can occur in
such states (related to DOM validity, error messages, discoverability,
back-button compatibility, etc.) as well as DOM-tree invariants that can
serve as oracle to detect such faults. We implemented our approach in
ATUSA, a tool offering generic invariant checking components, a
plugin-mechanism to add application-specific state validators, and
generation of a test suite covering the paths obtained during crawling.
Studying the Relation Between Coding Standard Violations and Known Faults
Speaker:
Cathal Boogerd
In spite of the widespread use of coding standards and tools enforcing
their rules, there is little empirical evidence supporting the intuition
that they prevent the introduction of faults in software. In previous
work, we performed a pilot study to assess the relation between rule
violations and actual faults, using the MISRA C 2004 standard on an
industrial case. In this talk, we investigate three different aspects
of the relation between violations and faults on a larger case study,
and compare the results across the two projects. We find that 10 rules
in the standard are significant predictors of fault location.
Evaluating Visualization
Speaker:
Bas Cornelissen
Visualization is a common technique to support software understanding,
and many variants have been proposed in the literature in the past
decades. Yet, when it comes down to their evaluation, one typically
resorts to anecdotal evidence rather than actual human subjects. Until
recently, the same was true for our own visualization tool (the one with
the exotic and colorful look).
In this talk, we present the design of a controlled experiment to
quantitatively measure the added value of visualization for typical
software maintenance tasks. We then report the results of the actual
application of this design on our visualization tool and 24 human
subjects.
Software Deployment in a Dynamic Cloud: From Device to Service Orientation in a Hospital Environment
Speaker:
Sander van der Burg
Hospital environments are currently primarily device-oriented: software
services are installed, often manually, on specific devices. For
instance, an application to view MRI scans may only be available on a
limited number of workstations. The medical world is changing to a
service-oriented environment, which means that every software service
should be available on every device. However, these devices have widely
varying capabilities, ranging from powerful workstations to PDAs, and
high-bandwidth local machines to low-bandwidth remote machines. To
support running applications in such an environment, we need to treat
the hospital machines as a cloud, where components of the application
are automatically deployed to machines in the cloud with the required
capabilities and connectivity. In this talk, we suggest an
architecture for applications in such a cloud, in which components are
reliably and automatically deployed on the basis of a declarative model
of the application using the Nix package manager.
Execution Models to Describe Large Software-Intensive Systems
Speaker: Trosky B. Callo
Execution models describe what software system does at runtime and how it does it. Although it is obvious that execution models are important assets to facilitate system evolution, in practice development organization do not pay enough attention to create useful execution models.
In this talk we present the foundations and infrastructure that we have developed to support the construction of execution models within a large development organization developing a large and complex software-intensive system. The foundations includes the abstractions, concerns, and stakeholders of execution models. The infrastructure consist of sources of information and tools that an organization should make available in order to construct useful execution models without considerable overhead.
Virtual Components: Integration Testing of Data Flow-oriented systems
Speaker:
Eric Piel
In order to improve the quality and to test the correct behaviour of a component-based system, components should not only be unit-tested but also the integration of the components should be tested. In this presentation, after having detailed our motivations, we will first have a look at some existing approaches for integration testing. In particular, we will highlight the difficulties met when testing data flow-oriented systems, in which components never have any explicit expectation on the other components.
We will then present our approach based on the notion of "virtual components". Complementary to the other existing approaches, it allows to verify very specific behaviours of the system, while leveraging the techniques of unit testing. Later on, will be given the status on the current implementation and on the validation of the approach.
Assessing the Value of Coding Standards
Speaker:
Cathal Boogerd
In spite of the widespread use of coding standards and tools enforcing their rules, there is little empirical evidence supporting the intuition that they prevent the introduction of faults in software. Not only can compliance with a set of rules having little impact on the number of faults be considered wasted effort, but it can actually result in an increase in faults, as any modification has a non-zero probability of introducing a fault or triggering a previously concealed one. Therefore, it is important to build a body of empirical knowledge, helping us understand which rules are worthwhile enforcing, and which ones should be ignored in the context of fault reduction. In this talk, we discuss two approaches to quantify the relation between rule violations and actual faults, and present empirical data on this relation for the MISRA C 2004 standard on an industrial case study.
An observation-based model for fault localization
Speaker:
Rui Abreu
Automatic techniques for helping developers in finding the root causes of software failures
are extremely important in the development cycle of software. In this paper we study a dynamic
modeling approach to fault localization, which is based on logic reasoning over program traces.
We present a simple diagnostic performance model to assess the influence of various parameters,
such as test set size and coverage, on the debugging effort required to find the root causes of
software failures. The model shows that our approach unambiguously reveals the actual faults,
provided that sufficient test cases are available. This optimal diagnostic performance
is confirmed by numerical experiments. Furthermore, we present preliminary experiments on the
diagnostic capabilities of this approach using the single-fault Siemens benchmark set. We show
that, for the Siemens set, the approach presented in this paper yields a better diagnostic ranking
than other well-known techniques.
Further information:
TechReport
A Systematic Survey of Program Comprehension through Dynamic Analysis
Speaker:
Bas Cornelissen
Program comprehension is an important activity in software maintenance, as
software must be sufficiently understood before it can be properly modified.
The study of a program's execution, known as dynamic analysis, has become a
common technique in this respect and has received substantial attention by the
research community, particularly over the last decade.
This talk presents an introduction into the use of dynamic analysis for program
comprehension. Next, we report on the results of our systematic literature
survey on this topic, in which we selected and characterized a total of 172
articles on the basis of four main facets: activity, target, method, and
evaluation. We conclude with several important lessons learned and a series of
future directions.
A verifiable Posix File-System using Flash Memory
Speaker:
Kees Pronk
This talk will be about three related subjects:
- The Grand Challenges as defined in the UK by the UK Computer Research Committee and GC6 (Dependable Systems Evolution) in particular. Two of the projects to be highlighted are the Mondex Electronic Purse and the Posix compliant verifiable file store using Flash memory chips.
- The physics of Flash memory explaining the problems to be solved when constructing a reliable file store for use in harsh and remote environments (space stations, rovers).
- The implementation and test of the Posix file store using Model Checking (Spin) on a multi-core machine (Thesis work by Paul Taverne).
Putting Fluent Interfaces to the Test
Speaker: Eric Bouwers
The API of a framework usually consists of several configuration objects and a few methods that perform operations on these objects. To use the framework a programmer initializes several value-objects, configures these objects through set-methods and then uses these objects as parameters for the methods that do the actual work. Unfortunately, this approach often leads to a long list of object-creation and method-calls in which it is hard to see the relation between the objects. In order to make the configuration of an object more readable one could use a so-called "Fluent Interface". This term, introduced by Martin Fowler in 2005, describes an interface which is constructed to write readable code.
Within this presentation we will introduce Fluent Interfaces using small examples. Furthermore, we will take a look at the API of JMock, a Java-library for unit-testing which has designed his API as a Fluent Interface. Also, we will share our experiences in implementing and using an API which is designed as a Fluent Interface.
Can Faulty Models Help to Fix Faulty Programs?
Speaker: Wolfgang Mayer
Debugging programs is a tedious task where automated tool would be most beneficial to reducing development effort. Model-based fault isolation techniques have shown great success in debugging physical systems, however, the absence of suitable formalisations of the correct behaviour hampers direct application of this paradigm to most software systems. Instead, model-based debugging derives a model from the incorrect system. Hence, the question arises in how far such incorrect models can aid in locating faults in the underlying program.
In this talk I will explore different modelling paradigms that have been tried for model-based debugging and outline our experiences with selected models. I will show how the model-based framework relates to well-known program analysis techniques and that synergies between complementary paradigms can boost accuracy.
Automating Runtime Evolution and Verification of Component-based Systems
Speaker:
Alberto Gonzalez
Systems-of-Systems (SoS) represent a novel kind of system, for which runtime evolution is a key requirement, as components join and leave during runtime. Current component integration and verification techniques are not enough in such a dynamic environment. In the colloquium we will present a brief overview of the verification problem, plus the status of our current work and research platform: Atlas. Atlas is based on the Fractal component model, and extends it with ideas based on the built-in testing paradigm. Our long-term research objective (vision?) is devising a fully automated integration and verification process for dynamic systems, establishing what degree of automation can be performed by the runtime environment, and what other features need extra artefacts of support by the components.
Note: some of the ideas of this colloquium are available as a technical report at;
http://swerl.tudelft.nl/twiki/pub/Main/TechnicalReports/TUD-SERG-2008-007.pdf
Aspect-oriented Web Engineering
Speaker: Matthias Niederhausen (TU Dresden)
In a quickly evolving WWW, users are accessing web pages with an increasing range of devices and from different contexts. Adaptative web applications are one way to address this growing heterogeneity, but still suffer from laborious authoring processes. Because adaptation is defined directly on the web application, authors have to consider previously defined adaptation when doing updates to the site. Therefore, maintenance of such web applications is greatly complicated. To address these problems, we propose the use of aspect-oriented programming (AOP). By separating adaptation from the core web application into adaptation aspects, authoring and maintenance processes can be considerably simplified. We will give an overview of our current work on this field as well as open research issues and collaboration opportunities.
Can Developer Social Networks Predict Failures?
Speaker: Martin Pinzger
Software teams should follow a well defined goal and keep their work focused. Work fragmentation is bad for efficiency and quality. In this paper we empirically investigate the relationship between the fragmentation of developer contributions and the number of post-release failures. Our approach is to represent the structure of developer contributions with a contribution network. We use social network centrality measures to measure the degree of fragmentation of developer contributions. Fragmentation is determined by the centrality of software modules in the contribution network. Our claim is that central software modules are more likely to be failure-prone than modules located in surrounding areas of the contribution network. We analyze this hypothesis by exploring the network centrality of Microsoft Windows Vista modules using several social network centrality measures as well as linear and logistic regression analysis. In particular, we investigate which centrality measures are significant to predict the probability and number of post-release failures. Results of our experiments show that central modules are more failure-prone than modules located in surrounding areas of the network. The basic centrality measures, number of authors and number of commits, are significant predictors for the probability of failures. For predicting the number of post-release failures the closeness centrality measures are most significant.
Static Estimation of Test Coverage
Speaker: Tiago Alves
Test coverage is an important indicator for unit test quality. Test coverage can be computed by tools such as Clover or Emma by first instrumenting the code with logging functionality, and then running the instrumented code to log which parts are executed during unit test runs. Since computation of test coverage is a dynamic analysis, it presupposes a working installation of the analyzed software.
In the context of software quality assessment by an independent third party, a working installation is often not available. The evaluator may not have access to the required software libraries or hardware platform. The installation procedure may not be automated or even documented. Instrumentation may not be feasible, e.g. due to space or time limitations in case of embedded software.
In this paper, we propose a method for estimating test coverage through static analysis only. The method uses slicing of static call graphs to estimate the actual dynamic test coverage. We explain the method and its implementation in an analysis tool. We validate the results of the static estimation by comparison to actual values obtained through dynamic analysis using Clover.
Towards an Assessment Methodology for Trace Abstraction Techniques
Speaker:
Bas Cornelissen
The use of dynamic analysis for software understanding has become increasingly popular over the last years, and various abstraction techniques are being offered to counter the scalability concerns involved therein. A major issue in this respect is that such techniques are typically not assessed in a systematic fashion: most evaluations comprise the gathering of anecdotal evidence through the treatment of a limited test set, which makes the techniques difficult to compare.
In this presentation, we take a large step towards an assessment methodology for trace abstraction techniques. In particular, we propose a systematic process in which such techniques can be quantitatively assessed and compared. We also report on the application of this methodology on a selection of three trace abstraction technique. We discuss our findings and outline the directions for future work.
Note: this research is available as a technical report at
http://swerl.tudelft.nl/twiki/pub/Main/TechnicalReports/TUD-SERG-2008-005.pdf
Exposing the Hidden-Web Induced by Ajax
Speaker:
Ali Mesbah
Ajax is a very promising approach for improving rich interactivity and responsiveness of web applications.
At the same time, Ajax techniques increase the totality of the hidden web by shattering the metaphor of a web "page"
upon which general search engines are based.
We present a technique for exposing the hidden web content behind Ajax by automatically creating a traditional multi-page instance. In particular we propose a method for crawling Ajax applications and building a "state-flow graph" modeling the various navigation paths and states within an Ajax application.
This abstract model is used to generate linked static HTML pages and a corresponding Sitemap.
We present our tool called Crawljax which implements the concepts discussed in this presentation.
Additionally, we present a case study in which we apply our approach to two Ajax applications and elaborate on the obtained results.
Model Driven Engineering in the Road Traffic Management domain
Speakers: Michel Soares, Jos Vrancken
Road traffic in many industrialized countries is both highly important and highly problematic, the latter due to high casualty rates, frequent congestion and high environmental pollution. Dynamic traffic management (DTM) is an important means to control traffic and counter these negative effects.
DTM-systems are complex, software intensive systems, that are very expensive to deploy, due to the IT-infrastructure required along the roads. On the other hand, DTM is a lively research area, coming up with new and innovative control measures regularly. This entails a strong need for DTM systems and their infrastructure to be as flexible as possible, in order to accomodate new measures without renewed high investments. Yet DTM systems involve human life, so reliability is also a prime requirement.
In this talk we show how Model Driven Engineering can help in building systems with both high flexibility and high reliability. To that end a number of different Software and System engineering techniques and methodologies will be integrated into a MDE method, and applied for the road traffic domain. Components of this methodology include Model Driven Requirements Engineering, UML Profiles (SysML) and Petri Nets.
Achieving Incremental Generative Model-driven Development
Speaker: T.D. Meijler (SAP, Dresden)
In the standard generative Model-driven Architecture (MDA) adapting the models of an existing system requires a full re-generation and restarting of that system. This is due to a strong separation between the modeling environment and the run-time environment. Certain current approaches remove this separation by allowing a system to be changed in an incremental way through incremental model changes. These approaches are however based on interpretation of modeling information. In this presentation I present an approach –largely realized in a commercial R&D project– that enables full-fledged incremental generative model-driven development, i.e., applying incremental changes in a generative model-driven fashion to a system that has itself been developed in a generative model-driven way. To achieve this, model changes must be propagated to impacted elements along three dimensions: generated implementation, data (instances) and modelled dependencies. The result suggests a fundamental rethinking of the MDA, where the three dimensions are explicitly represented in an integrated modelling – run-time environment to enable total traceability.
An Integrated System to Manage Crosscutting Concerns in Source Code
Speaker:
Marius Marin (TU Delft)
Evolution of software systems accounts for the largest part of their lifecycle and
costs. Software engineers therefore, more often than developing new systems,
work on complex, existing ones that they have to understand in order to
modify them. Understanding such systems requires insight into the
various concerns the systems implement, many of
which have to be inferred from source code.
Particularly challenging for software comprehension, and consequently,
software evolution, are those concerns said to be crosscutting:
implementation of such concerns lacks modularity and results in
scattered and tangled code.
The research presented in this talk proposes an integrated approach
to consistent comprehension, identification, documentation, and migration
of crosscutting concerns in existing systems. This work is aimed
at helping software engineers to more easily understand and manage
such concerns in source code. As a final step of our approach,
we also experiment with the refactoring of crosscutting concerns to
aspect-oriented programming and reflect on the support provided by this
new programming technique for improving modularization of concerns.
Reuseware -- Generic Invasive Software Composition
Speaker:
Steffen Zschaler (TU Dresden)
Traditionally, most composition systems have been either black or white box. Black box composition systems allow no knowledge about the internal structure of components to be used in composing them. White box composition systems, on the other hand, allow complete knowledge about internal structures to be used and allow these structures to be manipulated in any manner when performing compositions. Invasive software composition is maintaining a middle ground in this area by following a grey-box approach to composition. Here, the structure of a component is exposed and made available for manipulation in a controlled manner. Components specify explicitly what parts of their structure should be exposed and the composition system uses this specification to construct a composition interface for the components.
The presentation gives an overview of Reuseware, a generic implementation of invasive software composition based on Eclipse. Reuseware allows invasive composition concepts to be integrated with any arbitrary language and generates parsing, editing, and composition tooling for these languages.
On How Developers Test Open Source Software Systems
Speaker:
Andy Zaidman
Engineering software systems is a multidisciplinary activity,
whereby a number of artifacts must be created and
maintained synchronously. In this paper we investigate
whether production code and the accompanying tests co-evolve
by exploring a project's versioning system, code coverage reports and
size-metrics. Three open source case studies teach us that testing activities
usually start later on during the lifetime and are more "phased", although we
did not observe increasing testing activity before releases. Furthermore, we
note big differences in the levels of test coverage given the proportions of
test code.
Documenting Crosscutting Concerns Using Queries
Speaker:
Marius Marin
SoQueT is a tool that supports consistent documentation of crosscutting concerns in source code by using a set of pre-defined queries. Each query describes a typical relation and implementation idiom of crosscutting concerns, which we recognize as a concern
sort. The tool allows the user to parameterize the sort queries in dedicated user-interfaces and then use these queries to document crosscutting concerns as sort instances. Such instances are building blocks that can be composed in a concern model to describe more complex relations or design decisions in the system under investigation. Concern models assist new developers or system maintainers in recognizing crosscutting relations and program elements (i.e., the query results) participating in these relations.
The demo will show how SoQueT can be used to document crosscutting concerns, and how this documentation is useful for program comprehension and software evolution purposes.
Domain-Specific Languages in Perspective
Speaker: Jan Heering (CWI)
Domain-specific languages (DSLs) are languages tailored to a specific
application domain. They offer substantial gains in expressiveness and
ease of use compared with general-purpose programming languages in
their domain of application. While the use of DSLs is by no means new,
it is receiving increased attention in the context of software
technologies such as software product-line engineering, software
factories, language-oriented programming, and generative
programming. These approaches advocate the development and use of DSLs
as essential to the construction of software system families. We
discuss these trends from the perspective of the roles DSLs have
traditionally played.
Towards the Generation of a Text-Based IDE from a Language Metamodel
Speaker: Anneke Kleppe (University of Twente)
In the model driven world languages are usually specified by a (meta)
model of their abstract syntax. For textual languages this is different
from the traditional approach, where the language is specified by a
(E)BNF grammar. Support for the designer of textual languages, e.g. a
parser generator, is therefore normally based on grammars. This paper
shows that similar support for language design based on metamodels is
not only possible, but is even more powerful than the support based on
grammars. In this paper we describe how an integrated development
environment for a language can be generated from the language’s abstract
syntax metamodel, thus providing the language designer with the
possibility to quickly, and with little effort, create not only a new
language but also the tooling necessary for using this language.
Wireless Sensor Networks: A Promise that Fails to Deliver
Speaker: Koen Langendoen
Wireless Sensor Network (WSN) technology has been
envisioned to revolutionize society by providing unlimited access to
context data ranging from simple scalars (temperature, light, heartbeat,
etc.) to aggregated information (object tracking). This vision has
inspired a large body of research on how to construct large-scale,
self-organizing networks out of resource-scarce sensor nodes running
of batteries or ambient energy sources. The output has been a range of
new protocols/algorithms trading off performance for energy efficiency.
Taking these protocols from theory (simulation) to practice, however, has
proven to be a mission impossible. The prime reason being that traditional
software development for embedded systems -- testing -- does not scale
to WSNs because of the large number of nodes involved and the unreliable
nature of the wireless medium. Also, debugging a wireless system is a
"challenge" because of the limited access, especially when the protocol
stack is under development. Many pilot WSN deployments have learned
this the hard way, and if nothing changes WSN is doomed to fail entering
the consumer market where development cost is a key factor to success.
So HELP! Do you software engineering researchers have a proper development
technology that could save the day? Any pointers greatly appreciated!
Meta-analysis for the linux kernel: recognising, describing and analysing kernel domain abstractions expressed in C
Speaker: Peter Breuer
The linux kernel is written in low-level C and assembler, yet
expresses much higher level concepts. While a C compiler does a good
job of detecting and advising of low-level problems such as writing a
signed value into an unsigned receptacle for it, it cannot warn of
higher level problems such as accessing a field of a compound
structure that is potentially visible to many different threads of
computation at the same time without first having taken an appropriate
lock. Detecting that kind of problem requires a much higher level
analysis which takes into account domain knowledge of the kernel
structures and interfaces and their intended mode of use.
This talk will set out some of the experience gained and problems
experienced (as well as solutions adopted, clearly) in recognising
the higher-level structures in the kernel and codifying their
intended modes of use, and detecting the real programming
irregularities in the kernel that have arisen in the last ten years
of practice. The analysis tool is an extensible logic compiler and
lessons from its development should be applicable to the general
problem of extending languages with domain-specific knowledge.
Declarative Object Identity using Relation Types
Speaker: Frank Tip (IBM)
Object-oriented languages define the identity of an object to be an
address-based object identifier. The programmer may customize the
notion of object identity by overriding the equals() and hashCode()
methods following a specified contract. This customization often
introduces latent errors, since the contract is unenforced, and at
times impossible to satisfy. Notably, equals() may refer to mutable
state, which allows object identity to change during execution,
breaking standard library invariants.
We propose a programming model based on a relational view of the heap
which defines identity declaratively, obviating the need for equals()
and hashCode() methods. Each element in the heap (called a tuple)
belongs to a relation type and relates an immutable identity to
mutable state. The model entails a stricter contract: identity never
changes during an execution. Objects, values, and singletons arise as
special cases of tuples.
We formalize the model as an adaptation of Featherweight Java, and
implement it by extending Java with relation types. Experiments on a
set of Java programs show that the majority of classes that override
equals() can be refactored into relation types, and that most of the
remainder are buggy or fragile.
This work will be presented at ECOOP'07.
Grammar Engineering Support for Precedence Rule Recovery and Compatibility Checking
Speaker: Martin Bravenboer
A wide range of parser generators are used to generate parsers
for programming languages. The grammar formalisms that come
with parser generators provide different approaches for
defining operator precedence. Some generators (e.g. YACC)
support precedence declarations, others require the grammar to
be unambiguous, thus encoding the precedence rules. Even if
the grammar formalism provides precedence rules, a particular
grammar might not use it.
The result is grammar variants implementing the same
language. For the C language, the GNU Compiler uses YACC with
precedence rules, the C-Transformers uses SDF without
priorities, while the SDF library does use priorities. For
PHP, Zend uses YACC with precedence rules, whereas PHP-front
uses SDF with priority and associativity declarations.
The variance between grammars raises the question if the
precedence rules of one grammar are compatible with those of
another. This is usually not obvious, since some languages
have complex precedence rules. Also, for some parser
generators the semantics of precedence rules is defined
operationally, which makes it hard to reason about their
effect on the defined language.
We present a method and tool for comparing the precedence
rules of different grammars and parser generators. Although it
is undecidable whether two grammars define the same language,
this tool provides support for comparing and recovering
precedence rules, which is especially useful for reliable
migration of a grammar from one grammar formalism to
another. We evaluate our method by the application to
non-trivial mainstream programming languages, such as PHP and
C.
Modularization of Language Constructs: A Reflective Approach
Speaker: Thomas Cleenewerck (VUB)
Programming languages are in a continuous state of flux in order to keep
up with emerging needs of programmers. They are grown with new constructs
so that programmers can express the problems from their domain within the
language they are using. Growing languages means to grow their implementations
along with them. To support this, we wish to preserve the decomposition
of languages into language constructs in their implementations. As the design
of a language implementation directly reflects our intuitive decomposition, a
developer can engage in the natural process of developing a language.
We preserve the decomposition into language constructs by modularizing the
definition of language constructs in separate implementation modules containing
their syntactical representation and their translational semantics. In this
setting, growing a language boils down to writing or selecting the appropriate
language constructs and establishing the necessary interactions. As the language
is continuously evolving during its implementation and future evolutions,
the modularization of the language constructs renders the implementation less
susceptible to the continuous changes.
The modularization of language implementations has been the subject of
much research in the domain of compiler technology. The complexity of this
research lies in the fact that language constructs intrinsically take into account
other language constructs and therefore compromise their opportunities for
modularization. Indeed, the mechanisms presented by the contemporary state
of the art technologies for separating a language implementation into modules
do not suffice.
In this talk, we present a lightweight formal model for the modularization
of language constructs.From this model we deduce a new language implementation
design in which languages consists of three kinds of concerns: the basic
language concerns defining the language constructs, the language specification
defining the interactions between the basic concerns by using the special-purpose
concerns which define the mechanisms to implement the interactions.
As a solution for the above model, we present an open design for a new
language development technique: A language implementation is decomposed
into a set of interacting language modules called linglets. Each linglet defines
in isolation the syntax and the (translational) semantics of a single language
construct in terms of another (lower level) language. The mechanisms to establish
the necessary interactions among the language constructs are captured
in strategies. The strategies are defined as extensions of a specifically tailored
metaobject protocol (MOP).
Linglets and strategies can be reused across language implementations. In
addition, new linglets and strategies can be defined, and existing ones can be
specialized to respectively establish the interactions with other linglets and to
meet and adapt the strategy for the challenges in a particular language implementation.
We validate our approach by developing a non-trivial family of domainspecific
languages using a shared pool of language constructs and strategies,
and by implementing the necessary metalanguages. The resulting language implementations
are optimized with respect to separation of concerns according
to their language constructs.
Middleware for Semantic Service Advertisement and Discovery on MANETs
Speaker:
Zef Hemel
MANETs (Mobile Ad-hoc Networks) offer exciting new research
opportunities now that devices with wireless capabilities become more
widespread. Many wireless technologies, such as 802.11, support these
ad-hoc style networks. Opportunities lie in many areas, such as
routing protocols, services and applications. The network topology of
MANETs is constantly changing and the devices on these networks, like
laptops and PDAs, have limited processing and battery power.
Research on low-level protocols that do semantic service discovery on
ad-hoc networks is emerging. Pervasive and mobile computing
applications require these protocols, but using them requires a lot
of engineering and knowledge of network protocols and service matching.
This talk will give an overview of the challenges that lie in this
area and how they are addressed by the developed middleware. The
purpose of the middleware is to make the task of defining,
advertising and discovering semantic services on MANETs more straight-
forward by offering APIs to complete these tasks. As part of the
semantic service matching, context such as location and workload can
be defined and matched to further improve the discovery results.
Services and context are described using ontologies. Queries for
services can be expressed in a newly developed query language called
RaSSQL (RDF and Semantic Service Query Language). The middleware,
implemented in Python, is based on ideas from the OntoMobil protocol
(developed at TCD), but can use any protocol that discovers services
based on concept dissemination.
ComplexityMap, Bridging the Gap
Speaker: Mark Hissink Muller
A diagramming style will be presented, which aims to bridge the gap between business users and application developers/architects, by de/pre-scribing a common way of looking at integration architecture. This new style can help to open the traditional black-box, which is too often used as a management approach in system development and maintainance.
Based on the diagramming style, a proof of concept will be shown to indicate quality in various parts of a J2EE-application. Such a 'ComplexityMap' allows quality to be monitored and managed as an integral part of the application lifecycle.
Model-Driven Consistency Checking of Behavioural Specifications
Speaker:
Bas Graaf
For the development of software intensive systems different
types of behavioural specifications are used. Although
such specifications should be consistent with respect
to each other, this is not always the case in practice.
Maintainability problems are the result. In this
paper we propose a technique for assessing the consistency
of two types behavioural specifications: scenarios
and state machines. The technique is based on the
generation of state machines from scenarios. We specify
the required mapping using model transformations. The
use of technologies related to the Model Driven Architecture
enables easy integration with widely adopted (UML)
tools. We applied our technique to assess the consistency
of the behavioural specifications for the embedded software
of copiers developed by Oce. Finally, we evaluate
the approach and discuss its generalisability and wider
applicability.
Towards Unification of Software Component Procurement and Integration Approaches
Speaker:
Gerd Gross
Software component procurement and integration are primarily
based upon having the right communication mechanisms available
that can map component customer requirements to component provider
specifications. Such mechanisms are currently only available on lower levels
of abstraction, close to the implementation level. This paper describes
the research being performed at the TU Delft Embedded Software Laboratory
to elevate typical component feature mapping mechanisms from
the implementation level up onto the design and requirements engineering
levels.
An Evaluation of Similarity Coefficients for Software Fault Localization
Speaker:
Rui Abreu
Automated diagnosis of software faults can improve the efficiency
of the debugging process, and is therefore an important technique
for the development of dependable software. In this presentation we
discuss different similarity coefficients that are applied in the context
of a program spectral approach to software fault localization (single
programming mistakes). The coefficients studied are taken from the
systems diagnosis / automated debugging tools Pinpoint, Tarantula,
and AMPLE, and from the molecular biology domain (the Ochiai coefficient).
We evaluate these coefficients on the Siemens Suite of benchmark faults,
and assess their effectiveness in terms of the position of the actual
fault in the probability ranking of fault candidates produced by the
diagnosis technique. Our experiments indicate that the Ochiai coefficient
consistently outperforms the coefficients currently used by the tools
mentioned. In terms of the amount of code that needs to be inspected,
this coefficient improves 5% on average over the next best technique,
and up to 30% in specific cases.
Visualizing Testsuites to Aid in Software Understanding
Speaker:
Bas Cornelissen
Agile methods and eXtreme Programming have brought renewed attention
to testing during the software development process, both as a quality
assurance method and as a form of live documentation. It is for this
reason that a software system's testsuite is an ideal starting point
for gaining knowledge about its inner workings.
We propose to use sequence diagrams to visualize information that was
dynamically obtained from testsuites. We employ abstraction techniques
such as constructor hiding and stack depth limitation to make the
diagrams more scalable. We use JPacman as a case study, validate our
results by consulting with a domain expert, and use his feedback to finetune our techniques.
Prioritizing Software Inspection Results using Static Profiling
Speaker:
Cathal Boogerd
Static software checking tools are useful as an additional automated software
inspection step that can easily be integrated in the development cycle and
assist in creating secure, reliable and high quality code. However, an often
quoted disadvantage of these tools is that they generate an overly large
number of warnings, including many false positives due to the approximate
analysis techniques. This information overload effectively limits their
usefulness.
In this talk we will discuss ELAN, a technique that helps the user prioritize
the information generated by a software inspection tool, based on a
demand-driven computation of the likelihood that execution reaches the
locations for which warnings are reported. This analysis is orthogonal to
other prioritization techniques known from literature, such as severity levels
and statistical analysis to reduce false positives. We evaluate feasibility of
our technique using a number of case studies and assess the quality of our
predictions by comparing them to actual values obtained by dynamic profiling.