PUBLICATIONS

A Modelling Language for Defining Cloud Simulation Scenarios in RECAP Project Context

Authors:

Cleber Matos de Morais, Patricia Endo, Sergej Svorobej, Theo Lynn

Abstract:

RECAP is a European Union funded project that seeks to develop a next-generation resource management solution, from both technical and business perspectives, when adopting technological solutions spanning across cloud, fog, and edge layers. The RECAP project is composed of a set of use cases that present highly complex and scenario-specific requirements that should be modelled and simulated in order to find optimal solutions for resource management. Due use cases characteristics, configuring simulation scenarios is a high time consuming task and requires staff with specialist expertise.

ALPACA: Application Performance Aware Server Power Capping

Authors:

Jakub Krzywda, Ahmed Ali-Eldin, Eddie Wadbro, Per-Olov Östberg, Erik Elmroth

Abstract:

Server power capping limits the power consumption of a server to not exceed a specific power budget. This allows data center operators to reduce the peak power consumption at the cost of performance degradation of hosted applications. Previous work on server power capping rarely considers Quality-of-Service (QoS) requirements of consolidated services when enforcing the power budget. In this paper, we introduce ALPACA, a framework to reduce QoS violations and overall application performance degradation for consolidated services. ALPACA reduces unnecessary high power consumption when there is no performance gain, and divides the power among the running services in a way that reduces the overall QoS degradation when the power is scarce. We evaluate ALPACA using four applications: MediaWiki, SysBench, Sock Shop, and CloudSuite’s Web Search benchmark. Our experiments show that ALPACA reduces the operational costs of QoS penalties and electricity by up to 40% compared to a non optimized system.

Application, Workload, and Infrastructure Models for Virtualized Content Delivery Networks Deployed in Edge Computing Environments

Authors:

Thang Le Duc, Per-Olov Ostberg

Abstract:

Content Delivery Networks (CDNs) are handling a large part of the traffic over the Internet and are of growing importance for management and operation of coming generations of data intensive applications. This paper addresses modeling and scaling of content-oriented applications, and presents workload, application, and infrastructure models developed in collaboration with a large-scale CDN operating infrastructure provider aimed to improve the performance of content delivery subsystems deployed in wide area networks. It has been shown that leveraging edge resources for the deployment of caches of content greatly benefits CDNs. Therefore, the models are described from an edge computing perspective and intended to be integrated in network topology aware application orchestration and resource management systems.

Analyzing the Availability and Performance of an E-Health System Integrated with Edge, Fog and Cloud Infrastructures

Authors:

Guto Leoni Santos, Patricia Takako Endo, Matheus Felipe Ferreira da Silva Lisboa Tigre, Leylane Graziele Ferreira da Silva, Djamel Sadok, Judith Kelner and Theo Lynn

Abstract:

The Internet of Things has the potential of transforming health systems through the collection and analysis of patient physiological data via wearable devices and sensor networks. Such systems can offer assisted living services in real-time and offer a range of multimedia-based health services. However, service downtime, particularly in the case of emergencies, can lead to adverse outcomes and in the worst case, death. In this paper, we propose an e-health monitoring architecture based on sensors that relies on cloud and fog infrastructures to handle and store patient data. Furthermore, we propose stochastic models to analyze availability and performance of such systems including models to understand how failures across the Cloud-to-Thing continuum impact on e-health system availability and to identify potential bottlenecks. To feed our models with real data, we design and build a prototype and execute performance experiments. Our results identify that the sensors and fog devices are the components that have the most significant impact on the availability of the e-health monitoring system, as a whole, in the scenarios analyzed. Our findings suggest that in order to identify the best architecture to host the e-health monitoring system, there is a trade-off between performance and delays that must be resolved.

ATMoN: Adapting the “Temporality” in Large-Scale Dynamic Networks

Authors:

Demetris Trihinas, Luis F. Chiroque, George Pallis, Antonio Fernandez Anta, Marios D. Dikaiakos

Abstract:

With the widespread adoption of temporal graphs to study fast evolving interactions in dynamic networks, attention is needed to provide graph metrics in time and at scale. In this paper, we introduce ATMoN, an open-source library developed to computationally offload graph processing engines and ease the communication overhead in dynamic networks over an unprecedented wealth of data. This is achieved, by efficiently adapting, in place and inexpensively, the temporal granularity at which graph metrics are computed based on runtime knowledge captured by a low-cost probabilistic learning model capable of approximating both the metric stream evolution and the volatility of the graph topology. After a thorough evaluation with real-world data from mobile, face-to-face and vehicular networks, results show that ATMoN is able to reduce the compute overhead by at least 76%, data volume by 60% and overall cloud costs by at least 54%, while always maintaining accuracy above 88%.

Done Yet? A Critical Introspective of the Cloud Management Toolbox

Authors:

Mark Leznik, Simon Volpert, Frank Griesinger, Daniel Seybold, Jörg Domaschka

Abstract:

With the rapid rise of the cloud computing paradigm, the manual maintenance and provisioning of the technological layers behind it, both in their hardware and virtualized form, became cumbersome and error-prone. This has opened up the need for automated capacity planning strategies in heterogeneous cloud computing environments. However, even with mechanisms to fully accommodate customers and fulfill service-level agreements, providers often tend to over-provision their hardware and virtual resources. A proliferation of unused capacity leads to higher energy costs, and correspondingly, the price for cloud technology services. Capacity planning algorithms rely on data collected from the utilized resources. Yet, the amount of data aggregated through the monitoring of hardware and virtual instances does not allow for a manual supervision, much less data analysis or a correlation and anomaly detection. Current data science advancements enable the assistance of efficient automation, scheduling and provisioning of cloud computing resources based on supervised and unsupervised machine learning techniques. In this work, we present the current state of the art in monitoring, storage, analysis and adaptation approaches for the data produced by cloud computing environments, to enable proactive, dynamic resource provisioning.

Power-Performance Tradeoffs in Data Center Servers: DVFS, CPU pinning, Horizontal, and Vertical Scaling

Authors:

Jakub Krzywda, Ahmed Ali-Eldin, Trevor E.Carlson, Per-Olov Östberg, Erik Elmroth

Abstract:

Dynamic Voltage and Frequency Scaling (DVFS), CPU pinning, horizontal, and vertical scaling, are four techniques that have been proposed as actuators to control the performance and energy consumption on data center servers. This work investigates the utility of these four actuators, and quantifies the power-performance tradeoffs associated with them. Using replicas of the German Wikipedia running on our local testbed, we perform a set of experiments to quantify the influence of DVFS, vertical and horizontal scaling, and CPU pinning on end-to-end response time (average and tail), throughput, and power consumption with different workloads. Results of the experiments show that DVFS rarely reduces the power consumption of underloaded servers by more than 5%, but it can be used to limit the maximal power consumption of a saturated server by up to 20% (at a cost of performance degradation). CPU pinning reduces the power consumption of underloaded server (by up to 7%) at the cost of performance degradation, which can be limited by choosing an appropriate CPU pinning scheme. Horizontal and vertical scaling improves both the average and tail response time, but the improvement is not proportional to the amount of resources added. The load balancing strategy has a big impact on the tail response time of horizontally scaled applications.

Towards understanding HPC users and systems: A NERSC case study

Authors:

Gonzalo P. Rodrigo, P-O Östberg, Erik Elmroth, Katie Antypas, Richard Gerber, Lavanya Ramakrishnan

Abstract:

High performance computing (HPC) scheduling landscape currently faces new challenges due to the changes in the workload. Previously, HPC centers were dominated by tightly coupled MPI jobs. HPC workloads increasingly include high-throughput, data-intensive, and stream-processing applications. As a consequence, workloads are becoming more diverse at both application and job levels, posing new challenges to classical HPC schedulers. There is a need to understand the current HPC workloads and their evolution to facilitate informed future scheduling research and enable efficient scheduling in future HPC systems.

Reliable Capacity Provisioning for Distributed Cloud/Edge/Fog Computing Applications

Authors:

P-O Ostberg, James Byrne, Paolo Casari, Philip Eardley, Antonio Fernández Anta, Johan Forsman, John Kennedy, Thang Le Duc, Manuel Noya Mariño, Radhika Loomba, Miguel Angel López Peña, Jose Lopez Veiga, Theo Lynn, Vincenzo Mancuso, Sergej Svorobej, Anders Torneus, Stefan Wesner, Peter Willis, Jörg Domaschka

Abstract:

The REliable CApacity Provisioning and enhanced remediation for distributed cloud applications (RECAP) project aims to advance cloud and edge computing technology, to develop mechanisms for reliable capacity provisioning, and to make application placement, infrastructure management, and capacity provisioning autonomous, predictable and optimized. This paper presents the RECAP vision for an integrated edge-cloud architecture, discusses the scientific foundation of the project, and outlines plans for toolsets for continuous data collection, application performance modeling, application and component auto-scaling and remediation, and deployment optimization. The paper also presents four use cases from complementing fields that will be used to showcase the advancements of RECAP.

A Preliminary Systematic Review of Computer Science Literature on Cloud Computing Research using Open Source Simulation Platforms

Authors:

Theo Lynn, Anna Gourinovitch, James Byrne, PJ Byrne, Sergej Svorobej, Konstaninos Giannoutakis, David Kenny and John Morrison

Abstract:

Research and experimentation on live hyperscale clouds is limited by their scale, complexity, value and and issues of commercial sensitivity. As a result, there has been an increase in the development, adaptation and extension of cloud simulation platforms for cloud computing to enable enterprises, application developers and researchers to undertake both testing and experimentation. While there have been numerous surveys of cloud simulation platforms and their features, few surveys examine how these cloud simulation platforms are being used for research purposes. This paper provides a preliminary systematic review of literature on this topic covering 256 papers from 2009 to 2016. The paper aims to provide insights into the current status of cloud computing research using open source cloud simulation platforms. Our two-level analysis scheme includes a descriptive and synthetic analysis against a highly cited taxonomy of cloud computing. The analysis uncovers some imbalances in research and the need for a more granular and refined taxonomy against which to classify cloud computing research using simulators. The paper can be used to guide literature reviews in the area and identifies potential research opportunities for cloud computing and simulation researchers, complementing extant surveys on cloud simulation platforms.

A Review of Cloud Computing Simulation Platforms and Related Environments

Authors:

James Byrne, Sergej Svorobej, Konstantinos Giannoutakis, Dimitrios Tzovaras, PJ Byrne, P-O Östberg, Anna Gourinovitch, Theo Lynn

Abstract:

Recent years have seen an increasing trend towards the development of Discrete Event Simulation (DES) platforms to support cloud computing related decision making and research. The complexity of cloud environments is increasing with scale and heterogeneity posing a challenge for the efficient management of cloud applications and data centre resources. The increasing ubiquity of social media, mobile and cloud computing combined with the Internet of Things and emerging paradigms such as Edge and Fog Computing is exacerbating this complexity. Given the scale, complexity and commercial sensitivity of hyperscale computing environments, the opportunity for experimentation is limited and requires substantial investment of resources both in terms of time and effort. DES provides a low risk technique for providing decision support for complex hyperscale computing scenarios. In recent years, there has been a significant increase in the development and extension of tools to support DES for cloud computing resulting in a wide range of tools which vary in terms of their utility and features. Through a review and analysis of available literature, this paper provides an overview and multi-level feature analysis of 33 DES tools for cloud computing environments. This review updates and extends existing reviews to include not only autonomous simulation platforms, but also on plugins and extensions for specific cloud computing use cases. This review identifies the emergence of CloudSim as a de facto base platform for simulation research and shows a lack of tool support for distributed execution (parallel execution on distributed memory systems).