Publications

Jiang Y, Wang S, Valls V, Bong Jun K, Wei-Han L, Leung K, Tassiulas Let al., 2023, Model pruning enables efficient federated learning on edge devices, IEEE Transactions on Neural Networks and Learning Systems, Vol: 34, Pages: 10374-10386, ISSN: 1045-9227

Federated learning (FL) allows model training from local data collected by edge/mobile devices while preserving data privacy, which has wide applicability to image and vision applications. A challenge is that client devices in FL usually have much more limited computation and communication resources compared to servers in a data center. To overcome this challenge, we propose PruneFL--a novel FL approach with adaptive and distributed parameter pruning, which adapts the model size during FL to reduce both communication and computation overhead and minimize the overall training time, while maintaining a similar accuracy as the original model. PruneFL includes initial pruning at a selected client and further pruning as part of the FL process. The model size is adapted during this process, which includes maximizing the approximate empirical risk reduction divided by the time of one FL round. Our experiments with various datasets on edge devices (e.g., Raspberry Pi) show that: 1) we significantly reduce the training time compared to conventional FL and various other pruning-based methods and 2) the pruned model with automatically determined size converges to an accuracy that is very similar to the original model, and it is also a lottery ticket of the original model.

Journal article

Zafari F, Basu P, Leung KK, Li J, Towsley D, Swami Aet al., 2023, Resource Sharing in the Edge: A Distributed Bargaining-Theoretic Approach, IEEE Transactions on Network and Service Management, Vol: 20, Pages: 4369-4382

The growing demand for edge computing resources, particularly due to increasing popularity of Internet of Things (IoT), and distributed machine/deep learning applications poses a significant challenge. On the one hand, certain edge service providers (ESPs) may not have sufficient resources to satisfy their applications according to the associated service-level agreements. On the other hand, some ESPs may have additional unused resources. In this paper, we propose a resource-sharing framework that allows different ESPs to optimally utilize their resources and improve the satisfaction level of applications subject to constraints such as communication cost for sharing resources across ESPs. Our framework considers that different ESPs have their own objectives for utilizing their resources, thus resulting in a multi-objective optimization problem. We present an N-person Nash Bargaining Solution (NBS) for resource allocation and sharing among ESPs with Pareto optimality guarantee. Furthermore, we propose a distributed, primal-dual algorithm to obtain the NBS by proving that the strong-duality property holds for the resultant resource sharing optimization problem. Using synthetic and real-world data traces, we show numerically that the proposed NBS based framework not only enhances the ability to satisfy applications' resource demands, but also improves utilities of different ESPs.

Abstract
Cite

Journal article

Wang H, Liu CH, Yang H, Wang G, Leung KKet al., 2023, Ensuring Threshold AoI for UAV-Assisted Mobile Crowdsensing by Multi-Agent Deep Reinforcement Learning With Transformer, IEEE-ACM TRANSACTIONS ON NETWORKING, ISSN: 1063-6692

Journal article

Chen Z, Leung KK, Wang S, Tassiulas L, Chan K, Towsley Det al., 2022, Use coupled LSTM networks to solve constrained optimization problems, IEEE Transactions on Cognitive Communications and Networking, Vol: 9, Pages: 304-316, ISSN: 2332-7731

Gradient-based iterative algorithms have been widely used to solve optimization problems, including resource sharing and network management. When system parameters change, it requires a new solution independent of the previous parameter settings from the iterative methods. Therefore, we propose a learning approach that can quickly produce optimal solutions over a range of system parameters for constrained optimization problems. Two Coupled Long Short-Term Memory networks (CLSTMs) are proposed to find the optimal solution. The advantages of this framework include: (1) near-optimal solution for a given problem instance can be obtained in few iterations during the inference, (2) enhanced robustness as the CLSTMs can be trained using system parameters with distributions different from those used during inference to generate solutions. In this work, we analyze the relationship between minimizing the loss functions and solving the original constrained optimization problem for certain parameter settings. Extensive numerical experiments using datasets from Alibaba reveal that the solutions to a set of nonconvex optimization problems obtained by the CLSTMs reach within 90% or better of the corresponding optimum after 11 iterations, where the number of iterations and CPU time consumption are reduced by 81% and 33%, respectively, when compared with the gradient descent with momentum.

Journal article

Panigrahy NK, Vasantam T, Basu P, Towsley D, Swami A, Leung KKet al., 2022, On the Analysis and Evaluation of Proximity-based Load-balancing Policies, ACM TRANSACTIONS ON MODELING AND PERFORMANCE EVALUATION OF COMPUTING SYSTEMS, Vol: 7, ISSN: 2376-3639

Journal article

Gkelias A, Leung KK, 2022, GAN-Based Detection of Adversarial EM Signal Waveforms, IEEE Military Communications Conference (MILCOM), Publisher: IEEE, ISSN: 2155-7578

Conference paper

Gkelias A, Panigrahy NK, Mobayenjarihani M, Leung KK, Towsley D, Baker PJ, Worthington O, Fowkes Let al., 2021, Resource management in Software Defined Coalitions (SDC) through slicing, IEEE Communications Society (ComSoc)/AFCEA/IEEE Military Communications Conference (MILCOM), Publisher: IEEE, Pages: 237-242, ISSN: 2155-7578

Future defence infrastructure systems will require increased flexibility and agility to respond to changing application goals, external threats and complex environments. A key enabler for such agility is Software Defined Coalitions (SDC), where the network comprise multiple domains of resources owned by different defence units (partners) but dynamically joined together to form an infrastructure for communications and computation. Software Defined (SD) Slicing aims to enable agile and near-real-time provision and configuration of “slices” of the infrastructure resources for supporting future communications and computing applications. An SD slice, makes use of the allocated resources distributed across several domains to support a set of applications including distributed analytic services. We present a 3-level control architecture with a global SD-slice controller at the top, domain controllers (DC) in the middle, and dynamic, end-to-end flow controllers at the bottom. Associated with each DC is a domain inference engine whose function is to estimate availability of various resources in that domain. Based on the inferred resource availability in domains, the global controller determines the feasibility of supporting a new SD slice and if so, allocates resources to achieve/maintain the required performance of all slices. Based on the resources allocated by the global controller, the slice controller is responsible for sharing the resources across domains among data/processing flows to optimize resource utility. The end-to-end flow controllers then allocate resources to data flows or processing tasks according to dynamic conditions of resources for efficiency and robustness.

Conference paper

Chen Z, Leung KK, Wang S, Tassiulas L, Chan Ket al., 2021, Robust solutions to constrained optimization problems by LSTM networks, IEEE Communications Society (ComSoc)/AFCEA/IEEE Military Communications Conference (MILCOM), Publisher: IEEE, Pages: 503-508, ISSN: 2155-7578

Many technical issues for communications and computer infrastructures, including resource sharing, network management and distributed analytics, can be formulated as optimization problems. Gradient-based iterative algorithms have been widely utilized to solve these problems. Much research focuses on improving the iteration convergence. However, when system parameters change, it requires a new solution from the iterative methods. Therefore, it is helpful to develop machine-learning solution frameworks that can quickly produce solutions over a range of system parameters. We propose here a learning approach to solve non-convex, constrained optimization problems. Two coupled Long Short Term Memory (LSTM) networks are used to find the optimal solution. The advantages of this new framework include: (1) near optimal solution for a given problem instance can be obtained in very few iterations (time steps) during the inference process, (2) the learning approach allows selections of various hyper-parameters to achieve desirable tradeoffs between the training time and the solution quality, and (3) the coupled-LSTM networks can be trained using system parameters with distributions different from those used during inference to generate solutions, thus enhancing the robustness of the learning technique. Numerical experiments using a dataset from Alibaba reveal that the relative discrepancy between the generated solution and the optimum is less than 1% and 0.1% after 2 and 12 iterations, respectively.

Conference paper

Zeng Q, Du Y, Huang K, Leung KKet al., 2021, Energy-efficient resource management for federated edge learning with CPU-GPU heterogeneous computing, IEEE Transactions on Wireless Communications, Vol: 20, Pages: 7947-7962, ISSN: 1536-1276

Edge machine learning involves the deployment of learning algorithms at the network edge to leverage massive distributed data and computation resources to train artificial intelligence (AI) models. Among others, the framework of federated edge learning (FEEL) is popular for its data-privacy preservation. FEEL coordinates global model training at an edge server and local model training at devices that are connected by wireless links. This work contributes to the energy-efficient implementation of FEEL in wireless networks by designing joint computation-and-communication resource management ( C2 RM). The design targets the state-of-the-art heterogeneous mobile architecture where parallel computing using both CPU and GPU, called heterogeneous computing , can significantly improve both the performance and energy efficiency. To minimize the sum energy consumption of devices, we propose a novel C2 RM framework featuring multi-dimensional control including bandwidth allocation, CPU-GPU workload partitioning and speed scaling at each device, and C2 time division for each link. The key component of the framework is a set of equilibriums in energy rates with respect to different control variables that are proved to exist among devices or between processing units at each device. The results are applied to designing efficient algorithms for computing the optimal C2 RM policies faster than the standard optimization tools. Based on the equilibriums, we further design energy-efficient schemes for device scheduling and greedy spectrum sharing that scavenges “spectrum holes” resulting from heterogeneous C2 time divisions among devices. Using a real dataset, experiments are conducted to demonstrate the effectiveness of C2 RM on improving the energy efficiency of a FEEL system.

Journal article

Dai Y, Guan YL, Leung KK, Zhang Yet al., 2021, Reconfigurable intelligent surface for low-latency edge computing in 6G, IEEE Wireless Communications Magazine, Vol: 28, Pages: 72-79, ISSN: 1070-9916

Edge computing, as one of the key technologies in 6G networks, establishes a distributed computing environment by deploying computation and storage resources in proximity to end users. However, the dense deployment of base stations, cellular dead zones, and high dynamics of mobile devices may cause serious interference issues and weak signal propagation, which will severely affect the transmission efficiency of edge computing and cannot support low-latency applications and services. Reconfigurable intelligent surface (RIS) is a new technology that can enhance the spectral efficiency and suppress interference of wireless communication by adaptively configuring massive low-cost passive reflecting elements. In this article, we introduce RIS into edge computing to support low-latency applications, where edge computing can alleviate the heavy computation pressure of mobile devices with ubiquitously distributed computing resources, and RIS can enhance the quality of the wireless communication link by intelligently altering the radio propagation environment. To elaborate the effectiveness of RIS for edge computing, we then propose a deep-reinforcement-learning-based computation offloading scheme to minimize the total offloading latency of mobile devices. Numerical results indicate that the RIS-aided scheme can improve wireless communication data rate and reduce task execution latency.

Journal article

Zhang Z, Ma L, Leung KK, Le Fet al., 2021, More is not always better: an analytical study of controller synchronizations in distributed SDN, IEEE/ACM Transactions on Networking, Vol: 29, Pages: 1580-1590, ISSN: 1063-6692

Distributed software-defined networks (SDN), consisting of multiple inter-connected network domains, each managed by one SDN controller, is an emerging networking architecture that offers balanced centralized control and distributed operations. In such networking paradigm, most existing works focus on designing sophisticated controller-synchronization strategies to improve joint controller-decision-making for inter-domain routing. However, there is still a lack of fundamental understanding of how the performance of distributed SDN is related to network attributes, thus impossible to justify the necessity of complicated strategies. In this regard, we analyse and quantify how the performance enhancement of distributed SDN architectures is influenced by inter-domain synchronization levels, in terms of the resulting number of abstracted routing clusters, and network structural properties. Based on a generic network model incorporating link preference for path constructions, we establish analytical lower bounds for quantifying the routing performance under any arbitrarily given network synchronization status. The significance of these performance bounds is that they can be used to quantify the contribution of controller synchronization levels in improving the network performance under different network parameters, which therefore serves as a fundamental guidance for future SDN performance analysis and protocol designs.

Journal article

Zafari F, Leung KK, Towsley D, Basu P, Swami A, Li Jet al., 2021, Let's share: a game-theoretic framework for resource sharing in mobile edge clouds, IEEE Transactions on Network and Service Management, Vol: 18, Pages: 2107-2122, ISSN: 1932-4537

Mobile edge computing seeks to provide resources to different delay-sensitive applications. This is a challenging problem as an edge cloud-service provider may not have sufficient resources to satisfy all resource requests. Furthermore, allocating available resources optimally to different applications is also challenging. Resource sharing among different edge cloud-service providers can address the aforementioned limitation as certain service providers may have resources available that can be “rented” by other service providers. However, edge cloud service providers can have different objectives or utilities . Therefore, there is a need for an efficient and effective mechanism to share resources among service providers, while considering the different objectives of various providers. We model resource sharing as a multi-objective optimization problem and present a solution framework based on Cooperative Game Theory (CGT). We consider the strategy where each service provider allocates resources to its native applications first and shares the remaining resources with applications from other service providers. We prove that for a monotonic, non-decreasing utility function, the game is canonical and convex. Hence, the core is not empty and the grand coalition is stable. We propose two algorithms, Game-theoretic Pareto optimal allocation (GPOA) and Polyandrous-Polygamous Matching based Pareto Optimal Allocation (PPMPOA) that provide allocations from the core. Hence the obtained allocations are Pareto optimal and the grand coalition of all the service providers is stable. Experimental results confirm that our proposed resource sharing framework improves utilities of edge cloud-service providers and application request satisfaction.

Journal article

Liu CH, Dai Z, Zhao Y, Crowcroft J, Wu D, Leung KKet al., 2021, Distributed and Energy-Efficient Mobile Crowdsensing with Charging Stations by Deep Reinforcement Learning, IEEE TRANSACTIONS ON MOBILE COMPUTING, Vol: 20, Pages: 130-146, ISSN: 1536-1233

Journal article

Zhang Z, Mudgerikar A, Singla A, Leung KK, Bertino E, Verma D, Chan K, Melrose J, Tucker Jet al., 2021, Reinforcement and transfer learning for distributed analytics in fragmented software defined coalitions, Conference on Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications III, Publisher: SPIE-INT SOC OPTICAL ENGINEERING, Pages: 1-11, ISSN: 0277-786X

By extending the Software Defined Networking (SDN), the Distributed Analytics and Information Sciences International Technology Alliance (DAIS ITA) https://dais-ita.org/pub has introduced a new architecture called Software Defined Coalitions (SDC) to share communication, computation, storage, database, sensor and other resources among coalition forces. Reinforcement learning (RL) has been shown to be effective for managing SDC. Due to link failure or operational requirements, SDC may become fragmented and reconnected again over time. This paper shows how data and knowledge acquired in the disconnected SDC domains during fragmentation can be used via transfer learning (TL) to significantly enhance the RL after fragmentation ends. Thus, the combined RL-TL technique enables efficient management and control of SDC despite fragmentation. The technique also enhances the robustness of the SDC architecture for supporting distributed analytics services.

Conference paper

Liu CH, Piao C, Ma X, Yuan Y, Tang J, Wang G, Leung KKet al., 2021, Modeling Citywide Crowd Flows using Attentive Convolutional LSTM, 37th IEEE International Conference on Data Engineering (IEEE ICDE), Publisher: IEEE COMPUTER SOC, Pages: 217-228, ISSN: 1084-4627

Author Web Link
Cite
Citations: 9

Conference paper

Tuor T, Wang S, Ko BJ, Liu C, Leung KKet al., 2021, Overcoming Noisy and Irrelevant Data in Federated Learning, 25th International Conference on Pattern Recognition (ICPR), Publisher: IEEE COMPUTER SOC, Pages: 5020-5027, ISSN: 1051-4651

Author Web Link
Cite
Citations: 24

Conference paper

Pritz PJ, Ma L, Leung KK, 2021, Jointly-Learned State-Action Embedding for Efficient Reinforcement Learning, 30th ACM International Conference on Information and Knowledge Management (CIKM), Publisher: ASSOC COMPUTING MACHINERY, Pages: 1447-1456

Conference paper

Pritz PJ, Perez D, Leung KK, 2020, Fast-fourier-forecasting resource utilisation in distributed systems, 29th International Conference on Computer Communications and Networks (ICCCN), Publisher: IEEE, Pages: 1-9, ISSN: 1095-2055

Distributed computing systems often consist of hundreds of nodes (machines), executing tasks with different resource requirements. Efficient resource provisioning and task scheduling in such systems are non-trivial and require close monitoring and accurate forecasting of the state of the system, specifically resource utilisation at its constituent machines. Two challenges present themselves towards these objectives.First, collecting monitoring data entails substantial communication overhead. This overhead can be prohibitively high, especially in networks where bandwidth is limited. Second, forecasting models to predict resource utilisation should be accurate and also need to exhibit high inference speed. Mission critical scheduling and resource allocation algorithms use these predictions and rely on their immediate availability.To address the first challenge, we present a communication-efficient data collection mechanism. Resource utilisation data is collected at the individual machines in the system and transmitted to a central controller in batches. Each batch is processed by an adaptive data-reduction algorithm based on Fourier transforms and truncation in the frequency domain. We show that the proposed mechanism leads to a significant reduction in communication overhead while incurring only minimal error and adhering to accuracy guarantees. To address the second challenge, we propose a deep learning architecture using complex Gated Recurrent Units to forecast resource utilisation. This architecture is directly integrated with the above data collection mechanism to improve inference speed of the presented forecasting model. Using two real-world datasets, we demonstrate the effectiveness of our approach, both in terms of forecasting accuracy and inference speed.Our approach resolves several challenges encountered in resource provisioning frameworks and can also be generically applied to other forecasting problems.

Conference paper

Panigrahy NK, Basu P, Nain P, Towsley D, Swami A, Chan KS, Leung KKet al., 2020, Resource allocation in one-dimensional distributed service networks with applications, Performance Evaluation, Vol: 142, Pages: 1-25, ISSN: 0166-5316

We consider assignment policies that allocate resources to users, where both resources and users are located ona one-dimensional line [0, ∞). First, we consider unidirectionalassignment policies that allocate resources only to users locatedto their left. We propose the Move to Right (MTR) policy, whichscans from left to right assigning nearest rightmost availableresource to a user, and contrast it to the Unidirectional GaleShapley (UGS) matching policy. While both policies among allunidirectional policies, minimize the expected distance traveledby a request (request distance), MTR is fairer. Moreover, weshow that when user and resource locations are modeled bystatistical point processes, and resources are allowed to satisfymore than one user, the spatial system under unidirectionalpolicies can be mapped into bulk service queueing systems, thusallowing the application of many queueing theory results thatyield closed form expressions. As we consider a case wheredifferent resources can satisfy different numbers of users, wealso generate new results for bulk service queues. We alsoconsider bidirectional policies where there are no directionalrestrictions on resource allocation and develop an algorithm forcomputing the optimal assignment which is more efficient thanknown algorithms in the literature when there are more resourcesthan users. Finally, numerical evaluation of performance ofunidirectional and bidirectional allocation schemes yields designguidelines beneficial for resource placement.

Journal article

Han P, Wang S, Leung KK, 2020, Capacity analysis of distributed computing systems with multiple resource types, IEEE Wireless Communications and Networking Conference (IEEE WCNC), Publisher: IEEE, Pages: 1-6, ISSN: 1525-3511

In cloud and edge computing systems, computation, communication, and memory resources are distributed across different physical machines and can be used to execute computational tasks requested by different users. It is challenging to characterize the capacity of such a distributed system, because there exist multiple types of resources and the amount of resources required by different tasks is random. In this paper, we define the capacity as the number of tasks that the system can support with a given overload/outage probability. We derive theoretical formulas for the capacity of distributed systems with multiple resource types, where we consider the power of d choices as the task scheduling strategy in the analysis. Our analytical results describe the capacity of distributed computing systems, which can be used for planning purposes or assisting the scheduling and admission decisions of tasks to various resources in the system. Simulation results using both synthetic and real-world data are also presented to validate the capacity bounds.

Conference paper

Zafari F, Li J, Leung KK, Towsley D, Swami Aet al., 2020, Optimal energy consumption for communication, computation, caching and quality guarantee, IEEE Transactions on Control of Network Systems, Vol: 7, Pages: 151-162, ISSN: 2325-5870

Energy efficiency is a fundamental requirement of modern data-communication systems, and its importance is reflected in much recent work on performance analysis of system energy consumption. However, most work has only focused on communication and computation costs without accounting for data caching costs. Given the increasing interest in cache networks, this is a serious deficiency. In this paper, we consider the problem of energy consumption in data communication, computation and caching (C3) with a quality-of-information (QoI) guarantee in a communication network. Our goal is to identify the optimal data compression rates and cache placement over the network that minimizes the overall energy consumption in the network. We formulate the problem as a mixed integer nonlinear programming (MINLP) problem with nonconvex functions, which is non-deterministic polynomial-time hard (NP-hard) in general. We propose a variant of the spatial branch-and-bound algorithm (V-SBB) that can provide an ϵ-global optimal solution to the problem. By extensive numerical experiments, we show that the C3 optimization framework improves the energy efficiency by up to 88% compared to any optimization that only considers either communication and caching or communication and computation. Furthermore, the V-SBB technique provides comparatively better solutions than some other MINLP solvers at the cost of additional computation time.

Journal article

Zhang Z, Ma L, Leung KK, Poularakis K, Srivatsa Met al., 2020, State Action Separable Reinforcement Learning, 8th IEEE International Conference on Big Data (Big Data), Publisher: IEEE, Pages: 123-132, ISSN: 2639-1589

Conference paper

Qin Q, Poularakis K, Leung KK, Tassiulas Let al., 2020, Line-Speed and Scalable Intrusion Detection at the Network Edge via Federated Learning, 19th IFIP Networking Conference (Networking), Publisher: IEEE, Pages: 352-360

Author Web Link
Cite
Citations: 27

Conference paper

Han P, Wang S, Leung KK, 2020, Adaptive Gradient Sparsification for Efficient Federated Learning: An Online Learning Approach, 40th IEEE International Conference on Distributed Computing Systems (ICDCS), Publisher: IEEE COMPUTER SOC, Pages: 300-310, ISSN: 1063-6927

Author Web Link
Cite
Citations: 35

Conference paper

Liu CH, Zhao Y, Dai Z, Yuan Y, Wang G, Wu D, Leung KKet al., 2020, Curiosity-Driven Energy-Efficient Worker Scheduling in Vehicular Crowdsourcing: A Deep Reinforcement Learning Approach, IEEE 36th International Conference on Data Engineering (ICDE), Publisher: IEEE COMPUTER SOC, Pages: 25-36, ISSN: 1084-4627

Author Web Link
Cite
Citations: 17

Conference paper

Leung K, Nazemi S, Swami A, 2019, Distributed optimization framework for in-network data processing, IEEE ACM Transactions on Networking, Vol: 27, Pages: 2432-2443, ISSN: 1063-6692

In-Network Processing (INP) is an effective way to aggregate and process data from different sources and forward the aggregated data to other nodes for further processing until it reaches the end user. There is a trade-off between energy consumption for processing data and communication energy spent on transferring the data. Specifically, aggressive data aggregation consumes much energy for processing, but results in less data for transmission, thus using less energy for communications, andvice versa. An essential requirement in the INP process is to ensure that the user expectation of quality of information (QoI) is delivered during the process. Using wireless sensor networks for illustration and with the aim of minimising the total energy consumption of the system, we study and formulate the trade-off problem as a nonlinear optimisation problem where the goal is to determine the optimal data reduction rate, while satisfying the QoI required by the user. The formulated problem is a Signomial Programming (SP) problem, which is a non-convex optimisationproblem and very hard to be solved directly. We propose two solution frameworks. First, we introduce an equivalent problem which is still SP and non-convex as the original one, but we prove that the strong duality property holds, and propose an efficient distributed algorithm to obtain the optimal data reduction rates, while delivering the required QoI. The second framework applies to the system with identical nodes and parameter settings. In such cases, we prove that the complexity of the problem can be reduced logarithmically. We evaluate our proposed frameworks under different parameter settings and illustrate the validity and performance of the proposed techniques through extensive simulation.

Journal article

Shanmukhappa T, Ho IW-H, Tse CK, Leung KKet al., 2019, Recent development in public transport network analysis from the complex network perspective, IEEE Circuits and Systems Magazine, Vol: 19, Pages: 39-65, ISSN: 1049-3654

A graph, comprising a set of nodes connected by edges, is one of the simplest yet remarkably useful mathematical structures for the analysis of real-world complex systems. Network theory, being an application-based extension of graph theory, has been applied to a wide variety of real-world systems involving complex interconnection of subsystems. The application of network theory has permitted in-depth understanding of connectivity, topologies, and operations of many practical networked systems as well as the roles that various parameters play in determining the performance of such systems. In the field of transportation networks, however, the use of graph theory has been relatively much less explored, and this motivates us to bring together the recent development in the field of public transport analysis from a graph theoretic perspective. In this paper, we focus on ground transportation, and in particular the bus transport network (BTN) and metro transport network (MTN), since the two types of networks are widely used by the public and their performances have significant impact to people's life. In the course of our analysis, various network parameters are introduced to probe into the impact of topologies and their relative merits and demerits in transportation. The various local and global properties evaluated as part of the topological analysis provide a common platform to comprehend and decipher the inherent network features that are partly encoded in their topological properties. Overall, this paper gives a detailed exposition of recent development in the use of graph theory in public transport network analysis, and summarizes the key results that offer important insights for government agencies and public transport system operators to plan, design, and optimize future public transport networks in order to achieve more efficient and robust services.

Journal article

Zhang Z, Ma L, Poularakis K, Leung KK, Tucker J, Swami Aet al., 2019, MACS: deep reinforcement learning based SDN controller synchronization policy design, 27th IEEE International Conference on Network Protocols (IEEE ICNP), Publisher: IEEE COMPUTER SOC, Pages: 1-11, ISSN: 1092-1648

In distributed software-defined networks (SDN), multiple physical SDN controllers, each managing a network domain, are implemented to balance centralised control, scalability, and reliability requirements. In such networking paradigms, controllers synchronize with each other, in attempts to maintain a logically centralised network view. Despite the presence of various design proposals for distributed SDN controller architectures, most existing works only aim at eliminating anomalies arising from the inconsistencies in different controllers' network views. However, the performance aspect of controller synchronization designs with respect to given SDN applications are generally missing. To fill this gap, we formulate the controller synchronization problem as a Markov decision process (MDP) and apply reinforcement learning techniques combined with deep neural networks (DNNs) to train a smart, scalable, and fine-grained controller synchronization policy, called the Multi-Armed Cooperative Synchronization (MACS), whose goal is to maximise the performance enhancements brought by controller synchronizations. Evaluation results confirm the DNN's exceptional ability in abstracting latent patterns in the distributed SDN environment, rendering significant superiority to MACS-based synchronization policy, which are 56% and 30% performance improvements over ONOS and greedy SDN controller synchronization heuristics.

Conference paper

Tuor T, Wang S, Leung KK, Ko BJet al., 2019, Online collection and forecasting of resource utilization in large-scale distributed systems, 39th IEEE International Conference on Distributed Computing Systems (ICDCS), Publisher: IEEE COMPUTER SOC, Pages: 133-143, ISSN: 1063-6927

Large-scale distributed computing systems often contain thousands of distributed nodes (machines). Monitoring the conditions of these nodes is important for system management purposes, which, however, can be extremely resource demanding as this requires collecting local measurements of each individual node and constantly sending those measurements to a central controller. Meanwhile, it is often useful to forecast the future system conditions for various purposes such as resource planning/allocation and anomaly detection, but it is usually too resource-consuming to have one forecasting model running for each node, which may also neglect correlations in observed metrics across different nodes. In this paper, we propose a mechanism for collecting and forecasting the resource utilization of machines in a distributed computing system in a scalable manner. We present an algorithm that allows each local node to decide when to transmit its most recent measurement to the central node, so that the transmission frequency is kept below a given constraint value. Based on the measurements received from local nodes, the central node summarizes the received data into a small number of clusters. Since the cluster partitioning can change over time, we also present a method to capture the evolution of clusters and their centroids. As an effective way to reduce the amount of computation, time-series forecasting models are trained on the time-varying centroids of each cluster, to forecast the future resource utilizations of a group of local nodes. The effectiveness of our proposed approach is confirmed by extensive experiments using multiple real-world datasets.

Conference paper

Zhang Z, Ma L, Leung KK, Le F, Kompella S, Tassiulas Let al., 2019, How advantageous Is It? An analytical study of controller-assisted path construction in distributed SDN, IEEE ACM Transactions on Networking, Vol: 27, Pages: 1643-1656, ISSN: 1063-6692

Distributed software-defined networks (SDN), consisting of multiple inter-connected network domains, each managed by one SDN controller, is an emerging networking architecture that offers balanced centralized control and distributed operations. Under such a networking paradigm, most existing works focus on designing sophisticated controller-synchronization strategies to improve joint controller-decision-making for inter-domain routing. However, there is still a lack of fundamental understanding of how the performance of distributed SDN is related to network attributes, thus it is impossible to justify the necessity of complicated strategies. In this regard, we analyze and quantify the performance enhancement of distributed SDN architectures, which is influenced by intra-/inter-domain synchronization levels and network structural properties. Based on a generic network model, we establish analytical methods for performance estimation under four canonical inter-domain synchronization scenarios. Specifically, we first derive an asymptotic expression to quantify how dominating structural and synchronization-related parameters affect the performance metric. We then provide performance analytics for an important family of networks, where all links are of equal preference for path constructions. Finally, we establish fine-grained performance metric expressions for networks with dynamically adjusted link preferences. Our theoretical results reveal how network performance is related to synchronization levels and intra-/inter-domain connections, the accuracy of which is confirmed by simulations based on both real and synthetic networks. To the best of our knowledge, this is the first work quantifying the performance of distributed SDN in terms of network structural properties and synchronization levels.

Journal article

ProfessorKinLeung

Contact

Assistant

Location

Summary