NeTS: Small: Learning-Guided Network Resource Allocation: A Closed-Loop Approach

This is an NSF sponsored project under the scope of data-driven networking.

Project Team

Personnel

Prof. Xin Liu (PI)

Dr. Huasen Wu (Former coPI)

Dr. Xueying Guo (Postdoc researcher)

Dr. Jiaxin Ding (Postdoc researcher)

Mr. Xiaoxiao Wang (PhD student)

Mr. Shahbaz Rezeai (PhD student)

Mr. Albara Ah Ramli (former undergraduate researcher and current PhD student)

Mr. Cazamere Comrie (Undergraduate researchers)

Thomas Munduchira (Undergraduate researchers)

Bryce Kroencke (Undergraduate researchers)

Faraz Cherukattil (Undergraduate researchers)

Collaborators

Rajarajan Sivaraj, Jing Wang, Jie Chuai, Zhitang Chen, Guochen Liu, Chongming Zhu, Feiyi Shen, and Amit Pande.

Project Goals

Based on network measurement and user behavior data, much recent work has studied the modeling and prediction of network utility and user experience using machine learning techniques. While it provides important insights, prediction itself is often not the ultimate goal in networks. Ideally, a network could identify users with poor experience and take proper actions to proactively improve the overall performance. To achieve this goal, the project advocates a learning-based approach that uses learned utility functions to guide resource allocation and management in networks. Utilizing this proposed framework is highly challenging due to the unknown and noisy nature of the network utility function, and in the context of high dimensionality, limited exploration constraints, and non-convex optimization. To address these challenges, we adopt and develop techniques from convex and nonconvex optimizations, multi-armed bandits, other sequential decision techniques, as well as heuristic algorithms. We conduct both theoretical analysis and empirical evaluations on the proposed schemes.

Final Outcome

Cellular network configuration plays a critical role in network performance. In current practice, network configuration depends heavily on field experience of engineers and often remains static for a long period of time. This practice is far from optimal. To address this limitation, online-learning-based approaches have great potentials to automate and optimize network configuration. However, three major challenges facing learning-based approaches are: 1) learning a highly complex function for each base station; 2) the network has highly limited budget for exploration, and thus imposes significant difficulties in balancing the fundamental exploration-exploitation tradeoff; and 3) the configuration of one cell can interfere the performance of neighboring cells and thus results in complex interactions. To address this challenge, we have developed multiple approaches: 1) a joint-optimization approach based on learned utility functions; 2) kernel-based multi-BS contextual bandit algorithm based on multi-task learning. And 3) propose a collaborative learning approach to leverage data from different cells to boost the learning efficiency and to improve network performance. We evaluate our proposed algorithm via a simulator constructed using real network data and demonstrates faster convergence compared to baselines. More importantly, a live field test is also conducted on a real metropolitan cellular network consisting 1700+ cells to optimize 5 parameters for 2 weeks. Our proposed algorithm shows a significant performance improvement of 20%.

Motivated by the study of learning algorithms in cellular network configuration, we propose opportunistic contextual bandits - a special case of contextual bandits where the exploration cost varies under different environmental conditions, such as network load variation. When the exploration cost is low, so is the actual regret of pulling a sub-optimal arm (e.g., trying a suboptimal configuration). Therefore, intuitively, we could explore more when the exploration cost is relatively low and exploit more when the exploration cost is relatively high. Inspired by this intuition, for opportunistic contextual bandits with linear payoffs, we propose an Adaptive Upper-Confidence-Bound algorithm (AdaLinUCB) to adaptively balance the exploration-exploitation trade-off for opportunistic learning. We show the superiority of the proposed algorithm theoretically and empirically.

As mobile networks proliferate, we are experiencing a strong diversification of services, which requires greater flexibility from the existing network. Network slicing is proposed as a promising solution for resource utilization in 5G and future networks to address this dire need. In network slicing, dynamic resource orchestration and network slice management are crucial for maximizing resource utilization. Unfortunately, this process is too complex for traditional approaches to be effective due to a lack of accurate models and dynamic hidden structures. We formulate the problem as a Constrained Markov Decision Process (CMDP) without knowing models and hidden structures. Additionally, we propose to solve the problem using CLARA, a Constrained reinforcement LeArning based Resource Allocation algorithm. In particular, we analyze cumulative and instantaneous constraints using adaptive interior-point policy optimization and projection layer, respectively. Evaluations show that CLARA clearly outperforms baselines in resource allocation with service demand guarantees. Motivated by this need, we have also developed constrained reinforcement learning algorithms that is generally applicable to other applications. In addition, we also developed a constrained reinforcement learning algorithm, called IPO, and conducted a survey of constrained RL.

Broader Impact:

The PI or her students had presented the work at the multiple international conferences, both in networking field and in machine learning field. Such efforts broaden the participation and interaction between the networking and machine learning community. The PI presented the results to collaborators in academia and industry, including AT&T, Verizon, Intel, and Target, which enhances the interaction between academia and industry.

This project enabled partially the training of two postdoc researchers, five PhD students, and six undergraduate students. The PI has made a significant effort to recruit and train underrepresented minority students, including five female students and one black student. These trainees have developed important technical skills, communication skills, collaboration skills, and leadership skills. One trainee has joined the academia as a faculty member.

Accomplishments

Major Activities

In this report period, we have started to study learning-based approaches for resource allocation and management in network slicing, a promising solution for resource utilization, in 5G and future networks. Specifically, with the proliferation of mobile networks, we face strong diversification of services, demanding the current network to embed more flexibility. To satisfy this dire need, network slicing is considered by both academia and industry. In network slicing, dynamic resource orchestration and network slice management are critical for resource efficiency. However, it is highly complicated such that the traditional approaches cannot effectively perform resource orchestration due to the lack of accurate models and hidden problem structures. First, traditional optimization approaches require accurate mathematical models with parameters known, which is often difficult to achieve in practice, especially with the increasing complexity, scale and service diversity of the 5G and future networks. Second, traditional methods do not adapt to epistemic uncertainty, exhibited as hidden structures in networks, due to a lack of knowledge and subsequent ability to explore and learn from the studied system. For example, a user, experiences poor quality of service, may decide not to use the service again or at a reduced frequency. Such hidden structures can significantly affect user experience and network performance, but are usually not directly observed/modeled.

To address this challenge, we propose learning-based approaches because they can explore and learn from the environment without assuming the knowledge of accurate models. The industry has recognized machine learning (ML) as a core technology for future telecommunication networks, including 5G and beyond. Recently, there have been growing learning-based network research works showing significant performance improvement. However, few previous works have analyzed the resource allocation problem with the constraints imposed by the service requirements, which is the crucial for network slicing.

To address this challenge, we first formulate the problem as a reinforcement learning problem with constraints. RL with constraints is usually modeled as a Constrained Markov Decision Process (CMDP), where the agent must act with respect to constraints, in addition to reward maximization. There are two types of constraints: instantaneous constraints (e.g., maximum transmission power) and cumulative constraints (e.g., average latency). An instantaneous constraint is a constraint that the chosen action needs to satisfy in each step. A cumulative constraint requires that the sum of one constraint variable from the beginning to the current time step is within a certain limit. We have developed a framework called CLARA, a Constrained reinforcement LeArning based Resource Allocation algorithm. In particular, we analyze cumulative and instantaneous constraints using adaptive interior-point policy optimization and projection layer, respectively. Evaluations show that CLARA clearly outperforms baselines in resource allocation with service demand guarantees.

We propose a first-order optimization method, Interior-point Policy Optimization (IPO), to solve CMDPs with different types of cumulative constraints. Specifically, inspired by the interior-point method, we augment the objective function of IPO with logarithmic barrier functions as penalty functions to accommodate the constraints. Intuitively, we would like to construct functions such that 1) if a constraint is satisfied, the penalty added to the reward function is zero, and 2) if the constraint is violated, the penalty goes to negative infinity. The logarithmic barrier functions satisfy these requirements, are easy to implement, and also provide nice analytical properties. For policy optimization, we leverage PPO and thus inherit its trust region property. We note that other policy optimization algorithms can be integrated when needed, which increases the flexibility of the proposed methodology. Our algorithm is simple to implement and the hyperparameters are easy to tune. We conduct extensive evaluations to compare our approach with state-of-the-art baselines in constrained reinforcement learning. Our algorithm outperforms the baseline algorithms, in terms of reward maximization and constraint satisfaction.

Communities of interest

The PI or her students presented their work at the multiple international conferences, both in networking field and in machine learning field. The PI presented the results to collaborators in industry, including AT&T, Target, and Verizon.

The impact on the development of the principal discipline(s) of the project

The results of this project help the community better understand how to integrate learning into network resource management, from both theoretical and practical aspects.

The impact on other disciplines

Our developed algorithm and analysis further advances the state-of-the-art in machine learning, in particular, multi-armed bandit algorithms.

The impact on the development of human resources

The project involves the PI, two postdoc researchers, five PhD students, and six undergraduate researchers. All of them have gained significant experience in terms of collaboration, communication, and leadership skills, in addition to technical skills.

As specified in the proposal, the PI has made significant effort in recruiting and training undergraduate researchers, as well as researchers from under-represented group, including an African American student. The PI has also extended offers to a Hispanic student and a female student.

Products

Related publications:

CLARA: A Constrained Reinforcement Learning Based Resource Allocation Framework for Network Slicing [pdf]
Yongshuai Liu, Jiaxin Ding, Zhi-Li Zhang, and Xin Liu
IEEE International Conference on Big Data (IEEE BigData), 2021.
Policy Learning with Constraints in Model-free Reinforcement Learning: A Survey [pdf]
Yongshuai Liu, Avishai Halev, and Xin Liu
The 30th International Joint Conference on Artificial Intelligence (IJCAI), 2021
Can Online Learning Increase the Reliability of Extreme Mobility Management? [pdf]
Y. Li, E. Datta, J. Ding, N. B. Shroff, and X. Liu
IEEE/ACM International Symposium on Quality of Service, 2021.
On the Difficulty of Membership Inference Attacks [pdf]
Shahbaz Rezaei, Xin Liu
International Conference on Computer Vision and Pattern Recognition (CVPR) 2021
Adaptive Federated Dropout: Improving Communication Efficiency and Generalization for Federated Learning [pdf]
Nader Bouacida, Jiahui Hou, Hui Zang, Xin Liu, arXiv.
IEEE INFOCOM Workshop on Distributed Machine Learning and Fog Networks 2021.
IPO: Interior-point Policy Optimization under Constraints [pdf]
Yongshuai Liu, Jiaxin Ding, Xin Liu
The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI), 2020.
A Target-Agnostic Attack on Deep Models: Exploiting Security Vulnerabilities of Transfer Learning [pdf]
Shahbaz Rezaei, Xin Liu
The International Conference on Learning Representations (ICLR), 2020.
AdaLinUCB: Opportunistic Learning for Contextual Bandits [pdf]
Xueying Guo, Xiaoxiao Wang, and Xin Liu
the 28th International Joint Conference on Artificial Intelligence (IJCAI), 2019.
Kernel-based Multi- Task Contextual Bandits in Cellular Network Configuration [pdf]
XiaoxiaoWang, Xueying Guo, Jie Chuai, Zhitang Chen, and Xin Liu
IEEE BigData Conference, 2019.
Adaptive Learning- Based Task Offloading for Vehicular Edge Computing Systems [pdf]
Y. Sun, X. Guo, J. Song, S. Zhou, Z. Jiang, X. Liu, and Z. Liu, arXiv.
Large-scale Mobile App Identification Using Deep Learning [pdf]
Shahbaz Rezaei, Bryce Kroencke, Xin Liu
IEEE Access, 2019.
A Collaborative Learning Based Approach for Parameter Configuration of Cellular Networks [pdf]
Jie Chuai, Zhitang Chen, Guochen Liu, Xueying Guo, Xiaoxiao Wang, Xin Liu, Chongming Zhu, Feiyi Shen.
Infocom 2019.
How to Achieve High Classification Accuracy with Just a Few Labels: A Semi-supervised Approach Using Sampled Packets [pdf]
Shahbaz Rezaei and Xin Liu.
19th Industrial Conference on Data Mining (ICDM), 2019.
Deep Learning for Encrypted Traffic Classification: An Overview [pdf]
Shahbaz Rezaei, Xin Liu.
IEEE Communications Magazine, 2019.

Outreach and Broader Impacts

As specified in the proposal, the PI and her team have been working on generating broader impacts through this project. The PI and her students presented their work at the multiple international conferences, both in networking field and in machine learning field. The PI presented the results to collaborators in industry, including AT&T, Target, and Verizon. The PI has made significant effort in recruiting and training undergraduate researchers, as well as researchers from under-represented group. Currently, one of the undergraduate researchers is from an under-represented minority group. The PI has also continued her effect in recruiting and training female and under-represented minority students. For example, she extended offers to a Hispanic student and a female student. The PI has continued her effect in recruiting and training undergraduate researchers. Currently, four undergraduate researchers work in the group.