Abstract: Collaborative cross-edge analytics is a new computing paradigm in which Internet of Things (IoT) data analytics is performed across multiple geographically dispersed edge clouds. Existing work on collaborative cross-edge analytics mostly focuses on reducing either analytics response time or wide-area network (WAN) traffic volume. In this work, we empirically demonstrate that reducing either analytics response time or network traffic volume does not necessarily minimize the WAN traffic cost, due to the price heterogeneity of WAN links. To explicitly leverage the price heterogeneity for WAN cost minimization, we propose to schedule analytic tasks based on both price and bandwidth heterogeneities. Unfortunately, the problem of WAN cost minimization underperformance constraint is shown non-deterministic polynomial (NP)-hard and thus computationally intractable for large inputs. To address this challenge, we propose price- and performanceaware geo-distributed analytics (PPGA) , an efficient task scheduling heuristic that improves the cost-efficiency of IoT data analytic jobs across edge datacenters. We implement PPGA based on Apache Spark and conduct extensive experiments on Amazon EC2 to verify the efficacy of PPGA.
Keywords: collaborative cross-edge analytics; Internet of Things; task scheduling