Modeling TCP Incast Issue in Data Center Networks and an Adaptive Application-Layer Solution

更新时间：2016-07-05

1.Introduction

In data centers, an application often simultaneously requests data from numerous servers. This results in a many-to-one traffic pattern where multiple servers concurrently send data fragments to a single client via a bottleneck switch. For instance, in web search, a search query is partitioned and assigned to many workers, and then the workers’ responses are aggregated to generate the final result. As the number of concurrent senders increases, the bottleneck switch buffer which is usually shallow[1] can be easily overflowed, leading to massive packet losses and subsequent transmission control protocol （TCP） timeouts. As the minimum retransmission timeout (minRTO) is much greater than the round-trip delay in data centers, even one timeout remarkably prolongs the overall data transmission time. Hence, the client’s goodput decreases to be lower than the link capacity by one or even two orders of magnitude. Such catastrophic TCP goodput collapse for applications with many-to-one traffic pattern is referred to as TCP incast[1].

Many solutions to the TCP incast issue have been proposed from the aspects of different layers. For instance, at the Ethernet layer, the fully-qualified class name (FQCN)[2] uses explicit network feedback to control congestion among switches;at Internet protocol (IP) layer, [3] explores the effectiveness of tuning explicit congestion notification (ECN) to mitigate incast,and the cyclic prefix (CP)[4] drops only the packet payload instead of the entire packet upon congestion to reduce timeout possibility; at the transport layer, data center TCP (DCTCP)[5],predicted rice interactome network (PRIN)[6], and pFabric[7]decrease the timeout possibility through the cooperation of end-hosts and network; by adding a “shim layer” to the data receiver, incast congestion control for TCP (ICTCP)[8], proxy auto config (PAC)[9] and deadline and incast aware TCP(DIATCP)[10] proactively adjust the in-flight data size to reduce the packet loss; and at the application layer, [1] and [11]-[13]restrict the number of concurrent connections to avert incast.Among these proposals, the application-layer solutions are most practical for their low deployment barrier and minimal impact on ordinary one-to-one applications. Indeed, Ethernetlayer solutions require hardware revisions not supported by current commodity switches, and transport-layer and shimlayer solutions may cause fairness issues to ordinary applications running on regular TCP. In contrary, application-layer solutions merely regulate the data transfer of applications with the manyto-one traffic pattern, so they are easy to deploy and pose no side effect on ordinary applications.

基于Arduino的车内儿童防误锁报警系统工作过程为：首先，系统初始化，启动车门锁传感器检测，检测数据；然后，由串行口输入Arduino，Arduino根据检测车门是否锁死结果的来决定热释电红外传感器是否工作。如检测到车里有儿童被误锁，则Arduino控制声光报警器工作，车窗自动下降，并且利用GSM模块向驾驶员发送信息。基于Arduino的车内儿童防误锁报警系统流程图如图8所示。

Despite of the numerous application-layer solutions in literature, few works analytically study how application’s regulation on data transfer can affect incast. Currently, there are several analytical models for the TCP incast problem[1],[6],[11],[14]. However, most of them either ignore the possible existence of background TCP traffic[1],[11], or oversimplify the application layer as a dumb data source/sink without the ability to control data transfers[6],[14]. Therefore, the existing models provide few useful insights into addressing incast from the application layer, which explains why the current application-layer solutions can only avert incast in known and predefined network environments, e.g., the bottleneck link is the last-hop link to the receiver[12],[13], or the available bottleneck bandwidth is known by the receiver[1],[11]).But in real data centers, many network parameters may change drastically and cannot be known in prior. For instance, the bottleneck link often varies due to load balancing, and available bandwidth may fluctuate drastically in the presence of background TCP traffic. In these varying environments, the current solutions often fail to effectively prevent TCP incast, as demonstrated in Section 4.

In this paper, we intend to understand and solve the TCP incast problem from the viewpoint of applications. Toward this goal, we first develop an analytical model to comprehensively investigate the impact of the application layer on incast.Then under the model’s enlightenment, we propose an adaptive application-layer control mechanism to eliminate incast.

Since incast is essentially caused by TCP timeout, here we focus on modeling the relationship between the appearance of timeout and various factors related to applications, including the network environment and connection variables. Compared with the existing models[1],[6],[11],[14], our model is based on more general assumptions about TCP behaviors, and it considers the impact of background TCP connections on incurring timeout.In addition, integrated with the optimization theory, the model provides some insightful guidelines for dynamically tuning connection variables to minimize the incast probability in a wide range of network environments.

According to the theoretical results, the crux of avoiding timeout is to adaptively adjust the number of concurrent connections and equally allocate the sending rate to connections. Following this idea, we design an applicationlayer incast probability minimization mechanism (AIPM),which only modifies receiver-side applications to avert timeout. With AIPM, a client (receiver) concurrently sets up a small number of connections, and assigns an equal TCP advertised window (awnd) to each connection. After a connection finishes data transferring, a new connection can be started. The regulation of awnd is based on network settings and the amount of concurrent connections. The concurrency amount is decided by a sliding connectionwindow mechanism similar to TCP’s congestion control. By this mechanism, the connection-window size grows gradually upon a successful data transmission and shrinks as a connection is “lost”. A connection is considered as the lost connection when three new connections have finished since the last time it transmits new data. The lost connection is terminated at once and will be re-established when allowed by the connection-window.

The simulations show that AIPM effectively eludes incast and consistently achieves high performance in various scenarios. Particularly, in a leaf-spine-topology network with dynamic background TCP traffic, AIPM’s goodput is above 68% of the bottleneck capacity, while the proposals in [1]and [11] both suffer from rather low goodput (＜ 30%) due to incast.

In this subsection, we fix the SRU size of each server to 256 kB, and investigate the goodput of AIPM in the two aforementioned network scenarios, respectively.

The major contributions of this paper are two-folds:

● Build an analytical model to disclose the influence of network environment and connection variables on the occurrence of incast. The model provides some insightful guidelines for tuning connection variables to minimize incast possibility.

● Design an adaptive application-layer solution to the TCP incast problem. To our best knowledge, this solution is the first application-layer solution that can efficiently control incast in network environments with dynamic background TCP traffic and multiple bottleneck links.

The rest of the paper is organized as follows. Section 2 describes our model for incast probability and provides some insightful guidelines for taming incast. Section 3 proposes an adaptive application-layer solution to incast, namely, AIPM. In Section 4 we evaluate AIPM in various scenarios using NS2 simulations. Section 5 concludes the paper.

2.3 两组患者实验室观察指标的比较结果(表1)表明：2012年12月，两组KT/V、HGB、Alb、PTH水平差异无统计学意义。2013年12月，两组KT/V、HGB、Alb、PTH水平差异无统计学意义。2014年12月，两组KT/V、HGB、Alb、PTH水平差异无统计学意义。2015年12月，两组KT/V、Alb、PTH水平差异无统计学意义；内瘘组HGB高于导管组(P<0.05)。2012年12月至2015年12月，两组患者KT/V、HGB、Alb、PTH水平变化均不明显；两组间上述指标变化差异无统计学意义。

2.Modeling and Minimizing TCP Incast Probability

As a data requestor, the receiver-side application (i.e., the client) can implicitly manage data transmission by adjusting connection variables (including the number of concurrent connections and the sending window size for each connection).This fact enlightens us to model the TCP incast probability as a function of connection variables. Based on this model, we derive the conditions on which the incast risk is minimal.

2.1　Notations and Assumptions

（2）活性材料投加量的确定。取3号废泥浆样品加入 AP2.0%、水泥20.0%、促凝增强剂CA 5.0%及相应添加剂，HHJ投加量不同，考察固化改良后浸出液主要指标，试验结果见表9。通过数据分析可以看出，当HHJ投加量达到10.0%时，固化改良物浸出液的各项指标都能够到要求。再增加投加量，处理效果变化不大，因此选择HHJ活性材料的加量为10.0%。

Then consider the TCP incast scenario in Fig. 1. Let R be the number of rounds for transmitting data. During the kth round, there are n(k) servers sending data fragments, formally termed as a sever request unit (SRU) to a client via a common bottleneck link. The ith server’s sending rate is xi(k), and its sending window is wi(k), for 1≤i≤n. The bottleneck link is also shared by m(k) background connections from other applications,whose sending rates are yj(k) for. For the bottleneck link, its capacity is C packets per second, its buffer size is B packets, and the propagation RTT is D. The rest notations are summarized in Table 1.

We also make some assumptions to facilitate modeling.Firstly, we assume that the queuing policy of the bottleneck link is drop-tail. Secondly, we assume the spare bottleneck buffer is ignorable compared with the sum of the sending windows. This assumption is reasonable due to the “buffer pressure” phenomenon and the fact that switches’ buffers are usually very shallow in data centers[1]. Thirdly, we assume that a connection sees timeout only if the entire window of packets are lost. Most researches[5],[14] have shown that the full-window loss is the dominating kind of timeout that causes TCP incast, thus other kinds of timeout are trivial while modeling incast. At last, we assume that minRTO =200 ms by default, which is significant for the overall transmission time of the requested data block, so even one timeout leads to TCP incast with drastic goodput degradation.

Fig. 1. General scenario of TCP incast, where multiple servers concurrently transmit data fragraments (SRU) to a single client.

Table 1: Meanings of the commonly used parameters

Symbol　Meaning xi　Sending rate of a given connection wi　Sending window size of a given connection x Set of all existing connections’ sending rates X Sum of the sending rates of existing connections Y Sum of the sending rates of all background connections n Number of concurrently existing connections pi　Packet loss rate of a given connection awndi　Advertised window size of a given connection B Bottleneck buffer size C Bottleneck capacity D Round-trip delay without queuing

2.2　Probability of TCP Incast

Now we begin to model the probability of TCP incast as a function of current network condition (i.e., B, C, D, and yi(k))and connection context variables (i.e., n(k) and xi(k)).

First, recall that the client will suffer from incast if it has at least one timeout during R rounds, so the incast probability can be expressed by

2) How can AIPM know the round-trip propagation delay D? Data center’s network settings (i.e., hardware, framework,topology, etc.) are usually stable over a relatively long period.Thereby, the network administrator can measure D offline and feed the value to AIPM every time network settings change.

The timeout possibility Pi(k) of a connection is determined jointly by the sending window wi(k) and the packet loss rate pi(k) of that connection. Specifically, we consider full window loss as the only cause to timeout, thus Pi(k) equals to

The next task is to derive the packet drop rate pi(k). As we assumed previously, the spare bottleneck buffer is ignorable compared with the sum of the servers’ sending windows,which means that the bottleneck link would drop packets once the servers start transmitting data. We denoteas the sum of the servers’ sending rates,and 　 as the sum of the background connections’ sending rates. During the kth round, packets are injected into the network at the total rate of , and are processed by the bottleneck link at the rate of C.Henceforth, the drop rate pi(k) is

开发和利用资源是小学教育信息化工作的开展基础，作为地方行政部门和教育部门，应当以区域信息技术教育中心为基础，将信息资源给予各个小学，并加大投入鼓励学校建设校园网，如果没有条件的小学可以通过多媒体教学的方式来推动信息教学的发展。

Moreover, a connection’s sending window wi(k) is related with its sending rate xi(k) and RTT D as

Eventually, by substituting (2) to (4) into (1), we derive the TCP incast probability as an analytical function of network condition (i.e., B, C, D, and Y(k)) and connection variables (i.e.,n(k) and xi(k)) as follows:

2.3　Minimizing Incast Probability

As (5) indicates, to minimize the incast probability Pincast,we must minimize the timeout probability in every round. This is to maximize the timeout-free possibility that no connection ever incurs timeout in any round k as follows

Here we explore how could the client maximize the timeout-free probability (6) through adjusting the sending ratesand the connection amount n(k).To reveal the individual impact of a parameter on timeout, we analyze it by keeping other parameters unchanged. Because our analysis below focuses on maximizing (6) in every single round, we simply omit the round number k in the notations, like n(k) to n.

First of all, we put forward an important concept related to data transfers of concurrently living connections, called round.The first round starts from the beginning of data transmission from an endpoint, and lasts one round-trip time (RTT). The next round starts from the end of last round and lasts one RTT.

1) Adjusting sending sizes, x: We fix other parameters. Then the maximization of the timeout-free probability in (6) is to solve the optimization problem below:

It is straightforward to check that the Hessian matrix of−ln[f(x)] is positive semi-definite over the region x≥0, which means that ln[f(x)] is a concave function. This gives our analysis a great convenience. Specifically, the method of the Lagrange multiplier states that ln[f(x)] is globally maximized by the sending rate allocation if and only if

Finally, since TCP’s sending window size is upperbounded by the advertised window size (awnd), AIPM can emulate the equal sending window assignment in (13) by assigning an identical awnd to its connections as

立体绿化在城市绿化中具有重要的作用，它可以维护我国城市的生态平衡，促进人与自然的和谐发展。桂林城市立体绿化的程度对桂林的旅游城市形象有积极的影响。立体绿化也是我国城市绿化发展的必然趋势，相关部门以及工作人员应提高重视程度，对立体绿化的形式与具体应用进行分析，对现阶段我国立体绿化在城市绿化中所存在的问题进行探讨，采取宣传立体绿化理念、构建复层的群落结构、尊重经济等原则，结合先进科技等措施，有针对性地解决立体绿化发展过程中所存在的问题，为我国建设生态文明社会奠定良好的基础。

The unique solution of (8) that maximizes ln[f(x)] is

which also minimizes the incast probability (6).

Remark 1: To minimize the incast risk, the connections should always be given a same sending rate. This operation is feasible at the client application, as the client knows the concurrent connections’ number n, and it is able to implicitly control each connection’s sending rate xi through modifying the TCP advertised window field in acknowledgement (ACK)packets.

Remark 2: While the sending rates’ sum X is assumed as a constant for deriving (9), actually the client can change X by tuning the sending rates. But as proved in Appendix,the optimal X that can minimize the incast probability is dependent on the background traffic Y. Since the client does not know Y, it is unable to properly set X to prevent incast.

2) Adjusting the concurrent connections’ number, n: We fix other parameters . According to (6) and (9),we optimally set the sending rates to be xi=X/n for 1≤i≤n and rewrite the timeout-free probability as

Let , which is a constant in this case. To maximize f(n), we calculate the first derivative of ln[f(n)]:

which is always negative since p lies in (0, 1). This suggests that the timeout-free probability (10) decreases with the concurrent connection number n. Henceforth, the incast probability is an increasing function of n and will be minimized at n=1.

国家发改委表示，把长江干流和重要支流沿线，丹江口库区、南水北调水源及沿线、三峡库区及其上游等重大工程区域，鄱阳湖、洞庭湖、洱海、滇池、巢湖、太湖和千岛湖等汇水区，重要饮用水水源地等敏感区域作为重点治理区域，以县为单位集中连片开展农业农村面源污染全覆盖、“拉网式”治理。

Remark: The client should lower the number of concurrent connections n to reduce the incast risk. On the other hand, an excessively small n may cause a great waste of bandwidth in the cases where each connection has so little data to send SRU that it finishes sending before fully utilizing the bandwidth.How to properly set n will be discussed in the next section.

3.Minimizing Incast Probability at Application Layer

Based on the analyses of (9) and (11), we propose an AIPM scheme. The AIPM is implemented at the client application, and it minimizes the incast probability via equally allocating advertised windows of concurrent connections and dynamically adjusting concurrent connections’ amount.

3.1　Allocate Equal Advertised Window to Connections

As (9) indicates, the risk of TCP incast is minimal if the existing connections have an equal sending rate. However,AIPM is essentially a part of the client application, which means it cannot directly control each connection’s sending rate. Therefore, AIPM emulates the equal sending rate allocation by setting awnd at the client (e.g., via calling the setsokopt() application programming interface (API) in Berkeley software distribution (BSD) systems) as follows.

First, according to (4) and (9), the ideal sending rate allocation is equivalent to the following sending window allocation:

where X is the total sending rate of AIPM’s n connections,and D is RTT without queuing.

Next, AIPM should let X equal the bottleneck capacity C,so that it can fully utilize the bottleneck link without selfinduced drops. From X=C and (12), we derive that

1.2 基本概念试题的第(1)小题：水稻对Mp表现出的抗病与感病为一对相对____________。为判断某抗病水稻是否为纯合子，可通过观察自交子代____________来确定。

where awndi is awnd of the connection i. Although such emulation is suboptimal, it avoids the polarized allocation of sending window sizes and thus decreases the timeout probability.

Naturally, even if AIPM strictly follows (14) and equally allocates awnd to its connections, timeout may still happen due to the background traffic. Therefore, AIPM must further decrease timeout risk by adaptively tuning the connections’ amount. Besides, it must quickly recover data transmission from the timeout connections. These two demands can be met by the following two mechanisms,respectively.

3.2　Determine the Proper Number of Concurrent Connections with Sliding Window Mechanism

To reduce the timeout probability while keeping high goodput, AIPM must selectively connect to a subset of the servers at one time. The question is which servers AIPM should connect to. To find the answer, AIPM employs a sliding-window-like mechanism to maintain a window of the concurrently existing connections. In concept, we term this window as the connection window, or shortly, con_wnd. When the existing connections are less than the con_wnd size, AIPM establishes a new connection and admits it to con_wnd. When a connection finishes, AIPM removes it from con_wnd.

AIPM uses an additive increase multiplicative decrease(AIMD) policy to decide the con_wnd size n. The initial value for n is n=1. Whenever a connection in con_wnd completes,AIPM infers that the bottleneck link is not jamming, thus it gradually increases n as

1) Fast reconnection: For a timeout-broken data connection, AIPM terminates that connection (by sending finish (FIN) to the connection’s server) and removes it from con_wnd. Then AIPM will reconnect to the data server as long as con_wnd allows. Since the SRU is so small(typically dozens of KB), the server can retransmit it within an ignorable period of time. This scheme is termed as fast reconnection. It enables AIPM to quickly recover data transmission from the timeout-broken servers rather than being slowed down by TCP’s sluggish timeout retransmission mechanism.

对于陷入财务困境的上市公司，往往出现越补越亏和政府补助 “依赖症”的情况，政府补助能否同样吸引社会投资者跟进存在争议。潘越等（2009）［8］论证了财务困境企业收到政府补助对促进国有企业和强政治关联的民营企业的长期业绩作用并不明显。杜勇等（2016）［9］发现获得政府补助的上市公司在亏损当年会减少亏损，但是长期来看，由于寻租等成本加大了企业负担，政府补助对于亏损企业的业绩扭亏存在抑制作用。社会投资者跟进目标在于私人回报最大化，那么对于亏损企业收到的政府补助，其是否会因为顾虑业绩扭转受限等因素而放弃跟进是一个值得思考的问题。

检算荷载包括恒载和活载，恒载包括实心板梁、桥面铺装、栏杆等附属设施的自重。本次检算拟采用汽车-20级、挂车-100，人群荷载3.5 kPa作为检算荷载。

After detecting timeout, AIPM realizes that the timeout probability right now is too high. According to (11), AIPM can reduce the timeout probability by lowering the number of concurrent connections n while fixing other parameters. Hence,AIPM halves con_wnd as follows:

Because AIPM does not reduce the total awnd when performing (16), the live connections that are not timeout can quickly occupy the spare bandwidth and maintain relatively high link utilization.

3.3　Fast Reconnection and Slow Withdrawal

The growth of the concurrent connections inevitably leads to timeout. Based on the fact that minRTO=200 ms, is much greater than a connection’s ordinary life period(mostly less than 1 ms). The AIPM deduces a connection to be timeout if three new connections have been admitted and finished since the last time the connection transmitted any data.

2) Slow withdrawal: As timeout occurs to some connections,if AIPM naively follows (16) and instantly halves the number of the concurrent connections, it may have to close some live connections that are not timeout. Nevertheless, these live connections have already cut their sending windows after seeing packet losses and are unlikely to cause more timeout situation. Therefore, closing these live connections at one time will result in unnecessary data retransmissions and may even lead to link under-utilization.

To solve the above two issues, AIPM adopts the slow withdrawal scheme in the presence of timeout. Specifically,upon detecting timeout, AIPM records the current con_wnd size n, and then slowly decreases con_wnd by one after each live connection finishes. This means that AIPM does not close any live connections that are still transmitting data. Moreover,this gives the live connections sufficient time to grow their congestion windows and to fully utilize the spare bandwidth left by the closed or timeout connections (since the delays in datacenters are so small, the live connections can grow their congestion windows very fast). Once con_wnd shrinks to n/2,AIPM ends slow withdrawal and resumes to the normal AI operation (15).

3.4　Some Discussions about Design Issues

1) How can AIPM know the bottleneck link’s capacity,C? Nowadays datacenters employ several mechanisms like offering the uniform high capacity between racks and load balancing technologies so that congestion only happens at the edge top-of-rack (TOR) switches[8]. This feature allows AIPM to conveniently set C to be the link capacity of TOR switch.

where Pi(k) is the timeout probability of the ith connection in the kth round.

3) How should AIPM react if it observes several timeout connections at one time? To avoid the link under- utilization,after detecting timeout-broken connections, AIPM starts slow withdrawal normally and halves the window only once. AIPM will not further halve con_wnd even if it detects more timeoutbroken connections during the slow withdrawal state.

4.Empirical Validations of AIPM

4.1　Simulation Settings

We evaluate the performance of AIPM with NS2 simulation in two different network scenarios. The first one is a single 1 Gbps bottleneck link with a 64 KB buffer and 100 μs RTT. This scenario represents static network environments.The second scenario has the leaf-spine topology as in Fig. 2,which is commonly used in data centers[7]. The network has 144 end-hosts through 9 leaf (i.e., top-of-rack) switches that are connected to 4 spine switches in a full mesh. Each leaf switch has 16 downlinks of 1 Gbps to the hosts and four uplinks of 4 Gbps to the spine. RTT between two hosts connected to different leaf switches is 100 μs. Background TCP flows arrive following the Poisson process, where the source and the destination for each flow are chosen randomly, and each flow’s size satisfies the distribution observed in real-world data mining workloads[7] as shown in Fig. 3. The background flow arrival rate is set to obtain a data load of 0.5. This scenario represents realistic datacenter network environments that are complex and dynamic. Throughout our simulations, the client requests for a data block that is scattered over N servers, and it requests for the next data block after all N servers finish sending the current one. The data packet size is 1000 Byte, and the acknowledgement (ACK) size is 40 Byte.

在部署方式上，支持分布式部署和负载均衡配置。各子站拥有独立的域名，并支持独立的子站维护管理体系，同时主站与各子站、子站间的信息可以互相共享呈送，实现站点间的数据调度与交换，即采用“Web服务器+数据库服务器+数据存储”架构。网站群的主站与各子站Web应用和数据库可部署在同一或不同服务器上。

Fig. 2. Leaf-spine network topology used in the second simulation scenario.

Fig. 3. Flow size distribution in the second simulation scenario is based on real-world measurements of data mining workloads[7].

We compare AIPM with the Naïve method (i.e., the client concurrently connects to all N servers), as well as two state-of-art application-layer solutions, namely, Oracle security developer tools (OSDT)[1] and OSM[11]. Similar to AIPM, these two solutions also try to mitigate incast by restricting the concurrent TCP connections’ number and sending rates. The major difference is that AIPM can correspondingly tune its own parameters for various network scenarios, whereas OSDT and OSM are only specified for predefined environments with single bottleneck link and no background TCP traffic.Due to this difference, AIPM is able to remarkably outperform OSDT and OSM in changing network environments, as we will see below.

4.2　Fixed SRU Size

近日，据媒体报道，华润置地首席执行官吴向东确认加入华夏幸福。12月4日下午6点，华夏幸福发布公告称，决定聘任俞建为公司联席总裁，分管财务及融资等业务。吴向东曾经的搭档——华润置地前CFO俞建已先吴向东一步就任。

Fig. 4 shows that our solution AIPM achieves best goodput (0.68 Gbps to 0.88 Gbps) in both cases. The reason is that AIPM dynamically adjusts each connection’s advertised window size (awnd) and the connection window size (con_wnd) based on its estimation for the network state(i.e., if some connections incur timeout). Such adjustment enables AIPM to adaptively minimize incast risk and rapidly recover from timeout even in the network environment with multiple bottleneck links and varying background traffic(Fig. 4 (b)).

Reversely, OSDT and OSM both restrict the concurrent connection number by predefined values. Although such fixed values are well suited for static conditions, they can hardly elude incast in dynamic network environments where both bottleneck links and available bandwidth can change drastically. This is why OSDT and OSM only have the goodput less than 0.3 Gbps in Fig. 4 (b).

Fig. 4. Goodput with SRU=256 kB for (a) a single bottleneck link without background traffic and (b) leaf-spine topology with background TCP load of 0.5.

4.3　Varying SRU Size

Next, we fix the overall data block size to 2 MB, and set the SRU size to 2 MB/N, where N is the total number of the servers. We evaluate AIPM’s performance in terms of goodput and request completion time.

4.虚拟现实。虚拟现实是一种多源信息融合的交互式三维动态视景和实体行为系统仿真，它利用计算机生成模拟环境，使学员沉浸到该环境中。网龙网络公司研发了红色革命教育题材VR（虚拟现实）体验项目“飞夺泸定桥”，带上VR 眼镜，就能真实感受红军长征中的战斗场景。同时该公司还开发了《星火相传》《古田会议》等VR 课程，旋转手机场景也随视角改变，立体真实的画面给学员带来身临其境之感。

Fig. 5. Goodput with block size=2 MB for (a) a single bottleneck without background traffic and (b) leaf-spine topology with background TCP load of 0.5.

As Fig. 5 illustrates, AIPM achieves higher goodput than the alternative solutions in both network scenarios, which demonstrates again that AIPM can effectively address incast issue even in highly dynamic environments. Observe that AIPM, OSDT, and OSM all have descendant goodput as N grows, for a larger N reduces each server’s SRU and then decreases the average sending window size of the concurrent connections. However, AIPM’s goodput decreases more slowly than the other two’s due to its adaptive adjustment of the number of concurrent connections. Indeed, AIPM adapts to small SRU values by allowing more connections to concurrently send data (i.e., larger con_wnd), so that it can fully utilize the bottleneck link and keep high goodput regardless of the SRU value.

Fig. 6 compares AIPM’s request completion time (RCT)with the other three schemes. As we can see, AIPM’s RCT keeps being the smallest in both scenarios. Particularly, in the leaf-spine scenario, AIPM’s RCT is less than 20% of OSM or OSDT’s RCT, and is less than 3% of Naïve’s RCT.This result clearly demonstrates that AIPM effectively avoids incast by triggering no TCP retransmission timeout.With such small RCT, AIPM makes the client application respond more promptly to the upper-level user, and hence improves user experience.

如若细加划分，可以从三个方面来更立体地看待上述问题。从教师的角度而言，口语教学需要解决的问题是，如何激发并调动学生的学习积极性，如何有效平衡同一个教学班级内语言能力不同的学生对教学的预期及要求，课堂的口语练习方式老套、内容陈旧，但同时教师们在改进方面又缺乏相应的知识储备和训练，教师对自身语言能力的认知不令人满意。从学生的角度而言，学生对口语课堂活动的参与度较低，语言表现力也不够。从课堂的整体环境而言，没有一个很好的启发教学环境，班级人数较多，有限的教学资源，教学时间不充足。

Fig. 6. Request completion time (RCT) with block size=2 MB for(a) a single bottleneck without background traffic and (b) leafspine topology with background TCP load of 0.5.

4.4　Higher Bottleneck Capacity

At last, we explore the scalability of AIPM in higher- speed data centers. In the single-bottleneck scenario, we increase the bottleneck capacity from 1 Gbps to 10 Gbps. In the leaf-spine scenario, we increase the edge link capacity from 1 Gbps to 10 Gbps and the core link capacity from 4 Gbps to 40 Gbps. Other settings remain unchanged, i.e., SRU=256 kB, buffer size=64 kB, and RTT=100 μs.

As we can see in Fig. 7, the goodput of AIPM is generally higher than the goodput of other methods. In particular, AIPM maintains goodput up to 91% of the bottleneck capacity in the network with no background TCP (Fig. 7 (a)), and it achieves nearly two times higher goodput than the alternative solutions while coexisting with the background TCP traffic (Fig. 7 (b)).Such good performance shows that AIPM is readily scalable for higher-speed data centers in future.

Fig. 7. Goodput with the bottleneck capacity C=10 Gbps for (a) a single bottleneck link without background traffic and (b) leaf-spine topology with background TCP load of 0.5.

5.Conclusions

We built an analytical model to reveal how TCP incast is affected by various factors related to applications. From this model, we derive two guidelines for minimizing the incast risk,including equally allocating the sending rate to connections and restricting the number of concurrent connections.

Based on the analytical results, we designed an adaptive application-layer solution to incast, which allocates an equal advertised window to connections, and uses a slidingconnection-window mechanism to manage concurrent connections. Simulation results indicate that our solution effectively eliminates incast and achieves high goodput in various network scenarios.

Appendix

We adjust the sending rate sum X to maximize the timeoutfree probability in (6) while fixing other parameters According to (9), we let the sending rates xi be xi=X/n for 1≤i≤n, and express the timeout-free probability(6) as

If the background traffic Y is much smaller than the sum of the connections’ sending rates X, the timeout-free probability reduces to, which is a decreasing function of X. Reversely, if Y is much greater than X, then becomes, which is an increasing function of X. As a result, the optimal X that maximizes is dependent on the background traffic Y.

References

[1]S. Zhang, Y. Zhang, Y. Qin, Y. Han, Z. Zhao, and S. Ci,“OSDT: A scalable application-level scheduling scheme for TCP incast problem,” in Proc. of IEEE Intl. Conf. on Communications, 2015, pp. 325-331.

[2]Y. Zhang and N. Ansari, “On mitigating TCP incast in data center networks,” in Proc. of IEEE Conf. on Computer Communications, 2011, pp. 51-55.

[3]H. Wu, J. Ju, G. Lu, C. Guo, Y. Xiong, and Y. Zhang,“Tuning ECN for data center networks,” in Proc. of the 8th ACM Intl. Conf. on Emerging Networking Experiments and Technologies, 2012, pp. 25-36.

[4]P. Cheng, F. Ren, R. Shu, and C. Lin, “Catch the whole lot in an action: rapid precise packet loss notification in data centers,” in Proc. of the 11th USENIX Symposium on Networked Systems Design and Implementation, 2014, pp.17-28.

[5]M. Alizadeh, A. Greenberg, D. A. Maltz, et al., “Data center TCP,” ACM SIGCOMM Computer Communication Review,vol. 40, no. 4, pp. 63-74, Oct. 2011.

[6]J. Zhang, F. Ren, L. Tang, and C. Lin, “Modeling and solving TCP incast problem in data center networks,” IEEE Trans. on Parallel and Distributed Systems, vol. 26, no. 2,pp. 478-491, Feb. 2015.

[7]M. Alizadeh, S. Yang, M. Sharif, et al., “pFabric: Minimal near-optimal datacenter transport,” ACM　 SIGCOMM Computer Communication Review, vol. 43, no. 4, pp. 435-446, 2013.

[8]H. Wu, Z. Feng, C. Guo, and Y. Zhang, “ICTCP: Incast congestion control for TCP in data-center networks,”IEEE/ACM Trans. on Networking, vol. 21, no. 2, pp. 345-358, 2013.

[9]W. Bai, K. Chen, H. Wu, W. Lan, and Y. Zhao, “PAC:taming TCP incast congestion using proactive ACK control,”in Proc. of IEEE the 22nd Intl. Conf. on Network Protocols,2014, pp. 385-396.

[10]J. Hwang, J. Yoo, and N. Choi, “Deadline and incast aware tcp for cloud data center networks,” Computer Networks, vol.68, pp. 20-34, Feb. 2014.

[11]K. Kajita, S. Osada, Y. Fukushima, and T. Yokohira,“Improvement of a TCP incast avoidance method for data center networks,” in Proc. of IEEE Intl. Conf. on ICT Convergence, 2013, pp. 459-464.

[12]H. Zheng and C. Qiao, “An effective approach to preventing TCP incast throughput collapse for data center networks,” in Proc. of IEEE Global Telecommunications Conf., 2011, pp.1-6.

[13]Y. Yang, H. Abe, K. Baba, and S. Shimojo, “A scalable approach to avoid incast problem from application layer,” in Proc. of IEEE the 37th Annual Computer Software and Applications Conf. Workshops, 2013, pp. 713-718.

[14]W. Chen, F. Ren, J. Xie, C. Lin, K. Yin, and F. Baker,“Comprehensive understanding of TCP incast problem,” in Proc. of IEEE Conf. on Computer Communications, 2015,pp. 1688-1696.

作者

Jin-Tang Luo，Jie Xu，Jian Sun

基金

分类号

出处

《Journal of Electronic Science and Technology》 2018年第1期

上一篇：UEs Power Reduction Evolution with Adaptive Mechanism over LTE Wireless Networks

下一篇：Journal of Electronic Science and Technology Information for Authors

《Journal of Electronic Science and Technology》2018年第1期文献

Probabilistic Quantitative Temporal Constraints:Representing, Reasoning, and Query Answering 作者：Paolo Terenziani，Antonella Andolina

Optimizing Deep Learning Parameters Using Genetic Algorithm for Object Recognition and Robot Grasping 作者：Delowar Hossain，Genci Capi，Mitsuru Jindai

Novel Biological Based Method for Robot Navigation and Localization 作者：Endri Rama，Genci Capi，Yusuke Fujimura，Norifumi Tanaka，Shigenori Kawahara，Mitsuru Jindai

Learning Association Rules and Tracking the Changing Concepts on Webpages: An Effective Pornographic Websites Filtering Approach 作者：Jyh-Jian Sheu

Key-Attributes-Based Ensemble Classifier for Customer Churn Prediction 作者：Yu Qian，Liang-Qiang Li，Jian-Rong Ran，Pei-Ji Shao

Security Enhanced Anonymous User Authenticated Key Agreement Scheme Using Smart Card 作者：Jaewook Jung，Donghoon Lee，Hakjun Lee，Dongho Won

Pairing-Free Certificateless Key-Insulated Encryption with Provable Security 作者：Li-Bo He，Dong-Jie Yan，Hu Xiong，Zhi-Guang Qin

Overview of Graphene as Anode in Lithium-Ion Batteries 作者：Ri-Peng Luo，Wei-Qiang Lyu，Ke-Chun Wen，Wei-Dong He

High Power Highly Nonlinear Holey Fiber with Low Confinement Loss for Supercontinuum Light Sources 作者：Feroza Begum，Juliana Zaini，Saifullah Abu Bakar，Iskandar Petra，Yoshinori Namihira

Multi-Reconfigurable Band-Notched Coplanar Waveguide-Fed Slot Antenna 作者：M. Lertwatechakul，C. Benjangkaprasert

UEs Power Reduction Evolution with Adaptive Mechanism over LTE Wireless Networks 作者：Ruchi Sachan，Chang Wook Ahn

Modeling TCP Incast Issue in Data Center Networks and an Adaptive Application-Layer Solution 作者：Jin-Tang Luo，Jie Xu，Jian Sun

Journal of Electronic Science and Technology Information for Authors 2016/07/05

Call for Papers Journal of Electronic Science and Technology Special Section on Energy-Efficient Technologies 2016/07/05

Call for Papers Journal of Electronic Science and Technology Special Section on Terahertz Technology and Applications 2016/07/05

Message from JEST Editorial Committee 2016/07/05

Modeling TCP Incast Issue in Data Center Networks and an Adaptive Application-Layer Solution

1.Introduction

2.Modeling and Minimizing TCP Incast Probability

2.1 Notations and Assumptions

2.2 Probability of TCP Incast

2.3 Minimizing Incast Probability

3.Minimizing Incast Probability at Application Layer

3.1 Allocate Equal Advertised Window to Connections

3.2 Determine the Proper Number of Concurrent Connections with Sliding Window Mechanism

3.3 Fast Reconnection and Slow Withdrawal

3.4 Some Discussions about Design Issues