当前位置:首页 >> >> An empirical study of dynamic scheduling on rings of processors

An empirical study of dynamic scheduling on rings of processors


An Empirical Study of Dynamic Scheduling on Rings of Processors
Extended Abstract

Dawn E. Gregory

Lixin Gao

Arnold L. Rosenberg

Paul R. Cohen

Department of Computer Science, University of Massachusetts Amherst, MA 01003, USA Abstract
We empirically analyze and compare two distributed, low-overhead policies for scheduling dynamic treestructured computations on rings of identical PEs. Our experiments show that both policies give signi?cant parallel speedup on large classes of computations, and that one yields almost optimal speedup on moderate size rings. We believe that our methodology of experiment design and analysis will prove useful in other such studies. computations. Our experiments largely substantiate both conjectures, at least on moderate-size rings. Our concern here is with the design and analysis of simulation experiments as much as with comparing scheduling policies.

2. The Formal Setting
The architecture. A p-PE ring of processing elements (PEs) has p identical PEs, denoted P0 P 1 : : : P p;1, with each PE Pi connected to its clockwise neighbor P i+1modp and its counterclockwise neighbor P i;1modp . The computational load. In order to simplify the design of our experiments, we formulate a purely combinatorial environment which abstracts the behavior of multigrid algorithms. Our abstract computational load comprises dynamically growing binary tree-dags (BTs, for short). The (dynamic) computation that generates a BT T proceeds as follows, until no active leaves remain. Initially, T has a single node, named 1, which is simultaneously its root and its (current) active leaf. At each step, some set (depending on the schedule) of then-current active leaf-tasks get executed. An executed task x may: halt, so x becomes a permanent leaf; spawn two children, the new active leaves 2x + 1, so x becomes a nonleaf.
2

1. Introduction
The promise of parallel computers to accelerate computation relies on the algorithm designer’s ability to keep all (or most) of a computer’s processors fruitfully occupied all (or most) of the time. In this paper, we study the problem of ef?ciently scheduling evolving binary-tree-structured computations on a ring-structured parallel computer. We thus face the challenge of ef?ciently scheduling a computation whose ultimate “shape” is unknown (precluding “of?ine” planning as in [3, 4, 7]), on an architecture that has a large diameter (precluding ef?cient random placements as in [6]) and small bisection bandwidth (precluding ef?cient massive data transmission as in [5]). We empirically analyze and compare two distributed, low-overhead dynamic-scheduling policies. Our simpler policy—called KOSO, for “keep-one-send-one”—has each PE keep one child of a spawning task and pass the other to its clockwise neighbor in the ring; our more sophisticated policy—called KOSO? —operates similarly, but allows childpassing only from a more heavily loaded PE to a more lightly loaded one. Based on partial (mathematical) analyses of the policies, we formulated two conjectures about the ef?ciency of the schedules they produce. (a) Both policies yield close to (optimal) p-fold speedup on p-PE rings, for large, significant classes of evolving tree-structured computations; (b) policy KOSO? generally outperforms policy KOSO on these

x and

Policies KOSO and KOSO? . Our two scheduling policies differ in their load-balancing regimens but share the same scheduling policy. Both have PEs retain tasks awaiting execution in a local priority queue, ordered by the tasks’ heights in the BT T being scheduled. Each computation begins with the root of T as the sole occupant of PE P 0 ’s task-queue and all other PEs’ task-queues empty. Subsequently, the taskqueue of each PE contains some subset of the then-active leaves of T . At each step, each Pi having a nonempty task-queue:

1. executes the active leaf x in its task-queue which is ?rst in the mandated order; 2. if task x spawns two children, then P i adds the new leaf 2x to its task-queue. Under KOSO, P i simultaneously sends the new leaf 2x + 1 to the task-queue of PE P i+1modp . Under KOSO? , P i sends task 2x + 1 to P i+1modp only if the latter PE has a lighter load than P i ; otherwise, P i adds this task also to its own task-queue. We assess one time unit for the entire process of executing a task and performing the balancing actions just described. Thus, we ignore the fact that a step of KOSO? consumes a bit more real time than a step of KOSO, because of the required comparisons of loads at each step. Our timing assessment is congruous with the ?ne-grain nature of the motivating multigrid algorithms. Table 1 illustrates how KOSO and KOSO? distribute the nodes of the 6-level complete binary tree T 6 in the ring R 4 .
PE level 0 1 2 3 4 5 1 2 3 4 5 2 3 4 5 3 4 5
KOSO-resident nodes KOSO

The second motivating result focuses on how KOSO and distribute work to the PEs of R p while the evolving BT keeps growing. In both parts of the following theorem, we start growing a BT with a single task in PE P 0 and all other PEs idle, and we observe the pattern of work distribution throughout R p .
KOSO?

Theorem 2 Focus on an evolving BT in which every executed task spawns two new tasks. (a) After N p ; 1 steps of policy KOSO, the numbers of unexecuted tasks residing in the heaviest and lightest loaded PEs of Rp differ by p ; 2. [2] (b) After N (p ; 1)2 steps of policy KOSO? , the numbers of unexecuted tasks residing in the heaviest and lightest loaded PEs of Rp differ by 1.

4. The Experimental Setup
Because multigrid computations rarely generate complete binary trees in practice, we wish to explore the behaviors of KOSO and KOSO? under more realistic conditions. To this end, we designed a suite of experiments based on simulation of R p under each policy. The inputs to the experiments. A key issue in simulation experiments is the generation of random test instances that model realistic computation. We created an abstract analogue of a real, important class of tree-structured computations, namely, the computations that arise from numerically integrating real-valued functions on real intervals using re?nement techniques such as the Trapezoid Rule or Simpson’s Rule. (Our abstraction extends easily to higherdimensional multigrid-type algorithms.) BTs arise naturally from such algorithms: tasks correspond to real intervals; spawning corresponds to bisecting the current interval to get more accuracy; halting corresponds to achieving adequate accuracy or encountering inadequate (computer) resolution. We seek a probabilistic growth model that leads to BTs similar in structure to those one might expect to be generated by random applications of the Trapezoid Algorithm. To this end, we desire a growth model that produces BTs which: are “bushy” near the root (since most nonlinear functions will require some interval re?nement initially); rather quickly get “scrawny” (since most “actual” functions will be rather smooth throughout most of the domain of interest); stay rather “shallow” (since [numerical] resolution will be exceeded before a BT gets very deep). Our formal model employs a single real-valued parameter < 1 to generate a suite of random trees for each of our experiments. Each experiment works on a family F ( ) of random trees, which is de?ned by the rule:

P0

? -resident nodes

1 2 4 8 16, 31 32, 47, 55, 59, 61, 62 3 5, 6 9, 10, 12 17, 18, 20, 24 33, 34, 36, 40, 48, 63 7 11, 13, 14 19, 21, 22, 25, 26, 28 35, 37, 38, 41, 42, 44, 49, 50, 52, 56 15 23, 27, 29, 30 39, 43, 45, 46, 51, 53, 54, 57, 58, 60

P1

P2 P3

1 2 4, 5 8, 10, 11 16, 17, 20, 21, 22 32, 33, 34, 35, 40, 41, 42, 44, 45 3 6 9, 12, 13 18, 23, 24, 25, 26 36, 37, 43, 46, 47, 48, 50, 52 7 14 19, 27, 28, 29 38, 49, 51, 53, 54, 55, 56, 57, 58 15 30, 31 39, 59, 60, 61, 62, 63

Table 1. The node-assignments when R4 executes T 6 under policies KOSO and KOSO? .

3. Motivating our conjectures
Our conjectures about the individual and comparative behaviors of policies KOSO and KOSO? are founded on two analytical results which may lend the reader some intuition about how the policies work. The ?rst motivating result asserts that policy KOSO schedules BTs that ultimately grow into complete binary trees asymptotically optimally. Theorem 1 Under policy KOSO, Rp executes any BT that grows into the height-n complete binary tree T n within time (2n ; 1)=p + (low-order terms ). [2]

Each level-` node of an evolving BT in F ( ) spawns (two children) with probability ` and halts with probability 1 ; ` . Clearly our BTs will be shallow and will quickly get scrawny. To foster “bushiness” near our BTs’ roots, we use values of that are close to unity—speci?cally, no smaller than 0:96. The simulation environment. The input generator and network simulation were written in GNU C and run on a DEC Alpha under OSF1.1 Our probabilisticinput model employs a linear congruential random number generator. Because spawning decisions are made locally, the order in which tasks are executed modi?es the shape of the evolving tree. Thus, both the number of PEs and the policy may result in different trees, even if the number generator is initialized with the same “seed” value. The simulation is con?gured by three parameters: the number of PEs, p 3, the load-balancing regimen, Alg 2 fKOSO,KOSO? g, and the input family, :96. After running to completion, the simulator reports N , the total number of tasks consumed, h, the maximum height of encountered tasks, and T , the total running time.
E P S I L O N

2000

1000

10000

20000

30000 N

40000

50000

Figure 1. The relationship between N and . Light circles denote KOSO trials, dark squares represent KOSO? .

for KOSO. As a ?rst test of this, we compare the overhead incurred by each policy, using a t-test to determine if the difference we observe could have arisen by random chance. Comparing the average overhead for KOSO (305:9) with that of KOSO? (91:8) yields a test statistic of 5:75, hence a probability < 10 ;5 that the policies incur equal overhead. Extending the analysis. Our ?rst experiment has shown that KOSO? is apparently a better load-balancing policy, but it has not resolved the issue of (approximate) optimality. Because we know that large rings may incur additional overhead due to the cost of initial loading, we extend the analysis to account for ring size p. Two-factor analysis of variance (ANOVA) separates the in?uences of p and Alg on by generating F-statistics for three effects: one for p (F = 100:9), one for Alg (F = 57:5), and one for the non-linear interaction between them (F = 18:9). All three of these values are highly signi?cant (indicating a probability of no effect < 10;5), which means that both p and Alg affect the value of , and that p affects the way that Alg affects the value of . The average under each condition is reported in table 2.
Policy
KOSO KOSO ?

5. The Experiments and Their Results
Our ?rst analyses are based on data from a pilot experiment, in which we exercised both policies under a wide range of conditions. We executed 20 random samples from the input families = f:96 :965 :97g on the networks R 3 , R6 , R10, and R20 under each policy, yielding 480 data points (2 policies 20 instances 3 input families 4 networks). Because our conjectures are concerned with network ef?ciency, we formulate this analysis in terms of the overhead, 0, incurred over a trial. represents the difference between actual running time and optimal running time, computed as = T ; dN=pe. To verify our ?rst conjecture—that both of our policies are “good”—we consider the relationship between and N : if the policies are performing optimally, then and N will be unrelated. Figure 1 displays this relationship as captured in the pilot experiment; clearly, increases as N increases. We derive an equation for this relationship using linear regression: = :013N + 26:9. Even though the slope .013 is fairly small, it can be statistically distinguished from zero, and we must conclude that N and are related. On the other hand, the line accounts for only 16% of the variance in , suggesting that other factors are at work. Our second conjecture—that KOSO? outperforms KOSO— suggests that should be signi?cantly smaller for KOSO? than
1 Source code for the simulation can be obtained via the WWW at http://eksl-www.cs.umass.edu/ gregory/SPDP.html.

3 29.1 2.8

6 65.8 7.3

10 229.6 19.8

20 899.1 337.2

Table 2. Mean for Alg and p.

p = 10 and p = 20. This leads us to suspect the results are

Under both policies,

increases signi?cantly between

heavily in?uenced by the startup costs associated with loading the ring. To isolate startup costs from other overhead, we ran another experiment that measures two new variables: Startup, the number of steps before all PEs become busy, and Steady, the number of steps in which all PEs are occupied. We divide each by total running time T , to arrive at the fraction of running time spent in each condition.

We repeated the two-factor ANOVA of p and Alg on each of these variables, in hopes of eliminating one or more of the effects. The analysis of Startupshows that p has a signi?cant impact, but the effect of Alg and the nonlinear interaction have gone away. This is precisely what we predicted: ring size affects the overhead due to startup costs. Unfortunately, our analysis shows that Steady is subject to the same effects as the overhead . Thus, while the new experiment yielded one interesting result, we have reached another dead end. Looking inside the computation. Still lacking precise quanti?cation of the effects of either p or Alg, we ran another experiment to look even deeper inside the computation. In this experiment, we collected time-series of the number of busy PEs and the net queue-size for each PE on each step of the trial. From a sample of such trials, we selected four representative instances of approximately equal length, one for each algorithm on two different size rings. These data are displayed in ?gure 2.
10

for most of the trial. This plateau is indicative of the ring reaching its steady-state, where all PEs are busy. Thus, we see further evidence of ring size affecting the startup time: larger rings never reach the steady state. The graphs of queue-size (right) tell an even more interesting story. When the ring never reaches its steady state (plots (a) and (c)), we see a large variance among the queuesizes. When the network spends most of its time in steady state (plots (b) and (d)), this variance should be reduced; however, this occurs only under policy KOSO? . In plot (d) we see that the queues for KOSO? are nearly the same size over the entire trial. This is an important result, because it shows that KOSO? results in more even queue sizes than KOSO; thus, KOSO? literally does a better job of balancing the load among the PEs, by minimizing variance in the length of PE queues.

6. Ongoing Work
Having (largely) substantiated our conjectures on an abstract computational load, we are currently devising an experimental study that will compare the results reported here with those obtained from using KOSO and KOSO? to schedule “real” tree-structured computations, speci?cally those generated by multigrid algorithms.

30

(a)

20
5

10

50

100

150

200

250

50
8

100

150

200

250

50

7

40
6

5

30

(b)

4

20
3

Acknowledgements. The research of L.-X. Gao and A. L. Rosenberg was supported in part by NSF Grant CCR-92-21785. The research of D. E. Gregory was supported by an NSF Graduate Research Fellowship.

2

10

100

200

300

100

200

300

References
[1] P.R. Cohen (1995): Empirical Methods for Arti?cial Intelligence. MIT Press, Cambridge, MA. [2] L.-X. Gao and A.L. Rosenberg (1996): Toward ef?cient scheduling of evolving computations on rings of processors. J. Parallel Distr. Comput., to appear. [3] L.-X. Gao, A.L. Rosenberg, R.K. Sitaraman (1995): Optimal architecture-independent scheduling of ?ne-grain tree-sweep computations. 7th IEEE Symp. on Parallel and Distr. Processing, 620-629.

10

20

(c)

5

10

50

100

150

200

250

50
8

100

150

200

250

30
7

6

5

20

(d)

4

3

10
2

100

200

300

100

200

300

Figure 2. Busy PEs (left) and queue-sizes (right) for: KOSO on (a) R16 and (b) R8 ; KOSO? on (c) R16 and (d) R8 . The plots in ?gure 2 show a variety of interesting features. First, note that the number of busy PEs (left) never stabilizes on R16 (plots (a) and (c)), while on R8 (plots (b) and (d)) this number quickly reaches a plateau and remains there

[4] S.L. Johnsson (1987): Communication ef?cient basic linear algebra computations on hypercube architectures. J. Parallel Distr. Comput. 4, 133-172. [5] R.M. Karp and Y. Zhang (1988): Randomized parallel algorithms for backtrack search and branch-and-bound computation. J. ACM 40, 765-789. [6] A.G. Ranade (1994): Optimal speedup for backtrack search on a butter?y network. Math. Syst. Theory 27, 85-101. [7] T. Yang and A. Gerasoulis (1991): A fast static scheduling algorithm for dags on an unbounded number of processors. Supercomputing ’91, 633-642.


更多相关文档:

0303 Fundamental issues in the study of SLA_图文.pdf

empirical studies, magnificently summarized by Hyl...dynamic approach to SLA, focusing on the way L2...knowledge, linked with modular-specific processors....

An Empirical Study of Dynamic Scheduling on Rings of Processors.unkown

An Empirical Study of Dynamic Scheduling on Rings of Processors Extended Abstract Dawn E. Gregory Lixin Gao Arnold L. Rosenberg Paul R. Cohen Department ...

An Empirical Study of Dynamic Scheduling on Rings of ....unkown

An Empirical Study of Dynamic Scheduling on Rings of Processors Miranda E. Barrows§ Dawn E. Gregory Lixin Gao Arnold L. Rosenberg§ Paul R...

...Analysis of the Minimum Capital Requirement. Empirical ....unkown

Dynamic Financial Analysis of the Minimum Capital Requirement. Empirical ...statements (financial position) and probability of ruin of an insurance ...

An empirical phase spaceanalysisof ring current dynamics' ....unkown

105, NO. A4, PAGES 7707-7719, APRIL 1, 2000 An empirical phase spaceanalysisof ring current dynamics' Solar wind control of injection and decay T. ...

Empirical Analysis of a Dynamic Social Network Built from PGP....unkown

c Springer-Verlag Berlin Heidelberg 2007 Empirical Analysis of a Dynamic Social Network Built from PGP Keyrings 159 each time it is possible to compute a...

...using long-run relationships An empirical analysis of the co.unkown

methodology A dynamic analysis of the effect of monetary policy on agriculture using long-run relationships An empirical analysis of the cointegrated ...

An Empirical Analysis of the Inuence of Jump Dynamics on ....unkown

An Empirical Analysis of the Inuence of Jump Dynamics on Value-at-Risk Estimation by Christian Vogl August 2016 Master's Programme in Economics ...

An Empirical Study of Dynamic Financial Early Warning Based ....unkown

COMPUTER MODELLING & NEW TECHNOLOGIES 2014 18(12C) 943-946 Wang hui zhen An Empirical Study of Dynamic Financial Early Warning Based on Grey Correlation ...

An Empirical Study of Corporate Liquidity Dynamics, with ....unkown

An Empirical Study of Corporate Liquidity Dynamics, with Conditioning on Earnings Andrew Carverhill September 10, 2012 Abstract We investigate the dynamics of...

An Empirical Study of Data Speculation Use on the Intel ....unkown

An Empirical Study of Data Speculation Use on the Intel Itanium 2 Processor Markus Mock Department of Computer Science University of Pittsburgh Pittsburgh, ...

...and Advertising Effectiveness: An Empirical Analysis of ....unkown

The Dynamic Impact of Product-Harm Crises on Brand Equity and Advertising Effectiveness: An Empirical Analysis of the Automobile Industry Yan Liu Assistant ...

Inventory Dynamics in the Financial Crisis: An Empirical ....unkown

Inventory Dynamics in the Financial Crisis: An Empirical Analysis of Firm Responsiveness and its Effect on Financial Performance Kai Hoberg Maximiliano Udenio ...

...Growth : An Empirical Analysis on Dynamic Panel Data of WA....unkown

M PRA Munich Personal RePEc Archive Taxation and Economic Growth : An Empirical Analysis on Dynamic Panel Data of WAEMU Countries N'Yilimon NANTOB ...

An Empirical Study on the Relationship between ....unkown

LV14062 An Empirical Study on the Relationship between Entrepreneurship and Dynamic Capabilities of Chinese Family Owned Businesses Yongjie Zhao Ningbo University...

AN EMPIRICAL STUDY ON THE FINANCIAL PERFORMANCE OF SCHEDULED ....unkown

AN EMPIRICAL STUDY ON THE FINANCIAL PERFORMANCE OF SCHEDULED URBAN COOPERATIVE BANKS IN INDIA THESIS SUBMITTED TO BHARATHIAR UNIVERSITY IN PARTIAL FULFILLMENT ...

An empirical study to investigate the effects of internal and....unkown

www.GrowingScience.com/msl An empirical study to investigate the effects of internal and external knowledge management on dynamic organizational skills through...

An Empirical Analysis of C#, PHP, JAVA, JSP and ASP.Net ....unkown

1 June 2014 An Empirical Analysis of C#, PHP, JAVA, JSP and ASP.Net regarding performance analysis based on CPU utilization Md. Ahsan Arif ...

...dynamics of scientific knowledge networks: An empirical ....unkown

Structures and dynamics of scientific knowledge networks: An empirical analysis based on a co-word networkc WANG Xiaoguang1,3*, JIANG Tingting2 & LI ...

AN EMPIRICAL STUDY ON THE FINANCIAL PERFORMANCE OF SELECTED ....unkown

ABSTRACT The study mainly focused attention to study the financial performance of Urban Cooperative banks in India, which were covered under schedule II of ...

更多相关标签:
网站地图

文档资料共享网 nexoncn.com copyright ©right 2010-2020。
文档资料共享网内容来自网络,如有侵犯请联系客服。email:zhit325@126.com