top
logo


Home Information The Software WMS 3.2 Release Notes
WMS 3.2 Release Notes PDF Print E-mail
Written by WMS Support   
Monday, 02 March 2009 16:08

Release Notes of WMS 3.2

 

WMS 3.2 (Savannah patch 2597/3044) stands for a significant redesign of the gLite WMS which is basically (but absolutely not only) the end result of the code restructuring activity which took place during EGEE-II. It includes a redesign of the core architecture, implementation of a parallel match-making algorithm, restructuring of the WMProxy interface and a reviewed ISM structure which allows for an optimized algorithm for match-making with data

 

Among the major problems revealed by the previous releases of the WMS was the core architecture and the match-making performance, which the subsequent difficulty to recover from complex situations involving pending jobs, resubmissions, denials of service, timeouts and so on. Of course, the WMS has to cover a wide range of submission use-cases, and, even after the positive introduction of bulk submission and match-making for collections (as required by the experiments), both WMS stability and performance continued to be affected by an internal architecture prone to race conditions and needing huge locks throughput.

 

In this respect, the new architecture implemented in this release can be utilized in a wider scenario of submission use-cases. Prior to the present release single job processing and match-making was done almost serially, because of "big" locks on concurrently accessed data structures. This release is able to perform the match-making in parallel thanks to a re-design of the ISM that will be doubled in order to remove some locks with a huge scope at the moment necessary to keep the structure synchronised with readers and writers insisting on it. A read-only copy will be available for readers, the request handlers needing to perform the match-making, while another one will be created in background while purchasing. A pseudo-atomic swap between these two copies will occur periodically and timedly so that the ISM at the moment accessed by reader threads is disabled while, in the mean-time, the freshly purchased one, since then only accessed for writing, will then become available to the readers only. Two ISM instances will be contemporarily present in memory only for limited period - the time needed to carry out purchasing and to wait for the older threads, still pointing to that very copy, to complete - after which such instance can be definitely cleared.

 

This redesign also allows for faster operations on the ISM, such as dumping or updating: the so-called Task Queue has been replaced by a prioritized event queue where requests queues up waiting to be processed in a as stateless as possible fashion. This way several locks have been removed.

Two different queues are needed to accomodate for this new model. Periodic activities, handled as "timed" events, will re-schedule themselves to show-up at a given time in the "ready" queue. The number of threads is kept controlled, in such a way that each and every request for processing - a "submit" as well a "purchase from BDII" request - is processed within the same thread pool as a generic functor. The WM core becomes essentially a multi-threaded function executor, without the clear-cut dispatcher/request handler distinction which also used to require syncronization (locks) on several data structures.

 

Situations like the following, with basically serialized match-making, are now prevented by design:

...

09 Oct, 19:41:16 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/ubZmW8I1u5xiaUIlJiFcsg (0/6042 [12.67, 16.6] ) 09 Oct, 19:41:20 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/Y-JZ3DEbqJbordWuIvkIxg (0/6042 [10.84, 15.47] ) 09 Oct, 19:41:28 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/MrYWfFnejXhh43SHX0Lp3g (0/6042 [12.39, 19.95] ) 09 Oct, 19:41:32 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/jg_p9HyHKGaTUB0qIV1_Ww (2/6042 [16.76, 20.33] ) 09 Oct, 19:41:39 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/fQEtbp4kaF9qTicc-0bTig (0/6042 [19.25, 26.89] ) 09 Oct, 19:41:46 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/lvHFKWrE5YkG2EWAUOe0jw (0/6042 [21.81, 28.56] ) 09 Oct, 19:41:50 -I: [Info] checkRequirement(matchmakerISMImpl.cpp:79): MM for job: https://lb002.cnaf.infn.it:9000/HRFXvdB2iGA9ylbvvAUmLA (0/6042 [25.48, 29.72] )

...

 

This patch also provides an update of ICE, improving the performance and the scalability of submissions to CREAM based CE via the WMS, even if there are still some other scalability issues, which appear when there are many (thousands) active jobs being managed by ICE

 

Newly introduced features:

  • parallel match-making
  • ISM: restructured algorithm for matchmaking in case of data requirements specified in the JDL.

The new algorithm performs a reverse search starting from data resolution reducing the size of the search space to only those computing resources mounting the storage providing the specified files. Integration of the new restructured data structure within the broker/helper/brokerinfo modules allowing a faster collection of all the data relevant to the construction of the brokerinfo file while performing the MM. This allows the generation of the brokerinfo file without any further query to the ISM for extracting storage information.

  • support for IPv6
  • improved error reporting for DAGs
  • run-time selection of LB type: server or proxy.

Typically for small VOs (but not only as we will see later), it can make sense to install both WMS and LB on a single machine. In such circumstances, the use of LB proxy (a cache of jobs under processing) is discouraged to avoid storing twice the same events (this will change with the advent of LB 2.0 wherever LBserver and LBProxy are co-located). Also, configuring the whole system to work with LBserver instead of LB proxy needs to be done once for each and every component for correct job state computation. The previous versions allowed (not correctly) to mix up use of LBserver and proxy. See the "configuration changes" section for more details.

  • the jobwrapper template is now cached at each WM start (restart to re-load any change)
  • restructured jobwrapper (also removed perl dependencies)
  • Dumping the ISM can be done more often at a lesser cost, simply by creating a jobdir request (basically a file) like this one: [ command = "ism_dump"; ]

  • this code baseline is ready to support Grid Site Delegtion 2, at the moment disabled given that for backward-compatibility needs coming from external packages the project has been built against Gridsite 1.1.18

 

Configuration changes:

LBProxy (= true) in Common section, has been moved from [WorkloadManageProxy] section (see above).

RuntimeMalloc? [WorkloadManager]: allows to the use an alternative malloc library (i.e. nedmalloc, google performance tools and many more), run-time redirecting with LD_PRELOAD. Possible values are, for example, RuntimeMalloc? = "/usr/lib/libtcmalloc_minimal.so" if you use Google malloc. IsmThreads? (= true) [WMConfiguration]: The new WM core processes each computation/request in a thread pool (WorkerThreads? being the number) . Among many other benefits, this also helps keeping the number of threads controlled, so that it should be possibly set matching the number of physical cores. This is what happens with IsmThreads? set to false. Now, think of all the WorkerThreads? threads being busy at, say, submitting 1000 nodes collections. It would happen that ISM requests would have to wait, even if on top of the queue (they have the highest priority). To set-up a safer configuration for similar use-cases, one can specify for the WM to handle ISM requests as separate threads (not traversing any queue and not being part of a pool), hence IsmThreads? = true.

QueueSize? (= 1000) [WorkloadManager]: Size of the queue of events "ready" to be managed by the workers thread pool

 

Some remarks:

workaround "EnableZippedISB = false" in the JDL can be removed

given the increased performance of the WM, backlogs to JC/ICE side could be experimented. For the JC this basically stems from Condor not keeping up the pace (in the way Condor is used within our design). What can be done to avoid this effect:

1) Make sure jobdir is enabled as the JC input

2) Consider using a machine with two disks and two separate controllers. Use one for the JC/Condor stuff

3) A different physical layout for equally distributing the load between the WMS and LB is suggested. It always require two machines for a WMS+LB node, but differently distributed. Check out here:

https://twiki.cnaf.infn.it/cgi-bin/twiki/view/EgeeJra1It/DistributedWMS


Known Issues:

- bug #49844: WMProxy does not catch signal 25

- bug #50009: wmproxy.gacl person record allows anyone to pass

Last Updated on Wednesday, 19 August 2009 15:22
 

 

 

 

 

 






bottom

Webmaster: WMS Support