IO Profiling
Follows a summary of the CMS IO Profilling activity (collected and summarized by by Haifeng Pi).
CMS IO Profiling
The IO performance of CMS SE and the efficiency of IO intensive applications have been studied.
The method to test site local or grid I/O performance is in place through these activities. The standard CMS software setting allows to tune a set of parameters that have direct impact on the IO of the SE, e.g. cache-hint, read-hint, cache-size.
For cache-hint, various data caching strategies can be used: application-only, storage-only, lazy-download, and auto-detect.
For read-hint, several buffer strategies can be used: direct-unbufferred, read-ahead-bufferred, and auto-detect. In the test, no all the settings are tried. The tuning of SE settings are also largely beyond the scope of the test.
A CMS software patch is available reduce the total amount of data at the application level to improve the processing efficiency and smartly select the data to read so as to reduce the total amount data by a actor 2-3, which indirectly relates to the I/O performance of the system.
Overall the test software is incorporated into
CMSSW Performance Toolkit and ready to release and used for wide scope of users.
Several type of SE architecture, DPM, dCache, Storm+Lustre, Bestman+Hadoop, are tested. The test is primarily based on CMS analysis jobs which are both data- and CPU-intensive. Performance improvement is in ~20% with adoption of some optimization at the application level. No problem is found for I/O performance of SEs.
The results are sensitive to what applications to use. The major focus is on how much time to process a fixed number of physics events. It
shows some subtle area that how the application software interfacing with a given SE technology and how to maximize the performance by tuning
some parameters.
--
HaifengPi - 2010/07/15