-- NaveenKashyap - 2016/08/24

Purpose

This document will outline the lack of a current factory monitoring system and the design of our proposed solution.

Problem

The current problem we face is a lack of aggregation of data produced by each factory. Such aggregation is important to the monitoring and analysis of data.

Solution

Our solution utilizes the Collector within each factory.

We want to use Tyson's script (information found here) to query a list of factories. However, we do not want to query the factories sequentially to protect from failures and to improve performance. Therefore, we want to query the factories in parallel, each with their own logging/output files. The queries should be sent every 15 minutes.

Technicalities:

  • Each factory should have it's own thread/daemon to allow for parallel computations
  • Each thread/daemon should have it's own logging/output files
  • The user (probably crontab) should only have to make one call to initiate all queries to all factories and it should be easy to edit the list of factories to query
  • It is important that we use only pieces of Condor and python scripts and bindings.
Topic revision: r3 - 2016/09/08 - 17:13:35 - NaveenKashyap
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback