GridFTP? -LVS Tests

This page details scalability tests on the GridFTP? server/LVS run over the course of November 2016

Overview

To test the throughput and client load limits of the LVS and the GridFTP? servers

  • A variable (range: 100-8k) number of 30-45 minute jobs were submitted via condor_submit and put into the condor_q(ueue)
  • Each job proceeds by selecting a file from a list at random, gfal-copying that file (writing it to /dev/null) and then sleeping for a variable amount of time (2-10 sec) before repeating the process and continuing for 30-45 minutes.
  • The throughput (Gbps) of each individual gftp-X.t2.ucsd.edu server as well as the number of active jobs was recorded at 30 second time intervals through each client's execution
  • Objective is to test 6 GridFTP? Servers on the LVS setup and to see if the system perform as expected
  • !!!Number of clients active when maximum throughput occurs
  • !!!Total Throughput/Client (Gbps/Client)

Example Code

gfcTest.sh

#!/bin/bash
sleepTime=$1 #seconds to sleep after each gfal-copy
totalTime=$2*60 #minutes to execute script for
while "$SECONDS" -lt "$totalTime"?
do
file=$(cat fileList.txt | sort -R | head -1)
home=$(echo "gsiftp://gftp.t2.ucsd.edu/hadoop")
path=$home$file
gfal-copy -f -v $path file:/dev/null
sleep "$sleepTime"s
done

fileList.txt

/Path/To/File/test_1.file
/Path/To/File/test_2.file
...
/Path/To/File/test_n.file

100_30.submit

executable = gfcTest.sh

error = out/output_100_30/test-$(Cluster).$(Process).error

log = out/output_100_30/test-$(Cluster).$(Process).log

output = out/output_100_30/test-$(Cluster).$(Process).out

transfer_input_files = fileList.txt

RequestMemory? = 1000

arguments = 10 30

queue 100

Data

10 second sleep time

Initially the jobs submitted slept for 10 seconds in between each execution of gfal-copy. The data was inconsistent and no clear correlations were found. The average total bandwidth at max throughput was 17.5 Gbps with an average of 740 clients active each having an individual bandwidth of ~0.032 Gbps.

2 second sleep time

The jobs submitted which slept for 2 seconds in between each instance of gfal-copy were much more consistent. The average total bandwidth at max throughput was ~17.5 Gbps. The number of clients active at max throughput greatly stabilized and remained consistent in the range of 900-1200 jobs active with an average of 1011 active clients at max throughput each having an individual bandwidth of ~0.017 Gbps.

Edit | Attach | Print version | History: r9 < r8 < r7 < r6 < r5 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r7 - 2016/12/14 - 20:45:46 - CliftonPotter
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback