Difference: GridFTPLVSTests2016 (7 vs. 8)

Revision 82017/02/06 - Main.CliftonPotter

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
Changed:
<
<

GridFTP? -LVS Tests

>
>

GridFTP? -LVS Tests (gfal-copy writing to /dev/null

 
Changed:
<
<
This page details scalability tests on the GridFTP? server/LVS run over the course of November 2016
>
>
This page details scalability tests on the GridFTP? server/LVS run over the course of November 2016-January 2017

Overview

Using a sleeper pool at Caltech T2, a variable number of scripts which run gfal-copy to dev/null every 1 second to test the throughput and client load limits and overall performance of the LVS and the GridFTP? servers.

  • 1000, 2000, 3000, 4000 and 5000 instances of gfcTest.sh were submitted to condor queue
  • gfcTest.sh selects a file at random listed in fileList.txt and runs gfal-copy writing to dev/null
  • Upon completion of gfal-copy the script sleeps for 1 second and then executes gfal-copy again. (Total Time: 60 minutes)
  • The total load on each individual gftp-x.t2.ucsd.edu server and total number of active jobs in the condor_q were recorded every 30 seconds
  • The total throughput (sum of all individual gftp-x loads) was calculated.

Data

1000 Jobs:
1000jTEST.csv http://condorflux.t2.ucsd.edu:3000/dashboard/db/caltech-network-tests?from=1484711637329&to=1484717359888
1000j_active.csv http://condorflux.t2.ucsd.edu:3000/dashboard/db/condor-metrics-test-001?from=1484711637329&to=1484717359888

2000 Jobs:
2000jTEST.csv http://condorflux.t2.ucsd.edu:3000/dashboard/db/caltech-network-tests?from=1484719120593&to=1484724632263
2000j_active.csv http://condorflux.t2.ucsd.edu:3000/dashboard/db/condor-metrics-test-001?from=1484719120593&to=1484724632263

3000 Jobs:
3000jTEST.csv http://condorflux.t2.ucsd.edu:3000/dashboard/db/caltech-network-tests?from=1484725398613&to=1484730470759
3000j_active.csv http://condorflux.t2.ucsd.edu:3000/dashboard/db/condor-metrics-test-001?from=1484725398613&to=1484730470759

4000 Jobs:
4000jTEST.csv http://condorflux.t2.ucsd.edu:3000/dashboard/db/caltech-network-tests?from=1484735972258&to=1484741688843
4000j_active.csv http://condorflux.t2.ucsd.edu:3000/dashboard/db/condor-metrics-test-001?from=1484735972258&to=1484741688843

5000 Jobs:
5000jTEST.csv http://condorflux.t2.ucsd.edu:3000/dashboard/db/caltech-network-tests?from=1484742602465&to=1484747749937
5000j_active.csv http://condorflux.t2.ucsd.edu:3000/dashboard/db/condor-metrics-test-001?from=1484742602465&to=1484747749937

Overall:
FIGURE 1:

Screen_Shot_2017-02-06_at_7.01.31_AM.png

http://condorflux.t2.ucsd.edu:3000/dashboard/db/caltech-network-tests?from=1484711447764&to=1484749117351


FIGURE 2:

Screen_Shot_2017-02-06_at_7.04.19_AM.png

http://condorflux.t2.ucsd.edu:3000/dashboard/db/condor-metrics-test-001?from=1484711447764&to=1484749117351

Analysis:

The total duration (minutes), number of data points, average total throughput and maximum throughput for each batch submission were found.


# Jobs Submitted Duration (Minutes) # Data Points # Data Points per Subinterval Average Throughput Maximum Throughput
1000 94 191 47 22.945 Gbit/s 32.934 Gbit/s
2000 92 185 46 26.853 Gbit/s 34.211 Gbit/s
3000 84 169 42 28.384 Gbit/s 37.190 Gbit/s
4000 95 191 48 26.263 Gbit/s 36.328 Gbit/s
5000 85 172 43 23.334 Gbit/s 37.076 Gbit/s


Looking at the overall behavior of the each batch's individual job submission, execution, and completion (Figure 1), an immediate problem arises. The individual jobs do not execute simultaneously. This would lead to skewed overall averages. To compensate for this, each of the batches were split into four equally divided subintervals:


First Subinterval: J obs Submitting (Ramp Up)
Second Subinterval : All Jobs Active (Steady State)
Third Subinterval : All Jobs Active (Steady State)
Fourth Subinterval: Jobs Completing (Cool Down)

<--td {border: 1px solid #ccc;}br {mso-data-placement:same-cell;}-->

<--td {border: 1px solid #ccc;}br {mso-data-placement:same-cell;}-->
1000 Jobs: Avg Throughput:(Gbits/sec)Avg # Jobs: Avg Throughput per Job:
S1:24.48367459564.050.04340692241
S2:24.0918667710000.02409186677
S3:24.93353163969.90.02570732202
S4:18.4091187269.66666670.06826620037
2000 Jobs: Avg Throughput: (Gbits/sec)Avg # Jobs: Avg Throughput per Job:
S1:28.227300831249.950.02258274398
S2:28.1361569220000.01406807846
S3:29.755994961992.3571430.01493507079
S4:19.22643441808.640.02377625941
3000 Jobs:

Avg Throughput:
(Gbits/sec)

Avg # Jobs: Avg Throughput per Job:
S1:30.090107672244.4444440.01340648362
S2:27.033198892995.6842110.009024048259
S3:28.913248952884.0526320.01002521543
S4:27.38546522918.11111110.02982805119
4000 Jobs: Avg Throughput:
(Gbits/sec)
Avg # Jobs:Avg Throughput per Job:
S1:29.997149492271.90.01320355187
S2:23.130032473999.60.005783086426
S3:23.985784013777.9523810.006348884684
S4:28.039235191256.9523810.02230731698
5000 Jobs: Avg Throughput:
(Gbits/sec)
Avg # Jobs:Avg Throughput per Job:
S1:27.7564232917260.01608135764
S2:22.478167414707.3157890.004775156038
S3:17.6205829850000.003524116595
S4:26.099128044617.1666670.005652628532


FIGURE 3: Each of the batches subintervals average total throughputs (Column 2) were graphed with respect to the average number of jobs (Column 3):
The plotted subinterval averages show a very obvious pattern with a maximum at a point slightly left of center.
A function was fitted using Mathematica and the maximum average total throughput was found to be approximately 29 Gbit/s occuring when there were 2161 active jobs.

FIGURE 4: Each of the batches subintervals average total throughputs per job (Column 3) were graphed with respect to the average number of jobs (Column 3):
The plotted subinterval average total throughput per job shows, unexpectedly, an exponentially decaying function. Rejecting outliers, the trend still decreases linearly.
The average throughput per job decreasing as the number of jobs increases is unexpected. It was expected to remain relatively constant. This will require further investigation.

Code

setup.sh

#!/bin/bash
#$1 = total number of clients
#$2 = number of seconds to sleep after gfal-copy
#$3 = number of minutes to run script for before exit
echo $1 $2 $3
mkdir "out/output_"$1"j_"$2"s_"$3"m"
cat blank.submit | sed "s@%@$1@g" | sed "s@*@$2@g" | sed "s@\^@$3@g" > $1"j_"$2"s_"$3"m.submit"
echo "$1 $2 $3" | awk '{print "condor_submit "$1"j_"$2"s_"$3"m.submit"}' | sh

blank.submit

#!/bin/bash
#$1 = total number of clients
#$2 = number of seconds to sleep after gfal-copy
#$3 = number of minutes to run script for before exit
echo $1 $2 $3
mkdir "out/output_"$1"j_"$2"s_"$3"m"
cat blank.submit | sed "s@%@$1@g" | sed "s@*@$2@g" | sed "s@\^@$3@g" > $1"j_"$2"s_"$3"m.submit"
echo "$1 $2 $3" | awk '{print "condor_submit "$1"j_"$2"s_"$3"m.submit"}' | sh


blank.submit

#Argument 1 is the number of seconds of sleep for after each gfal-copy
#Argument 2 is the number of minutes this script will run for before quit
executable = gfcTest.sh
error = out/output_%j_*s_^m/test-$(Cluster).$(Process).error
log = out/output_%j_*s_^m/test-$(Cluster).$(Process).log
output = out/output_%j_*s_^m/test-$(Cluster).$(Process).out
transfer_input_files = fileList.txt
RequestMemory = 1000
arguments = * ^
queue %

gfcTest.sh

#!/bin/bash
sleepTime=$1 #seconds to sleep after each gfal-copy
totalTime=$2*60 #minutes to execute script for
while "$SECONDS" -lt "$totalTime"?
do
file=$(cat fileList.txt | sort -R | head -1)
home=$(echo "gsiftp://gftp.t2.ucsd.edu/hadoop")
path=$home$file
gfal-copy -f -v $path file:/dev/null
sleep "$sleepTime"s
done

fileList.txt

/Path/To/File/test_1.file
/Path/To/File/test_2.file
...
/Path/To/File/test_n.file



Relevant Mathematica Commands

MathematicaNotebook.nb

Average Throughput Per Subinterval:

Importing Data: (Check linked CSV for import file formatting style)
data = Import[" AverageThroughput.csv "]

{{269.667, 18.4091}, {564.05, 24.4837}, {808.64, 19.2264}, {918.111, 27.3855}, {969.9, 24.9335}, {1000, 24.0919}, {1249.95, 28.2273}, {1256.95, 28.0392}, {1726, 27.7564}, {1992.36, 29.756},
{2000, 28.1362}, {2244.44, 30.0901}, {2271.9, 29.9971}, {2884.05, 28.9132}, {2995.68, 27.0332}, {3777.95, 23.9858}, {3999.6, 23.13}, {4617.17, 26.0991}, {4707.32, 22.4782}, {5000, 17.6206}}

Fitting a 4th Order Approximation Function:
fitFunction = Fit[data, {1, x , x^2, x^3, x^4}, x]

Creating the Plot:
AvgThroughput = ListPlot? [data, ImageSize? -> Full, Frame -> True, FrameLabel -> {{"Average Throughput (Gbps)", " "}, {"Number of Active Jobs", "Average Throughput at t=[0.25,0.5,0.75,1.0] }, PlotTheme -> "Detailed",
PlotRangeClipping? -> True,
Plot[fitFunction, {x, 0, 5000}, PlotLabels? -> "Expressions"], PlotLabel -> None, LabelStyle? -> {GrayLevel[0], Bold}]

Show the Plot:
Show[AvgThroughput]

Export the Plot to JPEG:

Export["~/AverageThroughput.jpg", AvgThroughput? , "JPEG"]

Average Throughput per Job per Subinterval:


Importing Data: (
Check linked CSV for import file formatting style)

data = Import[" AverageThroughputPerJob.csv "]

{{269.667, 0.0682662}, {564.05, 0.0434069}, {808.64, 0.0237763}, {918.111, 0.0298281}, {969.9, 0.0257073}, {1000, 0.0240919}, {1249.95, 0.0225827}, {1256.95, 0.0223073},
{1726, 0.0160814}, {1992.36, 0.0149351}, {2000, 0.0140681}, {2244.44, 0.0134065}, {2271.9, 0.0132036}, {2884.05, 0.0100252}, {2995.68, 0.00902405}, {3777.95, 0.00634888},
{3999.6, 0.00578309}, {4617.17, 0.00565263}, {4707.32, 0.00477516}, {5000, 0.00352412}}

Fitting a 4th Order Approximation Function:

fitFunction = Fit[data, {1, x , x^2, x^3, x^4}, x]

Creating the Plot:

AvgThroughputPerJob = ListPlot? [data, ImageSize? -> Full, Frame -> True, FrameLabel -> {{"Average Throughput Per Job (Gbps)", " "}, {"Number of Active Jobs", "Average Throughput at t=[0.25,0.5,0.75,1.0] }, PlotTheme -> "Detailed",
PlotRangeClipping? -> True,
Plot[fitFunction, {x, 0, 5000}, PlotLabels? -> "Expressions"], PlotLabel -> None, LabelStyle? -> {GrayLevel[0], Bold}]

Show the Plot:

Show[AvgThroughputPerJob]

Export the Plot to JPEG:

Export["~/AverageThroughputPerJob.jpg", AvgThroughputPerJob? , "JPEG"]









Initial Attempts *Only leaving for Archival Purposes

This page details scalability tests on the GridFTP? server/LVS run over the course of November 2016

 

Overview

To test the throughput and client load limits of the LVS and the GridFTP? servers

Line: 72 to 195
 
META FILEATTACHMENT attachment="MaxThruPerClientVsActiveClients_10secSleep.jpg" attr="" comment="" date="1481746018" name="MaxThruPerClientVsActiveClients_10secSleep.jpg" path="MaxThruPerClientVsActiveClients_10secSleep.jpg" size="105115" stream="MaxThruPerClientVsActiveClients_10secSleep.jpg" tmpFilename="/tmp/cGYDxi3Awq" user="CliftonPotter" version="1"
META FILEATTACHMENT attachment="MaxThruVsNumActiveClients_2secSleep.jpg" attr="" comment="" date="1481746025" name="MaxThruVsNumActiveClients_2secSleep.jpg" path="MaxThruVsNumActiveClients_2secSleep.jpg" size="94745" stream="MaxThruVsNumActiveClients_2secSleep.jpg" tmpFilename="/tmp/JN8FptUY0N" user="CliftonPotter" version="1"
META FILEATTACHMENT attachment="MaxThruVsNumActiveClients_10secSleep.jpg" attr="" comment="" date="1481746031" name="MaxThruVsNumActiveClients_10secSleep.jpg" path="MaxThruVsNumActiveClients_10secSleep.jpg" size="98326" stream="MaxThruVsNumActiveClients_10secSleep.jpg" tmpFilename="/tmp/XTnWaLtA3R" user="CliftonPotter" version="1"
Added:
>
>
META FILEATTACHMENT attachment="Screen_Shot_2017-02-06_at_7.01.31_AM.png" attr="h" comment="" date="1486393410" name="Screen_Shot_2017-02-06_at_7.01.31_AM.png" path="Screen Shot 2017-02-06 at 7.01.31 AM.png" size="245126" stream="Screen Shot 2017-02-06 at 7.01.31 AM.png" tmpFilename="/tmp/BqTlx5b8cw" user="CliftonPotter" version="1"
META FILEATTACHMENT attachment="Screen_Shot_2017-02-06_at_7.04.19_AM.png" attr="h" comment="" date="1486393486" name="Screen_Shot_2017-02-06_at_7.04.19_AM.png" path="Screen Shot 2017-02-06 at 7.04.19 AM.png" size="60559" stream="Screen Shot 2017-02-06 at 7.04.19 AM.png" tmpFilename="/tmp/9Eki4cKWZd" user="CliftonPotter" version="1"
META FILEATTACHMENT attachment="1000jTEST.csv" attr="" comment="" date="1486393926" name="1000jTEST.csv" path="1000jTEST.csv" size="19305" stream="1000jTEST.csv" tmpFilename="/tmp/dzvKjyw3iZ" user="CliftonPotter" version="1"
META FILEATTACHMENT attachment="2000jTEST.csv" attr="" comment="" date="1486394055" name="2000jTEST.csv" path="2000jTEST.csv" size="18343" stream="2000jTEST.csv" tmpFilename="/tmp/QmWLpF0CPf" user="CliftonPotter" version="1"
META FILEATTACHMENT attachment="3000jTEST.csv" attr="" comment="" date="1486394065" name="3000jTEST.csv" path="3000jTEST.csv" size="16605" stream="3000jTEST.csv" tmpFilename="/tmp/yp9NzhMZYS" user="CliftonPotter" version="1"
META FILEATTACHMENT attachment="4000jTEST.csv" attr="" comment="" date="1486394073" name="4000jTEST.csv" path="4000jTEST.csv" size="18968" stream="4000jTEST.csv" tmpFilename="/tmp/uPuHf8nN3t" user="CliftonPotter" version="1"
META FILEATTACHMENT attachment="5000jTEST.csv" attr="" comment="" date="1486394080" name="5000jTEST.csv" path="5000jTEST.csv" size="17234" stream="5000jTEST.csv" tmpFilename="/tmp/mCIWAWF3Cz" user="CliftonPotter" version="1"
META FILEATTACHMENT attachment="1000j_active.csv" attr="" comment="" date="1486394094" name="1000j_active.csv" path="1000j_active.csv" size="17173" stream="1000j_active.csv" tmpFilename="/tmp/39dBQjVFGm" user="CliftonPotter" version="1"
META FILEATTACHMENT attachment="2000j_active.csv" attr="" comment="" date="1486394099" name="2000j_active.csv" path="2000j_active.csv" size="8290" stream="2000j_active.csv" tmpFilename="/tmp/XiN8gGfSTL" user="CliftonPotter" version="1"
META FILEATTACHMENT attachment="3000j_active.csv" attr="" comment="" date="1486394104" name="3000j_active.csv" path="3000j_active.csv" size="7666" stream="3000j_active.csv" tmpFilename="/tmp/5QMQLLa8be" user="CliftonPotter" version="1"
META FILEATTACHMENT attachment="4000j_active.csv" attr="" comment="" date="1486394109" name="4000j_active.csv" path="4000j_active.csv" size="8628" stream="4000j_active.csv" tmpFilename="/tmp/sfj3LKuV4W" user="CliftonPotter" version="1"
META FILEATTACHMENT attachment="5000j_active.csv" attr="" comment="" date="1486394241" name="5000j_active.csv" path="5000j_active.csv" size="7763" stream="5000j_active.csv" tmpFilename="/tmp/tfcdi3IXDy" user="CliftonPotter" version="1"
META FILEATTACHMENT attachment="AverageThroughput.jpg" attr="" comment="" date="1486398153" name="AverageThroughput.jpg" path="AverageThroughput.jpg" size="53364" stream="AverageThroughput.jpg" tmpFilename="/tmp/mAy52OzdHc" user="CliftonPotter" version="1"
META FILEATTACHMENT attachment="averagethroughput.jpg" attr="" comment="" date="1486398314" name="averagethroughput.jpg" path="averagethroughput.jpg" size="53364" stream="averagethroughput.jpg" tmpFilename="/tmp/1psCrQSs8K" user="CliftonPotter" version="1"
META FILEATTACHMENT attachment="averagethroughputperjob.jpg" attr="" comment="" date="1486398381" name="averagethroughputperjob.jpg" path="averagethroughputperjob.jpg" size="55354" stream="averagethroughputperjob.jpg" tmpFilename="/tmp/XQQsvqRTzB" user="CliftonPotter" version="1"
META FILEATTACHMENT attachment="AverageThroughput.csv" attr="" comment="" date="1486401051" name="AverageThroughput.csv" path="AverageThroughput.csv" size="439" stream="AverageThroughput.csv" tmpFilename="/tmp/6QYauRGxkl" user="CliftonPotter" version="1"
META FILEATTACHMENT attachment="AverageThroughputPerJob.csv" attr="" comment="" date="1486401068" name="AverageThroughputPerJob.csv" path="AverageThroughputPerJob.csv" size="486" stream="AverageThroughputPerJob.csv" tmpFilename="/tmp/oJHY19ILkb" user="CliftonPotter" version="1"
META FILEATTACHMENT attachment="MathematicaNotebook.nb" attr="" comment="" date="1486401471" name="MathematicaNotebook.nb" path="MathematicaNotebook.nb" size="45556" stream="MathematicaNotebook.nb" tmpFilename="/tmp/lRgz1vtH8V" user="CliftonPotter" version="1"
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback