# Jobs Submitted | Duration (Minutes) | # Data Points | # Data Points per Subinterval | Average Throughput | Maximum Throughput |
---|---|---|---|---|---|
1000 | 94 | 191 | 47 | 22.945 Gbit/s | 32.934 Gbit/s |
2000 | 92 | 185 | 46 | 26.853 Gbit/s | 34.211 Gbit/s |
3000 | 84 | 169 | 42 | 28.384 Gbit/s | 37.190 Gbit/s |
4000 | 95 | 191 | 48 | 26.263 Gbit/s | 36.328 Gbit/s |
5000 | 85 | 172 | 43 | 23.334 Gbit/s | 37.076 Gbit/s |
1000 Jobs: | Avg Throughput:(Gbits/sec) | Avg # Jobs: | Avg Throughput per Job: |
S1: | 24.48367459 | 564.05 | 0.04340692241 |
S2: | 24.09186677 | 1000 | 0.02409186677 |
S3: | 24.93353163 | 969.9 | 0.02570732202 |
S4: | 18.4091187 | 269.6666667 | 0.06826620037 |
2000 Jobs: | Avg Throughput: (Gbits/sec) | Avg # Jobs: | Avg Throughput per Job: |
S1: | 28.22730083 | 1249.95 | 0.02258274398 |
S2: | 28.13615692 | 2000 | 0.01406807846 |
S3: | 29.75599496 | 1992.357143 | 0.01493507079 |
S4: | 19.22643441 | 808.64 | 0.02377625941 |
3000 Jobs: |
Avg Throughput: (Gbits/sec) | Avg # Jobs: | Avg Throughput per Job: |
S1: | 30.09010767 | 2244.444444 | 0.01340648362 |
S2: | 27.03319889 | 2995.684211 | 0.009024048259 |
S3: | 28.91324895 | 2884.052632 | 0.01002521543 |
S4: | 27.38546522 | 918.1111111 | 0.02982805119 |
4000 Jobs: | Avg Throughput: (Gbits/sec) | Avg # Jobs: | Avg Throughput per Job: |
S1: | 29.99714949 | 2271.9 | 0.01320355187 |
S2: | 23.13003247 | 3999.6 | 0.005783086426 |
S3: | 23.98578401 | 3777.952381 | 0.006348884684 |
S4: | 28.03923519 | 1256.952381 | 0.02230731698 |
5000 Jobs: | Avg Throughput: (Gbits/sec) | Avg # Jobs: | Avg Throughput per Job: |
S1: | 27.75642329 | 1726 | 0.01608135764 |
S2: | 22.47816741 | 4707.315789 | 0.004775156038 |
S3: | 17.62058298 | 5000 | 0.003524116595 |
S4: | 26.09912804 | 4617.166667 | 0.005652628532 |
setup.sh
#!/bin/bash
#$1 = total number of clients
#$2 = number of seconds to sleep after gfal-copy
#$3 = number of minutes to run script for before exit
echo $1 $2 $3
mkdir "out/output_"$1"j_"$2"s_"$3"m"
cat blank.submit | sed "s@%@$1@g" | sed "s@*@$2@g" | sed "s@\^@$3@g" > $1"j_"$2"s_"$3"m.submit"
echo "$1 $2 $3" | awk '{print "condor_submit "$1"j_"$2"s_"$3"m.submit"}' | sh
blank.submit
#!/bin/bash
#$1 = total number of clients
#$2 = number of seconds to sleep after gfal-copy
#$3 = number of minutes to run script for before exit
echo $1 $2 $3
mkdir "out/output_"$1"j_"$2"s_"$3"m"
cat blank.submit | sed "s@%@$1@g" | sed "s@*@$2@g" | sed "s@\^@$3@g" > $1"j_"$2"s_"$3"m.submit"
echo "$1 $2 $3" | awk '{print "condor_submit "$1"j_"$2"s_"$3"m.submit"}' | sh#Argument 1 is the number of seconds of sleep for after each gfal-copy
blank.submit
#Argument 2 is the number of minutes this script will run for before quit
executable = gfcTest.sh
error = out/output_%j_*s_^m/test-$(Cluster).$(Process).error
log = out/output_%j_*s_^m/test-$(Cluster).$(Process).log
output = out/output_%j_*s_^m/test-$(Cluster).$(Process).out
transfer_input_files = fileList.txt
RequestMemory = 1000
arguments = * ^
queue %gfcTest.sh
#!/bin/bash
sleepTime=$1 #seconds to sleep after each gfal-copy
totalTime=$2*60 #minutes to execute script for
while "$SECONDS" -lt "$totalTime"?
do
file=$(cat fileList.txt | sort -R | head -1)
home=$(echo "gsiftp://gftp.t2.ucsd.edu/hadoop")
path=$home$file
gfal-copy -f -v $path file:/dev/null
sleep "$sleepTime"s
donefileList.txt
/Path/To/File/test_1.file
/Path/To/File/test_2.file
...
/Path/To/File/test_n.file
MathematicaNotebook.nb
Relevant Mathematica CommandsAverage Throughput Per Subinterval:
Importing Data: (Check linked CSV for import file formatting style)
data = Import[" AverageThroughput.csv "] {{269.667, 18.4091}, {564.05, 24.4837}, {808.64, 19.2264}, {918.111, 27.3855}, {969.9, 24.9335}, {1000, 24.0919}, {1249.95, 28.2273}, {1256.95, 28.0392}, {1726, 27.7564}, {1992.36, 29.756},
{2000, 28.1362}, {2244.44, 30.0901}, {2271.9, 29.9971}, {2884.05, 28.9132}, {2995.68, 27.0332}, {3777.95, 23.9858}, {3999.6, 23.13}, {4617.17, 26.0991}, {4707.32, 22.4782}, {5000, 17.6206}}Fitting a 4th Order Approximation Function:
fitFunction = Fit[data, {1, x , x^2, x^3, x^4}, x]
Creating the Plot:
AvgThroughput = ListPlot? [data, ImageSize? -> Full, Frame -> True, FrameLabel -> {{"Average Throughput (Gbps)", " "}, {"Number of Active Jobs", "Average Throughput at t=[0.25,0.5,0.75,1.0] }, PlotTheme -> "Detailed",
PlotRangeClipping? -> True, Plot[fitFunction, {x, 0, 5000}, PlotLabels? -> "Expressions"], PlotLabel -> None, LabelStyle? -> {GrayLevel[0], Bold}]Show the Plot:
Show[AvgThroughput]Export the Plot to JPEG:
Export["~/AverageThroughput.jpg", AvgThroughput? , "JPEG"]
Average Throughput per Job per Subinterval:
Importing Data: ( Check linked CSV for import file formatting style) data = Import[" AverageThroughputPerJob.csv "] {{269.667, 0.0682662}, {564.05, 0.0434069}, {808.64, 0.0237763}, {918.111, 0.0298281}, {969.9, 0.0257073}, {1000, 0.0240919}, {1249.95, 0.0225827}, {1256.95, 0.0223073},
{1726, 0.0160814}, {1992.36, 0.0149351}, {2000, 0.0140681}, {2244.44, 0.0134065}, {2271.9, 0.0132036}, {2884.05, 0.0100252}, {2995.68, 0.00902405}, {3777.95, 0.00634888},
{3999.6, 0.00578309}, {4617.17, 0.00565263}, {4707.32, 0.00477516}, {5000, 0.00352412}}Fitting a 4th Order Approximation Function:
fitFunction = Fit[data, {1, x , x^2, x^3, x^4}, x]
Creating the Plot:
AvgThroughputPerJob = ListPlot? [data, ImageSize? -> Full, Frame -> True, FrameLabel -> {{"Average Throughput Per Job (Gbps)", " "}, {"Number of Active Jobs", "Average Throughput at t=[0.25,0.5,0.75,1.0] }, PlotTheme -> "Detailed",
PlotRangeClipping? -> True, Plot[fitFunction, {x, 0, 5000}, PlotLabels? -> "Expressions"], PlotLabel -> None, LabelStyle? -> {GrayLevel[0], Bold}]Show the Plot:
Show[AvgThroughputPerJob]Export the Plot to JPEG:
Export["~/AverageThroughputPerJob.jpg", AvgThroughputPerJob? , "JPEG"]
gfcTest.sh
#!/bin/bash
sleepTime=$1 #seconds to sleep after each gfal-copy
totalTime=$2*60 #minutes to execute script for
while "$SECONDS" -lt "$totalTime"?
do
file=$(cat fileList.txt | sort -R | head -1)
home=$(echo "gsiftp://gftp.t2.ucsd.edu/hadoop")
path=$home$file
gfal-copy -f -v $path file:/dev/null
sleep "$sleepTime"s
donefileList.txt
/Path/To/File/test_1.file
/Path/To/File/test_2.file
...
/Path/To/File/test_n.file100_30.submit
executable = gfcTest.sh error = out/output_100_30/test-$(Cluster).$(Process).error log = out/output_100_30/test-$(Cluster).$(Process).log output = out/output_100_30/test-$(Cluster).$(Process).out transfer_input_files = fileList.txt RequestMemory? = 1000 arguments = 10 30 queue 100