Transcription

1NETWORKING TRENDSSCIENCE DMZ: INTRODUCTION,CHALLENGES, AND OPPORTUNITIESJorge CrichignoDepartment of Integrated Information TechnologyCollege of Engineering and ComputingUniversity of South CarolinaPresentation at Universidad Catolica de AsuncionAsuncion, ParaguayAugust 14, 2019

2Agenda Introduction to University of South Carolina (USC) The Science DMZ Motivation for a high-speed ‘science’ network architecture Science DMZ architecture Research opportunities: pacing, entropy-based intrusion detection,routers’ buffer size Resources online

3Agenda Introduction to University of South Carolina (USC) The Science DMZ Motivation for a high-speed ‘science’ network architecture Science DMZ architecture Research opportunities: pacing, entropy-based intrusion detection,routers’ buffer size Resources online

4University of South Carolina Founded in 1801 Flagship state institution 350 programs (BSc, MSc, PhD) 50,000 students, over 34,000 inColumbia campus

5University of South Carolina Founded in 1801 Flagship state institution 350 programs (BSc, MSc, PhD) 50,000 students, over 34,000 inColumbia campus

6College of Engineering and Computing 3222/570 undergraduate/graduate students 135 TTT faculty Research awards Federal agencies, foundations, industry Industry partnerships IBM, Boeing, Siemens, Samsung Cisco, Palo Alto Networks, Juniper Networks,Barefoot Networks, VMware, etc.

7University of South Carolina The College of Engineering and Computing includes: Integrated Information Technology (IIT) Computer Science Electrical Engineering Mechanical Engineering Aerospace Engineering Biomedical Engineering Chemical Engineering Civil and Environmental

8Information Technology More practical than theoretical in nature Promoteapplied research using professional tools andplatforms Research agenda emerges from the practice Laboratory experiences with workplace relevanceCapstonesIT FundamentalsPlatformsDatabasesWeb and mobileProgrammingSecurityNetworksGeneral EdElectives,professionalpractice, otherdisciplinesFoundations of Math andScienceUpper-levelLower-level2 years2 yearsInternships

9Information Technology

10USC’s Cyberinfrastructure (CI) Lab Information online at http://ce.sc.edu/cyberinfra/ Development ofcustomprotocolsusingswitchesTCP rate control using pacingEntropy-based intrusion detectionIoT traffic analysisCollaboration in the above topics with University of Texas at San Antonio (UTSA) University of South Florida (USF) U.S. Department of Energy and National Laboratories The Energy Science Network (ESnet) Brandon University (Canada) Brno University (Czech Republic)programmable

11Agenda Introduction to University of South Carolina (USC) The Science DMZ Motivation for a high-speed ‘science’ network architecture Science DMZ architecture Research opportunities: pacing, entropy-based intrusion detection,routers’ buffer size Resources online

12Motivation for a High-Speed Science Architecture Science and engineering applications are now generating data atan unprecedented rate From large facilities to portable devices, instruments can producehundreds of terabytes in short periods of time Data must be typically transferred across high-throughput highlatency Wide Area Networks (WANs)ApplicationsESnet trafficThe Energy Science Network (ESnet) is the backbone connecting U.S. national laboratories and research centers

13Motivation for a High-Speed Science Architecture A biology experiment using the U.S. National Energy ResearchScientific Computing Center (NERSC) resources

14Motivation for a High-Speed Science Architecture A biology experiment using the U.S. National Energy ResearchScientific Computing Center (NERSC) resourcesSnapChat Dataproduced per dayworldwide by millionsof people 38 TBhttp://www.nature.com/articles/ncomms5371One Biology experimentby a team of ninescientists: 114 TB(Photosystem II X-RayStudy)

15Motivation for a High-Speed Science ArchitectureEnterprise network limitations: Security appliances (IPS, firewalls, etc.) are CPU-intensive Inability of small-buffer routers/switches to absorb traffic bursts

16Motivation for a High-Speed Science ArchitectureEnterprise network limitations: Security appliances (IPS, firewalls, etc.) are CPU-intensive Inability of small-buffer routers/switches to absorb traffic bursts At best, transfers of big data may last days or even weeks1Two devices exchanging data on a 10 Gbps networkPacket loss rate is 1/22,000, or 0.0046%Dart, L. Rotman, B. Tierney, M. Hester, J. Zurawski, “The science dmz: a network design pattern for data-intensive science,” InternationalConference on High Performance Computing, Networking, Storage and Analysis, Nov. 2013.1E.

17Agenda Introduction to University of South Carolina (USC) The Science DMZ Motivation for a high-speed ‘science’ network architecture Science DMZ architecture Research opportunities: pacing, entropy-based intrusion detection,routers’ buffer size Resources online

18Science DMZ The Science DMZ is a network designed for big science data1, 2 Main elements High throughput, friction free WAN paths(no inline security appliances; routers /switches w/ large buffer size) Data Transfer Nodes (DTNs) End-to-end monitoring perfSONAR Security access-control list offline appliance/s (IDS)Crichigno, E. Bou-Harb, N. Ghani, “A comprehensive tutorial on science DMZ,” IEEE Communications Surveys and Tutorials, Vol. 21, Issue 2,2nd quarter 2019.2. 1E. Dart, L. Rotman, B. Tierney, M. Hester, J. Zurawski, “The science dmz: a network design pattern for data-intensive science,” InternationalConference on High Performance Computing, Networking, Storage and Analysis, Nov. 2013.1J.

19Science DMZ The Science DMZ is a network designed for big science data1, 2 Main elements High throughput, friction free WAN paths(no inline security appliances; routers /switches w/ large buffer size) Data Transfer Nodes (DTNs) End-to-end monitoring perfSONAR Security access-control list offline appliance/s (IDS)L1L2-L3L4-L5L5L3-L5

20Science DMZ Needs at eGuiseppeChandraTopicExperimentalnuclearphysics (ENP)ChemicalengineeringCurrent supportNSF: 1505615 ( 1.2M), 1614773 ( 610K), 1812382( 350K); Brookhaven National Laboratory (BNL) 218624( 15K); Jefferson Science Associates / DOE ( 11K)NSF: 1254352 ( 400K), 1534260 ( 840K), 1565964( 300K), 1832809 ( 160K), 1632824 ( 3M), 1805307( 75K)Aerospace,Siemens ( 628M in-kind [44]), Boeing ( 5M [45]), DODpredictivehq017-17-c-7110 ( 240K), Missile Def. Ag. HQ0147-16-Cmaintenance7606 ( 35K), Boeing SSOW-BRT-W0915-0001 ( 275K)Environment NSF: 1828055 ( 635K), 1738340 ( 286K), 1655926 (4K),nanoscience1553909 ( 510K), 1437307 ( 300K), 1508931 ( 390K),1834638 ( 380K); DOD 450388-19545 ( 380K); NIEH1P01ES028942-01 ( 6M), NIH R03ES027406-01 ( 144K).Digital image NASA C15-2A38-USC ( 1.2M), NSF 1537776 ( 165K),correlationBoeing SSOW-BRT-W0915-0003 ( 140K)(DIC)Ntl. Estuarine NOAA: NA18NOS4200120 ( 760K), NA17NOS4200104Research( 980K), OOS.16 (028)USC.DP.MOD.1 ( 100K), U. Mich.Reserve3003300692 ( 340K), FL Env. Protection CM08P ( 92K),SystemNIEHS 1P01ES028942-01 ( 6M), USDA ( 43K).ParticleNSF 1614611 ( 900K), NSF 1307204 ( 1M), NSF 1808426astrophysics( 306K)Requirements100 Gbps throughput to PSI, JLab.Highthroughputtoothercollaborators (Brookhaven, Argonne)High throughput (at least 10 Gbps) toXSEDE (SDSC, TACC), PNNLHigh throughput with encryption (10Gbps) to internal and external HPCs,XSEDE, SDSC, TACCHigh throughput (5 Gbps) connectionfrom TOF-ICP-MS instrument toInternet2High throughput from USC’s DIClaboratory to HPCs (SDSC, TACC)running ABAQUS, ANSYSHigh throughput from NOAA’sNERRS repository (located at USC)to Internet2 (large datasets downloadsworldwide)100GbpsconnectiontoMAJORANA (SD), CUORE (Italy),NERSC (CA)Semiconductor NSF: 1810116 ( 371K), 1711322 ( 370K), 1553634 High throughput (at least 10 Gbps)material( 695K); NIBIB 1R03EB026813-01 ( 136K), DOD fromX-rayphotoelectronW911NF-18-1-0029 ( 585K), SRNL/DOE UC150 ( 24K), spectroscopy instrument and storageDOE DE-SC0019360 ( 666K), RCSA 23976 ( 100K)to Internet2 (SRNL, INL, Sandia,other institutions)

21Science DMZ Needs at USCChandraShustovaSemiconductor NSF: 1810116 ( 371K), 1711322 ( 370K), 1553634material( 695K); NIBIB 1R03EB026813-01 ( 136K), DODW911NF-18-1-0029 ( 585K), SRNL/DOE UC150 ( 24K),DOE DE-SC0019360 ( 666K), RCSA 23976 ( 100K)RichardsonMyrickPhytoplankton NSF 1542555 ( 2M) and DXP Supply Chain Servicesspectroscopy ( 40K)NormanGenomics data NSF 1149447 ( 850K), NIEH 1P01ES028942-01 ( 6M),miningNSF SC EPSCoR 2031-231-2022570 ( gnoHigh throughput (at least 10 Gbps)fromX-rayphotoelectronspectroscopy instrument and storageto Internet2 (SRNL, INL, Sandia,other institutions)High throughput (10 Gbps) fromimage photometer, storage to internaland external HPC100 Gbps throughput from genomicsseq. instrument/storage to USC’sHPC; 10 Gbps connection toFrederick, Argonne, Oak Ridge Ntl.Laboratories, XSEDE resourcesHigh throughput from USC’sestuarine database to HPCs andInternet2 (datasets downloads)100 Gbps connection to USC’s HPC;10 Gbps connection to transportDNA / RNA-seq. datasets to XSEDE100 Gbps connection from genomicslaboratory to USC’s HPC, XSEDE100Gbpsconnectionfromengineering storage to USC’s HPCNSF 1736557 ( 1M), NOAA R/ER-49 ( 130K), NSF1829519 ( 265K), NSF 1458416 ( 593K), NSF 1433313( 362K), NASA 23175500 ( 167K)Genomics,NSF1556645( 1.2M),SCSeaGrantaquaticConsortium/NOAA/DOCN250( 40K),DODbiologyW81XWH1810088 ( 287K)Math, genome NSF: 1751339 ( 290K), 1410047 ( 210K)dynamicsMathematical SC Department of Commerce ( 300K), Duke Endowmentmodelsfor Child Care Division 1971-SP ( 646K), American CancerpatientSociety IRG-17-179-04 ( 30K), Patient-Centered OutcomestreatmentResearch Institute ME-1303-6011 ( 960K)OtherUSC NOAA/DOC NA18NMF4330239 ( 503K), NOAA/DOC 10 Gbps connection to move datasetscampuses,NA18NMF4270203 ( 230K), NOAA NA17NMF4540137 between USC Aiken - Internet2genomics( 153K), NOAA 719583-712683 ( 189K), NOAANA15NMF4330157 ( 466K).Cyberinfrast. NSF 1822567 ( 420K), NSF 1829698 ( 500K)100 Gbps programmable network

22USC’s Science DMZNAT, securityBorderappliances routers (3)ICampusenterprisenetwork (CEN)I2SDMZBro clusterS2S1L1L2Labs, inst.,storageSciencebuilding 1L3L4Labs, inst.,storageEngineeringbuildingS1-4: Spine switches 1-4L1-4: Leaf switches 1-4I: InternetI2: Internet2Labs, inst.,storageResearch CI100 Gbps40 to 100 Gbps10 to 40 Gbps10 GbpsSciencebuilding 2SANDTNHPCperfSONARCC* Networking Infrastructure: Building a Science DMZ for Data-intensive Research and Computation at the University of South Carolina, NSFAward # 1925484. Available online at https://nsf.gov/awardsearch/showAward?AWD ID 1925484&HistoricalAwards false

23U.S. Backbones: Internet2 and ESnetInternet2ESnet

24Science DMZs in the U.S. Science DMZ deployments, U.S.

25Agenda Introduction to University of South Carolina (USC) The Science DMZ Motivation for a high-speed ‘science’ network architecture Science DMZ architecture Research opportunities: pacing, entropy-based intrusion detection,routers’ buffer size Resources online

26Research Opportunities – Pacing Packet loss is expensive in high-throughput high-latency networksPacket lossSending rateAdditive increaseMultiplicative a) Sawtooth behaviorBuffer(b) TCP view of a connectionM. Mathis, J. Semke, J. Mahdavi, T. Ott, “The macroscopic behavior of the tcp congestion avoidance algorithm,” ACM Computer CommunicationReview, vol. 27, no 3, pp. 67-82, Jul. 1997.

27Research Opportunities – Pacing Packet loss is expensive in high-throughput high-latency networksPacket lossSending rateAdditive increaseMultiplicative a) Sawtooth behaviorBuffer(b) TCP view of a connectionMSS: maximum segment sizeRTT: round-trip timep: loss ratec: constant(c) Average throughput(d) Impact of packet lossM. Mathis, J. Semke, J. Mahdavi, T. Ott, “The macroscopic behavior of the tcp congestion avoidance algorithm,” ACM Computer CommunicationReview, vol. 27, no 3, pp. 67-82, Jul. 1997.

28Pacing With TCP pacing, a transmitter evenly spaces or pacespackets at a pre-configured rate helps to mitigate transient bursts improves fairness challenge: how to discover the bottleneck bandwidth?

29Pacing40403535Throughput (Gbps)Throughput (Gbps) Consider the following test1 100 Gbps network, 92 msec RTT Four concurrent flows302520151030252015105520406080Time (seconds)(a) Regular TCP10020406080Time (seconds)100(b) Operator sets rates manually1. 40G-100G-data-transfer-nodes.pdf

30ENABLING TCP PACING USINGPROGRAMMABLE DATAPLANE SWITCHESElie Kfoury, Jorge CrichignoCollege of Engineering and ComputingUniversity of South CarolinaIEEE Telecommunications and Signal Processing Conference (TSP’19)Budapest, HungaryJuly 1, 2019

31Overview P4 Switches Programming Protocol-Independent Packet Processors (P4) isa programming language for switches SDN is used to program the control plane P4 switches permit operators to program the data plane Add proprietary features: invent, develop custom protocolsP4 codeBarefoot’s Tofino (2016)

32Overview P4 Switches Programming Protocol-Independent Packet Processors (P4) isa programming language for switches SDN is used to program the control plane P4 switches permit operators to program the data plane Add proprietary features: invent, develop custom protocolsN. McKeown, “Software Defined Networking: How it has transformed networking and what happens next,” Future Forum Summit, Beijing, Nov.2018. Available online at http://yuba.stanford.edu/ nickm/talks.html.

33Pacing using Programmable Switches What if a sender’s rate is adjusted based on feedbackprovided by a P4 switch? Feedback includes number of large flows and moreBorder router (p4)Internet2SDMZDTN1100 GbpsDTN2DTN3Remoteserver 1Remoteserver 2Remoteserver 3

34Pacing using Programmable SwitchesHost 1DTN 1Initiate Large FlowP4 switchesHost 2DTN 2UpdatePacing RateHost 3bottleneckBroadcastState’Update StateHost 4DTN 3Update StateDTN 4

35Pacing using Programmable Switches Switches store network’s state (number of large flows) To initiate a large flow, a DTN inserts a custom header duringthe TCP 3-way handshake, using the IP options field Switches parse custom header, update number of large flows Number of large flows is returned in the SYN-ACK message,and sent to all DTNs. DTNs update their pacing rateBorder router (p4)Internet2SDMZDTN1DTN2DTN3Remoteserver 1Remoteserver 2Remoteserver 3100 GbpsSample topologyCustom protocol built using IP options field

36Emulation Results The custom protocol was implemented in Mininet The P4 switch is the BMv2 from P4.org Four hosts (DTNs) generating flows; 100 Mbps, 20ms RTT Hosts adjusted their pacing rate using two pacing disciplinesFair Queue (FQ)Hierarchical Token Bucket (HTB)Host 1P4 switchesHost 2Host 3Host 4bottleneck

37Emulation ResultsPeriod 1Period 2Period 3Period 4Period 1Period 2Period 3Period 4ThroughputCoefficient of variation and Jain’s fairnessPeriod 1Period 2Period 3Period 4

38Work in progress Implement proposed protocol using a real P4 switchednetwork Support for more complex topologies Extend the sharing bandwidth scheme for scenarioswhere an uneven allocation is desirable (priorities) Use proposed protocol in the production Science DMZ atUSC

39Agenda Introduction to University of South Carolina (USC) The Science DMZ Motivation for a high-speed ‘science’ network architecture Science DMZ architecture Research opportunities: pacing, entropy-based intrusion detection,routers’ buffer size Resources online

40A FLOW-BASED ENTROPYCHARACTERIZATION OF A NATEDNETWORK AND ITS APPLICATION ONINTRUSION DETECTIONJorge CrichignoCollege of Engineering and ComputingUniversity of South CarolinaIEEE International Conference on Communications (ICC’19)Shanghai, ChinaMay 22, 2019

41Motivation Offline scalable security appliances are required in high-speed networks such as Science DMZs There are two approaches to characterize traffic:Flow-based: information collected from header fieldsPayload-based: information collected from payload (deep inspection) The amount of processing of payload-based approachesmay become excessive at very high rates1, 21. R. Hofstede, P. Celeda, B. Trammell, I. Drago, R. Sadre, A. Sperotto, A. Pras, “Flow monitoring explained: from packet capture todata analysis with netFlow and ipfix,” IEEE Communications Surveys and Tutorials, vol. 16, no. 4, 2014.2. A. Gonzalez, J. Leigh, S. Peisert, B. Tierney, A. Lee, J. Schopf, “Monitoring big data transfers over international research networkconnections,” in Proceedings of the IEEE International Congress on Big Data,, Jun. 2017.

42Motivation Most networks use Network Address Translation (NAT) Although NAT has been used since early 2000s, trafficbehind NAT has not been characterized One approach for flow characterization is to measure therandomness or uncertainty of elements of a flow E.g., entropy of IP addresses, ports, and combinations Goal: characterize normal traffic behavior (entropy) byusing flow information

43Methodology A flow is uniquely identified by the external IP, campus IP,external port, campus port, protocol Measure flow-element entropiesInternetssh.usf.eduCampus networkgmail.commsnbc.comcnn.comInbound flowsabc.comgoogle.com

44Methodology A flow is uniquely identified by the external IP, campus IP,external port, campus port, protocol Measure flow-element entropiesInternetssh.usf.edu (22)Campus networkgmail.com (80)msnbc.com (80)cnn.com (80)Inbound flowsExternal port: low uncertainty;most external ports expected tobe 80 (http)abc.com (80)google.com (80)

45Methodology Entropy provides a measure of randomness or uncertainty For a variable X, entropy of X For the previous port example, let X be the variableindicating the external port

46Methodology Entropy provides a measure of randomness or uncertainty For a variable X, entropy of X For the previous port example, let X be the variableindicating the external port

47Methodology Entropy provides a measure of randomness or uncertainty For a variable X, entropy of X For the previous port example, let X be the variableindicating the external port 0 entropy no uncertainty (e.g., all external ports are 80) 1 entropy random - high uncertainty

48Methodology Campus network with 15 buildings Inbound traffic is used as a reference (external IP address isin the Internet, campus IP address is on campus) The collector organizes flow data in five-minute time slots Traffic data observed during a week is representative of thecampus trafficNAT / Border InternetNetflow trafficCampus traffic

49Methodology The entropy of a random variable X is:where x1, x2, xN is the range of values for X, and p(xi) is theprobability that X takes the value xi For each external (campus) IP address (port) xi, theprobability p(xi) is calculated as Entropies are normalized

50Methodology This paper also considers the entropy of the 3-tuple{external IP, campus IP, campus port} For a given 3-tuple xi, the corresponding probability iscalculated as

51ResultsExternal IPSat.3/25/17External PortFri.3/31/17External IP In general, high entropy, ‘many’ external IPaddresses External IPs dispersed in the Internet Abnormal low entropy points Entropy near zero (no uncertainty of the external IPaddress), or ‘very low’ level (few external IPaddresses dominate the distribution)Sat.3/25/17Fri.3/31/17External port Higher entropy during the night, weekends Low entropy during the day, noon Large volume of http flows when students are oncampus (less uncertainty/entropy on external port) Abnormal high entropy points Entropy widely varies over ‘hours’ but not over veryshort time periods

52ResultsCampus IPSat.3/25/17Campus PortFri.3/31/17Campus IP In general, low entropy, ‘few’ IP addresses oncampus Higher entropy on weekends and at night Lower entropy when students are on campus A handful of public IP addresses used for regularInternet connectivity (NAT operation) Entropy varies over ‘hours’ but not over very shorttime periodsSat.3/25/17Fri.3/31/17Campus port Lower entropy at night High entropy (close to uniform distribution) at noon Dynamic ports used by browsers when studentsconnect to the Internet Abnormal low entropy points Entropy widely varies over ‘hours’ but not over veryshort time periods

53Abnormal behaviorResults Anomalies are detected by a single feature or by correlating multiple features E.g., event I: low campus port’s entropy, high external port’s entropy, low external IP’s entropy

54ResultsDistributed bruteforce attack

55Results Correlation of entropy time-series

56Agenda Introduction to University of South Carolina (USC) The Science DMZ Motivation for a high-speed ‘science’ network architecture Science DMZ architecture Research opportunities: pacing, entropy-based intrusion detection,routers’ buffer size Resources online

57ROUTERS’ BUFFER SIZE

58Bufferbloat Routersand switches must have enough memoryallocated to hold packets momentarily (buffering) Rule of thumb: Buffer size RTT · bottleneck bandwidth1, 2Bottleneck bandwidth link ssingPropagationQueueing (waiting fortransmission)1. C. Villamizar, C. Song, “High performance TCP in ansnet,” ACM Computer Communications Review, vol. 24, no. 5, pp. 45-60, Oct. 1994.2. R. Bush, D. Meyer, “Some internet architectural guidelines and philosophy,” Internet Request for Comments, RFC Editor, RFC 3439, Dec.2003. [Online]. Available: https://www.ietf.org/rfc/rfc3439.txt.

59Bufferbloat Bufferbloat is a condition that occurs when the routerbuffers too much data, leading to excessive delaysBottleneck bandwidth link ssingQueueing (waiting fortransmission)Propagation

60Bufferbloat Bufferbloat is a condition that occurs when the routerbuffers too much data, leading to excessive delaysBottleneck bandwidth link ng rateProcessingTimePacket lossAdditive increaseMultiplicative decreaseQueueing (waiting fortransmission)Propagation

61Bufferbloat Bufferbloat is a condition that occurs when the routerbuffers too much data, leading to excessive delaysBottleneck bandwidth link ssingPropagationQueueing (waiting fortransmission)App. limitedBandwidth limitedBuffer limitedRTTSending rateBufferbloat starts: queueingdelay increases at router’squeuePacket lossOperating point oftraditional algorithmsRTpropOptimal operating pointTimeAdditive increaseMultiplicative decreaseThroughputPacket lossbtlbwInflight dataBDP RTprop · btlbwBDP buffer size1. N. Cardwell, Y. Cheng, C. Gunn, S. Yeganeh, V. Jacobson, “BBR: congestion-based congestion control,” Communications of theACM, vol 60, no. 2, pp. 58-66, Feb. 2017.

62Bufferbloat Topology Lab 14 1 Gbps, 20ms link s1-h3 Measure RTT and throughput h1 h3 Modify buffer size at s1 (interface s1-eth2) Case 1: buffer size (1 109) (20 10-3) [bits] 2,500,000 [bytes] Case 2: buffer size 25,000,000 [bytes]h1h1-eth040 Gbpsh3s1s1-eth1s1-eth3h240 Gbpsh2-eth0s1-eth2h3-eth01 Gbps, 20ms

63BufferbloatBuffer size 1 BDPBuffer size 10 BDP

64Agenda Introduction to University of South Carolina (USC) The Science DMZ Motivation for a high-speed ‘science’ network architecture Science DMZ architecture Research opportunities: pacing, entropy-based intrusion detection,routers’ buffer size Resources online

65Resources Online CI Lab website http://ce.sc.edu/cyberinfra/ A tutorial on Tools and Protocols for High-Speed Networks http://ce.sc.edu/cyberinfra/workshop.html University of South Carolina https://sc.edu/

66Additional Slide Protocol Independent Switch Architecture

Motivation for a High-Speed Science Architecture Enterprise network limitations: Security appliances (IPS, firewalls, etc.) are CPU-intensive Inability of small-buffer routers/switches to absorb traffic bursts At best, transfers of big data may last days or even weeks 16 1E. Dart, L. Rotman, B. Tierney, M. Hester, J. Zurawski, "The science dmz: a network design pattern for data .