LargeSynopticSurveyTelescop
DataManagementDatabase
JacekBecla,DanielWang,SergeMonkewitz,
AndySalnikov,AndrewHanushevsky,Douglas
MichaelKelsey,andFritzMueller
LDM-135
LatestRevision:2017-07-07
ThisLSSTdocumenthasbeenapprovedasaContent-ControlledDocument
nicalControlTeam.Ifthisdocumentischangedorsuperseded,
theHandledesignationshownabove.Thecontrolisonthemost
thisHandleintheLSSTdigitalarchiveandnotprintedversions.
foundinthecorrespondingDMRFC.
Abstract
ThisdocumentdiscussestheLSSTdatabasesystemarchitecture.
LARGESYNOPTICSURVEYTELESCOPE
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
ChangeRecord
VersionDateDescriptionOwnername
1.02009-06-15Initialversion.JacekBecla
2.02011-07-12Mostsectionsrewritten,addedscalabilitytest
section
JacekBecla
2.12011-08-12Refreshedfuture-plansandscheduleoftest-
ingsections,addedsectionaboutfaulttoler-
ance.
JacekBecla,Daniel
Wang
3.02013-08-02Synchronizedwithlatestchangestothere-
quirements[LSE-163].Rewrotemostofthe
“Implementation”chapter.Documentednew
tests,refreshedallotherchapters.
JacekBecla,Daniel
Wang,SergeMonke-
witz,Kian-TatLim,
DouglasSmith,Bill
Chickering
3.12013-10-10RefreshednumbersbasedonlatestLDM-141.
Updatedsharedscans(implementation)and
300-nodetestsections,addedsectionabout
sharedscansdemonstration
JacekBecla,Daniel
Wang
3.22013-10-10TCTapprovedRAllsman
2016-07-18Updatewithasyncquery,sharedscan,sec-
ondaryindex,XRootD,metadataserviceinfor-
mation.
JohnGates,Andy
Salnikov,Andrew
Hanushevsky,Michael
Kelsey,FritzMueller
2017-07-05Movehistoricalinvestigationstoseparate
documents:DMTN-046,DMTN-047,DMTN-
048,DMTR-21,DMTR-12
T.Jenness
4.02017-07-07Bringuptodatewithcurrentstatus,condense
requirementssection,re-ordersectionsfor
improvedreadability.
FritzMueller
Documentcurator:FritzMueller
https://github.com/lsst/LDM-135
The contents of this document are subject to configuration control by the
Team.
ii
LARGE SYNOPTIC SURVEY TELESCOPE
Data Management Database Design
LDM-135
Latest Revision 2017-07-07
Contents
1 Executive Summary
1
2 Introduction
2
3 Requirements
3
3.1 General Requirements
........................
3.2 Data Production Related
...................
Requirements
3.3 Query Access Related
....................
Requirements
3.4 Discussion
...........................
3.4.1 Design Considerations
......................
3.4.2 Query complexity and
.................
access patterns
4 Baseline Architecture
7
4.1 Alert Production and..................
Up-to-date Catalog
4.2 Data Release
.......................
Production
4.3 User Query
.........................
Access
4.3.1 Distributed
.....................
and parallel
4.3.2 Shared-nothing
........................
4.3.3 Indexing
..........................
4.3.4 Shared.scanning
......................
4.3.5 Clustering
.........................
4.3.6 Partitioning
.........................
4.3.7 Long-running
......................
queries
4.3.8 Technology
.......................
choice
5 Implementation (Qserv)
18
5.1 Components
...........................
5.1.1 MySQL
..........................
5.1.2 XRootD
..........................
5.2 Partitioning
...........................
The contents of this document are subject to configuration control by the
Team.
iii
LARGE SYNOPTIC SURVEY TELESCOPE
Data Management Database Design
LDM-135
Latest Revision 2017-07-07
5.3 Query Generation
.........................
5.3.1 Processing
......................
modules
5.3.2 Processing module
....................
overview
5.4 Dispatch
............................
5.4.1 Wireprotocol
........................
5.4.2 Frontend
..........................
5.4.3 Worker
..........................
5.5 Threading
..........................
Model
5.6 Aggregation
...........................
5.7 Indexing
............................
5.7.1 Secondary Index
....................
Structure
5.7.2 Secondary Index
.....................
Loading
5.8 Data Distribution
.........................
5.8.1 Database data
....................
distribution
5.8.2 Failure and integrity
..................
maintenance
5.9 Metadata
............................
5.9.1 Static
........................
metadata
5.9.2 Dynamic.......................
metadata
5.9.3 Architecture
.........................
5.9.4 Typical
.......................
Data Flow
5.10 Shared
...........................
Scans
5.10.1 Background
.........................
5.10.2 Implementation
........................
5.10.3 Memory management
......................
5.10.4 XRootD scheduling
....................
support
5.10.5 Multiple tables
.....................
support
5.11 Level 3: User Tables,
....................
External Data
5.12 Cluster and Task
.....................
Management
5.13 Fault Tolerance
..........................
5.14 Next-to-database
......................
Processing
The contents of this document are subject to configuration control by the
Team.
iv
LARGE SYNOPTIC SURVEY TELESCOPE
Data Management Database Design
LDM-135
Latest Revision 2017-07-07
5.15 Administration
..........................
5.15.1 Installation
.........................
5.15.2 Data.........................
loading
5.15.3 Administrative
......................
scripts
5.16 Current Status and
.....................
Future Plans
5.17 Open Issues
...........................
6 Risk Analysis
49
6.1 Potential
.........................
Key Risks
6.2 Risk Mitigations
..........................
7 References
52
The contents of this document are subject to configuration control by the
Team.
v
LARGE SYNOPTIC SURVEY TELESCOPE
Data Management Database Design
LDM-135
Latest Revision 2017-07-07
Data Management Database Design
1 Executive Summary
Two facets of LSST database architecture and their motivating
database architecture in support of real time Alert Production,
support of user query access to catalog data. Following this,
implementation of the query access architecture.
The LSST baseline database architecture for real time Alert
time-based partitioning. To guarantee reproducibility,
bined with maintaining validity time for appropriate rows
cas are maintained to isolate live production catalogs from
chronized in real time using native database replication.
The LSST baseline database architecture for user query access
sively parallel processing) relational database composed
a distributed communications layer, and a master controller,
cluster of commodity servers with locally attached spinning
tal scaling and recovering from hardware failures without
catalogs are spatially partitioned
chunks
horizontally
, and the remaining
into materialized
cat-
alogs are replicated on each server; the chunks are distributed
ject catalog is further
sub-chunks
partitioned
with overlaps,
intomaterialized on-the-
needed. Chunking is handled automatically without exposure
also partitioned vertically to maximize performance of
uses a few critical indexes to speed up spatial searches,
interactive queries. Shared scans are used to answer all
architecture is primarily driven by the variety and complexity
from single object
lookups??(??2)full-skycorrelationsoverbillions
to complex
Aprototypeimplementationofthebaselinearchitecture
above,Qserv,wasdevelopedduringtheR&DphaseofLSST,andits
stratedinearlytesting.Productizationwassubsequently
structionphaseofLSSTandispresentlyunderway.
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
1
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
Qservleveragestwomature,open-sourcetechnologiesas
MySQLasaSQLexecutionengine(thoughanyalternative
withoutundueeffortifneedbe),andXRootD
1[11]toprovideadistributed-system
forfault-tolerant,elastic,content-addressedmessaging.
WecurrentlymaintainthreerunninginstancesofQservat
nodeseachincontinuousoperationondedicatedhardware
Thesystemhasbeendemonstratedtocorrectlydistribute
volumequeries,includingsmall-areaconeandobjectsearches,
fulltablescansincludinglarge-areanear-neighborsearches.
UDFsinbothfilterandaggregationclauseshavealsobeen
beensuccessfullyconductedontheabove-mentionedclusters
imately70TB,andweexpecttocrossthe100TBmarkastests
systemisontrackthroughaseriesofgraduateddata-challenge
thestatedperformancerequirementsfortheproject.
Ifanequivalentopen-source,communitysupported,off-the-shelf
becomeavailableintime,itcouldpresentsignificantsupport
readyQserv.Thelargestbarrierpreventingusfromusing
sufficientsphericalgeometryandsphericalpartitioning
Toincreasethechancessuchasystemwillbecomereality
collaboratewiththeMonetDBopensourcecolumnardatabase
strationofQservbasedonMonetDBinsteadofMySQLwasdone
currentwiththestate-of-the-artinpetascaledatamanagement
dialogwithallrelevantsolutionproviders,bothDBMSand
intensiveusers,bothindustrialandscientific,through
2conferenceserieswelead,
andbeyond.
2Introduction
ThisdocumentdiscussestheLSSTdatabasesystemarchitecture
tationofpartofthatarchitecture(Qserv)inparticular.
2https://xldb.org
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
2
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
Section3summarizesLSSTdatabase-relatedrequirementsthat
Section4discussesthebaselinearchitectureitself.Section5discussesQservasanimple-
mentationofthebaselinearchitectureforuserqueryaccess.6coversattendantrisk
analysis.Forsomeadditionalbackground,DMTN-046coversin-depthanalysisof
potentialsolutions(Map/ReduceandRDBMS)asof2013,andDMTR-21andDMTR-12describe
largescaleQservtestsfrom2013.ThefullQservtestspecificationLDM-552.
DMTN-048discussestheoriginaldesigntrade-offsanddecision
teststhatwererunandsomeQservdemonstrations.
3Requirements
FormalDMdatabaserequirementsarecalledoutinLDM-555.Forpurposesofexposition,
thissectionsummarizessomeofthekeyrequirementswhich
tecture.
3.1GeneralRequirements
Incrementalscaling.Thesystemmustscaletotensofpetabytesand
mustgrowasthedatagrowsandastheaccessrequirements
becomeavailableduringthelifeofthesystemmustbeable
quantitativestorage,diskandnetworkbandwidthandI/OLDM-141.
Reliability.Thesystemmustnotlosedata,anditmustprovide
faceofhardwarefailures,softwarefailures,systemmaintenance,
Lowcost.Itisessentialtonotoverruntheallocatedbudget,
open-sourcesolutionisstronglypreferred.
3.2DataProductionRelatedRequirements
TheLSSTdatabasecatalogswillbegeneratedbyasmallset
•DataReleaseProduction–itproducesallkeycatalogs.
DRPtakesseveralmonthstocompleteandisdominated
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
3
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
jobs.Ingestcanbedoneseparatelyfrompipelineprocessing,
•NightlyAlertProduction–itproducesdifferenceimage
ject,SSObject,DiaSource,DiaForcedSourcecatalogs.
inunderaminuteafterdatahasbeentaken,datahasto
realtime.Thenumberofrowupdates/ingestedismodest:
occurevery~39sec[7].
•CalibrationPipeline–itproducescalibrationinformation.
nostringenttimingrequirements,ingestbandwidthneeds
Inaddition,thecameraandtelescopeconfigurationiscaptured
Database.Datavolumesareverymodest.
Further,theLevel1livecatalogwillneedtobeupdated
shouldnotbetakenoff-lineforextendedperiodsoftime.
Thedatabasesystemmustallowforoccasionalschemachanges
occasionalchangesthatdonotalterqueryresults
3fortheLevel2dataafterthe
beenreleased.Schemasfordifferentdatareleasesareallowed
3.3QueryAccessRelatedRequirements
TheScienceDataArchiveDataReleasequeryloadisdefined
largecatalogsinthearchive:Object,Source,andForcedSource.
forexample,thoughnumerous,areexpectedtobefast.In
Reproducibility.QueriesexecutedonanyLevel1andLevel2data
ducible.
Realtime.Alargefractionofad-hocuseraccesswillinvolve
–queriesthattouchsmallareaofsky,orrequestsmallnumber
3Exampleofnon-alteringchangesincludingadding/removing/resorting
derivedinformation,changingtypeofacolumnwithoutloosinginformation,FLOATtoDOUBLEwouldbealways
allowed,DOUBLEtoFLOATwouldonlybeallowedifallvaluescanbeexpressedusingFLOATwithoutloosingany
information)
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
4
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
requiredtobeansweredinafewseconds.Onaverage,weexpect
runningatanygiventime.
Fastturnaround.High-volumequeries–queriesthatinvolvefull-sky
tobeansweredinapproximately1hour,whilemorecomplex
correlationsareexpectedtobeansweredin~8-12hours.
queriesareexpectedtoberunningatanygiventime.
Cross-matchingwithexternal/userdata.Occasionally,LSSTdatabasecatalog
becross-matchedwithexternalcatalogs:bothlarge,such
suchassmallamateurdatasets.Usersshouldbeableto
accessthemduringsubsequentqueries.
Querycomplexity.Thesystemneedstohandlecomplexqueries,including
tions,timeseriescomparisons.Spatialcorrelationsare
–thisisanimportantobservation,asthisclassofqueries
partitioningwithoverlaps.
Flexibility.Sophisticatedendusersneedtobeabletoaccess
withasfewconstraintsaspossible.Manyenduserswill
SQL,somostofbasicSQL92willberequired.
3.4Discussion
3.4.1DesignConsiderations
Theaboverequirementshaveimportantimplicationsonthe
•Thesystemmustallowrapidselectionofsmallnumber
tables.Toachievethis,efficientdataindexinginboth
isessential.
•Thesystemmustefficientlyjoinmulti-trillionwithmulti-billion
izingthesetablestoavoidcommonjoins,suchasObject
ForcedSource,wouldbeprohibitivelyexpensive.
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
5
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
•Thesystemmustprovidehighdatabandwidth.Inorder
minutes,databandwidthsontheorderoftenstohundreds
required.
•Toachievehighbandwidths,toenableexpandability,
systemwillneedtorunonadistributedclustercomposed
•Themosteffectivewaytoprovidehigh-bandwidthaccess
partitionthedata,allowingmultiplemachinestowork
partitioningisalsoimportanttospeedupsomeoperations
building.
•Multiplemachinesandpartitioneddatainturnimply
willbeexecutedinparallel,requiringthemanagement
tasks.
•Limitedbudgetimpliesthesystemneedstogetmostout
incrementallyasneeded.ThesystemwillbediskI/Olimited,
attachingmultiplequeriestoasingletablescan(shared
3.4.2Querycomplexityandaccesspatterns
Acompilationofrepresentativequeriesprovidedbythe
enceCouncil,andothersurveyshavebeencaptured[5].Thesequeriescanbedivided
severaldistinctgroups:analysisofasingleobject,analysis
inaregionoracrossentiresky,analysisofobjectsclose
specialgrouping,timeseriesanalysisandcrossmatchwith
astothecomplexityrequired:thesequeriesincludedistance
self-joins,andtimeseriesanalysis.
Smallqueriesareexpectedtoexhibitsubstantialspatial
similarspatialcoordinates:rightascensionanddeclination).
expectedtoexhibitaslightlydifferentformofspatial
havenearbyspatialcoordinates.Spatialcorrelations
spatialcorrelationswillnotbeneededonSourceorForcedSourcetables.
Queriesrelatedtotimeseriesanalysisareexpectedto
vationsforagivenObject,sotheappropriateSourceor
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
6
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
joinedandaggregatefunctionsoperatingoverthelistof
Externaldatasetsanduserdata,includingresultsfrom
tributedalongsidedistributedproductiontabletoprovide
Thequerycomplexityhasimportantimplicationsontheoverall
tem.
4BaselineArchitecture
Thissectiondescribesthemostimportantaspectsofthe
ThechoiceofthearchitectureisdrivenbytheprojectrequirementsLDM-555)aswellas
cost,availability,andmaturityoftheoff-the-shelfsolutions
ket(seeDMTN-046),anddesigntrade-offs(seeDMTN-048).Thearchitectureisperiodically
revisited:wecontinuouslymonitorallrelevanttechnologies,
baselinearchitecture.
Insummary:
•TheLSSTbaselinearchitectureforAlertProductionis
RDBMSsystemwhichusesreplicationforfaulttolerance
horizontal(time-based)partitioning;
•ThebaselinearchitectureforuseraccesstoDataReleases
lelprocessing)relationaldatabaserunningonashared-nothing
serverswithlocallyattachedspinningdiskdrives;capable
(b)recoveringfromhardwarefailureswithoutdisrupting
alogsarespatiallypartitionedintomaterializedchunks,andtheremainingcatalogs
replicatedoneachserver;thechunksaredistributed
alogisfurtherpartitionedintosub-chunkswithoverlaps,4materializedon-the-fly
needed.Sharedscansareusedtoanswerallbutlow-volume
4Achunk’soverlapisimplicitlycontainedwithintheoverlapsofits
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
7
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
4.1AlertProductionandUp-to-dateCatalog
AlertProductioninvolvesdetectionandmeasurementof
(DiaSources).NewDiaSourcesarespatiallymatchedagainst
istingDiaObjects,whichcontainsummarypropertiesfor
falsepositives).UnmatchedDiaSourcesareusedtocreate
hasanassociatedDiaSourcethatisnomorethanamonthold,
(DiaForcedSource)istakenatthepositionofthatobject,
wasdetectedintheexposureornot.
TheoutputofAlertProductionconsistsmainlyofthreelarge
andDiaForcedSource-aswellasseveralsmallertables
exposures,visitsandprovenance.Thesecatalogswillbe
NotethatexistingDiaObjectsareneveroverwritten.Instead,
andDRP-producedDiaObjectsareinserted,allowingusers
ertiesofDiaObjectsasknowntothepipelinewhenalerts
historicalqueries,eachDiaObjectrowistaggedwitha
timeofanewDiaObjectversionissettotheobservation
Sourcethatledtoitscreation,andtheendtimeissetto
itsvalidityendtimeisupdated(inplace)toequalthestart
themostrecentversionsofDiaObjectscanalwaysberetrieved
SELECT*FROMDiaObjectWHEREvalidityEnd=infinity
Versionsasofsometimetareretrievablevia:
SELECT*FROMDiaObjectWHEREvalidityStart<=tANDt<validityEnd
NotethataDiaSourcecanalsobere-associatedtoasolar-system
processing.ThiswillresultinanewDiaObjectversionunless
associatedDiaSources.Inthatcase,thevalidityendtime
timeatwhichthere-associationoccurred.
OnceaDiaSourceisassociatedwithasolarsystemobject,
DiaObject.Therefore,ratherthanalsoversioningDiaSources,
associatedDiaObjectandsolarsystemobject,aswellas
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
8
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
Re-associationwillsetthesolarsystemobjectIDandre-association
forDiaObject123attimetcanbeobtainedusing:
SELECT*
FROMDiaSource
WHEREdiaObjectId=123
ANDmidPointTai<=t
AND(ssObjectIdisNULLOsRsObjectReassocTime>t)
DiaForcedSourcesareneverre-associatedorupdatedin
Fromthedatabasepointofviewthen,thealertproduction
databaseoperations189times(onceperLSSTCCD)pervisit
1.Issueapoint-in-regionqueryagainsttheDiaObject
versionsoftheobjectsfallinginsidetheCCD.
2.UsetheIDsofthesediaObjectstoretrieveallassociated
Sources.
3.InsertnewdiaSourcesanddiaForcedSources.
4.UpdatevalidityendtimesofdiaObjectsthatwillbesuperseded.
5.InsertnewversionsofdiaObjects.
Allspatialjoinswillbeperformedonin-memorydataby
database.WhileAlertProductiondoesalsoinvolveaspatial
produced)Objectcatalog,thisdoesnotrequireanydatabase
arenevermodified,sotheObjectcolumnsrequiredforspatial
compactbinaryfilesonceperDataRelease.Thesefileswill
veryfastregionqueries,allowingthedatabasetobebypassed
TheDiaSourceandDiaForcedSourcetableswillbesplitinto
andonecontainingrecordsinsertedduringthecurrent
besmallandrelyonatransactionalenginelikeInnoDB,
failures.Thehistorical-datatableswillusethefaster
ageengine,andwillalsotakeadvantageofpartitioning.
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
9
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
seedthelivecatalogswillbestoredinasingleinitial
erarchicalTriangularMeshtrixelIDsfortheirpositions).
diaForcedSourcesforthediaObjectsinaCCDwillbelocated
ingseeks.Everymonthofnewdatawillbestoredinafresh
Suchpartitionswillgrowtocontainjustafewbillionrows
forthelargestcatalog.Attheendofeachnight,thecontents
sortedandappendedtothepartitionforthecurrent-month,
entirecurrent-monthpartitionissortedspatially(during
monthiscreated.
ForDiaObject,thesameapproachisused.However,DiaObject
occurinanypartition,andarenotconfinedtothecurrent-night
touseatransactionalstorageenginelikeInnoDBforall
tablesusingtheprimarykey,wewilllikelydeclareitto
followedbydisambiguatingcolumns(diaObjectId,validityStart).
willnotbepartofanyindex.
Nouserquerieswillbeallowedontheliveproductioncatalogs.
separatereplicajustforuserqueries,synchronizedin
nativedatabasereplication.Thecatalogsforuserqueries
livecatalogs,andviewswillbeusedtohidethesplits(usingUNIONALL”).
Foradditionalsafety,wemightchoosetoreplicatethesmall
partitions,andtheremaining(small)changingtablesto
ofdisastrousmasterfailurethatcannotbefixedrapidly,
beusedasatemporaryreplacement,anduserquerieswill
resolved.
Basedonthesciencerequirements,onlyshort-running,
neededontheLevel1catalogs.Themostcomplexqueries,
borqueries,willnotbeneeded.Instead,userquerieswill
searches,lightcurvelookups,andhistoricalversions
sortedspatially,weexpecttobeabletoquicklyanswer
IDcolumnsandtheSciSQLUDFs,anapproachthathasworked
date.Furthermore,notethatthepositionsofdiaSources/diaForcedSources
thesamediaObjectwillbeveryclosetogether,sothatsorting
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
10
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
alsoendsupplacingsourcesbelongingtothesamelightcurve
thedataorganizationusedtoprovidefastpipelinequery
userqueries.
4.2DataReleaseProduction
DataReleaseProductionwillinvolvethegenerationofsignificantly
Production.However,theseareproducedoverthecourse
notwritedirectlytothedatabase,andtherearenopipeline
executiontimerequirementstobesatisfied.Whilewedoexpect
tablescansoverthecourseofaDataReleaseProduction,
queriesinvolvingsuchscansonadailybasis.Userquery
ofourscalabledatabasearchitecture,whichisdescribed
thedataloadingprocess,pleaseseeSection5.15.2
4.3UserQueryAccess
Theuserqueryaccessistheprimarydriverofthescalable
tectureisdescribedbelow.
4.3.1Distributedandparallel
Thedatabasearchitectureforuserqueryaccessrelieson
amongautonomousworkernodes.Autonomousworkershave
otherandcancompletetheirassignedworkwithoutdataor
Thisimpliesthatdatamustbepartitioned,andthesystem
singleuserqueryintosub-queries,andexecutingthese
high-volumequerywithoutparallelizingitwouldtakeunacceptably
veryfastCPU.Theparallelismanddatadistributionshould
systemandhiddenfromusers.
4.3.2Shared-nothing
Sucharchitectureprovidesgoodfoundationforincremental
causenodeshavenodirectknowledgeofeachotherandcan
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
11
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
withoutdataormanagementfromtheirpeers,itispossible
fromsuchsystemwithno(orwithminimal)disruption.However,
andproviderecovermechanisms,appropriatesmartshave
agementsoftware.
Distributor
Combiner
MySQL
Node
MySQL
Node
MySQL
Node
MySQL
Node
Partitioned
Data
Partitioned
Data
Partitioned
Data
Partitioned
Data
Figure1:Shared-nothingdatabasearchitecture.
4.3.3Indexing
DiskI/Obandwidthisexpectedtobethegreatestbottleneck.
throughindex,whichtypicallytranslatestoarandomaccess,
sequentialread(unlessmultiplecompetingscansareinvolved).
Indexesdramaticallyspeeduplocatingindividualrows,
Theyareessentialtoanswerlowvolumequeriesquickly,
spatialindexesareessential.However,unlikeintraditional,
tagesofindexesbecomequestionablewhenalargernumber
atable.IncaseofLSST,selectingevena0.01%ofatable
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
12
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
rows.Sinceeachfetchthroughanindexmightturninto
readsequentiallyfromdiskthantoseekforparticular
indexitselfisout-of-memory.Forthatreasonthearchitecture
dexing,onlyasmallnumberofcarefullyselectedindexes
queries,enablingtablejoins,andspeedingupspatial
analyticalquerysystem,itmakessensetomakeasfewassumptions
willbeimportanttoourusers,andtotryandprovidereasonable
aqueryloadaspossible,i.e.focusonscanthroughputrather
therbenefittothisapproachisthatmanydifferentqueries
I/O,boostingsystemthroughput,whereascachingindex
feweropportunitiesforsharingasthequerycountscales
afford).
4.3.4Sharedscanning
Nowwithtable-scanningbeingthenormratherthantheexception
significantamountoftime,multiplefull-scanqueries
eachemployedtheirownfull-scanningreadfromdisk.Sharedconvoy
scheduling)sharestheI/Ofromeachscanwithmultiplequeries.
andallconcerningqueriesoperateonthatpiecewhileit
frommanyfull-scanqueriescanbereturnedinlittlemore
query.Sharedscanningalsolowersthecostofdatacompression
amongthesharingqueries,tiltingthetrade-offofincreased
heavilyinfavorofcompression.
Sharedscanningwillbeusedforallhigh-volumeandsuper-high
scanningishelpfulforunpredictable,ad-hocanalysis,
increasingthediskI/Ocost–onlymoreCPUisneeded.On
runthefollowingscans:
•onefulltablescanofObjecttableforthelatestdata
•onesynchronizedfulltablescanofObject,Sourceand
hoursforthelatestdatareleaseonly,
•onesynchronizedfulltablescanofObjectandObject_Extra
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
13
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
andpreviousdatareleases.
AppropriateLevel3usertableswillbescannedaspart
answeranyin-flightuserqueries.
Sharedscanswilltakeadvantageoftablechunkingexplained
singlenodeascanwillinvolvefetchingsequentiallya
onthischunkallqueriesinthequeue.Thelevelofparallelism
availablecores.
Runningmultiplesharedscansallowsrelativelyfastresponse
andsupportingcomplex,multi-tablejoins:synchronized
betweendifferenttables.Forself-joins,asingleshared
nodemusthavesufficientmemorytohold2chunksatanygiven
andnextchunk).Refertothesizingmodel[LDM-141]forfurtherdetailsonthecost
scans.
Low-volumequerieswillbeexecutedad-hoc,interleaved
numberofspinningdisksismuchlargerthanthenumberof
anygiventime,thiswillhaveverylimitedimpactonthe
showninLDM-141.
4.3.5Clustering
ThedataintheObjectCatalogwillbephysicallyclustered
objectscollocatedinspacewillbealsocollocatedondisk.
ForcedSource,DiaSource,DiaForcedSource)willbeclustered
objectId–thisapproachenforcesspatialclusteringand
sameobject,allowingsequentialreadforqueriesthatinvolve
SSObjectcatalogwillbeunpartitioned,becausethere
couldchoosetouseforpartitioning.TheassociateddiaSources
withdiaSourcesassociatedwithstaticdiaSources)will
sition.ForthatreasontheSSObject-to-DiaSourcejoin
allchunks,unlikeDiaObject-to-DiaSourcequeries.Since
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
14
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
shouldnotbeanissue.
4.3.6Partitioning
Datamustbepartitionedamongnodesinashared-nothingsharding
approachespartitiondatabasedonahashoftheprimary
LSSTdatasinceiteliminatesoptimizationsbasedoncelestial
4.3.6.1ShardeddataandshardedqueriesAllcatalogsthatrequirespatial
(Object,Source,ForcedSource,DiaSource,DiaForcedSource)
associatedwiththem,suchasObject_Extra,willbedivided
thesameareabypartitioningthenintodeclinationzones,andchunkingeachzoneRA
stripes.Further,tobeabletoperformtablejoinswithout
partitioningboundariesforeachpartitionedtablemust
tablescorrespondingtothesameareaofskymustbeco-located
surechunksareappropriatelysized,thetwolargestcatalogs,
expectedtobepartitionedintofiner-grainchunks.Since
constantdensitythroughoutthecelestialsphere,anequal-area
thatisuniformlydistributedoverthesky.
Smallercatalogsthatcanbepartitionedspatially,such
bepartitionedspatially.Allremainingcatalogs,such
catedoneachnode.Thesizeofthesecatalogsisexpected
Withdatainseparatephysicalpartitions,userqueries
ratephysicalqueriestobeexecutedonpartitions.Each
binedintoasinglefinalresult.
4.3.6.2Two-levelpartitionsDeterminingthesizeandnumberofdata
beobvious.Queriesarefragmentedaccordingtopartitions
titionsincreasesthenumberofphysicalqueriestobedispatched,
Thusagreaternumberofpartitionsincreasesthepotential
theoverhead.Foradata-intensiveandbandwidth-limited
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
15
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
tothenumberofdiskspindlesshouldminimizeseeksand
mance.
Fromamanagementperspective,morepartitionsfacilitate
whennodesareaddedorremoved.Ifthenumberofpartitions
nodes,thentheadditionofanewnodewouldrequirethedata
otherhand,ifthereweremanymorepartitionsthannodes,
assignedtothenewnodewithoutre-computingpartition
Smallerandmorenumerouspartitionsbenefitspatialjoins.
areinterestedinobjectsnearotherobjects,andthusafull??(??2)joinisnotrequired–alocalized
spatialjoinismoreappropriate.Withspatialdatasplit
computingthejoinneednotevenconsider(andreject)all
allthepairswithinaregion.Thusataskthatis??(??2)naivelybecomes??(????)where??isthe
numberofobjectsinapartition.
Inconsiderationofthesetrade-offs,two-levelpartitioning
plewaytoblendtheadvantagesofbothextremes.Queries
coarsepartitions(“chunks”),andspatialnear-neighbor
partitions(“sub-chunks”)withineachpartition.Toavoid
non-joinqueries,thesystemcanstorechunksandgenerate
tialjoinqueries.On-the-flygenerationforjoinsiscost-effective
ofpairs,whichistrueaslongastherearemanysub-chunks
4.3.6.3OverlapAstrictpartitioningeliminatesnearbypairs
partitionsarepaired.Toproducecorrectresultsunder
toobjectsfromoutsidepartitions,whichmeansthatdata
eachpartitioncanbestoredwithaprecomputedamountof
pingdatadoesnotstrictlybelongtothepartitionbutis
thepartition’sborders.Usingthisdata,spatialjoins
presetdistancewithoutneedingdatafromotherpartitions
OverlapisneededonlyfortheObjectCatalog,asallspatial
catalogonly.Guidedbytheexperiencefromotherprojects
presettheoverlapto~1arcmin,whichresultsinduplicating
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
16
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
Catalog.
4.3.6.4SphericalgeometrySupportforsphericalgeometryisnotcommon
andsphericalgeometry-basedpartitioningwasnon-existent
cidedtodevelopQserv.Sincesphericalgeometryisthenorm
tialobjects(right-ascensionanddeclination),anyspatial
objectsmustaccountforitscomplexities.
4.3.6.5DataimmutabilityItisimportanttonotethatuserqueryaccess
onlydata.Nothavingtodealwithupdatessimplifiesthe
extraoptimizationsnotpossibleotherwise.TheLevel1
andwillnotrequirethescalablearchitecture–weplanto
of-theboxMySQLasdescribedinSection4.1
4.3.7Long-runningqueries
Manyofthetypicaluserqueriesmayneedsignificanttime
Toavoidre-submissionofthoselong-runningqueriesin
orhardwareissues)thesystemwillsupportasynchronous
modeuserswillsubmitqueriesusingspecialoptionsorsyntax
queryandimmediatelyreturntousersomeidentifierofthe
usersession.Thisqueryidentifierwillbeusedbyuserto
queryresultafterquerycompletes,orapartialqueryresult
Thesystemshouldbeabletoestimatethetimewhichuser
refusetorunlongqueriesinaregularblockingmode.
4.3.8Technologychoice
AsexplainedinDMTN-046,nooff-the-shelfsolutionmeetstheaboverequirements
anRDBMSseemsamuchbetterfitthanaMap/Reduce-basedsystem
suchasindexes,schema,andspeed.Forthatreason,our
customsoftwarebuiltontwoproductioncomponents:anopen
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
17
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
non-parallelDBMS(MySQL)andXRootD.Toeasepotential
municationwiththeunderlyingDBMSreliesonbasicDBMSfunctionalityonly,and
vendor-specificfeaturesandadditions.
Figure2:ComponentconnectionsinQserv.
5Implementation(Qserv)
Aprototypeimplementationofthebaselinearchitecture
above,Qserv,wasdevelopedduringtheR&DphaseofLSST,andits
stratedinearlytesting(DMTR-21,DMTR-12).Productizationwassubsequently
resourcedfortheconstructionphaseofLSSTandispresently
rentlyimplementedisdescribedhere.
5.1Components
5.1.1MySQL
Tocontrolthescopeofeffort,QservusesanexistingSQL
muchqueryprocessingaspossible.MySQLisagoodchoice
mentcommunity,matureimplementation,wideclientsoftware
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
18
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
lightweightexecution,andlowdataoverhead.MySQL’s
munitymeansthatexpertiseisrelativelycommon,which
developmentorlong-termmaintenanceintheyearsahead.
isalsolightweightandwell-understood,givingpredictable
vancedstoragelayoutthatmaydemandmorecapacity,bandwidth,
constrainedhardwarebudget.
Itisworthnoting,however,thatQserv’sdesignandimplementation
ofMySQLbeyondgluecodefacilitatingresultstransmission.
ordertoallowthesystemtoleverageamoreadvancedormore
thefuture.
5.1.2XRootD
TheXRootDdistributedfilesystemisusedtoprovideadistributed,
cated,fault-tolerantcommunicationfacilityforQserv.
havebeennon-trivial,sowewantedtoleverageanexisting
scalability,fault-tolerance,performance,andefficiency
physicscommunity.ItsrelativelyflexibleAPIenabledits
generalcommunicationroutingsystem.Sinceitwasdesigned
wereconfidentthatitcouldmediatenotonlyquerydispatch
transferofresults.
AXRootDclusterisimplementedasasetofdataserversand
toaredirector,whichactsasacachingnamespacelookup
appropriatedataservers.InQserv,XRootDdataservers
mentingplug-inswithintheXRootDframeworkwhichadvertise
addressableresourceswithintheXRootDcluster.TheQserv
andreceivesresultsasanXRootDclientbydispatchingmessages
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
19
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
Figure3:XRootD.
5.2Partitioning
InQserv,largespatialtablesarefragmentedintospatial
scheme.Thepartitioningspaceisasphericalspacedefined??(rightascension/??)
and??(declination/??).Forexample,theObjecttableisfragmented
pairspecifiedintwocolumns:right-ascensionanddeclination.
mentsarerepresentedastablesnamedObject_CCandObject_CC_SSwhereCCisthe“chunkid”
(first-levelfragment)andSSisthe“sub-chunkid”(second-levelfragment
ment.Sub-chunktablesarebuilton-the-flytooptimizeperformance
Largetablesarepartitionedonthesamespatialboundaries
betweenthem.
5.3QueryGeneration
Qservisunusual(thoughnotunique)inprocessingauser
mentsthataresubsequentlydistributedtoandexecuted
software.Thisisdoneinthehopesofprovidingadistributed
avoidingafullre-implementationofcommondatabasefeatures.
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
20
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
thatitisnecessarytoimplementaqueryprocessingframework
morestandarddatabase,withtheexceptionthattheresulting
mentsastheintermediatelanguage.
Asignificantamountofqueryanalysisnotunlikeadatabase
ordertogenerateadistributedexecutionplanthataccurately
queries.Incominguserqueriesarefirstparsedintoanintermediate
modifiedSQL92-compliantgrammar(LubosVnuk’sanltr-based
representationisequivalenttotheoriginaluserquery,
terpretation,butmaynotcompletelyreflecttheoriginal
tationistoprovideasemanticrepresentationthatmaybe
andtransformationmoduleswithoutthecomplexityofaparse
theoriginalEBNFgrammar.
Oncetheintermediaterepresentationhasbeencreated,
modules.Thefirstsequenceoperatesonthequeryasasingle
stepoccurstosplitthesinglerepresentationintoa“plan”
tion,onetobeexecutedper-data-chunk,andonetobeexecuted
resultsintofinaluserresults.Asecondsequenceisthen
necessarytransformationsforanaccurateresult.
Wehavefoundthatregularexpressionsandparseelement
lyzeandmanipulatequeriesforanythingbeyondthemost
5.3.1Processingmodules
Theprocessingmodulesperformmostoftheworkintransforming
mentsthatcanproduceafaithfulresultfromaQservcluster.
•Identifyspatialindexingopportunities.Thisallows
queriesononlyasubsetoftheavailablechunksconstituting
giveninQserv-specificsyntaxarerewrittenasboolean
•Identifysecondaryindexopportunities.Qservdatabases
areunderconsideration)asakeycolumnwhosevaluesare
onespatiallocation.IdentificationallowsQservtoconvert
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
21
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
intospatialrestrictions.
•Identifytablejoinsandgeneratesyntaxtoperformdistributed
marilysupports“near-neighbor”spatialjoinsforlimited
tioningcoordinatespace.Arbitraryjoinsbetweendistributed
usingthekeycolumn.Queriesareclassifiedaccording
ning.Byidentifyingtablesscannedinaquery,Qserv
tionusingsharedscanning,whichgreatlyincreasesefficiency.
5.3.2Processingmoduleoverview
Figure4:Processingmodules.
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
22
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
Thisfigureillustratesthequerypreparationpipelinethat
inputquerystring.Userquerystringsareparsed(1)into
thatispassedthroughasequenceofprocessingmodules(2)
tationin-place.Then,itisbrokenup(3)intopiecesthat
executionontablepartitionsandpiecesintendedtomerge
Anotherprocessingsequence(4)operatesonthisnewrepresentation,
cretequerystringsaregenerated(5)forexecution.
Thetwosequencesofprocessingmodulesprovideanextensible
analysisandmanipulation.Earlierprototypesperformed
parsing,butthisledtoapracticallyunmaintainablecode
portedtheprocessingmodulemodel.Processingissplit
flexibilitytomodulesthatmanipulatethephysicalstructures
queryrepresentationtomodulesthatdonotrequirethe
betweenparsing,whoseonlygoalistoprovideaintelligible
tation,andtheQserv-specificanalysisandmanipulation
maintainability,andextensibilityofthesystemandshould
andfutureLSSTneeds.
5.4Dispatch
QservusesXRootDasadistributed,highly-availablecommunications
frontendstocommunicatewithdataworkers.Upuntil2015,
APIwithnamedfilesascommunicationchannels.Thecurrent
eraltwo-waynamed-channelingsystemwhicheliminates
generalizedprotocolmessagesthatcanbeflexiblystreamed.
ServiceInterface(SSI)andisbuiltontopofXRootD.
5.4.1Wireprotocol
QservencodesquerydispatchesinGoogleProtobufmessages,
tobeexecutedbytheworkerandannotationsthatdescribe
teristics.TransmittingquerycharacteristicsallowsQserv
underchangingCPUanddiskloadsaswellasmemoryconsiderations.
re-analyzethequerytodiscoverthesecharacteristics
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
23
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
determinedbyqueryinspection.
QueryresultsarealsoreturnedviaProtobufmessages.Initial
tabledumpstoavoidlogictoencodeanddecodedatavalues,
typeMonetDBworkerbackendprovedthatdataencodingand
problemswhosesolutioncouldsignificantlyimproveoverall
tatingmetadataoperationsonworkerandfrontendDBMS
encodesresultsinprotobufmessagescontainingschema
ues.StreamingresultsdirectlyfromworkerDBMSinstances
isatechniqueunderconsideration,asisacustomaggregation
likelyeasetheimplementationofprovidingpartialquery
5.4.2Frontend
In2012,anewXRootDclientAPIwasdevelopedtoaddress
sion’sscalability(uncoveredduringa150node,30TBscalabilityDMTR-21]).Thenew
clientAPIbeganproductionuseforthebroaderXRootDcommunity
quently,workbeganunderourguidancetowardsanXRootD
onrequest-responseinteractionovernamedchannels,instead
ingfiles.AproductionversionofthisAPI,theScalableService
inearly2015andQservhassincebeenportedtousethis
significantbodyofcodethatmappeddispatchingandresult-retrieval
SSIAPInowresidesintheXRootDcodebase,whereitmaybe
TheSSIAPIprovidesQservwithafullyasynchronousinterface
ingthreadsusedbytheQservfrontendtocommunicatewith
oneclassofproblemsweencounteredduringlarge-scale
terfacesthatintegratesmoothlywiththeProtobufs-encoded
novelfeatureswerespecificallyaddedtoimproveQserv
sponseinterfaceenablesreducedbufferingintransmitting
toathefrontend,whichlowersend-to-endquerylatency
onworkers.Theout-of-bandmeta-dataresponsewhicharrives
beusedtomapouttheProtobufsencodingandsignificantly
orybuffers.
ThefullyasynchronousAPIiscrucialonthemasterbecause
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
24
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
rentchunkqueriesinflightexpectedinnormaloperation.
10kpieces,having10full-scanningqueriesrunningconcurrently
chunkqueries–toolargeanumberofthreadstoallowona
chronousAPItoXRootDiscrucial.Threadsareusedtoparallelize
Whileitdoesnotseemtobeimportanttoparse/analyze/manipulate
parallel(andsuchataskwouldbearesearchtopic),the
couldbedoneinparallelifsomeportionoftheaggregation/merging
coderatherthanloadedintothefrontend’sMySQLinstance
Thusresultsprocessingshouldbeparallelizedamongresults
queryparsing/analysis/manipulationcanbeparallelized
5.4.3Worker
TheQservworkerusesboththreadsandasynchronouscalls
allelism.ToserviceincomingrequestsfromtheXRootDAPI,
receiverequestsandenqueuethemforaction.Specifically,
isusedonQservworkersaswell.Theinterfaceprovides
onthefront-endmakingthelogicrelativelyeasytofollow
prone.
Threadsaremaintainedinathreadpooltoperformincoming
theDBMS’sAPI(currently,theapparentlysynchronousMySQL
runinobservanceoftheamountofparallelresourcesavailable.
I/Odependencyofeachincomingchunkqueryintermsofthe
resourcesinvolved,andattemptstoensurethatdiskaccess
Thusiftherearemanyqueriesthataccessthesametable
ofthemtorunasthereareCPUcoresinthesystem,butif
differentchunktables,itallowsfewersimultaneouschunk
onetablescanperdiskspindleoccurs.Furtherdiscussion
isdescribedbelow.
5.5ThreadingModel
NearlyeveryportionofQserviswrittenusingacombination
execution.
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
25
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
Qservheavilyreliesonmulti-threadingtotakeadvantage
executingqueries,asanexample,tocompleteonefulltable
chunks,1,000queries(processes)willbeexecuted.To
ofprocessesthatareexecutedoneachworker,weended
andswitchingfromathread-per-requestmodeltoathread
completelyasynchronous,withrealcall-backs.
mysqlproxySingle-threadedLuacodedrivingnon-blocking
clientAPI
Frontend-C++Processingthreadperuser-queryfor
Results-mergingthread-per-user-query
Frontend-XRootDCallbackthreadsperformquerytransmission
sultsretrieval
Frontend-XRootDinternalThreadsformaintainingworkerconnections
host)
XRootD,cmsdSmallthreadpoolsformanaginglivenetwork
tionsandperforminglookups
Worker-XRootDSmallplug-inthreadpoolO(#cores)tomakeblocking
APIcallsintolocalmysqld;callbackthreads
performadmission/schedulingoftasks
andtransmissionofresults
5.6Aggregation
QservsupportsseveralSQLaggregationfunctions:AVG(),
andSQL92levelGROUPBY.
5.7Indexing
Qserveschewsheavyindexingingeneral,duetotheprohibitive
curasaresultofthescaleofthehosteddata.Nevertheless,
primarykeyareanticipatedtobeaverycommonusecaseand
ciently.Tothatend,thecurrentimplementationadmits
ternodes,whichcanbeusedtomapqueriesrestrictedby
andsubchunkswhichcontainthoseObjects.Queryfragments
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
26
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
tojustthesetofinvolvedworkers,wherein-memorysubchunk
tobeefficientlyexecuted.Thisspecial-caseglobalindex
subchunkisreferredtoasthe“secondaryindex”.
5
Thesecondaryindexutilizesoneormoretablesusingthe
masternodetoperformlookups.Performancetests(Figure5)onasingle,dual-core
with1TBharddiskstorage(notSSD)haveshownthatthis
loadof40billionrowsinabout400,000seconds(110hours).
withmultiplecoresandSSDstorageisexpectedtomeetthe
lessthan48hours.
Figure5:PerformancetestsofMySQL-basedsecondaryindex.
ToimprovetheperformanceoftheInnoDBstorageenginefor
maybesplitacrossasmallnumber(dozens)oftables,each
keys.Thissplitting,ifdone,willbeindependentofthe
contiguityofkeyrangeswillallowthesecondaryindexservice
tablearithmeticallyviaanin-memorylookup.
5Itisacknowledgedthatthename“secondaryindex”waspoorlychosen,
literature.Thisnamewillprobablybechangedinthenearfuture.
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
27
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
5.7.1SecondaryIndexStructure
Thesecondaryindexconsistsofthreecolumns:theObject
wherealldatawiththatkeyarelocated(chunkId),andthe
datawiththekeyarelocated(subChunkId).TheobjectId
asa64-bitintegervalue;thechunkIdandsubChunkIdare
spatialregionsonthesky.
5.7.2SecondaryIndexLoading
TheInnoDBstorageengineloadstablesmostefficientlyif
beenpresortedaccordingtothetable’sprimarykey.When
iscollectedforloading(fromeachworkernodehandling
byobjectId,andmaybedividedintoroughlyequal“splits”.
atableenmasse.
Tofullyoptimizetheloadingandtablesplitting,theentire
allworkersandpresortedinmemoryonthemaster.This
entries(requiringaminimumof480GBmemory,plusoverhead).
fromasingleworkercanbeassumedtobea“representative
objectIds,sotablesplittingcanbedoneusingthefirst
workerswillbesplitandloadedaccordingtothosedefined
5.8DataDistribution
LSSTwillmaintainthereleaseddatastorebothontapemedia
tapearchiveisusedforlong-termarchival.Threecopies
bekept.Thedatabaseclusterwillmaintain3onlinecopies
clustersofreasonablesizefailureregularly,thecluster
providecontinuousdataaccess.Areplicationfactorof
dataintegritybymajorityrulewhenonereplicaiscorrupt.
Ifperiodicunplanneddowntimeisacceptable,anon-tape
three.However,theuseoftapedramaticallyincreasesthe
Thismaybeacceptableforsometables,particularlythose
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
28
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
althoughallowingservicedisruptionmaymakeitdifficult
analysisonthoselargetables.
5.8.1Databasedatadistribution
Thebaselinedatabasesystemwillprovideaccessforat
andprevious.Dataforeachreleasewillbespreadoutamong
Datareleasesarepartitionedspatially,andspatialpieces
robinfashionacrossallnodes.Thismeansthatareaqueries
almostguaranteedtoinvolveresourcesonmultiplenodes.
Eachnodeshouldmaintainatleast20%freespaceofits
mainingfreespaceisthenavailabletobe“borrowed”when
temporaryuseofstoragecapacityuntilmoreserverresources
80%storageuseisreturned.
5.8.2Failureandintegritymaintenance
Therewillbefailuresinanylargeclusterofnode,inthe
volumes,innetworksaccessandsoon.Thesefailureswill
residentonthosenodes,butthislossofdataaccessshould
toanalyzethedatasetasawhole.Weneedtosetadataavailability
confidenceofthecommunityinthestabilityofthesystem.
andtoallowacceptablelevelsofnodefailuresinacluster,
atablelevelthroughoutthecluster.
Thereplicationlevelwillbethateachtableinthedatabase
nodes.Amonitoringlayertothesystemwillcheckonthe
hours,althoughthistimewillbetunedinpractice.When
thanthreereplicasavailable,thiswillinitiateareplication
currentlyhostingthattable.Thetimesforthechecking,
tothestabilityofthecluster,suchthatabout5%ofall
1or2replicas.Threereplicaswillensurethattableswill
failures,orwhennodesneedtobemigratedtonewhardware
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
29
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
Shouldanentirenodefail,replicatingthatdatatoanother
siveintermsoftime(ontheorderofhours).Weplanonhaving
fillinglocalstorageto80%.Thefreespacewillbeused
failures,wherereplicascantakeplaceinparallelbetween
newnodeswithfreestorageareaddedtothecluster,then
freespaceintothedrive,potentiallytakingseveralhours,
dataduringthistime.Oncethisiscomplete,thisdatawill
oftimeuntilthesetablescanberemovedfromthetemporary
to80%usage.
5.9Metadata
Qservneedstotrackvariousmetadatainformation,static
infrequently),anddynamic(run-time)inordertomaintain
optimizetheclusterusage.
5.9.1Staticmetadata
Qservtypicallyworkswithdatabasesandtablesdistributed
itbreaksindividuallargetablesintosmallerchunksand
Allchunksthatbelongtothesamelogicaltablemusthave
parameters.Differenttablesoftenneedtobepartitioned
mightbepartitionedwithoverlap(suchastheObjecttable),
nooverlap(forexampletheSourcetable),andsomemight
atinyFiltertable).Allthisinformationaboutschemaand
databasesandtablesneedstobetrackedandkeptconsistent
ImplementationofthestaticmetadatainQservisbased
whichusesaregularMySQLdatabaseasastoragebackend.
multiplemastersanditmustbeservedbyafault-tolerant
amaster-masterreplicationsolutionlikeMariaDBGalera
criticalformetadataanditshouldbeimplementedusing
enginesinMySQL.
Staticmetadatamaycontainfollowinginformation:
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
30
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
•Per-databaseandper-tablepartitioningandscanscheduling
•Tableschemaforeachtable,usedtocreatedatabasetables
instances;theschemainthemasterMySQLinstancecan
informationwhenatableisalreadycreated.
•Databaseandtablestateinformation,usedprimarily
tablecreationordeletion.
•Definitionsforthesetofworkerandmasternodesinacluster
status.
Themainclientsofthestaticmetadataare:
•Administrationtools(command-lineutilitiesandmodules)
modifymetadatastructures.
•Qservmaster(s),mostlyqueryingpartitioningparameters
table/databasestatuswhendeleting/creatingnewtables
notdependonnodedefinitionsinmetadata;theXRootDfacility
withworkers.
•Aspecial“watcher”servicewhichimplementsdistributed
tablemanagement.
•Aninitialimplementationofthedataloadingapplication
tionsandwillcreate/updatedatabaseandtabledefinitions.
willeventuallybereplacedbyadistributedloadingmechanism
separatemechanisms.
5.9.2Dynamicmetadata
Inadditiontostaticmetadata,aQservclusteralsoneeds
variousstatisticsaboutqueryexecution.Thissortofdata
perqueryexecution,andiscalleddynamicmetadata.
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
31
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
Theimplementationofthedynamicmetadataisbasedonthe
metadataitneedstobesharedbetweenallmasterinstances
tolerantMySQLinstancewhichissharedwithstaticmetadata
Dynamicmetadatacontainsthefollowinginformation:
•DefinitionofeverymasterinstanceinaQservcluster.
•RecordofeverySELECT-typequeryprocessedbycluster.
processingstateandsomestatistical/timinginformation.
•Per-querylistoftablenamesusedbytheasynchronous
todelaytabledeletionwhileasyncqueriesareinprogress.
•Per-queryworkerinformation,whichincludeschunkid
theworkerprocessingthatchunkid.Thisinformation
restartthemasterormigratequeryprocessingtoadifferent
failure.
Themostsignificantuseofthedynamicmetadataistotrack
queries.Whenanasyncqueryissubmitteditisregistered
IDisreturnedtotheuserimmediately.Lateruserscanrequest
queryIDwhichisobtainedfromdynamicmetadata.Whenquery
canrequestresultsandthemastercanobtainthelocation
metadata.
Additionally,dynamicmetadatacanbeusedtocollectstatistical
thatwereexecutedinthepastwhichmaybeanimportanttool
ingsystemperformance.
5.9.3Architecture
TheQservmetadatasystemisimplementedbasedonmaster/server
dataiscentrallymanagedbyaQservMetadataServer(qms).Theinformationkept
workeriskepttoabareminimum:eachworkeronlyknowswhich
tohandle,andallremaininginformationcanbefetchedfrom
ourphilosophyofkeepingtheworkersassimpleaspossible.
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
32
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
Thereal-timemetadataismanagedinsideqmsinin-memory
withdisk-basedtable.Suchconfigurationallowsreducing
delayingqueryexecutiontime.Shouldaqmsfailureoccur,
theinformationwaslostwillberestarted.Sincethesynchronization
occurrelativelyfrequently(e.g.atleastonceperminute),
avoidoverloadingtheqms,onlythehigh-levelinformation
storedinqms;allworker-basedinformationiscachedin
inasimple,rawform(e.g,key-value,ASCIIfile),andcan
5.9.4TypicalDataFlow
Staticmetadata:
1.Partsofthestaticmetadataknownbeforedataispartitioned/loaded
administrationscriptsresponsibleforloadingdata
startdatapartitioner.
2.Thedatapartitionerreadsstaticmetadataloadedby
remaininginformation.
3.WhenQservstarts,itfetchesallstaticmetadataand
in-memoryoptimizedC++structure.
4.Thecontentsofthein-memorymetadatacacheinsideQserv
mandifthestaticmetadatachanges(forexample,when
added).
Dynamic-metadata:
1.Masterloadstheinformationforeachquery(whenitstarts,
2.Detailedstatisticsaredumpedbyeachworkerintoascratch
formationcanberequestedfromeachworkerondemand.
chunk-queriesexceptonecompleted,qmswouldfetch
chunk-querytoestimatewhenthequerymightfinish,whether
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
33
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
5.10SharedScans
Arbitraryfull-tablescanningqueriesmustbesupported
dertoprovidethissupportcost-effectivelyandefficiently,
SharedscanseffectivelyreducetheI/Ocostofexecuting
rently,reducingtherequiresystemhardwareandpurchasing
SharedscansreduceoverallI/Ocostsbyforcingincoming
queriesscanthesametable,theoretically,theycancompletely
I/Ocostofasinglequeryratherthanthesumoftheirindividual
forqueriestoshareI/Obecausetheirarrivaltimesarerandom
beginsscanningatdifferenttimes,andbecauseLSST’scatalog
systemcachingisineffective.InQserv,scanningqueries
sharedscanningforceseachquerytooperateonthesame
byreorderingandsequencingthequeryfragments.
5.10.1Background
Historically,sharedscanninghasbeenaresearchtopic
mentations.Weknowofonlyoneimplementationinuse(Teradata).
mentationsassumeOSordatabasecachingissufficient,encouraging
toreducetheneedoftablescans.However,ourexperiments
arelargeenough(byrowcount)andcolumnaccesssufficiently
columnswhentherearehundredstochoosefrom),indexes
indexesnolongerfitinmemory,andevenwhentheydofitin
trieveeachrowisdominantwhentheindexselectsapercentage
finitenumber(thousandsorless).
5.10.2Implementation
TheimplementationofsharedscansinQservisintwoparts.
tionofincomingqueriesasscanningqueriesornon-scanning
toscanatableifitdependsonnon-indexedcolumnvalueskchunks
(wherekisatunableconstant).Notethatinvolvingmultiple
lectsfromatleastonepartitionedtable.Thisclassification
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
34
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
onthefront-endandleveragestablemetadata.Themetadata
issetbyhand.Higherscanratingsindicatelargertables
Theidentified“scantables”andtheirratingsaremarked
whichusetheinformationinschedulingthefragmentsof
Thesecondpartofthesharedscansimplementationisascheduling
queryfragmentexecutiontooptimizecacheeffectiveness.
off-the-shelfDBMSinstancesonworkernodes,itisnotallowed
implementsharedscans.Instead,itissuesqueryfragments
accessindataandtime,andtriestolockthefilesassociated
muchaspossible.Usingtheidentifiedscantablesandtheir
theappropriatescheduler.Therewillbeatleastthreeschedulers.
tocompleteinunderanhour,whichareexpectedtoberelated
queriesexpectedtotakelessthaneighthours,expected
oneforscansexpectedtotakeeighttotwelvehoursforForcedSource
Thereasoningbeingthatasingleslowquerycanimpedethe
alltheotheruserqueriesonthatscan.Theremaybeaneed
queriestakingmorethan12hours.
Eachschedulerplacesincomingchunkqueriesintooneoftwo
idthenscanratingoftheindividualtables.Ifthequery
ningchunkid,itisplacedontheactivepriorityqueue,
priorityqueue.Afterchunkid,thepriorityqueueissorted
toensurethatthelargesttablesinthechunkaregrouped
Oncethequeryisontheappropriatescheduler,thealgorithm
dispatchslotisavailable,itchecksthehighestpriority
fragment(hereaftercalledatask),anditisnotatitsquota
task,otherwisetheworkerchecksthenextscheduler.It
beenstartedoralltheschedulershavebeenchecked.
Eachschedulerisonlyallowedtostartataskundercertain
enoughthreadsavailablefromthepoolsothatnoneofthe
threadsaswellasenoughmemoryavailabletolockallthe
theschedulerhasnotasksrunning,itmaystartonetask
tablesinthattask.Thisshouldpreventanyschedulerfrom
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
35
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
withoutrequiringcomplicatedlogic,butitcouldincur
Schedulerscheckfortasksbyfirstcheckingthetopofthe
priorityqueueisempty,andthependingpriorityqueue
queuesareswappedwiththetaskbeingtakenfromthetop
SincethequeriesarebeingrunbyaseparateDBMSinstance
overhowitgoesaboutrunningqueries,theworkercanonly
theDBMSandalsolockfilesinmemory.Filesinmemoryare
bepagedoutwhenmemoryresourcesarelow,whichwouldincrease
inmemorypreventsthisfromhappening.However,caremust
memorycanbeusedforlockingfiles.Usetoomuchandthere
onDBMSperformance.Setasidetoolittle,andschedulers
theresourcesavailableandmaybeforcedtoruntaskswithout
memory.
Thememorymanagercontrolswhichfilesarelockedinmemory.
runatask,thetaskasksthememorymanagertolockallthe
Thememorymanagerdetermineswhichfilesareassociated
alreadylockedinmemoryandthereisenoughmemoryavailable
notalreadylocked,thetaskisgivenahandleandallowed
ithandsthehandlebacktothememorymanager.Ifitwasthe
table,thememoryforthefilesusedbythattableisfreed.
Whenthememorymanagerlocksafile,itdoesnotreadthe
forthefiletooccupywhenitisreadbytheDBMS.Inthespecial
eventhoughthereisnotenoughmemoryavailable,thosetables
listofreservedtablesandtheirsizeissubtractedfrom
freed.Whenmemoryisfreed,thememorymanagerwilltry
BecauseQservprocessesinteractive,shortqueriesconcurrently
queryschedulershouldbeabletoallowforthosequeries
aqueryscan.Toachievethis,Qservworkernodeschoose
scribedaboveandasimplergroupingscheduler.Incomingquerieswithidentified
areadmittedtothescanscheduler,andallotherqueries
uler.Thegroupingschedulerisasimpleschedulerthat
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
36
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
(first-in-first-out)scheduler.LikeaFIFOscheduler,it
cute,andoperatesidenticallytoaFIFOschedulerwith
bychunkid.Eachincomingqueryisinsertedintothequeue
samechunk,andatthebackifnoqueuedquerymatches.The
thatthequeuewillnevergetverylong,becauseitisintended
querieslastingfractionsofseconds,butgroupsitsqueue
provideaminimalamountofaccesslocalitytoimprovethroughput
Somelongerquerieswillbeadmittedtothegroupingscheduler
ningqueries,providedthattheyhavebeendeterminedto
thesenon-sharedscanquerywilldisruptperformanceof
diskonaworker,theimpactisthoughttobesmallbecause
alargefractionof)theworkforasingleuserquery,and
disksonallworkers.
Fordiscussionabouttheperformanceofthecurrentimplementation,DMTR-16.
5.10.3Memorymanagement
Tominimizesystempagingwhenmultiplethreadsarescanning
mentedamemorymanagercalledmemman.Whenasharedscan
thesharedscanschedulerinformsmemmanaboutthetables
howimportantitistokeepthosetablesinmemoryduring
directedtokeepthetablesinmemory,memmanopenseachdata
memory,andthenlocksthepagestopreventthekernelfrom
uses.Thus,onceafilepageisfaultedin,itstaysinmemory
thecontentsofthepagewithoutincurringadditionalpage
thetablecompletes,memmanistoldthatthetablesnolonger
memmanfreesupthepagesbyunlockingthemanddeleting
Thistypeofmanagementisnecessarytosatisfysystempaging
primepagingpoolisthesetofunlockedfilesystempages.
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
37
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
5.10.4XRootDschedulingsupport
Whenthefront-enddispatchesaquery,XRootDnormally
attempttospreadtheloadacrossallofthenodesholding
wellforinteractivequeries,itishardlyidealforshared
memoryandI/Ousage,queriesforthesametableinashared
tothesamenode.Anewschedulingmodewasaddedtothe
scheduling.Thefront-endcantellXRootDwhetherornot
otherqueriesusingthesametable.Queriesthathaveaffinity
noderelativetothetabletheywillbeusing.Thisallows
pagingbyrunningthemaximumnumberofqueriesagainstthe
thatnodefail,XRootDassignsanotherworkingnodethat
queriesthathaveaffinity.
5.10.5Multipletablessupport
Handlingmultipletablesinsharedscansrequiresanadditional
schedulerwillaimtosatisfyathroughputyieldingaverage
•Objectqueries:1hour
•Object,Sourcequeries(join):12hours
•Object,ForcedSourcequeries(join):12hours
•Object_Extras6queries(join):8hours.
Thereareseparateschedulersforqueriesthatareexpected
twelvehours.Theschedulersgroupthetasksbychunkidand
thealltablesinthetask.Thescanratingsaremeanttobe
thesizeofthetable,sothissortingplacesscansusing
nexttoeachotherinthequeue.Usingscanratingallowsflexibility
schemasdifferentthanthatofLSST.
6Thisincludesall
Object-relatedtables,e.g.,Object_Extra,Object_Periodic,Object_NonPeriodic,Object_APMean
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
38
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
Sincescansarenotlimitedtospecifictables,complicated
thatcouldtakemorethantwelvehourstoprocess.Theworker
ifparticularqueriesappeartobetooslowfortheircurrent
placedonadedicated”snailscan”sotherestofthequeri
overlydelayed.
5.11Level3:UserTables,ExternalData
Level3tablesincludingtablesgeneratedbyusers,and
dependingontheirtypeandsize,willbeeitherpartitioned
ductiondatabaseservers,orkeptunpartitionedinonecentral
anddistributedLevel3datawillsharethenodeswithLevel
disks,independentfromthedisksservingLevel2data.
recoverabilityfromfailures.
Level3tableswillbetrackedandmanagedthroughtheQserv
scribedinSection5.9Thisincludesboththestatic,aswellasthe
5.12ClusterandTaskManagement
QservdelegatesmanagementofclusternodestoXRootD.The
termembership,noderegistration/deregistration,address
cation.ItsScalableServiceInterface(SSI)APIprovides
nelstotherestofQserv,hidingdetailslikenodecount,
existenceofreplicas,andnodefailure.TheQservmaster
endpointsandQservworkersfocusonreceivingandexecuting
ClustermanagementperformedoutsideofXRootDdoesnot
butincludescoordinatingdatadistribution,dataloading,
discussedinSection5.15TheSSIAPIincludesmethodsthatallowdynamic
dataviewofanXRootDclustersothatwhennewtablesappear
systemcanincorporatethatinformationforfuturescheduling
dynamicallychangewithouttheneedtorestarttheXRootD
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
39
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
5.13FaultTolerance
Qservapproachesfaulttoleranceinseveralways.Thedesign
underlyingdatabyreplicatinganddistributingdatachunks
eventofanodefailure,theproblemcanbeisolatedand
tonodesmaintainingduplicatedata.Moreover,thisarchitecture
incrementalscalabilityandparallelperformance.Within
modularizedwithminimalinterdependenceamongitscomponents,
narrowinterfaces.Finally,individualcomponentscontain
handling,andrecoveringfromerrors.
ThecomponentsthatcompriseQservincludefeaturesthat
preventionandfailure-recoverycapabilities.TheMySQL
amongseveralunderlyingMySQLserversandprovideautomatic
fails.TheXRootDsystemprovidesmultiplemanagersandhighly
highbandwidth,contendwithhighrequestrates,andcope
Qservmasteritselfcontainslogicthatworksinconjunction
fromworker-levelfailures.
Aworker-levelfailuredenotesanyfailuremodethatcan
nodes.Inprinciple,allsuchfailuresarerecoverablegiven
andalternativenodescontainingduplicatedataareavailable.
cludeadiskfailure,aworkerprocessormachinecrashing,
aworkerunreachable.
Considertheeventofadiskfailure.Qserv’sworkerlogic
failureonlocalizedregionsofdiskandwouldbehaveasif
workerprocesswouldthereforecrashandallchunkqueries
belost.Thein-flightqueriesonitslocalmysqldwouldbe
TheQservmaster’srequeststoretrievethesechunkqueries
errorcode.Themasterrespondsbyre-initializingthechunk
XRootD.Ideally,duplicatedataassociatedwiththechunk
thiscase,XRootDsilentlyre-routestherequest(s)tothe
queriesarecompletedasusual.Intheeventthatduplicate
chunkqueries,XRootDwouldagainreturnanerrorcode.The
submitachunkqueryafixednumberoftimes(determined
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
40
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
beforegivingup,logginginformationaboutthefailure,
theuserinresponsetotheassociatedquery.
Errorhandlingintheeventthatanarbitraryhardwareor
Qservworkeritself)causesaworkerprocessormachineto
nerdescribedabove.Thesameistrueintheeventthatnetwork
ness/overloadhasthelimitedeffectofpreventingXRootD
moreworkernodes.Aslongassuchfailuresarelimitedto
notextendtotheQservmasternode,XRootDisdesignedto
errorcode.Moreover,ifduplicatedataexistsonother
XRootD,whichwillsuccessfullyrouteanysubsequentchunk
Intheeventofanunrecoverableerror,theQservmasteris
sagingmechanismdesignedtobothlogdetailedinformation
human-readableerrormessagetotheuser.Thismechanism
logicthatencapsulatesallofthemaster’sinteractions
tionoccurs,themastergracefullyterminatesthequery,
event,andnotifiestheuser.Qserv’sinternalstatus/error
astatusmessageandtimestampeachtimeanindividualchunk
Suchmilestonesinclude:chunkquerydispatch,written
resultsmerged,andqueryfinalized.Thisreal-timestatus
intheeventofanunrecoverableerror.
Buildingupontheexistingfault-toleranceanderrorhandling
workincludesintroducingaheartbeatmechanismonworker
workerprocessandwillrestartitintheeventitbecomes
monitoringprocesscouldperiodicallypingworkernodes
essary.Wearealsoconsideringmanagingfailureataper-disk
researchsinceapplication-leveltreatmentofdiskfailure
possibletodevelopaninterfaceforcheckingthereal-time
processedbyQservbyleveragingitsinternallyusedstatus/error
5.14Next-to-databaseProcessing
Weexpectsomedataanalyseswillbeverydifficult,oreven
SQLlanguage.Thismightbeparticularlyusefulfortime-series
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
41
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
yses,wewillallowuserstoexecutetheiranalysisalgorithms
asPython.Todothat,wewillallowuserstoruntheirown
sourcesco-locatedwithproductiondatabaseservers.Users
tiondatabasewhichstreamrowsdirectlyfromdatabasecluster
cluster,wherearbitrarycodemayrunwithoutendangering
allowstheirincurreddatabaseI/Oneedstobesatisfied
scanninginfrastructurewhileprovidingthefullflexibility
5.15Administration
5.15.1Installation
Qservasaservicerequiresanumberofcomponentsthatall
figuredtogether.Onthemasternodewerequiremysqld,mysql-proxy,
metadataservice,andtheQservmasterprocess.Oneachof
bethemysqld,cmsd,andXRootDservice.Thesemajorcomponents
XRootD,andQservdistributions.Buttogetthesetowork
softwarepackage,suchasprotobuf,Lua,expat,libevent,
soon.Andmanyoftheserequiremorerecentversionsthan
distributions.
Tomanageboththecomplexityofdeploymentandthediversity
vironments,wehaveadoptedtheuseofLinuxcontainers
theiruser-spacedependenciesarebundledwithincontainers
uniformlytoanyhostrunningDockerwitharecentenough
atingsystemdistribution,packageprofile,patchlevels,
Wehaveexperimentedwithseveralcontainerorchestration
ment.Ofthese,Kubernetesseemstobeemergingasaclear
chestrationsolutionofchoice.
5.15.2Dataloading
Aspreviouslymentioned,DataReleaseProductionwill
Instead,theDRPpipelineswillproducebinaryFITStables
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
42
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
archivedastheyareproduced.DatawillbeloadedintoQserv
tablesareeithernotavailable,orcompleteandimmutable
spective.
Forreplicatedtables,theseFITSfilesareconvertedto
headerkeywordvaluepairs,orbytranslatingbinarytables
filesareloadeddirectlyintoMySQLandindexed.Forpartitioned
FITStablesarefedtotheQservpartitioner,whichassigns
andconvertstoCSV.
Inparticular,thepartitionerdividesthecelestialsphere
heightH.Foreachstripe,awidthWiscomputedsuchthat
longitudesseparatedbyatleastWhaveangularseparation
brokenintoanintegralnumberofchunksofwidthatleast
avaryingnumberofchunks(e.g.polarstripeswillcontain
variesbyafactorofaboutpioverthesphere.Thesameprocedure
chunks:eachstripeisbrokenintoaconfigurablenumber
eachsubstripeisbrokenintoequal-widthsubchunks.This
erarchicalTriangularMeshforitsspeed(notrigonometry
ofapointgiveninsphericalcoordinates),simplicityof
controlitoffersovertheareaofchunksandsub-chunks.
Theboundariesofsubchunksconstructedasdescribedare
theoverlapregionforasubchunkisdefinedasthespherical
thesubchunkbutwithintheoverlapradiusofitsboundary.
ThetaskofthepartitioneristofindtheIDsofthechunk
titioningpositionofeachrow,andtostoreeachrowinthe
itschunk.Ifthepartitioningparametersincludeoverlap,
mightadditionallyfallinsidetheoverlapregionsofone
copyoftherowisstoredforeachsuchsubchunk(inoverlap
TablesthatarepartitionedinQservmustbepartitioned
Thismeansthatchunktablesinadatabaseshareidentical
mappingsofchunkidtospatialpartition.Inordertofacilitate
columnsarechosentodefinethepartitioningspaceandall
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
43
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
latedsetoftables)areeitherpartitionedaccordingthat
atall.OurcurrentplanchoosestheObjecttable’sra_PSanddecl_PScolumns,meaningthat
rowsintheSourceandForcedSourcetableswillbepartitioned
reference.
Thereisoneexception:weallowforprecomputedspatial
atablemightprovideamany-to-manyrelationshipbetween
referencecatalogfromanothersurvey,listingallpairs
separatedbylessthansomefixedangle.Thereferencecatalog
associatedObject,asmorethanoneObjectmightbematched
thereferencecatalogmustbepartitionedbyreferenceobject
inthematchtablemightrefertoanObjectandreference
storedondifferentQservworkernodes.
Weavoidthiscomplicationbyagainexploitingoverlap.
time)thatnomatchpairisseparatedbymorethantheoverlap
matchtables,westoreacopyofeachmatchinthechunkof
match.WhenjoiningObjectstoreferenceobjectsviathe
tofindallmatchestoObjectsinchunkCbyjoiningwithall
objectsinCorintheoverlapregionofC.
AllQservworkernodeswillpartitionsubsetsofthepipeline
pectpartitioningtoachievesimilaraggregateI/Orates
queryaccess,sothatpartitioningshouldcompleteina
time.Onceitdoes,eachQservworkerwillgatheralloutput
themintoMySQL.Thestructureoftheresultingchunktables
performanceofuserqueryaccess(chunktableswilllikely
compressed),andappropriateindexesarebuilt.Sincechunks
ofthesestepscanbeperformedusinganin-memoryfile-system.
whenreadingtheCSVfilesduringtheloadandwhencopying
files)tolocaldisk.
Thelastphaseofdataloadingistoreplicateeachchunkto
willrelyontablechecksumverificationratherthanamajority
replicaiscorruptornot.
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
44
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
Thepartitionerhasbeenprototypedasamulti-threaded
map-reduceimplementationinternallytoscaleacrosscores,
moreinputCSVfilesinparallel.Itdoesnotcurrentlyunderstand
arealsoparallelized-eachoutputchunkisprocessedby
writtentoinparallelwithnoapplicationlevellocking.
wasabletosustainseveralhundredMB/sofbothreadandwrite
aCSVdumpofthePT1.2Sourcetable.
Weareinvestigatingapairofdataloadingoptimizations.
eitherintegratethepartitioningcodeorfeeddatadirectly
municatingviapersistentstorage.Theotheristowrite
format(e.g.as.MYDfiles,ideallyusingtheMySQL/MariaDB
theCSVdatabaseloadingsteptobebypassed.
5.15.3Administrativescripts
TheQservdesignoriginallyhadasomewhatcomplicatedset
tocoordinateandsequenceservicelaunchandshutdownon
onservicestatus,toautomaticallyrelaunchfailedservices,
upgradesoutontoacluster.Wehaverecentlyfoundthatall
containerorchestrationframeworkssuchasKubernetes,
ofremovingthisnow-redundanttoolinginordertosimplify
Dataingeststilltendstobeaslightlydetailedprocess,
tonow,datapreparationproducestextfilesinCSVformat
MySQLlayerasaMyISAMtable.Theschemaforthesetables
placeswithcolumnsforchunkandsubchunknumber.Loading
suspended,andindicesandtablestatisticsforthequery
atedoneachworkeraftertheload.Additionally,alistof
hasbeenloadedmustbegeneratedtoaidinefficientquery
arecurrentlycoordinatedbyaPythonqserv-loaderscript.
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
45
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
5.16CurrentStatusandFuturePlans
Asofnow(July2017)wehaveimplementedabasicversionof
threerunninginstancesatascaleofapproximately30physical
operationondedicatedhardwareclustersatNCSAandCC-IN2P3.
Thesystemhasbeendemonstratedtocorrectlydistribute
volumequeries,includingsmall-areaconeandobjectsearches,
fulltablescansincludinglarge-areanear-neighborsearches.
UDFsinbothfilterandaggregationclauseshavealsobeen
Scaletestinghasbeensuccessfullyconducted
7ontheabove-mentionedclusters
ofuptoapproximately70TB,andweexpecttocrossthe100
2017.Todatethesystemisontrackthroughaseriesofgraduated
[LDM-552]tomeetorexceedthestatedperformancerequirementsLDM-555].
Theshared-scanimplementationissubstantiallycomplete,
highlevelsofconcurrencywithoutdegradedfull-scanquery
Themetadatasystem,asdescribedabove,forbothstatic
andbasicquerymanagementfacilities(querycancellation
mented.
Thesystemincludesacomprehensivesetofunitandregression
thebuildandCIsystems.
Inadditiontobackgroundongoingefforttoimprovequery
coverage,implementationworkaheadincludes:
•automaticdatadistributionandreplication;
•improvedquerymanagementandmonitoringtools;
•demonstratingcross-matchwithexternalcatalogs;
•implementingsupportforLevel3data;
7SeeQservtestreports
DMTR-21,DMTR-12,DMTR-13,andDMTR-16(mostrecent).
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
46
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
•next-to-dataprocessingframework;
•improvementstoadministrationproceduresandscripts;
•authenticationandauthorization;
•resourcemanagement;
•security;
•earlyengagementwithastronomyusers.
Automaticdatadistributionandreplication.Wehaveexperimentedsuccessfully
staticreplicationofdataatingesttime,butforproduction
placedataanddynamicallyresponddynamicallytonodearrivals
quired.Thisiscurrentlyunderactivedevelopment.
Improvedquerymanagementandmonitoringtools.Thefacilitytoenableclients
porarilydisconnectfromlong-runningqueriesandlater
and/orcollectresultsiscurrentlyunderactivedevelopment.
andestimatecompletiontimesanddatasizesforlong-running
mented.
Demonstratingcross-matchwithexternalcatalogs.Oneoftheusecasesinvolves
matchingwithexternalcatalogs.Incaseswherethecatalog
willbereplicated.Forcross-matchingwithlargercatalogs,
withwillneedtobepartitionedanddistributedtotheworker
ImplementingsupportforLevel3data.Usersshouldbeabletomaintaintheir
tostoretheirowndataorresultsfrompreviousqueries.
andreplacetheirowntableswithinthesystem.
Nexttodataprocessingframework.Afacilitytostreamqueryresults
whereuser-submittedcodecanrunagainstthosestreams
Improvementstoadministrationproceduresandscripts.Tofurtherautomatecommon
tasksrelatedtodatabasemanagement,tablemanagement,
tribution,andothersweneedtoimplementvariousimprovements
ceduresandscripts.
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
47
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
Authenticationandauthorization.Aproductiondatabasesystemshould
facilityforuserorrole-basedaccesssothatusagecan
shared.ThisisinparticularneededforLevel-3dataproducts.
Resourcemanagement.Aproductionsystemshouldhavesomeway
resourceusageandprovidequality-of-servicecontrols.
thatcantrackeachnode’sloadandper-user-queryresource
Security.Thesystemneedstobesecureandresilientagainst
Earlyengagementwithastronomyusers.Itisimportantthatweengage
membersofourtargetuser-community,sowecanhavetime
wearebuilding.Doesthesystemhavethecapabilitiesthey
syntaxusableandpracticalforthem?Wehavebegunsome
tiesinthePDAC(PrototypeDataAccessCenter)clusterat
andplantoexpandthataudienceinupcomingmonths.
5.17OpenIssues
Whatfollowsisa(non-exhaustive)listofissues,technical
discussedandwherechangesarepossible.
•Rowupdates.Currently,allrowsonceingestedintoaQserv
immutable,androwupdatesarenotsupported.Thissimplification
becausethemotivatingusecaseforQserv(annuallyreleased,
logs)isprimarilyread-only.Theapparentneedforupdate
QservinstancecanbeworkedaroundbytreatingLevel3
gested;shouldchangestoaLevel3productberequired,
andre-ingested/regeneratedratherthanupdatedin
sarythatQservsupportrowupdatesand/ortransactions
changestotheQservarchitecturewouldlikelyberequired.
•Verylargeobjects.Someobjects(e.g.largegalaxies)aremuch
region;insomecasestheirfootprintwillspanmultiple
withtheobjectcenter,neglectingtheactualfootprint.
casesthatwouldbenefitfromasystemthattracksobjects
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
48
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
iscurrentlynotarequirement.Apotentialsolutionwould
similartother-tree-basedindexessuchastheTOUCH15].
•Verylargeresults.Currently,thefront-endthatdispatches
bleforassemblingitseventualresults.Ingeneral,
theresourcesrequiredtoprocessestheresultsmaybe
greaterthanthoseneededtodispatchthequery.Onepotential
thefront-endtotheextentnecessarytohandlequery
SSIinterfacecouldbeaugmentedtoitselfallowrunning
onceaparticularfront-enddispatchesaqueryitcan
connectfromit.Adifferentservercould,usingthathandle,
processtheresults.Thiswouldbeamoreflexiblemodel
dentscalingofquerydispatchandresultprocessing.
notrequiringcancellationofin-progressqueriesdispatched
shouldthatfront-enddie.
•Sub-queries.QservdoesnotcurrentlysupportSQLsub-queries.
thatsuchacapabilitymightbeusefultousers,weshould
signsandunderstandhoweasy/difficulttheymightbeto
approachesheremightbetosplitsub-queriesintomultiple
variables.Anaïveimplementationthatinvolvesdumping
andthenrereadingthem,similartoamulti-stagemap/reduce,
6RiskAnalysis
6.1PotentialKeyRisks
Insufficientdatabaseperformanceandscalabilityisoneofthemajorstatedmajor
DMDocument-7025[].
Qservasanimplementationofthebaselinearchitecture
encedaboveisalreadywellonitsway,andiscurrentlyresourced
tionontimeandwithinbudget.
Aviablealternativemightbetouseanoff-the-shelfsystem.
couldpresentsignificantsupportcostadvantagesovera
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
49
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
itisasystemwellsupportedbyalargeuseranddeveloper
source,scalablesolutionwillbeavailableonthetimescale
ofproductionscalabilityapproachingfewhundredterabytes
systemslargerthanthelargestsingleLSSTdatasethave
productiontoday.Thereisagrowingdemandforlargescale
titioninthatarea(Hadoop,Greenplum,InfiniDB,MonetDB,
whichwasaclosed-sourceprojectattheoutsetofLSST,
Finally,athirdalternativewouldbetouseaclosed-source,
orInfiniDB(Teradataistooexpensive).Someofthesesystems
believethelargestbarrierpreventingusfromusinganoff-the-shelf
poorlydevelopedsphericalgeometryandsphericalpartitioning
Potentialproblemswithoff-the-shelfdatabasesoftwareused,suchasMySQLisanother
potentialrisk.MySQLhasrecentlybeenpurchasedbyOracle,
theMySQLprojectwillbesufficientlysupportedinthelong-term.
independentforksofMySQLsoftwarehaveemerged,including
oftheMySQLfounders)andPercona.ShouldMySQLdisappear,
compatiblesystemsareasolidalternative.Shouldweneed
wehavetakenmultiplemeasurestominimizetheimpact:
•ourschemadoesnotcontainanyMySQL-specificelements
demonstratedusingitinothersystemssuchasMonetDB;
•wedonotrelyonanyMySQLspecificextensions,withthe
whichcanbemadetoworkwithnon-MySQLsystemsifneeded;
•weminimizetheuseofstoredfunctionsandstoredprocedures
specific,andinsteaduseuserdefinedfunctions,which
facebindingpartneedstobemigrated).
Complexdata.analysisThemostcomplexanalysisweidentifiedso
temporalcorrelationswhichexhibit??(??2)performancecharacteristics,searching
liesandrareevents,aswellassearchingforunknownare
dustrialusersdealwithmuchsimpler,welldefinedaccess
bead-hoc,andaccesspatternsmightbedifferentthanthese
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
50
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
large-scaleindustrialusershavestartedtoexpressstrong
understandingandcorrelatinguserbehavior(time-series
nies,searchingforabnormaluserbehaviortodetectfraud
companies,analyzinggenomesequencingdatarunbybiotech
ketanalysisrunbyfinancialcompaniesarejustafewexamples.
ad-hocandinvolvesearchingforunknowns,similartoscientific
(byrich,industrialusers)forthistypeofcomplexanalyses
rapidlystartingtoaddneededfeaturesintotheirsystems.
Thecompletelistofalldatabase-relatedrisksmaintained
•DM-014:Databaseperformanceinsufficientforplanned
•DM-015:Unexpecteddatabaseaccesspatternsfromscience
•DM-016:UnexpecteddatabaseaccesspatternsfromDMproductions
•DM-032:LSSTDMhardwarearchitecturebecomesantiquated
•DM-060:Dependenciesonexternalsoftwarepackages
•DM-061:Provenancecaptureinadequate
•DM-065:LSSTDMsoftwarearchitectureincompatible
dards
•DM-070:Archivesizinginadequate
•DM-074:LSSTDMsoftwarearchitecturebecomesantiquated
•DM-075:NewSRDrequirementsrequirenewDMfunctionality
6.2RiskMitigations
Tomitigatetheinsufficientperformance/scalabilityrisk,
stratedscalabilityandperformance.Inaddition,toincrease
source,communitysupported,off-the-shelfdatabasesystem
collaboratedwiththeMonetDBopensourcecolumnardatabase
lessons-learned,theyaretryingtoaddmissingfeatures
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
51
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
capableofsupportingLSSTneeds.Further,tostaycurrent
caledatamanagementandanalysis,wecontinueadialogwith
bothDBMSandMap/Reduce,aswellaswithdata-intensive
tific,throughtheXLDBconferenceandworkshopserieswe
Tounderstandquerycomplexityandexpectedaccesspatterns,
ScienceCollaborationsandtheLSSTScienceCounciltounderstand
andquerycomplexity.Wehavecompiledasetofcommonqueries5]anddistilledthisset
asmallersetofrepresentativequeriesweuseforvarious
eachmajorquerytype,rangingfromtriviallowvolume,to2].Wehave
alsotalkedtoscientistsanddatabasedevelopersfromother
SDSS,2MASS,Gaia,DES,LOFARandPan-STARRS.
Todealwithunpredictabilityofanalysis,wewilluseshared
willhaveaccesstoallthedata,allthecolumns,eventhese
dictablecost–withsharedscansincreasingcomplexity
I/Oneeds,itonlyincreasestheCPUneeds.
Tokeepqueryloadundercontrol,wewillemploythrottling
[1][DMTR-12],Becla,J.,2013,Qserv300nodetest,DMTR-12,URLhttps://ls.st/DMTR-
[2]Becla,J.,2013,QueriesUsedforScalability&Performancehttps://dev.
lsstcorp.org/trac/wiki/db/queries/ForPerfTest
[3][DMTR-13],Becla,J.,2015,QservSummer15LargeScaleTests,DMTR-13,URLhttps:
//ls.st/DMTR-13
[4][LDM-555],Becla,J.,2017,DataManagementDatabaseRequirements,LDM-555,URL
[5]Becla,J.,Lim,K.T.,2013,CommonQueries,URLhttps://dev.lsstcorp.org/trac/wiki/
db/queries
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
52
LARGESYNOPTICSURVEYTELESCOPE
DataManagementDatabaseDesignLDM-135LatestRevision2017-07-07
[6][LDM-141],Becla,J.,Lim,K.T.,2013,DataManagementStorageSizingand,LDM-
141,URLhttps://ls.st/LDM-141
[7]Becla,J.,Lim,K.T.,Monkewitz,S.,Nieto-Santisteban,
Bunclark,P.S.,Lewis,J.R.(eds.)AstronomicalData
vol.394ofAstronomicalSocietyofthePacificConferenceADSLink
[8][DMTN-048],Becla,J.,Lim,K.T.,Wang,D.,2011,Qservdesignprototypingexperiments,
DMTN-048,URLhttps://dmtn-048.lsst.io
LSSTDataManagementTechnicalNote
[9][DMTN-046],Becla,J.,Lim,K.T.,Wang,D.,2013,Aninvestigationofdatabase,
DMTN-046,URLhttps://dmtn-046.lsst.io
LSSTDataManagementTechnicalNote
[10][DMTR-21],Becla,J.,Lim,K.T.,Wang,D.,2013,Early(pre-2013)Large-Scale,
DMTR-21,URLhttps://ls.st/DMTR-21
[11]Dorigo,A.,Elmer,P.,Furano,F.,Hanushevsky,A.,2005,
ers,4,348,URLhttp://xrootd.org/presentations/xpaper3_cut_journal.pdf
[12][LSE-163],Jurić,M.,etal.,2017,LSSTDataProductsDefinitionDocument,LSE-163,URL
https://ls.st/LSE-163
[13][Document-7025],Kantor,J.,Krabbendam,V.,2011,DMRiskRegister,Document-7025,
URLhttps://ls.st/Document-7025
[14][LDM-552],Mueller,F.,2017,QservSoftwareTestSpecification,LDM-552,URLhttps:
//ls.st/LDM-552
[15]Nobari,S.,Tauheed,F.,Heinis,T.,etal.,2013,In:
MODInternationalConferenceonManagementofData,
NewYork,NY,USA,doi:10.1145/2463676.2463700
[16][DMTR-16],Thukral,V.,2017,QservFall16LargeScaleTests/KPMs,DMTR-16,URLhttps:
//ls.st/DMTR-16
[17][DMTN-047],Tommaney,J.,Becla,J.,Lim,K.T.,Wang,D.,2011,TestswithInfiniDB,DMTN-
047,URLhttps://dmtn-047.lsst.io
LSSTDataManagementTechnicalNote
Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
Team.
53
Back to top