
delays
bounded network delays, 285
bounded process pauses, 298
unbounded network delays, 282
unbounded process pauses, 296
deleting data, 463
denormalization (data representation), 34, 554
costs, 39
in derived data systems, 386
materialized views, 101
updating derived data, 228, 231, 490
versus normalization, 462
derived data, 386, 439, 554
from change data capture, 454
in event sourcing, 458-458
maintaining derived state through logs,
452-457, 459-463
observing, by subscribing to streams, 512
outputs of batch and stream processing, 495
through application code, 505
versus distributed transactions, 492
deterministic operations, 255, 274, 554
accidental nondeterminism, 423
and fault tolerance, 423, 426
and idempotence, 478, 492
computing derived data, 495, 526, 531
in state machine replication, 349, 452, 458
joins, 476
DevOps, 394
differential dataflow, 504
dimension tables, 94
dimensional modeling (see star schemas)
directed acyclic graphs (DAGs), 424
dirty reads (transaction isolation), 234
dirty writes (transaction isolation), 235
discrimination, 534
disks (see hard disks)
distributed actor frameworks, 138
distributed filesystems, 398-399
decoupling from query engines, 417
indiscriminately dumping data into, 415
use by MapReduce, 402
distributed systems, 273-312, 554
Byzantine faults, 304-306
cloud versus supercomputing, 275
detecting network faults, 280
faults and partial failures, 274-277
formalization of consensus, 365
impossibility results, 338, 353
issues with failover, 157
limitations of distributed transactions, 363
multi-datacenter, 169, 335
network problems, 277-286
quorums, relying on, 301
reasons for using, 145, 151
synchronized clocks, relying on, 291-295
system models, 306-310
use of clocks and time, 287
distributed transactions (see transactions)
Django (web framework), 232
DNS (Domain Name System), 216, 372
Docker (container manager), 506
document data model, 30-42
comparison to relational model, 38-42
document references, 38, 403
document-oriented databases, 31
many-to-many relationships and joins, 36
multi-object transactions, need for, 231
versus relational model
convergence of models, 41
data locality, 41
document-partitioned indexes, 206, 217, 411
domain-driven design (DDD), 457
DRBD (Distributed Replicated Block Device),
153
drift (clocks), 289
Drill (query engine), 93
Druid (database), 461
Dryad (dataflow engine), 421
dual writes, problems with, 452, 507
duplicates, suppression of, 517
(see also idempotence)
using a unique ID, 518, 522
durability (transactions), 226, 554
duration (time), 287
measurement with monotonic clocks, 288
dynamic partitioning, 212
dynamically typed languages
analogy to schema-on-read, 40
code generation and, 127
Dynamo-style databases (see leaderless replica‐
tion)
E
edges (in graphs), 49, 403
property graph model, 50
edit distance (full-text search), 88
effectively-once semantics, 476, 516
Index | 567