Andrew Noyes
d7612a4426
Fix OPEN_FOR_IDE build errors
2019-04-05 16:30:42 -07:00
mpilman
4287b1d2a1
resolved minor merge issues
2019-04-05 13:12:19 -07:00
mpilman
c008e16c81
Defer formatting in traces to make them cheaper
...
This is the first part of making `TraceEvent` cheaper. The main idea is
to defer calls to any code that formats string. These are the main
changes:
- TraceEvent::detail now takes a c-string instead of std::string for
literals. This prevents unnecessary allocations if the trace is not
going to be printed in the first place (for example for SevDebug).
Before that `detail` expected a `std::string` as key, which mean that
any string literal would be copied on each call.
- Templates Traceable and SpecialTraceMetricType. These templates can be
specialized for any type that needs to be printed. The actual
formatting will be deferred to after the `enabled` check. This
provides two benefits: (1) if a TraceEvent is disabled, we don't pay
for the formatting and (2) TraceEvent can trace types that it doesn't
know about.
- TraceEvent::enabled will be set in the constructor if the Severity is
passed. This will make sure that `TraceEvent::init` is not called.
- `TraceEvent::detail` will be inlined. So for disabled TraceEvent
calls, a call to detail will only introduce a if-branch which is much
cheaper than a function call.
2019-04-05 13:12:19 -07:00
Markus Pilman
101a05ae77
Merge branch 'master' into features/client-simulator
2019-04-03 10:03:56 -08:00
Evan Tschannen
39c595223b
Merge branch 'release-6.1'
2019-04-02 22:30:02 -07:00
Evan Tschannen
628fec8c8b
updated status with information about ongoing maintenance
...
clear the maintenance zone if a different storage server is detected failed
2019-04-02 14:15:51 -07:00
mpilman
371a41dbba
Allow classPath to be modified at runtime
2019-04-02 11:56:40 -07:00
mpilman
e19901186f
Fixed buggy register preparation for natives
2019-04-02 11:56:03 -07:00
mpilman
b148981bba
Fixed compilation issues with char*
2019-04-01 14:29:45 -07:00
mpilman
e23e63c6ac
Implemented JavaWorkload
...
This change allows a user to write a workload in Java.
The way this is implemented is by creating a JVM within the
simulator and calling the corresponding workload class. A
workload can then run in the simulator or on a testing cluster.
If the workload is executed within the simulator, the resulting
test will not be deterministic anymore as it will execute in a
different thread (and even without that it is not clear, whether
we could get determinism as the JVM does a lot of stuff that are
not deterministic).
This is intendet to get better testing of the Java client and
layer authors can use the simulator to test their layers on a single
machine but they can still simulate failing machines etc.
2019-03-31 17:57:43 -07:00
Evan Tschannen
d882c060bf
Merge commit '5dd6396eed0de0dfea6cf9eecc307995eff5cedc'
2019-03-28 18:00:55 -07:00
Evan Tschannen
b6008558d3
renamed BinaryWriter.toStringRef() to .toValue(), because the function now returns a Standalone<StringRef>()
...
eliminated an unnecessary copy from the proxy commit path
eliminated an unnecessary copy from buffered peek cursor
2019-03-28 11:52:50 -07:00
Evan Tschannen
836bb95a7a
Merge pull request #1372 from etschannen/master
...
Merge 6.1 into master
2019-03-27 21:00:49 -07:00
A.J. Beamon
71e2fdafb8
Changes to ratekeeper camel case
2019-03-27 08:24:25 -07:00
Jingyu Zhou
7c02ee6fdd
Fix compiler warning about unreferenced exception variable
2019-03-26 13:43:47 -07:00
Jingyu Zhou
10988f89d9
Code refactoring for ConsistencyCheck.actor.cpp
2019-03-23 11:06:43 -07:00
Evan Tschannen
36ab852bb1
Merge branch 'master' into ratekeeper
...
# Conflicts:
# fdbserver/ClusterController.actor.cpp
2019-03-22 18:41:00 -07:00
Evan Tschannen
f9aad46573
made use_provisional_proxies a transaction option
2019-03-19 18:44:37 -07:00
Evan Tschannen
eb54a700ba
changed the old memory configuration to memory-1
2019-03-18 15:10:04 -07:00
Jingyu Zhou
254c78053c
Fix a segfault error
...
After wait, ServerDBInfo may have changed. Using the old copy is wrong.
2019-03-15 22:11:13 -07:00
Jingyu Zhou
12ddd56698
Fix Ratekeeper and DataDistributor placement
...
Make sure both RateKeeper and DataDistributor are placed in the same data
center as the Master. Make sure only one RateKeeper is live in the cluster as
well.
2019-03-15 17:09:28 -07:00
Jingyu Zhou
bb5686eb75
Fix monitoring of DD and RK
2019-03-15 16:02:17 -07:00
Jingyu Zhou
9f6fe5f649
Merge remote-tracking branch 'apple/master' into ratekeeper
2019-03-15 11:30:04 -07:00
Jingyu Zhou
40860e0093
Attempt to fix.
2019-03-15 11:29:04 -07:00
Jingyu Zhou
9e59c9c253
Check DataDistributor and RateKeeper fitness
...
Fail the test if they are not put in the best fitness.
2019-03-14 16:14:57 -07:00
Steve Atherton
dbacfcbc82
Merge branch 'master' into feature-backup-json
2019-03-13 13:30:45 -07:00
Evan Tschannen
e068c478b5
merge master
2019-03-12 18:31:25 -07:00
Steve Atherton
8aab719c22
Merge branch 'master' into feature-backup-json
2019-03-12 18:23:16 -07:00
Stephen Atherton
f0eae0295f
Merge branch 'master' of https://github.com/apple/foundationdb into feature-backup-json
2019-03-12 03:35:03 -07:00
Stephen Atherton
e9b8bf601e
Added backup status JSON output to backup workload to get sim coverage.
2019-03-12 03:34:38 -07:00
Evan Tschannen
2627bcd35e
Merge branch 'master' into feature-metadata-version
2019-03-10 21:13:28 -07:00
Evan Tschannen
044b6b4f8a
Merge branch 'master' into feature-degraded-tlog
...
# Conflicts:
# fdbserver/ClusterController.actor.cpp
2019-03-08 22:50:41 -05:00
Evan Tschannen
710a64dc4e
replaced std::pair<WorkerInterface,ProcessClass> with a struct named WorkerDetails
2019-03-08 11:25:07 -05:00
Balachandar Namasivayam
f3391ea413
Merge pull request #1240 from satherton/feature-restore-by-timestamp
...
Restore by timestamp
2019-03-06 16:21:06 -08:00
Stephen Atherton
7778112f6a
Bug fix, restore was using the destination cluster to look up timestamps when printing the backup description instead of (optionally) the original cluster which generated the backup. Made missing cluster file errors more clear.
2019-03-06 02:45:55 -08:00
anoyes
981426bac9
More ide fixes
2019-03-05 18:03:57 -08:00
Evan Tschannen
82d957e0bb
Merge pull request #1178 from vishesh/task/issue-963-IPv6
...
IPv6 Support
2019-03-05 17:14:16 -08:00
Steve Atherton
21f55e1878
Merge pull request #1190 from bnamasivayam/restore-multiple-ranges
...
Add support for restoring multiple ranges.
2019-03-05 10:15:55 -08:00
Evan Tschannen
f1897f3eb6
Merge branch 'master' into feature-metadata-version
...
# Conflicts:
# fdbclient/NativeAPI.actor.cpp
2019-03-04 21:06:16 -08:00
Evan Tschannen
69d7633d5b
Merge pull request #1217 from alexmiller-apple/tstlog-goodref
...
Spill-By-Reference TLog Part 4: Actually Usable Reference Spilling
2019-03-04 20:58:24 -08:00
Trevor Clinkenbeard
89cbb77b4e
Merge branch 'master' of https://github.com/apple/foundationdb into lazily-fetch-health-metrics
2019-03-04 14:17:58 -08:00
Trevor Clinkenbeard
56ae46f89e
Client lazily fetches health metrics from proxies
2019-03-04 14:16:39 -08:00
Vishesh Yadav
cc9ad0e202
net: Use IPv6 in simulation testing #963
...
25% times we will use IPv6 addresses
2019-03-04 14:12:45 -08:00
Vishesh Yadav
57832e625d
net: Support IPv6 #963
...
- NetworkAddress now contains IPAddress object which can be either
IPv4 or IPv6 address. 128bits are used even for IPv4 addresses,
however only 32bits are used when using/serializing IPv4 address.
- ConnectPacket is updated to store IPv6 address. Backward compatible
with old format since the first 32bits of IP address field is used
for serialization of IPv4.
- Mainly updates rest of the code to use IPAddress structure instead
of plain uint32_t.
- IPv6 address/pair ports should be represented as `[ip]:port` as per
convention. This applies to both cluster files and command line
arguments.
2019-03-04 14:12:41 -08:00
Alex Miller
fb4cb8c3a8
Print out configuration changes in ConfigureTest.
2019-03-04 01:42:38 -08:00
Alex Miller
4d4e0a1d54
Fix the build on -O0.
...
C++ < 17 requires definitions of declared static constexpr variables.
2019-03-04 01:42:38 -08:00
Alex Miller
db546af4a3
Fix the build on -O0.
...
C++ < 17 requires definitions of declared static constexpr variables.
2019-03-04 01:38:58 -08:00
Evan Tschannen
075fdef31a
Merge branch 'master' into feature-metadata-version
...
# Conflicts:
# fdbclient/DatabaseContext.h
2019-03-03 22:58:45 -08:00
Evan Tschannen
057ebe56e4
fix: unknownCommit handling relied on soleOwnership of the version stamp keys, so we need to use a second key to track the commit version for the metadataVersionKey
...
renamed a confusing option
2019-03-03 21:31:40 -08:00
Trevor Clinkenbeard
39f612d132
Merge branch 'master' of https://github.com/apple/foundationdb into add-health-metrics
2019-03-02 17:07:00 -08:00
Evan Tschannen
c1de93f467
fix: binary search could get stuck in an infinite loop
...
fix: avoid picking a read version which could be before the last real commit
fix: we must wait on metadataVersionKey in case it is not already cached
fixed review comments
2019-03-02 13:55:41 -08:00
Balachandar Namasivayam
2a47fbb5a2
Fix tab spaces again
2019-03-01 15:30:40 -08:00
Balachandar Namasivayam
4324434b99
Fix tab spaces
2019-03-01 15:29:09 -08:00
Balachandar Namasivayam
74f64e4570
Fix FuzzApiCorrectness failure.
2019-03-01 15:25:37 -08:00
Evan Tschannen
2168b14834
Merge branch 'master' into feature-metadata-version
2019-02-28 17:45:55 -08:00
Evan Tschannen
3da85f3acd
implemented the \xff/metadataVersion key, which can be used by layers to help them cheaply cache metadata and know when their cache is invalid
2019-02-28 17:45:00 -08:00
A.J. Beamon
e2bcecb08f
Merge branch 'master' into ratekeeper-batch-priority-limits
2019-02-28 12:52:44 -08:00
A.J. Beamon
759c53a333
Add some very basic exercising of priority batch to simulation
2019-02-28 12:18:45 -08:00
A.J. Beamon
eb629d87a5
Add information about batch ratekeeper to status. Make it possible to track latencies in the ReadWrite workload for concurrently run instances separately.
2019-02-28 09:53:16 -08:00
A.J. Beamon
af69ba035a
Merge pull request #1200 from bnamasivayam/master
...
User provided Client Transaction id merging
2019-02-28 11:16:44 -05:00
Trevor Clinkenbeard
d2bde4e55b
Merge branch 'master' of https://github.com/apple/foundationdb into add-health-metrics
2019-02-27 16:30:25 -08:00
Trevor Clinkenbeard
3f59f82670
Calculate durabilityLag instead of NDV for health metrics
2019-02-27 16:30:01 -08:00
Evan Tschannen
fafb66b0a8
Merge pull request #1126 from bnamasivayam/ratelimit-consistencycheck
...
Dynamically rate limit consistency check.
2019-02-27 14:43:09 -08:00
A.J. Beamon
bccc065fdd
Update fdbserver/workloads/ClientTransactionProfileCorrectness.actor.cpp
...
Co-Authored-By: bnamasivayam <36455962+bnamasivayam@users.noreply.github.com>
2019-02-27 14:24:16 -08:00
Balachandar Namasivayam
cc0aac588a
Merge branch 'client-transaction-id'
...
* client-transaction-id:
Add user provided transaction id as part of the format in ClientTransactionProfileCorrectness workload.
Addressed review comments.
Apply suggestions from code review
Add support for client provided transaction identifier to be logged as part of trace logging or transaction profiling.
2019-02-27 13:11:15 -08:00
Balachandar Namasivayam
00acef6d4a
Add user provided transaction id as part of the format in ClientTransactionProfileCorrectness workload.
2019-02-27 13:10:49 -08:00
A.J. Beamon
a051055caf
Initial implementation of adding separate limits for batch priority in ratekeeper
2019-02-27 10:31:56 -08:00
Evan Tschannen
8afb7fbb9d
Merge pull request #1160 from alexmiller-apple/tstlog-fork
...
Spill-By-Reference TLog Part 2: New and Old TLogServers co-exist harmoniously
2019-02-26 18:00:04 -08:00
Balachandar Namasivayam
ab99497695
Addressed review comments.
2019-02-25 18:29:30 -08:00
Balachandar Namasivayam
7eba50b086
Add support for restoring multiple ranges.
2019-02-25 18:00:28 -08:00
Trevor Clinkenbeard
abfe057805
Merge branch 'master' of https://github.com/apple/foundationdb into add-health-metrics
2019-02-25 13:47:16 -08:00
Evan Tschannen
b8910ba7cd
Merge branch 'master' into feature-fix-force-recovery
...
# Conflicts:
# fdbclient/ManagementAPI.actor.h
# fdbserver/DataDistribution.actor.cpp
# fdbserver/storageserver.actor.cpp
# fdbserver/workloads/KillRegion.actor.cpp
2019-02-22 14:38:13 -08:00
Evan Tschannen
d008de576e
Merge pull request #1139 from xumengpanda/mengxu/machine-team-upgrade-PR
...
Add background actor to remove redundant teams
2019-02-22 14:22:07 -08:00
Alex Miller
6d23eb2d1a
Implement log_version.
...
This mega-commit introduces a new configuration setting, `log_version`,
that controls the TLog implementations and features that are available
within FDB, so that users can opt in to new features if they're willing
to sacrifice backwards compatibility.
2019-02-22 12:15:23 -08:00
Trevor Clinkenbeard
0d7f26beb1
Removed unnecessary code from Throttling.actor.cpp
2019-02-21 16:20:10 -08:00
Trevor Clinkenbeard
fb925f8ca6
Improved Throttling workload
...
Test now fails if client health metrics stop updating. Added SevError
trace lines for different failure cases. Also fixed bug so that
(detailedWorstDiskUsage == 0) causes test failure when detailed health
metrics are sent.
2019-02-21 15:50:17 -08:00
Trevor Clinkenbeard
fa96b8dd33
Merge branch 'master' of https://github.com/apple/foundationdb into add-health-metrics
2019-02-20 16:56:16 -08:00
Evan Tschannen
27e3617548
fix: remove bad teams needed to use dd_stall_check delay, because in simulation the buggified delay time could make us remove bad teams before they submit their ranges to the queue
2019-02-20 14:18:36 -08:00
Trevor Clinkenbeard
a20f5482bc
Created StorageStats struct to combine health metrics for storage servers
2019-02-20 11:57:41 -08:00
Trevor Clinkenbeard
1bb08b6e14
Minor bug fix in Throttling.actor.cpp
2019-02-20 11:46:24 -08:00
Alex Miller
7b1afdc71e
Hacky plumbing of spill type and file renaming.
2019-02-19 22:18:10 -08:00
Alex Miller
cd00b749c8
Add changing log_engine to ConfigureTest.
2019-02-19 22:10:46 -08:00
mpilman
f79a9594c1
Several bugfixes to make fdb build on non-ide
2019-02-19 15:16:59 -08:00
mpilman
999ea09bfd
Use correct fwd decls in TesterInterface
...
Also TesterInterface.h -> TesterInterface.actor.h
2019-02-19 15:16:59 -08:00
mpilman
699216f713
Use fwd decls in workloads
...
Also workloads.h -> workloads.actor.h
2019-02-19 15:16:59 -08:00
mpilman
3f0fd2a20c
Use fwd decls in WorkerInterface
...
Also WorkerInterface.h -> WorkerInterface.actor.h
2019-02-19 15:16:59 -08:00
mpilman
27a3153719
Use ACTOR forward declarations in MoveKeys
...
Also MoveKeys.h -> MoveKeys.actor.h
2019-02-19 15:16:59 -08:00
mpilman
3a0f9839b9
Fix minor IDE build errors
2019-02-19 15:16:59 -08:00
mpilman
0bb60e5a3b
Use proper fwd decl in NativeAPI
...
Also NativeAPI.h -> NativeAPI.actor.h
2019-02-19 15:16:59 -08:00
mpilman
78dd80ea8a
Proper fwd decl in BackupAgent
...
Also BackupAgent.h -> BackupAgent.actor.h
2019-02-19 15:16:59 -08:00
mpilman
3cb2391b58
use proper fwd declarations in ManagementAPI
...
Also ManagementAPI.h -> ManagementAPI.actor.h
2019-02-19 15:16:59 -08:00
Evan Tschannen
8ed89fd711
fixed review comments
2019-02-19 11:26:53 -08:00
Meng Xu
b35631365f
TeamRemover: Solve confict when merge with PR 1061
...
The previous commit merge with the master, which just merges
the pull request #1062 from jzhou77/PR that adds a new DataDistribution role.
The merge causes conflicts and errors in simulation tests.
This commit resolves the code conflicts and
tries to fix the new errors after incorporating the new DataDistribution role
2019-02-19 08:13:10 -08:00
Evan Tschannen
ed9e20ce17
forgot to fix merge conflicts
2019-02-18 17:09:55 -08:00
Evan Tschannen
065a45e05f
Merge branch 'master' into feature-fix-force-recovery
...
# Conflicts:
# fdbclient/ManagementAPI.actor.cpp
# fdbserver/ClusterController.actor.cpp
# fdbserver/workloads/KillRegion.actor.cpp
2019-02-18 17:09:06 -08:00
Evan Tschannen
62603d11a1
updated the killRegion simulation test to test a much larger variety of failure scenarios
2019-02-18 15:32:51 -08:00
Vishesh Yadav
e05b53d755
Merge remote-tracking branch 'apple/master' into task/tls-upgrade
2019-02-15 20:37:07 -08:00
Meng Xu
6d09ac483c
Merge with master
2019-02-15 17:03:40 -08:00
Jingyu Zhou
bf6da81bf9
Remove recovery version from data distribution queue
...
This parameter is no longer used/needed.
2019-02-14 16:37:16 -08:00
Vishesh Yadav
907446d0ce
Merge remote-tracking branch 'apple/master' into task/tls-upgrade
2019-02-14 11:37:38 -08:00
A.J. Beamon
b435d51061
Merge branch 'master' into track-server-request-latencies
2019-02-14 08:07:32 -08:00
Meng Xu
5481851e82
TeamCollection: Add knobs for team remover
...
Added three knobs to control team remover
bool TR_FLAG_DISABLE_TEAM_REMOVER:
Disable the teamRemover actor
double TR_REMOVE_MACHINE_TEAM_DELAY:
Wait for the specified time before try to remove next machine team
double TR_WAIT_FOR_ALL_MACHINES_HEALTHY_DELAY:
Wait before checking if all machines are healthy
2019-02-13 15:11:56 -08:00
Andrew Noyes
067a445e06
Replace unused _ variables with wait(success(...))
2019-02-12 17:30:30 -08:00
Meng Xu
3ae8767ee8
TeamCollection: Apply clang-format
2019-02-12 13:41:18 -08:00
Balachandar Namasivayam
f44f26c232
Dynamically rate limit consistency check.
2019-02-07 16:08:39 -08:00
A.J. Beamon
eb7c678e59
Return Void() in an actor return statement
2019-02-07 14:03:36 -08:00
A.J. Beamon
d4349293b9
Reworked the way latency counters are tracked. Report the latency bands in separate events from StorageMetrics and ProxyMetrics. Fix a problem when the latency band configuration was changed. Add correctness testing.
2019-02-07 13:39:22 -08:00
Evan Tschannen
7e0e0a7673
Merge pull request #1105 from vishesh/task/issue-218-compare-and-clear
...
Implements CompareAndClear AtomicOp
2019-02-05 18:11:28 -08:00
Meng Xu
2b73c89e98
TeamCollection: Test the number of teams
...
Call the traceTeamCollectionInfo function to record the team numbers
when we add a team directly from the shard information, instead of
using addTeamsBestOf logic.
2019-02-05 15:58:16 -08:00
Meng Xu
f5171d1b57
TeamCollection: Test the number of teams
...
The current simulator does not validate if the number of teams in
the system is larger than the maximum desired number of teams.
This validation should be added because we do NOT want too many teams
in the system, which may impede the systems availability when
multiple fault zones (e.g., machines) crashes at the same time.
This commit adds the test at the consistency check in simulation.
Since the current code does not handle the upgrading situation
when we enforce the machine teams, the test is expected to fail.
The later commit will handle the upgrading situation which gracefully
remove the surplus teams.
2019-02-04 18:14:36 -08:00
Vishesh Yadav
c532d5c277
Implements CompareAndClear AtomicOp
...
Adds CompareAndClear mutation. If the given parameter is equal to the
current value of the key, the key is cleared. At client, the mutation
is added to the operation stack. Hence if the mutation evaluates to
clear, we only get to know so when `read()` evaluates the stack in
`RYWIterator::kv()`, which is unlike what we currently do for typical
ClearRange.
2019-02-04 14:59:56 -08:00
Trevor Clinkenbeard
a09afe5906
Added Throttling workload to test native health metrics API
2019-02-04 13:04:25 -08:00
Evan Tschannen
e9ddd94e27
The failure monitor is given a list of all IP addresses associated with a process
...
The connect packet includes the correct remote address
Did a lot of code cleanup
Simulation test mixed TLS and non-TLS listeners on the same process
2019-01-31 18:20:14 -08:00
Evan Tschannen
1d7fec3074
Merge commit '048bfc5c368063d9e009513078dab88be0cbd5b0' into task/tls-upgrade-2
...
# Conflicts:
# .gitignore
2019-01-24 17:43:06 -08:00
Evan Tschannen
699f8dd617
fix: coordinators auto could put two coordinators in the same zone
...
simulation now tests two machines in the same zone
2019-01-18 15:42:48 -08:00
Evan Tschannen
4eb11d74af
Merge pull request #1029 from bnamasivayam/reenable-check_desired_classes
...
Re-enable CheckDesiredClasses after making necessary changes for mult…
2019-01-11 17:15:05 -08:00
A.J. Beamon
d4d5740282
* Add Optional.map and ErrorOr.map.
...
* Rename Optional/ErrorOr cast_to to castTo.
* Make printable(Optional<T>) templated rather than restricted to StringRef types.
* Fixes bug in (unused) ErrorOr.castTo where an ErrorOr that was not set would lose its error.
2019-01-11 09:03:38 -08:00
Balachandar Namasivayam
a8e2e75cd5
Re-enable CheckDesiredClasses after making necessary changes for multi-region setup.
...
Fixed a couple of bugs
1) A rare race condition where a worker is being roles even after it died.
2) Fix how RoleFitness is calculated for TLog and LogRouter. Only worst fitness is compared to see if a better fit is available.
2019-01-10 10:28:32 -08:00
Evan Tschannen
684a22a52b
Merge branch 'release-6.0'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbbackup/backup.actor.cpp
# fdbclient/BackupContainer.actor.cpp
# fdbclient/HTTP.actor.cpp
# fdbserver/storageserver.actor.cpp
# fdbserver/workloads/BackupCorrectness.actor.cpp
# versions.target
2019-01-09 16:14:46 -08:00
Stephen Atherton
604ad062d5
Updated backup correctness test to new behavior. WaitBackup() can now return the UID and BackupContainer atomically with the status code for a backup tag.
2019-01-08 18:12:15 -08:00
anoyes
6a4d87802b
Replace & operator with variadic function
2018-12-28 11:33:42 -08:00
Vishesh Yadav
3eb9b23024
Listen to multiple addresses and start using vector<NetworkAdddress> in Endpoint
...
- This patch will make FDB listen to multiple addresses given via
command line. Although, we'll still use first address in most places,
this patch starts using vector<NetworkAddress> in Endpoint at some basic
places.
- When sending packets to an endpoint, pick a random network address in
endpoints
- Renames Endpoint::address to Endpoint::addresses since it
now holds a vector of addresses.
2018-12-13 13:36:52 -08:00
Vishesh Yadav
43e5a46f9b
Change Endpoint::address(NetworkAddress) to vector<NetworkAddress>
...
Extend `Endpoint` class to take multiple NetworkAddresses instead of
just one. Hence, to talk to an endpoint instead of one IP:PORT, we'll
have multiple IP:PORT pairs.
This patch simply adds the field and makes changes to compile the
codebase. The first element of of `address` field is used everywhere.
Hence the way we talk to remains same with this patch.
NOTE:
Directly accessing the first memeber of Endpoint::address is unsafe
as Endpoint() doesn't enforces non-empty address list. However, since
the correctness test pass for now and are anyway replacing all those
unsafe accesses with ones considering the whole vector, this patch
ignores to access them in safe way.
2018-12-13 13:36:52 -08:00
Meng Xu
8de031f9a6
TeamCollection: clang-format
...
Format the changes with git clang-format.
No functional changes.
Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-11-21 11:18:26 -08:00
Meng Xu
f7a7e069f0
TeamCollection: Remove unnecessary comments
...
Pass 41806 tests with no failure
Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-11-16 15:56:35 -08:00
Meng Xu
5051b35c61
TeamCollection: Use machine team to create server team
...
Current server team collection logic does not consider
the fact that multipe storage servers can run on the same machine.
When multiple machines fail, all servers on the machines will fail, and
the possibility of having one process team fail and lose data is very high.
To reduce the possibility of losing data when multiple machine fails,
we first create machine teams which span across different fault zones;
we then create server teams based on machine teams by
first picking 1 machine team, and then
picking 1 server from each machine in the machine team.
Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-11-16 15:53:22 -08:00
Evan Tschannen
4e54690005
Merge branch 'release-6.0'
...
# Conflicts:
# fdbserver/DataDistribution.actor.cpp
# fdbserver/MoveKeys.actor.cpp
2018-11-12 20:26:58 -08:00
Evan Tschannen
6353a6724b
strengthened the protections related to changing regions
2018-11-12 17:40:40 -08:00
Evan Tschannen
4b5d0b4e2c
Merge branch 'release-6.0'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbclient/AsyncFileBlobStore.actor.cpp
# fdbclient/AsyncFileBlobStore.actor.h
# fdbclient/BlobStore.actor.cpp
# fdbclient/BlobStore.h
# fdbclient/HTTP.actor.cpp
# fdbclient/ManagementAPI.actor.cpp
# fdbclient/NativeAPI.actor.cpp
# fdbrpc/LoadBalance.actor.h
# fdbrpc/batcher.actor.h
# fdbrpc/fdbrpc.vcxproj
# fdbrpc/sim2.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/DataDistributionTracker.actor.cpp
# fdbserver/SimulatedCluster.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/masterserver.actor.cpp
2018-11-10 13:04:24 -08:00
Evan Tschannen
a654183f63
Merge pull request #791 from ajbeamon/remove-cluster-from-iclientapi
...
Remove cluster from IClientApi (phase 2 of removing DB names)
2018-11-10 10:16:18 -08:00
Evan Tschannen
6874e379fc
fix: set the simulator’s view of usable regions to one during configure tests which can disable usable regions
2018-11-09 10:06:03 -08:00
Evan Tschannen
bd60027544
test region priority changes
2018-11-04 20:11:23 -08:00
Evan Tschannen
c02690471d
added protection against configuration changes which cannot be immediately reverted
...
the configure database workload tests region configurations
2018-11-04 19:53:55 -08:00
Robert Escriva
268093a96d
Adjust all includes to be relative to the root.
...
Remove the use of relative paths. A header at foo/bar.h could be included by
files under foo/ with "bar.h", but would be included everywhere else as
"foo/bar.h". Adjust so that every include references such a header with the
latter form.
Signed-off-by: Robert Escriva <rescriva@dropbox.com>
2018-10-19 17:35:33 +00:00
Stephen Atherton
22f8a4efa9
Normalized all unit test names to begin with "/" if they should be included in random unit testing.
2018-10-05 22:09:58 -07:00
Stephen Atherton
3ea9193fa7
Renamed redwood to redwood-experimental. UnitTest names can now be hidden using # as the first character so that random correctness tests will not run them. Excluded redwood tests from correctness testing. Reverted default storage engine to ssd.
2018-10-05 14:43:54 -07:00
Stephen Atherton
7c1dc305cb
Merge commit 'a72c8f5cb2e79a673abc0ed3d27ef1c51028fb13' into feature-redwood
2018-10-05 10:15:10 -07:00
Evan Tschannen
3922e477a5
Merge branch 'release-6.0'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbclient/ManagementAPI.actor.cpp
# fdbserver/ClusterController.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/LogSystemDiskQueueAdapter.actor.cpp
# fdbserver/SimulatedCluster.actor.cpp
# fdbserver/TLogServer.actor.cpp
2018-10-03 16:57:18 -07:00
Evan Tschannen
c9f4109539
fix: add some additional time in the kill region workload to detect if we recovered successfully
2018-10-02 17:47:15 -07:00
Evan Tschannen
b560b94ebc
fix: do not force a recovery if the master was already in the other region (and therefore already recovered)
...
fix: reboot the remaining DC, because any storage server rejoins that were rolled back will cause that server to be unusable
2018-09-28 12:10:04 -07:00
A.J. Beamon
c831051474
This removes the idea of clusters from IClientApi.
2018-09-21 15:58:14 -07:00
Evan Tschannen
77e2fb787e
Merge branch 'release-6.0' into feature-fix-forced-recovery
2018-09-21 14:55:37 -07:00
Evan Tschannen
3f86905ea7
fix: restore did not take into account that the end version of a log file does not exist in that file. This resulted in restores done at the same version a snapshot completes to not apply the mutations at that final version.
2018-09-21 11:48:28 -07:00
Stephen Atherton
2fc86c5ff3
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
...
# Conflicts:
# fdbrpc/AsyncFileCached.actor.h
# fdbserver/IKeyValueStore.h
# fdbserver/KeyValueStoreMemory.actor.cpp
# fdbserver/workloads/StatusWorkload.actor.cpp
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-09-20 03:39:55 -07:00
Evan Tschannen
200e65fe61
added a workload which tests killing an entire region, and recovering from the failure with data loss.
...
fix: we cannot pop the txs tag from remote logs until they have a full copy of the txnStateStore
fix: we have to modify all of history, we cannot stop after finding a local remote
2018-09-17 18:32:39 -07:00
Alec Grieser
10a8e67266
Merge remote-tracking branch 'upstream/release-6.0' into merge-release-6.0
2018-09-11 21:49:59 -07:00
Evan Tschannen
c4eea395d4
fix: schema basic test did not pass in a complete json document
2018-09-10 09:28:53 -07:00
Stephen Atherton
ce3f01a0cf
Added concept of type to JsonString. Appending single items or key/value pairs is now type-safe and only allowed in certain cases. JsonString will refuse to produce invalid JSON. All duplicative functions have been replaced with templates. Encoding of values uses json_spirit's value writer which should be no worse performance than format() and it will escape everything properly. Final string form is now built directly using knowledge of type, such as when an instance becomes an Array or Object the appropriate opening character is written. This avoids a full copy just to prepend the opening character later. Index interface for key/value pairs no longer makes a temporary copy of the key string. JsonString is now only needed by Status.actor.cpp. Still more work to be done here.
2018-09-08 07:15:28 -07:00
Evan Tschannen
90301f497f
Merge branch 'release-6.0'
...
# Conflicts:
# fdbclient/ManagementAPI.actor.cpp
# fdbrpc/FlowTransport.actor.cpp
# fdbrpc/TLSConnection.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/Status.actor.cpp
# fdbserver/storageserver.actor.cpp
# fdbserver/workloads/StatusWorkload.actor.cpp
# versions.target
2018-09-05 16:06:33 -07:00
Evan Tschannen
dcdbb3ec4d
Merge branch 'release-6.0' of github.com:apple/foundationdb into feature-movekey-fixes
2018-09-05 10:29:13 -07:00
Evan Tschannen
40f5dbe423
fixed issues from review, added a safeguard to prevent configuring a cluster to an invalid configuration
2018-09-04 22:16:35 -07:00
Evan Tschannen
21f5cf9ce9
suppress spammy trace events
2018-09-04 17:12:26 -07:00
Evan Tschannen
d8ea3dbf9a
Added the ability to configure a cluster from a JSON file
2018-08-16 17:34:59 -07:00
A.J. Beamon
2a97139d5d
This is the first step in eliminating the usage of database names in our code. The C API remains the same, but underneath that all usage of database names is eliminated.
2018-08-16 10:24:12 -07:00
Alex Miller
63b1e85338
Ban `Void _ = wait(...)` constructions, and require just `wait(...)`.
...
There's never any reason to save the value of a Void return, and it's
the easiest source of redefined variable bugs that will creep back in
over time. So just `wait(...)`, it's cleaner that way.
2018-08-14 15:50:26 -07:00
Alex Miller
bca324eaa6
More actorcompiler.h fixes and additions.
2018-08-14 15:50:26 -07:00
Alex Miller
fb31a6999f
Rewrite all files to have #include actorcompiler.h as the last include.
2018-08-14 15:50:26 -07:00
Alex Miller
07e5281142
Restrict actor keyword #defines to actor files.
...
This introduces a new rule in our codebase, that any file that #includes
actorcompiler.h needs to do it as the last #include, and it needs to
then #include unactorcompiler.h at the end of the file.
The point of this is that it prevents our actorcompiler.h #defines from
leaking into boost or the c++ standard library. Both of these start
throwing errors if you s/state// their code, which `#define state `
effectively does.
2018-08-14 15:50:26 -07:00
Alex Miller
535b5701e5
Rewrite all `Void _ = wait(...)` -> `wait(...)`.
...
This takes advantage of the new actorcompiler functionality to avoid
having duplicate definitions of `Void _` when trying to feed the
un-actorompiled source through clang.
2018-08-14 15:50:26 -07:00
A.J. Beamon
3535ddad80
Merge pull request #674 from alexmiller-apple/glibcxx-debug-fixes
...
Fix bugs uncovered by -D_GLIBCXX_DEBUG
2018-08-09 08:18:51 -07:00
Alex Miller
9c2fdee86f
Simplify boolean logic.
2018-08-08 16:12:48 -07:00
Alex Miller
9ab56e5052
Remove a dead line that was accessing an empty vector.
2018-08-01 19:17:49 -07:00
Alex Miller
f40f33f555
Do not attempt to dereference past-the-end iterators in MemoryKVS.
...
mapIter could have been store.end(), if selector.getKey() was larger than
anything in store, and various conditionals weren't tolerant of this case.
This fixes #652 .
2018-08-01 18:42:37 -07:00
Evan Tschannen
1c29275672
call all methods which could disable a trace event before it is initialized. In practice this means calling .error first, then .suppressFor, then all your details.
2018-08-01 14:30:57 -07:00
Stephen Atherton
96389c74cd
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
...
# Conflicts:
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-07-10 16:42:34 -07:00
Stephen Atherton
1bc95862b7
Merge branch 'release-6.0' of github.com:apple/foundationdb into feature-redwood
...
# Conflicts:
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-07-10 04:16:02 -07:00
Evan Tschannen
2820b6e0bb
data inconsistency is always an error when detected by the consistency check
2018-07-09 22:26:13 -07:00
Evan Tschannen
82cc30be62
added testing for two_satellite_fast and two_satellite_safe
2018-07-09 22:01:46 -07:00
Stephen Atherton
b2fc2b4829
Gave status schema validation trace events more descriptive names. Fixed schema in tests.
2018-07-05 17:05:47 -07:00
Evan Tschannen
cd4fb9285a
waitForExlusion requires both regions to be healthy, which is only possible if we do not kill all logs in a region
2018-07-05 14:04:42 -07:00
Stephen Atherton
9d85a05372
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
...
# Conflicts:
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-07-05 12:52:06 -07:00
Evan Tschannen
da5a232d7e
fix: If we have not recruited the remote logs yet and detect a configuration change, we must fail the master to update the remote recruitment request
2018-07-05 12:17:41 -07:00
Evan Tschannen
e17dfea3b6
fix: desiredTLogCount was used instead of getDesiredLogs(), which caused problems with recruitment when desiredTLogCount was -1.
...
canKillProcess logic was wrong.
We still need to configure usable_regions because if datacenterVersionDifference is too large we cannot complete data movement.
2018-07-04 16:22:32 -04:00
Stephen Atherton
2925b9b984
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
2018-07-03 23:03:56 -07:00
Stephen Atherton
09e68a4335
Lots of bug fixes around page reads and concurrency.
2018-07-03 15:39:32 -07:00
Evan Tschannen
604b3bca17
increased the api correctness timeout
2018-07-02 12:51:50 -04:00
Evan Tschannen
89a4b2cd68
fix: consistency check could loop too long
2018-07-02 12:08:02 -04:00
Evan Tschannen
73e61312c6
fix: shareLogRange was never initialized
2018-07-01 22:49:24 -04:00
Stephen Atherton
b95a2bd6c1
Merge commit 'b17c8359ec22892ed4daeaa569f2f5e105477251' into feature-redwood
...
# Conflicts:
# flow/Trace.cpp
2018-06-30 23:18:29 -07:00
Evan Tschannen
4a3247da69
fixed a few problems with the consistency check
2018-06-30 10:39:28 -07:00
Evan Tschannen
02f616eb68
fix: consistency check was broken when the key server key space is sharded
2018-06-28 23:16:32 -07:00
Balachandar Namasivayam
8caa6eaecf
Merge pull request #541 from etschannen/feature-remote-logs
...
More multiple DC improvements
2018-06-28 11:22:08 -07:00
Evan Tschannen
45cf0067e4
fix: consistency check was not checking for data inconsistencies
2018-06-28 11:08:16 -07:00
A.J. Beamon
9f545ce002
Merge commit '892727e358c0b3f075564c60c2b7cedb64306f83' into trace-log-refactor
2018-06-26 11:37:23 -07:00
Stephen Atherton
e5c48d453a
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
2018-06-18 22:45:27 -07:00
Evan Tschannen
1ccfb3a0f4
fix: log_anti_quorum was always 0 in simulation
...
removed durableStorageQuorum, because it is no longer a useful configuration parameter
2018-06-18 10:24:57 -07:00
Evan Tschannen
0913368651
added usable_regions to specify if we will replicate into a remote region
...
remote replication defaults to the primary replication
removed remote_logs, because they should be specified as an override in the regions object
2018-06-17 19:31:15 -07:00
Stephen Atherton
90c8288c68
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
2018-06-17 14:55:05 -07:00
Evan Tschannen
99e21c869c
fixed a number of status calculations, and re-enabled the status workload
2018-06-14 17:58:57 -07:00
Stephen Atherton
1eae9d621b
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
2018-06-13 15:58:21 -07:00
Stephen Atherton
2878f30f29
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
...
# Conflicts:
# fdbserver/IKeyValueStore.h
# fdbserver/KeyValueStoreMemory.actor.cpp
# fdbserver/storageserver.actor.cpp
2018-06-13 15:56:06 -07:00
Richard Low
39894ea798
Merge remote-tracking branch 'apple/release-5.2'
2018-06-12 18:31:20 -07:00
Evan Tschannen
372ed67497
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# fdbserver/DataDistribution.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/TagPartitionedLogSystem.actor.cpp
2018-06-11 11:34:10 -07:00
A.J. Beamon
f965954122
Merge commit '82be52205b95464e355c449fdf3e7d483fa06677' into trace-log-refactor
...
# Conflicts:
# fdbserver/Status.actor.cpp
# fdbserver/workloads/DDMetrics.actor.cpp
# flow/Trace.cpp
2018-06-08 16:22:22 -07:00
Evan Tschannen
b9826dc1cb
fix: do not automatically reduce redundancy we move keys if the database does not have remote replicas. This is to prevent problems when dropping remote replicas from a configuration.
2018-06-08 16:17:27 -07:00
A.J. Beamon
99c9958db7
Some more trace event normalization
2018-06-08 13:57:00 -07:00
A.J. Beamon
0ca51989bb
Merge branch 'master' into trace-log-refactor
...
# Conflicts:
# fdbserver/QuietDatabase.actor.cpp
# fdbserver/Status.actor.cpp
# flow/Trace.cpp
2018-06-08 13:24:30 -07:00
Balachandar Namasivayam
20febf5ef9
Address review comments.
2018-06-08 11:24:51 -07:00
A.J. Beamon
e5488419cc
Attempt to normalize trace events:
...
* Detail names now all start with an uppercase character and contain no underscores. Ideally these should be head-first camel case, though that was harder to check.
* Type names have the same rules, except they allow one underscore (to support a usage pattern Context_Type). The first character after the underscore is also uppercase.
* Use seconds instead of milliseconds in details.
Added a check when events are logged in simulation that logs a message to stderr if the first two rules above aren't followed.
This probably doesn't address every instance of the above problems, but all of the events I was able to hit in simulation pass the check.
2018-06-08 11:11:08 -07:00
Balachandar Namasivayam
514b0e3c20
Having fixed limits for getRange results in continuously getting transaction_too_old error in some scenarios.
...
Cutting the limits by half in such cases allows to test to progress.
2018-06-07 15:27:05 -07:00