Daniel Smith
9c2937d4d0
Only check for files/directories when needed
2020-06-29 16:25:36 +00:00
Daniel Smith
b53faa1695
Actually check directory suffix
2020-06-18 17:21:14 +00:00
Daniel Smith
73091b212c
Allow detection of storage engines by presense of directory.
2020-06-17 21:50:06 +00:00
Young Liu
4dfb903a3a
tmp merge
2020-06-16 20:32:07 -07:00
Meng Xu
96206a8032
Merge pull request #3368 from apple/release-6.3
...
Merge Release 6.3 to master
2020-06-15 20:15:22 -07:00
Evan Tschannen
4c7d43271a
merge 6.3 into 7.0
2020-06-15 11:14:11 -07:00
Daniel Smith
a959c6eb23
Fix copy/paste error
2020-06-15 16:48:19 +00:00
Daniel Smith
acbfe2e4c9
Revert "Revert "Initial RocksDB""
2020-06-15 12:45:36 -04:00
Evan Tschannen
beab24de76
Merge branch 'release-6.3' of github.com:apple/foundationdb into release-6.3
2020-06-14 22:38:37 -07:00
Evan Tschannen
c56d97cc9f
randomize the coordinator a storage worker connects to
2020-06-14 22:26:06 -07:00
Young Liu
f211a54593
Merged from upstream master
2020-06-13 16:47:12 -07:00
Young Liu
f8c457d74d
Minor fix against Meng's comments
2020-06-13 16:27:08 -07:00
Meng Xu
8595813b7d
Merge pull request #3355 from apple/release-6.3
...
Merge Release 6.3 into master branch
2020-06-12 20:08:47 -07:00
Jingyu Zhou
9cd1614c82
Revert "Initial RocksDB"
2020-06-11 15:29:46 -07:00
Daniel Smith
a4dbb5dd01
Merge branch 'trace-batch-thread-hostile' into rocksdb-6.3
2020-06-11 15:53:57 +00:00
Young Liu
a47806a966
Fixed locked and metadataVersion in GetReadVersion
2020-06-10 15:55:23 -07:00
A.J. Beamon
739767b838
Delay cluster controller candidacy for all worst fit processes, not just storage servers.
2020-06-10 09:59:56 -07:00
Young Liu
3a37e0af75
Serve GetReadVersion through master instead of peer proxies
2020-06-09 20:47:34 -07:00
negoyal
23a565ec63
Few bug fixes.
2020-06-05 16:27:04 -07:00
Evan Tschannen
30bfd606c0
Merge branch 'release-6.2' into release-6.3
...
# Conflicts:
# CMakeLists.txt
# documentation/sphinx/source/downloads.rst
# documentation/sphinx/source/release-notes.rst
# fdbserver/worker.actor.cpp
# packaging/msi/FDBInstaller.wxs
# versions.target
2020-06-04 19:21:32 -07:00
A.J. Beamon
9edc872041
Don't attempt to become a cluster controller on any process with a class that has NeverAssign fitness.
2020-06-03 16:05:21 -07:00
negoyal
cf13e00a8f
Merge remote-tracking branch 'origin/release-6.3' into fdb_cache_wo_allocator
2020-06-01 17:38:31 -07:00
A.J. Beamon
8329a242d2
Merge branch 'release-6.2' into merge-release-6.2-into-release-6.3
...
# Conflicts:
# documentation/sphinx/source/downloads.rst
# documentation/sphinx/source/release-notes.rst
2020-05-29 15:51:56 -07:00
Evan Tschannen
e938d741e3
kill the process when a shared tlog throws an io_error
2020-05-29 09:02:55 -07:00
Daniel Smith
8731700d80
Merge remote-tracking branch 'upstream/release-6.3' into rocksdb-6.3
2020-05-27 20:02:25 +00:00
Evan Tschannen
4b6e1d8a57
fix compile problem
2020-05-22 17:16:59 -07:00
Evan Tschannen
ced65cd30b
finished explicitly versioning everything stored in the database
2020-05-22 17:14:21 -07:00
Daniel Smith
5d361fe532
Copy/paste rebase onto 6.3
2020-05-22 15:02:51 +00:00
Markus Pilman
eaaceab845
fixed compiler issues
2020-05-14 13:48:19 -07:00
Markus Pilman
c2bc75516f
Merge branch 'release-6.3' of github.com:apple/foundationdb into features/trace-roles
2020-05-14 10:34:53 -07:00
Evan Tschannen
07111f0e41
add a large random delay on failure detection so that not all storage servers need to attempt to become the cluster controller
2020-05-10 17:09:33 -07:00
Evan Tschannen
048201717c
Fixed a number of problems with monitorLeaderRemotely
2020-05-10 14:20:50 -07:00
Evan Tschannen
6fca885b9d
revert strage class monitor leader because of correctness issues
2020-05-09 18:03:59 -07:00
Evan Tschannen
f9518c3441
Merge pull request #3069 from alexmiller-apple/tls-connection-count
...
YOLO at reducing TLS connection count via doing monitorLeader on coordinators
2020-05-09 17:12:27 -07:00
Markus Pilman
025f27f389
control trace interval with a knob
2020-05-08 17:14:42 -07:00
Evan Tschannen
f0f52fb2be
Merge branch 'master' into feature-small-endpoint
...
# Conflicts:
# fdbclient/StorageServerInterface.h
2020-05-08 16:37:35 -07:00
Markus Pilman
5f9b127e56
Emit traces regularly about role assignment
...
We are currently emitting Role transition traces when a role starts and
when it ends. While this is useful for debugging, it doesn't work well
with tools that inject data and might potentially miss some trace lines.
We do decorate each trace lines with the roles assigned to that
particular process, however, this is not sufficient for tools that can
make use of the UID -> Role mapping
2020-05-08 16:27:57 -07:00
negoyal
749fcd13b0
Merge branch 'master' into fdb_cache_wo_allocator
2020-05-08 16:23:29 -07:00
Alex Miller
383099aef3
Bug fixes to get it actually doing the right thing:
...
* Intialize electionResult when constructing with NetworkAddress.
* Return after sending a reply.
* Reset the reply promise on each new request.
2020-05-08 01:00:18 -07:00
Evan Tschannen
51d3aaf4ae
fixed a few rare correctness bugs
2020-05-06 23:24:58 -07:00
Alex Miller
8a6e177950
Merge remote-tracking branch 'upstream/master' into tls-connection-count
2020-05-05 16:49:36 -07:00
A.J. Beamon
0b4c93bb1b
More aggressively cleanup a bad process ID file in simulation
2020-05-05 15:59:02 -07:00
Evan Tschannen
f329164fb4
Merge pull request #2532 from dongxinEric/feature/hot-read-key-detection-part-2
...
Feature/hot read key detection part 2
2020-05-05 14:33:34 -07:00
Alex Miller
1117eae2b5
Rework to make ElectionResult code similar to OpenDatabase code.
...
And also restore and fix the delayed cluster controller code.
2020-05-05 01:00:17 -07:00
negoyal
dd033736ed
Merge branch 'master' into fdb_cache_subfeature2
2020-05-04 17:29:43 -07:00
Evan Tschannen
ca92a39f5d
reduced the size of proxy and tlog interfaces
2020-05-01 16:41:20 -07:00
Alex Miller
43a63452d8
YOLO at reducing TLS connection count via doing monitorLeader on coordinators
2020-05-01 14:40:21 -07:00
Evan Tschannen
4d131bdd4a
Merge branch 'master' into feature-small-endpoint
2020-05-01 13:16:15 -07:00
Dave Cottlehuber
98639645b1
fdbserver: update headers
2020-04-30 18:11:23 +00:00
Evan Tschannen
a442565e13
more work towards shrinking locality
2020-04-18 21:29:38 -07:00
Evan Tschannen
4c51e0a05b
Update fdbserver/worker.actor.cpp
...
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2020-04-17 14:44:58 -07:00
Xin Dong
7dd7406c59
Merge branch 'master' into feature/hot-read-key-detection-part-2
2020-04-16 14:54:05 -07:00
Evan Tschannen
2eec3bb9b1
fixed logic for skipping broadcast
2020-04-13 13:09:21 -07:00
Evan Tschannen
8f78912483
knobified parameter
2020-04-11 20:54:17 -07:00
Evan Tschannen
e5ec7f2800
do not broadcast obsolete serverDBInfo
2020-04-11 20:05:03 -07:00
Evan Tschannen
1476057996
properly cache serialization of serverDBInfo
2020-04-11 19:30:05 -07:00
Evan Tschannen
07cc0a8d74
code cleanup
2020-04-10 17:02:11 -07:00
Evan Tschannen
ce4493f679
many bug fixes
2020-04-10 13:45:16 -07:00
Evan Tschannen
a51c92854a
Merge branch 'master' into feature-tree-broadcast
...
# Conflicts:
# fdbserver/WorkerInterface.actor.h
# fdbserver/worker.actor.cpp
2020-04-06 21:09:44 -07:00
Evan Tschannen
2a1bd97120
fix compilation errors
2020-04-06 20:58:43 -07:00
Evan Tschannen
477d66b46d
implemented a tree broadcast for txn state message for proxies, and serverDBInfo for workers
2020-04-05 23:09:36 -07:00
negoyal
acaf91ac47
Merge branch 'master' into fdb_cache_subfeature2
2020-03-26 13:33:08 -07:00
Jingyu Zhou
f0f4e42a4c
Add removal for backupWorkerCache
2020-03-23 12:47:42 -07:00
Jingyu Zhou
658504bc66
Add a cache to handle repeated delivery of backup recruitment messages
2020-03-23 10:22:24 -07:00
Balachandar Namasivayam
58a9bfa78b
Merge pull request #2820 from dongxinEric/fix/1977/add-back-trace-event-flush-failure-report
...
Fix/1977/add back trace event flush failure report
2020-03-18 16:11:44 -07:00
Xin Dong
89861c661e
Fix the random crash. Use a thread safe 'ThreadReturnPromise' instead of the ThreadFuture.
2020-03-16 13:36:55 -07:00
Xin Dong
5967ef5eab
Added back the changes that report trace log flush failures and fix the random crash
2020-03-12 14:34:19 -07:00
Meng Xu
a9136f3f72
Add waitForUnreliableExtraStoreReboot to wait for extra store to reboot
2020-03-12 10:18:31 -07:00
Evan Tschannen
303df197cf
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# bindings/c/test/mako/mako.c
# documentation/sphinx/source/release-notes.rst
# fdbbackup/backup.actor.cpp
# fdbclient/NativeAPI.actor.cpp
# fdbclient/NativeAPI.actor.h
# fdbserver/DataDistributionQueue.actor.cpp
# fdbserver/Knobs.cpp
# fdbserver/Knobs.h
# fdbserver/LogRouter.actor.cpp
# fdbserver/SkipList.cpp
# fdbserver/fdbserver.actor.cpp
# flow/CMakeLists.txt
# flow/Knobs.cpp
# flow/Knobs.h
# flow/flow.vcxproj
# flow/flow.vcxproj.filters
# versions.target
2020-03-06 18:22:46 -08:00
Evan Tschannen
1128666840
added additional logging on the log router
2020-03-05 18:17:06 -08:00
Xin Dong
39610d15f8
Revert this change since it somehow introduced a random crash detected on circus
2020-03-04 16:14:38 -08:00
negoyal
3acd3ad3af
Some bugfixes and cleanup.
2020-03-02 17:11:23 -08:00
negoyal
cd949eca71
Merge branch 'master' into fdb_cache_subfeature2
2020-02-26 11:22:08 -08:00
Xin Dong
f20619c9fb
Resolve review comments. Changed how issues got cleared
2020-02-25 15:39:51 -08:00
Xin Dong
fce71e4516
Added a TODO for the usage of 'issues' in 'monitorServerDBInfo'
2020-02-25 15:39:38 -08:00
Xin Dong
090c89e90a
Addressed review comments. Fix the bug where issues on a worker may be wrongly cleared by subsequent GetDBinfo request.
2020-02-25 15:39:38 -08:00
Xin Dong
288e95c7e1
Reallocate the issues set after each get. Changed an issues name to be accurate
2020-02-25 15:39:09 -08:00
Xin Dong
f4f860bfa8
Changed issue reporting to be thread safe. Also changed the liveness ping to be thread safe.
2020-02-25 15:38:14 -08:00
Xin Dong
a6580dc15f
Added the ability to ping a trace log writer thread and the monitoring in worker.actor.cpp. The current solution is simple a loose check. We can change this to be accurate check by using 'pthread_kill(writer_thread, 0)'
2020-02-25 15:37:53 -08:00
Xin Dong
0b0414fb94
Addressded review comments. Change the issue reporting from 'ITraceLogWriter' to be a more generic way.
2020-02-25 15:37:53 -08:00
Xin Dong
034dfe5e42
Now the inability to flush trace logs will be reported to both 'stderr' and also the status json object.
...
- Since the first flush failure, if the accumulated consecutive failure count exceeds the value defined in knobs, it will trigger the current worker process to report this issue via the 'GetServerDBInfo' interface of the cluster controler
- A successful flush will reset the accumulated counter.
Notice that the current solution does not take the time into consideration. The assumption is that flush failures tend to only happen in a clustered manner. The intermittent, but short, periods of flush failures are not considered as a problem since the memory pressure built by them should be negligible.
2020-02-25 15:37:32 -08:00
negoyal
308e088bca
Minor fixes.
2020-02-25 15:00:18 -08:00
Evan Tschannen
96258b9809
Merge branch 'release-6.2'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbcli/fdbcli.actor.cpp
# fdbclient/ManagementAPI.actor.cpp
# fdbrpc/FlowTransport.actor.cpp
# fdbserver/ClusterController.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/DataDistribution.actor.h
# fdbserver/DataDistributionQueue.actor.cpp
# fdbserver/KeyValueStoreMemory.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/QuietDatabase.actor.cpp
# fdbserver/SkipList.cpp
# fdbserver/StorageMetrics.actor.h
# fdbserver/TLogServer.actor.cpp
# fdbserver/fdbserver.actor.cpp
# fdbserver/storageserver.actor.cpp
# fdbserver/workloads/KVStoreTest.actor.cpp
# flow/CMakeLists.txt
# flow/Knobs.cpp
# flow/Knobs.h
# flow/genericactors.actor.cpp
# flow/serialize.h
2020-02-21 19:09:16 -08:00
Evan Tschannen
59ff782927
fix: only delete the processId file on binaryReader errors
2020-02-20 23:04:39 -08:00
A.J. Beamon
5586e6f6d8
Merge pull request #2697 from etschannen/feature-correctness-fixes
...
A variety of correctness fixes
2020-02-20 13:32:18 -08:00
Evan Tschannen
9b3254d5f4
A corrupted processId file should be deleted in simulation, as that is the manual operation that would fix the problem in the real world
2020-02-19 15:21:42 -08:00
Meng Xu
94d799552e
FastRestore:Apply clang-format against master
2020-02-18 16:41:59 -08:00
Meng Xu
132f5aa9ba
FastRestore:Improve trace name and cosmetic change
2020-02-18 16:41:19 -08:00
Meng Xu
31a6ec34b7
Merge branch 'master' into mengxu/fast-restore-agent-PR
2020-02-18 16:17:59 -08:00
Balachandar Namasivayam
1be6915a38
Fix an incorrect if else check.
2020-02-17 17:31:41 -08:00
A.J. Beamon
1d9140d874
Removed TLogVersion logging.
...
Added logging of SharedTLog ID for each TLog.
Switched ID logged for TLogRejoining event to the TLog instead of the SharedTLog.
Made some parameters to startRole passed by reference.
2020-02-14 12:33:43 -08:00
Balachandar Namasivayam
32165c506f
Fix a one line bug where the if check comparison was wrong.
2020-02-12 16:55:33 -08:00
A.J. Beamon
56053c565b
Improve TLog "Role" event by adding the worker ID, the TLog version, and under what circumstances the TLog is being started (Restored, Recruited, or Recovered).
...
The SharedTLog role was being started and stopped twice, so remove one instance of it.
2020-02-12 15:11:38 -08:00
negoyal
85cc35e81e
Merge branch 'master' into HEAD
2020-02-05 14:59:55 -08:00
Meng Xu
3b57bf1781
Merge branch 'master' into mengxu/fast-restore-agent-PR
2020-02-03 17:23:54 -08:00
Evan Tschannen
4524831456
Merge pull request #2518 from vishesh/task/failmon-remove-server
...
FailureMonitoring: Server processes no longer need to talk to ClusterController
2020-02-03 17:22:50 -08:00
Meng Xu
ca3b6135d0
FastRestore:Add debug to see why restore role is not connected
...
Reason: restore is a fdbserver who does not register with CC.
The new failure monitor changes how connection works for client and server.
For client, it does not connect to CC to get connected.
For server, it has to connect to CC to get connected.
Restore worker becomes the special role that behaves like a client but is a server.
2020-02-03 17:19:52 -08:00
Meng Xu
9c2046b11b
FastRestore:Minic fdbd to monitor coordintors
...
Before we start a fdb restore process.
2020-02-03 14:48:31 -08:00
Meng Xu
559b95c61a
FastRestore:RestoreRole:Mimic how fdbd starts
2020-02-01 10:23:48 -08:00
Alex Miller
ee6490c9d1
Merge pull request #2314 from mengranwo/memory-engine
...
New Radix-Tree based Memory Storage Engine
2020-01-30 16:20:13 -08:00