Commit Graph

117 Commits

Author SHA1 Message Date
A.J. Beamon c851ee4031
Merge pull request #2897 from tclinken/fix-trace-batch-loggroup-and-role
Annotate trace batch events before dumping
2020-04-13 11:22:51 -07:00
tclinken 8ef5a04896 Guard all of annotateEvent with mutex 2020-04-10 13:03:15 -07:00
tclinken 01285f3374 Delay annotation of trace batch events created before trace file is opened 2020-04-09 14:09:00 -07:00
tclinken 10fee8fafc Annotate trace batch events before dumping 2020-04-02 19:34:02 -07:00
Xin Dong 6820167d77
Merge branch 'master' into feature/1689/allow-custome-trace-log-file-identifier 2020-03-31 16:50:46 -07:00
Xin Dong 2805111a32 When provided with a custome identifier, use that string instead of the port/PID as the last part of the baseName. 2020-03-31 11:02:02 -07:00
Xin Dong 03e2102a21 Fix macOS build failure. 2020-03-26 11:41:36 -07:00
Xin Dong a0177a9335 Allow the user to provide a custome trace log file identifier that will be used as the prefix of all trace log files created at the client side. 2020-03-26 11:25:05 -07:00
tclinken baf0fe956c Take trace mutex in setLogGroup 2020-03-26 09:55:03 -07:00
tclinken 7d5ed53215 Allow trace log group to be set after database is created 2020-03-25 13:40:43 -07:00
Balachandar Namasivayam 58a9bfa78b
Merge pull request #2820 from dongxinEric/fix/1977/add-back-trace-event-flush-failure-report
Fix/1977/add back trace event flush failure report
2020-03-18 16:11:44 -07:00
Evan Tschannen e08f0201f1 merge release 6.2 into master 2020-03-17 12:51:47 -07:00
Xin Dong 31a9f0a26c Fix the segfault 2020-03-17 11:03:46 -07:00
A.J. Beamon f1523bd472 Setting the network thread more than once is a no-op 2020-03-16 15:37:06 -07:00
A.J. Beamon 7769218303 Move an increment after an ASSERT. 2020-03-16 14:11:07 -07:00
A.J. Beamon d8cfabe73b Extend the allocation tracing disabling flag to cover more parts of trace logging as a precaution. Make it possible to disable via knob. 2020-03-16 13:59:31 -07:00
Xin Dong 89861c661e Fix the random crash. Use a thread safe 'ThreadReturnPromise' instead of the ThreadFuture. 2020-03-16 13:36:55 -07:00
Xin Dong 5967ef5eab Added back the changes that report trace log flush failures and fix the random crash 2020-03-12 14:34:19 -07:00
A.J. Beamon 2466749648 Don't disallow allocation tracking when a trace event is open because we now have state trace events. Instead, only block allocation tracking while we are in the middle of allocation tracking already to prevent recursion. 2020-03-12 11:17:49 -07:00
Evan Tschannen 303df197cf Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	bindings/c/test/mako/mako.c
#	documentation/sphinx/source/release-notes.rst
#	fdbbackup/backup.actor.cpp
#	fdbclient/NativeAPI.actor.cpp
#	fdbclient/NativeAPI.actor.h
#	fdbserver/DataDistributionQueue.actor.cpp
#	fdbserver/Knobs.cpp
#	fdbserver/Knobs.h
#	fdbserver/LogRouter.actor.cpp
#	fdbserver/SkipList.cpp
#	fdbserver/fdbserver.actor.cpp
#	flow/CMakeLists.txt
#	flow/Knobs.cpp
#	flow/Knobs.h
#	flow/flow.vcxproj
#	flow/flow.vcxproj.filters
#	versions.target
2020-03-06 18:22:46 -08:00
Evan Tschannen 1128666840 added additional logging on the log router 2020-03-05 18:17:06 -08:00
Xin Dong 39610d15f8 Revert this change since it somehow introduced a random crash detected on circus 2020-03-04 16:14:38 -08:00
Xin Dong 16575ae94d Address review comments 2020-02-27 11:54:15 -08:00
Xin Dong 4ac7b36e44 Added back the mutex holder that was removed accidentally 2020-02-27 10:19:17 -08:00
Xin Dong 7b51ab6b63 Rebased with master 2020-02-25 15:43:33 -08:00
Xin Dong f20619c9fb Resolve review comments. Changed how issues got cleared 2020-02-25 15:39:51 -08:00
Xin Dong 3f24ae93f2 Remove the unused variable 2020-02-25 15:39:38 -08:00
Xin Dong 090c89e90a Addressed review comments. Fix the bug where issues on a worker may be wrongly cleared by subsequent GetDBinfo request. 2020-02-25 15:39:38 -08:00
Xin Dong 288e95c7e1 Reallocate the issues set after each get. Changed an issues name to be accurate 2020-02-25 15:39:09 -08:00
Xin Dong 1c346fcfb0 Added the new issues into Status Schema. Remove the issue reporting in lastError since:
- If the issue string contains the error number, status schema needs to be super verbose to include all possible issue strings
- If the issue string does not contain the error number, the generic issue string can be pretty useless.

Thus now specific issues are being reported before calling lastError
2020-02-25 15:38:14 -08:00
Xin Dong f4f860bfa8 Changed issue reporting to be thread safe. Also changed the liveness ping to be thread safe. 2020-02-25 15:38:14 -08:00
Xin Dong a6580dc15f Added the ability to ping a trace log writer thread and the monitoring in worker.actor.cpp. The current solution is simple a loose check. We can change this to be accurate check by using 'pthread_kill(writer_thread, 0)' 2020-02-25 15:37:53 -08:00
Xin Dong 0b0414fb94 Addressded review comments. Change the issue reporting from 'ITraceLogWriter' to be a more generic way. 2020-02-25 15:37:53 -08:00
Xin Dong 034dfe5e42 Now the inability to flush trace logs will be reported to both 'stderr' and also the status json object.
- Since the first flush failure, if the accumulated consecutive failure count exceeds the value defined in knobs, it will trigger the current worker process to report this issue via the 'GetServerDBInfo' interface of the cluster controler
    - A successful flush will reset the accumulated counter.
    Notice that the current solution does not take the time into consideration. The assumption is that flush failures tend to only happen in a clustered manner. The intermittent, but short, periods of flush failures are not considered as a problem since the memory pressure built by them should be negligible.
2020-02-25 15:37:32 -08:00
Alvin Moore 0f64505d0b Merge branch 'release-6.2' of github.com:apple/foundationdb
Needed to pull in changes to build docker
2020-02-23 23:27:53 -08:00
A.J. Beamon dfa5f76c01 Remove unused parameter. Don't put check for g_network presence in ASSERT_WE_THINK. 2020-02-21 16:28:03 -08:00
A.J. Beamon 2431d4d788 Always compute the time for a trace event when it is being logged rather than when it is being created. Usually these are the same, but if they aren't, doing the opposite can lead to out of order trace events. 2020-02-21 13:57:04 -08:00
A.J. Beamon 6810a03283 Add more logging to valley filler and mountain chopper 2020-02-21 10:55:14 -08:00
Paul J. Davis 32e285a761 Add network option for the trace clock source
This option allows clients to select the clock source for trace events
similar to the `--traceclock` command line parameter for `fdbserver`.
Using the `realtime` clock sources makes loading event data into
OpenTracing systems like Jaeger more useful.
2020-02-15 11:30:43 -06:00
Evan Tschannen a5f544818c
Merge pull request #2420 from ajbeamon/trace-clock-source-fix
Revert change to make g_trace_clock thread_local, ...
2020-01-10 12:36:38 -08:00
Evan Tschannen 16b5af067c changed trace event name 2020-01-03 16:03:29 -08:00
Evan Tschannen deb032745a fix: do not set logged until then end of the function 2020-01-03 12:45:23 -08:00
Evan Tschannen 1867d30017 added asserts to protect against future actions on a trace event that has been logged 2020-01-03 12:31:06 -08:00
Evan Tschannen 7152469cc3 log the base trace event before the endpoint messages 2020-01-03 12:15:38 -08:00
A.J. Beamon 9866d1ce27 Revert change to make g_trace_clock thread_local, instead checking we are on the correct thread when getting the time. 2019-12-06 10:15:49 -08:00
A.J. Beamon b5d2234a13 Merge branch 'release-6.1' into merge-release-6.1-into-master
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbbackup/backup.actor.cpp
#	fdbserver/MoveKeys.actor.cpp
#	flow/FastAlloc.h
#	versions.target
2019-07-30 16:23:42 -07:00
A.J. Beamon e98cee016d Fix unsafe usage of now() function from multiple threads in trace logging. 2019-07-22 22:31:38 -07:00
A.J. Beamon d981de18e4 Restrict huge arena sampling to the network thread. Revert removal of thread_local definitions. 2019-07-17 16:23:17 -07:00
A.J. Beamon d5051b08dd Make trace event field lengths (and total event sizes) default knobified and configurable. Add a transaction option to control the field length of transaction debug logging. Make the program start command line field less likely to be truncated. 2019-07-12 16:12:35 -07:00
Alex Miller 7a500cd37f A giant translation of TaskFooPriority -> TaskPriority::Foo
This is so that APIs that take priorities don't take ints, which are
common and easy to accidentally pass the wrong thing.
2019-06-25 02:47:35 -07:00