A.J. Beamon
c851ee4031
Merge pull request #2897 from tclinken/fix-trace-batch-loggroup-and-role
...
Annotate trace batch events before dumping
2020-04-13 11:22:51 -07:00
tclinken
8ef5a04896
Guard all of annotateEvent with mutex
2020-04-10 13:03:15 -07:00
tclinken
01285f3374
Delay annotation of trace batch events created before trace file is opened
2020-04-09 14:09:00 -07:00
tclinken
10fee8fafc
Annotate trace batch events before dumping
2020-04-02 19:34:02 -07:00
Xin Dong
6820167d77
Merge branch 'master' into feature/1689/allow-custome-trace-log-file-identifier
2020-03-31 16:50:46 -07:00
Xin Dong
2805111a32
When provided with a custome identifier, use that string instead of the port/PID as the last part of the baseName.
2020-03-31 11:02:02 -07:00
Xin Dong
03e2102a21
Fix macOS build failure.
2020-03-26 11:41:36 -07:00
Xin Dong
a0177a9335
Allow the user to provide a custome trace log file identifier that will be used as the prefix of all trace log files created at the client side.
2020-03-26 11:25:05 -07:00
tclinken
baf0fe956c
Take trace mutex in setLogGroup
2020-03-26 09:55:03 -07:00
tclinken
7d5ed53215
Allow trace log group to be set after database is created
2020-03-25 13:40:43 -07:00
Balachandar Namasivayam
58a9bfa78b
Merge pull request #2820 from dongxinEric/fix/1977/add-back-trace-event-flush-failure-report
...
Fix/1977/add back trace event flush failure report
2020-03-18 16:11:44 -07:00
Evan Tschannen
e08f0201f1
merge release 6.2 into master
2020-03-17 12:51:47 -07:00
Xin Dong
31a9f0a26c
Fix the segfault
2020-03-17 11:03:46 -07:00
A.J. Beamon
f1523bd472
Setting the network thread more than once is a no-op
2020-03-16 15:37:06 -07:00
A.J. Beamon
7769218303
Move an increment after an ASSERT.
2020-03-16 14:11:07 -07:00
A.J. Beamon
d8cfabe73b
Extend the allocation tracing disabling flag to cover more parts of trace logging as a precaution. Make it possible to disable via knob.
2020-03-16 13:59:31 -07:00
Xin Dong
89861c661e
Fix the random crash. Use a thread safe 'ThreadReturnPromise' instead of the ThreadFuture.
2020-03-16 13:36:55 -07:00
Xin Dong
5967ef5eab
Added back the changes that report trace log flush failures and fix the random crash
2020-03-12 14:34:19 -07:00
A.J. Beamon
2466749648
Don't disallow allocation tracking when a trace event is open because we now have state trace events. Instead, only block allocation tracking while we are in the middle of allocation tracking already to prevent recursion.
2020-03-12 11:17:49 -07:00
Evan Tschannen
303df197cf
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# bindings/c/test/mako/mako.c
# documentation/sphinx/source/release-notes.rst
# fdbbackup/backup.actor.cpp
# fdbclient/NativeAPI.actor.cpp
# fdbclient/NativeAPI.actor.h
# fdbserver/DataDistributionQueue.actor.cpp
# fdbserver/Knobs.cpp
# fdbserver/Knobs.h
# fdbserver/LogRouter.actor.cpp
# fdbserver/SkipList.cpp
# fdbserver/fdbserver.actor.cpp
# flow/CMakeLists.txt
# flow/Knobs.cpp
# flow/Knobs.h
# flow/flow.vcxproj
# flow/flow.vcxproj.filters
# versions.target
2020-03-06 18:22:46 -08:00
Evan Tschannen
1128666840
added additional logging on the log router
2020-03-05 18:17:06 -08:00
Xin Dong
39610d15f8
Revert this change since it somehow introduced a random crash detected on circus
2020-03-04 16:14:38 -08:00
Xin Dong
16575ae94d
Address review comments
2020-02-27 11:54:15 -08:00
Xin Dong
4ac7b36e44
Added back the mutex holder that was removed accidentally
2020-02-27 10:19:17 -08:00
Xin Dong
7b51ab6b63
Rebased with master
2020-02-25 15:43:33 -08:00
Xin Dong
f20619c9fb
Resolve review comments. Changed how issues got cleared
2020-02-25 15:39:51 -08:00
Xin Dong
3f24ae93f2
Remove the unused variable
2020-02-25 15:39:38 -08:00
Xin Dong
090c89e90a
Addressed review comments. Fix the bug where issues on a worker may be wrongly cleared by subsequent GetDBinfo request.
2020-02-25 15:39:38 -08:00
Xin Dong
288e95c7e1
Reallocate the issues set after each get. Changed an issues name to be accurate
2020-02-25 15:39:09 -08:00
Xin Dong
1c346fcfb0
Added the new issues into Status Schema. Remove the issue reporting in lastError since:
...
- If the issue string contains the error number, status schema needs to be super verbose to include all possible issue strings
- If the issue string does not contain the error number, the generic issue string can be pretty useless.
Thus now specific issues are being reported before calling lastError
2020-02-25 15:38:14 -08:00
Xin Dong
f4f860bfa8
Changed issue reporting to be thread safe. Also changed the liveness ping to be thread safe.
2020-02-25 15:38:14 -08:00
Xin Dong
a6580dc15f
Added the ability to ping a trace log writer thread and the monitoring in worker.actor.cpp. The current solution is simple a loose check. We can change this to be accurate check by using 'pthread_kill(writer_thread, 0)'
2020-02-25 15:37:53 -08:00
Xin Dong
0b0414fb94
Addressded review comments. Change the issue reporting from 'ITraceLogWriter' to be a more generic way.
2020-02-25 15:37:53 -08:00
Xin Dong
034dfe5e42
Now the inability to flush trace logs will be reported to both 'stderr' and also the status json object.
...
- Since the first flush failure, if the accumulated consecutive failure count exceeds the value defined in knobs, it will trigger the current worker process to report this issue via the 'GetServerDBInfo' interface of the cluster controler
- A successful flush will reset the accumulated counter.
Notice that the current solution does not take the time into consideration. The assumption is that flush failures tend to only happen in a clustered manner. The intermittent, but short, periods of flush failures are not considered as a problem since the memory pressure built by them should be negligible.
2020-02-25 15:37:32 -08:00
Alvin Moore
0f64505d0b
Merge branch 'release-6.2' of github.com:apple/foundationdb
...
Needed to pull in changes to build docker
2020-02-23 23:27:53 -08:00
A.J. Beamon
dfa5f76c01
Remove unused parameter. Don't put check for g_network presence in ASSERT_WE_THINK.
2020-02-21 16:28:03 -08:00
A.J. Beamon
2431d4d788
Always compute the time for a trace event when it is being logged rather than when it is being created. Usually these are the same, but if they aren't, doing the opposite can lead to out of order trace events.
2020-02-21 13:57:04 -08:00
A.J. Beamon
6810a03283
Add more logging to valley filler and mountain chopper
2020-02-21 10:55:14 -08:00
Paul J. Davis
32e285a761
Add network option for the trace clock source
...
This option allows clients to select the clock source for trace events
similar to the `--traceclock` command line parameter for `fdbserver`.
Using the `realtime` clock sources makes loading event data into
OpenTracing systems like Jaeger more useful.
2020-02-15 11:30:43 -06:00
Evan Tschannen
a5f544818c
Merge pull request #2420 from ajbeamon/trace-clock-source-fix
...
Revert change to make g_trace_clock thread_local, ...
2020-01-10 12:36:38 -08:00
Evan Tschannen
16b5af067c
changed trace event name
2020-01-03 16:03:29 -08:00
Evan Tschannen
deb032745a
fix: do not set logged until then end of the function
2020-01-03 12:45:23 -08:00
Evan Tschannen
1867d30017
added asserts to protect against future actions on a trace event that has been logged
2020-01-03 12:31:06 -08:00
Evan Tschannen
7152469cc3
log the base trace event before the endpoint messages
2020-01-03 12:15:38 -08:00
A.J. Beamon
9866d1ce27
Revert change to make g_trace_clock thread_local, instead checking we are on the correct thread when getting the time.
2019-12-06 10:15:49 -08:00
A.J. Beamon
b5d2234a13
Merge branch 'release-6.1' into merge-release-6.1-into-master
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbbackup/backup.actor.cpp
# fdbserver/MoveKeys.actor.cpp
# flow/FastAlloc.h
# versions.target
2019-07-30 16:23:42 -07:00
A.J. Beamon
e98cee016d
Fix unsafe usage of now() function from multiple threads in trace logging.
2019-07-22 22:31:38 -07:00
A.J. Beamon
d981de18e4
Restrict huge arena sampling to the network thread. Revert removal of thread_local definitions.
2019-07-17 16:23:17 -07:00
A.J. Beamon
d5051b08dd
Make trace event field lengths (and total event sizes) default knobified and configurable. Add a transaction option to control the field length of transaction debug logging. Make the program start command line field less likely to be truncated.
2019-07-12 16:12:35 -07:00
Alex Miller
7a500cd37f
A giant translation of TaskFooPriority -> TaskPriority::Foo
...
This is so that APIs that take priorities don't take ints, which are
common and easy to accidentally pass the wrong thing.
2019-06-25 02:47:35 -07:00