Commit Graph

20833 Commits

Author SHA1 Message Date
Jon Fu 9bfc6ee59e Merge branch 'main' of github.com:apple/foundationdb into jfu-mako-active-tenants 2022-06-16 10:32:20 -07:00
Jon Fu b891b424ea
Merge pull request #7387 from sfc-gh-jfu/jfu-mako-tenant-rows
When provided tenants in mako, divide number of rows by number of tenants.
2022-06-16 12:55:44 -04:00
Jon Fu 2b9c8ca874 introduce concept of active versus total tenants in mako 2022-06-15 15:30:02 -07:00
Andrew Noyes 83aceb216c
Use absl::GetStackTrace for slow task profiler (#7374)
* Make SlowTask workload runnable in joshua

* Remove SignalSafeUnwind, and use absl::GetStackTrace for slow task profiler
2022-06-15 14:53:52 -07:00
Andrew Noyes 0fea3fb731
Save a bunch of copies in the trace thread (#7392)
Currently, a std::string is copied unnecessarily for every key and value
in a trace event.

This actually showed up in a jemalloc heap profile while I was
investigating something unrelated. I was surprised to see it since these
allocations should have a very short lifetime.
2022-06-15 12:29:15 -07:00
Jon Fu a96928be2d Merge branch 'main' of github.com:apple/foundationdb into jfu-mako-tenant-rows 2022-06-15 12:15:23 -07:00
Johannes Scheuermann c9b4ff3302
Add support for lumberjack in logging and update example to 7.1 (#7357) 2022-06-15 19:41:01 +01:00
Jingyu Zhou a32127a0b9
Merge pull request #7391 from sbodagala/main
Do not always try to figure out the sequencer locality
2022-06-15 13:50:55 -04:00
Ata E Husain Bohra 9396b691b7
Generate GNU compatible build-id for mockkms golang binary (#7389)
* Generate GNU compatible build-id for mockkms golang binary

Description

 diff-1: Fix compilation issue

Generate GNU compatible build-id for mockkms golang binary
Leverage "cgo" to generate build-id

Testing

Debian package build, verified the GNU build-id
2022-06-15 10:43:46 -07:00
Sreenath Bodagala 2c85bb71c1 - Do not try to figure out the sequencer locality if knob
ENABLE_VERSION_VECTOR_HA_OPTIMIZATION is disabled.
2022-06-15 16:08:31 +00:00
Ata E Husain Bohra 8808d93813
Fix bugs in EncyrptKeyProxy actor (#7388)
Description

Major changes include:
1. GetEncryptByKeyIds cache elements can expire.
2. Update iterator after erasing an element during refresh encryption keys
   operation.

Testing

EncryptKeyProxyTest
2022-06-14 21:22:25 -07:00
Jon Fu 06c8f9068e fix arg parse ordering 2022-06-14 16:45:58 -07:00
Jon Fu 184c266bdf adjust rows calculation after parsing all arguments but before validation 2022-06-14 16:34:39 -07:00
Jon Fu 6999ebc86f When provided tenants, divide number of rows by number of tenants. Adjust population and range reads to account for this scenario 2022-06-14 16:22:07 -07:00
Yao Xiao 7da26db342
[ShardedRocksDB] 4/N Support removeRange. (#7345) 2022-06-14 13:52:03 -07:00
Yi Wu 6246664006
Support encrypting TxnStateStore (#7253)
Adding encryption support for TxnStateStore. It is done by supporting encryption. for KeyValueStoreMemory. The encryption is currently done on operation level when the operations are being write to the underlying log file. See inline comment for the encrypted data format.

This PR depends on #7252. It is part of the effort to support TLog encryption #6942.
2022-06-14 13:26:32 -07:00
Xiaoge Su 21ee76a44d fixup! Reformat source #2 2022-06-14 13:22:18 -07:00
Xiaoge Su c2676df2f8 fixup! Reformat source 2022-06-14 13:22:18 -07:00
Xiaoge Su 9fb6e5bb05 fixup! Fix the clang error when using std::move
This patch is to fix the compile error

/root/src/fdbclient/S3BlobStore.actor.cpp:410:9: error: moving a local
object in a return statement prevents copy elision
[-Werror,-Wpessimizing-move]
 return std::move(resource);
        ^
/root/src/fdbclient/S3BlobStore.actor.cpp:410:9: note: remove std::move
call here
 return std::move(resource);
        ^~~~~~~~~~        ~
1 error generated.
2022-06-14 13:22:18 -07:00
Vishesh Yadav fd6f6eb06a
Merge pull request #7364 from sfc-gh-ljoswiak/fixes/unnecessary-transaction-initialization
Remove unnecessary ReadYourWritesTransaction initialization
2022-06-14 11:02:31 -07:00
Xiaoge Su 00b805d8e0 fixup! Reformat source 2022-06-14 10:43:13 -07:00
Xiaoge Su e493f1c3cd fixup! Add a retry mechanism in changeQuorumChecker and changeQuorum
This is to fix an issue when recovery and change coordinator key happens
together. The issue will occur when:

1. Recovery starts
2. Coordinator key change transaction started
3. During the recovery the coordinator key is read from cluster file and
   stored in the storage server
4. The cluster controller received `ChangeCoordiatorsRequest`, and
   updated the cluster name with the new value.

at this stage, the value related to coordinator key in storage server and
the worker is inconsistent.

5. changeQuorumChecker is called, which will verify such consistency.
   Since they are different, the call is returning failure and the
   caller, which could be a TEST_CASE, fails.

This is a rare race issue, and it is also noticed that when the
recovery/coordinator key change process is done, the database is in a
proper state which allows changeQuorumChecker behave properly. In this
case, a retry mechanism should be sufficiently fix corresponding test
failures.
2022-06-14 10:43:13 -07:00
Junhyun Shim ed91ab5d54
Work around flow trace's data race bug (#7237)
* Work around flow trace's data race bug

BaseTraceEvent::setNetworkThread() and flushTraceFile[()|Void()]
has a long-standing race condition for traceEventThrottlerCache global
when flushTraceFileVoid() is not called from the network thread.

This race dates back to 2017 (commit hash 80e5fecfe2),
so before the race itself is fixed, work around the problem.

* Remove call to flushTraceFileVoid() from MkCertCli

* Apply clang format
2022-06-14 12:09:34 +02:00
Yao Xiao ddbecb69ad
Ignore ShardedRocksDBTest. (#7381) 2022-06-13 23:13:11 -07:00
Renxuan Wang 839af5701e
Fix bug in resolveTCPEndpoint() when hostname resolving fails. (#7375)
* Close trace file when error happens in runNetwork().

* Improve the bestCount algorithm in getLeader().

In the current implementation, if the nominees are [0,1], the chosen leader will be 1, which is an exception to other cases and our expectation that if 2 nominees have the same frequency, the one with lower id will be the leader.

* Remove unnecessary new statement.

stream will never be a nullptr.

* Move self->dnsCache out of lambda capture.

Member variables are not capture by default, thus, `host` and `service` are not captured. This somehow successfully compile, but throws std::bad_alloc or basic_string::_S_create exceptions when we call `host+":"+service` in dnsCache.remove().

* Revert unintended change.

* Address comments.
2022-06-13 20:24:30 -07:00
Hao Fu 9cee4c94e7
Safely remove fdb_transaction_get_range_and_flat_map (#7314) 2022-06-13 19:05:00 -07:00
Trevor Clinkenbeard 6bed046148
Merge pull request #7352 from sfc-gh-xwang/feature/ddtxn
[DD testability enhancement] Create IDDTxnProcessor and simple refactoring
2022-06-13 16:01:13 -07:00
Xiaoxi Wang ef0f415e3d add option; change to shared_ptr 2022-06-13 13:55:48 -07:00
Xiaoge Su 5a2804e04b
fixup! Fix the XmlTraceLogFormatter (#7322)
* fixup! Fix the XmlTraceLogFormatter

The original escape process uses a `loop` while the code is actually not
an ACTOR. So the actorcompiler is not reacting. This causes the escape
not escaping the XML fields properly.

* fixup! Reformat source
2022-06-13 13:38:17 -07:00
Xiaoxi Wang ea2edebbeb comment out store tuple 2022-06-13 13:36:19 -07:00
Zhanwei Wang e632aef1c7
Make backup work with s3 compatible service (#6355)(#6382) (#7324)
1. Support virtual hosting endpoint.

2. On-premise s3 compatible storage service may use IP instead of s3 form domain name,
especially for development/test environment.

Instead of parsing service and region from domain name,

1). Hard code "s3" as service name in v4 signature
2). Add new parameter to allow pass region name from url

3. Fix creating bucket issue on aws, adding request body.
2022-06-13 13:33:05 -07:00
Andrew Noyes 013b290ca5
Don't fail test if log cursor times out during network partition (#7330)
* Don't fail test if log cursor times out during network partition

Also, exercise the codepath for handling timed_out in simulation, by
reverting this knob buggification behavior to that of 07976993e7.

* clang-format
2022-06-13 13:28:22 -07:00
Trevor Clinkenbeard 942d687506
Clean up includes in actor header files (#7331)
* Remove unnecessary actorcompiler.h includes (from non-actor files)

* Make AsyncFileChaos a non-actor header file

* Add unactorcompiler.h include to the end of actor header files

* Add missing actorcompiler.h includes to actor header files
2022-06-13 13:26:51 -07:00
Ata E Husain Bohra a5d91fe18a
KmsConnector implementation to support KMS driven CipherKey TTL (#7334)
* KmsConnector implementation to support KMS driven CipherKey TTL

Description

KMS CipherKeys can be of two types:
1. Revocable CipherKeys: having a finite lifetime, after which the CipherKey
shouldn't be used by the FDB.
2. Non-revocable CipherKeys: ciphers are not revocable, however, FDB would
still want to refresh ciphers to support KMS cipher rotation feature.

Patch proposes following change to incorporate support for above defined cipher-key
types:
1. Extend KmsConnector response to include optional 'refreshAfter' & 'expireAfter'
time intervals. EncryptKeyProxy (EKP) cache would define corresponding absolute refresh &
expiry timestamp for a given cipherKey. On an event of transient KMS connectivity outage,
a caller of EKP API for a non-revocable key should continue using cached cipherKey until
it expires.
2. Simplify KmsConnector API arena handling by using VectorRef to represent component
structs and manage associated memory allocation/lifetime.

Testing

1. EncryptKeyProxyTest
2. RESTKmsConnectorTest
3. SimKmsConnectorTest

* KmsConnector implementation to support KMS driven CipherKey TTL

Description

  diff-1: Set expireTS for baseCipherId indexed cache

KMS CipherKeys can be of two types:
1. Revocable CipherKeys: having a finite lifetime, after which the CipherKey
shouldn't be used by the FDB.
2. Non-revocable CipherKeys: ciphers are not revocable, however, FDB would
still want to refresh ciphers to support KMS cipher rotation feature.

Patch proposes following change to incorporate support for above defined cipher-key
types:
1. Extend KmsConnector response to include optional 'refreshAfter' & 'expireAfter'
time intervals. EncryptKeyProxy (EKP) cache would define corresponding absolute refresh &
expiry timestamp for a given cipherKey. On an event of transient KMS connectivity outage,
a caller of EKP API for a non-revocable key should continue using cached cipherKey until
it expires.
2. Simplify KmsConnector API arena handling by using VectorRef to represent component
structs and manage associated memory allocation/lifetime.

Testing

1. EncryptKeyProxyTest
2. RESTKmsConnectorTest
3. SimKmsConnectorTest

* KmsConnector implementation to support KMS driven CipherKey TTL

Description

  diff-2: Fix Valgrind issues discovered runnign tests
  diff-1: Set expireTS for baseCipherId indexed cache

KMS CipherKeys can be of two types:
1. Revocable CipherKeys: having a finite lifetime, after which the CipherKey
shouldn't be used by the FDB.
2. Non-revocable CipherKeys: ciphers are not revocable, however, FDB would
still want to refresh ciphers to support KMS cipher rotation feature.

Patch proposes following change to incorporate support for above defined cipher-key
types:
1. Extend KmsConnector response to include optional 'refreshAfter' & 'expireAfter'
time intervals. EncryptKeyProxy (EKP) cache would define corresponding absolute refresh &
expiry timestamp for a given cipherKey. On an event of transient KMS connectivity outage,
a caller of EKP API for a non-revocable key should continue using cached cipherKey until
it expires.
2. Simplify KmsConnector API arena handling by using VectorRef to represent component
structs and manage associated memory allocation/lifetime.

Testing

1. EncryptKeyProxyTest
2. RESTKmsConnectorTest
3. SimKmsConnectorTest

* KmsConnector implementation to support KMS driven CipherKey TTL

Description

  diff-3: Address review comment
  diff-2: Fix Valgrind issues discovered runnign tests
  diff-1: Set expireTS for baseCipherId indexed cache

KMS CipherKeys can be of two types:
1. Revocable CipherKeys: having a finite lifetime, after which the CipherKey
shouldn't be used by the FDB.
2. Non-revocable CipherKeys: ciphers are not revocable, however, FDB would
still want to refresh ciphers to support KMS cipher rotation feature.

Patch proposes following change to incorporate support for above defined cipher-key
types:
1. Extend KmsConnector response to include optional 'refreshAfter' & 'expireAfter'
time intervals. EncryptKeyProxy (EKP) cache would define corresponding absolute refresh &
expiry timestamp for a given cipherKey. On an event of transient KMS connectivity outage,
a caller of EKP API for a non-revocable key should continue using cached cipherKey until
it expires.
2. Simplify KmsConnector API arena handling by using VectorRef to represent component
structs and manage associated memory allocation/lifetime.

Testing

1. EncryptKeyProxyTest
2. RESTKmsConnectorTest
3. SimKmsConnectorTest
2022-06-13 13:25:01 -07:00
Andrew Noyes 38db712e7a
Make ASAN arena aware (#7336) 2022-06-13 13:24:02 -07:00
Andrew Noyes 207e0bc105
Fix a few places we weren't doing exponential backoff (#7349)
* Fix a few places we weren't doing exponential backoff

We re-create the transaction every iteration of each of these retry
loops, so we need to manage exponential backoff here ourselves.

Closes #7301

* Remove former Backoff definition
2022-06-13 13:18:58 -07:00
Stitch-Zhang 2275127d8c
fix(fdbkubernetesmonitor): unclosed file description (#7356)
Closing additional environment file description as soon as read it completely
2022-06-13 13:16:26 -07:00
Andrew Noyes 2a8d8a1932
Fix more heap overflows (#7360)
From calling strlen on a not necessarily null-terminated buffer.
2022-06-13 13:13:05 -07:00
Xiaoxi Wang 1de6c09307 use struct instead of tuple 2022-06-13 11:27:50 -07:00
Ray Jenkins c45abc7c32
Add TRACING_SPAN_ATTRIBUTES_ENABLED Knob, default false. (#7354)
* Add TRACING_SPAN_ATTRIBUTES_ENABLED Knob, default false.

In order to prevent accidental leakage of PII to external tracing collector services,
we've added a knob to prevent additional attributes to be added to spans unless explicitly
enabled by the user.

* Enable span attributes knob for unit tests.
2022-06-13 11:37:09 -05:00
Xiaoxi Wang c12a7a30ed
Update fdbserver/DataDistributionQueue.actor.cpp
Co-authored-by: Trevor Clinkenbeard <trevor.clinkenbeard@snowflake.com>
2022-06-13 08:22:48 -07:00
Xiaoxi Wang 9604db3f10
Update fdbserver/DDTxnProcessor.h
Co-authored-by: Trevor Clinkenbeard <trevor.clinkenbeard@snowflake.com>
2022-06-13 08:19:14 -07:00
Lukas Joswiak 3b3ef49d40 Remove unnecessary transaction initialization
`ReadYourWritesTransaction` has memory allocated before being passed to
the main thread. This allows both threads to continue to access the
transaction object. Currently, the transaction gets allocated and
initialized on the foreign thread, and then re-initialized on the main
thread. This causes a bunch of extra, unnecessary work for each
`ReadYourWritesTransaction` where the temporary object gets destructed.

The fix is to only allocate memory for the `ReadYourWritesTransaction`
on the foreign thread, and then initialize it once on the main thread.
2022-06-10 16:53:19 -07:00
Jingyu Zhou d0c5449d5c Add 7.1.8 and 7.1.9 release notes 2022-06-10 15:09:10 -07:00
Andrew Noyes 849b1cd29a
Update to the latest jemalloc release (#7362)
Also remove our patch, since the fix is already present in the new
release.
2022-06-10 14:46:21 -07:00
Steve Atherton 90bb3a7f8c
Merge pull request #7341 from sfc-gh-satherton/net2-react-perf-fix
Performance bug fix: reactor.react() is called too often.
2022-06-09 17:55:07 -07:00
Xiaoxi Wang fb66561bc4 format code 2022-06-09 14:43:09 -07:00
Xiaoxi Wang 7ee6808ebd solve compiler warning 2022-06-09 14:32:24 -07:00
Xiaoxi Wang b99bd45730 format code 2022-06-09 12:36:20 -07:00
Xiaoxi Wang e5aa5fef22 merge upstream/main 2022-06-09 12:17:27 -07:00