Commit Graph

5102 Commits

Author SHA1 Message Date
sfc-gh-tclinkenbeard 840dac1fa3 Merge remote-tracking branch 'origin/main' into global-tag-throttling3 2022-06-22 22:17:33 -07:00
Lukas Joswiak 75423a100c Move shared_ptr to save a reference increment and decrement 2022-06-22 14:50:17 -07:00
Lukas Joswiak 4f2b1807e4 Use shared_ptr to track initialization across threads 2022-06-22 14:50:17 -07:00
Lukas Joswiak 1b1a9d4df5 Initialize on main thread 2022-06-22 14:50:17 -07:00
Lukas Joswiak 88557d9169 Simplify function call when transaction is null 2022-06-22 14:50:17 -07:00
Lukas Joswiak b80ed948f1 Check initialization status before accessing field 2022-06-22 14:50:17 -07:00
Lukas Joswiak 4c2bb0b44e Fix undefined behavior from accessing field of uninitialized object 2022-06-22 14:50:17 -07:00
Bharadwaj V.R 8cf2be030f
Build a TenantCache for use by DD (#7207)
* Add an DD tenant-cache-assembly actor
* Add basic tenant list monitoring for tenant cache. 
* Update DD tenant cache refresh to be more efficient and unit-testable
* Remove the DD prefix in the tenant cache class name (and associated impl and UT class names); there is nothing specific to DD in it; DD uses it; other modules may use it in the future
* Disable DD tenant awareness by default
2022-06-21 16:29:30 -07:00
sfc-gh-tclinkenbeard 2391e58fb2 Merge remote-tracking branch 'origin/main' into global-tag-throttling3 2022-06-21 10:09:15 -07:00
sfc-gh-tclinkenbeard bcd8785767 Remove old debug tracing 2022-06-20 00:59:22 -07:00
Josh Slocum 34e6a8f942
Merge pull request #7399 from sfc-gh-jslocum/bg_tenant_improvements
Bg tenant improvements
2022-06-17 11:19:41 -05:00
Markus Pilman 5aacaf891c
Merge pull request #7321 from sfc-gh-ajbeamon/multiple-tenant-creation
Support creating multiple tenants in the same transaction
2022-06-17 10:10:09 -06:00
Josh Slocum 1cc466e068 fixes and review comments 2022-06-17 08:17:44 -05:00
Xiaoxi Wang 6bb4e341f9
Merge pull request #7110 from sfc-gh-xwang/features/ppw-pause-state
Adding paused/running wiggling status to status json and also the last running/paused timestamp
2022-06-16 14:27:18 -07:00
sfc-gh-tclinkenbeard 99d243197e Add fdbcli quota command 2022-06-16 14:07:16 -07:00
Xiaoxi Wang a311cc28cc solve some comments 2022-06-16 11:07:21 -07:00
Josh Slocum b3597ef3a8 Added plumbing for tenant-aware purge granules 2022-06-16 13:04:34 -05:00
sfc-gh-tclinkenbeard 0216740c0c Add /GlobalTagThrottler/Simple unit test 2022-06-15 17:21:54 -07:00
Xiaoge Su 21ee76a44d fixup! Reformat source #2 2022-06-14 13:22:18 -07:00
Xiaoge Su c2676df2f8 fixup! Reformat source 2022-06-14 13:22:18 -07:00
Xiaoge Su 9fb6e5bb05 fixup! Fix the clang error when using std::move
This patch is to fix the compile error

/root/src/fdbclient/S3BlobStore.actor.cpp:410:9: error: moving a local
object in a return statement prevents copy elision
[-Werror,-Wpessimizing-move]
 return std::move(resource);
        ^
/root/src/fdbclient/S3BlobStore.actor.cpp:410:9: note: remove std::move
call here
 return std::move(resource);
        ^~~~~~~~~~        ~
1 error generated.
2022-06-14 13:22:18 -07:00
Vishesh Yadav fd6f6eb06a
Merge pull request #7364 from sfc-gh-ljoswiak/fixes/unnecessary-transaction-initialization
Remove unnecessary ReadYourWritesTransaction initialization
2022-06-14 11:02:31 -07:00
Xiaoge Su 00b805d8e0 fixup! Reformat source 2022-06-14 10:43:13 -07:00
Xiaoge Su e493f1c3cd fixup! Add a retry mechanism in changeQuorumChecker and changeQuorum
This is to fix an issue when recovery and change coordinator key happens
together. The issue will occur when:

1. Recovery starts
2. Coordinator key change transaction started
3. During the recovery the coordinator key is read from cluster file and
   stored in the storage server
4. The cluster controller received `ChangeCoordiatorsRequest`, and
   updated the cluster name with the new value.

at this stage, the value related to coordinator key in storage server and
the worker is inconsistent.

5. changeQuorumChecker is called, which will verify such consistency.
   Since they are different, the call is returning failure and the
   caller, which could be a TEST_CASE, fails.

This is a rare race issue, and it is also noticed that when the
recovery/coordinator key change process is done, the database is in a
proper state which allows changeQuorumChecker behave properly. In this
case, a retry mechanism should be sufficiently fix corresponding test
failures.
2022-06-14 10:43:13 -07:00
Renxuan Wang 839af5701e
Fix bug in resolveTCPEndpoint() when hostname resolving fails. (#7375)
* Close trace file when error happens in runNetwork().

* Improve the bestCount algorithm in getLeader().

In the current implementation, if the nominees are [0,1], the chosen leader will be 1, which is an exception to other cases and our expectation that if 2 nominees have the same frequency, the one with lower id will be the leader.

* Remove unnecessary new statement.

stream will never be a nullptr.

* Move self->dnsCache out of lambda capture.

Member variables are not capture by default, thus, `host` and `service` are not captured. This somehow successfully compile, but throws std::bad_alloc or basic_string::_S_create exceptions when we call `host+":"+service` in dnsCache.remove().

* Revert unintended change.

* Address comments.
2022-06-13 20:24:30 -07:00
sfc-gh-tclinkenbeard b7fd69ed7f Add GLOBAL_TAG_THROTTLING_TRACE_INTERVAL knob 2022-06-13 16:09:21 -07:00
Zhanwei Wang e632aef1c7
Make backup work with s3 compatible service (#6355)(#6382) (#7324)
1. Support virtual hosting endpoint.

2. On-premise s3 compatible storage service may use IP instead of s3 form domain name,
especially for development/test environment.

Instead of parsing service and region from domain name,

1). Hard code "s3" as service name in v4 signature
2). Add new parameter to allow pass region name from url

3. Fix creating bucket issue on aws, adding request body.
2022-06-13 13:33:05 -07:00
Andrew Noyes 013b290ca5
Don't fail test if log cursor times out during network partition (#7330)
* Don't fail test if log cursor times out during network partition

Also, exercise the codepath for handling timed_out in simulation, by
reverting this knob buggification behavior to that of 07976993e7.

* clang-format
2022-06-13 13:28:22 -07:00
Trevor Clinkenbeard 942d687506
Clean up includes in actor header files (#7331)
* Remove unnecessary actorcompiler.h includes (from non-actor files)

* Make AsyncFileChaos a non-actor header file

* Add unactorcompiler.h include to the end of actor header files

* Add missing actorcompiler.h includes to actor header files
2022-06-13 13:26:51 -07:00
Andrew Noyes 207e0bc105
Fix a few places we weren't doing exponential backoff (#7349)
* Fix a few places we weren't doing exponential backoff

We re-create the transaction every iteration of each of these retry
loops, so we need to manage exponential backoff here ourselves.

Closes #7301

* Remove former Backoff definition
2022-06-13 13:18:58 -07:00
sfc-gh-tclinkenbeard df71a49bf6 Merge remote-tracking branch 'origin/main' into global-tag-throttling3 2022-06-13 10:03:10 -07:00
sfc-gh-tclinkenbeard 75c858eb2c Differentiate between different quotas in GlobalTagThrottling workload 2022-06-13 10:01:55 -07:00
Lukas Joswiak 3b3ef49d40 Remove unnecessary transaction initialization
`ReadYourWritesTransaction` has memory allocated before being passed to
the main thread. This allows both threads to continue to access the
transaction object. Currently, the transaction gets allocated and
initialized on the foreign thread, and then re-initialized on the main
thread. This causes a bunch of extra, unnecessary work for each
`ReadYourWritesTransaction` where the temporary object gets destructed.

The fix is to only allocate memory for the `ReadYourWritesTransaction`
on the foreign thread, and then initialize it once on the main thread.
2022-06-10 16:53:19 -07:00
Jingyu Zhou 7acd184a38
Merge pull request #7339 from jzhou77/fix-status-memory
Add rss_bytes to process memory and fix available_bytes calculation
2022-06-08 13:10:51 -07:00
Jingyu Zhou b9ff6bc129 Address AJ's comments 2022-06-08 09:38:32 -07:00
Sreenath Bodagala fe5f11358f
Merge pull request #7318 from sbodagala/main
Introduce a knob that controls the placement of remote storage server commit versions in version vector
2022-06-08 12:18:15 -04:00
Yi Wu bbf8cb4b02
GetEncryptCipherKeys helper function and misc encryption changes (#7252)
Adding GetEncryptCipherKeys and GetLatestCipherKeys helper actors, which encapsulate cipher key fetch logic: getting cipher keys from local BlobCipherKeyCache, and on cache miss fetch from EKP (encrypt key proxy). These helper actors also handles the case if EKP get shutdown in the middle, they listen on ServerDBInfo to wait for new EKP start and send new request there instead.

The PR also have other misc changes:
* EKP is by default started in simulation regardless of. ENABLE_ENCRYPTION knob, so that in restart tests, if ENABLE_ENCRYPTION is switch from on to off after restart, encrypted data will still be able to be read.
* API tweaks for BlobCipher
* Adding a ENABLE_TLOG_ENCRYPTION knob which will be used in later PRs. The knob should normally be consistent with ENABLE_ENCRYPTION knob, but could be used to disable TLog encryption alone.

This PR is split out from #6942.
2022-06-07 21:00:13 -07:00
Sreenath Bodagala 96a88e3847 Merge remote-tracking branch 'apple-upstream/main' 2022-06-07 18:38:35 +00:00
A.J. Beamon 4f308b34fc Fix an off-by-one error in determining whether to include the entire range in the conflict ranges when a reverse range read returns early due to limit. 2022-06-07 08:52:10 -07:00
Dan Adkins bd47f390bd
Add simulation test for three_data_hall configuration (#7305)
* Add simulation test for 1 data hall + 1 machine failure case.

* Disable BUGGIFY for DEGRADED_RESET_INTERVAL.

A simulation test discovered a situation where machines attempting to connect
to a dead coordinator (with a well-known endpoint) were getting themselves
marked degraded. This flapping of the degraded state prevented recovery from
completing, as it started over any time it noticed that tlogs on degraded
hosts could be relocated to non-degraded ones.

bin/fdbserver -r simulation -f tests/rare/CycleWithDeadHall.toml -b on -s 276841956
2022-06-06 13:14:49 -07:00
Josh Slocum a3289f9cab adding tenant prefix to bg ranges call 2022-06-06 14:09:10 -05:00
A.J. Beamon 49a9a850d6 Move the lock aware option into the database version of the tenant functions 2022-06-06 09:45:14 -07:00
A.J. Beamon 90625ba20d Update the create tenant transaction to take the ID as a parameter. Generate unique IDs for multiple creations in the same transaction. Don't set lock aware options inside the tenant transaction code. 2022-06-06 09:45:14 -07:00
Sreenath Bodagala a3c6ed2e86 - Introduce a knob that will control the placement of the commit
versions of remote storage servers in version vector. This optimization
will help reduce the size of version vector in HA configuration.
2022-06-03 19:29:54 +00:00
Josh Slocum b650410a48 Merge branch 'main' into blob_granule_kms 2022-06-03 09:13:49 -05:00
Josh Slocum fcd20c479d addressing review comments 2022-06-03 08:36:07 -05:00
Hao Fu e7fa8e9f6f
Add versionstamp support in tuple (#7293)
Tuple in C++ needs to support Versionstamp.
2022-06-02 17:44:10 -07:00
Xiaoge Su c886a6efe4 Refactor the WriteMap.h
Split the interface and implementation, so adding debugging information
in the code will not trigger full rebuild.
2022-06-02 15:57:26 -07:00
Junhyun Shim 3e79735b2f
Authz JWT support (#7279)
* Add JWT support to TokenSign

* Encapsulate OpenSSL public/private key type

Type-safe passing around of keys without having to DER/PEM-serialize
(OpenSSL doesn't have distinct types for public and private key)

* Apply Clang format

* Add verify benchmark for JWT and FlatBuffers token

* Unit test base64url::{encode, decode}

* Make all payload fields optional

Let user code validate non-signature fields

* Make all payload fields optional

Completely defer field check to user code

* Move rapidjson from fdbclient to contrib

* Make fdbrpc's rapidjson linkage private

Currently only sources include them.

* Modify rapidjson path in apiversioner.py

* Algorithm::Unknown > Algorithm::UNKNOWN
2022-06-02 13:22:50 +02:00
Andrew Noyes 2e087a6ec6
Fix some spammy trace events (#7296)
* Exponential backoff for some GlobalConfig retry loops

* Fix incorrect usage of random01() <= p idiom
2022-06-01 16:49:25 -07:00