Commit Graph

2018 Commits

Author SHA1 Message Date
Markus Pilman 0838bfcfa2 Allow workloads to log errors when test times out 2023-03-06 17:36:26 -07:00
Ata E Husain Bohra 2db1da26d9
EaR: Update ApiWorkload to validate encryption at-rest guarantees (#9466)
* EaR: Update ApiWorkload to validate encryption at-rest guarantees

Description

FDB encryption data at-rest guarantees if cluster is configured with feature
enabled, all data written to persistent disks shall be "encrypted". Given FDB
maintains multiple persistent storages during lifecycle of the data, the patch
proposes a scheme to validate the invariant via "simulation testing"

Patch proposes updating ApiCorrectness workload to do the following:
1. Client supplied params and/randomly enable the validation feature.
2. Validation when enabled, allows injecting a known "marker string"
to workload generated Key and Value data patterns.
3. On shutdown, if the validation is enabled, all test files are
scanned for the known "marker" pattern.

Simulation tests are already capable of doing the following:
1. Randomly select TenantMode (disabled/optional/required)
2. Randomly select EncryptionAtRestMode (cluster_aware/domain_aware)

Hence, the updates test all possible combinations are validated. Also,
'defaultTenant' is present to cover 'domain_aware' encryption use cases.

Testing
devRunCorrectness
devRetryCorrectness - ApiCorrectness & EncryptedBackupCorrectness
2023-02-27 21:40:46 -08:00
Jingyu Zhou 842d485862
Merge pull request #9402 from yao-xiao-github/main
Add shard consistency validation.
2023-02-27 17:05:30 -08:00
Russell Sears bcc05b1058 Improve support for prebuilt boost 2023-02-27 15:38:58 -06:00
Junhyun Shim b811881f41
Allow unthrottled, unsuppressed traces for security-related events (#9459)
* Define API for unsuppressable TraceEvent types

Add trace checking tests for authz trace events

* Revert temporary configurations used for debugging

* Simplify/Modernize flow audit logging API

- Do event type whitelist checks at compile time
- Use ""_audit literal API instead of a tag struct
- Replace int with a lightweight struct for tracking/modifying TraceEvent enablement

* Revert installing signal handler for SIGTERM and refactor test script

Move trace checker to local_cluster.py

* Lengthen public key refresh interval and add more audited events

* Try and make MSVC and Mac build happy

* consteval > constexpr

'inline consteval' still causes link errors in Mac builds
2023-02-27 21:51:13 +01:00
Jingyu Zhou 9a257a60a4 Address review comments 2023-02-24 10:47:32 -08:00
Jingyu Zhou 0b2e02c402 Fix rare test failures
Unclog after DB is recovered, otherwise another recovery may become stuck again.
2023-02-23 15:42:33 -08:00
Jingyu Zhou 65443b6541 Fix compiling errors 2023-02-23 15:02:44 -08:00
Jingyu Zhou ecae81882c Change to only clog once for a particular tlog
If we repeat clogging, different tlogs may be excluded, which can cause the
recovery to stuck.
2023-02-23 14:31:39 -08:00
Jingyu Zhou 955826f2fe Add ClogTlog workload 2023-02-23 14:31:12 -08:00
Junhyun Shim 2497aa5701
Clamp GetKeyServerLocations result to tenant prefix (#9424) 2023-02-21 18:55:24 +01:00
Yao Xiao a3b2324816 add validation 2023-02-16 13:52:27 -08:00
Junhyun Shim d9c126a2d9
Introduce WipedString for Arena block holding AuthZ tokens (#9381)
* Enable secure allocation mode in Arena

This mode allows zeroing out blocks holding sensitive data after use

* Introduce WipedString to all token-holding memory

Also introduce a option flag "sensitive"

* Make pointer equivalency a hard requirement for non-ASAN builds

So that we can detect when Arena/malloc/memory-wipe behavior changes
2023-02-16 10:44:32 +01:00
Jingyu Zhou 622520bd2d Return the source team if remote DC is dead
Also refactor the code with findTeamFromServers().
2023-02-10 11:11:07 -08:00
Jingyu Zhou 6c4a9b5f23 Fix DD stuck when remote DC is dead
When remote DC is down, the remote team collection of DD can initializing
waiting for the remote to recover (all_tlog_recruited state). However, the
getTeam request can already be served by the remote team collection. So, for
a RelocateShard (data movement such as split, move), it will get a team for
the remote DC. But the data movement can't make progress on the remote team
because the remote DC hasn't recovered yet. Because of the stuck of data
movement, the primary cannot reach the "storage_recovered" state and stay in
accepting_commit state.

The specifc test failure: slow/ApiCorrectness.toml -s 339026305 -b on
at commit:  0edd899d65

In this test, primary DC has 1 SS killed, remote DC has 2 TLog and 2 SS killed.
So the remote is dead, the remaining 2 SSes can't make progress because of the
loss of 2 TLogs. The repairDeadDatacenter() can't reach the "storage_recovered"
state due to DD's failure of moving shards away from the killed SS in the
primary.

The fix is to exclude all remote in repairDeadDatacenter() so that tells DD to
mark all SSes in the remote as unhealthy. Another fix is to return empty
results for getTeam request if the remote team collection is not ready. This
will allow the data movement to continue, essentially remote team is not changed
for the data movement.
2023-02-10 11:11:07 -08:00
Josh Slocum 1e5bac6238
fixing fault injection stalling change feed fetch (#9316) 2023-02-08 15:49:56 -06:00
Junhyun Shim 21651bdd6d Log failed security verifications from server-side
Log request types related to private endpoint access and failed security verifications
2023-02-07 18:51:25 +01:00
Junhyun Shim be225acd2a Merge remote-tracking branch 'origin/main' into authz-tenant-name-to-tenant-id 2023-02-06 23:13:43 +01:00
Junhyun Shim 1afd63d7e3 Minimize the risk of TracedTooManyLines in simulation
- Disable audit logging for simulation
- Relax the max_trace_lines knob limit to reduce false positives
2023-02-06 21:50:39 +01:00
Xiaoxi Wang 7190fa0c08 Merge branch 'main' of https://github.com/apple/foundationdb into fix/main/testTimeout 2023-02-03 13:48:54 -08:00
Xiaoxi Wang b757e8914a fix BOOST_SYSTEM_NO_LIB redefinition in CI 2023-02-03 13:47:50 -08:00
Junhyun Shim ce652fa284 Replace AuthZ's use of tenant names in token with tenant ID
Also, to minimize audit log loss, handle token usage audit logging at each usage.
This has a side-effect of making the token use log less bursty.
This also subtly changes the dedup cache policy.
Dedup time window used to be 5 seconds (default) since the start of batch-logging.
Now it's 5 seconds from the first usage since the closing of the previous dedup window
2023-02-03 21:46:31 +01:00
Vaidas Gasiunas f8b1da8bc6
An option to initialize client tracing in setupNetwork (#9209)
* client_config_tester: use a generic mechanism to set specific network options

* trace_initialize_on_setup option to initialize client traces on network setup without local IP address

* trace_initialize_on_setup: Addressing review comments

* Restore correct formatting

* trace_initialize_on_setup: Update go bindings

* Include PID for identification into trace file names by default

* Use the same naming pattern for trace files in all configurations

* Empty commit
2023-02-02 10:00:51 +01:00
Jingyu Zhou e96adfa449 Fix excessive killing for HA configuration
In the HA configuration, it's possible the remote DC was killed 2 out of 3
machines, left not enough machines for a successful recovery. So this PR changes
to Reboot to avoid such excessive killings.
2023-02-01 15:16:10 -08:00
Josh Slocum 1b4753a4d4
Fix chunked reads (#9246)
* removing chunked read loop

* reducing memory overhead of async file block cache by freeing some blocks during read if no longer needed
2023-01-30 13:43:24 -06:00
Chaoguang Lin 4c5cbe6cda Merge branch 'main' of github.com:apple/foundationdb into fix-nightly-failure 2023-01-25 18:43:37 -08:00
Chaoguang Lin fce9490c19 A Fix from Evan 2023-01-25 15:55:24 -08:00
Xiaoge Su 348e49d84f Update the license in the files 2023-01-24 15:12:31 -08:00
Xiaoge Su eb4e147ebf Reformat source 2023-01-24 15:06:27 -08:00
Xiaoge Su 85d64ce8fe Remove dependency of IConnection.h from simulator.h 2023-01-24 14:48:58 -08:00
Xiaoge Su c54e3fc78f Remove dependency of IUDPSocket.h from simulator.h 2023-01-24 14:48:53 -08:00
Xiaoge Su 0a60142160 Extract ProcessInfo, MachineInfo, KillType out from ISimulator 2023-01-24 14:48:42 -08:00
Xiaoge Su 50de69c897 Extract IConnection and NetworkAddress out from network.h 2023-01-24 14:48:31 -08:00
Xiaoge Su c11c48fa3f Extract ChaosMetrics out from network.h 2023-01-24 14:47:48 -08:00
Xiaoge Su 3f03a6b12d Extract out IPAddress and IUDPSocket 2023-01-24 14:47:39 -08:00
Xiaoge Su 890e578fe9 Extract TaskPriority to its own header file 2023-01-24 14:41:13 -08:00
Xiaoge Su a42f00da86 Do not include <boost/asio.hpp> in network.h 2023-01-24 14:41:03 -08:00
Xiaoge Su b14ae1a6a2 Extern template to reduce object instantiation
Wall time: 939.781
Clang time: 13212.0
2023-01-23 21:07:51 -08:00
Zhe Wu d578ace1f0
Merge pull request #9195 from halfprice/zhewu/log-ping-latency
Log PingLatency when there is no ping latency samples, but ping attempts
2023-01-23 09:34:29 -08:00
Jingyu Zhou ab06bf3b20
Merge pull request #9157 from sfc-gh-tclinkenbeard/drop-udp-packets-more
Drop UDP packets more frequently in simulation
2023-01-23 09:29:59 -08:00
Zhe Wu bb62ff94fa Log PingLatency when there is no ping latency samples, but ping attempts 2023-01-20 18:35:24 -08:00
A.J. Beamon b10d1f227b Remove tenant name from the TenantInfo object 2023-01-20 14:04:43 -08:00
Nim Wijetunga 82c92abca2
Change Noisy Sev30s to Sev20s (#9180)
change noisy sev30 traces to sev20
2023-01-19 14:06:34 -08:00
sfc-gh-tclinkenbeard b2222a5249 Remove rare annotations from Token code 2023-01-18 11:39:02 -08:00
Junhyun Shim e3c3922cc5
Merge pull request #9114 from sfc-gh-vgasiunas/vgasiunas-client-status-report
API for Client Status Report
2023-01-17 11:49:34 +00:00
sfc-gh-tclinkenbeard 986c792a9f Drop UDP packets more frequently in simulation 2023-01-15 17:32:57 -08:00
Xiaoxi Wang f0fd84e9f1 merge upstream/main; solve conflicts 2023-01-12 13:49:19 -08:00
Mohamed Oulmahdi dcea8a0ac7 Restart joshua 2023-01-11 12:12:16 -06:00
Mohamed Oulmahdi b2eba6956c Add tokensign dependency for Windows 2023-01-11 12:12:16 -06:00
Vaidas Gasiunas 8d734fba85 get_client_status: report database connection status 2023-01-11 17:43:35 +01:00