Commit Graph

25385 Commits

Author SHA1 Message Date
A.J. Beamon 3ac7e17b79 Fix create tenant usage in tenant management workload 2023-02-23 15:08:52 -08:00
Jingyu Zhou 65443b6541 Fix compiling errors 2023-02-23 15:02:44 -08:00
Jingyu Zhou 40c7cfec0c Move ClogTlog.toml to rare 2023-02-23 14:31:47 -08:00
Jingyu Zhou ecae81882c Change to only clog once for a particular tlog
If we repeat clogging, different tlogs may be excluded, which can cause the
recovery to stuck.
2023-02-23 14:31:39 -08:00
Jingyu Zhou 6055f752c2 Exclude failed tlog if recovery stuck more than 30s
Because the tlog is clogged, recovery can stuck in initializing_transaction_servers.
This exclude allows the recovery to complete.
2023-02-23 14:31:32 -08:00
Jingyu Zhou c4773b7cc8 Update clogTlog workload to be single region 2023-02-23 14:31:24 -08:00
Jingyu Zhou 955826f2fe Add ClogTlog workload 2023-02-23 14:31:12 -08:00
Jingyu Zhou 1f1dc5e768 Allow a comma separated list of excluded addresses 2023-02-23 14:29:08 -08:00
Jingyu Zhou 6ac8720364 Add exclude to fdbcli's configure command
Right now this only allows one server address being excluded. This is useful
when the database is unavailable but we want the recruitment to skip some
particular processes.

Manually tested the concept works with a loopback cluster.
2023-02-23 14:28:20 -08:00
Markus Pilman c1f80fe471 Merge remote-tracking branch 'origin/main' into features/tenant-lock-fdbcli 2023-02-23 15:16:14 -07:00
Jingyu Zhou 792950dbdc
Merge pull request #9434 from sfc-gh-huliu/splitmetrics
Implement SplitMetric pagination in blob migrator
2023-02-23 14:10:27 -08:00
Markus Pilman 1862e65415 Fix build issue with awssdk_target 2023-02-23 15:05:17 -07:00
Markus Pilman 8759fd8f12 Fix refactoring mistake 2023-02-23 14:41:27 -07:00
A.J. Beamon b828f3f257 Add missing change to explicit TenantMapEntry conversion 2023-02-23 13:38:04 -08:00
A.J. Beamon 54955d54f2 Don't allow repopulating from a management cluster if there is another ID registered for the same cluster. Instead, the cluster must be unregistered first before repopulating from it. Also improves a trace event. 2023-02-23 13:28:10 -08:00
A.J. Beamon c2d28377af Set the restore ID in the data cluster after marking the cluster restoring in the management cluster 2023-02-23 13:28:10 -08:00
A.J. Beamon 6adccdafa9 Add a conflict range on the active restore ID when setting it 2023-02-23 13:28:10 -08:00
A.J. Beamon 537834ef00 Properly initialize API version of simulated MVC clusters when calling openDatabase 2023-02-23 13:28:10 -08:00
A.J. Beamon 06fe00544a Remove TenantMapEntry <-> MetaclusterTenantMapEntry conversion constructors and use named functions instead 2023-02-23 13:28:10 -08:00
A.J. Beamon dcae48cbbd Add concurrent restore testing to the metacluster restore workload 2023-02-23 13:27:20 -08:00
Markus Pilman 193e517cc4 Address review comments and move lock ID into TenantMapEntry 2023-02-23 14:25:36 -07:00
A.J. Beamon e151a2d363
Merge pull request #9451 from sfc-gh-ajbeamon/metacluster-management-workload-restore-support
Improve restore support in the metacluster management workload
2023-02-23 13:10:31 -08:00
Xiaoge Su 4c9c357d2c Change the storage directory to joshua/ensembles/results/applications/simulation_logs 2023-02-23 13:05:20 -08:00
Markus Pilman efc5bf9ee8
Merge pull request #9456 from sfc-gh-ajbeamon/smaller-tenant-in-txn-state-store
Store a smaller tenant object in the txn state store
2023-02-23 14:00:12 -07:00
A.J. Beamon 9e9a31c0f1 Use error variable consistently 2023-02-23 11:27:53 -08:00
Xiaoge Su 6408208c38
Update contrib/joshua_logtool.py
Co-authored-by: Jingyu Zhou <jingyuzhou@gmail.com>
2023-02-23 10:49:11 -08:00
Evan Tschannen cf3a4e6161 Merge branch 'main' into feature-change-feed-cache 2023-02-23 10:16:13 -08:00
Jingyu Zhou 3d8b8a2a05
Merge pull request #9450 from sfc-gh-ahusain/ahusain-misc-fixes
EaR: RESTClient and EKP changes to handle unreachable external KMS
2023-02-23 10:04:12 -08:00
Evan Tschannen a581a55452 ensure a worker cannot run multiple blob worker roles 2023-02-23 09:51:26 -08:00
A.J. Beamon dd650215d4 Store a smaller tenant object in the txn state store 2023-02-23 09:29:33 -08:00
Vaidas Gasiunas 402f618180
Default transaction options for report_conflicting_keys and used_during_commit_protection_disable (#9441)
* Introducing default transaction options for report_conflicting_keys and used_during_commit_protection_disable, set the latter option always in Java bindings

* Reformatting TransactionIntegrationTest.java

* Update description of transaction_report_conflicting_keys option

* Remove dependency between mock and real database implementation in RangeQueryTest.java

* Update generated.go after changing desciption of an option

* Small improvements of the TransactionIntegrationTest code
2023-02-23 18:05:01 +01:00
Markus Pilman 16da8022f2
Merge pull request #9431 from sfc-gh-mpilman/features/testharness-include-config-string
Include configure string in test harness output
2023-02-23 10:02:36 -07:00
Xiaoge Su 28598a3100 fixup! Add the joshua_logtool.py 2023-02-22 18:11:36 -08:00
Ata E Husain Bohra 7d079690d4 Merge branch 'main' into ahusain-misc-fixes 2023-02-22 18:11:11 -08:00
Xiaoge Su 282f681d13 fixup! Address comments 2023-02-22 18:11:07 -08:00
A.J. Beamon a76af5c696 Improve restore support in the metacluster management workload 2023-02-22 17:42:36 -08:00
Ata E Husain Bohra 1f7ee9437f EaR: RESTClient and EKP changes to handle unreachable external KMS
Description

Two major changes proposed are:

I)
Used following setup for testing:
1. Run `fdbserver` locally.
2. Run a mock python based HTTP server (encryption endpoints not implemented)

Expectation was RESTClient code should go in loop trying to establish connection
to the desired encryption endpoint. However, observation was the code loops for
one cycle and followup cycle SEGV while printing a log using RESTUrl object which
is obtained as a 'pointer' from the caller. Update the code to use RESTUrl object
instead of the pointer.

II) In above setup, KMSConnector would throw 'encrypt_key_fetch_failed' error
which wasn't handled by EKProxy, hence, causing the service to terminate. Add
code to re-throw the error to the caller.

Testing
2023-02-22 17:15:34 -08:00
Ankita Kejriwal 64ac92bd4b Improve comments as per review 2023-02-22 17:13:44 -08:00
A.J. Beamon 9b906d9b3d
Merge pull request #9447 from sfc-gh-ajbeamon/metacluster-restore-fixes
Metacluster restore fixes
2023-02-22 17:07:19 -08:00
Hui Liu 0fba65a3cd Implement SplitMetric pagination in blob migrator 2023-02-22 16:00:49 -08:00
Ankita Kejriwal 8aafbfe6cc Improve space estimation in checkExclusion() 2023-02-22 15:58:25 -08:00
Markus Pilman 46dd75ed06 Fix compilation issues after merging with main 2023-02-22 16:51:16 -07:00
Markus Pilman 230170a431 Merge remote-tracking branch 'sfc/features/tenant-lock-fdbcli' into features/tenant-lock-fdbcli 2023-02-22 16:29:28 -07:00
Markus Pilman 7d7ca0bb34
Apply suggestions from code review
Co-authored-by: A.J. Beamon <aj.beamon@snowflake.com>
2023-02-22 16:28:53 -07:00
Markus Pilman b1762916fb Merge remote-tracking branch 'origin/main' into features/tenant-lock-fdbcli 2023-02-22 16:27:24 -07:00
A.J. Beamon 87cb21be06
Merge pull request #9310 from sfc-gh-mpilman/features/tenant-lock2
Tenant Lock
2023-02-22 15:18:14 -08:00
A.J. Beamon 33431f062d Add some trace events, use a more appropriate error, and improve a check of allocated tenant groups 2023-02-22 14:39:51 -08:00
Markus Pilman 4e31cd7582 Fix compilation error due to TenantLockState being moved to a different namespace 2023-02-22 15:21:47 -07:00
Xiaoge Su ce54bca2bd fixup! Escalate log level from DEBUG to INFO for less pollution 2023-02-22 13:16:52 -08:00
Xiaoge Su 242962aed3 Provide a tool that allows downloading logs when simulation RocksDB failures
A script, rocksdb_logtool.py, is available to upload/download generated
XML/JSON log files when test harness 2 detects that the test is failed,
and the script detects that the test is using RocksDB.

To upload, no actions is needed, using the regular

```
joshua start --tarball correctness.tgz
```

The build system will automatically pack the rocksdb_logtool.py into the
tarball and test harness 2 will call the script if it thinks the test is
failed.

To download, simply provide the ensemble id and test uid, e.g.

```
python3 rocksdb_logtool.py download --ensemble-id
20230222-204240-xiaogesu-cb6ea277a898f134 --test-uid
ab6fb792-088f-49d6-92d2-43bc4fb81668
```

Note the test UID can be retrieved by

```
joshua tail ensemble_id
```

output, it is in the field `TestUID` in <Test> element of test harness 2
generated XML.

For convenience, it is possible to do a

```
python3 rocksdb_logtool.py list --ensemble-id ensemble-id
```

to generate all possible download commands for failed tests. However,
the list subcommand will *NOT* verify if the test failure is coming from
RocksDB, i.e. other test failues may be included and it is the caller's
responsibility to verify. If the test is not RocksDB related, the
download would fail as nothing is uploaded.
2023-02-22 13:15:57 -08:00