Commit Graph

5324 Commits

Author SHA1 Message Date
sramamoorthy b17ad85497 exec op not supported when log_anti_quorum > 0 2019-05-28 22:07:46 -07:00
sramamoorthy 3aa848b8af minor bug in whitelist binary path testing 2019-05-28 22:07:46 -07:00
sramamoorthy c906da1f62 simulator: spawnProcess to wait for long duration
spawnProcess was waiting for 3 seconds and terminating
the child process for synchronous calls, but in the
simulator, this can lead to non-determinism, because
some cases the command can run in <3 or >3 seconds.
The fix is to increase the wait for duration to be
very long that it has to synchronously wait and get
the results or the test will timeout.
2019-05-28 22:07:46 -07:00
sramamoorthy 31b6c86650 ignorePopDeadline to have high limit in simulator
- ignorePopDeadline to have highier limit in simulator
to accommdate for the buggify delays and make snapshot succeed.

- introduce a new knob for auto resetting the disabling of tlog pop
2019-05-28 22:07:46 -07:00
sramamoorthy 40358e1dd6 limit of getRange in snapTest reduced
With CLIENT_KNOBS->TOO_MANY in snapTest, by the time getRange
gathers all the results, the storage server's oldest version has
gone past the req->version and hence the transaction fails with
transaction_too_old
2019-05-28 22:07:46 -07:00
sramamoorthy b1b96946af logData->stop check right after execOpHold wait 2019-05-28 22:07:46 -07:00
sramamoorthy 5749e220bd use FlowLock for implementing critical section
Instead of using Promises and future to implement
critcal section use FlowLock
2019-05-28 22:07:46 -07:00
sramamoorthy e6c0b87a4d remove unused variable 2019-05-28 22:07:46 -07:00
sramamoorthy dcb99c5138 txn to disable tlogPop to be timedout
If the disable tlog pop txn takes more than 30 seconds then
tlog will automatically start enabling pop, fix is to timeout
the txn if it takes more than 10 seconds and retry a new
txn to disable tlog pop.
2019-05-28 22:07:46 -07:00
sramamoorthy b56d8e648f bp::child->wait_for does not give correct err code
boost::process::child->wait_for does not give the error code
from the process being run. Re-arrange the code to work-around
it.
2019-05-28 22:07:46 -07:00
sramamoorthy f27a40f118 execProcessingHelper made synchronous
tLogCommit exects no blocking between duplicate check and
setting of the new version, that constraint was broken
when synchronous execProcessingHelper was introduced.
As a fix, execProcessingHelper was made asynchronous.
2019-05-28 22:07:46 -07:00
sramamoorthy ceac68c990 restore - remove emtpy snapdir,snap loop retry fix
- remove partially snapped directories to avoid no cluster file assert
- snap create to retry max 3 times for not_fully_recovered and keep
  retrying for the other failures
2019-05-28 22:07:46 -07:00
sramamoorthy d3a179b6f9 Multiple bug fixes
- wait for snapTLogFailKeys in a loop, otherwise in some race
  condition it can cause a false assert
- in single region, there does not seem to be a guarantee of
  tagLocalityListKey for a given DC ID, avoiding that assert for now
- to find the workers that are coordinators, looking up by primary
  address is not sufficient in some cases, hence looking by both
  primary and secondary address
- test make files to reflect the location of the new test cases
2019-05-28 22:07:46 -07:00
sramamoorthy bb474dc323 if recovery < fully_recovered then fail the exec
Will do more cleanup, pushing it for a test run in CI
2019-05-28 22:07:46 -07:00
sramamoorthy 16fc7b6aaa move SnapTests into restarting/from_6.2.0 2019-05-28 22:07:46 -07:00
sramamoorthy c53c4fa898 reduce the snap test durations 2019-05-28 22:07:46 -07:00
sramamoorthy 925499954b New status cluster_not_fully_recovered 2019-05-28 22:07:46 -07:00
sramamoorthy 591ff96b93 increase retry and use eat instead of parsing 2019-05-28 22:07:46 -07:00
sramamoorthy 6f42337c09 TransactionNotPermitted instead of conflict error
When the cluster has not recovered completely, return op not
permitted instead of conflict error
2019-05-28 22:07:46 -07:00
sramamoorthy dcd2d96751 make spawnProcess predictable in the simulator 2019-05-28 22:07:46 -07:00
sramamoorthy 4083af0b01 Avoid using trackLatest for TLog pop test cases 2019-05-28 22:07:46 -07:00
sramamoorthy 936ffc2dde rebase related changes 2019-05-28 22:07:46 -07:00
sramamoorthy d68a229772 makefile changes to accommodate boost/process.hpp 2019-05-28 22:07:46 -07:00
sramamoorthy ec7834e2f7 code re-orgnaization and address comments 2019-05-28 22:07:46 -07:00
sramamoorthy b6e037ffbc Replace fork with boost::process::child 2019-05-28 22:07:46 -07:00
sramamoorthy c76cc84ded execute coordinators code reorganized 2019-05-28 22:07:46 -07:00
sramamoorthy e91c76834e tlog: move snap create part to indepdendent funcs 2019-05-28 22:07:46 -07:00
sramamoorthy 61e93a9304 Address review comments and minor fixes 2019-05-28 22:07:46 -07:00
sramamoorthy 9e3104c2d4 Fix: races in async exec leading to bad backup 2019-05-28 22:07:46 -07:00
sramamoorthy 858604b51d minor cleanups to SnapTest 2019-05-28 22:07:46 -07:00
sramamoorthy 00ccee8a6c workaround for log giving remote log and others
logSystemConfig.allLocalLogs() sometimes returns remote TLog interface
and a workaround is implemented here. Other minor cleanup.
2019-05-28 22:07:46 -07:00
sramamoorthy 8838ba3d3b Split SnapTestSimpleRestart into two test cases 2019-05-28 22:07:46 -07:00
sramamoorthy 090bb53034 ShardInfo::addMutation to handle exec mutation 2019-05-28 22:07:46 -07:00
sramamoorthy cfdad0c5e6 tlog to snapshot exactly at exec version 2019-05-28 22:07:46 -07:00
sramamoorthy 89b7a052f5 Bug fixes for snapping coordinators 2019-05-28 22:07:46 -07:00
sramamoorthy 539e65efad Skip parsing mutations if it is tagged for TxsTag
In Tlog, if a mutation is targetted for TxsTag then skip from
parsing them.
2019-05-28 22:07:46 -07:00
sramamoorthy 8370871e4c stale RESTORE option related code removed 2019-05-28 22:07:46 -07:00
sramamoorthy 17ecba8313 trace cleanup and other indentation changes 2019-05-28 22:07:46 -07:00
sramamoorthy 898bed66c1 Allow only whitelisted binary path for exec op 2019-05-28 22:07:46 -07:00
sramamoorthy aa79480d69 changes to make fdbfork asynchronous 2019-05-28 22:07:46 -07:00
sramamoorthy c4d27ac9d2 bug fixes in SnapTest
Earlier the test was checking for the following condition:
durable version of storage > min version of tlog, but the
check has been modified to:
durable version of storage >= min version of tlog - 1.

Ensure that the pre-snap validate keys are exactly 1000 in
the case of commit retires.
2019-05-28 22:07:46 -07:00
sramamoorthy f129d996fe Remove dumpAfterTest=true in snap tests 2019-05-28 22:07:46 -07:00
sramamoorthy d282016f93 Exec op to tag only local storage nodes 2019-05-28 22:07:46 -07:00
sramamoorthy a60145b9a1 Restore the cluster in single region configuration 2019-05-28 22:07:46 -07:00
sramamoorthy f7ba0635ef Make Exec op the first op in the batch 2019-05-28 22:07:46 -07:00
sramamoorthy 382b246930 trace change and retain fitness file after restore 2019-05-28 22:07:46 -07:00
sramamoorthy 281c785f94 '--restoring' cmd line arg removed for fdbserver
'--restoring' command line option was introduced to indicate
simulated fdbserver to restore from snapshot and restart the cluster.
As part of this change that option is removed and restore
information is stored in the restartInfo.ini.
2019-05-28 22:07:46 -07:00
sramamoorthy 6431513ad0 Fail exec req until the cluster is fully_recovered 2019-05-28 22:07:46 -07:00
sramamoorthy 4016f16c76 Fix few compilation and bugs in rebase 2019-05-28 22:07:46 -07:00
sramamoorthy 3d5998e9dd tlog: when pops are disabled, store them & replay
In Tlogs, disable pop is done whlie taking snapshots. Earlier, tlogs
were ignoring the pops if it got pop requests when pops were
disabled. In this change, instead of ignoring the pop - it remembers
the list of pops in-memory and plays them once the popping is
enabled.
2019-05-28 22:07:46 -07:00