Commit Graph

118 Commits

Author SHA1 Message Date
Steve Atherton d169875423 Add a knob for whether to allow guard pages in memory allocations done via mmapInternal(). The knob defaults to false. 2022-10-16 19:55:07 -07:00
Andrew Noyes da6cf36149 Only register SIGTERM with crashHandler for USE_GCOV
Otherwise SIGTERM (and rightly so) causes TSAN to lose its noodle when
fdbmonitor kills fdbserver in the normal way.
2022-10-07 13:42:23 -07:00
A.J. Beamon e1fe28b78b Switch some usages of LiteralStringRef to use the _sr suffix 2022-09-30 16:04:16 -07:00
Andrew Noyes fee77c8702
Change crashAndDie to abort instead of segv (#7827)
* Change crashAndDie to abort instead of segv

This way we still terminate early when `--crash` is set, but folks
inspecting error reports don't get mislead and confuse assertion
failures with genuine memory errors.

* Add SIGABRT to crash handler
2022-09-23 11:06:53 -07:00
Xiaoge Su 970463223c
Merge branch 'main' into main 2022-09-20 16:56:56 -07:00
A.J. Beamon 4fd64630e8 Convert literal string ref instances to use _sr suffix 2022-09-19 11:35:58 -07:00
Xiaoge Su 3a9bf10d81 fixup! Try fix Windows build 2022-09-12 19:35:55 -07:00
Xiaoge Su 6b27f10a6c fixup! Fix windows compile 2022-09-12 14:44:38 -07:00
Markus Pilman 59ce49913a
Merge pull request #8146 from sfc-gh-tclinkenbeard/improve-code-coverage
Increase the number of unit tests run in `RandomUnitTests.toml`
2022-09-12 15:10:47 -06:00
Xiaoge Su 5996643f13 fixup! Fix the include file 2022-09-12 12:01:05 -07:00
Xiaoge Su 8130fce97f Update code per comments
Also sort #include in Platform.actor.cpp
2022-09-12 11:44:41 -07:00
sfc-gh-tclinkenbeard 5dbbd73879 Fix typo in __eraseDirectoryRecursiveCount variable name 2022-09-11 00:36:13 -07:00
Yi Wu d831c87d14
Add encryption metrics (#8070)
Adding the following metrics:
* BlobCipherKeyCache hit/miss
* EKP: KMS requests latencies
* For each component that using encryption, they now need to pass a UsageType enum to the encryption helper methods (GetEncryptCipherKeys/GetLatestEncryptCipherKey/encrypt/decrypt) and those methods will help to log get cipher key latency samples and encryption/decryption cpu times accordingly.
2022-09-09 18:43:09 -07:00
Andrew Noyes fbf5830bb2 Rollback #7374
Valgrind is complaining about use of uninitialized memory in
absl::GetStackTrace, and the cases where it complains the backtraces are
incomplete. Note: this means that jemalloc heap profiling no longer
works out of the box. Advanced users who want to enable jemalloc heap
profiling will now have to revert this change and build from source.
2022-09-06 16:55:18 -07:00
Junhyun Shim 71486cccb1
Make envvar access in envvar-knob parsing cross-platform (#7999)
* Make envvar access in envvar-knob parsing cross-platform

Variable `environ' is defined for Linux,
does not exist in Mac, and is deprecated in Windows

* Prevent env knob strings from being copied at each iteration

* Add headers for {Get|Free}EnvironmentStrings()

* Adjust for different memory layout of environ vs GetEnvironmentStrings()

GetEnvironmentStrings returns a single byte string
with adjacent key=value strings delimited with null bytes,
whereas environ is an array of key=value strings.

* Fix missing return value for Windows build

* Have just one return statement
2022-08-29 14:15:03 +02:00
Dan Lambright 50be189a0f Merge remote-tracking branch 'origin' into env 2022-08-23 17:01:48 -04:00
Marian Dvorsky e268ae9a86 Run clang-format 2022-08-08 18:48:15 +02:00
Marian Dvorsky b8370e5129 Print SIGNAL output to stdout 2022-08-08 18:28:11 +02:00
Marian Dvorsky 77269d0506 Print to stderr only upon errors 2022-08-08 18:11:35 +02:00
Marian Dvorsky 722f66beb7 Flush gcov coverage upon SIGTERM 2022-08-08 17:11:20 +02:00
Dan Lambright 473feef246 Set knobs using environment variables 2022-08-02 14:29:37 -04:00
Renxuan Wang 570db7a7d2
Fix processDiskReadSeconds and processDiskWriteSeconds. (#7608) 2022-07-18 13:47:45 -07:00
Junhyun Shim 8b70c7050d Apply RTLD_NODELETE for USE_SANITIZER builds
This allows their stack trace to be symbolized in CI/Jenkins XSAN runs
2022-07-18 19:25:12 +02:00
Markus Pilman 03d913a1de Flow compiling 2022-06-27 17:05:55 -06:00
Markus Pilman 10e478dfc3 Flow is compiling 2022-06-23 16:35:19 -06:00
Markus Pilman d35445a868 enforce include modularization in cmake 2022-06-23 14:37:35 -06:00
Andrew Noyes 83aceb216c
Use absl::GetStackTrace for slow task profiler (#7374)
* Make SlowTask workload runnable in joshua

* Remove SignalSafeUnwind, and use absl::GetStackTrace for slow task profiler
2022-06-15 14:53:52 -07:00
Mohamed Oulmahdi 26ad8bc184 Format 2022-05-31 14:17:41 +02:00
Mohamed Oulmahdi 81e81c6799 Fix atomicPath and abspath for Windows 2022-05-31 12:01:39 +02:00
Ata E Husain Bohra 33ae398268
REST KmsConnector implementation (#6994)
* REST KmsConnector implementation

Description
  diff-1: Address review comments.
          Add utility interface to Platform namespace to
          create and operate on tmpfile
 diff-2: Address review comments
         Link Boost::filesystem to CMake build process

Major changes includes:
1. Implement REST based KmsConnector implementation.
2. Salient features of the connector:
 2.1. Two required configuration are:
   a. Discovery KMS URLs - enable KMS discovery on bootstrap
   b. Endpoint path configuration to construct URI to fetch/refresh
      encryption keys
   c. Configuration to provide "validationTokens" to connect with
      external KMS. Patch implements file-based token validation scheme.
 2.2. On startup, RESTKmsConnector discovers KMS Urls and caches
      them in-memory. Extracts "validationTokens" based on input config.
 2.3. Expose endpoints to allow fetch/refresh of encryption keys.
 2.4. Defines JSON format to interact with external KMS - request &
      response payload format.
3. Extend Platform namespace with an interface to create and operate on
   tmp files.
4. Update Platform 'readFileBytes' and 'writeFileBytes' to leverage
   fstream supported implementation.

NOTE: KMS URLs fetched after initial discovery will be persisted using
      DynamicKnobs. It is TODO at the moment and shall be completed
      once DynamicKnobs is feature complete

Testing

Unit test to validation following:
1. Parsing on "validation tokens" logic.
2. Construction and parsing of REST JSON request and response strings.
2022-05-07 13:18:35 -07:00
sfc-gh-tclinkenbeard 258ba462e1 Remove !defined(_WIN32) guards for encryption code 2022-05-03 09:48:24 -07:00
sfc-gh-tclinkenbeard 7f05221cfe Removed TLS_DISABLED macro 2022-05-02 22:15:27 -07:00
Andrew Noyes 7ed82c1ac5
Mac m1 has 16k pages (#7038)
Previously the page guard implementation assumed that the page size was
4k. Also check for mmap and mprotect returning errors.
2022-05-02 14:24:43 -07:00
Andrew Noyes 297d831192
Put guard pages next to fast alloc memory (#6885)
* Put guard pages next to fast alloc memory

I verified that we can now detect #6753 without creating tons of
threads.

* Use pageSize instead of 4096

* Don't include mmapInternal for windows
2022-04-19 11:22:35 -07:00
Yi Wu 994b8c92f8
Add option to limit resident memory and remove default memory limit (#6719)
Changing `memory` option to limit resident memory instead of virtual memory, in config file and fdbserver/fdbbackup/fdbcli command-line argument. Since `rlimit` doesn't support limiting virtual memory, the current implementation have both of fdbmonitor and the fdbserver/fdbbackup process checking process RSS periodically and kill and restart the process if the limit is exceeded.

Adding a new `memory_vsize` option to limit virtual memory, if backward-compatible behavior is desired.

closes #6671, closes #6672
2022-04-06 20:06:24 -07:00
Steve Atherton 38190ad7e7
Merge pull request #6737 from sfc-gh-satherton/fix-storage-timestamps
Change storage metadata and perpetual wiggle timestamps to double epoch seconds
2022-04-02 09:47:23 -07:00
Chaoguang Lin 7d365bd1bb
Remote ikvs debugging (#6465)
* initial structure for remote IKVS server

* moved struct to .h file, added new files to CMakeList

* happy path implementation, connection error when testing

* saved minor local change

* changed tracing to debug

* fixed onClosed and getError being called before init is finished

* fix spawn process bug, now use absolute path

* added server knob to set ikvs process port number

* added server knob for remote/local kv store

* implement simulator remote process spawning

* fixed bug for simulator timeout

* commit all changes

* removed print lines in trace

* added FlowProcess implementation by Markus

* initial debug of FlowProcess, stuck at parent sending OpenKVStoreRequest to child

* temporary fix for process factory throwing segfault on create

* specify public address in command

* change remote kv store knob to false for jenkins build

* made port 0 open random unused port

* change remote store knob to true for benchmark

* set listening port to randomly opened port

* added print lines for jenkins run open kv store timeout debug

* removed most tracing and print lines

* removed tutorial changes

* update handleIOErrors error handling to handle remote-ikvs cases

* Push all debugging changes

* A version where worker bug exists

* A version where restarting tests fail

* Use both the name and the port to determine the child process

* Remove unnecessary update on local address

* Disable remote-kvs for DiskFailureCycle test

* A version where restarting stuck

* A version where most restarting tests green

* Reset connection with child process explicitly

* Remove change on unnecessary files

* Unify flags from _ to -

* fix merging unexpected changes

* fix trac.error to .errorUnsuppressed

* Add license header

* Remove unnecessary header in FlowProcess.actor.cpp

* Fix Windows build

* Fix Windows build, add missing ;

* Fix a stupid bug caused by code dropped by code merging

* Disable remote kvs by default

* Pass the conn_file path to the flow process, though not needed, but the buildNetwork is difficult to tune

* serialization change on readrange

* Update traces

* Refactor the RemoteIKVS interface

* Format files

* Update sim2 interface to not clog connections between parent and child processes in simulation

* Update comments; remove debugging symbols; Add error handling for remote_kvs_cancelled

* Add comments, format files

* Change method name from isBuggifyDisabled to isStableConnection; Decrease(0.1x) latency for stable connections

* Commit the IConnection interface change, forgot in previous commit

* Fix the issue that onClosed request is cancelled by ActorCollection

* Enable the remote kv store knob

* Remove FlowProcess.actor.cpp and move functions to RemoteIKeyValueStore.actor.cpp; Add remote kv store delay to avoid race; Bind the child process to die with parent process

* Fix the bug where one process starts storage server more than once

* Add a please_reboot_remote_kv_store error to restart the storage server worker if remote kvs died abnormally

* Remove unreachable code path and add comments

* Clang format the code

* Fix a simple wait error

* Clang format after merging the main branch

* Testing mixed mode in simulation if remote_kvs knob is enabled, setting the default to false

* Disable remote kvs for PhysicalShardMove which is for RocksDB

* Cleanup #include orders, remove debugging traces

* Revert the reorder in fdbserver.actor.cpp, which fails the gcc build

Co-authored-by: “Lincoln <“lincoln.xiao@snowflake.com”>
2022-03-31 17:08:59 -07:00
Steve Atherton 6744e9e4f9 Change timestamps used in storage server metadata and perpetual wiggle metrics to epoch seconds, stored as doubles, and stringified as either floating point epoch seconds or timestamp strings of the form "2013-04-28 20:57:01.000 +0000". 2022-03-30 18:57:06 -07:00
Ata E Husain Bohra 017709aec6
Introduce BlobCipher interface and cipher caching interface (#6391)
* Introduce BlobCipher interface and cipher caching interface

 diff-3: Update the code to avoid deriving encryption key periodically.
         Implement EncyrptBuf interface to limit memcpys.
         Improve both unit test and simulation to better code coverage.
 diff-2: Add specific error code for OpenSSL AES call failures
 diff-1: Update encryption scheme to AES-256-CTR. Minor
         updates to Header to capture more information.

Major changes proposed are:
1. Introduce encyrption header format.
2. Introduce a BlobCipher cipher key representation encoding
following information: baseCipher details, derived encryption cipher
details, creationTime and random salt.
3. Introduce interface to support block cipher encrytion and decrytion
operations. Encyrption populates encryption header allowing client to
persist them on-disk, this header is then read allowing decryption
on reads.
4. Introduce interface to allow in-memory caching of cipher keys. The
cache allowing mapping of "encryption domain" -> "base cipher id" ->
"derived cipher keys" (3D hash map). This cache interface will be used
by FDB processes participating in encryption to cache recently used
ciphers (performance optimization).

Testing:
1. Unit test to validate caching interface.
2. Update EncryptionOps simulation test to validate block cipher
operations.
2022-03-24 07:31:49 -07:00
sfc-gh-tclinkenbeard a71099471b Update copyright header dates 2022-03-21 13:36:23 -07:00
Andrew Noyes a827e1cbd5 Flush all output streams in flushAndExit
Also detect reading past the end of the infile for debug_determinism tracing
2022-03-07 13:25:52 -08:00
Jingyu Zhou 1a5bf25b5c Update code base to use fmt 8.1.1 2022-03-04 15:52:06 -08:00
Renxuan Wang 233c918ffb Replace printf() and fprintf() with fmt::print(). 2022-02-25 19:06:57 -08:00
Renxuan Wang f7eb66441d Try eliminating warnings in macOS and Windows CI builds.
MacOS warnings are format warnings, e.g., `format specifies type 'long' but the argument has type 'Version' (aka 'long long')`.
Windows warnings are `ACTOR does not contain a wait() statement`.
2022-02-25 19:06:57 -08:00
Mohamed Oulmahdi 5b6098b5be Fix Windows build broken by #6449 2022-02-25 16:25:45 +01:00
A.J. Beamon 250a88e682 Enforce that trace event suppression calls happen first when using trace event call chaining. Fix various instances where we weren't following this requirement. 2022-02-24 12:25:52 -08:00
vikasgupta8 edfff755bf added support for ppc64le 2022-02-11 06:17:15 +00:00
Xiaoxi Wang 6dc5921575
createdTime based storage wiggler (#6219)
* add storagemetadata

* add StorageWiggler;

* fix serverMetadataKey bug

* add metadata tracker in storage tracker

* finish StorageWiggler

* update next storage ID

* change pid to server id

* write metadata when seed SS

* add status json fields

* remove pid based ppw iteration

* fix time expression

* fix tss metadata nonexistence; fix transaction retry when retrieving metadata

* fix checkMetadata bug when store type is wrong

* fix remove storage status json

* format code

* refactor updateNextWigglingStoragePID

* seperate storage metadata tracker and store type tracker

* rename pid

* wiggler stats

* fix completion between waitServerListChange and storageRecruiter

* solve review comments

* rename system key

* fix database lock timeout by adding lock_aware

* format code

* status json

* resolve code format/naming comments

* delete expireNow; change PerpetualStorageWiggleID's value to KeyBackedObjectMap<UID, StorageWiggleValue>

* fix omit start rount

* format code

* status json reset

* solve status json format

* improve status json latency; replace binarywriter/reader to objectwriter/reader; refactor storagewigglerstats transactions

* status timestamp
2022-02-04 15:04:30 -08:00
Ray Jenkins dd45805312
Merge branch 'apple:main' into threadname-issue-6064 2022-02-01 17:40:07 -06:00
Ata E Husain Bohra 591ef57857
Upgrade AES 128 GCM -> AES 256, StreamCipher code refactor (#6314)
* Upgrade AES 128 GCM -> AES 256, StreamCipher code refactor

Major changes proposed are:
1. Refactor StreamCipher code to enable instantiation of
   multiple encryption keys. However, code still retains
   a globalEncryption key semantics used in Backup file
   encryption usecase.
2. Enhance StreamCipher to provide HMAC signature digest
   generation. Further, the class implements HMAC encryption
   key derivation function.
3. Upgrade StreamCipher to use AES 256 GCM mode from currently
   supported AES 128 GCM mode.
   Note: The code changes the encryption key size, however, the
         feature is NOT currently in use, hence, should be OK.
3. Add EncryptionOps validation and benchmark toml supported
   workload, it does the following:
   a. Allow user to configure encrypt-decrypt of a fixed size
      buffer or variable size buffer [100, 512K]
   b. Allow user to configure number of interactions of the runs,
      in each iteration: generate random data, derive an encryption
      key using HMAC SHA256 method, encrypt data and
      then decrypt data. It collects following metrics:
    i) time taken to derive encryption key.
    ii) time taken to encrypt the buffer.
    iii) time taken to decrypt the buffer.
    iv) total bytes encrypted and/or decrypted
   c. Along with stats it basic basic validations on the encrypted
      and decrypted buffer
   d. On completion for test, records the above mentioned metrics
      in trace files.
2022-01-31 19:52:44 -06:00