Commit Graph

95 Commits

Author SHA1 Message Date
Markus Pilman 03d913a1de Flow compiling 2022-06-27 17:05:55 -06:00
Markus Pilman 10e478dfc3 Flow is compiling 2022-06-23 16:35:19 -06:00
Markus Pilman d35445a868 enforce include modularization in cmake 2022-06-23 14:37:35 -06:00
Andrew Noyes 83aceb216c
Use absl::GetStackTrace for slow task profiler (#7374)
* Make SlowTask workload runnable in joshua

* Remove SignalSafeUnwind, and use absl::GetStackTrace for slow task profiler
2022-06-15 14:53:52 -07:00
Mohamed Oulmahdi 26ad8bc184 Format 2022-05-31 14:17:41 +02:00
Mohamed Oulmahdi 81e81c6799 Fix atomicPath and abspath for Windows 2022-05-31 12:01:39 +02:00
Ata E Husain Bohra 33ae398268
REST KmsConnector implementation (#6994)
* REST KmsConnector implementation

Description
  diff-1: Address review comments.
          Add utility interface to Platform namespace to
          create and operate on tmpfile
 diff-2: Address review comments
         Link Boost::filesystem to CMake build process

Major changes includes:
1. Implement REST based KmsConnector implementation.
2. Salient features of the connector:
 2.1. Two required configuration are:
   a. Discovery KMS URLs - enable KMS discovery on bootstrap
   b. Endpoint path configuration to construct URI to fetch/refresh
      encryption keys
   c. Configuration to provide "validationTokens" to connect with
      external KMS. Patch implements file-based token validation scheme.
 2.2. On startup, RESTKmsConnector discovers KMS Urls and caches
      them in-memory. Extracts "validationTokens" based on input config.
 2.3. Expose endpoints to allow fetch/refresh of encryption keys.
 2.4. Defines JSON format to interact with external KMS - request &
      response payload format.
3. Extend Platform namespace with an interface to create and operate on
   tmp files.
4. Update Platform 'readFileBytes' and 'writeFileBytes' to leverage
   fstream supported implementation.

NOTE: KMS URLs fetched after initial discovery will be persisted using
      DynamicKnobs. It is TODO at the moment and shall be completed
      once DynamicKnobs is feature complete

Testing

Unit test to validation following:
1. Parsing on "validation tokens" logic.
2. Construction and parsing of REST JSON request and response strings.
2022-05-07 13:18:35 -07:00
sfc-gh-tclinkenbeard 258ba462e1 Remove !defined(_WIN32) guards for encryption code 2022-05-03 09:48:24 -07:00
sfc-gh-tclinkenbeard 7f05221cfe Removed TLS_DISABLED macro 2022-05-02 22:15:27 -07:00
Andrew Noyes 7ed82c1ac5
Mac m1 has 16k pages (#7038)
Previously the page guard implementation assumed that the page size was
4k. Also check for mmap and mprotect returning errors.
2022-05-02 14:24:43 -07:00
Andrew Noyes 297d831192
Put guard pages next to fast alloc memory (#6885)
* Put guard pages next to fast alloc memory

I verified that we can now detect #6753 without creating tons of
threads.

* Use pageSize instead of 4096

* Don't include mmapInternal for windows
2022-04-19 11:22:35 -07:00
Yi Wu 994b8c92f8
Add option to limit resident memory and remove default memory limit (#6719)
Changing `memory` option to limit resident memory instead of virtual memory, in config file and fdbserver/fdbbackup/fdbcli command-line argument. Since `rlimit` doesn't support limiting virtual memory, the current implementation have both of fdbmonitor and the fdbserver/fdbbackup process checking process RSS periodically and kill and restart the process if the limit is exceeded.

Adding a new `memory_vsize` option to limit virtual memory, if backward-compatible behavior is desired.

closes #6671, closes #6672
2022-04-06 20:06:24 -07:00
Steve Atherton 38190ad7e7
Merge pull request #6737 from sfc-gh-satherton/fix-storage-timestamps
Change storage metadata and perpetual wiggle timestamps to double epoch seconds
2022-04-02 09:47:23 -07:00
Chaoguang Lin 7d365bd1bb
Remote ikvs debugging (#6465)
* initial structure for remote IKVS server

* moved struct to .h file, added new files to CMakeList

* happy path implementation, connection error when testing

* saved minor local change

* changed tracing to debug

* fixed onClosed and getError being called before init is finished

* fix spawn process bug, now use absolute path

* added server knob to set ikvs process port number

* added server knob for remote/local kv store

* implement simulator remote process spawning

* fixed bug for simulator timeout

* commit all changes

* removed print lines in trace

* added FlowProcess implementation by Markus

* initial debug of FlowProcess, stuck at parent sending OpenKVStoreRequest to child

* temporary fix for process factory throwing segfault on create

* specify public address in command

* change remote kv store knob to false for jenkins build

* made port 0 open random unused port

* change remote store knob to true for benchmark

* set listening port to randomly opened port

* added print lines for jenkins run open kv store timeout debug

* removed most tracing and print lines

* removed tutorial changes

* update handleIOErrors error handling to handle remote-ikvs cases

* Push all debugging changes

* A version where worker bug exists

* A version where restarting tests fail

* Use both the name and the port to determine the child process

* Remove unnecessary update on local address

* Disable remote-kvs for DiskFailureCycle test

* A version where restarting stuck

* A version where most restarting tests green

* Reset connection with child process explicitly

* Remove change on unnecessary files

* Unify flags from _ to -

* fix merging unexpected changes

* fix trac.error to .errorUnsuppressed

* Add license header

* Remove unnecessary header in FlowProcess.actor.cpp

* Fix Windows build

* Fix Windows build, add missing ;

* Fix a stupid bug caused by code dropped by code merging

* Disable remote kvs by default

* Pass the conn_file path to the flow process, though not needed, but the buildNetwork is difficult to tune

* serialization change on readrange

* Update traces

* Refactor the RemoteIKVS interface

* Format files

* Update sim2 interface to not clog connections between parent and child processes in simulation

* Update comments; remove debugging symbols; Add error handling for remote_kvs_cancelled

* Add comments, format files

* Change method name from isBuggifyDisabled to isStableConnection; Decrease(0.1x) latency for stable connections

* Commit the IConnection interface change, forgot in previous commit

* Fix the issue that onClosed request is cancelled by ActorCollection

* Enable the remote kv store knob

* Remove FlowProcess.actor.cpp and move functions to RemoteIKeyValueStore.actor.cpp; Add remote kv store delay to avoid race; Bind the child process to die with parent process

* Fix the bug where one process starts storage server more than once

* Add a please_reboot_remote_kv_store error to restart the storage server worker if remote kvs died abnormally

* Remove unreachable code path and add comments

* Clang format the code

* Fix a simple wait error

* Clang format after merging the main branch

* Testing mixed mode in simulation if remote_kvs knob is enabled, setting the default to false

* Disable remote kvs for PhysicalShardMove which is for RocksDB

* Cleanup #include orders, remove debugging traces

* Revert the reorder in fdbserver.actor.cpp, which fails the gcc build

Co-authored-by: “Lincoln <“lincoln.xiao@snowflake.com”>
2022-03-31 17:08:59 -07:00
Steve Atherton 6744e9e4f9 Change timestamps used in storage server metadata and perpetual wiggle metrics to epoch seconds, stored as doubles, and stringified as either floating point epoch seconds or timestamp strings of the form "2013-04-28 20:57:01.000 +0000". 2022-03-30 18:57:06 -07:00
Ata E Husain Bohra 017709aec6
Introduce BlobCipher interface and cipher caching interface (#6391)
* Introduce BlobCipher interface and cipher caching interface

 diff-3: Update the code to avoid deriving encryption key periodically.
         Implement EncyrptBuf interface to limit memcpys.
         Improve both unit test and simulation to better code coverage.
 diff-2: Add specific error code for OpenSSL AES call failures
 diff-1: Update encryption scheme to AES-256-CTR. Minor
         updates to Header to capture more information.

Major changes proposed are:
1. Introduce encyrption header format.
2. Introduce a BlobCipher cipher key representation encoding
following information: baseCipher details, derived encryption cipher
details, creationTime and random salt.
3. Introduce interface to support block cipher encrytion and decrytion
operations. Encyrption populates encryption header allowing client to
persist them on-disk, this header is then read allowing decryption
on reads.
4. Introduce interface to allow in-memory caching of cipher keys. The
cache allowing mapping of "encryption domain" -> "base cipher id" ->
"derived cipher keys" (3D hash map). This cache interface will be used
by FDB processes participating in encryption to cache recently used
ciphers (performance optimization).

Testing:
1. Unit test to validate caching interface.
2. Update EncryptionOps simulation test to validate block cipher
operations.
2022-03-24 07:31:49 -07:00
sfc-gh-tclinkenbeard a71099471b Update copyright header dates 2022-03-21 13:36:23 -07:00
Andrew Noyes a827e1cbd5 Flush all output streams in flushAndExit
Also detect reading past the end of the infile for debug_determinism tracing
2022-03-07 13:25:52 -08:00
Jingyu Zhou 1a5bf25b5c Update code base to use fmt 8.1.1 2022-03-04 15:52:06 -08:00
Renxuan Wang 233c918ffb Replace printf() and fprintf() with fmt::print(). 2022-02-25 19:06:57 -08:00
Renxuan Wang f7eb66441d Try eliminating warnings in macOS and Windows CI builds.
MacOS warnings are format warnings, e.g., `format specifies type 'long' but the argument has type 'Version' (aka 'long long')`.
Windows warnings are `ACTOR does not contain a wait() statement`.
2022-02-25 19:06:57 -08:00
Mohamed Oulmahdi 5b6098b5be Fix Windows build broken by #6449 2022-02-25 16:25:45 +01:00
A.J. Beamon 250a88e682 Enforce that trace event suppression calls happen first when using trace event call chaining. Fix various instances where we weren't following this requirement. 2022-02-24 12:25:52 -08:00
vikasgupta8 edfff755bf added support for ppc64le 2022-02-11 06:17:15 +00:00
Xiaoxi Wang 6dc5921575
createdTime based storage wiggler (#6219)
* add storagemetadata

* add StorageWiggler;

* fix serverMetadataKey bug

* add metadata tracker in storage tracker

* finish StorageWiggler

* update next storage ID

* change pid to server id

* write metadata when seed SS

* add status json fields

* remove pid based ppw iteration

* fix time expression

* fix tss metadata nonexistence; fix transaction retry when retrieving metadata

* fix checkMetadata bug when store type is wrong

* fix remove storage status json

* format code

* refactor updateNextWigglingStoragePID

* seperate storage metadata tracker and store type tracker

* rename pid

* wiggler stats

* fix completion between waitServerListChange and storageRecruiter

* solve review comments

* rename system key

* fix database lock timeout by adding lock_aware

* format code

* status json

* resolve code format/naming comments

* delete expireNow; change PerpetualStorageWiggleID's value to KeyBackedObjectMap<UID, StorageWiggleValue>

* fix omit start rount

* format code

* status json reset

* solve status json format

* improve status json latency; replace binarywriter/reader to objectwriter/reader; refactor storagewigglerstats transactions

* status timestamp
2022-02-04 15:04:30 -08:00
Ray Jenkins dd45805312
Merge branch 'apple:main' into threadname-issue-6064 2022-02-01 17:40:07 -06:00
Ata E Husain Bohra 591ef57857
Upgrade AES 128 GCM -> AES 256, StreamCipher code refactor (#6314)
* Upgrade AES 128 GCM -> AES 256, StreamCipher code refactor

Major changes proposed are:
1. Refactor StreamCipher code to enable instantiation of
   multiple encryption keys. However, code still retains
   a globalEncryption key semantics used in Backup file
   encryption usecase.
2. Enhance StreamCipher to provide HMAC signature digest
   generation. Further, the class implements HMAC encryption
   key derivation function.
3. Upgrade StreamCipher to use AES 256 GCM mode from currently
   supported AES 128 GCM mode.
   Note: The code changes the encryption key size, however, the
         feature is NOT currently in use, hence, should be OK.
3. Add EncryptionOps validation and benchmark toml supported
   workload, it does the following:
   a. Allow user to configure encrypt-decrypt of a fixed size
      buffer or variable size buffer [100, 512K]
   b. Allow user to configure number of interactions of the runs,
      in each iteration: generate random data, derive an encryption
      key using HMAC SHA256 method, encrypt data and
      then decrypt data. It collects following metrics:
    i) time taken to derive encryption key.
    ii) time taken to encrypt the buffer.
    iii) time taken to decrypt the buffer.
    iv) total bytes encrypted and/or decrypted
   c. Along with stats it basic basic validations on the encrypted
      and decrypted buffer
   d. On completion for test, records the above mentioned metrics
      in trace files.
2022-01-31 19:52:44 -06:00
Ray Jenkins 6a1fe2489a format fixes 2022-01-28 18:21:42 -06:00
Ray Jenkins ef4e61ee33 Use GetLastError to include errno data in trace logs. 2022-01-28 17:27:41 -06:00
Ray Jenkins f9f0fb0781 Better error handling for pthread_setname_np.
In unit and simulation testing calls to pthread_setname_np may return errors,
as the threads may complete before calls to setname can be executed. This change
adds better error handling for cases where ENOENT or ESRCH is returned during testing.
Previously the ASSERT_EQ would cause tests to fail if a non-zero return value was encountered.

This change will trace log with a SevWarn when ENOENT or ESRCH is encountered. Otherwise
it will trace with SevError and throw a platform_error.
2022-01-28 16:18:22 -06:00
Renxuan Wang 4a8e2a80e6 Improve/fix disk metrics.
1. Introduce processDiskReadSeconds and processDiskWriteSeconds, which stands for disk read/write times `since the last logging`. They can only be obtained on Linux and macOS, and will be 0 on Windows and FreeBSD;
2. Rename `busyTicks` to `IOMilliSecs`;
3. On FreeBSD, the metrics should be collected among all devices.
2022-01-27 14:40:32 -08:00
Ray Jenkins 0d34ec0880 Add RunLoopProfiler thread name. 2022-01-25 13:22:22 -06:00
He Liu 2fb5c59440 Removed deprioritizeThread(). 2022-01-18 11:10:56 -08:00
He Liu 2c0c51dd6d Enabled setting thread poll priorities. 2022-01-10 17:50:56 -08:00
A.J. Beamon 264c75b9a6 Add some extra client logging details:
1. Add a trace event when a database is created and move the cluster file / connection string from ClientStart to the new trace event
2. Add a detail for the path to the image being loaded
3. Add a detail for whether a client library is primary or not
4. Set a thread name for each external client thread that includes the release version
2021-11-29 09:57:10 -08:00
A.J. Beamon 6cf731d5aa Add comment to run loop blocked message about timestamp order 2021-10-19 10:05:29 -07:00
A.J. Beamon abab45760d Add some additional logging if the network thread finishes, fails with an error, gets stopped, or is blocked. 2021-10-19 10:05:29 -07:00
Xiaoge Su e68b131e4a fixup! Reformat source code 2021-09-16 19:40:28 -07:00
Xiaoge Su abf73047ca Enforce std:: specifier rather than using namespace 2021-09-16 19:40:28 -07:00
Lukas Joswiak 5dc9a97230 Merge branch 'master' into fixes/alp6 2021-08-01 20:42:52 -07:00
sfc-gh-tclinkenbeard c74047c665 Merge remote-tracking branch 'origin/master' into fix-more-clang-warnings 2021-07-28 11:51:02 -07:00
Lukas Joswiak 3eed4084e2 Merge branch 'master' into fixes/alp6 2021-07-27 11:26:53 -07:00
Lukas Joswiak 59d535149e Merge branch 'master' into fixes/alp6 2021-07-27 10:07:18 -07:00
Steve Atherton 507c1f11e3 Add .log() to bare TraceEvent() invocations without any .detail()s to avoid clang-tidy warning about immediate destruction of object without use. 2021-07-26 19:55:10 -07:00
sfc-gh-tclinkenbeard e006e4fed4 Fix -Wreorder-ctor warnings in LogSystemPeekCursor.actor.cpp and several other files 2021-07-24 00:48:13 -07:00
sfc-gh-tclinkenbeard 64dc1dc185 Fix -Wreorder-ctor warnings in NativeAPI.actor.cpp and several other files 2021-07-24 00:23:06 -07:00
sfc-gh-tclinkenbeard b9a22a61ef Fix many -Wreorder-ctor warnings 2021-07-23 17:33:18 -07:00
Trevor Clinkenbeard f5ade03538
Merge pull request #4233 from sfc-gh-tclinkenbeard/encrypt-backup-files
Added AsyncFileEncrypted
2021-07-07 13:28:28 -07:00
sfc-gh-tclinkenbeard a73570e538 Added comment for crashHandler 2021-06-29 11:00:05 -07:00
sfc-gh-tclinkenbeard 8855c7ee8d Set error kind to BugDetected for Crash trace events 2021-06-27 16:47:54 -07:00