smithy-rs/rust-runtime
Russell Cohen 115638bebb
Fix stalled stream detection (#3238)
A number of small issues in StalledStreamDetection came together to make
it ineffective in practice:
1. We didn't push a throughput `0` on `Poll::Pending`. This means that
if the stream goes to poll pending, the calculate throughput can never
get to 0.
2. Because we always considered the _entire_ throughput log, an actually
stalled stream creates a pathological case where most of the log is full
of fast moving data, but we are very slow to evict it and detect
slowness. With a 426 length log, it would effectively take 7 minutes for
it to actually get to 0.

For these two issues, I introduced a check-window (default 1 second) in
the throughput log. We only consider this much data when making the
calculation. To avoid undeseriable interactions with the check interval,
I reduced the check interval to 500ms. I think this is likely to be
better in practice at detecting stalled streams.

As a coincidence, this makes all the math exact so the tests are now
precise without floating epsilons.

Finally, in an unrelated issue, the `check_interval` wasn't actually
being used in the call to sleep, I fixed that as well and added a test.

In doing this, I tried to also simplify sine wave test into a test that
could be logically reasoned about. The previous test was producing
nonsense data (integer input to `sine` which is in radians as one
example). I replaced it with several tests that send data at one rate,
then suddenly stop sending data, simulating a stalled stream. These have
the nice property that it's easy to determine mathematically when the
stream should stall.

Side note: It's possible these bugs are what was making it not work for
request bodies? Needs more investigation. I didn't rerun that test.

After those issues were resolved, I was a little concerned about
performance when maxing out the network connection. A quick benchmark
showed there was a small regression. When data is flowing normally, the
426 item buffer is usually shorter than our default check window of 1
second. This allows for an optimization where we track the current total
amount of data in the buffer, allowing us to return the calculated
throughput in constant time.

During writing of the test, I discovered that we didn't actually expose
`Throughput` so I took that as an opportunity to change it to use `u64`
instead of `f64` for it's byte count representation.

## Testing
- [x] Manual test of downloading a file then turning off the wifi.
Verify that the connection is aborted within the grace period.

## Checklist
<!--- If a checkbox below is not applicable, then please DELETE it
rather than leaving it unchecked -->
- [ ] I have updated `CHANGELOG.next.toml` if I made changes to the
smithy-rs codegen or runtime crates
- [ ] I have updated `CHANGELOG.next.toml` if I made changes to the AWS
SDK, generated SDK code, or SDK runtime crates

----

_By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice._
2023-11-20 21:32:08 -05:00
..
aws-smithy-async Remove #[doc(hidden)] from stable crates (#3226) 2023-11-17 14:34:17 -06:00
aws-smithy-checksums Fix repo org move issues (#3166) 2023-11-10 18:51:04 +00:00
aws-smithy-client Fix repo org move issues (#3166) 2023-11-10 18:51:04 +00:00
aws-smithy-eventstream Fix repo org move issues (#3166) 2023-11-10 18:51:04 +00:00
aws-smithy-http Feature-gate http versions in aws-smithy-runtime-api (#3236) 2023-11-17 15:00:56 -08:00
aws-smithy-http-auth Fix repo org move issues (#3166) 2023-11-10 18:51:04 +00:00
aws-smithy-http-server Feature-gate http versions in aws-smithy-runtime-api (#3236) 2023-11-17 15:00:56 -08:00
aws-smithy-http-server-python Fix repo org move issues (#3166) 2023-11-10 18:51:04 +00:00
aws-smithy-http-server-typescript Fix repo org move issues (#3166) 2023-11-10 18:51:04 +00:00
aws-smithy-http-tower Fix repo org move issues (#3166) 2023-11-10 18:51:04 +00:00
aws-smithy-json Fix repo org move issues (#3166) 2023-11-10 18:51:04 +00:00
aws-smithy-protocol-test Fix repo org move issues (#3166) 2023-11-10 18:51:04 +00:00
aws-smithy-query Fix repo org move issues (#3166) 2023-11-10 18:51:04 +00:00
aws-smithy-runtime Fix stalled stream detection (#3238) 2023-11-20 21:32:08 -05:00
aws-smithy-runtime-api Feature-gate http versions in aws-smithy-runtime-api (#3236) 2023-11-17 15:00:56 -08:00
aws-smithy-types Remove #[doc(hidden)] from stable crates (#3226) 2023-11-17 14:34:17 -06:00
aws-smithy-types-convert Fix repo org move issues (#3166) 2023-11-10 18:51:04 +00:00
aws-smithy-xml Fix repo org move issues (#3166) 2023-11-10 18:51:04 +00:00
inlineable Remove deprecations from rust-runtime (#3222) 2023-11-17 14:34:17 -06:00
.gitignore Partial HTTP protocol implementation (#1) 2020-10-29 15:49:22 -04:00
Cargo.toml Minimum throughput body timeouts Pt.1 (#3068) 2023-10-26 20:10:24 +00:00
build.gradle.kts Update release tooling to handle both stable and unstable crates (#3082) 2023-11-08 04:03:46 +00:00
clippy.toml Add clippy.toml with forbidden methods & fix SystemTime usages (#2882) 2023-07-28 17:16:44 +00:00