mirror of https://github.com/smithy-lang/smithy-rs
115638bebb
A number of small issues in StalledStreamDetection came together to make it ineffective in practice: 1. We didn't push a throughput `0` on `Poll::Pending`. This means that if the stream goes to poll pending, the calculate throughput can never get to 0. 2. Because we always considered the _entire_ throughput log, an actually stalled stream creates a pathological case where most of the log is full of fast moving data, but we are very slow to evict it and detect slowness. With a 426 length log, it would effectively take 7 minutes for it to actually get to 0. For these two issues, I introduced a check-window (default 1 second) in the throughput log. We only consider this much data when making the calculation. To avoid undeseriable interactions with the check interval, I reduced the check interval to 500ms. I think this is likely to be better in practice at detecting stalled streams. As a coincidence, this makes all the math exact so the tests are now precise without floating epsilons. Finally, in an unrelated issue, the `check_interval` wasn't actually being used in the call to sleep, I fixed that as well and added a test. In doing this, I tried to also simplify sine wave test into a test that could be logically reasoned about. The previous test was producing nonsense data (integer input to `sine` which is in radians as one example). I replaced it with several tests that send data at one rate, then suddenly stop sending data, simulating a stalled stream. These have the nice property that it's easy to determine mathematically when the stream should stall. Side note: It's possible these bugs are what was making it not work for request bodies? Needs more investigation. I didn't rerun that test. After those issues were resolved, I was a little concerned about performance when maxing out the network connection. A quick benchmark showed there was a small regression. When data is flowing normally, the 426 item buffer is usually shorter than our default check window of 1 second. This allows for an optimization where we track the current total amount of data in the buffer, allowing us to return the calculated throughput in constant time. During writing of the test, I discovered that we didn't actually expose `Throughput` so I took that as an opportunity to change it to use `u64` instead of `f64` for it's byte count representation. ## Testing - [x] Manual test of downloading a file then turning off the wifi. Verify that the connection is aborted within the grace period. ## Checklist <!--- If a checkbox below is not applicable, then please DELETE it rather than leaving it unchecked --> - [ ] I have updated `CHANGELOG.next.toml` if I made changes to the smithy-rs codegen or runtime crates - [ ] I have updated `CHANGELOG.next.toml` if I made changes to the AWS SDK, generated SDK code, or SDK runtime crates ---- _By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice._ |
||
---|---|---|
.. | ||
aws-smithy-async | ||
aws-smithy-checksums | ||
aws-smithy-client | ||
aws-smithy-eventstream | ||
aws-smithy-http | ||
aws-smithy-http-auth | ||
aws-smithy-http-server | ||
aws-smithy-http-server-python | ||
aws-smithy-http-server-typescript | ||
aws-smithy-http-tower | ||
aws-smithy-json | ||
aws-smithy-protocol-test | ||
aws-smithy-query | ||
aws-smithy-runtime | ||
aws-smithy-runtime-api | ||
aws-smithy-types | ||
aws-smithy-types-convert | ||
aws-smithy-xml | ||
inlineable | ||
.gitignore | ||
Cargo.toml | ||
build.gradle.kts | ||
clippy.toml |