Add training support for nearest interpolation
---------
Co-authored-by: yurzhang <yurzhang.oi@gmail.com>
Co-authored-by: Dilshod Tadjibaev <939125+antimora@users.noreply.github.com>
* Add int_random to int tensor ops
* Int random for tch backend
* Int random for burn-fusion
* int random for autodiff
* Int random for candle backend
* Int random for ndarray backend
* Int random for wgpu backend
* Merge imports
* Typo
* Shader file for int uniform distribution
* Create AutotuneOperationSet and public int_sum_dim_autotune
* Adjust bounds to 0..10
* Create uniform_int_kernel, unit tests, use new kernel
* Reduction kernels for regular and shared memory sum_dim int operations
* Macro that accomadates wgpu IntElement
* Add autotuning to int_mean_dim
* Use correct macro for Int autotuning
* Add int_mean_dim_shared_memory
* Add int_mean_dim and unit test
* Create autotunables for mean_dim
* Run fmt
* Remove comment
* Finish resolving merge conflict, fix doc
* Make the element trait bound a parameter to reduce_tune_ops macro
* Update book
* Fix requested change
* Change range to [0, 255] and update test accordingly
* Forgot to include candle in last commit
* Fix comment
* Use correct int autotune for mean dim
* Fix typo- not sure how this passed earlier
* Resolve syntax issues from merge
* Fix cast_float
* Saving here
* Continue fixing merge conflicts, all tests pass locally
* Run fmt
* Change cast_float to cast_u32_to_float
* Make uniform_int_inner_loop safer
* Be even more explicit about u32 casts
* Skip an intermediate step and cast directly to u32
* Replace JitElement + Element with IntElement
* Run fmt
* This should fix the CI
* This time for sure