Commit Graph

17 Commits

Author SHA1 Message Date
Ethan Vizitei 1004e66540 get sentry into canvas
closes CNVS-6016

No more error reports!  (soon)

this commit builds up sentry integration through the new
Canvas::Errors module, along with other things that need
to happen on every exception.  ErrorReports
should now get pushed towards just being used for representing
a complaint a user filed via the get help form.

I fixed about half the things that got linted as well
while I was in here, but because this touches to much
I fear divergence from tackling too many (I think we
can safely say it's "better than we found it")

I left a lot of the infrastructure for error reports in place
until other commits for plugins can be merged

TEST PLAN:
 1) setup your raven.yml config file with the dsn for our
  sentry install
 2) force an error to happen in a request response cycle.
 3) see the error in sentry
 4) force an error to happen in a job
 5) see the error in sentry
 6) statsd increments shoudl still fire
 7) for the moment, an error report should still get created.

Change-Id: I5a9dc7214598f8d5083451fd15f0423f8f939034
Reviewed-on: https://gerrit.instructure.com/51621
Reviewed-by: Simon Williams <simon@instructure.com>
Reviewed-by: Brian Palmer <brianp@instructure.com>
Tested-by: Jenkins
QA-Review: August Thornton <august@instructure.com>
Product-Review: Ethan Vizitei <evizitei@instructure.com>
2015-04-13 22:26:15 +00:00
Brian Palmer 71f0ea203b don't count ignored redis errors in stats
refs CNVS-15570

test plan: trigger an ignored redis error, such as running
`Canvas.redis.renamenx('no-such-key', 'new-key')` in console, and verify
that rails doesn't log a 'Failure handling redis' message, or send
anything to statsd.

Change-Id: Ie4480fcf053d626ba8dbdf26672b18815267fa86
Reviewed-on: https://gerrit.instructure.com/41805
Reviewed-by: Cody Cutrer <cody@instructure.com>
Tested-by: Jenkins <jenkins@instructure.com>
Product-Review: Brian Palmer <brianp@instructure.com>
QA-Review: Brian Palmer <brianp@instructure.com>
2014-09-25 23:57:25 +00:00
Brian Palmer 9b52af22df be more specific about max # of clients redis error
Matching on "ERR" isn't enough, as some normal operations such as
renaming a non-existent key return a CommandError with "ERR" in the
description.

fixes CNVS-15570

test plan:
* Create a new assignment or update the due date of an existing
  assignment as a teacher. Make sure to publish the assignment.
* Go to the course dashboard / home page as both the student and
  teacher. The new assignment should show up in the recent activity.

Change-Id: Ia3c48d2bdb7e0efefb40e97af90db472a7799953
Reviewed-on: https://gerrit.instructure.com/41752
Reviewed-by: Cody Cutrer <cody@instructure.com>
Tested-by: Jenkins <jenkins@instructure.com>
QA-Review: Amber Taniuchi <amber@instructure.com>
Product-Review: Brian Palmer <brianp@instructure.com>
2014-09-25 20:30:00 +00:00
Brian Palmer 833124e8ce handle redis "command errors" that are transient
Some command errors aren't application logic errors, but things such as
"max number of clients reached", and so should be treated like a
connection failure. Since it's the same exception class, the best we can
do is match by error message.

fixes CNVS-14979

test plan: to really test the failure case, you'll have to limit max
clients. i did this locally by:

* start up canvas
* edit the redis config to set maxclients to 1
* restart redis
* open a redis-cli and execute 'get x' or something to grab the 1 conn
* use the canvas web ui or console and verify that you don't get an
  error, rather the cache is blacklisted

Change-Id: I64360d165575ab0ef54c9c6d08dec8aa1afebad4
Reviewed-on: https://gerrit.instructure.com/40051
Tested-by: Jenkins <jenkins@instructure.com>
QA-Review: August Thornton <august@instructure.com>
Reviewed-by: Jacob Fugal <jacob@instructure.com>
Product-Review: Brian Palmer <brianp@instructure.com>
2014-08-29 17:41:28 +00:00
Brian Palmer a8760a13ce rescue system errors like Errno::ETIMEDOUT in redis
turns out the redis client doesn't always translate these into
Redis::BaseConnectionError -- specifically, we're seeing
Errno::ETIMEDOUT on writes

Change-Id: Ifa040f2768a1035a8a7e7557cd5642928e025bd3
Reviewed-on: https://gerrit.instructure.com/39899
Reviewed-by: Cody Cutrer <cody@instructure.com>
Tested-by: Jenkins <jenkins@instructure.com>
Product-Review: Brian Palmer <brianp@instructure.com>
QA-Review: Brian Palmer <brianp@instructure.com>
2014-08-25 20:14:39 +00:00
Nick Cloward 0216ac2018 extract canvas statsd gem
fixes: CNVS-11605

Change-Id: I44d708d77014d6c4d0f8d0b2f7bcedcdeb307829
Reviewed-on: https://gerrit.instructure.com/31261
QA-Review: August Thornton <august@instructure.com>
Tested-by: Jenkins <jenkins@instructure.com>
Reviewed-by: Nick Cloward <ncloward@instructure.com>
Product-Review: Nick Cloward <ncloward@instructure.com>
2014-03-14 15:03:23 +00:00
Cody Cutrer fd3e873a67 fix handling of redis del during blacklist of single node in ring
test plan:
 * configure a ring of redis cache servers
 * make sure at least one is not accessibly
 * login to canvas
 * it should not error

Change-Id: I46c464649438225f080f7097dbfc996260aea6cd
Reviewed-on: https://gerrit.instructure.com/28470
Tested-by: Jenkins <jenkins@instructure.com>
QA-Review: August Thornton <august@instructure.com>
Reviewed-by: Brian Palmer <brianp@instructure.com>
Product-Review: Cody Cutrer <cody@instructure.com>
2014-01-16 21:07:23 +00:00
Cody Cutrer d935ab98b1 deprecate Setting.get_cached
now that we have SIGHUP, we were changing everything to it anyway,
so just let caching in-proc be the default

Change-Id: Id1b44722522ac9693b17695da7107c99a359d5ac
Reviewed-on: https://gerrit.instructure.com/25020
Reviewed-by: Cody Cutrer <cody@instructure.com>
Product-Review: Cody Cutrer <cody@instructure.com>
QA-Review: Cody Cutrer <cody@instructure.com>
Tested-by: Jenkins <jenkins@instructure.com>
2013-10-10 00:42:52 +00:00
Brian Palmer 5cb135741a statsd: escape "." to avoid creating graphite folders
The redis_name is usually a fqdn, which was creating very annoying
folder structures

Change-Id: If8aecd3a523673321406fedc8769a92eee6d1a78
Reviewed-on: https://gerrit.instructure.com/22744
Tested-by: Jenkins <jenkins@instructure.com>
Reviewed-by: Cody Cutrer <cody@instructure.com>
Product-Review: Brian Palmer <brianp@instructure.com>
QA-Review: Brian Palmer <brianp@instructure.com>
2013-07-31 21:45:50 +00:00
Cody Cutrer 86af8ba7ac keep track of redis failures per server
fixes CNVS-7021

test plan:
 * have two separete redis servers (one being localhost and one being
   soemthing that does exist is sufficient) configured in
   cache_store.yml
 * make sure one is inaccessible (i.e. it doesn't exist)
 * run canvas. always reload every page. inspect your logs - on the
   second request, approximately half of the cache lines should be
   a cache hit, and half a cache miss
 * you can be more fine grained by doing Rails.cache.write('key',
   true); Rails.cache.fetch('key') in script/console for different
   keys. Half of the time it should return true, and half of the time
   it should return nil.

Change-Id: I85898e9ac5e01c01d042ce7340ad463865a0ba73
Reviewed-on: https://gerrit.instructure.com/22661
Tested-by: Jenkins <jenkins@instructure.com>
Reviewed-by: Jacob Fugal <jacob@instructure.com>
Reviewed-by: Brian Palmer <brianp@instructure.com>
QA-Review: Jeremy Putnam <jeremyp@instructure.com>
Product-Review: Cody Cutrer <cody@instructure.com>
2013-07-30 18:25:05 +00:00
Brian Palmer 24b1b3036a setting to raise on redis error, rather than ignore
Change-Id: Iee475ef727ee087062f2f4cd84579c85ff5fca5a
Reviewed-on: https://gerrit.instructure.com/13591
Reviewed-by: Zach Wily <zach@instructure.com>
Tested-by: Brian Palmer <brianp@instructure.com>
2012-09-10 16:50:31 -06:00
Brian Palmer e66fa507cf update the redis gem to 3.0.1
This required building our own fork of the redis-store gem so that we
could update its dependency, and fix one small issue with redis connect
strings getting nil instead of the default value for the port number.

The redis 3.0.x gem now catches all Errno and Timeout errors and
re-raises them as subclasses of Redis::BaseConnectionError. It also now
handles EAGAIN internally, retrying when appropriate. So we've modified
our redis failure handling code to match.

test plan: verify the redis failure handling code still works (specs
pass). for instance, stop redis locally and see that canvas works in the
degraded state. make sure that redis still works for both caching and
non-caching code such as login attempts.

Change-Id: I9e8d3929afa06c522656d30f71efc0427e4ef7cc
Reviewed-on: https://gerrit.instructure.com/11521
Tested-by: Jenkins <jenkins@instructure.com>
Reviewed-by: Cody Cutrer <cody@instructure.com>
2012-07-10 09:42:44 -06:00
Brian Palmer 2211a8effa log exception on redis failure recovery
test plan: turn redis caching on but don't have redis running, hit a
page, verify the generated error

Change-Id: Iddd525ed468abdcf0cce2dba9becc65e2d5aaa84
Reviewed-on: https://gerrit.instructure.com/9309
Reviewed-by: Cody Cutrer <cody@instructure.com>
Tested-by: Hudson <hudson@instructure.com>
2012-03-08 15:39:34 -07:00
Brian Palmer 29a916c8b1 rescue and retry EAGAIN for redis failures
Change-Id: I7dea77ed6aeb4f69ac9166ff980e182b01852b9f
Reviewed-on: https://gerrit.instructure.com/9005
Tested-by: Hudson <hudson@instructure.com>
Reviewed-by: Zach Wily <zach@instructure.com>
2012-03-01 08:57:03 -07:00
Brian Palmer 23a9facbee handle Timeout::Error in redis caching
Hook into the redis library at a pretty low level, to try and do
everything we can to avoid erroring if redis goes down. This applies to
both redis-as-cache and redis-as-data-store.

test plan: Set up redis and caching in your local instance. Point it to
both an existing box on a port not running redis, and a non-existent IP.
In both situations, you should not see caching errors or redis data
errors. After the first error, it shouldn't attempt to hit redis again for 5
minutes.

Change-Id: I101b2d3d2123151b244eb82ba78b176ed1f4d5ad
Reviewed-on: https://gerrit.instructure.com/8097
Tested-by: Hudson <hudson@instructure.com>
Reviewed-by: Cody Cutrer <cody@instructure.com>
2012-01-17 13:13:50 -07:00
Brian Palmer 20d6180dc4 enforce nonce and timestamp in lti outcome requests
This uses redis to store the nonces as locks that expire after 90
minutes. Timestamps are epoch UTC values, as per the oauth spec.

testplan: send oauth requests to the api endpoint with the same nonce
more than once, or with a too-old timestamp

refs #5892

Change-Id: Id6130c2a07e206dad716673aa6adbe9d36565a7c
Reviewed-on: https://gerrit.instructure.com/6683
Tested-by: Hudson <hudson@instructure.com>
Reviewed-by: Brian Whitmer <brian@instructure.com>
2011-11-04 09:42:51 -06:00
Brian Palmer 630200c32e support redis as well as memcache for a rails cache store
closes #4498

Change-Id: Icf29882d8c0d351574496ba0494c1d8c518a3e7f
Reviewed-on: https://gerrit.instructure.com/4580
Tested-by: Hudson <hudson@instructure.com>
Reviewed-by: Zach Wily <zach@instructure.com>
2011-07-20 14:59:03 -06:00