OpenCloudOS-Kernel/net/sctp/sysctl.c

512 lines
12 KiB
C
Raw Normal View History

// SPDX-License-Identifier: GPL-2.0-or-later
/* SCTP kernel implementation
* (C) Copyright IBM Corp. 2002, 2004
* Copyright (c) 2002 Intel Corp.
*
* This file is part of the SCTP kernel implementation
*
* Sysctl related interfaces for SCTP.
*
* Please send any bug reports or fixes you make to the
* email address(es):
* lksctp developers <linux-sctp@vger.kernel.org>
*
* Written or modified by:
* Mingqin Liu <liuming@us.ibm.com>
* Jon Grimm <jgrimm@us.ibm.com>
* Ardelle Fan <ardelle.fan@intel.com>
* Ryan Layer <rmlayer@us.ibm.com>
* Sridhar Samudrala <sri@us.ibm.com>
*/
net: sctp: fix permissions for rto_alpha and rto_beta knobs Commit 3fd091e73b81 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.") has silently changed permissions for rto_alpha and rto_beta knobs from 0644 to 0444. The purpose of this was to discourage users from tweaking rto_alpha and rto_beta knobs in production environments since they are key to correctly compute rtt/srtt. RFC4960 under section 6.3.1. RTO Calculation says regarding rto_alpha and rto_beta under rule C3 and C4: [...] C3) When a new RTT measurement R' is made, set RTTVAR <- (1 - RTO.Beta) * RTTVAR + RTO.Beta * |SRTT - R'| and SRTT <- (1 - RTO.Alpha) * SRTT + RTO.Alpha * R' Note: The value of SRTT used in the update to RTTVAR is its value before updating SRTT itself using the second assignment. After the computation, update RTO <- SRTT + 4 * RTTVAR. C4) When data is in flight and when allowed by rule C5 below, a new RTT measurement MUST be made each round trip. Furthermore, new RTT measurements SHOULD be made no more than once per round trip for a given destination transport address. There are two reasons for this recommendation: First, it appears that measuring more frequently often does not in practice yield any significant benefit [ALLMAN99]; second, if measurements are made more often, then the values of RTO.Alpha and RTO.Beta in rule C3 above should be adjusted so that SRTT and RTTVAR still adjust to changes at roughly the same rate (in terms of how many round trips it takes them to reflect new values) as they would if making only one measurement per round-trip and using RTO.Alpha and RTO.Beta as given in rule C3. However, the exact nature of these adjustments remains a research issue. [...] While it is discouraged to adjust rto_alpha and rto_beta and not further specified how to adjust them, the RFC also doesn't explicitly forbid it, but rather gives a RECOMMENDED default value (rto_alpha=3, rto_beta=2). We have a couple of users relying on the old permissions before they got changed. That said, if someone really has the urge to adjust them, we could allow it with a warning in the log. Fixes: 3fd091e73b81 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-15 06:59:14 +08:00
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <net/sctp/structs.h>
#include <net/sctp/sctp.h>
#include <linux/sysctl.h>
static int timer_max = 86400000; /* ms in one day */
static int sack_timer_min = 1;
static int sack_timer_max = 500;
static int addr_scope_max = SCTP_SCOPE_POLICY_MAX;
static int rwnd_scale_max = 16;
net: sctp: fix permissions for rto_alpha and rto_beta knobs Commit 3fd091e73b81 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.") has silently changed permissions for rto_alpha and rto_beta knobs from 0644 to 0444. The purpose of this was to discourage users from tweaking rto_alpha and rto_beta knobs in production environments since they are key to correctly compute rtt/srtt. RFC4960 under section 6.3.1. RTO Calculation says regarding rto_alpha and rto_beta under rule C3 and C4: [...] C3) When a new RTT measurement R' is made, set RTTVAR <- (1 - RTO.Beta) * RTTVAR + RTO.Beta * |SRTT - R'| and SRTT <- (1 - RTO.Alpha) * SRTT + RTO.Alpha * R' Note: The value of SRTT used in the update to RTTVAR is its value before updating SRTT itself using the second assignment. After the computation, update RTO <- SRTT + 4 * RTTVAR. C4) When data is in flight and when allowed by rule C5 below, a new RTT measurement MUST be made each round trip. Furthermore, new RTT measurements SHOULD be made no more than once per round trip for a given destination transport address. There are two reasons for this recommendation: First, it appears that measuring more frequently often does not in practice yield any significant benefit [ALLMAN99]; second, if measurements are made more often, then the values of RTO.Alpha and RTO.Beta in rule C3 above should be adjusted so that SRTT and RTTVAR still adjust to changes at roughly the same rate (in terms of how many round trips it takes them to reflect new values) as they would if making only one measurement per round-trip and using RTO.Alpha and RTO.Beta as given in rule C3. However, the exact nature of these adjustments remains a research issue. [...] While it is discouraged to adjust rto_alpha and rto_beta and not further specified how to adjust them, the RFC also doesn't explicitly forbid it, but rather gives a RECOMMENDED default value (rto_alpha=3, rto_beta=2). We have a couple of users relying on the old permissions before they got changed. That said, if someone really has the urge to adjust them, we could allow it with a warning in the log. Fixes: 3fd091e73b81 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-15 06:59:14 +08:00
static int rto_alpha_min = 0;
static int rto_beta_min = 0;
static int rto_alpha_max = 1000;
static int rto_beta_max = 1000;
static unsigned long max_autoclose_min = 0;
static unsigned long max_autoclose_max =
(MAX_SCHEDULE_TIMEOUT / HZ > UINT_MAX)
? UINT_MAX : MAX_SCHEDULE_TIMEOUT / HZ;
static int proc_sctp_do_hmac_alg(struct ctl_table *ctl, int write,
void __user *buffer, size_t *lenp,
loff_t *ppos);
static int proc_sctp_do_rto_min(struct ctl_table *ctl, int write,
void __user *buffer, size_t *lenp,
loff_t *ppos);
static int proc_sctp_do_rto_max(struct ctl_table *ctl, int write,
void __user *buffer, size_t *lenp,
loff_t *ppos);
net: sctp: fix permissions for rto_alpha and rto_beta knobs Commit 3fd091e73b81 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.") has silently changed permissions for rto_alpha and rto_beta knobs from 0644 to 0444. The purpose of this was to discourage users from tweaking rto_alpha and rto_beta knobs in production environments since they are key to correctly compute rtt/srtt. RFC4960 under section 6.3.1. RTO Calculation says regarding rto_alpha and rto_beta under rule C3 and C4: [...] C3) When a new RTT measurement R' is made, set RTTVAR <- (1 - RTO.Beta) * RTTVAR + RTO.Beta * |SRTT - R'| and SRTT <- (1 - RTO.Alpha) * SRTT + RTO.Alpha * R' Note: The value of SRTT used in the update to RTTVAR is its value before updating SRTT itself using the second assignment. After the computation, update RTO <- SRTT + 4 * RTTVAR. C4) When data is in flight and when allowed by rule C5 below, a new RTT measurement MUST be made each round trip. Furthermore, new RTT measurements SHOULD be made no more than once per round trip for a given destination transport address. There are two reasons for this recommendation: First, it appears that measuring more frequently often does not in practice yield any significant benefit [ALLMAN99]; second, if measurements are made more often, then the values of RTO.Alpha and RTO.Beta in rule C3 above should be adjusted so that SRTT and RTTVAR still adjust to changes at roughly the same rate (in terms of how many round trips it takes them to reflect new values) as they would if making only one measurement per round-trip and using RTO.Alpha and RTO.Beta as given in rule C3. However, the exact nature of these adjustments remains a research issue. [...] While it is discouraged to adjust rto_alpha and rto_beta and not further specified how to adjust them, the RFC also doesn't explicitly forbid it, but rather gives a RECOMMENDED default value (rto_alpha=3, rto_beta=2). We have a couple of users relying on the old permissions before they got changed. That said, if someone really has the urge to adjust them, we could allow it with a warning in the log. Fixes: 3fd091e73b81 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-15 06:59:14 +08:00
static int proc_sctp_do_alpha_beta(struct ctl_table *ctl, int write,
void __user *buffer, size_t *lenp,
loff_t *ppos);
net: sctp: cache auth_enable per endpoint Currently, it is possible to create an SCTP socket, then switch auth_enable via sysctl setting to 1 and crash the system on connect: Oops[#1]: CPU: 0 PID: 0 Comm: swapper Not tainted 3.14.1-mipsgit-20140415 #1 task: ffffffff8056ce80 ti: ffffffff8055c000 task.ti: ffffffff8055c000 [...] Call Trace: [<ffffffff8043c4e8>] sctp_auth_asoc_set_default_hmac+0x68/0x80 [<ffffffff8042b300>] sctp_process_init+0x5e0/0x8a4 [<ffffffff8042188c>] sctp_sf_do_5_1B_init+0x234/0x34c [<ffffffff804228c8>] sctp_do_sm+0xb4/0x1e8 [<ffffffff80425a08>] sctp_endpoint_bh_rcv+0x1c4/0x214 [<ffffffff8043af68>] sctp_rcv+0x588/0x630 [<ffffffff8043e8e8>] sctp6_rcv+0x10/0x24 [<ffffffff803acb50>] ip6_input+0x2c0/0x440 [<ffffffff8030fc00>] __netif_receive_skb_core+0x4a8/0x564 [<ffffffff80310650>] process_backlog+0xb4/0x18c [<ffffffff80313cbc>] net_rx_action+0x12c/0x210 [<ffffffff80034254>] __do_softirq+0x17c/0x2ac [<ffffffff800345e0>] irq_exit+0x54/0xb0 [<ffffffff800075a4>] ret_from_irq+0x0/0x4 [<ffffffff800090ec>] rm7k_wait_irqoff+0x24/0x48 [<ffffffff8005e388>] cpu_startup_entry+0xc0/0x148 [<ffffffff805a88b0>] start_kernel+0x37c/0x398 Code: dd0900b8 000330f8 0126302d <dcc60000> 50c0fff1 0047182a a48306a0 03e00008 00000000 ---[ end trace b530b0551467f2fd ]--- Kernel panic - not syncing: Fatal exception in interrupt What happens while auth_enable=0 in that case is, that ep->auth_hmacs is initialized to NULL in sctp_auth_init_hmacs() when endpoint is being created. After that point, if an admin switches over to auth_enable=1, the machine can crash due to NULL pointer dereference during reception of an INIT chunk. When we enter sctp_process_init() via sctp_sf_do_5_1B_init() in order to respond to an INIT chunk, the INIT verification succeeds and while we walk and process all INIT params via sctp_process_param() we find that net->sctp.auth_enable is set, therefore do not fall through, but invoke sctp_auth_asoc_set_default_hmac() instead, and thus, dereference what we have set to NULL during endpoint initialization phase. The fix is to make auth_enable immutable by caching its value during endpoint initialization, so that its original value is being carried along until destruction. The bug seems to originate from the very first days. Fix in joint work with Daniel Borkmann. Reported-by: Joshua Kinard <kumba@gentoo.org> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Tested-by: Joshua Kinard <kumba@gentoo.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-17 23:26:50 +08:00
static int proc_sctp_do_auth(struct ctl_table *ctl, int write,
void __user *buffer, size_t *lenp,
loff_t *ppos);
static struct ctl_table sctp_table[] = {
{
.procname = "sctp_mem",
.data = &sysctl_sctp_mem,
.maxlen = sizeof(sysctl_sctp_mem),
.mode = 0644,
.proc_handler = proc_doulongvec_minmax
},
{
.procname = "sctp_rmem",
.data = &sysctl_sctp_rmem,
.maxlen = sizeof(sysctl_sctp_rmem),
.mode = 0644,
.proc_handler = proc_dointvec,
},
{
.procname = "sctp_wmem",
.data = &sysctl_sctp_wmem,
.maxlen = sizeof(sysctl_sctp_wmem),
.mode = 0644,
.proc_handler = proc_dointvec,
},
{ /* sentinel */ }
};
static struct ctl_table sctp_net_table[] = {
{
.procname = "rto_initial",
.data = &init_net.sctp.rto_initial,
.maxlen = sizeof(unsigned int),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
proc/sysctl: add shared variables for range check In the sysctl code the proc_dointvec_minmax() function is often used to validate the user supplied value between an allowed range. This function uses the extra1 and extra2 members from struct ctl_table as minimum and maximum allowed value. On sysctl handler declaration, in every source file there are some readonly variables containing just an integer which address is assigned to the extra1 and extra2 members, so the sysctl range is enforced. The special values 0, 1 and INT_MAX are very often used as range boundary, leading duplication of variables like zero=0, one=1, int_max=INT_MAX in different source files: $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l 248 Add a const int array containing the most commonly used values, some macros to refer more easily to the correct array member, and use them instead of creating a local one for every object file. This is the bloat-o-meter output comparing the old and new binary compiled with the default Fedora config: # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164) Data old new delta sysctl_vals - 12 +12 __kstrtab_sysctl_vals - 12 +12 max 14 10 -4 int_max 16 - -16 one 68 - -68 zero 128 28 -100 Total: Before=20583249, After=20583085, chg -0.00% [mcroce@redhat.com: tipc: remove two unused variables] Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com [akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c] [arnd@arndb.de: proc/sysctl: make firmware loader table conditional] Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de [akpm@linux-foundation.org: fix fs/eventpoll.c] Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.com Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-19 06:58:50 +08:00
.extra1 = SYSCTL_ONE,
.extra2 = &timer_max
},
{
.procname = "rto_min",
.data = &init_net.sctp.rto_min,
.maxlen = sizeof(unsigned int),
.mode = 0644,
.proc_handler = proc_sctp_do_rto_min,
proc/sysctl: add shared variables for range check In the sysctl code the proc_dointvec_minmax() function is often used to validate the user supplied value between an allowed range. This function uses the extra1 and extra2 members from struct ctl_table as minimum and maximum allowed value. On sysctl handler declaration, in every source file there are some readonly variables containing just an integer which address is assigned to the extra1 and extra2 members, so the sysctl range is enforced. The special values 0, 1 and INT_MAX are very often used as range boundary, leading duplication of variables like zero=0, one=1, int_max=INT_MAX in different source files: $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l 248 Add a const int array containing the most commonly used values, some macros to refer more easily to the correct array member, and use them instead of creating a local one for every object file. This is the bloat-o-meter output comparing the old and new binary compiled with the default Fedora config: # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164) Data old new delta sysctl_vals - 12 +12 __kstrtab_sysctl_vals - 12 +12 max 14 10 -4 int_max 16 - -16 one 68 - -68 zero 128 28 -100 Total: Before=20583249, After=20583085, chg -0.00% [mcroce@redhat.com: tipc: remove two unused variables] Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com [akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c] [arnd@arndb.de: proc/sysctl: make firmware loader table conditional] Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de [akpm@linux-foundation.org: fix fs/eventpoll.c] Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.com Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-19 06:58:50 +08:00
.extra1 = SYSCTL_ONE,
.extra2 = &init_net.sctp.rto_max
},
{
.procname = "rto_max",
.data = &init_net.sctp.rto_max,
.maxlen = sizeof(unsigned int),
.mode = 0644,
.proc_handler = proc_sctp_do_rto_max,
.extra1 = &init_net.sctp.rto_min,
.extra2 = &timer_max
},
{
.procname = "rto_alpha_exp_divisor",
.data = &init_net.sctp.rto_alpha,
.maxlen = sizeof(int),
net: sctp: fix permissions for rto_alpha and rto_beta knobs Commit 3fd091e73b81 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.") has silently changed permissions for rto_alpha and rto_beta knobs from 0644 to 0444. The purpose of this was to discourage users from tweaking rto_alpha and rto_beta knobs in production environments since they are key to correctly compute rtt/srtt. RFC4960 under section 6.3.1. RTO Calculation says regarding rto_alpha and rto_beta under rule C3 and C4: [...] C3) When a new RTT measurement R' is made, set RTTVAR <- (1 - RTO.Beta) * RTTVAR + RTO.Beta * |SRTT - R'| and SRTT <- (1 - RTO.Alpha) * SRTT + RTO.Alpha * R' Note: The value of SRTT used in the update to RTTVAR is its value before updating SRTT itself using the second assignment. After the computation, update RTO <- SRTT + 4 * RTTVAR. C4) When data is in flight and when allowed by rule C5 below, a new RTT measurement MUST be made each round trip. Furthermore, new RTT measurements SHOULD be made no more than once per round trip for a given destination transport address. There are two reasons for this recommendation: First, it appears that measuring more frequently often does not in practice yield any significant benefit [ALLMAN99]; second, if measurements are made more often, then the values of RTO.Alpha and RTO.Beta in rule C3 above should be adjusted so that SRTT and RTTVAR still adjust to changes at roughly the same rate (in terms of how many round trips it takes them to reflect new values) as they would if making only one measurement per round-trip and using RTO.Alpha and RTO.Beta as given in rule C3. However, the exact nature of these adjustments remains a research issue. [...] While it is discouraged to adjust rto_alpha and rto_beta and not further specified how to adjust them, the RFC also doesn't explicitly forbid it, but rather gives a RECOMMENDED default value (rto_alpha=3, rto_beta=2). We have a couple of users relying on the old permissions before they got changed. That said, if someone really has the urge to adjust them, we could allow it with a warning in the log. Fixes: 3fd091e73b81 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-15 06:59:14 +08:00
.mode = 0644,
.proc_handler = proc_sctp_do_alpha_beta,
.extra1 = &rto_alpha_min,
.extra2 = &rto_alpha_max,
},
{
.procname = "rto_beta_exp_divisor",
.data = &init_net.sctp.rto_beta,
.maxlen = sizeof(int),
net: sctp: fix permissions for rto_alpha and rto_beta knobs Commit 3fd091e73b81 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.") has silently changed permissions for rto_alpha and rto_beta knobs from 0644 to 0444. The purpose of this was to discourage users from tweaking rto_alpha and rto_beta knobs in production environments since they are key to correctly compute rtt/srtt. RFC4960 under section 6.3.1. RTO Calculation says regarding rto_alpha and rto_beta under rule C3 and C4: [...] C3) When a new RTT measurement R' is made, set RTTVAR <- (1 - RTO.Beta) * RTTVAR + RTO.Beta * |SRTT - R'| and SRTT <- (1 - RTO.Alpha) * SRTT + RTO.Alpha * R' Note: The value of SRTT used in the update to RTTVAR is its value before updating SRTT itself using the second assignment. After the computation, update RTO <- SRTT + 4 * RTTVAR. C4) When data is in flight and when allowed by rule C5 below, a new RTT measurement MUST be made each round trip. Furthermore, new RTT measurements SHOULD be made no more than once per round trip for a given destination transport address. There are two reasons for this recommendation: First, it appears that measuring more frequently often does not in practice yield any significant benefit [ALLMAN99]; second, if measurements are made more often, then the values of RTO.Alpha and RTO.Beta in rule C3 above should be adjusted so that SRTT and RTTVAR still adjust to changes at roughly the same rate (in terms of how many round trips it takes them to reflect new values) as they would if making only one measurement per round-trip and using RTO.Alpha and RTO.Beta as given in rule C3. However, the exact nature of these adjustments remains a research issue. [...] While it is discouraged to adjust rto_alpha and rto_beta and not further specified how to adjust them, the RFC also doesn't explicitly forbid it, but rather gives a RECOMMENDED default value (rto_alpha=3, rto_beta=2). We have a couple of users relying on the old permissions before they got changed. That said, if someone really has the urge to adjust them, we could allow it with a warning in the log. Fixes: 3fd091e73b81 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-15 06:59:14 +08:00
.mode = 0644,
.proc_handler = proc_sctp_do_alpha_beta,
.extra1 = &rto_beta_min,
.extra2 = &rto_beta_max,
},
{
.procname = "max_burst",
.data = &init_net.sctp.max_burst,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
proc/sysctl: add shared variables for range check In the sysctl code the proc_dointvec_minmax() function is often used to validate the user supplied value between an allowed range. This function uses the extra1 and extra2 members from struct ctl_table as minimum and maximum allowed value. On sysctl handler declaration, in every source file there are some readonly variables containing just an integer which address is assigned to the extra1 and extra2 members, so the sysctl range is enforced. The special values 0, 1 and INT_MAX are very often used as range boundary, leading duplication of variables like zero=0, one=1, int_max=INT_MAX in different source files: $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l 248 Add a const int array containing the most commonly used values, some macros to refer more easily to the correct array member, and use them instead of creating a local one for every object file. This is the bloat-o-meter output comparing the old and new binary compiled with the default Fedora config: # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164) Data old new delta sysctl_vals - 12 +12 __kstrtab_sysctl_vals - 12 +12 max 14 10 -4 int_max 16 - -16 one 68 - -68 zero 128 28 -100 Total: Before=20583249, After=20583085, chg -0.00% [mcroce@redhat.com: tipc: remove two unused variables] Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com [akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c] [arnd@arndb.de: proc/sysctl: make firmware loader table conditional] Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de [akpm@linux-foundation.org: fix fs/eventpoll.c] Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.com Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-19 06:58:50 +08:00
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_INT_MAX,
},
{
.procname = "cookie_preserve_enable",
.data = &init_net.sctp.cookie_preserve_enable,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
},
{
.procname = "cookie_hmac_alg",
.data = &init_net.sctp.sctp_hmac_alg,
.maxlen = 8,
.mode = 0644,
.proc_handler = proc_sctp_do_hmac_alg,
},
{
.procname = "valid_cookie_life",
.data = &init_net.sctp.valid_cookie_life,
.maxlen = sizeof(unsigned int),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
proc/sysctl: add shared variables for range check In the sysctl code the proc_dointvec_minmax() function is often used to validate the user supplied value between an allowed range. This function uses the extra1 and extra2 members from struct ctl_table as minimum and maximum allowed value. On sysctl handler declaration, in every source file there are some readonly variables containing just an integer which address is assigned to the extra1 and extra2 members, so the sysctl range is enforced. The special values 0, 1 and INT_MAX are very often used as range boundary, leading duplication of variables like zero=0, one=1, int_max=INT_MAX in different source files: $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l 248 Add a const int array containing the most commonly used values, some macros to refer more easily to the correct array member, and use them instead of creating a local one for every object file. This is the bloat-o-meter output comparing the old and new binary compiled with the default Fedora config: # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164) Data old new delta sysctl_vals - 12 +12 __kstrtab_sysctl_vals - 12 +12 max 14 10 -4 int_max 16 - -16 one 68 - -68 zero 128 28 -100 Total: Before=20583249, After=20583085, chg -0.00% [mcroce@redhat.com: tipc: remove two unused variables] Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com [akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c] [arnd@arndb.de: proc/sysctl: make firmware loader table conditional] Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de [akpm@linux-foundation.org: fix fs/eventpoll.c] Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.com Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-19 06:58:50 +08:00
.extra1 = SYSCTL_ONE,
.extra2 = &timer_max
},
{
.procname = "sack_timeout",
.data = &init_net.sctp.sack_timeout,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = &sack_timer_min,
.extra2 = &sack_timer_max,
},
{
.procname = "hb_interval",
.data = &init_net.sctp.hb_interval,
.maxlen = sizeof(unsigned int),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
proc/sysctl: add shared variables for range check In the sysctl code the proc_dointvec_minmax() function is often used to validate the user supplied value between an allowed range. This function uses the extra1 and extra2 members from struct ctl_table as minimum and maximum allowed value. On sysctl handler declaration, in every source file there are some readonly variables containing just an integer which address is assigned to the extra1 and extra2 members, so the sysctl range is enforced. The special values 0, 1 and INT_MAX are very often used as range boundary, leading duplication of variables like zero=0, one=1, int_max=INT_MAX in different source files: $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l 248 Add a const int array containing the most commonly used values, some macros to refer more easily to the correct array member, and use them instead of creating a local one for every object file. This is the bloat-o-meter output comparing the old and new binary compiled with the default Fedora config: # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164) Data old new delta sysctl_vals - 12 +12 __kstrtab_sysctl_vals - 12 +12 max 14 10 -4 int_max 16 - -16 one 68 - -68 zero 128 28 -100 Total: Before=20583249, After=20583085, chg -0.00% [mcroce@redhat.com: tipc: remove two unused variables] Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com [akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c] [arnd@arndb.de: proc/sysctl: make firmware loader table conditional] Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de [akpm@linux-foundation.org: fix fs/eventpoll.c] Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.com Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-19 06:58:50 +08:00
.extra1 = SYSCTL_ONE,
.extra2 = &timer_max
},
{
.procname = "association_max_retrans",
.data = &init_net.sctp.max_retrans_association,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
proc/sysctl: add shared variables for range check In the sysctl code the proc_dointvec_minmax() function is often used to validate the user supplied value between an allowed range. This function uses the extra1 and extra2 members from struct ctl_table as minimum and maximum allowed value. On sysctl handler declaration, in every source file there are some readonly variables containing just an integer which address is assigned to the extra1 and extra2 members, so the sysctl range is enforced. The special values 0, 1 and INT_MAX are very often used as range boundary, leading duplication of variables like zero=0, one=1, int_max=INT_MAX in different source files: $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l 248 Add a const int array containing the most commonly used values, some macros to refer more easily to the correct array member, and use them instead of creating a local one for every object file. This is the bloat-o-meter output comparing the old and new binary compiled with the default Fedora config: # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164) Data old new delta sysctl_vals - 12 +12 __kstrtab_sysctl_vals - 12 +12 max 14 10 -4 int_max 16 - -16 one 68 - -68 zero 128 28 -100 Total: Before=20583249, After=20583085, chg -0.00% [mcroce@redhat.com: tipc: remove two unused variables] Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com [akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c] [arnd@arndb.de: proc/sysctl: make firmware loader table conditional] Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de [akpm@linux-foundation.org: fix fs/eventpoll.c] Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.com Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-19 06:58:50 +08:00
.extra1 = SYSCTL_ONE,
.extra2 = SYSCTL_INT_MAX,
},
{
.procname = "path_max_retrans",
.data = &init_net.sctp.max_retrans_path,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
proc/sysctl: add shared variables for range check In the sysctl code the proc_dointvec_minmax() function is often used to validate the user supplied value between an allowed range. This function uses the extra1 and extra2 members from struct ctl_table as minimum and maximum allowed value. On sysctl handler declaration, in every source file there are some readonly variables containing just an integer which address is assigned to the extra1 and extra2 members, so the sysctl range is enforced. The special values 0, 1 and INT_MAX are very often used as range boundary, leading duplication of variables like zero=0, one=1, int_max=INT_MAX in different source files: $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l 248 Add a const int array containing the most commonly used values, some macros to refer more easily to the correct array member, and use them instead of creating a local one for every object file. This is the bloat-o-meter output comparing the old and new binary compiled with the default Fedora config: # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164) Data old new delta sysctl_vals - 12 +12 __kstrtab_sysctl_vals - 12 +12 max 14 10 -4 int_max 16 - -16 one 68 - -68 zero 128 28 -100 Total: Before=20583249, After=20583085, chg -0.00% [mcroce@redhat.com: tipc: remove two unused variables] Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com [akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c] [arnd@arndb.de: proc/sysctl: make firmware loader table conditional] Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de [akpm@linux-foundation.org: fix fs/eventpoll.c] Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.com Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-19 06:58:50 +08:00
.extra1 = SYSCTL_ONE,
.extra2 = SYSCTL_INT_MAX,
},
{
.procname = "max_init_retransmits",
.data = &init_net.sctp.max_retrans_init,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
proc/sysctl: add shared variables for range check In the sysctl code the proc_dointvec_minmax() function is often used to validate the user supplied value between an allowed range. This function uses the extra1 and extra2 members from struct ctl_table as minimum and maximum allowed value. On sysctl handler declaration, in every source file there are some readonly variables containing just an integer which address is assigned to the extra1 and extra2 members, so the sysctl range is enforced. The special values 0, 1 and INT_MAX are very often used as range boundary, leading duplication of variables like zero=0, one=1, int_max=INT_MAX in different source files: $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l 248 Add a const int array containing the most commonly used values, some macros to refer more easily to the correct array member, and use them instead of creating a local one for every object file. This is the bloat-o-meter output comparing the old and new binary compiled with the default Fedora config: # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164) Data old new delta sysctl_vals - 12 +12 __kstrtab_sysctl_vals - 12 +12 max 14 10 -4 int_max 16 - -16 one 68 - -68 zero 128 28 -100 Total: Before=20583249, After=20583085, chg -0.00% [mcroce@redhat.com: tipc: remove two unused variables] Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com [akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c] [arnd@arndb.de: proc/sysctl: make firmware loader table conditional] Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de [akpm@linux-foundation.org: fix fs/eventpoll.c] Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.com Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-19 06:58:50 +08:00
.extra1 = SYSCTL_ONE,
.extra2 = SYSCTL_INT_MAX,
},
{
.procname = "pf_retrans",
.data = &init_net.sctp.pf_retrans,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
proc/sysctl: add shared variables for range check In the sysctl code the proc_dointvec_minmax() function is often used to validate the user supplied value between an allowed range. This function uses the extra1 and extra2 members from struct ctl_table as minimum and maximum allowed value. On sysctl handler declaration, in every source file there are some readonly variables containing just an integer which address is assigned to the extra1 and extra2 members, so the sysctl range is enforced. The special values 0, 1 and INT_MAX are very often used as range boundary, leading duplication of variables like zero=0, one=1, int_max=INT_MAX in different source files: $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l 248 Add a const int array containing the most commonly used values, some macros to refer more easily to the correct array member, and use them instead of creating a local one for every object file. This is the bloat-o-meter output comparing the old and new binary compiled with the default Fedora config: # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164) Data old new delta sysctl_vals - 12 +12 __kstrtab_sysctl_vals - 12 +12 max 14 10 -4 int_max 16 - -16 one 68 - -68 zero 128 28 -100 Total: Before=20583249, After=20583085, chg -0.00% [mcroce@redhat.com: tipc: remove two unused variables] Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com [akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c] [arnd@arndb.de: proc/sysctl: make firmware loader table conditional] Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de [akpm@linux-foundation.org: fix fs/eventpoll.c] Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.com Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-19 06:58:50 +08:00
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_INT_MAX,
},
{
.procname = "sndbuf_policy",
.data = &init_net.sctp.sndbuf_policy,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
},
{
.procname = "rcvbuf_policy",
.data = &init_net.sctp.rcvbuf_policy,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
},
{
.procname = "default_auto_asconf",
.data = &init_net.sctp.default_auto_asconf,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
},
{
.procname = "addip_enable",
.data = &init_net.sctp.addip_enable,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
},
{
.procname = "addip_noauth_enable",
.data = &init_net.sctp.addip_noauth,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
},
{
.procname = "prsctp_enable",
.data = &init_net.sctp.prsctp_enable,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
},
{
.procname = "reconf_enable",
.data = &init_net.sctp.reconf_enable,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
},
{
.procname = "auth_enable",
.data = &init_net.sctp.auth_enable,
.maxlen = sizeof(int),
.mode = 0644,
net: sctp: cache auth_enable per endpoint Currently, it is possible to create an SCTP socket, then switch auth_enable via sysctl setting to 1 and crash the system on connect: Oops[#1]: CPU: 0 PID: 0 Comm: swapper Not tainted 3.14.1-mipsgit-20140415 #1 task: ffffffff8056ce80 ti: ffffffff8055c000 task.ti: ffffffff8055c000 [...] Call Trace: [<ffffffff8043c4e8>] sctp_auth_asoc_set_default_hmac+0x68/0x80 [<ffffffff8042b300>] sctp_process_init+0x5e0/0x8a4 [<ffffffff8042188c>] sctp_sf_do_5_1B_init+0x234/0x34c [<ffffffff804228c8>] sctp_do_sm+0xb4/0x1e8 [<ffffffff80425a08>] sctp_endpoint_bh_rcv+0x1c4/0x214 [<ffffffff8043af68>] sctp_rcv+0x588/0x630 [<ffffffff8043e8e8>] sctp6_rcv+0x10/0x24 [<ffffffff803acb50>] ip6_input+0x2c0/0x440 [<ffffffff8030fc00>] __netif_receive_skb_core+0x4a8/0x564 [<ffffffff80310650>] process_backlog+0xb4/0x18c [<ffffffff80313cbc>] net_rx_action+0x12c/0x210 [<ffffffff80034254>] __do_softirq+0x17c/0x2ac [<ffffffff800345e0>] irq_exit+0x54/0xb0 [<ffffffff800075a4>] ret_from_irq+0x0/0x4 [<ffffffff800090ec>] rm7k_wait_irqoff+0x24/0x48 [<ffffffff8005e388>] cpu_startup_entry+0xc0/0x148 [<ffffffff805a88b0>] start_kernel+0x37c/0x398 Code: dd0900b8 000330f8 0126302d <dcc60000> 50c0fff1 0047182a a48306a0 03e00008 00000000 ---[ end trace b530b0551467f2fd ]--- Kernel panic - not syncing: Fatal exception in interrupt What happens while auth_enable=0 in that case is, that ep->auth_hmacs is initialized to NULL in sctp_auth_init_hmacs() when endpoint is being created. After that point, if an admin switches over to auth_enable=1, the machine can crash due to NULL pointer dereference during reception of an INIT chunk. When we enter sctp_process_init() via sctp_sf_do_5_1B_init() in order to respond to an INIT chunk, the INIT verification succeeds and while we walk and process all INIT params via sctp_process_param() we find that net->sctp.auth_enable is set, therefore do not fall through, but invoke sctp_auth_asoc_set_default_hmac() instead, and thus, dereference what we have set to NULL during endpoint initialization phase. The fix is to make auth_enable immutable by caching its value during endpoint initialization, so that its original value is being carried along until destruction. The bug seems to originate from the very first days. Fix in joint work with Daniel Borkmann. Reported-by: Joshua Kinard <kumba@gentoo.org> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Tested-by: Joshua Kinard <kumba@gentoo.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-17 23:26:50 +08:00
.proc_handler = proc_sctp_do_auth,
},
{
.procname = "intl_enable",
.data = &init_net.sctp.intl_enable,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
},
{
.procname = "ecn_enable",
.data = &init_net.sctp.ecn_enable,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
},
{
.procname = "addr_scope_policy",
.data = &init_net.sctp.scope_policy,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
proc/sysctl: add shared variables for range check In the sysctl code the proc_dointvec_minmax() function is often used to validate the user supplied value between an allowed range. This function uses the extra1 and extra2 members from struct ctl_table as minimum and maximum allowed value. On sysctl handler declaration, in every source file there are some readonly variables containing just an integer which address is assigned to the extra1 and extra2 members, so the sysctl range is enforced. The special values 0, 1 and INT_MAX are very often used as range boundary, leading duplication of variables like zero=0, one=1, int_max=INT_MAX in different source files: $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l 248 Add a const int array containing the most commonly used values, some macros to refer more easily to the correct array member, and use them instead of creating a local one for every object file. This is the bloat-o-meter output comparing the old and new binary compiled with the default Fedora config: # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164) Data old new delta sysctl_vals - 12 +12 __kstrtab_sysctl_vals - 12 +12 max 14 10 -4 int_max 16 - -16 one 68 - -68 zero 128 28 -100 Total: Before=20583249, After=20583085, chg -0.00% [mcroce@redhat.com: tipc: remove two unused variables] Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com [akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c] [arnd@arndb.de: proc/sysctl: make firmware loader table conditional] Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de [akpm@linux-foundation.org: fix fs/eventpoll.c] Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.com Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-19 06:58:50 +08:00
.extra1 = SYSCTL_ZERO,
.extra2 = &addr_scope_max,
},
{
.procname = "rwnd_update_shift",
.data = &init_net.sctp.rwnd_upd_shift,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = &proc_dointvec_minmax,
proc/sysctl: add shared variables for range check In the sysctl code the proc_dointvec_minmax() function is often used to validate the user supplied value between an allowed range. This function uses the extra1 and extra2 members from struct ctl_table as minimum and maximum allowed value. On sysctl handler declaration, in every source file there are some readonly variables containing just an integer which address is assigned to the extra1 and extra2 members, so the sysctl range is enforced. The special values 0, 1 and INT_MAX are very often used as range boundary, leading duplication of variables like zero=0, one=1, int_max=INT_MAX in different source files: $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l 248 Add a const int array containing the most commonly used values, some macros to refer more easily to the correct array member, and use them instead of creating a local one for every object file. This is the bloat-o-meter output comparing the old and new binary compiled with the default Fedora config: # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164) Data old new delta sysctl_vals - 12 +12 __kstrtab_sysctl_vals - 12 +12 max 14 10 -4 int_max 16 - -16 one 68 - -68 zero 128 28 -100 Total: Before=20583249, After=20583085, chg -0.00% [mcroce@redhat.com: tipc: remove two unused variables] Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com [akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c] [arnd@arndb.de: proc/sysctl: make firmware loader table conditional] Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de [akpm@linux-foundation.org: fix fs/eventpoll.c] Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.com Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-19 06:58:50 +08:00
.extra1 = SYSCTL_ONE,
.extra2 = &rwnd_scale_max,
},
{
.procname = "max_autoclose",
.data = &init_net.sctp.max_autoclose,
.maxlen = sizeof(unsigned long),
.mode = 0644,
.proc_handler = &proc_doulongvec_minmax,
.extra1 = &max_autoclose_min,
.extra2 = &max_autoclose_max,
},
{
.procname = "pf_enable",
.data = &init_net.sctp.pf_enable,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
},
{ /* sentinel */ }
};
static int proc_sctp_do_hmac_alg(struct ctl_table *ctl, int write,
void __user *buffer, size_t *lenp,
loff_t *ppos)
{
struct net *net = current->nsproxy->net_ns;
struct ctl_table tbl;
bool changed = false;
char *none = "none";
char tmp[8] = {0};
int ret;
memset(&tbl, 0, sizeof(struct ctl_table));
if (write) {
tbl.data = tmp;
tbl.maxlen = sizeof(tmp);
} else {
tbl.data = net->sctp.sctp_hmac_alg ? : none;
tbl.maxlen = strlen(tbl.data);
}
ret = proc_dostring(&tbl, write, buffer, lenp, ppos);
if (write && ret == 0) {
#ifdef CONFIG_CRYPTO_MD5
if (!strncmp(tmp, "md5", 3)) {
net->sctp.sctp_hmac_alg = "md5";
changed = true;
}
#endif
#ifdef CONFIG_CRYPTO_SHA1
if (!strncmp(tmp, "sha1", 4)) {
net->sctp.sctp_hmac_alg = "sha1";
changed = true;
}
#endif
if (!strncmp(tmp, "none", 4)) {
net->sctp.sctp_hmac_alg = NULL;
changed = true;
}
if (!changed)
ret = -EINVAL;
}
return ret;
}
static int proc_sctp_do_rto_min(struct ctl_table *ctl, int write,
void __user *buffer, size_t *lenp,
loff_t *ppos)
{
struct net *net = current->nsproxy->net_ns;
unsigned int min = *(unsigned int *) ctl->extra1;
unsigned int max = *(unsigned int *) ctl->extra2;
struct ctl_table tbl;
int ret, new_value;
memset(&tbl, 0, sizeof(struct ctl_table));
tbl.maxlen = sizeof(unsigned int);
if (write)
tbl.data = &new_value;
else
tbl.data = &net->sctp.rto_min;
ret = proc_dointvec(&tbl, write, buffer, lenp, ppos);
if (write && ret == 0) {
if (new_value > max || new_value < min)
return -EINVAL;
net->sctp.rto_min = new_value;
}
return ret;
}
static int proc_sctp_do_rto_max(struct ctl_table *ctl, int write,
void __user *buffer, size_t *lenp,
loff_t *ppos)
{
struct net *net = current->nsproxy->net_ns;
unsigned int min = *(unsigned int *) ctl->extra1;
unsigned int max = *(unsigned int *) ctl->extra2;
struct ctl_table tbl;
int ret, new_value;
memset(&tbl, 0, sizeof(struct ctl_table));
tbl.maxlen = sizeof(unsigned int);
if (write)
tbl.data = &new_value;
else
tbl.data = &net->sctp.rto_max;
ret = proc_dointvec(&tbl, write, buffer, lenp, ppos);
if (write && ret == 0) {
if (new_value > max || new_value < min)
return -EINVAL;
net->sctp.rto_max = new_value;
}
return ret;
}
net: sctp: fix permissions for rto_alpha and rto_beta knobs Commit 3fd091e73b81 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.") has silently changed permissions for rto_alpha and rto_beta knobs from 0644 to 0444. The purpose of this was to discourage users from tweaking rto_alpha and rto_beta knobs in production environments since they are key to correctly compute rtt/srtt. RFC4960 under section 6.3.1. RTO Calculation says regarding rto_alpha and rto_beta under rule C3 and C4: [...] C3) When a new RTT measurement R' is made, set RTTVAR <- (1 - RTO.Beta) * RTTVAR + RTO.Beta * |SRTT - R'| and SRTT <- (1 - RTO.Alpha) * SRTT + RTO.Alpha * R' Note: The value of SRTT used in the update to RTTVAR is its value before updating SRTT itself using the second assignment. After the computation, update RTO <- SRTT + 4 * RTTVAR. C4) When data is in flight and when allowed by rule C5 below, a new RTT measurement MUST be made each round trip. Furthermore, new RTT measurements SHOULD be made no more than once per round trip for a given destination transport address. There are two reasons for this recommendation: First, it appears that measuring more frequently often does not in practice yield any significant benefit [ALLMAN99]; second, if measurements are made more often, then the values of RTO.Alpha and RTO.Beta in rule C3 above should be adjusted so that SRTT and RTTVAR still adjust to changes at roughly the same rate (in terms of how many round trips it takes them to reflect new values) as they would if making only one measurement per round-trip and using RTO.Alpha and RTO.Beta as given in rule C3. However, the exact nature of these adjustments remains a research issue. [...] While it is discouraged to adjust rto_alpha and rto_beta and not further specified how to adjust them, the RFC also doesn't explicitly forbid it, but rather gives a RECOMMENDED default value (rto_alpha=3, rto_beta=2). We have a couple of users relying on the old permissions before they got changed. That said, if someone really has the urge to adjust them, we could allow it with a warning in the log. Fixes: 3fd091e73b81 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-15 06:59:14 +08:00
static int proc_sctp_do_alpha_beta(struct ctl_table *ctl, int write,
void __user *buffer, size_t *lenp,
loff_t *ppos)
{
if (write)
pr_warn_once("Changing rto_alpha or rto_beta may lead to "
"suboptimal rtt/srtt estimations!\n");
net: sctp: fix permissions for rto_alpha and rto_beta knobs Commit 3fd091e73b81 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.") has silently changed permissions for rto_alpha and rto_beta knobs from 0644 to 0444. The purpose of this was to discourage users from tweaking rto_alpha and rto_beta knobs in production environments since they are key to correctly compute rtt/srtt. RFC4960 under section 6.3.1. RTO Calculation says regarding rto_alpha and rto_beta under rule C3 and C4: [...] C3) When a new RTT measurement R' is made, set RTTVAR <- (1 - RTO.Beta) * RTTVAR + RTO.Beta * |SRTT - R'| and SRTT <- (1 - RTO.Alpha) * SRTT + RTO.Alpha * R' Note: The value of SRTT used in the update to RTTVAR is its value before updating SRTT itself using the second assignment. After the computation, update RTO <- SRTT + 4 * RTTVAR. C4) When data is in flight and when allowed by rule C5 below, a new RTT measurement MUST be made each round trip. Furthermore, new RTT measurements SHOULD be made no more than once per round trip for a given destination transport address. There are two reasons for this recommendation: First, it appears that measuring more frequently often does not in practice yield any significant benefit [ALLMAN99]; second, if measurements are made more often, then the values of RTO.Alpha and RTO.Beta in rule C3 above should be adjusted so that SRTT and RTTVAR still adjust to changes at roughly the same rate (in terms of how many round trips it takes them to reflect new values) as they would if making only one measurement per round-trip and using RTO.Alpha and RTO.Beta as given in rule C3. However, the exact nature of these adjustments remains a research issue. [...] While it is discouraged to adjust rto_alpha and rto_beta and not further specified how to adjust them, the RFC also doesn't explicitly forbid it, but rather gives a RECOMMENDED default value (rto_alpha=3, rto_beta=2). We have a couple of users relying on the old permissions before they got changed. That said, if someone really has the urge to adjust them, we could allow it with a warning in the log. Fixes: 3fd091e73b81 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-15 06:59:14 +08:00
return proc_dointvec_minmax(ctl, write, buffer, lenp, ppos);
}
net: sctp: cache auth_enable per endpoint Currently, it is possible to create an SCTP socket, then switch auth_enable via sysctl setting to 1 and crash the system on connect: Oops[#1]: CPU: 0 PID: 0 Comm: swapper Not tainted 3.14.1-mipsgit-20140415 #1 task: ffffffff8056ce80 ti: ffffffff8055c000 task.ti: ffffffff8055c000 [...] Call Trace: [<ffffffff8043c4e8>] sctp_auth_asoc_set_default_hmac+0x68/0x80 [<ffffffff8042b300>] sctp_process_init+0x5e0/0x8a4 [<ffffffff8042188c>] sctp_sf_do_5_1B_init+0x234/0x34c [<ffffffff804228c8>] sctp_do_sm+0xb4/0x1e8 [<ffffffff80425a08>] sctp_endpoint_bh_rcv+0x1c4/0x214 [<ffffffff8043af68>] sctp_rcv+0x588/0x630 [<ffffffff8043e8e8>] sctp6_rcv+0x10/0x24 [<ffffffff803acb50>] ip6_input+0x2c0/0x440 [<ffffffff8030fc00>] __netif_receive_skb_core+0x4a8/0x564 [<ffffffff80310650>] process_backlog+0xb4/0x18c [<ffffffff80313cbc>] net_rx_action+0x12c/0x210 [<ffffffff80034254>] __do_softirq+0x17c/0x2ac [<ffffffff800345e0>] irq_exit+0x54/0xb0 [<ffffffff800075a4>] ret_from_irq+0x0/0x4 [<ffffffff800090ec>] rm7k_wait_irqoff+0x24/0x48 [<ffffffff8005e388>] cpu_startup_entry+0xc0/0x148 [<ffffffff805a88b0>] start_kernel+0x37c/0x398 Code: dd0900b8 000330f8 0126302d <dcc60000> 50c0fff1 0047182a a48306a0 03e00008 00000000 ---[ end trace b530b0551467f2fd ]--- Kernel panic - not syncing: Fatal exception in interrupt What happens while auth_enable=0 in that case is, that ep->auth_hmacs is initialized to NULL in sctp_auth_init_hmacs() when endpoint is being created. After that point, if an admin switches over to auth_enable=1, the machine can crash due to NULL pointer dereference during reception of an INIT chunk. When we enter sctp_process_init() via sctp_sf_do_5_1B_init() in order to respond to an INIT chunk, the INIT verification succeeds and while we walk and process all INIT params via sctp_process_param() we find that net->sctp.auth_enable is set, therefore do not fall through, but invoke sctp_auth_asoc_set_default_hmac() instead, and thus, dereference what we have set to NULL during endpoint initialization phase. The fix is to make auth_enable immutable by caching its value during endpoint initialization, so that its original value is being carried along until destruction. The bug seems to originate from the very first days. Fix in joint work with Daniel Borkmann. Reported-by: Joshua Kinard <kumba@gentoo.org> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Tested-by: Joshua Kinard <kumba@gentoo.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-17 23:26:50 +08:00
static int proc_sctp_do_auth(struct ctl_table *ctl, int write,
void __user *buffer, size_t *lenp,
loff_t *ppos)
{
struct net *net = current->nsproxy->net_ns;
struct ctl_table tbl;
int new_value, ret;
memset(&tbl, 0, sizeof(struct ctl_table));
tbl.maxlen = sizeof(unsigned int);
if (write)
tbl.data = &new_value;
else
tbl.data = &net->sctp.auth_enable;
ret = proc_dointvec(&tbl, write, buffer, lenp, ppos);
if (write && ret == 0) {
net: sctp: cache auth_enable per endpoint Currently, it is possible to create an SCTP socket, then switch auth_enable via sysctl setting to 1 and crash the system on connect: Oops[#1]: CPU: 0 PID: 0 Comm: swapper Not tainted 3.14.1-mipsgit-20140415 #1 task: ffffffff8056ce80 ti: ffffffff8055c000 task.ti: ffffffff8055c000 [...] Call Trace: [<ffffffff8043c4e8>] sctp_auth_asoc_set_default_hmac+0x68/0x80 [<ffffffff8042b300>] sctp_process_init+0x5e0/0x8a4 [<ffffffff8042188c>] sctp_sf_do_5_1B_init+0x234/0x34c [<ffffffff804228c8>] sctp_do_sm+0xb4/0x1e8 [<ffffffff80425a08>] sctp_endpoint_bh_rcv+0x1c4/0x214 [<ffffffff8043af68>] sctp_rcv+0x588/0x630 [<ffffffff8043e8e8>] sctp6_rcv+0x10/0x24 [<ffffffff803acb50>] ip6_input+0x2c0/0x440 [<ffffffff8030fc00>] __netif_receive_skb_core+0x4a8/0x564 [<ffffffff80310650>] process_backlog+0xb4/0x18c [<ffffffff80313cbc>] net_rx_action+0x12c/0x210 [<ffffffff80034254>] __do_softirq+0x17c/0x2ac [<ffffffff800345e0>] irq_exit+0x54/0xb0 [<ffffffff800075a4>] ret_from_irq+0x0/0x4 [<ffffffff800090ec>] rm7k_wait_irqoff+0x24/0x48 [<ffffffff8005e388>] cpu_startup_entry+0xc0/0x148 [<ffffffff805a88b0>] start_kernel+0x37c/0x398 Code: dd0900b8 000330f8 0126302d <dcc60000> 50c0fff1 0047182a a48306a0 03e00008 00000000 ---[ end trace b530b0551467f2fd ]--- Kernel panic - not syncing: Fatal exception in interrupt What happens while auth_enable=0 in that case is, that ep->auth_hmacs is initialized to NULL in sctp_auth_init_hmacs() when endpoint is being created. After that point, if an admin switches over to auth_enable=1, the machine can crash due to NULL pointer dereference during reception of an INIT chunk. When we enter sctp_process_init() via sctp_sf_do_5_1B_init() in order to respond to an INIT chunk, the INIT verification succeeds and while we walk and process all INIT params via sctp_process_param() we find that net->sctp.auth_enable is set, therefore do not fall through, but invoke sctp_auth_asoc_set_default_hmac() instead, and thus, dereference what we have set to NULL during endpoint initialization phase. The fix is to make auth_enable immutable by caching its value during endpoint initialization, so that its original value is being carried along until destruction. The bug seems to originate from the very first days. Fix in joint work with Daniel Borkmann. Reported-by: Joshua Kinard <kumba@gentoo.org> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Tested-by: Joshua Kinard <kumba@gentoo.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-17 23:26:50 +08:00
struct sock *sk = net->sctp.ctl_sock;
net->sctp.auth_enable = new_value;
/* Update the value in the control socket */
lock_sock(sk);
sctp_sk(sk)->ep->auth_enable = new_value;
release_sock(sk);
}
return ret;
}
int sctp_sysctl_net_register(struct net *net)
{
Revert "sctp: optimize the sctp_sysctl_net_register" This revert commit efb842c45("sctp: optimize the sctp_sysctl_net_register"), Since it doesn't kmemdup a sysctl_table for init_net, so the init_net->sctp.sysctl_header->ctl_table_arg points to sctp_net_table which is a static array pointer. So when doing sctp_sysctl_net_unregister, it will free sctp_net_table, then we will get a NULL pointer dereference like that: [ 262.948220] BUG: unable to handle kernel NULL pointer dereference at 000000000000006c [ 262.948232] IP: [<ffffffff81144b70>] kfree+0x80/0x420 [ 262.948260] PGD db80a067 PUD dae12067 PMD 0 [ 262.948268] Oops: 0000 [#1] SMP [ 262.948273] Modules linked in: sctp(-) crc32c_generic libcrc32c ... [ 262.948338] task: ffff8800db830190 ti: ffff8800dad00000 task.ti: ffff8800dad00000 [ 262.948344] RIP: 0010:[<ffffffff81144b70>] [<ffffffff81144b70>] kfree+0x80/0x420 [ 262.948353] RSP: 0018:ffff8800dad01d88 EFLAGS: 00010046 [ 262.948358] RAX: 0100000000000000 RBX: ffffffffa0227940 RCX: ffffea0000707888 [ 262.948363] RDX: ffffea0000707888 RSI: 0000000000000001 RDI: ffffffffa0227940 [ 262.948369] RBP: ffff8800dad01de8 R08: 0000000000000000 R09: ffff8800d9e983a9 [ 262.948374] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa0227940 [ 262.948380] R13: ffffffff8187cfc0 R14: 0000000000000000 R15: ffffffff8187da10 [ 262.948386] FS: 00007fa2a2658700(0000) GS:ffff880112800000(0000) knlGS:0000000000000000 [ 262.948394] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 262.948400] CR2: 000000000000006c CR3: 00000000cddc0000 CR4: 00000000000006e0 [ 262.948410] Stack: [ 262.948413] ffff8800dad01da8 0000000000000286 0000000020227940 ffffffffa0227940 [ 262.948422] ffff8800dad01dd8 ffffffff811b7fa1 ffffffffa0227940 ffffffffa0227940 [ 262.948431] ffffffff8187d960 ffffffff8187cfc0 ffffffff8187d960 ffffffff8187da10 [ 262.948440] Call Trace: [ 262.948457] [<ffffffff811b7fa1>] ? unregister_sysctl_table+0x51/0xa0 [ 262.948476] [<ffffffffa020d1a1>] sctp_sysctl_net_unregister+0x21/0x30 [sctp] [ 262.948490] [<ffffffffa020ef6d>] sctp_net_exit+0x12d/0x150 [sctp] [ 262.948512] [<ffffffff81394f49>] ops_exit_list+0x39/0x60 [ 262.948522] [<ffffffff813951ed>] unregister_pernet_operations+0x3d/0x70 [ 262.948530] [<ffffffff81395292>] unregister_pernet_subsys+0x22/0x40 [ 262.948544] [<ffffffffa020efcc>] sctp_exit+0x3c/0x12d [sctp] [ 262.948562] [<ffffffff810c5e04>] SyS_delete_module+0x194/0x210 [ 262.948577] [<ffffffff81240fde>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 262.948587] [<ffffffff815217a2>] system_call_fastpath+0x16/0x1b With this revert, it won't occur the Oops. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-08 20:55:01 +08:00
struct ctl_table *table;
int i;
Revert "sctp: optimize the sctp_sysctl_net_register" This revert commit efb842c45("sctp: optimize the sctp_sysctl_net_register"), Since it doesn't kmemdup a sysctl_table for init_net, so the init_net->sctp.sysctl_header->ctl_table_arg points to sctp_net_table which is a static array pointer. So when doing sctp_sysctl_net_unregister, it will free sctp_net_table, then we will get a NULL pointer dereference like that: [ 262.948220] BUG: unable to handle kernel NULL pointer dereference at 000000000000006c [ 262.948232] IP: [<ffffffff81144b70>] kfree+0x80/0x420 [ 262.948260] PGD db80a067 PUD dae12067 PMD 0 [ 262.948268] Oops: 0000 [#1] SMP [ 262.948273] Modules linked in: sctp(-) crc32c_generic libcrc32c ... [ 262.948338] task: ffff8800db830190 ti: ffff8800dad00000 task.ti: ffff8800dad00000 [ 262.948344] RIP: 0010:[<ffffffff81144b70>] [<ffffffff81144b70>] kfree+0x80/0x420 [ 262.948353] RSP: 0018:ffff8800dad01d88 EFLAGS: 00010046 [ 262.948358] RAX: 0100000000000000 RBX: ffffffffa0227940 RCX: ffffea0000707888 [ 262.948363] RDX: ffffea0000707888 RSI: 0000000000000001 RDI: ffffffffa0227940 [ 262.948369] RBP: ffff8800dad01de8 R08: 0000000000000000 R09: ffff8800d9e983a9 [ 262.948374] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa0227940 [ 262.948380] R13: ffffffff8187cfc0 R14: 0000000000000000 R15: ffffffff8187da10 [ 262.948386] FS: 00007fa2a2658700(0000) GS:ffff880112800000(0000) knlGS:0000000000000000 [ 262.948394] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 262.948400] CR2: 000000000000006c CR3: 00000000cddc0000 CR4: 00000000000006e0 [ 262.948410] Stack: [ 262.948413] ffff8800dad01da8 0000000000000286 0000000020227940 ffffffffa0227940 [ 262.948422] ffff8800dad01dd8 ffffffff811b7fa1 ffffffffa0227940 ffffffffa0227940 [ 262.948431] ffffffff8187d960 ffffffff8187cfc0 ffffffff8187d960 ffffffff8187da10 [ 262.948440] Call Trace: [ 262.948457] [<ffffffff811b7fa1>] ? unregister_sysctl_table+0x51/0xa0 [ 262.948476] [<ffffffffa020d1a1>] sctp_sysctl_net_unregister+0x21/0x30 [sctp] [ 262.948490] [<ffffffffa020ef6d>] sctp_net_exit+0x12d/0x150 [sctp] [ 262.948512] [<ffffffff81394f49>] ops_exit_list+0x39/0x60 [ 262.948522] [<ffffffff813951ed>] unregister_pernet_operations+0x3d/0x70 [ 262.948530] [<ffffffff81395292>] unregister_pernet_subsys+0x22/0x40 [ 262.948544] [<ffffffffa020efcc>] sctp_exit+0x3c/0x12d [sctp] [ 262.948562] [<ffffffff810c5e04>] SyS_delete_module+0x194/0x210 [ 262.948577] [<ffffffff81240fde>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 262.948587] [<ffffffff815217a2>] system_call_fastpath+0x16/0x1b With this revert, it won't occur the Oops. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-08 20:55:01 +08:00
table = kmemdup(sctp_net_table, sizeof(sctp_net_table), GFP_KERNEL);
if (!table)
return -ENOMEM;
Revert "sctp: optimize the sctp_sysctl_net_register" This revert commit efb842c45("sctp: optimize the sctp_sysctl_net_register"), Since it doesn't kmemdup a sysctl_table for init_net, so the init_net->sctp.sysctl_header->ctl_table_arg points to sctp_net_table which is a static array pointer. So when doing sctp_sysctl_net_unregister, it will free sctp_net_table, then we will get a NULL pointer dereference like that: [ 262.948220] BUG: unable to handle kernel NULL pointer dereference at 000000000000006c [ 262.948232] IP: [<ffffffff81144b70>] kfree+0x80/0x420 [ 262.948260] PGD db80a067 PUD dae12067 PMD 0 [ 262.948268] Oops: 0000 [#1] SMP [ 262.948273] Modules linked in: sctp(-) crc32c_generic libcrc32c ... [ 262.948338] task: ffff8800db830190 ti: ffff8800dad00000 task.ti: ffff8800dad00000 [ 262.948344] RIP: 0010:[<ffffffff81144b70>] [<ffffffff81144b70>] kfree+0x80/0x420 [ 262.948353] RSP: 0018:ffff8800dad01d88 EFLAGS: 00010046 [ 262.948358] RAX: 0100000000000000 RBX: ffffffffa0227940 RCX: ffffea0000707888 [ 262.948363] RDX: ffffea0000707888 RSI: 0000000000000001 RDI: ffffffffa0227940 [ 262.948369] RBP: ffff8800dad01de8 R08: 0000000000000000 R09: ffff8800d9e983a9 [ 262.948374] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa0227940 [ 262.948380] R13: ffffffff8187cfc0 R14: 0000000000000000 R15: ffffffff8187da10 [ 262.948386] FS: 00007fa2a2658700(0000) GS:ffff880112800000(0000) knlGS:0000000000000000 [ 262.948394] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 262.948400] CR2: 000000000000006c CR3: 00000000cddc0000 CR4: 00000000000006e0 [ 262.948410] Stack: [ 262.948413] ffff8800dad01da8 0000000000000286 0000000020227940 ffffffffa0227940 [ 262.948422] ffff8800dad01dd8 ffffffff811b7fa1 ffffffffa0227940 ffffffffa0227940 [ 262.948431] ffffffff8187d960 ffffffff8187cfc0 ffffffff8187d960 ffffffff8187da10 [ 262.948440] Call Trace: [ 262.948457] [<ffffffff811b7fa1>] ? unregister_sysctl_table+0x51/0xa0 [ 262.948476] [<ffffffffa020d1a1>] sctp_sysctl_net_unregister+0x21/0x30 [sctp] [ 262.948490] [<ffffffffa020ef6d>] sctp_net_exit+0x12d/0x150 [sctp] [ 262.948512] [<ffffffff81394f49>] ops_exit_list+0x39/0x60 [ 262.948522] [<ffffffff813951ed>] unregister_pernet_operations+0x3d/0x70 [ 262.948530] [<ffffffff81395292>] unregister_pernet_subsys+0x22/0x40 [ 262.948544] [<ffffffffa020efcc>] sctp_exit+0x3c/0x12d [sctp] [ 262.948562] [<ffffffff810c5e04>] SyS_delete_module+0x194/0x210 [ 262.948577] [<ffffffff81240fde>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 262.948587] [<ffffffff815217a2>] system_call_fastpath+0x16/0x1b With this revert, it won't occur the Oops. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-08 20:55:01 +08:00
for (i = 0; table[i].data; i++)
table[i].data += (char *)(&net->sctp) - (char *)&init_net.sctp;
net->sctp.sysctl_header = register_net_sysctl(net, "net/sctp", table);
if (net->sctp.sysctl_header == NULL) {
kfree(table);
return -ENOMEM;
}
return 0;
}
void sctp_sysctl_net_unregister(struct net *net)
{
SCTP: Free the per-net sysctl table on net exit. v2 Per-net sysctl table needs to be explicitly freed at net exit. Otherwise we see the following with kmemleak: unreferenced object 0xffff880402d08000 (size 2048): comm "chrome_sandbox", pid 18437, jiffies 4310887172 (age 9097.630s) hex dump (first 32 bytes): b2 68 89 81 ff ff ff ff 20 04 04 f8 01 88 ff ff .h...... ....... 04 00 00 00 a4 01 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff815b4aad>] kmemleak_alloc+0x21/0x3e [<ffffffff81110352>] slab_post_alloc_hook+0x28/0x2a [<ffffffff81113fad>] __kmalloc_track_caller+0xf1/0x104 [<ffffffff810f10c2>] kmemdup+0x1b/0x30 [<ffffffff81571e9f>] sctp_sysctl_net_register+0x1f/0x72 [<ffffffff8155d305>] sctp_net_init+0x100/0x39f [<ffffffff814ad53c>] ops_init+0xc6/0xf5 [<ffffffff814ad5b7>] setup_net+0x4c/0xd0 [<ffffffff814ada5e>] copy_net_ns+0x6d/0xd6 [<ffffffff810938b1>] create_new_namespaces+0xd7/0x147 [<ffffffff810939f4>] copy_namespaces+0x63/0x99 [<ffffffff81076733>] copy_process+0xa65/0x1233 [<ffffffff81077030>] do_fork+0x10b/0x271 [<ffffffff8100a0e9>] sys_clone+0x23/0x25 [<ffffffff815dda73>] stub_clone+0x13/0x20 [<ffffffffffffffff>] 0xffffffffffffffff I fixed the spelling of sysctl_header so the code actually compiles. -- EWB. Reported-by: Martin Mokrejs <mmokrejs@fold.natur.cuni.cz> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-25 00:02:47 +08:00
struct ctl_table *table;
table = net->sctp.sysctl_header->ctl_table_arg;
unregister_net_sysctl_table(net->sctp.sysctl_header);
SCTP: Free the per-net sysctl table on net exit. v2 Per-net sysctl table needs to be explicitly freed at net exit. Otherwise we see the following with kmemleak: unreferenced object 0xffff880402d08000 (size 2048): comm "chrome_sandbox", pid 18437, jiffies 4310887172 (age 9097.630s) hex dump (first 32 bytes): b2 68 89 81 ff ff ff ff 20 04 04 f8 01 88 ff ff .h...... ....... 04 00 00 00 a4 01 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff815b4aad>] kmemleak_alloc+0x21/0x3e [<ffffffff81110352>] slab_post_alloc_hook+0x28/0x2a [<ffffffff81113fad>] __kmalloc_track_caller+0xf1/0x104 [<ffffffff810f10c2>] kmemdup+0x1b/0x30 [<ffffffff81571e9f>] sctp_sysctl_net_register+0x1f/0x72 [<ffffffff8155d305>] sctp_net_init+0x100/0x39f [<ffffffff814ad53c>] ops_init+0xc6/0xf5 [<ffffffff814ad5b7>] setup_net+0x4c/0xd0 [<ffffffff814ada5e>] copy_net_ns+0x6d/0xd6 [<ffffffff810938b1>] create_new_namespaces+0xd7/0x147 [<ffffffff810939f4>] copy_namespaces+0x63/0x99 [<ffffffff81076733>] copy_process+0xa65/0x1233 [<ffffffff81077030>] do_fork+0x10b/0x271 [<ffffffff8100a0e9>] sys_clone+0x23/0x25 [<ffffffff815dda73>] stub_clone+0x13/0x20 [<ffffffffffffffff>] 0xffffffffffffffff I fixed the spelling of sysctl_header so the code actually compiles. -- EWB. Reported-by: Martin Mokrejs <mmokrejs@fold.natur.cuni.cz> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-25 00:02:47 +08:00
kfree(table);
}
static struct ctl_table_header *sctp_sysctl_header;
/* Sysctl registration. */
void sctp_sysctl_register(void)
{
sctp_sysctl_header = register_net_sysctl(&init_net, "net/sctp", sctp_table);
}
/* Sysctl deregistration. */
void sctp_sysctl_unregister(void)
{
unregister_net_sysctl_table(sctp_sysctl_header);
}