linux-sg2042

Commit Graph

Author	SHA1	Message	Date
Eric Dumazet	16e5726269	af_unix: dont send SCM_CREDENTIALS by default Since commit `7361c36c52` (af_unix: Allow credentials to work across user and pid namespaces) af_unix performance dropped a lot. This is because we now take a reference on pid and cred in each write(), and release them in read(), usually done from another process, eventually from another cpu. This triggers false sharing. # Events: 154K cycles # # Overhead Command Shared Object Symbol # ........ ....... .................. ......................... # 10.40% hackbench [kernel.kallsyms] [k] put_pid 8.60% hackbench [kernel.kallsyms] [k] unix_stream_recvmsg 7.87% hackbench [kernel.kallsyms] [k] unix_stream_sendmsg 6.11% hackbench [kernel.kallsyms] [k] do_raw_spin_lock 4.95% hackbench [kernel.kallsyms] [k] unix_scm_to_skb 4.87% hackbench [kernel.kallsyms] [k] pid_nr_ns 4.34% hackbench [kernel.kallsyms] [k] cred_to_ucred 2.39% hackbench [kernel.kallsyms] [k] unix_destruct_scm 2.24% hackbench [kernel.kallsyms] [k] sub_preempt_count 1.75% hackbench [kernel.kallsyms] [k] fget_light 1.51% hackbench [kernel.kallsyms] [k] __mutex_lock_interruptible_slowpath 1.42% hackbench [kernel.kallsyms] [k] sock_alloc_send_pskb This patch includes SCM_CREDENTIALS information in a af_unix message/skb only if requested by the sender, [man 7 unix for details how to include ancillary data using sendmsg() system call] Note: This might break buggy applications that expected SCM_CREDENTIAL from an unaware write() system call, and receiver not using SO_PASSCRED socket option. If SOCK_PASSCRED is set on source or destination socket, we still include credentials for mere write() syscalls. Performance boost in hackbench : more than 50% gain on a 16 thread machine (2 quad-core cpus, 2 threads per core) hackbench 20 thread 2000 4.228 sec instead of 9.102 sec Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-09-28 13:29:50 -04:00
Eric Dumazet	33d480ce6d	net: cleanup some rcu_dereference_raw RCU api had been completed and rcu_access_pointer() or rcu_dereference_protected() are better than generic rcu_dereference_raw() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-12 02:55:28 -07:00
John W. Linville	36099365c7	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem Conflicts: drivers/net/wireless/rtlwifi/pci.c include/linux/netlink.h	2011-06-24 15:25:51 -04:00
Johannes Berg	670dc2833d	netlink: advertise incomplete dumps Consider the following situation: * a dump that would show 8 entries, four in the first round, and four in the second * between the first and second rounds, 6 entries are removed * now the second round will not show any entry, and even if there is a sequence/generation counter the application will not know To solve this problem, add a new flag NLM_F_DUMP_INTR to the netlink header that indicates the dump wasn't consistent, this flag can also be set on the MSG_DONE message that terminates the dump, and as such above situation can be detected. To achieve this, add a sequence counter to the netlink callback struct. Of course, netlink code still needs to use this new functionality. The correct way to do that is to always set cb->seq when a dumpit callback is invoked and call nl_dump_check_consistent() for each new message. The core code will also call this function for the final MSG_DONE message. To make it usable with generic netlink, a new function genlmsg_nlhdr() is needed to obtain the netlink header from the genetlink user header. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2011-06-22 16:09:45 -04:00
Dan Carpenter	c63d6ea306	rtnetlink: unlock on error path in netlink_dump() In `c7ac8679be` "rtnetlink: Compute and store minimum ifinfo dump size", we moved the allocation under the lock so we need to unlock on error path. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David S. Miller <davem@conan.davemloft.net>	2011-06-16 23:51:35 -04:00
Greg Rose	c7ac8679be	rtnetlink: Compute and store minimum ifinfo dump size The message size allocated for rtnl ifinfo dumps was limited to a single page. This is not enough for additional interface info available with devices that support SR-IOV and caused a bug in which VF info would not be displayed if more than approximately 40 VFs were created per interface. Implement a new function pointer for the rtnl_register service that will calculate the amount of data required for the ifinfo dump and allocate enough data to satisfy the request. Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2011-06-09 20:38:07 -07:00
Dan Rosenberg	71338aa7d0	net: convert %p usage to %pK The %pK format specifier is designed to hide exposed kernel pointers, specifically via /proc interfaces. Exposing these pointers provides an easy target for kernel write vulnerabilities, since they reveal the locations of writable structures containing easily triggerable function pointers. The behavior of %pK depends on the kptr_restrict sysctl. If kptr_restrict is set to 0, no deviation from the standard %p behavior occurs. If kptr_restrict is set to 1, the default, if the current user (intended to be a reader via seq_printf(), etc.) does not have CAP_SYSLOG (currently in the LSM tree), kernel pointers using %pK are printed as 0's. If kptr_restrict is set to 2, kernel pointers using %pK are printed as 0's regardless of privileges. Replacing with 0's was chosen over the default "(null)", which cannot be parsed by userland %p, which expects "(nil)". The supporting code for kptr_restrict and %pK are currently in the -mm tree. This patch converts users of %p in net/ to %pK. Cases of printing pointers to the syslog are not covered, since this would eliminate useful information for postmortem debugging and the reading of the syslog is already optionally protected by the dmesg_restrict sysctl. Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Cc: James Morris <jmorris@namei.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Thomas Graf <tgraf@infradead.org> Cc: Eugene Teo <eugeneteo@kernel.org> Cc: Kees Cook <kees.cook@canonical.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David S. Miller <davem@davemloft.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Eric Paris <eparis@parisplace.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-24 01:13:12 -04:00
Lai Jiangshan	37b6b935e9	net,rcu: convert call_rcu(listeners_free_rcu) to kfree_rcu() The rcu callback listeners_free_rcu() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(listeners_free_rcu). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2011-05-07 22:50:51 -07:00
David S. Miller	0a0e9ae1bd	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/bnx2x/bnx2x.h	2011-03-03 21:27:42 -08:00
Patrick McHardy	01a16b21d6	netlink: kill eff_cap from struct netlink_skb_parms Netlink message processing in the kernel is synchronous these days, capabilities can be checked directly in security_netlink_recv() from the current process. Signed-off-by: Patrick McHardy <kaber@trash.net> Reviewed-by: James Morris <jmorris@namei.org> [chrisw: update to include pohmelfs and uvesafb] Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-03 13:32:07 -08:00
Patrick McHardy	c53fa1ed92	netlink: kill loginuid/sessionid/sid members from struct netlink_skb_parms Netlink message processing in the kernel is synchronous these days, the session information can be collected when needed. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-03 10:55:40 -08:00
Andrey Vagin	b44d211e16	netlink: handle errors from netlink_dump() netlink_dump() may failed, but nobody handle its error. It generates output data, when a previous portion has been returned to user space. This mechanism works when all data isn't go in skb. If we enter in netlink_recvmsg() and skb is absent in the recv queue, the netlink_dump() will not been executed. So if netlink_dump() is failed one time, the new data never appear and the reader will sleep forever. netlink_dump() is called from two places: 1. from netlink_sendmsg->...->netlink_dump_start(). In this place we can report error directly and it will be returned by sendmsg(). 2. from netlink_recvmsg There we can't report error directly, because we have a portion of valid output data and call netlink_dump() for prepare the next portion. If netlink_dump() is failed, the socket will be mark as error and the next recvmsg will be failed. Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-02-28 12:18:12 -08:00
David S. Miller	b8f3ab4290	Revert "netlink: test for all flags of the NLM_F_DUMP composite" This reverts commit `0ab03c2b14`. It breaks several things including the avahi daemon. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-19 13:34:20 -08:00
Jan Engelhardt	0ab03c2b14	netlink: test for all flags of the NLM_F_DUMP composite Due to NLM_F_DUMP is composed of two bits, NLM_F_ROOT \| NLM_F_MATCH, when doing "if (x & NLM_F_DUMP)", it tests for _either_ of the bits being set. Because NLM_F_MATCH's value overlaps with NLM_F_EXCL, non-dump requests with NLM_F_EXCL set are mistaken as dump requests. Substitute the condition to test for _all_ bits being set. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-09 16:25:03 -08:00
Eric Dumazet	5c398dc8f5	netlink: fix netlink_change_ngroups() commit `6c04bb18dd` (netlink: use call_rcu for netlink_change_ngroups) used a somewhat convoluted and racy way to perform call_rcu(). The old block of memory is freed after a grace period, but the rcu_head used to track it is located in new block. This can clash if we call two times or more netlink_change_ngroups(), and a block is freed before another. call_rcu() called on different cpus makes no guarantee in order of callbacks. Fix this using a more standard way of handling this : Each block of memory contains its own rcu_head, so that no 'use after free' can happens. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Johannes Berg <johannes@sipsolutions.net> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-24 16:25:39 -07:00
John W. Linville	e9a68707d7	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem Conflicts: Documentation/feature-removal-schedule.txt drivers/net/wireless/ipw2x00/ipw2200.c	2010-10-08 15:39:28 -04:00
Johannes Berg	ff4c92d85c	genetlink: introduce pre_doit/post_doit hooks Each family may have some amount of boilerplate locking code that applies to most, or even all, commands. This allows a family to handle such things in a more generic way, by allowing it to a) include private flags in each operation b) specify a pre_doit hook that is called, before an operation's doit() callback and may return an error directly, c) specify a post_doit hook that can undo locking or similar things done by pre_doit, and finally d) include two private pointers in each info struct passed between all these operations including doit(). (It's two because I'll need two in nl80211 -- can be extended.) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-05 13:35:30 -04:00
David S. Miller	b963ea89f0	netlink: Make NETLINK_USERSOCK work again. Once we started enforcing the a nl_table[] entry exist for a protocol, NETLINK_USERSOCK stopped working. Add a dummy table entry so that it works again. Reported-by: Thomas Voegtle <tv@lio96.de> Tested-by: Thomas Voegtle <tv@lio96.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-08-31 09:51:37 -07:00
Johannes Berg	68d6ac6d27	netlink: fix compat recvmsg Since commit `1dacc76d00` Author: Johannes Berg <johannes@sipsolutions.net> Date: Wed Jul 1 11:26:02 2009 +0000 net/compat/wext: send different messages to compat tasks we had a race condition when setting and then restoring frag_list. Eric attempted to fix it, but the fix created even worse problems. However, the original motivation I had when I added the code that turned out to be racy is no longer clear to me, since we only copy up to skb->len to userspace, which doesn't include the frag_list length. As a result, not doing any frag_list clearing and restoring avoids the race condition, while not introducing any other problems. Additionally, while preparing this patch I found that since none of the remaining netlink code is really aware of the frag_list, we need to use the original skb's information for packet information and credentials. This fixes, for example, the group information received by compat tasks. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: stable@kernel.org [2.6.31+, for 2.6.35 revert `1235f504aa`] Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-08-18 23:35:58 -07:00
David S. Miller	daa3766e70	Revert "netlink: netlink_recvmsg() fix" This reverts commit `1235f504aa`. It causes regressions worse than the problem it was trying to fix. Eric will try to solve the problem another way. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-08-15 23:21:50 -07:00
Changli Gao	652c671746	genetlink: use genl_register_family_with_ops() Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-07-26 21:00:10 -07:00
Changli Gao	416c2f9cf5	genetlink: cleanup code according to CodingStyle If the function is exported, the EXPORT* macro for it should follow immediately after the closing function brace line. Signed-off-by: Changli Gao <xiaosuo@gmail.com> ---- net/netlink/genetlink.c \| 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>	2010-07-26 20:53:49 -07:00
Eric Dumazet	1235f504aa	netlink: netlink_recvmsg() fix commit `1dacc76d00` (net/compat/wext: send different messages to compat tasks) introduced a race condition on netlink, in case MSG_PEEK is used. An skb given by skb_recv_datagram() might be shared, we must copy it before any modification, or risk fatal corruption. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-07-26 13:09:16 -07:00
Neil Horman	70d4bf6d46	drop_monitor: convert some kfree_skb call sites to consume_skb Convert a few calls from kfree_skb to consume_skb Noticed while I was working on dropwatch that I was detecting lots of internal skb drops in several places. While some are legitimate, several were not, freeing skbs that were at the end of their life, rather than being discarded due to an error. This patch converts those calls sites from using kfree_skb to consume_skb, which quiets the in-kernel drop_monitor code from detecting them as drops. Tested successfully by myself Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-07-20 13:28:05 -07:00
Eric W. Biederman	b47030c71d	af_netlink: Add needed scm_destroy after scm_send. scm_send occasionally allocates state in the scm_cookie, so I have modified netlink_sendmsg to guarantee that when scm_send succeeds scm_destory will be called to free that state. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Reviewed-by: Daniel Lezcano <daniel.lezcano@free.fr> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-16 14:55:56 -07:00
Eric W. Biederman	910a7e905f	netlink: Implment netlink_broadcast_filtered When netlink sockets are used to convey data that is in a namespace we need a way to select a subset of the listening sockets to deliver the packet to. For the network namespace we have been doing this by only transmitting packets in the correct network namespace. For data belonging to other namespaces netlink_bradcast_filtered provides a mechanism that allows us to examine the destination socket and to decide if we should transmit the specified packet to it. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-05-21 09:37:32 -07:00
David S. Miller	871039f02f	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/stmmac/stmmac_main.c drivers/net/wireless/wl12xx/wl1271_cmd.c drivers/net/wireless/wl12xx/wl1271_main.c drivers/net/wireless/wl12xx/wl1271_spi.c net/core/ethtool.c net/mac80211/scan.c	2010-04-11 14:53:53 -07:00
David S. Miller	4a35ecf8bf	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/bonding/bond_main.c drivers/net/via-velocity.c drivers/net/wireless/iwlwifi/iwl-agn.c	2010-04-06 23:53:30 -07:00
Linus Torvalds	cb4361c1dc	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (37 commits) smc91c92_cs: fix the problem of "Unable to find hardware address" r8169: clean up my printk uglyness net: Hook up cxgb4 to Kconfig and Makefile cxgb4: Add main driver file and driver Makefile cxgb4: Add remaining driver headers and L2T management cxgb4: Add packet queues and packet DMA code cxgb4: Add HW and FW support code cxgb4: Add register, message, and FW definitions netlabel: Fix several rcu_dereference() calls used without RCU read locks bonding: fix potential deadlock in bond_uninit() net: check the length of the socket address passed to connect(2) stmmac: add documentation for the driver. stmmac: fix kconfig for crc32 build error be2net: fix bug in vlan rx path for big endian architecture be2net: fix flashing on big endian architectures be2net: fix a bug in flashing the redboot section bonding: bond_xmit_roundrobin() fix drivers/net: Add missing unlock net: gianfar - align BD ring size console messages net: gianfar - initialize per-queue statistics ...	2010-04-06 08:34:06 -07:00
James Chapman	f408e0ce40	netlink: Export genl_lock() API for use by modules This lets kernel modules which use genl netlink APIs serialize netlink processing. Signed-off-by: James Chapman <jchapman@katalix.com> Reviewed-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-03 14:56:05 -07:00
Changli Gao	6503d96168	net: check the length of the socket address passed to connect(2) check the length of the socket address passed to connect(2). Check the length of the socket address passed to connect(2). If the length is invalid, -EINVAL will be returned. Signed-off-by: Changli Gao <xiaosuo@gmail.com> ---- net/bluetooth/l2cap.c \| 3 ++- net/bluetooth/rfcomm/sock.c \| 3 ++- net/bluetooth/sco.c \| 3 ++- net/can/bcm.c \| 3 +++ net/ieee802154/af_ieee802154.c \| 3 +++ net/ipv4/af_inet.c \| 5 +++++ net/netlink/af_netlink.c \| 3 +++ 7 files changed, 20 insertions(+), 3 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-01 17:26:01 -07:00
Tejun Heo	5a0e3ad6af	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>	2010-03-30 22:02:32 +09:00
Tom Goff	66aa4a55fe	netlink: use the appropriate namespace pid This was included in OpenVZ kernels but wasn't integrated upstream. >From git://git.openvz.org/pub/linux-2.6.24-openvz: commit 5c69402f18adf7276352e051ece2cf31feefab02 Author: Alexey Dobriyan <adobriyan@openvz.org> Date: Mon Dec 24 14:37:45 2007 +0300 netlink: fixup ->tgid to work in multiple PID namespaces Signed-off-by: Tom Goff <thomas.goff@boeing.com> Acked-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-03-26 20:13:58 -07:00
Pablo Neira Ayuso	1a50307ba1	netlink: fix NETLINK_RECV_NO_ENOBUFS in netlink_set_err() Currently, ENOBUFS errors are reported to the socket via netlink_set_err() even if NETLINK_RECV_NO_ENOBUFS is set. However, that should not happen. This fixes this problem and it changes the prototype of netlink_set_err() to return the number of sockets that have set the NETLINK_RECV_NO_ENOBUFS socket option. This return value is used in the next patch in these bugfix series. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-03-20 14:29:03 -07:00
Masatake YAMATO	cf0aa4e07c	netlink: Adding inode field to /proc/net/netlink The Inode field in /proc/net/{tcp,udp,packet,raw,...} is useful to know the types of file descriptors associated to a process. Actually lsof utility uses the field. Unfortunately, unlike /proc/net/{tcp,udp,packet,raw,...}, /proc/net/netlink doesn't have the field. This patch adds the field to /proc/net/netlink. Signed-off-by: Masatake YAMATO <yamato@redhat.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-02-28 01:29:49 -08:00
David S. Miller	9c119ba54c	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2010-02-03 19:38:22 -08:00
Alexey Dobriyan	974c37e9d8	netlink: fix for too early rmmod Netlink code does module autoload if protocol userspace is asking for is not ready. However, module can dissapear right after it was autoloaded. Example: modprobe/rmmod stress-testing and xfrm_user.ko providing NETLINK_XFRM. netlink_create() in such situation _will_ create userspace socket and _will_not_ pin module. Now if module was removed and we're going to call ->netlink_rcv into nothing: BUG: unable to handle kernel paging request at ffffffffa02f842a ^^^^^^^^^^^^^^^^ modules are loaded near these addresses here IP: [<ffffffffa02f842a>] 0xffffffffa02f842a PGD 161f067 PUD 1623063 PMD baa12067 PTE 0 Oops: 0010 [#1] PREEMPT SMP DEBUG_PAGEALLOC last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/uevent CPU 1 Pid: 11515, comm: ip Not tainted 2.6.33-rc5-netns-00594-gaaa5728-dirty #6 P5E/P5E RIP: 0010:[<ffffffffa02f842a>] [<ffffffffa02f842a>] 0xffffffffa02f842a RSP: 0018:ffff8800baa3db48 EFLAGS: 00010292 RAX: ffff8800baa3dfd8 RBX: ffff8800be353640 RCX: 0000000000000000 RDX: ffffffff81959380 RSI: ffff8800bab7f130 RDI: 0000000000000001 RBP: ffff8800baa3db58 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000011 R13: ffff8800be353640 R14: ffff8800bcdec240 R15: ffff8800bd488010 FS: 00007f93749656f0(0000) GS:ffff880002300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffffffa02f842a CR3: 00000000ba82b000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process ip (pid: 11515, threadinfo ffff8800baa3c000, task ffff8800bab7eb30) Stack: ffffffff813637c0 ffff8800bd488000 ffff8800baa3dba8 ffffffff8136397d <0> 0000000000000000 ffffffff81344adc 7fffffffffffffff 0000000000000000 <0> ffff8800baa3ded8 ffff8800be353640 ffff8800bcdec240 0000000000000000 Call Trace: [<ffffffff813637c0>] ? netlink_unicast+0x100/0x2d0 [<ffffffff8136397d>] netlink_unicast+0x2bd/0x2d0 netlink_unicast_kernel: nlk->netlink_rcv(skb); [<ffffffff81344adc>] ? memcpy_fromiovec+0x6c/0x90 [<ffffffff81364263>] netlink_sendmsg+0x1d3/0x2d0 [<ffffffff8133975b>] sock_sendmsg+0xbb/0xf0 [<ffffffff8106cdeb>] ? __lock_acquire+0x27b/0xa60 [<ffffffff810a18c3>] ? might_fault+0x73/0xd0 [<ffffffff810a18c3>] ? might_fault+0x73/0xd0 [<ffffffff8106db22>] ? __lock_release+0x82/0x170 [<ffffffff810a190e>] ? might_fault+0xbe/0xd0 [<ffffffff810a18c3>] ? might_fault+0x73/0xd0 [<ffffffff81344c77>] ? verify_iovec+0x47/0xd0 [<ffffffff8133a509>] sys_sendmsg+0x1a9/0x360 [<ffffffff813c2be5>] ? _raw_spin_unlock_irqrestore+0x65/0x70 [<ffffffff8106aced>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff813c2bc2>] ? _raw_spin_unlock_irqrestore+0x42/0x70 [<ffffffff81197004>] ? __up_read+0x84/0xb0 [<ffffffff8106ac95>] ? trace_hardirqs_on_caller+0x145/0x190 [<ffffffff813c207f>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff8100262b>] system_call_fastpath+0x16/0x1b Code: Bad RIP value. RIP [<ffffffffa02f842a>] 0xffffffffa02f842a RSP <ffff8800baa3db48> CR2: ffffffffa02f842a If module was quickly removed after autoloading, return -E. Return -EPROTONOSUPPORT if module was quickly removed after autoloading. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-02-03 18:13:43 -08:00
Samir Bellabes	e1d5a01072	genetlink: optimize ctrl_dumpfamily() there is a unnecessary test which can be replaced by a good initialization in the 'for' statement Noticed by Serge E. Hallyn <serue@us.ibm.com> Signed-off-by: Samir Bellabes <sam@synack.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-01-13 20:37:45 -08:00
Octavian Purdila	09ad9bc752	net: use net_eq to compare nets Generated with the following semantic patch @@ struct net n1; struct net n2; @@ - n1 == n2 + net_eq(n1, n2) @@ struct net n1; struct net n2; @@ - n1 != n2 + !net_eq(n1, n2) applied over {include,net,drivers/net}. Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-11-25 15:14:13 -08:00
Johannes Berg	649300b927	netlink: remove subscriptions check on notifier The netlink URELEASE notifier doesn't notify for sockets that have been used to receive multicast but it should be called for such sockets as well since they might _also_ be used for sending and not solely for receiving multicast. We will need that for nl80211 (generic netlink sockets) in the future. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-11-17 04:08:49 -08:00
Cyrill Gorcunov	13cfa97bef	net: netlink_getname, packet_getname -- use DECLARE_SOCKADDR guard Use guard DECLARE_SOCKADDR in a few more places which allow us to catch if the structure copied back is too big. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-11-10 20:54:41 -08:00
Eric Paris	3f378b6844	net: pass kern to net_proto_family create function The generic __sock_create function has a kern argument which allows the security system to make decisions based on if a socket is being created by the kernel or by userspace. This patch passes that flag to the net_proto_family specific create function, so it can do the same thing. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-11-05 22:18:14 -08:00
Krishna Kumar	988ade6b8e	genetlink: Optimize and one bug fix in genl_generate_id() 1. GENL_MIN_ID is a valid id -> no need to start at GENL_MIN_ID + 1. 2. Avoid going through the ids two times: If we start at GENL_MIN_ID+1 (or bigger) and all ids are over!, the code iterates through the list twice (or lesser). 3. Simplify code - no need to start at idx=0 which gets reset to GENL_MIN_ID. Patch on net-next-2.6. Reboot test shows that first id passed to genl_register_family was 16, next two were GENL_ID_GENERATE and genl_generate_id returned 17 & 18 (user level testing of same code shows expected values across entire range of MIN/MAX). Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-10-17 23:57:26 -07:00
Krishna Kumar	93860b08e3	genetlink: Optimize genl_register_family() genl_register_family() doesn't need to call genl_family_find_byid when GENL_ID_GENERATE is passed during register. Patch on net-next-2.6, compile and reboot testing only. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-10-17 23:57:26 -07:00
Stephen Hemminger	ec1b4cf74c	net: mark net_proto_ops as const All usages of structure net_proto_ops should be declared const. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-10-07 01:10:46 -07:00
David S. Miller	b7058842c9	net: Make setsockopt() optlen be unsigned. This provides safety against negative optlen at the type level instead of depending upon (sometimes non-trivial) checks against this sprinkled all over the the place, in each and every implementation. Based upon work done by Arjan van de Ven and feedback from Linus Torvalds. Signed-off-by: David S. Miller <davem@davemloft.net>	2009-09-30 16:12:20 -07:00
John Fastabend	5dba93aedf	net: fix nlmsg len size for skb when error bit is set. Currently, the nlmsg->len field is not set correctly in netlink_ack() for ack messages that include the nlmsg of the error frame. This corrects the length field passed to __nlmsg_put to use the correct payload size. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-09-26 20:16:11 -07:00
Johannes Berg	b8273570f8	genetlink: fix netns vs. netlink table locking (2) Similar to commit `d136f1bd36`, there's a bug when unregistering a generic netlink family, which is caught by the might_sleep() added in that commit: BUG: sleeping function called from invalid context at net/netlink/af_netlink.c:183 in_atomic(): 1, irqs_disabled(): 0, pid: 1510, name: rmmod 2 locks held by rmmod/1510: #0: (genl_mutex){+.+.+.}, at: [<ffffffff8138283b>] genl_unregister_family+0x2b/0x130 #1: (rcu_read_lock){.+.+..}, at: [<ffffffff8138270c>] __genl_unregister_mc_group+0x1c/0x120 Pid: 1510, comm: rmmod Not tainted 2.6.31-wl #444 Call Trace: [<ffffffff81044ff9>] __might_sleep+0x119/0x150 [<ffffffff81380501>] netlink_table_grab+0x21/0x100 [<ffffffff813813a3>] netlink_clear_multicast_users+0x23/0x60 [<ffffffff81382761>] __genl_unregister_mc_group+0x71/0x120 [<ffffffff81382866>] genl_unregister_family+0x56/0x130 [<ffffffffa0007d85>] nl80211_exit+0x15/0x20 [cfg80211] [<ffffffffa000005a>] cfg80211_exit+0x1a/0x40 [cfg80211] Fix in the same way by grabbing the netlink table lock before doing rcu_read_lock(). Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-09-24 15:44:05 -07:00
Jan Beulich	4481374ce8	mm: replace various uses of num_physpages by totalram_pages Sizing of memory allocations shouldn't depend on the number of physical pages found in a system, as that generally includes (perhaps a huge amount of) non-RAM pages. The amount of what actually is usable as storage should instead be used as a basis here. Some of the calculations (i.e. those not intending to use high memory) should likely even use (totalram_pages - totalhigh_pages). Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Dave Airlie <airlied@linux.ie> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: "David S. Miller" <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-22 07:17:38 -07:00
Johannes Berg	d136f1bd36	genetlink: fix netns vs. netlink table locking Since my commits introducing netns awareness into genetlink we can get this problem: BUG: scheduling while atomic: modprobe/1178/0x00000002 2 locks held by modprobe/1178: #0: (genl_mutex){+.+.+.}, at: [<ffffffff8135ee1a>] genl_register_mc_grou #1: (rcu_read_lock){.+.+..}, at: [<ffffffff8135eeb5>] genl_register_mc_g Pid: 1178, comm: modprobe Not tainted 2.6.31-rc8-wl-34789-g95cb731-dirty # Call Trace: [<ffffffff8103e285>] __schedule_bug+0x85/0x90 [<ffffffff81403138>] schedule+0x108/0x588 [<ffffffff8135b131>] netlink_table_grab+0xa1/0xf0 [<ffffffff8135c3a7>] netlink_change_ngroups+0x47/0x100 [<ffffffff8135ef0f>] genl_register_mc_group+0x12f/0x290 because I overlooked that netlink_table_grab() will schedule, thinking it was just the rwlock. However, in the contention case, that isn't actually true. Fix this by letting the code grab the netlink table lock first and then the RCU for netns protection. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-09-14 17:02:50 -07:00

1 2 3 4 5

233 Commits