OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
stephen hemminger	b2ad5e5fcc	tipc: spelling errors Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-30 16:56:41 -04:00
Ying Xue	1a194c2d59	tipc: fix lockdep warning when intra-node messages are delivered When running tipcTC&tipcTS test suite, below lockdep unsafe locking scenario is reported: [ 1109.997854] [ 1109.997988] ================================= [ 1109.998290] [ INFO: inconsistent lock state ] [ 1109.998575] 3.17.0-rc1+ #113 Not tainted [ 1109.998762] --------------------------------- [ 1109.998762] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [ 1109.998762] swapper/7/0 [HC0[0]:SC1[1]:HE1:SE0] takes: [ 1109.998762] (slock-AF_TIPC){+.?...}, at: [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc] [ 1109.998762] {SOFTIRQ-ON-W} state was registered at: [ 1109.998762] [<ffffffff810a4770>] __lock_acquire+0x6a0/0x1d80 [ 1109.998762] [<ffffffff810a6555>] lock_acquire+0x95/0x1e0 [ 1109.998762] [<ffffffff81a2d1ce>] _raw_spin_lock+0x3e/0x80 [ 1109.998762] [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc] [ 1109.998762] [<ffffffffa0004fe8>] tipc_link_xmit+0xa8/0xc0 [tipc] [ 1109.998762] [<ffffffffa000ec6f>] tipc_sendmsg+0x15f/0x550 [tipc] [ 1109.998762] [<ffffffffa000f165>] tipc_connect+0x105/0x140 [tipc] [ 1109.998762] [<ffffffff817676ee>] SYSC_connect+0xae/0xc0 [ 1109.998762] [<ffffffff81767b7e>] SyS_connect+0xe/0x10 [ 1109.998762] [<ffffffff817a9788>] compat_SyS_socketcall+0xb8/0x200 [ 1109.998762] [<ffffffff81a306e5>] sysenter_dispatch+0x7/0x1f [ 1109.998762] irq event stamp: 241060 [ 1109.998762] hardirqs last enabled at (241060): [<ffffffff8105a4ad>] __local_bh_enable_ip+0x6d/0xd0 [ 1109.998762] hardirqs last disabled at (241059): [<ffffffff8105a46f>] __local_bh_enable_ip+0x2f/0xd0 [ 1109.998762] softirqs last enabled at (241020): [<ffffffff81059a52>] _local_bh_enable+0x22/0x50 [ 1109.998762] softirqs last disabled at (241021): [<ffffffff8105a626>] irq_exit+0x96/0xc0 [ 1109.998762] [ 1109.998762] other info that might help us debug this: [ 1109.998762] Possible unsafe locking scenario: [ 1109.998762] [ 1109.998762] CPU0 [ 1109.998762] ---- [ 1109.998762] lock(slock-AF_TIPC); [ 1109.998762] <Interrupt> [ 1109.998762] lock(slock-AF_TIPC); [ 1109.998762] [ 1109.998762] * DEADLOCK * [ 1109.998762] [ 1109.998762] 2 locks held by swapper/7/0: [ 1109.998762] #0: (rcu_read_lock){......}, at: [<ffffffff81782dc9>] __netif_receive_skb_core+0x69/0xb70 [ 1109.998762] #1: (rcu_read_lock){......}, at: [<ffffffffa0001c90>] tipc_l2_rcv_msg+0x40/0x260 [tipc] [ 1109.998762] [ 1109.998762] stack backtrace: [ 1109.998762] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 3.17.0-rc1+ #113 [ 1109.998762] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 1109.998762] ffffffff82745830 ffff880016c03828 ffffffff81a209eb 0000000000000007 [ 1109.998762] ffff880017b3cac0 ffff880016c03888 ffffffff81a1c5ef 0000000000000001 [ 1109.998762] ffff880000000001 ffff880000000000 ffffffff81012d4f 0000000000000000 [ 1109.998762] Call Trace: [ 1109.998762] <IRQ> [<ffffffff81a209eb>] dump_stack+0x4e/0x68 [ 1109.998762] [<ffffffff81a1c5ef>] print_usage_bug+0x1f1/0x202 [ 1109.998762] [<ffffffff81012d4f>] ? save_stack_trace+0x2f/0x50 [ 1109.998762] [<ffffffff810a406c>] mark_lock+0x28c/0x2f0 [ 1109.998762] [<ffffffff810a3440>] ? print_irq_inversion_bug.part.46+0x1f0/0x1f0 [ 1109.998762] [<ffffffff810a467d>] __lock_acquire+0x5ad/0x1d80 [ 1109.998762] [<ffffffff810a70dd>] ? trace_hardirqs_on+0xd/0x10 [ 1109.998762] [<ffffffff8108ace8>] ? sched_clock_cpu+0x98/0xc0 [ 1109.998762] [<ffffffff8108ad2b>] ? local_clock+0x1b/0x30 [ 1109.998762] [<ffffffff810a10dc>] ? lock_release_holdtime.part.29+0x1c/0x1a0 [ 1109.998762] [<ffffffff8108aa05>] ? sched_clock_local+0x25/0x90 [ 1109.998762] [<ffffffffa000dec0>] ? tipc_sk_get+0x60/0x80 [tipc] [ 1109.998762] [<ffffffff810a6555>] lock_acquire+0x95/0x1e0 [ 1109.998762] [<ffffffffa0011969>] ? tipc_sk_rcv+0x49/0x2b0 [tipc] [ 1109.998762] [<ffffffff810a6fb6>] ? trace_hardirqs_on_caller+0xa6/0x1c0 [ 1109.998762] [<ffffffff81a2d1ce>] _raw_spin_lock+0x3e/0x80 [ 1109.998762] [<ffffffffa0011969>] ? tipc_sk_rcv+0x49/0x2b0 [tipc] [ 1109.998762] [<ffffffffa000dec0>] ? tipc_sk_get+0x60/0x80 [tipc] [ 1109.998762] [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc] [ 1109.998762] [<ffffffffa00076bd>] tipc_rcv+0x5ed/0x960 [tipc] [ 1109.998762] [<ffffffffa0001d1c>] tipc_l2_rcv_msg+0xcc/0x260 [tipc] [ 1109.998762] [<ffffffffa0001c90>] ? tipc_l2_rcv_msg+0x40/0x260 [tipc] [ 1109.998762] [<ffffffff81783345>] __netif_receive_skb_core+0x5e5/0xb70 [ 1109.998762] [<ffffffff81782dc9>] ? __netif_receive_skb_core+0x69/0xb70 [ 1109.998762] [<ffffffff81784eb9>] ? dev_gro_receive+0x259/0x4e0 [ 1109.998762] [<ffffffff817838f6>] __netif_receive_skb+0x26/0x70 [ 1109.998762] [<ffffffff81783acd>] netif_receive_skb_internal+0x2d/0x1f0 [ 1109.998762] [<ffffffff81785518>] napi_gro_receive+0xd8/0x240 [ 1109.998762] [<ffffffff815bf854>] e1000_clean_rx_irq+0x2c4/0x530 [ 1109.998762] [<ffffffff815c1a46>] e1000_clean+0x266/0x9c0 [ 1109.998762] [<ffffffff8108ad2b>] ? local_clock+0x1b/0x30 [ 1109.998762] [<ffffffff8108aa05>] ? sched_clock_local+0x25/0x90 [ 1109.998762] [<ffffffff817842b1>] net_rx_action+0x141/0x310 [ 1109.998762] [<ffffffff810bd710>] ? handle_fasteoi_irq+0xe0/0x150 [ 1109.998762] [<ffffffff81059fa6>] __do_softirq+0x116/0x4d0 [ 1109.998762] [<ffffffff8105a626>] irq_exit+0x96/0xc0 [ 1109.998762] [<ffffffff81a30d07>] do_IRQ+0x67/0x110 [ 1109.998762] [<ffffffff81a2ee2f>] common_interrupt+0x6f/0x6f [ 1109.998762] <EOI> [<ffffffff8100d2b7>] ? default_idle+0x37/0x250 [ 1109.998762] [<ffffffff8100d2b5>] ? default_idle+0x35/0x250 [ 1109.998762] [<ffffffff8100dd1f>] arch_cpu_idle+0xf/0x20 [ 1109.998762] [<ffffffff810999fd>] cpu_startup_entry+0x27d/0x4d0 [ 1109.998762] [<ffffffff81034c78>] start_secondary+0x188/0x1f0 When intra-node messages are delivered from one process to another process, tipc_link_xmit() doesn't disable BH before it directly calls tipc_sk_rcv() on process context to forward messages to destination socket. Meanwhile, if messages delivered by remote node arrive at the node and their destinations are also the same socket, tipc_sk_rcv() running on process context might be preempted by tipc_sk_rcv() running BH context. As a result, the latter cannot obtain the socket lock as the lock was obtained by the former, however, the former has no chance to be run as the latter is owning the CPU now, so headlock happens. To avoid it, BH should be always disabled in tipc_sk_rcv(). Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-21 15:28:15 -04:00
Ying Xue	7b8613e0a1	tipc: fix a potential deadlock Locking dependency detected below possible unsafe locking scenario: CPU0 CPU1 T0: tipc_named_rcv() tipc_rcv() T1: [grab nametble write lock]* [grab node lock]* T2: tipc_update_nametbl() tipc_node_link_up() T3: tipc_nodesub_subscribe() tipc_nametbl_publish() T4: [grab node lock]* [grab nametble write lock]* The opposite order of holding nametbl write lock and node lock on above two different paths may result in a deadlock. If we move the the updating of the name table after link state named out of node lock, the reverse order of holding locks will be eliminated, and as a result, the deadlock risk. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-21 15:28:15 -04:00
Jon Paul Maloy	643566d4b4	tipc: fix bug in bundled buffer reception In commit `ec8a2e5621` ("tipc: same receive code path for connection protocol and data messages") we omitted the the possiblilty that an arriving message extracted from a bundle buffer may be a multicast message. Such messages need to be to be delivered to the socket via a separate function, tipc_sk_mcast_rcv(). As a result, small multicast messages arriving as members of a bundle buffer will be silently dropped. This commit corrects the error by considering this case in the function tipc_link_bundle_rcv(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-17 23:50:53 -04:00
Jon Maloy	908344cdda	tipc: fix bug in multicast congestion handling One aim of commit `50100a5e39` ("tipc: use pseudo message to wake up sockets after link congestion") was to handle link congestion abatement in a uniform way for both unicast and multicast transmit. However, the latter doesn't work correctly, and has been broken since the referenced commit was applied. If a user now sends a burst of multicast messages that is big enough to cause broadcast link congestion, it will be put to sleep, and not be waked up when the congestion abates as it should be. This has two reasons. First, the flag that is used, TIPC_WAKEUP_USERS, is set correctly, but in the wrong field. Instead of setting it in the 'action_flags' field of the arrival node struct, it is by mistake set in the dummy node struct that is owned by the broadcast link, where it will never tested for. Second, we cannot use the same flag for waking up unicast and multicast users, since the function tipc_node_unlock() needs to pick the wakeup pseudo messages to deliver from different queues. It must hence be able to distinguish between the two cases. This commit solves this problem by adding a new flag TIPC_WAKEUP_BCAST_USERS, and a new function tipc_bclink_wakeup_user(). The latter is to be called by tipc_node_unlock() when the named flag, now set in the correct field, is encountered. v2: using explicit 'unsigned int' declaration instead of 'uint', as per comment from David Miller. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-07 14:50:15 -04:00
Erik Hugne	0fc4dffad1	tipc: fix sparse warnings This fixes the following sparse warnings: sparse: symbol 'tipc_update_nametbl' was not declared. Should it be static? Also, the function is changed to return bool upon success, rather than a potentially freed pointer. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-10 14:00:58 -07:00
Erik Hugne	a5325ae5b8	tipc: add name distributor resiliency queue TIPC name table updates are distributed asynchronously in a cluster, entailing a risk of certain race conditions. E.g., if two nodes simultaneously issue conflicting (overlapping) publications, this may not be detected until both publications have reached a third node, in which case one of the publications will be silently dropped on that node. Hence, we end up with an inconsistent name table. In most cases this conflict is just a temporary race, e.g., one node is issuing a publication under the assumption that a previous, conflicting, publication has already been withdrawn by the other node. However, because of the (rtt related) distributed update delay, this may not yet hold true on all nodes. The symptom of this failure is a syslog message: "tipc: Cannot publish {%u,%u,%u}, overlap error". In this commit we add a resiliency queue at the receiving end of the name table distributor. When insertion of an arriving publication fails, we retain it in this queue for a short amount of time, assuming that another update will arrive very soon and clear the conflict. If so happens, we insert the publication, otherwise we drop it. The (configurable) retention value defaults to 2000 ms. Knowing from experience that the situation described above is extremely rare, there is no risk that the queue will accumulate any large number of items. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 17:51:48 -07:00
Erik Hugne	f4ad8a4b8b	tipc: refactor name table updates out of named packet receive routine We need to perform the same actions when processing deferred name table updates, so this functionality is moved to a separate function. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 17:51:48 -07:00
Ying Xue	cc086fcf92	tipc: fix a potential oops Commit `6c9808ce09` ("tipc: remove port_lock") accidentally involves a potential bug: when tipc socket instance(tsk) is not got with given reference number in tipc_sk_get(), tsk is set to NULL. Subsequently we jump to exit label where to decrease socket reference counter pointed by tsk pointer in tipc_sk_put(). However, As now tsk is NULL, oops may happen because of touching a NULL pointer. Signed-off-by: Ying Xue <ying.xue@windriver.com> Acked-by: Erik Hugne <erik.hugne@ericsson.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-29 20:22:43 -07:00
Jon Paul Maloy	301bae56f2	tipc: merge struct tipc_port into struct tipc_sock We complete the merging of the port and socket layer by aggregating the fields of struct tipc_port directly into struct tipc_sock, and moving the combined structure into socket.c. We also move all functions and macros that are not any longer exposed to the rest of the stack into socket.c, and rename them accordingly. Despite the size of this commit, there are no functional changes. We have only made such changes that are necessary due of the removal of struct tipc_port. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:35 -07:00
Jon Paul Maloy	808d90f9c5	tipc: remove files ref.h and ref.c The reference table is now 'socket aware' instead of being generic, and has in reality become a socket internal table. In order to be able to minimize the API exposed by the socket layer towards the rest of the stack, we now move the reference table definitions and functions into the file socket.c, and rename the functions accordingly. There are no functional changes in this commit. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:35 -07:00
Jon Paul Maloy	2e84c60b77	tipc: remove include file port.h We move the inline functions in the file port.h to socket.c, and modify their names accordingly. We move struct tipc_port and some macros to socket.h. Finally, we remove the file port.h. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:35 -07:00
Jon Paul Maloy	0fc87aaebd	tipc: remove source file port.c In this commit, we move the remaining functions in port.c to socket.c, and give them new names that correspond to their new location. We then remove the file port.c. There are only cosmetic changes to the moved functions. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:35 -07:00
Jon Paul Maloy	6c9808ce09	tipc: remove port_lock In previous commits we have reduced usage of port_lock to a minimum, and complemented it with usage of bh_lock_sock() at the remaining locations. The purpose has been to remove this lock altogether, since it largely duplicates the role of bh_lock_sock. We are now ready to do this. However, we still need to protect the BH callers from inadvertent release of the socket while they hold a reference to it. We do this by replacing port_lock by a combination of a rw-lock protecting the reference table as such, and updating the socket reference counter while the socket is referenced from BH. This technique is more standard and comprehensible than the previous approach, and turns out to have a positive effect on overall performance. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:34 -07:00
Jon Paul Maloy	9b50fd087a	tipc: replace port pointer with socket pointer in registry In order to make tipc_sock the only entity referencable from other parts of the stack, we add a tipc_sock pointer instead of a tipc_port pointer to the registry. As a consequence, we also let the function tipc_port_lock() return a pointer to a tipc_sock instead of a tipc_port. We keep the function's name for now, since the lock still is owned by the port. This is another step in the direction of eliminating port_lock, replacing its usage with lock_sock() and bh_lock_sock(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:34 -07:00
Jon Paul Maloy	5a9ee0be33	tipc: use registry when scanning sockets The functions tipc_port_get_ports() and tipc_port_reinit() scan over all sockets/ports to access each of them. This is done by using a dedicated linked list, 'tipc_socks' where all sockets are members. The list is in turn protected by a spinlock, 'port_list_lock', while each socket is locked by using port_lock at the moment of access. In order to reduce complexity and risk of deadlock, we want to get rid of the linked list and the accompanying spinlock. This is what we do in this commit. Instead of the linked list, we use the port registry to scan across the sockets. We also add usage of bh_lock_sock() inside the scope of port_lock in both functions, as a preparation for the complete removal of port_lock. Finally, we move the functions from port.c to socket.c, and rename them to tipc_sk_sock_show() and tipc_sk_reinit() repectively. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:34 -07:00
Jon Paul Maloy	5b8fa7ce82	tipc: eliminate functions tipc_port_init and tipc_port_destroy After the latest changes to the socket/port layer the existence of the functions tipc_port_init() and tipc_port_destroy() cannot be justified. They are both called only once, from tipc_sk_create() and tipc_sk_delete() respectively, and their functionality can better be merged into the latter two functions. This also entails that all remaining references to port_lock now are made from inside socket.c, something that will make it easier to remove this lock. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:34 -07:00
Jon Paul Maloy	739f5e4efc	tipc: redefine message acknowledge function The function tipc_acknowledge() is a remnant from the obsolete native API. Currently, it grabs port_lock, before building an acknowledge message and sending it to the peer. Since all access to socket members now is protected by the socket lock, it has become unnecessary to grab port_lock here. In this commit, we remove the usage of port_lock, simplify the function, and move it to socket.c, renaming it to tipc_sk_send_ack(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:34 -07:00
Jon Paul Maloy	dadebc0029	tipc: eliminate port_connect()/port_disconnect() functions tipc_port_connect()/tipc_port_disconnect() are remnants of the obsolete native API. Their only task is to grab port_lock and call the functions __tipc_port_connect()/__tipc_port_disconnect() respectively, which will perform the actual state change. Since socket/port exection now is single-threaded the use of port_lock is not needed any more, so we can safely replace the two functions with their lock-free counterparts. In this commit, we remove the two functions. Furthermore, the contents of __tipc_port_disconnect() is so trivial that we choose to eliminate that function too, expanding its functionality into tipc_shutdown(). __tipc_port_connect() is simplified, moved to socket.c, and given the more correct name tipc_sk_finish_conn(). Finally, we eliminate the function auto_connect(), and expand its contents into filter_connect(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:34 -07:00
Jon Paul Maloy	80e44c2225	tipc: eliminate function tipc_port_shutdown() tipc_port_shutdown() is a remnant from the now obsolete native interface. As such it grabs port_lock in order to protect itself from concurrent BH processing. However, after the recent changes to the port/socket upcalls, sockets are now basically single-threaded, and all execution, except the read-only tipc_sk_timer(), is executing within the protection of lock_sock(). So the use of port_lock is not needed here. In this commit we eliminate the whole function, and merge it into its only caller, tipc_shutdown(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:34 -07:00
Jon Paul Maloy	5728901581	tipc: clean up socket timer function The last remaining BH upcall to the socket, apart for the message reception function tipc_sk_rcv(), is the timer function. We prefer to let this function continue executing in BH, since it only does read-acces to semi-permanent data, but we make three changes to it: 1) We introduce a bh_lock_sock()/bh_unlock_sock() inside the scope of port_lock. This is a preparation for replacing port_lock with bh_lock_sock() at the locations where it is still used. 2) We move the function from port.c to socket.c, as a further step of eliminating the port code level altogether. 3) We let it make use of the newly introduced tipc_msg_create() function. This enables us to get rid of three context specific functions (port_create_self_abort_msg() etc.) in port.c Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:33 -07:00
Jon Paul Maloy	02be61a981	tipc: use message to abort connections when losing contact to node In the current implementation, each 'struct tipc_node' instance keeps a linked list of those ports/sockets that are connected to the node represented by that struct. The purpose of this is to let the node object know which sockets to alert when it loses contact with its peer node, i.e., which sockets need to have their connections aborted. This entails an unwanted direct reference from the node structure back to the port/socket structure, and a need to grab port_lock when we have to make an upcall to the port. We want to get rid of this unecessary BH entry point into the socket, and also eliminate its use of port_lock. In this commit, we instead let the node struct keep list of "connected socket" structs, which each represents a connected socket, but is allocated independently by the node at the moment of connection. If the node loses contact with its peer node, the list is traversed, and a "connection abort" message is created for each entry in the list. The message is sent to it respective connected socket using the ordinary data path, and the receiving socket aborts its connections upon reception of the message. This enables us to get rid of the direct reference from 'struct node' to ´struct port', and another unwanted BH access point to the latter. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:33 -07:00
Jon Paul Maloy	50100a5e39	tipc: use pseudo message to wake up sockets after link congestion The current link implementation keeps a linked list of blocked ports/ sockets that is populated when there is link congestion. The purpose of this is to let the link know which users to wake up when the congestion abates. This adds unnecessary complexity to the data structure and the code, since it forces us to involve the link each time we want to delete a socket. It also forces us to grab the spinlock port_lock within the scope of node_lock. We want to get rid of this direct dependence, as well as the deadlock hazard resulting from the usage of port_lock. In this commit, we instead let the link keep list of a "wakeup" pseudo messages for use in such situations. Those messages are sent to the pending sockets via the ordinary message reception path, and wake up the socket's owner when they are received. This enables us to get rid of the 'waiting_ports' linked lists in struct tipc_port that manifest this direct reference. As a consequence, we can eliminate another BH entry into the socket, and hence the need to grab port_lock. This is a further step in our effort to remove port_lock altogether. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:33 -07:00
Jon Paul Maloy	1dd0bd2b14	tipc: introduce new function tipc_msg_create() The function tipc_msg_init() has turned out to be of limited value in many cases. It take too few parameters to be usable for creating a complete message, it makes too many assumptions about what the message should be used for, and it does not allocate any buffer to be returned to the caller. Therefore, we now introduce the new function tipc_msg_create(), which takes all the parameters needed to create a full message, and returns a buffer of the requested size. The new function will be very useful for the changes we will be doing in later commits in this series. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:33 -07:00
David S. Miller	02784f1b05	tipc: Fix build. Missing semicolon in range check fix. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-19 11:16:38 -07:00
Erik Hugne	ac32c7f705	tipc: fix message importance range check Commit `3b4f302d85` ("tipc: eliminate redundant locking") introduced a bug by removing the sanity check for message importance, allowing programs to assign any value to the msg_user field. This will mess up the packet reception logic and may cause random link resets. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-16 20:17:34 -07:00
Wei Yongjun	ad025a56a5	tipc: remove duplicated include from socket.c Remove duplicated include. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-29 15:51:14 -07:00
Jon Paul Maloy	13e9b9972f	tipc: make tipc_buf_append() more robust As per comment from David Miller, we try to make the buffer reassembly function more resilient to user errors than it is today. - We check that the "buf" parameter always is set, since this is mandatory input. - We ensure that buf->next always is set to NULL before linking in the buffer, instead of relying of the caller to have done this. - We ensure that the "tail" pointer in the head buffer's control block is initialized to NULL when the first fragment arrives. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-28 18:34:01 -07:00
Wei Yongjun	52f50ce556	tipc: fix sparse non static symbol warnings Fixes the following sparse warnings: net/tipc/socket.c:545:5: warning: symbol 'tipc_sk_proto_rcv' was not declared. Should it be static? net/tipc/socket.c:2015:5: warning: symbol 'tipc_ioctl' was not declared. Should it be static? Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-20 22:19:04 -07:00
Jon Paul Maloy	6f92ee54b3	tipc: ensure sequential message delivery across dual bearers When we run broadcast packets over dual bearers/interfaces, the current transmission code is flipping bearers between each sent packet, with the purpose of leveraging the double bandwidth available. The receiving bclink is resequencing the packets if needed, so all messages are delivered upwards from the broadcast link in the correct order, even if they may arrive in concurrent interrupts. However, at the moment of delivery upwards to the socket, we release all spinlocks (bclink_lock, node_lock), so it is still possible that arriving messages bypass each other before they reach the socket queue. We fix this by applying the same technique we are using for unicast traffic. We use a link selector (i.e., the last bit of sending port number) to ensure that messages from the same sender socket always are sent over the same bearer. This guarantees sequential delivery between socket pairs, which is sufficient to satisfy the protocol spec, as well as all known user requirements. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-16 21:38:19 -07:00
Jon Paul Maloy	9fbfb8b120	tipc: rename temporarily named functions After the previous commit, we can now give the functions with temporary names, such as tipc_link_xmit2(), tipc_msg_build2() etc., their proper names. There are no functional changes in this commit. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-16 21:38:19 -07:00
Jon Paul Maloy	c4116e1057	tipc: remove unreferenced functions We can now remove a number of functions which have become obsolete and unreferenced through this commit series. There are no functional changes in this commit. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-16 21:38:19 -07:00
Jon Paul Maloy	0abd8ff21f	tipc: start using the new multicast functions In this commit, we convert the socket multicast send function to directly call the new multicast/broadcast function (tipc_bclink_xmit2()) introduced in the previous commit. We do this instead of letting the call go via the now obsolete tipc_port_mcast_xmit(), hence saving a call level and some code complexity. We also remove the initial destination lookup at the message sending side, and replace that with an unconditional lookup at the receiving side, including on the sending node itself. This makes the destination lookup and message transfer more uniform than before. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-16 21:38:18 -07:00
Jon Paul Maloy	078bec826f	tipc: add new functions for multicast and broadcast distribution We add a new broadcast link transmit function in bclink.c and a new receive function in socket.c. The purpose is to move the branching between external and internal destination down to the link layer, just as we have done with unicast in earlier commits. We also make use of the new link-independent fragmentation support that was introduced in an earlier commit series. This gives a shorter and simpler code path, and makes it possible to obtain copy-free buffer delivery to all node local destination sockets. The new transmission code is added in parallel with the existing one, and will be used by the socket multicast send function in the next commit in this series. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-16 21:38:18 -07:00
Jon Paul Maloy	25b660c7e2	tipc: let internal link users call the new link send function We convert the link internal users (changeover protocol, broadcast synchronization) to use the new packet send function. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-16 21:38:18 -07:00
Jon Paul Maloy	dbdf6d24ad	tipc: make name table distributor use new send function In a previous commit series ("tipc: new unicast transmission code") we introduced a new message sending function, tipc_link_xmit2(), and moved the unicast data users over to use that function. We now let the internal name table distributor do the same. The interaction between the name distributor and the node/link layer also becomes significantly simpler, so we can eliminate the function tipc_link_names_xmit(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-16 21:38:18 -07:00
David S. Miller	1a98c69af1	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-16 14:09:34 -07:00
Fabian Frederick	7ceaa583be	tipc: remove unnecessary break after return Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-15 16:26:59 -07:00
Jon Paul Maloy	999417549c	tipc: clear 'next'-pointer of message fragments before reassembly If the 'next' pointer of the last fragment buffer in a message is not zeroed before reassembly, we risk ending up with a corrupt message, since the reassembly function itself isn't doing this. Currently, when a buffer is retrieved from the deferred queue of the broadcast link, the next pointer is not cleared, with the result as described above. This commit corrects this, and thereby fixes a bug that may occur when long broadcast messages are transmitted across dual interfaces. The bug has been present since `40ba3cdf54` ("tipc: message reassembly using fragment chain") This commit should be applied to both net and net-next. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-11 15:02:10 -07:00
Erik Hugne	70452dcb6d	tipc: fix a memleak when sending data This fixes a regression bug caused by: `067608e9d0` ("tipc: introduce direct iovec to buffer chain fragmentation function") If data is sent on a nonblocking socket and the destination link is congested, the buffer chain is leaked. We fix this by freeing the chain in this case. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-08 16:10:01 -07:00
Jon Paul Maloy	29322d0db9	tipc: fix bug in multicast/broadcast message reassembly Since commit `37e22164a8` ("tipc: rename and move message reassembly function") reassembly of long broadcast messages has been broken. This is because we test for a non-NULL return value of the *buf parameter as criteria for succesful reassembly. However, this parameter is left defined even after reception of the first fragment, when reassebly is still incomplete. This leads to a kernel crash as soon as a the first fragment of a long broadcast message is received. We fix this with this commit, by implementing a stricter behavior of the function and its return values. This commit should be applied to both net and net-next. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-08 15:55:09 -07:00
Erik Hugne	3f53bd8f8b	tipc: fix link acknowledge logic in receive path Link state acks triggered from the receive path is done before the last received packet have been processed by the link layer. The effect of this is that the last received packet will not be included in the ack. This causes problems if the link window is set to TIPC_MIN_LINK_WIN, where the ack interval will be equal to the link tolerance, and the link enters a stop-and-go behavior. We move the ack logic to after link state processing, just before the packet is delivered to higher layers. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Carl Sigurjonsson <carl.sigurjonsson@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-07 19:55:07 -07:00
Erik Hugne	7ae934bebe	tipc: refactor message delivery out of tipc_rcv This is a cosmetic change, separating message delivery from the link state processing. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-07 19:55:07 -07:00
Jon Paul Maloy	60120526c2	tipc: simplify connection congestion handling As a consequence of the recently introduced serialized access to the socket in commit 8d94168a761819d10252bab1f8de6d7b202c3baa ("tipc: same receive code path for connection protocol and data messages") we can make a number of simplifications in the detection and handling of connection congestion situations. - We don't need to keep two counters, one for sent messages and one for acked messages. There is no longer any risk for races between acknowledge messages arriving in BH and data message sending running in user context. So we merge this into one counter, 'sent_unacked', which is incremented at sending and subtracted from at acknowledge reception. - We don't need to set the 'congested' field in tipc_port to true before we sent the message, and clear it when sending is successful. (As a matter of fact, it was never necessary; the field was set in link_schedule_port() before any wakeup could arrive anyway.) - We keep the conditions for link congestion and connection connection congestion separated. There would otherwise be a risk that an arriving acknowledge message may wake up a user sleeping because of link congestion. - We can simplify reception of acknowledge messages. We also make some cosmetic/structural changes: - We rename the 'congested' field to the more correct 'link_cong´. - We rename 'conn_unacked' to 'rcv_unacked' - We move the above mentioned fields from struct tipc_port to struct tipc_sock. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:56 -07:00
Jon Paul Maloy	ac0074ee70	tipc: clean up connection protocol reception function We simplify the code for receiving connection probes, leveraging the recently introduced tipc_msg_reverse() function. We also stick to the principle of sending a possible response message directly from the calling (tipc_sk_rcv or backlog_rcv) functions, hence making the call chain shallower and easier to follow. We make one small protocol change here, allowed according to the spec. If a protocol message arrives from a remote socket that is not the one we are connected to, we are currently generating a connection abort message and send it to the source. This behavior is unnecessary, and might even be a security risk, so instead we now choose to only ignore the message. The consequnce for the sender is that he will need longer time to discover his mistake (until the next timeout), but this is an extreme corner case, and may happen anyway under other circumstances, so we deem this change acceptable. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:56 -07:00
Jon Paul Maloy	ec8a2e5621	tipc: same receive code path for connection protocol and data messages As a preparation to eliminate port_lock we need to bring reception of connection protocol messages under proper protection of bh_lock_sock or socket owner. We fix this by letting those messages follow the same code path as incoming data messages. As a side effect of this change, the last reference to the function net_route_msg() disappears, and we can eliminate that function. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:56 -07:00
Jon Paul Maloy	b786e2b0fa	tipc: let port protocol senders use new link send function Several functions in port.c, related to the port protocol and connection shutdown, need to send messages. We now convert them to use the new link send function. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:55 -07:00
Jon Paul Maloy	4ccfe5e041	tipc: connection oriented transport uses new send functions We move the message sending across established connections to use the message preparation and send functions introduced earlier in this series. We now do the message preparation and call to the link send function directly from the socket, instead of going via the port layer. As a consequence of this change, the functions tipc_send(), tipc_port_iovec_rcv(), tipc_port_iovec_reject() and tipc_reject_msg() become unreferenced and can be eliminated from port.c. For the same reason, the functions tipc_link_xmit_fast(), tipc_link_iovec_xmit_long() and tipc_link_iovec_fast() can be eliminated from link.c. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:55 -07:00
Jon Paul Maloy	e2dafe87d3	tipc: RDM/DGRAM transport uses new fragmenting and sending functions We merge the code for sending port name and port identity addressed messages into the corresponding send functions in socket.c, and start using the new fragmenting and transmit functions we just have introduced. This saves a call level and quite a few code lines, as well as making this part of the code easier to follow. As a consequence, the functions tipc_send2name() and tipc_send2port() in port.c can be removed. For practical reasons, we break out the code for sending multicast messages from tipc_sendmsg() and move it into a separate function, tipc_sendmcast(), but we do not yet convert it into using the new build/send functions. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:55 -07:00
Jon Paul Maloy	5a379074a7	tipc: introduce message evaluation function When a message arrives in a node and finds no destination socket, we may need to drop it, reject it, or forward it after a secondary destination lookup. The latter two cases currently results in a code path that is perceived as complex, because it follows a deep call chain via obscure functions such as net_route_named_msg() and net_route_msg(). We now introduce a function, tipc_msg_eval(), that takes the decision about whether such a message should be rejected or forwarded, but leaves it to the caller to actually perform the indicated action. If the decision is 'reject', it is still the task of the recently introduced function tipc_msg_reverse() to take the final decision about whether the message is rejectable or not. In the latter case it drops the message. As a result of this change, we can finally eliminate the function net_route_named_msg(), and hence become independent of net_route_msg(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:55 -07:00
Jon Paul Maloy	8db1bae30b	tipc: separate building and sending of rejected messages The way we build and send rejected message is currenty perceived as hard to follow, partly because we let the transmission go via deep call chains through functions such as tipc_reject_msg() and net_route_msg(). We want to remove those functions, and make the call sequences shallower and simpler. For this purpose, we separate building and sending of rejected messages. We build the reject message using the new function tipc_msg_reverse(), and let the transmission go via the newly introduced tipc_link_xmit2() function, as all transmission eventually will do. We also ensure that all calls to tipc_link_xmit2() are made outside port_lock/bh_lock_sock. Finally, we replace all calls to tipc_reject_msg() with the two new calls at all locations in the code that we want to keep. The remaining calls are made from code that we are planning to remove, along with tipc_reject_msg() itself. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:55 -07:00
Jon Paul Maloy	067608e9d0	tipc: introduce direct iovec to buffer chain fragmentation function Fragmentation at message sending is currently performed in two places in link.c, depending on whether data to be transmitted is delivered in the form of an iovec or as a big sk_buff. Those functions are also tightly entangled with the send functions that are using them. We now introduce a re-entrant, standalone function, tipc_msg_build2(), that builds a packet chain directly from an iovec. Each fragment is sized according to the MTU value given by the caller, and is prepended with a correctly built fragment header, when needed. The function is independent from who is calling and where the chain will be delivered, as long as the caller is able to indicate a correct MTU. The function is tested, but not called by anybody yet. Since it is incompatible with the existing tipc_msg_build(), and we cannot yet remove that function, we have given it a temporary name. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:55 -07:00
Jon Paul Maloy	16e166b88c	tipc: make link mtu easily accessible from socket Message fragmentation is currently performed at link level, inside the protection of node_lock. This potentially binds up the sending link structure for a long time, instead of letting it do other tasks, such as handle reception of new packets. In this commit, we make the MTUs of each active link become easily accessible from the socket level, i.e., without taking any spinlock or dereferencing the target link pointer. This way, we make it possible to perform fragmentation in the sending socket, before sending the whole fragment chain to the link for transport. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:55 -07:00
Jon Paul Maloy	4f1688b2c6	tipc: introduce send functions for chained buffers in link The current link implementation provides several different transmit functions, depending on the characteristics of the message to be sent: if it is an iovec or an sk_buff, if it needs fragmentation or not, if the caller holds the node_lock or not. The permutation of these options gives us an unwanted amount of unnecessarily complex code. As a first step towards simplifying the send path for all messages, we introduce two new send functions at link level, tipc_link_xmit2() and __tipc_link_xmit2(). The former looks up a link to the message destination, and if one is found, it grabs the node lock and calls the second function, which works exclusively inside the node lock protection. If no link is found, and the destination is on the same node, it delivers the message directly to the local destination socket. The new functions take a buffer chain where all packet headers are already prepared, and the correct MTU has been used. These two functions will later replace all other link-level transmit functions. The functions are not backwards compatible, so we have added them as new functions with temporary names. They are tested, but have no users yet. Those will be added later in this series. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:54 -07:00
Jon Paul Maloy	e4de5fab80	tipc: use negative error return values in functions In some places, TIPC functions returns positive integers as return codes. This goes against standard Linux coding practice, and may even cause problems in some cases. We now change the return values of the functions filter_rcv() and filter_connect() to become signed integers, and return negative error codes when needed. The codes we use in these particular cases are still TIPC specific, since they are both part of the TIPC API and have no correspondence in errno.h Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:54 -07:00
Jon Paul Maloy	3d09fc4244	tipc: eliminate case of writing to freed memory In the function tipc_nodesub_notify() we call a function pointer aggregated into the object to be notified, whereafter we set the function pointer to NULL. However, in some cases the function pointed to will free the struct containing the function pointer, resulting in a write to already freed memory. This bug seems to always have been there, without causing any notable harm. In this commit we fix the problem by inverting the order of the zeroing and the function call. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:54 -07:00
Octavian Purdila	bad93e9d4e	net: add __pskb_copy_fclone and pskb_copy_for_clone There are several instances where a pskb_copy or __pskb_copy is immediately followed by an skb_clone. Add a couple of new functions to allow the copy skb to be allocated from the fclone cache and thus speed up subsequent skb_clone calls. Cc: Alexander Smirnov <alex.bluesman.smirnov@gmail.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Marek Lindner <mareklindner@neomailbox.ch> Cc: Simon Wunderlich <sw@simonwunderlich.de> Cc: Antonio Quartulli <antonio@meshcoding.com> Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Gustavo Padovan <gustavo@padovan.org> Cc: Johan Hedberg <johan.hedberg@gmail.com> Cc: Arvid Brodin <arvid.brodin@alten.se> Cc: Patrick McHardy <kaber@trash.net> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: Lauro Ramos Venancio <lauro.venancio@openbossa.org> Cc: Aloisio Almeida Jr <aloisio.almeida@openbossa.org> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: Jon Maloy <jon.maloy@ericsson.com> Cc: Allan Stephens <allan.stephens@windriver.com> Cc: Andrew Hendry <andrew.hendry@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Reviewed-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:38:02 -07:00
Jon Paul Maloy	02c00c2ab0	tipc: fix potential bug in function tipc_backlog_rcv In commit `4f4482dcd9` ("tipc: compensate for double accounting in socket rcv buffer") we access 'truesize' of a received buffer after it might have been released by the function filter_rcv(). In this commit we correct this by reading the value of 'truesize' to the stack before delivering the buffer to filter_rcv(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:01:30 -07:00
Arnaldo Carvalho de Melo	85d3fc9418	tipc: Don't reset the timeout when restarting As it may then take longer than what the user specified using setsockopt(SO_RCVTIMEO). Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-24 14:11:41 -04:00
Jon Paul Maloy	9816f0615d	tipc: merge port message reception into socket reception function In order to reduce complexity and save a call level during message reception at port/socket level, we remove the function tipc_port_rcv() and merge its functionality into tipc_sk_rcv(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:19:48 -04:00
Jon Paul Maloy	c82910e2a8	tipc: clean up neigbor discovery message reception The function tipc_disc_rcv(), which is handling received neighbor discovery messages, is perceived as messy, and it is hard to verify its correctness by code inspection. The fact that the task it is set to resolve is fairly complex does not make the situation better. In this commit we try to take a more systematic approach to the problem. We define a decision machine which takes three state flags as input, and produces three action flags as output. We then walk through all permutations of the state flags, and for each of them we describe verbally what is going on, plus that we set zero or more of the action flags. The action flags indicate what should be done once the decision machine has finished its job, while the last part of the function deals with performing those actions. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:19:48 -04:00
Jon Paul Maloy	38504c28a2	tipc: improve and extend media address conversion functions TIPC currently handles two media specific addresses: Ethernet MAC addresses and InfiniBand addresses. Those are kept in three different formats: 1) A "raw" format as obtained from the device. This format is known only by the media specific adapter code in eth_media.c and ib_media.c. 2) A "generic" internal format, in the form of struct tipc_media_addr, which can be referenced and passed around by the generic media- unaware code. 3) A serialized version of the latter, to be conveyed in neighbor discovery messages. Conversion between the three formats can only be done by the media specific code, so we have function pointers for this purpose in struct tipc_media. Here, the media adapters can install their own conversion functions at startup. We now introduce a new such function, 'raw2addr()', whose purpose is to convert from format 1 to format 2 above. We also try to as far as possible uniform commenting, variable names and usage of these functions, with the purpose of making them more comprehensible. We can now also remove the function tipc_l2_media_addr_set(), whose job is done better by the new function. Finally, we expand the field for serialized addresses (format 3) in discovery messages from 20 to 32 bytes. This is permitted according to the spec, and reduces the risk of problems when we add new media in the future. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:19:48 -04:00
Jon Paul Maloy	37e22164a8	tipc: rename and move message reassembly function The function tipc_link_frag_rcv() is in reality a re-entrant generic message reassemby function that has nothing in particular to do with the link, where it is defined now. This becomes obvious when we see the need to call the function from other places in the code. In this commit rename it to tipc_buf_append() and move it to the file msg.c. We also simplify its signature by moving the tail pointer to the control block of the head buffer, hence making the head buffer self-contained. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:19:48 -04:00
Jon Paul Maloy	5074ab89c5	tipc: mark head of reassembly buffer as non-linear The message reassembly function does not update the 'len' and 'data_len' fields of the head skbuff correctly when fragments are chained to it. This may sometimes lead to obsure errors, such as fragment reordering when we receive fragments which are cloned buffers. This commit fixes this, by ensuring that the two fields are updated correctly. Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:19:48 -04:00
Jon Paul Maloy	ec37dcd382	tipc: don't record link RESET or ACTIVATE messages as traffic In the current code, all incoming LINK_PROTOCOL messages, irrespective of type, nudge the "last message received" checkpoint, informing the link state machine that a message was received from the peer since last supervision timeout event. This inhibits the link from starting probing the peer unnecessarily. However, not only STATE messages are recorded as legitimate incoming traffic this way, but even RESET and ACTIVATE messages, which in reality are there to inform the link that the peer endpoint has been reset. At the same time, some RESET messages may be dropped instead of causing a link reset. This happens when the link endpoint thinks it is fully up and working, and the session number of the RESET is lower than or equal to the current link session. In such cases the RESET is perceived as a delayed remnant from an earlier session, or the current one, and dropped. Now, if a TIPC module is removed and then immediately reinserted, e.g. when using a script, RESET messages may arrive at the peer link endpoint before this one has had time to discover the failure. The RESET may be dropped because of the session number, but only after it has been recorded as a legitimate traffic event. Hence, the receiving link will not start probing, and not discover that the peer endpoint is down, at the same time ignoring the periodic RESET messages coming from that endpoint. We have ended up in a stale state where a failed link cannot be re-established. In this commit, we remedy this by nudging the checkpoint only for received STATE messages, not for RESET or ACTIVATE messages. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:19:48 -04:00
Jon Paul Maloy	4f4482dcd9	tipc: compensate for double accounting in socket rcv buffer The function net/core/sock.c::__release_sock() runs a tight loop to move buffers from the socket backlog queue to the receive queue. As a security measure, sk_backlog.len of the receiving socket is not set to zero until after the loop is finished, i.e., until the whole backlog queue has been transferred to the receive queue. During this transfer, the data that has already been moved is counted both in the backlog queue and the receive queue, hence giving an incorrect picture of the available queue space for new arriving buffers. This leads to unnecessary rejection of buffers by sk_add_backlog(), which in TIPC leads to unnecessarily broken connections. In this commit, we compensate for this double accounting by adding a counter that keeps track of it. The function socket.c::backlog_rcv() receives buffers one by one from __release_sock(), and adds them to the socket receive queue. If the transfer is successful, it increases a new atomic counter 'tipc_sock::dupl_rcvcnt' with 'truesize' of the transferred buffer. If a new buffer arrives during this transfer and finds the socket busy (owned), we attempt to add it to the backlog. However, when sk_add_backlog() is called, we adjust the 'limit' parameter with the value of the new counter, so that the risk of inadvertent rejection is eliminated. It should be noted that this change does not invalidate the original purpose of zeroing 'sk_backlog.len' after the full transfer. We set an upper limit for dupl_rcvcnt, so that if a 'wild' sender (i.e., one that doesn't respect the send window) keeps pumping in buffers to sk_add_backlog(), he will eventually reach an upper limit, (2 x TIPC_CONN_OVERLOAD_LIMIT). After that, no messages can be added to the backlog, and the connection will be broken. Ordinary, well- behaved senders will never reach this buffer limit at all. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:19:47 -04:00
Jon Paul Maloy	6163a194e0	tipc: decrease connection flow control window Memory overhead when allocating big buffers for data transfer may be quite significant. E.g., truesize of a 64 KB buffer turns out to be 132 KB, 2 x the requested size. This invalidates the "worst case" calculation we have been using to determine the default socket receive buffer limit, which is based on the assumption that 1024x64KB = 67MB buffers may be queued up on a socket. Since TIPC connections cannot survive hitting the buffer limit, we have to compensate for this overhead. We do that in this commit by dividing the fix connection flow control window from 1024 (2512) messages to 512 (2256). Since older version nodes send out acks at 512 message intervals, compatibility with such nodes is guaranteed, although performance may be non-optimal in such cases. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:19:47 -04:00
David S. Miller	5f013c9bc7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/altera/altera_sgdma.c net/netlink/af_netlink.c net/sched/cls_api.c net/sched/sch_api.c The netlink conflict dealt with moving to netlink_capable() and netlink_ns_capable() in the 'net' tree vs. supporting 'tc' operations in non-init namespaces. These were simple transformations from netlink_capable to netlink_ns_capable. The Altera driver conflict was simply code removal overlapping some void pointer cast cleanups in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-12 13:19:14 -04:00
Ying Xue	ca9cf06a06	tipc: don't directly overwrite node action_flags Each node action flag should be set or cleared separately, instead we now set the whole flags variable in one shot, and it's turned out to be hard to see which other flags are affected. Therefore, for instance, we explicitly clear TIPC_WAIT_OWN_LINKS_DOWN bit in node_lost_contact(). Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-09 01:41:01 -04:00
Ying Xue	aecb9bb89c	tipc: rename enum names of node flags Rename node flags to action_flags as well as its enum names so that they can reflect its real meanings. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-09 01:41:01 -04:00
Ying Xue	52ff872055	tipc: purge signal handler infrastructure In the previous commits of this series, we removed all asynchronous actions which were based on the tasklet handler - "tipc_k_signal()". So the moment has now come when we can completely remove the tasklet handler infrastructure. That is done with this commit. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 17:26:45 -04:00
Ying Xue	3f5a12bd9f	tipc: avoid to asynchronously reset all links Postpone the actions of resetting all links until after bclink lock is released, avoiding to asynchronously reset all links. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 17:26:45 -04:00
Ying Xue	eb8b00f5f2	tipc: convert allocations of global variables associated with bclink Convert allocations of global variables associated with bclink from static way to dynamical way for the convenience of bclink instance initialisation. Meanwhile, this also helps TIPC support name space in the future easily. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 17:26:45 -04:00
Ying Xue	d69afc90b8	tipc: define new functions to operate bc_lock As we are going to do more jobs when bc_lock is released, the two operations of holding/releasing the lock should be encapsulated with functions. In addition, we move bc_lock spin lock into tipc_bclink structure avoiding to define the global variable. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 17:26:44 -04:00
Ying Xue	ca0c42732c	tipc: avoid to asynchronously deliver name tables to peer node Postpone the actions of delivering name tables until after node lock is released, avoiding to do it under asynchronous context. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 17:26:44 -04:00
Ying Xue	9d56194968	tipc: remove TIPC_NAMES_GONE node flag Since previously what all publications pertaining to the lost node were removed from name table was finished in tasklet context asynchronously, we need to TIPC_NAMES_GONE flag indicating whether the node cleanup work is finished or not. But now as the cleanup work has been finished when node lock is released, the flag becomes meaningless for us. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 17:26:44 -04:00
Ying Xue	9db9fdd198	tipc: avoid to asynchronously notify subscriptions Postpone the actions of notifying subscriptions until after node lock is released, avoiding to asynchronously execute registered handlers when node is lost. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 17:26:44 -04:00
Ying Xue	10f465c496	tipc: rename setup_blocked variable of node struct to flags Rename setup_blocked variable of node struct to a more common name called "flags", which will be used to represent kinds of node states. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 17:26:44 -04:00
Ying Xue	486f930ac5	tipc: adjust order of variables in tipc_node structure Move more frequently used variables up to the head of tipc_node structure, hopefully improving a bit performance. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 17:26:44 -04:00
Ying Xue	5356f3d7d4	tipc: always use tipc_node_lock() to hold node lock Although we obtain node lock with tipc_node_lock() in most time, there are still places where we directly use native spin lock interface to grab node lock. But as we will do more jobs in the future when node lock is released, we should ensure that tipc_node_lock() is always called when node lock is taken. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 17:26:43 -04:00
Ying Xue	1621b94d2a	tipc: fix memory leak of publications Commit `1bb8dce57f` ("tipc: fix memory leak during module removal") introduced a memory leak issue: when name table is stopped, it's forgotten that publication instances are freed properly. Additionally the useless "continue" statement in tipc_nametbl_stop() is removed as well. Reported-by: Jason <huzhijiang@gmail.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Acked-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-30 13:31:26 -04:00
Ying Xue	eab8c04573	tipc: move the delivery of named messages out of nametbl lock Commit `a89778d8ba` ("tipc: add support for link state subscriptions") introduced below possible deadlock scenario: CPU0 CPU1 T0: tipc_publish() link_timeout() T1: tipc_nametbl_publish() [grab node lock]* T2: [grab nametbl write lock]* link_state_event() T3: named_cluster_distribute() link_activate() T4: [grab node lock]* tipc_node_link_up() T5: tipc_nametbl_publish() T6: [grab nametble write lock]* The opposite order of holding nametbl write lock and node lock on above two different paths may result in a deadlock. If we move the the delivery of named messages via link out of name nametbl lock, the reverse order of holding locks will be eliminated, as a result, the deadlock will be killed as well. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-28 14:49:54 -04:00
Erik Hugne	d7bb74c38c	tipc: fix out of bounds indexing Commit `78acb1f9b8` ("tipc: add ioctl to fetch link names") introduced a buffer overflow bug where specially crafted ioctl requests could cause out-of-bounds indexing of the node->links array. This was caused by an incorrect check vs MAX_BEARERS, and the static code checker complaint is: net/tipc/node.c:459 tipc_node_get_linkname() error: buffer overflow 'node->links' 2 <= 2 Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-28 14:43:35 -04:00
Ying Xue	22e7987ae7	tipc: fix a possible memory leak The commit `a8b9b96e95` ("tipc: fix race in disc create/delete") leads to the following static checker warning: net/tipc/discover.c:352 tipc_disc_create() warn: possible memory leak of 'req' The risk of memory leak really exists in practice. Especially when it's failed to allocate memory for "req->buf", tipc_disc_create() doesn't free its allocated memory, instead just directly returns with ENOMEM error code. In this situation, memory leak, of course, happens. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-27 19:08:06 -04:00
Erik Hugne	78acb1f9b8	tipc: add ioctl to fetch link names We add a new ioctl for AF_TIPC that can be used to fetch the logical name for a link to a remote node on a given bearer. This should be used in combination with link state subscriptions. The logical name size limit definitions are moved to tipc.h, as they are now also needed by the new ioctl. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-26 12:13:24 -04:00
Erik Hugne	a89778d8ba	tipc: add support for link state subscriptions When links are established over a bearer plane, we create a node local publication containing information about the peer node and bearer plane. This allows TIPC applications to use the standard TIPC topology server subscription mechanism to get notifications when a link goes up or down. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-26 12:13:24 -04:00
Eric W. Biederman	90f62cf30a	net: Use netlink_ns_capable to verify the permisions of netlink messages It is possible by passing a netlink socket to a more privileged executable and then to fool that executable into writing to the socket data that happens to be valid netlink message to do something that privileged executable did not intend to do. To keep this from happening replace bare capable and ns_capable calls with netlink_capable, netlink_net_calls and netlink_ns_capable calls. Which act the same as the previous calls except they verify that the opener of the socket had the desired permissions as well. Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-24 13:44:54 -04:00
Ying Xue	a8b9b96e95	tipc: fix race in disc create/delete Commit `a21a584d67` (tipc: fix neighbor detection problem after hw address change) introduces a race condition involving tipc_disc_delete() and tipc_disc_add/remove_dest that can cause TIPC to dereference the pointer to the bearer discovery request structure after it has been freed since a stray pointer is left in the bearer structure. In order to fix the issue, the process of resetting the discovery request handler is optimized: the discovery request handler and request buffer are just reset instead of being freed, allocated and initialized. As the request point is always valid and the request's lock is taken while the request handler is reset, the race doesn't happen any more. Reported-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:53 -04:00
Ying Xue	28dd94187a	tipc: use bc_lock to protect node map in bearer structure The node map variable - 'nodes' in bearer structure is only used by bclink. When bclink accesses it, bc_lock is held. But when change it, for instance, in tipc_bearer_add_dest() or tipc_bearer_remove_dest() the bc_lock is not taken at all. To avoid any inconsistent data, we should always grab bc_lock while accessing node map variable. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:53 -04:00
Ying Xue	4ae88c94d3	tipc: use bearer_disable to disable bearer in tipc_l2_device_event As bearer pointer is known in tipc_l2_device_event(), it's unnecessary to search it again in tipc_disable_bearer(). If tipc_disable_bearer() is replaced with bearer_disable() in tipc_l2_device_event(), this will help us save a bit time when bearer is disabled. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:53 -04:00
Ying Xue	f1c8d8cb82	tipc: make media_ptr pointed netdevice valid The 'media_ptr' pointer in bearer structure which points to network device, is protected by RCU. So, before netdevice is released, synchronize_net() should be involved to prevent no any user of the netdevice on read side from accessing it after it is freed. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:53 -04:00
Ying Xue	7216cd949c	tipc: purge tipc_net_lock lock Now tipc routing hierarchy comprises the structures 'node', 'link'and 'bearer'. The whole hierarchy is protected by a big read/write lock, tipc_net_lock, to ensure that nothing is added or removed while code is accessing any of these structures. Obviously the locking policy makes node, link and bearer components closely bound together so that their relationship becomes unnecessarily complex. In the worst case, such locking policy not only has a negative influence on performance, but also it's prone to lead to deadlock occasionally. In order o decouple the complex relationship between bearer and node as well as link, the locking policy is adjusted as follows: - Bearer level RTNL lock is used on update side, and RCU is used on read side. Meanwhile, all bearer instances including broadcast bearer are saved into bearer_list array. - Node and link level All node instances are saved into two tipc_node_list and node_htable lists. The two lists are protected by node_list_lock on write side, and they are guarded with RCU lock on read side. All members in node structure including link instances are protected by node spin lock. - The relationship between bearer and node When link accesses bearer, it first needs to find the bearer with its bearer identity from the bearer_list array. When bearer accesses node, it can iterate the node_htable hash list with the node address to find the corresponding node. In the new locking policy, every component has its private locking solution and the relationship between bearer and node is very simple, that is, they can find each other with node address or bearer identity from node_htable hash list or bearer_list array. Until now above all changes have been done, so tipc_net_lock can be removed safely. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:53 -04:00
Ying Xue	2231c5af45	tipc: use RCU to protect media_ptr pointer Now the media_ptr pointer is protected with tipc_net_lock write lock on write side; tipc_net_lock read lock is used to read side. As the part of effort of eliminating tipc_net_lock, we decide to adjust the locking policy of media_ptr pointer protection: on write side, RTNL lock is use while on read side RCU read lock is applied. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:53 -04:00
Ying Xue	7a2f7d18e7	tipc: decouple the relationship between bearer and link Currently on both paths of message transmission and reception, the read lock of tipc_net_lock must be held before bearer is accessed, while the write lock of tipc_net_lock has to be taken before bearer is configured. Although it can ensure that bearer is always valid on the two data paths, link and bearer is closely bound together. So as the part of effort of removing tipc_net_lock, the locking policy of bearer protection will be adjusted as below: on the two data paths, RCU is used, and on the configuration path of bearer, RTNL lock is applied. Now RCU just covers the path of message reception. To make it possible to protect the path of message transmission with RCU, link should not use its stored bearer pointer to access bearer, but it should use the bearer identity of its attached bearer as index to get bearer instance from bearer_list array, which can help us decouple the relationship between bearer and link. As a result, bearer on the path of message transmission can be safely protected by RCU when we access bearer_list array within RCU lock protection. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:53 -04:00
Ying Xue	f8322dfce5	tipc: convert bearer_list to RCU list Convert bearer_list to RCU list. It's protected by RTNL lock on update side, and RCU read lock is applied to read side. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:52 -04:00
Ying Xue	f97e455abf	tipc: use RTNL lock to protect tipc_net_stop routine As the tipc network initialization(ie, tipc_net_start routine) is under RTNL protection, its corresponding deinitialization part(ie, tipc_net_stop routine) should be protected by RTNL too. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:52 -04:00
Ying Xue	ca07fb07c9	tipc: adjust locking policy of protecting tipc_ptr pointer of net_device Currently the 'tipc_ptr' pointer is protected by tipc_net_lock write lock on write side, and RCU read lock is applied to read side. In addition, there have two paths on write side where we may change variables pointed by the 'tipc_ptr' pointer: one is to configure bearer by tipc-config tool and another one is that bearer status is changed by notification events of its attached interface. But on the latter path, we improperly deem that accessing 'tipc_ptr' pointer happens on read side with rcu_read_lock() although some variables pointed by the 'tipc_ptr' pointer are changed possibly. Moreover, as now the both paths are guarded by RTNL lock, it's better to adjust the locking policy of 'tipc_ptr' pointer protection, allowing RTNL instead of tipc_net_lock write lock to protect it on write side, which will help us purge tipc_net_lock in the future. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:52 -04:00
Ying Xue	ef13a262c3	tipc: replace config_mutex lock with RTNL lock There have two paths where we can configure or change bearer status: one is that bearer is configured from user space with tipc-config tool; another one is that bearer is changed by notification events from its attached interface. On the first path, one dedicated config_mutex lock is guarded; on the latter path, RTNL lock has been placed to serialize the process of dealing with interface events. So, if RTNL lock is also used to protect the first path, this will not only extremely help us simplify current locking policy, but also config_mutex lock can be deleted as well. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:52 -04:00
David S. Miller	676d23690f	net: Fix use after free by removing length arg from sk_data_ready callbacks. Several spots in the kernel perform a sequence like: skb_queue_tail(&sk->s_receive_queue, skb); sk->sk_data_ready(sk, skb->len); But at the moment we place the SKB onto the socket receive queue it can be consumed and freed up. So this skb->len access is potentially to freed up memory. Furthermore, the skb->len can be modified by the consumer so it is possible that the value isn't accurate. And finally, no actual implementation of this callback actually uses the length argument. And since nobody actually cared about it's value, lots of call sites pass arbitrary values in such as '0' and even '1'. So just remove the length argument from the callback, that way there is no confusion whatsoever and all of these use-after-free cases get fixed as a side effect. Based upon a patch by Eric Dumazet and his suggestion to audit this issue tree-wide. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-11 16:15:36 -04:00
Geert Uytterhoeven	065d7e3956	tipc: Let tipc_release() return 0 net/tipc/socket.c: In function ‘tipc_release’: net/tipc/socket.c:352: warning: ‘res’ is used uninitialized in this function Introduced by commit `24be34b5a0` ("tipc: eliminate upcall function pointers between port and socket"), which removed the sole initializer of "res". Just return 0 to fix it. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-07 15:08:17 -04:00
Erik Hugne	a5e7ac5ce1	tipc: fix regression bug where node events are not being generated Commit `5902385a24` ("tipc: obsolete the remote management feature") introduces a regression where node topology events are not being generated because the publication that triggers this: {0, <z.c.n>, <z.c.n>} is no longer available. This will break applications that rely on node events to discover when nodes join/leave a cluster. We fix this by advertising the node publication when TIPC enters networking mode, and withdraws it upon shutdown. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-03 16:03:57 -04:00
Erik Hugne	16470111ed	tipc: make discovery domain a bearer attribute The node discovery domain is assigned when a bearer is enabled. In the previous commit we reflect this attribute directly in the bearer structure since it's needed to reinitialize the node discovery mechanism after a hardware address change. There's no need to replicate this attribute anywhere else, so we remove it from the tipc_link_req structure. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-28 14:46:29 -04:00
Erik Hugne	a21a584d67	tipc: fix neighbor detection problem after hw address change If the hardware address of a underlying netdevice is changed, it is not enough to simply reset the bearer/links over this device. We also need to reflect this change in the TIPC bearer and node discovery structures aswell. This patch adds the necessary reinitialization of the node disovery mechanism following a hardware address change so that the correct originating media address is advertised in the discovery messages. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reported-by: Dong Liu <dliu.cn@gmail.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-28 14:46:29 -04:00
Ying Xue	dde2026608	tipc: use node list lock to protect tipc_num_links variable Without properly implicit or explicit read memory barrier, it's unsafe to read an atomic variable with atomic_read() from another thread which is different with the thread of changing the atomic variable with atomic_inc() or atomic_dec(). So a stale tipc_num_links may be got with atomic_read() in tipc_node_get_links(). If the tipc_num_links variable type is converted from atomic to unsigned integer and node list lock is used to protect it, the issue would be avoided. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-27 13:08:38 -04:00
Ying Xue	2220646a53	tipc: use node_list_lock to protect tipc_num_nodes variable As tipc_node_list is protected by rcu read lock on read side, it's unnecessary to hold node_list_lock to protect tipc_node_list in tipc_node_get_links(). Instead, node_list_lock should just protects tipc_num_nodes in the function. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-27 13:08:37 -04:00
Ying Xue	6c7a762e70	tipc: tipc: convert node list and node hlist to RCU lists Convert tipc_node_list list and node_htable hash list to RCU lists. On read side, the two lists are protected with RCU read lock, and on update side, node_list_lock is applied to them. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-27 13:08:37 -04:00
Ying Xue	46651c59c4	tipc: rename node create lock to protect node list and hlist When a node is created, tipc_net_lock read lock is first held and then node_create_lock is grabbed in order to prevent the same node from being created and inserted into both node list and hlist twice. But when we query node from the two node lists, we only hold tipc_net_lock read lock without grabbing node_create_lock. Obviously this locking policy is unable to guarantee that the two node lists are always synchronized especially when the operation of changing and accessing them occurs in different contexts like currently doing. Therefore, rename node_create_lock to node_list_lock to protect the two node lists, that is, whenever node is inserted into them or node is queried from them, the node_list_lock should be always held. As a result, tipc_net_lock read lock becomes redundant and then can be removed from the node query functions. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-27 13:08:37 -04:00
Ying Xue	987b58be37	tipc: make broadcast bearer store in bearer_list array Now unicast bearer is dynamically allocated and placed into its identity specified slot of bearer_list array. When we search bearer_list array with a bearer identity, the corresponding bearer instance can be found. But broadcast bearer is statically allocated and it is not located in the bearer_list array yet. So we decide to enlarge bearer_list array into MAX_BEARERS + 1 slots, and its last slot stores the broadcast bearer so that the broadcast bearer can be found from bearer_list array with MAX_BEARERS as index. The change will help us reduce the complex relationship between bearer and link in the future. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-27 13:08:37 -04:00
Ying Xue	f47de12b06	tipc: remove active flag from tipc_bearer structure After the allocation of tipc_bearer structure instance is converted from statical way to dynamical way, we identify whether a certain tipc_bearer structure pointer is valid by checking whether the pointer is NULL or not. So the active flag in tipc_bearer structure becomes redundant. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-27 13:08:37 -04:00
Ying Xue	3874ccbba8	tipc: convert tipc_bearers array to pointer list As part of the effort to introduce RCU protection for the bearer list, we first need to change it to a list of pointers. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-27 13:08:37 -04:00
Ying Xue	78dfb789b6	tipc: acquire necessary locks in named_cluster_distribute routine The 'tipc_node_list' is guarded by tipc_net_lock and 'links' array defined in 'tipc_node' structure is protected by node lock as well. Without acquiring the two locks in named_cluster_distribute() a fatal oops may happen in case that a destroyed link might be got and then accessed. Therefore, above mentioned two locks must be held in named_cluster_distribute() to prevent the issue from happening accidentally. As 'links' array in node struct must be protected by node lock, we have to move the code of selecting an active link from tipc_link_xmit() to named_cluster_distribute() and then call __tipc_link_xmit() with the selected link to deliver name messages. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-27 13:08:36 -04:00
Ying Xue	5902385a24	tipc: obsolete the remote management feature Due to the lacking of any credential, it's allowed to accept commands requested from remote nodes to query the local node status, which is prone to involve potential security risks. Instead, if we login to a remote node with ssh command, this approach is not only more safe than the remote management feature, but also it can give us more permissions like changing the remote node configuration. So it's reasonable for us to obsolete the remote management feature now. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-27 13:08:36 -04:00
Ying Xue	76d7882420	tipc: remove unnecessary checking for node object tipc_node_create routine doesn't need to check whether a node object specified with a node address exists or not because its caller(ie, tipc_disc_recv_msg routine) has checked this before calling it. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-27 13:08:36 -04:00
David S. Miller	04f58c8854	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: Documentation/devicetree/bindings/net/micrel-ks8851.txt net/core/netpoll.c The net/core/netpoll.c conflict is a bug fix in 'net' happening to code which is completely removed in 'net-next'. In micrel-ks8851.txt we simply have overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-25 20:29:20 -04:00
Erik Hugne	a5d0e7c037	tipc: fix spinlock recursion bug for failed subscriptions If a topology event subscription fails for any reason, such as out of memory, max number reached or because we received an invalid request the correct behavior is to terminate the subscribers connection to the topology server. This is currently broken and produces the following oops: [27.953662] tipc: Subscription rejected, illegal request [27.955329] BUG: spinlock recursion on CPU#1, kworker/u4:0/6 [27.957066] lock: 0xffff88003c67f408, .magic: dead4ead, .owner: kworker/u4:0/6, .owner_cpu: 1 [27.958054] CPU: 1 PID: 6 Comm: kworker/u4:0 Not tainted 3.14.0-rc6+ #5 [27.960230] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [27.960874] Workqueue: tipc_rcv tipc_recv_work [tipc] [27.961430] ffff88003c67f408 ffff88003de27c18 ffffffff815c0207 ffff88003de1c050 [27.962292] ffff88003de27c38 ffffffff815beec5 ffff88003c67f408 ffffffff817f0a8a [27.963152] ffff88003de27c58 ffffffff815beeeb ffff88003c67f408 ffffffffa0013520 [27.964023] Call Trace: [27.964292] [<ffffffff815c0207>] dump_stack+0x45/0x56 [27.964874] [<ffffffff815beec5>] spin_dump+0x8c/0x91 [27.965420] [<ffffffff815beeeb>] spin_bug+0x21/0x26 [27.965995] [<ffffffff81083df6>] do_raw_spin_lock+0x116/0x140 [27.966631] [<ffffffff815c6215>] _raw_spin_lock_bh+0x15/0x20 [27.967256] [<ffffffffa0008540>] subscr_conn_shutdown_event+0x20/0xa0 [tipc] [27.968051] [<ffffffffa000fde4>] tipc_close_conn+0xa4/0xb0 [tipc] [27.968722] [<ffffffffa00101ba>] tipc_conn_terminate+0x1a/0x30 [tipc] [27.969436] [<ffffffffa00089a2>] subscr_conn_msg_event+0x1f2/0x2f0 [tipc] [27.970209] [<ffffffffa0010000>] tipc_receive_from_sock+0x90/0xf0 [tipc] [27.970972] [<ffffffffa000fa79>] tipc_recv_work+0x29/0x50 [tipc] [27.971633] [<ffffffff8105dbf5>] process_one_work+0x165/0x3e0 [27.972267] [<ffffffff8105e869>] worker_thread+0x119/0x3a0 [27.972896] [<ffffffff8105e750>] ? manage_workers.isra.25+0x2a0/0x2a0 [27.973622] [<ffffffff810648af>] kthread+0xdf/0x100 [27.974168] [<ffffffff810647d0>] ? kthread_create_on_node+0x1a0/0x1a0 [27.974893] [<ffffffff815ce13c>] ret_from_fork+0x7c/0xb0 [27.975466] [<ffffffff810647d0>] ? kthread_create_on_node+0x1a0/0x1a0 The recursion occurs when subscr_terminate tries to grab the subscriber lock, which is already taken by subscr_conn_msg_event. We fix this by checking if the request to establish a new subscription was successful, and if not we initiate termination of the subscriber after we have released the subscriber lock. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-24 15:36:56 -04:00
David S. Miller	85dcce7a73	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/usb/r8152.c drivers/net/xen-netback/netback.c Both the r8152 and netback conflicts were simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-14 22:31:55 -04:00
Jon Paul Maloy	5c311421a2	tipc: eliminate redundant lookups in registry As an artefact from the native interface, the message sending functions in the port takes a port ref as first parameter, and then looks up in the registry to find the corresponding port pointer. This despite the fact that the only currently existing caller, tipc_sock, already knows this pointer. We change the signature of these functions to take a struct tipc_port* argument, and remove the redundant lookups. We also remove an unmotivated extra lookup in the function socket.c:auto_connect(), and, as the lookup functions tipc_port_deref() and ref_deref() now become unused, we remove these two functions. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-12 15:53:49 -04:00
Jon Paul Maloy	58ed944241	tipc: align usage of variable names and macros in socket The practice of naming variables in TIPC is inconistent, sometimes even within the same file. In this commit we align variable names and declarations within socket.c, and function and macro names within socket.h. We also reduce the number of conversion macros to two, in order to make usage less obsure. These changes are purely cosmetic. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-12 15:53:49 -04:00
Jon Paul Maloy	3b4f302d85	tipc: eliminate redundant locking The three functions tipc_portimportance(), tipc_portunreliable() and tipc_portunreturnable() and their corresponding tipc_set* functions, are all grabbing port_lock when accessing the targeted port. This is unnecessary in the current code, since these calls only are made from within socket downcalls, already protected by sock_lock. We remove the redundant locking. Also, since the functions now become trivial one-liners, we move them to port.h and make them inline. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-12 15:53:49 -04:00
Jon Paul Maloy	24be34b5a0	tipc: eliminate upcall function pointers between port and socket Due to the original one-to-many relation between port and user API layers, upcalls to the API have been performed via function pointers, installed in struct tipc_port at creation. Since this relation now always is one-to-one, we can instead use ordinary function calls. We remove the function pointers 'dispatcher' and ´wakeup' from struct tipc_port, and replace them with calls to the renamed functions tipc_sk_rcv() and tipc_sk_wakeup(). At the same time we change the name and signature of the functions tipc_createport() and tipc_deleteport() to reflect their new role as mere initialization/destruction functions. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-12 15:53:49 -04:00
Jon Paul Maloy	8826cde655	tipc: aggregate port structure into socket structure After the removal of the tipc native API the relation between a tipc_port and its API types is strictly one-to-one, i.e, the latter can now only be a socket API. There is therefore no need to allocate struct tipc_port and struct sock independently. In this commit, we aggregate struct tipc_port into struct tipc_sock, hence saving both CPU cycles and structure complexity. There are no functional changes in this commit, except for the elimination of the separate allocation/freeing of tipc_port. All other changes are just adaptatons to the new data structure. This commit also opens up for further code simplifications and code volume reduction, something we will do in later commits. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-12 15:53:49 -04:00
Jon Paul Maloy	f9fef18c6d	tipc: remove redundant 'peer_name' field in struct tipc_sock The field 'peer_name' in struct tipc_sock is redundant, since this information already is available from tipc_port, to which tipc_sock has a reference. We remove the field, and ensure that peer node and peer port info instead is fetched via the functions that already exist for this purpose. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-12 15:53:49 -04:00
Jon Paul Maloy	978813ee89	tipc: replace reference table rwlock with spinlock The lock for protecting the reference table is declared as an RWLOCK, although it is only used in write mode, never in read mode. We redefine it to become a spinlock. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-12 15:53:49 -04:00
Joe Perches	184593c734	tipc: Convert uses of __constant_<foo> to <foo> The use of __constant_<foo> has been unnecessary for quite awhile now. Make these uses consistent with the rest of the kernel. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-12 15:28:06 -04:00
Erik Hugne	2892505ea1	tipc: don't log disabled tasklet handler errors Failure to schedule a TIPC tasklet with tipc_k_signal because the tasklet handler is disabled is not an error. It means TIPC is currently in the process of shutting down. We remove the error logging in this case. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-06 14:46:24 -05:00
Erik Hugne	1bb8dce57f	tipc: fix memory leak during module removal When the TIPC module is removed, the tasklet handler is disabled before all other subsystems. This will cause lingering publications in the name table because the node_down tasklets responsible to clean up publications from an unreachable node will never run. When the name table is shut down, these publications are detected and an error message is logged: tipc: nametbl_stop(): orphaned hash chain detected This is actually a memory leak, introduced with commit `993b858e37` ("tipc: correct the order of stopping services at rmmod") Instead of just logging an error and leaking memory, we free the orphaned entries during nametable shutdown. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-06 14:46:24 -05:00
Erik Hugne	edcc0511b5	tipc: drop subscriber connection id invalidation When a topology server subscriber is disconnected, the associated connection id is set to zero. A check vs zero is then done in the subscription timeout function to see if the subscriber have been shut down. This is unnecessary, because all subscription timers will be cancelled when a subscriber terminates. Setting the connection id to zero is actually harmful because id zero is the identity of the topology server listening socket, and can cause a race that leads to this socket being closed instead. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-06 14:46:23 -05:00
Ying Xue	fe8e464939	tipc: avoid to unnecessary process switch under non-block mode When messages are received via tipc socket under non-block mode, schedule_timeout() is called in tipc_wait_for_rcvmsg(), that is, the process of receiving messages will be scheduled once although timeout value passed to schedule_timeout() is 0. The same issue exists in accept()/wait_for_accept(). To avoid this unnecessary process switch, we only call schedule_timeout() if the timeout value is non-zero. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-06 14:46:23 -05:00
Ying Xue	4652edb70e	tipc: fix connection refcount leak When tipc_conn_sendmsg() calls tipc_conn_lookup() to query a connection instance, its reference count value is increased if it's found. But subsequently if it's found that the connection is closed, the work of sending message is not queued into its server send workqueue, and the connection reference count is not decreased. This will cause a reference count leak. To reproduce this problem, an application would need to open and closes topology server connections with high intensity. We fix this by immediately decrementing the connection reference count if a send fails due to the connection being closed. Signed-off-by: Ying Xue <ying.xue@windriver.com> Acked-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-06 14:46:23 -05:00
Ying Xue	6d4ebeb4df	tipc: allow connection shutdown callback to be invoked in advance Currently connection shutdown callback function is called when connection instance is released in tipc_conn_kref_release(), and receiving packets and sending packets are running in different threads. Even if connection is closed by the thread of receiving packets, its shutdown callback may not be called immediately as the connection reference count is non-zero at that moment. So, although the connection is shut down by the thread of receiving packets, the thread of sending packets doesn't know it. Before its shutdown callback is invoked to tell the sending thread its connection has been closed, the sending thread may deliver messages by tipc_conn_sendmsg(), this is why the following error information appears: "Sending subscription event failed, no memory" To eliminate it, allow connection shutdown callback function to be called before connection id is removed in tipc_close_conn(), which makes the sending thread know the truth in time that its socket is closed so that it doesn't send message to it. We also remove the "Sending XXX failed..." error reporting for topology and config services. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-06 14:46:23 -05:00
David S. Miller	67ddc87f16	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/wireless/ath/ath9k/recv.c drivers/net/wireless/mwifiex/pcie.c net/ipv6/sit.c The SIT driver conflict consists of a bug fix being done by hand in 'net' (missing u64_stats_init()) whilst in 'net-next' a helper was created (netdev_alloc_pcpu_stats()) which takes care of this. The two wireless conflicts were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-05 20:32:02 -05:00
Ying Xue	970122fdf4	tipc: make bearer set up in module insertion stage Accidentally a side effect is involved by commit 6e967adf7(tipc: relocate common functions from media to bearer). Now tipc stack handler of receiving packets from netdevices as well as netdevice notification handler are registered when bearer is enabled rather than tipc module initialization stage, but the two handlers are both unregistered in tipc module exit phase. If tipc module is inserted and then immediately removed, the following warning message will appear: "dev_remove_pack: ffffffffa0380940 not found" This is because in module insertion stage tipc stack packet handler is not registered at all, but in module exit phase dev_remove_pack() needs to remove it. Of course, dev_remove_pack() cannot find tipc protocol handler from the kernel protocol handler list so that the warning message is printed out. But if registering the two handlers is adjusted from enabling bearer phase into inserting module stage, the warning message will be eliminated. Due to this change, tipc_core_start_net() and tipc_core_stop_net() can be deleted as well. Reported-by: Wang Weidong <wangweidong1@huawei.com> Cc: Jon Maloy <jon.maloy@ericsson.com> Cc: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-22 00:00:15 -05:00
Ying Xue	9fe7ed4749	tipc: remove all enabled flags from all tipc components When tipc module is inserted, many tipc components are initialized one by one. During the initialization period, if one of them is failed, tipc_core_stop() will be called to stop all components whatever corresponding components are created or not. To avoid to release uncreated ones, relevant components have to add necessary enabled flags indicating whether they are created or not. But in the initialization stage, if one component is unsuccessfully created, we will just destroy successfully created components before the failed component instead of all components. All enabled flags defined in components, in turn, become redundant. Additionally it's also unnecessary to identify whether table.types is NULL in tipc_nametbl_stop() because name stable has been definitely created successfully when tipc_nametbl_stop() is called. Cc: Jon Maloy <jon.maloy@ericsson.com> Cc: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-22 00:00:15 -05:00
Erik Hugne	63fa01c147	tipc: failed transmissions should return error When a message could not be sent out because the destination node or link could not be found, the full message size is returned from sendmsg() as if it had been sent successfully. An application will then get a false indication that it's making forward progress. This problem has existed since the initial commit in 2.6.16. We change this to return -ENETUNREACH if the message cannot be delivered due to the destination node/link being unavailable. We also get rid of the redundant tipc_reject_msg call since freeing the buffer and doing a tipc_port_iovec_reject accomplishes exactly the same thing. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-19 16:40:57 -05:00
David S. Miller	1e8d6421cf	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/bonding/bond_3ad.h drivers/net/bonding/bond_main.c Two minor conflicts in bonding, both of which were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-19 01:24:22 -05:00
Ying Xue	247f0f3c31	tipc: align tipc function names with common naming practice in the network Rename the following functions, which are shorter and more in line with common naming practice in the network subsystem. tipc_bclink_send_msg->tipc_bclink_xmit tipc_bclink_recv_pkt->tipc_bclink_rcv tipc_disc_recv_msg->tipc_disc_rcv tipc_link_send_proto_msg->tipc_link_proto_xmit link_recv_proto_msg->tipc_link_proto_rcv link_send_sections_long->tipc_link_iovec_long_xmit tipc_link_send_sections_fast->tipc_link_iovec_xmit_fast tipc_link_send_sync->tipc_link_sync_xmit tipc_link_recv_sync->tipc_link_sync_rcv tipc_link_send_buf->__tipc_link_xmit tipc_link_send->tipc_link_xmit tipc_link_send_names->tipc_link_names_xmit tipc_named_recv->tipc_named_rcv tipc_link_recv_bundle->tipc_link_bundle_rcv tipc_link_dup_send_queue->tipc_link_dup_queue_xmit link_send_long_buf->tipc_link_frag_xmit tipc_multicast->tipc_port_mcast_xmit tipc_port_recv_mcast->tipc_port_mcast_rcv tipc_port_reject_sections->tipc_port_iovec_reject tipc_port_recv_proto_msg->tipc_port_proto_rcv tipc_connect->tipc_port_connect __tipc_connect->__tipc_port_connect __tipc_disconnect->__tipc_port_disconnect tipc_disconnect->tipc_port_disconnect tipc_shutdown->tipc_port_shutdown tipc_port_recv_msg->tipc_port_rcv tipc_port_recv_sections->tipc_port_iovec_rcv release->tipc_release accept->tipc_accept bind->tipc_bind get_name->tipc_getname poll->tipc_poll send_msg->tipc_sendmsg send_packet->tipc_send_packet send_stream->tipc_send_stream recv_msg->tipc_recvmsg recv_stream->tipc_recv_stream connect->tipc_connect listen->tipc_listen shutdown->tipc_shutdown setsockopt->tipc_setsockopt getsockopt->tipc_getsockopt Above changes have no impact on current users of the functions. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-18 17:31:59 -05:00
Jon Paul Maloy	a11607f5a1	tipc: correct usage of spin_lock() vs spin_lock_bh() I commit `e099e86c9e` ("tipc: add node_lock protection to link lookup function") we are calling spin_lock(&node->lock) directly instead of indirectly via the tipc_node_lock(node) function. However, tipc_node_lock() is using spin_lock_bh(), not spin_lock(), something leading to unbalanced usage in one place, and a smatch warning. We fix this by consistently using tipc_node_lock()/unlock() in in the places touched by the mentioned commit. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-17 00:26:34 -05:00
Jon Paul Maloy	074bb43e9e	tipc: fix a loop style problem In commit `7d33939f47` ("tipc: delay delete of link when failover is needed") we introduced a loop for finding and removing a link pointer in an array. The removal is done after we have left the loop, giving the impression that one may remove the wrong pointer if no matching element is found. This is not really a bug, since we know that there will always be a matching element, but it looks wrong, and causes a smatch warning. We fix this loop with this commit. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-17 00:26:26 -05:00
Jon Paul Maloy	e099e86c9e	tipc: add node_lock protection to link lookup function In an earlier commit, ("tipc: remove links list from bearer struct") we described three issues that need to be pre-emptively resolved before we can remove tipc_net_lock. Here we resolve issue a) described in that commit: "a) In access method #2, we access the link before taking the protecting node_lock. This will not work once net_lock is gone, so we will have to change the access order. We will deal with this in a later commit in this series." Here, we change that access order, by ensuring that the function link_find_link() returns only a safe reference for finding the link, i.e., a node pointer and an index into its 'links' array, not the link pointer itself. We also change all callers of this function to first take the node lock before they can check if there still is a valid link pointer at the returned index. Since the function now returns a node pointer rather than a link pointer, we rename it to the more appropriate 'tipc_link_find_owner(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-13 17:57:07 -05:00
Ying Xue	a83045292d	tipc: remove bearer_lock from tipc_bearer struct After the earlier commits ("tipc: remove 'links' list from tipc_bearer struct") and ("tipc: introduce new spinlock to protect struct link_req"), there is no longer any need to protect struct link_req or or any link list by use of bearer_lock. Furthermore, we have eliminated the need for using bearer_lock during downcalls (send) from the link to the bearer, since we have ensured that bearers always have a longer life cycle that their associated links, and always contain valid data. So, the only need now for a lock protecting bearers is for guaranteeing consistency of the bearer list itself. For this, it is sufficient, at least for the time being, to continue applying 'net_lock´ in write mode. By removing bearer_lock we also pre-empt introduction of issue b) descibed in the previous commit "tipc: remove 'links' list from tipc_bearer struct": "b) When the outer protection from net_lock is gone, taking bearer_lock and node_lock in opposite order of method 1) and 2) will become an obvious deadlock hazard". Therefore, we now eliminate the bearer_lock spinlock. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-13 17:57:07 -05:00
Jon Paul Maloy	7d33939f47	tipc: delay delete of link when failover is needed When a bearer is disabled, all its attached links are deleted. Ideally, we should do link failover to redundant links on other bearers, if there are any, in such cases. This would be consistent with current behavior when a link is reset, but not deleted. However, due to the complexity involved, and the (wrongly) perceived low demand for this feature, it was never implemented until now. We mark the doomed link for deletion with a new flag, but wait until the failover process is finished before we actually delete it. With the improved link tunnelling/failover code introduced earlier in this commit series, it is now easy to identify a spot in the code where the failover is finished and it is safe to delete the marked link. Moreover, the test for the flag and the deletion can be done synchronously, and outside the most time critical data path. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-13 17:57:07 -05:00
Jon Paul Maloy	a5377831eb	tipc: changes to general packet reception algorithm We change the order of checking for destination users when processing incoming packets. By placing the checks for users that may potentially replace the processed buffer, i.e., CHANGEOVER_PROTOCOL and MSG_FRAGMENTER, in a separate step before we check for the true end users, we get rid of a label and a 'goto', at the same time making the code more comprehensible and easy to follow. This commit does not change any functionality, it is just a cosmetic code reshuffle. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-13 17:57:06 -05:00
Jon Paul Maloy	02842f718d	tipc: rename stack variables in function tipc_link_tunnel_rcv After the previous redesign of the tunnel reception algorithm and functions, we finalize it by renaming a couple of stack variables in tipc_tunnel_rcv(). This makes it more consistent with the naming scheme elsewhere in this part of the code. This change is purely cosmetic, with no functional changes. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-13 17:57:06 -05:00
Jon Paul Maloy	1e9d47a948	tipc: more cleanup of tunnelling reception function We simplify and slim down the code in function tipc_tunnel_rcv() No impact on the users of this function. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-13 17:57:06 -05:00
Jon Paul Maloy	3bb533800c	tipc: change signature of tunnelling reception function After the earlier commits in this series related to the function tipc_link_tunnel_rcv(), we can now go further and simplify its signature. The function now consumes all DUPLICATE packets, and only returns such ORIGINAL packets that are ready for immediate delivery, i.e., no more link level protocol processing needs to be done by the caller. As a consequence, the the caller, tipc_rcv(), does not access the link pointer after call return, and it becomes unnecessary to pass a link pointer reference in the call. Instead, we now only pass it the tunnel link's owner node, which is sufficient to find the destination link for the tunnelled packet. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-13 17:57:06 -05:00
Jon Paul Maloy	f006c9c70f	tipc: change reception of tunnelled failover packets When a link is reset, and there is a redundant link available, all sender sockets will steer their subsequent traffic through the remaining link. In order to guarantee preserved packet order and cardinality during the transition, we tunnel the failing link's send queue through the remaining link before we allow any sockets to use it. In this commit, we change the algorithm for receiving failover ("ORIGINAL_MSG") packets in tipc_link_tunnel_rcv(), at the same time delegating it to a new subfuncton, tipc_link_failover_rcv(). Instead of directly returning an extracted inner packet to the packet reception loop in tipc_rcv(), we first check if it is a message fragment, in which case we append it to the reset link's fragment chain. If the fragment chain is complete, we return the whole chain instead of the individual buffer, eliminating any need for the tipc_rcv() loop to do reassembly of tunneled packets. This change makes it possible to further simplify tipc_link_tunnel_rcv(), as well as the calling tipc_rcv() loop. We will do that in later commits. It also makes it possible to identify a single spot in the code where we can tell that a failover procedure is finished, something that is useful when we are deleting links after a failover. This will also be done in a later commit. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-13 17:57:06 -05:00
Jon Paul Maloy	1dab3d5ac2	tipc: change reception of tunnelled duplicate packets When a second link to a destination comes up, some sender sockets will steer their subsequent traffic through the new link. In order to guarantee preserved packet order and cardinality for those sockets, we tunnel a duplicate of the old link's send queue through the new link before we open it for regular traffic. The last arriving packet copy, on whichever link, will be dropped at the receiving end based on the original sequence number, to ensure that only one copy is delivered to the end receiver. In this commit, we change the algorithm for receiving DUPLICATE_MSG packets, at the same time delegating it to a new subfunction, tipc_link_dup_rcv(). Instead of returning an extracted inner packet to the packet reception loop in tipc_rcv(), we just add it to the receiving (new) link's deferred packet queue. The packet will then be processed by that link when it receives its first non-tunneled packet, i.e., at latest when the changeover procedure is finished. Because tipc_link_tunnel_rcv()/tipc_link_dup_rcv() now is consuming all packets of type DUPLICATE_MSG, the calling tipc_rcv() function can omit testing for this. This in turn means that the current conditional jump to the label 'protocol_check' becomes redundant, and we can remove that label. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-13 17:57:06 -05:00
Ying Xue	c61dd61dec	tipc: remove 'links' list from tipc_bearer struct In our ongoing effort to simplify the TIPC locking structure, we see a need to remove the linked list for tipc_links in the bearer. This can be explained as follows. Currently, we have three different ways to access a link, via three different lists/tables: 1: Via a node hash table: Used by the time-critical outgoing/incoming data paths. (e.g. link_send_sections_fast() and tipc_recv_msg() ): grab net_lock(read) find node from node hash table grab node_lock select link grab bearer_lock send_msg() release bearer_lock release node lock release net_lock 2: Via a global linked list for nodes: Used by configuration commands (link_cmd_set_value()) grab net_lock(read) find node and link from global node list (using link name) grab node_lock update link release node lock release net_lock (Same locking order as above. No problem.) 3: Via the bearer's linked link list: Used by notifications from interface (e.g. tipc_disable_bearer() ) grab net_lock(write) grab bearer_lock get link ptr from bearer's link list get node from link grab node_lock delete link release node lock release bearer_lock release net_lock (Different order from above, but works because we grab the outer net_lock in write mode first, excluding all other access.) The first major goal in our simplification effort is to get rid of the "big" net_lock, replacing it with rcu-locks when accessing the node list and node hash array. This will come in a later patch series. But to get there we first need to rewrite access methods ##2 and 3, since removal of net_lock would introduce three major problems: a) In access method #2, we access the link before taking the protecting node_lock. This will not work once net_lock is gone, so we will have to change the access order. We will deal with this in a later commit in this series, "tipc: add node lock protection to link found by link_find_link()". b) When the outer protection from net_lock is gone, taking bearer_lock and node_lock in opposite order of method 1) and 2) will become an obvious deadlock hazard. This is fixed in the commit ("tipc: remove bearer_lock from tipc_bearer struct") later in this series. c) Similar to what is described in problem a), access method #3 starts with using a link pointer that is unprotected by node_lock, in order to via that pointer find the correct node struct and lock it. Before we remove net_lock, this access order must be altered. This is what we do with this commit. We can avoid introducing problem problem c) by even here using the global node list to find the node, before accessing its links. When we loop though the node list we use the own bearer identity as search criteria, thus easily finding the links that are associated to the resetting/disabling bearer. It should be noted that although this method is somewhat slower than the current list traversal, it is in no way time critical. This is only about resetting or deleting links, something that must be considered relatively infrequent events. As a bonus, we can get rid of the mutual pointers between links and bearers. After this commit, pointer dependency go in one direction only: from the link to the bearer. This commit pre-empts introduction of problem c) as described above. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-13 17:57:05 -05:00
Ying Xue	135daee6d3	tipc: redefine 'started' flag in struct link to bitmap Currently, the 'started' field in struct tipc_link represents only a binary state, 'started' or 'not started'. We need it to represent more link execution states in the coming commits in this series. Hence, we rename the field to 'flags', and define the current started/non-started state to be represented by the LSB bit of that field. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-13 17:57:05 -05:00
Ying Xue	8d8439b686	tipc: move code for deleting links from bearer.c to link.c We break out the code for deleting attached links in the function bearer_disable(), and define a new function named tipc_link_delete_list() to do this job. This commit incurs no functional changes, but makes the code of function bearer_disable() cleaner. It is also a preparation for a more important change to the bearer code, in a subsequent commit in this series. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-13 17:57:05 -05:00

1 2 3 4 5 ...

894 Commits