OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Hariprasad Shenai	fc5ab02096	cxgb4: Replaced the backdoor mechanism to access the HW memory with PCIe Window method Rip out a bunch of redundant PCI-E Memory Window Read/Write routines, collapse the more general purpose routines into a single routine thereby eliminating the need for a large stack frame (and extra data copying) in the outer routine, change everything to use the improved routine t4_memory_rw. Based on origninal work by Casey Leedom <leedom@chelsio.com> and Steve Wise <swise@opengridcomputing.com> Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 18:56:10 -07:00
Hariprasad Shenai	0abfd1524b	cxgb4: Use FW interface to get BAR0 value Use the firmware interface to get the BAR0 value since we really don't want to use the PCI-E Configuration Space Backdoor access which is owned by the firmware. Set up PCI-E Memory Window registers using the true values programmed into BAR registers. When the PF4 "Master Function" is exported to a Virtual Machine, the values returned by pci_resource_start() will be for the synthetic PCI-E Configuration Space and not the real addresses. But we need to program the PCI-E Memory Window address decoders with the real addresses that we're going to be using in order to have accesses through the Memory Windows work. Based on origninal work by Casey Leedom <leedom@chelsio.com> Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 18:56:10 -07:00
Hariprasad Shenai	35b1de5579	rdma/cxgb4: Fixes cxgb4 probe failure in VM when PF is exposed through PCI Passthrough Change logic which determines our Physical Function at PCI Probe time. Now we read the PL_WHOAMI register and get the Physical Function. Pass Physical Function to Upper Layer Drivers in lld_info structure in the new field "pf" added to lld_info. This is useful for the cases where the PF, say PF4, is attached to a Virtual Machine via some form of "PCI Pass Through" technology and the PCI Function shows up as PF0 in the VM. Based on original work by Casey Leedom <leedom@chelsio.com> Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 18:56:10 -07:00
David S. Miller	2eb27a16b5	Merge branch 'dp83640-next' Stefan Sørensen says: ==================== dp83640: Increase support perout pins This patch series increases the number of periodic output pins supported on the dp83640 to 7, and allows for reprogramming the calibration pin. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 18:53:01 -07:00
Stefan Sørensen	72df7a7244	ptp: Allow reassigning calibration pin function The ptp pin function programming does not allow calibration pin to change function. This is problematic on hardware that uses the default calibration pin for other purposes. Removing this limitation does not impact calibration if userspace does not reprogram the calibration pin. Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 18:52:54 -07:00
Stefan Sørensen	e0155950f0	dp83640: Get calibration pin with ptp_find_pin For consistency, use the ptp_find_pin function to get the calibration pin, not gpio_tab. Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 18:52:54 -07:00
Stefan Sørensen	6f39eb87de	dp83640: Verify calibration pin assignment This constraints the pin assignment to not allow the calibration function to be reassigned and only allow reassigning the calibratin pin if only one phy is connected. Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 18:52:53 -07:00
Stefan Sørensen	ad01577aeb	dp83640: Increase supported perout pins to 7 This patch increases the number of supported periodic output pins from 1 to 7. The last pin is reserved for sync. Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 18:52:53 -07:00
Stefan Sørensen	35e872ae63	dp83640: Program pulsewidth2 values of perout triggers 0 and 1 Periodic output triggers 0 and 1 of the dp83640 has a programmable duty-cycle which is controlled by the Pulsewidth2 field of the trigger data register. This field is not documented in the datasheet, but it is described in the "PHYTER Software Development Guide" section 3.1.4.1. Failing to set the field can result in unstable/no trigger output. Add programming of the Pulsewidth2 field, setting it to the same value as the Pulsewidth field for a 50% duty cycle. Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 18:52:53 -07:00
David S. Miller	b6fd8b7fa1	Merge branch 'bnx2x-next' Yuval Mintz says: ==================== bnx2x: Enhancement patch series This patch series introduces the ability to propagate link parameters to VFs as well as control the VF link via hypervisor. In addition, it contains 2 small improvements [one IOV-related and the other improves performance on machines with short cache lines]. Please consider applying these patches to `net-next'. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 15:52:37 -07:00
Yuval Mintz	ebf457f931	bnx2x: Fail probe of VFs using an old incompatible driver There are linux distributions where the inbox bnx2x driver contains SRIOV support but doesn't contain the changes introduced in `b9871bcf` "bnx2x: VF RSS support - PF side". A VF in a VM running that distribution over a new hypervisor will access incorrect addresses when trying to transmit packets, causing an attention in the hypervisor and making that VF inactive until FLRed. The driver in the VM has to ne upgraded [no real way to overcome this], but due to the HW attention currently arising upgrading the driver in the VM would not suffice [since the VF needs also be FLRed if the previous driver was already loaded]. This patch causes the PF to fail the acquire message from a VF running an old problematic driver; The VF will then gracefully fail it's probe preventing the HW attention [and allow clean upgrade of driver in VM]. Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 15:52:30 -07:00
Dmitry Kravkov	9927b51469	bnx2x: enlarge minimal alignemnt of data offset This improves the performance of driver on machine with L1_CACHE_SHIFT of at most 32 bytes [HW was planned for 64-byte aligned fastpath data]. Signed-off-by: Dmitry Kravkov <Dmitry.Kravkov@qlogic.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 15:52:29 -07:00
Dmitry Kravkov	6495d15a7c	bnx2x: VF can report link speed Until now VFs were oblvious to the actual configured link parameters. This patch does 2 things: 1. It enables a PF to inform its VF using the bulletin board of the link configured, and allows the VF to present that information. 2. It adds support of `ndo_set_vf_link_state', allowing the hypervisor to set the VF link state. Signed-off-by: Dmitry Kravkov <Dmitry.Kravkov@qlogic.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 15:52:29 -07:00
David S. Miller	edd79ca8b3	Merge branch 'pktgen' Jesper Dangaard Brouer says: ==================== Optimizing pktgen for single CPU performance This series focus on optimizing "pktgen" for single CPU performance. V2-series: - Removed some patches - Doc real reason for TX ring buffer filling up NIC tuning for pktgen: http://netoptimizer.blogspot.dk/2014/06/pktgen-for-network-overload-testing.html General overload setup according to: http://netoptimizer.blogspot.dk/2014/04/basic-tuning-for-network-overload.html Hardware: System: CPU E5-2630 NIC: Intel ixgbe/82599 chip Testing done with net-next git tree on top of commit `6623b41944` ("Merge branch 'master' of...jkirsher/net-next") Pktgen script exercising race condition: https://github.com/netoptimizer/network-testing/blob/master/pktgen/unit_test01_race_add_rem_device_loop.sh Tool for measuring LOCK overhead: https://github.com/netoptimizer/network-testing/blob/master/src/overhead_cmpxchg.c ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 15:50:56 -07:00
Jesper Dangaard Brouer	8788370a1d	pktgen: RCU-ify "if_list" to remove lock in next_to_run() The if_lock()/if_unlock() in next_to_run() adds a significant overhead, because its called for every packet in busy loop of pktgen_thread_worker(). (Thomas Graf originally pointed me at this lock problem). Removing these two "LOCK" operations should in theory save us approx 16ns (8ns x 2), as illustrated below we do save 16ns when removing the locks and introducing RCU protection. Performance data with CLONE_SKB==100000, TX-size=512, rx-usecs=30: (single CPU performance, ixgbe 10Gbit/s, E5-2630) * Prev : 5684009 pps --> 175.93ns (1/568400910^9) RCU-fix: 6272204 pps --> 159.43ns (1/627220410^9) Diff : +588195 pps --> -16.50ns To understand this RCU patch, I describe the pktgen thread model below. In pktgen there is several kernel threads, but there is only one CPU running each kernel thread. Communication with the kernel threads are done through some thread control flags. This allow the thread to change data structures at a know synchronization point, see main thread func pktgen_thread_worker(). Userspace changes are communicated through proc-file writes. There are three types of changes, general control changes "pgctrl" (func:pgctrl_write), thread changes "kpktgend_X" (func:pktgen_thread_write), and interface config changes "etcX@N" (func:pktgen_if_write). Userspace "pgctrl" and "thread" changes are synchronized via the mutex pktgen_thread_lock, thus only a single userspace instance can run. The mutex is taken while the packet generator is running, by pgctrl "start". Thus e.g. "add_device" cannot be invoked when pktgen is running/started. All "pgctrl" and all "thread" changes, except thread "add_device", communicate via the thread control flags. The main problem is the exception "add_device", that modifies threads "if_list" directly. Fortunately "add_device" cannot be invoked while pktgen is running. But there exists a race between "rem_device_all" and "add_device" (which normally don't occur, because "rem_device_all" waits 125ms before returning). Background'ing "rem_device_all" and running "add_device" immediately allow the race to occur. The race affects the threads (list of devices) "if_list". The if_lock is used for protecting this "if_list". Other readers are given lock-free access to the list under RCU read sections. Note, interface config changes (via proc) can occur while pktgen is running, which worries me a bit. I'm assuming proc_remove() takes appropriate locks, to assure no writers exists after proc_remove() finish. I've been running a script exercising the race condition (leading me to fix the proc_remove order), without any issues. The script also exercises concurrent proc writes, while the interface config is getting removed. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 15:50:23 -07:00
Jesper Dangaard Brouer	baac167b70	pktgen: avoid expensive set_current_state() call in loop Avoid calling set_current_state() inside the busy-loop in pktgen_thread_worker(). In case of pkt_dev->delay, then it is still used/enabled in pktgen_xmit() via the spin() call. The set_current_state(TASK_INTERRUPTIBLE) uses a xchg, which implicit is LOCK prefixed. I've measured the asm LOCK operation to take approx 8ns on this E5-2630 CPU. Performance increase corrolate with this measurement. Performance data with CLONE_SKB==100000, rx-usecs=30: (single CPU performance, ixgbe 10Gbit/s, E5-2630) * Prev: 5454050 pps --> 183.35ns (1/545405010^9) Now: 5684009 pps --> 175.93ns (1/568400910^9) Diff: +229959 pps --> -7.42ns Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 15:50:23 -07:00
Jesper Dangaard Brouer	9ceb87fcea	pktgen: document tuning for max NIC performance Using pktgen I'm seeing the ixgbe driver "push-back", due TX ring running full. Thus, the TX ring is artificially limiting pktgen. (Diagnose via "ethtool -S", look for "tx_restart_queue" or "tx_busy" counters.) Using ixgbe, the real reason behind the TX ring running full, is due to TX ring not being cleaned up fast enough. The ixgbe driver combines TX+RX ring cleanups, and the cleanup interval is affected by the ethtool --coalesce setting of parameter "rx-usecs". Do not increase the default NIC TX ring buffer or default cleanup interval. Instead simply document that pktgen needs special NIC tuning for maximum packet per sec performance. Performance results with pktgen with clone_skb=100000. TX ring size 512 (default), adjusting "rx-usecs": (Single CPU performance, E5-2630, ixgbe) - 3935002 pps - rx-usecs: 1 (irqs: 9346) - 5132350 pps - rx-usecs: 10 (irqs: 99157) - 5375111 pps - rx-usecs: 20 (irqs: 50154) - 5454050 pps - rx-usecs: 30 (irqs: 33872) - 5496320 pps - rx-usecs: 40 (irqs: 26197) - 5502510 pps - rx-usecs: 50 (irqs: 21527) TX ring size adjusting (ethtool -G), "rx-usecs==1" (default): - 3935002 pps - tx-size: 512 - 5354401 pps - tx-size: 768 - 5356847 pps - tx-size: 1024 - 5327595 pps - tx-size: 1536 - 5356779 pps - tx-size: 2048 - 5353438 pps - tx-size: 4096 Notice after commit `6f25cd47d` (pktgen: fix xmit test for BQL enabled devices) pktgen uses netif_xmit_frozen_or_drv_stopped() and ignores the BQL "stack" pause (QUEUE_STATE_STACK_XOFF) flag. This allow us to put more pressure on the TX ring buffers. It is the ixgbe_maybe_stop_tx() call that stops the transmits, and pktgen respecting this in the call to netif_xmit_frozen_or_drv_stopped(txq). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 15:50:23 -07:00
Jiri Pirko	5b9e7e1607	openvswitch: introduce rtnl ops stub This stub now allows userspace to see IFLA_INFO_KIND for ovs master and IFLA_INFO_SLAVE_KIND for slave. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 14:40:17 -07:00
Jiri Pirko	b0ab2fabb5	rtnetlink: allow to register ops without ops->setup set So far, it is assumed that ops->setup is filled up. But there might be case that ops might make sense even without ->setup. In that case, forbid to newlink and dellink. This allows to register simple rtnl link ops containing only ->kind. That allows consistent way of passing device kind (either device-kind or slave-kind) to userspace. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 14:40:17 -07:00
Ying Xue	9bf2b8c280	net: fix some typos in comment In commit 371121057607e3127e19b3fa094330181b5b031e("net: QDISC_STATE_RUNNING dont need atomic bit ops") the __QDISC_STATE_RUNNING is renamed to __QDISC___STATE_RUNNING, but the old names existing in comment are not replaced with the new name completely. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 14:20:32 -07:00
Ben Greear	d933319657	ipv6: Allow accepting RA from local IP addresses. This can be used in virtual networking applications, and may have other uses as well. The option is disabled by default. A specific use case is setting up virtual routers, bridges, and hosts on a single OS without the use of network namespaces or virtual machines. With proper use of ip rules, routing tables, veth interface pairs and/or other virtual interfaces, and applications that can bind to interfaces and/or IP addresses, it is possibly to create one or more virtual routers with multiple hosts attached. The host interfaces can act as IPv6 systems, with radvd running on the ports in the virtual routers. With the option provided in this patch enabled, those hosts can now properly obtain IPv6 addresses from the radvd. Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 12:16:24 -07:00
Ben Greear	f2a762d8a9	ipv6: Add more debugging around accept-ra logic. This is disabled by default, just like similar debug info already in this module. But, makes it easier to find out why RA is not being accepted when debugging strange behaviour. Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 12:16:24 -07:00
Octavian Purdila	4135ab8208	tcp: tcp_conn_request: fix build error when IPv6 is disabled Fixes build error introduced by commit `1fb6f159fd` (tcp: add tcp_conn_request): net/ipv4/tcp_input.c: In function 'pr_drop_req': net/ipv4/tcp_input.c:5889:130: error: 'struct sock_common' has no member named 'skc_v6_daddr' Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-29 23:46:38 -07:00
David S. Miller	9e1a21b693	Merge branch 'tcp_conn_request_unification' Octavian Purdila says: ==================== tcp: remove code duplication in tcp_v[46]_conn_request This patch series unifies the TCPv4 and TCPv6 connection request flow in a single new function (tcp_conn_request). The first 3 patches are small cleanups and fixes found during the code merge process. The next patches add new methods in tcp_request_sock_ops to abstract the IPv4/IPv6 operations and keep the TCP connection request flow common. To identify potential performance issues this patch has been tested by measuring the connection per second rate with nginx and a httperf like client (to allow for concurrent connection requests - 256 CC were used during testing) using the loopback interface. A dual-core i5 Ivy Bridge processor was used and each process was bounded to a different core to make results consistent. Results for IPv4, unit is connections per second, higher is better, 20 measurements have been collected: before after min 27917 27962 max 28262 28366 avg 28094.1 28212.75 stdev 87.35 97.26 Results for IPv6, unit is connections per second, higher is better, 20 measurements have been collected: before after min 24813 24877 max 25029 25119 avg 24935.5 25017 stdev 64.13 62.93 Changes since v1: * add benchmarking datapoints * fix a few issues in the last patch (IPv6 related) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 15:53:54 -07:00
Octavian Purdila	1fb6f159fd	tcp: add tcp_conn_request Create tcp_conn_request and remove most of the code from tcp_v4_conn_request and tcp_v6_conn_request. Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 15:53:37 -07:00
Octavian Purdila	695da14eb0	tcp: add queue_add_hash to tcp_request_sock_ops Add queue_add_hash member to tcp_request_sock_ops so that we can later unify tcp_v4_conn_request and tcp_v6_conn_request. Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 15:53:36 -07:00
Octavian Purdila	2aec4a297b	tcp: add mss_clamp to tcp_request_sock_ops Add mss_clamp member to tcp_request_sock_ops so that we can later unify tcp_v4_conn_request and tcp_v6_conn_request. Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 15:53:36 -07:00
Octavian Purdila	5db92c9949	tcp: unify tcp_v4_rtx_synack and tcp_v6_rtx_synack Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 15:53:36 -07:00
Octavian Purdila	d6274bd8d6	tcp: add send_synack method to tcp_request_sock_ops Create a new tcp_request_sock_ops method to unify the IPv4/IPv6 signature for tcp_v[46]_send_synack. This allows us to later unify tcp_v4_rtx_synack with tcp_v6_rtx_synack and tcp_v4_conn_request with tcp_v4_conn_request. Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 15:53:36 -07:00
Octavian Purdila	936b8bdb53	tcp: add init_seq method to tcp_request_sock_ops More work in preparation of unifying tcp_v4_conn_request and tcp_v6_conn_request: indirect the init sequence calls via the tcp_request_sock_ops. Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 15:53:36 -07:00
Octavian Purdila	9403715977	tcp: move around a few calls in tcp_v6_conn_request Make the tcp_v6_conn_request calls flow similar with that of tcp_v4_conn_request. Note that want_cookie can be true only if isn is zero and that is why we can move the if (want_cookie) block out of the if (!isn) block. Moving security_inet_conn_request() has a couple of side effects: missing inet_rsk(req)->ecn_ok update and the req->cookie_ts update. However, neither SELinux nor Smack security hooks seems to check them. This change should also avoid future different behaviour for IPv4 and IPv6 in the security hooks. Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 15:53:36 -07:00
Octavian Purdila	d94e0417ad	tcp: add route_req method to tcp_request_sock_ops Create wrappers with same signature for the IPv4/IPv6 request routing calls and use these wrappers (via route_req method from tcp_request_sock_ops) in tcp_v4_conn_request and tcp_v6_conn_request with the purpose of unifying the two functions in a later patch. We can later drop the wrapper functions and modify inet_csk_route_req and inet6_cks_route_req to use the same signature. Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 15:53:36 -07:00
Octavian Purdila	fb7b37a7f3	tcp: add init_cookie_seq method to tcp_request_sock_ops Move the specific IPv4/IPv6 cookie sequence initialization to a new method in tcp_request_sock_ops in preparation for unifying tcp_v4_conn_request and tcp_v6_conn_request. Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 15:53:35 -07:00
Octavian Purdila	16bea70aa7	tcp: add init_req method to tcp_request_sock_ops Move the specific IPv4/IPv6 intializations to a new method in tcp_request_sock_ops in preparation for unifying tcp_v4_conn_request and tcp_v6_conn_request. Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 15:53:35 -07:00
Octavian Purdila	476eab8251	net: remove inet6_reqsk_alloc Since pktops is only used for IPv6 only and opts is used for IPv4 only, we can move these fields into a union and this allows us to drop the inet6_reqsk_alloc function as after this change it becomes equivalent with inet_reqsk_alloc. This patch also fixes a kmemcheck issue in the IPv6 stack: the flags field was not annotated after a request_sock was allocated. Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 15:53:35 -07:00
Octavian Purdila	aa27fc5018	tcp: tcp_v[46]_conn_request: fix snt_synack initialization Commit `016818d07` (tcp: TCP Fast Open Server - take SYNACK RTT after completing 3WHS) changes the code to only take a snt_synack timestamp when a SYNACK transmit or retransmit succeeds. This behaviour is later broken by commit `843f4a55e` (tcp: use tcp_v4_send_synack on first SYN-ACK), as snt_synack is now updated even if tcp_v4_send_synack fails. Also, commit `3a19ce0ee` (tcp: IPv6 support for fastopen server) misses the required IPv6 updates for `016818d07`. This patch makes sure that snt_synack is updated only when the SYNACK trasnmit/retransmit succeeds, for both IPv4 and IPv6. Cc: Cardwell <ncardwell@google.com> Cc: Daniel Lee <longinus00@gmail.com> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 15:53:35 -07:00
Octavian Purdila	57b47553f6	tcp: cookie_v4_init_sequence: skb should be const Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 15:53:35 -07:00
David S. Miller	c1c27fb9b3	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2014-06-26 This series contains updates to i40e and i40evf. Kamil provides a cleanup patch to i40e where we do not need to acquire the NVM for shadow RAM checksum calculation, since we only read the shadow RAM through SRCTL register. Paul provides a fix for handling HMC for big endian architectures for i40e and i40evf. Mitch provides four cleanup and fixes for i40evf. Fix an issue where if the VF driver fails to complete early init, then rmmod can cause a softlock when the driver tries to stop a watchdog timer that never got initialized. So add a check to see if the timer is actually initialized before stopping it. Make the function i40evf_send_api_ver() return more useful information, instead of just returning -EIO by propagating firmware errors back to the caller and log a message if the PF sends an invalid reply. Fix up a log message that was missing a word, which makes the log message more readable. Fix an initialization failure if many VFs are instantiated at the same time and the VF module is autoloaded by simply resending firmware request if there is no response the first time. Jacob does a rename of the function i40e_ptp_enable() to i40e_ptp_feature_enable(), like he did for ixgbe, to reduce possible confusion and ambugity in the purpose of the function. Does follow on PTP work on i40e, like he did for ixgbe, by breaking the PTP hardware control from the ioctl command for timestamping mode. By doing this, we can maintain state about the 1588 timestamping mode and properly re-enable to the last known mode during a re-initialization of 1588 bits. Anjali cleans up the i40e driver where TCP-IPv4 filters were being added twice, which seems to be left over from when we had to add two PTYPEs for one filter. Fixes the flow director sideband logic to detect when there is a full flow director table. Also fixes the programming of FDIR where a couple of fields in the descriptor setup that were not being programmed, which left the opportunity for stale data to be pushed as part of the descriptor next time it was used. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:59:38 -07:00
David S. Miller	0ff9275a4b	Merge branch 'tipc-next' Jon Maloy says: ==================== tipc: new unicast transmission code As a step towards making the data transmission code more maintainable and performant, we introduce a number of new functions, both for building, sending and rejecting messages. The new functions will eventually be used for alla data transmission, user data unicast, service internal messaging, and multicast/broadcast. We start with this series, where we introduce the functions, and let user data unicast and the internal connection protocol use them. The remaining users will come in a later series. There are only minor changes to data structures, and no protocol changes, so the older functions can still be used in parallel for some time. Until the old functions are removed, we use temporary names for the new functions, such as tipc_build_msg2, tipc_link_xmit2. It should be noted that the first two commits are unrelated to the rest of the series. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:51:00 -07:00
Jon Paul Maloy	60120526c2	tipc: simplify connection congestion handling As a consequence of the recently introduced serialized access to the socket in commit 8d94168a761819d10252bab1f8de6d7b202c3baa ("tipc: same receive code path for connection protocol and data messages") we can make a number of simplifications in the detection and handling of connection congestion situations. - We don't need to keep two counters, one for sent messages and one for acked messages. There is no longer any risk for races between acknowledge messages arriving in BH and data message sending running in user context. So we merge this into one counter, 'sent_unacked', which is incremented at sending and subtracted from at acknowledge reception. - We don't need to set the 'congested' field in tipc_port to true before we sent the message, and clear it when sending is successful. (As a matter of fact, it was never necessary; the field was set in link_schedule_port() before any wakeup could arrive anyway.) - We keep the conditions for link congestion and connection connection congestion separated. There would otherwise be a risk that an arriving acknowledge message may wake up a user sleeping because of link congestion. - We can simplify reception of acknowledge messages. We also make some cosmetic/structural changes: - We rename the 'congested' field to the more correct 'link_cong´. - We rename 'conn_unacked' to 'rcv_unacked' - We move the above mentioned fields from struct tipc_port to struct tipc_sock. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:56 -07:00
Jon Paul Maloy	ac0074ee70	tipc: clean up connection protocol reception function We simplify the code for receiving connection probes, leveraging the recently introduced tipc_msg_reverse() function. We also stick to the principle of sending a possible response message directly from the calling (tipc_sk_rcv or backlog_rcv) functions, hence making the call chain shallower and easier to follow. We make one small protocol change here, allowed according to the spec. If a protocol message arrives from a remote socket that is not the one we are connected to, we are currently generating a connection abort message and send it to the source. This behavior is unnecessary, and might even be a security risk, so instead we now choose to only ignore the message. The consequnce for the sender is that he will need longer time to discover his mistake (until the next timeout), but this is an extreme corner case, and may happen anyway under other circumstances, so we deem this change acceptable. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:56 -07:00
Jon Paul Maloy	ec8a2e5621	tipc: same receive code path for connection protocol and data messages As a preparation to eliminate port_lock we need to bring reception of connection protocol messages under proper protection of bh_lock_sock or socket owner. We fix this by letting those messages follow the same code path as incoming data messages. As a side effect of this change, the last reference to the function net_route_msg() disappears, and we can eliminate that function. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:56 -07:00
Jon Paul Maloy	b786e2b0fa	tipc: let port protocol senders use new link send function Several functions in port.c, related to the port protocol and connection shutdown, need to send messages. We now convert them to use the new link send function. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:55 -07:00
Jon Paul Maloy	4ccfe5e041	tipc: connection oriented transport uses new send functions We move the message sending across established connections to use the message preparation and send functions introduced earlier in this series. We now do the message preparation and call to the link send function directly from the socket, instead of going via the port layer. As a consequence of this change, the functions tipc_send(), tipc_port_iovec_rcv(), tipc_port_iovec_reject() and tipc_reject_msg() become unreferenced and can be eliminated from port.c. For the same reason, the functions tipc_link_xmit_fast(), tipc_link_iovec_xmit_long() and tipc_link_iovec_fast() can be eliminated from link.c. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:55 -07:00
Jon Paul Maloy	e2dafe87d3	tipc: RDM/DGRAM transport uses new fragmenting and sending functions We merge the code for sending port name and port identity addressed messages into the corresponding send functions in socket.c, and start using the new fragmenting and transmit functions we just have introduced. This saves a call level and quite a few code lines, as well as making this part of the code easier to follow. As a consequence, the functions tipc_send2name() and tipc_send2port() in port.c can be removed. For practical reasons, we break out the code for sending multicast messages from tipc_sendmsg() and move it into a separate function, tipc_sendmcast(), but we do not yet convert it into using the new build/send functions. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:55 -07:00
Jon Paul Maloy	5a379074a7	tipc: introduce message evaluation function When a message arrives in a node and finds no destination socket, we may need to drop it, reject it, or forward it after a secondary destination lookup. The latter two cases currently results in a code path that is perceived as complex, because it follows a deep call chain via obscure functions such as net_route_named_msg() and net_route_msg(). We now introduce a function, tipc_msg_eval(), that takes the decision about whether such a message should be rejected or forwarded, but leaves it to the caller to actually perform the indicated action. If the decision is 'reject', it is still the task of the recently introduced function tipc_msg_reverse() to take the final decision about whether the message is rejectable or not. In the latter case it drops the message. As a result of this change, we can finally eliminate the function net_route_named_msg(), and hence become independent of net_route_msg(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:55 -07:00
Jon Paul Maloy	8db1bae30b	tipc: separate building and sending of rejected messages The way we build and send rejected message is currenty perceived as hard to follow, partly because we let the transmission go via deep call chains through functions such as tipc_reject_msg() and net_route_msg(). We want to remove those functions, and make the call sequences shallower and simpler. For this purpose, we separate building and sending of rejected messages. We build the reject message using the new function tipc_msg_reverse(), and let the transmission go via the newly introduced tipc_link_xmit2() function, as all transmission eventually will do. We also ensure that all calls to tipc_link_xmit2() are made outside port_lock/bh_lock_sock. Finally, we replace all calls to tipc_reject_msg() with the two new calls at all locations in the code that we want to keep. The remaining calls are made from code that we are planning to remove, along with tipc_reject_msg() itself. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:55 -07:00
Jon Paul Maloy	067608e9d0	tipc: introduce direct iovec to buffer chain fragmentation function Fragmentation at message sending is currently performed in two places in link.c, depending on whether data to be transmitted is delivered in the form of an iovec or as a big sk_buff. Those functions are also tightly entangled with the send functions that are using them. We now introduce a re-entrant, standalone function, tipc_msg_build2(), that builds a packet chain directly from an iovec. Each fragment is sized according to the MTU value given by the caller, and is prepended with a correctly built fragment header, when needed. The function is independent from who is calling and where the chain will be delivered, as long as the caller is able to indicate a correct MTU. The function is tested, but not called by anybody yet. Since it is incompatible with the existing tipc_msg_build(), and we cannot yet remove that function, we have given it a temporary name. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:55 -07:00
Jon Paul Maloy	16e166b88c	tipc: make link mtu easily accessible from socket Message fragmentation is currently performed at link level, inside the protection of node_lock. This potentially binds up the sending link structure for a long time, instead of letting it do other tasks, such as handle reception of new packets. In this commit, we make the MTUs of each active link become easily accessible from the socket level, i.e., without taking any spinlock or dereferencing the target link pointer. This way, we make it possible to perform fragmentation in the sending socket, before sending the whole fragment chain to the link for transport. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:55 -07:00
Jon Paul Maloy	4f1688b2c6	tipc: introduce send functions for chained buffers in link The current link implementation provides several different transmit functions, depending on the characteristics of the message to be sent: if it is an iovec or an sk_buff, if it needs fragmentation or not, if the caller holds the node_lock or not. The permutation of these options gives us an unwanted amount of unnecessarily complex code. As a first step towards simplifying the send path for all messages, we introduce two new send functions at link level, tipc_link_xmit2() and __tipc_link_xmit2(). The former looks up a link to the message destination, and if one is found, it grabs the node lock and calls the second function, which works exclusively inside the node lock protection. If no link is found, and the destination is on the same node, it delivers the message directly to the local destination socket. The new functions take a buffer chain where all packet headers are already prepared, and the correct MTU has been used. These two functions will later replace all other link-level transmit functions. The functions are not backwards compatible, so we have added them as new functions with temporary names. They are tested, but have no users yet. Those will be added later in this series. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:54 -07:00

1 2 3 4 5 ...

455614 Commits All Branches Search

455614 Commits

All Branches