It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:
https://github.com/zfsonlinux/spl/issues/241http://tracker.ceph.com/issues/7790
The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.
This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.
Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Cc: stable@vger.kernel.org
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
conn_info_age value is calculated in ms, so need to be converted to
jiffies.
Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds support for max_tx_power in Get Connection Information
request. Value is read only once for given connection and then always
returned in response as parameter.
Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds support to store local maximum TX power level for
connection when reply for HCI_Read_Transmit_Power_Level is received.
Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
TX power for LE links is immutable thus we do not need to query for it
if already have value.
Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds support for Get Connection Information mgmt command
which can be used to query for information about connection, i.e. RSSI
and local TX power level.
In general values cached in hci_conn are returned as long as they are
considered valid, i.e. do not exceed age limit set in hdev. This limit
is calculated as random value between min/max values to avoid client
trying to guess when to poll for updated information.
Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds conn_info_min_age and conn_info_max_age parameters to
debugfs which determine lifetime of connection information. Actual
lifetime will be random value between min and max age.
Default values for min and max age are 1000ms and 3000ms respectively.
Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
RFC 4861 states in 7.2.5:
The IsRouter flag in the cache entry MUST be set based on the
Router flag in the received advertisement. In those cases
where the IsRouter flag changes from TRUE to FALSE as a result
of this update, the node MUST remove that router from the
Default Router List and update the Destination Cache entries
for all destinations using that neighbor as a router as
specified in Section 7.3.3. This is needed to detect when a
node that is used as a router stops forwarding packets due to
being configured as a host.
Currently, when dealing with NA Message which IsRouter flag changes from
TRUE to FALSE, the kernel only removes router from the Default Router List,
and don't update the Destination Cache entries.
Now in order to update those Destination Cache entries, i introduce
function rt6_clean_tohost().
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
net/ipv4/ip_vti.c
Steffen Klassert says:
====================
pull request (net): ipsec 2014-05-15
This pull request has a merge conflict in net/ipv4/ip_vti.c
between commit 8d89dcdf80 ("vti: don't allow to add the same
tunnel twice") and commit a32452366b ("vti4:Don't count header
length twice"). It can be solved like it is done in linux-next.
1) Fix a ipv6 xfrm output crash when a packet is rerouted
by netfilter to not use IPsec.
2) vti4 counts some header lengths twice leading to an incorrect
device mtu. Fix this by counting these headers only once.
3) We don't catch the case if an unsupported protocol is submitted
to the xfrm protocol handlers, this can lead to NULL pointer
dereferences. Fix this by adding the appropriate checks.
4) vti6 may unregister pernet ops twice on init errors.
Fix this by removing one of the calls to do it only once.
From Mathias Krause.
5) Set the vti tunnel mark before doing a lookup in the error
handlers. Otherwise we don't find the correct xfrm state.
====================
The conflict in ip_vti.c was simple, 'net' had a commit
removing a line from vti_tunnel_init() and this tree
being merged had a commit adding a line to the same
location.
Signed-off-by: David S. Miller <davem@davemloft.net>
Netdev_priv is an accessor function, and has no purpose if its result is
not used.
A simplified version of the semantic match that fixes this problem is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
@@ local idexpression x; @@
-x = netdev_priv(...);
... when != x
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Netdev_priv is an accessor function, and has no purpose if its result is
not used.
A simplified version of the semantic match that fixes this problem is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
@@ local idexpression x; @@
-x = netdev_priv(...);
... when != x
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Maps all internal BPF instructions into x86_64 instructions.
This patch replaces original BPF x64 JIT with internal BPF x64 JIT.
sysctl net.core.bpf_jit_enable is reused as on/off switch.
Performance:
1. old BPF JIT and internal BPF JIT generate equivalent x86_64 code.
No performance difference is observed for filters that were JIT-able before
Example assembler code for BPF filter "tcpdump port 22"
original BPF -> old JIT: original BPF -> internal BPF -> new JIT:
0: push %rbp 0: push %rbp
1: mov %rsp,%rbp 1: mov %rsp,%rbp
4: sub $0x60,%rsp 4: sub $0x228,%rsp
8: mov %rbx,-0x8(%rbp) b: mov %rbx,-0x228(%rbp) // prologue
12: mov %r13,-0x220(%rbp)
19: mov %r14,-0x218(%rbp)
20: mov %r15,-0x210(%rbp)
27: xor %eax,%eax // clear A
c: xor %ebx,%ebx 29: xor %r13,%r13 // clear X
e: mov 0x68(%rdi),%r9d 2c: mov 0x68(%rdi),%r9d
12: sub 0x6c(%rdi),%r9d 30: sub 0x6c(%rdi),%r9d
16: mov 0xd8(%rdi),%r8 34: mov 0xd8(%rdi),%r10
3b: mov %rdi,%rbx
1d: mov $0xc,%esi 3e: mov $0xc,%esi
22: callq 0xffffffffe1021e15 43: callq 0xffffffffe102bd75
27: cmp $0x86dd,%eax 48: cmp $0x86dd,%rax
2c: jne 0x0000000000000069 4f: jne 0x000000000000009a
2e: mov $0x14,%esi 51: mov $0x14,%esi
33: callq 0xffffffffe1021e31 56: callq 0xffffffffe102bd91
38: cmp $0x84,%eax 5b: cmp $0x84,%rax
3d: je 0x0000000000000049 62: je 0x0000000000000074
3f: cmp $0x6,%eax 64: cmp $0x6,%rax
42: je 0x0000000000000049 68: je 0x0000000000000074
44: cmp $0x11,%eax 6a: cmp $0x11,%rax
47: jne 0x00000000000000c6 6e: jne 0x0000000000000117
49: mov $0x36,%esi 74: mov $0x36,%esi
4e: callq 0xffffffffe1021e15 79: callq 0xffffffffe102bd75
53: cmp $0x16,%eax 7e: cmp $0x16,%rax
56: je 0x00000000000000bf 82: je 0x0000000000000110
58: mov $0x38,%esi 88: mov $0x38,%esi
5d: callq 0xffffffffe1021e15 8d: callq 0xffffffffe102bd75
62: cmp $0x16,%eax 92: cmp $0x16,%rax
65: je 0x00000000000000bf 96: je 0x0000000000000110
67: jmp 0x00000000000000c6 98: jmp 0x0000000000000117
69: cmp $0x800,%eax 9a: cmp $0x800,%rax
6e: jne 0x00000000000000c6 a1: jne 0x0000000000000117
70: mov $0x17,%esi a3: mov $0x17,%esi
75: callq 0xffffffffe1021e31 a8: callq 0xffffffffe102bd91
7a: cmp $0x84,%eax ad: cmp $0x84,%rax
7f: je 0x000000000000008b b4: je 0x00000000000000c2
81: cmp $0x6,%eax b6: cmp $0x6,%rax
84: je 0x000000000000008b ba: je 0x00000000000000c2
86: cmp $0x11,%eax bc: cmp $0x11,%rax
89: jne 0x00000000000000c6 c0: jne 0x0000000000000117
8b: mov $0x14,%esi c2: mov $0x14,%esi
90: callq 0xffffffffe1021e15 c7: callq 0xffffffffe102bd75
95: test $0x1fff,%ax cc: test $0x1fff,%rax
99: jne 0x00000000000000c6 d3: jne 0x0000000000000117
d5: mov %rax,%r14
9b: mov $0xe,%esi d8: mov $0xe,%esi
a0: callq 0xffffffffe1021e44 dd: callq 0xffffffffe102bd91 // MSH
e2: and $0xf,%eax
e5: shl $0x2,%eax
e8: mov %rax,%r13
eb: mov %r14,%rax
ee: mov %r13,%rsi
a5: lea 0xe(%rbx),%esi f1: add $0xe,%esi
a8: callq 0xffffffffe1021e0d f4: callq 0xffffffffe102bd6d
ad: cmp $0x16,%eax f9: cmp $0x16,%rax
b0: je 0x00000000000000bf fd: je 0x0000000000000110
ff: mov %r13,%rsi
b2: lea 0x10(%rbx),%esi 102: add $0x10,%esi
b5: callq 0xffffffffe1021e0d 105: callq 0xffffffffe102bd6d
ba: cmp $0x16,%eax 10a: cmp $0x16,%rax
bd: jne 0x00000000000000c6 10e: jne 0x0000000000000117
bf: mov $0xffff,%eax 110: mov $0xffff,%eax
c4: jmp 0x00000000000000c8 115: jmp 0x000000000000011c
c6: xor %eax,%eax 117: mov $0x0,%eax
c8: mov -0x8(%rbp),%rbx 11c: mov -0x228(%rbp),%rbx // epilogue
cc: leaveq 123: mov -0x220(%rbp),%r13
cd: retq 12a: mov -0x218(%rbp),%r14
131: mov -0x210(%rbp),%r15
138: leaveq
139: retq
On fully cached SKBs both JITed functions take 12 nsec to execute.
BPF interpreter executes the program in 30 nsec.
The difference in generated assembler is due to the following:
Old BPF imlements LDX_MSH instruction via sk_load_byte_msh() helper function
inside bpf_jit.S.
New JIT removes the helper and does it explicitly, so ldx_msh cost
is the same for both JITs, but generated code looks longer.
New JIT has 4 registers to save, so prologue/epilogue are larger,
but the cost is within noise on x64.
Old JIT checks whether first insn clears A and if not emits 'xor %eax,%eax'.
New JIT clears %rax unconditionally.
2. old BPF JIT doesn't support ANC_NLATTR, ANC_PAY_OFFSET, ANC_RANDOM
extensions. New JIT supports all BPF extensions.
Performance of such filters improves 2-4 times depending on a filter.
The longer the filter the higher performance gain.
Synthetic benchmarks with many ancillary loads see 20x speedup
which seems to be the maximum gain from JIT
Notes:
. net.core.bpf_jit_enable=2 + tools/net/bpf_jit_disasm is still functional
and can be used to see generated assembler
. there are two jit_compile() functions and code flow for classic filters is:
sk_attach_filter() - load classic BPF
bpf_jit_compile() - try to JIT from classic BPF
sk_convert_filter() - convert classic to internal
bpf_int_jit_compile() - JIT from internal BPF
seccomp and tracing filters will just call bpf_int_jit_compile()
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This function is only used within the same translation unit, so mark it
static.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
802.15.4 datagram sockets do not currently have a compliant sendmsg().
The destination address supplied is always ignored, and in unconnected
mode, packets are broadcast instead of dropped with -EDESTADDRREQ. This
patch fixes 802.15.4 dgram sockets to be compliant, i.e.
!conn && !msg_name => -EDESTADDRREQ
!conn && msg_name => send to msg_name
conn && !msg_name => send to connected
conn && msg_name => -EISCONN
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, 6lowpan creates one 802.15.4 MAC header for the original
packet the device was given by upper layers and reuses this header for
all fragments, if fragmentation is required. This also reuses frame
sequence numbers, which must not happen. 6lowpan also has issues with
fragmentation in the presence of security headers, since those may imply
the presence of trailing fields that are not accounted for by the
fragmentation code right now.
Fix both of these issues by properly allocating fragment skbs with
headromm and tailroom as specified by the underlying device, create one
header for each skb instead of reusing the original header, let the
underlying device do the rest.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current mac_cb handling of ieee802154 is rather awkward and limited.
Decompose the single flags field into multiple fields with the meanings
of each subfield of the flags field to make future extensions (for
example, link-layer security) easier. Also don't set the frame sequence
number in upper layers, since that's a thing the MAC is supposed to set
on frame transmit - we set it on header creation, but assuming that
upper layers do not blindly duplicate our headers, this is fine.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current WPAN header creation code checks for EMSGSIZE conditions,
but does not account for the MIC field that link layer security may add
at the end of the frame. Now that we can accurately calculate the
maximum payload size of packets, use that to check for EMSGSIZE
conditions.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
When dealing with 802.15.4, one often has to know the maximum payload
size for a given packet. This depends on many factors, one of which is
whether or not a security header is present in the frame. These
definitions and functions provide an easy way for any upper layer to
calculate the maximum payload size for a packet. The first obvious user
for this is 6lowpan, which duplicates this calculation and gets it
partially wrong because it ignores security headers.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Cong Wang <cwang@twopensource.com>
commit 50624c934d (net: Delay default_device_exit_batch until no
devices are unregistering) introduced rtnl_lock_unregistering() for
default_device_exit_batch(). Same race could happen we when rmmod a driver
which calls rtnl_link_unregister() as we call dev->destructor without rtnl
lock.
For long term, I think we should clean up the mess of netdev_run_todo()
and net namespce exit code.
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change introduced by 88e48d7b33
("batman-adv: make DAT drop ARP requests targeting local clients")
implements a check that prevents DAT from using the caching
mechanism when the client that is supposed to provide a reply
to an arp request is local.
However change brought by be1db4f661
("batman-adv: make the Distributed ARP Table vlan aware")
has not converted the above check into its vlan aware version
thus making it useless when the local client is behind a vlan.
Fix the behaviour by properly specifying the vlan when
checking for a client being local or not.
Reported-by: Simon Wunderlich <simon@open-mesh.com>
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
A pointer to the orig_node representing a bat-gateway is
stored in the gw_node->orig_node member, but the refcount
for such orig_node is never increased.
This leads to memory faults when gw_node->orig_node is accessed
and the originator has already been freed.
Fix this by increasing the refcount on gw_node creation
and decreasing it on gw_node free.
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
In the new fragmentation code the batadv_frag_send_packet()
function obtains a reference to the primary_if, but it does
not release it upon return.
This reference imbalance prevents the primary_if (and then
the related netdevice) to be properly released on shut down.
Fix this by releasing the primary_if in batadv_frag_send_packet().
Introduced by ee75ed8887
("batman-adv: Fragment and send skbs larger than mtu")
Cc: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Acked-by: Martin Hundebøll <martin@hundeboll.net>
If hard_iface is NULL and goto out is made batadv_hardif_free_ref()
doesn't check for NULL before dereferencing it to get to refcount.
Introduced in cb1c92ec37
("batman-adv: add debugfs support to view multiif tables").
Reported-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Acked-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Add the corresponding trace if we have a full match in a non-terminal
rule. Note that the traces will look slightly different than in
x_tables since the log message after all expressions have been
evaluated (contrary to x_tables, that emits it before the target
action). This manifests in two differences in nf_tables wrt. x_tables:
1) The rule that enables the tracing is included in the trace.
2) If the rule emits some log message, that is shown before the
trace log message.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Make the beacon CSA counters part of ieee80211_mutable_offsets and don't
decrement CSA counters when generating a beacon template. This permits the
driver to offload the CSA counters handling. Since mac80211 updates the probe
responses with the correct counter, the driver should sync the counter's value
with mac80211 using ieee80211_csa_update_counter function.
Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add a new API ieee80211_beacon_get_template, which doesn't
affect DTIM counter and should be used if the device generates beacon
frames, and new beacon template is needed. In addition set the offsets
to TIM IE for MESH interface.
Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Support up to IEEE80211_MAX_CSA_COUNTERS_NUM csa counters.
This is defined to be 2 now, to support both CSA and eCSA
counters.
Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Change the type of NL80211_ATTR_CSA_C_OFF_BEACON and
NL80211_ATTR_CSA_C_OFF_PRESP to be NLA_BINARY which allows
userspace to use beacons and probe responses with
multiple CSA counters.
This isn't breaking the API since userspace can
continue to use nla_put_u16 for this attributes, which
is equivalent to a single element u16 array.
In addition advertise max number of supported CSA counters.
This is needed when using CSA and eCSA IEs together.
Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Track current csa counter value and use it
to update mgmt frames at the provided offsets.
Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add NL80211_ATTR_CSA_C_OFFSETS_TX which holds an array
of offsets to the CSA counters which should be updated
when sending a management frames with NL80211_CMD_FRAME.
This API should be used by the drivers that wish to keep the
CSA counter updated in probe responses, but do not implement
probe response offloading and so, do not use
ieee80211_proberesp_get function.
Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There is no need to pass NL80211_IFTYPE_UNSPECIFIED when calling
cfg80211_chandef_dfs_required() since we always already have the
interface type. So, pass the actual interface type instead.
Additionally, have cfg80211_chandef_dfs_required() WARN if the passed
interface type is NL80211_IFTYPE_UNSPECIFIED, so we can detect
problems more easily.
Tested-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Reported-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Missing a colon on definition use is a bit odd so
change the macro for the 32 bit case to declare an
__attribute__((unused)) and __deprecated variable.
The __deprecated attribute will cause gcc to emit
an error if the variable is actually used.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When sending data through IUCV a MESSAGE COMPLETE interrupt
signals that sent data memory can be freed or reused again.
With commit f9c41a62bb
"af_iucv: fix recvmsg by replacing skb_pull() function" the
MESSAGE COMPLETE callback iucv_callback_txdone() identifies
the wrong skb as being confirmed, which leads to data corruption.
This patch fixes the skb mapping logic in iucv_callback_txdone().
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In Documentation/networking/dccp.txt points that request_retries
should be greater than 0. So make the extra1 to be &one instead
of &zero.
Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fengguang reported the following sparse warning:
>> net/ipv6/proc.c:198:41: sparse: incorrect type in argument 1 (different address spaces)
net/ipv6/proc.c:198:41: expected void [noderef] <asn:3>*mib
net/ipv6/proc.c:198:41: got void [noderef] <asn:3>**pcpumib
Fixes: commit 698365fa18 (net: clean up snmp stats code)
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_local_port_range is already per netns, so should ip_local_reserved_ports
be. And since it is none by default we don't actually need it when we don't
enable CONFIG_SYSCTL.
By the way, rename inet_is_reserved_local_port() to inet_is_local_reserved_port()
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to reduce complexity and save a call level during message
reception at port/socket level, we remove the function tipc_port_rcv()
and merge its functionality into tipc_sk_rcv().
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function tipc_disc_rcv(), which is handling received neighbor
discovery messages, is perceived as messy, and it is hard to verify
its correctness by code inspection. The fact that the task it is set
to resolve is fairly complex does not make the situation better.
In this commit we try to take a more systematic approach to the
problem. We define a decision machine which takes three state flags
as input, and produces three action flags as output. We then walk
through all permutations of the state flags, and for each of them we
describe verbally what is going on, plus that we set zero or more of
the action flags. The action flags indicate what should be done once
the decision machine has finished its job, while the last part of the
function deals with performing those actions.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TIPC currently handles two media specific addresses: Ethernet MAC
addresses and InfiniBand addresses. Those are kept in three different
formats:
1) A "raw" format as obtained from the device. This format is known
only by the media specific adapter code in eth_media.c and
ib_media.c.
2) A "generic" internal format, in the form of struct tipc_media_addr,
which can be referenced and passed around by the generic media-
unaware code.
3) A serialized version of the latter, to be conveyed in neighbor
discovery messages.
Conversion between the three formats can only be done by the media
specific code, so we have function pointers for this purpose in
struct tipc_media. Here, the media adapters can install their own
conversion functions at startup.
We now introduce a new such function, 'raw2addr()', whose purpose
is to convert from format 1 to format 2 above. We also try to as far
as possible uniform commenting, variable names and usage of these
functions, with the purpose of making them more comprehensible.
We can now also remove the function tipc_l2_media_addr_set(), whose
job is done better by the new function.
Finally, we expand the field for serialized addresses (format 3)
in discovery messages from 20 to 32 bytes. This is permitted
according to the spec, and reduces the risk of problems when we
add new media in the future.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function tipc_link_frag_rcv() is in reality a re-entrant generic
message reassemby function that has nothing in particular to do with
the link, where it is defined now. This becomes obvious when we see
the need to call the function from other places in the code.
In this commit rename it to tipc_buf_append() and move it to the file
msg.c. We also simplify its signature by moving the tail pointer to
the control block of the head buffer, hence making the head buffer
self-contained.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The message reassembly function does not update the 'len' and 'data_len'
fields of the head skbuff correctly when fragments are chained to it.
This may sometimes lead to obsure errors, such as fragment reordering
when we receive fragments which are cloned buffers.
This commit fixes this, by ensuring that the two fields are updated
correctly.
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the current code, all incoming LINK_PROTOCOL messages, irrespective
of type, nudge the "last message received" checkpoint, informing the
link state machine that a message was received from the peer since last
supervision timeout event. This inhibits the link from starting probing
the peer unnecessarily.
However, not only STATE messages are recorded as legitimate incoming
traffic this way, but even RESET and ACTIVATE messages, which in
reality are there to inform the link that the peer endpoint has been
reset. At the same time, some RESET messages may be dropped instead
of causing a link reset. This happens when the link endpoint thinks
it is fully up and working, and the session number of the RESET is
lower than or equal to the current link session. In such cases the
RESET is perceived as a delayed remnant from an earlier session, or
the current one, and dropped.
Now, if a TIPC module is removed and then immediately reinserted, e.g.
when using a script, RESET messages may arrive at the peer link endpoint
before this one has had time to discover the failure. The RESET may be
dropped because of the session number, but only after it has been
recorded as a legitimate traffic event. Hence, the receiving link will
not start probing, and not discover that the peer endpoint is down, at
the same time ignoring the periodic RESET messages coming from that
endpoint. We have ended up in a stale state where a failed link cannot
be re-established.
In this commit, we remedy this by nudging the checkpoint only for
received STATE messages, not for RESET or ACTIVATE messages.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function net/core/sock.c::__release_sock() runs a tight loop
to move buffers from the socket backlog queue to the receive queue.
As a security measure, sk_backlog.len of the receiving socket
is not set to zero until after the loop is finished, i.e., until
the whole backlog queue has been transferred to the receive queue.
During this transfer, the data that has already been moved is counted
both in the backlog queue and the receive queue, hence giving an
incorrect picture of the available queue space for new arriving buffers.
This leads to unnecessary rejection of buffers by sk_add_backlog(),
which in TIPC leads to unnecessarily broken connections.
In this commit, we compensate for this double accounting by adding
a counter that keeps track of it. The function socket.c::backlog_rcv()
receives buffers one by one from __release_sock(), and adds them to the
socket receive queue. If the transfer is successful, it increases a new
atomic counter 'tipc_sock::dupl_rcvcnt' with 'truesize' of the
transferred buffer. If a new buffer arrives during this transfer and
finds the socket busy (owned), we attempt to add it to the backlog.
However, when sk_add_backlog() is called, we adjust the 'limit'
parameter with the value of the new counter, so that the risk of
inadvertent rejection is eliminated.
It should be noted that this change does not invalidate the original
purpose of zeroing 'sk_backlog.len' after the full transfer. We set an
upper limit for dupl_rcvcnt, so that if a 'wild' sender (i.e., one that
doesn't respect the send window) keeps pumping in buffers to
sk_add_backlog(), he will eventually reach an upper limit,
(2 x TIPC_CONN_OVERLOAD_LIMIT). After that, no messages can be added
to the backlog, and the connection will be broken. Ordinary, well-
behaved senders will never reach this buffer limit at all.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Memory overhead when allocating big buffers for data transfer may
be quite significant. E.g., truesize of a 64 KB buffer turns out
to be 132 KB, 2 x the requested size.
This invalidates the "worst case" calculation we have been
using to determine the default socket receive buffer limit,
which is based on the assumption that 1024x64KB = 67MB buffers
may be queued up on a socket.
Since TIPC connections cannot survive hitting the buffer limit,
we have to compensate for this overhead.
We do that in this commit by dividing the fix connection flow
control window from 1024 (2*512) messages to 512 (2*256). Since
older version nodes send out acks at 512 message intervals,
compatibility with such nodes is guaranteed, although performance
may be non-optimal in such cases.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
0466 was probably meant to be 0644, there's no reason why everyone
except root could write there.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org
Jouni reported that if a remain-on-channel was active on the
same channel as the current operating channel, then the ROC
would start, but any frames transmitted using mgmt-tx on the
same channel would get delayed until after the ROC.
The reason for this is that the ROC starts, but doesn't have
any handling for "remain on the same channel", so it stops
the interface queues. The later mgmt-tx then puts the frame
on the interface queues (since it's on the current operating
channel) and thus they get delayed until after the ROC.
To fix this, add some logic to handle remaining on the same
channel specially and not stop the queues etc. in this case.
This not only fixes the bug but also improves behaviour in
this case as data frames etc. can continue to flow.
Cc: stable@vger.kernel.org
Reported-by: Jouni Malinen <j@w1.fi>
Tested-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
tot_len does specify the size of struct ipv6_txoptions. We need opt_flen +
opt_nflen to calculate the overall length of additional ipv6 extensions.
I found this while auditing the ipv6 output path for a memory corruption
reported by Alexey Preobrazhensky while he fuzzed an instrumented
AddressSanitizer kernel with trinity. This may or may not be the cause
of the original bug.
Fixes: 4df98e76cd ("ipv6: pmtudisc setting not respected with UFO/CORK")
Reported-by: Alexey Preobrazhensky <preobr@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
net_get_random_once depends on the static keys infrastructure to patch up
the branch to the slow path during boot. This was realized by abusing the
static keys api and defining a new initializer to not enable the call
site while still indicating that the branch point should get patched
up. This was needed to have the fast path considered likely by gcc.
The static key initialization during boot up normally walks through all
the registered keys and either patches in ideal nops or enables the jump
site but omitted that step on x86 if ideal nops where already placed at
static_key branch points. Thus net_get_random_once branches not always
became active.
This patch switches net_get_random_once to the ordinary static_key
api and thus places the kernel fast path in the - by gcc considered -
unlikely path. Microbenchmarks on Intel and AMD x86-64 showed that
the unlikely path actually beats the likely path in terms of cycle cost
and that different nop patterns did not make much difference, thus this
switch should not be noticeable.
Fixes: a48e42920f ("net: introduce new macro net_get_random_once")
Reported-by: Tuomas Räsänen <tuomasjjrasanen@tjjr.fi>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When using mark-based routing, sockets returned from accept()
may need to be marked differently depending on the incoming
connection request.
This is the case, for example, if different socket marks identify
different networks: a listening socket may want to accept
connections from all networks, but each connection should be
marked with the network that the request came in on, so that
subsequent packets are sent on the correct network.
This patch adds a sysctl to mark TCP sockets based on the fwmark
of the incoming SYN packet. If enabled, and an unmarked socket
receives a SYN, then the SYN packet's fwmark is written to the
connection's inet_request_sock, and later written back to the
accepted socket when the connection is established. If the
socket already has a nonzero mark, then the behaviour is the same
as it is today, i.e., the listening socket's fwmark is used.
Black-box tested using user-mode linux:
- IPv4/IPv6 SYN+ACK, FIN, etc. packets are routed based on the
mark of the incoming SYN packet.
- The socket returned by accept() is marked with the mark of the
incoming SYN packet.
- Tested with syncookies=1 and syncookies=2.
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, routing lookups used for Path PMTU Discovery in
absence of a socket or on unmarked sockets use a mark of 0.
This causes PMTUD not to work when using routing based on
netfilter fwmark mangling and fwmark ip rules, such as:
iptables -j MARK --set-mark 17
ip rule add fwmark 17 lookup 100
This patch causes these route lookups to use the fwmark from the
received ICMP error when the fwmark_reflect sysctl is enabled.
This allows the administrator to make PMTUD work by configuring
appropriate fwmark rules to mark the inbound ICMP packets.
Black-box tested using user-mode linux by pointing different
fwmarks at routing tables egressing on different interfaces, and
using iptables mangling to mark packets inbound on each interface
with the interface's fwmark. ICMPv4 and ICMPv6 PMTU discovery
work as expected when mark reflection is enabled and fail when
it is disabled.
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kernel-originated IP packets that have no user socket associated
with them (e.g., ICMP errors and echo replies, TCP RSTs, etc.)
are emitted with a mark of zero. Add a sysctl to make them have
the same mark as the packet they are replying to.
This allows an administrator that wishes to do so to use
mark-based routing, firewalling, etc. for these replies by
marking the original packets inbound.
Tested using user-mode linux:
- ICMP/ICMPv6 echo replies and errors.
- TCP RST packets (IPv4 and IPv6).
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After all the preparatory works, supporting IPv6 in Fast Open is now easy.
We pretty much just mirror v4 code. The only difference is how we
generate the Fast Open cookie for IPv6 sockets. Since Fast Open cookie
is 128 bits and we use AES 128, we use CBC-MAC to encrypt both the
source and destination IPv6 addresses since the cookie is a MAC tag.
Signed-off-by: Daniel Lee <longinus00@gmail.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Jerry Chu <hkchu@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a fast open socket is already accepted by the user, it should
be treated like a connected socket to record the ICMP error in
sk_softerr, so the user can fetch it. Do that in both tcp_v4_err
and tcp_v6_err.
Also refactor the sequence window check to improve readability
(e.g., there were two local variables named 'req').
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Daniel Lee <longinus00@gmail.com>
Signed-off-by: Jerry Chu <hkchu@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To avoid large code duplication in IPv6, we need to first simplify
the complicate SYN-ACK sending code in tcp_v4_conn_request().
To use tcp_v4(6)_send_synack() to send all SYN-ACKs, we need to
initialize the mini socket's receive window before trying to
create the child socket and/or building the SYN-ACK packet. So we move
that initialization from tcp_make_synack() to tcp_v4_conn_request()
as a new function tcp_openreq_init_req_rwin().
After this refactoring the SYN-ACK sending code is simpler and easier
to implement Fast Open for IPv6.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Daniel Lee <longinus00@gmail.com>
Signed-off-by: Jerry Chu <hkchu@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Consolidate various cookie checking and generation code to simplify
the fast open processing. The main goal is to reduce code duplication
in tcp_v4_conn_request() for IPv6 support.
Removes two experimental sysctl flags TFO_SERVER_ALWAYS and
TFO_SERVER_COOKIE_NOT_CHKD used primarily for developmental debugging
purposes.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Daniel Lee <longinus00@gmail.com>
Signed-off-by: Jerry Chu <hkchu@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move common TFO functions that will be used by both v4 and v6
to tcp_fastopen.c. Create a helper tcp_fastopen_queue_check().
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Daniel Lee <longinus00@gmail.com>
Signed-off-by: Jerry Chu <hkchu@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net: get rid of SET_ETHTOOL_OPS
Dave Miller mentioned he'd like to see SET_ETHTOOL_OPS gone.
This does that.
Mostly done via coccinelle script:
@@
struct ethtool_ops *ops;
struct net_device *dev;
@@
- SET_ETHTOOL_OPS(dev, ops);
+ dev->ethtool_ops = ops;
Compile tested only, but I'd seriously wonder if this broke anything.
Suggested-by: Dave Miller <davem@davemloft.net>
Signed-off-by: Wilfried Klaebe <w-lkml@lebenslange-mailadresse.de>
Acked-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk_unattached_filter_create() will copy the filter's instructions so we
don't need to have the master copy hanging around after initialization.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do not collide with the x86-64 PTRACE user API namespace.
net/core/filter.c:57:0: warning: "R8" redefined [enabled by default]
arch/x86/include/uapi/asm/ptrace-abi.h:38:0: note: this is the location of the previous definition
Fix by adding a BPF_ prefix to the register macros.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
- properly release neigh_ifinfo in batadv_iv_ogm_process_per_outif()
- properly release orig_ifinfo->router when freeing orig_ifinfo
- properly release neigh_node objects during periodic check
- properly release neigh_info objects when the related hard_iface
is free'd
These changes are all very important because they fix some
reference counting imbalances that lead to the
impossibility of releasing the netdev object used by
batman-adv on shutdown.
The consequence is that such object cannot be destroyed by
the networking stack (the refcounter does not reach zero)
thus bringing the system in hanging state during a normal
reboot operation or a network reconfiguration.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJTbyNlAAoJEEKTMo6mOh1VzeUQAJmcP73MMwdFDGfI+3DUD43Z
ziaWzHK1/NAkERIJMYu/Nj9BPhFJ/JgYNoYGd4eZ+0IVzIidBKffpGvZYLKJaBBb
kVzDt8sHgm7T+bmJdGK5zBCkCrQ66T1/7jF7evzWCtdmzAj9Ld+cJha6sZ6OLY4v
WusFFHH2yQgzOGML52HdM99lIfZJu53sdQtYrMI7FpmObwmoBw1VQsmLsJbbFj0A
XbFWYNOtQ0s8JvuHPnHB2gsczMXG6AdDuYdG1douOUryjsdg4AsKVWbPWaSuIyS9
ED6TiNsxtRt3A2YDgKrYmcGWHIc7CR4TE97DpdaB1xOEe/h0JPy8NEXaTiXifVi0
yWXaDZAl0J1gEKxda5foqIJZEScQyqWnAGFIIMVsxWxMpv9V3C+XaMgpgC5yQdoQ
hgs6lv8U/w7Qevu4oaU2oq64C5ipyzheLuL+l9Ykwig9brJ9pqvBhEr34VDyyLnK
l1VVQP5Y94gsPX2FuBaFgQ6oN3xjAkzFWDVKPtdYhMW7l93ER31KWgyJ53zK0Avk
wl/h5Xvep7vgA1pvyiu7Lom47QX2SVY3Xt6vsJ42qrR9bp1sLZ+piZaSBPTSuNmo
YySwgku6QlQfCFThh09zjuQ8+zwlq5Enjp+fvy/NtzEhTzK1gmknrQo0QF+Fj1Fj
5yz30/XWjUTn1dtBNeBw
=GsPT
-----END PGP SIGNATURE-----
Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge
Included changes:
- properly release neigh_ifinfo in batadv_iv_ogm_process_per_outif()
- properly release orig_ifinfo->router when freeing orig_ifinfo
- properly release neigh_node objects during periodic check
- properly release neigh_info objects when the related hard_iface
is free'd
These changes are all very important because they fix some
reference counting imbalances that lead to the
impossibility of releasing the netdev object used by
batman-adv on shutdown.
The consequence is that such object cannot be destroyed by
the networking stack (the refcounter does not reach zero)
thus bringing the system in hanging state during a normal
reboot operation or a network reconfiguration.
Since commit 7e98056964("ipv6: router reachability probing"), a router falls
into NUD_FAILED will be probed.
Now if function rt6_select() selects a router which neighbour state is NUD_FAILED,
and at the same time function rt6_probe() changes the neighbour state to NUD_PROBE,
then function dst_neigh_output() can directly send packets, but actually the
neighbour still is unreachable. If we set nud_state to NUD_INCOMPLETE instead
NUD_PROBE, packets will not be sent out until the neihbour is reachable.
In addition, because the route should be probes with a single NS, so we must
set neigh->probes to neigh_max_probes(), then the neigh timer timeout and function
neigh_timer_handler() will not send other NS Messages.
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
At the moment, the ath9k/ath10k DFS module only supports detecting ETSI
radar patterns.
Add a bitmap in the interface combinations, indicating which DFS regions
are supported by the detector. If unset, support for all regions is
assumed.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If the association is in progress while we suspend, the
stack will be in a messed up state. Clean it before we
suspend.
This patch completes Johannes's patch:
1a1cb744de
Author: Johannes Berg <johannes.berg@intel.com>
mac80211: fix suspend vs. authentication race
Cc: <stable@vger.kernel.org>
Fixes: 12e7f51702 ("mac80211: cleanup generic suspend/resume procedures")
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The function ip6_tnl_validate assumes that the rtnl
attribute IFLA_IPTUN_PROTO always be filled . If this
attribute is not filled by the userspace application
kernel get crashed with NULL pointer dereference. This
patch fixes the potential kernel crash when
IFLA_IPTUN_PROTO is missing .
Signed-off-by: Susant Sahani <susant@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
When I use the following command, eth0 cannot send any packets.
#tc qdisc add dev eth0 root handle 1: hhf limit 1
Because qlen need be smaller than limit, all packets were dropped.
Fix this by qlen *<=* limit.
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The __vlan_find_dev_deep should always called in RCU, according
David's suggestion, rename to __vlan_find_dev_deep_rcu looks more
reasonable.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As suggested by several people, rename local_df to ignore_df,
since it means "ignore df bit if it is set".
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/altera/altera_sgdma.c
net/netlink/af_netlink.c
net/sched/cls_api.c
net/sched/sch_api.c
The netlink conflict dealt with moving to netlink_capable() and
netlink_ns_capable() in the 'net' tree vs. supporting 'tc' operations
in non-init namespaces. These were simple transformations from
netlink_capable to netlink_ns_capable.
The Altera driver conflict was simply code removal overlapping some
void pointer cast cleanups in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>
Display "return" for implicit rule at the end of a non-base chain,
instead of when popping chain from the stack.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
After returning from the chain that we just went to with no matchings,
we get a bogus rule number in the trace. To fix this, we would need
to iterate over the list of remaining rules in the chain to update the
rule number counter.
Patrick suggested to set this to the maximum value since the default
base chain policy is the very last action when the processing the base
chain is over.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch fixes a crash when trying to access the counters and the
default chain policy from the non-base chain that we have reached
via the goto chain. Fix this by falling back on the original base
chain after returning from the custom chain.
While fixing this, kill the inline function to account chain statistics
to improve source code readability.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We need to use the mark we get from the tunnels o_key to
lookup the right vti state in the error handlers. This patch
ensures that.
Fixes: df3893c1 ("vti: Update the ipv4 side to use it's own receive hook.")
Fixes: fa9ad96d ("vti6: Update the ipv6 side to use its own receive hook.")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
If we fail to register one of the xfrm protocol handlers we will
unregister the pernet ops twice on the error exit path. This will
probably lead to a kernel panic as the double deregistration
leads to a double kfree().
Fix this by removing one of the calls to do it only once.
Fixes: fa9ad96d49 ("vti6: Update the ipv6 side to use its own...")
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
the parameter rt will be assigned to c.arg in function fib6_clean_tree(),
but function fib6_prune_clone() doesn't use c.arg, so we can remove it
safely.
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce BPF helper macros to define instructions
(similar to old BPF_STMT/BPF_JUMP macros)
Use them while converting classic BPF to internal
and in BPF testsuite later.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When an interface is removed separately, all neighbors need to be
checked if they have a neigh_ifinfo structure for that particular
interface. If that is the case, remove that ifinfo so any references to
a hard interface can be freed.
This is a regression introduced by
89652331c0
("batman-adv: split tq information in neigh_node struct")
Reported-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Simon Wunderlich <simon@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
The current code will not execute batadv_purge_orig_neighbors() when an
orig_ifinfo has already been purged. However we need to run it in any
case. Fix that.
This is a regression introduced by
7351a4822d
("batman-adv: split out router from orig_node")
Signed-off-by: Simon Wunderlich <simon@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
When an interface is removed from batman-adv, the orig_ifinfo of a
orig_node may be removed without releasing the router first.
This will prevent the reference for the neighbor pointed at by the
orig_ifinfo->router to be released, and this leak may result in
reference leaks for the interface used by this neighbor. Fix that.
This is a regression introduced by
7351a4822d
("batman-adv: split out router from orig_node").
Reported-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Simon Wunderlich <simon@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
The neigh_ifinfo object must be freed if it has been used in
batadv_iv_ogm_process_per_outif().
This is a regression introduced by
89652331c0
("batman-adv: split tq information in neigh_node struct")
Reported-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Simon Wunderlich <simon@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
This patch adds support to store local TX power level for connection
when reply for HCI_Read_Transmit_Power_Level is received.
Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
John W. Linville says:
====================
pull request: wireless 2014-05-08
This one is all from Johannes:
"Here are a few small fixes for the current cycle: radiotap TX flags were
wrong (fix by Bob), Chun-Yeow fixes an SMPS issue with mesh interfaces,
Eliad fixes a locking bug and a cfg80211 state problem and finally
Henning sent me a fix for IBSS rate information."
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When register_net_sysctl failed, we should free the
sysctl_table.
Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
unregister_net_sysctl_table will check the ctl_table_header,
so remove the unneed checking
Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following batch contains netfilter fixes for your net tree, they are:
1) Fix use after free in nfnetlink when sending a batch for some
unsupported subsystem, from Denys Fedoryshchenko.
2) Skip autoload of the nat module if no binding is specified via
ctnetlink, from Florian Westphal.
3) Set local_df after netfilter defragmentation to avoid a bogus ICMP
fragmentation needed in the forwarding path, also from Florian.
4) Fix potential user after free in ip6_route_me_harder() when returning
the error code to the upper layers, from Sergey Popovich.
5) Skip possible bogus ICMP time exceeded emitted from the router (not
valid according to RFC) if conntrack zones are used, from Vasily Averin.
6) Fix fragment handling when nf_defrag_ipv4 is loaded but nf_conntrack
is not present, also from Vasily.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If sdata doesn't have a valid dev (e.g. in case of monitor
vif), the vif_name field was initialized with (a length of)
some short string, but later was set to a different,
potentially larger one.
This resulted in out-of-bounds write, which usually
appeared as garbage in the trace log.
Simply trace sdata->name, as it should always have the
correct name for both cases.
Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This patch increments the management interface revision due to the
changes with the Device Found management event and other fixes.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
When the driver fails during HW restart or resume, the whole
stack goes into a very confused state with interfaces being
up while the hardware is down etc.
Address this by shutting down everything; we'll run into a
lot of warnings in the process but that's better than having
the whole stack get messed up.
Reviewed-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There are two (related) issues with this.
One case, reported by Michal, is related to hostap: it unsets the
20/40 capability bit for stations that associate when it's in 20
MHz mode.
The other case, reported by Eyal, is that some APs like Netgear
R6300v2 and probably others based on the BCM4360 chipset can be
configured for doing VHT at 20Mhz. In this case the beacon has
a VHT IE but the HT cap indicates transmitter only support 20Mhz.
In both of these cases, we currently avoid VHT and use only HT
this means we can't use the highest rates (MCS8), so fixing this
leads to throughput improvements.
Reported-by: Michal Kazior <michal.kazior@tieto.com>
Reported-by: Eyal Shapira <eyal@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Each node action flag should be set or cleared separately, instead
we now set the whole flags variable in one shot, and it's turned
out to be hard to see which other flags are affected. Therefore,
for instance, we explicitly clear TIPC_WAIT_OWN_LINKS_DOWN bit in
node_lost_contact().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename node flags to action_flags as well as its enum names so
that they can reflect its real meanings.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Validating the UDP checksum is now done in UDP before handing
packets to the encapsulation layer. Note that this also eliminates
the "feature" where L2TP can ignore a non-zero UDP checksum (doing
this was contrary to RFC 1122).
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Moving validation of UDP checksum to be done in UDP not encap layer.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use skb_checksum_simple_validate to verify checksum.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use skb_checksum_simple_validate to verify checksum.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use skb_checksum_simple_validate to verify checksum.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use skb_checksum_simple_validate to verify checksum.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All the callers hold RTNL lock, so there is no need to use inet_addr_hash_lock
to protect the hash list.
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similarly, when CONFIG_SYSCTL is not set, ping_group_range should still
work, just that no one can change it. Therefore we should move it out of
sysctl_net_ipv4.c. And, it should not share the same seqlock with
ip_local_port_range.
BTW, rename it to ->ping_group_range instead.
Cc: David S. Miller <davem@davemloft.net>
Cc: Francois Romieu <romieu@fr.zoreil.com>
Reported-by: Stefan de Konink <stefan@konink.de>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When CONFIG_SYSCTL is not set, ip_local_port_range should still work,
just that no one can change it. Therefore we should move it out of sysctl_inet.c.
Also, rename it to ->ip_local_ports instead.
Cc: David S. Miller <davem@davemloft.net>
Cc: Francois Romieu <romieu@fr.zoreil.com>
Reported-by: Stefan de Konink <stefan@konink.de>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dst is released one line before we access it again with dst->error.
Fixes: 58e35d1471 netfilter: ipv6: propagate routing errors from
ip6_route_me_harder()
Signed-off-by: Sergey Popovich <popovich_sergei@mail.ru>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch adds support to store RSSI for connection when reply for
HCI_Read_RSSI is received.
Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The Invalid Parameters error code is used to indicate that the command
length is invalid or that a parameter is outside of the specified range.
This error code wasn't clearly specified in the Bluetooth 4.0
specification but since 4.1 this has been fixed.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Commit 59af6928 (mac80211: fix CSA tx queue stopping) introduced a
sparse warning:
net/mac80211/cfg.c:3274:5: warning: symbol '__ieee80211_channel_switch' was not declared. Should it be static?
Fix it by declaring the function static.
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
It doesn't make much sense to leave a crippled
interface running.
As a side effect this will unblock tx queues with
CSA reason immediately after failure instead of
until after userspace requests interface to stop.
This also gives userspace an opportunity to
indirectly see CSA failure.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
[small code cleanup]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Increment fib_info_cnt in fib_create_info() right after successfuly
alllocating fib_info structure, overwise fib_metrics allocation failure
leads to fib_info_cnt incorrectly decremented in free_fib_info(), called
on error path from fib_create_info().
Signed-off-by: Sergey Popovich <popovich_sergei@mail.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 8f0ea0fe3a (snmp: reduce percpu needs by 50%)
reduced snmp array size to 1, so technically it doesn't have to be
an array any more. What's more, after the following commit:
commit 933393f58f
Date: Thu Dec 22 11:58:51 2011 -0600
percpu: Remove irqsafe_cpu_xxx variants
We simply say that regular this_cpu use must be safe regardless of
preemption and interrupt state. That has no material change for x86
and s390 implementations of this_cpu operations. However, arches that
do not provide their own implementation for this_cpu operations will
now get code generated that disables interrupts instead of preemption.
probably no arch wants to have SNMP_ARRAY_SZ == 2. At least after
almost 3 years, no one complains.
So, just convert the array to a single pointer and remove snmp_mib_init()
and snmp_mib_free() as well.
Cc: Christoph Lameter <cl@linux.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit d206940319,
there are no more callers.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Doing the segmentation in the forward path has one major drawback:
When using virtio, we may process gso udp packets coming
from host network stack. In that case, netfilter POSTROUTING
will see one packet with udp header followed by multiple ip
fragments.
Delay the segmentation and do it after POSTROUTING invocation
to avoid this.
Fixes: fe6cc55f3a ("net: ip, ipv6: handle gso skbs in forwarding path")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
If conntrack defragments incoming ipv6 frags it stores largest original
frag size in ip6cb and sets ->local_df.
We must thus first test the largest original frag size vs. mtu, and not
vice versa.
Without this patch PKTTOOBIG is still generated in ip6_fragment() later
in the stack, but
1) IPSTATS_MIB_INTOOBIGERRORS won't increment
2) packet did (needlessly) traverse netfilter postrouting hook.
Fixes: fe6cc55f3a ("net: ip, ipv6: handle gso skbs in forwarding path")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
local_df means 'ignore DF bit if set', so if its set we're
allowed to perform ip fragmentation.
This wasn't noticed earlier because the output path also drops such skbs
(and emits needed icmp error) and because netfilter ip defrag did not
set local_df until couple of days ago.
Only difference is that DF-packets-larger-than MTU now discarded
earlier (f.e. we avoid pointless netfilter postrouting trip).
While at it, drop the repeated test ip_exceeds_mtu, checking it once
is enough...
Fixes: fe6cc55f3a ("net: ip, ipv6: handle gso skbs in forwarding path")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
If gpio controller requires waiting for read and write
GPIO values, then we have to use the gpio cansleep api.
Fix the rfkill_gpio_set_power which calls only the
nonsleep version (causing kernel warning).
There is no problem to use the cansleep version here
because we are not in IRQ handler or similar context
(cf rfkill_set_block).
Signed-off-by: Loic Poulain <loic.poulain@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
It is not guaranteed that multi-vif channel
switching is tightly synchronized. It makes sense
to ignore cqm (missing beacons, et al) while csa
is progressing and re-check it after it completes.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This exports a new cfg80211_stop_iface() function.
This is intended for driver internal interface
combination management and channel switching.
Due to locking issues (it re-enters driver) the
call is asynchronous and uses cfg80211 event
list/worker.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
It was possible for tx queues to be stuck stopped
if AP CSA finalization failed. In that case
neither stop_ap nor do_stop woke the queues up.
This means it was impossible to perform tx at all
until driver was reloaded or a successful CSA was
performed later.
It was possible to solve this in a simpler manner
however this is more robust and future proof
(having multi-vif CSA in mind).
New sdata->csa_block_tx is introduced to keep
track of which interfaces requested tx to be
blocked for CSA. This is required because mac80211
stops all tx queues for that purpose. This means
queues must be awoken only when last tx-blocking
CSA interface is finished.
It is still possible to have tx queues stopped
after CSA failure but as soon as offending
interfaces are stopped from userspace (stop_ap or
ifdown) tx queues are woken up properly.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
We don't catch the case if an unsupported protocol is submitted
to the xfrm6 protocol handlers, this can lead to NULL pointer
dereferences. Fix this by adding the appropriate checks.
Fixes: 7e14ea15 ("xfrm6: Add IPsec protocol multiplexer")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Commit 4068579e1e ("net: Implmement
RFC 6936 (zero RX csums for UDP/IPv6)") introduced zero checksums
being allowed for IPv6, but in the case that a socket disallows a
zero checksum on RX we need to sock_put.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enabling CONFIG_DEBUG_ATOMIC_SLEEP has shown that some rfcomm functions
acquiring spinlocks call sleeping locks further in the chain. Converting
the offending spinlocks into mutexes makes sleeping safe.
Signed-off-by: Libor Pechacek <lpechacek@suse.cz>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Pull networking fixes from David Miller:
1) e1000e computes header length incorrectly wrt vlans, fix from Vlad
Yasevich.
2) ns_capable() check in sock_diag netlink code, from Andrew
Lutomirski.
3) Fix invalid queue pairs handling in virtio_net, from Amos Kong.
4) Checksum offloading busted in sxgbe driver due to incorrect
descriptor layout, fix from Byungho An.
5) Fix build failure with SMC_DEBUG set to 2 or larger, from Zi Shen
Lim.
6) Fix uninitialized A and X registers in BPF interpreter, from Alexei
Starovoitov.
7) Fix arch dependencies of candence driver.
8) Fix netlink capabilities checking tree-wide, from Eric W Biederman.
9) Don't dump IFLA_VF_PORTS if netlink request didn't ask for it in
IFLA_EXT_MASK, from David Gibson.
10) IPV6 FIB dump restart doesn't handle table changes that happen
meanwhile, causing the code to loop forever or emit dups, fix from
Kumar Sandararajan.
11) Memory leak on VF removal in bnx2x, from Yuval Mintz.
12) Bug fixes for new Altera TSE driver from Vince Bridgers.
13) Fix route lookup key in SCTP, from Xugeng Zhang.
14) Use BH blocking spinlocks in SLIP, as per a similar fix to CAN/SLCAN
driver. From Oliver Hartkopp.
15) TCP doesn't bump retransmit counters in some code paths, fix from
Eric Dumazet.
16) Clamp delayed_ack in tcp_cubic to prevent theoretical divides by
zero. Fix from Liu Yu.
17) Fix locking imbalance in error paths of HHF packet scheduler, from
John Fastabend.
18) Properly reference the transport module when vsock_core_init() runs,
from Andy King.
19) Fix buffer overflow in cdc_ncm driver, from Bjørn Mork.
20) IP_ECN_decapsulate() doesn't see a correct SKB network header in
ip_tunnel_rcv(), fix from Ying Cai.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (132 commits)
net: macb: Fix race between HW and driver
net: macb: Remove 'unlikely' optimization
net: macb: Re-enable RX interrupt only when RX is done
net: macb: Clear interrupt flags
net: macb: Pass same size to DMA_UNMAP as used for DMA_MAP
ip_tunnel: Set network header properly for IP_ECN_decapsulate()
e1000e: Restrict MDIO Slow Mode workaround to relevant parts
e1000e: Fix issue with link flap on 82579
e1000e: Expand workaround for 10Mb HD throughput bug
e1000e: Workaround for dropped packets in Gig/100 speeds on 82579
net/mlx4_core: Don't issue PCIe speed/width checks for VFs
net/mlx4_core: Load the Eth driver first
net/mlx4_core: Fix slave id computation for single port VF
net/mlx4_core: Adjust port number in qp_attach wrapper when detaching
net: cdc_ncm: fix buffer overflow
Altera TSE: ALTERA_TSE should depend on HAS_DMA
vsock: Make transport the proto owner
net: sched: lock imbalance in hhf qdisc
net: mvmdio: Check for a valid interrupt instead of an error
net phy: Check for aneg completion before setting state to PHY_RUNNING
...
Pull Ceph fixes from Sage Weil:
"First, there is a critical fix for the new primary-affinity function
that went into -rc1.
The second batch of patches from Zheng fix a range of problems with
directory fragmentation, readdir, and a few odds and ends for cephfs"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: reserve caps for file layout/lock MDS requests
ceph: avoid releasing caps that are being used
ceph: clear directory's completeness when creating file
libceph: fix non-default values check in apply_primary_affinity()
ceph: use fpos_cmp() to compare dentry positions
ceph: check directory's completeness before emitting directory entry
In the previous commits of this series, we removed all asynchronous
actions which were based on the tasklet handler - "tipc_k_signal()".
So the moment has now come when we can completely remove the tasklet
handler infrastructure. That is done with this commit.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Postpone the actions of resetting all links until after bclink
lock is released, avoiding to asynchronously reset all links.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert allocations of global variables associated with bclink from
static way to dynamical way for the convenience of bclink instance
initialisation. Meanwhile, this also helps TIPC support name space
in the future easily.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As we are going to do more jobs when bc_lock is released, the two
operations of holding/releasing the lock should be encapsulated with
functions. In addition, we move bc_lock spin lock into tipc_bclink
structure avoiding to define the global variable.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Postpone the actions of delivering name tables until after node
lock is released, avoiding to do it under asynchronous context.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since previously what all publications pertaining to the lost node
were removed from name table was finished in tasklet context
asynchronously, we need to TIPC_NAMES_GONE flag indicating whether
the node cleanup work is finished or not. But now as the cleanup work
has been finished when node lock is released, the flag becomes
meaningless for us.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Postpone the actions of notifying subscriptions until after node lock
is released, avoiding to asynchronously execute registered handlers
when node is lost.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename setup_blocked variable of node struct to a more common
name called "flags", which will be used to represent kinds of
node states.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move more frequently used variables up to the head of tipc_node
structure, hopefully improving a bit performance.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Although we obtain node lock with tipc_node_lock() in most time, there
are still places where we directly use native spin lock interface
to grab node lock. But as we will do more jobs in the future when node
lock is released, we should ensure that tipc_node_lock() is always
called when node lock is taken.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In ip_tunnel_rcv(), set skb->network_header to inner IP header
before IP_ECN_decapsulate().
Without the fix, IP_ECN_decapsulate() takes outer IP header as
inner IP header, possibly causing error messages or packet drops.
Note that this skb_reset_network_header() call was in this spot when
the original feature for checking consistency of ECN bits through
tunnels was added in eccc1bb8d4 ("tunnel: drop packet if ECN present
with not-ECT"). It was only removed from this spot in 3d7b46cd20
("ip_tunnel: push generic protocol handling to ip_tunnel module.").
Fixes: 3d7b46cd20 ("ip_tunnel: push generic protocol handling to ip_tunnel module.")
Reported-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Ying Cai <ycai@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is no longer used after commit e837735ec4
(ip6_tunnel: ensure to always have a link local address).
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RFC 6936 relaxes the requirement of RFC 2460 that UDP/IPv6 packets which
are received with a zero UDP checksum value must be dropped. RFC 6936
allows zero checksums to support tunnels over UDP.
When sk_no_check is set we allow on a socket we allow a zero IPv6
UDP checksum. This is for both sending zero checksum and accepting
a zero checksum on receive.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Call skb_checksum_init instead of private functions.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Call skb_checksum_init instead of private functions.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
pull request: wireless-next 2014-05-02
Please pull this batch of updates intended for the 3.16 stream...
For the mac80211 bits, Johannes says:
"In this round we have a large number of small features and
improvements from people too numerous to list here. The only really
bit thing is Michał and Luca's CSA work (including changing how
interface combination verification is done)."
For the Bluetooth bits, Gustavo says:
"Here goes some patches for the -next release. There is nothing
really special for this pull request, just a bunch of refactors,
fixes and clean ups."
For the ath10k/ath6kl bits, Kalle says:
"For ath6kl Kalle fixed a bunch of checkpatch warnings.
In ath10k we had more changes, major ones being:
* fix memory allocation failures after a firmware crash (Michal)
* some rework of DFS configuration to enable it correctly in all cases
(Michal)
* add a new firmware crash option to make it possible to crash 10.1
firmware for testing purposes (Marek P)
* fix RTS/CTS protection in certain cases (Marek K)
* fix wrong RSSI and rate reporting in some cases (Janusz)
* fix firmware stats reporting (Chun, Ben & Bartosz)"
For the iwlwifi bits, Emmanuel says:
"I have here a bunch of unrelated things. I disabled support for
-7.ucode which means that I can removed a lot of code. Eliad has
a brand new feature: we reduce the Tx power when the link allows -
this reduces our power consumption. The regular changes in power and
scan area. One interesting thing though is the patches from Johannes,
we have now GRO which allows to increase our throughput in TCP Rx. The
main advantage is that it reduces the number of TCP Acks - these TCP
Acks are completely useless when we are using A-MPDU since the first
packet of the A-MPDU generates a TCP Ack which is made obsolete by
the next packets."
Along with that, there are a variety of updates to b43, mwifiex,
rtl8180 and wil6210 drivers and a handful of other updates here
and there.
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Right now the core vsock module is the owner of the proto family. This
means there's nothing preventing the transport module from unloading if
there are open sockets, which results in a panic. Fix that by allowing
the transport to be the owner, which will refcount it properly.
Includes version bump to 1.0.1.0-k
Passes checkpatch this time, I swear...
Acked-by: Dmitry Torokhov <dtor@vmware.com>
Signed-off-by: Andy King <acking@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes ordering of rtnl notifications during unregister_netdevice
by moving RTM_DELLINK notification to until after ndo_uninit.
The problem was seen with unregistering bond netdevices.
bond ndo_uninit callback generates a few RTM_NEWLINK notifications for
NETDEV_CHANGEADDR and NETDEV_FEAT_CHANGE. This is seen mostly when the
bond is deleted with slaves still enslaved to the bond.
During unregister netdevice (rollback_registered_many to be specific)
bond ndo_uninit is called after RTM_DELLINK notification goes out.
This results in userspace seeing RTM_DELLINK followed by a couple of
RTM_NEWLINK's.
In userspace problem was seen with libnl. libnl cache deletes the bond
when it sees RTM_DELLINK and re-adds the bond with the following
RTM_NEWLINK. Resulting in a stale bond entry in libnl cache when the kernel
has already deleted the bond.
This patch has been tested for bond, bridges and vlan devices.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
pull request: wireless 2014-05-01
Please pull the following batch of fixes intended for the 3.15 stream!
For the Bluetooth bits, Gustavo says:
"Some fixes for 3.15. There is a revert for the intel driver, a new
device id, and two important SSP fixes from Johan."
On top of that...
Ben Hutchings gives us a fix for an unbalanced irq enable in an
rtl8192cu error path.
Colin Ian King provides an rtlwifi fix for an uninitialized variable.
Felix Fietkau brings a pair of ath9k fixes, one that corrects a
hardware initialization value and another that removes an (unnecessary)
flag that was being used in a way that led to a software tx queue
hang in ath9k.
Gertjan van Wingerde pushes a MAINTAINERS change to remove himself
from the rt2x00 maintainer team.
Hans de Goede fixes a brcmfmac firmware load hang.
Larry Finger changes rtlwifi to use the correct queue for V0 traffic
on rtl8192se.
Rajkumar Manoharan corrects a race in ath9k driver initialization.
Stanislaw Gruszka fixes an rt2x00 bug in which disabling beaconing
once on USB devices led to permanently disabling beaconing for those
devices.
Tim Harvey provides fixes for a pair of ath9k issues that can lead
to soft lockups in that driver.
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently bridge can silently drop ipv4 fragments.
If node have loaded nf_defrag_ipv4 module but have no nf_conntrack_ipv4,
br_nf_pre_routing defragments incoming ipv4 fragments
but nfct check in br_nf_dev_queue_xmit does not allow re-fragment combined
packet back, and therefore it is dropped in br_dev_queue_push_xmit without
incrementing of any failcounters
It seems the only way to hit the ip_fragment code in the bridge xmit
path is to have a fragment list whose reassembled fragments go over
the mtu. This only happens if nf_defrag is enabled. Thanks to
Florian Westphal for providing feedback to clarify this.
Defragmentation ipv4 is required not only in conntracks but at least in
TPROXY target and socket match, therefore #ifdef is changed from
NF_CONNTRACK_IPV4 to NF_DEFRAG_IPV4
Signed-off-by: Vasily Averin <vvs@openvz.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The 'local' variable in __ieee80211_vif_copy_chanctx_to_vlans()
is only used/needed when lockdep is compiled in, mark it as such
to avoid compile warnings in the other case.
While at it, fix some indentation where it's used.
Reviewed-by: Luciano Coelho <luciano.coelho@intel.com>
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Defrag user check in ip_expire was not updated after adding support for
"conntrack zones".
This bug manifests as a RFC violation, since the router will send
the icmp time exceeeded message when using conntrack zones.
Signed-off-by: Vasily Averin <vvs@openvz.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
With new additions planned, this code is getting too big for cfg.c.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Expose a new tdls flag for the public ieee80211_sta struct.
This can be used in some rate control decisions.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
ieee80211_reconfig already holds rtnl, so calling
cfg80211_sched_scan_stopped results in deadlock.
Use the rtnl-version of this function instead.
Fixes: d43c6b6 ("mac80211: reschedule sched scan after HW restart")
Cc: stable@vger.kernel.org (3.14+)
Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add locked-version for cfg80211_sched_scan_stopped.
This is used for some users that might want to
call it when rtnl is already locked.
Fixes: d43c6b6 ("mac80211: reschedule sched scan after HW restart")
Cc: stable@vger.kernel.org (3.14+)
Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
cfg80211 is notified about connection failures by
__cfg80211_connect_result() call. However, this
function currently does not free cfg80211 sme.
This results in hanging connection attempts in some cases
e.g. when mac80211 authentication attempt is denied,
we have this function call:
ieee80211_rx_mgmt_auth() -> cfg80211_rx_mlme_mgmt() ->
cfg80211_process_auth() -> cfg80211_sme_rx_auth() ->
__cfg80211_connect_result()
but cfg80211_sme_free() is never get called.
Fixes: ceca7b712 ("cfg80211: separate internal SME implementation")
Cc: stable@vger.kernel.org (3.10+)
Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Filter out incoming multicast packages before applying their bitrate
to the rx bitrate station info field to prevent them from setting the
rx bitrate to the basic multicast rate.
Signed-off-by: Henning Rogge <hrogge@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This contains only some minor misc cleanpus. We can spare us the
extra variable declaration in __skb_get_pay_offset(), the cast in
__get_random_u32() is rather unnecessary and in __sk_migrate_realloc()
we can remove the memcpy() and do a direct assignment of the structs.
Latter was suggested by Fengguang Wu found with coccinelle. Also,
remaining pointer casts of long should be unsigned long instead.
Suggested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current code is a bit hard to parse on which registers can be used,
how they are mapped and all play together. It makes much more sense to
define this a bit more clearly so that the code is a bit more intuitive.
This patch cleans this up, and makes naming a bit more consistent among
the code. This also allows for moving some of the defines into the header
file. Clearing of A and X registers in __sk_run_filter() do not get a
particular register name assigned as they have not an 'official' function,
but rather just result from the concrete initial mapping of old BPF
programs. Since for BPF helper functions for BPF_CALL we already use
small letters, so be consistent here as well. No functional changes.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch simplifies label naming for the BPF jump-table.
When we define labels via DL(), we just concatenate/textify
the combination of instruction opcode which consists of the
class, subclass, word size, target register and so on. Each
time we leave BPF_ prefix intact, so that e.g. the preprocessor
generates a label BPF_ALU_BPF_ADD_BPF_X for DL(BPF_ALU, BPF_ADD,
BPF_X) whereas a label name of ALU_ADD_X is much more easy
to grasp. Pure cleanup only.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hhf_change() takes the sch_tree_lock and releases it but misses the
error cases. Fix the missed case here.
To reproduce try a command like this,
# tc qdisc change dev p3p2 root hhf quantum 40960 non_hh_weight 300000
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
else we may fail to forward skb even if original fragments do fit
outgoing link mtu:
1. remote sends 2k packets in two 1000 byte frags, DF set
2. we want to forward but only see '2k > mtu and DF set'
3. we then send icmp error saying that outgoing link is 1500
But original sender never sent a packet that would not fit
the outgoing link.
Setting local_df makes outgoing path test size vs.
IPCB(skb)->frag_max_size, so we will still send the correct
error in case the largest original size did not fit
outgoing link mtu.
Reported-by: Maxime Bizon <mbizon@freebox.fr>
Suggested-by: Maxime Bizon <mbizon@freebox.fr>
Fixes: 5f2d04f1f9 (ipv4: fix path MTU discovery with connection tracking)
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Commit e114a710aa ("tcp: fix cwnd limited checking to improve
congestion control") obsoleted in_flight parameter from
tcp_is_cwnd_limited() and its callers.
This patch does the removal as promised.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung discovered tcp_is_cwnd_limited() was returning false in
slow start phase even if the application filled the socket write queue.
All congestion modules take into account tcp_is_cwnd_limited()
before increasing cwnd, so this behavior limits slow start from
probing the bandwidth at full speed.
The problem is that even if write queue is full (aka we are _not_
application limited), cwnd can be under utilized if TSO should auto
defer or TCP Small queues decided to hold packets.
So the in_flight can be kept to smaller value, and we can get to the
point tcp_is_cwnd_limited() returns false.
With TCP Small Queues and FQ/pacing, this issue is more visible.
We fix this by having tcp_cwnd_validate(), which is supposed to track
such things, take into account unsent_segs, the number of segs that we
are not sending at the moment due to TSO or TSQ, but intend to send
real soon. Then when we are cwnd-limited, remember this fact while we
are processing the window of ACKs that comes back.
For example, suppose we have a brand new connection with cwnd=10; we
are in slow start, and we send a flight of 9 packets. By the time we
have received ACKs for all 9 packets we want our cwnd to be 18.
We implement this by setting tp->lsnd_pending to 9, and
considering ourselves to be cwnd-limited while cwnd is less than
twice tp->lsnd_pending (2*9 -> 18).
This makes tcp_is_cwnd_limited() more understandable, by removing
the GSO/TSO kludge, that tried to work around the issue.
Note the in_flight parameter can be removed in a followup cleanup
patch.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This switches a few remaining capable(CAP_NET_ADMIN) to ns_capable so
that root in a user namespace may set tc rules inside that namespace.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit b9f47a3aae (tcp_cubic: limit delayed_ack ratio to prevent
divide error) try to prevent divide error, but there is still a little
chance that delayed_ack can reach zero. In case the param cnt get
negative value, then ratio+cnt would overflow and may happen to be zero.
As a result, min(ratio, ACK_RATIO_LIMIT) will calculate to be zero.
In some old kernels, such as 2.6.32, there is a bug that would
pass negative param, which then ultimately leads to this divide error.
commit 5b35e1e6e9 (tcp: fix tcp_trim_head() to adjust segment count
with skb MSS) fixed the negative param issue. However,
it's safe that we fix the range of delayed_ack as well,
to make sure we do not hit a divide by zero.
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Liu Yu <allanyuliu@tencent.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both TLP and Fast Open call __tcp_retransmit_skb() instead of
tcp_retransmit_skb() to avoid changing tp->retrans_out.
This has the side effect of missing SNMP counters increments as well
as tcp_info tcpi_total_retrans updates.
Fix this by moving the stats increments of into __tcp_retransmit_skb()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Nandita Dukkipati <nanditad@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 1bb8dce57f ("tipc: fix memory
leak during module removal") introduced a memory leak issue: when
name table is stopped, it's forgotten that publication instances are
freed properly. Additionally the useless "continue" statement in
tipc_nametbl_stop() is removed as well.
Reported-by: Jason <huzhijiang@gmail.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This replaces 6 identical code snippets with a call to a new
static inline function.
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reduce copy-past a bit by adding a common helper.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
commit 0eba801b64 tried to fix a race
where nat initialisation can happen after ctnetlink-created conntrack
has been created.
However, it causes the nat module(s) to be loaded needlessly on
systems that are not using NAT.
Fortunately, we do not have to create null bindings in that case.
conntracks injected via ctnetlink always have the CONFIRMED bit set,
which prevents addition of the nat extension in nf_nat_ipv4/6_fn().
We only need to make sure that either no nat extension is added
or that we've created both src and dst manips.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
nfacct objects already support accounting at the byte and packet
level. As such it is a natural extension to add the possiblity to
define a ceiling limit for both metrics.
All the support for quotas itself is added to nfnetlink acctounting
framework to stay coherent with current accounting object management.
Quota limit checks are implemented in xt_nfacct filter where
statistic collection is already done.
Pablo Neira Ayuso has also contributed to this feature.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
These BUG_ON statements should never trigger, but in the unlikely
event that somebody does manage don't stop everything but simply
exit the code path with an error.
Leave the one BUG_ON where changing it would result in a NULL
pointer dereference.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
These really can't trigger unless somebody messes up the code,
but don't make debugging it needlessly complicated, WARN and
return instead of BUG_ON().
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
We don't catch the case if an unsupported protocol is submitted
to the xfrm4 protocol handlers, this can lead to NULL pointer
dereferences. Fix this by adding the appropriate checks.
Fixes: 3328715e ("xfrm4: Add IPsec protocol multiplexer")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
osd_primary_affinity array is indexed into incorrectly when checking
for non-default primary-affinity values. This nullifies the impact of
the rest of the apply_primary_affinity() and results in misdirected
requests.
if (osds[i] != CRUSH_ITEM_NONE &&
osdmap->osd_primary_affinity[i] !=
^^^
CEPH_OSD_DEFAULT_PRIMARY_AFFINITY) {
For a pool with size 2, this always ends up checking osd0 and osd1
primary_affinity values, instead of the values that correspond to the
osds in question. E.g., given a [2,3] up set and a [max,max,0,max]
primary affinity vector, requests are still sent to osd2, because both
osd0 and osd1 happen to have max primary_affinity values and therefore
we return from apply_primary_affinity() early on the premise that all
osds in the given set have max (default) values. Fix it.
Fixes: http://tracker.ceph.com/issues/7954
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Commit a89778d8ba ("tipc: add support
for link state subscriptions") introduced below possible deadlock
scenario:
CPU0 CPU1
T0: tipc_publish() link_timeout()
T1: tipc_nametbl_publish() [grab node lock]*
T2: [grab nametbl write lock]* link_state_event()
T3: named_cluster_distribute() link_activate()
T4: [grab node lock]* tipc_node_link_up()
T5: tipc_nametbl_publish()
T6: [grab nametble write lock]*
The opposite order of holding nametbl write lock and node lock on
above two different paths may result in a deadlock. If we move the
the delivery of named messages via link out of name nametbl lock,
the reverse order of holding locks will be eliminated, as a result,
the deadlock will be killed as well.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To properly match iif in ip rules we have to provide
LOOPBACK_IFINDEX in flowi6_iif, not 0. Some ip6mr_fib_lookup
and fib6_rule_lookup callers need such fix.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 78acb1f9b8 ("tipc: add
ioctl to fetch link names") introduced a buffer overflow bug where
specially crafted ioctl requests could cause out-of-bounds indexing
of the node->links array. This was caused by an incorrect check vs
MAX_BEARERS, and the static code checker complaint is:
net/tipc/node.c:459 tipc_node_get_linkname() error: buffer overflow 'node->links' 2 <= 2
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit 3de0b59239 ("ethtool: Support for configurable RSS hash
key") introduced a new function ethtool_copy_validate_indir() with
full iteration of the loop to validate the ring indices, which could
be an overkill. To minimize the impact, we ought to exit the loop as
soon as the invalid index occurs for the very first time. The
remaining loop simply doesn't serve any more purpose.
Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Cc: Venkata Duvvuru <VenkatKumar.Duvvuru@Emulex.Com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement the new cfg80211 capability to enable mac80211-based drivers
to support for dynamic channel bandwidth changes (e.g., HT 20/40 MHz
changes).
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This extends NL80211_CMD_SET_CHANNEL to allow dynamic channel bandwidth
changes in AP mode (including P2P GO) during a lifetime of the BSS. This
can be used to implement, e.g., HT 20/40 MHz co-existence rules on the
2.4 GHz band.
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
P2P_DEVICE doesn't support ieee80211_bss_info_change_notify() for now,
so it's not needed to set changed flags for P2P_DEVICE.
Signed-off-by: Zhao, Gang <gamerh2o@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
ieee80211_assign_chanctx() checks if local->use_chanctx is true, so
the two code block related to ieee80211_assign_chanctx() can be moved
into above if clause, emphasize that these code are executed only if
local->use_chanctx is true.
Signed-off-by: Zhao, Gang <gamerh2o@gmail.com>
[change subject]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When we change the list of action on a given filter, currently we don't
change it to empty. This is a bug, we should allow to change to whatever
users given.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When actions are attached to a filter, they are a part of the filter
itself, so when changing a filter we should allow to overwrite the actions
inside as well.
In my specific case, when I tried to _append_ a new action to an existing
filter which already has an action, I got EEXIST since kernel refused
to overwrite the existing one in kernel.
This patch checks if we are changing the filter checking NLM_F_CREATE flag
(Sigh, filters don't use NLM_F_REPLACE...) and then passes the boolean down
to actions. This fixes the problem above.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't transition to the PF state on every strike after 'Path.Max.Retrans'.
Per draft-ietf-tsvwg-sctp-failover-03 Section 5.1.6:
Additional (PMR - PFMR) consecutive timeouts on a PF destination
confirm the path failure, upon which the destination transitions to the
Inactive state. As described in [RFC4960], the sender (i) SHOULD notify
ULP about this state transition, and (ii) transmit heartbeats to the
Inactive destination at a lower frequency as described in Section 8.3 of
[RFC4960].
This also prevents sending SCTP_ADDR_UNREACHABLE to the user as the state
bounces between SCTP_INACTIVE and SCTP_PF for each subsequent strike.
Signed-off-by: Karl Heiss <kheiss@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 813b3b5db8 (ipv4: Use caller's on-stack flowi as-is
in output route lookups.) introduces another regression which
is very similar to the problem of commit e6b45241c (ipv4: reset
flowi parameters on route connect) wants to fix:
Before we call ip_route_output_key() in sctp_v4_get_dst() to
get a dst that matches a bind address as the source address,
we have already called this function previously and the flowi
parameters have been initialized including flowi4_oif, so when
we call this function again, the process in __ip_route_output_key()
will be different because of the setting of flowi4_oif, and we'll
get a networking device which corresponds to the inputted flowi4_oif
as the output device, this is wrong because we'll never hit this
place if the previously returned source address of dst match one
of the bound addresses.
To reproduce this problem, a vlan setting is enough:
# ifconfig eth0 up
# route del default
# vconfig add eth0 2
# vconfig add eth0 3
# ifconfig eth0.2 10.0.1.14 netmask 255.255.255.0
# route add default gw 10.0.1.254 dev eth0.2
# ifconfig eth0.3 10.0.0.14 netmask 255.255.255.0
# ip rule add from 10.0.0.14 table 4
# ip route add table 4 default via 10.0.0.254 src 10.0.0.14 dev eth0.3
# sctp_darn -H 10.0.0.14 -P 36422 -h 10.1.4.134 -p 36422 -s -I
You'll detect that all the flow are routed to eth0.2(10.0.1.254).
Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When bridge device is created with IFLA_ADDRESS, we are not calling
br_stp_change_bridge_id(), which leads to incorrect local fdb
management and bridge id calculation, and prevents us from receiving
frames on the bridge device.
Reported-by: Tom Gundersen <teg@jklm.no>
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit a8b9b96e95 ("tipc: fix race
in disc create/delete") leads to the following static checker warning:
net/tipc/discover.c:352 tipc_disc_create()
warn: possible memory leak of 'req'
The risk of memory leak really exists in practice. Especially when
it's failed to allocate memory for "req->buf", tipc_disc_create()
doesn't free its allocated memory, instead just directly returns
with ENOMEM error code. In this situation, memory leak, of course,
happens.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do not initialize list twice.
list_replace_init() already takes care of initializing list.
We don't need to initialize it with LIST_HEAD() beforehand.
Signed-off-by: xiao jin <jin.xiao@intel.com>
Reviewed-by: David Cohen <david.a.cohen@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do not initialize net_kill_list twice.
list_replace_init() already takes care of initializing net_kill_list.
We don't need to initialize it with LIST_HEAD() beforehand.
Signed-off-by: xiao jin <jin.xiao@intel.com>
Reviewed-by: David Cohen <david.a.cohen@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We add a new ioctl for AF_TIPC that can be used to fetch the
logical name for a link to a remote node on a given bearer. This
should be used in combination with link state subscriptions.
The logical name size limit definitions are moved to tipc.h, as
they are now also needed by the new ioctl.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When links are established over a bearer plane, we create a node
local publication containing information about the peer node and
bearer plane. This allows TIPC applications to use the standard
TIPC topology server subscription mechanism to get notifications
when a link goes up or down.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current code checks if the 20MHz bandwidth is allowed for
particular channel -- if it is not, the channel is disabled.
Since we need to use 5/10 MHz channels, this code is modified in
the way that the default bandwidth to check is 5MHz. If the
maximum bandwidth allowed by the channel is smaller than 5MHz,
the channel is disabled. Otherwise the channel is used and the
flags are set according to the bandwidth allowed by the channel.
Signed-off-by: Rostislav Lisovy <rostislav.lisovy@fel.cvut.cz>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Since there are frequency bands (e.g. 5.9GHz) allowing channels
with only 10 or 5 MHz bandwidth, this patch adds attributes that
allow keeping track about this information.
When channel attributes are reported to user-space, make sure to
not break old tools, i.e. if the 'split wiphy dump' is enabled,
report the extra attributes (if present) describing the bandwidth
restrictions. If the 'split wiphy dump' is not enabled,
completely omit those channels that have flags set to either
IEEE80211_CHAN_NO_10MHZ or IEEE80211_CHAN_NO_20MHZ.
Add the check for new bandwidth restriction flags in
cfg80211_chandef_usable() to comply with the restrictions.
Signed-off-by: Rostislav Lisovy <rostislav.lisovy@fel.cvut.cz>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Return NOTIFY_DONE if we don't care this time's notification, return
NOTIFY_OK if we successfully handled this time's notification. That's
the formal way to do it.
Signed-off-by: Zhao, Gang <gamerh2o@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Return NOTIFY_DONE if we don't care this time's notification, return
NOTIFY_OK if we successfully handled this time's notification. That's
the formal way to do it.
Signed-off-by: Zhao, Gang <gamerh2o@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Name wiphy_to_rdev is more accurate to describe what the function
does, i.e., return a pointer pointing to struct
cfg80211_registered_device.
Signed-off-by: Zhao, Gang <gamerh2o@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Name "dev" is too common and ambiguous, let all the pointer name
pointing to struct cfg80211_registered_device be "rdev". This can
improve code readability and consistency(since other places have
already called it rdev).
Signed-off-by: Zhao, Gang <gamerh2o@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The BUG_ON(!err) can't be triggered in the code path, so remove
it.
Signed-off-by: Zhao, Gang <gamerh2o@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
And some code style changes in the function, and correct a typo in
comment.
Signed-off-by: Zhao, Gang <gamerh2o@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Some chips can encrypt managment frames in HW, but
require generated IV in the frame. Add a key flag
that allows us to achieve this.
Signed-off-by: Marek Kwaczynski <marek.kwaczynski@tieto.com>
[use BIT(0) to fill that spot, fix indentation]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
It doesn't make much sense to store refcount in
the chanctx structure. One still needs to hold
chanctx_mtx to get the value safely. Besides,
refcount isn't on performance critical paths.
This will make implementing chanctx reservation
refcounting a little easier.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Channel context refcount is protected by
chanctx_mtx. Accessing the value without holding
the mutex is racy. RCU section didn't guarantee
anything here.
Theoretically ieee80211_channel_switch() could
fail to see refcount change and read "1" instead
of, e.g. "2". This means mac80211 could accept CSA
even though it shouldn't have.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The function did a little too much. Split it up so
the code can be easily reused in the future.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The function did a little too much. Split it up so
the code can be easily reused in the future.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Use a separate function to look for reservation
chanctx. For multi-interface/channel reservation
search sematics differ slightly.
The new routine allows reservations to be merged
with chanctx that are already reserved by other
interface(s).
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This allows new vifs to be assigned to a chanctx
as long as chanctx's reservation chandefs (if any)
and chanctx's current chandef (implied by assigned
vifs at the time, if any) and the new vif chandef
are all compatible.
This implies it is impossible to assign a new vif
to an in-place reservation chanctx.
This gives no advantages for single-channel
hardware. It makes sense for multi-channel
hardware only.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This can be useful. Provides a more straghtforward
way to iterate over interfaces taking part in
chanctx reservation and allows tracking chanctx
usage explicitly.
The structure is protected by local->chanctx_mtx.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This can be useful. Provides a more straghtforward
way to iterate over interfaces bound to a given
chanctx and allows tracking chanctx usage
explicitly.
The structure is protected by local->chanctx_mtx.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Initial chanctx reservation code wasn't aware of
radar detection requirements. This is necessary
for chanctx reservations to be used for channel
switching in the future.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Do not allocate more channel contexts than a
driver is capable for currently matching interface
combination.
This allows the ieee80211_vif_reserve_chanctx() to
act as a guard against breaking interface
combinations.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The utility function has no uses yet. It is aimed
at future chanctx reservation management and
channel switching.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The patch splits cfg80211_check_combinations()
into an iterator function and a simple iteration
user.
This makes it possible for drivers to asses how
many channels can use given iftype setup. This in
turn can be used for future
multi-interface/multi-channel channel switching.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
At some locations, channels 149-165 are considered a single
bundle, while at some other locations, e.g., Indonesia, channels
149-161 are considered a single bundle, while channel 165 belongs
to a different bundle. This means that:
1. A station interface connection to an AP on channel 165 allows
the instantiation of a P2P GO on channels 149-165.
2. A station interface connection to an AP on channels 149-161
does NOT allow the instantiation of a P2P GO on channel 165.
Fix this.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When we're performing reauthentication (in order to elevate the
security level from an unauthenticated key to an authenticated one) we
do not need to issue any encryption command once authentication
completes. Since the trigger for the encryption HCI command is the
ENCRYPT_PEND flag this flag should not be set in this scenario.
Instead, the REAUTH_PEND flag takes care of all necessary steps for
reauthentication.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org
Commit 1c2e004183 introduced an event handler for the encryption key
refresh complete event with the intent of fixing some LE/SMP cases.
However, this event is shared with BR/EDR and there we actually want to
act only on the auth_complete event (which comes after the key refresh).
If we do not do this we may trigger an L2CAP Connect Request too early
and cause the remote side to return a security block error.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org
When the ipv6 fib changes during a table dump, the walk is
restarted and the number of nodes dumped are skipped. But the existing
code doesn't advance to the next node after a node is skipped. This can
cause the dump to loop or produce lots of duplicates when the fib
is modified during the dump.
This change advances the walk to the next node if the current node is
skipped after a restart.
Signed-off-by: Kumar Sundararajan <kumar@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows to switch the netns when packet is encapsulated or
decapsulated.
The vxlan socket is openned into the i/o netns, ie into the netns where
encapsulated packets are received. The socket lookup is done into this netns to
find the corresponding vxlan tunnel. After decapsulation, the packet is
injecting into the corresponding interface which may stand to another netns.
When one of the two netns is removed, the tunnel is destroyed.
Configuration example:
ip netns add netns1
ip netns exec netns1 ip link set lo up
ip link add vxlan10 type vxlan id 10 group 239.0.0.10 dev eth0 dstport 0
ip link set vxlan10 netns netns1
ip netns exec netns1 ip addr add 192.168.0.249/24 broadcast 192.168.0.255 dev vxlan10
ip netns exec netns1 ip link set vxlan10 up
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since 115c9b8192 (rtnetlink: Fix problem with
buffer allocation), RTM_NEWLINK messages only contain the IFLA_VFINFO_LIST
attribute if they were solicited by a GETLINK message containing an
IFLA_EXT_MASK attribute with the RTEXT_FILTER_VF flag.
That was done because some user programs broke when they received more data
than expected - because IFLA_VFINFO_LIST contains information for each VF
it can become large if there are many VFs.
However, the IFLA_VF_PORTS attribute, supplied for devices which implement
ndo_get_vf_port (currently the 'enic' driver only), has the same problem.
It supplies per-VF information and can therefore become large, but it is
not currently conditional on the IFLA_EXT_MASK value.
Worse, it interacts badly with the existing EXT_MASK handling. When
IFLA_EXT_MASK is not supplied, the buffer for netlink replies is fixed at
NLMSG_GOODSIZE. If the information for IFLA_VF_PORTS exceeds this, then
rtnl_fill_ifinfo() returns -EMSGSIZE on the first message in a packet.
netlink_dump() will misinterpret this as having finished the listing and
omit data for this interface and all subsequent ones. That can cause
getifaddrs(3) to enter an infinite loop.
This patch addresses the problem by only supplying IFLA_VF_PORTS when
IFLA_EXT_MASK is supplied with the RTEXT_FILTER_VF flag set.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Without IFLA_EXT_MASK specified, the information reported for a single
interface in response to RTM_GETLINK is expected to fit within a netlink
packet of NLMSG_GOODSIZE.
If it doesn't, however, things will go badly wrong, When listing all
interfaces, netlink_dump() will incorrectly treat -EMSGSIZE on the first
message in a packet as the end of the listing and omit information for
that interface and all subsequent ones. This can cause getifaddrs(3) to
enter an infinite loop.
This patch won't fix the problem, but it will WARN_ON() making it easier to
track down what's going wrong.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/netfilter/nfnetlink.c: In function ‘nfnetlink_rcv’:
net/netfilter/nfnetlink.c:371:14: warning: unused variable ‘net’ [-Wunused-variable]
Signed-off-by: David S. Miller <davem@davemloft.net>
It is possible by passing a netlink socket to a more privileged
executable and then to fool that executable into writing to the socket
data that happens to be valid netlink message to do something that
privileged executable did not intend to do.
To keep this from happening replace bare capable and ns_capable calls
with netlink_capable, netlink_net_calls and netlink_ns_capable calls.
Which act the same as the previous calls except they verify that the
opener of the socket had the desired permissions as well.
Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
netlink_net_capable - The common case use, for operations that are safe on a network namespace
netlink_capable - For operations that are only known to be safe for the global root
netlink_ns_capable - The general case of capable used to handle special cases
__netlink_ns_capable - Same as netlink_ns_capable except taking a netlink_skb_parms instead of
the skbuff of a netlink message.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk_net_capable - The common case, operations that are safe in a network namespace.
sk_capable - Operations that are not known to be safe in a network namespace
sk_ns_capable - The general case for special cases.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The permission check in sock_diag_put_filterinfo is wrong, and it is so removed
from it's sources it is not clear why it is wrong. Move the computation
into packet_diag_dump and pass a bool of the result into sock_diag_filterinfo.
This does not yet correct the capability check but instead simply moves it to make
it clear what is going on.
Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
netlink_capable is a static internal function in af_netlink.c and we
have better uses for the name netlink_capable.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/intel/igb/e1000_mac.c
net/core/filter.c
Both conflicts were simple overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
The HCISETRAW ioctl command is not really useful. To utilize raw and
direct access to the HCI controller, the HCI User Channel feature has
been introduced. Return EOPNOTSUPP to indicate missing support for
this command.
For legacy reasons hcidump used to use HCISETRAW for permission check
to return proper error codes to users. To keep backwards compability
return EPERM in case the caller does not have CAP_NET_ADMIN capability.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
NFT_META_BRI_IIFNAME to get packet input bridge interface name
NFT_META_BRI_OIFNAME to get packet output bridge interface name
Such meta key are accessible only through NFPROTO_BRIDGE family, on a
dedicated nft meta module: nft_meta_bridge.
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
exisiting BPF verifier allows uninitialized access to registers,
'ret A' is considered to be a valid filter.
So initialize A and X to zero to prevent leaking kernel memory
In the future BPF verifier will be rejecting such filters
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows to switch the netns when packet is encapsulated or
decapsulated. In other word, the encapsulated packet is received in a netns,
where the lookup is done to find the tunnel. Once the tunnel is found, the
packet is decapsulated and injecting into the corresponding interface which
stands to another netns.
When one of the two netns is removed, the tunnel is destroyed.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows to switch the netns when packet is encapsulated or
decapsulated. In other word, the encapsulated packet is received in a netns,
where the lookup is done to find the tunnel. Once the tunnel is found, the
packet is decapsulated and injecting into the corresponding interface which
stands to another netns.
When one of the two netns is removed, the tunnel is destroyed.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This will be useful to create network family dedicated META expression
as for NFPROTO_BRIDGE for instance.
Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
To ensure family tight expression gets selected in priority to family
agnostic ones.
Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Commit f1370cc4 "xfrm: Remove useless secid field from xfrm_audit." changed
"struct xfrm_audit" to have either
{ audit_get_loginuid(current) / audit_get_sessionid(current) } or
{ INVALID_UID / -1 } pair.
This means that we can represent "struct xfrm_audit" as "bool".
This patch replaces "struct xfrm_audit" argument with "bool".
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Call the per-protocol unbind function rather than bind function on
NETLINK_DROP_MEMBERSHIP in netlink_setsockopt().
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>