ip6gre_tunnel_xmit() is leaking the skb when we hit this error branch,
and the -1 return value from this function is bogus. Use the error
handling we already have in place in ip6gre_tunnel_xmit() for this error
case to fix this.
Signed-off-by: Tommi Rantala <tt.rantala@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Calling icmpv6_send() on a local message size error leads to an
incorrect update of the path mtu in the case when IPsec is used.
So use ipv6_local_error() instead to notify the socket about the
error.
Reported-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The xfrm gc threshold can be configured via xfrm{4,6}_gc_thresh
sysctl but currently only in init_net, other namespaces always
use the default value. This can substantially limit the number
of IPsec tunnels that can be effectively used.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Conflicts:
drivers/net/ethernet/intel/e1000e/ethtool.c
drivers/net/vmxnet3/vmxnet3_drv.c
drivers/net/wireless/iwlwifi/dvm/tx.c
net/ipv6/route.c
The ipv6 route.c conflict is simple, just ignore the 'net' side change
as we fixed the same problem in 'net-next' by eliminating cached
neighbours from ipv6 routes.
The e1000e conflict is an addition of a new statistic in the ethtool
code, trivial.
The vmxnet3 conflict is about one change in 'net' removing a guarding
conditional, whilst in 'net-next' we had a netdev_info() conversion.
The iwlwifi conflict is dealing with a WARN_ON() conversion in
'net-next' vs. a revert happening in 'net'.
Signed-off-by: David S. Miller <davem@davemloft.net>
With the loop, don't check 'rv' twice in a row. Without the loop, 'rv'
doesn't even need to be checked.
Make the comment more grammar-friendly.
Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch updates LINUX_MIB_LISTENDROPS and LINUX_MIB_LISTENOVERFLOWS in
tcp_v6_conn_request() and tcp_v6_err(). tcp_v6_conn_request() in particular can
drop SYNs for various reasons which are not currently tracked.
Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip6_datagram_recv_ctl and ip6_datagram_send_ctl are used for handling IPv6
ancillary data. Since ip6_datagram_send_ctl is already publicly exported for
use in modules, ip6_datagram_recv_ctl should also be available to support
ancillary data in the receive path.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The datagram_*_ctl functions in net/ipv6/datagram.c are IPv6-specific. Since
datagram_send_ctl is publicly exported it should be appropriately named to
reflect the fact that it's for IPv6 only.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since all users are write-lock, it does not make sense to use
rwlock here. Use simple spinlock.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
They will be created at output, if ever needed. This avoids creating
empty neighbor entries when TPROXYing/Forwarding packets for addresses
that are not even directly reachable.
Note that IPv4 already handles it this way. No neighbor entries are
created for local input.
Tested by myself and customer.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The "Universal/Local" (U/L) bit must be complmented according to RFC4944
and RFC2464.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bring in the 'net' tree so that we can get some ipv4/ipv6 bug
fixes that some net-next work will build upon.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds anti-spoofing checks in sit.c as specified in RFC3964
section 5.2 for 6to4 and RFC5969 section 12 for 6rd. I left out the
checks which could easily be implemented with netfilter.
Specifically this patch adds following logic (based loosely on the
pseudocode in RFC3964 section 5.2):
if prefix (inner_src_v6) == rd6_prefix (2002::/16 is the default)
and outer_src_v4 != embedded_ipv4 (inner_src_v6)
drop
if prefix (inner_dst_v6) == rd6_prefix (or 2002::/16 is the default)
and outer_dst_v4 != embedded_ipv4 (inner_dst_v6)
drop
accept
To accomplish the specified security checks proposed by above RFCs,
it is still necessary to employ uRPF filters with netfilter. These new
checks only kick in if the employed addresses are within the 2002::/16 or
another range specified by the 6rd-prefix (which defaults to 2002::/16).
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When attempting to build linux-next with user namespaces enabled I ran
into this fun build error.
CC net/ipv6/inet6_connection_sock.o
.../net/ipv6/inet6_connection_sock.c: In function ‘inet6_csk_bind_conflict’:
.../net/ipv6/inet6_connection_sock.c:37:12: error: incompatible types when initializing type ‘int’ using
type ‘kuid_t’
.../net/ipv6/inet6_connection_sock.c:54:30: error: incompatible type for argument 1 of ‘uid_eq’
.../include/linux/uidgid.h:48:20: note: expected ‘kuid_t’ but argument is of type ‘int’
make[3]: *** [net/ipv6/inet6_connection_sock.o] Error 1
make[2]: *** [net/ipv6] Error 2
make[2]: *** Waiting for unfinished jobs....
Using kuid_t instead of int to hold the uid fixes this.
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are some usecase when lifetime of ipv4 addresses might be helpful.
For example:
1) initramfs networkmanager uses a DHCP daemon to learn network
configuration parameters
2) initramfs networkmanager addresses, routes and DNS configuration
3) initramfs networkmanager is requested to stop
4) initramfs networkmanager stops all daemons including dhclient
5) there are addresses and routes configured but no daemon running. If
the system doesn't start networkmanager for some reason, addresses and
routes will be used forever, which violates RFC 2131.
This patch is essentially a backport of ivp6 address lifetime mechanism
for ipv4 addresses.
Current "ip" tool supports this without any patch (since it does not
distinguish between ipv4 and ipv6 addresses in this perspective.
Also, this should be back-compatible with all current netlink users.
Reported-by: Pavel Šimerda <psimerda@redhat.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Updating the fragmentation queues LRU (Least-Recently-Used) list,
required taking the hash writer lock. However, the LRU list isn't
tied to the hash at all, so we can use a separate lock for it.
Original-idea-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change is primarily a preparation to ease the extension of memory
limit tracking.
The change does reduce the number atomic operation, during freeing of
a frag queue. This does introduce a some performance improvement, as
these atomic operations are at the core of the performance problems
seen on NUMA systems.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pravin Shelar mentioned that GSO could potentially generate
wrong TX checksum if skb has fragments that are overwritten
by the user between the checksum computation and transmit.
He suggested to linearize skbs but this extra copy can be
avoided for normal tcp skbs cooked by tcp_sendmsg().
This patch introduces a new SKB_GSO_SHARED_FRAG flag, set
in skb_shinfo(skb)->gso_type if at least one frag can be
modified by the user.
Typical sources of such possible overwrites are {vm}splice(),
sendfile(), and macvtap/tun/virtio_net drivers.
Tested:
$ netperf -H 7.7.8.84
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
7.7.8.84 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 10.00 3959.52
$ netperf -H 7.7.8.84 -t TCP_SENDFILE
TCP SENDFILE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 ()
port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 10.00 3216.80
Performance of the SENDFILE is impacted by the extra allocation and
copy, and because we use order-0 pages, while the TCP_STREAM uses
bigger pages.
Reported-by: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We did this for IPv4 in b49d3c1e1c "net: ipmr: limit MRT_TABLE
identifiers" but we need to do it for IPv6 as well. On IPv6 the name
is "pim6reg" instead of "pimreg" so there is one less digit allowed.
The strcpy() is in ip6mr_reg_vif().
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
This batch contains netfilter updates for you net-next tree, they are:
* The new connlabel extension for x_tables, that allows us to attach
labels to each conntrack flow. The kernel implementation uses a
bitmask and there's a file in user-space that maps the bits with the
corresponding string for each existing label. By now, you can attach
up to 128 overlapping labels. From Florian Westphal.
* A new round of improvements for the netns support for conntrack.
Gao feng has moved many of the initialization code of each module
of the netns init path. He also made several code refactoring, that
code looks cleaner to me now.
* Added documentation for all possible tweaks for nf_conntrack via
sysctl, from Jiri Pirko.
* Cisco 7941/7945 IP phone support for our SIP conntrack helper,
from Kevin Cernekee.
* Missing header file in the snmp helper, from Stephen Hemminger.
* Finally, a couple of fixes to resolve minor issues with these
changes, from myself.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Motivation for soreuseport would be something like a DNS server. An
alternative would be to recv on the same socket from multiple threads.
As in the case of TCP, the load across these threads tends to be
disproportionate and we also see a lot of contection on the socket lock.
Note that SO_REUSEADDR already allows multiple UDP sockets to bind to
the same port, however there is no provision to prevent hijacking and
nothing to distribute packets across all the sockets sharing the same
bound port. This patch does not change the semantics of SO_REUSEADDR,
but provides usable functionality of it for unicast.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Motivation for soreuseport would be something like a web server
binding to port 80 running with multiple threads, where each thread
might have it's own listener socket. This could be done as an
alternative to other models: 1) have one listener thread which
dispatches completed connections to workers. 2) accept on a single
listener socket from multiple threads. In case #1 the listener thread
can easily become the bottleneck with high connection turn-over rate.
In case #2, the proportion of connections accepted per thread tends
to be uneven under high connection load (assuming simple event loop:
while (1) { accept(); process() }, wakeup does not promote fairness
among the sockets. We have seen the disproportion to be as high
as 3:1 ratio between thread accepting most connections and the one
accepting the fewest. With so_reusport the distribution is
uniform.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the code that register/unregister l4proto to the
module_init/exit context.
Given that we have to modify some interfaces to accomodate
these changes, it is a good time to use shorter function names
for this using the nf_ct_* prefix instead of nf_conntrack_*,
that is:
nf_ct_l4proto_register
nf_ct_l4proto_pernet_register
nf_ct_l4proto_unregister
nf_ct_l4proto_pernet_unregister
We same many line breaks with it.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Move the code that register/unregister l3proto to the
module_init/exit context.
Given that we have to modify some interfaces to accomodate
these changes, it is a good time to use shorter function names
for this using the nf_ct_* prefix instead of nf_conntrack_*,
that is:
nf_ct_l3proto_register
nf_ct_l3proto_pernet_register
nf_ct_l3proto_unregister
nf_ct_l3proto_pernet_unregister
We same many line breaks with it.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
It is declared in:
include/net/ip6_route.h:187:int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *));
and net/ip6_route.h is already included.
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
1) The transport header did not point to the right place after
esp/ah processing on tunnel mode in the receive path. As a
result, the ECN field of the inner header was not set correctly,
fixes from Li RongQing.
2) We did a null check too late in one of the xfrm_replay advance
functions. This can lead to a division by zero, fix from
Nickolai Zeldovich.
3) The size calculation of the hash table missed the muiltplication
with the actual struct size when the hash table is freed.
We might call the wrong free function, fix from Michal Kubecek.
4) On IPsec pmtu events we can't access the transport headers of
the original packet, so force a relookup for all routes
to notify about the pmtu event.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 2152caea ("ipv6: Do not depend on rt->n in rt6_probe().")
introduce a bug to try to update "updated" time in neighbour
structure.
Update the "updated" time only if neighbour is available.
Bug was found by Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch add the support of proxy multicast, ie being able to build a static
multicast tree. It adds the support of (*,*) and (*,G) entries.
The user should define an (*,*) entry which is not used for real forwarding.
This entry defines the upstream in iif and contains all interfaces from the
static tree in its oifs. It will be used to forward packet upstream when they
come from an interface belonging to the static tree.
Hence, the user should define (*,G) entries to build its static tree. Note that
upstream interface must be part of oifs: packets are sent to all oifs
interfaces except the input interface. This ensures to always join the whole
static tree, even if the packet is not coming from the upstream interface.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Construct NS/NA/RS message directly using C99 compound literals.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb_transport_header() (thus icmp6_hdr()) is available here,
use it.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Build ICMPv6 message first and make buffer management easier;
we can use skb->len when filling checksum in ICMPv6 header,
and then build IP header with length field.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
- move ip6_nd_hdr() to its users' source files.
In net/ipv6/mcast.c, it will be called ip6_mc_hdr().
- make return type to void since this function never fails.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Suggested by Eric Dumazet <edumazet@google.com>.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This also makes ndisc_opt_addr_data() and ndisc_fill_addr_option()
use ndisc_opt_addr_space().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add pointer to struct net_device (dev) and remove
data_len (= dev->addr_len) and addr_type (= dev->type).
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is already checked by the caller (tunnel64_rcv) and brings ipip6_rcv
in line with ipip_rcv.
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check message length before accessing "target" field,
as we do for other types.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because of rt->n removal, we do not need neigh argument any more.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
pmtu and redirect events are now handled in the protocols error handler,
so add an error handler for icmp6 to do this. It is needed in the case
when we have no socket context. Based on a patch by Duan Jiong.
Reported-by: Duan Jiong <djduanjiong@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is to fix up a build problem with a wireless driver due to the
dynamic-debug patches in this branch.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CC: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
CC: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
neigh->nud_state and neigh->updated are under protection of
neigh->lock.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 299b0767 (ipv6: Fix IPsec slowpath fragmentation problem)
has introduced a error in the header length calculation that
provokes corrupted packets when non-fragmentable extensions
headers (Destination Option or Routing Header Type 2) are used.
rt->rt6i_nfheader_len is the length of the non-fragmentable
extension header, and it should be substracted to
rt->dst.header_len, and not to exthdrlen, as it was done before
commit 299b0767.
This patch reverts to the original and correct behavior. It has
been successfully tested with and without IPsec on packets
that include non-fragmentable extensions headers.
Signed-off-by: Romain Kuntz <r.kuntz@ipflavors.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
Documentation/networking/ip-sysctl.txt
drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
Both conflicts were simply overlapping context.
A build fix for qlcnic is in here too, simply removing the added
devinit annotations which no longer exist.
Signed-off-by: David S. Miller <davem@davemloft.net>
The only user is cxgb3 driver.
old_neigh is used to check device change, but it must not happen
on redirect. In this sense, we can remove old_neigh argument.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Router Alert option is very small and we can store the value
itself in the skb.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move generalized version of ipv6_is_mld() to header,
and use it from ip6_mc_input().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 7a3198a8 ("ipv6: helper function to get tclass") introduced
ipv6_tclass(), but similar function is already available as
ipv6_get_dsfield().
We might be able to call ipv6_tclass() from ipv6_get_dsfield(),
but it is confusing to have two versions.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is not only for readability but also for optimization.
What we do here is to build the 32bit word at the beginning of the ipv6
header (the "ip6_flow" virtual member of struct ip6_hdr in RFC3542) and
we do not need to read the tclass portion of the target buffer.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.
CC: "David S. Miller" <davem@davemloft.net>
CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
CC: James Morris <jmorris@namei.org>
CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: David S. Miller <davem@davemloft.net>
Replace ip6_route_lookup() with addrconf_get_prefix_route() when
looking up for a prefix route. This ensures that the connected prefix
is looked up in the main table, and avoids the selection of other
matching routes located in different tables as well as blackhole
or prohibited entries.
In addition, this fixes an Opps introduced by commit 64c6d08e (ipv6:
del unreachable route when an addr is deleted on lo), that would occur
when a blackhole or prohibited entry is selected by ip6_route_lookup().
Such entries have a NULL rt6i_table argument, which is accessed by
__ip6_del_rt() when trying to lock rt6i_table->tb6_lock.
The function addrconf_is_prefix_route() is not used anymore and is
removed.
[v2] Minor indentation cleanup and log updates.
Signed-off-by: Romain Kuntz <r.kuntz@ipflavors.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The tests on the flags in addrconf_get_prefix_route() does no make
much sense: the 'noflags' parameter contains the set of flags that
must not match with the route flags, so the test must be done
against 'noflags', and not against 'flags'.
Signed-off-by: Romain Kuntz <r.kuntz@ipflavors.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In ipv6_recv_error(), addr_offset points to daddr field of the ip header.
To get ipv6 header, use container_of() macro instead of substracting magic
number (24).
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As suggested by David, udp6_csum_init() is too big to be inlined,
move it to ipv6 static library, net/ipv6/ip6_checksum.c.
And the generic csum_ipv6_magic() too.
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPsec tunnel does not set ECN field to CE in inner header when
the ECN field in the outer header is CE, and the ECN field in
the inner header is ECT(0) or ECT(1).
The cause is ipip6_hdr() does not return the correct address of
inner header since skb->transport-header is not the inner header
after esp6_input_done2(), or ah6_input().
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
As per suggestion from Eric Dumazet this patch makes tcp_ecn sysctl
namespace aware. The reason behind this patch is to ease the testing
of ecn problems on the internet and allows applications to tune their
own use of ecn.
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the size of skb allocated for NDISC is MAX_HEADER +
LL_RESERVED_SPACE(dev) + packet length + dev->needed_tailroom,
but only LL_RESERVED_SPACE(dev) bytes is "reserved" for headers.
As a result, the skb looks like this (after construction of the
message):
head data tail end
+--------------------------------------------------------------+
+ | | | |
+--------------------------------------------------------------+
|<-hlen---->|<---ipv6 packet------>|<--tlen-->|<--MAX_HEADER-->|
=LL_ = dev
RESERVED_ ->needed_
SPACE(dev) tailroom
As the name implies, "MAX_HEADER" is used for headers, and should
be "reserved" in prior to packet construction. Or, if some space
is really required at the tail of ther skb, it should be
explicitly documented.
We have several option after construction of NDISC message:
Option 1:
head data tail end
+---------------------------------------------+
+ | | |
+---------------------------------------------+
|<-hlen---->|<---ipv6 packet------>|<--tlen-->|
=LL_ = dev
RESERVED_ ->needed_
SPACE(dev) tailroom
Option 2:
head data tail end
+--------------------------------------------------+
+ | | |
+--------------------------------------------------+
|<--MAX_HEADER-->|<---ipv6 packet------>|<--tlen-->|
= dev
->needed_
tailroom
Option 3:
head data tail end
+--------------------------------------------------------------+
+ | | | |
+--------------------------------------------------------------+
|<--MAX_HEADER-->|<-hlen---->|<---ipv6 packet------>|<--tlen-->|
=LL_ = dev
RESERVED_ ->needed_
SPACE(dev) tailroom
Our tunnel drivers try expanding headroom and the space for tunnel
encapsulation was not a mandatory space -- so we are not seeing
bugs here --, but just for optimization for performance critial
situations.
Since NDISC messages are not performance critical unlike TCP,
and as we know outgoing device, LL_RESERVED_SPACE(dev) should be
just enough for the device in most (if not all) cases:
LL_RESERVED_SPACE(dev) <= LL_MAX_HEADER <= MAX_HEADER
Note that LL_RESERVED_SPACE(dev) is also enough for NDISC over
SIT (e.g., ISATAP).
So, I think Option 1 is just fine here.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
csum16_add() has a broken carry detection, should be:
sum += sum < (__force u16)b;
Instead of fixing csum16_add, remove the custom checksum
functions and use the generic csum_add/csum_sub ones.
Signed-off-by: Ulrich Weber <ulrich.weber@sophos.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso says:
====================
The following batch contains Netfilter fixes for 3.8-rc1. They are
a mixture of old bugs that have passed unnoticed (I'll pass these to
stable) and more fresh ones from the previous merge window, they are:
* Fix for MAC address in 6in4 tunnels via NFLOG that results in ulogd
showing up wrong address, from Bob Hockney.
* Fix a comment in nf_conntrack_ipv6, from Florent Fourcot.
* Fix a leak an error path in ctnetlink while creating an expectation,
from Jesper Juhl.
* Fix missing ICMP time exceeded in the IPv6 defragmentation code, from
Haibo Xi.
* Fix inconsistent handling of routing changes in MASQUERADE for the
new connections case, from Andrew Collins.
* Fix a missing skb_reset_transport in ip[6]t_REJECT that leads to
crashes in the ixgbe driver (since it seems to access the transport
header with TSO enabled), from Mukund Jampala.
* Recover obsoleted NOTRACK target by including it into the CT and spot
a warning via printk about being obsoleted. Many people don't check the
scheduled to be removal file under Documentation, so we follow some
less agressive approach to kill this in a year or so. Spotted by Florian
Westphal, patch from myself.
* Fix race condition in xt_hashlimit that allows to create two or more
entries, from myself.
* Fix crash if the CT is used due to the recently added facilities to
consult the dying and unconfirmed conntrack lists, from myself.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
ip6gre_xmit2() incorrectly sets transport header to inner payload
instead of GRE header. It seems copy-and-pasted from ipip.c.
Set transport header to gre header.
(In ipip case the transport header is the inner ip header, so that's
correct.)
Found by inspection. In practice the incorrect transport header
doesn't matter because the skb usually is sent to another net_device
or socket, so the transport header isn't referenced.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
the value of err is always negative if it goes to errout, so we don't need to
check the value of err.
Signed-off-by: Cong Ding <dinggnu@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit b836c99fd6 (ipv6: unify conntrack reassembly expire
code with standard one) use the standard IPv6 reassembly
code(ip6_expire_frag_queue) to handle conntrack reassembly expire.
In ip6_expire_frag_queue, it invoke dev_get_by_index_rcu to get
which device received this expired packet.so we must save ifindex
when NF_conntrack get this packet.
With this patch applied, I can see ICMP Time Exceeded sent
from the receiver when the sender sent out 1/2 fragmented
IPv6 packet.
Signed-off-by: Haibo Xi <haibbo@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Remove ambiguity of double negation.
Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Since (a0ecb85 netfilter: nf_nat: Handle routing changes in MASQUERADE
target), the MASQUERADE target handles routing changes which affect
the output interface of a connection, but only for ESTABLISHED
connections. It is also possible for NEW connections which
already have a conntrack entry to be affected by routing changes.
This adds a check to drop entries in the NEW+conntrack state
when the oif has changed.
Signed-off-by: Andrew Collins <bsderandrew@gmail.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The following commit breaks IPv6 TCP transmission for me:
Commit 75fe83c322
Author: Vlad Yasevich <vyasevic@redhat.com>
Date: Fri Nov 16 09:41:21 2012 +0000
ipv6: Preserve ipv6 functionality needed by NET
This patch fixes the typo "ipv6_offload" which should be
"ipv6-offload".
I don't know why not including the offload modules should
break TCP. Disabling all offload options on the NIC didn't
help. Outgoing pulseaudio traffic kept stalling.
Signed-off-by: Simon Arlott <simon@fire.lp0.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>