Commit Graph

2284 Commits

Author SHA1 Message Date
Phil Oester 1205e1fa61 netfilter: xt_TCPMSS: correct return value in tcpmss_mangle_packet
In commit b396966c4 (netfilter: xt_TCPMSS: Fix missing fragmentation handling),
I attempted to add safe fragment handling to xt_TCPMSS.  However, Andy Padavan
of Project N56U correctly points out that returning XT_CONTINUE in this
function does not work.  The callers (tcpmss_tg[46]) expect to receive a value
of 0 in order to return XT_CONTINUE.

Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-09-04 14:20:03 +02:00
Patrick McHardy f4de4c89d8 netfilter: synproxy_core: fix warning in __nf_ct_ext_add_length()
With CONFIG_NETFILTER_DEBUG we get the following warning during SYNPROXY init:

[   80.558906] WARNING: CPU: 1 PID: 4833 at net/netfilter/nf_conntrack_extend.c:80 __nf_ct_ext_add_length+0x217/0x220 [nf_conntrack]()

The reason is that the conntrack template is set to confirmed before adding
the extension and it is invalid to add extensions to already confirmed
conntracks. Fix by adding the extensions before setting the conntrack to
confirmed.

Reported-by: Jesper Dangaard Brouer <jesper.brouer@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-09-04 11:43:36 +02:00
Florian Westphal b7e092c05b netfilter: ctnetlink: fix uninitialized variable
net/netfilter/nf_conntrack_netlink.c: In function 'ctnetlink_nfqueue_attach_expect':
'helper' may be used uninitialized in this function

It was only initialized in if CTA_EXPECT_HELP_NAME attribute was
present, it must be NULL otherwise.

Problem added recently in bd077937
(netfilter: nfnetlink_queue: allow to attach expectations to conntracks).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-28 00:28:19 +02:00
Patrick McHardy 48b1de4c11 netfilter: add SYNPROXY core/target
Add a SYNPROXY for netfilter. The code is split into two parts, the synproxy
core with common functions and an address family specific target.

The SYNPROXY receives the connection request from the client, responds with
a SYN/ACK containing a SYN cookie and announcing a zero window and checks
whether the final ACK from the client contains a valid cookie.

It then establishes a connection to the original destination and, if
successful, sends a window update to the client with the window size
announced by the server.

Support for timestamps, SACK, window scaling and MSS options can be
statically configured as target parameters if the features of the server
are known. If timestamps are used, the timestamp value sent back to
the client in the SYN/ACK will be different from the real timestamp of
the server. In order to now break PAWS, the timestamps are translated in
the direction server->client.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Tested-by: Martin Topholm <mph@one.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-28 00:27:54 +02:00
Patrick McHardy 41d73ec053 netfilter: nf_conntrack: make sequence number adjustments usuable without NAT
Split out sequence number adjustments from NAT and move them to the conntrack
core to make them usable for SYN proxying. The sequence number adjustment
information is moved to a seperate extend. The extend is added to new
conntracks when a NAT mapping is set up for a connection using a helper.

As a side effect, this saves 24 bytes per connection with NAT in the common
case that a connection does not have a helper assigned.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Tested-by: Martin Topholm <mph@one.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-28 00:26:48 +02:00
David S. Miller 89d5e23210 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Conflicts:
	net/netfilter/nf_conntrack_proto_tcp.c

The conflict had to do with overlapping changes dealing with
fixing the use of an "s32" to hold the value returned by
NAT_OFFSET().

Pablo Neira Ayuso says:

====================
The following batch contains Netfilter/IPVS updates for your net-next tree.
More specifically, they are:

* Trivial typo fix in xt_addrtype, from Phil Oester.

* Remove net_ratelimit in the conntrack logging for consistency with other
  logging subsystem, from Patrick McHardy.

* Remove unneeded includes from the recently added xt_connlabel support, from
  Florian Westphal.

* Allow to update conntracks via nfqueue, don't need NFQA_CFG_F_CONNTRACK for
  this, from Florian Westphal.

* Remove tproxy core, now that we have socket early demux, from Florian
  Westphal.

* A couple of patches to refactor conntrack event reporting to save a good
  bunch of lines, from Florian Westphal.

* Fix missing locking in NAT sequence adjustment, it did not manifested in
  any known bug so far, from Patrick McHardy.

* Change sequence number adjustment variable to 32 bits, to delay the
  possible early overflow in long standing connections, also from Patrick.

* Comestic cleanups for IPVS, from Dragos Foianu.

* Fix possible null dereference in IPVS in the SH scheduler, from Daniel
  Borkmann.

* Allow to attach conntrack expectations via nfqueue. Before this patch, you
  had to use ctnetlink instead, thus, we save the conntrack lookup.

* Export xt_rpfilter and xt_HMARK header files, from Nicolas Dichtel.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20 13:30:54 -07:00
David S. Miller 2ff1cf12c9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-08-16 15:37:26 -07:00
Pablo Neira Ayuso bd07793705 netfilter: nfnetlink_queue: allow to attach expectations to conntracks
This patch adds the capability to attach expectations via nfnetlink_queue.
This is required by conntrack helpers that trigger expectations based on
the first packet seen like the TFTP and the DHCPv6 user-space helpers.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-13 16:32:10 +02:00
Pablo Neira Ayuso 0ef71ee1a5 netfilter: ctnetlink: refactor ctnetlink_create_expect
This patch refactors ctnetlink_create_expect by spliting it in two
chunks. As a result, we have a new function ctnetlink_alloc_expect
to allocate and to setup the expectation from ctnetlink.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-13 15:48:20 +02:00
Yuchung Cheng 356d7d88e0 netfilter: nf_conntrack: fix tcp_in_window for Fast Open
Currently the conntrack checks if the ending sequence of a packet
falls within the observed receive window. However it does so even
if it has not observe any packet from the remote yet and uses an
uninitialized receive window (td_maxwin).

If a connection uses Fast Open to send a SYN-data packet which is
dropped afterward in the network. The subsequent SYNs retransmits
will all fail this check and be discarded, leading to a connection
timeout. This is because the SYN retransmit does not contain data
payload so

end == initial sequence number (isn) + 1
sender->td_end == isn + syn_data_len
receiver->td_maxwin == 0

The fix is to only apply this check after td_maxwin is initialized.

Reported-by: Michael Chan <mcfchan@stanford.edu>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-10 18:36:22 +02:00
Florian Westphal c655bc6896 netfilter: nf_conntrack: don't send destroy events from iterator
Let nf_ct_delete handle delivery of the DESTROY event.

Based on earlier patch from Pablo Neira.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-09 12:03:33 +02:00
Daniel Borkmann 54e35cc523 ipvs: ip_vs_sh: ip_vs_sh_get_port: check skb_header_pointer for NULL
skb_header_pointer could return NULL, so check for it as we do it
everywhere else in ipvs code. This fixes a coverity warning.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-08-07 08:57:57 +09:00
Dragos Foianu 70e3ca79cd ipvs: fixed spacing at for statements
found using checkpatch.pl

Signed-off-by: Dragos Foianu <dragos.foianu@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-08-06 15:09:58 +09:00
Dan Carpenter e4d091d7bf netfilter: nfnetlink_{log,queue}: fix information leaks in netlink message
These structs have a "_pad" member.  Also the "phw" structs have an 8
byte "hw_addr[]" array but sometimes only the first 6 bytes are
initialized.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-05 17:36:04 +02:00
Florian Westphal d8b3bfc253 netfilter: tproxy: fix build with IP6_NF_IPTABLES=n
after commit 93742cf (netfilter: tproxy: remove nf_tproxy_core.h)

CONFIG_IPV6=y
CONFIG_IP6_NF_IPTABLES=n

gives us:

net/netfilter/xt_TPROXY.c: In function 'nf_tproxy_get_sock_v6':
net/netfilter/xt_TPROXY.c:178:4: error: implicit declaration of function 'inet6_lookup_listener'

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-05 12:57:38 +02:00
David S. Miller 0e76a3a587 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Merge net into net-next to setup some infrastructure Eric
Dumazet needs for usbnet changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-03 21:36:46 -07:00
Pablo Neira Ayuso a206bcb3b0 netfilter: xt_TCPOPTSTRIP: fix possible off by one access
Fix a possible off by one access since optlen()
touches opt[offset+1] unsafely when i == tcp_hdrlen(skb) - 1.

This patch replaces tcp_hdrlen() by the local variable tcp_hdrlen
that stores the TCP header length, to save some cycles.

Reported-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-01 11:45:15 +02:00
Pablo Neira Ayuso 71ffe9c77d netfilter: xt_TCPMSS: fix handling of malformed TCP header and options
Make sure the packet has enough room for the TCP header and
that it is not malformed.

While at it, store tcph->doff*4 in a variable, as it is used
several times.

This patch also fixes a possible off by one in case of malformed
TCP options.

Reported-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-01 11:42:53 +02:00
Patrick McHardy 12e7ada385 netfilter: nf_nat: use per-conntrack locking for sequence number adjustments
Get rid of the global lock and use per-conntrack locks for protecting the
sequencen number adjustment data. Additionally saves one lock/unlock
operation for every TCP packet.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-31 19:54:59 +02:00
Patrick McHardy 2d89c68ac7 netfilter: nf_nat: change sequence number adjustments to 32 bits
Using 16 bits is too small, when many adjustments happen the offsets might
overflow and break the connection.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-31 19:54:51 +02:00
Patrick McHardy 0658cdc8f3 netfilter: nf_nat: fix locking in nf_nat_seq_adjust()
nf_nat_seq_adjust() needs to grab nf_nat_seqofs_lock to protect against
concurrent changes to the sequence adjustment data.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-31 19:54:24 +02:00
Florian Westphal 02982c27ba netfilter: nf_conntrack: remove duplicate code in ctnetlink
ctnetlink contains copy-paste code from death_by_timeout.  In order to
avoid changing both places in upcoming event delivery patch,
export death_by_timeout functionality and use it in the ctnetlink code.

Based on earlier patch from Pablo Neira.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-31 18:51:23 +02:00
Florian Westphal 93742cf8af netfilter: tproxy: remove nf_tproxy_core.h
We've removed nf_tproxy_core.ko, so also remove its header.
The lookup helpers are split and then moved to tproxy target/socket match.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-31 18:43:45 +02:00
Florian Westphal fd158d79d3 netfilter: tproxy: remove nf_tproxy_core, keep tw sk assigned to skb
The module was "permanent", due to the special tproxy skb->destructor.
Nowadays we have tcp early demux and its sock_edemux destructor in
networking core which can be used instead.

Thanks to early demux changes the input path now also handles
"skb->sk is tw socket" correctly, so this no longer needs the special
handling introduced with commit d503b30bd6
(netfilter: tproxy: do not assign timewait sockets to skb->sk).

Thus:
- move assign_sock function to where its needed
- don't prevent timewait sockets from being assigned to the skb
- remove nf_tproxy_core.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-31 16:39:40 +02:00
Florian Westphal 957bec3685 netfilter: nf_queue: relax NFQA_CT attribute check
Allow modifying attributes of the conntrack associated with a packet
without first requesting ct data via CFG_F_CONNTRACK or extra
nfnetlink_conntrack socket.

Also remove unneded rcu_read_lock; the entire function is already
protected by rcu.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-31 16:39:29 +02:00
Florian Westphal 5813a8eb47 netfilter: connlabels: remove unneeded includes
leftovers from the (never merged) v1 patch.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-31 16:39:18 +02:00
Patrick McHardy 312a0c16c1 netfilter: nf_conntrack: constify sk_buff argument to nf_ct_attach()
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-31 16:37:38 +02:00
Phil Oester 5774c94ace netfilter: xt_addrtype: fix trivial typo
Fix typo in error message.

Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-31 16:36:25 +02:00
Joe Stringer 024ec3deac net/sctp: Refactor SCTP skb checksum computation
This patch consolidates the SCTP checksum calculation code from various
places to a single new function, sctp_compute_cksum(skb, offset).

Signed-off-by: Joe Stringer <joe@wand.net.nz>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-27 20:07:15 -07:00
Eric Dumazet baf60efa58 netfilter: xt_socket: fix broken v0 support
commit 681f130f39 ("netfilter: xt_socket: add XT_SOCKET_NOWILDCARD
flag") added a potential NULL dereference if an old iptables package
uses v0 of the match.

Fix this by removing the test on @info in fast path.

IPv6 can remove the test as well, as it uses v1 or v2.

Reported-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-15 11:15:21 +02:00
Pablo Neira Ayuso f09eca8db0 netfilter: ctnetlink: fix incorrect NAT expectation dumping
nf_ct_expect_alloc leaves unset the expectation NAT fields. However,
ctnetlink_exp_dump_expect expects them to be zeroed in case they are
not used, which may not be the case. This results in dumping the NAT
tuple of the expectation when it should not.

Fix it by zeroing the NAT fields of the expectation.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-15 11:14:51 +02:00
David S. Miller 0c1072ae02 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/freescale/fec_main.c
	drivers/net/ethernet/renesas/sh_eth.c
	net/ipv4/gre.c

The GRE conflict is between a bug fix (kfree_skb --> kfree_skb_list)
and the splitting of the gre.c code into seperate files.

The FEC conflict was two sets of changes adding ethtool support code
in an "!CONFIG_M5272" CPP protected block.

Finally the sh_eth.c conflict was between one commit add bits set
in the .eesr_err_check mask whilst another commit removed the
.tx_error_check member and assignments.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-03 14:55:13 -07:00
Florian Westphal 496e4ae7dc netfilter: nf_queue: add NFQA_SKB_CSUM_NOTVERIFIED info flag
The common case is that TCP/IP checksums have already been
verified, e.g. by hardware (rx checksum offload), or conntrack.

Userspace can use this flag to determine when the checksum
has not been validated yet.

If the flag is set, this doesn't necessarily mean that the packet has
an invalid checksum, e.g. if NIC doesn't support rx checksum.

Userspace that sucessfully enabled NFQA_CFG_F_GSO queue feature flag can
infer that IP/TCP checksum has already been validated if either the
SKB_INFO attribute is not present or the NFQA_SKB_CSUM_NOTVERIFIED
flag is unset.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-30 18:15:48 +02:00
Julian Anastasov 4d0c875dcc ipvs: add sync_persist_mode flag
Add sync_persist_mode flag to reduce sync traffic
by syncing only persistent templates.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-06-26 18:01:46 +09:00
Alexander Frolkin eba3b5a787 ipvs: SH fallback and L4 hashing
By default the SH scheduler rejects connections that are hashed onto a
realserver of weight 0.  This patch adds a flag to make SH choose a
different realserver in this case, instead of rejecting the connection.

The patch also adds a flag to make SH include the source port (TCP, UDP,
SCTP) in the hash as well as the source address.  This basically allows
for deterministic round-robin load balancing (i.e., where any director
in a cluster of directors with identical config will send the same
packet the same way).

The flags are service flags (IP_VS_SVC_F_SCHED*) so that these options
can be set per service.  They are set using a new option to ipvsadm.

Signed-off-by: Alexander Frolkin <avf@eldamar.org.uk>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-06-26 18:01:46 +09:00
Julian Anastasov acaac5d8bb ipvs: drop SCTP connections depending on state
Drop SCTP connections under load (dropentry context) depending
on the protocol state, just like for TCP: INIT conns are
dropped immediately, established are dropped randomly while
connections in progress or shutdown are skipped.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-06-26 18:01:46 +09:00
Julian Anastasov 61e7c420b4 ipvs: replace the SCTP state machine
Convert the SCTP state table, so that it is more readable.
Change the states to be according to the diagram in RFC 2960
and add more states suitable for middle box. Still, such
change in states adds incompatibility if systems in sync
setup include this change and others do not include it.

With this change we also have proper transitions in INPUT-ONLY
mode (DR/TUN) where we see packets only from client. Now
we should not switch to 10-second CLOSED state at a time
when we should stay in ESTABLISHED state.

The short names for states are because we have 16-char space
in ipvsadm and 11-char limit for the connection list format.
It is a sequence of the TCP implementation where the longest
state name is ESTABLISHED.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-06-26 18:01:46 +09:00
Alexander Frolkin c6c96c1883 ipvs: sloppy TCP and SCTP
This adds support for sloppy TCP and SCTP modes to IPVS.

When enabled (sysctls net.ipv4.vs.sloppy_tcp and
net.ipv4.vs.sloppy_sctp), allows IPVS to create connection state on any
packet, not just a TCP SYN (or SCTP INIT).

This allows connections to fail over from one IPVS director to another
mid-flight.

Signed-off-by: Alexander Frolkin <avf@eldamar.org.uk>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-06-26 18:01:46 +09:00
Julian Anastasov bba54de5bd ipvs: provide iph to schedulers
Before now the schedulers needed access only to IP
addresses and it was easy to get them from skb by
using ip_vs_fill_iph_addr_only.

New changes for the SH scheduler will need the protocol
and ports which is difficult to get from skb for the
IPv6 case. As we have all the data in the iph structure,
to avoid the same slow lookups provide the iph to schedulers.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-06-26 18:01:45 +09:00
Florian Westphal 797a7d66d2 netfilter: ctnetlink: send event when conntrack label was modified
commit 0ceabd8387
(netfilter: ctnetlink: deliver labels to userspace) sets the event bit
when we raced with another packet, instead of raising the event bit
when the label bit is set for the first time.

commit 9b21f6a909
(netfilter: ctnetlink: allow userspace to modify labels) forgot to update
the event mask in the "conntrack already exists" case.

Both issues result in CTA_LABELS attribute not getting included in the
conntrack event.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-24 11:32:56 +02:00
Balazs Peter Odor 5aed93875c netfilter: nf_nat_sip: fix mangling
In (b20ab9c netfilter: nf_ct_helper: better logging for dropped packets)
there were some missing brackets around the logging information, thus
always returning drop.

Closes https://bugzilla.kernel.org/show_bug.cgi?id=60061

Signed-off-by: Balazs Peter Odor <balazs@obiserver.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-24 11:32:40 +02:00
Eric Dumazet 681f130f39 netfilter: xt_socket: add XT_SOCKET_NOWILDCARD flag
xt_socket module can be a nice replacement to conntrack module
in some cases (SYN filtering for example)

But it lacks the ability to match the 3rd packet of TCP
handshake (ACK coming from the client).

Add a XT_SOCKET_NOWILDCARD flag to disable the wildcard mechanism.

The wildcard is the legacy socket match behavior, that ignores
LISTEN sockets bound to INADDR_ANY (or ipv6 equivalent)

iptables -I INPUT -p tcp --syn -j SYN_CHAIN
iptables -I INPUT -m socket --nowildcard -j ACCEPT

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-20 20:28:49 +02:00
Florian Westphal 6547a22187 netfilter: nf_conntrack: avoid large timeout for mid-stream pickup
When loose tracking is enabled (default), non-syn packets cause
creation of new conntracks in established state with default timeout for
established state (5 days).  This causes the table to fill up with UNREPLIED
when the 'new ack' packet happened to be the last-ack of a previous,
already timed-out connection.

Consider:

A 192.168.x.52792 > 10.184.y.80: F, 426:426(0) ack 9237 win 255
B 10.184.y.80 > 192.168.x.52792: ., ack 427 win 123
<61 second pause>
C 10.184.y.80 > 192.168.x.52792: F, 9237:9237(0) ack 427 win 123
D 192.168.x.52792 > 10.184.y.80: ., ack 9238 win 255

B moves conntrack to CLOSE_WAIT and will kill it after 60 second timeout,
C is ignored (FIN set), but last packet (D) causes new ct with 5-days timeout.

Use UNACK timeout (5 minutes) instead to get rid of these entries sooner
when in ESTABLISHED state without having seen traffic in both directions.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-20 11:20:13 +02:00
Daniel Borkmann 130ffbc263 netfilter: check return code from nla_parse_tested
These are the only calls under net/ that do not check nla_parse_nested()
for its error code, but simply continue execution. If parsing of netlink
attributes fails, we should return with an error instead of continuing.
In nearly all of these calls we have a policy attached, that is being
type verified during nla_parse_nested(), which we would miss checking
for otherwise.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-20 11:20:13 +02:00
David S. Miller d98cae64e4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/wireless/ath/ath9k/Kconfig
	drivers/net/xen-netback/netback.c
	net/batman-adv/bat_iv_ogm.c
	net/wireless/nl80211.c

The ath9k Kconfig conflict was a change of a Kconfig option name right
next to the deletion of another option.

The xen-netback conflict was overlapping changes involving the
handling of the notify list in xen_netbk_rx_action().

Batman conflict resolution provided by Antonio Quartulli, basically
keep everything in both conflict hunks.

The nl80211 conflict is a little more involved.  In 'net' we added a
dynamic memory allocation to nl80211_dump_wiphy() to fix a race that
Linus reported.  Meanwhile in 'net-next' the handlers were converted
to use pre and post doit handlers which use a flag to determine
whether to hold the RTNL mutex around the operation.

However, the dump handlers to not use this logic.  Instead they have
to explicitly do the locking.  There were apparent bugs in the
conversion of nl80211_dump_wiphy() in that we were not dropping the
RTNL mutex in all the return paths, and it seems we very much should
be doing so.  So I fixed that whilst handling the overlapping changes.

To simplify the initial returns, I take the RTNL mutex after we try
to allocate 'tb'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 16:49:39 -07:00
Julian Anastasov 06f3d7f973 ipvs: SCTP ports should be writable in ICMP packets
Make sure that SCTP ports are writable when embedded in ICMP
from client, so that ip_vs_nat_icmp can translate them safely.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-06-19 09:53:52 +09:00
Joe Perches fe2c6338fd net: Convert uses of typedef ctl_table to struct ctl_table
Reduce the uses of this unnecessary typedef.

Done via perl script:

$ git grep --name-only -w ctl_table net | \
  xargs perl -p -i -e '\
	sub trim { my ($local) = @_; $local =~ s/(^\s+|\s+$)//g; return $local; } \
        s/\b(?<!struct\s)ctl_table\b(\s*\*\s*|\s+\w+)/"struct ctl_table " . trim($1)/ge'

Reflow the modified lines that now exceed 80 columns.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-13 02:36:09 -07:00
Phil Oester b396966c46 netfilter: xt_TCPMSS: Fix missing fragmentation handling
Similar to commit bc6bcb59 ("netfilter: xt_TCPOPTSTRIP: fix
possible mangling beyond packet boundary"), add safe fragment
handling to xt_TCPMSS.

Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-12 11:06:19 +02:00
Phil Oester 70d19f805f netfilter: xt_TCPMSS: Fix IPv6 default MSS too
As a followup to commit 409b545a ("netfilter: xt_TCPMSS: Fix violation
of RFC879 in absence of MSS option"), John Heffner points out that IPv6
has a higher MTU than IPv4, and thus a higher minimum MSS. Update TCPMSS
target to account for this, and update RFC comment.

While at it, point to more recent reference RFC1122 instead of RFC879.

Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-12 11:04:41 +02:00
Eric Dumazet 45203a3b38 net_sched: add 64bit rate estimators
struct gnet_stats_rate_est contains u32 fields, so the bytes per second
field can wrap at 34360Mbit.

Add a new gnet_stats_rate_est64 structure to get 64bit bps/pps fields,
and switch the kernel to use this structure natively.

This structure is dumped to user space as a new attribute :

TCA_STATS_RATE_EST64

Old tc command will now display the capped bps (to 34360Mbit), instead
of wrapped values, and updated tc command will display correct
information.

Old tc command output, after patch :

eric:~# tc -s -d qd sh dev lo
qdisc pfifo 8001: root refcnt 2 limit 1000p
 Sent 80868245400 bytes 1978837 pkt (dropped 0, overlimits 0 requeues 0)
 rate 34360Mbit 189696pps backlog 0b 0p requeues 0

This patch carefully reorganizes "struct Qdisc" layout to get optimal
performance on SMP.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-11 02:51:03 -07:00