In this case, the SCTP association transmits an ASCONF packet
including addition of the new IP address and deletion of the old
address. This patch implements this functionality.
In this case, the ASCONF chunk is added to the beginning of the
queue, because the other chunks cannot be transmitted in this state.
Signed-off-by: Michio Honda <micchie@sfc.wide.ad.jp>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the peer restart the asoc, we should not only fail any unsent/unacked
data, but also stop the T3-rtx, SACK, T4-rto timers, and teardown ASCONF
queues.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If an ASCONF chunk is outstanding, then the following ASCONF
chunk will be queued for later transmission. But when we free
the asoc, we forget to free the ASCONF queue at the same time,
this will cause memory leak.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since we can not update retran path to unconfirmed transports,
when we remove a peer, the retran path may not be update if the
other transports are all unconfirmed, and we will still using
the removed transport as the retran path. This may cause panic
if retrasnmit happen.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit fbdf501c93
sctp: Do no select unconfirmed transports for retransmissions
Introduced the initial falt.
commit d598b166ce
sctp: Make sure we always return valid retransmit path
Solved the problem, but forgot to change the DEBUG statement.
Thus it was still possible to dereference a NULL pointer.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change SCTP_DEBUG_PRINTK and SCTP_DEBUG_PRINTK_IPADDR to
use do { print } while (0) guards.
Add SCTP_DEBUG_PRINTK_CONT to fix errors in log when
lines were continued.
Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
Add a missing newline in "Failed bind hash alloc"
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes from net/ (but not any netfilter files)
all the unnecessary return; statements that precede the
last closing brace of void functions.
It does not remove the returns that are immediately
preceded by a label as gcc doesn't like that.
Done via:
$ grep -rP --include=*.[ch] -l "return;\n}" net/ | \
xargs perl -i -e 'local $/ ; while (<>) { s/\n[ \t\n]+return;\n}/\n}/g; print; }'
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rwnd_press tracks the pressure on the recieve window. Every
timer the receive buffer overlows, we truncate the receive
window and then grow it back. However, if we don't track
the cumulative presser, it's possible to reach a situation
when receive buffer is empty, but rwnd stays truncated.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Right now, sctp transports are not fully initialized and when
adding any new fields, they have to be explicitely initialized.
This is prone to mistakes. So we switch to calling kzalloc()
which makes things much simpler.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
commit 4951feda0c60d1ef681f1a270afdd617924ab041
sctp: Do no select unconfirmed transports for retransmissions
added code to make sure that we do not select unconfirmed paths
for data transmission. This caused a problem when there are only
2 paths, 1 unconfirmed and 1 unreachable. In that case, the next
retransmit path returned is NULL and that causes a kernel crash.
The solution is to only change retransmit paths if we found one to use.
Reported-by: Frank Schuster <frank.schuster01@web.de>
Signed-off-b: Vlad Yasevich <vladislav.yasevich@hp.com>
An unconfirmed transport is one that we have not been
able to reach since the beginning. There is no point in
trying to retrasnmit data on those transports. Also, the
specification forbids it due to security issues.
Reported-by: Frank Schuster <frank.schuster01@web.de>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
When sctp attempts to update an assocition, it removes any
addresses that were not in the updated INITs. However, the loop
may attempt to refrence a transport with address after removing it.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We use the idr subsystem and always ask for an id
at or above 1. This results in a id reuse when one
association is terminated while another is created.
To prevent re-use, we keep track of the last id returned
and ask for that id + 1 as a base for each query. We let
the idr spin lock protect this base id as well.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
When setting the autoclose timeout in jiffies there is a possible
integer overflow if the value in seconds is very large
(e.g. for 2^22 s with HZ=1024). The problem appears even on
64-bit due to the integer promotion rules. The fix is just a cast
to unsigned long.
Signed-off-by: Andrei Pelinescu-Onciul <andrei@iptel.org>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Current implementation of max.burst ends up limiting new
data during cwnd decay period. The decay is happening becuase
the connection is idle and we are allowed to fill the congestion
window. The point of max.burst is to limit micro-bursts in response
to large acks. This still happens, as max.burst is still applied
to each transmit opportunity. It will also apply if a very large
send is made (greater then allowed by burst).
Tested-by: Florian Niederbacher <florian.niederbacher@student.uibk.ac.at>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
We currently send window update SACKs every time we free up 1 PMTU
worth of data. That a lot more SACKs then necessary. Instead, we'll
now send back the actuall window every time we send a sack, and do
window-update SACKs when a fraction of the receive buffer has been
opened. The fraction is controlled with a sysctl.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
When sctp_connectx() is used, we pick the first address as
primary, even though it may not have worked. This results
in excessive retransmits and poor performance. We should
select the address that the association was established with.
Reported-by: Thomas Dreibholz <dreibh@iem.uni-due.de>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Recent commit 8da645e101
sctp: Get rid of an extra routing lookup when adding a transport
introduced a regression in the connection setup. The behavior was
different between IPv4 and IPv6. IPv4 case ended up working because the
route lookup routing returned a NULL route, which triggered another
route lookup later in the output patch that succeeded. In the IPv6 case,
a valid route was returned for first call, but we could not find a valid
source address at the time since the source addresses were not set on the
association yet. Thus resulted in a hung connection.
The solution is to set the source addresses on the association prior to
adding peers.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We used to perform 2 routing lookups for a new transport: one
just for path mtu detection, and one to actually route to destination
and path mtu update when sending a packet. There is no point in doing
both of them, especially since the first one just for path mtu doesn't
take into account source address and sometimes gives the wrong route,
causing path mtu updates anyway.
We now do just the one call to do both route to destination and get
path mtu updates.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Add-IP feature allows users to delete an active transport. If that
transport has chunks in flight, those chunks need to be moved to another
transport or association may get into unrecoverable state.
Reported-by: Rafael Laufer <rlaufer@cisco.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
We had a bug that we never stored the user-defined value for
MAXSEG when setting the value on an association. Thus future
PMTU events ended up re-writing the frag point and increasing
it past user limit. Additionally, when setting the option on
the socket/endpoint, we effect all current associations, which
is against spec.
Now, we store the user 'maxseg' value along with the computed
'frag_point'. We inherit 'maxseg' from the socket at association
creation and use it as an upper limit for 'frag_point' when its
set.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
SCTP has a problem that when small chunks are used, it is possible
to exhaust the receiver buffer without fully closing receive window.
This happens due to all overhead that we have account for with small
messages. To fix this, when receive buffer is exceeded, we'll drop
the window to 0 and save the 'drop' portion. When application starts
reading data and freeing up recevie buffer space, we'll wait until
we've reached the 'drop' window and then add back this 'drop' one
mtu at a time. This worked well in testing and under stress produced
rather even recovery.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
When the sctp transport is marked down, we can release the
cached route and force a new lookup when attempting to use
this transport for anything. This way, if a better route
or source address is available, we'll try to use it.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Prior implementation of the new sctp_connectx() call that returns
an association ID did not work correctly on non-blocking socket.
This is because we could not return both a EINPROGRESS error and
an association id. This is a new implementation that supports this.
Originally from Ivan Skytte Jørgensen <isj-sctp@i1.dk
Signed-off-by: Ivan Skytte Jørgensen <isj-sctp@i1.dk
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
RFC 5061 Section 5.1 ASCONF Chunk Procedures said:
B4) Re-transmit the ASCONF Chunk last sent and if possible choose an
alternate destination address (please refer to [RFC4960],
Section 6.4.1). An endpoint MUST NOT add new parameters to this
chunk; it MUST be the same (including its Sequence Number) as
the last ASCONF sent. An endpoint MAY, however, bundle an
additional ASCONF with new ASCONF parameters with the next
Sequence Number. For details, see Section 5.5.
This patch fix to choose an alternate destination address when
re-transmit the ASCONF chunk, with some dup codes cleanup.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
If T4-rto timer is expired on a removed transport, kernel panic
will occur when we do failure management on that transport.
You can reproduce this use the following sequence:
Endpoint A Endpoint B
(ESTABLISHED) (ESTABLISHED)
<----------------- ASCONF
(SRC=X)
ASCONF ----------------->
(Delete IP Address = X)
<----------------- ASCONF-ACK
(Success Indication)
<----------------- ASCONF
(T4-rto timer expire)
This patch fixed the problem.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
If T2-shutdown timer is expired on a removed transport, kernel
panic will occur when we do failure management on that transport.
You can reproduce this use the following sequence:
Endpoint A Endpoint B
(ESTABLISHED) (ESTABLISHED)
<----------------- SHUTDOWN
(SRC=X)
ASCONF ----------------->
(Delete IP Address = X)
<----------------- ASCONF-ACK
(Success Indication)
<----------------- SHUTDOWN
(T2-shutdown timer expire)
This patch fixed the problem.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
If socket is create by PF_INET type, it can not used IPv6 address
to send/recv DATA. So only enable IPv6 address support on PF_INET6
socket.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
The tsn map currently use is 4K large and is stuck inside
the sctp_association structure making memory references REALLY
expensive. What we really need is at most 4K worth of bits
so the biggest map we would have is 512 bytes. Also, the
map is only really usefull when we have gaps to store and
report. As such, starting with minimal map of say 32 TSNs (bits)
should be enough for normal low-loss operations. We can grow
the map by some multiple of 32 along with some extra room any
time we receive the TSN which would put us outside of the map
boundry. As we close gaps, we can shift the map to rebase
it on the latest TSN we've seen. This saves 4088 bytes per
association just in the map alone along savings from the now
unnecessary structure members.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If INIT-ACK is received with SupportedExtensions parameter which
indicates that the peer does not support AUTH, the packet will be
silently ignore, and sctp_process_init() do cleanup all of the
transports in the association.
When T1-Init timer is expires, OOPS happen while we try to choose
a different init transport.
The solution is to only clean up the non-active transports, i.e
the ones that the peer added. However, that introduces a problem
with sctp_connectx(), because we don't mark the proper state for
the transports provided by the user. So, we'll simply mark
user-provided transports as ACTIVE. That will allow INIT
retransmissions to work properly in the sctp_connectx() context
and prevent the crash.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Removes legacy reinvent-the-wheel type thing. The generic
machinery integrates much better to automated debugging aids
such as kerneloops.org (and others), and is unambiguous due to
better naming. Non-intuively BUG_TRAP() is actually equal to
WARN_ON() rather than BUG_ON() though some might actually be
promoted to BUG_ON() but I left that to future.
I could make at least one BUILD_BUG_ON conversion.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
valgrind reports uninizialized memory accesses when running
sctp inside the network simulation cradle simulator:
Conditional jump or move depends on uninitialised value(s)
at 0x570E34A: sctp_assoc_sync_pmtu (associola.c:1324)
by 0x57427DA: sctp_packet_transmit (output.c:403)
by 0x5710EFF: sctp_outq_flush (outqueue.c:824)
by 0x5710B88: sctp_outq_uncork (outqueue.c:701)
by 0x5745262: sctp_cmd_interpreter (sm_sideeffect.c:1548)
by 0x57444B7: sctp_side_effects (sm_sideeffect.c:976)
by 0x5744460: sctp_do_sm (sm_sideeffect.c:945)
by 0x572157D: sctp_primitive_ASSOCIATE (primitive.c:94)
by 0x5725C04: __sctp_connect (socket.c:1094)
by 0x57297DC: sctp_connect (socket.c:3297)
Conditional jump or move depends on uninitialised value(s)
at 0x575D3A5: mod_timer (timer.c:630)
by 0x5752B78: sctp_cmd_hb_timers_start (sm_sideeffect.c:555)
by 0x5754133: sctp_cmd_interpreter (sm_sideeffect.c:1448)
by 0x5753607: sctp_side_effects (sm_sideeffect.c:976)
by 0x57535B0: sctp_do_sm (sm_sideeffect.c:945)
by 0x571E9AE: sctp_endpoint_bh_rcv (endpointola.c:474)
by 0x573347F: sctp_inq_push (inqueue.c:104)
by 0x572EF93: sctp_rcv (input.c:256)
by 0x5689623: ip_local_deliver_finish (ip_input.c:230)
by 0x5689759: ip_local_deliver (ip_input.c:268)
by 0x5689CAC: ip_rcv_finish (dst.h:246)
#1 is due to "if (t->pmtu_pending)".
8a4794914f "[SCTP] Flag a pmtu change request"
suggests it should be initialized to 0.
#2 is the heartbeat timer 'expires' value, which is uninizialised, but
test by mod_timer().
T3_rtx_timer seems to be affected by the same problem, so initialize it, too.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RFC 4960, Section 11.4. Protection of Non-SCTP-Capable Hosts
When an SCTP stack receives a packet containing multiple control or
DATA chunks and the processing of the packet requires the sending of
multiple chunks in response, the sender of the response chunk(s) MUST
NOT send more than one packet. If bundling is supported, multiple
response chunks that fit into a single packet MAY be bundled together
into one single response packet. If bundling is not supported, then
the sender MUST NOT send more than one response chunk and MUST
discard all other responses. Note that this rule does NOT apply to a
SACK chunk, since a SACK chunk is, in itself, a response to DATA and
a SACK does not require a response of more DATA.
We implement this by not servicing our outqueue until we reach the end
of the packet. This enables maximum bundling. We also identify
'response' chunks and make sure that we only send 1 packet when sending
such chunks.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Right now, any time we set a primary transport we set
the changeover_active flag. As a result, we invoke SFR-CACC
even when there has been no changeover events.
Only set changeover_active, when there is a true changeover
event, i.e. we had a primary path and we are changing to
another transport.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the current retran_path is the only active one, it should
update it to the the next inactive one.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Brings delayed_ack socket option set/get into line with the latest ietf
socket extensions API draft, while maintaining backwards compatibility.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replacing (almost) all invocations of list_for_each() with
list_for_each_entry() tightens up the code and allows for the deletion
of numerous list iterator variables that are no longer necessary.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While recevied ASCONF chunk with serial number less then needed, kernel
will treat this chunk as a retransmitted ASCONF chunk and find cached
ASCONF-ACK chunk used sctp_assoc_lookup_asconf_ack(). But this function
will always return NO-NULL. So response with cached ASCONF-ACKs chunk
will cause kernel panic.
In function sctp_assoc_lookup_asconf_ack(), if the cached ASCONF-ACKs
list asconf_ack_list is empty, or if the serial being requested does not
exists, the function as it currectly stands returns the actuall
list_head asoc->asconf_ack_list, this is not a cache ASCONF-ACK chunk
but a bogus pointer.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
I was notified by Randy Stewart that lksctp claims to be
"the reference implementation". First of all, "the
refrence implementation" was the original implementation
of SCTP in usersapce written ty Randy and a few others.
Second, after looking at the definiton of 'reference implementation',
we don't really meet the requirements.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
The processing of the ASCONF chunks has changed a lot in the
spec. New items are:
1. A list of ASCONF-ACK chunks is now cached
2. The source of the packet is used in response.
3. New handling for unexpect ASCONF chunks.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Address Parameter in the parameter list of the ASCONF chunk
may be a wildcard address. In this case special processing
is required. For the 'add' case, the source IP of the packet is
added. In the 'del' case, all addresses except the source IP
of packet are removed. In the "mark primary" case, the source
address is marked as primary.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Many-many code in the kernel initialized the timer->function
and timer->data together with calling init_timer(timer). There
is already a helper for this. Use it for networking code.
The patch is HUGE, but makes the code 130 lines shorter
(98 insertions(+), 228 deletions(-)).
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>