Commit Graph

146 Commits

Author SHA1 Message Date
Gerrit Renker 6fdd34d43b dccp ccid-2: Phase out the use of boolean Ack Vector sysctl
This removes the use of the sysctl and the minisock variable for the Send Ack
Vector feature, as it now is handled fully dynamically via feature negotiation
(i.e. when CCID-2 is enabled, Ack Vectors are automatically enabled as per
 RFC 4341, 4.).

Using a sysctl in parallel to this implementation would open the door to
crashes, since much of the code relies on tests of the boolean minisock /
sysctl variable. Thus, this patch replaces all tests of type

	if (dccp_msk(sk)->dccpms_send_ack_vector)
		/* ... */
with
	if (dp->dccps_hc_rx_ackvec != NULL)
		/* ... */

The dccps_hc_rx_ackvec is allocated by the dccp_hdlr_ackvec() when feature
negotiation concluded that Ack Vectors are to be used on the half-connection.
Otherwise, it is NULL (due to dccp_init_sock/dccp_create_openreq_child),
so that the test is a valid one.

The activation handler for Ack Vectors is called as soon as the feature
negotiation has concluded at the
 * server when the Ack marking the transition RESPOND => OPEN arrives;
 * client after it has sent its ACK, marking the transition REQUEST => PARTOPEN.

Adding the sequence number of the Response packet to the Ack Vector has been
removed, since
 (a) connection establishment implies that the Response has been received;
 (b) the CCIDs only look at packets received in the (PART)OPEN state, i.e.
     this entry will always be ignored;
 (c) it can not be used for anything useful - to detect loss for instance, only
     packets received after the loss can serve as pseudo-dupacks.

There was a FIXME to change the error code when dccp_ackvec_add() fails.
I removed this after finding out that:
 * the check whether ackno < ISN is already made earlier,
 * this Response is likely the 1st packet with an Ackno that the client gets,
 * so when dccp_ackvec_add() fails, the reason is likely not a packet error.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-08 01:19:06 -08:00
Gerrit Renker 4098dce5be dccp: Remove manual influence on NDP Count feature
Updating the NDP count feature is handled automatically now:
 * for CCID-2 it is disabled, since the code does not use NDP counts;
 * for CCID-3 it is enabled, as NDP counts are used to determine loss lengths.

Allowing the user to change NDP values leads to unpredictable and failing
behaviour, since it is then possible to disable NDP counts even when they
are needed (e.g. in CCID-3).

This means that only those user settings are sensible that agree with the
values for Send NDP Count implied by the choice of CCID. But those settings
are already activated by the feature negotiation (CCID dependency tracking),
hence this form of support is redundant.

At startup the initialisation of the NDP count feature uses the default
value of 0, which is done implicitly by the zeroing-out of the socket when
it is allocated. If the choice of CCID or feature negotiation enables NDP
count, this will then be updated via the NDP activation handler.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-08 01:18:37 -08:00
Gerrit Renker 422d9cdcb8 dccp: Feature activation handlers
This patch provides the post-processing of feature negotiation state, after
the negotiation has completed.

To this purpose, handlers are used and added to the dccp_feat_table. Each
handler is passed a boolean flag whether the RX or TX side of the feature
is meant.

Several handlers are provided already, new handlers can easily be added.

The initialisation is now fully dynamic, i.e. CCIDs are activated only
after the feature negotiation. The integration of this dynamic activation
is done in the subsequent patches.

Thanks to Wei Yongjun for pointing out the necessity of skipping over empty
Confirm options while copying the negotiated feature values.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-01 23:34:01 -08:00
Gerrit Renker 0971d17ca3 dccp: Insert feature-negotiation options into skb
This patch replaces the earlier insertion routine from options.c, so that
code specific to feature negotiation can remain in feat.c. This is possible
by calling a function already existing in options.c.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-01 23:27:31 -08:00
Eric Dumazet dd24c00191 net: Use a percpu_counter for orphan_count
Instead of using one atomic_t per protocol, use a percpu_counter
for "orphan_count", to reduce cache line contention on
heavy duty network servers. 

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 21:17:14 -08:00
Gerrit Renker dd9c0e363c dccp: Deprecate Ack Ratio sysctl
This patch deprecates the Ack Ratio sysctl, since
 * Ack Ratio is entirely ignored by CCID-3 and CCID-4,
 * Ack Ratio currently doesn't work in CCID-2 (i.e. is always set to 1);
 * even if it would work in CCID-2, there is no point for a user to change it:
   - Ack Ratio is constrained by cwnd (RFC 4341, 6.1.2),
   - if Ack Ratio > cwnd, the system resorts to spurious RTO timeouts
     (since waiting for Acks which will never arrive in this window),
   - cwnd is not a user-configurable value.

The only reasonable place for Ack Ratio is to print it for debugging. It is
planned to do this later on, as part of e.g. dccp_probe.

With this patch Ack Ratio is now under full control of feature negotiation:
 * Ack Ratio is resolved as a dependency of the selected CCID;
 * if the chosen CCID supports it (i.e. CCID == CCID-2), Ack Ratio is set to
   the default of 2, following RFC 4340, 11.3 - "New connections start with Ack
   Ratio 2 for both endpoints";
 * what happens then is part of another patch set, since it concerns the
   dynamic update of Ack Ratio while the connection is in full flight.

Thanks to Tomasz Grobelny for discussion leading up to this patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-16 22:55:08 -08:00
Gerrit Renker 0c1168398e dccp: Mechanism to resolve CCID dependencies
This adds a hook to resolve features whose value depends on the choice of
CCID. It is done at the server since it can only be done after the CCID
values have been negotiated; i.e. the client will add its CCID preference
list on the Change options sent in the Request, which will be reconciled
with the local preference list of the server.

The concept is documented on
http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/feature_negotiation/\
				implementation_notes.html#ccid_dependencies

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-16 22:49:52 -08:00
Gerrit Renker 9eca0a47de dccp: Resolve dependencies of features on choice of CCID
This provides a missing link in the code chain, as several features implicitly
depend and/or rely on the choice of CCID. Most notably, this is the Send Ack Vector
feature, but also Ack Ratio and Send Loss Event Rate (also taken care of).

For Send Ack Vector, the situation is as follows:
 * since CCID2 mandates the use of Ack Vectors, there is no point in allowing 
   endpoints which use CCID2 to disable Ack Vector features such a connection;

 * a peer with a TX CCID of CCID2 will always expect Ack Vectors, and a peer
   with a RX CCID of CCID2 must always send Ack Vectors (RFC 4341, sec. 4);

 * for all other CCIDs, the use of (Send) Ack Vector is optional and thus
   negotiable. However, this implies that the code negotiating the use of Ack
   Vectors also supports it (i.e. is able to supply and to either parse or
   ignore received Ack Vectors). Since this is not the case (CCID-3 has no Ack
   Vector support), the use of Ack Vectors is here disabled, with a comment
   in the source code.

An analogous consideration arises for the Send Loss Event Rate feature,
since the CCID-3 implementation does not support the loss interval options
of RFC 4342. To make such use explicit, corresponding feature-negotiation
options are inserted which signal the use of the loss event rate option,
as it is used by the CCID3 code.

Lastly, the values of the Ack Ratio feature are matched to the choice of CCID.

The patch implements this as a function which is called after the user has
made all other registrations for changing default values of features.

The table is variable-length, the reserved (and hence for feature-negotiation
invalid, confirmed by considering section 19.4 of RFC 4340) feature number `0'
is used to mark the end of the table.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-12 00:48:44 -08:00
Gerrit Renker ac75773c27 dccp: Per-socket initialisation of feature negotiation
This provides feature-negotiation initialisation for both DCCP sockets
and DCCP request_sockets, to support feature negotiation during
connection setup.

It also resolves a FIXME regarding the congestion control
initialisation.

Thanks to Wei Yongjun for help with the IPv6 side of this patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-04 23:55:49 -08:00
Gerrit Renker 61e6473efb dccp: List management for new feature negotiation
This adds list initial fields and list management functions for the
new feature negotiation implementation.

Thanks to Arnaldo for suggestions and improvements.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-04 23:54:04 -08:00
Gui Jianfeng 6edafaaf6f tcp: Fix kernel panic when calling tcp_v(4/6)_md5_do_lookup
If the following packet flow happen, kernel will panic.
MathineA			MathineB
		SYN
	---------------------->    
        	SYN+ACK
	<----------------------
		ACK(bad seq)
	---------------------->
When a bad seq ACK is received, tcp_v4_md5_do_lookup(skb->sk, ip_hdr(skb)->daddr))
is finally called by tcp_v4_reqsk_send_ack(), but the first parameter(skb->sk) is 
NULL at that moment, so kernel panic happens.
This patch fixes this bug.

OOPS output is as following:
[  302.812793] IP: [<c05cfaa6>] tcp_v4_md5_do_lookup+0x12/0x42
[  302.817075] Oops: 0000 [#1] SMP 
[  302.819815] Modules linked in: ipv6 loop dm_multipath rtc_cmos rtc_core rtc_lib pcspkr pcnet32 mii i2c_piix4 parport_pc i2c_core parport ac button ata_piix libata dm_mod mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_wait_scan]
[  302.849946] 
[  302.851198] Pid: 0, comm: swapper Not tainted (2.6.27-rc1-guijf #5)
[  302.855184] EIP: 0060:[<c05cfaa6>] EFLAGS: 00010296 CPU: 0
[  302.858296] EIP is at tcp_v4_md5_do_lookup+0x12/0x42
[  302.861027] EAX: 0000001e EBX: 00000000 ECX: 00000046 EDX: 00000046
[  302.864867] ESI: ceb69e00 EDI: 1467a8c0 EBP: cf75f180 ESP: c0792e54
[  302.868333]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[  302.871287] Process swapper (pid: 0, ti=c0792000 task=c0712340 task.ti=c0746000)
[  302.875592] Stack: c06f413a 00000000 cf75f180 ceb69e00 00000000 c05d0d86 000016d0 ceac5400 
[  302.883275]        c05d28f8 000016d0 ceb69e00 ceb69e20 681bf6e3 00001000 00000000 0a67a8c0 
[  302.890971]        ceac5400 c04250a3 c06f413a c0792eb0 c0792edc cf59a620 cf59a620 cf59a634 
[  302.900140] Call Trace:
[  302.902392]  [<c05d0d86>] tcp_v4_reqsk_send_ack+0x17/0x35
[  302.907060]  [<c05d28f8>] tcp_check_req+0x156/0x372
[  302.910082]  [<c04250a3>] printk+0x14/0x18
[  302.912868]  [<c05d0aa1>] tcp_v4_do_rcv+0x1d3/0x2bf
[  302.917423]  [<c05d26be>] tcp_v4_rcv+0x563/0x5b9
[  302.920453]  [<c05bb20f>] ip_local_deliver_finish+0xe8/0x183
[  302.923865]  [<c05bb10a>] ip_rcv_finish+0x286/0x2a3
[  302.928569]  [<c059e438>] dev_alloc_skb+0x11/0x25
[  302.931563]  [<c05a211f>] netif_receive_skb+0x2d6/0x33a
[  302.934914]  [<d0917941>] pcnet32_poll+0x333/0x680 [pcnet32]
[  302.938735]  [<c05a3b48>] net_rx_action+0x5c/0xfe
[  302.941792]  [<c042856b>] __do_softirq+0x5d/0xc1
[  302.944788]  [<c042850e>] __do_softirq+0x0/0xc1
[  302.948999]  [<c040564b>] do_softirq+0x55/0x88
[  302.951870]  [<c04501b1>] handle_fasteoi_irq+0x0/0xa4
[  302.954986]  [<c04284da>] irq_exit+0x35/0x69
[  302.959081]  [<c0405717>] do_IRQ+0x99/0xae
[  302.961896]  [<c040422b>] common_interrupt+0x23/0x28
[  302.966279]  [<c040819d>] default_idle+0x2a/0x3d
[  302.969212]  [<c0402552>] cpu_idle+0xb2/0xd2
[  302.972169]  =======================
[  302.974274] Code: fc ff 84 d2 0f 84 df fd ff ff e9 34 fe ff ff 83 c4 0c 5b 5e 5f 5d c3 90 90 57 89 d7 56 53 89 c3 50 68 3a 41 6f c0 e8 e9 55 e5 ff <8b> 93 9c 04 00 00 58 85 d2 59 74 1e 8b 72 10 31 db 31 c9 85 f6 
[  303.011610] EIP: [<c05cfaa6>] tcp_v4_md5_do_lookup+0x12/0x42 SS:ESP 0068:c0792e54
[  303.018360] Kernel panic - not syncing: Fatal exception in interrupt

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-06 23:50:04 -07:00
Gerrit Renker 59435444a1 dccp: Allow to distinguish original and retransmitted packets
This patch allows the sender to distinguish original and retransmitted packets,
which is in particular needed for the retransmission of DCCP-Requests:
 * the first Request uses ISS (generated in net/dccp/ip*.c), and sets GSS = ISS;
 * all retransmitted Requests use GSS' = GSS + 1, so that the n-th retransmitted
   Request has sequence number ISS + n (mod 48).

To add generic support, the patch reorganises existing code so that:
 * icsk_retransmits == 0     for the original packet and
 * icsk_retransmits = n > 0  for the n-th retransmitted packet
at the time dccp_transmit_skb() is called, via dccp_retransmit_skb().
 
Thanks to Wei Yongjun for pointing this problem out.

Further changes:
----------------
 * removed the `skb' argument from dccp_retransmit_skb(), since sk_send_head
   is used for all retransmissions (the exception is client-Acks in PARTOPEN
   state, but these do not use sk_send_head);
 * since sk_send_head always contains the original skb (via dccp_entail()),
   skb_cloned() never evaluated to true and thus pskb_copy() was never used.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2008-07-26 11:59:09 +01:00
Ilpo Järvinen 547b792cac net: convert BUG_TRAP to generic WARN_ON
Removes legacy reinvent-the-wheel type thing. The generic
machinery integrates much better to automated debugging aids
such as kerneloops.org (and others), and is unambiguous due to
better naming. Non-intuively BUG_TRAP() is actually equal to
WARN_ON() rather than BUG_ON() though some might actually be
promoted to BUG_ON() but I left that to future.

I could make at least one BUILD_BUG_ON conversion.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-25 21:43:18 -07:00
Gerrit Renker 2013c7e35a dccp ccid-3: Fix error in loss detection
The TFRC loss detection code used the wrong loss condition (RFC 4340, 7.7.1):
 * the difference between sequence numbers s1 and s2 instead of 
 * the number of packets missing between s1 and s2 (one less than the distance).

Since this condition appears in many places of the code, it has been put into a
separate function, dccp_loss_free().

Further changes:
----------------
 * tidied up incorrect typing (it was using `int' for u64/s64 types);
 * optimised conditional statements for common case of non-reordered packets;
 * rewrote comments/documentation to match the changes.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2008-07-13 11:51:40 +01:00
Brian Haley 7d06b2e053 net: change proto destroy method to return void
Change struct proto destroy function pointer to return void.  Noticed
by Al Viro.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-14 17:04:49 -07:00
David S. Miller df39e8ba56 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/ehea/ehea_main.c
	drivers/net/wireless/iwlwifi/Kconfig
	drivers/net/wireless/rt2x00/rt61pci.c
	net/ipv4/inet_timewait_sock.c
	net/ipv6/raw.c
	net/mac80211/ieee80211_sta.c
2008-04-14 02:30:23 -07:00
Patrick McHardy 028b027524 [DCCP]: Fix skb->cb conflicts with IP
dev_queue_xmit() and the other IP output functions expect to get a skb
with clear or properly initialized skb->cb. Unlike TCP and UDP, the
dccp_skb_cb doesn't contain a struct inet_skb_parm at the beginning,
so the DCCP-specific data is interpreted by the IP output functions.
This can cause false negatives for the conditional POST_ROUTING hook
invocation, making the packet bypass the hook.

Add a inet_skb_parm/inet6_skb_parm union to the beginning of
dccp_skb_cb to avoid clashes. Also add a BUILD_BUG_ON to make
sure it fits in the cb.

[ Combined with patch from Gerrit Renker to remove two now unnecessary
  memsets of IPCB(skb)->opt ]

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-12 18:35:41 -07:00
Denis V. Lunev 7630f02681 [DCCP]: Replace socket with sock for reset sending.
Replace dccp_v(4|6)_ctl_socket with sock to unify a code with TCP/ICMP.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-03 14:20:52 -07:00
Harvey Harrison 0dc47877a3 net: replace remaining __FUNCTION__ occurrences
__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-05 20:47:47 -08:00
Arnaldo Carvalho de Melo ab1e0a13d7 [SOCK] proto: Add hashinfo member to struct proto
This way we can remove TCP and DCCP specific versions of

sk->sk_prot->get_port: both v4 and v6 use inet_csk_get_port
sk->sk_prot->hash:     inet_hash is directly used, only v6 need
                       a specific version to deal with mapped sockets
sk->sk_prot->unhash:   both v4 and v6 use inet_hash directly

struct inet_connection_sock_af_ops also gets a new member, bind_conflict, so
that inet_csk_get_port can find the per family routine.

Now only the lookup routines receive as a parameter a struct inet_hashtable.

With this we further reuse code, reducing the difference among INET transport
protocols.

Eventually work has to be done on UDP and SCTP to make them share this
infrastructure and get as a bonus inet_diag interfaces so that iproute can be
used with these protocols.

net-2.6/net/ipv4/inet_hashtables.c:
  struct proto			     |   +8
  struct inet_connection_sock_af_ops |   +8
 2 structs changed
  __inet_hash_nolisten               |  +18
  __inet_hash                        | -210
  inet_put_port                      |   +8
  inet_bind_bucket_create            |   +1
  __inet_hash_connect                |   -8
 5 functions changed, 27 bytes added, 218 bytes removed, diff: -191

net-2.6/net/core/sock.c:
  proto_seq_show                     |   +3
 1 function changed, 3 bytes added, diff: +3

net-2.6/net/ipv4/inet_connection_sock.c:
  inet_csk_get_port                  |  +15
 1 function changed, 15 bytes added, diff: +15

net-2.6/net/ipv4/tcp.c:
  tcp_set_state                      |   -7
 1 function changed, 7 bytes removed, diff: -7

net-2.6/net/ipv4/tcp_ipv4.c:
  tcp_v4_get_port                    |  -31
  tcp_v4_hash                        |  -48
  tcp_v4_destroy_sock                |   -7
  tcp_v4_syn_recv_sock               |   -2
  tcp_unhash                         | -179
 5 functions changed, 267 bytes removed, diff: -267

net-2.6/net/ipv6/inet6_hashtables.c:
  __inet6_hash |   +8
 1 function changed, 8 bytes added, diff: +8

net-2.6/net/ipv4/inet_hashtables.c:
  inet_unhash                        | +190
  inet_hash                          | +242
 2 functions changed, 432 bytes added, diff: +432

vmlinux:
 16 functions changed, 485 bytes added, 492 bytes removed, diff: -7

/home/acme/git/net-2.6/net/ipv6/tcp_ipv6.c:
  tcp_v6_get_port                    |  -31
  tcp_v6_hash                        |   -7
  tcp_v6_syn_recv_sock               |   -9
 3 functions changed, 47 bytes removed, diff: -47

/home/acme/git/net-2.6/net/dccp/proto.c:
  dccp_destroy_sock                  |   -7
  dccp_unhash                        | -179
  dccp_hash                          |  -49
  dccp_set_state                     |   -7
  dccp_done                          |   +1
 5 functions changed, 1 bytes added, 242 bytes removed, diff: -241

/home/acme/git/net-2.6/net/dccp/ipv4.c:
  dccp_v4_get_port                   |  -31
  dccp_v4_request_recv_sock          |   -2
 2 functions changed, 33 bytes removed, diff: -33

/home/acme/git/net-2.6/net/dccp/ipv6.c:
  dccp_v6_get_port                   |  -31
  dccp_v6_hash                       |   -7
  dccp_v6_request_recv_sock          |   +5
 3 functions changed, 5 bytes added, 38 bytes removed, diff: -33

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-03 04:28:52 -08:00
Gerrit Renker a07a5a86d0 [DCCP]: Remove unused inline function
The function follows48(), which is a special-case of dccp_delta_seqno(),
is nowhere used in the DCCP code, thus removed by this patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:24 -08:00
Gerrit Renker af3b867e2f [DCCP]: Support inserting options during the 3-way handshake
This provides a separate routine to insert options during the initial handshake.
The main purpose is to conduct feature negotiation, for the moment the only user
is the timestamp echo needed for the (CCID3) handshake RTT sample.

Padding of options has been put into a small separate routine, to be shared among
the two functions. This could also be used as a generic routine to finish inserting
options.

Also removed an `XXX' comment since its content was obvious.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:52 -08:00
Gerrit Renker 28be544004 [DCCP]: Use maximum-RTO backoff from DCCP spec
This removes another Fixme, using the TCP maximum RTO rather than the value
specified by the DCCP specification. Across the sections in RFC 4340, 64
seconds is consistently suggested as maximum RTO backoff value; and this is
the value which is now used.

I have checked both termination cases for retransmissions of Close/CloseReq:
with the default value 15 of `retries2', and an initial icsk_retransmit = 0,
it takes about 614 seconds to declare a non-responding peer as dead, after
which the final terminating Reset is sent. With the TCP maximum RTO value of
120 seconds it takes (as might be expected) almost twice as long, about 23
minutes.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:47 -08:00
Gerrit Renker 954c2db868 [CCID3]: Interface CCID3 code with newer Loss Intervals Database
This hooks up the TFRC Loss Interval database with CCID 3 packet reception.
In addition, it makes the CCID-specific computation of the first loss
interval (which requires access to all the guts of CCID3) local to ccid3.c.

The patch also fixes an omission in the DCCP code, that of a default /
fallback RTT value (defined in section 3.4 of RFC 4340 as 0.2 sec); while
at it, the  upper bound of 4 seconds for an RTT sample has  been reduced to
match the initial TCP RTO value of 3 seconds from[RFC 1122, 4.2.3.1].

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:20 -08:00
Gerrit Renker 2180c41ca5 [DCCP]: Introduce generic function to test for `data packets'
as per  RFC 4340, sec. 7.7.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:40 -08:00
Gerrit Renker e356d37a09 [DCCP]: Factor out common code for generating Resets
This factors code common to dccp_v{4,6}_ctl_send_reset into a separate function,
and adds support for filling in the Data 1 ... Data 3 fields from RFC 4340, 5.6.

It is useful to have this separate, since the following Reset codes will always
be generated from the control socket rather than via dccp_send_reset:
 * Code 3, "No Connection", cf. 8.3.1;
 * Code 4, "Packet Error" (identification for Data 1 added);
 * Code 5, "Option Error" (identification for Data 1..3 added, will be used later);
 * Code 6, "Mandatory Error" (same as Option Error);
 * Code 7, "Connection Refused" (what on Earth is the difference to "No Connection"?);
 * Code 8, "Bad Service Code";
 * Code 9, "Too Busy";
 * Code 10, "Bad Init Cookie" (not used).

Code 0 is not recommended by the RFC, the following codes would be used in
dccp_send_reset() instead, since they all relate to an established DCCP connection:
 * Code 1, "Closed";
 * Code 2, "Aborted";
 * Code 11, "Aggression Penalty" (12.3).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10 16:52:44 -07:00
Gerrit Renker a94f0f9705 [DCCP]: Rate-limit DCCP-Syncs
This implements a SHOULD from RFC 4340, 7.5.4:
 "To protect against denial-of-service attacks, DCCP implementations SHOULD
  impose a rate limit on DCCP-Syncs sent in response to sequence-invalid packets,
  such as not more than eight DCCP-Syncs per second."

The rate-limit is maintained on a per-socket basis. This is a more stringent
policy than enforcing the rate-limit on a per-source-address basis and
protects against attacks with forged source addresses.

Moreover, the mechanism is deliberately kept simple. In contrast to
xrlim_allow(), bursts of Sync packets in reply to sequence-invalid packets
are not supported.  This foils such attacks where the receipt of a Sync
triggers further sequence-invalid packets. (I have tested this mechanism against
xrlim_allow algorithm for Syncs, permitting bursts just increases the problems.)

In order to keep flexibility, the timeout parameter can be set via sysctl; and
the whole mechanism can even be disabled (which is however not recommended).

The algorithm in this patch has been improved with regard to wrapping issues
thanks to a suggestion by Arnaldo.

Commiter note: Rate limited the step 6 DCCP_WARN too, as it says we're
               sending a sync.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10 16:52:43 -07:00
Gerrit Renker 0430ee3451 [DCCP]: Add Support for Data 1 .. 3 fields of Reset packets
This adds fields to support the informational Data 1..3 fields of the
DCCP-Reset packets (RFC 4340, 5.6), and makes minor cosmetic changes
to documentation.
Code which fills in these fields follows in subsequent patches, it is
primarily used for reporting option-processing and feature-negotiation
errors.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10 16:52:42 -07:00
Gerrit Renker 727ecc5faa [DCCP]: Add FIXME for send_delayed_ack
This adds a FIXME to signal that the function dccp_send_delayed_ack is nowhere
used in the entire DCCP/CCID code.

Using a delayed Ack timer is suggested in 11.3 of RFC 4340, but it has also
rather subtle implications for the Ack-Ratio-accounting.

CCID2 does not use this (maybe it should).

I think leaving the function in is good, in case someone wants to implement
this.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10 16:52:41 -07:00
Gerrit Renker 3393da8241 [DCCP]: Simplify interface of dccp_sample_rtt
The third parameter of dccp_sample_rtt now becomes useless and is removed.

Also combined the subtraction of the timestamp echo and the elapsed time.
This is safe, since (a) presence of timestamp echo is tested first and (b)
elapsed time is either present and non-zero or it is not set and equals 0
due to the memset in dccp_parse_options.

To avoid measuring option-processing time, the timestamp for measuring the
initial Request/Response RTT sample is taken directly when the function is
called (the Linux implementation always adds a timestamp on the Request,
so there is no loss in doing this).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:35 -07:00
Gerrit Renker 4c70f383e0 [DCCP]: Provide 10s of microsecond timesource
This provides a timesource, conveniently used for DCCP timestamps, which
returns the elapsed time in 10s of microseconds since initialisation.
This makes for a wrap-around time of about 11.9 hours, which should be
sufficient for most applications.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:35 -07:00
Arnaldo Carvalho de Melo 6168b96c07 [DCCP]: Nuke the timeval helpers now that we fully converted to ktime_t
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:48:17 -07:00
Arnaldo Carvalho de Melo 8fb8354af9 [DCCP]: Nuke dccp_timestamp and dccps_epoch, not used anymore
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:48:17 -07:00
Arnaldo Carvalho de Melo 9823b7b554 [DCCP]: Convert dccp_sample_rtt to ktime_t
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:48:13 -07:00
Ian McDonald e961811fcd Fix dccp_sum_coverage
When compiling with EXTRA_CFLAGS=-W notice that we have signed/unsigned issue
in dccp.h.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
2007-07-10 22:15:05 -07:00
Gerrit Renker 4712a792ee [DCCP]: Provide function for RTT sampling
A recurring problem, in particular in the CCID code, is that RTT samples
from packets with timestamp echo and elapsed time options need to be taken.

This service is provided via a new function dccp_sample_rtt in this patch.
Furthermore, to protect against `insane' RTT samples, the sampled value
is bounded between 100 microseconds and 4 seconds - for which u32 is sufficient.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:00 -07:00
Gerrit Renker ac12b0c495 [DCCP]: Always use debug-toggle parameters
Currently debugging output (when configured) is automatically enabled when
DCCP modules are compiled into the kernel rather than built as loadable modules.
This is not necessary, since the module parameters in this case become kernel
commandline parameters, e.g. DCCP or CCID3 debug output can be enabled for a
static build by appending the following at the boot prompt:

	dccp.dccp_debug=1 	dccp_ccid3.ccid3_debug=1

This patch therefore does away with the more complicated way of always enabling
debug output for static builds

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:53 -07:00
Gerrit Renker b16be51b5e [DCCP]: Fix for follows48
The follows48 relation identifies whether 48-bit sequence number
x is the direct successor of y. Currently, it does not handle cases
of the following type correctly:

	follows48(0x(prefix)10000LL, 0x(prefix)0FFFFLL)

where prefix is an arbitrary hex sequence of up to 7 digits.

This is fixed by reusing the new dccp_delta_seqno function.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:45 -07:00
Gerrit Renker d52de17b8c [DCCP]: Make `before' relation unambiguous
Problem:
2007-04-25 22:26:44 -07:00
Gerrit Renker 0aec51c869 [DCCP]: Make dccp_delta_seqno return signed numbers
Problem:
2007-04-25 22:26:43 -07:00
Gerrit Renker 6b811d43f6 [DCCP]: 48-bit sequence number arithmetic
This patch
 * organizes the sequence arithmetic functions into one corner of dccp.h
 * performs a small modification of dccp_set_seqno to make it more widely reusable
   (now it is safe to use any number, since it performs modulo-2^48 assignment)
 * adds functions and generic macros for 48-bit sequence arithmetic:
 	--48 bit complement
 	--modulo-48 addition and modulo-48 subtraction
	--dccp_inc_seqno now a special case of add48
Constants renamed following a suggestion by Arnaldo.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:42 -07:00
Adrian Bunk c93a882ebe [DCCP]: make dccp_write_xmit_timer() static again
dccp_write_xmit_timer() needlessly became global.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-25 18:48:10 -07:00
Gerrit Renker aabb601b0f [DCCP]: Initialise write_xmit_timer also on passive sockets
The TX CCID needs the write_xmit_timer for delaying packet sends. Previously
this timer was only activated on active (connecting) sockets.

This patch initialises the write_xmit_timer in sync with the other timers, i.e.
the timer will be ready on any socket. This is used by applications with a
listening socket which start to stream after receiving an initiation by the
client.  The write_xmit_timer is stopped when the application closes, as before.

Was tested to work and to remove the timer bug reported on dccp@vger.

Also moved timer initialisation into timer.c (static). 

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-09 13:47:58 -08:00
YOSHIFUJI Hideaki c9eaf17341 [NET] DCCP: Fix whitespace errors.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-10 23:19:27 -08:00
Gerrit Renker 0f9e5b573f [DCCP]: Debug timeval operations
Problem:

 Most target types in the CCID3 code are u32, so subtle conversion errors
 can occur if signed time calculations yield negative results: the original
 values are lost in the conversion to unsigned, calculation errors go undetected.

 This patch therefore
   * sets all critical time types from unsigned to suseconds_t
   * avoids comparison between signed/unsigned via type-casting
   * provides ample warning messages in case time calculations are negative

 These warning messages can be removed at a later stage when the code
 has undergone more testing.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-11 14:34:45 -08:00
Ian McDonald 5cc3741d6c [DCCP]: Remove timeo from output.c
It simplifies waiting for the CCID module to signal that a packet
is ready to be sent.  Other simplifications flow on from this such as
removing constants.

As a result of this EAGAIN is not returned any more by dccp_wait_for_ccid
(which would otherwise lead to unnecessarily discarding the packet in
dccp_write_xmit).

Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-11 14:34:37 -08:00
Gerrit Renker 59348b19ef [DCCP]: Simplified conditions due to use of enum:8 states
This reaps the benefit of the earlier patch, which changed the type of
CCID 3 states to use enums, in that many conditions are now simplified
and the number of possible (unexpected) values is greatly reduced.

In a few instances, this also allowed to simplify pre-conditions; where
care has been taken to retain logical equivalence.

[DCCP]: Introduce a consistent BUG/WARN message scheme

This refines the existing set of DCCP messages so that
 * BUG(), BUG_ON(), WARN_ON() have meaningful DCCP-specific counterparts
 * DCCP_CRIT (for severe warnings) is not rate-limited
 * DCCP_WARN() is introduced as rate-limited wrapper

Using these allows a faster and cleaner transition to their original
counterparts once the code has matured into a full DCCP implementation.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:24:38 -08:00
Ian McDonald b1308dc015 [DCCP]: Set TX Queue Length Bounds via Sysctl
Previously the transmit queue was unbounded.

This patch:
	* puts a limit on transmit queue length
	  and sends back EAGAIN if the buffer is full
	* sets the TX queue length to a sensible default
	* implements tx buffer sysctls for DCCP

Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:24:37 -08:00
Gerrit Renker 84116716cc [DCCP]: enable debug messages also for static builds
This patch
  * makes debugging (when configured) work both for static / module build
  * provides generic debugging macros for use in other DCCP / CCID modules
  * adds missing information about debug parameters to Kconfig
  * performs some code tidy-up

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:24:35 -08:00
Gerrit Renker 3c6952624a [DCCP]: Introduce DCCP_{BUG{_ON},CRIT} macros, use enum:8 for the ccid3 states
This patch tackles the following problem:
       * the ccid3_hc_{t,r}x_sock define ccid3hc{t,r}x_state as `u8', but
         in reality there can only be a few, pre-defined enum names
       * this necessitates addiditional checking for unexpected values
         which would otherwise be caught by the compiler

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:23:49 -08:00
Gerrit Renker afb0a34dd3 [DCCP]: Introduce a consistent naming scheme for sysctls
In order to make their function clearer and obtain a consistent naming
scheme to identify sysctls, all existing DCCP sysctls have been prefixed
with `sysctl_dccp', following the same convention as used by TCP.

Feature-specific sysctls retain the `feat' in the middle, although the
`default' has been dropped, since it is obvious from use.

Also removed a duplicate `dccp_feat_default_sequence_window' in ipv4.c.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:22:19 -08:00
Gerrit Renker 2e2e9e92bd [DCCP]: Add sysctls to control retransmission behaviour
This adds 3 sysctls which govern the retransmission behaviour of DCCP control
packets (3way handshake, feature negotiation).

It removes 4 FIXMEs from the code.

The close resemblance of sysctl variables to their TCP analogues is emphasised
not only by their name, but also by giving them the same initial values.
This is useful since there is not much practical experience with DCCP yet.

Furthermore, with regard to the previous patch, it is now possible to limit
the number of keepalive-Responses by setting net.dccp.default.request_retries
(also a bit like in TCP).

Lastly, added documentation of all existing DCCP sysctls.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:22:18 -08:00
Gerrit Renker 6f4e5fff1e [DCCP]: Support for partial checksums (RFC 4340, sec. 9.2)
This patch does the following:
  a) introduces variable-length checksums as specified in [RFC 4340, sec. 9.2]
  b) provides necessary socket options and documentation as to how to use them
  c) basic support and infrastructure for the Minimum Checksum Coverage feature
     [RFC 4340, sec. 9.2.1]: acceptability tests, user notification and user
     interface

In addition, it

 (1) fixes two bugs in the DCCPv4 checksum computation:
 	* pseudo-header used checksum_len instead of skb->len
	* incorrect checksum coverage calculation based on dccph_x
 (2) removes dccp_v4_verify_checksum() since it reduplicates code of the
     checksum computation; code calling this function is updated accordingly.
 (3) now uses skb_checksum(), which is safer than checksum_partial() if the
     sk_buff has is a non-linear buffer (has pages attached to it).
 (4) fixes an outstanding TODO item:
        * If P.CsCov is too large for the packet size, drop packet and return.

The code has been tested with applications, the latest version of tcpdump now
comes with support for partial DCCP checksums.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:22:09 -08:00
Gerrit Renker cf557926f6 [DCCP]: tidy up dccp_v{4,6}_conn_request
This is a code simplification to remove reduplicated code
by concentrating and abstracting shared code.

Detailed Changes:
2006-12-02 21:22:03 -08:00
Ian McDonald f45b3ec481 [DCCP]: Fix logfile overflow
This patch fixes data being spewed into the logs continually. As the
code stood if there was a large queue and long delays timeo would go
down to zero and never get reset.

This fixes it by resetting timeo. Put constant into header as well.

Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:22:02 -08:00
Gerrit Renker 8a73cd09d9 [DCCP]: calling dccp_v{4,6}_reqsk_send_ack is a BUG
This patch removes two functions, the send_ack functions of request_sock,
which are not called/used by the DCCP code. It is correct that these
functions are not called, below is a justification why calling these
functions (on a passive socket in the LISTEN/RESPOND state) would mean
a DCCP protocol violation.

A) Background: using request_sock in TCP:
2006-12-02 21:21:58 -08:00
Arnaldo Carvalho de Melo f6484f7c7a [DCCP] timewait: Remove leftover extern declarations
Gerrit Renker noticed dccp_tw_deschedule and submitted a patch with a FIXME,
but as he suggests in the same patch the best thing is to just ditch this
declaration, while doing that also noticed that tcp_tw_count is as well not
defined anywhere, so ditch it too.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:21:57 -08:00
Gerrit Renker 60361be1be [DCCP]: set safe upper bound for option length
This is a re-send from
http://www.mail-archive.com/dccp@vger.kernel.org/msg00553.html

It is the same patch as before, but I have built in Arnaldo's suggestions
pointed out in that posting.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:21:53 -08:00
Gerrit Renker 0e64e94e47 [DCCP]: Update documentation references.
Updates the references to spec documents throughout the code, taking into
account that

* the DCCP, CCID 2, and CCID 3 drafts all became RFCs in March this year

* RFC 1063 was obsoleted by RFC 1191

* draft-ietf-tcpimpl-pmtud-0x.txt was published as an Informational
  RFC, RFC 2923 on 2000-09-22.

All references verified.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-24 16:17:51 -07:00
Ian McDonald 97e5848dd3 [DCCP]: Introduce tx buffering
This adds transmit buffering to DCCP.

I have tested with CCID2/3 and with loss and rate limiting.

Signed off by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:18:17 -07:00
Ian McDonald 837d107cd1 [DCCP]: Introduces follows48 function
This adds a new function to see if two sequence numbers follow each
other.

Signed off by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-26 19:06:42 -07:00
Ian McDonald e6bccd3573 [DCCP]: Update contact details and copyright
Just updating copyright and contacts

Signed off by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-26 19:01:30 -07:00
Jörn Engel 6ab3d5624e Remove obsolete #include <linux/config.h>
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-30 19:25:36 +02:00
Arnaldo Carvalho de Melo a4bf390242 [DCCP] minisock: Rename struct dccp_options to struct dccp_minisock
This will later be included in struct dccp_request_sock so that we can
have per connection feature negotiation state while in the 3way
handshake, when we clone the DCCP_ROLE_LISTEN socket (in
dccp_create_openreq_child) we'll just copy this state from
dreq_minisock to dccps_minisock.

Also the feature negotiation and option parsing code will mostly touch
dccps_minisock, which will simplify some stuff.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:50:58 -08:00
Dmitry Mishin 3fdadf7d27 [NET]: {get|set}sockopt compatibility layer
This patch extends {get|set}sockopt compatibility layer in order to
move protocol specific parts to their place and avoid huge universal
net/compat.c file in the future.

Signed-off-by: Dmitry Mishin <dim@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:45:21 -08:00
Arnaldo Carvalho de Melo 2d0817d11e [DCCP] options: Make dccp_insert_options & friends yell on error
And not the silly LIMIT_NETDEBUG and silently return without inserting
the option requested.

Also drop some old debugging messages associated to option insertion.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:32:06 -08:00
Arnaldo Carvalho de Melo 110bae4efb [DCCP]: Remove leftover dccp_send_response prototype
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:31:46 -08:00
Arnaldo Carvalho de Melo 7247887357 [DCCP] ipv6: Add missing ipv6 control socket
I guess I forgot to add it, nah, now it just works:

18:04:33.274066 IP6 ::1.1476 > ::1.5001: request (service=0)
18:04:33.334482 IP6 ::1.5001 > ::1.1476: reset (code=bad_service_code)

Ditched IP_DCCP_UNLOAD_HACK, as now we would have to do it for both
IPv6 and IPv4, so I'll come up with another way for freeing the
control sockets in upcoming changesets.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:00:37 -08:00
Arnaldo Carvalho de Melo c25a18ba34 [DCCP]: Uninline some functions
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 21:58:56 -08:00
Arnaldo Carvalho de Melo b61fafc4ef [DCCP]: Move the IPv4 specific bits from proto.c to ipv4.c
With this patch in place we can break down the complexity by better
compartmentalizing the code that is common to ipv6 and ipv4.

Now we have these modules:
Module                  Size  Used by
dccp_diag               1344  0
inet_diag               9448  1 dccp_diag
dccp_ccid3             15856  0
dccp_tfrc_lib          12320  1 dccp_ccid3
dccp_ccid2              5764  0
dccp_ipv4              16996  2
dccp                   48208  4 dccp_diag,dccp_ccid3,dccp_ccid2,dccp_ipv4

dccp_ipv6 still requires dccp_ipv4 due to dccp_ipv6_mapped, that is
the next target to work on the "hey, ipv4 is legacy, I only want ipv6
dude!" direction.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 21:25:11 -08:00
Arnaldo Carvalho de Melo c985ed705f [DCCP]: Move dccp_[un]hash from ipv4.c to the core
As this is used by both ipv4 and ipv6 and is not ipv4 specific.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 21:23:39 -08:00
Arnaldo Carvalho de Melo 3e0fadc51f [DCCP]: Move dccp_v4_{init,destroy}_sock to the core
Removing one more ipv6 uses ipv4 stuff case in dccp land.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 21:23:15 -08:00
Arnaldo Carvalho de Melo 017487d7d1 [DCCP]: Generalize dccp_v4_send_reset
Renaming it to dccp_send_reset and moving it from the ipv4 specific
code to the core dccp code.

This fixes some bugs in IPV6 where timers would send v4 resets, etc.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:25:24 -08:00
Arnaldo Carvalho de Melo e55d912f5b [DCCP] feat: Introduce sysctls for the default features
[root@qemu ~]# for a in /proc/sys/net/dccp/default/* ; do echo $a ; cat $a ; done
/proc/sys/net/dccp/default/ack_ratio
2
/proc/sys/net/dccp/default/rx_ccid
3
/proc/sys/net/dccp/default/send_ackvec
1
/proc/sys/net/dccp/default/send_ndp
1
/proc/sys/net/dccp/default/seq_window
100
/proc/sys/net/dccp/default/tx_ccid
3
[root@qemu ~]#

So if wanting to test ccid3 as the tx CCID one can just do:

[root@qemu ~]# echo 3 > /proc/sys/net/dccp/default/tx_ccid
[root@qemu ~]# echo 2 > /proc/sys/net/dccp/default/rx_ccid
[root@qemu ~]# cat /proc/sys/net/dccp/default/[tr]x_ccid
2
3
[root@qemu ~]#

Of course we also need the setsockopt for each app to tell its preferences, but
for testing or defining something other than CCID2 as the default for apps that
don't explicitely set their preference the sysctl interface is handy.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:25:02 -08:00
Andrea Bittau 60fe62e789 [DCCP]: sparse endianness annotations
This also fixes the layout of dccp_hdr short sequence numbers, problem
was not fatal now as we only support long (48 bits) sequence numbers.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:23:32 -08:00
Arnaldo Carvalho de Melo f21e68caa0 [DCCP]: Prepare the AF agnostic core for the introduction of DCCPv6
Basically exports a similar set of functions as the one exported by
the non-AF specific TCP code.

In the process moved some non-AF specific code from dccp_v4_connect to
dccp_connect_init and moved the checksum verification from
dccp_invalid_packet to dccp_v4_rcv, so as to use it in dccp_v6_rcv
too.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:50 -08:00
Arnaldo Carvalho de Melo 34ca686081 [DCCP]: Just rename dccp_v4_prot to dccp_prot
To match TCP equivalent.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:49 -08:00
Herbert Xu 48918a4dbd [DCCP]: Simplify skb_set_owner_w semantics
While we're at it let's reorganise the set_owner_w calls a little so that:
  
1) dccp_transmit_skb sets the owner for all packets except data packets.
2) Add dccp_skb_entail to set owner for packets queued for retransmission.
3) Make dccp_transmit_skb static.
  
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-31 19:26:17 -02:00
Arnaldo Carvalho de Melo ae31c3399d [DCCP]: Move the ack vector code to net/dccp/ackvec.[ch]
Isolating it, that will be used when we introduce a CCID2 (TCP-Like)
implementation.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-18 00:17:51 -07:00
Arnaldo Carvalho de Melo 67e6b62921 [DCCP]: Introduce DCCP_SOCKOPT_SERVICE
As discussed in the dccp@vger mailing list:

Now applications have to use setsockopt(DCCP_SOCKOPT_SERVICE, service[s]),
prior to calling listen() and connect().

An array of unsigned ints can be passed meaning that the listening sock accepts
connection requests for several services.

With this we can ditch struct sockaddr_dccp and use only sockaddr_in (and
sockaddr_in6 in the future).

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-16 16:58:40 -07:00
Arnaldo Carvalho de Melo b0e567806d [DCCP] Introduce dccp_timestamp
To start the timestamps with 0.0ms, easing the integer maths in the CCIDs, this
probably will be reworked to use the to be introduced struct timeval_offset
infrastructure out of skb_get_timestamp, etc.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-09-09 02:38:35 -03:00
Arnaldo Carvalho de Melo c530cfb1ce [CCID3]: Call sk->sk_write_space(sk) when receiving a feedback packet
This makes the send rate calculations behave way more closely to what
is specified, with the jitter previously seen on x and x_recv
disappearing completely on non lossy setups.

This resembles the tcp_data_snd_check code, that possibly we'll end up
using in DCCP as well, perhaps moving this code to
inet_connection_sock.

For now I'm doing the simplest implementation tho.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 16:13:46 -07:00
Arnaldo Carvalho de Melo b6ee3d4ada [CCID3]: Reorganise timeval handling
Introducing functions to add to or subtract from a timeval variable
and renaming now_delta to timeval_new_delta that calls do_gettimeofday
and then timeval_delta, that should be used when there are several
deltas made relative to the current time or setting variables to it,
so as to avoid calling do_gettimeofday excessively.

I'm leaving these "timeval_" prefixed funcions internal to DCCP for a
while till we're sure there are no subtle bugs in it.

It also is more correct as it checks if the number of usecs added to
or subtracted from a tv_usec field is more than 2 seconds.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 16:11:56 -07:00
Arnaldo Carvalho de Melo d6809c12b3 [DCCP]: Introduce dccp_wait_for_ccid and use it in dccp_write_xmit
This is not quite what I think we should have long term but improves
performance for now, so lets use it till we get CCID3 working well,
then we can think about using sk_write_queue, perhaps using some ideas
from Juwen Lai's old stack for 2.4.20.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 16:11:38 -07:00
Arnaldo Carvalho de Melo d4b81ff705 [DCCP]: Export dccp_insert_option_timestamp to CCIDs
And don't insert a TIMESTAMP option in all packets, leave the decision
to the CCIDs.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 16:04:53 -07:00
Arnaldo Carvalho de Melo 7ad07e7cf3 [DCCP]: Implement the CLOSING timer
So that we retransmit CLOSE/CLOSEREQ packets till they elicit an
answer or we hit a timeout.

Most of the machinery uses TCP approaches, this code has to be
polished & audited, but this is better than we had before.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 16:04:31 -07:00
Arnaldo Carvalho de Melo 03ace394ac [DCCP]: Fix the ACK and SEQ window variables settings
This is from a first audit, more eyeballs are more than welcome.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 16:03:42 -07:00
Ian McDonald 1bc0986957 [DCCP]: Fix the timestamp options
This changes timestamp, timestamp echo, and elapsed time to use units of 10
usecs as per DCCP spec. This has been tested to verify that times are correct.
Also fixed up length and used hton/ntoh more.

Still to add in later patches:
- actually use elapsed time to adjust RTT
(commented out as was prior to this patch)
- send options at times more closely following the spec
(content is now correct)

Signed-off-by: Ian McDonald <iam4@cs.waikato.ac.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 16:02:34 -07:00
Arnaldo Carvalho de Melo e92ae93a8a [DCCP]: Send SYNCACK packets in response to SYNC packets
Also fix step 6 when receiving SYNC or SYNCACK packets, i.e. we were not using
the updated swl.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 16:01:50 -07:00
Patrick McHardy a10cedd4b9 [DCCP]: Fix compiler warnings
may be a false warning if there always is something on ccid3hcrx_hist:

net/dccp/ccids/ccid3.c: In function 'ccid3_hc_rx_packet_recv':
net/dccp/ccids/ccid3.c:1634: warning: 'tstamp.tv_usec' may be used uninitialized in this function
net/dccp/ccids/ccid3.c:1634: warning: 'tstamp.tv_sec' may be used uninitialized in this function

const on inline functions doesn't have any effect:

net/dccp/dccp.h:64: warning: type qualifiers ignored on function return type
net/dccp/dccp.h:70: warning: type qualifiers ignored on function return type
net/dccp/dccp.h:76: warning: type qualifiers ignored on function return type

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 16:00:12 -07:00
Arnaldo Carvalho de Melo a1d3a35518 [DCCP]: Fix sparse warnings
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:59:59 -07:00
Arnaldo Carvalho de Melo 725ba8eee3 [DCCP]: Introduce the DCCP Kernel hacking menu
Only available if CONFIG_DEBUG_KERNEL is enabled in the "Kernel
Hacking" Menu.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:59:43 -07:00
Arnaldo Carvalho de Melo 7690af3fff [DCCP]: Just reflow the source code to fit in 80 columns
Andrew Morton should be happy now 8)

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:59:26 -07:00
Arnaldo Carvalho de Melo 27258ee54f [DCCP]: Introduce dccp_write_xmit from code in dccp_sendmsg
This way it gets closer to the TCP flow, where congestion window
checks are done, it seems we can map ccid_hc_tx_send_packet in
dccp_write_xmit to tcp_snd_wnd_test in tcp_write_xmit, a CCID2
decision should just fit in here as well...

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:55:18 -07:00
Yoshifumi Nishida 95b81ef794 [DCCP]: Fix checksum routines
Signed-off-by: Yoshifumi Nishida <nishida@csl.sony.co.jp>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:49:55 -07:00
Arnaldo Carvalho de Melo 7c657876b6 [DCCP]: Initial implementation
Development to this point was done on a subversion repository at:

http://oops.ghostprotocols.net:81/cgi-bin/viewcvs.cgi/dccp-2.6/

This repository will be kept at this site for the foreseable future,
so that interested parties can see the history of this code,
attributions, etc.

If I ever decide to take this offline I'll provide the full history at
some other suitable place.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:49:46 -07:00