summaryrefslogtreecommitdiffstats
path: root/net/sctp
Commit message (Collapse)AuthorAgeFilesLines
* sctp: fix overrun in sctp_diag_dump_one()Lance Richardson2016-08-231-2/+4
| | | | | | | | | | | | | The function sctp_diag_dump_one() currently performs a memcpy() of 64 bytes from a 16 byte field into another 16 byte field. Fix by using correct size, use sizeof to obtain correct size instead of using a hard-coded constant. Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file") Signed-off-by: Lance Richardson <lrichard@redhat.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: linearize early if it's not GSOMarcelo Ricardo Leitner2016-08-192-17/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because otherwise when crc computation is still needed it's way more expensive than on a linear buffer to the point that it affects performance. It's so expensive that netperf test gives a perf output as below: Overhead Command Shared Object Symbol 18,62% netserver [kernel.vmlinux] [k] crc32_generic_shift 2,57% netserver [kernel.vmlinux] [k] __pskb_pull_tail 1,94% netserver [kernel.vmlinux] [k] fib_table_lookup 1,90% netserver [kernel.vmlinux] [k] copy_user_enhanced_fast_string 1,66% swapper [kernel.vmlinux] [k] intel_idle 1,63% netserver [kernel.vmlinux] [k] _raw_spin_lock 1,59% netserver [sctp] [k] sctp_packet_transmit 1,55% netserver [kernel.vmlinux] [k] memcpy_erms 1,42% netserver [sctp] [k] sctp_rcv # netperf -H 192.168.10.1 -l 10 -t SCTP_STREAM -cC -- -m 12000 SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.10.1 () port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 212992 212992 12000 10.00 3016.42 2.88 3.78 1.874 2.462 After patch: Overhead Command Shared Object Symbol 2,75% netserver [kernel.vmlinux] [k] memcpy_erms 2,63% netserver [kernel.vmlinux] [k] copy_user_enhanced_fast_string 2,39% netserver [kernel.vmlinux] [k] fib_table_lookup 2,04% netserver [kernel.vmlinux] [k] __pskb_pull_tail 1,91% netserver [kernel.vmlinux] [k] _raw_spin_lock 1,91% netserver [sctp] [k] sctp_packet_transmit 1,72% netserver [mlx4_en] [k] mlx4_en_process_rx_cq 1,68% netserver [sctp] [k] sctp_rcv # netperf -H 192.168.10.1 -l 10 -t SCTP_STREAM -cC -- -m 12000 SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.10.1 () port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 212992 212992 12000 10.00 3681.77 3.83 3.46 2.045 1.849 Fixes: 3acb50c18d8d ("sctp: delay as much as possible skb_linearize") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sctp: always initialise sctp_ht_iter::start_failVegard Nossum2016-08-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sctp_transport_seq_start() does not currently clear iter->start_fail on success, but relies on it being zero when it is allocated (by seq_open_net()). This can be a problem in the following sequence: open() // allocates iter (and implicitly sets iter->start_fail = 0) read() - iter->start() // fails and sets iter->start_fail = 1 - iter->stop() // doesn't call sctp_transport_walk_stop() (correct) read() again - iter->start() // succeeds, but doesn't change iter->start_fail - iter->stop() // doesn't call sctp_transport_walk_stop() (wrong) We should initialize sctp_ht_iter::start_fail to zero if ->start() succeeds, otherwise it's possible that we leave an old value of 1 there, which will cause ->stop() to not call sctp_transport_walk_stop(), which causes all sorts of problems like not calling rcu_read_unlock() (and preempt_enable()), eventually leading to more warnings like this: BUG: sleeping function called from invalid context at mm/slab.h:388 in_atomic(): 0, irqs_disabled(): 0, pid: 16551, name: trinity-c2 Preemption disabled at:[<ffffffff819bceb6>] rhashtable_walk_start+0x46/0x150 [<ffffffff81149abb>] preempt_count_add+0x1fb/0x280 [<ffffffff83295892>] _raw_spin_lock+0x12/0x40 [<ffffffff819bceb6>] rhashtable_walk_start+0x46/0x150 [<ffffffff82ec665f>] sctp_transport_walk_start+0x2f/0x60 [<ffffffff82edda1d>] sctp_transport_seq_start+0x4d/0x150 [<ffffffff81439e50>] traverse+0x170/0x850 [<ffffffff8143aeec>] seq_read+0x7cc/0x1180 [<ffffffff814f996c>] proc_reg_read+0xbc/0x180 [<ffffffff813d0384>] do_loop_readv_writev+0x134/0x210 [<ffffffff813d2a95>] do_readv_writev+0x565/0x660 [<ffffffff813d6857>] vfs_readv+0x67/0xa0 [<ffffffff813d6c16>] do_preadv+0x126/0x170 [<ffffffff813d710c>] SyS_preadv+0xc/0x10 [<ffffffff8100334c>] do_syscall_64+0x19c/0x410 [<ffffffff83296225>] return_from_SYSCALL_64+0x0/0x6a [<ffffffffffffffff>] 0xffffffffffffffff Notice that this is a subtly different stacktrace from the one in commit 5fc382d875 ("net/sctp: terminate rhashtable walk correctly"). Cc: Xin Long <lucien.xin@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Acked-By: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: use event->chunk when it's validXin Long2016-08-081-2/+2
| | | | | | | | | | | | | | | | | | | | Commit 52253db924d1 ("sctp: also point GSO head_skb to the sk when it's available") used event->chunk->head_skb to get the head_skb in sctp_ulpevent_set_owner(). But at that moment, the event->chunk was NULL, as it cloned the skb in sctp_ulpevent_make_rcvmsg(). Therefore, that patch didn't really work. This patch is to move the event->chunk initialization before calling sctp_ulpevent_receive_data() so that it uses event->chunk when it's valid. Fixes: 52253db924d1 ("sctp: also point GSO head_skb to the sk when it's available") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp_diag: Respect ss adding TCPF_CLOSE to idiag_statesPhil Sutter2016-08-081-2/+2
| | | | | | | | | Since 'ss' always adds TCPF_CLOSE to idiag_states flags, sctp_diag can't rely upon TCPF_LISTEN flag solely being present when listening sockets are requested. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp_diag: Fix T3_rtx timer exportPhil Sutter2016-08-081-4/+10
| | | | | | | | | | | | The asoc's timer value is not kept in asoc->timeouts array but in it's primary transport instead. Furthermore, we must export the timer only if it is pending, otherwise the value will underrun when stored in an unsigned variable and user space will only see a very large timeout value. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: allow receiving msg when TCP-style sk is in CLOSED stateXin Long2016-07-301-1/+1
| | | | | | | | | | | | | | | | | Commit 141ddefce7c8 ("sctp: change sk state to CLOSED instead of CLOSING in sctp_sock_migrate") changed sk state to CLOSED if the assoc is closed when sctp_accept clones a new sk. If there is still data in sk receive queue, users will not be able to read it any more, as sctp_recvmsg returns directly if sk state is CLOSED. This patch is to add CLOSED state check in sctp_recvmsg to allow reading data from TCP-style sk with CLOSED state as what TCP does. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: allow delivering notifications after receiving SHUTDOWNXin Long2016-07-301-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | Prior to this patch, once sctp received SHUTDOWN or shutdown with RD, sk->sk_shutdown would be set with RCV_SHUTDOWN, and all events would be dropped in sctp_ulpq_tail_event(). It would cause: 1. some notifications couldn't be received by users. like SCTP_SHUTDOWN_COMP generated by sctp_sf_do_4_C(). 2. sctp would also never trigger sk_data_ready when the association was closed, making it harder to identify the end of the association by calling recvmsg() and getting an EOF. It was not convenient for kernel users. The check here should be stopping delivering DATA chunks after receiving SHUTDOWN, and stopping delivering ANY chunks after sctp_close(). So this patch is to allow notifications to enqueue into receive queue even if sk->sk_shutdown is set to RCV_SHUTDOWN in sctp_ulpq_tail_event, but if sk->sk_shutdown == RCV_SHUTDOWN | SEND_SHUTDOWN, it drops all events. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: fix the issue sctp requeue auth chunk incorrectlyXin Long2016-07-301-1/+2
| | | | | | | | | | | | | | | | | sctp needs to queue auth chunk back when we know that we are going to generate another segment. But commit f1533cce60d1 ("sctp: fix panic when sending auth chunks") requeues the last chunk processed which is probably not the auth chunk. It causes panic when calculating the MAC in sctp_auth_calculate_hmac(), as the incorrect offset of the auth chunk in skb->data. This fix is to requeue it by using packet->auth. Fixes: f1533cce60d1 ("sctp: fix panic when sending auth chunks") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sctp: terminate rhashtable walk correctlyVegard Nossum2016-07-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I was seeing a lot of these: BUG: sleeping function called from invalid context at mm/slab.h:388 in_atomic(): 0, irqs_disabled(): 0, pid: 14971, name: trinity-c2 Preemption disabled at:[<ffffffff819bcd46>] rhashtable_walk_start+0x46/0x150 [<ffffffff81149abb>] preempt_count_add+0x1fb/0x280 [<ffffffff83295722>] _raw_spin_lock+0x12/0x40 [<ffffffff811aac87>] console_unlock+0x2f7/0x930 [<ffffffff811ab5bb>] vprintk_emit+0x2fb/0x520 [<ffffffff811aba6a>] vprintk_default+0x1a/0x20 [<ffffffff812c171a>] printk+0x94/0xb0 [<ffffffff811d6ed0>] print_stack_trace+0xe0/0x170 [<ffffffff8115835e>] ___might_sleep+0x3be/0x460 [<ffffffff81158490>] __might_sleep+0x90/0x1a0 [<ffffffff8139b823>] kmem_cache_alloc+0x153/0x1e0 [<ffffffff819bca1e>] rhashtable_walk_init+0xfe/0x2d0 [<ffffffff82ec64de>] sctp_transport_walk_start+0x1e/0x60 [<ffffffff82edd8ad>] sctp_transport_seq_start+0x4d/0x150 [<ffffffff8143a82b>] seq_read+0x27b/0x1180 [<ffffffff814f97fc>] proc_reg_read+0xbc/0x180 [<ffffffff813d471b>] __vfs_read+0xdb/0x610 [<ffffffff813d4d3a>] vfs_read+0xea/0x2d0 [<ffffffff813d615b>] SyS_pread64+0x11b/0x150 [<ffffffff8100334c>] do_syscall_64+0x19c/0x410 [<ffffffff832960a5>] return_from_SYSCALL_64+0x0/0x6a [<ffffffffffffffff>] 0xffffffffffffffff Apparently we always need to call rhashtable_walk_stop(), even when rhashtable_walk_start() fails: * rhashtable_walk_start - Start a hash table walk * @iter: Hash table iterator * * Start a hash table walk. Note that we take the RCU lock in all * cases including when we return an error. So you must always call * rhashtable_walk_stop to clean up. otherwise we never call rcu_read_unlock() and we get the splat above. Fixes: 53fa1036 ("sctp: fix some rhashtable functions using in sctp proc/diag") See-also: 53fa1036 ("sctp: fix some rhashtable functions using in sctp proc/diag") See-also: f2dba9c6 ("rhashtable: Introduce rhashtable_walk_*") Cc: Xin Long <lucien.xin@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: stable@vger.kernel.org Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: also point GSO head_skb to the sk when it's availableMarcelo Ricardo Leitner2016-07-251-0/+3
| | | | | | | | | | | | | | The head skb for GSO packets won't travel through the inner depths of SCTP stack as it doesn't contain any chunks on it. That means skb->sk doesn't get set and then when sctp_recvmsg() calls sctp_inet6_skb_msgname() on the head_skb it panics, as this last needs to check flags at the socket (sp->v4mapped). The fix is to initialize skb->sk for th head skb once we are able to do it. That is, when the first chunk is processed. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: fix BH handling on socket backlogMarcelo Ricardo Leitner2016-07-252-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that the backlog processing is called with BH enabled, we have to disable BH before taking the socket lock via bh_lock_sock() otherwise it may dead lock: sctp_backlog_rcv() bh_lock_sock(sk); if (sock_owned_by_user(sk)) { if (sk_add_backlog(sk, skb, sk->sk_rcvbuf)) sctp_chunk_free(chunk); else backloged = 1; } else sctp_inq_push(inqueue, chunk); bh_unlock_sock(sk); while sctp_inq_push() was disabling/enabling BH, but enabling BH triggers pending softirq, which then may try to re-lock the socket in sctp_rcv(). [ 219.187215] <IRQ> [ 219.187217] [<ffffffff817ca3e0>] _raw_spin_lock+0x20/0x30 [ 219.187223] [<ffffffffa041888c>] sctp_rcv+0x48c/0xba0 [sctp] [ 219.187225] [<ffffffff816e7db2>] ? nf_iterate+0x62/0x80 [ 219.187226] [<ffffffff816f1b14>] ip_local_deliver_finish+0x94/0x1e0 [ 219.187228] [<ffffffff816f1e1f>] ip_local_deliver+0x6f/0xf0 [ 219.187229] [<ffffffff816f1a80>] ? ip_rcv_finish+0x3b0/0x3b0 [ 219.187230] [<ffffffff816f17a8>] ip_rcv_finish+0xd8/0x3b0 [ 219.187232] [<ffffffff816f2122>] ip_rcv+0x282/0x3a0 [ 219.187233] [<ffffffff810d8bb6>] ? update_curr+0x66/0x180 [ 219.187235] [<ffffffff816abac4>] __netif_receive_skb_core+0x524/0xa90 [ 219.187236] [<ffffffff810d8e00>] ? update_cfs_shares+0x30/0xf0 [ 219.187237] [<ffffffff810d557c>] ? __enqueue_entity+0x6c/0x70 [ 219.187239] [<ffffffff810dc454>] ? enqueue_entity+0x204/0xdf0 [ 219.187240] [<ffffffff816ac048>] __netif_receive_skb+0x18/0x60 [ 219.187242] [<ffffffff816ad1ce>] process_backlog+0x9e/0x140 [ 219.187243] [<ffffffff816ac8ec>] net_rx_action+0x22c/0x370 [ 219.187245] [<ffffffff817cd352>] __do_softirq+0x112/0x2e7 [ 219.187247] [<ffffffff817cc3bc>] do_softirq_own_stack+0x1c/0x30 [ 219.187247] <EOI> [ 219.187248] [<ffffffff810aa1c8>] do_softirq.part.14+0x38/0x40 [ 219.187249] [<ffffffff810aa24d>] __local_bh_enable_ip+0x7d/0x80 [ 219.187254] [<ffffffffa0408428>] sctp_inq_push+0x68/0x80 [sctp] [ 219.187258] [<ffffffffa04190f1>] sctp_backlog_rcv+0x151/0x1c0 [sctp] [ 219.187260] [<ffffffff81692b07>] __release_sock+0x87/0xf0 [ 219.187261] [<ffffffff81692ba0>] release_sock+0x30/0xa0 [ 219.187265] [<ffffffffa040e46d>] sctp_accept+0x17d/0x210 [sctp] [ 219.187266] [<ffffffff810e7510>] ? prepare_to_wait_event+0xf0/0xf0 [ 219.187268] [<ffffffff8172d52c>] inet_accept+0x3c/0x130 [ 219.187269] [<ffffffff8168d7a3>] SYSC_accept4+0x103/0x210 [ 219.187271] [<ffffffff817ca2ba>] ? _raw_spin_unlock_bh+0x1a/0x20 [ 219.187272] [<ffffffff81692bfc>] ? release_sock+0x8c/0xa0 [ 219.187276] [<ffffffffa0413e22>] ? sctp_inet_listen+0x62/0x1b0 [sctp] [ 219.187277] [<ffffffff8168f2d0>] SyS_accept+0x10/0x20 Fixes: 860fbbc343bf ("sctp: prepare for socket backlog behavior change") Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: use inet_recvmsg to support sctp RFS wellXin Long2016-07-252-2/+2
| | | | | | | | | | | | | | | | | | | | Commit 486bdee0134c ("sctp: add support for RPS and RFS") saves skb->hash into sk->sk_rxhash so that the inet_* can record it to flow table. But sctp uses sock_common_recvmsg as .recvmsg instead of inet_recvmsg, sock_common_recvmsg doesn't invoke sock_rps_record_flow to record the flow. It may cause that the receiver has no chances to record the flow if it doesn't send msg or poll the socket. So this patch fixes it by using inet_recvmsg as .recvmsg in sctp. Fixes: 486bdee0134c ("sctp: add support for RPS and RFS") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: support ipv6 nonlocal bindXin Long2016-07-251-1/+3
| | | | | | | | | | | | This patch makes sctp support ipv6 nonlocal bind by adding sp->inet.freebind and net->ipv6.sysctl.ip_nonlocal_bind check in sctp_v6_available as what sctp did to support ipv4 nonlocal bind (commit cdac4e077489). Reported-by: Shijoe George <spanjikk@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: fix GSO for IPv6Marcelo Ricardo Leitner2016-07-161-1/+22
| | | | | | | | | | | | | commit 90017accff61 ("sctp: Add GSO support") didn't register SCTP GSO offloading for IPv6 and yet didn't put any restrictions on generating GSO packets while in IPv6, which causes all IPv6 GSO'ed packets to be silently dropped. The fix is to properly register the offload this time. Fixes: 90017accff61 ("sctp: Add GSO support") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: recvmsg should be able to run even if sock is in closing stateMarcelo Ricardo Leitner2016-07-161-15/+17
| | | | | | | | | | | | | | | | | | Commit d46e416c11c8 missed to update some other places which checked for the socket being TCP-style AND Established state, as Closing state has some overlapping with the previous understanding of Established. Without this fix, one of the effects is that some already queued rx messages may not be readable anymore depending on how the association teared down, and sending may also not be possible if peer initiated the shutdown. Also merge two if() blocks into one condition on sctp_sendmsg(). Cc: Xin Long <lucien.xin@gmail.com> Fixes: d46e416c11c8 ("sctp: sctp should change socket state when shutdown is received") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: only check for ECN if peer is using itMarcelo Ricardo Leitner2016-07-131-3/+2
| | | | | | | | | | | | Currently only read-only checks are performed up to the point on where we check if peer is ECN capable, checks which we can avoid otherwise. The flag ecn_ce_done is only used to perform this check once per incoming packet, and nothing more. Thus this patch moves the peer check up. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: do not clear chunk->ecn_ce_done flagMarcelo Ricardo Leitner2016-07-131-1/+0
| | | | | | | | | | | We should not clear that flag when switching to a new skb from a GSO skb because it would cause ECN processing to happen multiple times per GSO skb, which is not wanted. Instead, let it be processed once per chunk. That is, in other words, once per IP header available. Fixes: 90017accff61 ("sctp: Add GSO support") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: avoid identifying address family many times for a chunkMarcelo Ricardo Leitner2016-07-134-21/+8
| | | | | | | | | | | | Identifying address family operations during rx path is not something expensive but it's ugly to the eye to have it done multiple times, specially when we already validated it during initial rx processing. This patch takes advantage of the now shared sctp_input_cb and make the pointer to the operations readily available. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: allow GSO frags to access the chunk tooMarcelo Ricardo Leitner2016-07-136-10/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SCTP will try to access original IP headers on sctp_recvmsg in order to copy the addresses used. There are also other places that do similar access to IP or even SCTP headers. But after 90017accff61 ("sctp: Add GSO support") they aren't always there because they are only present in the header skb. SCTP handles the queueing of incoming data by cloning the incoming skb and limiting to only the relevant payload. This clone has its cb updated to something different and it's then queued on socket rx queue. Thus we need to fix this in two moments. For rx path, not related to socket queue yet, this patch uses a partially copied sctp_input_cb to such GSO frags. This restores the ability to access the headers for this part of the code. Regarding the socket rx queue, it removes iif member from sctp_event and also add a chunk pointer on it. With these changes we're always able to reach the headers again. The biggest change here is that now the sctp_chunk struct and the original skb are only freed after the application consumed the buffer. Note however that the original payload was already like this due to the skb cloning. For iif, SCTP's IPv4 code doesn't use it, so no change is necessary. IPv6 now can fetch it directly from original's IPv6 CB as the original skb is still accessible. In the future we probably can simplify sctp_v*_skb_iif() stuff, as sctp_v4_skb_iif() was called but it's return value not used, and now it's not even called, but such cleanup is out of scope for this change. Fixes: 90017accff61 ("sctp: Add GSO support") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: reorder sctp_ulpevent and shrink msg_flagsMarcelo Ricardo Leitner2016-07-131-2/+2
| | | | | | | | | | | The next patch needs 8 bytes in there. sctp_ulpevent has a hole due to bad alignment; msg_flags is using 4 bytes while it actually uses only 2, so we shrink it, and iif member (4 bytes) which can be easily fetched from another place once the next patch is there, so we remove it and thus creating space for 8 bytes. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: allow others to use sctp_input_cbMarcelo Ricardo Leitner2016-07-131-11/+0
| | | | | | | | We process input path in other files too and having access to it is nice, so move it to a header where it's shared. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: implement prsctp PRIO policyXin Long2016-07-114-1/+105
| | | | | | | | | | | | | | | | | | | | | | | | | | prsctp PRIO policy is a policy to abandon lower priority chunks when asoc doesn't have enough snd buffer, so that the current chunk with higher priority can be queued successfully. Similar to TTL/RTX policy, we will set the priority of the chunk to prsctp_param with sinfo->sinfo_timetolive in sctp_set_prsctp_policy(). So if PRIO policy is enabled, msg->expire_at won't work. asoc->sent_cnt_removable will record how many chunks can be checked to remove. If priority policy is enabled, when the chunk is queued into the out_queue, we will increase sent_cnt_removable. When the chunk is moved to abandon_queue or dequeue and free, we will decrease sent_cnt_removable. In sctp_sendmsg, we will check if there is enough snd buffer for current msg and if sent_cnt_removable is not 0. Then try to abandon chunks in sctp_prune_prsctp when sendmsg from the retransmit/transmited queue, and free chunks from out_queue in right order until the abandon+free size > msg_len - sctp_wfree. For the abandon size, we have to wait until it sends FORWARD TSN, receives the sack and the chunks are really freed. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: implement prsctp RTX policyXin Long2016-07-112-0/+6
| | | | | | | | | | | | | | | | | prsctp RTX policy is a policy to abandon chunks when they are retransmitted beyond the max count. This patch uses sent_count to count how many times one chunk has been sent, and prsctp_param is the max rtx count, which is from sinfo->sinfo_timetolive in sctp_set_prsctp_policy(). So similar to TTL policy, if RTX policy is enabled, msg->expire_at won't work. Then in sctp_chunk_abandoned, this patch checks if chunk->sent_count is bigger than chunk->prsctp_param to abandon this chunk. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: implement prsctp TTL policyXin Long2016-07-114-5/+33
| | | | | | | | | | | | | | prsctp TTL policy is a policy to abandon chunks when they expire at the specific time in local stack. It's similar with expires_at in struct sctp_datamsg. This patch uses sinfo->sinfo_timetolive to set the specific time for TTL policy. sinfo->sinfo_timetolive is also used for msg->expires_at. So if prsctp_enable or TTL policy is not enabled, msg->expires_at still works as before. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: add SCTP_PR_ASSOC_STATUS on sctp sockoptXin Long2016-07-111-0/+62
| | | | | | | | | | | | | | | | | | | This patch adds SCTP_PR_ASSOC_STATUS to sctp sockopt, which is used to dump the prsctp statistics info from the asoc. The prsctp statistics includes abandoned_sent/unsent from the asoc. abandoned_sent is the count of the packets we drop packets from retransmit/transmited queue, and abandoned_unsent is the count of the packets we drop from out_queue according to the policy. Note: another option for prsctp statistics dump described in rfc is SCTP_PR_STREAM_STATUS, which is used to dump the prsctp statistics info from each stream. But by now, linux doesn't yet have per stream statistics info, it needs rfc6525 to be implemented. As the prsctp statistics for each stream has to be based on per stream statistics, we will delay it until rfc6525 is done in linux. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: add SCTP_DEFAULT_PRINFO into sctp sockoptXin Long2016-07-111-0/+91
| | | | | | | | | | | | | | | | | This patch adds SCTP_DEFAULT_PRINFO to sctp sockopt. It is used to set/get sctp Partially Reliable Policies' default params, which includes 3 policies (ttl, rtx, prio) and their values. Still, if we set policy params in sndinfo, we will use the params of sndinfo against chunks, instead of the default params. In this patch, we will use 5-8bit of sp/asoc->default_flags to store prsctp policies, and reuse asoc->default_timetolive to store their values. It means if we enable and set prsctp policy, prior ttl timeout in sctp will not work any more. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: add SCTP_PR_SUPPORTED on sctp sockoptXin Long2016-07-114-6/+88
| | | | | | | | | | | | | According to section 4.5 of rfc7496, prsctp_enable should be per asoc. We will add prsctp_enable to both asoc and ep, and replace the places where it used net.sctp->prsctp_enable with asoc->prsctp_enable. ep->prsctp_enable will be initialized with net.sctp->prsctp_enable, and asoc->prsctp_enable will be initialized with ep->prsctp_enable. We can also modify it's value through sockopt SCTP_PR_SUPPORTED. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: fix panic when sending auth chunksMarcelo Ricardo Leitner2016-07-091-3/+15
| | | | | | | | | | | | | | | | When we introduced GSO support, if using auth the auth chunk was being left queued on the packet even after the final segment was generated. Later on sctp_transmit_packet it calls sctp_packet_reset, which zeroed the packet len while not accounting for this left-over. This caused more space to be used the next packet due to the chunk still being queued, but space which wasn't allocated as its size wasn't accounted. The fix is to only queue it back when we know that we are going to generate another segment. Fixes: 90017accff61 ("sctp: Add GSO support") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2016-06-301-6/+0
|\ | | | | | | | | | | | | | | Several cases of overlapping changes, except the packet scheduler conflicts which deal with the addition of the free list parameter to qdisc_enqueue(). Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: diag: add missing declarationsBen Dooks2016-06-101-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | The functions inet_diag_msg_common_fill and inet_diag_msg_attrs_fill seem to have been missed from the include/linux/inet_diag.h header file. Add them to fix the following warnings: net/ipv4/inet_diag.c:69:6: warning: symbol 'inet_diag_msg_common_fill' was not declared. Should it be static? net/ipv4/inet_diag.c:108:5: warning: symbol 'inet_diag_msg_attrs_fill' was not declared. Should it be static? Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sctp: change sk state to CLOSED instead of CLOSING in sctp_sock_migrateXin Long2016-06-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Commit d46e416c11c8 ("sctp: sctp should change socket state when shutdown is received") may set sk_state CLOSING in sctp_sock_migrate, but inet_accept doesn't allow the sk_state other than ESTABLISHED/ CLOSED for sctp. So we will change sk_state to CLOSED, instead of CLOSING, as actually sk is closed already there. Fixes: d46e416c11c8 ("sctp: sctp should change socket state when shutdown is received") Reported-by: Ye Xiaolong <xiaolong.ye@intel.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sctp: fix error return code in sctp_init()Wei Yongjun2016-06-141-1/+2
| | | | | | | | | | | | | | | | | | | | Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Acked-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sctp: sctp should change socket state when shutdown is receivedXin Long2016-06-102-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now sctp doesn't change socket state upon shutdown reception. It changes just the assoc state, even though it's a TCP-style socket. For some cases, if we really need to check sk->sk_state, it's necessary to fix this issue, at least when we use ss or netstat to dump, we can get a more exact information. As an improvement, we will change sk->sk_state when we change asoc->state to SHUTDOWN_RECEIVED, and also do it in sctp_shutdown to keep consistent with sctp_close. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sctp: Fix warning in sctp_packet_transmit_chunk()David S. Miller2016-06-031-1/+1
| | | | | | | | | | | | | | size_t objects should be printed with %Z printf format. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sctp: improve debug message to also log curr pkt and new chunk sizeMarcelo Ricardo Leitner2016-06-031-1/+2
| | | | | | | | | | | | | | | | This is useful for debugging packet sizes. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sctp: Add GSO supportMarcelo Ricardo Leitner2016-06-037-124/+408
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SCTP has this pecualiarity that its packets cannot be just segmented to (P)MTU. Its chunks must be contained in IP segments, padding respected. So we can't just generate a big skb, set gso_size to the fragmentation point and deliver it to IP layer. This patch takes a different approach. SCTP will now build a skb as it would be if it was received using GRO. That is, there will be a cover skb with protocol headers and children ones containing the actual segments, already segmented to a way that respects SCTP RFCs. With that, we can tell skb_segment() to just split based on frag_list, trusting its sizes are already in accordance. This way SCTP can benefit from GSO and instead of passing several packets through the stack, it can pass a single large packet. v2: - Added support for receiving GSO frames, as requested by Dave Miller. - Clear skb->cb if packet is GSO (otherwise it's not used by SCTP) - Added heuristics similar to what we have in TCP for not generating single GSO packets that fills cwnd. v3: - consider sctphdr size in skb_gso_transport_seglen() - rebased due to 5c7cdf339af5 ("gso: Remove arbitrary checks for unsupported GSO") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sctp: delay as much as possible skb_linearizeMarcelo Ricardo Leitner2016-06-032-31/+43
|/ | | | | | | | | | | | | | | | | | | | | | This patch is a preparation for the GSO one. In order to successfully handle GSO packets on rx path we must not call skb_linearize, otherwise it defeats any gain GSO may have had. This patch thus delays as much as possible the call to skb_linearize, leaving it to sctp_inq_pop() moment. For that the sanity checks performed now know how to deal with fragments. One positive side-effect of this is that if the socket is backlogged it will have the chance of doing it on backlog processing instead of during softirq. With this move, it's evident that a check for non-linearity in sctp_inq_pop was ineffective and is now removed. Note that a similar check is performed a bit below this one. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: sctp_diag should dump sctp socket typeXin Long2016-05-311-0/+1
| | | | | | | | | | | | | | | | | | | | | Now we cannot distinguish that one sk is a udp or sctp style when we use ss to dump sctp_info. it's necessary to dump it as well. For sctp_diag, ss support is not officially available, thus there are no official users of this yet, so we can add this field in the middle of sctp_info without breaking user API. v1->v2: - move 'sctpi_s_type' field to the end of struct sctp_info, so that it won't cause incompatibility with applications already built. - add __reserved3 in sctp_info to make sure sctp_info is 8-byte alignment. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: fix double EPs display in sctp_diagXin Long2016-05-251-0/+3
| | | | | | | | | | | | | | | | | | | | | | | We have this situation: that EP hash table, contains only the EPs that are listening, while the transports one, has the opposite. We have to traverse both to dump all. But when we traverse the transports one we will also get EPs that are in the EP hash if they are listening. In this case, the EP is dumped twice. We will fix it by checking if the endpoint that is in the endpoint hash table contains any ep->asoc in there, as it means we will also find it via transport hash, and thus we can/should skip it, depending on the filters used, like 'ss -l'. Still, we should NOT skip it if the user is listing only listening endpoints, because then we are not traversing the transport hash. so we have to check idiag_states there also. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: prepare for socket backlog behavior changeEric Dumazet2016-05-021-0/+2
| | | | | | | | | | | | sctp_inq_push() will soon be called without BH being blocked when generic socket code flushes the socket backlog. It is very possible SCTP can be converted to not rely on BH, but this needs to be done by SCTP experts. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: signal sk_data_ready earlier on data chunks receptionMarcelo Ricardo Leitner2016-05-012-13/+19
| | | | | | | | | | | | | | | Dave Miller pointed out that fb586f25300f ("sctp: delay calls to sk_data_ready() as much as possible") may insert latency specially if the receiving application is running on another CPU and that it would be better if we signalled as early as possible. This patch thus basically inverts the logic on fb586f25300f and signals it as early as possible, similar to what we had before. Fixes: fb586f25300f ("sctp: delay calls to sk_data_ready() as much as possible") Reported-by: Dave Miller <davem@davemloft.net> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: rename NET_{ADD|INC}_STATS_BH()Eric Dumazet2016-04-271-1/+1
| | | | | | | | Rename NET_INC_STATS_BH() to __NET_INC_STATS() and NET_ADD_STATS_BH() to __NET_ADD_STATS() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: rename ICMP6_INC_STATS_BH()Eric Dumazet2016-04-271-1/+1
| | | | | | | Rename ICMP6_INC_STATS_BH() to __ICMP6_INC_STATS() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sctp: rename SCTP_INC_STATS_BH()Eric Dumazet2016-04-271-6/+6
| | | | | | | Rename SCTP_INC_STATS_BH() to __SCTP_INC_STATS() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: rename ICMP_INC_STATS_BH()Eric Dumazet2016-04-271-1/+1
| | | | | | | Rename ICMP_INC_STATS_BH() to __ICMP_INC_STATS() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: snmp: kill various STATS_USER() helpersEric Dumazet2016-04-271-1/+1
| | | | | | | | | | | | | | | | | | | | | In the old days (before linux-3.0), SNMP counters were duplicated, one for user context, and one for BH context. After commit 8f0ea0fe3a03 ("snmp: reduce percpu needs by 50%") we have a single copy, and what really matters is preemption being enabled or disabled, since we use this_cpu_inc() or __this_cpu_inc() respectively. We therefore kill SNMP_INC_STATS_USER(), SNMP_ADD_STATS_USER(), NET_INC_STATS_USER(), NET_ADD_STATS_USER(), SCTP_INC_STATS_USER(), SNMP_INC_STATS64_USER(), SNMP_ADD_STATS64_USER(), TCP_ADD_STATS_USER(), UDP_INC_STATS_USER(), UDP6_INC_STATS_USER(), and XFRM_INC_STATS_USER() Following patches will rename __BH helpers to make clear their usage is not tied to BH being disabled. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: sctp_diag should fill RMEM_ALLOC with asoc->rmem_alloc when ↵Xin Long2016-04-261-1/+5
| | | | | | | | | | | rcvbuf_policy is set For sctp assoc, when rcvbuf_policy is set, it will has it's own rmem_alloc, when we dump asoc info in sctp_diag, we should use that value on RMEM_ALLOC as well, just like WMEM_ALLOC. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sock_diag: align nlattr properly when neededNicolas Dichtel2016-04-261-2/+3
| | | | | | | | | I also fix the value of INET_DIAG_MAX. It's wrong since commit 8f840e47f190 which is only in net-next right now, thus I didn't make a separate patch. Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2016-04-234-33/+40
|\ | | | | | | | | | | | | | | | | | | | | Conflicts were two cases of simple overlapping changes, nothing serious. In the UDP case, we need to add a hlist_add_tail_rcu() to linux/rculist.h, because we've moved UDP socket handling away from using nulls lists. Signed-off-by: David S. Miller <davem@davemloft.net>