summaryrefslogtreecommitdiffstats
path: root/net/netfilter
Commit message (Collapse)AuthorAgeFilesLines
* netfilter: nft_set_bitmap: incorrect bitmap sizePablo Neira Ayuso2017-02-261-1/+1
| | | | | | | | priv->bitmap_size stores the real bitmap size, instead of the full struct nft_bitmap object. Fixes: 665153ff5752 ("netfilter: nf_tables: add bitmap set type") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: nf_ct_expect: Change __nf_ct_expect_check() return value.Jarno Rajahalme2017-02-261-2/+2
| | | | | | | | | | | | | Commit 4dee62b1b9b4 ("netfilter: nf_ct_expect: nf_ct_expect_insert() returns void") inadvertently changed the successful return value of nf_ct_expect_related_report() from 0 to 1 due to __nf_ct_expect_check() returning 1 on success. Prevent this regression in the future by changing the return value of __nf_ct_expect_check() to 0 on success. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: nf_ct_expect: nf_ct_expect_related_report(): Return zero on success.Jarno Rajahalme2017-02-251-1/+1
| | | | | | | | | | | | | | Commit 4dee62b1b9b4 ("netfilter: nf_ct_expect: nf_ct_expect_insert() returns void") inadvertently changed the successful return value of nf_ct_expect_related_report() from 0 to 1, which caused openvswitch conntrack integration fail in FTP test cases. Fix this by always returning zero on the success code path. Fixes: 4dee62b1b9b4 ("netfilter: nf_ct_expect: nf_ct_expect_insert() returns void") Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: nft_ct: fix random validation errors for zone set supportFlorian Westphal2017-02-231-0/+1
| | | | | | | | | | | Dan reports: net/netfilter/nft_ct.c:549 nft_ct_set_init() error: uninitialized symbol 'len'. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: edee4f1e924582 ("netfilter: nft_ct: add zone id set support") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller2017-02-236-39/+81
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree, they are: 1) Revisit warning logic when not applying default helper assignment. Jiri Kosina considers we are breaking existing setups and not warning our users accordinly now that automatic helper assignment has been turned off by default. So let's make him happy by spotting the warning by when we find a helper but we cannot attach, instead of warning on the former deprecated behaviour. Patch from Jiri Kosina. 2) Two patches to fix regression in ctnetlink interfaces with nfnetlink_queue. Specifically, perform more relaxed in CTA_STATUS and do not bail out if CTA_HELP indicates the same helper that we already have. Patches from Kevin Cernekee. 3) A couple of bugfixes for ipset via Jozsef Kadlecsik. Due to wrong index logic in hash set types and null pointer exception in the list:set type. 4) hashlimit bails out with correct userspace parameters due to wrong arithmetics in the code that avoids "divide by zero" when transforming the userspace timing in milliseconds to token credits. Patch from Alban Browaeys. 5) Fix incorrect NFQA_VLAN_MAX definition, patch from Ken-ichirou MATSUZAWA. 6) Don't not declare nfnetlink batch error list as static, since this may be used by several subsystems at the same time. Patch from Liping Zhang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge branch 'master' of git://blackhole.kfki.hu/nfPablo Neira Ayuso2017-02-212-4/+7
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jozsef Kadlecsik says: ==================== ipset patches for nf Please apply the next patches for ipset in your nf branch. Both patches should go into the stable kernel branches as well, because these are important bugfixes: * Sometimes valid entries in hash:* types of sets were evicted due to a typo in an index. The wrong evictions happen when entries are deleted from the set and the bucket is shrinked. Bug was reported by Eric Ewanco and the patch fixes netfilter bugzilla id #1119. * Fixing of a null pointer exception when someone wants to add an entry to an empty list type of set and specifies an add before/after option. The fix is from Vishwanath Pai. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * netfilter: ipset: Null pointer exception in ipset list:setVishwanath Pai2017-02-191-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we use before/after to add an element to an empty list it will cause a kernel panic. $> cat crash.restore create a hash:ip create b hash:ip create test list:set timeout 5 size 4 add test b before a $> ipset -R < crash.restore Executing the above will crash the kernel. Signed-off-by: Vishwanath Pai <vpai@akamai.com> Reviewed-by: Josh Hunt <johunt@akamai.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
| | * Fix bug: sometimes valid entries in hash:* types of sets were evictedJozsef Kadlecsik2017-02-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Wrong index was used and therefore when shrinking a hash bucket at deleting an entry, valid entries could be evicted as well. Thanks to Eric Ewanco for the thorough bugreport. Fixes netfilter bugzilla #1119 Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
| * | netfilter: nfnetlink: remove static declaration from err_listLiping Zhang2017-02-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Otherwise, different subsys will race to access the err_list, with holding the different nfnl_lock(subsys_id). But this will not happen now, since ->call_batch is only implemented by nftables, so the err_list is protected by nfnl_lock(NFNL_SUBSYS_NFTABLES). Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: xt_hashlimit: Fix integer divide round to zero.Alban Browaeys2017-02-191-16/+9
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Diving the divider by the multiplier before applying to the input. When this would "divide by zero", divide the multiplier by the divider first then multiply the input by this value. Currently user2creds outputs zero when input value is bigger than the number of slices and lower than scale. This as then user input is applied an integer divide operation to a number greater than itself (scale). That rounds up to zero, then we multiply zero by the credits slice size. iptables -t filter -I INPUT --protocol tcp --match hashlimit --hashlimit 40/second --hashlimit-burst 20 --hashlimit-mode srcip --hashlimit-name syn-flood --jump RETURN thus trigger the overflow detection code: xt_hashlimit: overflow, try lower: 25000/20 (25000 as hashlimit avg and 20 the burst) Here: 134217 slices of (HZ * CREDITS_PER_JIFFY) size. 500000 is user input value 1000000 is XT_HASHLIMIT_SCALE_v2 gives: 0 as user2creds output Setting burst to "1" typically solve the issue ... but setting it to "40" does too ! This is on 32bit arch calling into revision 2 of hashlimit. Signed-off-by: Alban Browaeys <alban.browaeys@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: ctnetlink: Fix regression in CTA_HELP processingKevin Cernekee2017-02-061-4/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to Linux 4.4, it was usually harmless to send a CTA_HELP attribute containing the name of the current helper. That is no longer the case: as of Linux 4.4, if ctnetlink_change_helper() returns an error from the ct->master check, processing of the request will fail, skipping the NFQA_EXP attribute (if present). This patch changes the behavior to improve compatibility with user programs that expect the kernel interface to work the way it did prior to Linux 4.4. If a user program specifies CTA_HELP but the argument matches the current conntrack helper name, ignore it instead of generating an error. Signed-off-by: Kevin Cernekee <cernekee@chromium.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: ctnetlink: Fix regression in CTA_STATUS processingKevin Cernekee2017-02-061-1/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The libnetfilter_conntrack userland library always sets IPS_CONFIRMED when building a CTA_STATUS attribute. If this toggles the bit from 0->1, the parser will return an error. On Linux 4.4+ this will cause any NFQA_EXP attribute in the packet to be ignored. This breaks conntrackd's userland helpers because they operate on unconfirmed connections. Instead of returning -EBUSY if the user program asks to modify an unchangeable bit, simply ignore the change. Also, fix the logic so that user programs are allowed to clear the bits that they are allowed to change. Signed-off-by: Kevin Cernekee <cernekee@chromium.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: nf_ct_helper: warn when not applying default helper assignmentJiri Kosina2017-02-061-13/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 3bb398d925 ("netfilter: nf_ct_helper: disable automatic helper assignment") is causing behavior regressions in firewalls, as traffic handled by conntrack helpers is now by default not passed through even though it was before due to missing CT targets (which were not necessary before this commit). The default had to be switched off due to security reasons [1] [2] and therefore should stay the way it is, but let's be friendly to firewall admins and issue a warning the first time we're in situation where packet would be likely passed through with the old default but we're likely going to drop it on the floor now. Rewrite the code a little bit as suggested by Linus, so that we avoid spaghettiing the code even more -- namely the whole decision making process regarding helper selection (either automatic or not) is being separated, so that the whole logic can be simplified and code (condition) duplication reduced. [1] https://cansecwest.com/csw12/conntrack-attack.pdf [2] https://home.regit.org/netfilter-en/secure-use-of-helpers/ Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | netfilter: nf_tables: honor NFT_SET_OBJECT in set backend selectionPablo Neira Ayuso2017-02-123-3/+4
| | | | | | | | | | | | | | Check for NFT_SET_OBJECT feature flag, otherwise we may end up selecting the wrong set backend. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | netfilter: nf_tables: add NFTA_RULE_ID attributePablo Neira Ayuso2017-02-121-0/+26
| | | | | | | | | | | | | | | | | | | | This new attribute allows us to uniquely identify a rule in transaction. Robots may trigger an insertion followed by deletion in a batch, in that scenario we still don't have a public rule handle that we can use to delete the rule. This is similar to the NFTA_SET_ID attribute that allows us to refer to an anonymous set from a batch. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | netfilter: nf_tables: add check_genid to the nfnetlink subsystemPablo Neira Ayuso2017-02-121-0/+6
| | | | | | | | | | | | | | | | This patch implements the check generation id as provided by nfnetlink. This allows us to reject ruleset updates against stale baseline, so userspace can retry update with a fresh ruleset cache. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | netfilter: nfnetlink: allow to check for generation IDPablo Neira Ayuso2017-02-121-4/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch allows userspace to specify the generation ID that has been used to build an incremental batch update. If userspace specifies the generation ID in the batch message as attribute, then nfnetlink compares it to the current generation ID so you make sure that you work against the right baseline. Otherwise, bail out with ERESTART so userspace knows that its changeset is stale and needs to respin. Userspace can do this transparently at the cost of taking slightly more time to refresh caches and rework the changeset. This check is optional, if there is no NFNL_BATCH_GENID attribute in the batch begin message, then no check is performed. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | netfilter: nfnetlink: add nfnetlink_rcv_skb_batch()Pablo Neira Ayuso2017-02-121-23/+28
| | | | | | | | | | | | | | Add new nfnetlink_rcv_skb_batch() to wrap initial nfnetlink batch handling. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | netfilter: nfnetlink: get rid of u_intX_t typesPablo Neira Ayuso2017-02-121-8/+8
| | | | | | | | | | | | Use uX types instead. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | netfilter: nf_ct_expect: nf_ct_expect_insert() returns voidGao Feng2017-02-121-5/+3
| | | | | | | | | | | | | | | | | | Because nf_ct_expect_insert() always succeeds now, its return value can be just void instead of int. And remove code that checks for its return value. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | netfilter: nf_ct_sip: Use mod_timer_pending()Gao Feng2017-02-121-7/+5
| | | | | | | | | | | | | | | | timer_del() followed by timer_add() can be replaced by mod_timer_pending(). Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | netfilter: nft_exthdr: add TCP option matchingManuel Messner2017-02-082-15/+108
| | | | | | | | | | | | | | | | | | This patch implements the kernel side of the TCP option patch. Signed-off-by: Manuel Messner <mm@skelett.io> Reviewed-by: Florian Westphal <fw@strlen.de> Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | netfilter: nft_ct: add zone id set supportFlorian Westphal2017-02-081-1/+143
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | zones allow tracking multiple connections sharing identical tuples, this is needed e.g. when tracking distinct vlans with overlapping ip addresses (conntrack is l2 agnostic). Thus the zone has to be set before the packet is picked up by the connection tracker. This is done by means of 'conntrack templates' which are conntrack structures used solely to pass this info from one netfilter hook to the next. The iptables CT target instantiates these connection tracking templates once per rule, i.e. the template is fixed/tied to particular zone, can be read-only and therefore be re-used by as many skbs simultaneously as needed. We can't follow this model because we want to take the zone id from an sreg at rule eval time so we could e.g. fill in the zone id from the packets vlan id or a e.g. nftables key : value maps. To avoid cost of per packet alloc/free of the template, use a percpu template 'scratch' object and use the refcount to detect the (unlikely) case where the template is still attached to another skb (i.e., previous skb was nfqueued ...). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | netfilter: nft_ct: prepare for key-dependent error unwindFlorian Westphal2017-02-081-14/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Next patch will add ZONE_ID set support which will need similar error unwind (put operation) as conntrack labels. Prepare for this: remove the 'label_got' boolean in favor of a switch statement that can be extended in next patch. As we already have that in the set_destroy function place that in a separate function and call it from the set init function. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | netfilter: nft_ct: add zone id get supportFlorian Westphal2017-02-081-3/+19
| | | | | | | | | | | | | | | | | | | | | | | | Just like with counters the direction attribute is optional. We set priv->dir to MAX unconditionally to avoid duplicating the assignment for all keys with optional direction. For keys where direction is mandatory, existing code already returns an error. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | netfilter: nf_tables: add bitmap set typePablo Neira Ayuso2017-02-083-0/+321
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a new bitmap set type. This bitmap uses two bits to represent one element. These two bits determine the element state in the current and the future generation that fits into the nf_tables commit protocol. When dumping elements back to userspace, the two bits are expanded into a struct nft_set_ext object. If no NFTA_SET_DESC_SIZE is specified, the existing automatic set backend selection prefers bitmap over hash in case of keys whose size is <= 16 bit. If the set size is know, the bitmap set type is selected if with 16 bit kets and more than 390 elements in the set, otherwise the hash table set implementation is used. For 8 bit keys, the bitmap consumes 66 bytes. For 16 bit keys, the bitmap takes 16388 bytes. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | netfilter: nf_tables: add space notation to setsPablo Neira Ayuso2017-02-083-5/+19
| | | | | | | | | | | | | | | | | | | | | | | | The space notation allows us to classify the set backend implementation based on the amount of required memory. This provides an order of the set representation scalability in terms of memory. The size field is still left in place so use this if the userspace provides no explicit number of elements, so we cannot calculate the real memory that this set needs. This also helps us break ties in the set backend selection routine, eg. two backend implementations provide the same performance. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | netfilter: nf_tables: rename struct nft_set_estimate class fieldPablo Neira Ayuso2017-02-083-8/+8
| | | | | | | | | | | | | | Use lookup as field name instead, to prepare the introduction of the memory class in a follow up patch. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | netfilter: nf_tables: add flush field to struct nft_set_iterPablo Neira Ayuso2017-02-081-0/+4
| | | | | | | | | | | | | | | | | | | | This provides context to walk callback iterator, thus, we know if the walk happens from the set flush path. This is required by the new bitmap set type coming in a follow up patch which has no real struct nft_set_ext, so it has to allocate it based on the two bit compact element representation. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | netfilter: nf_tables: rename deactivate_one() to flush()Pablo Neira Ayuso2017-02-083-9/+9
| | | | | | | | | | | | | | | | Although semantics are similar to deactivate() with no implicit element lookup, this is only called from the set flush path, so better rename this to flush(). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | netfilter: nf_tables: use struct nft_set_iter in set element flushPablo Neira Ayuso2017-02-081-7/+5
| | | | | | | | | | | | | | Instead of struct nft_set_dump_args, remove unnecessary wrapper structure. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | netfilter: nf_tables: pass netns to set->ops->remove()Pablo Neira Ayuso2017-02-083-5/+7
| | | | | | | | | | | | | | This new parameter is required by the new bitmap set type that comes in a follow up patch. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | netfilter: nft_exthdr: Add support for existence checkPhil Sutter2017-02-081-2/+20
| | | | | | | | | | | | | | | | | | If NFT_EXTHDR_F_PRESENT is set, exthdr will not copy any header field data into *dest, but instead set it to 1 if the header is found and 0 otherwise. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller2017-02-0331-513/+461
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree, they are: 1) Stash ctinfo 3-bit field into pointer to nf_conntrack object from sk_buff so we only access one single cacheline in the conntrack hotpath. Patchset from Florian Westphal. 2) Don't leak pointer to internal structures when exporting x_tables ruleset back to userspace, from Willem DeBruijn. This includes new helper functions to copy data to userspace such as xt_data_to_user() as well as conversions of our ip_tables, ip6_tables and arp_tables clients to use it. Not surprinsingly, ebtables requires an ad-hoc update. There is also a new field in x_tables extensions to indicate the amount of bytes that we copy to userspace. 3) Add nf_log_all_netns sysctl: This new knob allows you to enable logging via nf_log infrastructure for all existing netnamespaces. Given the effort to provide pernet syslog has been discontinued, let's provide a way to restore logging using netfilter kernel logging facilities in trusted environments. Patch from Michal Kubecek. 4) Validate SCTP checksum from conntrack helper, from Davide Caratti. 5) Merge UDPlite conntrack and NAT helpers into UDP, this was mostly a copy&paste from the original helper, from Florian Westphal. 6) Reset netfilter state when duplicating packets, also from Florian. 7) Remove unnecessary check for broadcast in IPv6 in pkttype match and nft_meta, from Liping Zhang. 8) Add missing code to deal with loopback packets from nft_meta when used by the netdev family, also from Liping. 9) Several cleanups on nf_tables, one to remove unnecessary check from the netlink control plane path to add table, set and stateful objects and code consolidation when unregister chain hooks, from Gao Feng. 10) Fix harmless reference counter underflow in IPVS that, however, results in problems with the introduction of the new refcount_t type, from David Windsor. 11) Enable LIBCRC32C from nf_ct_sctp instead of nf_nat_sctp, from Davide Caratti. 12) Missing documentation on nf_tables uapi header, from Liping Zhang. 13) Use rb_entry() helper in xt_connlimit, from Geliang Tang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | netfilter: allow logging from non-init namespacesMichal Kubeček2017-02-021-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 69b34fb996b2 ("netfilter: xt_LOG: add net namespace support for xt_LOG") disabled logging packets using the LOG target from non-init namespaces. The motivation was to prevent containers from flooding kernel log of the host. The plan was to keep it that way until syslog namespace implementation allows containers to log in a safe way. However, the work on syslog namespace seems to have hit a dead end somewhere in 2013 and there are users who want to use xt_LOG in all network namespaces. This patch allows to do so by setting /proc/sys/net/netfilter/nf_log_all_netns to a nonzero value. This sysctl is only accessible from init_net so that one cannot switch the behaviour from inside a container. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | ipvs: free ip_vs_dest structs when refcnt=0David Windsor2017-02-021-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the ip_vs_dest cache frees ip_vs_dest objects when their reference count becomes < 0. Aside from not being semantically sound, this is problematic for the new type refcount_t, which will be introduced shortly in a separate patch. refcount_t is the new kernel type for holding reference counts, and provides overflow protection and a constrained interface relative to atomic_t (the type currently being used for kernel reference counts). Per Julian Anastasov: "The problem is that dest_trash currently holds deleted dests (unlinked from RCU lists) with refcnt=0." Changing dest_trash to hold dest with refcnt=1 will allow us to free ip_vs_dest structs when their refcnt=0, in ip_vs_dest_put_and_free(). Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: merge ctinfo into nfct pointer storage areaFlorian Westphal2017-02-024-8/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After this change conntrack operations (lookup, creation, matching from ruleset) only access one instead of two sk_buff cache lines. This works for normal conntracks because those are allocated from a slab that guarantees hw cacheline or 8byte alignment (whatever is larger) so the 3 bits needed for ctinfo won't overlap with nf_conn addresses. Template allocation now does manual address alignment (see previous change) on arches that don't have sufficent kmalloc min alignment. Some spots intentionally use skb->_nfct instead of skb_nfct() helpers, this is to avoid undoing the skb_nfct() use when we remove untracked conntrack object in the future. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: guarantee 8 byte minalign for template addressesFlorian Westphal2017-02-021-5/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The next change will merge skb->nfct pointer and skb->nfctinfo status bits into single skb->_nfct (unsigned long) area. For this to work nf_conn addresses must always be aligned at least on an 8 byte boundary since we will need the lower 3bits to store nfctinfo. Conntrack templates are allocated via kmalloc. kbuild test robot reported BUILD_BUG_ON failed: NFCT_INFOMASK >= ARCH_KMALLOC_MINALIGN on v1 of this patchset, so not all platforms meet this requirement. Do manual alignment if needed, the alignment offset is stored in the nf_conn entry protocol area. This works because templates are not handed off to L4 protocol trackers. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: add and use nf_ct_set helperFlorian Westphal2017-02-023-14/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | Add a helper to assign a nf_conn entry and the ctinfo bits to an sk_buff. This avoids changing code in followup patch that merges skb->nfct and skb->nfctinfo into skb->_nfct. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | skbuff: add and use skb_nfct helperFlorian Westphal2017-02-023-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | Followup patch renames skb->nfct and changes its type so add a helper to avoid intrusive rename change later. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: reduce direct skb->nfct usageFlorian Westphal2017-02-021-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | Next patch makes direct skb->nfct access illegal, reduce noise in next patch by using accessors we already have. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: conntrack: no need to pass ctinfo to error handlerFlorian Westphal2017-02-025-7/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is never accessed for reading and the only places that write to it are the icmp(6) handlers, which also set skb->nfct (and skb->nfctinfo). The conntrack core specifically checks for attached skb->nfct after ->error() invocation and returns early in this case. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: nf_tables: Eliminate duplicated code in nf_tables_table_enable()Feng2017-02-021-23/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If something fails in nf_tables_table_enable(), it unregisters the chains. But the rollback code is the same as nf_tables_table_disable() almostly, except there is one counter check. Now create one wrapper function to eliminate the duplicated codes. Signed-off-by: Feng <fgao@ikuai8.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: nf_tables: eliminate useless condition checksGao Feng2017-01-181-12/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The return value of nf_tables_table_lookup() is valid pointer or one pointer error. There are two cases: 1) IS_ERR(table) is true, it would return the error or reset the table as NULL, it is unnecessary to perform the latter check "table != NULL". 2) IS_ERR(obj) is false, the table is one valid pointer. It is also unnecessary to perform that check. The nf_tables_newset() and nf_tables_newobj() have same logic codes. In summary, we could move the block of condition check "table != NULL" in the else block to eliminate the original condition checks. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: nft_meta: deal with PACKET_LOOPBACK in netdev familyLiping Zhang2017-01-181-1/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After adding the following nft rule, then ping 224.0.0.1: # nft add rule netdev t c pkttype host counter The warning complain message will be printed out again and again: WARNING: CPU: 0 PID: 10182 at net/netfilter/nft_meta.c:163 \ nft_meta_get_eval+0x3fe/0x460 [nft_meta] [...] Call Trace: <IRQ> dump_stack+0x85/0xc2 __warn+0xcb/0xf0 warn_slowpath_null+0x1d/0x20 nft_meta_get_eval+0x3fe/0x460 [nft_meta] nft_do_chain+0xff/0x5e0 [nf_tables] So we should deal with PACKET_LOOPBACK in netdev family too. For ipv4, convert it to PACKET_BROADCAST/MULTICAST according to the destination address's type; For ipv6, convert it to PACKET_MULTICAST directly. Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: pkttype: unnecessary to check ipv6 multicast addressLiping Zhang2017-01-182-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Since there's no broadcast address in IPV6, so in ipv6 family, the PACKET_LOOPBACK must be multicast packets, there's no need to check it again. Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | xtables: extend matches and targets with .usersizeWillem de Bruijn2017-01-0911-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In matches and targets that define a kernel-only tail to their xt_match and xt_target data structs, add a field .usersize that specifies up to where data is to be shared with userspace. Performed a search for comment "Used internally by the kernel" to find relevant matches and targets. Manually inspected the structs to derive a valid offsetof. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | xtables: use match, target and data copy_to_user helpers in compatWillem de Bruijn2017-01-091-10/+4
| | | | | | | | | | | | | | | | | | | | | | | | Convert compat to copying entries, matches and targets one by one, using the xt_match_to_user and xt_target_to_user helper functions. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | xtables: add xt_match, xt_target and data copy_to_user functionsWillem de Bruijn2017-01-091-0/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xt_entry_target, xt_entry_match and their private data may contain kernel data. Introduce helper functions xt_match_to_user, xt_target_to_user and xt_data_to_user that copy only the expected fields. These replace existing logic that calls copy_to_user on entire structs, then overwrites select fields. Private data is defined in xt_match and xt_target. All matches and targets that maintain kernel data store this at the tail of their private structure. Extend xt_match and xt_target with .usersize to limit how many bytes of data are copied. The remainder is cleared. If compatsize is specified, usersize can only safely be used if all fields up to usersize use platform-independent types. Otherwise, the compat_to_user callback must be defined. This patch does not yet enable the support logic. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: xt_connlimit: use rb_entry()Geliang Tang2017-01-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | To make the code clearer, use rb_entry() instead of container_of() to deal with rbtree. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>