summaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/mellanox
Commit message (Collapse)AuthorAgeFilesLines
* mlxsw: spectrum_router: Avoid potential packets lossIdo Schimmel2017-03-011-10/+20
| | | | | | | | | | | | | When the structure of the LPM tree changes (f.e., due to the addition of a new prefix), we unbind the old tree and then bind the new one. This may result in temporary packet loss. Instead, overwrite the old binding with the new one. Fixes: 6b75c4807db3 ("mlxsw: spectrum_router: Add virtual router management") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/mlx4_en: fix overflow in mlx4_en_init_timestamp()Eric Dumazet2017-02-262-11/+8
| | | | | | | | | | | | | | | | | | | | | | | The cited commit makes a great job of finding optimal shift/multiplier values assuming a 10 seconds wrap around, but forgot to change the overflow_period computation. It overflows in cyclecounter_cyc2ns(), and the final result is 804 ms, which is silly. Lets simply use 5 seconds, no need to recompute this, given how it is supposed to work. Later, we will use a timer instead of a work queue, since the new RX allocation schem will no longer need mlx4_en_recover_from_oom() and the service_task firing every 250 ms. Fixes: 31c128b66e5b ("net/mlx4_en: Choose time-stamping shift value according to HW frequency") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Eugenia Emantayev <eugenia@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2017-02-2316-49/+73
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: 1) Some 'const'ing in qlogic networking drivers, from Bhumika Goyal. 2) Fix scheduling while atomic in l2tp network namespace exit by deferring the work to the workqueue. From Ridge Kennedy. 3) Fix use after free in dccp timewait handling, from Andrey Ryabinin. 4) mlx5e CQE compression engine not initialized properly, from Tariq Toukan. 5) Some UAPI header fixes from Dmitry V. Levin. 6) Don't overwrite module parameter value in mlx4 driver, from Majd Dibbiny. 7) Fix divide by zero in xt_hashlimit netfilter module, from Alban Browaeys. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (35 commits) bpf: Fix bpf_xdp_event_output net/mlx4_en: Use __skb_fill_page_desc() net/mlx4_core: Use cq quota in SRIOV when creating completion EQs net/mlx4_core: Fix VF overwrite of module param which disables DMFS on new probed PFs net/mlx4: Spoofcheck and zero MAC can't coexist net/mlx4: Change ENOTSUPP to EOPNOTSUPP uapi: fix linux/rds.h userspace compilation errors uapi: fix linux/seg6.h and linux/seg6_iptunnel.h userspace compilation errors lib: Remove string from parman config selection forcedeth: Remove return from a void function bpf: fix spelling mistake: "proccessed" -> "processed" uapi: fix linux/llc.h userspace compilation error uapi: fix linux/ip6_tunnel.h userspace compilation errors net/mlx5e: Fix wrong CQE decompression net/mlx5e: Update MPWQE stride size when modifying CQE compress state net/mlx5e: Fix broken CQE compression initialization net/mlx5e: Do not reduce LRO WQE size when not using build_skb net/mlx5e: Register/unregister vport representors on interface attach/detach net/mlx5e: s390 system compilation fix tcp: account for ts offset only if tsecr not zero ...
| * net/mlx4_en: Use __skb_fill_page_desc()Eric Dumazet2017-02-231-4/+4
| | | | | | | | | | | | | | | | | | Or we might miss the fact that a page was allocated from memory reserves. Fixes: dceeab0e5258 ("mlx4: support __GFP_MEMALLOC for rx") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/mlx4_core: Use cq quota in SRIOV when creating completion EQsJack Morgenstein2017-02-232-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When creating EQs to handle CQ completion events for the PF or for VFs, we create enough EQE entries to handle completions for the max number of CQs that can use that EQ. When SRIOV is activated, the max number of CQs a VF (or the PF) can obtain is its CQ quota (determined by the Hypervisor resource tracker). Therefore, when creating an EQ, the number of EQE entries that the VF should request for that EQ is the CQ quota value (and not the total number of CQs available in the FW). Under SRIOV, the PF, also must use its CQ quota, because the resource tracker also controls how many CQs the PF can obtain. Using the FW total CQs instead of the CQ quota when creating EQs resulted wasting MTT entries, due to allocating more EQEs than were needed. Fixes: 5a0d0a6161ae ("mlx4: Structures and init/teardown for VF resource quotas") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Reported-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/mlx4_core: Fix VF overwrite of module param which disables DMFS on new ↵Majd Dibbiny2017-02-231-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | probed PFs In the VF driver, module parameter mlx4_log_num_mgm_entry_size was mistakenly overwritten -- and in a manner which overrode the device-managed flow steering option encoded in the parameter. log_num_mgm_entry_size is a global module parameter which affects all ConnectX-3 PFs installed on that host. If a VF changes log_num_mgm_entry_size, this will affect all PFs which are probed subsequent to the change (by disabling DMFS for those PFs). Fixes: 3c439b5586e9 ("mlx4_core: Allow choosing flow steering mode") Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/mlx4: Spoofcheck and zero MAC can't coexistEugenia Emantayev2017-02-232-7/+21
| | | | | | | | | | | | | | | | | | | | Spoofcheck can't be enabled if VF MAC is zero. Vice versa, can't zero MAC if spoofcheck is on. Fixes: 8f7ba3ca12f6 ('net/mlx4: Add set VF mac address support') Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/mlx4: Change ENOTSUPP to EOPNOTSUPPOr Gerlitz2017-02-237-9/+9
| | | | | | | | | | | | | | | | | | | | | | As ENOTSUPP is specific to NFS, change the return error value to EOPNOTSUPP in various places in the mlx4 driver. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Suggested-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/mlx5e: Fix wrong CQE decompressionTariq Toukan2017-02-231-7/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In cqe compression with striding RQ, the decompression of the CQE field wqe_counter was done with a wrong wraparound value. This caused handling cqes with a wrong pointer to wqe (rx descriptor) and creating SKBs with wrong data, pointing to wrong (and already consumed) strides/pages. The meaning of the CQE field wqe_counter in striding RQ holds the stride index instead of the WQE index. Hence, when decompressing a CQE, wqe_counter should have wrapped-around the number of strides in a single multi-packet WQE. We dropped this wrap-around mask at all in CQE decompression of striding RQ. It is not needed as in such cases the CQE compression session would break because of different value of wqe_id field, starting a new compression session. Tested: ethtool -K ethxx lro off/on ethtool --set-priv-flags ethxx rx_cqe_compress on super_netperf 16 {ipv4,ipv6} -t TCP_STREAM -m 50 -D verified no csum errors and no page refcount issues. Fixes: 7219ab34f184 ("net/mlx5e: CQE compression") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reported-by: Tom Herbert <tom@herbertland.com> Cc: kernel-team@fb.com Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/mlx5e: Update MPWQE stride size when modifying CQE compress stateSaeed Mahameed2017-02-234-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the admin enables/disables cqe compression, updating mpwqe stride size is required: CQE compress ON ==> stride size = 256B CQE compress OFF ==> stride size = 64B This is already done on driver load via mlx5e_set_rq_type_params, all we need is just to call it on arbitrary admin changes of cqe compression state via priv flags or when changing timestamping state (as it is mutually exclusive with cqe compression). This bug introduces no functional damage, it only makes cqe compression occur less often, since in ConnectX4-LX CQE compression is performed only on packets smaller than stride size. Tested: ethtool --set-priv-flags ethxx rx_cqe_compress on pktgen with 64 < pkt size < 256 and netperf TCP_STREAM (IPv4/IPv6) verify `ethtool -S ethxx | grep compress` are advancing more often (rapidly) Fixes: 7219ab34f184 ("net/mlx5e: CQE compression") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Cc: kernel-team@fb.com Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/mlx5e: Fix broken CQE compression initializationTariq Toukan2017-02-231-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some of RQ type parameters are derived from CQE compression state flag, CQE compression flag was initialized only after RQ type parameters setup. This leads to load RQ with stride size smaller than what we want for when CQE compression is on. This bug introduces no functional damage, it only makes CQE compression occur less often, since in ConnectX4-LX CQE compression is performed only on packets smaller than stride size. Fix this by marking default status of CQE compression in PFLAG prior to calling mlx5e_set_rq_priv_params(), as it inits some fields based on it. Tested: load driver on systems where rx CQE compress will be on (MH) pktgen with 64 < pkt size < 256 and netperf TCP_STREAM (IPv4/IPv6) verify `ethtool -S ethxx | grep compress` are advancing more often (rapidly) Fixes: 2fc4bfb7250d ("net/mlx5e: Dynamic RQ type infrastructure") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Cc: kernel-team@fb.com Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/mlx5e: Do not reduce LRO WQE size when not using build_skbTariq Toukan2017-02-231-6/+5
| | | | | | | | | | | | | | | | | | | | When rq_type is Striding RQ, no room of SKB_RESERVE is needed as SKB allocation is not done via build_skb. Fixes: e4b85508072b ("net/mlx5e: Slightly reduce hardware LRO size") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/mlx5e: Register/unregister vport representors on interface attach/detachSaeed Mahameed2017-02-231-8/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently vport representors are added only on driver load and removed on driver unload. Apparently we forgot to handle them when we added the seamless reset flow feature. This caused to leave the representors netdevs alive and active with open HW resources on pci shutdown and on error reset flows. To overcome this we move their handling to interface attach/detach, so they would be cleaned up on shutdown and recreated on reset flows. Fixes: 26e59d8077a3 ("net/mlx5e: Implement mlx5e interface attach/detach callbacks") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/mlx5e: s390 system compilation fixMohamad Haj Yahia2017-02-232-0/+2
| | | | | | | | | | | | | | | | | | | | Add necessary headers include for s390 arch compilation. Fixes: e586b3b0baee ("net/mlx5: Ethernet Datapath files") Fixes: d605d6686dc7 ("net/mlx5e: Add support for ethtool self..") Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge tag 'for-linus' of ↵Linus Torvalds2017-02-231-2/+10
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull Mellanox rdma updates from Doug Ledford: "Mellanox specific updates for 4.11 merge window Because the Mellanox code required being based on a net-next tree, I keept it separate from the remainder of the RDMA stack submission that is based on 4.10-rc3. This branch contains: - Various mlx4 and mlx5 fixes and minor changes - Support for adding a tag match rule to flow specs - Support for cvlan offload operation for raw ethernet QPs - A change to the core IB code to recognize raw eth capabilities and enumerate them (touches non-Mellanox code) - Implicit On-Demand Paging memory registration support" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (40 commits) IB/mlx5: Fix configuration of port capabilities IB/mlx4: Take source GID by index from HW GID table IB/mlx5: Fix blue flame buffer size calculation IB/mlx4: Remove unused variable from function declaration IB: Query ports via the core instead of direct into the driver IB: Add protocol for USNIC IB/mlx4: Support raw packet protocol IB/mlx5: Support raw packet protocol IB/core: Add raw packet protocol IB/mlx5: Add implicit MR support IB/mlx5: Expose MR cache for mlx5_ib IB/mlx5: Add null_mkey access IB/umem: Indicate that process is being terminated IB/umem: Update on demand page (ODP) support IB/core: Add implicit MR flag IB/mlx5: Support creation of a WQ with scatter FCS offload IB/mlx5: Enable QP creation with cvlan offload IB/mlx5: Enable WQ creation and modification with cvlan offload IB/mlx5: Expose vlan offloads capabilities IB/uverbs: Enable QP creation with cvlan offload ...
| * net/mlx5: Consolidate flow rules regardless their flow tagMaor Gottlieb2017-02-141-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | Flow rules with same match criteria and value should be mapped to the same flow table entry regardless the flow tag identifier. Flow tag is part of flow table entry context and not of the destination, therefore we should return error when we try to add destination to flow table entry with different flow tag. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
* | mlx4: reduce OOM risk on arches with large pagesEric Dumazet2017-02-201-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | Since mlx4 NIC are used on PowerPC with 64K pages, we need to adapt MLX4_EN_ALLOC_PREFER_ORDER definition. Otherwise, a fragment sitting in an out of order TCP queue can hold 0.5 Mbytes and it is a serious OOM risk. Fixes: 51151a16a60f ("mlx4: allow order-0 memory allocations in RX path") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | mlx4: fix potential divide by 0 in mlx4_en_auto_moderation()Eric Dumazet2017-02-191-10/+14
| | | | | | | | | | | | | | | | | | | | | | | | 1) In the case where rate == priv->pkt_rate_low == priv->pkt_rate_high, mlx4_en_auto_moderation() does a divide by zero. 2) We want to properly change the moderation parameters if rx_frames was changed (like in ethtool -C eth0 rx-frames 16) Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | mlx4: do not fire tasklet unless necessaryEric Dumazet2017-02-172-9/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All rx and rx netdev interrupts are handled by respectively by mlx4_en_rx_irq() and mlx4_en_tx_irq() which simply schedule a NAPI. But mlx4_eq_int() also fires a tasklet to service all items that were queued via mlx4_add_cq_to_tasklet(), but this handler was not called unless user cqe was handled. This is very confusing, as "mpstat -I SCPU ..." show huge number of tasklet invocations. This patch saves this overhead, by carefully firing the tasklet directly from mlx4_add_cq_to_tasklet(), removing four atomic operations per IRQ. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2017-02-161-0/+4
|\ \
| * | net/mlx5e: Disable preemption when doing TC statistics upcallOr Gerlitz2017-02-141-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When called by HW offloading drivers, the TC action (e.g net/sched/act_mirred.c) code uses this_cpu logic, e.g _bstats_cpu_update(this_cpu_ptr(a->cpu_bstats), bytes, packets) per the kernel documention, preemption should be disabled, add that. Before the fix, when running with CONFIG_PREEMPT set, we get a BUG: using smp_processor_id() in preemptible [00000000] code: tc/3793 asserion from the TC action (mirred) stats_update callback. Fixes: aad7e08d39bd ('net/mlx5e: Hardware offloaded flower filter statistics support') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | mlxsw: acl: Use PBS type for forward actionJiri Pirko2017-02-151-11/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current behaviour of "mirred redirect" action (forward) offload is a bit odd. For matched packets the action forwards them to the desired destination, but it also lets the packet duplicates to go the original way down (bridge, router, etc). That is more like "mirred mirror". Fix this by using PBS type which behaves exactly like "mirred redirect". Note that PBS does not support loopback mode. Fixes: 4cda7d8d7098 ("mlxsw: core: Introduce flexible actions support") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | mlx4: do not use rwlock in fast pathEric Dumazet2017-02-152-18/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using a reader-writer lock in fast path is silly, when we can instead use RCU or a seqlock. For mlx4 hwstamp clock, a seqlock is the way to go, removing two atomic operations and false sharing. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | mlxsw: spectrum: Change ipv6 unregistered mc tableNogah Frankel2017-02-141-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Point back the unregister IPv6 mc table to the bc table. It is done since IPv6 mcast snooping is not supported for Spectrum yet. Reported-by: Jiri Pirko <jiri@mellanox.com> Fixes: 71c365bdc439 ("mlxsw: spectrum: Separate bc and mc floods") Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Tested-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | spectrum: flower: Treat ETH_P_ALL as a special case and translate for HWJiri Pirko2017-02-101-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | HW does not understand ETH_P_ALL. So treat this special case differently and translate to 0/0 key/mask. That will allow HW to match all ethertypes. Fixes: 7aa0f5aa9030 ("mlxsw: spectrum: Implement TC flower offload") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | mlxsw: spectrum: Update mc_disabled flag by switchdev attrNogah Frankel2017-02-101-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a function to update mc_disabled from switchdev attr SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | mlxsw: spectrum: Extend port_orig_get for bridge devicesNogah Frankel2017-02-101-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function mlxsw_sp_port_orig_get returns the vport from the physical port if needed, based on the original device. This patch addresses the case where the original device is a bridge. If it is vlan unaware bridge, it returns the matching vport. If it is vlan aware bridge, there is no matching vport, and it returns the original port. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | mlxsw: spectrum: Add an option to flood mc by mc_router_portNogah Frankel2017-02-103-3/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The decision whether to flood a multicast packet to a port dependent on three flags: mc_disabled, mc_router_port, mc_flood. If mc_disabled is on, the port will be flooded according to mc_flood, otherwise, according to mc_router_port. To accomplish that, add those flags into the mlxsw_sp_port struct and update the mc flood table accordingly. Update mc_router_port by switchdev attribute SWITCHDEV_ATTR_ID_PORT_MC_ROUTER_PORT. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | mlxsw: spectrum: Separate bc and mc floodsNogah Frankel2017-02-103-13/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Break the bm (broadcast-multicast) into two tables, one for broadcast (and link local multicast that behaves like bc) and one for unknown multicasts. Add a bool into mlxsw_sp_port named mc_flood that reflect the value this port should have in the mc flood table (currently, always 1); Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | mlxsw: spectrum: Change max vfidNogah Frankel2017-02-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A user that wants many bridges will use 1.Q bridge which are scalable. One can have as many 1.Q bridges as vfids. This patch sets their number to 1k, which is a reasonably large number. This change is done here because the next patches will add a new flood table, and without it, it will increase the overall size of the flood tables dramatically. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | mlxsw: spectrum: Make port flood update more genericNogah Frankel2017-02-101-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, there is a per port flood update function only for the UC table. Make the function more generic by changing the table type to be an input. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | mlxsw: spectrum: Break flood set func to be per tableNogah Frankel2017-02-101-20/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the flood set function can't operate on only one table, but sets both uc_flood and mb_flood together. This patch creates a function that sets the flood state per table. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | mlxsw: spectrum_router: Add support for route replaceIdo Schimmel2017-02-101-7/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upon the reception of an ENTRY_REPLACE notification, resolve the FIB node corresponding to the prefix and length and insert the new route before the first matching entry. Since the notification also signals the deletion of the replaced route, delete it from the driver's cache. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | mlxsw: spectrum_router: Add support for route appendIdo Schimmel2017-02-101-6/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a new route is appended, it's placed after existing routes sharing the same parameters (prefix, length, table ID, TOS and priority). While the device supports only one route with the same prefix and length in a single table, it's important to correctly place the appended route in the driver's cache, as when a route is deleted the next one is programmed into the device. Following the reception of an ENTRY_APPEND notification, resolve the FIB node corresponding to the prefix and length and correctly place the new entry in its entry list. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | mlxsw: spectrum_router: Correctly handle identical routesIdo Schimmel2017-02-101-178/+403
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the device, routes are indexed in a routing table based on the prefix and its length. This is in contrast to the kernel's FIB where several FIB aliases can exist with these parameters being identical. In such cases, the routes will be sorted by table ID (LOCAL first, then MAIN), TOS and finally priority (metric). During lookup, these routes will be evaluated in order. In case the packet's TOS field is non-zero and a FIB alias with a matching TOS is found, then it's selected. Otherwise, the lookup defaults to the route with TOS 0 (if it exists). However, if the requested scope is narrower than the one found, then the lookup continues. To best reflect the kernel's datapath we should take the above into account. Given a prefix and its length, the reflected route will always be the first one in the FIB alias list. However, if the route has a non-zero TOS then its action will be converted to trap instead of forward, since we currently don't support TOS-based routing. If this turns out to be a real issue, we can add support for that using policy-based switching. The route's scope can be effectively ignored as any packet being routed by the device would've been looked-up using the widest scope (UNIVERSE). To achieve that we need to do two changes. Firstly, we need to create another struct (FIB node) that will hold the list of FIB entries sharing the same prefix and length. This struct will be hashed using these two parameters. Secondly, we need to change the route reflection to match the above logic, so that the first FIB entry in the list will be programmed into the device while the rest will remain in the driver's cache in case of subsequent changes. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | mlxsw: spectrum_router: Don't reflect LINKDOWN nexthopsIdo Schimmel2017-02-081-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kernel resolves the nexthops for a given route using FIB_LOOKUP_IGNORE_LINKSTATE which means a notification can be sent for a route with one of its nexthops being LINKDOWN. In case IGNORE_ROUTES_WITH_LINKDOWN is set for the nexthop netdev, then we shouldn't reflect the nexthop to the device's table. Once the nexthop netdev's carrier goes up we'll be notified using NH_ADD and reflect it to the device. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | mlxsw: spectrum_router: Flush resources when RIF is deletedIdo Schimmel2017-02-083-3/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the last IP address is removed from a netdev, its RIF is deleted. However, if user didn't first remove neighbours and nexthops using this interface, then they would still be present in the device's tables. Therefore, whenever a RIF is deleted, make sure all the neighbours and nexthops (adjacency entries) using it are removed from the relevant tables as well. The action associated with any route using this RIF would be refreshed, most likely to trap. If the kernel decides to remove the route (f.e., because all the nexthops are now DEAD), then an event would be sent, causing the route to be removed from the device. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | mlxsw: spectrum_router: Reflect nexthop status changesIdo Schimmel2017-02-081-2/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a packet hits a multipath route in the device's routing table, a hash is computed over its headers, which is then used to select the appropriate nexthop from the device's adjacency table. There are situations in which the kernel removes a nexthop from a multipath route (e.g., no carrier) and the device should do the same. Upon the reception of NH_{ADD,DEL} events, add or remove a nexthop from the device's adjacency table and refresh all the routes using the nexthop group. If all the nexthops of a multipath route are invalid, then any packet hitting the route would be trapped to the CPU for forwarding. If all the nexthops are DEAD, then the kernel would remove the route entirely. On the other hand, if all the nexthops are merely LINKDOWN, then the kernel would keep the route and forward any incoming packet using a different route. While the last case might sound like a problem, it's expected that a routing daemon running in user space would remove such a route from the FIB as it's dumped with the DEAD flag set. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | mlxsw: spectrum_router: Use trap action only for some route typesIdo Schimmel2017-02-081-29/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The device can have one of three actions associated with a route: 1) Remote - packets continue to the adjacency table 2) Local - packets continue to the neighbour table 3) Trap - packets continue to the CPU The first two actions can also trap packets to the CPU, but they do so using a different trap ID, which has a lower traffic class and less allotted bandwidth. We currently use the third action for both RTN_{LOCAL,BROADCAST} routes and RTN_UNICAST routes not pointing to the switch ports. However, packets that merely need to be forwarded by the switch are likely not control packets and can be therefore scheduled towards the CPU using a lower traffic class. Achieve the above by assigning the third action only to local and broadcast routes and have any other route use either of the first two actions, based on whether the route is gatewayed or not. This will also allow us to refresh routes using the local action and have them trap packets when their RIF is no longer valid following a NH_DEL event. One side effect of this patch is that we no longer give special treatment to multipath routes using both switch and non-switch ports towards their nexthops. If at least one of the nexthops can be resolved, then the device will forward the packets instead of trapping them. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | mlxsw: spectrum_router: Determine offload status using generic functionIdo Schimmel2017-02-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The previous patch introduced a generic function to determine whether a route should be offloaded or not. Make use of it here. In the future we're going to add more conditions to this test (e.g., whether TOS is non-zero), so it makes sense to centralize it instead of open coding it in a few places. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | mlxsw: spectrum_router: More accurately set offload flagIdo Schimmel2017-02-081-20/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently set the RTNH_F_OFFLOAD flag for all routes using remote action, but this isn't always correct. If none of the nexthops associated with a gatewayed route can be offloaded into the device, then any packet hitting it would be trapped to the CPU and forwarded by the kernel. Solve this by pushing the setting of the offload flag to after the route was programmed into the device, thereby allowing us to take all the parameters into account. This change will also help us further in the patchset, when we refresh routes following the reception of NH_{ADD,DEL} events. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | mlxsw: spectrum_router: Refactor nexthop init routineIdo Schimmel2017-02-081-31/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The nexthop init and de-init functions both have symmetric parts concerned with the reflection of the neighbour entry into the device's adjacency table, in case it's used by a gatewayed route. These sections of code also need to be called when a nexthop is marked as valid / invalid following NH_{ADD,DEL} events. Break these out into appropriate functions, so that they could be invoked following the reception of above events. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | mlxsw: spectrum_router: Remove FIB info from FIB entry structIdo Schimmel2017-02-081-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After the previous changes, the FIB info is embedded in every nexthop group struct, which in turn is embedded in every FIB entry struct. We can therefore safely remove the FIB info from the entry struct. This has the added advantage of making the router-related structs more generic and suitable for use with IPv6 offloads. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | mlxsw: spectrum_router: Store routes in a more generic wayIdo Schimmel2017-02-081-13/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Up until now, the only FIB entries that were associated with a nexthop group were routes to remote networks where all the nexthop devices had a valid router interface (RIF). This is in contrast to the FIB code, where all the routes are associated with a FIB info. The same design choice needs to be applied to the driver's cache. Based on the NH_{ADD,DEL} events which will be added later in the patchset, we need to be able to change the action (forward / trap) associated with all the routes using the nexthop group. However, if we can't link between the nexthop and the routes using it, then the above is impossible. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | mlxsw: spectrum_router: Add gateway indication to nexthop groupIdo Schimmel2017-02-081-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The next patch is going to generalize the way in which we store routes. Instead of attaching a nexthop group only to gatewayed routes, one will be attached to each route, in a similar way to the way the FIB code stores its routes. The above means that any function operating on a nexthop group cannot assume the group represents only gatewayed nexthops. One such function is the one that refreshes a nexthop group and updates the adjacency table following nexthop changes. For a nexthop group that doesn't represent any gateways this function would essentially be a NOP, but it would be useful if it did update the action associated with any route using it. This will allow us to later consolidate code paths when a nexthop changes following NH_{ADD,DEL} events. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | mlxsw: spectrum_router: Use nexthop's scope to set action typeIdo Schimmel2017-02-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently use the scope of the FIB info to distinguish between a direct unicast route and a gatewayed one. However, the kernel is perfectly happy to configure a route with scope UNIVERSE to a directly connected network. Instead, we can rely on the first nexthop's scope to check if the route is gatewayed or not. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | mlxsw: spectrum_router: Store nexthops in a hash tableIdo Schimmel2017-02-082-4/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Later in the patchset we'll add the NH_{ADD,DEL} events which will let us know when a nexthop is considered to be dead. Based on these events we need to be able to add or remove the nexthop from the device's tables. Therefore, store the private nexthop structs in a hash table and use the kernel's fib_nh struct as the key, so that we'll be able to easily find them when the events are received. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | mlxsw: spectrum_router: Store nexthop groups in a hash tableIdo Schimmel2017-02-082-52/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, when we're notified about a new RTN_UNICAST route we perform a lookup on the nexthop group list looking for a group with a matching configuration to that found in the FIB info. This is quite inefficient. Instead, we can simply rely on the kernel to consolidate several FIB configurations into the same FIB info and use the FIB info as the key for our private nexthop group struct. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | mlxsw: spectrum_router: Nullify nexthop's neigh pointerIdo Schimmel2017-02-081-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we invalidate a nexthop we should also invalidate its neighbour entry pointer as it might be destroyed later on. This makes the nexthop de-init function symmetric with its init and also ensures nobody will try to access the neighbour entry. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | mlxsw: acl: Fix mlxsw_afa_block_commit error pathJiri Pirko2017-02-081-11/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No rollback is needed since the chain is in consistent state and mlxsw_afa_block_destroy() will take care of putting it away. So remove the one we have now which is wrong. Also move the set of 'finished' flag to the beginning of the function, because the block is certainly unusable for future action addition no matter if the function succeeds or not. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 4cda7d8d7098 ("mlxsw: core: Introduce flexible actions support") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>