summaryrefslogtreecommitdiffstats
path: root/drivers/nvdimm/nd.h
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'libnvdimm-for-4.17' of ↵Linus Torvalds2018-04-101-1/+0
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "This cycle was was not something I ever want to repeat as there were several late changes that have only now just settled. Half of the branch up to commit d2c997c0f145 ("fs, dax: use page->mapping to warn...") have been in -next for several releases. The of_pmem driver and the address range scrub rework were late arrivals, and the dax work was scaled back at the last moment. The of_pmem driver missed a previous merge window due to an oversight. A sense of obligation to rectify that miss is why it is included for 4.17. It has acks from PowerPC folks. Stephen reported a build failure that only occurs when merging it with your latest tree, for now I have fixed that up by disabling modular builds of of_pmem. A test merge with your tree has received a build success report from the 0day robot over 156 configs. An initial version of the ARS rework was submitted before the merge window. It is self contained to libnvdimm, a net code reduction, and passing all unit tests. The filesystem-dax changes are based on the wait_var_event() functionality from tip/sched/core. However, late review feedback showed that those changes regressed truncate performance to a large degree. The branch was rewound to drop the truncate behavior change and now only includes preparation patches and cleanups (with full acks and reviews). The finalization of this dax-dma-vs-trnucate work will need to wait for 4.18. Summary: - A rework of the filesytem-dax implementation provides for detection of unmap operations (truncate / hole punch) colliding with in-progress device-DMA. A fix for these collisions remains a work-in-progress pending resolution of truncate latency and starvation regressions. - The of_pmem driver expands the users of libnvdimm outside of x86 and ACPI to describe an implementation of persistent memory on PowerPC with Open Firmware / Device tree. - Address Range Scrub (ARS) handling is completely rewritten to account for the fact that ARS may run for 100s of seconds and there is no platform defined way to cancel it. ARS will now no longer block namespace initialization. - The NVDIMM Namespace Label implementation is updated to handle label areas as small as 1K, down from 128K. - Miscellaneous cleanups and updates to unit test infrastructure" * tag 'libnvdimm-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (39 commits) libnvdimm, of_pmem: workaround OF_NUMA=n build error nfit, address-range-scrub: add module option to skip initial ars nfit, address-range-scrub: rework and simplify ARS state machine nfit, address-range-scrub: determine one platform max_ars value powerpc/powernv: Create platform devs for nvdimm buses doc/devicetree: Persistent memory region bindings libnvdimm: Add device-tree based driver libnvdimm: Add of_node to region and bus descriptors libnvdimm, region: quiet region probe libnvdimm, namespace: use a safe lookup for dimm device name libnvdimm, dimm: fix dpa reservation vs uninitialized label area libnvdimm, testing: update the default smart ctrl_temperature libnvdimm, testing: Add emulation for smart injection commands nfit, address-range-scrub: introduce nfit_spa->ars_state libnvdimm: add an api to cast a 'struct nd_region' to its 'struct device' nfit, address-range-scrub: fix scrub in-progress reporting dax, dm: allow device-mapper to operate without dax support dax: introduce CONFIG_DAX_DRIVER fs, dax: use page->mapping to warn if truncate collides with a busy page ext2, dax: introduce ext2_dax_aops ...
| * libnvdimm: add an api to cast a 'struct nd_region' to its 'struct device'Dan Williams2018-04-031-1/+0
| | | | | | | | | | | | | | | | | | | | | | For debug, it is useful for bus providers to be able to retrieve the 'struct device' associated with an nd_region instance that it registered. We already have to_nd_region() to perform the reverse cast operation, in fact its duplicate declaration can be removed from the private drivers/nvdimm/nd.h header. Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* | block: Move SECTOR_SIZE and SECTOR_SHIFT definitions into <linux/blkdev.h>Bart Van Assche2018-03-171-1/+0
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It happens often while I'm preparing a patch for a block driver that I'm wondering: is a definition of SECTOR_SIZE and/or SECTOR_SHIFT available for this driver? Do I have to introduce definitions of these constants before I can use these constants? To avoid this confusion, move the existing definitions of SECTOR_SIZE and SECTOR_SHIFT into the <linux/blkdev.h> header file such that these become available for all block drivers. Make the SECTOR_SIZE definition in the uapi msdos_fs.h header file conditional to avoid that including that header file after <linux/blkdev.h> causes the compiler to complain about a SECTOR_SIZE redefinition. Note: the SECTOR_SIZE / SECTOR_SHIFT / SECTOR_BITS definitions have not been removed from uapi header files nor from NAND drivers in which these constants are used for another purpose than converting block layer offsets and sizes into a number of sectors. Cc: David S. Miller <davem@davemloft.net> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* memremap: change devm_memremap_pages interface to use struct dev_pagemapChristoph Hellwig2018-01-081-5/+4
| | | | | | | | | | | | | | | | | This new interface is similar to how struct device (and many others) work. The caller initializes a 'struct dev_pagemap' as required and calls 'devm_memremap_pages'. This allows the pagemap structure to be embedded in another structure and thus container_of can be used. In this way application specific members can be stored in a containing struct. This will be used by the P2P infrastructure and HMM could probably be cleaned up to use it as well (instead of having it's own, similar 'hmm_devmem_pages_create' function). Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* libnvdimm: move poison list functions to a new 'badrange' fileDave Jiang2017-11-021-6/+0
| | | | | | | | | | | | | nfit_test needs to use the poison list manipulation code as well. Make it more generic and in the process rename poison to badrange, and move all the related helpers to a new file. Signed-off-by: Dave Jiang <dave.jiang@intel.com> [vishal: Add badrange.o to nfit_test's Kbuild] [vishal: add a missed include in bus.c for the new badrange functions] [vishal: rename all instances of 'be' to 'bre'] Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* libnvdimm, dimm: clear 'locked' status on successful DIMM enableDan Williams2017-09-281-0/+1
| | | | | | | | | | | If we successfully enable a DIMM then it must not be locked and we can clear the label-read failure condition. Otherwise, we need to reload the entire bus provider driver to achieve the same effect, and that can disrupt unrelated DIMMs and namespaces. Fixes: 9d62ed965118 ("libnvdimm: handle locked label storage areas") Cc: <stable@vger.kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* Merge tag 'libnvdimm-for-4.14' of ↵Linus Torvalds2017-09-111-4/+12
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm from Dan Williams: "A rework of media error handling in the BTT driver and other updates. It has appeared in a few -next releases and collected some late- breaking build-error and warning fixups as a result. Summary: - Media error handling support in the Block Translation Table (BTT) driver is reworked to address sleeping-while-atomic locking and memory-allocation-context conflicts. - The dax_device lookup overhead for xfs and ext4 is moved out of the iomap hot-path to a mount-time lookup. - A new 'ecc_unit_size' sysfs attribute is added to advertise the read-modify-write boundary property of a persistent memory range. - Preparatory fix-ups for arm and powerpc pmem support are included along with other miscellaneous fixes" * tag 'libnvdimm-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (26 commits) libnvdimm, btt: fix format string warnings libnvdimm, btt: clean up warning and error messages ext4: fix null pointer dereference on sbi libnvdimm, nfit: move the check on nd_reserved2 to the endpoint dax: fix FS_DAX=n BLOCK=y compilation libnvdimm: fix integer overflow static analysis warning libnvdimm, nd_blk: remove mmio_flush_range() libnvdimm, btt: rework error clearing libnvdimm: fix potential deadlock while clearing errors libnvdimm, btt: cache sector_size in arena_info libnvdimm, btt: ensure that flags were also unchanged during a map_read libnvdimm, btt: refactor map entry operations with macros libnvdimm, btt: fix a missed NVDIMM_IO_ATOMIC case in the write path libnvdimm, nfit: export an 'ecc_unit_size' sysfs attribute ext4: perform dax_device lookup at mount ext2: perform dax_device lookup at mount xfs: perform dax_device lookup at mount dax: introduce a fs_dax_get_by_bdev() helper libnvdimm, btt: check memory allocation failure libnvdimm, label: fix index block size calculation ...
| * libnvdimm, label: fix index block size calculationDan Williams2017-08-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The old calculation assumed that the label space was 128k and the label size is 128. With v1.2 labels where the label size is 256 this calculation will return zero. We are saved by the fact that the nsindex_size is always pre-initialized from a previous 128 byte assumption and we are lucky that the index sizes turn out the same. Fix this going forward in case we start encountering different geometries of label areas besides 128k. Since the label size can change from one call to the next, drop the caching of nsindex_size. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * libnvdimm: rename nd_sector_size_{show,store} to nd_size_select_{show,store}Dan Williams2017-08-111-3/+3
| | | | | | | | | | | | | | | | Prepare for other another consumer of this size selection scheme that is not a 'sector size'. Cc: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * nfit, libnvdimm, region: export 'position' in mapping infoDan Williams2017-08-041-0/+1
| | | | | | | | | | | | | | | | | | | | It is useful to be able to know the position of a DIMM in an interleave-set. Consider the case where the order of the DIMMs changes causing a namespace to be invalidated because the interleave-set cookie no longer matches. If the before and after state of each DIMM position is known this state debugged by the system owner. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * libnvdimm: Stop using HPAGE_SIZEOliver O'Halloran2017-07-251-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently libnvdimm uses HPAGE_SIZE as the default alignment for DAX and PFN devices. HPAGE_SIZE is the default hugetlbfs page size and when hugetlbfs is disabled it defaults to PAGE_SIZE. Given DAX has more in common with THP than hugetlbfs we should proably be using HPAGE_PMD_SIZE, but this is undefined when THP is disabled so lets just give it a new name. The other usage of HPAGE_SIZE in libnvdimm is when determining how large the altmap should be. For the reasons mentioned above it doesn't really make sense to use HPAGE_SIZE here either. PMD_SIZE seems to be safe to use in generic code and it happens to match the vmemmap allocation block on x86 and Power. It's still a hack, but it's a slightly nicer hack. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* | block: replace bi_bdev with a gendisk pointer and partitions indexChristoph Hellwig2017-08-231-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This way we don't need a block_device structure to submit I/O. The block_device has different life time rules from the gendisk and request_queue and is usually only available when the block device node is open. Other callers need to explicitly create one (e.g. the lightnvm passthrough code, or the new nvme multipathing code). For the actual I/O path all that we need is the gendisk, which exists once per block device. But given that the block layer also does partition remapping we additionally need a partition index, which is used for said remapping in generic_make_request. Note that all the block drivers generally want request_queue or sometimes the gendisk, so this removes a layer of indirection all over the stack. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | block: pass in queue to inflight accountingJens Axboe2017-08-091-2/+3
|/ | | | | | | | | No functional change in this patch, just in preparation for basing the inflight mechanism on the queue in question. Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* libnvdimm, btt: BTT updates for UEFI 2.7 formatVishal Verma2017-06-291-0/+3
| | | | | | | | | | | | The UEFI 2.7 specification defines an updated BTT metadata format, bumping the revision to 2.0. Add support for the new format, while retaining compatibility for the old 1.1 format. Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Linda Knippers <linda.knippers@hpe.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* libnvdimm, pmem: Add sysfs notifications to badblocksToshi Kani2017-06-151-0/+1
| | | | | | | | | | | | | | | Sysfs "badblocks" information may be updated during run-time that: - MCE, SCI, and sysfs "scrub" may add new bad blocks - Writes and ioctl() may clear bad blocks Add support to send sysfs notifications to sysfs "badblocks" file under region and pmem directories when their badblocks information is re-evaluated (but is not necessarily changed) during run-time. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Linda Knippers <linda.knippers@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* libnvdimm, label: add address abstraction identifiersDan Williams2017-06-151-0/+1
| | | | | | | | | | | | | | | | | | | | Starting with v1.2 labels, 'address abstractions' can be hinted via an address abstraction id that implies an info-block format. The standard address abstraction in the specification is the v2 format of the Block-Translation-Table (BTT). Support for that is saved for a later patch, for now we add support for the Linux supported address abstractions BTT (v1), PFN, and DAX. The new 'holder_class' attribute for namespace devices is added for tooling to specify the 'abstraction_guid' to store in the namespace label. For v1.1 labels this field is undefined and any setting of 'holder_class' away from the default 'none' value will only have effect until the driver is unloaded. Setting 'holder_class' requires that whatever device tries to claim the namespace must be of the specified class. Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* libnvdimm, label: honor the lba size specified in v1.2 labelsDan Williams2017-06-151-0/+1
| | | | | | | | | | | Previously we only honored the lba size for blk-aperture mode namespaces. For pmem namespaces the lba size was just assumed to be 512. With the new v1.2 label definition and compatibility with other operating environments, the ->lbasize property is now respected for pmem namespaces. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* libnvdimm, label: add v1.2 interleave-set-cookie algorithmDan Williams2017-06-151-1/+2
| | | | | | | | | | The interleave-set-cookie algorithm is extended to incorporate all the same components that are used to generate an nvdimm unique-id. For backwards compatibility we still maintain the old v1.1 definition. Reported-by: Nicholas Moulin <nicholas.w.moulin@intel.com> Reported-by: Kaushik Kanetkar <kaushik.a.kanetkar@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* libnvdimm, label: add v1.2 nvdimm label definitionsDan Williams2017-06-151-1/+7
| | | | | | | | | | | | | | | | | | | | | | In support of improved interoperability between operating systems and pre-boot environments the Intel proposed NVDIMM Namespace Specification [1], has been adopted and modified to the the UEFI 2.7 NVDIMM Label Protocol [2]. Update the definitions of the namespace label data structures so that the new format can be supported alongside the existing label format. The new specification changes the default label size to 256 bytes, so everywhere that relied on sizeof(struct nd_namespace_label) must now use the sizeof_namespace_label() helper. There should be no functional differences from these changes as the default is still the v1.1 128-byte format. Future patches will move the default to the v1.2 definition. [1]: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf [2]: http://www.uefi.org/sites/default/files/resources/UEFI_Spec_2_7.pdf Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* libnvdimm: add an atomic vs process context flag to rw_bytesVishal Verma2017-05-101-0/+1
| | | | | | | | | | | | | | | nsio_rw_bytes can clear media errors, but this cannot be done while we are in an atomic context due to locking within ACPI. From the BTT, ->rw_bytes may be called either from atomic or process context depending on whether the calls happen during initialization or during IO. During init, we want to ensure error clearing happens, and the flag marking process context allows nsio_rw_bytes to do that. When called during IO, we're in atomic context, and error clearing can be skipped. Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* libnvdimm: convert NDD_ flags to use bitops, introduce NDD_LOCKEDDan Williams2017-05-041-0/+1
| | | | | | | | | | | | This is a preparation patch for handling locked nvdimm label regions, a new concept as introduced by the latest DSM document on pmem.io [1]. A future patch will leverage nvdimm_set_locked() at DIMM probe time to flag regions that can not be enabled. There should be no functional difference resulting from this change. [1]: http://pmem.io/documents/NVDIMM_DSM_Interface_Example-V1.3.pdf Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* libnvdimm: add mechanism to publish badblocks at the region levelDave Jiang2017-04-121-0/+1
| | | | | | | | | | badblocks sysfs file will be export at region level. When nvdimm event notifier happens for NVDIMM_REVALIATE_POISON, the badblocks in the region will be updated. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* nfit, libnvdimm: fix interleave set cookie calculationDan Williams2017-03-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The interleave-set cookie is a sum that sanity checks the composition of an interleave set has not changed from when the namespace was initially created. The checksum is calculated by sorting the DIMMs by their location in the interleave-set. The comparison for the sort must be 64-bit wide, not byte-by-byte as performed by memcmp() in the broken case. Fix the implementation to accept correct cookie values in addition to the Linux "memcmp" order cookies, but only allow correct cookies to be generated going forward. It does mean that namespaces created by third-party-tooling, or created by newer kernels with this fix, will not validate on older kernels. However, there are a couple mitigating conditions: 1/ platforms with namespace-label capable NVDIMMs are not widely available. 2/ interleave-sets with a single-dimm are by definition not affected (nothing to sort). This covers the QEMU-KVM NVDIMM emulation case. The cookie stored in the namespace label will be fixed by any write the namespace label, the most straightforward way to achieve this is to write to the "alt_name" attribute of a namespace in sysfs. Cc: <stable@vger.kernel.org> Fixes: eaf961536e16 ("libnvdimm, nfit: add interleave-set state-tracking infrastructure") Reported-by: Nicholas Moulin <nicholas.w.moulin@linux.intel.com> Tested-by: Nicholas Moulin <nicholas.w.moulin@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* libnvdimm: allow a platform to force enable label supportDan Williams2016-10-191-0/+1
| | | | | | | | | | | | | | | Platforms like QEMU-KVM implement an NFIT table and label DSMs. However, since that environment does not define an aliased configuration, the labels are currently ignored and the kernel registers a single full-sized pmem-namespace per region. Now that the kernel supports sub-divisions of pmem regions the labels have a purpose. Arrange for the labels to be honored when we find an existing / valid namespace index block. Cc: <qemu-devel@nongnu.org> Cc: Haozhong Zhang <haozhong.zhang@intel.com> Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* libnvdimm: use generic iostat interfacesToshi Kani2016-10-191-2/+9
| | | | | | | | | | | | | | | nd_iostat_start() and nd_iostat_end() implement the same functionality that generic_start_io_acct() and generic_end_io_acct() already provide. Change nd_iostat_start() and nd_iostat_end() to call the generic iostat interfaces. There is no change in the nd interfaces. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dave Chinner <david@fromorbit.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* Merge branch 'for-4.9/libnvdimm' into libnvdimm-for-nextDan Williams2016-10-071-3/+26
|\
| * libnvdimm, label: convert label tracking to a linked listDan Williams2016-09-301-4/+12
| | | | | | | | | | | | | | | | | | In preparation for enabling multiple namespaces per pmem region, convert the label tracking to use a linked list. In particular this will allow select_pmem_id() to move labels from the unvalidated state to the validated state. Currently we only track one validated set per-region. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * libnvdimm, region: move region-mapping input-paramters to nd_mapping_descDan Williams2016-09-301-0/+14
| | | | | | | | | | | | | | | | | | Before we add more libnvdimm-private fields to nd_mapping make it clear which parameters are input vs libnvdimm internals. Use struct nd_mapping_desc instead of struct nd_mapping in nd_region_desc and make struct nd_mapping private to libnvdimm. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * libnvdimm: Fix nvdimm_probe error on NVDIMM-NToshi Kani2016-09-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'ndctl list --buses --dimms' does not list any NVDIMM-Ns since they are considered as idle. ndctl checks if any driver is attached to nmem device. nvdimm_probe() always fails in nvdimm_init_nsarea() since NVDIMM-Ns do not implement optinal ND_CMD_GET_CONFIG_DATA command. Change nvdimm_probe() to accept the case that the CONFIG_DATA command is not implemented for NVDIMM-Ns. The driver attaches without ndd, which keeps it no-op to the device. Reported-by: Brian Boylston <brian.boylston@hpe.com> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Dan Williams <dan.j.williams@intel.com> Tested-by: Johannes Thumshirn <jthumshirn@suse.de> Acked-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* | libnvdimm, region: fix flush hint table thinkoDan Williams2016-09-241-2/+20
|/ | | | | | | | | | | | | | | | | The definition of the flush hint table as: void __iomem *flush_wpq[0][0]; ...passed the unit test, but is broken as flush_wpq[0][1] and flush_wpq[1][0] refer to the same entry. Fix this to use a helper that calculates a slot in the table based on the geometry of flush hints in the region. This is important to get right since virtualization solutions use this mechanism to trigger hypervisor flushes to platform persistence. Reported-by: Dave Jiang <dave.jiang@intel.com> Tested-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* nvdimm, btt: add a size attribute for BTTsVishal Verma2016-08-081-0/+1
| | | | | | | | | | To be consistent with other namespaces, expose a 'size' attribute for BTT devices also. Cc: Dan Williams <dan.j.williams@intel.com> Reported-by: Linda Knippers <linda.knippers@hpe.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* libnvdimm: cycle flush hintsDan Williams2016-07-111-0/+1
| | | | | | | | | | | | | | | When the NFIT provides multiple flush hint addresses per-dimm it is expressing that the platform is capable of processing multiple flush requests in parallel. There is some fixed cost per flush request, let the cost be shared in parallel on multiple cpus. Since there may not be enough flush hint addresses for each cpu to have one, keep a per-cpu index of the last used hint, hash it with current pid, and assume that access pattern and scheduler randomness will keep the flush-hint usage somewhat staggered across cpus. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* libnvdimm, nfit: move flush hint mapping to region-device driver-dataDan Williams2016-07-111-3/+5
| | | | | | | | | | | | | | In preparation for triggering flushes of a DIMM's writes-posted-queue (WPQ) via the pmem driver move mapping of flush hint addresses to the region driver. Since this uses devm_nvdimm_memremap() the flush addresses will remain mapped while any region to which the dimm belongs is active. We need to communicate more information to the nvdimm core to facilitate this mapping, namely each dimm object now carries an array of flush hint address resources. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* libnvdimm, nfit: remove nfit_spa_map() infrastructureDan Williams2016-07-111-1/+0
| | | | | | | | Now that all shared mappings are handled by devm_nvdimm_memremap() we no longer need nfit_spa_map() nor do we need to trigger a callback to the bus provider at region disable time. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* libnvdimm, dax: autodetect supportDan Williams2016-05-201-2/+9
| | | | | | | | | For autodetecting a previously established dax configuration we need the info block to indicate block-device vs device-dax mode, and we need to have the default namespace probe hand-off the configuration to the dax_pmem driver. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* libnvdimm, dax: introduce device-dax infrastructureDan Williams2016-05-091-0/+25
| | | | | | | | | | | Device DAX is the device-centric analogue of Filesystem DAX (CONFIG_FS_DAX). It allows persistent memory ranges to be allocated and mapped without need of an intervening file system. This initial infrastructure arranges for a libnvdimm pfn-device to be represented as a different device-type so that it can be attached to a driver other than the pmem driver. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* libnvdimm, pmem, pfn: move pfn setup to the coreDan Williams2016-04-221-0/+7
| | | | | | | | | Now that pmem internals have been disentangled from pfn setup, that code can move to the core. This is in preparation for adding another user of the pfn-device capabilities. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* libnvdimm, pmem, pfn: make pmem_rw_bytes generic and refactor pfn setupDan Williams2016-04-221-7/+33
| | | | | | | | | | | | | | | | | In preparation for providing an alternative (to block device) access mechanism to persistent memory, convert pmem_rw_bytes() to nsio_rw_bytes(). This allows ->rw_bytes() functionality without requiring a 'struct pmem_device' to be instantiated. In other words, when ->rw_bytes() is in use i/o is driven through 'struct nd_namespace_io', otherwise it is driven through 'struct pmem_device' and the block layer. This consolidates the disjoint calls to devm_exit_badblocks() and devm_memunmap() into a common devm_nsio_disable() and cleans up the init path to use a unified pmem_attach_disk() implementation. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* libnvdimm, btt, convert nd_btt_probe() to devmDan Williams2016-04-221-2/+4
| | | | | | | | | Pass the device performing the probe so we can use a devm allocation for the btt superblock. Cc: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* libnvdimm, pfn, convert nd_pfn_probe() to devmDan Williams2016-04-221-2/+4
| | | | | | | | | Pass the device performing the probe so we can use a devm allocation for the pfn superblock. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* libnvdimm, pmem: kill pmem->ndnsDan Williams2016-04-221-1/+1
| | | | | | | | We can derive the common namespace from other information. We also do not need to cache it because all the usages are in slow paths. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* libnvdimm, pfn: fix nvdimm_namespace_add_poison() vs section alignmentDan Williams2016-04-071-2/+2
| | | | | | | | | | | | | | | When section alignment padding is in effect we need to shift / truncate the range that is queried for poison by the 'start_pad' or 'end_trunc' reservations. It's easiest if we just pass in an adjusted resource range rather than deriving it from the passed in namespace. With the resource range resolution pushed out to the caller we can also push the namespace-to-region lookup to the caller and drop the implicit pmem-type assumption about the passed in namespace object. Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* libnvdimm, pmem: clear poison on writeDan Williams2016-03-091-0/+2
| | | | | | | | | | | | | | | If a write is directed at a known bad block perform the following: 1/ write the data 2/ send a clear poison command 3/ invalidate the poison out of the cache hierarchy Cc: <x86@kernel.org> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* libnvdimm: async notification supportDan Williams2016-03-051-0/+2
| | | | | | | | In preparation for asynchronous address range scrub support add an ability for the pmem driver to dynamically consume address range scrub results. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* Merge branch 'for-4.5/block-dax' into for-4.5/libnvdimmDan Williams2016-01-101-0/+8
|\
| * libnvdimm: convert to statically allocated badblocksDan Williams2016-01-091-2/+2
| | | | | | | | | | | | | | | | | | | | If a device will ever have badblocks it should always have a badblocks instance available. So, similar to md, embed a badblocks instance in pmem_device. This reduces pointer chasing in the i/o fast path, and simplifies the init path. Reported-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * libnvdimm, pmem: move definition of nvdimm_namespace_add_poison to nd.hDan Williams2016-01-091-0/+2
| | | | | | | | | | | | | | nd-core.h is private to the libnvdimm core internals and should not be used by drivers. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * libnvdimm: Add a poison list and export badblocksVishal Verma2016-01-091-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | During region creation, perform Address Range Scrubs (ARS) for the SPA (System Physical Address) ranges to retrieve known poison locations from firmware. Add a new data structure 'nd_poison' which is used as a list in nvdimm_bus to store these poison locations. When creating a pmem namespace, if there is any known poison associated with its physical address space, convert the poison ranges to bad sectors that are exposed using the badblocks interface. Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* | libnvdimm, pfn: add 'align' attribute, default to HPAGE_SIZEDan Williams2015-12-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | When setting aside capacity for struct page it must be aligned to the largest mapping size that is to be made available via DAX. Make the alignment configurable to enable support for 1GiB page-size mappings. The offset for PFN_MODE_RAM may now be larger than SZ_8K, so fixup the offset check in nvdimm_namespace_attach_pfn(). Reported-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* | libnvdimm, pfn: kill ND_PFN_ALIGNDan Williams2015-12-101-7/+0
|/ | | | | | | The alignment constraint isn't necessary now that devm_memremap_pages() allows for unaligned mappings. Signed-off-by: Dan Williams <dan.j.williams@intel.com>