summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* nfs: Fix bdi handling for cloned superblocksJan Kara2017-05-042-21/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 0d3b12584972 "nfs: Convert to separately allocated bdi" I have wrongly cloned bdi reference in nfs_clone_super(). Further inspection has shown that originally the code was actually allocating a new bdi (in ->clone_server callback) which was later registered in nfs_fs_mount_common() and used for sb->s_bdi in nfs_initialise_sb(). This could later result in bdi for the original superblock not getting unregistered when that superblock got shutdown (as the cloned sb still held bdi reference) and later when a new superblock was created under the same anonymous device number, a clash in sysfs has happened on bdi registration: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 10284 at /linux-next/fs/sysfs/dir.c:31 sysfs_warn_dup+0x64/0x74 sysfs: cannot create duplicate filename '/devices/virtual/bdi/0:32' Modules linked in: axp20x_usb_power gpio_axp209 nvmem_sunxi_sid sun4i_dma sun4i_ss virt_dma CPU: 1 PID: 10284 Comm: mount.nfs Not tainted 4.11.0-rc4+ #14 Hardware name: Allwinner sun7i (A20) Family [<c010f19c>] (unwind_backtrace) from [<c010bc74>] (show_stack+0x10/0x14) [<c010bc74>] (show_stack) from [<c03c6e24>] (dump_stack+0x78/0x8c) [<c03c6e24>] (dump_stack) from [<c0122200>] (__warn+0xe8/0x100) [<c0122200>] (__warn) from [<c0122250>] (warn_slowpath_fmt+0x38/0x48) [<c0122250>] (warn_slowpath_fmt) from [<c02ac178>] (sysfs_warn_dup+0x64/0x74) [<c02ac178>] (sysfs_warn_dup) from [<c02ac254>] (sysfs_create_dir_ns+0x84/0x94) [<c02ac254>] (sysfs_create_dir_ns) from [<c03c8b8c>] (kobject_add_internal+0x9c/0x2ec) [<c03c8b8c>] (kobject_add_internal) from [<c03c8e24>] (kobject_add+0x48/0x98) [<c03c8e24>] (kobject_add) from [<c048d75c>] (device_add+0xe4/0x5a0) [<c048d75c>] (device_add) from [<c048ddb4>] (device_create_groups_vargs+0xac/0xbc) [<c048ddb4>] (device_create_groups_vargs) from [<c048dde4>] (device_create_vargs+0x20/0x28) [<c048dde4>] (device_create_vargs) from [<c02075c8>] (bdi_register_va+0x44/0xfc) [<c02075c8>] (bdi_register_va) from [<c023d378>] (super_setup_bdi_name+0x48/0xa4) [<c023d378>] (super_setup_bdi_name) from [<c0312ef4>] (nfs_fill_super+0x1a4/0x204) [<c0312ef4>] (nfs_fill_super) from [<c03133f0>] (nfs_fs_mount_common+0x140/0x1e8) [<c03133f0>] (nfs_fs_mount_common) from [<c03335cc>] (nfs4_remote_mount+0x50/0x58) [<c03335cc>] (nfs4_remote_mount) from [<c023ef98>] (mount_fs+0x14/0xa4) [<c023ef98>] (mount_fs) from [<c025cba0>] (vfs_kern_mount+0x54/0x128) [<c025cba0>] (vfs_kern_mount) from [<c033352c>] (nfs_do_root_mount+0x80/0xa0) [<c033352c>] (nfs_do_root_mount) from [<c0333818>] (nfs4_try_mount+0x28/0x3c) [<c0333818>] (nfs4_try_mount) from [<c0313874>] (nfs_fs_mount+0x2cc/0x8c4) [<c0313874>] (nfs_fs_mount) from [<c023ef98>] (mount_fs+0x14/0xa4) [<c023ef98>] (mount_fs) from [<c025cba0>] (vfs_kern_mount+0x54/0x128) [<c025cba0>] (vfs_kern_mount) from [<c02600f0>] (do_mount+0x158/0xc7c) [<c02600f0>] (do_mount) from [<c0260f98>] (SyS_mount+0x8c/0xb4) [<c0260f98>] (SyS_mount) from [<c0107840>] (ret_fast_syscall+0x0/0x3c) Fix the problem by always creating new bdi for a superblock as we used to do. Reported-and-tested-by: Corentin Labbe <clabbe.montjoie@gmail.com> Fixes: 0d3b12584972ce5781179ad3f15cca3cdb5cae05 Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
* Merge branch 'for-4.12/block' of git://git.kernel.dk/linux-blockLinus Torvalds2017-05-0139-257/+186
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block layer updates from Jens Axboe: - Add BFQ IO scheduler under the new blk-mq scheduling framework. BFQ was initially a fork of CFQ, but subsequently changed to implement fairness based on B-WF2Q+, a modified variant of WF2Q. BFQ is meant to be used on desktop type single drives, providing good fairness. From Paolo. - Add Kyber IO scheduler. This is a full multiqueue aware scheduler, using a scalable token based algorithm that throttles IO based on live completion IO stats, similary to blk-wbt. From Omar. - A series from Jan, moving users to separately allocated backing devices. This continues the work of separating backing device life times, solving various problems with hot removal. - A series of updates for lightnvm, mostly from Javier. Includes a 'pblk' target that exposes an open channel SSD as a physical block device. - A series of fixes and improvements for nbd from Josef. - A series from Omar, removing queue sharing between devices on mostly legacy drivers. This helps us clean up other bits, if we know that a queue only has a single device backing. This has been overdue for more than a decade. - Fixes for the blk-stats, and improvements to unify the stats and user windows. This both improves blk-wbt, and enables other users to register a need to receive IO stats for a device. From Omar. - blk-throttle improvements from Shaohua. This provides a scalable framework for implementing scalable priotization - particularly for blk-mq, but applicable to any type of block device. The interface is marked experimental for now. - Bucketized IO stats for IO polling from Stephen Bates. This improves efficiency of polled workloads in the presence of mixed block size IO. - A few fixes for opal, from Scott. - A few pulls for NVMe, including a lot of fixes for NVMe-over-fabrics. From a variety of folks, mostly Sagi and James Smart. - A series from Bart, improving our exposed info and capabilities from the blk-mq debugfs support. - A series from Christoph, cleaning up how handle WRITE_ZEROES. - A series from Christoph, cleaning up the block layer handling of how we track errors in a request. On top of being a nice cleanup, it also shrinks the size of struct request a bit. - Removal of mg_disk and hd (sorry Linus) by Christoph. The former was never used by platforms, and the latter has outlived it's usefulness. - Various little bug fixes and cleanups from a wide variety of folks. * 'for-4.12/block' of git://git.kernel.dk/linux-block: (329 commits) block: hide badblocks attribute by default blk-mq: unify hctx delay_work and run_work block: add kblock_mod_delayed_work_on() blk-mq: unify hctx delayed_run_work and run_work nbd: fix use after free on module unload MAINTAINERS: bfq: Add Paolo as maintainer for the BFQ I/O scheduler blk-mq-sched: alloate reserved tags out of normal pool mtip32xx: use runtime tag to initialize command header scsi: Implement blk_mq_ops.show_rq() blk-mq: Add blk_mq_ops.show_rq() blk-mq: Show operation, cmd_flags and rq_flags names blk-mq: Make blk_flags_show() callers append a newline character blk-mq: Move the "state" debugfs attribute one level down blk-mq: Unregister debugfs attributes earlier blk-mq: Only unregister hctxs for which registration succeeded blk-mq-debugfs: Rename functions for registering and unregistering the mq directory blk-mq: Let blk_mq_debugfs_register() look up the queue name blk-mq: Register <dev>/queue/mq after having registered <dev>/queue ide-pm: always pass 0 error to ide_complete_rq in ide_do_devset ide-pm: always pass 0 error to __blk_end_request_all ..
| * block: get rid of blk_integrity_revalidate()Ilya Dryomov2017-04-211-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk") introduced blk_integrity_revalidate(), which seems to assume ownership of the stable pages flag and unilaterally clears it if no blk_integrity profile is registered: if (bi->profile) disk->queue->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES; else disk->queue->backing_dev_info->capabilities &= ~BDI_CAP_STABLE_WRITES; It's called from revalidate_disk() and rescan_partitions(), making it impossible to enable stable pages for drivers that support partitions and don't use blk_integrity: while the call in revalidate_disk() can be trivially worked around (see zram, which doesn't support partitions and hence gets away with zram_revalidate_disk()), rescan_partitions() can be triggered from userspace at any time. This breaks rbd, where the ceph messenger is responsible for generating/verifying CRCs. Since blk_integrity_{un,}register() "must" be used for (un)registering the integrity profile with the block layer, move BDI_CAP_STABLE_WRITES setting there. This way drivers that call blk_integrity_register() and use integrity infrastructure won't interfere with drivers that don't but still want stable pages. Fixes: 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk") Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 4.4+, needs backporting Tested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * scsi: introduce a result field in struct scsi_requestChristoph Hellwig2017-04-201-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This passes on the scsi_cmnd result field to users of passthrough requests. Currently we abuse req->errors for this purpose, but that field will go away in its current form. Note that the old IDE code abuses the errors field in very creative ways and stores all kinds of different values in it. I didn't dare to touch this magic, so the abuses are brought forward 1:1. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * block: remove the blk_execute_rq return valueChristoph Hellwig2017-04-201-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | The function only returns -EIO if rq->errors is non-zero, which is not very useful and lets a large number of callers ignore the return value. Just let the callers figure out their error themselves. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * bdi: Drop 'parent' argument from bdi_register[_va]()Jan Kara2017-04-201-1/+1
| | | | | | | | | | | | | | | | | | Drop 'parent' argument of bdi_register() and bdi_register_va(). It is always NULL. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
| * fs: Remove SB_I_DYNBDI flagJan Kara2017-04-204-7/+1
| | | | | | | | | | | | | | | | | | Now that all bdi structures filesystems use are properly refcounted, we can remove the SB_I_DYNBDI flag. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
| * ubifs: Convert to separately allocated bdiJan Kara2017-04-202-19/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Allocate struct backing_dev_info separately instead of embedding it inside the superblock. This unifies handling of bdi among users. CC: Richard Weinberger <richard@nod.at> CC: Artem Bityutskiy <dedekind1@gmail.com> CC: Adrian Hunter <adrian.hunter@intel.com> CC: linux-mtd@lists.infradead.org Acked-by: Richard Weinberger <richard@nod.at> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
| * nfs: Convert to separately allocated bdiJan Kara2017-04-204-35/+28
| | | | | | | | | | | | | | | | | | | | | | | | Allocate struct backing_dev_info separately instead of embedding it inside the superblock. This unifies handling of bdi among users. CC: Anna Schumaker <anna.schumaker@netapp.com> CC: linux-nfs@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * ncpfs: Convert to separately allocated bdiJan Kara2017-04-202-7/+2
| | | | | | | | | | | | | | | | | | | | | | Allocate struct backing_dev_info separately instead of embedding it inside the superblock. This unifies handling of bdi among users. CC: Petr Vandrovec <petr@vandrovec.name> Acked-by: Petr Vandrovec <petr@vandrovec.name> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
| * nilfs2: Convert to properly refcounting bdiJan Kara2017-04-201-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | Similarly to set_bdev_super() NILFS2 just used block device reference to bdi. Convert it to properly getting bdi reference. The reference will get automatically dropped on superblock destruction. CC: linux-nilfs@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Jens Axboe <axboe@fb.com>
| * gfs2: Convert to properly refcounting bdiJan Kara2017-04-201-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | Similarly to set_bdev_super() GFS2 just used block device reference to bdi. Convert it to properly getting bdi reference. The reference will get automatically dropped on superblock destruction. CC: Steven Whitehouse <swhiteho@redhat.com> CC: Bob Peterson <rpeterso@redhat.com> CC: cluster-devel@redhat.com Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
| * fuse: Get rid of bdi_initializedJan Kara2017-04-203-8/+2
| | | | | | | | | | | | | | | | | | | | | | | | It is not needed anymore since bdi is initialized whenever superblock exists. CC: Miklos Szeredi <miklos@szeredi.hu> CC: linux-fsdevel@vger.kernel.org Suggested-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
| * fuse: Convert to separately allocated bdiJan Kara2017-04-203-36/+17
| | | | | | | | | | | | | | | | | | | | | | | | Allocate struct backing_dev_info separately instead of embedding it inside the superblock. This unifies handling of bdi among users. CC: Miklos Szeredi <miklos@szeredi.hu> CC: linux-fsdevel@vger.kernel.org Acked-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
| * exofs: Convert to separately allocated bdiJan Kara2017-04-202-12/+6
| | | | | | | | | | | | | | | | | | | | | | | | Allocate struct backing_dev_info separately instead of embedding it inside the superblock. This unifies handling of bdi among users. CC: Boaz Harrosh <ooo@electrozaur.com> CC: Benny Halevy <bhalevy@primarydata.com> Acked-by: Boaz Harrosh <ooo@electrozaur.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
| * coda: Convert to separately allocated bdiJan Kara2017-04-201-7/+4
| | | | | | | | | | | | | | | | | | | | | | | | Allocate struct backing_dev_info separately instead of embedding it inside the superblock. This unifies handling of bdi among users. CC: Jan Harkes <jaharkes@cs.cmu.edu> CC: coda@cs.cmu.edu CC: codalist@coda.cs.cmu.edu Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
| * afs: Convert to separately allocated bdiJan Kara2017-04-203-10/+4
| | | | | | | | | | | | | | | | | | | | | | Allocate struct backing_dev_info separately instead of embedding it inside the superblock. This unifies handling of bdi among users. CC: David Howells <dhowells@redhat.com> CC: linux-afs@lists.infradead.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
| * ecryptfs: Convert to separately allocated bdiJan Kara2017-04-202-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | Allocate struct backing_dev_info separately instead of embedding it inside the superblock. This unifies handling of bdi among users. CC: Tyler Hicks <tyhicks@canonical.com> CC: ecryptfs@vger.kernel.org Acked-by: Tyler Hicks <tyhicks@canonical.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
| * cifs: Convert to separately allocated bdiJan Kara2017-04-203-12/+6
| | | | | | | | | | | | | | | | | | | | | | Allocate struct backing_dev_info separately instead of embedding it inside superblock. This unifies handling of bdi among users. CC: Steve French <sfrench@samba.org> CC: linux-cifs@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
| * ceph: Convert to separately allocated bdiJan Kara2017-04-204-28/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | Allocate struct backing_dev_info separately instead of embedding it inside client structure. This unifies handling of bdi among users. CC: Ilya Dryomov <idryomov@gmail.com> CC: "Yan, Zheng" <zyan@redhat.com> CC: Sage Weil <sage@redhat.com> CC: ceph-devel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
| * btrfs: Convert to separately allocated bdiJan Kara2017-04-203-30/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allocate struct backing_dev_info separately instead of embedding it inside superblock. This unifies handling of bdi among users. CC: Chris Mason <clm@fb.com> CC: Josef Bacik <jbacik@fb.com> CC: David Sterba <dsterba@suse.com> CC: linux-btrfs@vger.kernel.org Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
| * 9p: Convert to separately allocated bdiJan Kara2017-04-203-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | Allocate struct backing_dev_info separately instead of embedding it inside session. This unifies handling of bdi among users. CC: Eric Van Hensbergen <ericvh@gmail.com> CC: Ron Minnich <rminnich@sandia.gov> CC: Latchesar Ionkov <lucho@ionkov.net> CC: v9fs-developer@lists.sourceforge.net Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
| * fs: Get proper reference for s_bdiJan Kara2017-04-201-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | So far we just relied on block device to hold a bdi reference for us while the filesystem is mounted. While that works perfectly fine, it is a bit awkward that we have a pointer to a refcounted structure in the superblock without proper reference. So make s_bdi hold a proper reference to block device's BDI. No filesystem using mount_bdev() actually changes s_bdi so this is safe and will make bdev filesystems work the same way as filesystems needing to set up their private bdi. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
| * fs: Provide infrastructure for dynamic BDIs in filesystemsJan Kara2017-04-201-0/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide helper functions for setting up dynamically allocated backing_dev_info structures for filesystems and cleaning them up on superblock destruction. CC: linux-mtd@lists.infradead.org CC: linux-nfs@vger.kernel.org CC: Petr Vandrovec <petr@vandrovec.name> CC: linux-nilfs@vger.kernel.org CC: cluster-devel@redhat.com CC: osd-dev@open-osd.org CC: codalist@coda.cs.cmu.edu CC: linux-afs@lists.infradead.org CC: ecryptfs@vger.kernel.org CC: linux-cifs@vger.kernel.org CC: ceph-devel@vger.kernel.org CC: linux-btrfs@vger.kernel.org CC: v9fs-developer@lists.sourceforge.net CC: lustre-devel@lists.lustre.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
| * block_dev: use blkdev_issue_zerout for hole punchesChristoph Hellwig2017-04-081-8/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This gets us support for non-discard efficient write of zeroes (e.g. NVMe) and prepares for removing the discard_zeroes_data flag. Also remove a pointless discard support check, which is done in blkdev_issue_discard already. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * block: add a flags argument to (__)blkdev_issue_zerooutChristoph Hellwig2017-04-083-3/+3
| | | | | | | | | | | | | | | | | | | | Turn the existing discard flag into a new BLKDEV_ZERO_UNMAP flag with similar semantics, but without referring to diѕcard. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * Merge branch 'for-linus' into for-4.12/blockJens Axboe2017-04-0720-209/+167
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | We've added a considerable amount of fixes for stalls and issues with the blk-mq scheduling in the 4.11 series since forking off the for-4.12/block branch. We need to do improvements on top of that for 4.12, so pull in the previous fixes to make our lives easier going forward. Signed-off-by: Jens Axboe <axboe@fb.com>
| * | block: Fix oops in locked_inode_to_wb_and_lock_list()Jan Kara2017-03-221-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When block device is closed, we call inode_detach_wb() in __blkdev_put() which sets inode->i_wb to NULL. That is contrary to expectations that inode->i_wb stays valid once set during the whole inode's lifetime and leads to oops in wb_get() in locked_inode_to_wb_and_lock_list() because inode_to_wb() returned NULL. The reason why we called inode_detach_wb() is not valid anymore though. BDI is guaranteed to stay along until we call bdi_put() from bdev_evict_inode() so we can postpone calling inode_detach_wb() to that moment. Also add a warning to catch if someone uses inode_detach_wb() in a dangerous way. Reported-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | block: Fix bdi assignment to bdev inode when racing with disk deleteJan Kara2017-03-221-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When disk->fops->open() in __blkdev_get() returns -ERESTARTSYS, we restart the process of opening the block device. However we forget to switch bdev->bd_bdi back to noop_backing_dev_info and as a result bdev inode will be pointing to a stale bdi. Fix the problem by setting bdev->bd_bdi later when __blkdev_get() is already guaranteed to succeed. Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
* | | Merge branch 'for-linus-4.11' of ↵Linus Torvalds2017-04-281-4/+7
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fix from Chris Mason: "We have one more fix for btrfs. This gets rid of a new WARN_ON from rc1 that ended up making more noise than we really want. The larger fix for the underflow got delayed a bit and it's better for now to put it under CONFIG_BTRFS_DEBUG" * 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: btrfs: qgroup: move noisy underflow warning to debugging build
| * | | btrfs: qgroup: move noisy underflow warning to debugging buildDavid Sterba2017-04-191-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The WARN_ON and warning from report_reserved_underflow can become very noisy and is visible unconditionally although this is namely for debugging. The patch "btrfs: Add WARN_ON for qgroup reserved underflow" (18dc22c19bef520cca11ce4c0807ac9dec48d31f) went to 4.11-rc1 and the plan was to get the fix as well, but this hasn't happened. CC: Qu Wenruo <quwenruo@cn.fujitsu.com> Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
* | | | Merge tag 'nfsd-4.11-3' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2017-04-273-8/+51
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd fixes from Bruce Fields: "Thanks to Ari Kauppi and Tuomas Haanpää at Synopsis for spotting bugs in our NFSv2/v3 xdr code that could crash the server or leak memory" * tag 'nfsd-4.11-3' of git://linux-nfs.org/~bfields/linux: nfsd: stricter decoding of write-like NFSv2/v3 ops nfsd4: minor NFSv2/v3 write decoding cleanup nfsd: check for oversized NFSv2/v3 arguments
| * | | | nfsd: stricter decoding of write-like NFSv2/v3 opsJ. Bruce Fields2017-04-252-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The NFSv2/v3 code does not systematically check whether we decode past the end of the buffer. This generally appears to be harmless, but there are a few places where we do arithmetic on the pointers involved and don't account for the possibility that a length could be negative. Add checks to catch these. Reported-by: Tuomas Haanpää <thaan@synopsys.com> Reported-by: Ari Kauppi <ari@synopsys.com> Reviewed-by: NeilBrown <neilb@suse.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | nfsd4: minor NFSv2/v3 write decoding cleanupJ. Bruce Fields2017-04-252-8/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use a couple shortcuts that will simplify a following bugfix. Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | nfsd: check for oversized NFSv2/v3 argumentsJ. Bruce Fields2017-04-251-0/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A client can append random data to the end of an NFSv2 or NFSv3 RPC call without our complaining; we'll just stop parsing at the end of the expected data and ignore the rest. Encoded arguments and replies are stored together in an array of pages, and if a call is too large it could leave inadequate space for the reply. This is normally OK because NFS RPC's typically have either short arguments and long replies (like READ) or long arguments and short replies (like WRITE). But a client that sends an incorrectly long reply can violate those assumptions. This was observed to cause crashes. Also, several operations increment rq_next_page in the decode routine before checking the argument size, which can leave rq_next_page pointing well past the end of the page array, causing trouble later in svc_free_pages. So, following a suggestion from Neil Brown, add a central check to enforce our expectation that no NFSv2/v3 call has both a large call and a large reply. As followup we may also want to rewrite the encoding routines to check more carefully that they aren't running off the end of the page array. We may also consider rejecting calls that have any extra garbage appended. That would be safer, and within our rights by spec, but given the age of our server and the NFS protocol, and the fact that we've never enforced this before, we may need to balance that against the possibility of breaking some oddball client. Reported-by: Tuomas Haanpää <thaan@synopsys.com> Reported-by: Ari Kauppi <ari@synopsys.com> Cc: stable@vger.kernel.org Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | | | | Merge tag 'ceph-for-4.11-rc9' of git://github.com/ceph/ceph-clientLinus Torvalds2017-04-271-12/+10
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull ceph fix from Ilya Dryomov: "A fix for a kernel stack overflow bug in ceph setattr code, marked for stable" * tag 'ceph-for-4.11-rc9' of git://github.com/ceph/ceph-client: ceph: fix recursion between ceph_set_acl() and __ceph_setattr()
| * | | | | ceph: fix recursion between ceph_set_acl() and __ceph_setattr()Yan, Zheng2017-04-251-12/+10
| |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ceph_set_acl() calls __ceph_setattr() if the setacl operation needs to modify inode's i_mode. __ceph_setattr() updates inode's i_mode, then calls posix_acl_chmod(). The problem is that __ceph_setattr() calls posix_acl_chmod() before sending the setattr request. The get_acl() call in posix_acl_chmod() can trigger a getxattr request. The reply of the getxattr request can restore inode's i_mode to its old value. The set_acl() call in posix_acl_chmod() sees old value of inode's i_mode, so it calls __ceph_setattr() again. Cc: stable@vger.kernel.org # needs backporting for < 4.9 Link: http://tracker.ceph.com/issues/19688 Reported-by: Jerry Lee <leisurelysw24@gmail.com> Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Tested-by: Luis Henriques <lhenriques@suse.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* | | | | Merge branch 'for-linus' of ↵Linus Torvalds2017-04-272-12/+19
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: - fix orangefs handling of faults on write() - I'd missed that one back when orangefs was going through review. - readdir counterpart of "9p: cope with bogus responses from server in p9_client_{read,write}" - server might be lying or broken, and we'd better not overrun the kmalloc'ed buffer we are copying the results into. - NFS O_DIRECT read/write can leave iov_iter advanced by too much; that's what had been causing iov_iter_pipe() warnings davej had been seeing. - statx_timestamp.tv_nsec type fix (s32 -> u32). That one really should go in before 4.11. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: uapi: change the type of struct statx_timestamp.tv_nsec to unsigned fix nfs O_DIRECT advancing iov_iter too much p9_client_readdir() fix orangefs_bufmap_copy_from_iovec(): fix EFAULT handling
| * | | | | fix nfs O_DIRECT advancing iov_iter too muchAl Viro2017-04-171-9/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It leaves the iterator advanced by the amount of IO it has requested instead of the amount actually transferred. Among other things, that confuses the hell out of generic_file_splice_read(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | orangefs_bufmap_copy_from_iovec(): fix EFAULT handlingAl Viro2017-04-171-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | short copy here should mean instant EFAULT, not "move to the next page and hope it fails there, this time with nothing copied" Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | | | statx: correct error handling of NULL pathnameMichael Kerrisk (man-pages)2017-04-271-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The change in commit 1e2f82d1e9d1 ("statx: Kill fd-with-NULL-path support in favour of AT_EMPTY_PATH") to error on a NULL pathname to statx() is inconsistent. It results in the error EINVAL for a NULL pathname. Other system calls with similar APIs (fchownat(), fstatat(), linkat()), return EFAULT. The solution is simply to remove the EINVAL check. As I already pointed out in [1], user_path_at*() and filename_lookup() will handle the NULL pathname as per the other APIs, to correctly produce the error EFAULT. [1] https://lkml.org/lkml/2017/4/26/561 Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | statx: Kill fd-with-NULL-path support in favour of AT_EMPTY_PATHDavid Howells2017-04-261-7/+6
| |/ / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the new statx() syscall, the following both allow the attributes of the file attached to a file descriptor to be retrieved: statx(dfd, NULL, 0, ...); and: statx(dfd, "", AT_EMPTY_PATH, ...); Change the code to reject the first option, though this means copying the path and engaging pathwalk for the fstat() equivalent. dfd can be a non-directory provided path is "". [ The timing of this isn't wonderful, but applying this now before we have statx() in any released kernel, before anybody starts using the NULL special case. - Linus ] Fixes: a528d35e8bfc ("statx: Add a system call to make enhanced file info available") Reported-by: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Eric Sandeen <sandeen@sandeen.net> cc: fstests@vger.kernel.org cc: linux-api@vger.kernel.org cc: linux-man@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | Merge tag 'upstream-4.11-rc7' of git://git.infradead.org/linux-ubifsLinus Torvalds2017-04-232-9/+19
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull UBI/UBIFS fixes from Richard Weinberger: "This contains fixes for issues in both UBI and UBIFS: - more O_TMPFILE fallout - RENAME_WHITEOUT regression due to a mis-merge - memory leak in ubifs_mknod() - power-cut problem in UBI's update volume feature" * tag 'upstream-4.11-rc7' of git://git.infradead.org/linux-ubifs: ubifs: Fix O_TMPFILE corner case in ubifs_link() ubifs: Fix RENAME_WHITEOUT support ubifs: Fix debug messages for an invalid filename in ubifs_dump_inode ubifs: Fix debug messages for an invalid filename in ubifs_dump_node ubifs: Remove filename from debug messages in ubifs_readdir ubifs: Fix memory leak in error path in ubifs_mknod ubi/upd: Always flush after prepared for an update
| * | | | | ubifs: Fix O_TMPFILE corner case in ubifs_link()Richard Weinberger2017-04-181-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is perfectly fine to link a tmpfile back using linkat(). Since tmpfiles are created with a link count of 0 they appear on the orphan list, upon re-linking the inode has to be removed from the orphan list again. Ralph faced a filesystem corruption in combination with overlayfs due to this bug. Cc: <stable@vger.kernel.org> Cc: Ralph Sennhauser <ralph.sennhauser@gmail.com> Cc: Amir Goldstein <amir73il@gmail.com> Reported-by: Ralph Sennhauser <ralph.sennhauser@gmail.com> Tested-by: Ralph Sennhauser <ralph.sennhauser@gmail.com> Reported-by: Amir Goldstein <amir73il@gmail.com> Fixes: 474b93704f321 ("ubifs: Implement O_TMPFILE") Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | | ubifs: Fix RENAME_WHITEOUT supportFelix Fietkau2017-03-301-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove faulty leftover check in do_rename(), apparently introduced in a merge that combined whiteout support changes with commit f03b8ad8d386 ("fs: support RENAME_NOREPLACE for local filesystems") Fixes: f03b8ad8d386 ("fs: support RENAME_NOREPLACE for local filesystems") Fixes: 9e0a1fff8db5 ("ubifs: Implement RENAME_WHITEOUT") Cc: stable@vger.kernel.org Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | | ubifs: Fix debug messages for an invalid filename in ubifs_dump_inodeHyunchul Lee2017-03-301-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | instead of filenames, print inode numbers, file types, and length. Signed-off-by: Hyunchul Lee <cheol.lee@lge.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | | ubifs: Fix debug messages for an invalid filename in ubifs_dump_nodeHyunchul Lee2017-03-301-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | if a character is not printable, print '?' instead of that. Signed-off-by: Hyunchul Lee <cheol.lee@lge.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | | ubifs: Remove filename from debug messages in ubifs_readdirHyunchul Lee2017-03-301-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | if filename is encrypted, filename could have no printable characters. so remove it. Signed-off-by: Hyunchul Lee <cheol.lee@lge.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | | ubifs: Fix memory leak in error path in ubifs_mknodRichard Weinberger2017-03-301-1/+3
| | |_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | When fscrypt_setup_filename() fails we have to free dev. Signed-off-by: Richard Weinberger <richard@nod.at>
* | | | | Merge tag 'nfsd-4.11-2' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2017-04-211-1/+1
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd bugfix from Bruce Fields: "Fix a 4.11 regression that triggers a BUG() on an attempt to use an unsupported NFSv4 compound op" * tag 'nfsd-4.11-2' of git://linux-nfs.org/~bfields/linux: nfsd: fix oops on unsupported operation