summaryrefslogtreecommitdiffstats
path: root/drivers/md
Commit message (Collapse)AuthorAgeFilesLines
* dm cache: fix race causing dirty blocks to be marked as cleanAnssi Hannula2014-09-101-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a writeback or a promotion of a block is completed, the cell of that block is removed from the prison, the block is marked as clean, and the clear_dirty() callback of the cache policy is called. Unfortunately, performing those actions in this order allows an incoming new write bio for that block to come in before clearing the dirty status is completed and therefore possibly causing one of these two scenarios: Scenario A: Thread 1 Thread 2 cell_defer() . - cell removed from prison . - detained bios queued . . incoming write bio . remapped to cache . set_dirty() called, . but block already dirty . => it does nothing clear_dirty() . - block marked clean . - policy clear_dirty() called . Result: Block is marked clean even though it is actually dirty. No writeback will occur. Scenario B: Thread 1 Thread 2 cell_defer() . - cell removed from prison . - detained bios queued . clear_dirty() . - block marked clean . . incoming write bio . remapped to cache . set_dirty() called . - block marked dirty . - policy set_dirty() called - policy clear_dirty() called . Result: Block is properly marked as dirty, but policy thinks it is clean and therefore never asks us to writeback it. This case is visible in "dmsetup status" dirty block count (which normally decreases to 0 on a quiet device). Fix these issues by calling clear_dirty() before calling cell_defer(). Incoming bios for that block will then be detained in the cell and released only after clear_dirty() has completed, so the race will not occur. Found by inspecting the code after noticing spurious dirty counts (scenario B). Signed-off-by: Anssi Hannula <anssi.hannula@iki.fi> Acked-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
* dm crypt: fix access beyond the end of allocated spaceMikulas Patocka2014-08-281-6/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The DM crypt target accesses memory beyond allocated space resulting in a crash on 32 bit x86 systems. This bug is very old (it dates back to 2.6.25 commit 3a7f6c990ad04 "dm crypt: use async crypto"). However, this bug was masked by the fact that kmalloc rounds the size up to the next power of two. This bug wasn't exposed until 3.17-rc1 commit 298a9fa08a ("dm crypt: use per-bio data"). By switching to using per-bio data there was no longer any padding beyond the end of a dm-crypt allocated memory block. To minimize allocation overhead dm-crypt puts several structures into one block allocated with kmalloc. The block holds struct ablkcipher_request, cipher-specific scratch pad (crypto_ablkcipher_reqsize(any_tfm(cc))), struct dm_crypt_request and an initialization vector. The variable dmreq_start is set to offset of struct dm_crypt_request within this memory block. dm-crypt allocates the block with this size: cc->dmreq_start + sizeof(struct dm_crypt_request) + cc->iv_size. When accessing the initialization vector, dm-crypt uses the function iv_of_dmreq, which performs this calculation: ALIGN((unsigned long)(dmreq + 1), crypto_ablkcipher_alignmask(any_tfm(cc)) + 1). dm-crypt allocated "cc->iv_size" bytes beyond the end of dm_crypt_request structure. However, when dm-crypt accesses the initialization vector, it takes a pointer to the end of dm_crypt_request, aligns it, and then uses it as the initialization vector. If the end of dm_crypt_request is not aligned on a crypto_ablkcipher_alignmask(any_tfm(cc)) boundary the alignment causes the initialization vector to point beyond the allocated space. Fix this bug by calculating the variable iv_size_padding and adding it to the allocated size. Also correct the alignment of dm_crypt_request. struct dm_crypt_request is specific to dm-crypt (it isn't used by the crypto subsystem at all), so it is aligned on __alignof__(struct dm_crypt_request). Also align per_bio_data_size on ARCH_KMALLOC_MINALIGN, so that it is aligned as if the block was allocated with kmalloc. Reported-by: Krzysztof Kolasa <kkolasa@winsoft.pl> Tested-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* md/raid10: always initialise ->state on newly allocated r10_bioNeilBrown2014-08-191-0/+3
| | | | | | | | | | | | | | Most places which allocate an r10_bio zero the ->state, some don't. As the r10_bio comes from a mempool, and the allocation function uses kzalloc it is often zero anyway. But sometimes it isn't and it is best to be safe. I only noticed this because of the bug fixed by an earlier patch where the r10_bios allocated for a reshape were left around to be used by a subsequent resync. In that case the R10BIO_IsReshape flag caused problems. Signed-off-by: NeilBrown <neilb@suse.de>
* md/raid10: avoid memory leak on error path during reshape.NeilBrown2014-08-191-0/+1
| | | | | | | If raid10 reshape fails to find somewhere to read a block from, it returns without freeing memory... Signed-off-by: NeilBrown <neilb@suse.de>
* md/raid10: Fix memory leak when raid10 reshape completes.NeilBrown2014-08-191-0/+1
| | | | | | | | | | | | | | | | | | When a raid10 commences a resync/recovery/reshape it allocates some buffer space. When a resync/recovery completes the buffer space is freed. But not when the reshape completes. This can result in a small memory leak. There is a subtle side-effect of this bug. When a RAID10 is reshaped to a larger array (more devices), the reshape is immediately followed by a "resync" of the new space. This "resync" will use the buffer space which was allocated for "reshape". This can cause problems including a "BUG" in the SCSI layer. So this is suitable for -stable. Cc: stable@vger.kernel.org (v3.5+) Fixes: 3ea7daa5d7fde47cd41f4d56c2deb949114da9d6 Signed-off-by: NeilBrown <neilb@suse.de>
* md/raid10: fix memory leak when reshaping a RAID10.NeilBrown2014-08-191-1/+1
| | | | | | | | | | | | | | | | | | raid10 reshape clears unwanted bits from a bio->bi_flags using a method which, while clumsy, worked until 3.10 when BIO_OWNS_VEC was added. Since then it clears that bit but shouldn't. This results in a memory leak. So change to used the approved method of clearing unwanted bits. As this causes a memory leak which can consume all of memory the fix is suitable for -stable. Fixes: a38352e0ac02dbbd4fa464dc22d1352b5fbd06fd Cc: stable@vger.kernel.org (v3.10+) Reported-by: mdraid.pkoch@dfgh.net (Peter Koch) Signed-off-by: NeilBrown <neilb@suse.de>
* md/raid6: avoid data corruption during recovery of double-degraded RAID6NeilBrown2014-08-181-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | During recovery of a double-degraded RAID6 it is possible for some blocks not to be recovered properly, leading to corruption. If a write happens to one block in a stripe that would be written to a missing device, and at the same time that stripe is recovering data to the other missing device, then that recovered data may not be written. This patch skips, in the double-degraded case, an optimisation that is only safe for single-degraded arrays. Bug was introduced in 2.6.32 and fix is suitable for any kernel since then. In an older kernel with separate handle_stripe5() and handle_stripe6() functions the patch must change handle_stripe6(). Cc: stable@vger.kernel.org (2.6.32+) Fixes: 6c0069c0ae9659e3a91b68eaed06a5c6c37f45c8 Cc: Yuri Tikhonov <yur@emcraft.com> Cc: Dan Williams <dan.j.williams@intel.com> Reported-by: "Manibalan P" <pmanibalan@amiindia.co.in> Tested-by: "Manibalan P" <pmanibalan@amiindia.co.in> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1090423 Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Dan Williams <dan.j.williams@intel.com>
* md/raid5: avoid livelock caused by non-aligned writes.NeilBrown2014-08-181-1/+1
| | | | | | | | | | | | | | | | | | | | If a stripe in a raid6 array received a write to each data block while the array is degraded, and if any of these writes to a missing device are not page-aligned, then a live-lock happens. In this case the P and Q blocks need to be read so that the part of the missing block which is *not* being updated by the write can be constructed. Due to a logic error, these blocks are not loaded, so the update cannot proceed and the stripe is 'handled' repeatedly in an infinite loop. This bug is unlikely as most writes are page aligned. However as it can lead to a livelock it is suitable for -stable. It was introduced in 3.16. Cc: stable@vger.kernel.org (v3.16) Fixed: 67f455486d2ea20b2d94d6adf5b9b783d079e321 Signed-off-by: NeilBrown <neilb@suse.de>
* Merge tag 'dm-3.17-changes' of ↵Linus Torvalds2014-08-1410-205/+394
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper changes from Mike Snitzer: - Allow the thin target to paired with any size external origin; also allow thin snapshots to be larger than the external origin. - Add support for quickly loading a repetitive pattern into the dm-switch target. - Use per-bio data in the dm-crypt target instead of always using a mempool for each allocation. Required switching to kmalloc alignment for the bio slab. - Fix DM core to properly stack the QUEUE_FLAG_NO_SG_MERGE flag - Fix the dm-cache and dm-thin targets' export of the minimum_io_size to match the data block size -- this fixes an issue where mkfs.xfs would improperly infer raid striping was in place on the underlying storage. - Small cleanups in dm-io, dm-mpath and dm-cache * tag 'dm-3.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm table: propagate QUEUE_FLAG_NO_SG_MERGE dm switch: efficiently support repetitive patterns dm switch: factor out switch_region_table_read dm cache: set minimum_io_size to cache's data block size dm thin: set minimum_io_size to pool's data block size dm crypt: use per-bio data block: use kmalloc alignment for bio slab dm table: make dm_table_supports_discards static dm cache metadata: use dm-space-map-metadata.h defined size limits dm cache: fail migrations in the do_worker error path dm cache: simplify deferred set reference count increments dm thin: relax external origin size constraints dm thin: switch to an atomic_t for tracking pending new block preparations dm mpath: eliminate pg_ready() wrapper dm io: simplify dec_count and sync_io
| * dm table: propagate QUEUE_FLAG_NO_SG_MERGEJeff Moyer2014-08-101-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 05f1dd5 ("block: add queue flag for disabling SG merging") introduced a new queue flag: QUEUE_FLAG_NO_SG_MERGE. This gets set by default in blk_mq_init_queue for mq-enabled devices. The effect of the flag is to bypass the SG segment merging. Instead, the bio->bi_vcnt is used as the number of hardware segments. With a device mapper target on top of a device with QUEUE_FLAG_NO_SG_MERGE set, we can end up sending down more segments than a driver is prepared to handle. I ran into this when backporting the virtio_blk mq support. It triggerred this BUG_ON, in virtio_queue_rq: BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems); The queue's max is set here: blk_queue_max_segments(q, vblk->sg_elems-2); Basically, what happens is that a bio is built up for the dm device (which does not have the QUEUE_FLAG_NO_SG_MERGE flag set) using bio_add_page. That path will call into __blk_recalc_rq_segments, so what you end up with is bi_phys_segments being much smaller than bi_vcnt (and bi_vcnt grows beyond the maximum sg elements). Then, when the bio is submitted, it gets cloned. When the cloned bio is submitted, it will end up in blk_recount_segments, here: if (test_bit(QUEUE_FLAG_NO_SG_MERGE, &q->queue_flags)) bio->bi_phys_segments = bio->bi_vcnt; and now we've set bio->bi_phys_segments to a number that is beyond what was registered as queue_max_segments by the driver. The right way to fix this is to propagate the queue flag up the stack. The rules for propagating the flag are simple: - if the flag is set for any underlying device, it must be set for the upper device - consequently, if the flag is not set for any underlying device, it should not be set for the upper device. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 3.16+
| * dm switch: efficiently support repetitive patternsMikulas Patocka2014-08-011-2/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for quickly loading a repetitive pattern into the dm-switch target. In the "set_regions_mappings" message, the user may now use "Rn,m" as one of the arguments. "n" and "m" are hexadecimal numbers. The "Rn,m" argument repeats the last "n" arguments in the following "m" slots. For example: dmsetup message switch 0 set_region_mappings 1000:1 :2 R2,10 is equivalent to dmsetup message switch 0 set_region_mappings 1000:1 :2 :1 :2 :1 :2 :1 :2 \ :1 :2 :1 :2 :1 :2 :1 :2 :1 :2 Requested-by: Jay Wang <jwang@nimblestorage.com> Tested-by: Jay Wang <jwang@nimblestorage.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm switch: factor out switch_region_table_readMikulas Patocka2014-08-011-5/+13
| | | | | | | | | | | | | | | | | | Move code that reads the table to a switch_region_table_read. It will be needed for the next commit. No functional change. Tested-by: Jay Wang <jwang@nimblestorage.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm cache: set minimum_io_size to cache's data block sizeMike Snitzer2014-08-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Before, if the block layer's limit stacking didn't establish an optimal_io_size that was compatible with the cache's data block size we'd set optimal_io_size to the data block size and minimum_io_size to 0 (which the block layer adjusts to be physical_block_size). Update cache_io_hints() to set both minimum_io_size and optimal_io_size to the cache's data block size. This fixes an issue where mkfs.xfs would create more XFS Allocation Groups on cache volumes than on a normal linear LV of comparable size. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm thin: set minimum_io_size to pool's data block sizeMike Snitzer2014-08-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before, if the block layer's limit stacking didn't establish an optimal_io_size that was compatible with the thin-pool's data block size we'd set optimal_io_size to the data block size and minimum_io_size to 0 (which the block layer adjusts to be physical_block_size). Update pool_io_hints() to set both minimum_io_size and optimal_io_size to the thin-pool's data block size. This fixes an issue reported where mkfs.xfs would create more XFS Allocation Groups on thinp volumes than on a normal linear LV of comparable size, see: https://bugzilla.redhat.com/show_bug.cgi?id=1003227 Reported-by: Chris Murphy <lists@colorremedies.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm crypt: use per-bio dataMikulas Patocka2014-08-011-14/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change dm-crypt so that it uses auxiliary data allocated with the bio. Dm-crypt requires two allocations per request - struct dm_crypt_io and struct ablkcipher_request (with other data appended to it). It previously only used mempool allocations. Some requests may require more dm_crypt_ios and ablkcipher_requests, however most requests need just one of each of these two structures to complete. This patch changes it so that the first dm_crypt_io and ablkcipher_request are allocated with the bio (using target per_bio_data_size option). If the request needs additional values, they are allocated from the mempool. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm table: make dm_table_supports_discards staticMikulas Patocka2014-08-012-37/+37
| | | | | | | | | | | | | | | | | | The function dm_table_supports_discards is only called from dm-table.c:dm_table_set_restrictions(). So move it above dm_table_set_restrictions and make it static. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm cache metadata: use dm-space-map-metadata.h defined size limitsMike Snitzer2014-08-013-8/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | Commit 7d48935e cleaned up the persistent-data's space-map-metadata limits by elevating them to dm-space-map-metadata.h. Update dm-cache-metadata to use these same limits. The calculation for DM_CACHE_METADATA_MAX_SECTORS didn't account for the sizeof the disk_bitmap_header. So the supported maximum metadata size is a bit smaller (reduced from 33423360 to 33292800 sectors). Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com>
| * dm cache: fail migrations in the do_worker error pathJoe Thornber2014-08-011-0/+1
| | | | | | | | | | Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm cache: simplify deferred set reference count incrementsJoe Thornber2014-08-011-46/+77
| | | | | | | | | | | | | | | | | | | | | | Factor out inc_and_issue and inc_ds helpers to simplify deferred set reference count increments. Also cleanup cache_map to consistently call cell_defer and inc_ds when the bio is DM_MAPIO_REMAPPED. No functional change. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm thin: relax external origin size constraintsJoe Thornber2014-08-011-43/+115
| | | | | | | | | | | | | | | | | | | | Track the size of any external origin. Previously the external origin's size had to be a multiple of the thin-pool's block size, that is no longer a requirement. In addition, snapshots that are larger than the external origin are now supported. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm thin: switch to an atomic_t for tracking pending new block preparationsJoe Thornber2014-08-011-13/+16
| | | | | | | | | | | | | | | | | | Previously we used separate boolean values to track quiescing and copying actions. By switching to an atomic_t we can support blocks that need a partial copy and partial zero. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm mpath: eliminate pg_ready() wrapperMike Snitzer2014-08-011-4/+2
| | | | | | | | | | | | | | | | pg_ready() is not comprehensive in its logic and only serves to obfuscate code. Replace pg_ready() with the appropriate logic in multipath_map(). Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm io: simplify dec_count and sync_ioJoe Thornber2014-08-011-35/+42
| | | | | | | | | | | | | | | | | | | | | | Remove the io struct off the stack in sync_io() and allocate it from the mempool like is done in async_io(). dec_count() now always calls a callback function and always frees the io struct back to the mempool (so sync_io and async_io share this pattern). Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* | Merge branch 'for-3.17/drivers' of git://git.kernel.dk/linux-blockLinus Torvalds2014-08-1414-65/+119
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block driver changes from Jens Axboe: "Nothing out of the ordinary here, this pull request contains: - A big round of fixes for bcache from Kent Overstreet, Slava Pestov, and Surbhi Palande. No new features, just a lot of fixes. - The usual round of drbd updates from Andreas Gruenbacher, Lars Ellenberg, and Philipp Reisner. - virtio_blk was converted to blk-mq back in 3.13, but now Ming Lei has taken it one step further and added support for actually using more than one queue. - Addition of an explicit SG_FLAG_Q_AT_HEAD for block/bsg, to compliment the the default behavior of adding to the tail of the queue. From Douglas Gilbert" * 'for-3.17/drivers' of git://git.kernel.dk/linux-block: (86 commits) bcache: Drop unneeded blk_sync_queue() calls bcache: add mutex lock for bch_is_open bcache: Correct printing of btree_gc_max_duration_ms bcache: try to set b->parent properly bcache: fix memory corruption in init error path bcache: fix crash with incomplete cache set bcache: Fix more early shutdown bugs bcache: fix use-after-free in btree_gc_coalesce() bcache: Fix an infinite loop in journal replay bcache: fix crash in bcache_btree_node_alloc_fail tracepoint bcache: bcache_write tracepoint was crashing bcache: fix typo in bch_bkey_equal_header bcache: Allocate bounce buffers with GFP_NOWAIT bcache: Make sure to pass GFP_WAIT to mempool_alloc() bcache: fix uninterruptible sleep in writeback thread bcache: wait for buckets when allocating new btree root bcache: fix crash on shutdown in passthrough mode bcache: fix lockdep warnings on shutdown bcache allocator: send discards with correct size bcache: Fix to remove the rcu_sched stalls. ...
| * | bcache: Drop unneeded blk_sync_queue() callsKent Overstreet2014-08-041-8/+2
| | | | | | | | | | | | | | | | | | | | | | | | this is needed for the queue/block device we created (it's done by blk_cleanup_queue() which we do call) - but calling it for the block devices we only opened is pointless. Change-Id: I53dfded14ed15b9581d10ca8399d5e1b3abbf9f2
| * | bcache: add mutex lock for bch_is_openJianjian Huo2014-08-041-0/+2
| | | | | | | | | | | | | | | | | | | | | Since bch_is_open will iterate linked list bch_cache_sets and uncached_devices, it needs bch_register_lock. Signed-off-by: Jianjian Huo <samuel.huo@gmail.com>
| * | bcache: Correct printing of btree_gc_max_duration_msSurbhi Palande2014-08-041-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | time_stats::btree_gc_max_duration_mc is not bit shifted by 8 Fixes BUG #138 Change-Id: I44fc6e1d0579674016acc533f1a546b080e5371a Signed-off-by: Surbhi Palande <sap@daterainc.com>
| * | bcache: try to set b->parent properlySlava Pestov2014-08-043-20/+25
| | | | | | | | | | | | | | | | | | | | | bcache_flash_dev.ktest would reliably crash with 8k and 16k bucket size before; now it passes. Change-Id: Ib542232235e39298c3a7548fe52b645cabb823d1
| * | bcache: fix memory corruption in init error pathSlava Pestov2014-08-041-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | If register_cache_set() failed, we would touch ca->set after it had already been freed. Also, fix an assertion to catch this. Change-Id: I748e5f5b223e2d9b2602075dec2f997cced2394d
| * | bcache: fix crash with incomplete cache setSlava Pestov2014-08-042-0/+8
| | | | | | | | | | | | Change-Id: I6abde52afe917633480caaf4e2518f42a816d886
| * | bcache: Fix more early shutdown bugsKent Overstreet2014-08-041-3/+4
| | | | | | | | | | | | Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * | bcache: fix use-after-free in btree_gc_coalesce()Slava Pestov2014-08-041-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we goto out_nocoalesce after we free new_nodes[0], we end up freeing new_nodes[0] again. This was generating a lockdep warning. The fix is to set new_nodes[0] to NULL, since the out_nocoalesce path safely ignores NULL entries in the new_nodes array. This regression was introduced in 2d7f9531. Change-Id: I76564d7257800583214376b4bacf236cda90c89c
| * | bcache: Fix an infinite loop in journal replayKent Overstreet2014-08-041-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | When running with multiple cache devices, if one of the devices has a completely empty journal but we'd already found some journal entries on a previosu device we'd go into an infinite loop. Change-Id: I1dcdc0d738192746de28f40e8b08825b0dea5e2b Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * | bcache: fix crash in bcache_btree_node_alloc_fail tracepointSlava Pestov2014-08-041-1/+1
| | | | | | | | | | | | | | | | | | 'b' was NULL. Change-Id: Icac0fd04afa2d23f213d96d51afd53374e6dd0c0
| * | bcache: bcache_write tracepoint was crashingSlava Pestov2014-08-041-1/+2
| | | | | | | | | | | | Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * | bcache: fix typo in bch_bkey_equal_headerSlava Pestov2014-08-041-1/+1
| | | | | | | | | | | | Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * | bcache: Allocate bounce buffers with GFP_NOWAITKent Overstreet2014-08-042-2/+2
| | | | | | | | | | | | | | | | | | | | | There's no point in blocking on these allocations, since our fallback paths will probably go faster than blocking. Change-Id: I733ca202c25cb36bde02607a0a60552229a4241c
| * | bcache: Make sure to pass GFP_WAIT to mempool_alloc()Kent Overstreet2014-08-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | this was very wrong - mempool_alloc() only guarantees success with GFP_WAIT. bcache uses GFP_NOWAIT in various other places where we have a fallback, circuits must've gotten crossed when writing this code or something. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * | bcache: fix uninterruptible sleep in writeback threadSlava Pestov2014-08-043-5/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There were two issues here: - writeback thread did not start until the device first became dirty - writeback thread used uninterruptible sleep once running Without this patch I see kernel warnings printed and a load average of 1.52 after booting my test VM. With this patch the warnings are gone and the load average is near 0.00 as expected. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * | bcache: wait for buckets when allocating new btree rootSlava Pestov2014-08-043-5/+12
| | | | | | | | | | | | | | | | | | | | | | | | Tested: - sometimes bcache_tier test would hang on startup with a failure to allocate the btree root -- no longer seeing this Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * | bcache: fix crash on shutdown in passthrough modeSlava Pestov2014-08-041-1/+2
| | | | | | | | | | | | We never started the writeback thread in this case, so don't stop it.
| * | bcache: fix lockdep warnings on shutdownSlava Pestov2014-08-041-0/+4
| | |
| * | bcache allocator: send discards with correct sizeSlava Pestov2014-08-041-1/+1
| | |
| * | bcache: Fix to remove the rcu_sched stalls.Surbhi Palande2014-08-041-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | while loop was executing infinitely. This fix ends the while loop gracefully. Signed-off-by: Surbhi Palande <sap@daterainc.com> Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * | bcache: Fix a journal replay bugKent Overstreet2014-08-043-11/+19
| | | | | | | | | | | | | | | | | | | | | journal replay wansn't validating pointers with bch_extent_invalid() before derefing, fixed Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * | bcache: Fix a bug when detachingKent Overstreet2014-08-041-3/+6
| | | | | | | | | | | | | | | | | | | | | After detaching a backing device from a cache set, a bit wasn't getting reset meaning the second detach wouldn't work correctly. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* | | Merge tag 'md/3.17' of git://neil.brown.name/mdLinus Torvalds2014-08-114-16/+22
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull md updates from Neil Brown: "Most interesting is that md devices (major == 9) with minor numbers of 512 or more will no longer be created simply by opening a block device file. They can only be created by writing to /sys/module/md_mod/parameters/new_array The 'auto-create-on-open' semantic is cumbersome and we need to start moving away from it" * tag 'md/3.17' of git://neil.brown.name/md: md: don't allow bitmap file to be added to raid0/linear. md/raid0: check for bitmap compatability when changing raid levels. md: Recovery speed is wrong md: disable probing for md devices 512 and over. md/raid1,raid10: always abort recover on write error.
| * | | md: don't allow bitmap file to be added to raid0/linear.NeilBrown2014-08-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An array can only accept a bitmap if it will call bitmap_daemon_work periodically, which means it needs a thread running. If there is no thread, don't allow a bitmap to be added. Signed-off-by: NeilBrown <neilb@suse.de>
| * | | md/raid0: check for bitmap compatability when changing raid levels.NeilBrown2014-08-081-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | If an array has a bitmap, then it cannot be converted to raid0. Reported-by: Xiao Ni <xni@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>
| * | | md: Recovery speed is wrongXiao Ni2014-08-081-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we calculate the speed of recovery, the numerator that contains the recovery done sectors. It's need to subtract the sectors which don't finish recovery. Signed-off-by: Xiao Ni <xni@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>