summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* ubfis: authentication: Authenticate master nodeSascha Hauer2018-10-233-10/+61
| | | | | | | | The master node contains hashes over the root index node and the LPT. This patch adds a HMAC to authenticate the master node itself. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>
* ubifs: authentication: Authenticate LPTSascha Hauer2018-10-233-0/+134
| | | | | | | | | | | The LPT needs to be authenticated aswell. Since the LPT is only written during commit it is enough to authenticate the whole LPT with a single hash which is stored in the master node. Only the leaf nodes (pnodes) are hashed which makes the implementation much simpler than it would be to hash the complete LPT. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>
* ubifs: Authenticate replayed journalSascha Hauer2018-10-231-2/+144
| | | | | | | | | | | | | | Make sure that during replay all buds can be authenticated. To do this we calculate the hash chain until we find an authentication node and check the HMAC in that node against the current status of the hash chain. After a power cut it can happen that some nodes have been written, but not yet the authentication node for them. These nodes have to be discarded during replay. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>
* ubifs: Add auth nodes to garbage collector journal headSascha Hauer2018-10-231-3/+43
| | | | | | | | To be able to authenticate the garbage collector journal head add authentication nodes to the buds the garbage collector creates. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>
* ubifs: Add authentication nodes to journalSascha Hauer2018-10-236-18/+153
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Nodes that are written to flash can only be authenticated through the index after the next commit. When a journal replay is necessary the nodes are not yet referenced by the index and thus can't be authenticated. This patch overcomes this situation by creating a hash over all nodes beginning from the commit start node over the reference node(s) and the buds themselves. From time to time we insert authentication nodes. Authentication nodes contain a HMAC from the current hash state, so that they can be used to authenticate a journal replay up to the point where the authentication node is. The hash is continued afterwards so that theoretically we would only have to check the HMAC of the last authentication node we find. Overall we get this picture: ,,,,,,,, ,......,........................................... ,. CS , hash1.----. hash2.----. ,. | , . |hmac . |hmac ,. v , . v . v ,.REF#0,-> bud -> bud -> bud.-> auth -> bud -> bud.-> auth ... ,..|...,........................................... , | , , | ,,,,,,,,,,,,,,, . | hash3,----. , | , |hmac , v , v , REF#1 -> bud -> bud,-> auth ... ,,,|,,,,,,,,,,,,,,,,,, v REF#2 -> ... | V ... Note how hash3 covers CS, REF#0 and REF#1 so that it is not possible to exchange or skip any reference nodes. Unlike the picture suggests the auth nodes themselves are not hashed. With this it is possible for an offline attacker to cut each journal head or to drop the last reference node(s), but not to skip any journal heads or to reorder any operations. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>
* ubifs: authentication: Add hashes to index nodesSascha Hauer2018-10-237-14/+81
| | | | | | | | | | | | | With this patch the hashes over the index nodes stored in the tree node cache are written to flash and are checked when read back from flash. The hash of the root index node is stored in the master node. During journal replay the hashes are regenerated from the read nodes and stored in the tree node cache. This means the nodes must previously be authenticated by other means. This is done in a later patch. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>
* ubifs: Add hashes to the tree node cacheSascha Hauer2018-10-234-30/+135
| | | | | | | | | | | | | | As part of the UBIFS authentication support every branch in the index gets a hash covering the referenced node. To make that happen the tree node cache needs hashes over the nodes. This patch adds a hash argument to ubifs_tnc_add() and ubifs_tnc_add_nm(). The hashes are calculated from the callers of these functions which actually prepare the nodes. With this patch all the leaf nodes of the index tree get hashes, but currently nothing is done with these hashes, this is left for a later patch. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>
* ubifs: Create functions to embed a HMAC in a nodeSascha Hauer2018-10-232-6/+70
| | | | | | | | | | | | | With authentication support some nodes (master node, super block node) get a HMAC embedded into them. This patch adds functions to prepare and write such a node. The difficulty is that besides the HMAC the nodes also have a CRC which must stay valid. This means we first have to initialize all fields in the node, then calculate the HMAC (not covering the CRC) and finally calculate the CRC. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>
* ubifs: Add helper functions for authentication supportSascha Hauer2018-10-234-0/+722
| | | | | | | | | | | This patch adds the various helper functions needed for authentication support. We need functions to hash nodes, to embed HMACs into a node and to compare hashes and HMACs. Most functions first check if this filesystem is authenticated and bail out early if not, which makes the functions safe to be called with disabled authentication. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>
* ubifs: Add separate functions to init/crc a nodeSascha Hauer2018-10-232-15/+29
| | | | | | | | | | When adding authentication support we will embed a HMAC into some nodes. To prepare these nodes we have to first initialize the nodes, then add a HMAC and finally add a CRC. To accomplish this add separate ubifs_init_node/ubifs_crc_node functions. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>
* ubifs: Format changes for authentication supportSascha Hauer2018-10-233-3/+50
| | | | | | | | | | | | | | | | | | | | | | | This patch adds the changes to the on disk format needed for authentication support. We'll add: * a HMAC covering super block node * a HMAC covering the master node * a hash over the root index node to the master node * a hash over the LPT to the master node * a flag to the filesystem flag indicating the filesystem is authenticated * an authentication node necessary to authenticate the nodes written to the journal heads while they are written. * a HMAC of a well known message to the super block node to be able to check if the correct key is provided And finally, not visible in this patch, nevertheless explained here: * hashes over the referenced child nodes in each branch of a index node Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>
* ubifs: Store read superblock nodeSascha Hauer2018-10-233-22/+8
| | | | | | | | | | | | The superblock node is read/modified/written several times throughout the UBIFS code. Instead of reading it from the device each time just keep a copy in memory and write back the modified copy when necessary. This patch helps for authentication support, here we not only have to read the superblock node, but also have to authenticate it, which is easier if we do it once during initialization. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>
* ubifs: Drop write_nodeSascha Hauer2018-10-231-34/+5
| | | | | | | | write_node() is used only once and can easily be replaced with calls to ubifs_prepare_node()/write_head() which makes the code a bit shorter. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>
* ubifs: Implement ubifs_lpt_lookup using ubifs_pnode_lookupSascha Hauer2018-10-231-18/+2
| | | | | | | | | ubifs_lpt_lookup() starts by looking up the nth pnode in the LPT. We already have this functionality in ubifs_pnode_lookup(). Use this function rather than open coding its functionality. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>
* ubifs: Export pnode_lookup as ubifs_pnode_lookupSascha Hauer2018-10-233-36/+37
| | | | | | | | | ubifs_lpt_lookup could be implemented using pnode_lookup. To make that possible move pnode_lookup from lpt.c to lpt_commit.c. Rename it to ubifs_pnode_lookup since it's now exported. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>
* ubifs: Pass ubifs_zbranch to read_znode()Sascha Hauer2018-10-231-5/+6
| | | | | | | | | | | read_znode() takes len, lnum and offs arguments which the caller all extracts from the same struct ubifs_zbranch *. When adding authentication support we would have to add a pointer to a hash to the arguments which is also part of struct ubifs_zbranch. Pass the ubifs_zbranch * instead so that we do not have to add another argument. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>
* ubifs: Pass ubifs_zbranch to try_read_node()Sascha Hauer2018-10-231-7/+7
| | | | | | | | | | | try_read_node() takes len, lnum and offs arguments which the caller all extracts from the same struct ubifs_zbranch *. When adding authentication support we would have to add a pointer to a hash to the arguments which is also part of struct ubifs_zbranch. Pass the ubifs_zbranch * instead so that we do not have to add another argument. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>
* ubifs: Refactor create_default_filesystem()Sascha Hauer2018-10-231-48/+47
| | | | | | | | | | | | create_default_filesystem() allocates memory for a node, writes that node and frees the memory directly afterwards. With this patch we allocate memory for all nodes at the beginning of the function and free the memory at the end. This makes it easier to implement authentication support since with authentication support we'll need the contents of some nodes when creating other nodes. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>
* fscache: Fix out of bound read in long cookie keysEric Sandeen2018-10-181-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fscache_set_key() can incur an out-of-bounds read, reported by KASAN: BUG: KASAN: slab-out-of-bounds in fscache_alloc_cookie+0x5b3/0x680 [fscache] Read of size 4 at addr ffff88084ff056d4 by task mount.nfs/32615 and also reported by syzbot at https://lkml.org/lkml/2018/7/8/236 BUG: KASAN: slab-out-of-bounds in fscache_set_key fs/fscache/cookie.c:120 [inline] BUG: KASAN: slab-out-of-bounds in fscache_alloc_cookie+0x7a9/0x880 fs/fscache/cookie.c:171 Read of size 4 at addr ffff8801d3cc8bb4 by task syz-executor907/4466 This happens for any index_key_len which is not divisible by 4 and is larger than the size of the inline key, because the code allocates exactly index_key_len for the key buffer, but the hashing loop is stepping through it 4 bytes (u32) at a time in the buf[] array. Fix this by calculating how many u32 buffers we'll need by using DIV_ROUND_UP, and then using kcalloc() to allocate a precleared allocation buffer to hold the index_key, then using that same count as the hashing index limit. Fixes: ec0328e46d6e ("fscache: Maintain a catalogue of allocated cookies") Reported-by: syzbot+a95b989b2dde8e806af8@syzkaller.appspotmail.com Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* fscache: Fix incomplete initialisation of inline key spaceDavid Howells2018-10-183-23/+5
| | | | | | | | | | | | | | | | | | | | | | | | | The inline key in struct rxrpc_cookie is insufficiently initialized, zeroing only 3 of the 4 slots, therefore an index_key_len between 13 and 15 bytes will end up hashing uninitialized memory because the memcpy only partially fills the last buf[] element. Fix this by clearing fscache_cookie objects on allocation rather than using the slab constructor to initialise them. We're going to pretty much fill in the entire struct anyway, so bringing it into our dcache writably shouldn't incur much overhead. This removes the need to do clearance in fscache_set_key() (where we aren't doing it correctly anyway). Also, we don't need to set cookie->key_len in fscache_set_key() as we already did it in the only caller, so remove that. Fixes: ec0328e46d6e ("fscache: Maintain a catalogue of allocated cookies") Reported-by: syzbot+a95b989b2dde8e806af8@syzkaller.appspotmail.com Reported-by: Eric Sandeen <sandeen@redhat.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* cachefiles: fix the race between cachefiles_bury_object() and rmdir(2)Al Viro2018-10-181-1/+1
| | | | | | | | | | | | | | | | | | | | the victim might've been rmdir'ed just before the lock_rename(); unlike the normal callers, we do not look the source up after the parents are locked - we know it beforehand and just recheck that it's still the child of what used to be its parent. Unfortunately, the check is too weak - we don't spot a dead directory since its ->d_parent is unchanged, dentry is positive, etc. So we sail all the way to ->rename(), with hosting filesystems _not_ expecting to be asked renaming an rmdir'ed subdirectory. The fix is easy, fortunately - the lock on parent is sufficient for making IS_DEADDIR() on child safe. Cc: stable@vger.kernel.org Fixes: 9ae326a69004 (CacheFiles: A cache that backs onto a mounted filesystem) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* afs: Fix clearance of replyDavid Howells2018-10-152-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The recent patch to fix the afs_server struct leak didn't actually fix the bug, but rather fixed some of the symptoms. The problem is that an asynchronous call that holds a resource pointed to by call->reply[0] will find the pointer cleared in the call destructor, thereby preventing the resource from being cleaned up. In the case of the server record leak, the afs_fs_get_capabilities() function in devel code sets up a call with reply[0] pointing at the server record that should be altered when the result is obtained, but this was being cleared before the destructor was called, so the put in the destructor does nothing and the record is leaked. Commit f014ffb025c1 removed the additional ref obtained by afs_install_server(), but the removal of this ref is actually used by the garbage collector to mark a server record as being defunct after the record has expired through lack of use. The offending clearance of call->reply[0] upon completion in afs_process_async_call() has been there from the origin of the code, but none of the asynchronous calls actually use that pointer currently, so it should be safe to remove (note that synchronous calls don't involve this function). Fix this by the following means: (1) Revert commit f014ffb025c1. (2) Remove the clearance of reply[0] from afs_process_async_call(). Without this, afs_manage_servers() will suffer an assertion failure if it sees a server record that didn't get used because the usage count is not 1. Fixes: f014ffb025c1 ("afs: Fix afs_server struct leak") Fixes: 08e0e7c82eea ("[AF_RXRPC]: Make the in-kernel AFS filesystem use AF_RXRPC.") Signed-off-by: David Howells <dhowells@redhat.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Merge tag 'libnvdimm-fixes-4.19-rc8' of ↵Greg Kroah-Hartman2018-10-141-2/+11
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Dan writes: "libnvdimm/dax 4.19-rc8 * Fix a livelock in dax_layout_busy_page() present since v4.18. The lockup triggers when truncating an actively mapped huge page out of a mapping pinned for direct-I/O. * Fix mprotect() clobbers of _PAGE_DEVMAP. Broken since v4.5 mprotect() clears this flag that is needed to communicate the liveness of device pages to the get_user_pages() path." * tag 'libnvdimm-fixes-4.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: mm: Preserve _PAGE_DEVMAP across mprotect() calls filesystem-dax: Fix dax_layout_busy_page() livelock
| * filesystem-dax: Fix dax_layout_busy_page() livelockDan Williams2018-10-081-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the presence of multi-order entries the typical pagevec_lookup_entries() pattern may loop forever: while (index < end && pagevec_lookup_entries(&pvec, mapping, index, min(end - index, (pgoff_t)PAGEVEC_SIZE), indices)) { ... for (i = 0; i < pagevec_count(&pvec); i++) { index = indices[i]; ... } index++; /* BUG */ } The loop updates 'index' for each index found and then increments to the next possible page to continue the lookup. However, if the last entry in the pagevec is multi-order then the next possible page index is more than 1 page away. Fix this locally for the filesystem-dax case by checking for dax-multi-order entries. Going forward new users of multi-order entries need to be similarly careful, or we need a generic way to report the page increment in the radix iterator. Fixes: 5fac7408d828 ("mm, fs, dax: handle layout changes to pinned dax...") Cc: <stable@vger.kernel.org> Cc: Ross Zwisler <zwisler@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* | ubifs: Fix WARN_ON logic in exit pathRichard Weinberger2018-10-131-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | ubifs_assert() is not WARN_ON(), so we have to invert the checks. Randy faced this warning with UBIFS being a module, since most users use UBIFS as builtin because UBIFS is the rootfs nobody noticed so far. :-( Including me. Reported-by: Randy Dunlap <rdunlap@infradead.org> Fixes: 54169ddd382d ("ubifs: Turn two ubifs_assert() into a WARN_ON()") Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Merge branch 'akpm'Greg Kroah-Hartman2018-10-132-0/+3
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes from Andrew: * akpm: fs/fat/fatent.c: add cond_resched() to fat_count_free_clusters() mm/thp: fix call to mmu_notifier in set_pmd_migration_entry() v2 mm/mmap.c: don't clobber partially overlapping VMA with MAP_FIXED_NOREPLACE ocfs2: fix a GCC warning
| * | fs/fat/fatent.c: add cond_resched() to fat_count_free_clusters()Khazhismel Kumykov2018-10-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On non-preempt kernels this loop can take a long time (more than 50 ticks) processing through entries. Link: http://lkml.kernel.org/r/20181010172623.57033-1-khazhy@google.com Signed-off-by: Khazhismel Kumykov <khazhy@google.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | ocfs2: fix a GCC warningzhong jiang2018-10-131-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the following compile warning: fs/ocfs2/dlmglue.c:99:30: warning: ‘lockdep_keys’ defined but not used [-Wunused-variable] static struct lock_class_key lockdep_keys[OCFS2_NUM_LOCK_TYPES]; Link: http://lkml.kernel.org/r/1536938148-32110-1-git-send-email-zhongjiang@huawei.com Signed-off-by: zhong jiang <zhongjiang@huawei.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | | Merge tag 'gfs2-4.19.fixes3' of ↵Greg Kroah-Hartman2018-10-131-5/+1
|\ \ \ | |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Andreas writes: "gfs2 4.19 fixes Fix iomap buffered write support for journaled files" * tag 'gfs2-4.19.fixes3' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Fix iomap buffered write support for journaled files (2)
| * | gfs2: Fix iomap buffered write support for journaled files (2)Andreas Gruenbacher2018-10-121-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It turns out that the fix in commit 6636c3cc56 is bad; the assertion that the iomap code no longer creates buffer heads is incorrect for filesystems that set the IOMAP_F_BUFFER_HEAD flag. Instead, what's happening is that gfs2_iomap_begin_write treats all files that have the jdata flag set as journaled files, which is incorrect as long as those files are inline ("stuffed"). We're handling stuffed files directly via the page cache, which is why we ended up with pages without buffer heads in gfs2_page_add_databufs. Fix this by handling stuffed journaled files correctly in gfs2_iomap_begin_write. This reverts commit 6636c3cc5690c11631e6366cf9a28fb99c8b25bb. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* | | afs: Fix afs_server struct leakDavid Howells2018-10-121-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a leak of afs_server structs. The routine that installs them in the various lookup lists and trees gets a ref on leaving the function, whether it added the server or a server already exists. It shouldn't increment the refcount if it added the server. The effect of this that "rmmod kafs" will hang waiting for the leaked server to become unused. Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | | afs: Fix cell proc listDavid Howells2018-10-125-10/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Access to the list of cells by /proc/net/afs/cells has a couple of problems: (1) It should be checking against SEQ_START_TOKEN for the keying the header line. (2) It's only holding the RCU read lock, so it can't just walk over the list without following the proper RCU methods. Fix these by using an hlist instead of an ordinary list and using the appropriate accessor functions to follow it with RCU. Since the code that adds a cell to the list must also necessarily change, sort the list on insertion whilst we're at it. Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | | Merge tag 'xfs-fixes-for-4.19-rc7' of ↵Greg Kroah-Hartman2018-10-111-35/+165
|\ \ \ | |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Dave writes: "xfs: fixes for 4.19-rc7 Update for 4.19-rc7 to fix numerous file clone and deduplication issues." * tag 'xfs-fixes-for-4.19-rc7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix data corruption w/ unaligned reflink ranges xfs: fix data corruption w/ unaligned dedupe ranges xfs: update ctime and remove suid before cloning files xfs: zero posteof blocks when cloning above eof xfs: refactor clonerange preparation into a separate helper
| * | xfs: fix data corruption w/ unaligned reflink rangesDave Chinner2018-10-061-13/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When reflinking sub-file ranges, a data corruption can occur when the source file range includes a partial EOF block. This shares the unknown data beyond EOF into the second file at a position inside EOF, exposing stale data in the second file. XFS only supports whole block sharing, but we still need to support whole file reflink correctly. Hence if the reflink request includes the last block of the souce file, only proceed with the reflink operation if it lands at or past the destination file's current EOF. If it lands within the destination file EOF, reject the entire request with -EINVAL and make the caller go the hard way. This avoids the data corruption vector, but also avoids disruption of returning EINVAL to userspace for the common case of whole file cloning. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | xfs: fix data corruption w/ unaligned dedupe rangesDave Chinner2018-10-061-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A deduplication data corruption is Exposed by fstests generic/505 on XFS. It is caused by extending the block match range to include the partial EOF block, but then allowing unknown data beyond EOF to be considered a "match" to data in the destination file because the comparison is only made to the end of the source file. This corrupts the destination file when the source extent is shared with it. XFS only supports whole block dedupe, but we still need to appear to support whole file dedupe correctly. Hence if the dedupe request includes the last block of the souce file, don't include it in the actual XFS dedupe operation. If the rest of the range dedupes successfully, then report the partial last block as deduped, too, so that userspace sees it as a successful dedupe rather than return EINVAL because we can't dedupe unaligned blocks. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | xfs: update ctime and remove suid before cloning filesDarrick J. Wong2018-10-051-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before cloning into a file, update the ctime and remove sensitive attributes like suid, just like we'd do for a regular file write. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | xfs: zero posteof blocks when cloning above eofDarrick J. Wong2018-10-051-8/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we're reflinking between two files and the destination file range is well beyond the destination file's EOF marker, zero any posteof speculative preallocations in the destination file so that we don't expose stale disk contents. The previous strategy of trying to clear the preallocations does not work if the destination file has the PREALLOC flag set. Uncovered by shared/010. Reported-by: Zorro Lang <zlang@redhat.com> Bugzilla-id: https://bugzilla.kernel.org/show_bug.cgi?id=201259 Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | xfs: refactor clonerange preparation into a separate helperDarrick J. Wong2018-10-051-27/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Refactor all the reflink preparation steps into a separate helper that we'll use to land all the upcoming fixes for insufficient input checks. This rework also moves the invalidation of the destination range to the prep function so that it is done before the range is remapped. This ensures that nobody can access the data in range being remapped until the remap is complete. [dgc: fix xfs_reflink_remap_prep() return value and caller check to handle vfs_clone_file_prep_inodes() returning 0 to mean "nothing to do". ] [dgc: make sure length changed by vfs_clone_file_prep_inodes() gets propagated back to XFS code that does the remapping. ] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* | | gfs2: Fix iomap buffered write support for journaled filesAndreas Gruenbacher2018-10-091-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 64bc06bb32ee broke buffered writes to journaled files (chattr +j): we'll try to journal the buffer heads of the page being written to in gfs2_iomap_journaled_page_done. However, the iomap code no longer creates buffer heads, so we'll BUG() in gfs2_page_add_databufs. Fix that by creating buffer heads ourself when needed. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* | | ocfs2: fix locking for res->tracking and dlm->tracking_listAshish Samant2018-10-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In dlm_init_lockres() we access and modify res->tracking and dlm->tracking_list without holding dlm->track_lock. This can cause list corruptions and can end up in kernel panic. Fix this by locking res->tracking and dlm->tracking_list with dlm->track_lock instead of dlm->spinlock. Link: http://lkml.kernel.org/r/1529951192-4686-1-git-send-email-ashish.samant@oracle.com Signed-off-by: Ashish Samant <ashish.samant@oracle.com> Reviewed-by: Changwei Ge <ge.changwei@h3c.com> Acked-by: Joseph Qi <jiangqi903@gmail.com> Acked-by: Jun Piao <piaojun@huawei.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <ge.changwei@h3c.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | | proc: restrict kernel stack dumps to rootJann Horn2018-10-051-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, you can use /proc/self/task/*/stack to cause a stack walk on a task you control while it is running on another CPU. That means that the stack can change under the stack walker. The stack walker does have guards against going completely off the rails and into random kernel memory, but it can interpret random data from your kernel stack as instruction pointers and stack pointers. This can cause exposure of kernel stack contents to userspace. Restrict the ability to inspect kernel stacks of arbitrary tasks to root in order to prevent a local attacker from exploiting racy stack unwinding to leak kernel task stack contents. See the added comment for a longer rationale. There don't seem to be any users of this userspace API that can't gracefully bail out if reading from the file fails. Therefore, I believe that this change is unlikely to break things. In the case that this patch does end up needing a revert, the next-best solution might be to fake a single-entry stack based on wchan. Link: http://lkml.kernel.org/r/20180927153316.200286-1-jannh@google.com Fixes: 2ec220e27f50 ("proc: add /proc/*/stack") Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Ken Chen <kenchen@google.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | | ocfs2: fix crash in ocfs2_duplicate_clusters_by_page()Larry Chen2018-10-051-4/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ocfs2_duplicate_clusters_by_page() may crash if one of the extent's pages is dirty. When a page has not been written back, it is still in dirty state. If ocfs2_duplicate_clusters_by_page() is called against the dirty page, the crash happens. To fix this bug, we can just unlock the page and wait until the page until its not dirty. The following is the backtrace: kernel BUG at /root/code/ocfs2/refcounttree.c:2961! [exception RIP: ocfs2_duplicate_clusters_by_page+822] __ocfs2_move_extent+0x80/0x450 [ocfs2] ? __ocfs2_claim_clusters+0x130/0x250 [ocfs2] ocfs2_defrag_extent+0x5b8/0x5e0 [ocfs2] __ocfs2_move_extents_range+0x2a4/0x470 [ocfs2] ocfs2_move_extents+0x180/0x3b0 [ocfs2] ? ocfs2_wait_for_recovery+0x13/0x70 [ocfs2] ocfs2_ioctl_move_extents+0x133/0x2d0 [ocfs2] ocfs2_ioctl+0x253/0x640 [ocfs2] do_vfs_ioctl+0x90/0x5f0 SyS_ioctl+0x74/0x80 do_syscall_64+0x74/0x140 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Once we find the page is dirty, we do not wait until it's clean, rather we use write_one_page() to write it back Link: http://lkml.kernel.org/r/20180829074740.9438-1-lchen@suse.com [lchen@suse.com: update comments] Link: http://lkml.kernel.org/r/20180830075041.14879-1-lchen@suse.com [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Larry Chen <lchen@suse.com> Acked-by: Changwei Ge <ge.changwei@h3c.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | | Merge tag '4.19-rc6-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Greg Kroah-Hartman2018-10-054-6/+31
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Steve writes: "SMB3 fixes four small SMB3 fixes: one for stable, the others to address a more recent regression" * tag '4.19-rc6-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb3: fix lease break problem introduced by compounding cifs: only wake the thread for the very last PDU in a compound cifs: add a warning if we try to to dequeue a deleted mid smb2: fix missing files in root share directory listing
| * | | smb3: fix lease break problem introduced by compoundingSteve French2018-10-021-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes problem (discovered by Aurelien) introduced by recent commit: commit b24df3e30cbf48255db866720fb71f14bf9d2f39 ("cifs: update receive_encrypted_standard to handle compounded responses") which broke the ability to respond to some lease breaks (lease breaks being ignored is a problem since can block server response for duration of the lease break timeout). Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
| * | | cifs: only wake the thread for the very last PDU in a compoundRonnie Sahlberg2018-10-021-1/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For compounded PDUs we whould only wake the waiting thread for the very last PDU of the compound. We do this so that we are guaranteed that the demultiplex_thread will not process or access any of those MIDs any more once the send/recv thread starts processing. Else there is a race where at the end of the send/recv processing we will try to delete all the mids of the compound. If the multiplex thread still has other mids to process at this point for this compound this can lead to an oops. Needed to fix recent commit: commit 730928c8f4be88e9d6a027a16b1e8fa9c59fc077 ("cifs: update smb2_queryfs() to use compounding") Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
| * | | cifs: add a warning if we try to to dequeue a deleted midRonnie Sahlberg2018-10-023-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cifs_delete_mid() is called once we are finished handling a mid and we expect no more work done on this mid. Needed to fix recent commit: commit 730928c8f4be88e9d6a027a16b1e8fa9c59fc077 ("cifs: update smb2_queryfs() to use compounding") Add a warning if someone tries to dequeue a mid that has already been flagged to be deleted. Also change list_del() to list_del_init() so that if we have similar bugs resurface in the future we will not oops. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
| * | | smb2: fix missing files in root share directory listingAurelien Aptel2018-10-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When mounting a Windows share that is the root of a drive (eg. C$) the server does not return . and .. directory entries. This results in the smb2 code path erroneously skipping the 2 first entries. Pseudo-code of the readdir() code path: cifs_readdir(struct file, struct dir_context) initiate_cifs_search <-- if no reponse cached yet server->ops->query_dir_first dir_emit_dots dir_emit <-- adds "." and ".." if we're at pos=0 find_cifs_entry initiate_cifs_search <-- if pos < start of current response (restart search) server->ops->query_dir_next <-- if pos > end of current response (fetch next search res) for(...) <-- loops over cur response entries starting at pos cifs_filldir <-- skip . and .., emit entry cifs_fill_dirent dir_emit pos++ A) dir_emit_dots() always adds . & .. and sets the current dir pos to 2 (0 and 1 are done). Therefore we always want the index_to_find to be 2 regardless of if the response has . and .. B) smb1 code initializes index_of_last_entry with a +2 offset in cifssmb.c CIFSFindFirst(): psrch_inf->index_of_last_entry = 2 /* skip . and .. */ + psrch_inf->entries_in_buffer; Later in find_cifs_entry() we want to find the next dir entry at pos=2 as a result of (A) first_entry_in_buffer = cfile->srch_inf.index_of_last_entry - cfile->srch_inf.entries_in_buffer; This var is the dir pos that the first entry in the buffer will have therefore it must be 2 in the first call. If we don't offset index_of_last_entry by 2 (like in (B)), first_entry_in_buffer=0 but we were instructed to get pos=2 so this code in find_cifs_entry() skips the 2 first which is ok for non-root shares, as it skips . and .. from the response but is not ok for root shares where the 2 first are actual files pos_in_buf = index_to_find - first_entry_in_buffer; // pos_in_buf=2 // we skip 2 first response entries :( for (i = 0; (i < (pos_in_buf)) && (cur_ent != NULL); i++) { /* go entry by entry figuring out which is first */ cur_ent = nxt_dir_entry(cur_ent, end_of_smb, cfile->srch_inf.info_level); } C) cifs_filldir() skips . and .. so we can safely ignore them for now. Sample program: int main(int argc, char **argv) { const char *path = argc >= 2 ? argv[1] : "."; DIR *dh; struct dirent *de; printf("listing path <%s>\n", path); dh = opendir(path); if (!dh) { printf("opendir error %d\n", errno); return 1; } while (1) { de = readdir(dh); if (!de) { if (errno) { printf("readdir error %d\n", errno); return 1; } printf("end of listing\n"); break; } printf("off=%lu <%s>\n", de->d_off, de->d_name); } return 0; } Before the fix with SMB1 on root shares: <.> off=1 <..> off=2 <$Recycle.Bin> off=3 <bootmgr> off=4 and on non-root shares: <.> off=1 <..> off=4 <-- after adding .., the offsets jumps to +2 because <2536> off=5 we skipped . and .. from response buffer (C) <411> off=6 but still incremented pos <file> off=7 <fsx> off=8 Therefore the fix for smb2 is to mimic smb1 behaviour and offset the index_of_last_entry by 2. Test results comparing smb1 and smb2 before/after the fix on root share, non-root shares and on large directories (ie. multi-response dir listing): PRE FIX ======= pre-1-root VS pre-2-root: ERR pre-2-root is missing [bootmgr, $Recycle.Bin] pre-1-nonroot VS pre-2-nonroot: OK~ same files, same order, different offsets pre-1-nonroot-large VS pre-2-nonroot-large: OK~ same files, same order, different offsets POST FIX ======== post-1-root VS post-2-root: OK same files, same order, same offsets post-1-nonroot VS post-2-nonroot: OK same files, same order, same offsets post-1-nonroot-large VS post-2-nonroot-large: OK same files, same order, same offsets REGRESSION? =========== pre-1-root VS post-1-root: OK same files, same order, same offsets pre-1-nonroot VS post-1-nonroot: OK same files, same order, same offsets BugLink: https://bugzilla.samba.org/show_bug.cgi?id=13107 Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Paulo Alcantara <palcantara@suse.deR> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org>
* | | | Merge tag 'ovl-fixes-4.19-rc7' of ↵Greg Kroah-Hartman2018-10-049-10/+27
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Miklos writes: "overlayfs fixes for 4.19-rc7 This update fixes a couple of regressions in the stacked file update added in this cycle, as well as some older bugs uncovered by syzkaller. There's also one trivial naming change that touches other parts of the fs subsystem." * tag 'ovl-fixes-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: fix format of setxattr debug ovl: fix access beyond unterminated strings ovl: make symbol 'ovl_aops' static vfs: swap names of {do,vfs}_clone_file_range() ovl: fix freeze protection bypass in ovl_clone_file_range() ovl: fix freeze protection bypass in ovl_write_iter() ovl: fix memory leak on unlink of indexed file
| * | | | ovl: fix format of setxattr debugMiklos Szeredi2018-10-041-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Format has a typo: it was meant to be "%.*s", not "%*s". But at some point callers grew nonprintable values as well, so use "%*pE" instead with a maximized length. Reported-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") Cc: <stable@vger.kernel.org> # v4.12
| * | | | ovl: fix access beyond unterminated stringsAmir Goldstein2018-10-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KASAN detected slab-out-of-bounds access in printk from overlayfs, because string format used %*s instead of %.*s. > BUG: KASAN: slab-out-of-bounds in string+0x298/0x2d0 lib/vsprintf.c:604 > Read of size 1 at addr ffff8801c36c66ba by task syz-executor2/27811 > > CPU: 0 PID: 27811 Comm: syz-executor2 Not tainted 4.19.0-rc5+ #36 ... > printk+0xa7/0xcf kernel/printk/printk.c:1996 > ovl_lookup_index.cold.15+0xe8/0x1f8 fs/overlayfs/namei.c:689 Reported-by: syzbot+376cea2b0ef340db3dd4@syzkaller.appspotmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 359f392ca53e ("ovl: lookup index entry for copy up origin") Cc: <stable@vger.kernel.org> # v4.13