| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull CIFS/SMB3 updates from Steve French:
"Includes support for a critical SMB3 security feature: per-share
encryption from Pavel, and a cleanup from Jean Delvare.
Will have another cifs/smb3 merge next week"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
CIFS: Allow to switch on encryption with seal mount option
CIFS: Add capability to decrypt big read responses
CIFS: Decrypt and process small encrypted packets
CIFS: Add copy into pages callback for a read operation
CIFS: Add mid handle callback
CIFS: Add transform header handling callbacks
CIFS: Encrypt SMB3 requests before sending
CIFS: Enable encryption during session setup phase
CIFS: Add capability to transform requests before sending
CIFS: Separate RFC1001 length processing for SMB2 read
CIFS: Separate SMB2 sync header processing
CIFS: Send RFC1001 length in a separate iov
CIFS: Make send_cancel take rqst as argument
CIFS: Make SendReceive2() takes resp iov
CIFS: Separate SMB2 header structure
CIFS: Fix splice read for non-cached files
cifs: Add soft dependencies
cifs: Only select the required crypto modules
cifs: Simplify SMB2 and SMB311 dependencies
|
| |
| |
| |
| |
| |
| |
| | |
This allows users to inforce encryption for SMB3 shares if a server
supports it.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Allow to decrypt transformed packets that are bigger than the big
buffer size. In particular it is used for read responses that can
only exceed the big buffer size.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
|
| |
| |
| |
| |
| |
| |
| | |
Allow to decrypt transformed packets, find a corresponding mid
and process as usual further.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Since we have two different types of reads (pagecache and direct)
we need to process such responses differently after decryption of
a packet. The change allows to specify a callback that copies a read
payload data into preallocated pages.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
We need to process read responses differently because the data
should go directly into preallocated pages. This can be done
by specifying a mid handle callback.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
|
| |
| |
| |
| |
| |
| |
| | |
We need to recognize and parse transformed packets in demultiplex
thread to find a corresponsing mid and process it further.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
|
| |
| |
| |
| |
| |
| |
| | |
This change allows to encrypt packets if it is required by a server
for SMB sessions or tree connections.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
|
| |
| |
| |
| |
| |
| |
| | |
In order to allow encryption on SMB connection we need to exchange
a session key and generate encryption and decryption keys.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
This will allow us to do protocol specific tranformations of packets
before sending to the server. For SMB3 it can be used to support
encryption.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Allocate and initialize SMB2 read request without RFC1001 length
field to directly call cifs_send_recv() rather than SendReceive2()
in a read codepath.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Do not process RFC1001 length in smb2_hdr_assemble() because
it is not a part of SMB2 header. This allows to cleanup the code
and adds a possibility combine several SMB2 packets into one
for compounding.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
In order to simplify further encryption support we need to separate
RFC1001 length and SMB2 header when sending a request. Put the length
field in iov[0] and the rest of the packet into following iovs.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
|
| |
| |
| |
| | |
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Now SendReceive2 frees the first iov and returns a response buffer
in it that increases a code complexity. Simplify this by making
a caller responsible for freeing request buffer itself and returning
a response buffer in a separate iov.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In order to support compounding and encryption we need to separate
RFC1001 length field and SMB2 header structure because the protocol
treats them differently. This change will allow to simplify parsing
of such complex SMB2 packets further.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Currently we call copy_page_to_iter() for uncached reading into a pipe.
This is wrong because it treats pages as VFS cache pages and copies references
rather than actual data. When we are trying to read from the pipe we end up
calling page_cache_pipe_buf_confirm() which returns -ENODATA. This error
is translated into 0 which is returned to a user.
This issue is reproduced by running xfs-tests suite (generic test #249)
against mount points with "cache=none". Fix it by mapping pages manually
and calling copy_to_iter() that copies data into the pipe.
Cc: Stable <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
List soft dependencies of cifs so that mkinitrd and dracut can include
the required helper modules.
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Steve French <sfrench@samba.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The sha256 and cmac crypto modules are only needed for SMB2+, so move
the select statements to config CIFS_SMB2. Also select CRYPTO_AES
there as SMB2+ needs it.
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Steve French <sfrench@samba.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* CIFS_SMB2 depends on CIFS, which depends on INET and selects NLS. So
these dependencies do not need to be repeated for CIFS_SMB2.
* CIFS_SMB311 depends on CIFS_SMB2, which depends on INET. So this
dependency doesn't need to be repeated for CIFS_SMB311.
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Steve French <sfrench@samba.org>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"For this cycle we add support for the shutdown ioctl, which is
primarily used for testing, but which can be useful on production
systems when a scratch volume is being destroyed and the data on it
doesn't need to be saved.
This found (and we fixed) a number of bugs with ext4's recovery to
corrupted file system --- the bugs increased the amount of data that
could be potentially lost, and in the case of the inline data feature,
could cause the kernel to BUG.
Also included are a number of other bug fixes, including in ext4's
fscrypt, DAX, inline data support"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (26 commits)
ext4: rename EXT4_IOC_GOINGDOWN to EXT4_IOC_SHUTDOWN
ext4: fix fencepost in s_first_meta_bg validation
ext4: don't BUG when truncating encrypted inodes on the orphan list
ext4: do not use stripe_width if it is not set
ext4: fix stripe-unaligned allocations
dax: assert that i_rwsem is held exclusive for writes
ext4: fix DAX write locking
ext4: add EXT4_IOC_GOINGDOWN ioctl
ext4: add shutdown bit and check for it
ext4: rename s_resize_flags to s_ext4_flags
ext4: return EROFS if device is r/o and journal replay is needed
ext4: preserve the needs_recovery flag when the journal is aborted
jbd2: don't leak modified metadata buffers on an aborted journal
ext4: fix inline data error paths
ext4: move halfmd4 into hash.c directly
ext4: fix use-after-iput when fscrypt contexts are inconsistent
jbd2: fix use after free in kjournald2()
ext4: fix data corruption in data=journal mode
ext4: trim allocation requests to group size
ext4: replace BUG_ON with WARN_ON in mb_find_extent()
...
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
It's very likely the file system independent ioctl name will be
FS_IOC_SHUTDOWN, so let's use the same name for the ext4 ioctl name.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
It is OK for s_first_meta_bg to be equal to the number of block group
descriptor blocks. (It rarely happens, but it shouldn't cause any
problems.)
https://bugzilla.kernel.org/show_bug.cgi?id=194567
Fixes: 3a4b77cd47bb837b8557595ec7425f281f2ca1fe
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Fix a BUG when the kernel tries to mount a file system constructed as
follows:
echo foo > foo.txt
mke2fs -Fq -t ext4 -O encrypt foo.img 100
debugfs -w foo.img << EOF
write foo.txt a
set_inode_field a i_flags 0x80800
set_super_value s_last_orphan 12
quit
EOF
root@kvm-xfstests:~# mount -o loop foo.img /mnt
[ 160.238770] ------------[ cut here ]------------
[ 160.240106] kernel BUG at /usr/projects/linux/ext4/fs/ext4/inode.c:3874!
[ 160.240106] invalid opcode: 0000 [#1] SMP
[ 160.240106] Modules linked in:
[ 160.240106] CPU: 0 PID: 2547 Comm: mount Tainted: G W 4.10.0-rc3-00034-gcdd33b941b67 #227
[ 160.240106] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1 04/01/2014
[ 160.240106] task: f4518000 task.stack: f47b6000
[ 160.240106] EIP: ext4_block_zero_page_range+0x1a7/0x2b4
[ 160.240106] EFLAGS: 00010246 CPU: 0
[ 160.240106] EAX: 00000001 EBX: f7be4b50 ECX: f47b7dc0 EDX: 00000007
[ 160.240106] ESI: f43b05a8 EDI: f43babec EBP: f47b7dd0 ESP: f47b7dac
[ 160.240106] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[ 160.240106] CR0: 80050033 CR2: bfd85b08 CR3: 34a00680 CR4: 000006f0
[ 160.240106] Call Trace:
[ 160.240106] ext4_truncate+0x1e9/0x3e5
[ 160.240106] ext4_fill_super+0x286f/0x2b1e
[ 160.240106] ? set_blocksize+0x2e/0x7e
[ 160.240106] mount_bdev+0x114/0x15f
[ 160.240106] ext4_mount+0x15/0x17
[ 160.240106] ? ext4_calculate_overhead+0x39d/0x39d
[ 160.240106] mount_fs+0x58/0x115
[ 160.240106] vfs_kern_mount+0x4b/0xae
[ 160.240106] do_mount+0x671/0x8c3
[ 160.240106] ? _copy_from_user+0x70/0x83
[ 160.240106] ? strndup_user+0x31/0x46
[ 160.240106] SyS_mount+0x57/0x7b
[ 160.240106] do_int80_syscall_32+0x4f/0x61
[ 160.240106] entry_INT80_32+0x2f/0x2f
[ 160.240106] EIP: 0xb76b919e
[ 160.240106] EFLAGS: 00000246 CPU: 0
[ 160.240106] EAX: ffffffda EBX: 08053838 ECX: 08052188 EDX: 080537e8
[ 160.240106] ESI: c0ed0000 EDI: 00000000 EBP: 080537e8 ESP: bfa13660
[ 160.240106] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
[ 160.240106] Code: 59 8b 00 a8 01 0f 84 09 01 00 00 8b 07 66 25 00 f0 66 3d 00 80 75 61 89 f8 e8 3e e2 ff ff 84 c0 74 56 83 bf 48 02 00 00 00 75 02 <0f> 0b 81 7d e8 00 10 00 00 74 02 0f 0b 8b 43 04 8b 53 08 31 c9
[ 160.240106] EIP: ext4_block_zero_page_range+0x1a7/0x2b4 SS:ESP: 0068:f47b7dac
[ 160.317241] ---[ end trace d6a773a375c810a5 ]---
The problem is that when the kernel tries to truncate an inode in
ext4_truncate(), it tries to clear any on-disk data beyond i_size.
Without the encryption key, it can't do that, and so it triggers a
BUG.
E2fsck does *not* provide this service, and in practice most file
systems have their orphan list processed by e2fsck, so to avoid
crashing, this patch skips this step if we don't have access to the
encryption key (which is the case when processing the orphan list; in
all other cases, we will have the encryption key, or the kernel
wouldn't have allowed the file to be opened).
An open question is whether the fact that e2fsck isn't clearing the
bytes beyond i_size causing problems --- and if we've lived with it
not doing it for so long, can we drop this from the kernel replay of
the orphan list in all cases (not just when we don't have the key for
encrypted inodes).
Addresses-Google-Bug: #35209576
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Avoid using stripe_width for sbi->s_stripe value if it is not actually
set. It prevents using the stride for sbi->s_stripe.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When a filesystem is created using:
mkfs.ext4 -b 4096 -E stride=512 <dev>
and we try to allocate 64MB extent, we will end up directly in
ext4_mb_complex_scan_group(). This is because the request is detected
as power-of-two allocation (so we start in ext4_mb_regular_allocator()
with ac_criteria == 0) however the check before
ext4_mb_simple_scan_group() refuses the direct buddy scan because the
allocation request is too large. Since cr == 0, the check whether we
should use ext4_mb_scan_aligned() fails as well and we fall back to
ext4_mb_complex_scan_group().
Fix the problem by checking for upper limit on power-of-two requests
directly when detecting them.
Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Make sure all callers follow the same locking protocol, given that DAX
transparantly replaced the normal buffered I/O path.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Unlike O_DIRECT DAX is not an optional opt-in feature selected by the
application, so we'll have to provide the traditional synchronіzation
of overlapping writes as we do for buffered writes.
This was broken historically for DAX, but got fixed for ext2 and XFS
as part of the iomap conversion. Fix up ext4 as well.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This ioctl is modeled after the xfs's XFS_IOC_GOINGDOWN ioctl. (In
fact, it uses the same code points.)
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add a shutdown bit that will cause ext4 processing to fail immediately
with EIO.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We are currently using one bit in s_resize_flags; rename it in order
to allow more of the bits in that unsigned long for other purposes.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If the file system requires journal recovery, and the device is
read-ony, return EROFS to the mount system call. This allows xfstests
generic/050 to pass.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If the journal is aborted, the needs_recovery feature flag should not
be removed. Otherwise, it's the journal might not get replayed and
this could lead to more data getting lost.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If the journal has been aborted, we shouldn't mark the underlying
buffer head as dirty, since that will cause the metadata block to get
modified. And if the journal has been aborted, we shouldn't allow
this since it will almost certainly lead to a corrupted file system.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The write_end() function must always unlock the page and drop its ref
count, even on an error.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The "half md4" transform should not be used by any new code. And
fortunately, it's only used now by ext4. Since ext4 supports several
hashing methods, at some point it might be desirable to move to
something like SipHash. As an intermediate step, remove half md4 from
cryptohash.h and lib, and make it just a local function in ext4's
hash.c. There's precedent for doing this; the other function ext can use
for its hashes -- TEA -- is also implemented in the same place. Also, by
being a local function, this might allow gcc to perform some additional
optimizations.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
In the case where the child's encryption context was inconsistent with
its parent directory, we were using inode->i_sb and inode->i_ino after
the inode had already been iput(). Fix this by doing the iput() in the
correct places.
Note: only ext4 had this bug, not f2fs and ubifs.
Fixes: d9cdc9033181 ("ext4 crypto: enforce context consistency")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Below is the synchronization issue between unmount and kjournald2
contexts, which results into use after free issue in kjournald2().
Fix this issue by using journal->j_state_lock to synchronize the
wait_event() done in journal_kill_thread() and the wake_up() done
in kjournald2().
TASK 1:
umount cmd:
|--jbd2_journal_destroy() {
|--journal_kill_thread() {
write_lock(&journal->j_state_lock);
journal->j_flags |= JBD2_UNMOUNT;
...
write_unlock(&journal->j_state_lock);
wake_up(&journal->j_wait_commit); TASK 2 wakes up here:
kjournald2() {
...
checks JBD2_UNMOUNT flag and calls goto end-loop;
...
end_loop:
write_unlock(&journal->j_state_lock);
journal->j_task = NULL; --> If this thread gets
pre-empted here, then TASK 1 wait_event will
exit even before this thread is completely
done.
wait_event(journal->j_wait_done_commit, journal->j_task == NULL);
...
write_lock(&journal->j_state_lock);
write_unlock(&journal->j_state_lock);
}
|--kfree(journal);
}
}
wake_up(&journal->j_wait_done_commit); --> this step
now results into use after free issue.
}
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
ext4_journalled_write_end() did not propely handle all the cases when
generic_perform_write() did not copy all the data into the target page
and could mark buffers with uninitialized contents as uptodate and dirty
leading to possible data corruption (which would be quickly fixed by
generic_perform_write() retrying the write but still). Fix the problem
by carefully handling the case when the page that is written to is not
uptodate.
CC: stable@vger.kernel.org
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If filesystem groups are artifically small (using parameter -g to
mkfs.ext4), ext4_mb_normalize_request() can result in a request that is
larger than a block group. Trim the request size to not confuse
allocation code.
Reported-by: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The last BUG_ON in mb_find_extent() is apparently triggering in some
rare cases. Most of the time it indicates a bug in the buddy bitmap
algorithms, but there are some weird cases where it can trigger when
buddy bitmap is still in memory, but the block bitmap has to be read
from disk, and there is disk or memory corruption such that the block
bitmap and the buddy bitmap are out of sync.
Google-Bug-Id: #33702157
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | |
| | |
| | |
| | | |
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
There is no need to call ext4_mark_inode_dirty while holding xattr_sem
or i_data_sem, so where it's easy to avoid it, move it out from the
critical region.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The xattr_sem deadlock problems fixed in commit 2e81a4eeedca: "ext4:
avoid deadlock when expanding inode size" didn't include the use of
xattr_sem in fs/ext4/inline.c. With the addition of project quota
which added a new extra inode field, this exposed deadlocks in the
inline_data code similar to the ones fixed by 2e81a4eeedca.
The deadlock can be reproduced via:
dmesg -n 7
mke2fs -t ext4 -O inline_data -Fq -I 256 /dev/vdc 32768
mount -t ext4 -o debug_want_extra_isize=24 /dev/vdc /vdc
mkdir /vdc/a
umount /vdc
mount -t ext4 /dev/vdc /vdc
echo foo > /vdc/a/foo
and looks like this:
[ 11.158815]
[ 11.160276] =============================================
[ 11.161960] [ INFO: possible recursive locking detected ]
[ 11.161960] 4.10.0-rc3-00015-g011b30a8a3cf #160 Tainted: G W
[ 11.161960] ---------------------------------------------
[ 11.161960] bash/2519 is trying to acquire lock:
[ 11.161960] (&ei->xattr_sem){++++..}, at: [<c1225a4b>] ext4_expand_extra_isize_ea+0x3d/0x4cd
[ 11.161960]
[ 11.161960] but task is already holding lock:
[ 11.161960] (&ei->xattr_sem){++++..}, at: [<c1227941>] ext4_try_add_inline_entry+0x3a/0x152
[ 11.161960]
[ 11.161960] other info that might help us debug this:
[ 11.161960] Possible unsafe locking scenario:
[ 11.161960]
[ 11.161960] CPU0
[ 11.161960] ----
[ 11.161960] lock(&ei->xattr_sem);
[ 11.161960] lock(&ei->xattr_sem);
[ 11.161960]
[ 11.161960] *** DEADLOCK ***
[ 11.161960]
[ 11.161960] May be due to missing lock nesting notation
[ 11.161960]
[ 11.161960] 4 locks held by bash/2519:
[ 11.161960] #0: (sb_writers#3){.+.+.+}, at: [<c11a2414>] mnt_want_write+0x1e/0x3e
[ 11.161960] #1: (&type->i_mutex_dir_key){++++++}, at: [<c119508b>] path_openat+0x338/0x67a
[ 11.161960] #2: (jbd2_handle){++++..}, at: [<c123314a>] start_this_handle+0x582/0x622
[ 11.161960] #3: (&ei->xattr_sem){++++..}, at: [<c1227941>] ext4_try_add_inline_entry+0x3a/0x152
[ 11.161960]
[ 11.161960] stack backtrace:
[ 11.161960] CPU: 0 PID: 2519 Comm: bash Tainted: G W 4.10.0-rc3-00015-g011b30a8a3cf #160
[ 11.161960] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1 04/01/2014
[ 11.161960] Call Trace:
[ 11.161960] dump_stack+0x72/0xa3
[ 11.161960] __lock_acquire+0xb7c/0xcb9
[ 11.161960] ? kvm_clock_read+0x1f/0x29
[ 11.161960] ? __lock_is_held+0x36/0x66
[ 11.161960] ? __lock_is_held+0x36/0x66
[ 11.161960] lock_acquire+0x106/0x18a
[ 11.161960] ? ext4_expand_extra_isize_ea+0x3d/0x4cd
[ 11.161960] down_write+0x39/0x72
[ 11.161960] ? ext4_expand_extra_isize_ea+0x3d/0x4cd
[ 11.161960] ext4_expand_extra_isize_ea+0x3d/0x4cd
[ 11.161960] ? _raw_read_unlock+0x22/0x2c
[ 11.161960] ? jbd2_journal_extend+0x1e2/0x262
[ 11.161960] ? __ext4_journal_get_write_access+0x3d/0x60
[ 11.161960] ext4_mark_inode_dirty+0x17d/0x26d
[ 11.161960] ? ext4_add_dirent_to_inline.isra.12+0xa5/0xb2
[ 11.161960] ext4_add_dirent_to_inline.isra.12+0xa5/0xb2
[ 11.161960] ext4_try_add_inline_entry+0x69/0x152
[ 11.161960] ext4_add_entry+0xa3/0x848
[ 11.161960] ? __brelse+0x14/0x2f
[ 11.161960] ? _raw_spin_unlock_irqrestore+0x44/0x4f
[ 11.161960] ext4_add_nondir+0x17/0x5b
[ 11.161960] ext4_create+0xcf/0x133
[ 11.161960] ? ext4_mknod+0x12f/0x12f
[ 11.161960] lookup_open+0x39e/0x3fb
[ 11.161960] ? __wake_up+0x1a/0x40
[ 11.161960] ? lock_acquire+0x11e/0x18a
[ 11.161960] path_openat+0x35c/0x67a
[ 11.161960] ? sched_clock_cpu+0xd7/0xf2
[ 11.161960] do_filp_open+0x36/0x7c
[ 11.161960] ? _raw_spin_unlock+0x22/0x2c
[ 11.161960] ? __alloc_fd+0x169/0x173
[ 11.161960] do_sys_open+0x59/0xcc
[ 11.161960] SyS_open+0x1d/0x1f
[ 11.161960] do_int80_syscall_32+0x4f/0x61
[ 11.161960] entry_INT80_32+0x2f/0x2f
[ 11.161960] EIP: 0xb76ad469
[ 11.161960] EFLAGS: 00000286 CPU: 0
[ 11.161960] EAX: ffffffda EBX: 08168ac8 ECX: 00008241 EDX: 000001b6
[ 11.161960] ESI: b75e46bc EDI: b7755000 EBP: bfbdb108 ESP: bfbdafc0
[ 11.161960] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
Cc: stable@vger.kernel.org # 3.10 (requires 2e81a4eeedca as a prereq)
Reported-by: George Spelvin <linux@sciencehorizons.net>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
In order to test the inode extra isize expansion code, it is useful to
be able to easily create file systems that have inodes with extra
isize values smaller than the current desired value.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Inside ext4_ext_shift_extents() function ext4_find_extent() is called
without EXT4_EX_NOCACHE flag, which should prevent cache population.
This leads to oudated offsets in the extents tree and wrong blocks
afterwards.
Patch fixes the problem providing EXT4_EX_NOCACHE flag for each
ext4_find_extents() call inside ext4_ext_shift_extents function.
Fixes: 331573febb6a2
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Namjae Jeon <namjae.jeon@samsung.com>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: stable@vger.kernel.org
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
While doing 'insert range' start block should be also shifted right.
The bug can be easily reproduced by the following test:
ptr = malloc(4096);
assert(ptr);
fd = open("./ext4.file", O_CREAT | O_TRUNC | O_RDWR, 0600);
assert(fd >= 0);
rc = fallocate(fd, 0, 0, 8192);
assert(rc == 0);
for (i = 0; i < 2048; i++)
*((unsigned short *)ptr + i) = 0xbeef;
rc = pwrite(fd, ptr, 4096, 0);
assert(rc == 4096);
rc = pwrite(fd, ptr, 4096, 4096);
assert(rc == 4096);
for (block = 2; block < 1000; block++) {
rc = fallocate(fd, FALLOC_FL_INSERT_RANGE, 4096, 4096);
assert(rc == 0);
for (i = 0; i < 2048; i++)
*((unsigned short *)ptr + i) = block;
rc = pwrite(fd, ptr, 4096, 4096);
assert(rc == 4096);
}
Because start block is not included in the range the hole appears at
the wrong offset (just after the desired offset) and the following
pwrite() overwrites already existent block, keeping hole untouched.
Simple way to verify wrong behaviour is to check zeroed blocks after
the test:
$ hexdump ./ext4.file | grep '0000 0000'
The root cause of the bug is a wrong range (start, stop], where start
should be inclusive, i.e. [start, stop].
This patch fixes the problem by including start into the range. But
not to break left shift (range collapse) stop points to the beginning
of the a block, not to the end.
The other not obvious change is an iterator check on validness in a
main loop. Because iterator is unsigned the following corner case
should be considered with care: insert a block at 0 offset, when stop
variables overflows and never becomes less than start, which is 0.
To handle this special case iterator is set to NULL to indicate that
end of the loop is reached.
Fixes: 331573febb6a2
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Namjae Jeon <namjae.jeon@samsung.com>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: stable@vger.kernel.org
|
| |\ \ |
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt
Pull fscrypt updates from Ted Ts'o:
"Various cleanups for the file system encryption feature"
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt:
fscrypt: constify struct fscrypt_operations
fscrypt: properly declare on-stack completion
fscrypt: split supp and notsupp declarations into their own headers
fscrypt: remove redundant assignment of res
fscrypt: make fscrypt_operations.key_prefix a string
fscrypt: remove unused 'mode' member of fscrypt_ctx
ext4: don't allow encrypted operations without keys
fscrypt: make test_dummy_encryption require a keyring key
fscrypt: factor out bio specific functions
fscrypt: pass up error codes from ->get_context()
fscrypt: remove user-triggerable warning messages
fscrypt: use EEXIST when file already uses different policy
fscrypt: use ENOTDIR when setting encryption policy on nondirectory
fscrypt: use ENOKEY when file cannot be created w/o key
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Richard Weinberger <richard@nod.at>
|