| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The single caller passes a string to delta_ipc_open, which copies with a
fixed size larger than the string. So it copies some random data after
the original string the ro segment.
If the string was at the end of a page it may fault.
Just copy the string with a normal strcpy after clearing the field.
Found by a LTO build (which errors out)
because the compiler inlines the functions and can resolve
the string sizes and triggers the compile time checks in memcpy.
In function `memcpy',
inlined from `delta_ipc_open.constprop' at linux/drivers/media/platform/sti/delta/delta-ipc.c:178:0,
inlined from `delta_mjpeg_ipc_open' at linux/drivers/media/platform/sti/delta/delta-mjpeg-dec.c:227:0,
inlined from `delta_mjpeg_decode' at linux/drivers/media/platform/sti/delta/delta-mjpeg-dec.c:403:0:
/home/andi/lsrc/linux/include/linux/string.h:337:0: error: call to `__read_overflow2' declared with attribute error: detected read beyond size of object passed as 2nd parameter
__read_overflow2();
Link: http://lkml.kernel.org/r/20171222001212.1850-1-andi@firstfloor.org
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Hugues FRUCHET <hugues.fruchet@st.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
current_time is the last remaining caller of current_kernel_time64(),
which is a wrapper around ktime_get_coarse_real_ts64(). This calls the
latter directly for consistency with the rest of the kernel that is moving
to the ktime_get_ family of time accessors, as now documented in
Documentation/core-api/timekeeping.rst.
An open questions is whether we may want to actually call the more
accurate ktime_get_real_ts64() for file systems that save high-resolution
timestamps in their on-disk format. This would add a small overhead to
each update of the inode stamps but lead to inode timestamps to actually
have a usable resolution better than one jiffy (1 to 10 milliseconds
normally). Experiments on a variety of hardware platforms show a typical
time of around 100 CPU cycles to read the cycle counter and calculate the
accurate time from that. On old platforms without a cycle counter, this
can be signiciantly higher, up to several microseconds to access a
hardware clock, but those have become very rare by now.
I traced the original addition of the current_kernel_time() call to set
the nanosecond fields back to linux-2.5.48, where Andi Kleen added a patch
with subject "nanosecond stat timefields". Andi explains that the
motivation was to introduce as little overhead as possible back then. At
this time, reading the clock hardware was also more expensive when most
architectures did not have a cycle counter.
One side effect of having more accurate inode timestamp would be having to
write out the inode every time that mtime/ctime/atime get touched on most
systems, whereas many file systems today only write it when the timestamps
have changed, i.e. at most once per jiffy unless something else changes
as well. That change would certainly be noticed in some workloads, which
is enough reason to not do it without a good reason, regardless of the
cost of reading the time.
One thing we could still consider however would be to round the timestamps
from current_time() to multiples of NSEC_PER_JIFFY, e.g. full
milliseconds rather than having six or seven meaningless but confusing
digits at the end of the timestamp.
Link: http://lkml.kernel.org/r/20180726130820.4174359-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
syzbot is reporting too large memory allocation at bfs_fill_super() [1].
Since file system image is corrupted such that bfs_sb->s_start == 0,
bfs_fill_super() is trying to allocate 8MB of continuous memory. Fix this
by adding a sanity check on bfs_sb->s_start, __GFP_NOWARN and printf().
[1] https://syzkaller.appspot.com/bug?id=16a87c236b951351374a84c8a32f40edbc034e96
Link: http://lkml.kernel.org/r/1525862104-3407-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+71c6b5d68e91149fc8a4@syzkaller.appspotmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Tigran Aivazian <aivazian.tigran@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
fill_with_dentries() failed to propagate errors up to
reiserfs_for_each_xattr() properly. Plumb them through.
Note that reiserfs_for_each_xattr() is only used by
reiserfs_delete_xattrs() and reiserfs_chown_xattrs(). The result of
reiserfs_delete_xattrs() is discarded anyway, the only difference there is
whether a warning is printed to dmesg. The result of
reiserfs_chown_xattrs() does matter because it can block chowning of the
file to which the xattrs belong; but either way, the resulting state can
have misaligned ownership, so my patch doesn't improve things greatly.
Credit for making me look at this code goes to Al Viro, who pointed out
that the ->actor calling convention is suboptimal and should be changed.
Link: http://lkml.kernel.org/r/20180802163335.83312-1-jannh@google.com
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Inserting a new record in a btree may require splitting several of its
nodes. If we hit ENOSPC halfway through, the new nodes will be left
orphaned and their records will be lost. This could mean lost inodes
or extents.
>From now on, check the available disk space before making any changes.
This still leaves the potential problem of corruption on ENOMEM.
There is no need to reserve space before deleting a catalog record, as
we do for hfsplus. This difference is because hfs index nodes have
fixed length keys.
Link: http://lkml.kernel.org/r/68e77996d38baf789e8dc108ca6d210c5987f2ea.1535682464.git.ernesto.mnd.fernandez@gmail.com
Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
hfs_brec_update_parent() may hit BUG_ON() if the first record of both a
leaf node and its parent are changed, and if this forces the parent to
be split. It is not possible for this to happen on a valid hfs
filesystem because the index nodes have fixed length keys.
For reasons I ignore, the hfs module does have support for a number of
hfsplus features. A corrupt btree header may report variable length
keys and trigger this BUG, so it's better to fix it.
Link: http://lkml.kernel.org/r/cf9b02d57f806217a2b1bf5db8c3e39730d8f603.1535682463.git.ernesto.mnd.fernandez@gmail.com
Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This bug is triggered whenever hfs_brec_update_parent() needs to split
the root node. The height of the btree is not increased, which leaves
the new node orphaned and its records lost. It is not possible for this
to happen on a valid hfs filesystem because the index nodes have fixed
length keys.
For reasons I ignore, the hfs module does have support for a number of
hfsplus features. A corrupt btree header may report variable length
keys and trigger this bug, so it's better to fix it.
Link: http://lkml.kernel.org/r/9750b1415685c4adca10766895f6d5ef12babdb0.1535682463.git.ernesto.mnd.fernandez@gmail.com
Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Inserting or deleting a record in a btree may require splitting several of
its nodes. If we hit ENOSPC halfway through, the new nodes will be left
orphaned and their records will be lost. This could mean lost inodes,
extents or xattrs.
Henceforth, check the available disk space before making any changes.
This still leaves the potential problem of corruption on ENOMEM.
The patch can be tested with xfstests generic/027.
Link: http://lkml.kernel.org/r/4ecd0a1d09756aadb173809e18285e807994e90b.1535682462.git.ernesto.mnd.fernandez@gmail.com
Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Creating, renaming or deleting a file may hit BUG_ON() if the first
record of both a leaf node and its parent are changed, and if this
forces the parent to be split. This bug is triggered by xfstests
generic/027, somewhat rarely; here is a more reliable reproducer:
truncate -s 50M fs.iso
mkfs.hfsplus fs.iso
mount fs.iso /mnt
i=1000
while [ $i -le 2400 ]; do
touch /mnt/$i &>/dev/null
((++i))
done
i=2400
while [ $i -ge 1000 ]; do
mv /mnt/$i /mnt/$(perl -e "print $i x61") &>/dev/null
((--i))
done
The issue is that a newly created bnode is being put twice. Reset
new_node to NULL in hfs_brec_update_parent() before reaching goto again.
Link: http://lkml.kernel.org/r/5ee1db09b60373a15890f6a7c835d00e76bf601d.1535682461.git.ernesto.mnd.fernandez@gmail.com
Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Creating, renaming or deleting a file may cause catalog corruption and
data loss. This bug is randomly triggered by xfstests generic/027, but
here is a faster reproducer:
truncate -s 50M fs.iso
mkfs.hfsplus fs.iso
mount fs.iso /mnt
i=100
while [ $i -le 150 ]; do
touch /mnt/$i &>/dev/null
((++i))
done
i=100
while [ $i -le 150 ]; do
mv /mnt/$i /mnt/$(perl -e "print $i x82") &>/dev/null
((++i))
done
umount /mnt
fsck.hfsplus -n fs.iso
The bug is triggered whenever hfs_brec_update_parent() needs to split the
root node. The height of the btree is not increased, which leaves the new
node orphaned and its records lost.
Link: http://lkml.kernel.org/r/26d882184fc43043a810114258f45277752186c7.1535682461.git.ernesto.mnd.fernandez@gmail.com
Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Support referencing the root partition label from GPT as argument
to the root= option on the kernel command line in analogy to
referencing the partition uuid as root=PARTUUID=<uuid>.
Specifying the partition label instead of the uuid is often much
easier, e.g. in embedded environments when there is an
A/B rootfs partition scheme for interruptible firmware updates
(i.e. rootfsA/ rootfsB).
The partition label can be queried with the blkid command.
Link: http://lkml.kernel.org/r/20180822060904.828E510665E@pc-niv.weinmann.com
Signed-off-by: Nikolaus Voss <nikolaus.voss@loewensteinmedical.de>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Sasha Levin <Alexander.Levin@microsoft.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This warning was there to avoid the use of 0bxxx values as they are not
supported by gcc prior to v4.3
Since cafa0010cd51f ("Raise the minimum required gcc version to 4.6"),
it's not an issue anymore and using such values can increase readability
of code.
Joe said:
: Seems sensible as the other compilers also support binary literals from
: relatively old versions.
: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3472.pdf
: https://software.intel.com/en-us/articles/c14-features-supported-by-intel-c-compiler
Link: http://lkml.kernel.org/r/392eeae782302ee8812a3c932a602035deed1609.1535351453.git.christophe.leroy@c-s.fr
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This simplifies the code. No change in behavior.
Link: http://lkml.kernel.org/r/20180830194814.192880-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This simplifies the code. No change in behavior.
Link: http://lkml.kernel.org/r/20180830194436.188867-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
len is guaranteed to lie in [1, PAGE_SIZE]. If scnprintf is called with a
buffer size of 1, it is guaranteed to return 0. So in the extremely
unlikely case of having just one byte remaining in the page, let's just
call scnprintf anyway. The only difference is that this will write a '\0'
to that final byte in the page, but that's an improvement: We now
guarantee that after the call, buf is a properly terminated C string of
length exactly the return value.
Link: http://lkml.kernel.org/r/20180818131623.8755-8-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Yury Norov <ynorov@caviumnetworks.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
include mm.h for offset_in_page()
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Yury Norov <ynorov@caviumnetworks.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
use offset_in_page(), per Andy
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Yury Norov <ynorov@caviumnetworks.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
For various alignments of buf, the current expression computes
4096 ok
4095 ok
8190
8189
...
4097
i.e., if the caller has already written two bytes into the page buffer,
len is 8190 rather than 4094, because PTR_ALIGN aligns up to the next
boundary. So if the printed version of the bitmap is huge, scnprintf()
ends up writing beyond the page boundary.
I don't think any current callers actually write anything before
bitmap_print_to_pagebuf, but the API seems to be designed to allow it.
Link: http://lkml.kernel.org/r/20180818131623.8755-7-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Yury Norov <ynorov@caviumnetworks.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
It's not clear what's so horrible about emitting a function call to handle
a run-time sized bitmap. Moreover, gcc also emits a function call for a
compile-time-constant-but-huge nbits, so the comment isn't even accurate.
Link: http://lkml.kernel.org/r/20180818131623.8755-6-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Yury Norov <ynorov@caviumnetworks.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Most other bitmap API, including the OOL version __bitmap_shift_right,
take unsigned nbits. This was accidentally left out from 2fbad29917c98.
Link: http://lkml.kernel.org/r/20180818131623.8755-5-linux@rasmusvillemoes.dk
Fixes: 2fbad29917c98 ("lib: bitmap: change bitmap_shift_right to take unsigned parameters")
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reported-by: Yury Norov <ynorov@caviumnetworks.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In the _zero, _fill and _copy functions, the small_const_nbits branch is
redundant. If nbits is small and const, gcc knows full well that
BITS_TO_LONGS(nbits) is 1, so len is also a compile-time constant
(sizeof(long)), and calling memset or memcpy with a length argument of
sizeof(long) makes gcc generate the expected code anyway:
#include <string.h>
void a(unsigned long *x) { memset(x, 0, 8); }
void b(unsigned long *x) { memset(x, 0xff, 8); }
void c(unsigned long *x, const unsigned long *y) { memcpy(x, y, 8); }
turns into
0000000000000000 <a>:
0: 48 c7 07 00 00 00 00 movq $0x0,(%rdi)
7: c3 retq
8: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
f: 00
0000000000000010 <b>:
10: 48 c7 07 ff ff ff ff movq $0xffffffffffffffff,(%rdi)
17: c3 retq
18: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
1f: 00
0000000000000020 <c>:
20: 48 8b 06 mov (%rsi),%rax
23: 48 89 07 mov %rax,(%rdi)
26: c3 retq
Link: http://lkml.kernel.org/r/20180818131623.8755-4-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Yury Norov <ynorov@caviumnetworks.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The static inlines in bitmap.h do not handle a compile-time constant
nbits==0 correctly (they dereference the passed src or dst pointers,
despite only 0 words being valid to access). I had the 0-day buildbot
chew on a patch [1] that would cause build failures for such cases without
complaining, suggesting that we don't have any such users currently, at
least for the 70 .config/arch combinations that was built. Should any
turn up, make sure they use the out-of-line versions, which do handle
nbits==0 correctly.
This is of course not the most efficient, but it's much less churn than
teaching all the static inlines an "if (zero_const_nbits())", and since we
don't have any current instances, this doesn't affect existing code at
all.
[1] lkml.kernel.org/r/20180815085539.27485-1-linux@rasmusvillemoes.dk
Link: http://lkml.kernel.org/r/20180818131623.8755-3-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Yury Norov <ynorov@caviumnetworks.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This promise is violated in a number of places, e.g. already in the
second function below this paragraph. Since I don't think anybody relies
on this being true, and since actually honouring it would hurt performance
and code size in various places, just remove the paragraph.
Link: http://lkml.kernel.org/r/20180818131623.8755-2-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Yury Norov <ynorov@caviumnetworks.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Among syzbot's unresolved hung task reports, 18 out of 65 reports contain
__getblk_gfp() line in the backtrace. Since there is a comment block that
says that __getblk_gfp() will lock up the machine if try_to_free_buffers()
attempt from grow_dev_page() is failing, let's start from checking whether
syzbot is hitting that case. This change will be removed after the bug is
fixed.
Link: http://lkml.kernel.org/r/9b9fcdda-c347-53ee-fdbb-8a7d11cf430e@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: <syzkaller-bugs@googlegroups.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When pfn_valid(pfn) returns false, pfn should be aligned with
pageblock_nr_pages other than MAX_ORDER_NR_PAGES in init_pages_in_zone,
because the skipped 2M may be valid pfn, as a result, early allocated
count will not be accurate.
Link: http://lkml.kernel.org/r/1468938136-24228-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Currently, init_pages_in_zone() walks the zone in pageblock_nr_pages
steps. MAX_ORDER_NR_PAGES is possible to have holes when
CONFIG_HOLES_IN_ZONE is set. it is likely to be different between
MAX_ORDER_NR_PAGES and pageblock_nr_pages. if we skip the size of
MAX_ORDER_NR_PAGES, it will result in the second 2M memroy leak.
Meanwhile, the change will make the code consistent. because the entire
function is based on the pageblock_nr_pages steps.
Link: http://lkml.kernel.org/r/1512395284-13588-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We don't want to expose page before it's properly setup. During page
setup, we may call page_add_new_anon_rmap() which uses non- atomic bit op.
If page is exposed before it's done, we could overwrite page flags that
are set by get_user_pages_fast() or its callers. Here is a non-fatal
scenario (there might be other fatal problems that I didn't look into):
CPU 1 CPU1
set_pte_at() get_user_pages_fast()
page_add_new_anon_rmap() gup_pte_range()
__SetPageSwapBacked() SetPageReferenced()
Fix the problem by delaying set_pte_at() until page is ready.
I didn't observe the race directly. But I did get few crashes when
trying to access mem_cgroup of pages returned by get_user_pages_fast().
Those page were charged and they showed valid mem_cgroup in kdumps.
So this led me to think the problem came from premature set_pte_at().
I think the fact that nobody complained about this problem is because
the race only happens when using ksm+swap, and it might not cause any
fatal problem even so. Nevertheless, it's nice to have set_pte_at()
done consistently after rmap is added and page is charged.
Link: http://lkml.kernel.org/r/20180108225632.16332-1-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The "strictlimit" feature was introduced to enforce per-bdi dirty limits
for FUSE which sets bdi max_ratio to 1% by default:
http://article.gmane.org/gmane.linux.kernel.mm/105809
However the feature can be useful for other relatively slow or untrusted
BDIs like USB flash drives and DVD+RW. The patch adds a knob to enable
the feature:
echo 1 > /sys/class/bdi/X:Y/strictlimit
Being enabled, the feature enforces bdi max_ratio limit even if global
(10%) dirty limit is not reached. Of course, the effect is not visible
until /sys/class/bdi/X:Y/max_ratio is decreased to some reasonable value.
Jan said:
: In principle I have nothing against this and the usecase sounds reasonable
: (in fact I believe the lack of a feature like this is one of reasons why
: desktop automounters usually mount USB devices with 'sync' mount option).
: So feel free to add:
Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: "Artem S. Tashkinov" <t.artem@lycos.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Jan Kara <jack@suse.cz>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
include prefetch.h, remove unlocked list_empty() test, per Dave
Cc: Dave Chinner <david@fromorbit.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
list_lru_del() removes the given item from the LRU list. The operation
looks simple, but it involves writing into the cachelines of the two
neighboring list entries in order to get the deletion done. That can take
a while if the cachelines aren't there yet, thus prolonging the lock hold
time.
To reduce the lock hold time, the cachelines of the two neighboring list
entries are now prefetched before acquiring the list_lru_node's lock.
Using a multi-threaded test program that created a large number of
dentries and then killed them, the execution time was reduced from 38.5s
to 36.6s after applying the patch on a 2-socket 36-core 72-thread x86-64
system.
Link: http://lkml.kernel.org/r/1511965054-6328-1-git-send-email-longman@redhat.com
Signed-off-by: Waiman Long <longman@redhat.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Via commit 4b3ef9daa4fc ("mm/swap: split swap cache into 64MB trunks") on,
after swapoff, the address_space associated with the swap device will be
freed. So swap_address_space() users which touch the address_space need
some kind of mechanism to prevent the address_space from being freed
during accessing.
When mincore process unmapped range for swapped shmem pages, it doesn't
hold the lock to prevent swap device from being swapoff. So the following
race is possible,
CPU1 CPU2
do_mincore() swapoff()
walk_page_range()
mincore_unmapped_range()
__mincore_unmapped_range
mincore_page
as = swap_address_space()
... exit_swap_address_space()
... kvfree(spaces)
find_get_page(as)
The address space may be accessed after being freed.
To fix the race, get_swap_device()/put_swap_device() is used to enclose
find_get_page() to check whether the swap entry is valid and prevent the
swap device from being swapoff during accessing.
Link: http://lkml.kernel.org/r/20180313012036.1597-1-ying.huang@intel.com
Fixes: 4b3ef9daa4fc ("mm/swap: split swap cache into 64MB trunks")
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
- Add more comments to get_swap_device() to make it more clear about
possible swapoff or swapoff+swapon.
Link: http://lkml.kernel.org/r/20180223060010.954-1-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When swapin is performed, after getting the swap entry information from
the page table, system will swap in the swap entry, without any lock held
to prevent the swap device from being swapoff. This may cause the race
like below,
CPU 1 CPU 2
----- -----
do_swap_page
swapin_readahead
__read_swap_cache_async
swapoff swapcache_prepare
p->swap_map = NULL __swap_duplicate
p->swap_map[?] /* !!! NULL pointer access */
Because swapoff is usually done when system shutdown only, the race may
not hit many people in practice. But it is still a race need to be fixed.
To fix the race, get_swap_device() is added to check whether the specified
swap entry is valid in its swap device. If so, it will keep the swap
entry valid via preventing the swap device from being swapoff, until
put_swap_device() is called.
Because swapoff() is very rare code path, to make the normal path runs as
fast as possible, disabling preemption + stop_machine() instead of
reference count is used to implement get/put_swap_device(). From
get_swap_device() to put_swap_device(), the preemption is disabled, so
stop_machine() in swapoff() will wait until put_swap_device() is called.
In addition to swap_map, cluster_info, etc. data structure in the struct
swap_info_struct, the swap cache radix tree will be freed after swapoff,
so this patch fixes the race between swap cache looking up and swapoff
too.
Races between some other swap cache usages protected via disabling
preemption and swapoff are fixed too via calling stop_machine() between
clearing PageSwapCache() and freeing swap cache data structure.
Alternative implementation could be replacing disable preemption with
rcu_read_lock_sched and stop_machine() with synchronize_sched().
Link: http://lkml.kernel.org/r/20180213014220.2464-1-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
uninline overlap_memmap_init()
Link: http://lkml.kernel.org/r/20180726193509.3326-4-pasha.tatashin@oracle.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
memmap_init_zone, is getting complex, because it is called from different
contexts: hotplug, and during boot, and also because it must handle some
architecture quirks. One of them is mirrored memory.
Move the code that decides whether to skip mirrored memory outside of
memmap_init_zone, into a separate function.
Link: http://lkml.kernel.org/r/20180724235520.10200-4-pasha.tatashin@oracle.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
make defer_init non-inline, __meminit
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
v2: add comment about defer_init's lack of locking
Link: http://lkml.kernel.org/r/20180726193509.3326-3-pasha.tatashin@oracle.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
update_defer_init() should be called only when struct page is about to be
initialized. Because it counts number of initialized struct pages, but
there we may skip struct pages if there is some mirrored memory.
So move, update_defer_init() after checking for mirrored memory.
Also, rename update_defer_init() to defer_init() and reverse the return
boolean to emphasize that this is a boolean function, that tells that the
reset of memmap initialization should be deferred.
Make this function self-contained: do not pass number of already
initialized pages in this zone by using static counters.
I found this bug by reading the code. The effect is that fewer than
expected struct pages are initialized early in boot, and it is possible
that in some corner cases we may fail to boot when mirrored pages are
used. The deferred on demand code should somewhat mitigate this. But
this still brings some inconsistencies compared to when booting without
mirrored pages, so it is better to fix.
Link: http://lkml.kernel.org/r/20180724235520.10200-3-pasha.tatashin@oracle.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
memmap_init is sometimes a macro sometimes a function based on
__HAVE_ARCH_MEMMAP_INIT. It is only a function on ia64. Make memmap_init
a weak function instead, and let ia64 redefine it.
Link: http://lkml.kernel.org/r/20180724235520.10200-2-pasha.tatashin@oracle.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
During the processing of headless pages in z3fold_reclaim_page(), there
was a problem that the zhdr pointed to another page or a page was already
released in z3fold_free(). So, the wrong page is encoded in headless, or
test_bit does not work properly in z3fold_reclaim_page(). This patch
fixed these problems.
Link: http://lkml.kernel.org/r/1530853846-30215-1-git-send-email-ks77sj@gmail.com
Signed-off-by: Jongseok Kim <ks77sj@gmail.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where
possible") optimized the loop in memmap_init_zone(). But there is still
some room for improvement. E.g. in early_pfn_valid(), if pfn and pfn+1
are in the same memblock region, we can record the last returned memblock
region index and check whether pfn++ is still in the same region.
Currently it only improve the performance on arm/arm64 and will have no
impact on other arches.
For the performance improvement, after this set, I can see the time
overhead of memmap_init() is reduced from 27956us to 13537us in my armv8a
server(QDF2400 with 96G memory, pagesize 64k).
Link: http://lkml.kernel.org/r/1530867675-9018-7-git-send-email-hejianet@gmail.com
Signed-off-by: Jia He <jia.he@hxt-semitech.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Daniel Vacek <neelx@redhat.com>
Cc: Eugeniu Rosca <erosca@de.adit-jv.com>
Cc: Gioh Kim <gi-oh.kim@profitbricks.com>
Cc: James Morse <james.morse@arm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kemi Wang <kemi.wang@intel.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nikolay Borisov <nborisov@suse.com>
Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Cc: Petr Tesarik <ptesarik@suse.com>
Cc: Philip Derrin <philip@cog.systems>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where
possible") optimized the loop in memmap_init_zone(). But there is still
some room for improvement. E.g. in early_pfn_valid(), we can record the
last returned memblock region. If current pfn and last pfn are in the
same memory region, we needn't do the unnecessary binary searches because
memblock_is_nomap is the same result for whole memory region.
Link: http://lkml.kernel.org/r/1530867675-9018-6-git-send-email-hejianet@gmail.com
Signed-off-by: Jia He <jia.he@hxt-semitech.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Daniel Vacek <neelx@redhat.com>
Cc: Eugeniu Rosca <erosca@de.adit-jv.com>
Cc: Gioh Kim <gi-oh.kim@profitbricks.com>
Cc: James Morse <james.morse@arm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kemi Wang <kemi.wang@intel.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nikolay Borisov <nborisov@suse.com>
Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Cc: Petr Tesarik <ptesarik@suse.com>
Cc: Philip Derrin <philip@cog.systems>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
simplify code
Cc: Jia He <jia.he@hxt-semitech.com>
Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This helper is to find the memory region index of input pfn.
Link: http://lkml.kernel.org/r/1530867675-9018-5-git-send-email-hejianet@gmail.com
Signed-off-by: Jia He <jia.he@hxt-semitech.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Daniel Vacek <neelx@redhat.com>
Cc: Eugeniu Rosca <erosca@de.adit-jv.com>
Cc: Gioh Kim <gi-oh.kim@profitbricks.com>
Cc: James Morse <james.morse@arm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kemi Wang <kemi.wang@intel.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nikolay Borisov <nborisov@suse.com>
Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Cc: Petr Tesarik <ptesarik@suse.com>
Cc: Philip Derrin <philip@cog.systems>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
x-fix
fix bogus fix
Cc: Jia He <jia.he@hxt-semitech.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
s/ulong/unsigned long/, make early_region_idx local to memblock_next_valid_pfn()
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Daniel Vacek <neelx@redhat.com>
Cc: Eugeniu Rosca <erosca@de.adit-jv.com>
Cc: Gioh Kim <gi-oh.kim@profitbricks.com>
Cc: James Morse <james.morse@arm.com>
Cc: Jia He <jia.he@hxt-semitech.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kemi Wang <kemi.wang@intel.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nikolay Borisov <nborisov@suse.com>
Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Cc: Petr Tesarik <ptesarik@suse.com>
Cc: Philip Derrin <philip@cog.systems>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where
possible") optimized the loop in memmap_init_zone(). But there is still
some room for improvement.
E.g. if pfn and pfn+1 are in the same memblock region, we can simply
pfn++ instead of doing the binary search in memblock_next_valid_pfn.
Furthermore, if the pfn is in a gap of two memory region, skip to next
region directly if possible.
Attached the memblock region information in my server.
[ 0.000000] Zone ranges:
[ 0.000000] DMA32 [mem 0x0000000000200000-0x00000000ffffffff]
[ 0.000000] Normal [mem 0x0000000100000000-0x00000017ffffffff]
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000000200000-0x000000000021ffff]
[ 0.000000] node 0: [mem 0x0000000000820000-0x000000000307ffff]
[ 0.000000] node 0: [mem 0x0000000003080000-0x000000000308ffff]
[ 0.000000] node 0: [mem 0x0000000003090000-0x00000000031fffff]
[ 0.000000] node 0: [mem 0x0000000003200000-0x00000000033fffff]
[ 0.000000] node 0: [mem 0x0000000003410000-0x00000000034fffff]
[ 0.000000] node 0: [mem 0x0000000003500000-0x000000000351ffff]
[ 0.000000] node 0: [mem 0x0000000003520000-0x000000000353ffff]
[ 0.000000] node 0: [mem 0x0000000003540000-0x0000000003e3ffff]
[ 0.000000] node 0: [mem 0x0000000003e40000-0x0000000003e7ffff]
[ 0.000000] node 0: [mem 0x0000000003e80000-0x0000000003ecffff]
[ 0.000000] node 0: [mem 0x0000000003ed0000-0x0000000003ed5fff]
[ 0.000000] node 0: [mem 0x0000000003ed6000-0x0000000006eeafff]
[ 0.000000] node 0: [mem 0x0000000006eeb000-0x000000000710ffff]
[ 0.000000] node 0: [mem 0x0000000007110000-0x0000000007f0ffff]
[ 0.000000] node 0: [mem 0x0000000007f10000-0x0000000007faffff]
[ 0.000000] node 0: [mem 0x0000000007fb0000-0x000000000806ffff]
[ 0.000000] node 0: [mem 0x0000000008070000-0x00000000080affff]
[ 0.000000] node 0: [mem 0x00000000080b0000-0x000000000832ffff]
[ 0.000000] node 0: [mem 0x0000000008330000-0x000000000836ffff]
[ 0.000000] node 0: [mem 0x0000000008370000-0x000000000838ffff]
[ 0.000000] node 0: [mem 0x0000000008390000-0x00000000083a9fff]
[ 0.000000] node 0: [mem 0x00000000083aa000-0x00000000083bbfff]
[ 0.000000] node 0: [mem 0x00000000083bc000-0x00000000083fffff]
[ 0.000000] node 0: [mem 0x0000000008400000-0x000000000841ffff]
[ 0.000000] node 0: [mem 0x0000000008420000-0x000000000843ffff]
[ 0.000000] node 0: [mem 0x0000000008440000-0x000000000865ffff]
[ 0.000000] node 0: [mem 0x0000000008660000-0x000000000869ffff]
[ 0.000000] node 0: [mem 0x00000000086a0000-0x00000000086affff]
[ 0.000000] node 0: [mem 0x00000000086b0000-0x00000000086effff]
[ 0.000000] node 0: [mem 0x00000000086f0000-0x0000000008b6ffff]
[ 0.000000] node 0: [mem 0x0000000008b70000-0x0000000008bbffff]
[ 0.000000] node 0: [mem 0x0000000008bc0000-0x0000000008edffff]
[ 0.000000] node 0: [mem 0x0000000008ee0000-0x0000000008ee0fff]
[ 0.000000] node 0: [mem 0x0000000008ee1000-0x0000000008ee2fff]
[ 0.000000] node 0: [mem 0x0000000008ee3000-0x000000000decffff]
[ 0.000000] node 0: [mem 0x000000000ded0000-0x000000000defffff]
[ 0.000000] node 0: [mem 0x000000000df00000-0x000000000fffffff]
[ 0.000000] node 0: [mem 0x0000000010800000-0x0000000017feffff]
[ 0.000000] node 0: [mem 0x000000001c000000-0x000000001c00ffff]
[ 0.000000] node 0: [mem 0x000000001c010000-0x000000001c7fffff]
[ 0.000000] node 0: [mem 0x000000001c810000-0x000000007efbffff]
[ 0.000000] node 0: [mem 0x000000007efc0000-0x000000007efdffff]
[ 0.000000] node 0: [mem 0x000000007efe0000-0x000000007efeffff]
[ 0.000000] node 0: [mem 0x000000007eff0000-0x000000007effffff]
[ 0.000000] node 0: [mem 0x000000007f000000-0x00000017ffffffff]
[ 0.000000] Initmem setup node 0 [mem
0x0000000000200000-0x00000017ffffffff]
[ 0.000000] On node 0 totalpages: 25145296
[ 0.000000] DMA32 zone: 16376 pages used for memmap
[ 0.000000] DMA32 zone: 0 pages reserved
[ 0.000000] DMA32 zone: 1028048 pages, LIFO batch:31
[ 0.000000] Normal zone: 376832 pages used for memmap
[ 0.000000] Normal zone: 24117248 pages, LIFO batch:31
Link: http://lkml.kernel.org/r/1530867675-9018-4-git-send-email-hejianet@gmail.com
Signed-off-by: Jia He <jia.he@hxt-semitech.com>
Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Daniel Vacek <neelx@redhat.com>
Cc: Eugeniu Rosca <erosca@de.adit-jv.com>
Cc: Gioh Kim <gi-oh.kim@profitbricks.com>
Cc: James Morse <james.morse@arm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kemi Wang <kemi.wang@intel.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nikolay Borisov <nborisov@suse.com>
Cc: Petr Tesarik <ptesarik@suse.com>
Cc: Philip Derrin <philip@cog.systems>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|