summaryrefslogtreecommitdiffstats
path: root/kernel
Commit message (Collapse)AuthorAgeFilesLines
...
* | | | Merge tag 'for_linus-3.7' of ↵Linus Torvalds2012-10-134-5/+50
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb Pull KGDB/KDB fixes and cleanups from Jason Wessel: "Cleanups - Clean up compile warnings in kgdboc.c and x86/kernel/kgdb.c - Add module event hooks for simplified debugging with gdb Fixes - Fix kdb to stop paging with 'q' on bta and dmesg - Fix for data that scrolls off the vga console due to line wrapping when using the kdb pager New - The debug core registers for kernel module events which allows a kernel aware gdb to automatically load symbols and break on entry to a kernel module - Allow kgdboc=kdb to setup kdb on the vga console" * tag 'for_linus-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb: tty/console: fix warnings in drivers/tty/serial/kgdboc.c kdb,vt_console: Fix missed data due to pager overruns kdb: Fix dmesg/bta scroll to quit with 'q' kgdboc: Accept either kbd or kdb to activate the vga + keyboard kdb shell kgdb,x86: fix warning about unused variable mips,kgdb: fix recursive page fault with CONFIG_KPROBES kgdb: Add module event hooks
| * | | | kdb,vt_console: Fix missed data due to pager overrunsJason Wessel2012-10-121-5/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is possible to miss data when using the kdb pager. The kdb pager does not pay attention to the maximum column constraint of the screen or serial terminal. This result is not incrementing the shown lines correctly and the pager will print more lines that fit on the screen. Obviously that is less than useful when using a VGA console where you cannot scroll back. The pager will now look at the kdb_buffer string to see how many characters are printed. It might not be perfect considering you can output ASCII that might move the cursor position, but it is a substantially better approximation for viewing dmesg and trace logs. This also means that the vt screen needs to set the kdb COLUMNS variable. Cc: <stable@vger.kernel.org> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
| * | | | kdb: Fix dmesg/bta scroll to quit with 'q'Jason Wessel2012-10-122-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If you press 'q' the pager should exit instead of printing everything from dmesg which can really bog down a 9600 baud serial link. The same is true for the bta command. Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
| * | | | kgdb: Add module event hooksJason Wessel2012-10-121-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow gdb to auto load kernel modules when it is attached, which makes it trivially easy to debug module init functions or pre-set breakpoints in a kernel module that has not loaded yet. Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
* | | | | Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds2012-10-131-9/+12
|\ \ \ \ \ | | |_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "This tree includes some late late perf items that missed the first round: tools: - Bash auto completion improvements, now we can auto complete the tools long options, tracepoint event names, etc, from Namhyung Kim. - Look up thread using tid instead of pid in 'perf sched'. - Move global variables into a perf_kvm struct, from David Ahern. - Hists refactorings, preparatory for improved 'diff' command, from Jiri Olsa. - Hists refactorings, preparatory for event group viewieng work, from Namhyung Kim. - Remove double negation on optional feature macro definitions, from Namhyung Kim. - Remove several cases of needless global variables, on most builtins. - misc fixes kernel: - sysfs support for IBS on AMD CPUs, from Robert Richter. - Support for an upcoming Intel CPU, the Xeon-Phi / Knights Corner HPC blade PMU, from Vince Weaver. - misc fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits) perf: Fix perf_cgroup_switch for sw-events perf: Clarify perf_cpu_context::active_pmu usage by renaming it to ::unique_pmu perf/AMD/IBS: Add sysfs support perf hists: Add more helpers for hist entry stat perf hists: Move he->stat.nr_events initialization to a template perf hists: Introduce struct he_stat perf diff: Removing the total_period argument from output code perf tool: Add hpp interface to enable/disable hpp column perf tools: Removing hists pair argument from output path perf hists: Separate overhead and baseline columns perf diff: Refactor diff displacement possition info perf hists: Add struct hists pointer to struct hist_entry perf tools: Complete tracepoint event names perf/x86: Add support for Intel Xeon-Phi Knights Corner PMU perf evlist: Remove some unused methods perf evlist: Introduce add_newtp method perf kvm: Move global variables into a perf_kvm struct perf tools: Convert to BACKTRACE_SUPPORT perf tools: Long option completion support for each subcommands perf tools: Complete long option names of perf command ...
| * | | | perf: Fix perf_cgroup_switch for sw-eventsPeter Zijlstra2012-10-051-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jiri reported that he could trigger the WARN_ON_ONCE() in perf_cgroup_switch() using sw-events. This is because sw-events share a cpuctx with multiple PMUs. Use the ->unique_pmu pointer to limit the pmu iteration to unique cpuctx instances. Reported-and-Tested-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-so7wi2zf3jjzrwcutm2mkz0j@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | perf: Clarify perf_cpu_context::active_pmu usage by renaming it to ::unique_pmuPeter Zijlstra2012-10-051-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Stephane thought the perf_cpu_context::active_pmu name confusing and suggested using 'unique_pmu' instead. This pointer is a pointer to a 'random' pmu sharing the cpuctx instance, therefore limiting a for_each_pmu loop to those where cpuctx->unique_pmu matches the pmu we get a loop over unique cpuctx instances. Suggested-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-kxyjqpfj2fn9gt7kwu5ag9ks@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | Merge branch 'for-linus' of ↵Linus Torvalds2012-10-132-2/+6
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull third pile of kernel_execve() patches from Al Viro: "The last bits of infrastructure for kernel_thread() et.al., with alpha/arm/x86 use of those. Plus sanitizing the asm glue and do_notify_resume() on alpha, fixing the "disabled irq while running task_work stuff" breakage there. At that point the rest of kernel_thread/kernel_execve/sys_execve work can be done independently for different architectures. The only pending bits that do depend on having all architectures converted are restrictred to fs/* and kernel/* - that'll obviously have to wait for the next cycle. I thought we'd have to wait for all of them done before we start eliminating the longjump-style insanity in kernel_execve(), but it turned out there's a very simple way to do that without flagday-style changes." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: alpha: switch to saner kernel_execve() semantics arm: switch to saner kernel_execve() semantics x86, um: convert to saner kernel_execve() semantics infrastructure for saner ret_from_kernel_thread semantics make sure that kernel_thread() callbacks call do_exit() themselves make sure that we always have a return path from kernel_execve() ppc: eeh_event should just use kthread_run() don't bother with kernel_thread/kernel_execve for launching linuxrc alpha: get rid of switch_stack argument of do_work_pending() alpha: don't bother passing switch_stack separately from regs alpha: take SIGPENDING/NOTIFY_RESUME loop into signal.c alpha: simplify TIF_NEED_RESCHED handling
| * | | | | infrastructure for saner ret_from_kernel_thread semanticsAl Viro2012-10-122-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * allow kernel_execve() leave the actual return to userland to caller (selected by CONFIG_GENERIC_KERNEL_EXECVE). Callers updated accordingly. * architecture that does select GENERIC_KERNEL_EXECVE in its Kconfig should have its ret_from_kernel_thread() do this: call schedule_tail call the callback left for it by copy_thread(); if it ever returns, that's because it has just done successful kernel_execve() jump to return from syscall IOW, its only difference from ret_from_fork() is that it does call the callback. * such an architecture should also get rid of ret_from_kernel_execve() and __ARCH_WANT_KERNEL_EXECVE This is the last part of infrastructure patches in that area - from that point on work on different architectures can live independently. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | make sure that kernel_thread() callbacks call do_exit() themselvesAl Viro2012-10-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most of them never returned anyway - only two functions had to be changed. That allows to simplify their callers a whole lot. Note that this does *not* apply to kthread_run() callbacks - all of those had been called from the same kernel_thread() callback, which did do_exit() already. This is strictly about very few low-level kernel_thread() callbacks (there are only 6 of those, mostly as part of kthread.h and kmod.h exported mechanisms, plus kernel_init() itself). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2012-10-135-99/+199
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull third pile of VFS updates from Al Viro: "Stuff from Jeff Layton, mostly. Sanitizing interplay between audit and namei, removing a lot of insanity from audit_inode() mess and getting things ready for his ESTALE patchset." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: procfs: don't need a PATH_MAX allocation to hold a string representation of an int vfs: embed struct filename inside of names_cache allocation if possible audit: make audit_inode take struct filename vfs: make path_openat take a struct filename pointer vfs: turn do_path_lookup into wrapper around struct filename variant audit: allow audit code to satisfy getname requests from its names_list vfs: define struct filename and have getname() return it vfs: unexport getname and putname symbols acct: constify the name arg to acct_on vfs: allocate page instead of names_cache buffer in mount_block_root audit: overhaul __audit_inode_child to accomodate retrying audit: optimize audit_compare_dname_path audit: make audit_compare_dname_path use parent_len helper audit: remove dirlen argument to audit_compare_dname_path audit: set the name_len in audit_inode for parent lookups audit: add a new "type" field to audit_names struct audit: reverse arguments to audit_inode_child audit: no need to walk list in audit_inode if name is NULL audit: pass in dentry to audit_copy_inode wherever possible audit: remove unnecessary NULL ptr checks from do_path_lookup
| * | | | | | audit: make audit_inode take struct filenameJeff Layton2012-10-121-2/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Keep a pointer to the audit_names "slot" in struct filename. Have all of the audit_inode callers pass a struct filename ponter to audit_inode instead of a string pointer. If the aname field is already populated, then we can skip walking the list altogether and just use it directly. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | vfs: make path_openat take a struct filename pointerJeff Layton2012-10-121-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ...and fix up the callers. For do_file_open_root, just declare a struct filename on the stack and fill out the .name field. For do_filp_open, make it also take a struct filename pointer, and fix up its callers to call it appropriately. For filp_open, add a variant that takes a struct filename pointer and turn filp_open into a wrapper around it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | audit: allow audit code to satisfy getname requests from its names_listJeff Layton2012-10-121-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, if we call getname() on a userland string more than once, we'll get multiple copies of the string and multiple audit_names records. Add a function that will allow the audit_names code to satisfy getname requests using info from the audit_names list, avoiding a new allocation and audit_names records. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | vfs: define struct filename and have getname() return itJeff Layton2012-10-122-31/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | getname() is intended to copy pathname strings from userspace into a kernel buffer. The result is just a string in kernel space. It would however be quite helpful to be able to attach some ancillary info to the string. For instance, we could attach some audit-related info to reduce the amount of audit-related processing needed. When auditing is enabled, we could also call getname() on the string more than once and not need to recopy it from userspace. This patchset converts the getname()/putname() interfaces to return a struct instead of a string. For now, the struct just tracks the string in kernel space and the original userland pointer for it. Later, we'll add other information to the struct as it becomes convenient. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | acct: constify the name arg to acct_onJeff Layton2012-10-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | audit: overhaul __audit_inode_child to accomodate retryingJeff Layton2012-10-121-27/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to accomodate retrying path-based syscalls, we need to add a new "type" argument to audit_inode_child. This will tell us whether we're looking for a child entry that represents a create or a delete. If we find a parent, don't automatically assume that we need to create a new entry. Instead, use the information we have to try to find an existing entry first. Update it if one is found and create a new one if not. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | audit: optimize audit_compare_dname_pathJeff Layton2012-10-124-12/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the cases where we already know the length of the parent, pass it as a parm so we don't need to recompute it. In the cases where we don't know the length, pass in AUDIT_NAME_FULL (-1) to indicate that it should be determined. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | audit: make audit_compare_dname_path use parent_len helperEric Paris2012-10-121-20/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | audit: remove dirlen argument to audit_compare_dname_pathJeff Layton2012-10-124-10/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All the callers set this to NULL now. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | audit: set the name_len in audit_inode for parent lookupsJeff Layton2012-10-123-12/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, this gets set mostly by happenstance when we call into audit_inode_child. While that might be a little more efficient, it seems wrong. If the syscall ends up failing before audit_inode_child ever gets called, then you'll have an audit_names record that shows the full path but has the parent inode info attached. Fix this by passing in a parent flag when we call audit_inode that gets set to the value of LOOKUP_PARENT. We can then fix up the pathname for the audit entry correctly from the get-go. While we're at it, clean up the no-op macro for audit_inode in the !CONFIG_AUDITSYSCALL case. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | audit: add a new "type" field to audit_names structJeff Layton2012-10-121-5/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For now, we just have two possibilities: UNKNOWN: for a new audit_names record that we don't know anything about yet NORMAL: for everything else In later patches, we'll add other types so we can distinguish and update records created under different circumstances. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | audit: reverse arguments to audit_inode_childJeff Layton2012-10-121-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most of the callers get called with an inode and dentry in the reverse order. The compiler then has to reshuffle the arg registers and/or stack in order to pass them on to audit_inode_child. Reverse those arguments for a micro-optimization. Reported-by: Eric Paris <eparis@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | audit: no need to walk list in audit_inode if name is NULLJeff Layton2012-10-121-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If name is NULL then the condition in the loop will never be true. Also, with this change, we can eliminate the check for n->name == NULL since the equivalence check will never be true if it is. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | audit: pass in dentry to audit_copy_inode wherever possibleJeff Layton2012-10-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In some cases, we were passing in NULL even when we have a dentry. Reported-by: Eric Paris <eparis@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | | | | Merge branch 'timers-core-for-linus' of ↵Linus Torvalds2012-10-126-164/+119
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer core update from Thomas Gleixner: - Bug fixes (one for a longstanding dead loop issue) - Rework of time related vsyscalls - Alarm timer updates - Jiffies updates to remove compile time dependencies * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping: Cast raw_interval to u64 to avoid shift overflow timers: Fix endless looping between cascade() and internal_add_timer() time/jiffies: bring back unconditional LATCH definition time: Convert x86_64 to using new update_vsyscall time: Only do nanosecond rounding on GENERIC_TIME_VSYSCALL_OLD systems time: Introduce new GENERIC_TIME_VSYSCALL time: Convert CONFIG_GENERIC_TIME_VSYSCALL to CONFIG_GENERIC_TIME_VSYSCALL_OLD time: Move update_vsyscall definitions to timekeeper_internal.h time: Move timekeeper structure to timekeeper_internal.h for vsyscall changes jiffies: Remove compile time assumptions about CLOCK_TICK_RATE jiffies: Kill unused TICK_USEC_TO_NSEC alarmtimer: Rename alarmtimer_remove to alarmtimer_dequeue alarmtimer: Remove unused helpers & defines alarmtimer: Use hrtimer per-alarm instead of per-base alarmtimer: Implement minimum alarm interval for allowing suspend
| * | | | | | | timekeeping: Cast raw_interval to u64 to avoid shift overflowDan Carpenter2012-10-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We fixed a bunch of integer overflows in timekeeping code during the 3.6 cycle. I did an audit based on that and found this potential overflow. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: John Stultz <johnstul@us.ibm.com> Link: http://lkml.kernel.org/r/20121009071823.GA19159@elgon.mountain Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org
| * | | | | | | timers: Fix endless looping between cascade() and internal_add_timer()Hildner, Christian2012-10-091-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adding two (or more) timers with large values for "expires" (they have to reside within tv5 in the same list) leads to endless looping between cascade() and internal_add_timer() in case CONFIG_BASE_SMALL is one and jiffies are crossing the value 1 << 18. The bug was introduced between 2.6.11 and 2.6.12 (and survived for quite some time). This patch ensures that when cascade() is called timers within tv5 are not added endlessly to their own list again, instead they are added to the next lower tv level tv4 (as expected). Signed-off-by: Christian Hildner <christian.hildner@siemens.com> Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com> Link: http://lkml.kernel.org/r/98673C87CB31274881CFFE0B65ECC87B0F5FC1963E@DEFTHW99EA4MSX.ww902.siemens.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org
| * | | | | | | Merge branch 'fortglx/3.7/time' of git://git.linaro.org/people/jstultz/linux ↵Thomas Gleixner2012-10-0978-2670/+3421
| |\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | into timers/core
| | * | | | | | | time: Only do nanosecond rounding on GENERIC_TIME_VSYSCALL_OLD systemsJohn Stultz2012-09-241-14/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We only do rounding to the next nanosecond so we don't see minor 1ns inconsistencies in the vsyscall implementations. Since we're changing the vsyscall implementations to avoid this, conditionalize the rounding only to the GENERIC_TIME_VSYSCALL_OLD architectures. Cc: Tony Luck <tony.luck@intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
| | * | | | | | | time: Introduce new GENERIC_TIME_VSYSCALLJohn Stultz2012-09-242-13/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we moved everyone over to GENERIC_TIME_VSYSCALL_OLD, introduce the new declaration and config option for the new update_vsyscall method. Cc: Tony Luck <tony.luck@intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
| | * | | | | | | time: Convert CONFIG_GENERIC_TIME_VSYSCALL to CONFIG_GENERIC_TIME_VSYSCALL_OLDJohn Stultz2012-09-242-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To help migrate archtectures over to the new update_vsyscall method, redfine CONFIG_GENERIC_TIME_VSYSCALL as CONFIG_GENERIC_TIME_VSYSCALL_OLD Cc: Tony Luck <tony.luck@intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
| | * | | | | | | time: Move update_vsyscall definitions to timekeeper_internal.hJohn Stultz2012-09-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since users will need to include timekeeper_internal.h, move update_vsyscall definitions to timekeeper_internal.h. Cc: Tony Luck <tony.luck@intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
| | * | | | | | | time: Move timekeeper structure to timekeeper_internal.h for vsyscall changesJohn Stultz2012-09-241-55/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We're going to need to access the timekeeper in update_vsyscall, so make the structure available for those who need it. Cc: Tony Luck <tony.luck@intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
| | * | | | | | | jiffies: Remove compile time assumptions about CLOCK_TICK_RATEJohn Stultz2012-09-241-1/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CLOCK_TICK_RATE is used to accurately caclulate exactly how a tick will be at a given HZ. This is useful, because while we'd expect NSEC_PER_SEC/HZ, the underlying hardware will have some granularity limit, so we won't be able to have exactly HZ ticks per second. This slight error can cause timekeeping quality problems when using the jiffies or other jiffies driven clocksources. Thus we currently use compile time CLOCK_TICK_RATE value to generate SHIFTED_HZ and NSEC_PER_JIFFIES, which we then use to adjust the jiffies clocksource to correct this error. Unfortunately though, since CLOCK_TICK_RATE is a compile time value, and the jiffies clocksource is registered very early during boot, there are a number of cases where there are different possible hardware timers that have different tick rates. This causes problems in cases like ARM where there are numerous different types of hardware, each having their own compile-time CLOCK_TICK_RATE, making it hard to accurately support different hardware with a single kernel. For the most part, this doesn't matter all that much, as not too many systems actually utilize the jiffies or jiffies driven clocksource. Usually there are other highres clocksources who's granularity error is negligable. Even so, we have some complicated calcualtions that we do everywhere to handle these edge cases. This patch removes the compile time SHIFTED_HZ value, and introduces a register_refined_jiffies() function. This results in the default jiffies clock as being assumed a perfect HZ freq, and allows archtectures that care about jiffies accuracy to call register_refined_jiffies() with the tick rate, specified dynamically at boot. This allows us, where necessary, to not have a compile time CLOCK_TICK_RATE constant, simplifies the jiffies code, and still provides a way to have an accurate jiffies clock. NOTE: Since this patch does not add register_refinied_jiffies() calls for every arch, it may cause time quality regressions in some cases. Its likely these will not be noticable, but if they are an issue, adding the following to the end of setup_arch() should resolve the regression: register_refinied_jiffies(CLOCK_TICK_RATE) Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
| | * | | | | | | alarmtimer: Rename alarmtimer_remove to alarmtimer_dequeueJohn Stultz2012-09-241-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that alarmtimer_remove has been simplified, change its name to _dequeue to better match its paired _enqueue function. Cc: Arve Hjønnevåg <arve@android.com> Cc: Colin Cross <ccross@android.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
| | * | | | | | | alarmtimer: Use hrtimer per-alarm instead of per-baseJohn Stultz2012-09-241-67/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Arve Hjønnevåg reported numerous crashes from the "BUG_ON(timer->state != HRTIMER_STATE_CALLBACK)" check in __run_hrtimer after it called alarmtimer_fired. It ends up the alarmtimer code was not properly handling possible failures of hrtimer_try_to_cancel, and because these faulres occur when the underlying base hrtimer is being run, this limits the ability to properly handle modifications to any alarmtimers on that base. Because much of the logic duplicates the hrtimer logic, it seems that we might as well have a per-alarmtimer hrtimer, and avoid the extra complextity of trying to multiplex many alarmtimers off of one hrtimer. Thus this patch moves the hrtimer to the alarm structure and simplifies the management logic. Changelog: v2: * Includes a fix for double alarm_start calls found by Arve Cc: Arve Hjønnevåg <arve@android.com> Cc: Colin Cross <ccross@android.com> Cc: Thomas Gleixner <tglx@linutronix.de> Reported-by: Arve Hjønnevåg <arve@android.com> Tested-by: Arve Hjønnevåg <arve@android.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
| | * | | | | | | alarmtimer: Implement minimum alarm interval for allowing suspendTodd Poynor2012-09-241-5/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | alarmtimer suspend return -EBUSY if the next alarm will fire in less than 2 seconds. This allows one RTC seconds tick to occur subsequent to this check before the alarm wakeup time is set, ensuring the wakeup time is still in the future (assuming the RTC does not tick one more second prior to setting the alarm). If suspend is rejected due to an imminent alarm, hold a wakeup source for 2 seconds to process the alarm prior to reattempting suspend. If setting the alarm incurs an -ETIME for an alarm set in the past, or any other problem setting the alarm, abort suspend and hold a wakelock for 1 second while the alarm is allowed to be serviced or other hopefully transient conditions preventing the alarm clear up. Signed-off-by: Todd Poynor <toddpoynor@google.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
* | | | | | | | | Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds2012-10-122-1/+70
|\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "A CPU hotplug related crash fix and a nohz accounting fixlet." * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Update sched_domains_numa_masks[][] when new cpus are onlined sched: Ensure 'sched_domains_numa_levels' is safe to use in other functions nohz: Fix one jiffy count too far in idle cputime
| * | | | | | | | | sched: Update sched_domains_numa_masks[][] when new cpus are onlinedTang Chen2012-10-051-0/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Once array sched_domains_numa_masks[] []is defined, it is never updated. When a new cpu on a new node is onlined, the coincident member in sched_domains_numa_masks[][] is not initialized, and all the masks are 0. As a result, the build_overlap_sched_groups() will initialize a NULL sched_group for the new cpu on the new node, which will lead to kernel panic: [ 3189.403280] Call Trace: [ 3189.403286] [<ffffffff8106c36f>] warn_slowpath_common+0x7f/0xc0 [ 3189.403289] [<ffffffff8106c3ca>] warn_slowpath_null+0x1a/0x20 [ 3189.403292] [<ffffffff810b1d57>] build_sched_domains+0x467/0x470 [ 3189.403296] [<ffffffff810b2067>] partition_sched_domains+0x307/0x510 [ 3189.403299] [<ffffffff810b1ea2>] ? partition_sched_domains+0x142/0x510 [ 3189.403305] [<ffffffff810fcc93>] cpuset_update_active_cpus+0x83/0x90 [ 3189.403308] [<ffffffff810b22a8>] cpuset_cpu_active+0x38/0x70 [ 3189.403316] [<ffffffff81674b87>] notifier_call_chain+0x67/0x150 [ 3189.403320] [<ffffffff81664647>] ? native_cpu_up+0x18a/0x1b5 [ 3189.403328] [<ffffffff810a044e>] __raw_notifier_call_chain+0xe/0x10 [ 3189.403333] [<ffffffff81070470>] __cpu_notify+0x20/0x40 [ 3189.403337] [<ffffffff8166663e>] _cpu_up+0xe9/0x131 [ 3189.403340] [<ffffffff81666761>] cpu_up+0xdb/0xee [ 3189.403348] [<ffffffff8165667c>] store_online+0x9c/0xd0 [ 3189.403355] [<ffffffff81437640>] dev_attr_store+0x20/0x30 [ 3189.403361] [<ffffffff8124aa63>] sysfs_write_file+0xa3/0x100 [ 3189.403368] [<ffffffff811ccbe0>] vfs_write+0xd0/0x1a0 [ 3189.403371] [<ffffffff811ccdb4>] sys_write+0x54/0xa0 [ 3189.403375] [<ffffffff81679c69>] system_call_fastpath+0x16/0x1b [ 3189.403377] ---[ end trace 1e6cf85d0859c941 ]--- [ 3189.403398] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 This patch registers a new notifier for cpu hotplug notify chain, and updates sched_domains_numa_masks every time a new cpu is onlined or offlined. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> [ fixed compile warning ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1348578751-16904-3-git-send-email-tangchen@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | | | sched: Ensure 'sched_domains_numa_levels' is safe to use in other functionsTang Chen2012-10-051-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We should temporarily reset 'sched_domains_numa_levels' to 0 after it is reset to 'level' in sched_init_numa(). If it fails to allocate memory for array sched_domains_numa_masks[][], the array will contain less then 'level' members. This could be dangerous when we use it to iterate array sched_domains_numa_masks[][] in other functions. This patch set sched_domains_numa_levels to 0 before initializing array sched_domains_numa_masks[][], and reset it to 'level' when sched_domains_numa_masks[][] is fully initialized. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1348578751-16904-2-git-send-email-tangchen@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | | | nohz: Fix one jiffy count too far in idle cputimeFrederic Weisbecker2012-10-051-1/+1
| | |_|_|_|/ / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we stop the tick in idle, we save the current jiffies value in ts->idle_jiffies. This snapshot is substracted from the later value of jiffies when the tick is restarted and the resulting delta is accounted as idle cputime. This is how we handle the idle cputime accounting without the tick. But sometimes we need to schedule the next tick to some time in the future instead of completely stopping it. In this case, a tick may happen before we restart the periodic behaviour and from that tick we account one jiffy to idle cputime as usual but we also increment the ts->idle_jiffies snapshot by one so that when we compute the delta to account, we substract the one jiffy we just accounted. To prepare for stopping the tick outside idle, we introduced a check that prevents from fixing up that ts->idle_jiffies if we are not running the idle task. But we use idle_cpu() for that and this is a problem if we run the tick while another CPU remotely enqueues a ttwu to our runqueue: CPU 0: CPU 1: tick_sched_timer() { ttwu_queue_remote() if (idle_cpu(CPU 0)) ts->idle_jiffies++; } Here, idle_cpu() notes that &rq->wake_list is not empty and hence won't consider the CPU as idle. As a result, ts->idle_jiffies won't be incremented. But this is wrong because we actually account the current jiffy to idle cputime. And that jiffy won't get substracted from the nohz time delta. So in the end, this jiffy is accounted twice. Fix this by changing idle_cpu(smp_processor_id()) with is_idle_task(current). This way the jiffy is substracted correctly even if a ttwu operation is enqueued on the CPU. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> # 3.5+ Link: http://lkml.kernel.org/r/1349308004-3482-1-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | | | | | Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds2012-10-122-11/+16
|\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU fixes from Ingo Molnar: "This tree includes a shutdown/cpu-hotplug deadlock fix and a documentation fix." * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rcu: Advise most users not to enable RCU user mode rcu: Grace-period initialization excludes only RCU notifier
| * | | | | | | | | rcu: Grace-period initialization excludes only RCU notifierPaul E. McKenney2012-10-082-11/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Kirill noted the following deadlock cycle on shutdown involving padata: > With commit 755609a9087fa983f567dc5452b2fa7b089b591f I've got deadlock on > poweroff. > > It guess it happens because of race for cpu_hotplug.lock: > > CPU A CPU B > disable_nonboot_cpus() > _cpu_down() > cpu_hotplug_begin() > mutex_lock(&cpu_hotplug.lock); > __cpu_notify() > padata_cpu_callback() > __padata_remove_cpu() > padata_replace() > synchronize_rcu() > rcu_gp_kthread() > get_online_cpus(); > mutex_lock(&cpu_hotplug.lock); It would of course be good to eliminate grace-period delays from CPU-hotplug notifiers, but that is a separate issue. Deadlock is not an appropriate diagnostic for excessive CPU-hotplug latency. Fortunately, grace-period initialization does not actually need to exclude all of the CPU-hotplug operation, but rather only RCU's own CPU_UP_PREPARE and CPU_DEAD CPU-hotplug notifiers. This commit therefore introduces a new per-rcu_state onoff_mutex that provides the required concurrency control in place of the get_online_cpus() that was previously in rcu_gp_init(). Reported-by: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Kirill A. Shutemov <kirill@shutemov.name>
* | | | | | | | | | Merge tag 'pinctrl-for-3.7-late' of ↵Linus Torvalds2012-10-121-5/+28
|\ \ \ \ \ \ \ \ \ \ | |_|_|_|_|/ / / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull second set of pinctrl patches from Linus Walleij: "Here is a late pinctrl pull request with stuff that wasn't quite tested at the first pull request. The main reason to not hold off is that the modifications to irq_domain_add_simple() as reviewed by Rob Herring introduce new infrastructure for irqdomains that will be useful for the next cycle: instead of sprinkling irq descriptor allocation all over the kernel wherever a "legacy" domain is registered, which is necessary for any platform using sparse IRQs, and many irq chips are say GPIO controllers which may be used with several systems, some with sparse IRQs some not, we push this into the irq_domain_add_simple() so we can atleast do mistakes in one place. The irq_domain_add_simple() is currently unused in the kernel, so I need to provide a user. The Nomadik stuff that goes with are changes to the driver I use day-to-day to make use of this facility (and a dependency), so see it as a way to eat my own dogfood: if this blows up the egg hits my face. A second round of pinctrl patches for v3.7: - Complement the Nomadik pinctrl driver with alternate Cx functions so it handles all oddities. - A patch to the IRQdomain to reform the simple irqdomain to handle IRQ descriptor allocation dynamically. - Use the above feature in the Nomadik pin controller." * tag 'pinctrl-for-3.7-late' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl/nomadik: use simple or linear IRQ domain irqdomain: augment add_simple() to allocate descs pinctrl/nomadik: support other alternate-C functions
| * | | | | | | | | irqdomain: augment add_simple() to allocate descsLinus Walleij2012-10-101-5/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we rely on all IRQ chip instances to dynamically allocate their IRQ descriptors unless they use the linear IRQ domain. So for irqdomain_add_legacy() and irqdomain_add_simple() the caller need to make sure that descriptors are allocated. Let's slightly augment the yet unused irqdomain_add_simple() to also allocate descriptors as a means to simplify usage and avoid code duplication throughout the kernel. We warn if descriptors cannot be allocated, e.g. if a platform has the bad habit of hogging descriptors at boot time. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Lee Jones <lee.jones@linaro.org> Reviewed-by: Rob Herring <rob.herring@calxeda.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
* | | | | | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2012-10-121-0/+2
|\ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull pile 2 of vfs updates from Al Viro: "Stuff in this one - assorted fixes, lglock tidy-up, death to lock_super(). There'll be a VFS pile tomorrow (with patches from Jeff Layton, sanitizing getname() and related parts of audit and preparing for ESTALE fixes), but I'd rather push the stuff in this one ASAP - some of the bugs closed here are quite unpleasant." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vfs: bogus warnings in fs/namei.c consitify do_mount() arguments lglock: add DEFINE_STATIC_LGLOCK() lglock: make the per_cpu locks static lglock: remove unused DEFINE_LGLOCK_LOCKDEP() MAX_LFS_FILESIZE definition for 64bit needs LL... tmpfs,ceph,gfs2,isofs,reiserfs,xfs: fix fh_len checking vfs: drop lock/unlock super ufs: drop lock/unlock super sysv: drop lock/unlock super hpfs: drop lock/unlock super fat: drop lock/unlock super ext3: drop lock/unlock super exofs: drop lock/unlock super dup3: Return an error when oldfd == newfd. fs: handle failed audit_log_start properly fs: prevent use after free in auditing when symlink following was denied
| * | | | | | | | | | fs: handle failed audit_log_start properlySasha Levin2012-10-091-0/+2
| |/ / / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | audit_log_start() may return NULL, this is unchecked by the caller in audit_log_link_denied() and could cause a NULL ptr deref. Introduced by commit a51d9eaa ("fs: add link restriction audit reporting"). Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | | | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2012-10-121-1/+1
|\ \ \ \ \ \ \ \ \ \ | | |_|_|_|_|/ / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull pile 2 of execve and kernel_thread unification work from Al Viro: "Stuff in there: kernel_thread/kernel_execve/sys_execve conversions for several more architectures plus assorted signal fixes and cleanups. There'll be more (in particular, real fixes for the alpha do_notify_resume() irq mess)..." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (43 commits) alpha: don't open-code trace_report_syscall_{enter,exit} Uninclude linux/freezer.h m32r: trim masks avr32: trim masks tile: don't bother with SIGTRAP in setup_frame microblaze: don't bother with SIGTRAP in setup_rt_frame() mn10300: don't bother with SIGTRAP in setup_frame() frv: no need to raise SIGTRAP in setup_frame() x86: get rid of duplicate code in case of CONFIG_VM86 unicore32: remove pointless test h8300: trim _TIF_WORK_MASK parisc: decide whether to go to slow path (tracesys) based on thread flags parisc: don't bother looping in do_signal() parisc: fix double restarts bury the rest of TIF_IRET sanitize tsk_is_polling() bury _TIF_RESTORE_SIGMASK unicore32: unobfuscate _TIF_WORK_MASK mips: NOTIFY_RESUME is not needed in TIF masks mips: merge the identical "return from syscall" per-ABI code ... Conflicts: arch/arm/include/asm/thread_info.h
| * | | | | | | | | sanitize tsk_is_polling()Al Viro2012-10-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make default just return 0. The current default (checking TIF_POLLING_NRFLAG) is taken to architectures that need it; ones that don't do polling in their idle threads don't need to defined TIF_POLLING_NRFLAG at all. ia64 defined both TS_POLLING (used by its tsk_is_polling()) and TIF_POLLING_NRFLAG (not used at all). Killed the latter... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>