summaryrefslogtreecommitdiffstats
path: root/kernel
Commit message (Collapse)AuthorAgeFilesLines
* kernel: delete __cpuinit usage from all core kernel filesPaul Gortmaker2013-07-1415-33/+33
| | | | | | | | | | | | | | | | | | | | | The __cpuinit type of throwaway sections might have made sense some time ago when RAM was more constrained, but now the savings do not offset the cost and complications. For example, the fix in commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time") is a good example of the nasty type of bugs that can be created with improper use of the various __init prefixes. After a discussion on LKML[1] it was decided that cpuinit should go the way of devinit and be phased out. Once all the users are gone, we can then finally remove the macros themselves from linux/init.h. This removes all the uses of the __cpuinit macros from C files in the core kernel directories (kernel, init, lib, mm, and include) that don't really have a specific maintainer. [1] https://lkml.org/lkml/2013/5/20/589 Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
* rcu: delete __cpuinit usage from all rcu filesPaul Gortmaker2013-07-144-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | The __cpuinit type of throwaway sections might have made sense some time ago when RAM was more constrained, but now the savings do not offset the cost and complications. For example, the fix in commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time") is a good example of the nasty type of bugs that can be created with improper use of the various __init prefixes. After a discussion on LKML[1] it was decided that cpuinit should go the way of devinit and be phased out. Once all the users are gone, we can then finally remove the macros themselves from linux/init.h. This removes all the drivers/rcu uses of the __cpuinit macros from all C files. [1] https://lkml.org/lkml/2013/5/20/589 Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@freedesktop.org> Cc: Dipankar Sarma <dipankar@in.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
* Merge branch 'for-linus' of ↵Linus Torvalds2013-07-141-10/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs stuff from Al Viro: "O_TMPFILE ABI changes, Oleg's fput() series, misc cleanups, including making simple_lookup() usable for filesystems with non-NULL s_d_op, which allows us to get rid of quite a bit of ugliness" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: sunrpc: now we can just set ->s_d_op cgroup: we can use simple_lookup() now efivarfs: we can use simple_lookup() now make simple_lookup() usable for filesystems that set ->s_d_op configfs: don't open-code d_alloc_name() __rpc_lookup_create_exclusive: pass string instead of qstr rpc_create_*_dir: don't bother with qstr llist: llist_add() can use llist_add_batch() llist: fix/simplify llist_add() and llist_add_batch() fput: turn "list_head delayed_fput_list" into llist_head fs/file_table.c:fput(): add comment Safer ABI for O_TMPFILE
| * cgroup: we can use simple_lookup() nowAl Viro2013-07-141-10/+1
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds2013-07-131-9/+11
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Thomas Gleixner: "Fix a potential deadlock versus hrtimers" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Fix HRTICK
| * | sched: Fix HRTICKPeter Zijlstra2013-07-121-9/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | David reported that the HRTICK sched feature was borken; which was enough motivation for me to finally fix it ;-) We should not allow hrtimer code to do softirq wakeups while holding scheduler locks. The hrtimer code only needs this when we accidentally try to program an expired time. We don't much care about those anyway since we have the regular tick to fall back to. Reported-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20130628091853.GE29209@dyad.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds2013-07-131-1/+1
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: - core fix for missing round up in the generic irq chip implementation - new irq chip for MOXA SoCs - a few fixes and cleanups in the irqchip drivers * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip: Add support for MOXA ART SoCs genirq: generic chip: Use DIV_ROUND_UP to calculate numchips irqchip: nvic: Fix wrong num_ct argument for irq_alloc_domain_generic_chips() irqchip: sun4i: Staticize sun4i_irq_ack() irqchip: vt8500: Staticize local symbols
| * | | genirq: generic chip: Use DIV_ROUND_UP to calculate numchipsAxel Lin2013-07-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The number of interrupts in a domain may be not divisible by the number of interrupts each chip handles. Integer division may truncate the result, thus use DIV_ROUND_UP to count numchips. Seems all users of irq_alloc_domain_generic_chips() in current code do not have this issue. I just found the issue while reading the code. Signed-off-by: Axel Lin <axel.lin@ingics.com> Cc: Grant Likely <grant.likely@linaro.org> Cc: Tony Lindgren <tony@atomide.com> Cc: Arnd Bergmann <arnd@arndb.de> Link: http://lkml.kernel.org/r/1373015592.18252.2.camel@phoenix Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | | | Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds2013-07-134-62/+75
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: - watchdog fixes for full dynticks - improved debug output for full dynticks - remove an obsolete full dynticks check - two ARM SoC clocksource drivers for sharing across SoCs - tick broadcast fix for CPU hotplug * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tick: broadcast: Check broadcast mode on CPU hotplug clocksource: arm_global_timer: Add ARM global timer support clocksource: Add Marvell Orion SoC timer nohz: Remove obsolete check for full dynticks CPUs to be RCU nocbs watchdog: Boot-disable by default on full dynticks watchdog: Rename confusing state variable watchdog: Register / unregister watchdog kthreads on sysctl control nohz: Warn if the machine can not perform nohz_full
| * | | | tick: broadcast: Check broadcast mode on CPU hotplugStephen Boyd2013-07-121-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On ARM systems the dummy clockevent is registered with the cpu hotplug notifier chain before any other per-cpu clockevent. This has the side-effect of causing the dummy clockevent to be registered first in every hotplug sequence. Because the dummy is first, we'll try to turn the broadcast source on but the code in tick_device_uses_broadcast() assumes the broadcast source is in periodic mode and calls tick_broadcast_start_periodic() unconditionally. On boot this isn't a problem because we typically haven't switched into oneshot mode yet (if at all). During hotplug, if the broadcast source isn't in periodic mode we'll replace the broadcast oneshot handler with the broadcast periodic handler and start emulating oneshot mode when we shouldn't. Due to the way the broadcast oneshot handler programs the next_event it's possible for it to contain KTIME_MAX and cause us to hang the system when the periodic handler tries to program the next tick. Fix this by using the appropriate function to start the broadcast source. Reported-by: Stephen Warren <swarren@nvidia.com> Tested-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Cc: Mark Rutland <Mark.Rutland@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: ARM kernel mailing list <linux-arm-kernel@lists.infradead.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Joseph Lo <josephl@nvidia.com> Link: http://lkml.kernel.org/r/20130711140059.GA27430@codeaurora.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | Merge branch 'linus' into timers/urgentThomas Gleixner2013-07-1262-2553/+3927
| |\ \ \ \ | | | |/ / | | |/| | | | | | | | | | | | | | | | | Get upstream changes so we can apply fixes against them Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | Merge branch 'timers/core' of ↵Ingo Molnar2013-07-103-61/+71
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into timers/urgent Pull nohz updates/fixes from Frederic Weisbecker: ' Note that "watchdog: Boot-disable by default on full dynticks" is a temporary solution to solve the issue with the watchdog that prevents the tick from stopping. This is to make sure that 3.11 doesn't have that problem as several people complained about it. A proper and longer term solution has been proposed by Peterz: http://lkml.kernel.org/r/20130618103632.GO3204@twins.programming.kicks-ass.net ' Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * | | | nohz: Remove obsolete check for full dynticks CPUs to be RCU nocbsFrederic Weisbecker2013-06-201-10/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Building full dynticks now implies that all CPUs are forced into RCU nocb mode through CONFIG_RCU_NOCB_CPU_ALL. The dynamic check has become useless. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Borislav Petkov <bp@alien8.de>
| | * | | | watchdog: Boot-disable by default on full dynticksFrederic Weisbecker2013-06-201-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the watchdog runs, it prevents the full dynticks CPUs from stopping their tick because the hard lockup detector uses perf events internally, which in turn rely on the periodic tick. Since this is a rather confusing behaviour that is not easy to track down and identify for those who want to test CONFIG_NO_HZ_FULL, let's default disable the watchdog on boot time when full dynticks is enabled. The user can still enable it later on runtime using proc or sysctl. Reported-by: Steven Rostedt <rostedt@goodmis.org> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Anish Singh <anish198519851985@gmail.com>
| | * | | | watchdog: Rename confusing state variableFrederic Weisbecker2013-06-202-17/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have two very conflicting state variable names in the watchdog: * watchdog_enabled: This one reflects the user interface. It's set to 1 by default and can be overriden with boot options or sysctl/procfs interface. * watchdog_disabled: This is the internal toggle state that tells if watchdog threads, timers and NMI events are currently running or not. This state mostly depends on the user settings. It's a convenient state latch. Now we really need to find clearer names because those are just too confusing to encourage deep review. watchdog_enabled now becomes watchdog_user_enabled to reflect its purpose as an interface. watchdog_disabled becomes watchdog_running to suggest its role as a pure internal state. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Anish Singh <anish198519851985@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Don Zickus <dzickus@redhat.com>
| | * | | | watchdog: Register / unregister watchdog kthreads on sysctl controlFrederic Weisbecker2013-06-201-40/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The user activation/deactivation of the watchdog through boot parameters or systcl is currently implemented with a dance involving kthreads parking and unparking methods: the threads are unconditionally registered on boot and they park as soon as the user want the watchdog to be disabled. This method involves a few noisy details to handle though: the watchdog kthreads may be unparked anytime due to hotplug operations, after which the watchdog internals have to decide to park again if it is user-disabled. As a result the setup() and unpark() methods need to be able to request a reparking. This is not currently supported in the kthread infrastructure so this piece of the watchdog code only works halfway. Besides, unparking/reparking the watchdog kthreads consume unnecessary cputime on hotplug operations when those could be simply ignored in the first place. As suggested by Srivatsa, let's instead only register the watchdog threads when they are needed. This way we don't need to think about hotplug operations and we don't burden the CPU onlining when the watchdog is simply disabled. Suggested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Anish Singh <anish198519851985@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Don Zickus <dzickus@redhat.com>
| | * | | | nohz: Warn if the machine can not perform nohz_fullSteven Rostedt2013-06-201-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the user configures NO_HZ_FULL and defines nohz_full=XXX on the kernel command line, or enables NO_HZ_FULL_ALL, but nohz fails due to the machine having a unstable clock, warn about it. We do not want users thinking that they are getting the benefit of nohz when their machine can not support it. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
* | | | | | Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds2013-07-131-3/+25
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: - fix for do_div() abuse on x86 - locking fix in perf core - a pile of (build) fixes and cleanups in perf tools * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits) perf/x86: Fix incorrect use of do_div() in NMI warning perf: Fix perf_lock_task_context() vs RCU perf: Remove WARN_ON_ONCE() check in __perf_event_enable() for valid scenario perf: Clone child context from parent context pmu perf script: Fix broken include in Context.xs perf tools: Fix -ldw/-lelf link test when static linking perf tools: Revert regression in configuration of Python support perf tools: Fix perf version generation perf stat: Fix per-socket output bug for uncore events perf symbols: Fix vdso list searching perf evsel: Fix missing increment in sample parsing perf tools: Update symbol_conf.nr_events when processing attribute events perf tools: Fix new_term() missing free on error path perf tools: Fix parse_events_terms() segfault on error path perf evsel: Fix count parameter to read call in event_format__new perf tools: fix a typo of a Power7 event name perf tools: Fix -x/--exclude-other option for report command perf evlist: Enhance perf_evlist__start_workload() perf record: Remove -f/--force option perf record: Remove -A/--append option ...
| * | | | | | perf: Fix perf_lock_task_context() vs RCUPeter Zijlstra2013-07-121-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jiri managed to trigger this warning: [] ====================================================== [] [ INFO: possible circular locking dependency detected ] [] 3.10.0+ #228 Tainted: G W [] ------------------------------------------------------- [] p/6613 is trying to acquire lock: [] (rcu_node_0){..-...}, at: [<ffffffff810ca797>] rcu_read_unlock_special+0xa7/0x250 [] [] but task is already holding lock: [] (&ctx->lock){-.-...}, at: [<ffffffff810f2879>] perf_lock_task_context+0xd9/0x2c0 [] [] which lock already depends on the new lock. [] [] the existing dependency chain (in reverse order) is: [] [] -> #4 (&ctx->lock){-.-...}: [] -> #3 (&rq->lock){-.-.-.}: [] -> #2 (&p->pi_lock){-.-.-.}: [] -> #1 (&rnp->nocb_gp_wq[1]){......}: [] -> #0 (rcu_node_0){..-...}: Paul was quick to explain that due to preemptible RCU we cannot call rcu_read_unlock() while holding scheduler (or nested) locks when part of the read side critical section was preemptible. Therefore solve it by making the entire RCU read side non-preemptible. Also pull out the retry from under the non-preempt to play nice with RT. Reported-by: Jiri Olsa <jolsa@redhat.com> Helped-out-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: <stable@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | perf: Remove WARN_ON_ONCE() check in __perf_event_enable() for valid scenarioJiri Olsa2013-07-121-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The '!ctx->is_active' check has a valid scenario, so there's no need for the warning. The reason is that there's a time window between the 'ctx->is_active' check in the perf_event_enable() function and the __perf_event_enable() function having: - IRQs on - ctx->lock unlocked where the task could be killed and 'ctx' deactivated by perf_event_exit_task(), ending up with the warning below. So remove the WARN_ON_ONCE() check and add comments to explain it all. This addresses the following warning reported by Vince Weaver: [ 324.983534] ------------[ cut here ]------------ [ 324.984420] WARNING: at kernel/events/core.c:1953 __perf_event_enable+0x187/0x190() [ 324.984420] Modules linked in: [ 324.984420] CPU: 19 PID: 2715 Comm: nmi_bug_snb Not tainted 3.10.0+ #246 [ 324.984420] Hardware name: Supermicro X8DTN/X8DTN, BIOS 4.6.3 01/08/2010 [ 324.984420] 0000000000000009 ffff88043fce3ec8 ffffffff8160ea0b ffff88043fce3f00 [ 324.984420] ffffffff81080ff0 ffff8802314fdc00 ffff880231a8f800 ffff88043fcf7860 [ 324.984420] 0000000000000286 ffff880231a8f800 ffff88043fce3f10 ffffffff8108103a [ 324.984420] Call Trace: [ 324.984420] <IRQ> [<ffffffff8160ea0b>] dump_stack+0x19/0x1b [ 324.984420] [<ffffffff81080ff0>] warn_slowpath_common+0x70/0xa0 [ 324.984420] [<ffffffff8108103a>] warn_slowpath_null+0x1a/0x20 [ 324.984420] [<ffffffff81134437>] __perf_event_enable+0x187/0x190 [ 324.984420] [<ffffffff81130030>] remote_function+0x40/0x50 [ 324.984420] [<ffffffff810e51de>] generic_smp_call_function_single_interrupt+0xbe/0x130 [ 324.984420] [<ffffffff81066a47>] smp_call_function_single_interrupt+0x27/0x40 [ 324.984420] [<ffffffff8161fd2f>] call_function_single_interrupt+0x6f/0x80 [ 324.984420] <EOI> [<ffffffff816161a1>] ? _raw_spin_unlock_irqrestore+0x41/0x70 [ 324.984420] [<ffffffff8113799d>] perf_event_exit_task+0x14d/0x210 [ 324.984420] [<ffffffff810acd04>] ? switch_task_namespaces+0x24/0x60 [ 324.984420] [<ffffffff81086946>] do_exit+0x2b6/0xa40 [ 324.984420] [<ffffffff8161615c>] ? _raw_spin_unlock_irq+0x2c/0x30 [ 324.984420] [<ffffffff81087279>] do_group_exit+0x49/0xc0 [ 324.984420] [<ffffffff81096854>] get_signal_to_deliver+0x254/0x620 [ 324.984420] [<ffffffff81043057>] do_signal+0x57/0x5a0 [ 324.984420] [<ffffffff8161a164>] ? __do_page_fault+0x2a4/0x4e0 [ 324.984420] [<ffffffff8161665c>] ? retint_restore_args+0xe/0xe [ 324.984420] [<ffffffff816166cd>] ? retint_signal+0x11/0x84 [ 324.984420] [<ffffffff81043605>] do_notify_resume+0x65/0x80 [ 324.984420] [<ffffffff81616702>] retint_signal+0x46/0x84 [ 324.984420] ---[ end trace 442ec2f04db3771a ]--- Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Jiri Olsa <jolsa@redhat.com> Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1373384651-6109-2-git-send-email-jolsa@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | perf: Clone child context from parent context pmuJiri Olsa2013-07-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently when the child context for inherited events is created, it's based on the pmu object of the first event of the parent context. This is wrong for the following scenario: - HW context having HW and SW event - HW event got removed (closed) - SW event stays in HW context as the only event and its pmu is used to clone the child context The issue starts when the cpu context object is touched based on the pmu context object (__get_cpu_context). In this case the HW context will work with SW cpu context ending up with following WARN below. Fixing this by using parent context pmu object to clone from child context. Addresses the following warning reported by Vince Weaver: [ 2716.472065] ------------[ cut here ]------------ [ 2716.476035] WARNING: at kernel/events/core.c:2122 task_ctx_sched_out+0x3c/0x) [ 2716.476035] Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl nfs locn [ 2716.476035] CPU: 0 PID: 3164 Comm: perf_fuzzer Not tainted 3.10.0-rc4 #2 [ 2716.476035] Hardware name: AOpen DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, BI2 [ 2716.476035] 0000000000000000 ffffffff8102e215 0000000000000000 ffff88011fc18 [ 2716.476035] ffff8801175557f0 0000000000000000 ffff880119fda88c ffffffff810ad [ 2716.476035] ffff880119fda880 ffffffff810af02a 0000000000000009 ffff880117550 [ 2716.476035] Call Trace: [ 2716.476035] [<ffffffff8102e215>] ? warn_slowpath_common+0x5b/0x70 [ 2716.476035] [<ffffffff810ab2bd>] ? task_ctx_sched_out+0x3c/0x5f [ 2716.476035] [<ffffffff810af02a>] ? perf_event_exit_task+0xbf/0x194 [ 2716.476035] [<ffffffff81032a37>] ? do_exit+0x3e7/0x90c [ 2716.476035] [<ffffffff810cd5ab>] ? __do_fault+0x359/0x394 [ 2716.476035] [<ffffffff81032fe6>] ? do_group_exit+0x66/0x98 [ 2716.476035] [<ffffffff8103dbcd>] ? get_signal_to_deliver+0x479/0x4ad [ 2716.476035] [<ffffffff810ac05c>] ? __perf_event_task_sched_out+0x230/0x2d1 [ 2716.476035] [<ffffffff8100205d>] ? do_signal+0x3c/0x432 [ 2716.476035] [<ffffffff810abbf9>] ? ctx_sched_in+0x43/0x141 [ 2716.476035] [<ffffffff810ac2ca>] ? perf_event_context_sched_in+0x7a/0x90 [ 2716.476035] [<ffffffff810ac311>] ? __perf_event_task_sched_in+0x31/0x118 [ 2716.476035] [<ffffffff81050dd9>] ? mmdrop+0xd/0x1c [ 2716.476035] [<ffffffff81051a39>] ? finish_task_switch+0x7d/0xa6 [ 2716.476035] [<ffffffff81002473>] ? do_notify_resume+0x20/0x5d [ 2716.476035] [<ffffffff813654f5>] ? retint_signal+0x3d/0x78 [ 2716.476035] ---[ end trace 827178d8a5966c3d ]--- Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1373384651-6109-1-git-send-email-jolsa@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | | | Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds2013-07-131-0/+1
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core locking updates from Thomas Gleixner: "Header cleanup as requested by Linus" (This is the "don't include support for ww_mutex in a header file that everybody wants, when almost nobody wants the ww part" change) * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: mutex: Move ww_mutex definitions to ww_mutex.h
| * | | | | | | mutex: Move ww_mutex definitions to ww_mutex.hMaarten Lankhorst2013-07-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the definitions for wound/wait mutexes out to a separate header, ww_mutex.h. This reduces clutter in mutex.h, and increases readability. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Cc: Dave Airlie <airlied@gmail.com> Link: http://lkml.kernel.org/r/51D675DC.3000907@canonical.com [ Tidied up the code a bit. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | | | | Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linusLinus Torvalds2013-07-131-12/+0
|\ \ \ \ \ \ \ \ | |_|_|_|_|/ / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull MIPS updates from Ralf Baechle: "MIPS updates: - All the things that didn't make 3.10. - Removes the Windriver PPMC platform. Nobody will miss it. - Remove a workaround from kernel/irq/irqdomain.c which was there exclusivly for MIPS. Patch by Grant Likely. - More small improvments for the SEAD 3 platform - Improvments on the BMIPS / SMP support for the BCM63xx series. - Various cleanups of dead leftovers. - Platform support for the Cavium Octeon-based EdgeRouter Lite. Two large KVM patchsets didn't make it for this pull request because their respective authors are vacationing" * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (124 commits) MIPS: Kconfig: Add missing MODULES dependency to VPE_LOADER MIPS: BCM63xx: CLK: Add dummy clk_{set,round}_rate() functions MIPS: SEAD3: Disable L2 cache on SEAD-3. MIPS: BCM63xx: Enable second core SMP on BCM6328 if available MIPS: BCM63xx: Add SMP support to prom.c MIPS: define write{b,w,l,q}_relaxed MIPS: Expose missing pci_io{map,unmap} declarations MIPS: Malta: Update GCMP detection. Revert "MIPS: make CAC_ADDR and UNCAC_ADDR account for PHYS_OFFSET" MIPS: APSP: Remove <asm/kspd.h> SSB: Kconfig: Amend SSB_EMBEDDED dependencies MIPS: microMIPS: Fix improper definition of ISA exception bit. MIPS: Don't try to decode microMIPS branch instructions where they cannot exist. MIPS: Declare emulate_load_store_microMIPS as a static function. MIPS: Fix typos and cleanup comment MIPS: Cleanup indentation and whitespace MIPS: BMIPS: support booting from physical CPU other than 0 MIPS: Only set cpu_has_mmips if SYS_SUPPORTS_MICROMIPS MIPS: GIC: Fix gic_set_affinity infinite loop MIPS: Don't save/restore OCTEON wide multiplier state on syscalls. ...
| * | | | | | | irqdomain: Remove temporary MIPS workaround codeGrant Likely2013-06-181-12/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The MIPS interrupt controllers are all registering their own irq_domains now. Drop the MIPS specific code because it is no longer needed. Signed-off-by: Grant Likely <grant.likely@linaro.org> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/5458/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* | | | | | | | Merge tag 'trace-3.11' of ↵Linus Torvalds2013-07-1113-265/+650
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing changes from Steven Rostedt: "The majority of the changes here are cleanups for the large changes that were added to 3.10, which includes several bug fixes that have been marked for stable. As for new features, there were a few, but nothing to write to LWN about. These include: New function trigger called "dump" and "cpudump" that will cause ftrace to dump its buffer to the console when the function is called. The difference between "dump" and "cpudump" is that "dump" will dump the entire contents of the ftrace buffer, where as "cpudump" will only dump the contents of the ftrace buffer for the CPU that called the function. Another small enhancement is a new sysctl switch called "traceoff_on_warning" which, when enabled, will disable tracing if any WARN_ON() is triggered. This is useful if you want to debug what caused a warning and do not want to risk losing your trace data by the ring buffer overwriting the data before you can disable it. There's also a kernel command line option that will make this enabled at boot up called the same thing" * tag 'trace-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (34 commits) tracing: Make tracing_open_generic_{tr,tc}() static tracing: Remove ftrace() function tracing: Remove TRACE_EVENT_TYPE enum definition tracing: Make tracer_tracing_{off,on,is_on}() static tracing: Fix irqs-off tag display in syscall tracing uprobes: Fix return value in error handling path tracing: Fix race between deleting buffer and setting events tracing: Add trace_array_get/put() to event handling tracing: Get trace_array ref counts when accessing trace files tracing: Add trace_array_get/put() to handle instance refs better tracing: Protect ftrace_trace_arrays list in trace_events.c tracing: Make trace_marker use the correct per-instance buffer ftrace: Do not run selftest if command line parameter is set tracing/kprobes: Don't pass addr=ip to perf_trace_buf_submit() tracing: Use flag buffer_disabled for irqsoff tracer tracing/kprobes: Turn trace_probe->files into list_head tracing: Fix disabling of soft disable tracing: Add missing syscall_metadata comment tracing: Simplify code for showing of soft disabled flag tracing/kprobes: Kill probe_enable_lock ...
| * | | | | | | | tracing: Make tracing_open_generic_{tr,tc}() staticSteven Rostedt (Red Hat)2013-07-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I have patches that will use tracing_open_generic_tr/tc() in other files, but as they are not ready to be merged yet, and Fengguang Wu's sparse scripts pointed out that these functions were not declared anywhere, I'll make them static for now. When these functions are required to be used elsewhere, I'll remove the static then. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | | | | | | tracing: Remove ftrace() functionzhangwei(Jovi)2013-07-022-14/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The only caller of function ftrace(...) was removed a long time ago, so remove the function body as well. Link: http://lkml.kernel.org/r/1365564393-10972-10-git-send-email-jovi.zhangwei@huawei.com Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | | | | | | tracing: Remove TRACE_EVENT_TYPE enum definitionzhangwei(Jovi)2013-07-021-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TRACE_EVENT_TYPE enum is not used at present, remove it. Link: http://lkml.kernel.org/r/1365564393-10972-8-git-send-email-jovi.zhangwei@huawei.com Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | | | | | | tracing: Make tracer_tracing_{off,on,is_on}() staticSteven Rostedt (Red Hat)2013-07-021-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I have patches that will use tracer_tracing_on/off/is_on() in other files, but as they are not ready to be merged yet, and Fengguang Wu's sparse scripts pointed out that these functions were not declared anywhere, I'll make them static for now. When these functions are required to be used elsewhere, I'll remove the static then. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | | | | | | tracing: Fix irqs-off tag display in syscall tracingzhangwei(Jovi)2013-07-021-4/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All syscall tracing irqs-off tags are wrong, the syscall enter entry doesn't disable irqs. [root@jovi tracing]#echo "syscalls:sys_enter_open" > set_event [root@jovi tracing]# cat trace # tracer: nop # # entries-in-buffer/entries-written: 13/13 #P:2 # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | irqbalance-513 [000] d... 56115.496766: sys_open(filename: 804e1a6, flags: 0, mode: 1b6) irqbalance-513 [000] d... 56115.497008: sys_open(filename: 804e1bb, flags: 0, mode: 1b6) sendmail-771 [000] d... 56115.827982: sys_open(filename: b770e6d1, flags: 0, mode: 1b6) The reason is syscall tracing doesn't record irq_flags into buffer. The proper display is: [root@jovi tracing]#echo "syscalls:sys_enter_open" > set_event [root@jovi tracing]# cat trace # tracer: nop # # entries-in-buffer/entries-written: 14/14 #P:2 # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | irqbalance-514 [001] .... 46.213921: sys_open(filename: 804e1a6, flags: 0, mode: 1b6) irqbalance-514 [001] .... 46.214160: sys_open(filename: 804e1bb, flags: 0, mode: 1b6) <...>-920 [001] .... 47.307260: sys_open(filename: 4e82a0c5, flags: 80000, mode: 0) Link: http://lkml.kernel.org/r/1365564393-10972-3-git-send-email-jovi.zhangwei@huawei.com Cc: stable@vger.kernel.org # 2.6.35 Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | | | | | | uprobes: Fix return value in error handling pathzhangwei(Jovi)2013-07-021-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When wrong argument is passed into uprobe_events it does not return an error: [root@jovi tracing]# echo 'p:myprobe /bin/bash' > uprobe_events [root@jovi tracing]# The proper response is: [root@jovi tracing]# echo 'p:myprobe /bin/bash' > uprobe_events -bash: echo: write error: Invalid argument Link: http://lkml.kernel.org/r/51B964FF.5000106@huawei.com Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: <srikar@linux.vnet.ibm.com> Cc: stable@vger.kernel.org # 3.5+ Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | | | | | | tracing: Fix race between deleting buffer and setting eventsSteven Rostedt (Red Hat)2013-07-021-6/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While analyzing the code, I discovered that there's a potential race between deleting a trace instance and setting events. There are a few races that can occur if events are being traced as the buffer is being deleted. Mostly the problem comes with freeing the descriptor used by the trace event callback. To prevent problems like this, the events are disabled before the buffer is deleted. The problem with the current solution is that the event_mutex is let go between disabling the events and freeing the files, which means that the events could be enabled again while the freeing takes place. Cc: stable@vger.kernel.org # 3.10 Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | | | | | | tracing: Add trace_array_get/put() to event handlingSteven Rostedt (Red Hat)2013-07-022-4/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit a695cb58162 "tracing: Prevent deleting instances when they are being read" tried to fix a race between deleting a trace instance and reading contents of a trace file. But it wasn't good enough. The following could crash the kernel: # cd /sys/kernel/debug/tracing/instances # ( while :; do mkdir foo; rmdir foo; done ) & # ( while :; do echo 1 > foo/events/sched/sched_switch 2> /dev/null; done ) & Luckily this can only be done by root user, but it should be fixed regardless. The problem is that a delete of the file can happen after the write to the event is opened, but before the enabling happens. The solution is to make sure the trace_array is available before succeeding in opening for write, and incerment the ref counter while opened. Now the instance can be deleted when the events are writing to the buffer, but the deletion of the instance will disable all events before the instance is actually deleted. Cc: stable@vger.kernel.org # 3.10 Reported-by: Alexander Lam <azl@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | | | | | | tracing: Get trace_array ref counts when accessing trace filesSteven Rostedt (Red Hat)2013-07-021-9/+112
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a trace file is opened that may access a trace array, it must increment its ref count to prevent it from being deleted. Cc: stable@vger.kernel.org # 3.10 Reported-by: Alexander Lam <azl@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | | | | | | tracing: Add trace_array_get/put() to handle instance refs betterSteven Rostedt (Red Hat)2013-07-021-18/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit a695cb58162 "tracing: Prevent deleting instances when they are being read" tried to fix a race between deleting a trace instance and reading contents of a trace file. But it wasn't good enough. The following could crash the kernel: # cd /sys/kernel/debug/tracing/instances # ( while :; do mkdir foo; rmdir foo; done ) & # ( while :; do cat foo/trace &> /dev/null; done ) & Luckily this can only be done by root user, but it should be fixed regardless. The problem is that a delete of the file can happen after the reader starts to open the file but before it grabs the trace_types_mutex. The solution is to validate the trace array before using it. If the trace array does not exist in the list of trace arrays, then it returns -ENODEV. There's a possibility that a trace_array could be deleted and a new one created and the open would open its file instead. But that is very minor as it will just return the data of the new trace array, it may confuse the user but it will not crash the system. As this can only be done by root anyway, the race will only occur if root is deleting what its trying to read at the same time. Cc: stable@vger.kernel.org # 3.10 Reported-by: Alexander Lam <azl@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | | | | | | tracing: Protect ftrace_trace_arrays list in trace_events.cAlexander Z Lam2013-07-013-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are multiple places where the ftrace_trace_arrays list is accessed in trace_events.c without the trace_types_lock held. Link: http://lkml.kernel.org/r/1372732674-22726-1-git-send-email-azl@google.com Cc: Vaibhav Nagarnaik <vnagarnaik@google.com> Cc: David Sharp <dhsharp@google.com> Cc: Alexander Z Lam <lambchop468@gmail.com> Cc: stable@vger.kernel.org # 3.10 Signed-off-by: Alexander Z Lam <azl@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | | | | | | tracing: Make trace_marker use the correct per-instance bufferAlexander Z Lam2013-07-011-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The trace_marker file was present for each new instance created, but it added the trace mark to the global trace buffer instead of to the instance's buffer. Link: http://lkml.kernel.org/r/1372717885-4543-2-git-send-email-azl@google.com Cc: David Sharp <dhsharp@google.com> Cc: Vaibhav Nagarnaik <vnagarnaik@google.com> Cc: Alexander Z Lam <lambchop468@gmail.com> Cc: stable@vger.kernel.org # 3.10 Signed-off-by: Alexander Z Lam <azl@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | | | | | | ftrace: Do not run selftest if command line parameter is setSteven Rostedt (Red Hat)2013-07-013-2/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the kernel command line ftrace filter parameters are set (ftrace_filter or ftrace_notrace), force the function self test to pass, with a warning why it was forced. If the user adds a filter to the kernel command line, it is assumed that they know what they are doing, and the self test should just not run instead of failing (which disables function tracing) or clearing the filter, as that will probably annoy the user. If the user wants the selftest to run, the message will tell them why it did not. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | | | | | | tracing/kprobes: Don't pass addr=ip to perf_trace_buf_submit()Oleg Nesterov2013-07-011-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kprobe_perf_func() and kretprobe_perf_func() pass addr=ip to perf_trace_buf_submit() for no reason. This sets perf_sample_data->addr for PERF_SAMPLE_ADDR, we already have perf_sample_data->ip initialized if PERF_SAMPLE_IP. Link: http://lkml.kernel.org/r/20130620173811.GA13161@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | | | | | | tracing: Use flag buffer_disabled for irqsoff tracerSteven Rostedt (Red Hat)2013-07-012-33/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the ring buffer is disabled and the irqsoff tracer records a trace it will clear out its buffer and lose the data it had previously recorded. Currently there's a callback when writing to the tracing_of file, but if tracing is disabled via the function tracer trigger, it will not inform the irqsoff tracer to stop recording. By using the "mirror" flag (buffer_disabled) in the trace_array, that keeps track of the status of the trace_array's buffer, it gives the irqsoff tracer a fast way to know if it should record a new trace or not. The flag may be a little behind the real state of the buffer, but it should not affect the trace too much. It's more important for the irqsoff tracer to be fast. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | | | | | | tracing/kprobes: Turn trace_probe->files into list_headOleg Nesterov2013-07-011-101/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I think that "ftrace_event_file *trace_probe[]" complicates the code for no reason, turn it into list_head to simplify the code. enable_trace_probe() no longer needs synchronize_sched(). This needs the extra sizeof(list_head) memory for every attached ftrace_event_file, hopefully not a problem in this case. Link: http://lkml.kernel.org/r/20130620173814.GA13165@redhat.com Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | | | | | | tracing: Fix disabling of soft disableTom Zanussi2013-07-011-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The comment on the soft disable 'disable' case of __ftrace_event_enable_disable() states that the soft disable bit should be cleared in that case, but currently only the soft mode bit is actually cleared. This essentially leaves the standard non-soft-enable enable/disable paths as the only way to clear the soft disable flag, but the soft disable bit should also be cleared when removing a trigger with '!'. Also, the SOFT_DISABLED bit should never be set if SOFT_MODE is cleared. This fixes the above discrepancies. Link: http://lkml.kernel.org/r/b9c68dd50bc07019e6c67d3f9b29be4ef1b2badb.1372479499.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | | | | | | tracing: Simplify code for showing of soft disabled flagTom Zanussi2013-07-011-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than enumerating each permutation, build the enable state string up from the combination of states. This also allows for the simpler addition of more states. Link: http://lkml.kernel.org/r/9aff5af6dee2f5a40ca30df41c39d5f33e998d7a.1372479499.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | | | | | | tracing/kprobes: Kill probe_enable_lockOleg Nesterov2013-07-011-23/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enable_trace_probe() and disable_trace_probe() should not worry about serialization, the caller (perf_trace_init or __ftrace_set_clr_event) holds event_mutex. They are also called by kprobe_trace_self_tests_init(), but this __init function can't race with itself or trace_events.c And note that this code depended on event_mutex even before 41a7dd420c which introduced probe_enable_lock. In fact it assumes that the caller kprobe_register() can never race with itself. Otherwise, say, tp->flags manipulations are racy. Link: http://lkml.kernel.org/r/20130620173809.GA13158@redhat.com Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | | | | | | tracing/kprobes: Avoid perf_trace_buf_*() if ->perf_events is emptyOleg Nesterov2013-07-011-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | perf_trace_buf_prepare() + perf_trace_buf_submit() make no sense if this task/CPU has no active counters. Change kprobe_perf_func() and kretprobe_perf_func() to check call->perf_events beforehand and return if this list is empty. For example, "perf record -e some_probe -p1". Only /sbin/init will report, all other threads which hit the same probe will do perf_trace_buf_prepare/perf_trace_buf_submit just to realize that nobody wants perf_swevent_event(). Link: http://lkml.kernel.org/r/20130620173806.GA13151@redhat.com Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | | | | | | tracing: Failed to create system directorySteven Rostedt2013-07-011-6/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Running the following: # cd /sys/kernel/debug/tracing # echo p:i do_sys_open > kprobe_events # echo p:j schedule >> kprobe_events # cat kprobe_events p:kprobes/i do_sys_open p:kprobes/j schedule # echo p:i do_sys_open >> kprobe_events # cat kprobe_events p:kprobes/j schedule p:kprobes/i do_sys_open # ls /sys/kernel/debug/tracing/events/kprobes/ enable filter j Notice that the 'i' is missing from the kprobes directory. The console produces: "Failed to create system directory kprobes" This is because kprobes passes in a allocated name for the system and the ftrace event subsystem saves off that name instead of creating a duplicate for it. But the kprobes may free the system name making the pointer to it invalid. This bug was introduced by 92edca073c37 "tracing: Use direct field, type and system names" which switched from using kstrdup() on the system name in favor of just keeping apointer to it, as the internal ftrace event system names are static and exist for the life of the computer being booted. Instead of reverting back to duplicating system names again, we can use core_kernel_data() to determine if the passed in name was allocated or static. Then use the MSB of the ref_count to be a flag to keep track if the name was allocated or not. Then we can still save from having to duplicate strings that will always exist, but still copy the ones that may be freed. Cc: stable@vger.kernel.org # 3.10 Reported-by: "zhangwei(Jovi)" <jovi.zhangwei@huawei.com> Reported-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | | | | | | ftrace: Fix stddev calculation in function profilerJuri Lelli2013-06-191-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When FUNCTION_GRAPH_TRACER is enabled, ftrace can profile kernel functions and print basic statistics about them. Unfortunately, running stddev calculation is wrong. This patch corrects it implementing Welford’s method: s^2 = 1 / (n * (n-1)) * (n * \Sum (x_i)^2 - (\Sum x_i)^2) . Link: http://lkml.kernel.org/r/1371031398-24048-1-git-send-email-juri.lelli@gmail.com Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Juri Lelli <juri.lelli@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | | | | | | tracing/kprobes: Remove unnecessary checking of trace_probe_is_enabledzhangwei(Jovi)2013-06-191-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since tp->flags assignment was moved into function enable_trace_probe(), there is no need to use trace_probe_is_enabled to check flags in the same function. Remove the unnecessary checking. Link: http://lkml.kernel.org/r/51BA7B9E.3040807@huawei.com Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | | | | | | tracing: Disable tracing on warningSteven Rostedt (Red Hat)2013-06-193-0/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a traceoff_on_warning option in both the kernel command line as well as a sysctl option. When set, any WARN*() function that is hit will cause the tracing_on variable to be cleared, which disables writing to the ring buffer. This is useful especially when tracing a bug with function tracing. When a warning is hit, the print caused by the warning can flood the trace with the functions that producing the output for the warning. This can make the resulting trace useless by either hiding where the bug happened, or worse, by overflowing the buffer and losing the trace of the bug totally. Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>