summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--Documentation/trace/events.txt1555
-rw-r--r--Documentation/trace/ftrace.txt34
-rw-r--r--include/linux/trace_events.h120
-rw-r--r--kernel/trace/Kconfig26
-rw-r--r--kernel/trace/Makefile2
-rw-r--r--kernel/trace/trace.c275
-rw-r--r--kernel/trace/trace.h190
-rw-r--r--kernel/trace/trace_events.c329
-rw-r--r--kernel/trace/trace_events_filter.c77
-rw-r--r--kernel/trace/trace_events_hist.c1755
-rw-r--r--kernel/trace/trace_events_trigger.c215
-rw-r--r--kernel/trace/tracing_map.c1062
-rw-r--r--kernel/trace/tracing_map.h283
-rw-r--r--tools/testing/selftests/ftrace/test.d/functions9
-rw-r--r--tools/testing/selftests/ftrace/test.d/trigger/trigger-eventonoff.tc64
-rw-r--r--tools/testing/selftests/ftrace/test.d/trigger/trigger-filter.tc59
-rw-r--r--tools/testing/selftests/ftrace/test.d/trigger/trigger-hist-mod.tc75
-rw-r--r--tools/testing/selftests/ftrace/test.d/trigger/trigger-hist.tc83
-rw-r--r--tools/testing/selftests/ftrace/test.d/trigger/trigger-multihist.tc73
-rw-r--r--tools/testing/selftests/ftrace/test.d/trigger/trigger-snapshot.tc56
-rw-r--r--tools/testing/selftests/ftrace/test.d/trigger/trigger-stacktrace.tc53
-rw-r--r--tools/testing/selftests/ftrace/test.d/trigger/trigger-traceonoff.tc58
22 files changed, 6041 insertions, 412 deletions
diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.txt
index c010be8c85d7..08d74d75150d 100644
--- a/Documentation/trace/events.txt
+++ b/Documentation/trace/events.txt
@@ -512,3 +512,1558 @@ The following commands are supported:
Note that there can be only one traceon or traceoff trigger per
triggering event.
+
+- hist
+
+ This command aggregates event hits into a hash table keyed on one or
+ more trace event format fields (or stacktrace) and a set of running
+ totals derived from one or more trace event format fields and/or
+ event counts (hitcount).
+
+ The format of a hist trigger is as follows:
+
+ hist:keys=<field1[,field2,...]>[:values=<field1[,field2,...]>]
+ [:sort=<field1[,field2,...]>][:size=#entries][:pause][:continue]
+ [:clear][:name=histname1] [if <filter>]
+
+ When a matching event is hit, an entry is added to a hash table
+ using the key(s) and value(s) named. Keys and values correspond to
+ fields in the event's format description. Values must correspond to
+ numeric fields - on an event hit, the value(s) will be added to a
+ sum kept for that field. The special string 'hitcount' can be used
+ in place of an explicit value field - this is simply a count of
+ event hits. If 'values' isn't specified, an implicit 'hitcount'
+ value will be automatically created and used as the only value.
+ Keys can be any field, or the special string 'stacktrace', which
+ will use the event's kernel stacktrace as the key. The keywords
+ 'keys' or 'key' can be used to specify keys, and the keywords
+ 'values', 'vals', or 'val' can be used to specify values. Compound
+ keys consisting of up to two fields can be specified by the 'keys'
+ keyword. Hashing a compound key produces a unique entry in the
+ table for each unique combination of component keys, and can be
+ useful for providing more fine-grained summaries of event data.
+ Additionally, sort keys consisting of up to two fields can be
+ specified by the 'sort' keyword. If more than one field is
+ specified, the result will be a 'sort within a sort': the first key
+ is taken to be the primary sort key and the second the secondary
+ key. If a hist trigger is given a name using the 'name' parameter,
+ its histogram data will be shared with other triggers of the same
+ name, and trigger hits will update this common data. Only triggers
+ with 'compatible' fields can be combined in this way; triggers are
+ 'compatible' if the fields named in the trigger share the same
+ number and type of fields and those fields also have the same names.
+ Note that any two events always share the compatible 'hitcount' and
+ 'stacktrace' fields and can therefore be combined using those
+ fields, however pointless that may be.
+
+ 'hist' triggers add a 'hist' file to each event's subdirectory.
+ Reading the 'hist' file for the event will dump the hash table in
+ its entirety to stdout. If there are multiple hist triggers
+ attached to an event, there will be a table for each trigger in the
+ output. The table displayed for a named trigger will be the same as
+ any other instance having the same name. Each printed hash table
+ entry is a simple list of the keys and values comprising the entry;
+ keys are printed first and are delineated by curly braces, and are
+ followed by the set of value fields for the entry. By default,
+ numeric fields are displayed as base-10 integers. This can be
+ modified by appending any of the following modifiers to the field
+ name:
+
+ .hex display a number as a hex value
+ .sym display an address as a symbol
+ .sym-offset display an address as a symbol and offset
+ .syscall display a syscall id as a system call name
+ .execname display a common_pid as a program name
+
+ Note that in general the semantics of a given field aren't
+ interpreted when applying a modifier to it, but there are some
+ restrictions to be aware of in this regard:
+
+ - only the 'hex' modifier can be used for values (because values
+ are essentially sums, and the other modifiers don't make sense
+ in that context).
+ - the 'execname' modifier can only be used on a 'common_pid'. The
+ reason for this is that the execname is simply the 'comm' value
+ saved for the 'current' process when an event was triggered,
+ which is the same as the common_pid value saved by the event
+ tracing code. Trying to apply that comm value to other pid
+ values wouldn't be correct, and typically events that care save
+ pid-specific comm fields in the event itself.
+
+ A typical usage scenario would be the following to enable a hist
+ trigger, read its current contents, and then turn it off:
+
+ # echo 'hist:keys=skbaddr.hex:vals=len' > \
+ /sys/kernel/debug/tracing/events/net/netif_rx/trigger
+
+ # cat /sys/kernel/debug/tracing/events/net/netif_rx/hist
+
+ # echo '!hist:keys=skbaddr.hex:vals=len' > \
+ /sys/kernel/debug/tracing/events/net/netif_rx/trigger
+
+ The trigger file itself can be read to show the details of the
+ currently attached hist trigger. This information is also displayed
+ at the top of the 'hist' file when read.
+
+ By default, the size of the hash table is 2048 entries. The 'size'
+ parameter can be used to specify more or fewer than that. The units
+ are in terms of hashtable entries - if a run uses more entries than
+ specified, the results will show the number of 'drops', the number
+ of hits that were ignored. The size should be a power of 2 between
+ 128 and 131072 (any non- power-of-2 number specified will be rounded
+ up).
+
+ The 'sort' parameter can be used to specify a value field to sort
+ on. The default if unspecified is 'hitcount' and the default sort
+ order is 'ascending'. To sort in the opposite direction, append
+ .descending' to the sort key.
+
+ The 'pause' parameter can be used to pause an existing hist trigger
+ or to start a hist trigger but not log any events until told to do
+ so. 'continue' or 'cont' can be used to start or restart a paused
+ hist trigger.
+
+ The 'clear' parameter will clear the contents of a running hist
+ trigger and leave its current paused/active state.
+
+ Note that the 'pause', 'cont', and 'clear' parameters should be
+ applied using 'append' shell operator ('>>') if applied to an
+ existing trigger, rather than via the '>' operator, which will cause
+ the trigger to be removed through truncation.
+
+- enable_hist/disable_hist
+
+ The enable_hist and disable_hist triggers can be used to have one
+ event conditionally start and stop another event's already-attached
+ hist trigger. Any number of enable_hist and disable_hist triggers
+ can be attached to a given event, allowing that event to kick off
+ and stop aggregations on a host of other events.
+
+ The format is very similar to the enable/disable_event triggers:
+
+ enable_hist:<system>:<event>[:count]
+ disable_hist:<system>:<event>[:count]
+
+ Instead of enabling or disabling the tracing of the target event
+ into the trace buffer as the enable/disable_event triggers do, the
+ enable/disable_hist triggers enable or disable the aggregation of
+ the target event into a hash table.
+
+ A typical usage scenario for the enable_hist/disable_hist triggers
+ would be to first set up a paused hist trigger on some event,
+ followed by an enable_hist/disable_hist pair that turns the hist
+ aggregation on and off when conditions of interest are hit:
+
+ # echo 'hist:keys=skbaddr.hex:vals=len:pause' > \
+ /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
+
+ # echo 'enable_hist:net:netif_receive_skb if filename==/usr/bin/wget' > \
+ /sys/kernel/debug/tracing/events/sched/sched_process_exec/trigger
+
+ # echo 'disable_hist:net:netif_receive_skb if comm==wget' > \
+ /sys/kernel/debug/tracing/events/sched/sched_process_exit/trigger
+
+ The above sets up an initially paused hist trigger which is unpaused
+ and starts aggregating events when a given program is executed, and
+ which stops aggregating when the process exits and the hist trigger
+ is paused again.
+
+ The examples below provide a more concrete illustration of the
+ concepts and typical usage patterns discussed above.
+
+
+6.2 'hist' trigger examples
+---------------------------
+
+ The first set of examples creates aggregations using the kmalloc
+ event. The fields that can be used for the hist trigger are listed
+ in the kmalloc event's format file:
+
+ # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/format
+ name: kmalloc
+ ID: 374
+ format:
+ field:unsigned short common_type; offset:0; size:2; signed:0;
+ field:unsigned char common_flags; offset:2; size:1; signed:0;
+ field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
+ field:int common_pid; offset:4; size:4; signed:1;
+
+ field:unsigned long call_site; offset:8; size:8; signed:0;
+ field:const void * ptr; offset:16; size:8; signed:0;
+ field:size_t bytes_req; offset:24; size:8; signed:0;
+ field:size_t bytes_alloc; offset:32; size:8; signed:0;
+ field:gfp_t gfp_flags; offset:40; size:4; signed:0;
+
+ We'll start by creating a hist trigger that generates a simple table
+ that lists the total number of bytes requested for each function in
+ the kernel that made one or more calls to kmalloc:
+
+ # echo 'hist:key=call_site:val=bytes_req' > \
+ /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
+
+ This tells the tracing system to create a 'hist' trigger using the
+ call_site field of the kmalloc event as the key for the table, which
+ just means that each unique call_site address will have an entry
+ created for it in the table. The 'val=bytes_req' parameter tells
+ the hist trigger that for each unique entry (call_site) in the
+ table, it should keep a running total of the number of bytes
+ requested by that call_site.
+
+ We'll let it run for awhile and then dump the contents of the 'hist'
+ file in the kmalloc event's subdirectory (for readability, a number
+ of entries have been omitted):
+
+ # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
+ # trigger info: hist:keys=call_site:vals=bytes_req:sort=hitcount:size=2048 [active]
+
+ { call_site: 18446744072106379007 } hitcount: 1 bytes_req: 176
+ { call_site: 18446744071579557049 } hitcount: 1 bytes_req: 1024
+ { call_site: 18446744071580608289 } hitcount: 1 bytes_req: 16384
+ { call_site: 18446744071581827654 } hitcount: 1 bytes_req: 24
+ { call_site: 18446744071580700980 } hitcount: 1 bytes_req: 8
+ { call_site: 18446744071579359876 } hitcount: 1 bytes_req: 152
+ { call_site: 18446744071580795365 } hitcount: 3 bytes_req: 144
+ { call_site: 18446744071581303129 } hitcount: 3 bytes_req: 144
+ { call_site: 18446744071580713234 } hitcount: 4 bytes_req: 2560
+ { call_site: 18446744071580933750 } hitcount: 4 bytes_req: 736
+ .
+ .
+ .
+ { call_site: 18446744072106047046 } hitcount: 69 bytes_req: 5576
+ { call_site: 18446744071582116407 } hitcount: 73 bytes_req: 2336
+ { call_site: 18446744072106054684 } hitcount: 136 bytes_req: 140504
+ { call_site: 18446744072106224230 } hitcount: 136 bytes_req: 19584
+ { call_site: 18446744072106078074 } hitcount: 153 bytes_req: 2448
+ { call_site: 18446744072106062406 } hitcount: 153 bytes_req: 36720
+ { call_site: 18446744071582507929 } hitcount: 153 bytes_req: 37088
+ { call_site: 18446744072102520590 } hitcount: 273 bytes_req: 10920
+ { call_site: 18446744071582143559 } hitcount: 358 bytes_req: 716
+ { call_site: 18446744072106465852 } hitcount: 417 bytes_req: 56712
+ { call_site: 18446744072102523378 } hitcount: 485 bytes_req: 27160
+ { call_site: 18446744072099568646 } hitcount: 1676 bytes_req: 33520
+
+ Totals:
+ Hits: 4610
+ Entries: 45
+ Dropped: 0
+
+ The output displays a line for each entry, beginning with the key
+ specified in the trigger, followed by the value(s) also specified in
+ the trigger. At the beginning of the output is a line that displays
+ the trigger info, which can also be displayed by reading the
+ 'trigger' file:
+
+ # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
+ hist:keys=call_site:vals=bytes_req:sort=hitcount:size=2048 [active]
+
+ At the end of the output are a few lines that display the overall
+ totals for the run. The 'Hits' field shows the total number of
+ times the event trigger was hit, the 'Entries' field shows the total
+ number of used entries in the hash table, and the 'Dropped' field
+ shows the number of hits that were dropped because the number of
+ used entries for the run exceeded the maximum number of entries
+ allowed for the table (normally 0, but if not a hint that you may
+ want to increase the size of the table using the 'size' parameter).
+
+ Notice in the above output that there's an extra field, 'hitcount',
+ which wasn't specified in the trigger. Also notice that in the
+ trigger info output, there's a parameter, 'sort=hitcount', which
+ wasn't specified in the trigger either. The reason for that is that
+ every trigger implicitly keeps a count of the total number of hits
+ attributed to a given entry, called the 'hitcount'. That hitcount
+ information is explicitly displayed in the output, and in the
+ absence of a user-specified sort parameter, is used as the default
+ sort field.
+
+ The value 'hitcount' can be used in place of an explicit value in
+ the 'values' parameter if you don't really need to have any
+ particular field summed and are mainly interested in hit
+ frequencies.
+
+ To turn the hist trigger off, simply call up the trigger in the
+ command history and re-execute it with a '!' prepended:
+
+ # echo '!hist:key=call_site:val=bytes_req' > \
+ /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
+
+ Finally, notice that the call_site as displayed in the output above
+ isn't really very useful. It's an address, but normally addresses
+ are displayed in hex. To have a numeric field displayed as a hex
+ value, simply append '.hex' to the field name in the trigger:
+
+ # echo 'hist:key=call_site.hex:val=bytes_req' > \
+ /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
+
+ # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
+ # trigger info: hist:keys=call_site.hex:vals=bytes_req:sort=hitcount:size=2048 [active]
+
+ { call_site: ffffffffa026b291 } hitcount: 1 bytes_req: 433
+ { call_site: ffffffffa07186ff } hitcount: 1 bytes_req: 176
+ { call_site: ffffffff811ae721 } hitcount: 1 bytes_req: 16384
+ { call_site: ffffffff811c5134 } hitcount: 1 bytes_req: 8
+ { call_site: ffffffffa04a9ebb } hitcount: 1 bytes_req: 511
+ { call_site: ffffffff8122e0a6 } hitcount: 1 bytes_req: 12
+ { call_site: ffffffff8107da84 } hitcount: 1 bytes_req: 152
+ { call_site: ffffffff812d8246 } hitcount: 1 bytes_req: 24
+ { call_site: ffffffff811dc1e5 } hitcount: 3 bytes_req: 144
+ { call_site: ffffffffa02515e8 } hitcount: 3 bytes_req: 648
+ { call_site: ffffffff81258159 } hitcount: 3 bytes_req: 144
+ { call_site: ffffffff811c80f4 } hitcount: 4 bytes_req: 544
+ .
+ .
+ .
+ { call_site: ffffffffa06c7646 } hitcount: 106 bytes_req: 8024
+ { call_site: ffffffffa06cb246 } hitcount: 132 bytes_req: 31680
+ { call_site: ffffffffa06cef7a } hitcount: 132 bytes_req: 2112
+ { call_site: ffffffff8137e399 } hitcount: 132 bytes_req: 23232
+ { call_site: ffffffffa06c941c } hitcount: 185 bytes_req: 171360
+ { call_site: ffffffffa06f2a66 } hitcount: 185 bytes_req: 26640
+ { call_site: ffffffffa036a70e } hitcount: 265 bytes_req: 10600
+ { call_site: ffffffff81325447 } hitcount: 292 bytes_req: 584
+ { call_site: ffffffffa072da3c } hitcount: 446 bytes_req: 60656
+ { call_site: ffffffffa036b1f2 } hitcount: 526 bytes_req: 29456
+ { call_site: ffffffffa0099c06 } hitcount: 1780 bytes_req: 35600
+
+ Totals:
+ Hits: 4775
+ Entries: 46
+ Dropped: 0
+
+ Even that's only marginally more useful - while hex values do look
+ more like addresses, what users are typically more interested in
+ when looking at text addresses are the corresponding symbols
+ instead. To have an address displayed as symbolic value instead,
+ simply append '.sym' or '.sym-offset' to the field name in the
+ trigger:
+
+ # echo 'hist:key=call_site.sym:val=bytes_req' > \
+ /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
+
+ # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
+ # trigger info: hist:keys=call_site.sym:vals=bytes_req:sort=hitcount:size=2048 [active]
+
+ { call_site: [ffffffff810adcb9] syslog_print_all } hitcount: 1 bytes_req: 1024
+ { call_site: [ffffffff8154bc62] usb_control_msg } hitcount: 1 bytes_req: 8
+ { call_site: [ffffffffa00bf6fe] hidraw_send_report [hid] } hitcount: 1 bytes_req: 7
+ { call_site: [ffffffff8154acbe] usb_alloc_urb } hitcount: 1 bytes_req: 192
+ { call_site: [ffffffffa00bf1ca] hidraw_report_event [hid] } hitcount: 1 bytes_req: 7
+ { call_site: [ffffffff811e3a25] __seq_open_private } hitcount: 1 bytes_req: 40
+ { call_site: [ffffffff8109524a] alloc_fair_sched_group } hitcount: 2 bytes_req: 128
+ { call_site: [ffffffff811febd5] fsnotify_alloc_group } hitcount: 2 bytes_req: 528
+ { call_site: [ffffffff81440f58] __tty_buffer_request_room } hitcount: 2 bytes_req: 2624
+ { call_site: [ffffffff81200ba6] inotify_new_group } hitcount: 2 bytes_req: 96
+ { call_site: [ffffffffa05e19af] ieee80211_start_tx_ba_session [mac80211] } hitcount: 2 bytes_req: 464
+ { call_site: [ffffffff81672406] tcp_get_metrics } hitcount: 2 bytes_req: 304
+ { call_site: [ffffffff81097ec2] alloc_rt_sched_group } hitcount: 2 bytes_req: 128
+ { call_site: [ffffffff81089b05] sched_create_group } hitcount: 2 bytes_req: 1424
+ .
+ .
+ .
+ { call_site: [ffffffffa04a580c] intel_crtc_page_flip [i915] } hitcount: 1185 bytes_req: 123240
+ { call_site: [ffffffffa0287592] drm_mode_page_flip_ioctl [drm] } hitcount: 1185 bytes_req: 104280
+ { call_site: [ffffffffa04c4a3c] intel_plane_duplicate_state [i915] } hitcount: 1402 bytes_req: 190672
+ { call_site: [ffffffff812891ca] ext4_find_extent } hitcount: 1518 bytes_req: 146208
+ { call_site: [ffffffffa029070e] drm_vma_node_allow [drm] } hitcount: 1746 bytes_req: 69840
+ { call_site: [ffffffffa045e7c4] i915_gem_do_execbuffer.isra.23 [i915] } hitcount: 2021 bytes_req: 792312
+ { call_site: [ffffffffa02911f2] drm_modeset_lock_crtc [drm] } hitcount: 2592 bytes_req: 145152
+ { call_site: [ffffffffa0489a66] intel_ring_begin [i915] } hitcount: 2629 bytes_req: 378576
+ { call_site: [ffffffffa046041c] i915_gem_execbuffer2 [i915] } hitcount: 2629 bytes_req: 3783248
+ { call_site: [ffffffff81325607] apparmor_file_alloc_security } hitcount: 5192 bytes_req: 10384
+ { call_site: [ffffffffa00b7c06] hid_report_raw_event [hid] } hitcount: 5529 bytes_req: 110584
+ { call_site: [ffffffff8131ebf7] aa_alloc_task_context } hitcount: 21943 bytes_req: 702176
+ { call_site: [ffffffff8125847d] ext4_htree_store_dirent } hitcount: 55759 bytes_req: 5074265
+
+ Totals:
+ Hits: 109928
+ Entries: 71
+ Dropped: 0
+
+ Because the default sort key above is 'hitcount', the above shows a
+ the list of call_sites by increasing hitcount, so that at the bottom
+ we see the functions that made the most kmalloc calls during the
+ run. If instead we we wanted to see the top kmalloc callers in
+ terms of the number of bytes requested rather than the number of
+ calls, and we wanted the top caller to appear at the top, we can use
+ the 'sort' parameter, along with the 'descending' modifier:
+
+ # echo 'hist:key=call_site.sym:val=bytes_req:sort=bytes_req.descending' > \
+ /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
+
+ # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
+ # trigger info: hist:keys=call_site.sym:vals=bytes_req:sort=bytes_req.descending:size=2048 [active]
+
+ { call_site: [ffffffffa046041c] i915_gem_execbuffer2 [i915] } hitcount: 2186 bytes_req: 3397464
+ { call_site: [ffffffffa045e7c4] i915_gem_do_execbuffer.isra.23 [i915] } hitcount: 1790 bytes_req: 712176
+ { call_site: [ffffffff8125847d] ext4_htree_store_dirent } hitcount: 8132 bytes_req: 513135
+ { call_site: [ffffffff811e2a1b] seq_buf_alloc } hitcount: 106 bytes_req: 440128
+ { call_site: [ffffffffa0489a66] intel_ring_begin [i915] } hitcount: 2186 bytes_req: 314784
+ { call_site: [ffffffff812891ca] ext4_find_extent } hitcount: 2174 bytes_req: 208992
+ { call_site: [ffffffff811ae8e1] __kmalloc } hitcount: 8 bytes_req: 131072
+ { call_site: [ffffffffa04c4a3c] intel_plane_duplicate_state [i915] } hitcount: 859 bytes_req: 116824
+ { call_site: [ffffffffa02911f2] drm_modeset_lock_crtc [drm] } hitcount: 1834 bytes_req: 102704
+ { call_site: [ffffffffa04a580c] intel_crtc_page_flip [i915] } hitcount: 972 bytes_req: 101088
+ { call_site: [ffffffffa0287592] drm_mode_page_flip_ioctl [drm] } hitcount: 972 bytes_req: 85536
+ { call_site: [ffffffffa00b7c06] hid_report_raw_event [hid] } hitcount: 3333 bytes_req: 66664
+ { call_site: [ffffffff8137e559] sg_kmalloc } hitcount: 209 bytes_req: 61632
+ .
+ .
+ .
+ { call_site: [ffffffff81095225] alloc_fair_sched_group } hitcount: 2 bytes_req: 128
+ { call_site: [ffffffff81097ec2] alloc_rt_sched_group } hitcount: 2 bytes_req: 128
+ { call_site: [ffffffff812d8406] copy_semundo } hitcount: 2 bytes_req: 48
+ { call_site: [ffffffff81200ba6] inotify_new_group } hitcount: 1 bytes_req: 48
+ { call_site: [ffffffffa027121a] drm_getmagic [drm] } hitcount: 1 bytes_req: 48
+ { call_site: [ffffffff811e3a25] __seq_open_private } hitcount: 1 bytes_req: 40
+ { call_site: [ffffffff811c52f4] bprm_change_interp } hitcount: 2 bytes_req: 16
+ { call_site: [ffffffff8154bc62] usb_control_msg } hitcount: 1 bytes_req: 8
+ { call_site: [ffffffffa00bf1ca] hidraw_report_event [hid] } hitcount: 1 bytes_req: 7
+ { call_site: [ffffffffa00bf6fe] hidraw_send_report [hid] } hitcount: 1 bytes_req: 7
+
+ Totals:
+ Hits: 32133
+ Entries: 81
+ Dropped: 0
+
+ To display the offset and size information in addition to the symbol
+ name, just use 'sym-offset' instead:
+
+ # echo 'hist:key=call_site.sym-offset:val=bytes_req:sort=bytes_req.descending' > \
+ /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
+
+ # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
+ # trigger info: hist:keys=call_site.sym-offset:vals=bytes_req:sort=bytes_req.descending:size=2048 [active]
+
+ { call_site: [ffffffffa046041c] i915_gem_execbuffer2+0x6c/0x2c0 [i915] } hitcount: 4569 bytes_req: 3163720
+ { call_site: [ffffffffa0489a66] intel_ring_begin+0xc6/0x1f0 [i915] } hitcount: 4569 bytes_req: 657936
+ { call_site: [ffffffffa045e7c4] i915_gem_do_execbuffer.isra.23+0x694/0x1020 [i915] } hitcount: 1519 bytes_req: 472936
+ { call_site: [ffffffffa045e646] i915_gem_do_execbuffer.isra.23+0x516/0x1020 [i915] } hitcount: 3050 bytes_req: 211832
+ { call_site: [ffffffff811e2a1b] seq_buf_alloc+0x1b/0x50 } hitcount: 34 bytes_req: 148384
+ { call_site: [ffffffffa04a580c] intel_crtc_page_flip+0xbc/0x870 [i915] } hitcount: 1385 bytes_req: 144040
+ { call_site: [ffffffff811ae8e1] __kmalloc+0x191/0x1b0 } hitcount: 8 bytes_req: 131072
+ { call_site: [ffffffffa0287592] drm_mode_page_flip_ioctl+0x282/0x360 [drm] } hitcount: 1385 bytes_req: 121880
+ { call_site: [ffffffffa02911f2] drm_modeset_lock_crtc+0x32/0x100 [drm] } hitcount: 1848 bytes_req: 103488
+ { call_site: [ffffffffa04c4a3c] intel_plane_duplicate_state+0x2c/0xa0 [i915] } hitcount: 461 bytes_req: 62696
+ { call_site: [ffffffffa029070e] drm_vma_node_allow+0x2e/0xd0 [drm] } hitcount: 1541 bytes_req: 61640
+ { call_site: [ffffffff815f8d7b] sk_prot_alloc+0xcb/0x1b0 } hitcount: 57 bytes_req: 57456
+ .
+ .
+ .
+ { call_site: [ffffffff8109524a] alloc_fair_sched_group+0x5a/0x1a0 } hitcount: 2 bytes_req: 128
+ { call_site: [ffffffffa027b921] drm_vm_open_locked+0x31/0xa0 [drm] } hitcount: 3 bytes_req: 96
+ { call_site: [ffffffff8122e266] proc_self_follow_link+0x76/0xb0 } hitcount: 8 bytes_req: 96
+ { call_site: [ffffffff81213e80] load_elf_binary+0x240/0x1650 } hitcount: 3 bytes_req: 84
+ { call_site: [ffffffff8154bc62] usb_control_msg+0x42/0x110 } hitcount: 1 bytes_req: 8
+ { call_site: [ffffffffa00bf6fe] hidraw_send_report+0x7e/0x1a0 [hid] } hitcount: 1 bytes_req: 7
+ { call_site: [ffffffffa00bf1ca] hidraw_report_event+0x8a/0x120 [hid] } hitcount: 1 bytes_req: 7
+
+ Totals:
+ Hits: 26098
+ Entries: 64
+ Dropped: 0
+
+ We can also add multiple fields to the 'values' parameter. For
+ example, we might want to see the total number of bytes allocated
+ alongside bytes requested, and display the result sorted by bytes
+ allocated in a descending order:
+
+ # echo 'hist:keys=call_site.sym:values=bytes_req,bytes_alloc:sort=bytes_alloc.descending' > \
+ /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
+
+ # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
+ # trigger info: hist:keys=call_site.sym:vals=bytes_req,bytes_alloc:sort=bytes_alloc.descending:size=2048 [active]
+
+ { call_site: [ffffffffa046041c] i915_gem_execbuffer2 [i915] } hitcount: 7403 bytes_req: 4084360 bytes_alloc: 5958016
+ { call_site: [ffffffff811e2a1b] seq_buf_alloc } hitcount: 541 bytes_req: 2213968 bytes_alloc: 2228224
+ { call_site: [ffffffffa0489a66] intel_ring_begin [i915] } hitcount: 7404 bytes_req: 1066176 bytes_alloc: 1421568
+ { call_site: [ffffffffa045e7c4] i915_gem_do_execbuffer.isra.23 [i915] } hitcount: 1565 bytes_req: 557368 bytes_alloc: 1037760
+ { call_site: [ffffffff8125847d] ext4_htree_store_dirent } hitcount: 9557 bytes_req: 595778 bytes_alloc: 695744
+ { call_site: [ffffffffa045e646] i915_gem_do_execbuffer.isra.23 [i915] } hitcount: 5839 bytes_req: 430680 bytes_alloc: 470400
+ { call_site: [ffffffffa04c4a3c] intel_plane_duplicate_state [i915] } hitcount: 2388 bytes_req: 324768 bytes_alloc: 458496
+ { call_site: [ffffffffa02911f2] drm_modeset_lock_crtc [drm] } hitcount: 3911 bytes_req: 219016 bytes_alloc: 250304
+ { call_site: [ffffffff815f8d7b] sk_prot_alloc } hitcount: 235 bytes_req: 236880 bytes_alloc: 240640
+ { call_site: [ffffffff8137e559] sg_kmalloc } hitcount: 557 bytes_req: 169024 bytes_alloc: 221760
+ { call_site: [ffffffffa00b7c06] hid_report_raw_event [hid] } hitcount: 9378 bytes_req: 187548 bytes_alloc: 206312
+ { call_site: [ffffffffa04a580c] intel_crtc_page_flip [i915] } hitcount: 1519 bytes_req: 157976 bytes_alloc: 194432
+ .
+ .
+ .
+ { call_site: [ffffffff8109bd3b] sched_autogroup_create_attach } hitcount: 2 bytes_req: 144 bytes_alloc: 192
+ { call_site: [ffffffff81097ee8] alloc_rt_sched_group } hitcount: 2 bytes_req: 128 bytes_alloc: 128
+ { call_site: [ffffffff8109524a] alloc_fair_sched_group } hitcount: 2 bytes_req: 128 bytes_alloc: 128
+ { call_site: [ffffffff81095225] alloc_fair_sched_group } hitcount: 2 bytes_req: 128 bytes_alloc: 128
+ { call_site: [ffffffff81097ec2] alloc_rt_sched_group } hitcount: 2 bytes_req: 128 bytes_alloc: 128
+ { call_site: [ffffffff81213e80] load_elf_binary } hitcount: 3 bytes_req: 84 bytes_alloc: 96
+ { call_site: [ffffffff81079a2e] kthread_create_on_node } hitcount: 1 bytes_req: 56 bytes_alloc: 64
+ { call_site: [ffffffffa00bf6fe] hidraw_send_report [hid] } hitcount: 1 bytes_req: 7 bytes_alloc: 8
+ { call_site: [ffffffff8154bc62] usb_control_msg } hitcount: 1 bytes_req: 8 bytes_alloc: 8
+ { call_site: [ffffffffa00bf1ca] hidraw_report_event [hid] } hitcount: 1 bytes_req: 7 bytes_alloc: 8
+
+ Totals:
+ Hits: 66598
+ Entries: 65
+ Dropped: 0
+
+ Finally, to finish off our kmalloc example, instead of simply having
+ the hist trigger display symbolic call_sites, we can have the hist
+ trigger additionally display the complete set of kernel stack traces
+ that led to each call_site. To do that, we simply use the special
+ value 'stacktrace' for the key parameter:
+
+ # echo 'hist:keys=stacktrace:values=bytes_req,bytes_alloc:sort=bytes_alloc' > \
+ /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
+
+ The above trigger will use the kernel stack trace in effect when an
+ event is triggered as the key for the hash table. This allows the
+ enumeration of every kernel callpath that led up to a particular
+ event, along with a running total of any of the event fields for
+ that event. Here we tally bytes requested and bytes allocated for
+ every callpath in the system that led up to a kmalloc (in this case
+ every callpath to a kmalloc for a kernel compile):
+
+ # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
+ # trigger info: hist:keys=stacktrace:vals=bytes_req,bytes_alloc:sort=bytes_alloc:size=2048 [active]
+
+ { stacktrace:
+ __kmalloc_track_caller+0x10b/0x1a0
+ kmemdup+0x20/0x50
+ hidraw_report_event+0x8a/0x120 [hid]
+ hid_report_raw_event+0x3ea/0x440 [hid]
+ hid_input_report+0x112/0x190 [hid]
+ hid_irq_in+0xc2/0x260 [usbhid]
+ __usb_hcd_giveback_urb+0x72/0x120
+ usb_giveback_urb_bh+0x9e/0xe0
+ tasklet_hi_action+0xf8/0x100
+ __do_softirq+0x114/0x2c0
+ irq_exit+0xa5/0xb0
+ do_IRQ+0x5a/0xf0
+ ret_from_intr+0x0/0x30
+ cpuidle_enter+0x17/0x20
+ cpu_startup_entry+0x315/0x3e0
+ rest_init+0x7c/0x80
+ } hitcount: 3 bytes_req: 21 bytes_alloc: 24
+ { stacktrace:
+ __kmalloc_track_caller+0x10b/0x1a0
+ kmemdup+0x20/0x50
+ hidraw_report_event+0x8a/0x120 [hid]
+ hid_report_raw_event+0x3ea/0x440 [hid]
+ hid_input_report+0x112/0x190 [hid]
+ hid_irq_in+0xc2/0x260 [usbhid]
+ __usb_hcd_giveback_urb+0x72/0x120
+ usb_giveback_urb_bh+0x9e/0xe0
+ tasklet_hi_action+0xf8/0x100
+ __do_softirq+0x114/0x2c0
+ irq_exit+0xa5/0xb0
+ do_IRQ+0x5a/0xf0
+ ret_from_intr+0x0/0x30
+ } hitcount: 3 bytes_req: 21 bytes_alloc: 24
+ { stacktrace:
+ kmem_cache_alloc_trace+0xeb/0x150
+ aa_alloc_task_context+0x27/0x40
+ apparmor_cred_prepare+0x1f/0x50
+ security_prepare_creds+0x16/0x20
+ prepare_creds+0xdf/0x1a0
+ SyS_capset+0xb5/0x200
+ system_call_fastpath+0x12/0x6a
+ } hitcount: 1 bytes_req: 32 bytes_alloc: 32
+ .
+ .
+ .
+ { stacktrace:
+ __kmalloc+0x11b/0x1b0
+ i915_gem_execbuffer2+0x6c/0x2c0 [i915]
+ drm_ioctl+0x349/0x670 [drm]
+ do_vfs_ioctl+0x2f0/0x4f0
+ SyS_ioctl+0x81/0xa0
+ system_call_fastpath+0x12/0x6a
+ } hitcount: 17726 bytes_req: 13944120 bytes_alloc: 19593808
+ { stacktrace:
+ __kmalloc+0x11b/0x1b0
+ load_elf_phdrs+0x76/0xa0
+ load_elf_binary+0x102/0x1650
+ search_binary_handler+0x97/0x1d0
+ do_execveat_common.isra.34+0x551/0x6e0
+ SyS_execve+0x3a/0x50
+ return_from_execve+0x0/0x23
+ } hitcount: 33348 bytes_req: 17152128 bytes_alloc: 20226048
+ { stacktrace:
+ kmem_cache_alloc_trace+0xeb/0x150
+ apparmor_file_alloc_security+0x27/0x40
+ security_file_alloc+0x16/0x20
+ get_empty_filp+0x93/0x1c0
+ path_openat+0x31/0x5f0
+ do_filp_open+0x3a/0x90
+ do_sys_open+0x128/0x220
+ SyS_open+0x1e/0x20
+ system_call_fastpath+0x12/0x6a
+ } hitcount: 4766422 bytes_req: 9532844 bytes_alloc: 38131376
+ { stacktrace:
+ __kmalloc+0x11b/0x1b0
+ seq_buf_alloc+0x1b/0x50
+ seq_read+0x2cc/0x370
+ proc_reg_read+0x3d/0x80
+ __vfs_read+0x28/0xe0
+ vfs_read+0x86/0x140
+ SyS_read+0x46/0xb0
+ system_call_fastpath+0x12/0x6a
+ } hitcount: 19133 bytes_req: 78368768 bytes_alloc: 78368768
+
+ Totals:
+ Hits: 6085872
+ Entries: 253
+ Dropped: 0
+
+ If you key a hist trigger on common_pid, in order for example to
+ gather and display sorted totals for each process, you can use the
+ special .execname modifier to display the executable names for the
+ processes in the table rather than raw pids. The example below
+ keeps a per-process sum of total bytes read:
+
+ # echo 'hist:key=common_pid.execname:val=count:sort=count.descending' > \
+ /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
+
+ # cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/hist
+ # trigger info: hist:keys=common_pid.execname:vals=count:sort=count.descending:size=2048 [active]
+
+ { common_pid: gnome-terminal [ 3196] } hitcount: 280 count: 1093512
+ { common_pid: Xorg [ 1309] } hitcount: 525 count: 256640
+ { common_pid: compiz [ 2889] } hitcount: 59 count: 254400
+ { common_pid: bash [ 8710] } hitcount: 3 count: 66369
+ { common_pid: dbus-daemon-lau [ 8703] } hitcount: 49 count: 47739
+ { common_pid: irqbalance [ 1252] } hitcount: 27 count: 27648
+ { common_pid: 01ifupdown [ 8705] } hitcount: 3 count: 17216
+ { common_pid: dbus-daemon [ 772] } hitcount: 10 count: 12396
+ { common_pid: Socket Thread [ 8342] } hitcount: 11 count: 11264
+ { common_pid: nm-dhcp-client. [ 8701] } hitcount: 6 count: 7424
+ { common_pid: gmain [ 1315] } hitcount: 18 count: 6336
+ .
+ .
+ .
+ { common_pid: postgres [ 1892] } hitcount: 2 count: 32
+ { common_pid: postgres [ 1891] } hitcount: 2 count: 32
+ { common_pid: gmain [ 8704] } hitcount: 2 count: 32
+ { common_pid: upstart-dbus-br [ 2740] } hitcount: 21 count: 21
+ { common_pid: nm-dispatcher.a [ 8696] } hitcount: 1 count: 16
+ { common_pid: indicator-datet [ 2904] } hitcount: 1 count: 16
+ { common_pid: gdbus [ 2998] } hitcount: 1 count: 16
+ { common_pid: rtkit-daemon [ 2052] } hitcount: 1 count: 8
+ { common_pid: init [ 1] } hitcount: 2 count: 2
+
+ Totals:
+ Hits: 2116
+ Entries: 51
+ Dropped: 0
+
+ Similarly, if you key a hist trigger on syscall id, for example to
+ gather and display a list of systemwide syscall hits, you can use
+ the special .syscall modifier to display the syscall names rather
+ than raw ids. The example below keeps a running total of syscall
+ counts for the system during the run:
+
+ # echo 'hist:key=id.syscall:val=hitcount' > \
+ /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger
+
+ # cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist
+ # trigger info: hist:keys=id.syscall:vals=hitcount:sort=hitcount:size=2048 [active]
+
+ { id: sys_fsync [ 74] } hitcount: 1
+ { id: sys_newuname [ 63] } hitcount: 1
+ { id: sys_prctl [157] } hitcount: 1
+ { id: sys_statfs [137] } hitcount: 1
+ { id: sys_symlink [ 88] } hitcount: 1
+ { id: sys_sendmmsg [307] } hitcount: 1
+ { id: sys_semctl [ 66] } hitcount: 1
+ { id: sys_readlink [ 89] } hitcount: 3
+ { id: sys_bind [ 49] } hitcount: 3
+ { id: sys_getsockname [ 51] } hitcount: 3
+ { id: sys_unlink [ 87] } hitcount: 3
+ { id: sys_rename [ 82] } hitcount: 4
+ { id: unknown_syscall [ 58] } hitcount: 4
+ { id: sys_connect [ 42] } hitcount: 4
+ { id: sys_getpid [ 39] } hitcount: 4
+ .
+ .
+ .
+ { id: sys_rt_sigprocmask [ 14] } hitcount: 952
+ { id: sys_futex [202] } hitcount: 1534
+ { id: sys_write [ 1] } hitcount: 2689
+ { id: sys_setitimer [ 38] } hitcount: 2797
+ { id: sys_read [ 0] } hitcount: 3202
+ { id: sys_select [ 23] } hitcount: 3773
+ { id: sys_writev [ 20] } hitcount: 4531
+ { id: sys_poll [ 7] } hitcount: 8314
+ { id: sys_recvmsg [ 47] } hitcount: 13738
+ { id: sys_ioctl [ 16] } hitcount: 21843
+
+ Totals:
+ Hits: 67612
+ Entries: 72
+ Dropped: 0
+
+ The syscall counts above provide a rough overall picture of system
+ call activity on the system; we can see for example that the most
+ popular system call on this system was the 'sys_ioctl' system call.
+
+ We can use 'compound' keys to refine that number and provide some
+ further insight as to which processes exactly contribute to the
+ overall ioctl count.
+
+ The command below keeps a hitcount for every unique combination of
+ system call id and pid - the end result is essentially a table
+ that keeps a per-pid sum of system call hits. The results are
+ sorted using the system call id as the primary key, and the
+ hitcount sum as the secondary key:
+
+ # echo 'hist:key=id.syscall,common_pid.execname:val=hitcount:sort=id,hitcount' > \
+ /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger
+
+ # cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist
+ # trigger info: hist:keys=id.syscall,common_pid.execname:vals=hitcount:sort=id.syscall,hitcount:size=2048 [active]
+
+ { id: sys_read [ 0], common_pid: rtkit-daemon [ 1877] } hitcount: 1
+ { id: sys_read [ 0], common_pid: gdbus [ 2976] } hitcount: 1
+ { id: sys_read [ 0], common_pid: console-kit-dae [ 3400] } hitcount: 1
+ { id: sys_read [ 0], common_pid: postgres [ 1865] } hitcount: 1
+ { id: sys_read [ 0], common_pid: deja-dup-monito [ 3543] } hitcount: 2
+ { id: sys_read [ 0], common_pid: NetworkManager [ 890] } hitcount: 2
+ { id: sys_read [ 0], common_pid: evolution-calen [ 3048] } hitcount: 2
+ { id: sys_read [ 0], common_pid: postgres [ 1864] } hitcount: 2
+ { id: sys_read [ 0], common_pid: nm-applet [ 3022] } hitcount: 2
+ { id: sys_read [ 0], common_pid: whoopsie [ 1212] } hitcount: 2
+ .
+ .
+ .
+ { id: sys_ioctl [ 16], common_pid: bash [ 8479] } hitcount: 1
+ { id: sys_ioctl [ 16], common_pid: bash [ 3472] } hitcount: 12
+ { id: sys_ioctl [ 16], common_pid: gnome-terminal [ 3199] } hitcount: 16
+ { id: sys_ioctl [ 16], common_pid: Xorg [ 1267] } hitcount: 1808
+ { id: sys_ioctl [ 16], common_pid: compiz [ 2994] } hitcount: 5580
+ .
+ .
+ .
+ { id: sys_waitid [247], common_pid: upstart-dbus-br [ 2690] } hitcount: 3
+ { id: sys_waitid [247], common_pid: upstart-dbus-br [ 2688] } hitcount: 16
+ { id: sys_inotify_add_watch [254], common_pid: gmain [ 975] } hitcount: 2
+ { id: sys_inotify_add_watch [254], common_pid: gmain [ 3204] } hitcount: 4
+ { id: sys_inotify_add_watch [254], common_pid: gmain [ 2888] } hitcount: 4
+ { id: sys_inotify_add_watch [254], common_pid: gmain [ 3003] } hitcount: 4
+ { id: sys_inotify_add_watch [254], common_pid: gmain [ 2873] } hitcount: 4
+ { id: sys_inotify_add_watch [254], common_pid: gmain [ 3196] } hitcount: 6
+ { id: sys_openat [257], common_pid: java [ 2623] } hitcount: 2
+ { id: sys_eventfd2 [290], common_pid: ibus-ui-gtk3 [ 2760] } hitcount: 4
+ { id: sys_eventfd2 [290], common_pid: compiz [ 2994] } hitcount: 6
+
+ Totals:
+ Hits: 31536
+ Entries: 323
+ Dropped: 0
+
+ The above list does give us a breakdown of the ioctl syscall by
+ pid, but it also gives us quite a bit more than that, which we
+ don't really care about at the moment. Since we know the syscall
+ id for sys_ioctl (16, displayed next to the sys_ioctl name), we
+ can use that to filter out all the other syscalls:
+
+ # echo 'hist:key=id.syscall,common_pid.execname:val=hitcount:sort=id,hitcount if id == 16' > \
+ /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger
+
+ # cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist
+ # trigger info: hist:keys=id.syscall,common_pid.execname:vals=hitcount:sort=id.syscall,hitcount:size=2048 if id == 16 [active]
+
+ { id: sys_ioctl [ 16], common_pid: gmain [ 2769] } hitcount: 1
+ { id: sys_ioctl [ 16], common_pid: evolution-addre [ 8571] } hitcount: 1
+ { id: sys_ioctl [ 16], common_pid: gmain [ 3003] } hitcount: 1
+ { id: sys_ioctl [ 16], common_pid: gmain [ 2781] } hitcount: 1
+ { id: sys_ioctl [ 16], common_pid: gmain [ 2829] } hitcount: 1
+ { id: sys_ioctl [ 16], common_pid: bash [ 8726] } hitcount: 1
+ { id: sys_ioctl [ 16], common_pid: bash [ 8508] } hitcount: 1
+ { id: sys_ioctl [ 16], common_pid: gmain [ 2970] } hitcount: 1
+ { id: sys_ioctl [ 16], common_pid: gmain [ 2768] } hitcount: 1
+ .
+ .
+ .
+ { id: sys_ioctl [ 16], common_pid: pool [ 8559] } hitcount: 45
+ { id: sys_ioctl [ 16], common_pid: pool [ 8555] } hitcount: 48
+ { id: sys_ioctl [ 16], common_pid: pool [ 8551] } hitcount: 48
+ { id: sys_ioctl [ 16], common_pid: avahi-daemon [ 896] } hitcount: 66
+ { id: sys_ioctl [ 16], common_pid: Xorg [ 1267] } hitcount: 26674
+ { id: sys_ioctl [ 16], common_pid: compiz [ 2994] } hitcount: 73443
+
+ Totals:
+ Hits: 101162
+ Entries: 103
+ Dropped: 0
+
+ The above output shows that 'compiz' and 'Xorg' are far and away
+ the heaviest ioctl callers (which might lead to questions about
+ whether they really need to be making all those calls and to
+ possible avenues for further investigation.)
+
+ The compound key examples used a key and a sum value (hitcount) to
+ sort the output, but we can just as easily use two keys instead.
+ Here's an example where we use a compound key composed of the the
+ common_pid and size event fields. Sorting with pid as the primary
+ key and 'size' as the secondary key allows us to display an
+ ordered summary of the recvfrom sizes, with counts, received by
+ each process:
+
+ # echo 'hist:key=common_pid.execname,size:val=hitcount:sort=common_pid,size' > \
+ /sys/kernel/debug/tracing/events/syscalls/sys_enter_recvfrom/trigger
+
+ # cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_recvfrom/hist
+ # trigger info: hist:keys=common_pid.execname,size:vals=hitcount:sort=common_pid.execname,size:size=2048 [active]
+
+ { common_pid: smbd [ 784], size: 4 } hitcount: 1
+ { common_pid: dnsmasq [ 1412], size: 4096 } hitcount: 672
+ { common_pid: postgres [ 1796], size: 1000 } hitcount: 6
+ { common_pid: postgres [ 1867], size: 1000 } hitcount: 10
+ { common_pid: bamfdaemon [ 2787], size: 28 } hitcount: 2
+ { common_pid: bamfdaemon [ 2787], size: 14360 } hitcount: 1
+ { common_pid: compiz [ 2994], size: 8 } hitcount: 1
+ { common_pid: compiz [ 2994], size: 20 } hitcount: 11
+ { common_pid: gnome-terminal [ 3199], size: 4 } hitcount: 2
+ { common_pid: firefox [ 8817], size: 4 } hitcount: 1
+ { common_pid: firefox [ 8817], size: 8 } hitcount: 5
+ { common_pid: firefox [ 8817], size: 588 } hitcount: 2
+ { common_pid: firefox [ 8817], size: 628 } hitcount: 1
+ { common_pid: firefox [ 8817], size: 6944 } hitcount: 1
+ { common_pid: firefox [ 8817], size: 408880 } hitcount: 2
+ { common_pid: firefox [ 8822], size: 8 } hitcount: 2
+ { common_pid: firefox [ 8822], size: 160 } hitcount: 2
+ { common_pid: firefox [ 8822], size: 320 } hitcount: 2
+ { common_pid: firefox [ 8822], size: 352 } hitcount: 1
+ .
+ .
+ .
+ { common_pid: pool [ 8923], size: 1960 } hitcount: 10
+ { common_pid: pool [ 8923], size: 2048 } hitcount: 10
+ { common_pid: pool [ 8924], size: 1960 } hitcount: 10
+ { common_pid: pool [ 8924], size: 2048 } hitcount: 10
+ { common_pid: pool [ 8928], size: 1964 } hitcount: 4
+ { common_pid: pool [ 8928], size: 1965 } hitcount: 2
+ { common_pid: pool [ 8928], size: 2048 } hitcount: 6
+ { common_pid: pool [ 8929], size: 1982 } hitcount: 1
+ { common_pid: pool [ 8929], size: 2048 } hitcount: 1
+
+ Totals:
+ Hits: 2016
+ Entries: 224
+ Dropped: 0
+
+ The above example also illustrates the fact that although a compound
+ key is treated as a single entity for hashing purposes, the sub-keys
+ it's composed of can be accessed independently.
+
+ The next example uses a string field as the hash key and
+ demonstrates how you can manually pause and continue a hist trigger.
+ In this example, we'll aggregate fork counts and don't expect a
+ large number of entries in the hash table, so we'll drop it to a
+ much smaller number, say 256:
+
+ # echo 'hist:key=child_comm:val=hitcount:size=256' > \
+ /sys/kernel/debug/tracing/events/sched/sched_process_fork/trigger
+
+ # cat /sys/kernel/debug/tracing/events/sched/sched_process_fork/hist
+ # trigger info: hist:keys=child_comm:vals=hitcount:sort=hitcount:size=256 [active]
+
+ { child_comm: dconf worker } hitcount: 1
+ { child_comm: ibus-daemon } hitcount: 1
+ { child_comm: whoopsie } hitcount: 1
+ { child_comm: smbd } hitcount: 1
+ { child_comm: gdbus } hitcount: 1
+ { child_comm: kthreadd } hitcount: 1
+ { child_comm: dconf worker } hitcount: 1
+ { child_comm: evolution-alarm } hitcount: 2
+ { child_comm: Socket Thread } hitcount: 2
+ { child_comm: postgres } hitcount: 2
+ { child_comm: bash } hitcount: 3
+ { child_comm: compiz } hitcount: 3
+ { child_comm: evolution-sourc } hitcount: 4
+ { child_comm: dhclient } hitcount: 4
+ { child_comm: pool } hitcount: 5
+ { child_comm: nm-dispatcher.a } hitcount: 8
+ { child_comm: firefox } hitcount: 8
+ { child_comm: dbus-daemon } hitcount: 8
+ { child_comm: glib-pacrunner } hitcount: 10
+ { child_comm: evolution } hitcount: 23
+
+ Totals:
+ Hits: 89
+ Entries: 20
+ Dropped: 0
+
+ If we want to pause the hist trigger, we can simply append :pause to
+ the command that started the trigger. Notice that the trigger info
+ displays as [paused]:
+
+ # echo 'hist:key=child_comm:val=hitcount:size=256:pause' >> \
+ /sys/kernel/debug/tracing/events/sched/sched_process_fork/trigger
+
+ # cat /sys/kernel/debug/tracing/events/sched/sched_process_fork/hist
+ # trigger info: hist:keys=child_comm:vals=hitcount:sort=hitcount:size=256 [paused]
+
+ { child_comm: dconf worker } hitcount: 1
+ { child_comm: kthreadd } hitcount: 1
+ { child_comm: dconf worker } hitcount: 1
+ { child_comm: gdbus } hitcount: 1
+ { child_comm: ibus-daemon } hitcount: 1
+ { child_comm: Socket Thread } hitcount: 2
+ { child_comm: evolution-alarm } hitcount: 2
+ { child_comm: smbd } hitcount: 2
+ { child_comm: bash } hitcount: 3
+ { child_comm: whoopsie } hitcount: 3
+ { child_comm: compiz } hitcount: 3
+ { child_comm: evolution-sourc } hitcount: 4
+ { child_comm: pool } hitcount: 5
+ { child_comm: postgres } hitcount: 6
+ { child_comm: firefox } hitcount: 8
+ { child_comm: dhclient } hitcount: 10
+ { child_comm: emacs } hitcount: 12
+ { child_comm: dbus-daemon } hitcount: 20
+ { child_comm: nm-dispatcher.a } hitcount: 20
+ { child_comm: evolution } hitcount: 35
+ { child_comm: glib-pacrunner } hitcount: 59
+
+ Totals:
+ Hits: 199
+ Entries: 21
+ Dropped: 0
+
+ To manually continue having the trigger aggregate events, append
+ :cont instead. Notice that the trigger info displays as [active]
+ again, and the data has changed:
+
+ # echo 'hist:key=child_comm:val=hitcount:size=256:cont' >> \
+ /sys/kernel/debug/tracing/events/sched/sched_process_fork/trigger
+
+ # cat /sys/kernel/debug/tracing/events/sched/sched_process_fork/hist
+ # trigger info: hist:keys=child_comm:vals=hitcount:sort=hitcount:size=256 [active]
+
+ { child_comm: dconf worker } hitcount: 1
+ { child_comm: dconf worker } hitcount: 1
+ { child_comm: kthreadd } hitcount: 1
+ { child_comm: gdbus } hitcount: 1
+ { child_comm: ibus-daemon } hitcount: 1
+ { child_comm: Socket Thread } hitcount: 2
+ { child_comm: evolution-alarm } hitcount: 2
+ { child_comm: smbd } hitcount: 2
+ { child_comm: whoopsie } hitcount: 3
+ { child_comm: compiz } hitcount: 3
+ { child_comm: evolution-sourc } hitcount: 4
+ { child_comm: bash } hitcount: 5
+ { child_comm: pool } hitcount: 5
+ { child_comm: postgres } hitcount: 6
+ { child_comm: firefox } hitcount: 8
+ { child_comm: dhclient } hitcount: 11
+ { child_comm: emacs } hitcount: 12
+ { child_comm: dbus-daemon } hitcount: 22
+ { child_comm: nm-dispatcher.a } hitcount: 22
+ { child_comm: evolution } hitcount: 35
+ { child_comm: glib-pacrunner } hitcount: 59
+
+ Totals:
+ Hits: 206
+ Entries: 21
+ Dropped: 0
+
+ The previous example showed how to start and stop a hist trigger by
+ appending 'pause' and 'continue' to the hist trigger command. A
+ hist trigger can also be started in a paused state by initially
+ starting the trigger with ':pause' appended. This allows you to
+ start the trigger only when you're ready to start collecting data
+ and not before. For example, you could start the trigger in a
+ paused state, then unpause it and do something you want to measure,
+ then pause the trigger again when done.
+
+ Of course, doing this manually can be difficult and error-prone, but
+ it is possible to automatically start and stop a hist trigger based
+ on some condition, via the enable_hist and disable_hist triggers.
+
+ For example, suppose we wanted to take a look at the relative
+ weights in terms of skb length for each callpath that leads to a
+ netif_receieve_skb event when downloading a decent-sized file using
+ wget.
+
+ First we set up an initially paused stacktrace trigger on the
+ netif_receive_skb event:
+
+ # echo 'hist:key=stacktrace:vals=len:pause' > \
+ /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
+
+ Next, we set up an 'enable_hist' trigger on the sched_process_exec
+ event, with an 'if filename==/usr/bin/wget' filter. The effect of
+ this new trigger is that it will 'unpause' the hist trigger we just
+ set up on netif_receive_skb if and only if it sees a
+ sched_process_exec event with a filename of '/usr/bin/wget'. When
+ that happens, all netif_receive_skb events are aggregated into a
+ hash table keyed on stacktrace:
+
+ # echo 'enable_hist:net:netif_receive_skb if filename==/usr/bin/wget' > \
+ /sys/kernel/debug/tracing/events/sched/sched_process_exec/trigger
+
+ The aggregation continues until the netif_receive_skb is paused
+ again, which is what the following disable_hist event does by
+ creating a similar setup on the sched_process_exit event, using the
+ filter 'comm==wget':
+
+ # echo 'disable_hist:net:netif_receive_skb if comm==wget' > \
+ /sys/kernel/debug/tracing/events/sched/sched_process_exit/trigger
+
+ Whenever a process exits and the comm field of the disable_hist
+ trigger filter matches 'comm==wget', the netif_receive_skb hist
+ trigger is disabled.
+
+ The overall effect is that netif_receive_skb events are aggregated
+ into the hash table for only the duration of the wget. Executing a
+ wget command and then listing the 'hist' file will display the
+ output generated by the wget command:
+
+ $ wget https://www.kernel.org/pub/linux/kernel/v3.x/patch-3.19.xz
+
+ # cat /sys/kernel/debug/tracing/events/net/netif_receive_skb/hist
+ # trigger info: hist:keys=stacktrace:vals=len:sort=hitcount:size=2048 [paused]
+
+ { stacktrace:
+ __netif_receive_skb_core+0x46d/0x990
+ __netif_receive_skb+0x18/0x60
+ netif_receive_skb_internal+0x23/0x90
+ napi_gro_receive+0xc8/0x100
+ ieee80211_deliver_skb+0xd6/0x270 [mac80211]
+ ieee80211_rx_handlers+0xccf/0x22f0 [mac80211]
+ ieee80211_prepare_and_rx_handle+0x4e7/0xc40 [mac80211]
+ ieee80211_rx+0x31d/0x900 [mac80211]
+ iwlagn_rx_reply_rx+0x3db/0x6f0 [iwldvm]
+ iwl_rx_dispatch+0x8e/0xf0 [iwldvm]
+ iwl_pcie_irq_handler+0xe3c/0x12f0 [iwlwifi]
+ irq_thread_fn+0x20/0x50
+ irq_thread+0x11f/0x150
+ kthread+0xd2/0xf0
+ ret_from_fork+0x42/0x70
+ } hitcount: 85 len: 28884
+ { stacktrace:
+ __netif_receive_skb_core+0x46d/0x990
+ __netif_receive_skb+0x18/0x60
+ netif_receive_skb_internal+0x23/0x90
+ napi_gro_complete+0xa4/0xe0
+ dev_gro_receive+0x23a/0x360
+ napi_gro_receive+0x30/0x100
+ ieee80211_deliver_skb+0xd6/0x270 [mac80211]
+ ieee80211_rx_handlers+0xccf/0x22f0 [mac80211]
+ ieee80211_prepare_and_rx_handle+0x4e7/0xc40 [mac80211]
+ ieee80211_rx+0x31d/0x900 [mac80211]
+ iwlagn_rx_reply_rx+0x3db/0x6f0 [iwldvm]
+ iwl_rx_dispatch+0x8e/0xf0 [iwldvm]
+ iwl_pcie_irq_handler+0xe3c/0x12f0 [iwlwifi]
+ irq_thread_fn+0x20/0x50
+ irq_thread+0x11f/0x150
+ kthread+0xd2/0xf0
+ } hitcount: 98 len: 664329
+ { stacktrace:
+ __netif_receive_skb_core+0x46d/0x990
+ __netif_receive_skb+0x18/0x60
+ process_backlog+0xa8/0x150
+ net_rx_action+0x15d/0x340
+ __do_softirq+0x114/0x2c0
+ do_softirq_own_stack+0x1c/0x30
+ do_softirq+0x65/0x70
+ __local_bh_enable_ip+0xb5/0xc0
+ ip_finish_output+0x1f4/0x840
+ ip_output+0x6b/0xc0
+ ip_local_out_sk+0x31/0x40
+ ip_send_skb+0x1a/0x50
+ udp_send_skb+0x173/0x2a0
+ udp_sendmsg+0x2bf/0x9f0
+ inet_sendmsg+0x64/0xa0
+ sock_sendmsg+0x3d/0x50
+ } hitcount: 115 len: 13030
+ { stacktrace:
+ __netif_receive_skb_core+0x46d/0x990
+ __netif_receive_skb+0x18/0x60
+ netif_receive_skb_internal+0x23/0x90
+ napi_gro_complete+0xa4/0xe0
+ napi_gro_flush+0x6d/0x90
+ iwl_pcie_irq_handler+0x92a/0x12f0 [iwlwifi]
+ irq_thread_fn+0x20/0x50
+ irq_thread+0x11f/0x150
+ kthread+0xd2/0xf0
+ ret_from_fork+0x42/0x70
+ } hitcount: 934 len: 5512212
+
+ Totals:
+ Hits: 1232
+ Entries: 4
+ Dropped: 0
+
+ The above shows all the netif_receive_skb callpaths and their total
+ lengths for the duration of the wget command.
+
+ The 'clear' hist trigger param can be used to clear the hash table.
+ Suppose we wanted to try another run of the previous example but
+ this time also wanted to see the complete list of events that went
+ into the histogram. In order to avoid having to set everything up
+ again, we can just clear the histogram first:
+
+ # echo 'hist:key=stacktrace:vals=len:clear' >> \
+ /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
+
+ Just to verify that it is in fact cleared, here's what we now see in
+ the hist file:
+
+ # cat /sys/kernel/debug/tracing/events/net/netif_receive_skb/hist
+ # trigger info: hist:keys=stacktrace:vals=len:sort=hitcount:size=2048 [paused]
+
+ Totals:
+ Hits: 0
+ Entries: 0
+ Dropped: 0
+
+ Since we want to see the detailed list of every netif_receive_skb
+ event occurring during the new run, which are in fact the same
+ events being aggregated into the hash table, we add some additional
+ 'enable_event' events to the triggering sched_process_exec and
+ sched_process_exit events as such:
+
+ # echo 'enable_event:net:netif_receive_skb if filename==/usr/bin/wget' > \
+ /sys/kernel/debug/tracing/events/sched/sched_process_exec/trigger
+
+ # echo 'disable_event:net:netif_receive_skb if comm==wget' > \
+ /sys/kernel/debug/tracing/events/sched/sched_process_exit/trigger
+
+ If you read the trigger files for the sched_process_exec and
+ sched_process_exit triggers, you should see two triggers for each:
+ one enabling/disabling the hist aggregation and the other
+ enabling/disabling the logging of events:
+
+ # cat /sys/kernel/debug/tracing/events/sched/sched_process_exec/trigger
+ enable_event:net:netif_receive_skb:unlimited if filename==/usr/bin/wget
+ enable_hist:net:netif_receive_skb:unlimited if filename==/usr/bin/wget
+
+ # cat /sys/kernel/debug/tracing/events/sched/sched_process_exit/trigger
+ enable_event:net:netif_receive_skb:unlimited if comm==wget
+ disable_hist:net:netif_receive_skb:unlimited if comm==wget
+
+ In other words, whenever either of the sched_process_exec or
+ sched_process_exit events is hit and matches 'wget', it enables or
+ disables both the histogram and the event log, and what you end up
+ with is a hash table and set of events just covering the specified
+ duration. Run the wget command again:
+
+ $ wget https://www.kernel.org/pub/linux/kernel/v3.x/patch-3.19.xz
+
+ Displaying the 'hist' file should show something similar to what you
+ saw in the last run, but this time you should also see the
+ individual events in the trace file:
+
+ # cat /sys/kernel/debug/tracing/trace
+
+ # tracer: nop
+ #
+ # entries-in-buffer/entries-written: 183/1426 #P:4
+ #
+ # _-----=> irqs-off
+ # / _----=> need-resched
+ # | / _---=> hardirq/softirq
+ # || / _--=> preempt-depth
+ # ||| / delay
+ # TASK-PID CPU# |||| TIMESTAMP FUNCTION
+ # | | | |||| | |
+ wget-15108 [000] ..s1 31769.606929: netif_receive_skb: dev=lo skbaddr=ffff88009c353100 len=60
+ wget-15108 [000] ..s1 31769.606999: netif_receive_skb: dev=lo skbaddr=ffff88009c353200 len=60
+ dnsmasq-1382 [000] ..s1 31769.677652: netif_receive_skb: dev=lo skbaddr=ffff88009c352b00 len=130
+ dnsmasq-1382 [000] ..s1 31769.685917: netif_receive_skb: dev=lo skbaddr=ffff88009c352200 len=138
+ ##### CPU 2 buffer started ####
+ irq/29-iwlwifi-559 [002] ..s. 31772.031529: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d433d00 len=2948
+ irq/29-iwlwifi-559 [002] ..s. 31772.031572: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d432200 len=1500
+ irq/29-iwlwifi-559 [002] ..s. 31772.032196: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d433100 len=2948
+ irq/29-iwlwifi-559 [002] ..s. 31772.032761: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d433000 len=2948
+ irq/29-iwlwifi-559 [002] ..s. 31772.033220: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d432e00 len=1500
+ .
+ .
+ .
+
+ The following example demonstrates how multiple hist triggers can be
+ attached to a given event. This capability can be useful for
+ creating a set of different summaries derived from the same set of
+ events, or for comparing the effects of different filters, among
+ other things.
+
+ # echo 'hist:keys=skbaddr.hex:vals=len if len < 0' >> \
+ /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
+ # echo 'hist:keys=skbaddr.hex:vals=len if len > 4096' >> \
+ /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
+ # echo 'hist:keys=skbaddr.hex:vals=len if len == 256' >> \
+ /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
+ # echo 'hist:keys=skbaddr.hex:vals=len' >> \
+ /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
+ # echo 'hist:keys=len:vals=common_preempt_count' >> \
+ /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
+
+ The above set of commands create four triggers differing only in
+ their filters, along with a completely different though fairly
+ nonsensical trigger. Note that in order to append multiple hist
+ triggers to the same file, you should use the '>>' operator to
+ append them ('>' will also add the new hist trigger, but will remove
+ any existing hist triggers beforehand).
+
+ Displaying the contents of the 'hist' file for the event shows the
+ contents of all five histograms:
+
+ # cat /sys/kernel/debug/tracing/events/net/netif_receive_skb/hist
+
+ # event histogram
+ #
+ # trigger info: hist:keys=len:vals=hitcount,common_preempt_count:sort=hitcount:size=2048 [active]
+ #
+
+ { len: 176 } hitcount: 1 common_preempt_count: 0
+ { len: 223 } hitcount: 1 common_preempt_count: 0
+ { len: 4854 } hitcount: 1 common_preempt_count: 0
+ { len: 395 } hitcount: 1 common_preempt_count: 0
+ { len: 177 } hitcount: 1 common_preempt_count: 0
+ { len: 446 } hitcount: 1 common_preempt_count: 0
+ { len: 1601 } hitcount: 1 common_preempt_count: 0
+ .
+ .
+ .
+ { len: 1280 } hitcount: 66 common_preempt_count: 0
+ { len: 116 } hitcount: 81 common_preempt_count: 40
+ { len: 708 } hitcount: 112 common_preempt_count: 0
+ { len: 46 } hitcount: 221 common_preempt_count: 0
+ { len: 1264 } hitcount: 458 common_preempt_count: 0
+
+ Totals:
+ Hits: 1428
+ Entries: 147
+ Dropped: 0
+
+
+ # event histogram
+ #
+ # trigger info: hist:keys=skbaddr.hex:vals=hitcount,len:sort=hitcount:size=2048 [active]
+ #
+
+ { skbaddr: ffff8800baee5e00 } hitcount: 1 len: 130
+ { skbaddr: ffff88005f3d5600 } hitcount: 1 len: 1280
+ { skbaddr: ffff88005f3d4900 } hitcount: 1 len: 1280
+ { skbaddr: ffff88009fed6300 } hitcount: 1 len: 115
+ { skbaddr: ffff88009fe0ad00 } hitcount: 1 len: 115
+ { skbaddr: ffff88008cdb1900 } hitcount: 1 len: 46
+ { skbaddr: ffff880064b5ef00 } hitcount: 1 len: 118
+ { skbaddr: ffff880044e3c700 } hitcount: 1 len: 60
+ { skbaddr: ffff880100065900 } hitcount: 1 len: 46
+ { skbaddr: ffff8800d46bd500 } hitcount: 1 len: 116
+ { skbaddr: ffff88005f3d5f00 } hitcount: 1 len: 1280
+ { skbaddr: ffff880100064700 } hitcount: 1 len: 365
+ { skbaddr: ffff8800badb6f00 } hitcount: 1 len: 60
+ .
+ .
+ .
+ { skbaddr: ffff88009fe0be00 } hitcount: 27 len: 24677
+ { skbaddr: ffff88009fe0a400 } hitcount: 27 len: 23052
+ { skbaddr: ffff88009fe0b700 } hitcount: 31 len: 25589
+ { skbaddr: ffff88009fe0b600 } hitcount: 32 len: 27326
+ { skbaddr: ffff88006a462800 } hitcount: 68 len: 71678
+ { skbaddr: ffff88006a463700 } hitcount: 70 len: 72678
+ { skbaddr: ffff88006a462b00 } hitcount: 71 len: 77589
+ { skbaddr: ffff88006a463600 } hitcount: 73 len: 71307
+ { skbaddr: ffff88006a462200 } hitcount: 81 len: 81032
+
+ Totals:
+ Hits: 1451
+ Entries: 318
+ Dropped: 0
+
+
+ # event histogram
+ #
+ # trigger info: hist:keys=skbaddr.hex:vals=hitcount,len:sort=hitcount:size=2048 if len == 256 [active]
+ #
+
+
+ Totals:
+ Hits: 0
+ Entries: 0
+ Dropped: 0
+
+
+ # event histogram
+ #
+ # trigger info: hist:keys=skbaddr.hex:vals=hitcount,len:sort=hitcount:size=2048 if len > 4096 [active]
+ #
+
+ { skbaddr: ffff88009fd2c300 } hitcount: 1 len: 7212
+ { skbaddr: ffff8800d2bcce00 } hitcount: 1 len: 7212
+ { skbaddr: ffff8800d2bcd700 } hitcount: 1 len: 7212
+ { skbaddr: ffff8800d2bcda00 } hitcount: 1 len: 21492
+ { skbaddr: ffff8800ae2e2d00 } hitcount: 1 len: 7212
+ { skbaddr: ffff8800d2bcdb00 } hitcount: 1 len: 7212
+ { skbaddr: ffff88006a4df500 } hitcount: 1 len: 4854
+ { skbaddr: ffff88008ce47b00 } hitcount: 1 len: 18636
+ { skbaddr: ffff8800ae2e2200 } hitcount: 1 len: 12924
+ { skbaddr: ffff88005f3e1000 } hitcount: 1 len: 4356
+ { skbaddr: ffff8800d2bcdc00 } hitcount: 2 len: 24420
+ { skbaddr: ffff8800d2bcc200 } hitcount: 2 len: 12996
+
+ Totals:
+ Hits: 14
+ Entries: 12
+ Dropped: 0
+
+
+ # event histogram
+ #
+ # trigger info: hist:keys=skbaddr.hex:vals=hitcount,len:sort=hitcount:size=2048 if len < 0 [active]
+ #
+
+
+ Totals:
+ Hits: 0
+ Entries: 0
+ Dropped: 0
+
+ Named triggers can be used to have triggers share a common set of
+ histogram data. This capability is mostly useful for combining the
+ output of events generated by tracepoints contained inside inline
+ functions, but names can be used in a hist trigger on any event.
+ For example, these two triggers when hit will update the same 'len'
+ field in the shared 'foo' histogram data:
+
+ # echo 'hist:name=foo:keys=skbaddr.hex:vals=len' > \
+ /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
+ # echo 'hist:name=foo:keys=skbaddr.hex:vals=len' > \
+ /sys/kernel/debug/tracing/events/net/netif_rx/trigger
+
+ You can see that they're updating common histogram data by reading
+ each event's hist files at the same time:
+
+ # cat /sys/kernel/debug/tracing/events/net/netif_receive_skb/hist;
+ cat /sys/kernel/debug/tracing/events/net/netif_rx/hist
+
+ # event histogram
+ #
+ # trigger info: hist:name=foo:keys=skbaddr.hex:vals=hitcount,len:sort=hitcount:size=2048 [active]
+ #
+
+ { skbaddr: ffff88000ad53500 } hitcount: 1 len: 46
+ { skbaddr: ffff8800af5a1500 } hitcount: 1 len: 76
+ { skbaddr: ffff8800d62a1900 } hitcount: 1 len: 46
+ { skbaddr: ffff8800d2bccb00 } hitcount: 1 len: 468
+ { skbaddr: ffff8800d3c69900 } hitcount: 1 len: 46
+ { skbaddr: ffff88009ff09100 } hitcount: 1 len: 52
+ { skbaddr: ffff88010f13ab00 } hitcount: 1 len: 168
+ { skbaddr: ffff88006a54f400 } hitcount: 1 len: 46
+ { skbaddr: ffff8800d2bcc500 } hitcount: 1 len: 260
+ { skbaddr: ffff880064505000 } hitcount: 1 len: 46
+ { skbaddr: ffff8800baf24e00 } hitcount: 1 len: 32
+ { skbaddr: ffff88009fe0ad00 } hitcount: 1 len: 46
+ { skbaddr: ffff8800d3edff00 } hitcount: 1 len: 44
+ { skbaddr: ffff88009fe0b400 } hitcount: 1 len: 168
+ { skbaddr: ffff8800a1c55a00 } hitcount: 1 len: 40
+ { skbaddr: ffff8800d2bcd100 } hitcount: 1 len: 40
+ { skbaddr: ffff880064505f00 } hitcount: 1 len: 174
+ { skbaddr: ffff8800a8bff200 } hitcount: 1 len: 160
+ { skbaddr: ffff880044e3cc00 } hitcount: 1 len: 76
+ { skbaddr: ffff8800a8bfe700 } hitcount: 1 len: 46
+ { skbaddr: ffff8800d2bcdc00 } hitcount: 1 len: 32
+ { skbaddr: ffff8800a1f64800 } hitcount: 1 len: 46
+ { skbaddr: ffff8800d2bcde00 } hitcount: 1 len: 988
+ { skbaddr: ffff88006a5dea00 } hitcount: 1 len: 46
+ { skbaddr: ffff88002e37a200 } hitcount: 1 len: 44
+ { skbaddr: ffff8800a1f32c00 } hitcount: 2 len: 676
+ { skbaddr: ffff88000ad52600 } hitcount: 2 len: 107
+ { skbaddr: ffff8800a1f91e00 } hitcount: 2 len: 92
+ { skbaddr: ffff8800af5a0200 } hitcount: 2 len: 142
+ { skbaddr: ffff8800d2bcc600 } hitcount: 2 len: 220
+ { skbaddr: ffff8800ba36f500 } hitcount: 2 len: 92
+ { skbaddr: ffff8800d021f800 } hitcount: 2 len: 92
+ { skbaddr: ffff8800a1f33600 } hitcount: 2 len: 675
+ { skbaddr: ffff8800a8bfff00 } hitcount: 3 len: 138
+ { skbaddr: ffff8800d62a1300 } hitcount: 3 len: 138
+ { skbaddr: ffff88002e37a100 } hitcount: 4 len: 184
+ { skbaddr: ffff880064504400 } hitcount: 4 len: 184
+ { skbaddr: ffff8800a8bfec00 } hitcount: 4 len: 184
+ { skbaddr: ffff88000ad53700 } hitcount: 5 len: 230
+ { skbaddr: ffff8800d2bcdb00 } hitcount: 5 len: 196
+ { skbaddr: ffff8800a1f90000 } hitcount: 6 len: 276
+ { skbaddr: ffff88006a54f900 } hitcount: 6 len: 276
+
+ Totals:
+ Hits: 81
+ Entries: 42
+ Dropped: 0
+ # event histogram
+ #
+ # trigger info: hist:name=foo:keys=skbaddr.hex:vals=hitcount,len:sort=hitcount:size=2048 [active]
+ #
+
+ { skbaddr: ffff88000ad53500 } hitcount: 1 len: 46
+ { skbaddr: ffff8800af5a1500 } hitcount: 1 len: 76
+ { skbaddr: ffff8800d62a1900 } hitcount: 1 len: 46
+ { skbaddr: ffff8800d2bccb00 } hitcount: 1 len: 468
+ { skbaddr: ffff8800d3c69900 } hitcount: 1 len: 46
+ { skbaddr: ffff88009ff09100 } hitcount: 1 len: 52
+ { skbaddr: ffff88010f13ab00 } hitcount: 1 len: 168
+ { skbaddr: ffff88006a54f400 } hitcount: 1 len: 46
+ { skbaddr: ffff8800d2bcc500 } hitcount: 1 len: 260
+ { skbaddr: ffff880064505000 } hitcount: 1 len: 46
+ { skbaddr: ffff8800baf24e00 } hitcount: 1 len: 32
+ { skbaddr: ffff88009fe0ad00 } hitcount: 1 len: 46
+ { skbaddr: ffff8800d3edff00 } hitcount: 1 len: 44
+ { skbaddr: ffff88009fe0b400 } hitcount: 1 len: 168
+ { skbaddr: ffff8800a1c55a00 } hitcount: 1 len: 40
+ { skbaddr: ffff8800d2bcd100 } hitcount: 1 len: 40
+ { skbaddr: ffff880064505f00 } hitcount: 1 len: 174
+ { skbaddr: ffff8800a8bff200 } hitcount: 1 len: 160
+ { skbaddr: ffff880044e3cc00 } hitcount: 1 len: 76
+ { skbaddr: ffff8800a8bfe700 } hitcount: 1 len: 46
+ { skbaddr: ffff8800d2bcdc00 } hitcount: 1 len: 32
+ { skbaddr: ffff8800a1f64800 } hitcount: 1 len: 46
+ { skbaddr: ffff8800d2bcde00 } hitcount: 1 len: 988
+ { skbaddr: ffff88006a5dea00 } hitcount: 1 len: 46
+ { skbaddr: ffff88002e37a200 } hitcount: 1 len: 44
+ { skbaddr: ffff8800a1f32c00 } hitcount: 2 len: 676
+ { skbaddr: ffff88000ad52600 } hitcount: 2 len: 107
+ { skbaddr: ffff8800a1f91e00 } hitcount: 2 len: 92
+ { skbaddr: ffff8800af5a0200 } hitcount: 2 len: 142
+ { skbaddr: ffff8800d2bcc600 } hitcount: 2 len: 220
+ { skbaddr: ffff8800ba36f500 } hitcount: 2 len: 92
+ { skbaddr: ffff8800d021f800 } hitcount: 2 len: 92
+ { skbaddr: ffff8800a1f33600 } hitcount: 2 len: 675
+ { skbaddr: ffff8800a8bfff00 } hitcount: 3 len: 138
+ { skbaddr: ffff8800d62a1300 } hitcount: 3 len: 138
+ { skbaddr: ffff88002e37a100 } hitcount: 4 len: 184
+ { skbaddr: ffff880064504400 } hitcount: 4 len: 184
+ { skbaddr: ffff8800a8bfec00 } hitcount: 4 len: 184
+ { skbaddr: ffff88000ad53700 } hitcount: 5 len: 230
+ { skbaddr: ffff8800d2bcdb00 } hitcount: 5 len: 196
+ { skbaddr: ffff8800a1f90000 } hitcount: 6 len: 276
+ { skbaddr: ffff88006a54f900 } hitcount: 6 len: 276
+
+ Totals:
+ Hits: 81
+ Entries: 42
+ Dropped: 0
+
+ And here's an example that shows how to combine histogram data from
+ any two events even if they don't share any 'compatible' fields
+ other than 'hitcount' and 'stacktrace'. These commands create a
+ couple of triggers named 'bar' using those fields:
+
+ # echo 'hist:name=bar:key=stacktrace:val=hitcount' > \
+ /sys/kernel/debug/tracing/events/sched/sched_process_fork/trigger
+ # echo 'hist:name=bar:key=stacktrace:val=hitcount' > \
+ /sys/kernel/debug/tracing/events/net/netif_rx/trigger
+
+ And displaying the output of either shows some interesting if
+ somewhat confusing output:
+
+ # cat /sys/kernel/debug/tracing/events/sched/sched_process_fork/hist
+ # cat /sys/kernel/debug/tracing/events/net/netif_rx/hist
+
+ # event histogram
+ #
+ # trigger info: hist:name=bar:keys=stacktrace:vals=hitcount:sort=hitcount:size=2048 [active]
+ #
+
+ { stacktrace:
+ _do_fork+0x18e/0x330
+ kernel_thread+0x29/0x30
+ kthreadd+0x154/0x1b0
+ ret_from_fork+0x3f/0x70
+ } hitcount: 1
+ { stacktrace:
+ netif_rx_internal+0xb2/0xd0
+ netif_rx_ni+0x20/0x70
+ dev_loopback_xmit+0xaa/0xd0
+ ip_mc_output+0x126/0x240
+ ip_local_out_sk+0x31/0x40
+ igmp_send_report+0x1e9/0x230
+ igmp_timer_expire+0xe9/0x120
+ call_timer_fn+0x39/0xf0
+ run_timer_softirq+0x1e1/0x290
+ __do_softirq+0xfd/0x290
+ irq_exit+0x98/0xb0
+ smp_apic_timer_interrupt+0x4a/0x60
+ apic_timer_interrupt+0x6d/0x80
+ cpuidle_enter+0x17/0x20
+ call_cpuidle+0x3b/0x60
+ cpu_startup_entry+0x22d/0x310
+ } hitcount: 1
+ { stacktrace:
+ netif_rx_internal+0xb2/0xd0
+ netif_rx_ni+0x20/0x70
+ dev_loopback_xmit+0xaa/0xd0
+ ip_mc_output+0x17f/0x240
+ ip_local_out_sk+0x31/0x40
+ ip_send_skb+0x1a/0x50
+ udp_send_skb+0x13e/0x270
+ udp_sendmsg+0x2bf/0x980
+ inet_sendmsg+0x67/0xa0
+ sock_sendmsg+0x38/0x50
+ SYSC_sendto+0xef/0x170
+ SyS_sendto+0xe/0x10
+ entry_SYSCALL_64_fastpath+0x12/0x6a
+ } hitcount: 2
+ { stacktrace:
+ netif_rx_internal+0xb2/0xd0
+ netif_rx+0x1c/0x60
+ loopback_xmit+0x6c/0xb0
+ dev_hard_start_xmit+0x219/0x3a0
+ __dev_queue_xmit+0x415/0x4f0
+ dev_queue_xmit_sk+0x13/0x20
+ ip_finish_output2+0x237/0x340
+ ip_finish_output+0x113/0x1d0
+ ip_output+0x66/0xc0
+ ip_local_out_sk+0x31/0x40
+ ip_send_skb+0x1a/0x50
+ udp_send_skb+0x16d/0x270
+ udp_sendmsg+0x2bf/0x980
+ inet_sendmsg+0x67/0xa0
+ sock_sendmsg+0x38/0x50
+ ___sys_sendmsg+0x14e/0x270
+ } hitcount: 76
+ { stacktrace:
+ netif_rx_internal+0xb2/0xd0
+ netif_rx+0x1c/0x60
+ loopback_xmit+0x6c/0xb0
+ dev_hard_start_xmit+0x219/0x3a0
+ __dev_queue_xmit+0x415/0x4f0
+ dev_queue_xmit_sk+0x13/0x20
+ ip_finish_output2+0x237/0x340
+ ip_finish_output+0x113/0x1d0
+ ip_output+0x66/0xc0
+ ip_local_out_sk+0x31/0x40
+ ip_send_skb+0x1a/0x50
+ udp_send_skb+0x16d/0x270
+ udp_sendmsg+0x2bf/0x980
+ inet_sendmsg+0x67/0xa0
+ sock_sendmsg+0x38/0x50
+ ___sys_sendmsg+0x269/0x270
+ } hitcount: 77
+ { stacktrace:
+ netif_rx_internal+0xb2/0xd0
+ netif_rx+0x1c/0x60
+ loopback_xmit+0x6c/0xb0
+ dev_hard_start_xmit+0x219/0x3a0
+ __dev_queue_xmit+0x415/0x4f0
+ dev_queue_xmit_sk+0x13/0x20
+ ip_finish_output2+0x237/0x340
+ ip_finish_output+0x113/0x1d0
+ ip_output+0x66/0xc0
+ ip_local_out_sk+0x31/0x40
+ ip_send_skb+0x1a/0x50
+ udp_send_skb+0x16d/0x270
+ udp_sendmsg+0x2bf/0x980
+ inet_sendmsg+0x67/0xa0
+ sock_sendmsg+0x38/0x50
+ SYSC_sendto+0xef/0x170
+ } hitcount: 88
+ { stacktrace:
+ _do_fork+0x18e/0x330
+ SyS_clone+0x19/0x20
+ entry_SYSCALL_64_fastpath+0x12/0x6a
+ } hitcount: 244
+
+ Totals:
+ Hits: 489
+ Entries: 7
+ Dropped: 0
diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index 9857606dd7b7..a6b3705e62a6 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -210,6 +210,11 @@ of ftrace. Here is a list of some of the key files:
Note, sched_switch and sched_wake_up will also trace events
listed in this file.
+ To have the PIDs of children of tasks with their PID in this file
+ added on fork, enable the "event-fork" option. That option will also
+ cause the PIDs of tasks to be removed from this file when the task
+ exits.
+
set_graph_function:
Set a "trigger" function where tracing should start
@@ -725,16 +730,14 @@ noraw
nohex
nobin
noblock
-nostacktrace
trace_printk
-noftrace_preempt
nobranch
annotate
nouserstacktrace
nosym-userobj
noprintk-msg-only
context-info
-latency-format
+nolatency-format
sleep-time
graph-time
record-cmd
@@ -742,7 +745,10 @@ overwrite
nodisable_on_free
irq-info
markers
+noevent-fork
function-trace
+nodisplay-graph
+nostacktrace
To disable one of the options, echo in the option prepended with
"no".
@@ -796,11 +802,6 @@ Here are the available options:
block - When set, reading trace_pipe will not block when polled.
- stacktrace - This is one of the options that changes the trace
- itself. When a trace is recorded, so is the stack
- of functions. This allows for back traces of
- trace sites.
-
trace_printk - Can disable trace_printk() from writing into the buffer.
branch - Enable branch tracing with the tracer.
@@ -897,6 +898,10 @@ x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6]
When disabled, the trace_marker will error with EINVAL
on write.
+ event-fork - When set, tasks with PIDs listed in set_event_pid will have
+ the PIDs of their children added to set_event_pid when those
+ tasks fork. Also, when tasks with PIDs in set_event_pid exit,
+ their PIDs will be removed from the file.
function-trace - The latency tracers will enable function tracing
if this option is enabled (default it is). When
@@ -904,8 +909,17 @@ x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6]
functions. This keeps the overhead of the tracer down
when performing latency tests.
- Note: Some tracers have their own options. They only appear
- when the tracer is active.
+ display-graph - When set, the latency tracers (irqsoff, wakeup, etc) will
+ use function graph tracing instead of function tracing.
+
+ stacktrace - This is one of the options that changes the trace
+ itself. When a trace is recorded, so is the stack
+ of functions. This allows for back traces of
+ trace sites.
+
+ Note: Some tracers have their own options. They only appear in this
+ file when the tracer is active. They always appear in the
+ options directory.
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 222f6aa0418f..be007610ceb0 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -154,21 +154,6 @@ trace_event_buffer_lock_reserve(struct ring_buffer **current_buffer,
struct trace_event_file *trace_file,
int type, unsigned long len,
unsigned long flags, int pc);
-struct ring_buffer_event *
-trace_current_buffer_lock_reserve(struct ring_buffer **current_buffer,
- int type, unsigned long len,
- unsigned long flags, int pc);
-void trace_buffer_unlock_commit(struct trace_array *tr,
- struct ring_buffer *buffer,
- struct ring_buffer_event *event,
- unsigned long flags, int pc);
-void trace_buffer_unlock_commit_regs(struct trace_array *tr,
- struct ring_buffer *buffer,
- struct ring_buffer_event *event,
- unsigned long flags, int pc,
- struct pt_regs *regs);
-void trace_current_buffer_discard_commit(struct ring_buffer *buffer,
- struct ring_buffer_event *event);
void tracing_record_cmdline(struct task_struct *tsk);
@@ -229,7 +214,6 @@ enum {
TRACE_EVENT_FL_NO_SET_FILTER_BIT,
TRACE_EVENT_FL_IGNORE_ENABLE_BIT,
TRACE_EVENT_FL_WAS_ENABLED_BIT,
- TRACE_EVENT_FL_USE_CALL_FILTER_BIT,
TRACE_EVENT_FL_TRACEPOINT_BIT,
TRACE_EVENT_FL_KPROBE_BIT,
TRACE_EVENT_FL_UPROBE_BIT,
@@ -244,7 +228,6 @@ enum {
* WAS_ENABLED - Set and stays set when an event was ever enabled
* (used for module unloading, if a module event is enabled,
* it is best to clear the buffers that used it).
- * USE_CALL_FILTER - For trace internal events, don't use file filter
* TRACEPOINT - Event is a tracepoint
* KPROBE - Event is a kprobe
* UPROBE - Event is a uprobe
@@ -255,7 +238,6 @@ enum {
TRACE_EVENT_FL_NO_SET_FILTER = (1 << TRACE_EVENT_FL_NO_SET_FILTER_BIT),
TRACE_EVENT_FL_IGNORE_ENABLE = (1 << TRACE_EVENT_FL_IGNORE_ENABLE_BIT),
TRACE_EVENT_FL_WAS_ENABLED = (1 << TRACE_EVENT_FL_WAS_ENABLED_BIT),
- TRACE_EVENT_FL_USE_CALL_FILTER = (1 << TRACE_EVENT_FL_USE_CALL_FILTER_BIT),
TRACE_EVENT_FL_TRACEPOINT = (1 << TRACE_EVENT_FL_TRACEPOINT_BIT),
TRACE_EVENT_FL_KPROBE = (1 << TRACE_EVENT_FL_KPROBE_BIT),
TRACE_EVENT_FL_UPROBE = (1 << TRACE_EVENT_FL_UPROBE_BIT),
@@ -407,16 +389,12 @@ enum event_trigger_type {
ETT_SNAPSHOT = (1 << 1),
ETT_STACKTRACE = (1 << 2),
ETT_EVENT_ENABLE = (1 << 3),
+ ETT_EVENT_HIST = (1 << 4),
+ ETT_HIST_ENABLE = (1 << 5),
};
extern int filter_match_preds(struct event_filter *filter, void *rec);
-extern int filter_check_discard(struct trace_event_file *file, void *rec,
- struct ring_buffer *buffer,
- struct ring_buffer_event *event);
-extern int call_filter_check_discard(struct trace_event_call *call, void *rec,
- struct ring_buffer *buffer,
- struct ring_buffer_event *event);
extern enum event_trigger_type event_triggers_call(struct trace_event_file *file,
void *rec);
extern void event_triggers_post_call(struct trace_event_file *file,
@@ -450,100 +428,6 @@ trace_trigger_soft_disabled(struct trace_event_file *file)
return false;
}
-/*
- * Helper function for event_trigger_unlock_commit{_regs}().
- * If there are event triggers attached to this event that requires
- * filtering against its fields, then they wil be called as the
- * entry already holds the field information of the current event.
- *
- * It also checks if the event should be discarded or not.
- * It is to be discarded if the event is soft disabled and the
- * event was only recorded to process triggers, or if the event
- * filter is active and this event did not match the filters.
- *
- * Returns true if the event is discarded, false otherwise.
- */
-static inline bool
-__event_trigger_test_discard(struct trace_event_file *file,
- struct ring_buffer *buffer,
- struct ring_buffer_event *event,
- void *entry,
- enum event_trigger_type *tt)
-{
- unsigned long eflags = file->flags;
-
- if (eflags & EVENT_FILE_FL_TRIGGER_COND)
- *tt = event_triggers_call(file, entry);
-
- if (test_bit(EVENT_FILE_FL_SOFT_DISABLED_BIT, &file->flags))
- ring_buffer_discard_commit(buffer, event);
- else if (!filter_check_discard(file, entry, buffer, event))
- return false;
-
- return true;
-}
-
-/**
- * event_trigger_unlock_commit - handle triggers and finish event commit
- * @file: The file pointer assoctiated to the event
- * @buffer: The ring buffer that the event is being written to
- * @event: The event meta data in the ring buffer
- * @entry: The event itself
- * @irq_flags: The state of the interrupts at the start of the event
- * @pc: The state of the preempt count at the start of the event.
- *
- * This is a helper function to handle triggers that require data
- * from the event itself. It also tests the event against filters and
- * if the event is soft disabled and should be discarded.
- */
-static inline void
-event_trigger_unlock_commit(struct trace_event_file *file,
- struct ring_buffer *buffer,
- struct ring_buffer_event *event,
- void *entry, unsigned long irq_flags, int pc)
-{
- enum event_trigger_type tt = ETT_NONE;
-
- if (!__event_trigger_test_discard(file, buffer, event, entry, &tt))
- trace_buffer_unlock_commit(file->tr, buffer, event, irq_flags, pc);
-
- if (tt)
- event_triggers_post_call(file, tt, entry);
-}
-
-/**
- * event_trigger_unlock_commit_regs - handle triggers and finish event commit
- * @file: The file pointer assoctiated to the event
- * @buffer: The ring buffer that the event is being written to
- * @event: The event meta data in the ring buffer
- * @entry: The event itself
- * @irq_flags: The state of the interrupts at the start of the event
- * @pc: The state of the preempt count at the start of the event.
- *
- * This is a helper function to handle triggers that require data
- * from the event itself. It also tests the event against filters and
- * if the event is soft disabled and should be discarded.
- *
- * Same as event_trigger_unlock_commit() but calls
- * trace_buffer_unlock_commit_regs() instead of trace_buffer_unlock_commit().
- */
-static inline void
-event_trigger_unlock_commit_regs(struct trace_event_file *file,
- struct ring_buffer *buffer,
- struct ring_buffer_event *event,
- void *entry, unsigned long irq_flags, int pc,
- struct pt_regs *regs)
-{
- enum event_trigger_type tt = ETT_NONE;
-
- if (!__event_trigger_test_discard(file, buffer, event, entry, &tt))
- trace_buffer_unlock_commit_regs(file->tr, buffer, event,
- irq_flags, pc, regs);
-
- if (tt)
- event_triggers_post_call(file, tt, entry);
-}
-
#ifdef CONFIG_BPF_EVENTS
unsigned int trace_call_bpf(struct bpf_prog *prog, void *ctx);
#else
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index e45db6b0d878..fafeaf803bd0 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -528,6 +528,32 @@ config MMIOTRACE
See Documentation/trace/mmiotrace.txt.
If you are not helping to develop drivers, say N.
+config TRACING_MAP
+ bool
+ depends on ARCH_HAVE_NMI_SAFE_CMPXCHG
+ help
+ tracing_map is a special-purpose lock-free map for tracing,
+ separated out as a stand-alone facility in order to allow it
+ to be shared between multiple tracers. It isn't meant to be
+ generally used outside of that context, and is normally
+ selected by tracers that use it.
+
+config HIST_TRIGGERS
+ bool "Histogram triggers"
+ depends on ARCH_HAVE_NMI_SAFE_CMPXCHG
+ select TRACING_MAP
+ default n
+ help
+ Hist triggers allow one or more arbitrary trace event fields
+ to be aggregated into hash tables and dumped to stdout by
+ reading a debugfs/tracefs file. They're useful for
+ gathering quick and dirty (though precise) summaries of
+ event activity as an initial guide for further investigation
+ using more advanced tools.
+
+ See Documentation/trace/events.txt.
+ If in doubt, say N.
+
config MMIOTRACE_TEST
tristate "Test module for mmiotrace"
depends on MMIOTRACE && m
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index 9b1044e936a6..979e7bfbde7a 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -31,6 +31,7 @@ obj-$(CONFIG_TRACING) += trace_output.o
obj-$(CONFIG_TRACING) += trace_seq.o
obj-$(CONFIG_TRACING) += trace_stat.o
obj-$(CONFIG_TRACING) += trace_printk.o
+obj-$(CONFIG_TRACING_MAP) += tracing_map.o
obj-$(CONFIG_CONTEXT_SWITCH_TRACER) += trace_sched_switch.o
obj-$(CONFIG_FUNCTION_TRACER) += trace_functions.o
obj-$(CONFIG_IRQSOFF_TRACER) += trace_irqsoff.o
@@ -53,6 +54,7 @@ obj-$(CONFIG_EVENT_TRACING) += trace_event_perf.o
endif
obj-$(CONFIG_EVENT_TRACING) += trace_events_filter.o
obj-$(CONFIG_EVENT_TRACING) += trace_events_trigger.o
+obj-$(CONFIG_HIST_TRIGGERS) += trace_events_hist.o
obj-$(CONFIG_BPF_EVENTS) += bpf_trace.o
obj-$(CONFIG_KPROBE_EVENT) += trace_kprobe.o
obj-$(CONFIG_TRACEPOINTS) += power-traces.o
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index a2f0b9f33e9b..8a4bd6b68a0b 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -253,6 +253,9 @@ unsigned long long ns2usecs(cycle_t nsec)
#define TOP_LEVEL_TRACE_FLAGS (TRACE_ITER_PRINTK | \
TRACE_ITER_PRINTK_MSGONLY | TRACE_ITER_RECORD_CMD)
+/* trace_flags that are default zero for instances */
+#define ZEROED_TRACE_FLAGS \
+ TRACE_ITER_EVENT_FORK
/*
* The global_trace is the descriptor that holds the tracing
@@ -303,33 +306,18 @@ void trace_array_put(struct trace_array *this_tr)
mutex_unlock(&trace_types_lock);
}
-int filter_check_discard(struct trace_event_file *file, void *rec,
- struct ring_buffer *buffer,
- struct ring_buffer_event *event)
-{
- if (unlikely(file->flags & EVENT_FILE_FL_FILTERED) &&
- !filter_match_preds(file->filter, rec)) {
- ring_buffer_discard_commit(buffer, event);
- return 1;
- }
-
- return 0;
-}
-EXPORT_SYMBOL_GPL(filter_check_discard);
-
int call_filter_check_discard(struct trace_event_call *call, void *rec,
struct ring_buffer *buffer,
struct ring_buffer_event *event)
{
if (unlikely(call->flags & TRACE_EVENT_FL_FILTERED) &&
!filter_match_preds(call->filter, rec)) {
- ring_buffer_discard_commit(buffer, event);
+ __trace_event_discard_commit(buffer, event);
return 1;
}
return 0;
}
-EXPORT_SYMBOL_GPL(call_filter_check_discard);
static cycle_t buffer_ftrace_now(struct trace_buffer *buf, int cpu)
{
@@ -1672,6 +1660,16 @@ tracing_generic_entry_update(struct trace_entry *entry, unsigned long flags,
}
EXPORT_SYMBOL_GPL(tracing_generic_entry_update);
+static __always_inline void
+trace_event_setup(struct ring_buffer_event *event,
+ int type, unsigned long flags, int pc)
+{
+ struct trace_entry *ent = ring_buffer_event_data(event);
+
+ tracing_generic_entry_update(ent, flags, pc);
+ ent->type = type;
+}
+
struct ring_buffer_event *
trace_buffer_lock_reserve(struct ring_buffer *buffer,
int type,
@@ -1681,34 +1679,137 @@ trace_buffer_lock_reserve(struct ring_buffer *buffer,
struct ring_buffer_event *event;
event = ring_buffer_lock_reserve(buffer, len);
- if (event != NULL) {
- struct trace_entry *ent = ring_buffer_event_data(event);
+ if (event != NULL)
+ trace_event_setup(event, type, flags, pc);
+
+ return event;
+}
+
+DEFINE_PER_CPU(struct ring_buffer_event *, trace_buffered_event);
+DEFINE_PER_CPU(int, trace_buffered_event_cnt);
+static int trace_buffered_event_ref;
+
+/**
+ * trace_buffered_event_enable - enable buffering events
+ *
+ * When events are being filtered, it is quicker to use a temporary
+ * buffer to write the event data into if there's a likely chance
+ * that it will not be committed. The discard of the ring buffer
+ * is not as fast as committing, and is much slower than copying
+ * a commit.
+ *
+ * When an event is to be filtered, allocate per cpu buffers to
+ * write the event data into, and if the event is filtered and discarded
+ * it is simply dropped, otherwise, the entire data is to be committed
+ * in one shot.
+ */
+void trace_buffered_event_enable(void)
+{
+ struct ring_buffer_event *event;
+ struct page *page;
+ int cpu;
- tracing_generic_entry_update(ent, flags, pc);
- ent->type = type;
+ WARN_ON_ONCE(!mutex_is_locked(&event_mutex));
+
+ if (trace_buffered_event_ref++)
+ return;
+
+ for_each_tracing_cpu(cpu) {
+ page = alloc_pages_node(cpu_to_node(cpu),
+ GFP_KERNEL | __GFP_NORETRY, 0);
+ if (!page)
+ goto failed;
+
+ event = page_address(page);
+ memset(event, 0, sizeof(*event));
+
+ per_cpu(trace_buffered_event, cpu) = event;
+
+ preempt_disable();
+ if (cpu == smp_processor_id() &&
+ this_cpu_read(trace_buffered_event) !=
+ per_cpu(trace_buffered_event, cpu))
+ WARN_ON_ONCE(1);
+ preempt_enable();
}
- return event;
+ return;
+ failed:
+ trace_buffered_event_disable();
}
-void
-__buffer_unlock_commit(struct ring_buffer *buffer, struct ring_buffer_event *event)
+static void enable_trace_buffered_event(void *data)
{
- __this_cpu_write(trace_cmdline_save, true);
- ring_buffer_unlock_commit(buffer, event);
+ /* Probably not needed, but do it anyway */
+ smp_rmb();
+ this_cpu_dec(trace_buffered_event_cnt);
}
-void trace_buffer_unlock_commit(struct trace_array *tr,
- struct ring_buffer *buffer,
- struct ring_buffer_event *event,
- unsigned long flags, int pc)
+static void disable_trace_buffered_event(void *data)
{
- __buffer_unlock_commit(buffer, event);
+ this_cpu_inc(trace_buffered_event_cnt);
+}
- ftrace_trace_stack(tr, buffer, flags, 6, pc, NULL);
- ftrace_trace_userstack(buffer, flags, pc);
+/**
+ * trace_buffered_event_disable - disable buffering events
+ *
+ * When a filter is removed, it is faster to not use the buffered
+ * events, and to commit directly into the ring buffer. Free up
+ * the temp buffers when there are no more users. This requires
+ * special synchronization with current events.
+ */
+void trace_buffered_event_disable(void)
+{
+ int cpu;
+
+ WARN_ON_ONCE(!mutex_is_locked(&event_mutex));
+
+ if (WARN_ON_ONCE(!trace_buffered_event_ref))
+ return;
+
+ if (--trace_buffered_event_ref)
+ return;
+
+ preempt_disable();
+ /* For each CPU, set the buffer as used. */
+ smp_call_function_many(tracing_buffer_mask,
+ disable_trace_buffered_event, NULL, 1);
+ preempt_enable();
+
+ /* Wait for all current users to finish */
+ synchronize_sched();
+
+ for_each_tracing_cpu(cpu) {
+ free_page((unsigned long)per_cpu(trace_buffered_event, cpu));
+ per_cpu(trace_buffered_event, cpu) = NULL;
+ }
+ /*
+ * Make sure trace_buffered_event is NULL before clearing
+ * trace_buffered_event_cnt.
+ */
+ smp_wmb();
+
+ preempt_disable();
+ /* Do the work on each cpu */
+ smp_call_function_many(tracing_buffer_mask,
+ enable_trace_buffered_event, NULL, 1);
+ preempt_enable();
+}
+
+void
+__buffer_unlock_commit(struct ring_buffer *buffer, struct ring_buffer_event *event)
+{
+ __this_cpu_write(trace_cmdline_save, true);
+
+ /* If this is the temp buffer, we need to commit fully */
+ if (this_cpu_read(trace_buffered_event) == event) {
+ /* Length is in event->array[0] */
+ ring_buffer_write(buffer, event->array[0], &event->array[1]);
+ /* Release the temp buffer */
+ this_cpu_dec(trace_buffered_event_cnt);
+ } else
+ ring_buffer_unlock_commit(buffer, event);
}
-EXPORT_SYMBOL_GPL(trace_buffer_unlock_commit);
static struct ring_buffer *temp_buffer;
@@ -1719,8 +1820,23 @@ trace_event_buffer_lock_reserve(struct ring_buffer **current_rb,
unsigned long flags, int pc)
{
struct ring_buffer_event *entry;
+ int val;
*current_rb = trace_file->tr->trace_buffer.buffer;
+
+ if ((trace_file->flags &
+ (EVENT_FILE_FL_SOFT_DISABLED | EVENT_FILE_FL_FILTERED)) &&
+ (entry = this_cpu_read(trace_buffered_event))) {
+ /* Try to use the per cpu buffer first */
+ val = this_cpu_inc_return(trace_buffered_event_cnt);
+ if (val == 1) {
+ trace_event_setup(entry, type, flags, pc);
+ entry->array[0] = len;
+ return entry;
+ }
+ this_cpu_dec(trace_buffered_event_cnt);
+ }
+
entry = trace_buffer_lock_reserve(*current_rb,
type, len, flags, pc);
/*
@@ -1738,17 +1854,6 @@ trace_event_buffer_lock_reserve(struct ring_buffer **current_rb,
}
EXPORT_SYMBOL_GPL(trace_event_buffer_lock_reserve);
-struct ring_buffer_event *
-trace_current_buffer_lock_reserve(struct ring_buffer **current_rb,
- int type, unsigned long len,
- unsigned long flags, int pc)
-{
- *current_rb = global_trace.trace_buffer.buffer;
- return trace_buffer_lock_reserve(*current_rb,
- type, len, flags, pc);
-}
-EXPORT_SYMBOL_GPL(trace_current_buffer_lock_reserve);
-
void trace_buffer_unlock_commit_regs(struct trace_array *tr,
struct ring_buffer *buffer,
struct ring_buffer_event *event,
@@ -1760,14 +1865,6 @@ void trace_buffer_unlock_commit_regs(struct trace_array *tr,
ftrace_trace_stack(tr, buffer, flags, 0, pc, regs);
ftrace_trace_userstack(buffer, flags, pc);
}
-EXPORT_SYMBOL_GPL(trace_buffer_unlock_commit_regs);
-
-void trace_current_buffer_discard_commit(struct ring_buffer *buffer,
- struct ring_buffer_event *event)
-{
- ring_buffer_discard_commit(buffer, event);
-}
-EXPORT_SYMBOL_GPL(trace_current_buffer_discard_commit);
void
trace_function(struct trace_array *tr,
@@ -3571,6 +3668,9 @@ int set_tracer_flag(struct trace_array *tr, unsigned int mask, int enabled)
if (mask == TRACE_ITER_RECORD_CMD)
trace_event_enable_cmd_record(enabled);
+ if (mask == TRACE_ITER_EVENT_FORK)
+ trace_event_follow_fork(tr, enabled);
+
if (mask == TRACE_ITER_OVERWRITE) {
ring_buffer_change_overwrite(tr->trace_buffer.buffer, enabled);
#ifdef CONFIG_TRACER_MAX_TRACE
@@ -3658,7 +3758,7 @@ tracing_trace_options_write(struct file *filp, const char __user *ubuf,
if (cnt >= sizeof(buf))
return -EINVAL;
- if (copy_from_user(&buf, ubuf, cnt))
+ if (copy_from_user(buf, ubuf, cnt))
return -EFAULT;
buf[cnt] = 0;
@@ -3804,12 +3904,19 @@ static const char readme_msg[] =
"\t trigger: traceon, traceoff\n"
"\t enable_event:<system>:<event>\n"
"\t disable_event:<system>:<event>\n"
+#ifdef CONFIG_HIST_TRIGGERS
+ "\t enable_hist:<system>:<event>\n"
+ "\t disable_hist:<system>:<event>\n"
+#endif
#ifdef CONFIG_STACKTRACE
"\t\t stacktrace\n"
#endif
#ifdef CONFIG_TRACER_SNAPSHOT
"\t\t snapshot\n"
#endif
+#ifdef CONFIG_HIST_TRIGGERS
+ "\t\t hist (see below)\n"
+#endif
"\t example: echo traceoff > events/block/block_unplug/trigger\n"
"\t echo traceoff:3 > events/block/block_unplug/trigger\n"
"\t echo 'enable_event:kmem:kmalloc:3 if nr_rq > 1' > \\\n"
@@ -3825,6 +3932,56 @@ static const char readme_msg[] =
"\t To remove a trigger with a count:\n"
"\t echo '!<trigger>:0 > <system>/<event>/trigger\n"
"\t Filters can be ignored when removing a trigger.\n"
+#ifdef CONFIG_HIST_TRIGGERS
+ " hist trigger\t- If set, event hits are aggregated into a hash table\n"
+ "\t Format: hist:keys=<field1[,field2,...]>\n"
+ "\t [:values=<field1[,field2,...]>]\n"
+ "\t [:sort=<field1[,field2,...]>]\n"
+ "\t [:size=#entries]\n"
+ "\t [:pause][:continue][:clear]\n"
+ "\t [:name=histname1]\n"
+ "\t [if <filter>]\n\n"
+ "\t When a matching event is hit, an entry is added to a hash\n"
+ "\t table using the key(s) and value(s) named, and the value of a\n"
+ "\t sum called 'hitcount' is incremented. Keys and values\n"
+ "\t correspond to fields in the event's format description. Keys\n"
+ "\t can be any field, or the special string 'stacktrace'.\n"
+ "\t Compound keys consisting of up to two fields can be specified\n"
+ "\t by the 'keys' keyword. Values must correspond to numeric\n"
+ "\t fields. Sort keys consisting of up to two fields can be\n"
+ "\t specified using the 'sort' keyword. The sort direction can\n"
+ "\t be modified by appending '.descending' or '.ascending' to a\n"
+ "\t sort field. The 'size' parameter can be used to specify more\n"
+ "\t or fewer than the default 2048 entries for the hashtable size.\n"
+ "\t If a hist trigger is given a name using the 'name' parameter,\n"
+ "\t its histogram data will be shared with other triggers of the\n"
+ "\t same name, and trigger hits will update this common data.\n\n"
+ "\t Reading the 'hist' file for the event will dump the hash\n"
+ "\t table in its entirety to stdout. If there are multiple hist\n"
+ "\t triggers attached to an event, there will be a table for each\n"
+ "\t trigger in the output. The table displayed for a named\n"
+ "\t trigger will be the same as any other instance having the\n"
+ "\t same name. The default format used to display a given field\n"
+ "\t can be modified by appending any of the following modifiers\n"
+ "\t to the field name, as applicable:\n\n"
+ "\t .hex display a number as a hex value\n"
+ "\t .sym display an address as a symbol\n"
+ "\t .sym-offset display an address as a symbol and offset\n"
+ "\t .execname display a common_pid as a program name\n"
+ "\t .syscall display a syscall id as a syscall name\n\n"
+ "\t .log2 display log2 value rather than raw number\n\n"
+ "\t The 'pause' parameter can be used to pause an existing hist\n"
+ "\t trigger or to start a hist trigger but not log any events\n"
+ "\t until told to do so. 'continue' can be used to start or\n"
+ "\t restart a paused hist trigger.\n\n"
+ "\t The 'clear' parameter will clear the contents of a running\n"
+ "\t hist trigger and leave its current paused/active state\n"
+ "\t unchanged.\n\n"
+ "\t The enable_hist and disable_hist triggers can be used to\n"
+ "\t have one event conditionally start and stop another event's\n"
+ "\t already-attached hist trigger. The syntax is analagous to\n"
+ "\t the enable_event and disable_event triggers.\n"
+#endif
;
static ssize_t
@@ -4474,7 +4631,7 @@ tracing_set_trace_write(struct file *filp, const char __user *ubuf,
if (cnt > MAX_TRACER_SIZE)
cnt = MAX_TRACER_SIZE;
- if (copy_from_user(&buf, ubuf, cnt))
+ if (copy_from_user(buf, ubuf, cnt))
return -EFAULT;
buf[cnt] = 0;
@@ -5264,7 +5421,7 @@ static ssize_t tracing_clock_write(struct file *filp, const char __user *ubuf,
if (cnt >= sizeof(buf))
return -EINVAL;
- if (copy_from_user(&buf, ubuf, cnt))
+ if (copy_from_user(buf, ubuf, cnt))
return -EFAULT;
buf[cnt] = 0;
@@ -6650,7 +6807,7 @@ static int instance_mkdir(const char *name)
if (!alloc_cpumask_var(&tr->tracing_cpumask, GFP_KERNEL))
goto out_free_tr;
- tr->trace_flags = global_trace.trace_flags;
+ tr->trace_flags = global_trace.trace_flags & ~ZEROED_TRACE_FLAGS;
cpumask_copy(tr->tracing_cpumask, cpu_all_mask);
@@ -6724,6 +6881,12 @@ static int instance_rmdir(const char *name)
list_del(&tr->list);
+ /* Disable all the flags that were enabled coming in */
+ for (i = 0; i < TRACE_FLAGS_MAX_SIZE; i++) {
+ if ((1 << i) & ZEROED_TRACE_FLAGS)
+ set_tracer_flag(tr, 1 << i, 0);
+ }
+
tracing_set_nop(tr);
event_trace_del_tracer(tr);
ftrace_destroy_function_files(tr);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 3fff4adfd431..5167c366d6b7 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -177,9 +177,8 @@ struct trace_options {
};
struct trace_pid_list {
- unsigned int nr_pids;
- int order;
- pid_t *pids;
+ int pid_max;
+ unsigned long *pids;
};
/*
@@ -656,6 +655,7 @@ static inline void __trace_stack(struct trace_array *tr, unsigned long flags,
extern cycle_t ftrace_now(int cpu);
extern void trace_find_cmdline(int pid, char comm[]);
+extern void trace_event_follow_fork(struct trace_array *tr, bool enable);
#ifdef CONFIG_DYNAMIC_FTRACE
extern unsigned long ftrace_update_tot_cnt;
@@ -967,6 +967,7 @@ extern int trace_get_user(struct trace_parser *parser, const char __user *ubuf,
C(STOP_ON_FREE, "disable_on_free"), \
C(IRQ_INFO, "irq-info"), \
C(MARKERS, "markers"), \
+ C(EVENT_FORK, "event-fork"), \
FUNCTION_FLAGS \
FGRAPH_FLAGS \
STACK_FLAGS \
@@ -1064,6 +1065,137 @@ struct trace_subsystem_dir {
int nr_events;
};
+extern int call_filter_check_discard(struct trace_event_call *call, void *rec,
+ struct ring_buffer *buffer,
+ struct ring_buffer_event *event);
+
+void trace_buffer_unlock_commit_regs(struct trace_array *tr,
+ struct ring_buffer *buffer,
+ struct ring_buffer_event *event,
+ unsigned long flags, int pc,
+ struct pt_regs *regs);
+
+static inline void trace_buffer_unlock_commit(struct trace_array *tr,
+ struct ring_buffer *buffer,
+ struct ring_buffer_event *event,
+ unsigned long flags, int pc)
+{
+ trace_buffer_unlock_commit_regs(tr, buffer, event, flags, pc, NULL);
+}
+
+DECLARE_PER_CPU(struct ring_buffer_event *, trace_buffered_event);
+DECLARE_PER_CPU(int, trace_buffered_event_cnt);
+void trace_buffered_event_disable(void);
+void trace_buffered_event_enable(void);
+
+static inline void
+__trace_event_discard_commit(struct ring_buffer *buffer,
+ struct ring_buffer_event *event)
+{
+ if (this_cpu_read(trace_buffered_event) == event) {
+ /* Simply release the temp buffer */
+ this_cpu_dec(trace_buffered_event_cnt);
+ return;
+ }
+ ring_buffer_discard_commit(buffer, event);
+}
+
+/*
+ * Helper function for event_trigger_unlock_commit{_regs}().
+ * If there are event triggers attached to this event that requires
+ * filtering against its fields, then they wil be called as the
+ * entry already holds the field information of the current event.
+ *
+ * It also checks if the event should be discarded or not.
+ * It is to be discarded if the event is soft disabled and the
+ * event was only recorded to process triggers, or if the event
+ * filter is active and this event did not match the filters.
+ *
+ * Returns true if the event is discarded, false otherwise.
+ */
+static inline bool
+__event_trigger_test_discard(struct trace_event_file *file,
+ struct ring_buffer *buffer,
+ struct ring_buffer_event *event,
+ void *entry,
+ enum event_trigger_type *tt)
+{
+ unsigned long eflags = file->flags;
+
+ if (eflags & EVENT_FILE_FL_TRIGGER_COND)
+ *tt = event_triggers_call(file, entry);
+
+ if (test_bit(EVENT_FILE_FL_SOFT_DISABLED_BIT, &file->flags) ||
+ (unlikely(file->flags & EVENT_FILE_FL_FILTERED) &&
+ !filter_match_preds(file->filter, entry))) {
+ __trace_event_discard_commit(buffer, event);
+ return true;
+ }
+
+ return false;
+}
+
+/**
+ * event_trigger_unlock_commit - handle triggers and finish event commit
+ * @file: The file pointer assoctiated to the event
+ * @buffer: The ring buffer that the event is being written to
+ * @event: The event meta data in the ring buffer
+ * @entry: The event itself
+ * @irq_flags: The state of the interrupts at the start of the event
+ * @pc: The state of the preempt count at the start of the event.
+ *
+ * This is a helper function to handle triggers that require data
+ * from the event itself. It also tests the event against filters and
+ * if the event is soft disabled and should be discarded.
+ */
+static inline void
+event_trigger_unlock_commit(struct trace_event_file *file,
+ struct ring_buffer *buffer,
+ struct ring_buffer_event *event,
+ void *entry, unsigned long irq_flags, int pc)
+{
+ enum event_trigger_type tt = ETT_NONE;
+
+ if (!__event_trigger_test_discard(file, buffer, event, entry, &tt))
+ trace_buffer_unlock_commit(file->tr, buffer, event, irq_flags, pc);
+
+ if (tt)
+ event_triggers_post_call(file, tt, entry);
+}
+
+/**
+ * event_trigger_unlock_commit_regs - handle triggers and finish event commit
+ * @file: The file pointer assoctiated to the event
+ * @buffer: The ring buffer that the event is being written to
+ * @event: The event meta data in the ring buffer
+ * @entry: The event itself
+ * @irq_flags: The state of the interrupts at the start of the event
+ * @pc: The state of the preempt count at the start of the event.
+ *
+ * This is a helper function to handle triggers that require data
+ * from the event itself. It also tests the event against filters and
+ * if the event is soft disabled and should be discarded.
+ *
+ * Same as event_trigger_unlock_commit() but calls
+ * trace_buffer_unlock_commit_regs() instead of trace_buffer_unlock_commit().
+ */
+static inline void
+event_trigger_unlock_commit_regs(struct trace_event_file *file,
+ struct ring_buffer *buffer,
+ struct ring_buffer_event *event,
+ void *entry, unsigned long irq_flags, int pc,
+ struct pt_regs *regs)
+{
+ enum event_trigger_type tt = ETT_NONE;
+
+ if (!__event_trigger_test_discard(file, buffer, event, entry, &tt))
+ trace_buffer_unlock_commit_regs(file->tr, buffer, event,
+ irq_flags, pc, regs);
+
+ if (tt)
+ event_triggers_post_call(file, tt, entry);
+}
+
#define FILTER_PRED_INVALID ((unsigned short)-1)
#define FILTER_PRED_IS_RIGHT (1 << 15)
#define FILTER_PRED_FOLD (1 << 15)
@@ -1161,6 +1293,15 @@ extern struct mutex event_mutex;
extern struct list_head ftrace_events;
extern const struct file_operations event_trigger_fops;
+extern const struct file_operations event_hist_fops;
+
+#ifdef CONFIG_HIST_TRIGGERS
+extern int register_trigger_hist_cmd(void);
+extern int register_trigger_hist_enable_disable_cmds(void);
+#else
+static inline int register_trigger_hist_cmd(void) { return 0; }
+static inline int register_trigger_hist_enable_disable_cmds(void) { return 0; }
+#endif
extern int register_trigger_cmds(void);
extern void clear_event_triggers(struct trace_array *tr);
@@ -1174,9 +1315,41 @@ struct event_trigger_data {
char *filter_str;
void *private_data;
bool paused;
+ bool paused_tmp;
struct list_head list;
+ char *name;
+ struct list_head named_list;
+ struct event_trigger_data *named_data;
+};
+
+/* Avoid typos */
+#define ENABLE_EVENT_STR "enable_event"
+#define DISABLE_EVENT_STR "disable_event"
+#define ENABLE_HIST_STR "enable_hist"
+#define DISABLE_HIST_STR "disable_hist"
+
+struct enable_trigger_data {
+ struct trace_event_file *file;
+ bool enable;
+ bool hist;
};
+extern int event_enable_trigger_print(struct seq_file *m,
+ struct event_trigger_ops *ops,
+ struct event_trigger_data *data);
+extern void event_enable_trigger_free(struct event_trigger_ops *ops,
+ struct event_trigger_data *data);
+extern int event_enable_trigger_func(struct event_command *cmd_ops,
+ struct trace_event_file *file,
+ char *glob, char *cmd, char *param);
+extern int event_enable_register_trigger(char *glob,
+ struct event_trigger_ops *ops,
+ struct event_trigger_data *data,
+ struct trace_event_file *file);
+extern void event_enable_unregister_trigger(char *glob,
+ struct event_trigger_ops *ops,
+ struct event_trigger_data *test,
+ struct trace_event_file *file);
extern void trigger_data_free(struct event_trigger_data *data);
extern int event_trigger_init(struct event_trigger_ops *ops,
struct event_trigger_data *data);
@@ -1189,7 +1362,18 @@ extern void unregister_trigger(char *glob, struct event_trigger_ops *ops,
extern int set_trigger_filter(char *filter_str,
struct event_trigger_data *trigger_data,
struct trace_event_file *file);
+extern struct event_trigger_data *find_named_trigger(const char *name);
+extern bool is_named_trigger(struct event_trigger_data *test);
+extern int save_named_trigger(const char *name,
+ struct event_trigger_data *data);
+extern void del_named_trigger(struct event_trigger_data *data);
+extern void pause_named_trigger(struct event_trigger_data *data);
+extern void unpause_named_trigger(struct event_trigger_data *data);
+extern void set_named_trigger_data(struct event_trigger_data *data,
+ struct event_trigger_data *named_data);
extern int register_event_command(struct event_command *cmd);
+extern int unregister_event_command(struct event_command *cmd);
+extern int register_trigger_hist_enable_disable_cmds(void);
/**
* struct event_trigger_ops - callbacks for trace event triggers
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index b7b0760ba6ee..3d4155892a1e 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -15,7 +15,7 @@
#include <linux/kthread.h>
#include <linux/tracefs.h>
#include <linux/uaccess.h>
-#include <linux/bsearch.h>
+#include <linux/vmalloc.h>
#include <linux/module.h>
#include <linux/ctype.h>
#include <linux/sort.h>
@@ -381,6 +381,7 @@ static int __ftrace_event_enable_disable(struct trace_event_file *file,
{
struct trace_event_call *call = file->event_call;
struct trace_array *tr = file->tr;
+ unsigned long file_flags = file->flags;
int ret = 0;
int disable;
@@ -463,6 +464,15 @@ static int __ftrace_event_enable_disable(struct trace_event_file *file,
break;
}
+ /* Enable or disable use of trace_buffered_event */
+ if ((file_flags & EVENT_FILE_FL_SOFT_DISABLED) !=
+ (file->flags & EVENT_FILE_FL_SOFT_DISABLED)) {
+ if (file->flags & EVENT_FILE_FL_SOFT_DISABLED)
+ trace_buffered_event_enable();
+ else
+ trace_buffered_event_disable();
+ }
+
return ret;
}
@@ -489,24 +499,26 @@ static void ftrace_clear_events(struct trace_array *tr)
mutex_unlock(&event_mutex);
}
-static int cmp_pid(const void *key, const void *elt)
+/* Shouldn't this be in a header? */
+extern int pid_max;
+
+/* Returns true if found in filter */
+static bool
+find_filtered_pid(struct trace_pid_list *filtered_pids, pid_t search_pid)
{
- const pid_t *search_pid = key;
- const pid_t *pid = elt;
+ /*
+ * If pid_max changed after filtered_pids was created, we
+ * by default ignore all pids greater than the previous pid_max.
+ */
+ if (search_pid >= filtered_pids->pid_max)
+ return false;
- if (*search_pid == *pid)
- return 0;
- if (*search_pid < *pid)
- return -1;
- return 1;
+ return test_bit(search_pid, filtered_pids->pids);
}
static bool
-check_ignore_pid(struct trace_pid_list *filtered_pids, struct task_struct *task)
+ignore_this_task(struct trace_pid_list *filtered_pids, struct task_struct *task)
{
- pid_t search_pid;
- pid_t *pid;
-
/*
* Return false, because if filtered_pids does not exist,
* all pids are good to trace.
@@ -514,15 +526,68 @@ check_ignore_pid(struct trace_pid_list *filtered_pids, struct task_struct *task)
if (!filtered_pids)
return false;
- search_pid = task->pid;
+ return !find_filtered_pid(filtered_pids, task->pid);
+}
+
+static void filter_add_remove_task(struct trace_pid_list *pid_list,
+ struct task_struct *self,
+ struct task_struct *task)
+{
+ if (!pid_list)
+ return;
+
+ /* For forks, we only add if the forking task is listed */
+ if (self) {
+ if (!find_filtered_pid(pid_list, self->pid))
+ return;
+ }
- pid = bsearch(&search_pid, filtered_pids->pids,
- filtered_pids->nr_pids, sizeof(pid_t),
- cmp_pid);
- if (!pid)
- return true;
+ /* Sorry, but we don't support pid_max changing after setting */
+ if (task->pid >= pid_list->pid_max)
+ return;
- return false;
+ /* "self" is set for forks, and NULL for exits */
+ if (self)
+ set_bit(task->pid, pid_list->pids);
+ else
+ clear_bit(task->pid, pid_list->pids);
+}
+
+static void
+event_filter_pid_sched_process_exit(void *data, struct task_struct *task)
+{
+ struct trace_pid_list *pid_list;
+ struct trace_array *tr = data;
+
+ pid_list = rcu_dereference_sched(tr->filtered_pids);
+ filter_add_remove_task(pid_list, NULL, task);
+}
+
+static void
+event_filter_pid_sched_process_fork(void *data,
+ struct task_struct *self,
+ struct task_struct *task)
+{
+ struct trace_pid_list *pid_list;
+ struct trace_array *tr = data;
+
+ pid_list = rcu_dereference_sched(tr->filtered_pids);
+ filter_add_remove_task(pid_list, self, task);
+}
+
+void trace_event_follow_fork(struct trace_array *tr, bool enable)
+{
+ if (enable) {
+ register_trace_prio_sched_process_fork(event_filter_pid_sched_process_fork,
+ tr, INT_MIN);
+ register_trace_prio_sched_process_exit(event_filter_pid_sched_process_exit,
+ tr, INT_MAX);
+ } else {
+ unregister_trace_sched_process_fork(event_filter_pid_sched_process_fork,
+ tr);
+ unregister_trace_sched_process_exit(event_filter_pid_sched_process_exit,
+ tr);
+ }
}
static void
@@ -535,8 +600,8 @@ event_filter_pid_sched_switch_probe_pre(void *data, bool preempt,
pid_list = rcu_dereference_sched(tr->filtered_pids);
this_cpu_write(tr->trace_buffer.data->ignore_pid,
- check_ignore_pid(pid_list, prev) &&
- check_ignore_pid(pid_list, next));
+ ignore_this_task(pid_list, prev) &&
+ ignore_this_task(pid_list, next));
}
static void
@@ -549,7 +614,7 @@ event_filter_pid_sched_switch_probe_post(void *data, bool preempt,
pid_list = rcu_dereference_sched(tr->filtered_pids);
this_cpu_write(tr->trace_buffer.data->ignore_pid,
- check_ignore_pid(pid_list, next));
+ ignore_this_task(pid_list, next));
}
static void
@@ -565,7 +630,7 @@ event_filter_pid_sched_wakeup_probe_pre(void *data, struct task_struct *task)
pid_list = rcu_dereference_sched(tr->filtered_pids);
this_cpu_write(tr->trace_buffer.data->ignore_pid,
- check_ignore_pid(pid_list, task));
+ ignore_this_task(pid_list, task));
}
static void
@@ -582,7 +647,7 @@ event_filter_pid_sched_wakeup_probe_post(void *data, struct task_struct *task)
/* Set tracing if current is enabled */
this_cpu_write(tr->trace_buffer.data->ignore_pid,
- check_ignore_pid(pid_list, current));
+ ignore_this_task(pid_list, current));
}
static void __ftrace_clear_event_pids(struct trace_array *tr)
@@ -620,7 +685,7 @@ static void __ftrace_clear_event_pids(struct trace_array *tr)
/* Wait till all users are no longer using pid filtering */
synchronize_sched();
- free_pages((unsigned long)pid_list->pids, pid_list->order);
+ vfree(pid_list->pids);
kfree(pid_list);
}
@@ -964,11 +1029,32 @@ static void t_stop(struct seq_file *m, void *p)
mutex_unlock(&event_mutex);
}
+static void *
+p_next(struct seq_file *m, void *v, loff_t *pos)
+{
+ struct trace_array *tr = m->private;
+ struct trace_pid_list *pid_list = rcu_dereference_sched(tr->filtered_pids);
+ unsigned long pid = (unsigned long)v;
+
+ (*pos)++;
+
+ /* pid already is +1 of the actual prevous bit */
+ pid = find_next_bit(pid_list->pids, pid_list->pid_max, pid);
+
+ /* Return pid + 1 to allow zero to be represented */
+ if (pid < pid_list->pid_max)
+ return (void *)(pid + 1);
+
+ return NULL;
+}
+
static void *p_start(struct seq_file *m, loff_t *pos)
__acquires(RCU)
{
struct trace_pid_list *pid_list;
struct trace_array *tr = m->private;
+ unsigned long pid;
+ loff_t l = 0;
/*
* Grab the mutex, to keep calls to p_next() having the same
@@ -981,10 +1067,18 @@ static void *p_start(struct seq_file *m, loff_t *pos)
pid_list = rcu_dereference_sched(tr->filtered_pids);
- if (!pid_list || *pos >= pid_list->nr_pids)
+ if (!pid_list)
+ return NULL;
+
+ pid = find_first_bit(pid_list->pids, pid_list->pid_max);
+ if (pid >= pid_list->pid_max)
return NULL;
- return (void *)&pid_list->pids[*pos];
+ /* Return pid + 1 so that zero can be the exit value */
+ for (pid++; pid && l < *pos;
+ pid = (unsigned long)p_next(m, (void *)pid, &l))
+ ;
+ return (void *)pid;
}
static void p_stop(struct seq_file *m, void *p)
@@ -994,25 +1088,11 @@ static void p_stop(struct seq_file *m, void *p)
mutex_unlock(&event_mutex);
}
-static void *
-p_next(struct seq_file *m, void *v, loff_t *pos)
-{
- struct trace_array *tr = m->private;
- struct trace_pid_list *pid_list = rcu_dereference_sched(tr->filtered_pids);
-
- (*pos)++;
-
- if (*pos >= pid_list->nr_pids)
- return NULL;
-
- return (void *)&pid_list->pids[*pos];
-}
-
static int p_show(struct seq_file *m, void *v)
{
- pid_t *pid = v;
+ unsigned long pid = (unsigned long)v - 1;
- seq_printf(m, "%d\n", *pid);
+ seq_printf(m, "%lu\n", pid);
return 0;
}
@@ -1561,11 +1641,6 @@ show_header(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos)
return r;
}
-static int max_pids(struct trace_pid_list *pid_list)
-{
- return (PAGE_SIZE << pid_list->order) / sizeof(pid_t);
-}
-
static void ignore_task_cpu(void *data)
{
struct trace_array *tr = data;
@@ -1579,7 +1654,7 @@ static void ignore_task_cpu(void *data)
mutex_is_locked(&event_mutex));
this_cpu_write(tr->trace_buffer.data->ignore_pid,
- check_ignore_pid(pid_list, current));
+ ignore_this_task(pid_list, current));
}
static ssize_t
@@ -1589,7 +1664,7 @@ ftrace_event_pid_write(struct file *filp, const char __user *ubuf,
struct seq_file *m = filp->private_data;
struct trace_array *tr = m->private;
struct trace_pid_list *filtered_pids = NULL;
- struct trace_pid_list *pid_list = NULL;
+ struct trace_pid_list *pid_list;
struct trace_event_file *file;
struct trace_parser parser;
unsigned long val;
@@ -1597,7 +1672,7 @@ ftrace_event_pid_write(struct file *filp, const char __user *ubuf,
ssize_t read = 0;
ssize_t ret = 0;
pid_t pid;
- int i;
+ int nr_pids = 0;
if (!cnt)
return 0;
@@ -1610,10 +1685,43 @@ ftrace_event_pid_write(struct file *filp, const char __user *ubuf,
return -ENOMEM;
mutex_lock(&event_mutex);
+ filtered_pids = rcu_dereference_protected(tr->filtered_pids,
+ lockdep_is_held(&event_mutex));
+
/*
- * Load as many pids into the array before doing a
- * swap from the tr->filtered_pids to the new list.
+ * Always recreate a new array. The write is an all or nothing
+ * operation. Always create a new array when adding new pids by
+ * the user. If the operation fails, then the current list is
+ * not modified.
*/
+ pid_list = kmalloc(sizeof(*pid_list), GFP_KERNEL);
+ if (!pid_list) {
+ read = -ENOMEM;
+ goto out;
+ }
+ pid_list->pid_max = READ_ONCE(pid_max);
+ /* Only truncating will shrink pid_max */
+ if (filtered_pids && filtered_pids->pid_max > pid_list->pid_max)
+ pid_list->pid_max = filtered_pids->pid_max;
+ pid_list->pids = vzalloc((pid_list->pid_max + 7) >> 3);
+ if (!pid_list->pids) {
+ kfree(pid_list);
+ read = -ENOMEM;
+ goto out;
+ }
+ if (filtered_pids) {
+ /* copy the current bits to the new max */
+ pid = find_first_bit(filtered_pids->pids,
+ filtered_pids->pid_max);
+ while (pid < filtered_pids->pid_max) {
+ set_bit(pid, pid_list->pids);
+ pid = find_next_bit(filtered_pids->pids,
+ filtered_pids->pid_max,
+ pid + 1);
+ nr_pids++;
+ }
+ }
+
while (cnt > 0) {
this_pos = 0;
@@ -1631,92 +1739,35 @@ ftrace_event_pid_write(struct file *filp, const char __user *ubuf,
ret = -EINVAL;
if (kstrtoul(parser.buffer, 0, &val))
break;
- if (val > INT_MAX)
+ if (val >= pid_list->pid_max)
break;
pid = (pid_t)val;
- ret = -ENOMEM;
- if (!pid_list) {
- pid_list = kmalloc(sizeof(*pid_list), GFP_KERNEL);
- if (!pid_list)
- break;
+ set_bit(pid, pid_list->pids);
+ nr_pids++;
- filtered_pids = rcu_dereference_protected(tr->filtered_pids,
- lockdep_is_held(&event_mutex));
- if (filtered_pids)
- pid_list->order = filtered_pids->order;
- else
- pid_list->order = 0;
-
- pid_list->pids = (void *)__get_free_pages(GFP_KERNEL,
- pid_list->order);
- if (!pid_list->pids)
- break;
-
- if (filtered_pids) {
- pid_list->nr_pids = filtered_pids->nr_pids;
- memcpy(pid_list->pids, filtered_pids->pids,
- pid_list->nr_pids * sizeof(pid_t));
- } else
- pid_list->nr_pids = 0;
- }
-
- if (pid_list->nr_pids >= max_pids(pid_list)) {
- pid_t *pid_page;
-
- pid_page = (void *)__get_free_pages(GFP_KERNEL,
- pid_list->order + 1);
- if (!pid_page)
- break;
- memcpy(pid_page, pid_list->pids,
- pid_list->nr_pids * sizeof(pid_t));
- free_pages((unsigned long)pid_list->pids, pid_list->order);
-
- pid_list->order++;
- pid_list->pids = pid_page;
- }
-
- pid_list->pids[pid_list->nr_pids++] = pid;
trace_parser_clear(&parser);
ret = 0;
}
trace_parser_put(&parser);
if (ret < 0) {
- if (pid_list)
- free_pages((unsigned long)pid_list->pids, pid_list->order);
+ vfree(pid_list->pids);
kfree(pid_list);
- mutex_unlock(&event_mutex);
- return ret;
- }
-
- if (!pid_list) {
- mutex_unlock(&event_mutex);
- return ret;
+ read = ret;
+ goto out;
}
- sort(pid_list->pids, pid_list->nr_pids, sizeof(pid_t), cmp_pid, NULL);
-
- /* Remove duplicates */
- for (i = 1; i < pid_list->nr_pids; i++) {
- int start = i;
-
- while (i < pid_list->nr_pids &&
- pid_list->pids[i - 1] == pid_list->pids[i])
- i++;
-
- if (start != i) {
- if (i < pid_list->nr_pids) {
- memmove(&pid_list->pids[start], &pid_list->pids[i],
- (pid_list->nr_pids - i) * sizeof(pid_t));
- pid_list->nr_pids -= i - start;
- i = start;
- } else
- pid_list->nr_pids = start;
- }
+ if (!nr_pids) {
+ /* Cleared the list of pids */
+ vfree(pid_list->pids);
+ kfree(pid_list);
+ read = ret;
+ if (!filtered_pids)
+ goto out;
+ pid_list = NULL;
}
-
rcu_assign_pointer(tr->filtered_pids, pid_list);
list_for_each_entry(file, &tr->events, list) {
@@ -1726,7 +1777,7 @@ ftrace_event_pid_write(struct file *filp, const char __user *ubuf,
if (filtered_pids) {
synchronize_sched();
- free_pages((unsigned long)filtered_pids->pids, filtered_pids->order);
+ vfree(filtered_pids->pids);
kfree(filtered_pids);
} else {
/*
@@ -1763,10 +1814,12 @@ ftrace_event_pid_write(struct file *filp, const char __user *ubuf,
*/
on_each_cpu(ignore_task_cpu, tr, 1);
+ out:
mutex_unlock(&event_mutex);
ret = read;
- *ppos += read;
+ if (read > 0)
+ *ppos += read;
return ret;
}
@@ -2121,6 +2174,10 @@ event_create_dir(struct dentry *parent, struct trace_event_file *file)
trace_create_file("trigger", 0644, file->dir, file,
&event_trigger_fops);
+#ifdef CONFIG_HIST_TRIGGERS
+ trace_create_file("hist", 0444, file->dir, file,
+ &event_hist_fops);
+#endif
trace_create_file("format", 0444, file->dir, call,
&ftrace_event_format_fops);
@@ -3368,7 +3425,7 @@ static __init void event_trace_self_tests(void)
static DEFINE_PER_CPU(atomic_t, ftrace_test_event_disable);
-static struct trace_array *event_tr;
+static struct trace_event_file event_trace_file __initdata;
static void __init
function_test_events_call(unsigned long ip, unsigned long parent_ip,
@@ -3392,17 +3449,17 @@ function_test_events_call(unsigned long ip, unsigned long parent_ip,
local_save_flags(flags);
- event = trace_current_buffer_lock_reserve(&buffer,
- TRACE_FN, sizeof(*entry),
- flags, pc);
+ event = trace_event_buffer_lock_reserve(&buffer, &event_trace_file,
+ TRACE_FN, sizeof(*entry),
+ flags, pc);
if (!event)
goto out;
entry = ring_buffer_event_data(event);
entry->ip = ip;
entry->parent_ip = parent_ip;
- trace_buffer_unlock_commit(event_tr, buffer, event, flags, pc);
-
+ event_trigger_unlock_commit(&event_trace_file, buffer, event,
+ entry, flags, pc);
out:
atomic_dec(&per_cpu(ftrace_test_event_disable, cpu));
preempt_enable_notrace();
@@ -3417,9 +3474,11 @@ static struct ftrace_ops trace_ops __initdata =
static __init void event_trace_self_test_with_function(void)
{
int ret;
- event_tr = top_trace_array();
- if (WARN_ON(!event_tr))
+
+ event_trace_file.tr = top_trace_array();
+ if (WARN_ON(!event_trace_file.tr))
return;
+
ret = register_ftrace_function(&trace_ops);
if (WARN_ON(ret < 0)) {
pr_info("Failed to enable function tracer for event tests\n");
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index b3f5051cd4e9..9daa9b3bc6d9 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -689,10 +689,7 @@ static void append_filter_err(struct filter_parse_state *ps,
static inline struct event_filter *event_filter(struct trace_event_file *file)
{
- if (file->event_call->flags & TRACE_EVENT_FL_USE_CALL_FILTER)
- return file->event_call->filter;
- else
- return file->filter;
+ return file->filter;
}
/* caller must hold event_mutex */
@@ -826,12 +823,12 @@ static void __free_preds(struct event_filter *filter)
static void filter_disable(struct trace_event_file *file)
{
- struct trace_event_call *call = file->event_call;
+ unsigned long old_flags = file->flags;
- if (call->flags & TRACE_EVENT_FL_USE_CALL_FILTER)
- call->flags &= ~TRACE_EVENT_FL_FILTERED;
- else
- file->flags &= ~EVENT_FILE_FL_FILTERED;
+ file->flags &= ~EVENT_FILE_FL_FILTERED;
+
+ if (old_flags != file->flags)
+ trace_buffered_event_disable();
}
static void __free_filter(struct event_filter *filter)
@@ -883,13 +880,8 @@ static int __alloc_preds(struct event_filter *filter, int n_preds)
static inline void __remove_filter(struct trace_event_file *file)
{
- struct trace_event_call *call = file->event_call;
-
filter_disable(file);
- if (call->flags & TRACE_EVENT_FL_USE_CALL_FILTER)
- remove_filter_string(call->filter);
- else
- remove_filter_string(file->filter);
+ remove_filter_string(file->filter);
}
static void filter_free_subsystem_preds(struct trace_subsystem_dir *dir,
@@ -906,15 +898,8 @@ static void filter_free_subsystem_preds(struct trace_subsystem_dir *dir,
static inline void __free_subsystem_filter(struct trace_event_file *file)
{
- struct trace_event_call *call = file->event_call;
-
- if (call->flags & TRACE_EVENT_FL_USE_CALL_FILTER) {
- __free_filter(call->filter);
- call->filter = NULL;
- } else {
- __free_filter(file->filter);
- file->filter = NULL;
- }
+ __free_filter(file->filter);
+ file->filter = NULL;
}
static void filter_free_subsystem_filters(struct trace_subsystem_dir *dir,
@@ -1718,69 +1703,43 @@ fail:
static inline void event_set_filtered_flag(struct trace_event_file *file)
{
- struct trace_event_call *call = file->event_call;
+ unsigned long old_flags = file->flags;
- if (call->flags & TRACE_EVENT_FL_USE_CALL_FILTER)
- call->flags |= TRACE_EVENT_FL_FILTERED;
- else
- file->flags |= EVENT_FILE_FL_FILTERED;
+ file->flags |= EVENT_FILE_FL_FILTERED;
+
+ if (old_flags != file->flags)
+ trace_buffered_event_enable();
}
static inline void event_set_filter(struct trace_event_file *file,
struct event_filter *filter)
{
- struct trace_event_call *call = file->event_call;
-
- if (call->flags & TRACE_EVENT_FL_USE_CALL_FILTER)
- rcu_assign_pointer(call->filter, filter);
- else
- rcu_assign_pointer(file->filter, filter);
+ rcu_assign_pointer(file->filter, filter);
}
static inline void event_clear_filter(struct trace_event_file *file)
{
- struct trace_event_call *call = file->event_call;
-
- if (call->flags & TRACE_EVENT_FL_USE_CALL_FILTER)
- RCU_INIT_POINTER(call->filter, NULL);
- else
- RCU_INIT_POINTER(file->filter, NULL);
+ RCU_INIT_POINTER(file->filter, NULL);
}
static inline void
event_set_no_set_filter_flag(struct trace_event_file *file)
{
- struct trace_event_call *call = file->event_call;
-
- if (call->flags & TRACE_EVENT_FL_USE_CALL_FILTER)
- call->flags |= TRACE_EVENT_FL_NO_SET_FILTER;
- else
- file->flags |= EVENT_FILE_FL_NO_SET_FILTER;
+ file->flags |= EVENT_FILE_FL_NO_SET_FILTER;
}
static inline void
event_clear_no_set_filter_flag(struct trace_event_file *file)
{
- struct trace_event_call *call = file->event_call;
-
- if (call->flags & TRACE_EVENT_FL_USE_CALL_FILTER)
- call->flags &= ~TRACE_EVENT_FL_NO_SET_FILTER;
- else
- file->flags &= ~EVENT_FILE_FL_NO_SET_FILTER;
+ file->flags &= ~EVENT_FILE_FL_NO_SET_FILTER;
}
static inline bool
event_no_set_filter_flag(struct trace_event_file *file)
{
- struct trace_event_call *call = file->event_call;
-
if (file->flags & EVENT_FILE_FL_NO_SET_FILTER)
return true;
- if ((call->flags & TRACE_EVENT_FL_USE_CALL_FILTER) &&
- (call->flags & TRACE_EVENT_FL_NO_SET_FILTER))
- return true;
-
return false;
}
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
new file mode 100644
index 000000000000..0c05b8a99806
--- /dev/null
+++ b/kernel/trace/trace_events_hist.c
@@ -0,0 +1,1755 @@
+/*
+ * trace_events_hist - trace event hist triggers
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * Copyright (C) 2015 Tom Zanussi <tom.zanussi@linux.intel.com>
+ */
+
+#include <linux/module.h>
+#include <linux/kallsyms.h>
+#include <linux/mutex.h>
+#include <linux/slab.h>
+#include <linux/stacktrace.h>
+
+#include "tracing_map.h"
+#include "trace.h"
+
+struct hist_field;
+
+typedef u64 (*hist_field_fn_t) (struct hist_field *field, void *event);
+
+struct hist_field {
+ struct ftrace_event_field *field;
+ unsigned long flags;
+ hist_field_fn_t fn;
+ unsigned int size;
+ unsigned int offset;
+};
+
+static u64 hist_field_none(struct hist_field *field, void *event)
+{
+ return 0;
+}
+
+static u64 hist_field_counter(struct hist_field *field, void *event)
+{
+ return 1;
+}
+
+static u64 hist_field_string(struct hist_field *hist_field, void *event)
+{
+ char *addr = (char *)(event + hist_field->field->offset);
+
+ return (u64)(unsigned long)addr;
+}
+
+static u64 hist_field_dynstring(struct hist_field *hist_field, void *event)
+{
+ u32 str_item = *(u32 *)(event + hist_field->field->offset);
+ int str_loc = str_item & 0xffff;
+ char *addr = (char *)(event + str_loc);
+
+ return (u64)(unsigned long)addr;
+}
+
+static u64 hist_field_pstring(struct hist_field *hist_field, void *event)
+{
+ char **addr = (char **)(event + hist_field->field->offset);
+
+ return (u64)(unsigned long)*addr;
+}
+
+static u64 hist_field_log2(struct hist_field *hist_field, void *event)
+{
+ u64 val = *(u64 *)(event + hist_field->field->offset);
+
+ return (u64) ilog2(roundup_pow_of_two(val));
+}
+
+#define DEFINE_HIST_FIELD_FN(type) \
+static u64 hist_field_##type(struct hist_field *hist_field, void *event)\
+{ \
+ type *addr = (type *)(event + hist_field->field->offset); \
+ \
+ return (u64)(unsigned long)*addr; \
+}
+
+DEFINE_HIST_FIELD_FN(s64);
+DEFINE_HIST_FIELD_FN(u64);
+DEFINE_HIST_FIELD_FN(s32);
+DEFINE_HIST_FIELD_FN(u32);
+DEFINE_HIST_FIELD_FN(s16);
+DEFINE_HIST_FIELD_FN(u16);
+DEFINE_HIST_FIELD_FN(s8);
+DEFINE_HIST_FIELD_FN(u8);
+
+#define for_each_hist_field(i, hist_data) \
+ for ((i) = 0; (i) < (hist_data)->n_fields; (i)++)
+
+#define for_each_hist_val_field(i, hist_data) \
+ for ((i) = 0; (i) < (hist_data)->n_vals; (i)++)
+
+#define for_each_hist_key_field(i, hist_data) \
+ for ((i) = (hist_data)->n_vals; (i) < (hist_data)->n_fields; (i)++)
+
+#define HIST_STACKTRACE_DEPTH 16
+#define HIST_STACKTRACE_SIZE (HIST_STACKTRACE_DEPTH * sizeof(unsigned long))
+#define HIST_STACKTRACE_SKIP 5
+
+#define HITCOUNT_IDX 0
+#define HIST_KEY_SIZE_MAX (MAX_FILTER_STR_VAL + HIST_STACKTRACE_SIZE)
+
+enum hist_field_flags {
+ HIST_FIELD_FL_HITCOUNT = 1,
+ HIST_FIELD_FL_KEY = 2,
+ HIST_FIELD_FL_STRING = 4,
+ HIST_FIELD_FL_HEX = 8,
+ HIST_FIELD_FL_SYM = 16,
+ HIST_FIELD_FL_SYM_OFFSET = 32,
+ HIST_FIELD_FL_EXECNAME = 64,
+ HIST_FIELD_FL_SYSCALL = 128,
+ HIST_FIELD_FL_STACKTRACE = 256,
+ HIST_FIELD_FL_LOG2 = 512,
+};
+
+struct hist_trigger_attrs {
+ char *keys_str;
+ char *vals_str;
+ char *sort_key_str;
+ char *name;
+ bool pause;
+ bool cont;
+ bool clear;
+ unsigned int map_bits;
+};
+
+struct hist_trigger_data {
+ struct hist_field *fields[TRACING_MAP_FIELDS_MAX];
+ unsigned int n_vals;
+ unsigned int n_keys;
+ unsigned int n_fields;
+ unsigned int key_size;
+ struct tracing_map_sort_key sort_keys[TRACING_MAP_SORT_KEYS_MAX];
+ unsigned int n_sort_keys;
+ struct trace_event_file *event_file;
+ struct hist_trigger_attrs *attrs;
+ struct tracing_map *map;
+};
+
+static hist_field_fn_t select_value_fn(int field_size, int field_is_signed)
+{
+ hist_field_fn_t fn = NULL;
+
+ switch (field_size) {
+ case 8:
+ if (field_is_signed)
+ fn = hist_field_s64;
+ else
+ fn = hist_field_u64;
+ break;
+ case 4:
+ if (field_is_signed)
+ fn = hist_field_s32;
+ else
+ fn = hist_field_u32;
+ break;
+ case 2:
+ if (field_is_signed)
+ fn = hist_field_s16;
+ else
+ fn = hist_field_u16;
+ break;
+ case 1:
+ if (field_is_signed)
+ fn = hist_field_s8;
+ else
+ fn = hist_field_u8;
+ break;
+ }
+
+ return fn;
+}
+
+static int parse_map_size(char *str)
+{
+ unsigned long size, map_bits;
+ int ret;
+
+ strsep(&str, "=");
+ if (!str) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ ret = kstrtoul(str, 0, &size);
+ if (ret)
+ goto out;
+
+ map_bits = ilog2(roundup_pow_of_two(size));
+ if (map_bits < TRACING_MAP_BITS_MIN ||
+ map_bits > TRACING_MAP_BITS_MAX)
+ ret = -EINVAL;
+ else
+ ret = map_bits;
+ out:
+ return ret;
+}
+
+static void destroy_hist_trigger_attrs(struct hist_trigger_attrs *attrs)
+{
+ if (!attrs)
+ return;
+
+ kfree(attrs->name);
+ kfree(attrs->sort_key_str);
+ kfree(attrs->keys_str);
+ kfree(attrs->vals_str);
+ kfree(attrs);
+}
+
+static struct hist_trigger_attrs *parse_hist_trigger_attrs(char *trigger_str)
+{
+ struct hist_trigger_attrs *attrs;
+ int ret = 0;
+
+ attrs = kzalloc(sizeof(*attrs), GFP_KERNEL);
+ if (!attrs)
+ return ERR_PTR(-ENOMEM);
+
+ while (trigger_str) {
+ char *str = strsep(&trigger_str, ":");
+
+ if ((strncmp(str, "key=", strlen("key=")) == 0) ||
+ (strncmp(str, "keys=", strlen("keys=")) == 0))
+ attrs->keys_str = kstrdup(str, GFP_KERNEL);
+ else if ((strncmp(str, "val=", strlen("val=")) == 0) ||
+ (strncmp(str, "vals=", strlen("vals=")) == 0) ||
+ (strncmp(str, "values=", strlen("values=")) == 0))
+ attrs->vals_str = kstrdup(str, GFP_KERNEL);
+ else if (strncmp(str, "sort=", strlen("sort=")) == 0)
+ attrs->sort_key_str = kstrdup(str, GFP_KERNEL);
+ else if (strncmp(str, "name=", strlen("name=")) == 0)
+ attrs->name = kstrdup(str, GFP_KERNEL);
+ else if (strcmp(str, "pause") == 0)
+ attrs->pause = true;
+ else if ((strcmp(str, "cont") == 0) ||
+ (strcmp(str, "continue") == 0))
+ attrs->cont = true;
+ else if (strcmp(str, "clear") == 0)
+ attrs->clear = true;
+ else if (strncmp(str, "size=", strlen("size=")) == 0) {
+ int map_bits = parse_map_size(str);
+
+ if (map_bits < 0) {
+ ret = map_bits;
+ goto free;
+ }
+ attrs->map_bits = map_bits;
+ } else {
+ ret = -EINVAL;
+ goto free;
+ }
+ }
+
+ if (!attrs->keys_str) {
+ ret = -EINVAL;
+ goto free;
+ }
+
+ return attrs;
+ free:
+ destroy_hist_trigger_attrs(attrs);
+
+ return ERR_PTR(ret);
+}
+
+static inline void save_comm(char *comm, struct task_struct *task)
+{
+ if (!task->pid) {
+ strcpy(comm, "<idle>");
+ return;
+ }
+
+ if (WARN_ON_ONCE(task->pid < 0)) {
+ strcpy(comm, "<XXX>");
+ return;
+ }
+
+ memcpy(comm, task->comm, TASK_COMM_LEN);
+}
+
+static void hist_trigger_elt_comm_free(struct tracing_map_elt *elt)
+{
+ kfree((char *)elt->private_data);
+}
+
+static int hist_trigger_elt_comm_alloc(struct tracing_map_elt *elt)
+{
+ struct hist_trigger_data *hist_data = elt->map->private_data;
+ struct hist_field *key_field;
+ unsigned int i;
+
+ for_each_hist_key_field(i, hist_data) {
+ key_field = hist_data->fields[i];
+
+ if (key_field->flags & HIST_FIELD_FL_EXECNAME) {
+ unsigned int size = TASK_COMM_LEN + 1;
+
+ elt->private_data = kzalloc(size, GFP_KERNEL);
+ if (!elt->private_data)
+ return -ENOMEM;
+ break;
+ }
+ }
+
+ return 0;
+}
+
+static void hist_trigger_elt_comm_copy(struct tracing_map_elt *to,
+ struct tracing_map_elt *from)
+{
+ char *comm_from = from->private_data;
+ char *comm_to = to->private_data;
+
+ if (comm_from)
+ memcpy(comm_to, comm_from, TASK_COMM_LEN + 1);
+}
+
+static void hist_trigger_elt_comm_init(struct tracing_map_elt *elt)
+{
+ char *comm = elt->private_data;
+
+ if (comm)
+ save_comm(comm, current);
+}
+
+static const struct tracing_map_ops hist_trigger_elt_comm_ops = {
+ .elt_alloc = hist_trigger_elt_comm_alloc,
+ .elt_copy = hist_trigger_elt_comm_copy,
+ .elt_free = hist_trigger_elt_comm_free,
+ .elt_init = hist_trigger_elt_comm_init,
+};
+
+static void destroy_hist_field(struct hist_field *hist_field)
+{
+ kfree(hist_field);
+}
+
+static struct hist_field *create_hist_field(struct ftrace_event_field *field,
+ unsigned long flags)
+{
+ struct hist_field *hist_field;
+
+ if (field && is_function_field(field))
+ return NULL;
+
+ hist_field = kzalloc(sizeof(struct hist_field), GFP_KERNEL);
+ if (!hist_field)
+ return NULL;
+
+ if (flags & HIST_FIELD_FL_HITCOUNT) {
+ hist_field->fn = hist_field_counter;
+ goto out;
+ }
+
+ if (flags & HIST_FIELD_FL_STACKTRACE) {
+ hist_field->fn = hist_field_none;
+ goto out;
+ }
+
+ if (flags & HIST_FIELD_FL_LOG2) {
+ hist_field->fn = hist_field_log2;
+ goto out;
+ }
+
+ if (WARN_ON_ONCE(!field))
+ goto out;
+
+ if (is_string_field(field)) {
+ flags |= HIST_FIELD_FL_STRING;
+
+ if (field->filter_type == FILTER_STATIC_STRING)
+ hist_field->fn = hist_field_string;
+ else if (field->filter_type == FILTER_DYN_STRING)
+ hist_field->fn = hist_field_dynstring;
+ else
+ hist_field->fn = hist_field_pstring;
+ } else {
+ hist_field->fn = select_value_fn(field->size,
+ field->is_signed);
+ if (!hist_field->fn) {
+ destroy_hist_field(hist_field);
+ return NULL;
+ }
+ }
+ out:
+ hist_field->field = field;
+ hist_field->flags = flags;
+
+ return hist_field;
+}
+
+static void destroy_hist_fields(struct hist_trigger_data *hist_data)
+{
+ unsigned int i;
+
+ for (i = 0; i < TRACING_MAP_FIELDS_MAX; i++) {
+ if (hist_data->fields[i]) {
+ destroy_hist_field(hist_data->fields[i]);
+ hist_data->fields[i] = NULL;
+ }
+ }
+}
+
+static int create_hitcount_val(struct hist_trigger_data *hist_data)
+{
+ hist_data->fields[HITCOUNT_IDX] =
+ create_hist_field(NULL, HIST_FIELD_FL_HITCOUNT);
+ if (!hist_data->fields[HITCOUNT_IDX])
+ return -ENOMEM;
+
+ hist_data->n_vals++;
+
+ if (WARN_ON(hist_data->n_vals > TRACING_MAP_VALS_MAX))
+ return -EINVAL;
+
+ return 0;
+}
+
+static int create_val_field(struct hist_trigger_data *hist_data,
+ unsigned int val_idx,
+ struct trace_event_file *file,
+ char *field_str)
+{
+ struct ftrace_event_field *field = NULL;
+ unsigned long flags = 0;
+ char *field_name;
+ int ret = 0;
+
+ if (WARN_ON(val_idx >= TRACING_MAP_VALS_MAX))
+ return -EINVAL;
+
+ field_name = strsep(&field_str, ".");
+ if (field_str) {
+ if (strcmp(field_str, "hex") == 0)
+ flags |= HIST_FIELD_FL_HEX;
+ else {
+ ret = -EINVAL;
+ goto out;
+ }
+ }
+
+ field = trace_find_event_field(file->event_call, field_name);
+ if (!field) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ hist_data->fields[val_idx] = create_hist_field(field, flags);
+ if (!hist_data->fields[val_idx]) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ ++hist_data->n_vals;
+
+ if (WARN_ON(hist_data->n_vals > TRACING_MAP_VALS_MAX))
+ ret = -EINVAL;
+ out:
+ return ret;
+}
+
+static int create_val_fields(struct hist_trigger_data *hist_data,
+ struct trace_event_file *file)
+{
+ char *fields_str, *field_str;
+ unsigned int i, j;
+ int ret;
+
+ ret = create_hitcount_val(hist_data);
+ if (ret)
+ goto out;
+
+ fields_str = hist_data->attrs->vals_str;
+ if (!fields_str)
+ goto out;
+
+ strsep(&fields_str, "=");
+ if (!fields_str)
+ goto out;
+
+ for (i = 0, j = 1; i < TRACING_MAP_VALS_MAX &&
+ j < TRACING_MAP_VALS_MAX; i++) {
+ field_str = strsep(&fields_str, ",");
+ if (!field_str)
+ break;
+ if (strcmp(field_str, "hitcount") == 0)
+ continue;
+ ret = create_val_field(hist_data, j++, file, field_str);
+ if (ret)
+ goto out;
+ }
+ if (fields_str && (strcmp(fields_str, "hitcount") != 0))
+ ret = -EINVAL;
+ out:
+ return ret;
+}
+
+static int create_key_field(struct hist_trigger_data *hist_data,
+ unsigned int key_idx,
+ unsigned int key_offset,
+ struct trace_event_file *file,
+ char *field_str)
+{
+ struct ftrace_event_field *field = NULL;
+ unsigned long flags = 0;
+ unsigned int key_size;
+ int ret = 0;
+
+ if (WARN_ON(key_idx >= TRACING_MAP_FIELDS_MAX))
+ return -EINVAL;
+
+ flags |= HIST_FIELD_FL_KEY;
+
+ if (strcmp(field_str, "stacktrace") == 0) {
+ flags |= HIST_FIELD_FL_STACKTRACE;
+ key_size = sizeof(unsigned long) * HIST_STACKTRACE_DEPTH;
+ } else {
+ char *field_name = strsep(&field_str, ".");
+
+ if (field_str) {
+ if (strcmp(field_str, "hex") == 0)
+ flags |= HIST_FIELD_FL_HEX;
+ else if (strcmp(field_str, "sym") == 0)
+ flags |= HIST_FIELD_FL_SYM;
+ else if (strcmp(field_str, "sym-offset") == 0)
+ flags |= HIST_FIELD_FL_SYM_OFFSET;
+ else if ((strcmp(field_str, "execname") == 0) &&
+ (strcmp(field_name, "common_pid") == 0))
+ flags |= HIST_FIELD_FL_EXECNAME;
+ else if (strcmp(field_str, "syscall") == 0)
+ flags |= HIST_FIELD_FL_SYSCALL;
+ else if (strcmp(field_str, "log2") == 0)
+ flags |= HIST_FIELD_FL_LOG2;
+ else {
+ ret = -EINVAL;
+ goto out;
+ }
+ }
+
+ field = trace_find_event_field(file->event_call, field_name);
+ if (!field) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ if (is_string_field(field))
+ key_size = MAX_FILTER_STR_VAL;
+ else
+ key_size = field->size;
+ }
+
+ hist_data->fields[key_idx] = create_hist_field(field, flags);
+ if (!hist_data->fields[key_idx]) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ key_size = ALIGN(key_size, sizeof(u64));
+ hist_data->fields[key_idx]->size = key_size;
+ hist_data->fields[key_idx]->offset = key_offset;
+ hist_data->key_size += key_size;
+ if (hist_data->key_size > HIST_KEY_SIZE_MAX) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ hist_data->n_keys++;
+
+ if (WARN_ON(hist_data->n_keys > TRACING_MAP_KEYS_MAX))
+ return -EINVAL;
+
+ ret = key_size;
+ out:
+ return ret;
+}
+
+static int create_key_fields(struct hist_trigger_data *hist_data,
+ struct trace_event_file *file)
+{
+ unsigned int i, key_offset = 0, n_vals = hist_data->n_vals;
+ char *fields_str, *field_str;
+ int ret = -EINVAL;
+
+ fields_str = hist_data->attrs->keys_str;
+ if (!fields_str)
+ goto out;
+
+ strsep(&fields_str, "=");
+ if (!fields_str)
+ goto out;
+
+ for (i = n_vals; i < n_vals + TRACING_MAP_KEYS_MAX; i++) {
+ field_str = strsep(&fields_str, ",");
+ if (!field_str)
+ break;
+ ret = create_key_field(hist_data, i, key_offset,
+ file, field_str);
+ if (ret < 0)
+ goto out;
+ key_offset += ret;
+ }
+ if (fields_str) {
+ ret = -EINVAL;
+ goto out;
+ }
+ ret = 0;
+ out:
+ return ret;
+}
+
+static int create_hist_fields(struct hist_trigger_data *hist_data,
+ struct trace_event_file *file)
+{
+ int ret;
+
+ ret = create_val_fields(hist_data, file);
+ if (ret)
+ goto out;
+
+ ret = create_key_fields(hist_data, file);
+ if (ret)
+ goto out;
+
+ hist_data->n_fields = hist_data->n_vals + hist_data->n_keys;
+ out:
+ return ret;
+}
+
+static int is_descending(const char *str)
+{
+ if (!str)
+ return 0;
+
+ if (strcmp(str, "descending") == 0)
+ return 1;
+
+ if (strcmp(str, "ascending") == 0)
+ return 0;
+
+ return -EINVAL;
+}
+
+static int create_sort_keys(struct hist_trigger_data *hist_data)
+{
+ char *fields_str = hist_data->attrs->sort_key_str;
+ struct ftrace_event_field *field = NULL;
+ struct tracing_map_sort_key *sort_key;
+ int descending, ret = 0;
+ unsigned int i, j;
+
+ hist_data->n_sort_keys = 1; /* we always have at least one, hitcount */
+
+ if (!fields_str)
+ goto out;
+
+ strsep(&fields_str, "=");
+ if (!fields_str) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ for (i = 0; i < TRACING_MAP_SORT_KEYS_MAX; i++) {
+ char *field_str, *field_name;
+
+ sort_key = &hist_data->sort_keys[i];
+
+ field_str = strsep(&fields_str, ",");
+ if (!field_str) {
+ if (i == 0)
+ ret = -EINVAL;
+ break;
+ }
+
+ if ((i == TRACING_MAP_SORT_KEYS_MAX - 1) && fields_str) {
+ ret = -EINVAL;
+ break;
+ }
+
+ field_name = strsep(&field_str, ".");
+ if (!field_name) {
+ ret = -EINVAL;
+ break;
+ }
+
+ if (strcmp(field_name, "hitcount") == 0) {
+ descending = is_descending(field_str);
+ if (descending < 0) {
+ ret = descending;
+ break;
+ }
+ sort_key->descending = descending;
+ continue;
+ }
+
+ for (j = 1; j < hist_data->n_fields; j++) {
+ field = hist_data->fields[j]->field;
+ if (field && (strcmp(field_name, field->name) == 0)) {
+ sort_key->field_idx = j;
+ descending = is_descending(field_str);
+ if (descending < 0) {
+ ret = descending;
+ goto out;
+ }
+ sort_key->descending = descending;
+ break;
+ }
+ }
+ if (j == hist_data->n_fields) {
+ ret = -EINVAL;
+ break;
+ }
+ }
+ hist_data->n_sort_keys = i;
+ out:
+ return ret;
+}
+
+static void destroy_hist_data(struct hist_trigger_data *hist_data)
+{
+ destroy_hist_trigger_attrs(hist_data->attrs);
+ destroy_hist_fields(hist_data);
+ tracing_map_destroy(hist_data->map);
+ kfree(hist_data);
+}
+
+static int create_tracing_map_fields(struct hist_trigger_data *hist_data)
+{
+ struct tracing_map *map = hist_data->map;
+ struct ftrace_event_field *field;
+ struct hist_field *hist_field;
+ int i, idx;
+
+ for_each_hist_field(i, hist_data) {
+ hist_field = hist_data->fields[i];
+ if (hist_field->flags & HIST_FIELD_FL_KEY) {
+ tracing_map_cmp_fn_t cmp_fn;
+
+ field = hist_field->field;
+
+ if (hist_field->flags & HIST_FIELD_FL_STACKTRACE)
+ cmp_fn = tracing_map_cmp_none;
+ else if (is_string_field(field))
+ cmp_fn = tracing_map_cmp_string;
+ else
+ cmp_fn = tracing_map_cmp_num(field->size,
+ field->is_signed);
+ idx = tracing_map_add_key_field(map,
+ hist_field->offset,
+ cmp_fn);
+
+ } else
+ idx = tracing_map_add_sum_field(map);
+
+ if (idx < 0)
+ return idx;
+ }
+
+ return 0;
+}
+
+static bool need_tracing_map_ops(struct hist_trigger_data *hist_data)
+{
+ struct hist_field *key_field;
+ unsigned int i;
+
+ for_each_hist_key_field(i, hist_data) {
+ key_field = hist_data->fields[i];
+
+ if (key_field->flags & HIST_FIELD_FL_EXECNAME)
+ return true;
+ }
+
+ return false;
+}
+
+static struct hist_trigger_data *
+create_hist_data(unsigned int map_bits,
+ struct hist_trigger_attrs *attrs,
+ struct trace_event_file *file)
+{
+ const struct tracing_map_ops *map_ops = NULL;
+ struct hist_trigger_data *hist_data;
+ int ret = 0;
+
+ hist_data = kzalloc(sizeof(*hist_data), GFP_KERNEL);
+ if (!hist_data)
+ return ERR_PTR(-ENOMEM);
+
+ hist_data->attrs = attrs;
+
+ ret = create_hist_fields(hist_data, file);
+ if (ret)
+ goto free;
+
+ ret = create_sort_keys(hist_data);
+ if (ret)
+ goto free;
+
+ if (need_tracing_map_ops(hist_data))
+ map_ops = &hist_trigger_elt_comm_ops;
+
+ hist_data->map = tracing_map_create(map_bits, hist_data->key_size,
+ map_ops, hist_data);
+ if (IS_ERR(hist_data->map)) {
+ ret = PTR_ERR(hist_data->map);
+ hist_data->map = NULL;
+ goto free;
+ }
+
+ ret = create_tracing_map_fields(hist_data);
+ if (ret)
+ goto free;
+
+ ret = tracing_map_init(hist_data->map);
+ if (ret)
+ goto free;
+
+ hist_data->event_file = file;
+ out:
+ return hist_data;
+ free:
+ hist_data->attrs = NULL;
+
+ destroy_hist_data(hist_data);
+
+ hist_data = ERR_PTR(ret);
+
+ goto out;
+}
+
+static void hist_trigger_elt_update(struct hist_trigger_data *hist_data,
+ struct tracing_map_elt *elt,
+ void *rec)
+{
+ struct hist_field *hist_field;
+ unsigned int i;
+ u64 hist_val;
+
+ for_each_hist_val_field(i, hist_data) {
+ hist_field = hist_data->fields[i];
+ hist_val = hist_field->fn(hist_field, rec);
+ tracing_map_update_sum(elt, i, hist_val);
+ }
+}
+
+static inline void add_to_key(char *compound_key, void *key,
+ struct hist_field *key_field, void *rec)
+{
+ size_t size = key_field->size;
+
+ if (key_field->flags & HIST_FIELD_FL_STRING) {
+ struct ftrace_event_field *field;
+
+ field = key_field->field;
+ if (field->filter_type == FILTER_DYN_STRING)
+ size = *(u32 *)(rec + field->offset) >> 16;
+ else if (field->filter_type == FILTER_PTR_STRING)
+ size = strlen(key);
+ else if (field->filter_type == FILTER_STATIC_STRING)
+ size = field->size;
+
+ /* ensure NULL-termination */
+ if (size > key_field->size - 1)
+ size = key_field->size - 1;
+ }
+
+ memcpy(compound_key + key_field->offset, key, size);
+}
+
+static void event_hist_trigger(struct event_trigger_data *data, void *rec)
+{
+ struct hist_trigger_data *hist_data = data->private_data;
+ bool use_compound_key = (hist_data->n_keys > 1);
+ unsigned long entries[HIST_STACKTRACE_DEPTH];
+ char compound_key[HIST_KEY_SIZE_MAX];
+ struct stack_trace stacktrace;
+ struct hist_field *key_field;
+ struct tracing_map_elt *elt;
+ u64 field_contents;
+ void *key = NULL;
+ unsigned int i;
+
+ memset(compound_key, 0, hist_data->key_size);
+
+ for_each_hist_key_field(i, hist_data) {
+ key_field = hist_data->fields[i];
+
+ if (key_field->flags & HIST_FIELD_FL_STACKTRACE) {
+ stacktrace.max_entries = HIST_STACKTRACE_DEPTH;
+ stacktrace.entries = entries;
+ stacktrace.nr_entries = 0;
+ stacktrace.skip = HIST_STACKTRACE_SKIP;
+
+ memset(stacktrace.entries, 0, HIST_STACKTRACE_SIZE);
+ save_stack_trace(&stacktrace);
+
+ key = entries;
+ } else {
+ field_contents = key_field->fn(key_field, rec);
+ if (key_field->flags & HIST_FIELD_FL_STRING) {
+ key = (void *)(unsigned long)field_contents;
+ use_compound_key = true;
+ } else
+ key = (void *)&field_contents;
+ }
+
+ if (use_compound_key)
+ add_to_key(compound_key, key, key_field, rec);
+ }
+
+ if (use_compound_key)
+ key = compound_key;
+
+ elt = tracing_map_insert(hist_data->map, key);
+ if (elt)
+ hist_trigger_elt_update(hist_data, elt, rec);
+}
+
+static void hist_trigger_stacktrace_print(struct seq_file *m,
+ unsigned long *stacktrace_entries,
+ unsigned int max_entries)
+{
+ char str[KSYM_SYMBOL_LEN];
+ unsigned int spaces = 8;
+ unsigned int i;
+
+ for (i = 0; i < max_entries; i++) {
+ if (stacktrace_entries[i] == ULONG_MAX)
+ return;
+
+ seq_printf(m, "%*c", 1 + spaces, ' ');
+ sprint_symbol(str, stacktrace_entries[i]);
+ seq_printf(m, "%s\n", str);
+ }
+}
+
+static void
+hist_trigger_entry_print(struct seq_file *m,
+ struct hist_trigger_data *hist_data, void *key,
+ struct tracing_map_elt *elt)
+{
+ struct hist_field *key_field;
+ char str[KSYM_SYMBOL_LEN];
+ bool multiline = false;
+ unsigned int i;
+ u64 uval;
+
+ seq_puts(m, "{ ");
+
+ for_each_hist_key_field(i, hist_data) {
+ key_field = hist_data->fields[i];
+
+ if (i > hist_data->n_vals)
+ seq_puts(m, ", ");
+
+ if (key_field->flags & HIST_FIELD_FL_HEX) {
+ uval = *(u64 *)(key + key_field->offset);
+ seq_printf(m, "%s: %llx",
+ key_field->field->name, uval);
+ } else if (key_field->flags & HIST_FIELD_FL_SYM) {
+ uval = *(u64 *)(key + key_field->offset);
+ sprint_symbol_no_offset(str, uval);
+ seq_printf(m, "%s: [%llx] %-45s",
+ key_field->field->name, uval, str);
+ } else if (key_field->flags & HIST_FIELD_FL_SYM_OFFSET) {
+ uval = *(u64 *)(key + key_field->offset);
+ sprint_symbol(str, uval);
+ seq_printf(m, "%s: [%llx] %-55s",
+ key_field->field->name, uval, str);
+ } else if (key_field->flags & HIST_FIELD_FL_EXECNAME) {
+ char *comm = elt->private_data;
+
+ uval = *(u64 *)(key + key_field->offset);
+ seq_printf(m, "%s: %-16s[%10llu]",
+ key_field->field->name, comm, uval);
+ } else if (key_field->flags & HIST_FIELD_FL_SYSCALL) {
+ const char *syscall_name;
+
+ uval = *(u64 *)(key + key_field->offset);
+ syscall_name = get_syscall_name(uval);
+ if (!syscall_name)
+ syscall_name = "unknown_syscall";
+
+ seq_printf(m, "%s: %-30s[%3llu]",
+ key_field->field->name, syscall_name, uval);
+ } else if (key_field->flags & HIST_FIELD_FL_STACKTRACE) {
+ seq_puts(m, "stacktrace:\n");
+ hist_trigger_stacktrace_print(m,
+ key + key_field->offset,
+ HIST_STACKTRACE_DEPTH);
+ multiline = true;
+ } else if (key_field->flags & HIST_FIELD_FL_LOG2) {
+ seq_printf(m, "%s: ~ 2^%-2llu", key_field->field->name,
+ *(u64 *)(key + key_field->offset));
+ } else if (key_field->flags & HIST_FIELD_FL_STRING) {
+ seq_printf(m, "%s: %-50s", key_field->field->name,
+ (char *)(key + key_field->offset));
+ } else {
+ uval = *(u64 *)(key + key_field->offset);
+ seq_printf(m, "%s: %10llu", key_field->field->name,
+ uval);
+ }
+ }
+
+ if (!multiline)
+ seq_puts(m, " ");
+
+ seq_puts(m, "}");
+
+ seq_printf(m, " hitcount: %10llu",
+ tracing_map_read_sum(elt, HITCOUNT_IDX));
+
+ for (i = 1; i < hist_data->n_vals; i++) {
+ if (hist_data->fields[i]->flags & HIST_FIELD_FL_HEX) {
+ seq_printf(m, " %s: %10llx",
+ hist_data->fields[i]->field->name,
+ tracing_map_read_sum(elt, i));
+ } else {
+ seq_printf(m, " %s: %10llu",
+ hist_data->fields[i]->field->name,
+ tracing_map_read_sum(elt, i));
+ }
+ }
+
+ seq_puts(m, "\n");
+}
+
+static int print_entries(struct seq_file *m,
+ struct hist_trigger_data *hist_data)
+{
+ struct tracing_map_sort_entry **sort_entries = NULL;
+ struct tracing_map *map = hist_data->map;
+ int i, n_entries;
+
+ n_entries = tracing_map_sort_entries(map, hist_data->sort_keys,
+ hist_data->n_sort_keys,
+ &sort_entries);
+ if (n_entries < 0)
+ return n_entries;
+
+ for (i = 0; i < n_entries; i++)
+ hist_trigger_entry_print(m, hist_data,
+ sort_entries[i]->key,
+ sort_entries[i]->elt);
+
+ tracing_map_destroy_sort_entries(sort_entries, n_entries);
+
+ return n_entries;
+}
+
+static void hist_trigger_show(struct seq_file *m,
+ struct event_trigger_data *data, int n)
+{
+ struct hist_trigger_data *hist_data;
+ int n_entries, ret = 0;
+
+ if (n > 0)
+ seq_puts(m, "\n\n");
+
+ seq_puts(m, "# event histogram\n#\n# trigger info: ");
+ data->ops->print(m, data->ops, data);
+ seq_puts(m, "#\n\n");
+
+ hist_data = data->private_data;
+ n_entries = print_entries(m, hist_data);
+ if (n_entries < 0) {
+ ret = n_entries;
+ n_entries = 0;
+ }
+
+ seq_printf(m, "\nTotals:\n Hits: %llu\n Entries: %u\n Dropped: %llu\n",
+ (u64)atomic64_read(&hist_data->map->hits),
+ n_entries, (u64)atomic64_read(&hist_data->map->drops));
+}
+
+static int hist_show(struct seq_file *m, void *v)
+{
+ struct event_trigger_data *data;
+ struct trace_event_file *event_file;
+ int n = 0, ret = 0;
+
+ mutex_lock(&event_mutex);
+
+ event_file = event_file_data(m->private);
+ if (unlikely(!event_file)) {
+ ret = -ENODEV;
+ goto out_unlock;
+ }
+
+ list_for_each_entry_rcu(data, &event_file->triggers, list) {
+ if (data->cmd_ops->trigger_type == ETT_EVENT_HIST)
+ hist_trigger_show(m, data, n++);
+ }
+
+ out_unlock:
+ mutex_unlock(&event_mutex);
+
+ return ret;
+}
+
+static int event_hist_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, hist_show, file);
+}
+
+const struct file_operations event_hist_fops = {
+ .open = event_hist_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+static const char *get_hist_field_flags(struct hist_field *hist_field)
+{
+ const char *flags_str = NULL;
+
+ if (hist_field->flags & HIST_FIELD_FL_HEX)
+ flags_str = "hex";
+ else if (hist_field->flags & HIST_FIELD_FL_SYM)
+ flags_str = "sym";
+ else if (hist_field->flags & HIST_FIELD_FL_SYM_OFFSET)
+ flags_str = "sym-offset";
+ else if (hist_field->flags & HIST_FIELD_FL_EXECNAME)
+ flags_str = "execname";
+ else if (hist_field->flags & HIST_FIELD_FL_SYSCALL)
+ flags_str = "syscall";
+ else if (hist_field->flags & HIST_FIELD_FL_LOG2)
+ flags_str = "log2";
+
+ return flags_str;
+}
+
+static void hist_field_print(struct seq_file *m, struct hist_field *hist_field)
+{
+ seq_printf(m, "%s", hist_field->field->name);
+ if (hist_field->flags) {
+ const char *flags_str = get_hist_field_flags(hist_field);
+
+ if (flags_str)
+ seq_printf(m, ".%s", flags_str);
+ }
+}
+
+static int event_hist_trigger_print(struct seq_file *m,
+ struct event_trigger_ops *ops,
+ struct event_trigger_data *data)
+{
+ struct hist_trigger_data *hist_data = data->private_data;
+ struct hist_field *key_field;
+ unsigned int i;
+
+ seq_puts(m, "hist:");
+
+ if (data->name)
+ seq_printf(m, "%s:", data->name);
+
+ seq_puts(m, "keys=");
+
+ for_each_hist_key_field(i, hist_data) {
+ key_field = hist_data->fields[i];
+
+ if (i > hist_data->n_vals)
+ seq_puts(m, ",");
+
+ if (key_field->flags & HIST_FIELD_FL_STACKTRACE)
+ seq_puts(m, "stacktrace");
+ else
+ hist_field_print(m, key_field);
+ }
+
+ seq_puts(m, ":vals=");
+
+ for_each_hist_val_field(i, hist_data) {
+ if (i == HITCOUNT_IDX)
+ seq_puts(m, "hitcount");
+ else {
+ seq_puts(m, ",");
+ hist_field_print(m, hist_data->fields[i]);
+ }
+ }
+
+ seq_puts(m, ":sort=");
+
+ for (i = 0; i < hist_data->n_sort_keys; i++) {
+ struct tracing_map_sort_key *sort_key;
+
+ sort_key = &hist_data->sort_keys[i];
+
+ if (i > 0)
+ seq_puts(m, ",");
+
+ if (sort_key->field_idx == HITCOUNT_IDX)
+ seq_puts(m, "hitcount");
+ else {
+ unsigned int idx = sort_key->field_idx;
+
+ if (WARN_ON(idx >= TRACING_MAP_FIELDS_MAX))
+ return -EINVAL;
+
+ hist_field_print(m, hist_data->fields[idx]);
+ }
+
+ if (sort_key->descending)
+ seq_puts(m, ".descending");
+ }
+
+ seq_printf(m, ":size=%u", (1 << hist_data->map->map_bits));
+
+ if (data->filter_str)
+ seq_printf(m, " if %s", data->filter_str);
+
+ if (data->paused)
+ seq_puts(m, " [paused]");
+ else
+ seq_puts(m, " [active]");
+
+ seq_putc(m, '\n');
+
+ return 0;
+}
+
+static int event_hist_trigger_init(struct event_trigger_ops *ops,
+ struct event_trigger_data *data)
+{
+ struct hist_trigger_data *hist_data = data->private_data;
+
+ if (!data->ref && hist_data->attrs->name)
+ save_named_trigger(hist_data->attrs->name, data);
+
+ data->ref++;
+
+ return 0;
+}
+
+static void event_hist_trigger_free(struct event_trigger_ops *ops,
+ struct event_trigger_data *data)
+{
+ struct hist_trigger_data *hist_data = data->private_data;
+
+ if (WARN_ON_ONCE(data->ref <= 0))
+ return;
+
+ data->ref--;
+ if (!data->ref) {
+ if (data->name)
+ del_named_trigger(data);
+ trigger_data_free(data);
+ destroy_hist_data(hist_data);
+ }
+}
+
+static struct event_trigger_ops event_hist_trigger_ops = {
+ .func = event_hist_trigger,
+ .print = event_hist_trigger_print,
+ .init = event_hist_trigger_init,
+ .free = event_hist_trigger_free,
+};
+
+static int event_hist_trigger_named_init(struct event_trigger_ops *ops,
+ struct event_trigger_data *data)
+{
+ data->ref++;
+
+ save_named_trigger(data->named_data->name, data);
+
+ event_hist_trigger_init(ops, data->named_data);
+
+ return 0;
+}
+
+static void event_hist_trigger_named_free(struct event_trigger_ops *ops,
+ struct event_trigger_data *data)
+{
+ if (WARN_ON_ONCE(data->ref <= 0))
+ return;
+
+ event_hist_trigger_free(ops, data->named_data);
+
+ data->ref--;
+ if (!data->ref) {
+ del_named_trigger(data);
+ trigger_data_free(data);
+ }
+}
+
+static struct event_trigger_ops event_hist_trigger_named_ops = {
+ .func = event_hist_trigger,
+ .print = event_hist_trigger_print,
+ .init = event_hist_trigger_named_init,
+ .free = event_hist_trigger_named_free,
+};
+
+static struct event_trigger_ops *event_hist_get_trigger_ops(char *cmd,
+ char *param)
+{
+ return &event_hist_trigger_ops;
+}
+
+static void hist_clear(struct event_trigger_data *data)
+{
+ struct hist_trigger_data *hist_data = data->private_data;
+
+ if (data->name)
+ pause_named_trigger(data);
+
+ synchronize_sched();
+
+ tracing_map_clear(hist_data->map);
+
+ if (data->name)
+ unpause_named_trigger(data);
+}
+
+static bool compatible_field(struct ftrace_event_field *field,
+ struct ftrace_event_field *test_field)
+{
+ if (field == test_field)
+ return true;
+ if (field == NULL || test_field == NULL)
+ return false;
+ if (strcmp(field->name, test_field->name) != 0)
+ return false;
+ if (strcmp(field->type, test_field->type) != 0)
+ return false;
+ if (field->size != test_field->size)
+ return false;
+ if (field->is_signed != test_field->is_signed)
+ return false;
+
+ return true;
+}
+
+static bool hist_trigger_match(struct event_trigger_data *data,
+ struct event_trigger_data *data_test,
+ struct event_trigger_data *named_data,
+ bool ignore_filter)
+{
+ struct tracing_map_sort_key *sort_key, *sort_key_test;
+ struct hist_trigger_data *hist_data, *hist_data_test;
+ struct hist_field *key_field, *key_field_test;
+ unsigned int i;
+
+ if (named_data && (named_data != data_test) &&
+ (named_data != data_test->named_data))
+ return false;
+
+ if (!named_data && is_named_trigger(data_test))
+ return false;
+
+ hist_data = data->private_data;
+ hist_data_test = data_test->private_data;
+
+ if (hist_data->n_vals != hist_data_test->n_vals ||
+ hist_data->n_fields != hist_data_test->n_fields ||
+ hist_data->n_sort_keys != hist_data_test->n_sort_keys)
+ return false;
+
+ if (!ignore_filter) {
+ if ((data->filter_str && !data_test->filter_str) ||
+ (!data->filter_str && data_test->filter_str))
+ return false;
+ }
+
+ for_each_hist_field(i, hist_data) {
+ key_field = hist_data->fields[i];
+ key_field_test = hist_data_test->fields[i];
+
+ if (key_field->flags != key_field_test->flags)
+ return false;
+ if (!compatible_field(key_field->field, key_field_test->field))
+ return false;
+ if (key_field->offset != key_field_test->offset)
+ return false;
+ }
+
+ for (i = 0; i < hist_data->n_sort_keys; i++) {
+ sort_key = &hist_data->sort_keys[i];
+ sort_key_test = &hist_data_test->sort_keys[i];
+
+ if (sort_key->field_idx != sort_key_test->field_idx ||
+ sort_key->descending != sort_key_test->descending)
+ return false;
+ }
+
+ if (!ignore_filter && data->filter_str &&
+ (strcmp(data->filter_str, data_test->filter_str) != 0))
+ return false;
+
+ return true;
+}
+
+static int hist_register_trigger(char *glob, struct event_trigger_ops *ops,
+ struct event_trigger_data *data,
+ struct trace_event_file *file)
+{
+ struct hist_trigger_data *hist_data = data->private_data;
+ struct event_trigger_data *test, *named_data = NULL;
+ int ret = 0;
+
+ if (hist_data->attrs->name) {
+ named_data = find_named_trigger(hist_data->attrs->name);
+ if (named_data) {
+ if (!hist_trigger_match(data, named_data, named_data,
+ true)) {
+ ret = -EINVAL;
+ goto out;
+ }
+ }
+ }
+
+ if (hist_data->attrs->name && !named_data)
+ goto new;
+
+ list_for_each_entry_rcu(test, &file->triggers, list) {
+ if (test->cmd_ops->trigger_type == ETT_EVENT_HIST) {
+ if (!hist_trigger_match(data, test, named_data, false))
+ continue;
+ if (hist_data->attrs->pause)
+ test->paused = true;
+ else if (hist_data->attrs->cont)
+ test->paused = false;
+ else if (hist_data->attrs->clear)
+ hist_clear(test);
+ else
+ ret = -EEXIST;
+ goto out;
+ }
+ }
+ new:
+ if (hist_data->attrs->cont || hist_data->attrs->clear) {
+ ret = -ENOENT;
+ goto out;
+ }
+
+ if (named_data) {
+ destroy_hist_data(data->private_data);
+ data->private_data = named_data->private_data;
+ set_named_trigger_data(data, named_data);
+ data->ops = &event_hist_trigger_named_ops;
+ }
+
+ if (hist_data->attrs->pause)
+ data->paused = true;
+
+ if (data->ops->init) {
+ ret = data->ops->init(data->ops, data);
+ if (ret < 0)
+ goto out;
+ }
+
+ list_add_rcu(&data->list, &file->triggers);
+ ret++;
+
+ update_cond_flag(file);
+
+ if (trace_event_trigger_enable_disable(file, 1) < 0) {
+ list_del_rcu(&data->list);
+ update_cond_flag(file);
+ ret--;
+ }
+ out:
+ return ret;
+}
+
+static void hist_unregister_trigger(char *glob, struct event_trigger_ops *ops,
+ struct event_trigger_data *data,
+ struct trace_event_file *file)
+{
+ struct hist_trigger_data *hist_data = data->private_data;
+ struct event_trigger_data *test, *named_data = NULL;
+ bool unregistered = false;
+
+ if (hist_data->attrs->name)
+ named_data = find_named_trigger(hist_data->attrs->name);
+
+ list_for_each_entry_rcu(test, &file->triggers, list) {
+ if (test->cmd_ops->trigger_type == ETT_EVENT_HIST) {
+ if (!hist_trigger_match(data, test, named_data, false))
+ continue;
+ unregistered = true;
+ list_del_rcu(&test->list);
+ trace_event_trigger_enable_disable(file, 0);
+ update_cond_flag(file);
+ break;
+ }
+ }
+
+ if (unregistered && test->ops->free)
+ test->ops->free(test->ops, test);
+}
+
+static void hist_unreg_all(struct trace_event_file *file)
+{
+ struct event_trigger_data *test;
+
+ list_for_each_entry_rcu(test, &file->triggers, list) {
+ if (test->cmd_ops->trigger_type == ETT_EVENT_HIST) {
+ list_del_rcu(&test->list);
+ trace_event_trigger_enable_disable(file, 0);
+ update_cond_flag(file);
+ if (test->ops->free)
+ test->ops->free(test->ops, test);
+ }
+ }
+}
+
+static int event_hist_trigger_func(struct event_command *cmd_ops,
+ struct trace_event_file *file,
+ char *glob, char *cmd, char *param)
+{
+ unsigned int hist_trigger_bits = TRACING_MAP_BITS_DEFAULT;
+ struct event_trigger_data *trigger_data;
+ struct hist_trigger_attrs *attrs;
+ struct event_trigger_ops *trigger_ops;
+ struct hist_trigger_data *hist_data;
+ char *trigger;
+ int ret = 0;
+
+ if (!param)
+ return -EINVAL;
+
+ /* separate the trigger from the filter (k:v [if filter]) */
+ trigger = strsep(&param, " \t");
+ if (!trigger)
+ return -EINVAL;
+
+ attrs = parse_hist_trigger_attrs(trigger);
+ if (IS_ERR(attrs))
+ return PTR_ERR(attrs);
+
+ if (attrs->map_bits)
+ hist_trigger_bits = attrs->map_bits;
+
+ hist_data = create_hist_data(hist_trigger_bits, attrs, file);
+ if (IS_ERR(hist_data)) {
+ destroy_hist_trigger_attrs(attrs);
+ return PTR_ERR(hist_data);
+ }
+
+ trigger_ops = cmd_ops->get_trigger_ops(cmd, trigger);
+
+ ret = -ENOMEM;
+ trigger_data = kzalloc(sizeof(*trigger_data), GFP_KERNEL);
+ if (!trigger_data)
+ goto out_free;
+
+ trigger_data->count = -1;
+ trigger_data->ops = trigger_ops;
+ trigger_data->cmd_ops = cmd_ops;
+
+ INIT_LIST_HEAD(&trigger_data->list);
+ RCU_INIT_POINTER(trigger_data->filter, NULL);
+
+ trigger_data->private_data = hist_data;
+
+ /* if param is non-empty, it's supposed to be a filter */
+ if (param && cmd_ops->set_filter) {
+ ret = cmd_ops->set_filter(param, trigger_data, file);
+ if (ret < 0)
+ goto out_free;
+ }
+
+ if (glob[0] == '!') {
+ cmd_ops->unreg(glob+1, trigger_ops, trigger_data, file);
+ ret = 0;
+ goto out_free;
+ }
+
+ ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file);
+ /*
+ * The above returns on success the # of triggers registered,
+ * but if it didn't register any it returns zero. Consider no
+ * triggers registered a failure too.
+ */
+ if (!ret) {
+ if (!(attrs->pause || attrs->cont || attrs->clear))
+ ret = -ENOENT;
+ goto out_free;
+ } else if (ret < 0)
+ goto out_free;
+ /* Just return zero, not the number of registered triggers */
+ ret = 0;
+ out:
+ return ret;
+ out_free:
+ if (cmd_ops->set_filter)
+ cmd_ops->set_filter(NULL, trigger_data, NULL);
+
+ kfree(trigger_data);
+
+ destroy_hist_data(hist_data);
+ goto out;
+}
+
+static struct event_command trigger_hist_cmd = {
+ .name = "hist",
+ .trigger_type = ETT_EVENT_HIST,
+ .flags = EVENT_CMD_FL_NEEDS_REC,
+ .func = event_hist_trigger_func,
+ .reg = hist_register_trigger,
+ .unreg = hist_unregister_trigger,
+ .unreg_all = hist_unreg_all,
+ .get_trigger_ops = event_hist_get_trigger_ops,
+ .set_filter = set_trigger_filter,
+};
+
+__init int register_trigger_hist_cmd(void)
+{
+ int ret;
+
+ ret = register_event_command(&trigger_hist_cmd);
+ WARN_ON(ret < 0);
+
+ return ret;
+}
+
+static void
+hist_enable_trigger(struct event_trigger_data *data, void *rec)
+{
+ struct enable_trigger_data *enable_data = data->private_data;
+ struct event_trigger_data *test;
+
+ list_for_each_entry_rcu(test, &enable_data->file->triggers, list) {
+ if (test->cmd_ops->trigger_type == ETT_EVENT_HIST) {
+ if (enable_data->enable)
+ test->paused = false;
+ else
+ test->paused = true;
+ }
+ }
+}
+
+static void
+hist_enable_count_trigger(struct event_trigger_data *data, void *rec)
+{
+ if (!data->count)
+ return;
+
+ if (data->count != -1)
+ (data->count)--;
+
+ hist_enable_trigger(data, rec);
+}
+
+static struct event_trigger_ops hist_enable_trigger_ops = {
+ .func = hist_enable_trigger,
+ .print = event_enable_trigger_print,
+ .init = event_trigger_init,
+ .free = event_enable_trigger_free,
+};
+
+static struct event_trigger_ops hist_enable_count_trigger_ops = {
+ .func = hist_enable_count_trigger,
+ .print = event_enable_trigger_print,
+ .init = event_trigger_init,
+ .free = event_enable_trigger_free,
+};
+
+static struct event_trigger_ops hist_disable_trigger_ops = {
+ .func = hist_enable_trigger,
+ .print = event_enable_trigger_print,
+ .init = event_trigger_init,
+ .free = event_enable_trigger_free,
+};
+
+static struct event_trigger_ops hist_disable_count_trigger_ops = {
+ .func = hist_enable_count_trigger,
+ .print = event_enable_trigger_print,
+ .init = event_trigger_init,
+ .free = event_enable_trigger_free,
+};
+
+static struct event_trigger_ops *
+hist_enable_get_trigger_ops(char *cmd, char *param)
+{
+ struct event_trigger_ops *ops;
+ bool enable;
+
+ enable = (strcmp(cmd, ENABLE_HIST_STR) == 0);
+
+ if (enable)
+ ops = param ? &hist_enable_count_trigger_ops :
+ &hist_enable_trigger_ops;
+ else
+ ops = param ? &hist_disable_count_trigger_ops :
+ &hist_disable_trigger_ops;
+
+ return ops;
+}
+
+static void hist_enable_unreg_all(struct trace_event_file *file)
+{
+ struct event_trigger_data *test;
+
+ list_for_each_entry_rcu(test, &file->triggers, list) {
+ if (test->cmd_ops->trigger_type == ETT_HIST_ENABLE) {
+ list_del_rcu(&test->list);
+ update_cond_flag(file);
+ trace_event_trigger_enable_disable(file, 0);
+ if (test->ops->free)
+ test->ops->free(test->ops, test);
+ }
+ }
+}
+
+static struct event_command trigger_hist_enable_cmd = {
+ .name = ENABLE_HIST_STR,
+ .trigger_type = ETT_HIST_ENABLE,
+ .func = event_enable_trigger_func,
+ .reg = event_enable_register_trigger,
+ .unreg = event_enable_unregister_trigger,
+ .unreg_all = hist_enable_unreg_all,
+ .get_trigger_ops = hist_enable_get_trigger_ops,
+ .set_filter = set_trigger_filter,
+};
+
+static struct event_command trigger_hist_disable_cmd = {
+ .name = DISABLE_HIST_STR,
+ .trigger_type = ETT_HIST_ENABLE,
+ .func = event_enable_trigger_func,
+ .reg = event_enable_register_trigger,
+ .unreg = event_enable_unregister_trigger,
+ .unreg_all = hist_enable_unreg_all,
+ .get_trigger_ops = hist_enable_get_trigger_ops,
+ .set_filter = set_trigger_filter,
+};
+
+static __init void unregister_trigger_hist_enable_disable_cmds(void)
+{
+ unregister_event_command(&trigger_hist_enable_cmd);
+ unregister_event_command(&trigger_hist_disable_cmd);
+}
+
+__init int register_trigger_hist_enable_disable_cmds(void)
+{
+ int ret;
+
+ ret = register_event_command(&trigger_hist_enable_cmd);
+ if (WARN_ON(ret < 0))
+ return ret;
+ ret = register_event_command(&trigger_hist_disable_cmd);
+ if (WARN_ON(ret < 0))
+ unregister_trigger_hist_enable_disable_cmds();
+
+ return ret;
+}
diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
index d67992f3bb0e..a975571cde24 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -347,7 +347,7 @@ __init int register_event_command(struct event_command *cmd)
* Currently we only unregister event commands from __init, so mark
* this __init too.
*/
-static __init int unregister_event_command(struct event_command *cmd)
+__init int unregister_event_command(struct event_command *cmd)
{
struct event_command *p, *n;
int ret = -ENODEV;
@@ -641,6 +641,7 @@ event_trigger_callback(struct event_command *cmd_ops,
trigger_data->ops = trigger_ops;
trigger_data->cmd_ops = cmd_ops;
INIT_LIST_HEAD(&trigger_data->list);
+ INIT_LIST_HEAD(&trigger_data->named_list);
if (glob[0] == '!') {
cmd_ops->unreg(glob+1, trigger_ops, trigger_data, file);
@@ -764,6 +765,148 @@ int set_trigger_filter(char *filter_str,
return ret;
}
+static LIST_HEAD(named_triggers);
+
+/**
+ * find_named_trigger - Find the common named trigger associated with @name
+ * @name: The name of the set of named triggers to find the common data for
+ *
+ * Named triggers are sets of triggers that share a common set of
+ * trigger data. The first named trigger registered with a given name
+ * owns the common trigger data that the others subsequently
+ * registered with the same name will reference. This function
+ * returns the common trigger data associated with that first
+ * registered instance.
+ *
+ * Return: the common trigger data for the given named trigger on
+ * success, NULL otherwise.
+ */
+struct event_trigger_data *find_named_trigger(const char *name)
+{
+ struct event_trigger_data *data;
+
+ if (!name)
+ return NULL;
+
+ list_for_each_entry(data, &named_triggers, named_list) {
+ if (data->named_data)
+ continue;
+ if (strcmp(data->name, name) == 0)
+ return data;
+ }
+
+ return NULL;
+}
+
+/**
+ * is_named_trigger - determine if a given trigger is a named trigger
+ * @test: The trigger data to test
+ *
+ * Return: true if 'test' is a named trigger, false otherwise.
+ */
+bool is_named_trigger(struct event_trigger_data *test)
+{
+ struct event_trigger_data *data;
+
+ list_for_each_entry(data, &named_triggers, named_list) {
+ if (test == data)
+ return true;
+ }
+
+ return false;
+}
+
+/**
+ * save_named_trigger - save the trigger in the named trigger list
+ * @name: The name of the named trigger set
+ * @data: The trigger data to save
+ *
+ * Return: 0 if successful, negative error otherwise.
+ */
+int save_named_trigger(const char *name, struct event_trigger_data *data)
+{
+ data->name = kstrdup(name, GFP_KERNEL);
+ if (!data->name)
+ return -ENOMEM;
+
+ list_add(&data->named_list, &named_triggers);
+
+ return 0;
+}
+
+/**
+ * del_named_trigger - delete a trigger from the named trigger list
+ * @data: The trigger data to delete
+ */
+void del_named_trigger(struct event_trigger_data *data)
+{
+ kfree(data->name);
+ data->name = NULL;
+
+ list_del(&data->named_list);
+}
+
+static void __pause_named_trigger(struct event_trigger_data *data, bool pause)
+{
+ struct event_trigger_data *test;
+
+ list_for_each_entry(test, &named_triggers, named_list) {
+ if (strcmp(test->name, data->name) == 0) {
+ if (pause) {
+ test->paused_tmp = test->paused;
+ test->paused = true;
+ } else {
+ test->paused = test->paused_tmp;
+ }
+ }
+ }
+}
+
+/**
+ * pause_named_trigger - Pause all named triggers with the same name
+ * @data: The trigger data of a named trigger to pause
+ *
+ * Pauses a named trigger along with all other triggers having the
+ * same name. Because named triggers share a common set of data,
+ * pausing only one is meaningless, so pausing one named trigger needs
+ * to pause all triggers with the same name.
+ */
+void pause_named_trigger(struct event_trigger_data *data)
+{
+ __pause_named_trigger(data, true);
+}
+
+/**
+ * unpause_named_trigger - Un-pause all named triggers with the same name
+ * @data: The trigger data of a named trigger to unpause
+ *
+ * Un-pauses a named trigger along with all other triggers having the
+ * same name. Because named triggers share a common set of data,
+ * unpausing only one is meaningless, so unpausing one named trigger
+ * needs to unpause all triggers with the same name.
+ */
+void unpause_named_trigger(struct event_trigger_data *data)
+{
+ __pause_named_trigger(data, false);
+}
+
+/**
+ * set_named_trigger_data - Associate common named trigger data
+ * @data: The trigger data of a named trigger to unpause
+ *
+ * Named triggers are sets of triggers that share a common set of
+ * trigger data. The first named trigger registered with a given name
+ * owns the common trigger data that the others subsequently
+ * registered with the same name will reference. This function
+ * associates the common trigger data from the first trigger with the
+ * given trigger.
+ */
+void set_named_trigger_data(struct event_trigger_data *data,
+ struct event_trigger_data *named_data)
+{
+ data->named_data = named_data;
+}
+
static void
traceon_trigger(struct event_trigger_data *data, void *rec)
{
@@ -1062,15 +1205,6 @@ static __init void unregister_trigger_traceon_traceoff_cmds(void)
unregister_event_command(&trigger_traceoff_cmd);
}
-/* Avoid typos */
-#define ENABLE_EVENT_STR "enable_event"
-#define DISABLE_EVENT_STR "disable_event"
-
-struct enable_trigger_data {
- struct trace_event_file *file;
- bool enable;
-};
-
static void
event_enable_trigger(struct event_trigger_data *data, void *rec)
{
@@ -1100,14 +1234,16 @@ event_enable_count_trigger(struct event_trigger_data *data, void *rec)
event_enable_trigger(data, rec);
}
-static int
-event_enable_trigger_print(struct seq_file *m, struct event_trigger_ops *ops,
- struct event_trigger_data *data)
+int event_enable_trigger_print(struct seq_file *m,
+ struct event_trigger_ops *ops,
+ struct event_trigger_data *data)
{
struct enable_trigger_data *enable_data = data->private_data;
seq_printf(m, "%s:%s:%s",
- enable_data->enable ? ENABLE_EVENT_STR : DISABLE_EVENT_STR,
+ enable_data->hist ?
+ (enable_data->enable ? ENABLE_HIST_STR : DISABLE_HIST_STR) :
+ (enable_data->enable ? ENABLE_EVENT_STR : DISABLE_EVENT_STR),
enable_data->file->event_call->class->system,
trace_event_name(enable_data->file->event_call));
@@ -1124,9 +1260,8 @@ event_enable_trigger_print(struct seq_file *m, struct event_trigger_ops *ops,
return 0;
}
-static void
-event_enable_trigger_free(struct event_trigger_ops *ops,
- struct event_trigger_data *data)
+void event_enable_trigger_free(struct event_trigger_ops *ops,
+ struct event_trigger_data *data)
{
struct enable_trigger_data *enable_data = data->private_data;
@@ -1171,10 +1306,9 @@ static struct event_trigger_ops event_disable_count_trigger_ops = {
.free = event_enable_trigger_free,
};
-static int
-event_enable_trigger_func(struct event_command *cmd_ops,
- struct trace_event_file *file,
- char *glob, char *cmd, char *param)
+int event_enable_trigger_func(struct event_command *cmd_ops,
+ struct trace_event_file *file,
+ char *glob, char *cmd, char *param)
{
struct trace_event_file *event_enable_file;
struct enable_trigger_data *enable_data;
@@ -1183,6 +1317,7 @@ event_enable_trigger_func(struct event_command *cmd_ops,
struct trace_array *tr = file->tr;
const char *system;
const char *event;
+ bool hist = false;
char *trigger;
char *number;
bool enable;
@@ -1207,8 +1342,15 @@ event_enable_trigger_func(struct event_command *cmd_ops,
if (!event_enable_file)
goto out;
- enable = strcmp(cmd, ENABLE_EVENT_STR) == 0;
+#ifdef CONFIG_HIST_TRIGGERS
+ hist = ((strcmp(cmd, ENABLE_HIST_STR) == 0) ||
+ (strcmp(cmd, DISABLE_HIST_STR) == 0));
+ enable = ((strcmp(cmd, ENABLE_EVENT_STR) == 0) ||
+ (strcmp(cmd, ENABLE_HIST_STR) == 0));
+#else
+ enable = strcmp(cmd, ENABLE_EVENT_STR) == 0;
+#endif
trigger_ops = cmd_ops->get_trigger_ops(cmd, trigger);
ret = -ENOMEM;
@@ -1228,6 +1370,7 @@ event_enable_trigger_func(struct event_command *cmd_ops,
INIT_LIST_HEAD(&trigger_data->list);
RCU_INIT_POINTER(trigger_data->filter, NULL);
+ enable_data->hist = hist;
enable_data->enable = enable;
enable_data->file = event_enable_file;
trigger_data->private_data = enable_data;
@@ -1305,10 +1448,10 @@ event_enable_trigger_func(struct event_command *cmd_ops,
goto out;
}
-static int event_enable_register_trigger(char *glob,
- struct event_trigger_ops *ops,
- struct event_trigger_data *data,
- struct trace_event_file *file)
+int event_enable_register_trigger(char *glob,
+ struct event_trigger_ops *ops,
+ struct event_trigger_data *data,
+ struct trace_event_file *file)
{
struct enable_trigger_data *enable_data = data->private_data;
struct enable_trigger_data *test_enable_data;
@@ -1318,6 +1461,8 @@ static int event_enable_register_trigger(char *glob,
list_for_each_entry_rcu(test, &file->triggers, list) {
test_enable_data = test->private_data;
if (test_enable_data &&
+ (test->cmd_ops->trigger_type ==
+ data->cmd_ops->trigger_type) &&
(test_enable_data->file == enable_data->file)) {
ret = -EEXIST;
goto out;
@@ -1343,10 +1488,10 @@ out:
return ret;
}
-static void event_enable_unregister_trigger(char *glob,
- struct event_trigger_ops *ops,
- struct event_trigger_data *test,
- struct trace_event_file *file)
+void event_enable_unregister_trigger(char *glob,
+ struct event_trigger_ops *ops,
+ struct event_trigger_data *test,
+ struct trace_event_file *file)
{
struct enable_trigger_data *test_enable_data = test->private_data;
struct enable_trigger_data *enable_data;
@@ -1356,6 +1501,8 @@ static void event_enable_unregister_trigger(char *glob,
list_for_each_entry_rcu(data, &file->triggers, list) {
enable_data = data->private_data;
if (enable_data &&
+ (data->cmd_ops->trigger_type ==
+ test->cmd_ops->trigger_type) &&
(enable_data->file == test_enable_data->file)) {
unregistered = true;
list_del_rcu(&data->list);
@@ -1375,8 +1522,12 @@ event_enable_get_trigger_ops(char *cmd, char *param)
struct event_trigger_ops *ops;
bool enable;
+#ifdef CONFIG_HIST_TRIGGERS
+ enable = ((strcmp(cmd, ENABLE_EVENT_STR) == 0) ||
+ (strcmp(cmd, ENABLE_HIST_STR) == 0));
+#else
enable = strcmp(cmd, ENABLE_EVENT_STR) == 0;
-
+#endif
if (enable)
ops = param ? &event_enable_count_trigger_ops :
&event_enable_trigger_ops;
@@ -1447,6 +1598,8 @@ __init int register_trigger_cmds(void)
register_trigger_snapshot_cmd();
register_trigger_stacktrace_cmd();
register_trigger_enable_disable_cmds();
+ register_trigger_hist_enable_disable_cmds();
+ register_trigger_hist_cmd();
return 0;
}
diff --git a/kernel/trace/tracing_map.c b/kernel/trace/tracing_map.c
new file mode 100644
index 000000000000..0a689bbb78ef
--- /dev/null
+++ b/kernel/trace/tracing_map.c
@@ -0,0 +1,1062 @@
+/*
+ * tracing_map - lock-free map for tracing
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * Copyright (C) 2015 Tom Zanussi <tom.zanussi@linux.intel.com>
+ *
+ * tracing_map implementation inspired by lock-free map algorithms
+ * originated by Dr. Cliff Click:
+ *
+ * http://www.azulsystems.com/blog/cliff/2007-03-26-non-blocking-hashtable
+ * http://www.azulsystems.com/events/javaone_2007/2007_LockFreeHash.pdf
+ */
+
+#include <linux/vmalloc.h>
+#include <linux/jhash.h>
+#include <linux/slab.h>
+#include <linux/sort.h>
+
+#include "tracing_map.h"
+#include "trace.h"
+
+/*
+ * NOTE: For a detailed description of the data structures used by
+ * these functions (such as tracing_map_elt) please see the overview
+ * of tracing_map data structures at the beginning of tracing_map.h.
+ */
+
+/**
+ * tracing_map_update_sum - Add a value to a tracing_map_elt's sum field
+ * @elt: The tracing_map_elt
+ * @i: The index of the given sum associated with the tracing_map_elt
+ * @n: The value to add to the sum
+ *
+ * Add n to sum i associated with the specified tracing_map_elt
+ * instance. The index i is the index returned by the call to
+ * tracing_map_add_sum_field() when the tracing map was set up.
+ */
+void tracing_map_update_sum(struct tracing_map_elt *elt, unsigned int i, u64 n)
+{
+ atomic64_add(n, &elt->fields[i].sum);
+}
+
+/**
+ * tracing_map_read_sum - Return the value of a tracing_map_elt's sum field
+ * @elt: The tracing_map_elt
+ * @i: The index of the given sum associated with the tracing_map_elt
+ *
+ * Retrieve the value of the sum i associated with the specified
+ * tracing_map_elt instance. The index i is the index returned by the
+ * call to tracing_map_add_sum_field() when the tracing map was set
+ * up.
+ *
+ * Return: The sum associated with field i for elt.
+ */
+u64 tracing_map_read_sum(struct tracing_map_elt *elt, unsigned int i)
+{
+ return (u64)atomic64_read(&elt->fields[i].sum);
+}
+
+int tracing_map_cmp_string(void *val_a, void *val_b)
+{
+ char *a = val_a;
+ char *b = val_b;
+
+ return strcmp(a, b);
+}
+
+int tracing_map_cmp_none(void *val_a, void *val_b)
+{
+ return 0;
+}
+
+static int tracing_map_cmp_atomic64(void *val_a, void *val_b)
+{
+ u64 a = atomic64_read((atomic64_t *)val_a);
+ u64 b = atomic64_read((atomic64_t *)val_b);
+
+ return (a > b) ? 1 : ((a < b) ? -1 : 0);
+}
+
+#define DEFINE_TRACING_MAP_CMP_FN(type) \
+static int tracing_map_cmp_##type(void *val_a, void *val_b) \
+{ \
+ type a = *(type *)val_a; \
+ type b = *(type *)val_b; \
+ \
+ return (a > b) ? 1 : ((a < b) ? -1 : 0); \
+}
+
+DEFINE_TRACING_MAP_CMP_FN(s64);
+DEFINE_TRACING_MAP_CMP_FN(u64);
+DEFINE_TRACING_MAP_CMP_FN(s32);
+DEFINE_TRACING_MAP_CMP_FN(u32);
+DEFINE_TRACING_MAP_CMP_FN(s16);
+DEFINE_TRACING_MAP_CMP_FN(u16);
+DEFINE_TRACING_MAP_CMP_FN(s8);
+DEFINE_TRACING_MAP_CMP_FN(u8);
+
+tracing_map_cmp_fn_t tracing_map_cmp_num(int field_size,
+ int field_is_signed)
+{
+ tracing_map_cmp_fn_t fn = tracing_map_cmp_none;
+
+ switch (field_size) {
+ case 8:
+ if (field_is_signed)
+ fn = tracing_map_cmp_s64;
+ else
+ fn = tracing_map_cmp_u64;
+ break;
+ case 4:
+ if (field_is_signed)
+ fn = tracing_map_cmp_s32;
+ else
+ fn = tracing_map_cmp_u32;
+ break;
+ case 2:
+ if (field_is_signed)
+ fn = tracing_map_cmp_s16;
+ else
+ fn = tracing_map_cmp_u16;
+ break;
+ case 1:
+ if (field_is_signed)
+ fn = tracing_map_cmp_s8;
+ else
+ fn = tracing_map_cmp_u8;
+ break;
+ }
+
+ return fn;
+}
+
+static int tracing_map_add_field(struct tracing_map *map,
+ tracing_map_cmp_fn_t cmp_fn)
+{
+ int ret = -EINVAL;
+
+ if (map->n_fields < TRACING_MAP_FIELDS_MAX) {
+ ret = map->n_fields;
+ map->fields[map->n_fields++].cmp_fn = cmp_fn;
+ }
+
+ return ret;
+}
+
+/**
+ * tracing_map_add_sum_field - Add a field describing a tracing_map sum
+ * @map: The tracing_map
+ *
+ * Add a sum field to the key and return the index identifying it in
+ * the map and associated tracing_map_elts. This is the index used
+ * for instance to update a sum for a particular tracing_map_elt using
+ * tracing_map_update_sum() or reading it via tracing_map_read_sum().
+ *
+ * Return: The index identifying the field in the map and associated
+ * tracing_map_elts, or -EINVAL on error.
+ */
+int tracing_map_add_sum_field(struct tracing_map *map)
+{
+ return tracing_map_add_field(map, tracing_map_cmp_atomic64);
+}
+
+/**
+ * tracing_map_add_key_field - Add a field describing a tracing_map key
+ * @map: The tracing_map
+ * @offset: The offset within the key
+ * @cmp_fn: The comparison function that will be used to sort on the key
+ *
+ * Let the map know there is a key and that if it's used as a sort key
+ * to use cmp_fn.
+ *
+ * A key can be a subset of a compound key; for that purpose, the
+ * offset param is used to describe where within the the compound key
+ * the key referenced by this key field resides.
+ *
+ * Return: The index identifying the field in the map and associated
+ * tracing_map_elts, or -EINVAL on error.
+ */
+int tracing_map_add_key_field(struct tracing_map *map,
+ unsigned int offset,
+ tracing_map_cmp_fn_t cmp_fn)
+
+{
+ int idx = tracing_map_add_field(map, cmp_fn);
+
+ if (idx < 0)
+ return idx;
+
+ map->fields[idx].offset = offset;
+
+ map->key_idx[map->n_keys++] = idx;
+
+ return idx;
+}
+
+void tracing_map_array_clear(struct tracing_map_array *a)
+{
+ unsigned int i;
+
+ if (!a->pages)
+ return;
+
+ for (i = 0; i < a->n_pages; i++)
+ memset(a->pages[i], 0, PAGE_SIZE);
+}
+
+void tracing_map_array_free(struct tracing_map_array *a)
+{
+ unsigned int i;
+
+ if (!a)
+ return;
+
+ if (!a->pages) {
+ kfree(a);
+ return;
+ }
+
+ for (i = 0; i < a->n_pages; i++) {
+ if (!a->pages[i])
+ break;
+ free_page((unsigned long)a->pages[i]);
+ }
+}
+
+struct tracing_map_array *tracing_map_array_alloc(unsigned int n_elts,
+ unsigned int entry_size)
+{
+ struct tracing_map_array *a;
+ unsigned int i;
+
+ a = kzalloc(sizeof(*a), GFP_KERNEL);
+ if (!a)
+ return NULL;
+
+ a->entry_size_shift = fls(roundup_pow_of_two(entry_size) - 1);
+ a->entries_per_page = PAGE_SIZE / (1 << a->entry_size_shift);
+ a->n_pages = n_elts / a->entries_per_page;
+ if (!a->n_pages)
+ a->n_pages = 1;
+ a->entry_shift = fls(a->entries_per_page) - 1;
+ a->entry_mask = (1 << a->entry_shift) - 1;
+
+ a->pages = kcalloc(a->n_pages, sizeof(void *), GFP_KERNEL);
+ if (!a->pages)
+ goto free;
+
+ for (i = 0; i < a->n_pages; i++) {
+ a->pages[i] = (void *)get_zeroed_page(GFP_KERNEL);
+ if (!a->pages[i])
+ goto free;
+ }
+ out:
+ return a;
+ free:
+ tracing_map_array_free(a);
+ a = NULL;
+
+ goto out;
+}
+
+static void tracing_map_elt_clear(struct tracing_map_elt *elt)
+{
+ unsigned i;
+
+ for (i = 0; i < elt->map->n_fields; i++)
+ if (elt->fields[i].cmp_fn == tracing_map_cmp_atomic64)
+ atomic64_set(&elt->fields[i].sum, 0);
+
+ if (elt->map->ops && elt->map->ops->elt_clear)
+ elt->map->ops->elt_clear(elt);
+}
+
+static void tracing_map_elt_init_fields(struct tracing_map_elt *elt)
+{
+ unsigned int i;
+
+ tracing_map_elt_clear(elt);
+
+ for (i = 0; i < elt->map->n_fields; i++) {
+ elt->fields[i].cmp_fn = elt->map->fields[i].cmp_fn;
+
+ if (elt->fields[i].cmp_fn != tracing_map_cmp_atomic64)
+ elt->fields[i].offset = elt->map->fields[i].offset;
+ }
+}
+
+static void tracing_map_elt_free(struct tracing_map_elt *elt)
+{
+ if (!elt)
+ return;
+
+ if (elt->map->ops && elt->map->ops->elt_free)
+ elt->map->ops->elt_free(elt);
+ kfree(elt->fields);
+ kfree(elt->key);
+ kfree(elt);
+}
+
+static struct tracing_map_elt *tracing_map_elt_alloc(struct tracing_map *map)
+{
+ struct tracing_map_elt *elt;
+ int err = 0;
+
+ elt = kzalloc(sizeof(*elt), GFP_KERNEL);
+ if (!elt)
+ return ERR_PTR(-ENOMEM);
+
+ elt->map = map;
+
+ elt->key = kzalloc(map->key_size, GFP_KERNEL);
+ if (!elt->key) {
+ err = -ENOMEM;
+ goto free;
+ }
+
+ elt->fields = kcalloc(map->n_fields, sizeof(*elt->fields), GFP_KERNEL);
+ if (!elt->fields) {
+ err = -ENOMEM;
+ goto free;
+ }
+
+ tracing_map_elt_init_fields(elt);
+
+ if (map->ops && map->ops->elt_alloc) {
+ err = map->ops->elt_alloc(elt);
+ if (err)
+ goto free;
+ }
+ return elt;
+ free:
+ tracing_map_elt_free(elt);
+
+ return ERR_PTR(err);
+}
+
+static struct tracing_map_elt *get_free_elt(struct tracing_map *map)
+{
+ struct tracing_map_elt *elt = NULL;
+ int idx;
+
+ idx = atomic_inc_return(&map->next_elt);
+ if (idx < map->max_elts) {
+ elt = *(TRACING_MAP_ELT(map->elts, idx));
+ if (map->ops && map->ops->elt_init)
+ map->ops->elt_init(elt);
+ }
+
+ return elt;
+}
+
+static void tracing_map_free_elts(struct tracing_map *map)
+{
+ unsigned int i;
+
+ if (!map->elts)
+ return;
+
+ for (i = 0; i < map->max_elts; i++) {
+ tracing_map_elt_free(*(TRACING_MAP_ELT(map->elts, i)));
+ *(TRACING_MAP_ELT(map->elts, i)) = NULL;
+ }
+
+ tracing_map_array_free(map->elts);
+ map->elts = NULL;
+}
+
+static int tracing_map_alloc_elts(struct tracing_map *map)
+{
+ unsigned int i;
+
+ map->elts = tracing_map_array_alloc(map->max_elts,
+ sizeof(struct tracing_map_elt *));
+ if (!map->elts)
+ return -ENOMEM;
+
+ for (i = 0; i < map->max_elts; i++) {
+ *(TRACING_MAP_ELT(map->elts, i)) = tracing_map_elt_alloc(map);
+ if (IS_ERR(*(TRACING_MAP_ELT(map->elts, i)))) {
+ *(TRACING_MAP_ELT(map->elts, i)) = NULL;
+ tracing_map_free_elts(map);
+
+ return -ENOMEM;
+ }
+ }
+
+ return 0;
+}
+
+static inline bool keys_match(void *key, void *test_key, unsigned key_size)
+{
+ bool match = true;
+
+ if (memcmp(key, test_key, key_size))
+ match = false;
+
+ return match;
+}
+
+static inline struct tracing_map_elt *
+__tracing_map_insert(struct tracing_map *map, void *key, bool lookup_only)
+{
+ u32 idx, key_hash, test_key;
+ struct tracing_map_entry *entry;
+
+ key_hash = jhash(key, map->key_size, 0);
+ if (key_hash == 0)
+ key_hash = 1;
+ idx = key_hash >> (32 - (map->map_bits + 1));
+
+ while (1) {
+ idx &= (map->map_size - 1);
+ entry = TRACING_MAP_ENTRY(map->map, idx);
+ test_key = entry->key;
+
+ if (test_key && test_key == key_hash && entry->val &&
+ keys_match(key, entry->val->key, map->key_size)) {
+ atomic64_inc(&map->hits);
+ return entry->val;
+ }
+
+ if (!test_key) {
+ if (lookup_only)
+ break;
+
+ if (!cmpxchg(&entry->key, 0, key_hash)) {
+ struct tracing_map_elt *elt;
+
+ elt = get_free_elt(map);
+ if (!elt) {
+ atomic64_inc(&map->drops);
+ entry->key = 0;
+ break;
+ }
+
+ memcpy(elt->key, key, map->key_size);
+ entry->val = elt;
+ atomic64_inc(&map->hits);
+
+ return entry->val;
+ }
+ }
+
+ idx++;
+ }
+
+ return NULL;
+}
+
+/**
+ * tracing_map_insert - Insert key and/or retrieve val from a tracing_map
+ * @map: The tracing_map to insert into
+ * @key: The key to insert
+ *
+ * Inserts a key into a tracing_map and creates and returns a new
+ * tracing_map_elt for it, or if the key has already been inserted by
+ * a previous call, returns the tracing_map_elt already associated
+ * with it. When the map was created, the number of elements to be
+ * allocated for the map was specified (internally maintained as
+ * 'max_elts' in struct tracing_map), and that number of
+ * tracing_map_elts was created by tracing_map_init(). This is the
+ * pre-allocated pool of tracing_map_elts that tracing_map_insert()
+ * will allocate from when adding new keys. Once that pool is
+ * exhausted, tracing_map_insert() is useless and will return NULL to
+ * signal that state. There are two user-visible tracing_map
+ * variables, 'hits' and 'drops', which are updated by this function.
+ * Every time an element is either successfully inserted or retrieved,
+ * the 'hits' value is incrememented. Every time an element insertion
+ * fails, the 'drops' value is incremented.
+ *
+ * This is a lock-free tracing map insertion function implementing a
+ * modified form of Cliff Click's basic insertion algorithm. It
+ * requires the table size be a power of two. To prevent any
+ * possibility of an infinite loop we always make the internal table
+ * size double the size of the requested table size (max_elts * 2).
+ * Likewise, we never reuse a slot or resize or delete elements - when
+ * we've reached max_elts entries, we simply return NULL once we've
+ * run out of entries. Readers can at any point in time traverse the
+ * tracing map and safely access the key/val pairs.
+ *
+ * Return: the tracing_map_elt pointer val associated with the key.
+ * If this was a newly inserted key, the val will be a newly allocated
+ * and associated tracing_map_elt pointer val. If the key wasn't
+ * found and the pool of tracing_map_elts has been exhausted, NULL is
+ * returned and no further insertions will succeed.
+ */
+struct tracing_map_elt *tracing_map_insert(struct tracing_map *map, void *key)
+{
+ return __tracing_map_insert(map, key, false);
+}
+
+/**
+ * tracing_map_lookup - Retrieve val from a tracing_map
+ * @map: The tracing_map to perform the lookup on
+ * @key: The key to look up
+ *
+ * Looks up key in tracing_map and if found returns the matching
+ * tracing_map_elt. This is a lock-free lookup; see
+ * tracing_map_insert() for details on tracing_map and how it works.
+ * Every time an element is retrieved, the 'hits' value is
+ * incrememented. There is one user-visible tracing_map variable,
+ * 'hits', which is updated by this function. Every time an element
+ * is successfully retrieved, the 'hits' value is incrememented. The
+ * 'drops' value is never updated by this function.
+ *
+ * Return: the tracing_map_elt pointer val associated with the key.
+ * If the key wasn't found, NULL is returned.
+ */
+struct tracing_map_elt *tracing_map_lookup(struct tracing_map *map, void *key)
+{
+ return __tracing_map_insert(map, key, true);
+}
+
+/**
+ * tracing_map_destroy - Destroy a tracing_map
+ * @map: The tracing_map to destroy
+ *
+ * Frees a tracing_map along with its associated array of
+ * tracing_map_elts.
+ *
+ * Callers should make sure there are no readers or writers actively
+ * reading or inserting into the map before calling this.
+ */
+void tracing_map_destroy(struct tracing_map *map)
+{
+ if (!map)
+ return;
+
+ tracing_map_free_elts(map);
+
+ tracing_map_array_free(map->map);
+ kfree(map);
+}
+
+/**
+ * tracing_map_clear - Clear a tracing_map
+ * @map: The tracing_map to clear
+ *
+ * Resets the tracing map to a cleared or initial state. The
+ * tracing_map_elts are all cleared, and the array of struct
+ * tracing_map_entry is reset to an initialized state.
+ *
+ * Callers should make sure there are no writers actively inserting
+ * into the map before calling this.
+ */
+void tracing_map_clear(struct tracing_map *map)
+{
+ unsigned int i;
+
+ atomic_set(&map->next_elt, -1);
+ atomic64_set(&map->hits, 0);
+ atomic64_set(&map->drops, 0);
+
+ tracing_map_array_clear(map->map);
+
+ for (i = 0; i < map->max_elts; i++)
+ tracing_map_elt_clear(*(TRACING_MAP_ELT(map->elts, i)));
+}
+
+static void set_sort_key(struct tracing_map *map,
+ struct tracing_map_sort_key *sort_key)
+{
+ map->sort_key = *sort_key;
+}
+
+/**
+ * tracing_map_create - Create a lock-free map and element pool
+ * @map_bits: The size of the map (2 ** map_bits)
+ * @key_size: The size of the key for the map in bytes
+ * @ops: Optional client-defined tracing_map_ops instance
+ * @private_data: Client data associated with the map
+ *
+ * Creates and sets up a map to contain 2 ** map_bits number of
+ * elements (internally maintained as 'max_elts' in struct
+ * tracing_map). Before using, map fields should be added to the map
+ * with tracing_map_add_sum_field() and tracing_map_add_key_field().
+ * tracing_map_init() should then be called to allocate the array of
+ * tracing_map_elts, in order to avoid allocating anything in the map
+ * insertion path. The user-specified map size reflects the maximum
+ * number of elements that can be contained in the table requested by
+ * the user - internally we double that in order to keep the table
+ * sparse and keep collisions manageable.
+ *
+ * A tracing_map is a special-purpose map designed to aggregate or
+ * 'sum' one or more values associated with a specific object of type
+ * tracing_map_elt, which is attached by the map to a given key.
+ *
+ * tracing_map_create() sets up the map itself, and provides
+ * operations for inserting tracing_map_elts, but doesn't allocate the
+ * tracing_map_elts themselves, or provide a means for describing the
+ * keys or sums associated with the tracing_map_elts. All
+ * tracing_map_elts for a given map have the same set of sums and
+ * keys, which are defined by the client using the functions
+ * tracing_map_add_key_field() and tracing_map_add_sum_field(). Once
+ * the fields are defined, the pool of elements allocated for the map
+ * can be created, which occurs when the client code calls
+ * tracing_map_init().
+ *
+ * When tracing_map_init() returns, tracing_map_elt elements can be
+ * inserted into the map using tracing_map_insert(). When called,
+ * tracing_map_insert() grabs a free tracing_map_elt from the pool, or
+ * finds an existing match in the map and in either case returns it.
+ * The client can then use tracing_map_update_sum() and
+ * tracing_map_read_sum() to update or read a given sum field for the
+ * tracing_map_elt.
+ *
+ * The client can at any point retrieve and traverse the current set
+ * of inserted tracing_map_elts in a tracing_map, via
+ * tracing_map_sort_entries(). Sorting can be done on any field,
+ * including keys.
+ *
+ * See tracing_map.h for a description of tracing_map_ops.
+ *
+ * Return: the tracing_map pointer if successful, ERR_PTR if not.
+ */
+struct tracing_map *tracing_map_create(unsigned int map_bits,
+ unsigned int key_size,
+ const struct tracing_map_ops *ops,
+ void *private_data)
+{
+ struct tracing_map *map;
+ unsigned int i;
+
+ if (map_bits < TRACING_MAP_BITS_MIN ||
+ map_bits > TRACING_MAP_BITS_MAX)
+ return ERR_PTR(-EINVAL);
+
+ map = kzalloc(sizeof(*map), GFP_KERNEL);
+ if (!map)
+ return ERR_PTR(-ENOMEM);
+
+ map->map_bits = map_bits;
+ map->max_elts = (1 << map_bits);
+ atomic_set(&map->next_elt, -1);
+
+ map->map_size = (1 << (map_bits + 1));
+ map->ops = ops;
+
+ map->private_data = private_data;
+
+ map->map = tracing_map_array_alloc(map->map_size,
+ sizeof(struct tracing_map_entry));
+ if (!map->map)
+ goto free;
+
+ map->key_size = key_size;
+ for (i = 0; i < TRACING_MAP_KEYS_MAX; i++)
+ map->key_idx[i] = -1;
+ out:
+ return map;
+ free:
+ tracing_map_destroy(map);
+ map = ERR_PTR(-ENOMEM);
+
+ goto out;
+}
+
+/**
+ * tracing_map_init - Allocate and clear a map's tracing_map_elts
+ * @map: The tracing_map to initialize
+ *
+ * Allocates a clears a pool of tracing_map_elts equal to the
+ * user-specified size of 2 ** map_bits (internally maintained as
+ * 'max_elts' in struct tracing_map). Before using, the map fields
+ * should be added to the map with tracing_map_add_sum_field() and
+ * tracing_map_add_key_field(). tracing_map_init() should then be
+ * called to allocate the array of tracing_map_elts, in order to avoid
+ * allocating anything in the map insertion path. The user-specified
+ * map size reflects the max number of elements requested by the user
+ * - internally we double that in order to keep the table sparse and
+ * keep collisions manageable.
+ *
+ * See tracing_map.h for a description of tracing_map_ops.
+ *
+ * Return: the tracing_map pointer if successful, ERR_PTR if not.
+ */
+int tracing_map_init(struct tracing_map *map)
+{
+ int err;
+
+ if (map->n_fields < 2)
+ return -EINVAL; /* need at least 1 key and 1 val */
+
+ err = tracing_map_alloc_elts(map);
+ if (err)
+ return err;
+
+ tracing_map_clear(map);
+
+ return err;
+}
+
+static int cmp_entries_dup(const struct tracing_map_sort_entry **a,
+ const struct tracing_map_sort_entry **b)
+{
+ int ret = 0;
+
+ if (memcmp((*a)->key, (*b)->key, (*a)->elt->map->key_size))
+ ret = 1;
+
+ return ret;
+}
+
+static int cmp_entries_sum(const struct tracing_map_sort_entry **a,
+ const struct tracing_map_sort_entry **b)
+{
+ const struct tracing_map_elt *elt_a, *elt_b;
+ struct tracing_map_sort_key *sort_key;
+ struct tracing_map_field *field;
+ tracing_map_cmp_fn_t cmp_fn;
+ void *val_a, *val_b;
+ int ret = 0;
+
+ elt_a = (*a)->elt;
+ elt_b = (*b)->elt;
+
+ sort_key = &elt_a->map->sort_key;
+
+ field = &elt_a->fields[sort_key->field_idx];
+ cmp_fn = field->cmp_fn;
+
+ val_a = &elt_a->fields[sort_key->field_idx].sum;
+ val_b = &elt_b->fields[sort_key->field_idx].sum;
+
+ ret = cmp_fn(val_a, val_b);
+ if (sort_key->descending)
+ ret = -ret;
+
+ return ret;
+}
+
+static int cmp_entries_key(const struct tracing_map_sort_entry **a,
+ const struct tracing_map_sort_entry **b)
+{
+ const struct tracing_map_elt *elt_a, *elt_b;
+ struct tracing_map_sort_key *sort_key;
+ struct tracing_map_field *field;
+ tracing_map_cmp_fn_t cmp_fn;
+ void *val_a, *val_b;
+ int ret = 0;
+
+ elt_a = (*a)->elt;
+ elt_b = (*b)->elt;
+
+ sort_key = &elt_a->map->sort_key;
+
+ field = &elt_a->fields[sort_key->field_idx];
+
+ cmp_fn = field->cmp_fn;
+
+ val_a = elt_a->key + field->offset;
+ val_b = elt_b->key + field->offset;
+
+ ret = cmp_fn(val_a, val_b);
+ if (sort_key->descending)
+ ret = -ret;
+
+ return ret;
+}
+
+static void destroy_sort_entry(struct tracing_map_sort_entry *entry)
+{
+ if (!entry)
+ return;
+
+ if (entry->elt_copied)
+ tracing_map_elt_free(entry->elt);
+
+ kfree(entry);
+}
+
+/**
+ * tracing_map_destroy_sort_entries - Destroy an array of sort entries
+ * @entries: The entries to destroy
+ * @n_entries: The number of entries in the array
+ *
+ * Destroy the elements returned by a tracing_map_sort_entries() call.
+ */
+void tracing_map_destroy_sort_entries(struct tracing_map_sort_entry **entries,
+ unsigned int n_entries)
+{
+ unsigned int i;
+
+ for (i = 0; i < n_entries; i++)
+ destroy_sort_entry(entries[i]);
+
+ vfree(entries);
+}
+
+static struct tracing_map_sort_entry *
+create_sort_entry(void *key, struct tracing_map_elt *elt)
+{
+ struct tracing_map_sort_entry *sort_entry;
+
+ sort_entry = kzalloc(sizeof(*sort_entry), GFP_KERNEL);
+ if (!sort_entry)
+ return NULL;
+
+ sort_entry->key = key;
+ sort_entry->elt = elt;
+
+ return sort_entry;
+}
+
+static struct tracing_map_elt *copy_elt(struct tracing_map_elt *elt)
+{
+ struct tracing_map_elt *dup_elt;
+ unsigned int i;
+
+ dup_elt = tracing_map_elt_alloc(elt->map);
+ if (IS_ERR(dup_elt))
+ return NULL;
+
+ if (elt->map->ops && elt->map->ops->elt_copy)
+ elt->map->ops->elt_copy(dup_elt, elt);
+
+ dup_elt->private_data = elt->private_data;
+ memcpy(dup_elt->key, elt->key, elt->map->key_size);
+
+ for (i = 0; i < elt->map->n_fields; i++) {
+ atomic64_set(&dup_elt->fields[i].sum,
+ atomic64_read(&elt->fields[i].sum));
+ dup_elt->fields[i].cmp_fn = elt->fields[i].cmp_fn;
+ }
+
+ return dup_elt;
+}
+
+static int merge_dup(struct tracing_map_sort_entry **sort_entries,
+ unsigned int target, unsigned int dup)
+{
+ struct tracing_map_elt *target_elt, *elt;
+ bool first_dup = (target - dup) == 1;
+ int i;
+
+ if (first_dup) {
+ elt = sort_entries[target]->elt;
+ target_elt = copy_elt(elt);
+ if (!target_elt)
+ return -ENOMEM;
+ sort_entries[target]->elt = target_elt;
+ sort_entries[target]->elt_copied = true;
+ } else
+ target_elt = sort_entries[target]->elt;
+
+ elt = sort_entries[dup]->elt;
+
+ for (i = 0; i < elt->map->n_fields; i++)
+ atomic64_add(atomic64_read(&elt->fields[i].sum),
+ &target_elt->fields[i].sum);
+
+ sort_entries[dup]->dup = true;
+
+ return 0;
+}
+
+static int merge_dups(struct tracing_map_sort_entry **sort_entries,
+ int n_entries, unsigned int key_size)
+{
+ unsigned int dups = 0, total_dups = 0;
+ int err, i, j;
+ void *key;
+
+ if (n_entries < 2)
+ return total_dups;
+
+ sort(sort_entries, n_entries, sizeof(struct tracing_map_sort_entry *),
+ (int (*)(const void *, const void *))cmp_entries_dup, NULL);
+
+ key = sort_entries[0]->key;
+ for (i = 1; i < n_entries; i++) {
+ if (!memcmp(sort_entries[i]->key, key, key_size)) {
+ dups++; total_dups++;
+ err = merge_dup(sort_entries, i - dups, i);
+ if (err)
+ return err;
+ continue;
+ }
+ key = sort_entries[i]->key;
+ dups = 0;
+ }
+
+ if (!total_dups)
+ return total_dups;
+
+ for (i = 0, j = 0; i < n_entries; i++) {
+ if (!sort_entries[i]->dup) {
+ sort_entries[j] = sort_entries[i];
+ if (j++ != i)
+ sort_entries[i] = NULL;
+ } else {
+ destroy_sort_entry(sort_entries[i]);
+ sort_entries[i] = NULL;
+ }
+ }
+
+ return total_dups;
+}
+
+static bool is_key(struct tracing_map *map, unsigned int field_idx)
+{
+ unsigned int i;
+
+ for (i = 0; i < map->n_keys; i++)
+ if (map->key_idx[i] == field_idx)
+ return true;
+ return false;
+}
+
+static void sort_secondary(struct tracing_map *map,
+ const struct tracing_map_sort_entry **entries,
+ unsigned int n_entries,
+ struct tracing_map_sort_key *primary_key,
+ struct tracing_map_sort_key *secondary_key)
+{
+ int (*primary_fn)(const struct tracing_map_sort_entry **,
+ const struct tracing_map_sort_entry **);
+ int (*secondary_fn)(const struct tracing_map_sort_entry **,
+ const struct tracing_map_sort_entry **);
+ unsigned i, start = 0, n_sub = 1;
+
+ if (is_key(map, primary_key->field_idx))
+ primary_fn = cmp_entries_key;
+ else
+ primary_fn = cmp_entries_sum;
+
+ if (is_key(map, secondary_key->field_idx))
+ secondary_fn = cmp_entries_key;
+ else
+ secondary_fn = cmp_entries_sum;
+
+ for (i = 0; i < n_entries - 1; i++) {
+ const struct tracing_map_sort_entry **a = &entries[i];
+ const struct tracing_map_sort_entry **b = &entries[i + 1];
+
+ if (primary_fn(a, b) == 0) {
+ n_sub++;
+ if (i < n_entries - 2)
+ continue;
+ }
+
+ if (n_sub < 2) {
+ start = i + 1;
+ n_sub = 1;
+ continue;
+ }
+
+ set_sort_key(map, secondary_key);
+ sort(&entries[start], n_sub,
+ sizeof(struct tracing_map_sort_entry *),
+ (int (*)(const void *, const void *))secondary_fn, NULL);
+ set_sort_key(map, primary_key);
+
+ start = i + 1;
+ n_sub = 1;
+ }
+}
+
+/**
+ * tracing_map_sort_entries - Sort the current set of tracing_map_elts in a map
+ * @map: The tracing_map
+ * @sort_key: The sort key to use for sorting
+ * @sort_entries: outval: pointer to allocated and sorted array of entries
+ *
+ * tracing_map_sort_entries() sorts the current set of entries in the
+ * map and returns the list of tracing_map_sort_entries containing
+ * them to the client in the sort_entries param. The client can
+ * access the struct tracing_map_elt element of interest directly as
+ * the 'elt' field of a returned struct tracing_map_sort_entry object.
+ *
+ * The sort_key has only two fields: idx and descending. 'idx' refers
+ * to the index of the field added via tracing_map_add_sum_field() or
+ * tracing_map_add_key_field() when the tracing_map was initialized.
+ * 'descending' is a flag that if set reverses the sort order, which
+ * by default is ascending.
+ *
+ * The client should not hold on to the returned array but should use
+ * it and call tracing_map_destroy_sort_entries() when done.
+ *
+ * Return: the number of sort_entries in the struct tracing_map_sort_entry
+ * array, negative on error
+ */
+int tracing_map_sort_entries(struct tracing_map *map,
+ struct tracing_map_sort_key *sort_keys,
+ unsigned int n_sort_keys,
+ struct tracing_map_sort_entry ***sort_entries)
+{
+ int (*cmp_entries_fn)(const struct tracing_map_sort_entry **,
+ const struct tracing_map_sort_entry **);
+ struct tracing_map_sort_entry *sort_entry, **entries;
+ int i, n_entries, ret;
+
+ entries = vmalloc(map->max_elts * sizeof(sort_entry));
+ if (!entries)
+ return -ENOMEM;
+
+ for (i = 0, n_entries = 0; i < map->map_size; i++) {
+ struct tracing_map_entry *entry;
+
+ entry = TRACING_MAP_ENTRY(map->map, i);
+
+ if (!entry->key || !entry->val)
+ continue;
+
+ entries[n_entries] = create_sort_entry(entry->val->key,
+ entry->val);
+ if (!entries[n_entries++]) {
+ ret = -ENOMEM;
+ goto free;
+ }
+ }
+
+ if (n_entries == 0) {
+ ret = 0;
+ goto free;
+ }
+
+ if (n_entries == 1) {
+ *sort_entries = entries;
+ return 1;
+ }
+
+ ret = merge_dups(entries, n_entries, map->key_size);
+ if (ret < 0)
+ goto free;
+ n_entries -= ret;
+
+ if (is_key(map, sort_keys[0].field_idx))
+ cmp_entries_fn = cmp_entries_key;
+ else
+ cmp_entries_fn = cmp_entries_sum;
+
+ set_sort_key(map, &sort_keys[0]);
+
+ sort(entries, n_entries, sizeof(struct tracing_map_sort_entry *),
+ (int (*)(const void *, const void *))cmp_entries_fn, NULL);
+
+ if (n_sort_keys > 1)
+ sort_secondary(map,
+ (const struct tracing_map_sort_entry **)entries,
+ n_entries,
+ &sort_keys[0],
+ &sort_keys[1]);
+
+ *sort_entries = entries;
+
+ return n_entries;
+ free:
+ tracing_map_destroy_sort_entries(entries, n_entries);
+
+ return ret;
+}
diff --git a/kernel/trace/tracing_map.h b/kernel/trace/tracing_map.h
new file mode 100644
index 000000000000..618838f5f30a
--- /dev/null
+++ b/kernel/trace/tracing_map.h
@@ -0,0 +1,283 @@
+#ifndef __TRACING_MAP_H
+#define __TRACING_MAP_H
+
+#define TRACING_MAP_BITS_DEFAULT 11
+#define TRACING_MAP_BITS_MAX 17
+#define TRACING_MAP_BITS_MIN 7
+
+#define TRACING_MAP_KEYS_MAX 2
+#define TRACING_MAP_VALS_MAX 3
+#define TRACING_MAP_FIELDS_MAX (TRACING_MAP_KEYS_MAX + \
+ TRACING_MAP_VALS_MAX)
+#define TRACING_MAP_SORT_KEYS_MAX 2
+
+typedef int (*tracing_map_cmp_fn_t) (void *val_a, void *val_b);
+
+/*
+ * This is an overview of the tracing_map data structures and how they
+ * relate to the tracing_map API. The details of the algorithms
+ * aren't discussed here - this is just a general overview of the data
+ * structures and how they interact with the API.
+ *
+ * The central data structure of the tracing_map is an initially
+ * zeroed array of struct tracing_map_entry (stored in the map field
+ * of struct tracing_map). tracing_map_entry is a very simple data
+ * structure containing only two fields: a 32-bit unsigned 'key'
+ * variable and a pointer named 'val'. This array of struct
+ * tracing_map_entry is essentially a hash table which will be
+ * modified by a single function, tracing_map_insert(), but which can
+ * be traversed and read by a user at any time (though the user does
+ * this indirectly via an array of tracing_map_sort_entry - see the
+ * explanation of that data structure in the discussion of the
+ * sorting-related data structures below).
+ *
+ * The central function of the tracing_map API is
+ * tracing_map_insert(). tracing_map_insert() hashes the
+ * arbitrarily-sized key passed into it into a 32-bit unsigned key.
+ * It then uses this key, truncated to the array size, as an index
+ * into the array of tracing_map_entries. If the value of the 'key'
+ * field of the tracing_map_entry found at that location is 0, then
+ * that entry is considered to be free and can be claimed, by
+ * replacing the 0 in the 'key' field of the tracing_map_entry with
+ * the new 32-bit hashed key. Once claimed, that tracing_map_entry's
+ * 'val' field is then used to store a unique element which will be
+ * forever associated with that 32-bit hashed key in the
+ * tracing_map_entry.
+ *
+ * That unique element now in the tracing_map_entry's 'val' field is
+ * an instance of tracing_map_elt, where 'elt' in the latter part of
+ * that variable name is short for 'element'. The purpose of a
+ * tracing_map_elt is to hold values specific to the particular
+ * 32-bit hashed key it's assocated with. Things such as the unique
+ * set of aggregated sums associated with the 32-bit hashed key, along
+ * with a copy of the full key associated with the entry, and which
+ * was used to produce the 32-bit hashed key.
+ *
+ * When tracing_map_create() is called to create the tracing map, the
+ * user specifies (indirectly via the map_bits param, the details are
+ * unimportant for this discussion) the maximum number of elements
+ * that the map can hold (stored in the max_elts field of struct
+ * tracing_map). This is the maximum possible number of
+ * tracing_map_entries in the tracing_map_entry array which can be
+ * 'claimed' as described in the above discussion, and therefore is
+ * also the maximum number of tracing_map_elts that can be associated
+ * with the tracing_map_entry array in the tracing_map. Because of
+ * the way the insertion algorithm works, the size of the allocated
+ * tracing_map_entry array is always twice the maximum number of
+ * elements (2 * max_elts). This value is stored in the map_size
+ * field of struct tracing_map.
+ *
+ * Because tracing_map_insert() needs to work from any context,
+ * including from within the memory allocation functions themselves,
+ * both the tracing_map_entry array and a pool of max_elts
+ * tracing_map_elts are pre-allocated before any call is made to
+ * tracing_map_insert().
+ *
+ * The tracing_map_entry array is allocated as a single block by
+ * tracing_map_create().
+ *
+ * Because the tracing_map_elts are much larger objects and can't
+ * generally be allocated together as a single large array without
+ * failure, they're allocated individually, by tracing_map_init().
+ *
+ * The pool of tracing_map_elts are allocated by tracing_map_init()
+ * rather than by tracing_map_create() because at the time
+ * tracing_map_create() is called, there isn't enough information to
+ * create the tracing_map_elts. Specifically,the user first needs to
+ * tell the tracing_map implementation how many fields the
+ * tracing_map_elts contain, and which types of fields they are (key
+ * or sum). The user does this via the tracing_map_add_sum_field()
+ * and tracing_map_add_key_field() functions, following which the user
+ * calls tracing_map_init() to finish up the tracing map setup. The
+ * array holding the pointers which make up the pre-allocated pool of
+ * tracing_map_elts is allocated as a single block and is stored in
+ * the elts field of struct tracing_map.
+ *
+ * There is also a set of structures used for sorting that might
+ * benefit from some minimal explanation.
+ *
+ * struct tracing_map_sort_key is used to drive the sort at any given
+ * time. By 'any given time' we mean that a different
+ * tracing_map_sort_key will be used at different times depending on
+ * whether the sort currently being performed is a primary or a
+ * secondary sort.
+ *
+ * The sort key is very simple, consisting of the field index of the
+ * tracing_map_elt field to sort on (which the user saved when adding
+ * the field), and whether the sort should be done in an ascending or
+ * descending order.
+ *
+ * For the convenience of the sorting code, a tracing_map_sort_entry
+ * is created for each tracing_map_elt, again individually allocated
+ * to avoid failures that might be expected if allocated as a single
+ * large array of struct tracing_map_sort_entry.
+ * tracing_map_sort_entry instances are the objects expected by the
+ * various internal sorting functions, and are also what the user
+ * ultimately receives after calling tracing_map_sort_entries().
+ * Because it doesn't make sense for users to access an unordered and
+ * sparsely populated tracing_map directly, the
+ * tracing_map_sort_entries() function is provided so that users can
+ * retrieve a sorted list of all existing elements. In addition to
+ * the associated tracing_map_elt 'elt' field contained within the
+ * tracing_map_sort_entry, which is the object of interest to the
+ * user, tracing_map_sort_entry objects contain a number of additional
+ * fields which are used for caching and internal purposes and can
+ * safely be ignored.
+*/
+
+struct tracing_map_field {
+ tracing_map_cmp_fn_t cmp_fn;
+ union {
+ atomic64_t sum;
+ unsigned int offset;
+ };
+};
+
+struct tracing_map_elt {
+ struct tracing_map *map;
+ struct tracing_map_field *fields;
+ void *key;
+ void *private_data;
+};
+
+struct tracing_map_entry {
+ u32 key;
+ struct tracing_map_elt *val;
+};
+
+struct tracing_map_sort_key {
+ unsigned int field_idx;
+ bool descending;
+};
+
+struct tracing_map_sort_entry {
+ void *key;
+ struct tracing_map_elt *elt;
+ bool elt_copied;
+ bool dup;
+};
+
+struct tracing_map_array {
+ unsigned int entries_per_page;
+ unsigned int entry_size_shift;
+ unsigned int entry_shift;
+ unsigned int entry_mask;
+ unsigned int n_pages;
+ void **pages;
+};
+
+#define TRACING_MAP_ARRAY_ELT(array, idx) \
+ (array->pages[idx >> array->entry_shift] + \
+ ((idx & array->entry_mask) << array->entry_size_shift))
+
+#define TRACING_MAP_ENTRY(array, idx) \
+ ((struct tracing_map_entry *)TRACING_MAP_ARRAY_ELT(array, idx))
+
+#define TRACING_MAP_ELT(array, idx) \
+ ((struct tracing_map_elt **)TRACING_MAP_ARRAY_ELT(array, idx))
+
+struct tracing_map {
+ unsigned int key_size;
+ unsigned int map_bits;
+ unsigned int map_size;
+ unsigned int max_elts;
+ atomic_t next_elt;
+ struct tracing_map_array *elts;
+ struct tracing_map_array *map;
+ const struct tracing_map_ops *ops;
+ void *private_data;
+ struct tracing_map_field fields[TRACING_MAP_FIELDS_MAX];
+ unsigned int n_fields;
+ int key_idx[TRACING_MAP_KEYS_MAX];
+ unsigned int n_keys;
+ struct tracing_map_sort_key sort_key;
+ atomic64_t hits;
+ atomic64_t drops;
+};
+
+/**
+ * struct tracing_map_ops - callbacks for tracing_map
+ *
+ * The methods in this structure define callback functions for various
+ * operations on a tracing_map or objects related to a tracing_map.
+ *
+ * For a detailed description of tracing_map_elt objects please see
+ * the overview of tracing_map data structures at the beginning of
+ * this file.
+ *
+ * All the methods below are optional.
+ *
+ * @elt_alloc: When a tracing_map_elt is allocated, this function, if
+ * defined, will be called and gives clients the opportunity to
+ * allocate additional data and attach it to the element
+ * (tracing_map_elt->private_data is meant for that purpose).
+ * Element allocation occurs before tracing begins, when the
+ * tracing_map_init() call is made by client code.
+ *
+ * @elt_copy: At certain points in the lifetime of an element, it may
+ * need to be copied. The copy should include a copy of the
+ * client-allocated data, which can be copied into the 'to'
+ * element from the 'from' element.
+ *
+ * @elt_free: When a tracing_map_elt is freed, this function is called
+ * and allows client-allocated per-element data to be freed.
+ *
+ * @elt_clear: This callback allows per-element client-defined data to
+ * be cleared, if applicable.
+ *
+ * @elt_init: This callback allows per-element client-defined data to
+ * be initialized when used i.e. when the element is actually
+ * claimed by tracing_map_insert() in the context of the map
+ * insertion.
+ */
+struct tracing_map_ops {
+ int (*elt_alloc)(struct tracing_map_elt *elt);
+ void (*elt_copy)(struct tracing_map_elt *to,
+ struct tracing_map_elt *from);
+ void (*elt_free)(struct tracing_map_elt *elt);
+ void (*elt_clear)(struct tracing_map_elt *elt);
+ void (*elt_init)(struct tracing_map_elt *elt);
+};
+
+extern struct tracing_map *
+tracing_map_create(unsigned int map_bits,
+ unsigned int key_size,
+ const struct tracing_map_ops *ops,
+ void *private_data);
+extern int tracing_map_init(struct tracing_map *map);
+
+extern int tracing_map_add_sum_field(struct tracing_map *map);
+extern int tracing_map_add_key_field(struct tracing_map *map,
+ unsigned int offset,
+ tracing_map_cmp_fn_t cmp_fn);
+
+extern void tracing_map_destroy(struct tracing_map *map);
+extern void tracing_map_clear(struct tracing_map *map);
+
+extern struct tracing_map_elt *
+tracing_map_insert(struct tracing_map *map, void *key);
+extern struct tracing_map_elt *
+tracing_map_lookup(struct tracing_map *map, void *key);
+
+extern tracing_map_cmp_fn_t tracing_map_cmp_num(int field_size,
+ int field_is_signed);
+extern int tracing_map_cmp_string(void *val_a, void *val_b);
+extern int tracing_map_cmp_none(void *val_a, void *val_b);
+
+extern void tracing_map_update_sum(struct tracing_map_elt *elt,
+ unsigned int i, u64 n);
+extern u64 tracing_map_read_sum(struct tracing_map_elt *elt, unsigned int i);
+extern void tracing_map_set_field_descr(struct tracing_map *map,
+ unsigned int i,
+ unsigned int key_offset,
+ tracing_map_cmp_fn_t cmp_fn);
+extern int
+tracing_map_sort_entries(struct tracing_map *map,
+ struct tracing_map_sort_key *sort_keys,
+ unsigned int n_sort_keys,
+ struct tracing_map_sort_entry ***sort_entries);
+
+extern void
+tracing_map_destroy_sort_entries(struct tracing_map_sort_entry **entries,
+ unsigned int n_entries);
+#endif /* __TRACING_MAP_H */
diff --git a/tools/testing/selftests/ftrace/test.d/functions b/tools/testing/selftests/ftrace/test.d/functions
index 5d8cd06d920f..c37262f6c269 100644
--- a/tools/testing/selftests/ftrace/test.d/functions
+++ b/tools/testing/selftests/ftrace/test.d/functions
@@ -14,3 +14,12 @@ enable_tracing() { # start trace recording
reset_tracer() { # reset the current tracer
echo nop > current_tracer
}
+
+reset_trigger() { # reset all current setting triggers
+ grep -v ^# events/*/*/trigger |
+ while read line; do
+ cmd=`echo $line | cut -f2- -d: | cut -f1 -d" "`
+ echo "!$cmd" > `echo $line | cut -f1 -d:`
+ done
+}
+
diff --git a/tools/testing/selftests/ftrace/test.d/trigger/trigger-eventonoff.tc b/tools/testing/selftests/ftrace/test.d/trigger/trigger-eventonoff.tc
new file mode 100644
index 000000000000..1a9445021bf1
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-eventonoff.tc
@@ -0,0 +1,64 @@
+#!/bin/sh
+# description: event trigger - test event enable/disable trigger
+
+do_reset() {
+ reset_trigger
+ echo > set_event
+ clear_trace
+}
+
+fail() { #msg
+ do_reset
+ echo $1
+ exit $FAIL
+}
+
+if [ ! -f set_event -o ! -d events/sched ]; then
+ echo "event tracing is not supported"
+ exit_unsupported
+fi
+
+if [ ! -f events/sched/sched_process_fork/trigger ]; then
+ echo "event trigger is not supported"
+ exit_unsupported
+fi
+
+reset_tracer
+do_reset
+
+FEATURE=`grep enable_event events/sched/sched_process_fork/trigger`
+if [ -z "$FEATURE" ]; then
+ echo "event enable/disable trigger is not supported"
+ exit_unsupported
+fi
+
+echo "Test enable_event trigger"
+echo 0 > events/sched/sched_switch/enable
+echo 'enable_event:sched:sched_switch' > events/sched/sched_process_fork/trigger
+( echo "forked")
+if [ `cat events/sched/sched_switch/enable` != '1*' ]; then
+ fail "enable_event trigger on sched_process_fork did not work"
+fi
+
+reset_trigger
+
+echo "Test disable_event trigger"
+echo 1 > events/sched/sched_switch/enable
+echo 'disable_event:sched:sched_switch' > events/sched/sched_process_fork/trigger
+( echo "forked")
+if [ `cat events/sched/sched_switch/enable` != '0*' ]; then
+ fail "disable_event trigger on sched_process_fork did not work"
+fi
+
+reset_trigger
+
+echo "Test semantic error for event enable/disable trigger"
+! echo 'enable_event:nogroup:noevent' > events/sched/sched_process_fork/trigger
+! echo 'disable_event+1' > events/sched/sched_process_fork/trigger
+echo 'enable_event:sched:sched_switch' > events/sched/sched_process_fork/trigger
+! echo 'enable_event:sched:sched_switch' > events/sched/sched_process_fork/trigger
+! echo 'disable_event:sched:sched_switch' > events/sched/sched_process_fork/trigger
+
+do_reset
+
+exit 0
diff --git a/tools/testing/selftests/ftrace/test.d/trigger/trigger-filter.tc b/tools/testing/selftests/ftrace/test.d/trigger/trigger-filter.tc
new file mode 100644
index 000000000000..514e466e198b
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-filter.tc
@@ -0,0 +1,59 @@
+#!/bin/sh
+# description: event trigger - test trigger filter
+
+do_reset() {
+ reset_trigger
+ echo > set_event
+ clear_trace
+}
+
+fail() { #msg
+ do_reset
+ echo $1
+ exit $FAIL
+}
+
+if [ ! -f set_event -o ! -d events/sched ]; then
+ echo "event tracing is not supported"
+ exit_unsupported
+fi
+
+if [ ! -f events/sched/sched_process_fork/trigger ]; then
+ echo "event trigger is not supported"
+ exit_unsupported
+fi
+
+reset_tracer
+do_reset
+
+echo "Test trigger filter"
+echo 1 > tracing_on
+echo 'traceoff if child_pid == 0' > events/sched/sched_process_fork/trigger
+( echo "forked")
+if [ `cat tracing_on` -ne 1 ]; then
+ fail "traceoff trigger on sched_process_fork did not work"
+fi
+
+reset_trigger
+
+echo "Test semantic error for trigger filter"
+! echo 'traceoff if a' > events/sched/sched_process_fork/trigger
+! echo 'traceoff if common_pid=0' > events/sched/sched_process_fork/trigger
+! echo 'traceoff if common_pid==b' > events/sched/sched_process_fork/trigger
+echo 'traceoff if common_pid == 0' > events/sched/sched_process_fork/trigger
+echo '!traceoff' > events/sched/sched_process_fork/trigger
+! echo 'traceoff if common_pid == child_pid' > events/sched/sched_process_fork/trigger
+echo 'traceoff if common_pid <= 0' > events/sched/sched_process_fork/trigger
+echo '!traceoff' > events/sched/sched_process_fork/trigger
+echo 'traceoff if common_pid >= 0' > events/sched/sched_process_fork/trigger
+echo '!traceoff' > events/sched/sched_process_fork/trigger
+echo 'traceoff if parent_pid >= 0 && child_pid >= 0' > events/sched/sched_process_fork/trigger
+echo '!traceoff' > events/sched/sched_process_fork/trigger
+echo 'traceoff if parent_pid >= 0 || child_pid >= 0' > events/sched/sched_process_fork/trigger
+echo '!traceoff' > events/sched/sched_process_fork/trigger
+
+
+
+do_reset
+
+exit 0
diff --git a/tools/testing/selftests/ftrace/test.d/trigger/trigger-hist-mod.tc b/tools/testing/selftests/ftrace/test.d/trigger/trigger-hist-mod.tc
new file mode 100644
index 000000000000..c2b61c4fda11
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-hist-mod.tc
@@ -0,0 +1,75 @@
+#!/bin/sh
+# description: event trigger - test histogram modifiers
+
+do_reset() {
+ reset_trigger
+ echo > set_event
+ clear_trace
+}
+
+fail() { #msg
+ do_reset
+ echo $1
+ exit $FAIL
+}
+
+if [ ! -f set_event -o ! -d events/sched ]; then
+ echo "event tracing is not supported"
+ exit_unsupported
+fi
+
+if [ ! -f events/sched/sched_process_fork/trigger ]; then
+ echo "event trigger is not supported"
+ exit_unsupported
+fi
+
+reset_tracer
+do_reset
+
+FEATURE=`grep hist events/sched/sched_process_fork/trigger`
+if [ -z "$FEATURE" ]; then
+ echo "hist trigger is not supported"
+ exit_unsupported
+fi
+
+echo "Test histogram with execname modifier"
+
+echo 'hist:keys=common_pid.execname' > events/sched/sched_process_fork/trigger
+for i in `seq 1 10` ; do ( echo "forked" > /dev/null); done
+COMM=`cat /proc/$$/comm`
+grep "common_pid: $COMM" events/sched/sched_process_fork/hist > /dev/null || \
+ fail "execname modifier on sched_process_fork did not work"
+
+reset_trigger
+
+echo "Test histogram with hex modifier"
+
+echo 'hist:keys=parent_pid.hex' > events/sched/sched_process_fork/trigger
+for i in `seq 1 10` ; do ( echo "forked" > /dev/null); done
+# Note that $$ is the parent pid. $PID is current PID.
+HEX=`printf %x $PID`
+grep "parent_pid: $HEX" events/sched/sched_process_fork/hist > /dev/null || \
+ fail "hex modifier on sched_process_fork did not work"
+
+reset_trigger
+
+echo "Test histogram with syscall modifier"
+
+echo 'hist:keys=id.syscall' > events/raw_syscalls/sys_exit/trigger
+for i in `seq 1 10` ; do ( echo "forked" > /dev/null); done
+grep "id: sys_" events/raw_syscalls/sys_exit/hist > /dev/null || \
+ fail "syscall modifier on raw_syscalls/sys_exit did not work"
+
+
+reset_trigger
+
+echo "Test histgram with log2 modifier"
+
+echo 'hist:keys=bytes_req.log2' > events/kmem/kmalloc/trigger
+for i in `seq 1 10` ; do ( echo "forked" > /dev/null); done
+grep 'bytes_req: ~ 2^[0-9]*' events/kmem/kmalloc/hist > /dev/null || \
+ fail "log2 modifier on kmem/kmalloc did not work"
+
+do_reset
+
+exit 0
diff --git a/tools/testing/selftests/ftrace/test.d/trigger/trigger-hist.tc b/tools/testing/selftests/ftrace/test.d/trigger/trigger-hist.tc
new file mode 100644
index 000000000000..b2902d42a537
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-hist.tc
@@ -0,0 +1,83 @@
+#!/bin/sh
+# description: event trigger - test histogram trigger
+
+do_reset() {
+ reset_trigger
+ echo > set_event
+ clear_trace
+}
+
+fail() { #msg
+ do_reset
+ echo $1
+ exit $FAIL
+}
+
+if [ ! -f set_event -o ! -d events/sched ]; then
+ echo "event tracing is not supported"
+ exit_unsupported
+fi
+
+if [ ! -f events/sched/sched_process_fork/trigger ]; then
+ echo "event trigger is not supported"
+ exit_unsupported
+fi
+
+reset_tracer
+do_reset
+
+FEATURE=`grep hist events/sched/sched_process_fork/trigger`
+if [ -z "$FEATURE" ]; then
+ echo "hist trigger is not supported"
+ exit_unsupported
+fi
+
+echo "Test histogram basic tigger"
+
+echo 'hist:keys=parent_pid:vals=child_pid' > events/sched/sched_process_fork/trigger
+for i in `seq 1 10` ; do ( echo "forked" > /dev/null); done
+grep parent_pid events/sched/sched_process_fork/hist > /dev/null || \
+ fail "hist trigger on sched_process_fork did not work"
+grep child events/sched/sched_process_fork/hist > /dev/null || \
+ fail "hist trigger on sched_process_fork did not work"
+
+reset_trigger
+
+echo "Test histogram with compound keys"
+
+echo 'hist:keys=parent_pid,child_pid' > events/sched/sched_process_fork/trigger
+for i in `seq 1 10` ; do ( echo "forked" > /dev/null); done
+grep '^{ parent_pid:.*, child_pid:.*}' events/sched/sched_process_fork/hist > /dev/null || \
+ fail "compound keys on sched_process_fork did not work"
+
+reset_trigger
+
+echo "Test histogram with string key"
+
+echo 'hist:keys=parent_comm' > events/sched/sched_process_fork/trigger
+for i in `seq 1 10` ; do ( echo "forked" > /dev/null); done
+COMM=`cat /proc/$$/comm`
+grep "parent_comm: $COMM" events/sched/sched_process_fork/hist > /dev/null || \
+ fail "string key on sched_process_fork did not work"
+
+reset_trigger
+
+echo "Test histogram with sort key"
+
+echo 'hist:keys=parent_pid,child_pid:sort=child_pid.ascending' > events/sched/sched_process_fork/trigger
+for i in `seq 1 10` ; do ( echo "forked" > /dev/null); done
+
+check_inc() {
+ while [ $# -gt 1 ]; do
+ [ $1 -gt $2 ] && return 1
+ shift 1
+ done
+ return 0
+}
+check_inc `grep -o "child_pid:[[:space:]]*[[:digit:]]*" \
+ events/sched/sched_process_fork/hist | cut -d: -f2 ` ||
+ fail "sort param on sched_process_fork did not work"
+
+do_reset
+
+exit 0
diff --git a/tools/testing/selftests/ftrace/test.d/trigger/trigger-multihist.tc b/tools/testing/selftests/ftrace/test.d/trigger/trigger-multihist.tc
new file mode 100644
index 000000000000..03c4a46561fc
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-multihist.tc
@@ -0,0 +1,73 @@
+#!/bin/sh
+# description: event trigger - test multiple histogram triggers
+
+do_reset() {
+ reset_trigger
+ echo > set_event
+ clear_trace
+}
+
+fail() { #msg
+ do_reset
+ echo $1
+ exit $FAIL
+}
+
+if [ ! -f set_event -o ! -d events/sched ]; then
+ echo "event tracing is not supported"
+ exit_unsupported
+fi
+
+if [ ! -f events/sched/sched_process_fork/trigger ]; then
+ echo "event trigger is not supported"
+ exit_unsupported
+fi
+
+reset_tracer
+do_reset
+
+FEATURE=`grep hist events/sched/sched_process_fork/trigger`
+if [ -z "$FEATURE" ]; then
+ echo "hist trigger is not supported"
+ exit_unsupported
+fi
+
+reset_trigger
+
+echo "Test histogram multiple tiggers"
+
+echo 'hist:keys=parent_pid:vals=child_pid' > events/sched/sched_process_fork/trigger
+echo 'hist:keys=parent_comm:vals=child_pid' >> events/sched/sched_process_fork/trigger
+for i in `seq 1 10` ; do ( echo "forked" > /dev/null); done
+grep parent_pid events/sched/sched_process_fork/hist > /dev/null || \
+ fail "hist trigger on sched_process_fork did not work"
+grep child events/sched/sched_process_fork/hist > /dev/null || \
+ fail "hist trigger on sched_process_fork did not work"
+COMM=`cat /proc/$$/comm`
+grep "parent_comm: $COMM" events/sched/sched_process_fork/hist > /dev/null || \
+ fail "string key on sched_process_fork did not work"
+
+reset_trigger
+
+echo "Test histogram with its name"
+
+echo 'hist:name=test_hist:keys=common_pid' > events/sched/sched_process_fork/trigger
+for i in `seq 1 10` ; do ( echo "forked" > /dev/null); done
+grep test_hist events/sched/sched_process_fork/hist > /dev/null || \
+ fail "named event on sched_process_fork did not work"
+
+echo "Test same named histogram on different events"
+
+echo 'hist:name=test_hist:keys=common_pid' > events/sched/sched_process_exit/trigger
+for i in `seq 1 10` ; do ( echo "forked" > /dev/null); done
+grep test_hist events/sched/sched_process_exit/hist > /dev/null || \
+ fail "named event on sched_process_fork did not work"
+
+diffs=`diff events/sched/sched_process_exit/hist events/sched/sched_process_fork/hist | wc -l`
+test $diffs -eq 0 || fail "Same name histograms are not same"
+
+reset_trigger
+
+do_reset
+
+exit 0
diff --git a/tools/testing/selftests/ftrace/test.d/trigger/trigger-snapshot.tc b/tools/testing/selftests/ftrace/test.d/trigger/trigger-snapshot.tc
new file mode 100644
index 000000000000..f84b80d551a2
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-snapshot.tc
@@ -0,0 +1,56 @@
+#!/bin/sh
+# description: event trigger - test snapshot-trigger
+
+do_reset() {
+ reset_trigger
+ echo > set_event
+ clear_trace
+}
+
+fail() { #msg
+ do_reset
+ echo $1
+ exit $FAIL
+}
+
+if [ ! -f set_event -o ! -d events/sched ]; then
+ echo "event tracing is not supported"
+ exit_unsupported
+fi
+
+if [ ! -f events/sched/sched_process_fork/trigger ]; then
+ echo "event trigger is not supported"
+ exit_unsupported
+fi
+
+reset_tracer
+do_reset
+
+FEATURE=`grep snapshot events/sched/sched_process_fork/trigger`
+if [ -z "$FEATURE" ]; then
+ echo "snapshot trigger is not supported"
+ exit_unsupported
+fi
+
+echo "Test snapshot tigger"
+echo 0 > snapshot
+echo 1 > events/sched/sched_process_fork/enable
+( echo "forked")
+echo 'snapshot:1' > events/sched/sched_process_fork/trigger
+( echo "forked")
+grep sched_process_fork snapshot > /dev/null || \
+ fail "snapshot trigger on sched_process_fork did not work"
+
+reset_trigger
+echo 0 > snapshot
+echo 0 > events/sched/sched_process_fork/enable
+
+echo "Test snapshot semantic errors"
+
+! echo "snapshot+1" > events/sched/sched_process_fork/trigger
+echo "snapshot" > events/sched/sched_process_fork/trigger
+! echo "snapshot" > events/sched/sched_process_fork/trigger
+
+do_reset
+
+exit 0
diff --git a/tools/testing/selftests/ftrace/test.d/trigger/trigger-stacktrace.tc b/tools/testing/selftests/ftrace/test.d/trigger/trigger-stacktrace.tc
new file mode 100644
index 000000000000..9fa23b085def
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-stacktrace.tc
@@ -0,0 +1,53 @@
+#!/bin/sh
+# description: event trigger - test stacktrace-trigger
+
+do_reset() {
+ reset_trigger
+ echo > set_event
+ clear_trace
+}
+
+fail() { #msg
+ do_reset
+ echo $1
+ exit $FAIL
+}
+
+if [ ! -f set_event -o ! -d events/sched ]; then
+ echo "event tracing is not supported"
+ exit_unsupported
+fi
+
+if [ ! -f events/sched/sched_process_fork/trigger ]; then
+ echo "event trigger is not supported"
+ exit_unsupported
+fi
+
+reset_tracer
+do_reset
+
+FEATURE=`grep stacktrace events/sched/sched_process_fork/trigger`
+if [ -z "$FEATURE" ]; then
+ echo "stacktrace trigger is not supported"
+ exit_unsupported
+fi
+
+echo "Test stacktrace tigger"
+echo 0 > trace
+echo 0 > options/stacktrace
+echo 'stacktrace' > events/sched/sched_process_fork/trigger
+( echo "forked")
+grep "<stack trace>" trace > /dev/null || \
+ fail "stacktrace trigger on sched_process_fork did not work"
+
+reset_trigger
+
+echo "Test stacktrace semantic errors"
+
+! echo "stacktrace:foo" > events/sched/sched_process_fork/trigger
+echo "stacktrace" > events/sched/sched_process_fork/trigger
+! echo "stacktrace" > events/sched/sched_process_fork/trigger
+
+do_reset
+
+exit 0
diff --git a/tools/testing/selftests/ftrace/test.d/trigger/trigger-traceonoff.tc b/tools/testing/selftests/ftrace/test.d/trigger/trigger-traceonoff.tc
new file mode 100644
index 000000000000..87648e5f987c
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-traceonoff.tc
@@ -0,0 +1,58 @@
+#!/bin/sh
+# description: event trigger - test traceon/off trigger
+
+do_reset() {
+ reset_trigger
+ echo > set_event
+ clear_trace
+}
+
+fail() { #msg
+ do_reset
+ echo $1
+ exit $FAIL
+}
+
+if [ ! -f set_event -o ! -d events/sched ]; then
+ echo "event tracing is not supported"
+ exit_unsupported
+fi
+
+if [ ! -f events/sched/sched_process_fork/trigger ]; then
+ echo "event trigger is not supported"
+ exit_unsupported
+fi
+
+reset_tracer
+do_reset
+
+echo "Test traceoff trigger"
+echo 1 > tracing_on
+echo 'traceoff' > events/sched/sched_process_fork/trigger
+( echo "forked")
+if [ `cat tracing_on` -ne 0 ]; then
+ fail "traceoff trigger on sched_process_fork did not work"
+fi
+
+reset_trigger
+
+echo "Test traceon trigger"
+echo 0 > tracing_on
+echo 'traceon' > events/sched/sched_process_fork/trigger
+( echo "forked")
+if [ `cat tracing_on` -ne 1 ]; then
+ fail "traceoff trigger on sched_process_fork did not work"
+fi
+
+reset_trigger
+
+echo "Test semantic error for traceoff/on trigger"
+! echo 'traceoff:badparam' > events/sched/sched_process_fork/trigger
+! echo 'traceoff+0' > events/sched/sched_process_fork/trigger
+echo 'traceon' > events/sched/sched_process_fork/trigger
+! echo 'traceon' > events/sched/sched_process_fork/trigger
+! echo 'traceoff' > events/sched/sched_process_fork/trigger
+
+do_reset
+
+exit 0