summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorClark Williams <williams@redhat.com>2009-05-12 16:53:59 -0500
committerClark Williams <williams@redhat.com>2009-05-12 16:53:59 -0500
commit337bf38f80a5af14a0927285fa7329d7626467f2 (patch)
tree1d0e39e9ca0d1d8b72b4b06b970763f0c6934aa9
parent9153c383ec7f7b08fd7d50f35d39a1a6c933b579 (diff)
downloadrt-tests-337bf38f80a5af14a0927285fa7329d7626467f2.tar.gz
rt-tests-337bf38f80a5af14a0927285fa7329d7626467f2.tar.xz
initial checkin
-rw-r--r--src/smidetect/smi_detector.txt137
1 files changed, 137 insertions, 0 deletions
diff --git a/src/smidetect/smi_detector.txt b/src/smidetect/smi_detector.txt
new file mode 100644
index 0000000..411bedc
--- /dev/null
+++ b/src/smidetect/smi_detector.txt
@@ -0,0 +1,137 @@
+Introduction:
+-------------
+
+The module smi_detector is a special purpose kernel module that is
+used to detect if System Management Interrupts (SMIs) are causing event
+latencies in the Linux RT kernel.
+
+SMIs are usually not serviced by the Linux kernel. They are set up by
+BIOS code and are serviced by BIOS code, usually for critical events
+such as management of thermal sensors and fans. Sometimes though, SMIs
+are used for other tasks and those tasks can spend an inordinate
+amount of time in the handler (sometimes measured in milliseconds).
+Obviously if you are trying to keep event service latencies down in the
+microsecond range, this is a problem.
+
+The SMI detector works by hogging the cpu for configurable amounts of
+time (by calling stop_machine()), polling the Time Stamp Counter (TSC)
+register for some period, then looking for gaps in the TSC data. Any
+gap indicates a time when the polling was interrupted and since the
+machine is stopped and interrupts turned off the only thing that could
+do that would be an SMI.
+
+Note that the SMI detector should *NEVER* be used in a production
+environment. It is intended to be run manually to determine if the
+hardware platform has a problem with long SMI service routines.
+
+Usage:
+------
+
+Loading the module smi_detector passing the parameter "enabled=1" is the only
+step required to start the smi_detector. It is possible to define a threshold
+in microseconds (us) above which latency spikes will be taken in account
+(parameter "threshold=").
+
+Example:
+
+ # insmod ./smi_detector.ko enabled=1 threshold=100
+
+After the module is loaded, it creates a directory named "smi_detector" under
+the debugfs mountpoint, "/debugfs/smi_detector" for this text. It is necessary
+to have debugfs mounted.
+
+avg_smi_interval_us - average interval (usecs) between SMI latency spikes
+latency_threshold_us - minimum latency value to be considered (usecs)
+max_sample_us - maximum SMI latency spike observed (usecs)
+ms_between_samples - interval between samples (ms)
+ms_per_sample - sampling time (ms)
+sample_us - last sample (usecs). Continously updated, may be used
+ to plot graphs or to create histograms
+smi_count - how many latency spikes have been observed
+
+
+ # cd /debugfs/smi_detector
+
+ # cat *
+ 0 (avg_smi_interval_us)
+ 100 (latency_threshold_us)
+ 0 (latency_threshold_us)
+ 5000 (ms_between_samples)
+ 1 (ms_per_sample)
+ 3 (sample_us)
+ 0 (smi_count)
+
+
+The default values for ms_between_samples and ms_per_sample define that every
+5000ms (5s) samples will be collected during 1ms. It is possible to change the
+sampling time or the sampling interval writing to the related files. To collect
+samples during 5ms every 50ms:
+
+ # echo 5 > ms_per_sample
+ # echo 50 > ms_between_samples
+
+
+ # cat ms_between_samples
+ 50
+ # cat ms_per_sample
+ 5
+
+After a while, data may be verified to atest the existence of SMI induced
+latencies:
+
+ # cat smi_count
+ 2
+ # cat max_sample_us
+ 468
+ # cat avg_smi_interval_us
+ 1356306050
+
+Depending on the latencies observed it is important to better adjust the
+sampling intervals to obtain more accurate measurements. Care must be taken
+to not create a continous sampling situation, that might be perceived by the
+kernel as a deadlock.
+
+Python Script
+-------------
+
+A python script is provided in in the kernel scripts directory named
+'smidetect'. This script handles mounting/unmounting the debugfs,
+loading/unloading the smi_detector module, setting various parameters
+for the SMI detector and then starting and stopping the detection. The
+smidetect script handles the following arguments:
+
+Usage: smidetect [options]
+
+Options:
+ -h, --help show this help message and exit
+ --duration=DURATION total time to test for SMIs (<n>{smdw})
+ --threshold=THRESHOLD
+ value above which is considered an SMI (microseconds
+ --interval=INTERVAL time between samples (milliseconds)
+ --sample_width=SAMPLE_WIDTH
+ time to actually measure (milliseconds)
+ --report=REPORT filename for sample data
+ --cleanup force unload of module and umount of debugfs
+ --debug turn on debugging prints
+ --quiet turn off all screen output
+
+
+An example run looks like this:
+
+ $ sudo scripts/smidetect --duration=1m --report=smi.txt
+ smidetect: test duration 60 seconds
+ parameters:
+ Latency threshold: 100us
+ Non-sampling gap: 5ms
+ Sample length: 20ms
+ Output File: smi.txt
+ Starting test
+ test finished
+ Max Latency: 427us
+ Samples recorded: 2201
+ Samples exceeding threshold: 7
+ sample data written to smi.txt
+ $
+
+The above run generated 2201 samples of which 27 exceeded the 100
+microsecond threshold.