diff options
author | Clark Williams <williams@redhat.com> | 2009-05-12 16:53:59 -0500 |
---|---|---|
committer | Clark Williams <williams@redhat.com> | 2009-05-12 16:53:59 -0500 |
commit | 337bf38f80a5af14a0927285fa7329d7626467f2 (patch) | |
tree | 1d0e39e9ca0d1d8b72b4b06b970763f0c6934aa9 | |
parent | 9153c383ec7f7b08fd7d50f35d39a1a6c933b579 (diff) | |
download | rt-tests-337bf38f80a5af14a0927285fa7329d7626467f2.tar.gz rt-tests-337bf38f80a5af14a0927285fa7329d7626467f2.tar.xz |
initial checkin
-rw-r--r-- | src/smidetect/smi_detector.txt | 137 |
1 files changed, 137 insertions, 0 deletions
diff --git a/src/smidetect/smi_detector.txt b/src/smidetect/smi_detector.txt new file mode 100644 index 0000000..411bedc --- /dev/null +++ b/src/smidetect/smi_detector.txt @@ -0,0 +1,137 @@ +Introduction: +------------- + +The module smi_detector is a special purpose kernel module that is +used to detect if System Management Interrupts (SMIs) are causing event +latencies in the Linux RT kernel. + +SMIs are usually not serviced by the Linux kernel. They are set up by +BIOS code and are serviced by BIOS code, usually for critical events +such as management of thermal sensors and fans. Sometimes though, SMIs +are used for other tasks and those tasks can spend an inordinate +amount of time in the handler (sometimes measured in milliseconds). +Obviously if you are trying to keep event service latencies down in the +microsecond range, this is a problem. + +The SMI detector works by hogging the cpu for configurable amounts of +time (by calling stop_machine()), polling the Time Stamp Counter (TSC) +register for some period, then looking for gaps in the TSC data. Any +gap indicates a time when the polling was interrupted and since the +machine is stopped and interrupts turned off the only thing that could +do that would be an SMI. + +Note that the SMI detector should *NEVER* be used in a production +environment. It is intended to be run manually to determine if the +hardware platform has a problem with long SMI service routines. + +Usage: +------ + +Loading the module smi_detector passing the parameter "enabled=1" is the only +step required to start the smi_detector. It is possible to define a threshold +in microseconds (us) above which latency spikes will be taken in account +(parameter "threshold="). + +Example: + + # insmod ./smi_detector.ko enabled=1 threshold=100 + +After the module is loaded, it creates a directory named "smi_detector" under +the debugfs mountpoint, "/debugfs/smi_detector" for this text. It is necessary +to have debugfs mounted. + +avg_smi_interval_us - average interval (usecs) between SMI latency spikes +latency_threshold_us - minimum latency value to be considered (usecs) +max_sample_us - maximum SMI latency spike observed (usecs) +ms_between_samples - interval between samples (ms) +ms_per_sample - sampling time (ms) +sample_us - last sample (usecs). Continously updated, may be used + to plot graphs or to create histograms +smi_count - how many latency spikes have been observed + + + # cd /debugfs/smi_detector + + # cat * + 0 (avg_smi_interval_us) + 100 (latency_threshold_us) + 0 (latency_threshold_us) + 5000 (ms_between_samples) + 1 (ms_per_sample) + 3 (sample_us) + 0 (smi_count) + + +The default values for ms_between_samples and ms_per_sample define that every +5000ms (5s) samples will be collected during 1ms. It is possible to change the +sampling time or the sampling interval writing to the related files. To collect +samples during 5ms every 50ms: + + # echo 5 > ms_per_sample + # echo 50 > ms_between_samples + + + # cat ms_between_samples + 50 + # cat ms_per_sample + 5 + +After a while, data may be verified to atest the existence of SMI induced +latencies: + + # cat smi_count + 2 + # cat max_sample_us + 468 + # cat avg_smi_interval_us + 1356306050 + +Depending on the latencies observed it is important to better adjust the +sampling intervals to obtain more accurate measurements. Care must be taken +to not create a continous sampling situation, that might be perceived by the +kernel as a deadlock. + +Python Script +------------- + +A python script is provided in in the kernel scripts directory named +'smidetect'. This script handles mounting/unmounting the debugfs, +loading/unloading the smi_detector module, setting various parameters +for the SMI detector and then starting and stopping the detection. The +smidetect script handles the following arguments: + +Usage: smidetect [options] + +Options: + -h, --help show this help message and exit + --duration=DURATION total time to test for SMIs (<n>{smdw}) + --threshold=THRESHOLD + value above which is considered an SMI (microseconds + --interval=INTERVAL time between samples (milliseconds) + --sample_width=SAMPLE_WIDTH + time to actually measure (milliseconds) + --report=REPORT filename for sample data + --cleanup force unload of module and umount of debugfs + --debug turn on debugging prints + --quiet turn off all screen output + + +An example run looks like this: + + $ sudo scripts/smidetect --duration=1m --report=smi.txt + smidetect: test duration 60 seconds + parameters: + Latency threshold: 100us + Non-sampling gap: 5ms + Sample length: 20ms + Output File: smi.txt + Starting test + test finished + Max Latency: 427us + Samples recorded: 2201 + Samples exceeding threshold: 7 + sample data written to smi.txt + $ + +The above run generated 2201 samples of which 27 exceeded the 100 +microsecond threshold. |