From 2a0e49279850d28c450f27e51b419ce90bacdcdc Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Mon, 13 Mar 2017 23:59:57 +0100 Subject: cpufreq: User/admin documentation update and consolidation The user/admin documentation of cpufreq is badly outdated. It conains stale and/or inaccurate information along with things that are not particularly useful. Also, some of the important pieces are missing from it. For this reason, add a new user/admin document for cpufreq containing current information to admin-guide and drop the old outdated .txt documents it is replacing. Since there will be more PM documents in admin-guide going forward, create a separate directory for them and put the cpufreq document in there right away. Signed-off-by: Rafael J. Wysocki Acked-by: Viresh Kumar Signed-off-by: Jonathan Corbet --- Documentation/cpu-freq/boost.txt | 93 ----------- Documentation/cpu-freq/governors.txt | 301 ---------------------------------- Documentation/cpu-freq/index.txt | 7 - Documentation/cpu-freq/user-guide.txt | 228 ------------------------- 4 files changed, 629 deletions(-) delete mode 100644 Documentation/cpu-freq/boost.txt delete mode 100644 Documentation/cpu-freq/governors.txt delete mode 100644 Documentation/cpu-freq/user-guide.txt (limited to 'Documentation/cpu-freq') diff --git a/Documentation/cpu-freq/boost.txt b/Documentation/cpu-freq/boost.txt deleted file mode 100644 index dd62e1334f0a..000000000000 --- a/Documentation/cpu-freq/boost.txt +++ /dev/null @@ -1,93 +0,0 @@ -Processor boosting control - - - information for users - - -Quick guide for the impatient: --------------------- -/sys/devices/system/cpu/cpufreq/boost -controls the boost setting for the whole system. You can read and write -that file with either "0" (boosting disabled) or "1" (boosting allowed). -Reading or writing 1 does not mean that the system is boosting at this -very moment, but only that the CPU _may_ raise the frequency at it's -discretion. --------------------- - -Introduction -------------- -Some CPUs support a functionality to raise the operating frequency of -some cores in a multi-core package if certain conditions apply, mostly -if the whole chip is not fully utilized and below it's intended thermal -budget. The decision about boost disable/enable is made either at hardware -(e.g. x86) or software (e.g ARM). -On Intel CPUs this is called "Turbo Boost", AMD calls it "Turbo-Core", -in technical documentation "Core performance boost". In Linux we use -the term "boost" for convenience. - -Rationale for disable switch ----------------------------- - -Though the idea is to just give better performance without any user -intervention, sometimes the need arises to disable this functionality. -Most systems offer a switch in the (BIOS) firmware to disable the -functionality at all, but a more fine-grained and dynamic control would -be desirable: -1. While running benchmarks, reproducible results are important. Since - the boosting functionality depends on the load of the whole package, - single thread performance can vary. By explicitly disabling the boost - functionality at least for the benchmark's run-time the system will run - at a fixed frequency and results are reproducible again. -2. To examine the impact of the boosting functionality it is helpful - to do tests with and without boosting. -3. Boosting means overclocking the processor, though under controlled - conditions. By raising the frequency and the voltage the processor - will consume more power than without the boosting, which may be - undesirable for instance for mobile users. Disabling boosting may - save power here, though this depends on the workload. - - -User controlled switch ----------------------- - -To allow the user to toggle the boosting functionality, the cpufreq core -driver exports a sysfs knob to enable or disable it. There is a file: -/sys/devices/system/cpu/cpufreq/boost -which can either read "0" (boosting disabled) or "1" (boosting enabled). -The file is exported only when cpufreq driver supports boosting. -Explicitly changing the permissions and writing to that file anyway will -return EINVAL. - -On supported CPUs one can write either a "0" or a "1" into this file. -This will either disable the boost functionality on all cores in the -whole system (0) or will allow the software or hardware to boost at will -(1). - -Writing a "1" does not explicitly boost the system, but just allows the -CPU to boost at their discretion. Some implementations take external -factors like the chip's temperature into account, so boosting once does -not necessarily mean that it will occur every time even using the exact -same software setup. - - -AMD legacy cpb switch ---------------------- -The AMD powernow-k8 driver used to support a very similar switch to -disable or enable the "Core Performance Boost" feature of some AMD CPUs. -This switch was instantiated in each CPU's cpufreq directory -(/sys/devices/system/cpu[0-9]*/cpufreq) and was called "cpb". -Though the per CPU existence hints at a more fine grained control, the -actual implementation only supported a system-global switch semantics, -which was simply reflected into each CPU's file. Writing a 0 or 1 into it -would pull the other CPUs to the same state. -For compatibility reasons this file and its behavior is still supported -on AMD CPUs, though it is now protected by a config switch -(X86_ACPI_CPUFREQ_CPB). On Intel CPUs this file will never be created, -even with the config option set. -This functionality is considered legacy and will be removed in some future -kernel version. - -More fine grained boosting control ----------------------------------- - -Technically it is possible to switch the boosting functionality at least -on a per package basis, for some CPUs even per core. Currently the driver -does not support it, but this may be implemented in the future. diff --git a/Documentation/cpu-freq/governors.txt b/Documentation/cpu-freq/governors.txt deleted file mode 100644 index 61b3184b6c24..000000000000 --- a/Documentation/cpu-freq/governors.txt +++ /dev/null @@ -1,301 +0,0 @@ - CPU frequency and voltage scaling code in the Linux(TM) kernel - - - L i n u x C P U F r e q - - C P U F r e q G o v e r n o r s - - - information for users and developers - - - - Dominik Brodowski - some additions and corrections by Nico Golde - Rafael J. Wysocki - Viresh Kumar - - - - Clock scaling allows you to change the clock speed of the CPUs on the - fly. This is a nice method to save battery power, because the lower - the clock speed, the less power the CPU consumes. - - -Contents: ---------- -1. What is a CPUFreq Governor? - -2. Governors In the Linux Kernel -2.1 Performance -2.2 Powersave -2.3 Userspace -2.4 Ondemand -2.5 Conservative -2.6 Schedutil - -3. The Governor Interface in the CPUfreq Core - -4. References - - -1. What Is A CPUFreq Governor? -============================== - -Most cpufreq drivers (except the intel_pstate and longrun) or even most -cpu frequency scaling algorithms only allow the CPU frequency to be set -to predefined fixed values. In order to offer dynamic frequency -scaling, the cpufreq core must be able to tell these drivers of a -"target frequency". So these specific drivers will be transformed to -offer a "->target/target_index/fast_switch()" call instead of the -"->setpolicy()" call. For set_policy drivers, all stays the same, -though. - -How to decide what frequency within the CPUfreq policy should be used? -That's done using "cpufreq governors". - -Basically, it's the following flow graph: - -CPU can be set to switch independently | CPU can only be set - within specific "limits" | to specific frequencies - - "CPUfreq policy" - consists of frequency limits (policy->{min,max}) - and CPUfreq governor to be used - / \ - / \ - / the cpufreq governor decides - / (dynamically or statically) - / what target_freq to set within - / the limits of policy->{min,max} - / \ - / \ - Using the ->setpolicy call, Using the ->target/target_index/fast_switch call, - the limits and the the frequency closest - "policy" is set. to target_freq is set. - It is assured that it - is within policy->{min,max} - - -2. Governors In the Linux Kernel -================================ - -2.1 Performance ---------------- - -The CPUfreq governor "performance" sets the CPU statically to the -highest frequency within the borders of scaling_min_freq and -scaling_max_freq. - - -2.2 Powersave -------------- - -The CPUfreq governor "powersave" sets the CPU statically to the -lowest frequency within the borders of scaling_min_freq and -scaling_max_freq. - - -2.3 Userspace -------------- - -The CPUfreq governor "userspace" allows the user, or any userspace -program running with UID "root", to set the CPU to a specific frequency -by making a sysfs file "scaling_setspeed" available in the CPU-device -directory. - - -2.4 Ondemand ------------- - -The CPUfreq governor "ondemand" sets the CPU frequency depending on the -current system load. Load estimation is triggered by the scheduler -through the update_util_data->func hook; when triggered, cpufreq checks -the CPU-usage statistics over the last period and the governor sets the -CPU accordingly. The CPU must have the capability to switch the -frequency very quickly. - -Sysfs files: - -* sampling_rate: - - Measured in uS (10^-6 seconds), this is how often you want the kernel - to look at the CPU usage and to make decisions on what to do about the - frequency. Typically this is set to values of around '10000' or more. - It's default value is (cmp. with users-guide.txt): transition_latency - * 1000. Be aware that transition latency is in ns and sampling_rate - is in us, so you get the same sysfs value by default. Sampling rate - should always get adjusted considering the transition latency to set - the sampling rate 750 times as high as the transition latency in the - bash (as said, 1000 is default), do: - - $ echo `$(($(cat cpuinfo_transition_latency) * 750 / 1000)) > ondemand/sampling_rate - -* sampling_rate_min: - - The sampling rate is limited by the HW transition latency: - transition_latency * 100 - - Or by kernel restrictions: - - If CONFIG_NO_HZ_COMMON is set, the limit is 10ms fixed. - - If CONFIG_NO_HZ_COMMON is not set or nohz=off boot parameter is - used, the limits depend on the CONFIG_HZ option: - HZ=1000: min=20000us (20ms) - HZ=250: min=80000us (80ms) - HZ=100: min=200000us (200ms) - - The highest value of kernel and HW latency restrictions is shown and - used as the minimum sampling rate. - -* up_threshold: - - This defines what the average CPU usage between the samplings of - 'sampling_rate' needs to be for the kernel to make a decision on - whether it should increase the frequency. For example when it is set - to its default value of '95' it means that between the checking - intervals the CPU needs to be on average more than 95% in use to then - decide that the CPU frequency needs to be increased. - -* ignore_nice_load: - - This parameter takes a value of '0' or '1'. When set to '0' (its - default), all processes are counted towards the 'cpu utilisation' - value. When set to '1', the processes that are run with a 'nice' - value will not count (and thus be ignored) in the overall usage - calculation. This is useful if you are running a CPU intensive - calculation on your laptop that you do not care how long it takes to - complete as you can 'nice' it and prevent it from taking part in the - deciding process of whether to increase your CPU frequency. - -* sampling_down_factor: - - This parameter controls the rate at which the kernel makes a decision - on when to decrease the frequency while running at top speed. When set - to 1 (the default) decisions to reevaluate load are made at the same - interval regardless of current clock speed. But when set to greater - than 1 (e.g. 100) it acts as a multiplier for the scheduling interval - for reevaluating load when the CPU is at its top speed due to high - load. This improves performance by reducing the overhead of load - evaluation and helping the CPU stay at its top speed when truly busy, - rather than shifting back and forth in speed. This tunable has no - effect on behavior at lower speeds/lower CPU loads. - -* powersave_bias: - - This parameter takes a value between 0 to 1000. It defines the - percentage (times 10) value of the target frequency that will be - shaved off of the target. For example, when set to 100 -- 10%, when - ondemand governor would have targeted 1000 MHz, it will target - 1000 MHz - (10% of 1000 MHz) = 900 MHz instead. This is set to 0 - (disabled) by default. - - When AMD frequency sensitivity powersave bias driver -- - drivers/cpufreq/amd_freq_sensitivity.c is loaded, this parameter - defines the workload frequency sensitivity threshold in which a lower - frequency is chosen instead of ondemand governor's original target. - The frequency sensitivity is a hardware reported (on AMD Family 16h - Processors and above) value between 0 to 100% that tells software how - the performance of the workload running on a CPU will change when - frequency changes. A workload with sensitivity of 0% (memory/IO-bound) - will not perform any better on higher core frequency, whereas a - workload with sensitivity of 100% (CPU-bound) will perform better - higher the frequency. When the driver is loaded, this is set to 400 by - default -- for CPUs running workloads with sensitivity value below - 40%, a lower frequency is chosen. Unloading the driver or writing 0 - will disable this feature. - - -2.5 Conservative ----------------- - -The CPUfreq governor "conservative", much like the "ondemand" -governor, sets the CPU frequency depending on the current usage. It -differs in behaviour in that it gracefully increases and decreases the -CPU speed rather than jumping to max speed the moment there is any load -on the CPU. This behaviour is more suitable in a battery powered -environment. The governor is tweaked in the same manner as the -"ondemand" governor through sysfs with the addition of: - -* freq_step: - - This describes what percentage steps the cpu freq should be increased - and decreased smoothly by. By default the cpu frequency will increase - in 5% chunks of your maximum cpu frequency. You can change this value - to anywhere between 0 and 100 where '0' will effectively lock your CPU - at a speed regardless of its load whilst '100' will, in theory, make - it behave identically to the "ondemand" governor. - -* down_threshold: - - Same as the 'up_threshold' found for the "ondemand" governor but for - the opposite direction. For example when set to its default value of - '20' it means that if the CPU usage needs to be below 20% between - samples to have the frequency decreased. - -* sampling_down_factor: - - Similar functionality as in "ondemand" governor. But in - "conservative", it controls the rate at which the kernel makes a - decision on when to decrease the frequency while running in any speed. - Load for frequency increase is still evaluated every sampling rate. - - -2.6 Schedutil -------------- - -The "schedutil" governor aims at better integration with the Linux -kernel scheduler. Load estimation is achieved through the scheduler's -Per-Entity Load Tracking (PELT) mechanism, which also provides -information about the recent load [1]. This governor currently does -load based DVFS only for tasks managed by CFS. RT and DL scheduler tasks -are always run at the highest frequency. Unlike all the other -governors, the code is located under the kernel/sched/ directory. - -Sysfs files: - -* rate_limit_us: - - This contains a value in microseconds. The governor waits for - rate_limit_us time before reevaluating the load again, after it has - evaluated the load once. - -For an in-depth comparison with the other governors refer to [2]. - - -3. The Governor Interface in the CPUfreq Core -============================================= - -A new governor must register itself with the CPUfreq core using -"cpufreq_register_governor". The struct cpufreq_governor, which has to -be passed to that function, must contain the following values: - -governor->name - A unique name for this governor. -governor->owner - .THIS_MODULE for the governor module (if appropriate). - -plus a set of hooks to the functions implementing the governor's logic. - -The CPUfreq governor may call the CPU processor driver using one of -these two functions: - -int cpufreq_driver_target(struct cpufreq_policy *policy, - unsigned int target_freq, - unsigned int relation); - -int __cpufreq_driver_target(struct cpufreq_policy *policy, - unsigned int target_freq, - unsigned int relation); - -target_freq must be within policy->min and policy->max, of course. -What's the difference between these two functions? When your governor is -in a direct code path of a call to governor callbacks, like -governor->start(), the policy->rwsem is still held in the cpufreq core, -and there's no need to lock it again (in fact, this would cause a -deadlock). So use __cpufreq_driver_target only in these cases. In all -other cases (for example, when there's a "daemonized" function that -wakes up every second), use cpufreq_driver_target to take policy->rwsem -before the command is passed to the cpufreq driver. - -4. References -============= - -[1] Per-entity load tracking: https://lwn.net/Articles/531853/ -[2] Improvements in CPU frequency management: https://lwn.net/Articles/682391/ - diff --git a/Documentation/cpu-freq/index.txt b/Documentation/cpu-freq/index.txt index ef1d39247b05..03a7cee6ac73 100644 --- a/Documentation/cpu-freq/index.txt +++ b/Documentation/cpu-freq/index.txt @@ -21,8 +21,6 @@ Documents in this directory: amd-powernow.txt - AMD powernow driver specific file. -boost.txt - Frequency boosting support. - core.txt - General description of the CPUFreq core and of CPUFreq notifiers. @@ -32,17 +30,12 @@ cpufreq-nforce2.txt - nVidia nForce2 platform specific file. cpufreq-stats.txt - General description of sysfs cpufreq stats. -governors.txt - What are cpufreq governors and how to - implement them? - index.txt - File index, Mailing list and Links (this document) intel-pstate.txt - Intel pstate cpufreq driver specific file. pcc-cpufreq.txt - PCC cpufreq driver specific file. -user-guide.txt - User Guide to CPUFreq - Mailing List ------------ diff --git a/Documentation/cpu-freq/user-guide.txt b/Documentation/cpu-freq/user-guide.txt deleted file mode 100644 index 391da64e9492..000000000000 --- a/Documentation/cpu-freq/user-guide.txt +++ /dev/null @@ -1,228 +0,0 @@ - CPU frequency and voltage scaling code in the Linux(TM) kernel - - - L i n u x C P U F r e q - - U S E R G U I D E - - - Dominik Brodowski - - - - Clock scaling allows you to change the clock speed of the CPUs on the - fly. This is a nice method to save battery power, because the lower - the clock speed, the less power the CPU consumes. - - -Contents: ---------- -1. Supported Architectures and Processors -1.1 ARM and ARM64 -1.2 x86 -1.3 sparc64 -1.4 ppc -1.5 SuperH -1.6 Blackfin - -2. "Policy" / "Governor"? -2.1 Policy -2.2 Governor - -3. How to change the CPU cpufreq policy and/or speed -3.1 Preferred interface: sysfs - - - -1. Supported Architectures and Processors -========================================= - -1.1 ARM and ARM64 ------------------ - -Almost all ARM and ARM64 platforms support CPU frequency scaling. - -1.2 x86 -------- - -The following processors for the x86 architecture are supported by cpufreq: - -AMD Elan - SC400, SC410 -AMD mobile K6-2+ -AMD mobile K6-3+ -AMD mobile Duron -AMD mobile Athlon -AMD Opteron -AMD Athlon 64 -Cyrix Media GXm -Intel mobile PIII and Intel mobile PIII-M on certain chipsets -Intel Pentium 4, Intel Xeon -Intel Pentium M (Centrino) -National Semiconductors Geode GX -Transmeta Crusoe -Transmeta Efficeon -VIA Cyrix 3 / C3 -various processors on some ACPI 2.0-compatible systems [*] -And many more - -[*] Only if "ACPI Processor Performance States" are available -to the ACPI<->BIOS interface. - - -1.3 sparc64 ------------ - -The following processors for the sparc64 architecture are supported by -cpufreq: - -UltraSPARC-III - - -1.4 ppc -------- - -Several "PowerBook" and "iBook2" notebooks are supported. -The following POWER processors are supported in powernv mode: -POWER8 -POWER9 - -1.5 SuperH ----------- - -All SuperH processors supporting rate rounding through the clock -framework are supported by cpufreq. - -1.6 Blackfin ------------- - -The following Blackfin processors are supported by cpufreq: - -BF522, BF523, BF524, BF525, BF526, BF527, Rev 0.1 or higher -BF531, BF532, BF533, Rev 0.3 or higher -BF534, BF536, BF537, Rev 0.2 or higher -BF561, Rev 0.3 or higher -BF542, BF544, BF547, BF548, BF549, Rev 0.1 or higher - - -2. "Policy" / "Governor" ? -========================== - -Some CPU frequency scaling-capable processor switch between various -frequencies and operating voltages "on the fly" without any kernel or -user involvement. This guarantees very fast switching to a frequency -which is high enough to serve the user's needs, but low enough to save -power. - - -2.1 Policy ----------- - -On these systems, all you can do is select the lower and upper -frequency limit as well as whether you want more aggressive -power-saving or more instantly available processing power. - - -2.2 Governor ------------- - -On all other cpufreq implementations, these boundaries still need to -be set. Then, a "governor" must be selected. Such a "governor" decides -what speed the processor shall run within the boundaries. One such -"governor" is the "userspace" governor. This one allows the user - or -a yet-to-implement userspace program - to decide what specific speed -the processor shall run at. - - -3. How to change the CPU cpufreq policy and/or speed -==================================================== - -3.1 Preferred Interface: sysfs ------------------------------- - -The preferred interface is located in the sysfs filesystem. If you -mounted it at /sys, the cpufreq interface is located in a subdirectory -"cpufreq" within the cpu-device directory -(e.g. /sys/devices/system/cpu/cpu0/cpufreq/ for the first CPU). - -affected_cpus : List of Online CPUs that require software - coordination of frequency. - -cpuinfo_cur_freq : Current frequency of the CPU as obtained from - the hardware, in KHz. This is the frequency - the CPU actually runs at. - -cpuinfo_min_freq : this file shows the minimum operating - frequency the processor can run at(in kHz) - -cpuinfo_max_freq : this file shows the maximum operating - frequency the processor can run at(in kHz) - -cpuinfo_transition_latency The time it takes on this CPU to - switch between two frequencies in nano - seconds. If unknown or known to be - that high that the driver does not - work with the ondemand governor, -1 - (CPUFREQ_ETERNAL) will be returned. - Using this information can be useful - to choose an appropriate polling - frequency for a kernel governor or - userspace daemon. Make sure to not - switch the frequency too often - resulting in performance loss. - -related_cpus : List of Online + Offline CPUs that need software - coordination of frequency. - -scaling_available_frequencies : List of available frequencies, in KHz. - -scaling_available_governors : this file shows the CPUfreq governors - available in this kernel. You can see the - currently activated governor in - -scaling_cur_freq : Current frequency of the CPU as determined by - the governor and cpufreq core, in KHz. This is - the frequency the kernel thinks the CPU runs - at. - -scaling_driver : this file shows what cpufreq driver is - used to set the frequency on this CPU - -scaling_governor, and by "echoing" the name of another - governor you can change it. Please note - that some governors won't load - they only - work on some specific architectures or - processors. - -scaling_min_freq and -scaling_max_freq show the current "policy limits" (in - kHz). By echoing new values into these - files, you can change these limits. - NOTE: when setting a policy you need to - first set scaling_max_freq, then - scaling_min_freq. - -scaling_setspeed This can be read to get the currently programmed - value by the governor. This can be written to - change the current frequency for a group of - CPUs, represented by a policy. This is supported - currently only by the userspace governor. - -bios_limit : If the BIOS tells the OS to limit a CPU to - lower frequencies, the user can read out the - maximum available frequency from this file. - This typically can happen through (often not - intended) BIOS settings, restrictions - triggered through a service processor or other - BIOS/HW based implementations. - This does not cover thermal ACPI limitations - which can be detected through the generic - thermal driver. - -If you have selected the "userspace" governor which allows you to -set the CPU operating frequency to a specific value, you can read out -the current frequency in - -scaling_setspeed. By "echoing" a new frequency into this - you can change the speed of the CPU, - but only within the limits of - scaling_min_freq and scaling_max_freq. -- cgit v1.2.3