From 49978be705dd7c202f58bb401ce82bdca6cf9756 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Mon, 6 Mar 2017 11:13:51 -0800 Subject: docs: Clarify details for reporting security bugs The kernel security team is regularly asked to provide CVE identifiers, which we don't normally do. This updates the documentation to mention this and adds some more details about coordination and patch handling that come up regularly. Based on an earlier draft by Willy Tarreau. Signed-off-by: Kees Cook Acked-by: Willy Tarreau Signed-off-by: Jonathan Corbet --- Documentation/admin-guide/security-bugs.rst | 39 +++++++++++++++++++++++++---- 1 file changed, 34 insertions(+), 5 deletions(-) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/security-bugs.rst b/Documentation/admin-guide/security-bugs.rst index 4f7414cad5861..47574b382d758 100644 --- a/Documentation/admin-guide/security-bugs.rst +++ b/Documentation/admin-guide/security-bugs.rst @@ -14,14 +14,17 @@ Contact The Linux kernel security team can be contacted by email at . This is a private list of security officers who will help verify the bug report and develop and release a fix. -It is possible that the security team will bring in extra help from -area maintainers to understand and fix the security vulnerability. +If you already have a fix, please include it with your report, as +that can speed up the process considerably. It is possible that the +security team will bring in extra help from area maintainers to +understand and fix the security vulnerability. As it is with any bug, the more information provided the easier it will be to diagnose and fix. Please review the procedure outlined in -admin-guide/reporting-bugs.rst if you are unclear about what information is helpful. -Any exploit code is very helpful and will not be released without -consent from the reporter unless it has already been made public. +admin-guide/reporting-bugs.rst if you are unclear about what +information is helpful. Any exploit code is very helpful and will not +be released without consent from the reporter unless it has already been +made public. Disclosure ---------- @@ -39,6 +42,32 @@ disclosure is from immediate (esp. if it's already publicly known) to a few weeks. As a basic default policy, we expect report date to disclosure date to be on the order of 7 days. +Coordination +------------ + +Fixes for sensitive bugs, such as those that might lead to privilege +escalations, may need to be coordinated with the private + mailing list so that distribution vendors +are well prepared to issue a fixed kernel upon public disclosure of the +upstream fix. Distros will need some time to test the proposed patch and +will generally request at least a few days of embargo, and vendor update +publication prefers to happen Tuesday through Thursday. When appropriate, +the security team can assist with this coordination, or the reporter can +include linux-distros from the start. In this case, remember to prefix +the email Subject line with "[vs]" as described in the linux-distros wiki: + + +CVE assignment +-------------- + +The security team does not normally assign CVEs, nor do we require them +for reports or fixes, as this can needlessly complicate the process and +may delay the bug handling. If a reporter wishes to have a CVE identifier +assigned ahead of public disclosure, they will need to contact the private +linux-distros list, described above. When such a CVE identifier is known +before a patch is provided, it is desirable to mention it in the commit +message, though. + Non-disclosure agreements ------------------------- -- cgit v1.2.3 From 2a0e49279850d28c450f27e51b419ce90bacdcdc Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Mon, 13 Mar 2017 23:59:57 +0100 Subject: cpufreq: User/admin documentation update and consolidation The user/admin documentation of cpufreq is badly outdated. It conains stale and/or inaccurate information along with things that are not particularly useful. Also, some of the important pieces are missing from it. For this reason, add a new user/admin document for cpufreq containing current information to admin-guide and drop the old outdated .txt documents it is replacing. Since there will be more PM documents in admin-guide going forward, create a separate directory for them and put the cpufreq document in there right away. Signed-off-by: Rafael J. Wysocki Acked-by: Viresh Kumar Signed-off-by: Jonathan Corbet --- Documentation/admin-guide/index.rst | 1 + Documentation/admin-guide/pm/cpufreq.rst | 700 +++++++++++++++++++++++++++++++ Documentation/admin-guide/pm/index.rst | 15 + Documentation/cpu-freq/boost.txt | 93 ---- Documentation/cpu-freq/governors.txt | 301 ------------- Documentation/cpu-freq/index.txt | 7 - Documentation/cpu-freq/user-guide.txt | 228 ---------- 7 files changed, 716 insertions(+), 629 deletions(-) create mode 100644 Documentation/admin-guide/pm/cpufreq.rst create mode 100644 Documentation/admin-guide/pm/index.rst delete mode 100644 Documentation/cpu-freq/boost.txt delete mode 100644 Documentation/cpu-freq/governors.txt delete mode 100644 Documentation/cpu-freq/user-guide.txt (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst index 8ddae4e4299aa..8c60a8a32a1a1 100644 --- a/Documentation/admin-guide/index.rst +++ b/Documentation/admin-guide/index.rst @@ -60,6 +60,7 @@ configure specific aspects of kernel behavior to your liking. mono java ras + pm/index .. only:: subproject and html diff --git a/Documentation/admin-guide/pm/cpufreq.rst b/Documentation/admin-guide/pm/cpufreq.rst new file mode 100644 index 0000000000000..289c80f7760eb --- /dev/null +++ b/Documentation/admin-guide/pm/cpufreq.rst @@ -0,0 +1,700 @@ +.. |struct cpufreq_policy| replace:: :c:type:`struct cpufreq_policy ` + +======================= +CPU Performance Scaling +======================= + +:: + + Copyright (c) 2017 Intel Corp., Rafael J. Wysocki + +The Concept of CPU Performance Scaling +====================================== + +The majority of modern processors are capable of operating in a number of +different clock frequency and voltage configurations, often referred to as +Operating Performance Points or P-states (in ACPI terminology). As a rule, +the higher the clock frequency and the higher the voltage, the more instructions +can be retired by the CPU over a unit of time, but also the higher the clock +frequency and the higher the voltage, the more energy is consumed over a unit of +time (or the more power is drawn) by the CPU in the given P-state. Therefore +there is a natural tradeoff between the CPU capacity (the number of instructions +that can be executed over a unit of time) and the power drawn by the CPU. + +In some situations it is desirable or even necessary to run the program as fast +as possible and then there is no reason to use any P-states different from the +highest one (i.e. the highest-performance frequency/voltage configuration +available). In some other cases, however, it may not be necessary to execute +instructions so quickly and maintaining the highest available CPU capacity for a +relatively long time without utilizing it entirely may be regarded as wasteful. +It also may not be physically possible to maintain maximum CPU capacity for too +long for thermal or power supply capacity reasons or similar. To cover those +cases, there are hardware interfaces allowing CPUs to be switched between +different frequency/voltage configurations or (in the ACPI terminology) to be +put into different P-states. + +Typically, they are used along with algorithms to estimate the required CPU +capacity, so as to decide which P-states to put the CPUs into. Of course, since +the utilization of the system generally changes over time, that has to be done +repeatedly on a regular basis. The activity by which this happens is referred +to as CPU performance scaling or CPU frequency scaling (because it involves +adjusting the CPU clock frequency). + + +CPU Performance Scaling in Linux +================================ + +The Linux kernel supports CPU performance scaling by means of the ``CPUFreq`` +(CPU Frequency scaling) subsystem that consists of three layers of code: the +core, scaling governors and scaling drivers. + +The ``CPUFreq`` core provides the common code infrastructure and user space +interfaces for all platforms that support CPU performance scaling. It defines +the basic framework in which the other components operate. + +Scaling governors implement algorithms to estimate the required CPU capacity. +As a rule, each governor implements one, possibly parametrized, scaling +algorithm. + +Scaling drivers talk to the hardware. They provide scaling governors with +information on the available P-states (or P-state ranges in some cases) and +access platform-specific hardware interfaces to change CPU P-states as requested +by scaling governors. + +In principle, all available scaling governors can be used with every scaling +driver. That design is based on the observation that the information used by +performance scaling algorithms for P-state selection can be represented in a +platform-independent form in the majority of cases, so it should be possible +to use the same performance scaling algorithm implemented in exactly the same +way regardless of which scaling driver is used. Consequently, the same set of +scaling governors should be suitable for every supported platform. + +However, that observation may not hold for performance scaling algorithms +based on information provided by the hardware itself, for example through +feedback registers, as that information is typically specific to the hardware +interface it comes from and may not be easily represented in an abstract, +platform-independent way. For this reason, ``CPUFreq`` allows scaling drivers +to bypass the governor layer and implement their own performance scaling +algorithms. That is done by the ``intel_pstate`` scaling driver. + + +``CPUFreq`` Policy Objects +========================== + +In some cases the hardware interface for P-state control is shared by multiple +CPUs. That is, for example, the same register (or set of registers) is used to +control the P-state of multiple CPUs at the same time and writing to it affects +all of those CPUs simultaneously. + +Sets of CPUs sharing hardware P-state control interfaces are represented by +``CPUFreq`` as |struct cpufreq_policy| objects. For consistency, +|struct cpufreq_policy| is also used when there is only one CPU in the given +set. + +The ``CPUFreq`` core maintains a pointer to a |struct cpufreq_policy| object for +every CPU in the system, including CPUs that are currently offline. If multiple +CPUs share the same hardware P-state control interface, all of the pointers +corresponding to them point to the same |struct cpufreq_policy| object. + +``CPUFreq`` uses |struct cpufreq_policy| as its basic data type and the design +of its user space interface is based on the policy concept. + + +CPU Initialization +================== + +First of all, a scaling driver has to be registered for ``CPUFreq`` to work. +It is only possible to register one scaling driver at a time, so the scaling +driver is expected to be able to handle all CPUs in the system. + +The scaling driver may be registered before or after CPU registration. If +CPUs are registered earlier, the driver core invokes the ``CPUFreq`` core to +take a note of all of the already registered CPUs during the registration of the +scaling driver. In turn, if any CPUs are registered after the registration of +the scaling driver, the ``CPUFreq`` core will be invoked to take note of them +at their registration time. + +In any case, the ``CPUFreq`` core is invoked to take note of any logical CPU it +has not seen so far as soon as it is ready to handle that CPU. [Note that the +logical CPU may be a physical single-core processor, or a single core in a +multicore processor, or a hardware thread in a physical processor or processor +core. In what follows "CPU" always means "logical CPU" unless explicitly stated +otherwise and the word "processor" is used to refer to the physical part +possibly including multiple logical CPUs.] + +Once invoked, the ``CPUFreq`` core checks if the policy pointer is already set +for the given CPU and if so, it skips the policy object creation. Otherwise, +a new policy object is created and initialized, which involves the creation of +a new policy directory in ``sysfs``, and the policy pointer corresponding to +the given CPU is set to the new policy object's address in memory. + +Next, the scaling driver's ``->init()`` callback is invoked with the policy +pointer of the new CPU passed to it as the argument. That callback is expected +to initialize the performance scaling hardware interface for the given CPU (or, +more precisely, for the set of CPUs sharing the hardware interface it belongs +to, represented by its policy object) and, if the policy object it has been +called for is new, to set parameters of the policy, like the minimum and maximum +frequencies supported by the hardware, the table of available frequencies (if +the set of supported P-states is not a continuous range), and the mask of CPUs +that belong to the same policy (including both online and offline CPUs). That +mask is then used by the core to populate the policy pointers for all of the +CPUs in it. + +The next major initialization step for a new policy object is to attach a +scaling governor to it (to begin with, that is the default scaling governor +determined by the kernel configuration, but it may be changed later +via ``sysfs``). First, a pointer to the new policy object is passed to the +governor's ``->init()`` callback which is expected to initialize all of the +data structures necessary to handle the given policy and, possibly, to add +a governor ``sysfs`` interface to it. Next, the governor is started by +invoking its ``->start()`` callback. + +That callback it expected to register per-CPU utilization update callbacks for +all of the online CPUs belonging to the given policy with the CPU scheduler. +The utilization update callbacks will be invoked by the CPU scheduler on +important events, like task enqueue and dequeue, on every iteration of the +scheduler tick or generally whenever the CPU utilization may change (from the +scheduler's perspective). They are expected to carry out computations needed +to determine the P-state to use for the given policy going forward and to +invoke the scaling driver to make changes to the hardware in accordance with +the P-state selection. The scaling driver may be invoked directly from +scheduler context or asynchronously, via a kernel thread or workqueue, depending +on the configuration and capabilities of the scaling driver and the governor. + +Similar steps are taken for policy objects that are not new, but were "inactive" +previously, meaning that all of the CPUs belonging to them were offline. The +only practical difference in that case is that the ``CPUFreq`` core will attempt +to use the scaling governor previously used with the policy that became +"inactive" (and is re-initialized now) instead of the default governor. + +In turn, if a previously offline CPU is being brought back online, but some +other CPUs sharing the policy object with it are online already, there is no +need to re-initialize the policy object at all. In that case, it only is +necessary to restart the scaling governor so that it can take the new online CPU +into account. That is achieved by invoking the governor's ``->stop`` and +``->start()`` callbacks, in this order, for the entire policy. + +As mentioned before, the ``intel_pstate`` scaling driver bypasses the scaling +governor layer of ``CPUFreq`` and provides its own P-state selection algorithms. +Consequently, if ``intel_pstate`` is used, scaling governors are not attached to +new policy objects. Instead, the driver's ``->setpolicy()`` callback is invoked +to register per-CPU utilization update callbacks for each policy. These +callbacks are invoked by the CPU scheduler in the same way as for scaling +governors, but in the ``intel_pstate`` case they both determine the P-state to +use and change the hardware configuration accordingly in one go from scheduler +context. + +The policy objects created during CPU initialization and other data structures +associated with them are torn down when the scaling driver is unregistered +(which happens when the kernel module containing it is unloaded, for example) or +when the last CPU belonging to the given policy in unregistered. + + +Policy Interface in ``sysfs`` +============================= + +During the initialization of the kernel, the ``CPUFreq`` core creates a +``sysfs`` directory (kobject) called ``cpufreq`` under +:file:`/sys/devices/system/cpu/`. + +That directory contains a ``policyX`` subdirectory (where ``X`` represents an +integer number) for every policy object maintained by the ``CPUFreq`` core. +Each ``policyX`` directory is pointed to by ``cpufreq`` symbolic links +under :file:`/sys/devices/system/cpu/cpuY/` (where ``Y`` represents an integer +that may be different from the one represented by ``X``) for all of the CPUs +associated with (or belonging to) the given policy. The ``policyX`` directories +in :file:`/sys/devices/system/cpu/cpufreq` each contain policy-specific +attributes (files) to control ``CPUFreq`` behavior for the corresponding policy +objects (that is, for all of the CPUs associated with them). + +Some of those attributes are generic. They are created by the ``CPUFreq`` core +and their behavior generally does not depend on what scaling driver is in use +and what scaling governor is attached to the given policy. Some scaling drivers +also add driver-specific attributes to the policy directories in ``sysfs`` to +control policy-specific aspects of driver behavior. + +The generic attributes under :file:`/sys/devices/system/cpu/cpufreq/policyX/` +are the following: + +``affected_cpus`` + List of online CPUs belonging to this policy (i.e. sharing the hardware + performance scaling interface represented by the ``policyX`` policy + object). + +``bios_limit`` + If the platform firmware (BIOS) tells the OS to apply an upper limit to + CPU frequencies, that limit will be reported through this attribute (if + present). + + The existence of the limit may be a result of some (often unintentional) + BIOS settings, restrictions coming from a service processor or another + BIOS/HW-based mechanisms. + + This does not cover ACPI thermal limitations which can be discovered + through a generic thermal driver. + + This attribute is not present if the scaling driver in use does not + support it. + +``cpuinfo_max_freq`` + Maximum possible operating frequency the CPUs belonging to this policy + can run at (in kHz). + +``cpuinfo_min_freq`` + Minimum possible operating frequency the CPUs belonging to this policy + can run at (in kHz). + +``cpuinfo_transition_latency`` + The time it takes to switch the CPUs belonging to this policy from one + P-state to another, in nanoseconds. + + If unknown or if known to be so high that the scaling driver does not + work with the `ondemand`_ governor, -1 (:c:macro:`CPUFREQ_ETERNAL`) + will be returned by reads from this attribute. + +``related_cpus`` + List of all (online and offline) CPUs belonging to this policy. + +``scaling_available_governors`` + List of ``CPUFreq`` scaling governors present in the kernel that can + be attached to this policy or (if the ``intel_pstate`` scaling driver is + in use) list of scaling algorithms provided by the driver that can be + applied to this policy. + + [Note that some governors are modular and it may be necessary to load a + kernel module for the governor held by it to become available and be + listed by this attribute.] + +``scaling_cur_freq`` + Current frequency of all of the CPUs belonging to this policy (in kHz). + + For the majority of scaling drivers, this is the frequency of the last + P-state requested by the driver from the hardware using the scaling + interface provided by it, which may or may not reflect the frequency + the CPU is actually running at (due to hardware design and other + limitations). + + Some scaling drivers (e.g. ``intel_pstate``) attempt to provide + information more precisely reflecting the current CPU frequency through + this attribute, but that still may not be the exact current CPU + frequency as seen by the hardware at the moment. + +``scaling_driver`` + The scaling driver currently in use. + +``scaling_governor`` + The scaling governor currently attached to this policy or (if the + ``intel_pstate`` scaling driver is in use) the scaling algorithm + provided by the driver that is currently applied to this policy. + + This attribute is read-write and writing to it will cause a new scaling + governor to be attached to this policy or a new scaling algorithm + provided by the scaling driver to be applied to it (in the + ``intel_pstate`` case), as indicated by the string written to this + attribute (which must be one of the names listed by the + ``scaling_available_governors`` attribute described above). + +``scaling_max_freq`` + Maximum frequency the CPUs belonging to this policy are allowed to be + running at (in kHz). + + This attribute is read-write and writing a string representing an + integer to it will cause a new limit to be set (it must not be lower + than the value of the ``scaling_min_freq`` attribute). + +``scaling_min_freq`` + Minimum frequency the CPUs belonging to this policy are allowed to be + running at (in kHz). + + This attribute is read-write and writing a string representing a + non-negative integer to it will cause a new limit to be set (it must not + be higher than the value of the ``scaling_max_freq`` attribute). + +``scaling_setspeed`` + This attribute is functional only if the `userspace`_ scaling governor + is attached to the given policy. + + It returns the last frequency requested by the governor (in kHz) or can + be written to in order to set a new frequency for the policy. + + +Generic Scaling Governors +========================= + +``CPUFreq`` provides generic scaling governors that can be used with all +scaling drivers. As stated before, each of them implements a single, possibly +parametrized, performance scaling algorithm. + +Scaling governors are attached to policy objects and different policy objects +can be handled by different scaling governors at the same time (although that +may lead to suboptimal results in some cases). + +The scaling governor for a given policy object can be changed at any time with +the help of the ``scaling_governor`` policy attribute in ``sysfs``. + +Some governors expose ``sysfs`` attributes to control or fine-tune the scaling +algorithms implemented by them. Those attributes, referred to as governor +tunables, can be either global (system-wide) or per-policy, depending on the +scaling driver in use. If the driver requires governor tunables to be +per-policy, they are located in a subdirectory of each policy directory. +Otherwise, they are located in a subdirectory under +:file:`/sys/devices/system/cpu/cpufreq/`. In either case the name of the +subdirectory containing the governor tunables is the name of the governor +providing them. + +``performance`` +--------------- + +When attached to a policy object, this governor causes the highest frequency, +within the ``scaling_max_freq`` policy limit, to be requested for that policy. + +The request is made once at that time the governor for the policy is set to +``performance`` and whenever the ``scaling_max_freq`` or ``scaling_min_freq`` +policy limits change after that. + +``powersave`` +------------- + +When attached to a policy object, this governor causes the lowest frequency, +within the ``scaling_min_freq`` policy limit, to be requested for that policy. + +The request is made once at that time the governor for the policy is set to +``powersave`` and whenever the ``scaling_max_freq`` or ``scaling_min_freq`` +policy limits change after that. + +``userspace`` +------------- + +This governor does not do anything by itself. Instead, it allows user space +to set the CPU frequency for the policy it is attached to by writing to the +``scaling_setspeed`` attribute of that policy. + +``schedutil`` +------------- + +This governor uses CPU utilization data available from the CPU scheduler. It +generally is regarded as a part of the CPU scheduler, so it can access the +scheduler's internal data structures directly. + +It runs entirely in scheduler context, although in some cases it may need to +invoke the scaling driver asynchronously when it decides that the CPU frequency +should be changed for a given policy (that depends on whether or not the driver +is capable of changing the CPU frequency from scheduler context). + +The actions of this governor for a particular CPU depend on the scheduling class +invoking its utilization update callback for that CPU. If it is invoked by the +RT or deadline scheduling classes, the governor will increase the frequency to +the allowed maximum (that is, the ``scaling_max_freq`` policy limit). In turn, +if it is invoked by the CFS scheduling class, the governor will use the +Per-Entity Load Tracking (PELT) metric for the root control group of the +given CPU as the CPU utilization estimate (see the `Per-entity load tracking`_ +LWN.net article for a description of the PELT mechanism). Then, the new +CPU frequency to apply is computed in accordance with the formula + + f = 1.25 * ``f_0`` * ``util`` / ``max`` + +where ``util`` is the PELT number, ``max`` is the theoretical maximum of +``util``, and ``f_0`` is either the maximum possible CPU frequency for the given +policy (if the PELT number is frequency-invariant), or the current CPU frequency +(otherwise). + +This governor also employs a mechanism allowing it to temporarily bump up the +CPU frequency for tasks that have been waiting on I/O most recently, called +"IO-wait boosting". That happens when the :c:macro:`SCHED_CPUFREQ_IOWAIT` flag +is passed by the scheduler to the governor callback which causes the frequency +to go up to the allowed maximum immediately and then draw back to the value +returned by the above formula over time. + +This governor exposes only one tunable: + +``rate_limit_us`` + Minimum time (in microseconds) that has to pass between two consecutive + runs of governor computations (default: 1000 times the scaling driver's + transition latency). + + The purpose of this tunable is to reduce the scheduler context overhead + of the governor which might be excessive without it. + +This governor generally is regarded as a replacement for the older `ondemand`_ +and `conservative`_ governors (described below), as it is simpler and more +tightly integrated with the CPU scheduler, its overhead in terms of CPU context +switches and similar is less significant, and it uses the scheduler's own CPU +utilization metric, so in principle its decisions should not contradict the +decisions made by the other parts of the scheduler. + +``ondemand`` +------------ + +This governor uses CPU load as a CPU frequency selection metric. + +In order to estimate the current CPU load, it measures the time elapsed between +consecutive invocations of its worker routine and computes the fraction of that +time in which the given CPU was not idle. The ratio of the non-idle (active) +time to the total CPU time is taken as an estimate of the load. + +If this governor is attached to a policy shared by multiple CPUs, the load is +estimated for all of them and the greatest result is taken as the load estimate +for the entire policy. + +The worker routine of this governor has to run in process context, so it is +invoked asynchronously (via a workqueue) and CPU P-states are updated from +there if necessary. As a result, the scheduler context overhead from this +governor is minimum, but it causes additional CPU context switches to happen +relatively often and the CPU P-state updates triggered by it can be relatively +irregular. Also, it affects its own CPU load metric by running code that +reduces the CPU idle time (even though the CPU idle time is only reduced very +slightly by it). + +It generally selects CPU frequencies proportional to the estimated load, so that +the value of the ``cpuinfo_max_freq`` policy attribute corresponds to the load of +1 (or 100%), and the value of the ``cpuinfo_min_freq`` policy attribute +corresponds to the load of 0, unless when the load exceeds a (configurable) +speedup threshold, in which case it will go straight for the highest frequency +it is allowed to use (the ``scaling_max_freq`` policy limit). + +This governor exposes the following tunables: + +``sampling_rate`` + This is how often the governor's worker routine should run, in + microseconds. + + Typically, it is set to values of the order of 10000 (10 ms). Its + default value is equal to the value of ``cpuinfo_transition_latency`` + for each policy this governor is attached to (but since the unit here + is greater by 1000, this means that the time represented by + ``sampling_rate`` is 1000 times greater than the transition latency by + default). + + If this tunable is per-policy, the following shell command sets the time + represented by it to be 750 times as high as the transition latency:: + + # echo `$(($(cat cpuinfo_transition_latency) * 750 / 1000)) > ondemand/sampling_rate + + +``min_sampling_rate`` + The minimum value of ``sampling_rate``. + + Equal to 10000 (10 ms) if :c:macro:`CONFIG_NO_HZ_COMMON` and + :c:data:`tick_nohz_active` are both set or to 20 times the value of + :c:data:`jiffies` in microseconds otherwise. + +``up_threshold`` + If the estimated CPU load is above this value (in percent), the governor + will set the frequency to the maximum value allowed for the policy. + Otherwise, the selected frequency will be proportional to the estimated + CPU load. + +``ignore_nice_load`` + If set to 1 (default 0), it will cause the CPU load estimation code to + treat the CPU time spent on executing tasks with "nice" levels greater + than 0 as CPU idle time. + + This may be useful if there are tasks in the system that should not be + taken into account when deciding what frequency to run the CPUs at. + Then, to make that happen it is sufficient to increase the "nice" level + of those tasks above 0 and set this attribute to 1. + +``sampling_down_factor`` + Temporary multiplier, between 1 (default) and 100 inclusive, to apply to + the ``sampling_rate`` value if the CPU load goes above ``up_threshold``. + + This causes the next execution of the governor's worker routine (after + setting the frequency to the allowed maximum) to be delayed, so the + frequency stays at the maximum level for a longer time. + + Frequency fluctuations in some bursty workloads may be avoided this way + at the cost of additional energy spent on maintaining the maximum CPU + capacity. + +``powersave_bias`` + Reduction factor to apply to the original frequency target of the + governor (including the maximum value used when the ``up_threshold`` + value is exceeded by the estimated CPU load) or sensitivity threshold + for the AMD frequency sensitivity powersave bias driver + (:file:`drivers/cpufreq/amd_freq_sensitivity.c`), between 0 and 1000 + inclusive. + + If the AMD frequency sensitivity powersave bias driver is not loaded, + the effective frequency to apply is given by + + f * (1 - ``powersave_bias`` / 1000) + + where f is the governor's original frequency target. The default value + of this attribute is 0 in that case. + + If the AMD frequency sensitivity powersave bias driver is loaded, the + value of this attribute is 400 by default and it is used in a different + way. + + On Family 16h (and later) AMD processors there is a mechanism to get a + measured workload sensitivity, between 0 and 100% inclusive, from the + hardware. That value can be used to estimate how the performance of the + workload running on a CPU will change in response to frequency changes. + + The performance of a workload with the sensitivity of 0 (memory-bound or + IO-bound) is not expected to increase at all as a result of increasing + the CPU frequency, whereas workloads with the sensitivity of 100% + (CPU-bound) are expected to perform much better if the CPU frequency is + increased. + + If the workload sensitivity is less than the threshold represented by + the ``powersave_bias`` value, the sensitivity powersave bias driver + will cause the governor to select a frequency lower than its original + target, so as to avoid over-provisioning workloads that will not benefit + from running at higher CPU frequencies. + +``conservative`` +---------------- + +This governor uses CPU load as a CPU frequency selection metric. + +It estimates the CPU load in the same way as the `ondemand`_ governor described +above, but the CPU frequency selection algorithm implemented by it is different. + +Namely, it avoids changing the frequency significantly over short time intervals +which may not be suitable for systems with limited power supply capacity (e.g. +battery-powered). To achieve that, it changes the frequency in relatively +small steps, one step at a time, up or down - depending on whether or not a +(configurable) threshold has been exceeded by the estimated CPU load. + +This governor exposes the following tunables: + +``freq_step`` + Frequency step in percent of the maximum frequency the governor is + allowed to set (the ``scaling_max_freq`` policy limit), between 0 and + 100 (5 by default). + + This is how much the frequency is allowed to change in one go. Setting + it to 0 will cause the default frequency step (5 percent) to be used + and setting it to 100 effectively causes the governor to periodically + switch the frequency between the ``scaling_min_freq`` and + ``scaling_max_freq`` policy limits. + +``down_threshold`` + Threshold value (in percent, 20 by default) used to determine the + frequency change direction. + + If the estimated CPU load is greater than this value, the frequency will + go up (by ``freq_step``). If the load is less than this value (and the + ``sampling_down_factor`` mechanism is not in effect), the frequency will + go down. Otherwise, the frequency will not be changed. + +``sampling_down_factor`` + Frequency decrease deferral factor, between 1 (default) and 10 + inclusive. + + It effectively causes the frequency to go down ``sampling_down_factor`` + times slower than it ramps up. + + +Frequency Boost Support +======================= + +Background +---------- + +Some processors support a mechanism to raise the operating frequency of some +cores in a multicore package temporarily (and above the sustainable frequency +threshold for the whole package) under certain conditions, for example if the +whole chip is not fully utilized and below its intended thermal or power budget. + +Different names are used by different vendors to refer to this functionality. +For Intel processors it is referred to as "Turbo Boost", AMD calls it +"Turbo-Core" or (in technical documentation) "Core Performance Boost" and so on. +As a rule, it also is implemented differently by different vendors. The simple +term "frequency boost" is used here for brevity to refer to all of those +implementations. + +The frequency boost mechanism may be either hardware-based or software-based. +If it is hardware-based (e.g. on x86), the decision to trigger the boosting is +made by the hardware (although in general it requires the hardware to be put +into a special state in which it can control the CPU frequency within certain +limits). If it is software-based (e.g. on ARM), the scaling driver decides +whether or not to trigger boosting and when to do that. + +The ``boost`` File in ``sysfs`` +------------------------------- + +This file is located under :file:`/sys/devices/system/cpu/cpufreq/` and controls +the "boost" setting for the whole system. It is not present if the underlying +scaling driver does not support the frequency boost mechanism (or supports it, +but provides a driver-specific interface for controlling it, like +``intel_pstate``). + +If the value in this file is 1, the frequency boost mechanism is enabled. This +means that either the hardware can be put into states in which it is able to +trigger boosting (in the hardware-based case), or the software is allowed to +trigger boosting (in the software-based case). It does not mean that boosting +is actually in use at the moment on any CPUs in the system. It only means a +permission to use the frequency boost mechanism (which still may never be used +for other reasons). + +If the value in this file is 0, the frequency boost mechanism is disabled and +cannot be used at all. + +The only values that can be written to this file are 0 and 1. + +Rationale for Boost Control Knob +-------------------------------- + +The frequency boost mechanism is generally intended to help to achieve optimum +CPU performance on time scales below software resolution (e.g. below the +scheduler tick interval) and it is demonstrably suitable for many workloads, but +it may lead to problems in certain situations. + +For this reason, many systems make it possible to disable the frequency boost +mechanism in the platform firmware (BIOS) setup, but that requires the system to +be restarted for the setting to be adjusted as desired, which may not be +practical at least in some cases. For example: + + 1. Boosting means overclocking the processor, although under controlled + conditions. Generally, the processor's energy consumption increases + as a result of increasing its frequency and voltage, even temporarily. + That may not be desirable on systems that switch to power sources of + limited capacity, such as batteries, so the ability to disable the boost + mechanism while the system is running may help there (but that depends on + the workload too). + + 2. In some situations deterministic behavior is more important than + performance or energy consumption (or both) and the ability to disable + boosting while the system is running may be useful then. + + 3. To examine the impact of the frequency boost mechanism itself, it is useful + to be able to run tests with and without boosting, preferably without + restarting the system in the meantime. + + 4. Reproducible results are important when running benchmarks. Since + the boosting functionality depends on the load of the whole package, + single-thread performance may vary because of it which may lead to + unreproducible results sometimes. That can be avoided by disabling the + frequency boost mechanism before running benchmarks sensitive to that + issue. + +Legacy AMD ``cpb`` Knob +----------------------- + +The AMD powernow-k8 scaling driver supports a ``sysfs`` knob very similar to +the global ``boost`` one. It is used for disabling/enabling the "Core +Performance Boost" feature of some AMD processors. + +If present, that knob is located in every ``CPUFreq`` policy directory in +``sysfs`` (:file:`/sys/devices/system/cpu/cpufreq/policyX/`) and is called +``cpb``, which indicates a more fine grained control interface. The actual +implementation, however, works on the system-wide basis and setting that knob +for one policy causes the same value of it to be set for all of the other +policies at the same time. + +That knob is still supported on AMD processors that support its underlying +hardware feature, but it may be configured out of the kernel (via the +:c:macro:`CONFIG_X86_ACPI_CPUFREQ_CPB` configuration option) and the global +``boost`` knob is present regardless. Thus it is always possible use the +``boost`` knob instead of the ``cpb`` one which is highly recommended, as that +is more consistent with what all of the other systems do (and the ``cpb`` knob +may not be supported any more in the future). + +The ``cpb`` knob is never present for any processors without the underlying +hardware feature (e.g. all Intel ones), even if the +:c:macro:`CONFIG_X86_ACPI_CPUFREQ_CPB` configuration option is set. + + +.. _Per-entity load tracking: https://lwn.net/Articles/531853/ diff --git a/Documentation/admin-guide/pm/index.rst b/Documentation/admin-guide/pm/index.rst new file mode 100644 index 0000000000000..c80f087321fcb --- /dev/null +++ b/Documentation/admin-guide/pm/index.rst @@ -0,0 +1,15 @@ +================ +Power Management +================ + +.. toctree:: + :maxdepth: 2 + + cpufreq + +.. only:: subproject and html + + Indices + ======= + + * :ref:`genindex` diff --git a/Documentation/cpu-freq/boost.txt b/Documentation/cpu-freq/boost.txt deleted file mode 100644 index dd62e1334f0a8..0000000000000 --- a/Documentation/cpu-freq/boost.txt +++ /dev/null @@ -1,93 +0,0 @@ -Processor boosting control - - - information for users - - -Quick guide for the impatient: --------------------- -/sys/devices/system/cpu/cpufreq/boost -controls the boost setting for the whole system. You can read and write -that file with either "0" (boosting disabled) or "1" (boosting allowed). -Reading or writing 1 does not mean that the system is boosting at this -very moment, but only that the CPU _may_ raise the frequency at it's -discretion. --------------------- - -Introduction -------------- -Some CPUs support a functionality to raise the operating frequency of -some cores in a multi-core package if certain conditions apply, mostly -if the whole chip is not fully utilized and below it's intended thermal -budget. The decision about boost disable/enable is made either at hardware -(e.g. x86) or software (e.g ARM). -On Intel CPUs this is called "Turbo Boost", AMD calls it "Turbo-Core", -in technical documentation "Core performance boost". In Linux we use -the term "boost" for convenience. - -Rationale for disable switch ----------------------------- - -Though the idea is to just give better performance without any user -intervention, sometimes the need arises to disable this functionality. -Most systems offer a switch in the (BIOS) firmware to disable the -functionality at all, but a more fine-grained and dynamic control would -be desirable: -1. While running benchmarks, reproducible results are important. Since - the boosting functionality depends on the load of the whole package, - single thread performance can vary. By explicitly disabling the boost - functionality at least for the benchmark's run-time the system will run - at a fixed frequency and results are reproducible again. -2. To examine the impact of the boosting functionality it is helpful - to do tests with and without boosting. -3. Boosting means overclocking the processor, though under controlled - conditions. By raising the frequency and the voltage the processor - will consume more power than without the boosting, which may be - undesirable for instance for mobile users. Disabling boosting may - save power here, though this depends on the workload. - - -User controlled switch ----------------------- - -To allow the user to toggle the boosting functionality, the cpufreq core -driver exports a sysfs knob to enable or disable it. There is a file: -/sys/devices/system/cpu/cpufreq/boost -which can either read "0" (boosting disabled) or "1" (boosting enabled). -The file is exported only when cpufreq driver supports boosting. -Explicitly changing the permissions and writing to that file anyway will -return EINVAL. - -On supported CPUs one can write either a "0" or a "1" into this file. -This will either disable the boost functionality on all cores in the -whole system (0) or will allow the software or hardware to boost at will -(1). - -Writing a "1" does not explicitly boost the system, but just allows the -CPU to boost at their discretion. Some implementations take external -factors like the chip's temperature into account, so boosting once does -not necessarily mean that it will occur every time even using the exact -same software setup. - - -AMD legacy cpb switch ---------------------- -The AMD powernow-k8 driver used to support a very similar switch to -disable or enable the "Core Performance Boost" feature of some AMD CPUs. -This switch was instantiated in each CPU's cpufreq directory -(/sys/devices/system/cpu[0-9]*/cpufreq) and was called "cpb". -Though the per CPU existence hints at a more fine grained control, the -actual implementation only supported a system-global switch semantics, -which was simply reflected into each CPU's file. Writing a 0 or 1 into it -would pull the other CPUs to the same state. -For compatibility reasons this file and its behavior is still supported -on AMD CPUs, though it is now protected by a config switch -(X86_ACPI_CPUFREQ_CPB). On Intel CPUs this file will never be created, -even with the config option set. -This functionality is considered legacy and will be removed in some future -kernel version. - -More fine grained boosting control ----------------------------------- - -Technically it is possible to switch the boosting functionality at least -on a per package basis, for some CPUs even per core. Currently the driver -does not support it, but this may be implemented in the future. diff --git a/Documentation/cpu-freq/governors.txt b/Documentation/cpu-freq/governors.txt deleted file mode 100644 index 61b3184b6c24e..0000000000000 --- a/Documentation/cpu-freq/governors.txt +++ /dev/null @@ -1,301 +0,0 @@ - CPU frequency and voltage scaling code in the Linux(TM) kernel - - - L i n u x C P U F r e q - - C P U F r e q G o v e r n o r s - - - information for users and developers - - - - Dominik Brodowski - some additions and corrections by Nico Golde - Rafael J. Wysocki - Viresh Kumar - - - - Clock scaling allows you to change the clock speed of the CPUs on the - fly. This is a nice method to save battery power, because the lower - the clock speed, the less power the CPU consumes. - - -Contents: ---------- -1. What is a CPUFreq Governor? - -2. Governors In the Linux Kernel -2.1 Performance -2.2 Powersave -2.3 Userspace -2.4 Ondemand -2.5 Conservative -2.6 Schedutil - -3. The Governor Interface in the CPUfreq Core - -4. References - - -1. What Is A CPUFreq Governor? -============================== - -Most cpufreq drivers (except the intel_pstate and longrun) or even most -cpu frequency scaling algorithms only allow the CPU frequency to be set -to predefined fixed values. In order to offer dynamic frequency -scaling, the cpufreq core must be able to tell these drivers of a -"target frequency". So these specific drivers will be transformed to -offer a "->target/target_index/fast_switch()" call instead of the -"->setpolicy()" call. For set_policy drivers, all stays the same, -though. - -How to decide what frequency within the CPUfreq policy should be used? -That's done using "cpufreq governors". - -Basically, it's the following flow graph: - -CPU can be set to switch independently | CPU can only be set - within specific "limits" | to specific frequencies - - "CPUfreq policy" - consists of frequency limits (policy->{min,max}) - and CPUfreq governor to be used - / \ - / \ - / the cpufreq governor decides - / (dynamically or statically) - / what target_freq to set within - / the limits of policy->{min,max} - / \ - / \ - Using the ->setpolicy call, Using the ->target/target_index/fast_switch call, - the limits and the the frequency closest - "policy" is set. to target_freq is set. - It is assured that it - is within policy->{min,max} - - -2. Governors In the Linux Kernel -================================ - -2.1 Performance ---------------- - -The CPUfreq governor "performance" sets the CPU statically to the -highest frequency within the borders of scaling_min_freq and -scaling_max_freq. - - -2.2 Powersave -------------- - -The CPUfreq governor "powersave" sets the CPU statically to the -lowest frequency within the borders of scaling_min_freq and -scaling_max_freq. - - -2.3 Userspace -------------- - -The CPUfreq governor "userspace" allows the user, or any userspace -program running with UID "root", to set the CPU to a specific frequency -by making a sysfs file "scaling_setspeed" available in the CPU-device -directory. - - -2.4 Ondemand ------------- - -The CPUfreq governor "ondemand" sets the CPU frequency depending on the -current system load. Load estimation is triggered by the scheduler -through the update_util_data->func hook; when triggered, cpufreq checks -the CPU-usage statistics over the last period and the governor sets the -CPU accordingly. The CPU must have the capability to switch the -frequency very quickly. - -Sysfs files: - -* sampling_rate: - - Measured in uS (10^-6 seconds), this is how often you want the kernel - to look at the CPU usage and to make decisions on what to do about the - frequency. Typically this is set to values of around '10000' or more. - It's default value is (cmp. with users-guide.txt): transition_latency - * 1000. Be aware that transition latency is in ns and sampling_rate - is in us, so you get the same sysfs value by default. Sampling rate - should always get adjusted considering the transition latency to set - the sampling rate 750 times as high as the transition latency in the - bash (as said, 1000 is default), do: - - $ echo `$(($(cat cpuinfo_transition_latency) * 750 / 1000)) > ondemand/sampling_rate - -* sampling_rate_min: - - The sampling rate is limited by the HW transition latency: - transition_latency * 100 - - Or by kernel restrictions: - - If CONFIG_NO_HZ_COMMON is set, the limit is 10ms fixed. - - If CONFIG_NO_HZ_COMMON is not set or nohz=off boot parameter is - used, the limits depend on the CONFIG_HZ option: - HZ=1000: min=20000us (20ms) - HZ=250: min=80000us (80ms) - HZ=100: min=200000us (200ms) - - The highest value of kernel and HW latency restrictions is shown and - used as the minimum sampling rate. - -* up_threshold: - - This defines what the average CPU usage between the samplings of - 'sampling_rate' needs to be for the kernel to make a decision on - whether it should increase the frequency. For example when it is set - to its default value of '95' it means that between the checking - intervals the CPU needs to be on average more than 95% in use to then - decide that the CPU frequency needs to be increased. - -* ignore_nice_load: - - This parameter takes a value of '0' or '1'. When set to '0' (its - default), all processes are counted towards the 'cpu utilisation' - value. When set to '1', the processes that are run with a 'nice' - value will not count (and thus be ignored) in the overall usage - calculation. This is useful if you are running a CPU intensive - calculation on your laptop that you do not care how long it takes to - complete as you can 'nice' it and prevent it from taking part in the - deciding process of whether to increase your CPU frequency. - -* sampling_down_factor: - - This parameter controls the rate at which the kernel makes a decision - on when to decrease the frequency while running at top speed. When set - to 1 (the default) decisions to reevaluate load are made at the same - interval regardless of current clock speed. But when set to greater - than 1 (e.g. 100) it acts as a multiplier for the scheduling interval - for reevaluating load when the CPU is at its top speed due to high - load. This improves performance by reducing the overhead of load - evaluation and helping the CPU stay at its top speed when truly busy, - rather than shifting back and forth in speed. This tunable has no - effect on behavior at lower speeds/lower CPU loads. - -* powersave_bias: - - This parameter takes a value between 0 to 1000. It defines the - percentage (times 10) value of the target frequency that will be - shaved off of the target. For example, when set to 100 -- 10%, when - ondemand governor would have targeted 1000 MHz, it will target - 1000 MHz - (10% of 1000 MHz) = 900 MHz instead. This is set to 0 - (disabled) by default. - - When AMD frequency sensitivity powersave bias driver -- - drivers/cpufreq/amd_freq_sensitivity.c is loaded, this parameter - defines the workload frequency sensitivity threshold in which a lower - frequency is chosen instead of ondemand governor's original target. - The frequency sensitivity is a hardware reported (on AMD Family 16h - Processors and above) value between 0 to 100% that tells software how - the performance of the workload running on a CPU will change when - frequency changes. A workload with sensitivity of 0% (memory/IO-bound) - will not perform any better on higher core frequency, whereas a - workload with sensitivity of 100% (CPU-bound) will perform better - higher the frequency. When the driver is loaded, this is set to 400 by - default -- for CPUs running workloads with sensitivity value below - 40%, a lower frequency is chosen. Unloading the driver or writing 0 - will disable this feature. - - -2.5 Conservative ----------------- - -The CPUfreq governor "conservative", much like the "ondemand" -governor, sets the CPU frequency depending on the current usage. It -differs in behaviour in that it gracefully increases and decreases the -CPU speed rather than jumping to max speed the moment there is any load -on the CPU. This behaviour is more suitable in a battery powered -environment. The governor is tweaked in the same manner as the -"ondemand" governor through sysfs with the addition of: - -* freq_step: - - This describes what percentage steps the cpu freq should be increased - and decreased smoothly by. By default the cpu frequency will increase - in 5% chunks of your maximum cpu frequency. You can change this value - to anywhere between 0 and 100 where '0' will effectively lock your CPU - at a speed regardless of its load whilst '100' will, in theory, make - it behave identically to the "ondemand" governor. - -* down_threshold: - - Same as the 'up_threshold' found for the "ondemand" governor but for - the opposite direction. For example when set to its default value of - '20' it means that if the CPU usage needs to be below 20% between - samples to have the frequency decreased. - -* sampling_down_factor: - - Similar functionality as in "ondemand" governor. But in - "conservative", it controls the rate at which the kernel makes a - decision on when to decrease the frequency while running in any speed. - Load for frequency increase is still evaluated every sampling rate. - - -2.6 Schedutil -------------- - -The "schedutil" governor aims at better integration with the Linux -kernel scheduler. Load estimation is achieved through the scheduler's -Per-Entity Load Tracking (PELT) mechanism, which also provides -information about the recent load [1]. This governor currently does -load based DVFS only for tasks managed by CFS. RT and DL scheduler tasks -are always run at the highest frequency. Unlike all the other -governors, the code is located under the kernel/sched/ directory. - -Sysfs files: - -* rate_limit_us: - - This contains a value in microseconds. The governor waits for - rate_limit_us time before reevaluating the load again, after it has - evaluated the load once. - -For an in-depth comparison with the other governors refer to [2]. - - -3. The Governor Interface in the CPUfreq Core -============================================= - -A new governor must register itself with the CPUfreq core using -"cpufreq_register_governor". The struct cpufreq_governor, which has to -be passed to that function, must contain the following values: - -governor->name - A unique name for this governor. -governor->owner - .THIS_MODULE for the governor module (if appropriate). - -plus a set of hooks to the functions implementing the governor's logic. - -The CPUfreq governor may call the CPU processor driver using one of -these two functions: - -int cpufreq_driver_target(struct cpufreq_policy *policy, - unsigned int target_freq, - unsigned int relation); - -int __cpufreq_driver_target(struct cpufreq_policy *policy, - unsigned int target_freq, - unsigned int relation); - -target_freq must be within policy->min and policy->max, of course. -What's the difference between these two functions? When your governor is -in a direct code path of a call to governor callbacks, like -governor->start(), the policy->rwsem is still held in the cpufreq core, -and there's no need to lock it again (in fact, this would cause a -deadlock). So use __cpufreq_driver_target only in these cases. In all -other cases (for example, when there's a "daemonized" function that -wakes up every second), use cpufreq_driver_target to take policy->rwsem -before the command is passed to the cpufreq driver. - -4. References -============= - -[1] Per-entity load tracking: https://lwn.net/Articles/531853/ -[2] Improvements in CPU frequency management: https://lwn.net/Articles/682391/ - diff --git a/Documentation/cpu-freq/index.txt b/Documentation/cpu-freq/index.txt index ef1d39247b054..03a7cee6ac73a 100644 --- a/Documentation/cpu-freq/index.txt +++ b/Documentation/cpu-freq/index.txt @@ -21,8 +21,6 @@ Documents in this directory: amd-powernow.txt - AMD powernow driver specific file. -boost.txt - Frequency boosting support. - core.txt - General description of the CPUFreq core and of CPUFreq notifiers. @@ -32,17 +30,12 @@ cpufreq-nforce2.txt - nVidia nForce2 platform specific file. cpufreq-stats.txt - General description of sysfs cpufreq stats. -governors.txt - What are cpufreq governors and how to - implement them? - index.txt - File index, Mailing list and Links (this document) intel-pstate.txt - Intel pstate cpufreq driver specific file. pcc-cpufreq.txt - PCC cpufreq driver specific file. -user-guide.txt - User Guide to CPUFreq - Mailing List ------------ diff --git a/Documentation/cpu-freq/user-guide.txt b/Documentation/cpu-freq/user-guide.txt deleted file mode 100644 index 391da64e94925..0000000000000 --- a/Documentation/cpu-freq/user-guide.txt +++ /dev/null @@ -1,228 +0,0 @@ - CPU frequency and voltage scaling code in the Linux(TM) kernel - - - L i n u x C P U F r e q - - U S E R G U I D E - - - Dominik Brodowski - - - - Clock scaling allows you to change the clock speed of the CPUs on the - fly. This is a nice method to save battery power, because the lower - the clock speed, the less power the CPU consumes. - - -Contents: ---------- -1. Supported Architectures and Processors -1.1 ARM and ARM64 -1.2 x86 -1.3 sparc64 -1.4 ppc -1.5 SuperH -1.6 Blackfin - -2. "Policy" / "Governor"? -2.1 Policy -2.2 Governor - -3. How to change the CPU cpufreq policy and/or speed -3.1 Preferred interface: sysfs - - - -1. Supported Architectures and Processors -========================================= - -1.1 ARM and ARM64 ------------------ - -Almost all ARM and ARM64 platforms support CPU frequency scaling. - -1.2 x86 -------- - -The following processors for the x86 architecture are supported by cpufreq: - -AMD Elan - SC400, SC410 -AMD mobile K6-2+ -AMD mobile K6-3+ -AMD mobile Duron -AMD mobile Athlon -AMD Opteron -AMD Athlon 64 -Cyrix Media GXm -Intel mobile PIII and Intel mobile PIII-M on certain chipsets -Intel Pentium 4, Intel Xeon -Intel Pentium M (Centrino) -National Semiconductors Geode GX -Transmeta Crusoe -Transmeta Efficeon -VIA Cyrix 3 / C3 -various processors on some ACPI 2.0-compatible systems [*] -And many more - -[*] Only if "ACPI Processor Performance States" are available -to the ACPI<->BIOS interface. - - -1.3 sparc64 ------------ - -The following processors for the sparc64 architecture are supported by -cpufreq: - -UltraSPARC-III - - -1.4 ppc -------- - -Several "PowerBook" and "iBook2" notebooks are supported. -The following POWER processors are supported in powernv mode: -POWER8 -POWER9 - -1.5 SuperH ----------- - -All SuperH processors supporting rate rounding through the clock -framework are supported by cpufreq. - -1.6 Blackfin ------------- - -The following Blackfin processors are supported by cpufreq: - -BF522, BF523, BF524, BF525, BF526, BF527, Rev 0.1 or higher -BF531, BF532, BF533, Rev 0.3 or higher -BF534, BF536, BF537, Rev 0.2 or higher -BF561, Rev 0.3 or higher -BF542, BF544, BF547, BF548, BF549, Rev 0.1 or higher - - -2. "Policy" / "Governor" ? -========================== - -Some CPU frequency scaling-capable processor switch between various -frequencies and operating voltages "on the fly" without any kernel or -user involvement. This guarantees very fast switching to a frequency -which is high enough to serve the user's needs, but low enough to save -power. - - -2.1 Policy ----------- - -On these systems, all you can do is select the lower and upper -frequency limit as well as whether you want more aggressive -power-saving or more instantly available processing power. - - -2.2 Governor ------------- - -On all other cpufreq implementations, these boundaries still need to -be set. Then, a "governor" must be selected. Such a "governor" decides -what speed the processor shall run within the boundaries. One such -"governor" is the "userspace" governor. This one allows the user - or -a yet-to-implement userspace program - to decide what specific speed -the processor shall run at. - - -3. How to change the CPU cpufreq policy and/or speed -==================================================== - -3.1 Preferred Interface: sysfs ------------------------------- - -The preferred interface is located in the sysfs filesystem. If you -mounted it at /sys, the cpufreq interface is located in a subdirectory -"cpufreq" within the cpu-device directory -(e.g. /sys/devices/system/cpu/cpu0/cpufreq/ for the first CPU). - -affected_cpus : List of Online CPUs that require software - coordination of frequency. - -cpuinfo_cur_freq : Current frequency of the CPU as obtained from - the hardware, in KHz. This is the frequency - the CPU actually runs at. - -cpuinfo_min_freq : this file shows the minimum operating - frequency the processor can run at(in kHz) - -cpuinfo_max_freq : this file shows the maximum operating - frequency the processor can run at(in kHz) - -cpuinfo_transition_latency The time it takes on this CPU to - switch between two frequencies in nano - seconds. If unknown or known to be - that high that the driver does not - work with the ondemand governor, -1 - (CPUFREQ_ETERNAL) will be returned. - Using this information can be useful - to choose an appropriate polling - frequency for a kernel governor or - userspace daemon. Make sure to not - switch the frequency too often - resulting in performance loss. - -related_cpus : List of Online + Offline CPUs that need software - coordination of frequency. - -scaling_available_frequencies : List of available frequencies, in KHz. - -scaling_available_governors : this file shows the CPUfreq governors - available in this kernel. You can see the - currently activated governor in - -scaling_cur_freq : Current frequency of the CPU as determined by - the governor and cpufreq core, in KHz. This is - the frequency the kernel thinks the CPU runs - at. - -scaling_driver : this file shows what cpufreq driver is - used to set the frequency on this CPU - -scaling_governor, and by "echoing" the name of another - governor you can change it. Please note - that some governors won't load - they only - work on some specific architectures or - processors. - -scaling_min_freq and -scaling_max_freq show the current "policy limits" (in - kHz). By echoing new values into these - files, you can change these limits. - NOTE: when setting a policy you need to - first set scaling_max_freq, then - scaling_min_freq. - -scaling_setspeed This can be read to get the currently programmed - value by the governor. This can be written to - change the current frequency for a group of - CPUs, represented by a policy. This is supported - currently only by the userspace governor. - -bios_limit : If the BIOS tells the OS to limit a CPU to - lower frequencies, the user can read out the - maximum available frequency from this file. - This typically can happen through (often not - intended) BIOS settings, restrictions - triggered through a service processor or other - BIOS/HW based implementations. - This does not cover thermal ACPI limitations - which can be detected through the generic - thermal driver. - -If you have selected the "userspace" governor which allows you to -set the CPU operating frequency to a specific value, you can read out -the current frequency in - -scaling_setspeed. By "echoing" a new frequency into this - you can change the speed of the CPU, - but only within the limits of - scaling_min_freq and scaling_max_freq. -- cgit v1.2.3 From 30e010f64a9da9928dc3df205b96155e68c2eff1 Mon Sep 17 00:00:00 2001 From: Martin Kepplinger Date: Sun, 12 Mar 2017 12:54:23 +0100 Subject: Documentation: admin-guide: fix path to input key definitions The UAPI header split failed to update the documentation here; fix things accordingly. Signed-off-by: Martin Kepplinger Signed-off-by: Jonathan Corbet --- Documentation/admin-guide/sysrq.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/sysrq.rst b/Documentation/admin-guide/sysrq.rst index d1712ea2d314d..7b9035c01a2e1 100644 --- a/Documentation/admin-guide/sysrq.rst +++ b/Documentation/admin-guide/sysrq.rst @@ -212,7 +212,8 @@ I hit SysRq, but nothing seems to happen, what's wrong? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ There are some keyboards that produce a different keycode for SysRq than the -pre-defined value of 99 (see ``KEY_SYSRQ`` in ``include/linux/input.h``), or +pre-defined value of 99 +(see ``KEY_SYSRQ`` in ``include/uapi/linux/input-event-codes.h``), or which don't have a SysRq key at all. In these cases, run ``showkey -s`` to find an appropriate scancode sequence, and use ``setkeycodes 99`` to map this sequence to the usual SysRq code (e.g., ``setkeycodes e05b 99``). It's -- cgit v1.2.3 From 9f02a486da2a06687460fb2fd5303ca29b711f87 Mon Sep 17 00:00:00 2001 From: Tamara Diaconita Date: Tue, 14 Mar 2017 10:38:35 +0200 Subject: Documentation: admin-guide: Fix typos Fix typos in admin-guide directory. Make documentation clear and grammatically correct. Signed-off-by: Tamara Diaconita Signed-off-by: Jonathan Corbet --- Documentation/admin-guide/kernel-parameters.rst | 2 +- Documentation/admin-guide/ras.rst | 12 ++++++------ 2 files changed, 7 insertions(+), 7 deletions(-) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/kernel-parameters.rst b/Documentation/admin-guide/kernel-parameters.rst index b516164999a80..c5eae20d5691e 100644 --- a/Documentation/admin-guide/kernel-parameters.rst +++ b/Documentation/admin-guide/kernel-parameters.rst @@ -197,7 +197,7 @@ and is between 256 and 4096 characters. It is defined in the file Finally, the [KMG] suffix is commonly described after a number of kernel parameter values. These 'K', 'M', and 'G' letters represent the _binary_ -multipliers 'Kilo', 'Mega', and 'Giga', equalling 2^10, 2^20, and 2^30 +multipliers 'Kilo', 'Mega', and 'Giga', equaling 2^10, 2^20, and 2^30 bytes respectively. Such letter suffixes can also be entirely omitted: .. include:: kernel-parameters.txt diff --git a/Documentation/admin-guide/ras.rst b/Documentation/admin-guide/ras.rst index 1b90c6f00a929..8c7bbf2c88d23 100644 --- a/Documentation/admin-guide/ras.rst +++ b/Documentation/admin-guide/ras.rst @@ -8,7 +8,7 @@ RAS concepts ************ Reliability, Availability and Serviceability (RAS) is a concept used on -servers meant to measure their robusteness. +servers meant to measure their robustness. Reliability is the probability that a system will produce correct outputs. @@ -42,13 +42,13 @@ Among the monitoring measures, the most usual ones include: * CPU – detect errors at instruction execution and at L1/L2/L3 caches; * Memory – add error correction logic (ECC) to detect and correct errors; -* I/O – add CRC checksums for tranfered data; +* I/O – add CRC checksums for transferred data; * Storage – RAID, journal file systems, checksums, Self-Monitoring, Analysis and Reporting Technology (SMART). By monitoring the number of occurrences of error detections, it is possible to identify if the probability of hardware errors is increasing, and, on such -case, do a preventive maintainance to replace a degrated component while +case, do a preventive maintenance to replace a degraded component while those errors are correctable. Types of errors @@ -121,7 +121,7 @@ using the ``dmidecode`` tool. For example, on a desktop machine, it shows:: On the above example, a DDR4 SO-DIMM memory module is located at the system's memory labeled as "BANK 0", as given by the *bank locator* field. Please notice that, on such system, the *total width* is equal to the -*data witdh*. It means that such memory module doesn't have error +*data width*. It means that such memory module doesn't have error detection/correction mechanisms. Unfortunately, not all systems use the same field to specify the memory @@ -145,7 +145,7 @@ bank. On this example, from an older server, ``dmidecode`` shows:: There, the DDR3 RDIMM memory module is located at the system's memory labeled as "DIMM_A1", as given by the *locator* field. Please notice that this -memory module has 64 bits of *data witdh* and 72 bits of *total width*. So, +memory module has 64 bits of *data width* and 72 bits of *total width*. So, it has 8 extra bits to be used by error detection and correction mechanisms. Such kind of memory is called Error-correcting code memory (ECC memory). @@ -186,7 +186,7 @@ Architecture (MCA)\ [#f3]_. .. [#f1] Please notice that several memory controllers allow operation on a mode called "Lock-Step", where it groups two memory modules together, doing 128-bit reads/writes. That gives 16 bits for error correction, with - significatively improves the error correction mechanism, at the expense + significantly improves the error correction mechanism, at the expense that, when an error happens, there's no way to know what memory module is to blame. So, it has to blame both memory modules. -- cgit v1.2.3 From 9b9355a2692621be80f95ac31e16d2c16c1c07c9 Mon Sep 17 00:00:00 2001 From: Andrew Clayton Date: Thu, 20 Apr 2017 14:04:15 +0100 Subject: docs: process/4.Coding.rst: Fix a couple of document refs In Documentation/process/4.Coding.rst there were a couple of paragraphs that spilled over the 80 character line length. This was likely caused when the document was converted to reStructuredText. Re-flow the paragraphs and make the document references proper reStructuredText :ref: links. This also adds the appropriate reStructuredText file heading to kernel-parameters.rst as referenced by the kernel-parameters link in this patch. Signed-off-by: Andrew Clayton Reviewed-by: Jani Nikula Signed-off-by: Jonathan Corbet --- Documentation/admin-guide/kernel-parameters.rst | 2 ++ Documentation/process/4.Coding.rst | 17 +++++++++-------- 2 files changed, 11 insertions(+), 8 deletions(-) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/kernel-parameters.rst b/Documentation/admin-guide/kernel-parameters.rst index c5eae20d5691e..33b90e273489e 100644 --- a/Documentation/admin-guide/kernel-parameters.rst +++ b/Documentation/admin-guide/kernel-parameters.rst @@ -1,3 +1,5 @@ +.. _kernelparameters: + The kernel's command-line parameters ==================================== diff --git a/Documentation/process/4.Coding.rst b/Documentation/process/4.Coding.rst index 2a728d898fc5a..6df19943dd4dc 100644 --- a/Documentation/process/4.Coding.rst +++ b/Documentation/process/4.Coding.rst @@ -22,11 +22,11 @@ Coding style ************ The kernel has long had a standard coding style, described in -Documentation/process/coding-style.rst. For much of that time, the policies described -in that file were taken as being, at most, advisory. As a result, there is -a substantial amount of code in the kernel which does not meet the coding -style guidelines. The presence of that code leads to two independent -hazards for kernel developers. +:ref:`Documentation/process/coding-style.rst `. For much of +that time, the policies described in that file were taken as being, at most, +advisory. As a result, there is a substantial amount of code in the kernel +which does not meet the coding style guidelines. The presence of that code +leads to two independent hazards for kernel developers. The first of these is to believe that the kernel coding standards do not matter and are not enforced. The truth of the matter is that adding new @@ -343,9 +343,10 @@ user-space developers to know what they are working with. See Documentation/ABI/README for a description of how this documentation should be formatted and what information needs to be provided. -The file Documentation/admin-guide/kernel-parameters.rst describes all of the kernel's -boot-time parameters. Any patch which adds new parameters should add the -appropriate entries to this file. +The file :ref:`Documentation/admin-guide/kernel-parameters.rst +` describes all of the kernel's boot-time parameters. +Any patch which adds new parameters should add the appropriate entries to +this file. Any new configuration options must be accompanied by help text which clearly explains the options and when the user might want to select them. -- cgit v1.2.3