1
linux/drivers/cpuidle/governors/teo.c

552 lines
17 KiB
C
Raw Normal View History

cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
// SPDX-License-Identifier: GPL-2.0
/*
* Timer events oriented CPU idle governor
*
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
* Copyright (C) 2018 - 2021 Intel Corporation
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
* Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
*/
/**
* DOC: teo-description
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
*
* The idea of this governor is based on the observation that on many systems
* timer events are two or more orders of magnitude more frequent than any
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
* other interrupts, so they are likely to be the most significant cause of CPU
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
* wakeups from idle states. Moreover, information about what happened in the
* (relatively recent) past can be used to estimate whether or not the deepest
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
* idle state with target residency within the (known) time till the closest
* timer event, referred to as the sleep length, is likely to be suitable for
* the upcoming CPU idle period and, if not, then which of the shallower idle
* states to choose instead of it.
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
*
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
* Of course, non-timer wakeup sources are more important in some use cases
* which can be covered by taking a few most recent idle time intervals of the
* CPU into account. However, even in that context it is not necessary to
* consider idle duration values greater than the sleep length, because the
* closest timer will ultimately wake up the CPU anyway unless it is woken up
* earlier.
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
*
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
* Thus this governor estimates whether or not the prospective idle duration of
* a CPU is likely to be significantly shorter than the sleep length and selects
* an idle state for it accordingly.
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
*
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
* The computations carried out by this governor are based on using bins whose
* boundaries are aligned with the target residency parameter values of the CPU
* idle states provided by the %CPUIdle driver in the ascending order. That is,
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
* the first bin spans from 0 up to, but not including, the target residency of
* the second idle state (idle state 1), the second bin spans from the target
* residency of idle state 1 up to, but not including, the target residency of
* idle state 2, the third bin spans from the target residency of idle state 2
* up to, but not including, the target residency of idle state 3 and so on.
* The last bin spans from the target residency of the deepest idle state
* supplied by the driver to infinity.
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
*
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
* Two metrics called "hits" and "intercepts" are associated with each bin.
* They are updated every time before selecting an idle state for the given CPU
* in accordance with what happened last time.
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
*
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
* The "hits" metric reflects the relative frequency of situations in which the
* sleep length and the idle duration measured after CPU wakeup fall into the
* same bin (that is, the CPU appears to wake up "on time" relative to the sleep
* length). In turn, the "intercepts" metric reflects the relative frequency of
* situations in which the measured idle duration is so much shorter than the
* sleep length that the bin it falls into corresponds to an idle state
cpuidle: teo: Rework most recent idle duration values treatment The TEO (Timer Events Oriented) cpuidle governor uses several most recent idle duration values for a given CPU to refine the idle state selection in case the previous long-term trends have not been followed recently and a new trend appears to be forming. That is done by computing the average of the most recent idle duration values falling below the time till the next timer event ("sleep length"), provided that they are the majority of the most recent idle duration values taken into account, and using it as the new expected idle duration value. However, idle state selection based on that value may not be optimal, because the average does not really indicate which of the idle states with target residencies less than or equal to it is likely to be the best fit. Thus, instead of computing the average, make the governor carry out computations based on the distribution of the most recent idle duration values among the bins corresponding to different idle states. Namely, if the majority of the most recent idle duration values taken into consideration are less than the current sleep length (which means that the CPU is likely to wake up early), find the idle state closest to the "candidate" one "matching" the sleep length whose target residency is less than or equal to the majority of the most recent idle duration values that have fallen below the current sleep length (which means that it is likely to be "shallow enough" this time). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:17:18 -07:00
* shallower than the one whose bin is fallen into by the sleep length (these
* situations are referred to as "intercepts" below).
*
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
* In order to select an idle state for a CPU, the governor takes the following
* steps (modulo the possible latency constraint that must be taken into account
* too):
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
*
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
* 1. Find the deepest CPU idle state whose target residency does not exceed
* the current sleep length (the candidate idle state) and compute 2 sums as
cpuidle: teo: Rework most recent idle duration values treatment The TEO (Timer Events Oriented) cpuidle governor uses several most recent idle duration values for a given CPU to refine the idle state selection in case the previous long-term trends have not been followed recently and a new trend appears to be forming. That is done by computing the average of the most recent idle duration values falling below the time till the next timer event ("sleep length"), provided that they are the majority of the most recent idle duration values taken into account, and using it as the new expected idle duration value. However, idle state selection based on that value may not be optimal, because the average does not really indicate which of the idle states with target residencies less than or equal to it is likely to be the best fit. Thus, instead of computing the average, make the governor carry out computations based on the distribution of the most recent idle duration values among the bins corresponding to different idle states. Namely, if the majority of the most recent idle duration values taken into consideration are less than the current sleep length (which means that the CPU is likely to wake up early), find the idle state closest to the "candidate" one "matching" the sleep length whose target residency is less than or equal to the majority of the most recent idle duration values that have fallen below the current sleep length (which means that it is likely to be "shallow enough" this time). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:17:18 -07:00
* follows:
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
*
* - The sum of the "hits" and "intercepts" metrics for the candidate state
* and all of the deeper idle states (it represents the cases in which the
* CPU was idle long enough to avoid being intercepted if the sleep length
* had been equal to the current one).
*
* - The sum of the "intercepts" metrics for all of the idle states shallower
* than the candidate one (it represents the cases in which the CPU was not
* idle long enough to avoid being intercepted if the sleep length had been
* equal to the current one).
*
* 2. If the second sum is greater than the first one the CPU is likely to wake
* up early, so look for an alternative idle state to select.
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
*
* - Traverse the idle states shallower than the candidate one in the
* descending order.
*
* - For each of them compute the sum of the "intercepts" metrics over all
* of the idle states between it and the candidate one (including the
* former and excluding the latter).
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
*
cpuidle: teo: Rework most recent idle duration values treatment The TEO (Timer Events Oriented) cpuidle governor uses several most recent idle duration values for a given CPU to refine the idle state selection in case the previous long-term trends have not been followed recently and a new trend appears to be forming. That is done by computing the average of the most recent idle duration values falling below the time till the next timer event ("sleep length"), provided that they are the majority of the most recent idle duration values taken into account, and using it as the new expected idle duration value. However, idle state selection based on that value may not be optimal, because the average does not really indicate which of the idle states with target residencies less than or equal to it is likely to be the best fit. Thus, instead of computing the average, make the governor carry out computations based on the distribution of the most recent idle duration values among the bins corresponding to different idle states. Namely, if the majority of the most recent idle duration values taken into consideration are less than the current sleep length (which means that the CPU is likely to wake up early), find the idle state closest to the "candidate" one "matching" the sleep length whose target residency is less than or equal to the majority of the most recent idle duration values that have fallen below the current sleep length (which means that it is likely to be "shallow enough" this time). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:17:18 -07:00
* - If each of these sums that needs to be taken into account (because the
* check related to it has indicated that the CPU is likely to wake up
* early) is greater than a half of the corresponding sum computed in step
* 1 (which means that the target residency of the state in question had
* not exceeded the idle duration in over a half of the relevant cases),
* select the given idle state instead of the candidate one.
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
*
cpuidle: teo: Rework most recent idle duration values treatment The TEO (Timer Events Oriented) cpuidle governor uses several most recent idle duration values for a given CPU to refine the idle state selection in case the previous long-term trends have not been followed recently and a new trend appears to be forming. That is done by computing the average of the most recent idle duration values falling below the time till the next timer event ("sleep length"), provided that they are the majority of the most recent idle duration values taken into account, and using it as the new expected idle duration value. However, idle state selection based on that value may not be optimal, because the average does not really indicate which of the idle states with target residencies less than or equal to it is likely to be the best fit. Thus, instead of computing the average, make the governor carry out computations based on the distribution of the most recent idle duration values among the bins corresponding to different idle states. Namely, if the majority of the most recent idle duration values taken into consideration are less than the current sleep length (which means that the CPU is likely to wake up early), find the idle state closest to the "candidate" one "matching" the sleep length whose target residency is less than or equal to the majority of the most recent idle duration values that have fallen below the current sleep length (which means that it is likely to be "shallow enough" this time). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:17:18 -07:00
* 3. By default, select the candidate state.
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
*/
#include <linux/cpuidle.h>
#include <linux/jiffies.h>
#include <linux/kernel.h>
#include <linux/sched/clock.h>
#include <linux/tick.h>
#include "gov.h"
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
/*
* The PULSE value is added to metrics when they grow and the DECAY_SHIFT value
* is used for decreasing metrics on a regular basis.
*/
#define PULSE 1024
#define DECAY_SHIFT 3
/**
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
* struct teo_bin - Metrics used by the TEO cpuidle governor.
* @intercepts: The "intercepts" metric.
* @hits: The "hits" metric.
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
*/
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
struct teo_bin {
unsigned int intercepts;
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
unsigned int hits;
};
/**
* struct teo_cpu - CPU data used by the TEO cpuidle governor.
* @time_span_ns: Time between idle state selection and post-wakeup update.
* @sleep_length_ns: Time till the closest timer event (at the selection time).
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
* @state_bins: Idle state data bins for this CPU.
cpuidle: teo: Introduce util-awareness Modern interactive systems, such as recent Android phones, tend to have power efficient shallow idle states. Selecting deeper idle states on a device while a latency-sensitive workload is running can adversely impact performance due to increased latency. Additionally, if the CPU wakes up from a deeper sleep before its target residency as is often the case, it results in a waste of energy on top of that. At the moment, none of the available idle governors take any scheduling information into account. They also tend to overestimate the idle duration quite often, which causes them to select excessively deep idle states, thus leading to increased wakeup latency and lower performance with no power saving. For 'menu' while web browsing on Android for instance, those types of wakeups ('too deep') account for over 24% of all wakeups. At the same time, on some platforms idle state 0 can be power efficient enough to warrant wanting to prefer it over idle state 1. This is because the power usage of the two states can be so close that sufficient amounts of too deep state 1 sleeps can completely offset the state 1 power saving to the point where it would've been more power efficient to just use state 0 instead. This is, of course, for systems where state 0 is not a polling state, such as arm-based devices. Sleeps that happened in state 0 while they could have used state 1 ('too shallow') only save less power than they otherwise could have. Too deep sleeps, on the other hand, harm performance and nullify the potential power saving from using state 1 in the first place. While taking this into account, it is clear that on balance it is preferable for an idle governor to have more too shallow sleeps instead of more too deep sleeps on those kinds of platforms. This patch specifically tunes TEO to prefer shallower idle states in order to reduce wakeup latency and achieve better performance. To this end, before selecting the next idle state it uses the avg_util signal of a CPU's runqueue in order to determine to what extent the CPU is being utilized. This util value is then compared to a threshold defined as a percentage of the CPU's capacity (capacity >> 6 ie. ~1.5% in the current implementation). If the util is above the threshold, the index of the idle state selected by TEO metrics will be reduced by 1, thus selecting a shallower state. If the util is below the threshold, the governor defaults to the TEO metrics mechanism to try to select the deepest available idle state based on the closest timer event and its own correctness. The main goal of this is to reduce latency and increase performance for some workloads. Under some workloads it will result in an increase in power usage (Geekbench 5) while for other workloads it will also result in a decrease in power usage compared to TEO (PCMark Web, Jankbench, Speedometer). It can provide drastically decreased latency and performance benefits in certain types of workloads that are sensitive to latency. Example test results: 1. GB5 (better score, latency & more power usage) | metric | menu | teo | teo-util-aware | | ------------------------------------- | -------------- | ----------------- | ----------------- | | gmean score | 2826.5 (0.0%) | 2764.8 (-2.18%) | 2865 (1.36%) | | gmean power usage [mW] | 2551.4 (0.0%) | 2606.8 (2.17%) | 2722.3 (6.7%) | | gmean too deep % | 14.99% | 9.65% | 4.02% | | gmean too shallow % | 2.5% | 5.96% | 14.59% | | gmean task wakeup latency (asynctask) | 78.16μs (0.0%) | 61.60μs (-21.19%) | 54.45μs (-30.34%) | 2. Jankbench (better score, latency & less power usage) | metric | menu | teo | teo-util-aware | | ------------------------------------- | -------------- | ----------------- | ----------------- | | gmean frame duration | 13.9 (0.0%) | 14.7 (6.0%) | 12.6 (-9.0%) | | gmean jank percentage | 1.5 (0.0%) | 2.1 (36.99%) | 1.3 (-17.37%) | | gmean power usage [mW] | 144.6 (0.0%) | 136.9 (-5.27%) | 121.3 (-16.08%) | | gmean too deep % | 26.00% | 11.00% | 2.54% | | gmean too shallow % | 4.74% | 11.89% | 21.93% | | gmean wakeup latency (RenderThread) | 139.5μs (0.0%) | 116.5μs (-16.49%) | 91.11μs (-34.7%) | | gmean wakeup latency (surfaceflinger) | 124.0μs (0.0%) | 151.9μs (22.47%) | 87.65μs (-29.33%) | Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com> [ rjw: Comment edits and white space adjustments ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-05 07:51:59 -07:00
* @total: Grand total of the "intercepts" and "hits" metrics for all bins.
* @tick_hits: Number of "hits" after TICK_NSEC.
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
*/
struct teo_cpu {
s64 time_span_ns;
s64 sleep_length_ns;
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
struct teo_bin state_bins[CPUIDLE_STATE_MAX];
unsigned int total;
unsigned int tick_hits;
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
};
static DEFINE_PER_CPU(struct teo_cpu, teo_cpus);
/**
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
* teo_update - Update CPU metrics after wakeup.
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
* @drv: cpuidle driver containing state data.
* @dev: Target CPU.
*/
static void teo_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
{
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
int i, idx_timer = 0, idx_duration = 0;
s64 target_residency_ns;
u64 measured_ns;
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
if (cpu_data->time_span_ns >= cpu_data->sleep_length_ns) {
/*
* One of the safety nets has triggered or the wakeup was close
* enough to the closest timer event expected at the idle state
* selection time to be discarded.
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
*/
measured_ns = U64_MAX;
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
} else {
u64 lat_ns = drv->states[dev->last_state_idx].exit_latency_ns;
/*
* The computations below are to determine whether or not the
* (saved) time till the next timer event and the measured idle
* duration fall into the same "bin", so use last_residency_ns
* for that instead of time_span_ns which includes the cpuidle
* overhead.
*/
measured_ns = dev->last_residency_ns;
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
/*
* The delay between the wakeup and the first instruction
* executed by the CPU is not likely to be worst-case every
* time, so take 1/2 of the exit latency as a very rough
* approximation of the average of it.
*/
if (measured_ns >= lat_ns)
measured_ns -= lat_ns / 2;
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
else
measured_ns /= 2;
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
}
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
cpu_data->total = 0;
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
/*
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
* Decay the "hits" and "intercepts" metrics for all of the bins and
* find the bins that the sleep length and the measured idle duration
* fall into.
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
*/
for (i = 0; i < drv->state_count; i++) {
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
struct teo_bin *bin = &cpu_data->state_bins[i];
bin->hits -= bin->hits >> DECAY_SHIFT;
bin->intercepts -= bin->intercepts >> DECAY_SHIFT;
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
cpu_data->total += bin->hits + bin->intercepts;
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
target_residency_ns = drv->states[i].target_residency_ns;
if (target_residency_ns <= cpu_data->sleep_length_ns) {
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
idx_timer = i;
if (target_residency_ns <= measured_ns)
idx_duration = i;
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
}
}
/*
* If the deepest state's target residency is below the tick length,
* make a record of it to help teo_select() decide whether or not
* to stop the tick. This effectively adds an extra hits-only bin
* beyond the last state-related one.
*/
if (target_residency_ns < TICK_NSEC) {
cpu_data->tick_hits -= cpu_data->tick_hits >> DECAY_SHIFT;
cpu_data->total += cpu_data->tick_hits;
if (TICK_NSEC <= cpu_data->sleep_length_ns) {
idx_timer = drv->state_count;
if (TICK_NSEC <= measured_ns) {
cpu_data->tick_hits += PULSE;
goto end;
}
}
}
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
/*
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
* If the measured idle duration falls into the same bin as the sleep
* length, this is a "hit", so update the "hits" metric for that bin.
* Otherwise, update the "intercepts" metric for the bin fallen into by
* the measured idle duration.
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
*/
if (idx_timer == idx_duration)
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
cpu_data->state_bins[idx_timer].hits += PULSE;
else
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
cpu_data->state_bins[idx_duration].intercepts += PULSE;
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
end:
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
cpu_data->total += PULSE;
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
}
static bool teo_state_ok(int i, struct cpuidle_driver *drv)
{
return !tick_nohz_tick_stopped() ||
drv->states[i].target_residency_ns >= TICK_NSEC;
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
}
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
/**
* teo_find_shallower_state - Find shallower idle state matching given duration.
* @drv: cpuidle driver containing state data.
* @dev: Target CPU.
* @state_idx: Index of the capping idle state.
* @duration_ns: Idle duration value to match.
* @no_poll: Don't consider polling states.
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
*/
static int teo_find_shallower_state(struct cpuidle_driver *drv,
struct cpuidle_device *dev, int state_idx,
s64 duration_ns, bool no_poll)
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
{
int i;
for (i = state_idx - 1; i >= 0; i--) {
if (dev->states_usage[i].disable ||
(no_poll && drv->states[i].flags & CPUIDLE_FLAG_POLLING))
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
continue;
state_idx = i;
if (drv->states[i].target_residency_ns <= duration_ns)
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
break;
}
return state_idx;
}
/**
* teo_select - Selects the next idle state to enter.
* @drv: cpuidle driver containing state data.
* @dev: Target CPU.
* @stop_tick: Indication on whether or not to stop the scheduler tick.
*/
static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
bool *stop_tick)
{
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
s64 latency_req = cpuidle_governor_latency_req(dev->cpu);
ktime_t delta_tick = TICK_NSEC / 2;
unsigned int tick_intercept_sum = 0;
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
unsigned int idx_intercept_sum = 0;
unsigned int intercept_sum = 0;
unsigned int idx_hit_sum = 0;
unsigned int hit_sum = 0;
int constraint_idx = 0;
int idx0 = 0, idx = -1;
int prev_intercept_idx;
s64 duration_ns;
int i;
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
if (dev->last_state_idx >= 0) {
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
teo_update(drv, dev);
dev->last_state_idx = -1;
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
}
cpu_data->time_span_ns = local_clock();
/*
* Set the expected sleep length to infinity in case of an early
* return.
*/
cpu_data->sleep_length_ns = KTIME_MAX;
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
/* Check if there is any choice in the first place. */
if (drv->state_count < 2) {
idx = 0;
goto out_tick;
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
}
if (!dev->states_usage[0].disable)
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
idx = 0;
cpuidle: teo: Fix "early hits" handling for disabled idle states The TEO governor uses idle duration "bins" defined in accordance with the CPU idle states table provided by the driver, so that each "bin" covers the idle duration range between the target residency of the idle state corresponding to it and the target residency of the closest deeper idle state. The governor collects statistics for each bin regardless of whether or not the idle state corresponding to it is currently enabled. In particular, the "early hits" metric measures the likelihood of a situation in which the idle duration measured after wakeup falls into to given bin, but the time till the next timer (sleep length) falls into a bin corresponding to one of the deeper idle states. It is used when the "hits" and "misses" metrics indicate that the state "matching" the sleep length should not be selected, so that the state with the maximum "early hits" value is selected instead of it. If the idle state corresponding to the given bin is disabled, it cannot be selected and if it turns out to be the one that should be selected, a shallower idle state needs to be used instead of it. Nevertheless, the metrics collected for the bin corresponding to it are still valid and need to be taken into account as though that state had not been disabled. As far as the "early hits" metric is concerned, teo_select() tries to take disabled states into account, but the state index corresponding to the maximum "early hits" value computed by it may be incorrect. Namely, it always uses the index of the previous maximum "early hits" state then, but there may be enabled idle states closer to the disabled one in question. In particular, if the current candidate state (whose index is the idx value) is closer to the disabled one and the "early hits" value of the disabled state is greater than the current maximum, the index of the current candidate state (idx) should replace the "maximum early hits state" index. Modify the code to handle that case correctly. Fixes: b26bf6ab716f ("cpuidle: New timer events oriented governor for tickless systems") Reported-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: 5.1+ <stable@vger.kernel.org> # 5.1+
2019-10-10 14:37:39 -07:00
/* Compute the sums of metrics for early wakeup pattern detection. */
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
for (i = 1; i < drv->state_count; i++) {
struct teo_bin *prev_bin = &cpu_data->state_bins[i-1];
struct cpuidle_state *s = &drv->states[i];
cpuidle: teo: Fix "early hits" handling for disabled idle states The TEO governor uses idle duration "bins" defined in accordance with the CPU idle states table provided by the driver, so that each "bin" covers the idle duration range between the target residency of the idle state corresponding to it and the target residency of the closest deeper idle state. The governor collects statistics for each bin regardless of whether or not the idle state corresponding to it is currently enabled. In particular, the "early hits" metric measures the likelihood of a situation in which the idle duration measured after wakeup falls into to given bin, but the time till the next timer (sleep length) falls into a bin corresponding to one of the deeper idle states. It is used when the "hits" and "misses" metrics indicate that the state "matching" the sleep length should not be selected, so that the state with the maximum "early hits" value is selected instead of it. If the idle state corresponding to the given bin is disabled, it cannot be selected and if it turns out to be the one that should be selected, a shallower idle state needs to be used instead of it. Nevertheless, the metrics collected for the bin corresponding to it are still valid and need to be taken into account as though that state had not been disabled. As far as the "early hits" metric is concerned, teo_select() tries to take disabled states into account, but the state index corresponding to the maximum "early hits" value computed by it may be incorrect. Namely, it always uses the index of the previous maximum "early hits" state then, but there may be enabled idle states closer to the disabled one in question. In particular, if the current candidate state (whose index is the idx value) is closer to the disabled one and the "early hits" value of the disabled state is greater than the current maximum, the index of the current candidate state (idx) should replace the "maximum early hits state" index. Modify the code to handle that case correctly. Fixes: b26bf6ab716f ("cpuidle: New timer events oriented governor for tickless systems") Reported-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: 5.1+ <stable@vger.kernel.org> # 5.1+
2019-10-10 14:37:39 -07:00
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
/*
* Update the sums of idle state mertics for all of the states
* shallower than the current one.
*/
intercept_sum += prev_bin->intercepts;
hit_sum += prev_bin->hits;
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
if (dev->states_usage[i].disable)
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
continue;
if (idx < 0)
idx0 = i; /* first enabled state */
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
idx = i;
if (s->exit_latency_ns <= latency_req)
constraint_idx = i;
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
/* Save the sums for the current state. */
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
idx_intercept_sum = intercept_sum;
idx_hit_sum = hit_sum;
}
/* Avoid unnecessary overhead. */
if (idx < 0) {
idx = 0; /* No states enabled, must use 0. */
goto out_tick;
}
if (idx == idx0) {
/*
* Only one idle state is enabled, so use it, but do not
* allow the tick to be stopped it is shallow enough.
*/
duration_ns = drv->states[idx].target_residency_ns;
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
goto end;
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
}
tick_intercept_sum = intercept_sum +
cpu_data->state_bins[drv->state_count-1].intercepts;
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
/*
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
* If the sum of the intercepts metric for all of the idle states
* shallower than the current candidate one (idx) is greater than the
* sum of the intercepts and hits metrics for the candidate state and
* all of the deeper states a shallower idle state is likely to be a
* better choice.
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
*/
prev_intercept_idx = idx;
if (2 * idx_intercept_sum > cpu_data->total - idx_hit_sum) {
int first_suitable_idx = idx;
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
/*
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
* Look for the deepest idle state whose target residency had
* not exceeded the idle duration in over a half of the relevant
* cases in the past.
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
*
* Take the possible duration limitation present if the tick
* has been stopped already into account.
*/
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
intercept_sum = 0;
for (i = idx - 1; i >= 0; i--) {
cpuidle: teo: Rework most recent idle duration values treatment The TEO (Timer Events Oriented) cpuidle governor uses several most recent idle duration values for a given CPU to refine the idle state selection in case the previous long-term trends have not been followed recently and a new trend appears to be forming. That is done by computing the average of the most recent idle duration values falling below the time till the next timer event ("sleep length"), provided that they are the majority of the most recent idle duration values taken into account, and using it as the new expected idle duration value. However, idle state selection based on that value may not be optimal, because the average does not really indicate which of the idle states with target residencies less than or equal to it is likely to be the best fit. Thus, instead of computing the average, make the governor carry out computations based on the distribution of the most recent idle duration values among the bins corresponding to different idle states. Namely, if the majority of the most recent idle duration values taken into consideration are less than the current sleep length (which means that the CPU is likely to wake up early), find the idle state closest to the "candidate" one "matching" the sleep length whose target residency is less than or equal to the majority of the most recent idle duration values that have fallen below the current sleep length (which means that it is likely to be "shallow enough" this time). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:17:18 -07:00
struct teo_bin *bin = &cpu_data->state_bins[i];
cpuidle: teo: Rework most recent idle duration values treatment The TEO (Timer Events Oriented) cpuidle governor uses several most recent idle duration values for a given CPU to refine the idle state selection in case the previous long-term trends have not been followed recently and a new trend appears to be forming. That is done by computing the average of the most recent idle duration values falling below the time till the next timer event ("sleep length"), provided that they are the majority of the most recent idle duration values taken into account, and using it as the new expected idle duration value. However, idle state selection based on that value may not be optimal, because the average does not really indicate which of the idle states with target residencies less than or equal to it is likely to be the best fit. Thus, instead of computing the average, make the governor carry out computations based on the distribution of the most recent idle duration values among the bins corresponding to different idle states. Namely, if the majority of the most recent idle duration values taken into consideration are less than the current sleep length (which means that the CPU is likely to wake up early), find the idle state closest to the "candidate" one "matching" the sleep length whose target residency is less than or equal to the majority of the most recent idle duration values that have fallen below the current sleep length (which means that it is likely to be "shallow enough" this time). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:17:18 -07:00
intercept_sum += bin->intercepts;
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
if (2 * intercept_sum > idx_intercept_sum) {
/*
* Use the current state unless it is too
* shallow or disabled, in which case take the
* first enabled state that is deep enough.
*/
if (teo_state_ok(i, drv) &&
!dev->states_usage[i].disable)
idx = i;
else
idx = first_suitable_idx;
break;
}
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
if (dev->states_usage[i].disable)
continue;
if (!teo_state_ok(i, drv)) {
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
/*
* The current state is too shallow, but if an
* alternative candidate state has been found,
* it may still turn out to be a better choice.
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
*/
if (first_suitable_idx != idx)
continue;
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
break;
}
first_suitable_idx = i;
}
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
}
if (!idx && prev_intercept_idx) {
/*
* We have to query the sleep length here otherwise we don't
* know after wakeup if our guess was correct.
*/
duration_ns = tick_nohz_get_sleep_length(&delta_tick);
cpu_data->sleep_length_ns = duration_ns;
goto out_tick;
}
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
/*
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
* If there is a latency constraint, it may be necessary to select an
* idle state shallower than the current candidate one.
*/
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
if (idx > constraint_idx)
idx = constraint_idx;
/*
* Skip the timers check if state 0 is the current candidate one,
* because an immediate non-timer wakeup is expected in that case.
*/
if (!idx)
goto out_tick;
/*
* If state 0 is a polling one, check if the target residency of
* the current candidate state is low enough and skip the timers
* check in that case too.
*/
if ((drv->states[0].flags & CPUIDLE_FLAG_POLLING) &&
drv->states[idx].target_residency_ns < RESIDENCY_THRESHOLD_NS)
goto out_tick;
duration_ns = tick_nohz_get_sleep_length(&delta_tick);
cpu_data->sleep_length_ns = duration_ns;
/*
* If the closest expected timer is before the target residency of the
* candidate state, a shallower one needs to be found.
*/
if (drv->states[idx].target_residency_ns > duration_ns) {
i = teo_find_shallower_state(drv, dev, idx, duration_ns, false);
if (teo_state_ok(i, drv))
idx = i;
}
cpuidle: teo: Introduce util-awareness Modern interactive systems, such as recent Android phones, tend to have power efficient shallow idle states. Selecting deeper idle states on a device while a latency-sensitive workload is running can adversely impact performance due to increased latency. Additionally, if the CPU wakes up from a deeper sleep before its target residency as is often the case, it results in a waste of energy on top of that. At the moment, none of the available idle governors take any scheduling information into account. They also tend to overestimate the idle duration quite often, which causes them to select excessively deep idle states, thus leading to increased wakeup latency and lower performance with no power saving. For 'menu' while web browsing on Android for instance, those types of wakeups ('too deep') account for over 24% of all wakeups. At the same time, on some platforms idle state 0 can be power efficient enough to warrant wanting to prefer it over idle state 1. This is because the power usage of the two states can be so close that sufficient amounts of too deep state 1 sleeps can completely offset the state 1 power saving to the point where it would've been more power efficient to just use state 0 instead. This is, of course, for systems where state 0 is not a polling state, such as arm-based devices. Sleeps that happened in state 0 while they could have used state 1 ('too shallow') only save less power than they otherwise could have. Too deep sleeps, on the other hand, harm performance and nullify the potential power saving from using state 1 in the first place. While taking this into account, it is clear that on balance it is preferable for an idle governor to have more too shallow sleeps instead of more too deep sleeps on those kinds of platforms. This patch specifically tunes TEO to prefer shallower idle states in order to reduce wakeup latency and achieve better performance. To this end, before selecting the next idle state it uses the avg_util signal of a CPU's runqueue in order to determine to what extent the CPU is being utilized. This util value is then compared to a threshold defined as a percentage of the CPU's capacity (capacity >> 6 ie. ~1.5% in the current implementation). If the util is above the threshold, the index of the idle state selected by TEO metrics will be reduced by 1, thus selecting a shallower state. If the util is below the threshold, the governor defaults to the TEO metrics mechanism to try to select the deepest available idle state based on the closest timer event and its own correctness. The main goal of this is to reduce latency and increase performance for some workloads. Under some workloads it will result in an increase in power usage (Geekbench 5) while for other workloads it will also result in a decrease in power usage compared to TEO (PCMark Web, Jankbench, Speedometer). It can provide drastically decreased latency and performance benefits in certain types of workloads that are sensitive to latency. Example test results: 1. GB5 (better score, latency & more power usage) | metric | menu | teo | teo-util-aware | | ------------------------------------- | -------------- | ----------------- | ----------------- | | gmean score | 2826.5 (0.0%) | 2764.8 (-2.18%) | 2865 (1.36%) | | gmean power usage [mW] | 2551.4 (0.0%) | 2606.8 (2.17%) | 2722.3 (6.7%) | | gmean too deep % | 14.99% | 9.65% | 4.02% | | gmean too shallow % | 2.5% | 5.96% | 14.59% | | gmean task wakeup latency (asynctask) | 78.16μs (0.0%) | 61.60μs (-21.19%) | 54.45μs (-30.34%) | 2. Jankbench (better score, latency & less power usage) | metric | menu | teo | teo-util-aware | | ------------------------------------- | -------------- | ----------------- | ----------------- | | gmean frame duration | 13.9 (0.0%) | 14.7 (6.0%) | 12.6 (-9.0%) | | gmean jank percentage | 1.5 (0.0%) | 2.1 (36.99%) | 1.3 (-17.37%) | | gmean power usage [mW] | 144.6 (0.0%) | 136.9 (-5.27%) | 121.3 (-16.08%) | | gmean too deep % | 26.00% | 11.00% | 2.54% | | gmean too shallow % | 4.74% | 11.89% | 21.93% | | gmean wakeup latency (RenderThread) | 139.5μs (0.0%) | 116.5μs (-16.49%) | 91.11μs (-34.7%) | | gmean wakeup latency (surfaceflinger) | 124.0μs (0.0%) | 151.9μs (22.47%) | 87.65μs (-29.33%) | Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com> [ rjw: Comment edits and white space adjustments ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-05 07:51:59 -07:00
/*
* If the selected state's target residency is below the tick length
* and intercepts occurring before the tick length are the majority of
* total wakeup events, do not stop the tick.
*/
if (drv->states[idx].target_residency_ns < TICK_NSEC &&
tick_intercept_sum > cpu_data->total / 2 + cpu_data->total / 8)
duration_ns = TICK_NSEC / 2;
cpuidle: teo: Change the main idle state selection logic Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 11:16:32 -07:00
end:
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
/*
* Allow the tick to be stopped unless the selected state is a polling
* one or the expected idle duration is shorter than the tick period
* length.
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
*/
if ((!(drv->states[idx].flags & CPUIDLE_FLAG_POLLING) &&
duration_ns >= TICK_NSEC) || tick_nohz_tick_stopped())
return idx;
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
/*
* The tick is not going to be stopped, so if the target residency of
* the state to be returned is not within the time till the closest
* timer including the tick, try to correct that.
*/
if (idx > idx0 &&
drv->states[idx].target_residency_ns > delta_tick)
idx = teo_find_shallower_state(drv, dev, idx, delta_tick, false);
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
out_tick:
*stop_tick = false;
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
return idx;
}
/**
* teo_reflect - Note that governor data for the CPU need to be updated.
* @dev: Target CPU.
* @state: Entered state.
*/
static void teo_reflect(struct cpuidle_device *dev, int state)
{
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
dev->last_state_idx = state;
cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 04:30:47 -07:00
/*
* If the wakeup was not "natural", but triggered by one of the safety
* nets, assume that the CPU might have been idle for the entire sleep
* length time.
*/
if (dev->poll_time_limit ||
(tick_nohz_idle_got_tick() && cpu_data->sleep_length_ns > TICK_NSEC)) {
dev->poll_time_limit = false;
cpu_data->time_span_ns = cpu_data->sleep_length_ns;
} else {
cpu_data->time_span_ns = local_clock() - cpu_data->time_span_ns;
}
}
/**
* teo_enable_device - Initialize the governor's data for the target CPU.
* @drv: cpuidle driver (not used).
* @dev: Target CPU.
*/
static int teo_enable_device(struct cpuidle_driver *drv,
struct cpuidle_device *dev)
{
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
memset(cpu_data, 0, sizeof(*cpu_data));
return 0;
}
static struct cpuidle_governor teo_governor = {
.name = "teo",
.rating = 19,
.enable = teo_enable_device,
.select = teo_select,
.reflect = teo_reflect,
};
static int __init teo_governor_init(void)
{
return cpuidle_register_governor(&teo_governor);
}
postcore_initcall(teo_governor_init);