Commit graph

511 commits

Author SHA1 Message Date
Hakjoo Kim
af0d18c9a4 Merge branch 'linux-3.0.y' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroid-3.0.y
Conflicts:
	fs/block_dev.c
2013-08-13 00:13:19 +09:00
Thomas Gleixner
9ffff08f8a tick: Prevent uncontrolled switch to oneshot mode
commit 1f73a9806b upstream.

When the system switches from periodic to oneshot mode, the broadcast
logic causes a possibility that a CPU which has not yet switched to
oneshot mode puts its own clock event device into oneshot mode without
updating the state and the timer handler.

CPU0				CPU1
				per cpu tickdev is in periodic mode
				and switched to broadcast

Switch to oneshot mode
 tick_broadcast_switch_to_oneshot()
  cpumask_copy(tick_oneshot_broacast_mask,
	       tick_broadcast_mask);

  broadcast device mode = oneshot

				Timer interrupt

				irq_enter()
				 tick_check_oneshot_broadcast()
				  dev->set_mode(ONESHOT);

				tick_handle_periodic()
				 if (dev->mode == ONESHOT)
				   dev->next_event += period;
				   FAIL.

We fail, because dev->next_event contains KTIME_MAX, if the device was
in periodic mode before the uncontrolled switch to oneshot happened.

We must copy the broadcast bits over to the oneshot mask, because
otherwise a CPU which relies on the broadcast would not been woken up
anymore after the broadcast device switched to oneshot mode.

So we need to verify in tick_check_oneshot_broadcast() whether the CPU
has already switched to oneshot mode. If not, leave the device
untouched and let the CPU switch controlled into oneshot mode.

This is a long standing bug, which was never noticed, because the main
user of the broadcast x86 cannot run into that scenario, AFAICT. The
nonarchitected timer mess of ARM creates a gazillion of differently
broken abominations which trigger the shortcomings of that broadcast
code, which better had never been necessary in the first place.

Reported-and-tested-by: Stehle Vincent-B46079 <B46079@freescale.com>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: John Stultz <john.stultz@linaro.org>,
Cc: Mark Rutland <mark.rutland@arm.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1307012153060.4013@ionos.tec.linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28 16:18:24 -07:00
Hakjoo Kim
b3bcd0f6f8 Merge branch 'linux-3.0.y' of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroid-3.0.y 2013-05-24 10:14:50 +09:00
Thomas Gleixner
b9cbfd2730 tick: Cleanup NOHZ per cpu data on cpu down
commit 4b0c0f294f upstream.

Prarit reported a crash on CPU offline/online. The reason is that on
CPU down the NOHZ related per cpu data of the dead cpu is not cleaned
up. If at cpu online an interrupt happens before the per cpu tick
device is registered the irq_enter() check potentially sees stale data
and dereferences a NULL pointer.

Cleanup the data after the cpu is dead.

Reported-by: Prarit Bhargava <prarit@redhat.com>
Cc: Mike Galbraith <bitbucket@online.de>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305031451561.2886@ionos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-19 10:04:40 -07:00
Thomas Gleixner
a35089a9cc clockevents: Set dummy handler on CPU_DEAD shutdown
commit 6f7a05d701 upstream.

Vitaliy reported that a per cpu HPET timer interrupt crashes the
system during hibernation. What happens is that the per cpu HPET timer
gets shut down when the nonboot cpus are stopped. When the nonboot
cpus are onlined again the HPET code sets up the MSI interrupt which
fires before the clock event device is registered. The event handler
is still set to hrtimer_interrupt, which then crashes the machine due
to highres mode not being active.

See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=700333

There is no real good way to avoid that in the HPET code. The HPET
code alrady has a mechanism to detect spurious interrupts when event
handler == NULL for a similar reason.

We can handle that in the clockevent/tick layer and replace the
previous functional handler with a dummy handler like we do in
tick_setup_new_device().

The original clockevents code did this in clockevents_exchange_device(),
but that got removed by commit 7c1e76897 (clockevents: prevent
clockevent event_handler ending up handler_noop) which forgot to fix
it up in tick_shutdown(). Same issue with the broadcast device.

Reported-by: Vitaliy Fillipov <vitalif@yourcmc.ru>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: 700333@bugs.debian.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-07 19:57:26 -07:00
Hakjoo Kim
b0ceff27ee Merge branch 'linux-3.0.y' of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into odroid-3.0.y 2013-04-04 19:34:06 +09:00
Mark Rutland
611f2aaf1f clockevents: Don't allow dummy broadcast timers
commit a7dc19b865 upstream.

Currently tick_check_broadcast_device doesn't reject clock_event_devices
with CLOCK_EVT_FEAT_DUMMY, and may select them in preference to real
hardware if they have a higher rating value. In this situation, the
dummy timer is responsible for broadcasting to itself, and the core
clockevents code may attempt to call non-existent callbacks for
programming the dummy, eventually leading to a panic.

This patch makes tick_check_broadcast_device always reject dummy timers,
preventing this problem.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Jon Medhurst (Tixy) <tixy@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-28 12:06:03 -07:00
Michal Hocko
73b883a6ba nohz: Make idle/iowait counter update conditional
get_cpu_{idle,iowait}_time_us update idle/iowait counters
unconditionally if the given CPU is in the idle loop.

This doesn't work well outside of CPU governors which are singletons
so nobody (except for IRQ) can race with them.

We will need to use both functions from /proc/stat handler to properly
handle nohz idle/iowait times.

Make the update depend on a non NULL last_update_time argument.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Dave Jones <davej@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Link: http://lkml.kernel.org/r/11f23179472635ce52e78921d47a20216b872f23.1314172057.git.mhocko@suse.cz
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-01-17 18:30:54 +09:00
Michal Hocko
c06ffc37f0 nohz: Fix update_ts_time_stat idle accounting
update_ts_time_stat currently updates idle time even if we are in
iowait loop at the moment. The only real users of the idle counter
(via get_cpu_idle_time_us) are CPU governors and they expect to get
cumulative time for both idle and iowait times.
The value (idle_sleeptime) is also printed to userspace by print_cpu
but it prints both idle and iowait times so the idle part is misleading.

Let's clean this up and fix update_ts_time_stat to account both counters
properly and update consumers of idle to consider iowait time as well.
If we do this we might use get_cpu_{idle,iowait}_time_us from other
contexts as well and we will get expected values.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Dave Jones <davej@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Link: http://lkml.kernel.org/r/e9c909c221a8da402c4da07e4cd968c3218f8eb1.1314172057.git.mhocko@suse.cz
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-01-17 18:30:51 +09:00
Heiko Carstens
8f02fa5741 nohz: Remove "Switched to NOHz mode" debugging messages
When performing cpu hotplug tests the kernel printk log buffer gets flooded
with pointless "Switched to NOHz mode..." messages. Especially when afterwards
analyzing a dump this might have removed more interesting stuff out of the
buffer.
Assuming that switching to NOHz mode simply works just remove the printk.

Change-Id: I1746f8c0119a512055716c3fd77a966b735ca49b
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Link: http://lkml.kernel.org/r/20110823112046.GB2540@osiris.boeblingen.de.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jeff Ohlstein <johlstei@codeaurora.org>
2013-01-17 18:30:47 +09:00
Shi, Alex
cc28f6c7b1 nohz: Remove nohz_cpu_mask
RCU no longer uses this global variable, nor does anyone else.  This
commit therefore removes this variable.  This reduces memory footprint
and also removes some atomic instructions and memory barriers from
the dyntick-idle path.

Signed-off-by: Alex Shi <alex.shi@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2013-01-17 18:30:42 +09:00
Frederic Weisbecker
95d6eff616 nohz: Remove update_ts_time_stat from tick_nohz_start_idle
From: Michal Hocko <mhocko@suse.cz>

There is no reason to call update_ts_time_stat from tick_nohz_start_idle
anymore (after e0e37c20 sched: Eliminate the ts->idle_lastupdate field)
when we updated idle_lastupdate unconditionally.

We haven't set idle_active yet and do not provide last_update_time so
the whole call end up being just 2 wasted branches.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2013-01-17 18:30:39 +09:00
Frederic Weisbecker
d3e7367488 nohz: Remove ts->inidle checks before restarting the tick
ts->inidle is set by tick_nohz_stop_sched_tick(1) and unset
by tick_nohz_restart_sched_tick(). These two calls are assumed
to be always paired. This means that by the time we call
tick_nohz_restart_sched_tick(), ts->inidle is supposed to
be always set to 1.

Remove the checks for ts->inidle in tick_nohz_restart_sched_tick().
This simplifies a bit the code and improves its debuggability
(ie: ensure the call is paired with a tick_nohz_stop_sched_tick() call).

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
2013-01-17 18:30:34 +09:00
Charles Wang
cc03796685 sched: Add missing call to calc_load_exit_idle()
Azat Khuzhin reported high loadavg in Linux v3.6

After checking the upstream scheduler code, I found Peter's commit:

  5167e8d541 sched/nohz: Rewrite and fix load-avg computation -- again

not fully applied, missing the call to calc_load_exit_idle().

After that idle exit in sampling window will always be calculated
to non-idle, and the load will be higher than normal.

This patch adds the missing call to calc_load_exit_idle().

Signed-off-by: Charles Wang <muming.wq@taobao.com>
Cc: stable@kernel.org
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1345449754-27130-1-git-send-email-muming.wq@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Conflicts:

	kernel/time/tick-sched.c
2013-01-17 18:29:42 +09:00
Andrei F
c809f4d8d4 sched/nohz: Rewrite, fix and add more accurate load-avg computation
Part 1)

After patch 453494c3d4 (sched: Fix nohz load accounting -- again!), we can fold
the idle into calc_load_tasks_idle between the last cpu load calculating and
calc_global_load calling. However problem still exits between the first cpu
load calculating and the last cpu load calculating. Every time when we do load
calculating, calc_load_tasks_idle will be added into calc_load_tasks, even if
the idle load is caused by calculated cpus. Consider following case:
5HZ+1
| cpu0_load     cpu1    cpu2    cpu3    calc_load_tasks    tasks_idle
|    1           1       1       1
|  -->calc_load                             1                  0
|    1           1       1       1
|                    -->calc_load           2                  0
|    0           0       1       0
|                    -->calc_load           2+1-3=0            -3
|    1           1       0       1
|                            -->calc_load   1-1=0              -1
V
5HZ+11     -->calc_global_load              0                  0

actually the load should be around 3, but shows nearly 0.

This can be found in our work load. The average running processes number
is about 15, but the load only shows about 4.

We provides a solution, by taking those load not calculated cpus' idle out from
global idle as calc_unmask_cpu_load_idle. Then when calc_load execute on every
cpu, we only fold calc_unmask_cpu_load_idle. After this patch, case above
should be as follow:
5HZ+1
| cpu0_load     cpu1    cpu2    cpu3   calc_load_tasks tasks_idle  unmask_idle
|    1           1       1       1
|  -->calc_load                             1              0           0
|    1           1       1       1
|              -->calc_load                 2              0           0
|    0           0       1       0
|                    -->calc_load           2+1-1=2        -3          -1
|    1           1       0       1
|                            -->calc_load   2+1=3          -2-1=-3     0
V
5HZ+11     -->calc_global_load              3              -3          0

CC: Peter Zijlstra <peterz@infradead.org>
CC: Doug Smythies <dsmythies@telus.net>
CC: Ingo Molnar <mingo@redhat.com>
CC: Tao Ma <tm@tao.ma>
CC: Sha Zhengju <handai.szj@gmail.com>
Reported-by: Sha Zhengju <handai.szj@gmail.com>
Signed-off-by: Charles Wang <muming.wq@taobao.com>

Part 2)

Thanks to Charles Wang for spotting the defects in the current code:

 - If we go idle during the sample window -- after sampling, we get a
   negative bias because we can negate our own sample.

 - If we wake up during the sample window we get a positive bias
   because we push the sample to a known active period.

So rewrite the entire nohz load-avg muck once again, now adding
copious documentation to the code.

Reported-and-tested-by: Doug Smythies <dsmythies@telus.net>
Reported-and-tested-by: Charles Wang <muming.wq@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/1340373782.18025.74.camel@twins
[ minor edits ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-17 18:29:37 +09:00
Hakjoo Kim
c3242d25d0 Merge 3.0.51 from Hardkernel svn repo 2012-12-31 14:20:49 +09:00
John Stultz
fe979e2c0a time: Move ktime_t overflow checking into timespec_valid_strict
commit cee58483cf upstream

Andreas Bombe reported that the added ktime_t overflow checking added to
timespec_valid in commit 4e8b14526c ("time: Improve sanity checking of
timekeeping inputs") was causing problems with X.org because it caused
timeouts larger then KTIME_T to be invalid.

Previously, these large timeouts would be clamped to KTIME_MAX and would
never expire, which is valid.

This patch splits the ktime_t overflow checking into a new
timespec_valid_strict function, and converts the timekeeping codes
internal checking to use this more strict function.

Reported-and-tested-by: Andreas Bombe <aeb@debian.org>
Cc: Zhouping Liu <zliu@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02 09:47:52 -07:00
John Stultz
4ffa9a8069 time: Avoid making adjustments if we haven't accumulated anything
commit bf2ac31219 upstream

If update_wall_time() is called and the current offset isn't large
enough to accumulate, avoid re-calling timekeeping_adjust which may
change the clock freq and can cause 1ns inconsistencies with
CLOCK_REALTIME_COARSE/CLOCK_MONOTONIC_COARSE.

Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1345595449-34965-5-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02 09:47:51 -07:00
John Stultz
44fa9a0111 time: Improve sanity checking of timekeeping inputs
commit 4e8b14526c upstream

Unexpected behavior could occur if the time is set to a value large
enough to overflow a 64bit ktime_t (which is something larger then the
year 2262).

Also unexpected behavior could occur if large negative offsets are
injected via adjtimex.

So this patch improves the sanity check timekeeping inputs by
improving the timespec_valid() check, and then makes better use of
timespec_valid() to make sure we don't set the time to an invalid
negative value or one that overflows ktime_t.

Note: This does not protect from setting the time close to overflowing
ktime_t and then letting natural accumulation cause the overflow.

Reported-by: CAI Qian <caiqian@redhat.com>
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Zhouping Liu <zliu@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1344454580-17031-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02 09:47:44 -07:00
John Stultz
dccecc646f ntp: Fix STA_INS/DEL clearing bug
commit 6b1859dba0 upstream.

In commit 6b43ae8a61, I
introduced a bug that kept the STA_INS or STA_DEL bit
from being cleared from time_status via adjtimex()
without forcing STA_PLL first.

Usually once the STA_INS is set, it isn't cleared
until the leap second is applied, so its unlikely this
affected anyone. However during testing I noticed it
took some effort to cancel a leap second once STA_INS
was set.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Link: http://lkml.kernel.org/r/1342156917-25092-2-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-01 12:26:53 -07:00
Thomas Gleixner
0851978b66 timekeeping: Add missing update call in timekeeping_resume()
This is a backport of 3e997130bd

The leap second rework unearthed another issue of inconsistent data.

On timekeeping_resume() the timekeeper data is updated, but nothing
calls timekeeping_update(), so now the update code in the timer
interrupt sees stale values.

This has been the case before those changes, but then the timer
interrupt was using stale data as well so this went unnoticed for quite
some time.

Add the missing update call, so all the data is consistent everywhere.

Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Reported-and-tested-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Reported-and-tested-by: Martin Steigerwald <Martin@lichtvoll.de>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
Cc: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-19 08:58:46 -07:00
Thomas Gleixner
22f4bbcfb1 timekeeping: Provide hrtimer update function
This is a backport of f6c06abfb3

To finally fix the infamous leap second issue and other race windows
caused by functions which change the offsets between the various time
bases (CLOCK_MONOTONIC, CLOCK_REALTIME and CLOCK_BOOTTIME) we need a
function which atomically gets the current monotonic time and updates
the offsets of CLOCK_REALTIME and CLOCK_BOOTTIME with minimalistic
overhead. The previous patch which provides ktime_t offsets allows us
to make this function almost as cheap as ktime_get() which is going to
be replaced in hrtimer_interrupt().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Link: http://lkml.kernel.org/r/1341960205-56738-7-git-send-email-johnstul@us.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-19 08:58:46 -07:00
Thomas Gleixner
03a90b9a6f timekeeping: Maintain ktime_t based offsets for hrtimers
This is a backport of 5b9fe759a6

We need to update the hrtimer clock offsets from the hrtimer interrupt
context. To avoid conversions from timespec to ktime_t maintain a
ktime_t based representation of those offsets in the timekeeper. This
puts the conversion overhead into the code which updates the
underlying offsets and provides fast accessible values in the hrtimer
interrupt.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Link: http://lkml.kernel.org/r/1341960205-56738-4-git-send-email-johnstul@us.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-19 08:58:46 -07:00
John Stultz
d21e4baf45 timekeeping: Fix leapsecond triggered load spike issue
This is a backport of 4873fa070a

The timekeeping code misses an update of the hrtimer subsystem after a
leap second happened. Due to that timers based on CLOCK_REALTIME are
either expiring a second early or late depending on whether a leap
second has been inserted or deleted until an operation is initiated
which causes that update. Unless the update happens by some other
means this discrepancy between the timekeeping and the hrtimer data
stays forever and timers are expired either early or late.

The reported immediate workaround - $ data -s "`date`" - is causing a
call to clock_was_set() which updates the hrtimer data structures.
See: http://www.sheeri.com/content/mysql-and-leap-second-high-cpu-and-fix

Add the missing clock_was_set() call to update_wall_time() in case of
a leap second event. The actual update is deferred to softirq context
as the necessary smp function call cannot be invoked from hard
interrupt context.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
Reported-by: Jan Engelhardt <jengelh@inai.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Link: http://lkml.kernel.org/r/1341960205-56738-3-git-send-email-johnstul@us.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-19 08:58:43 -07:00
Thomas Gleixner
c7e2580578 time: Move common updates to a function
This is a backport of cc06268c6a

While not a bugfix itself, it allows following fixes to backport
in a more straightforward manner.

CC: Thomas Gleixner <tglx@linutronix.de>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-19 08:58:32 -07:00
John Stultz
c33f2424c3 timekeeping: Fix CLOCK_MONOTONIC inconsistency during leapsecond
This is a backport of fad0c66c4b
which resolves a bug the previous commit.

Commit 6b43ae8a61 (ntp: Fix leap-second hrtimer livelock) broke the
leapsecond update of CLOCK_MONOTONIC. The missing leapsecond update to
wall_to_monotonic causes discontinuities in CLOCK_MONOTONIC.

Adjust wall_to_monotonic when NTP inserted a leapsecond.

Reported-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Tested-by: Richard Cochran <richardcochran@gmail.com>
Link: http://lkml.kernel.org/r/1338400497-12420-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-19 08:58:27 -07:00
Richard Cochran
96bab736ba ntp: Correct TAI offset during leap second
This is a backport of dd48d708ff

When repeating a UTC time value during a leap second (when the UTC
time should be 23:59:60), the TAI timescale should not stop. The kernel
NTP code increments the TAI offset one second too late. This patch fixes
the issue by incrementing the offset during the leap second itself.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-19 08:58:22 -07:00
John Stultz
9c24771f84 ntp: Fix leap-second hrtimer livelock
This is a backport of 6b43ae8a61

This should have been backported when it was commited, but I
mistook the problem as requiring the ntp_lock changes
that landed in 3.4 in order for it to occur.

Unfortunately the same issue can happen (with only one cpu)
as follows:
do_adjtimex()
 write_seqlock_irq(&xtime_lock);
  process_adjtimex_modes()
   process_adj_status()
    ntp_start_leap_timer()
     hrtimer_start()
      hrtimer_reprogram()
       tick_program_event()
        clockevents_program_event()
         ktime_get()
          seq = req_seqbegin(xtime_lock); [DEADLOCK]

This deadlock will no always occur, as it requires the
leap_timer to force a hrtimer_reprogram which only happens
if its set and there's no sooner timer to expire.

NOTE: This patch, being faithful to the original commit,
introduces a bug (we don't update wall_to_monotonic),
which will be resovled by backporting a following fix.

Original commit message below:

Since commit 7dffa3c673 the ntp
subsystem has used an hrtimer for triggering the leapsecond
adjustment. However, this can cause a potential livelock.

Thomas diagnosed this as the following pattern:
CPU 0                                                    CPU 1
do_adjtimex()
  spin_lock_irq(&ntp_lock);
    process_adjtimex_modes();				 timer_interrupt()
      process_adj_status();                                do_timer()
        ntp_start_leap_timer();                             write_lock(&xtime_lock);
          hrtimer_start();                                  update_wall_time();
             hrtimer_reprogram();                            ntp_tick_length()
               tick_program_event()                            spin_lock(&ntp_lock);
                 clockevents_program_event()
		   ktime_get()
                     seq = req_seqbegin(xtime_lock);

This patch tries to avoid the problem by reverting back to not using
an hrtimer to inject leapseconds, and instead we handle the leapsecond
processing in the second_overflow() function.

The downside to this change is that on systems that support highres
timers, the leap second processing will occur on a HZ tick boundary,
(ie: ~1-10ms, depending on HZ)  after the leap second instead of
possibly sooner (~34us in my tests w/ x86_64 lapic).

This patch applies on top of tip/timers/core.

CC: Sasha Levin <levinsasha928@gmail.com>
CC: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Diagnoised-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sasha Levin <levinsasha928@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-19 08:58:22 -07:00
Neal Cardwell
2ca8877b56 nohz: Fix stale jiffies update in tick_nohz_restart()
commit 6f103929f8 upstream.

Fix tick_nohz_restart() to not use a stale ktime_t "now" value when
calling tick_do_update_jiffies64(now).

If we reach this point in the loop it means that we crossed a tick
boundary since we grabbed the "now" timestamp, so at this point "now"
refers to a time in the old jiffy, so using the old value for "now" is
incorrect, and is likely to give us a stale jiffies value.

In particular, the first time through the loop the
tick_do_update_jiffies64(now) call is always a no-op, since the
caller, tick_nohz_restart_sched_tick(), will have already called
tick_do_update_jiffies64(now) with that "now" value.

Note that tick_nohz_stop_sched_tick() already uses the correct
approach: when we notice we cross a jiffy boundary, grab a new
timestamp with ktime_get(), and *then* update jiffies.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Link: http://lkml.kernel.org/r/1332875377-23014-1-git-send-email-ncardwell@google.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-22 16:21:25 -07:00
Sasha Levin
d16d493a1c ntp: Fix integer overflow when setting time
commit a078c6d0e6 upstream.

'long secs' is passed as divisor to div_s64, which accepts a 32bit
divisor. On 64bit machines that value is trimmed back from 8 bytes
back to 4, causing a divide by zero when the number is bigger than
(1 << 32) - 1 and all 32 lower bits are 0.

Use div64_long() instead.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Cc: johnstul@us.ibm.com
Link: http://lkml.kernel.org/r/1331829374-31543-2-git-send-email-levinsasha928@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-02 09:27:09 -07:00
Linus Torvalds
ef7386b5de Revert "clockevents: Set noop handler in clockevents_exchange_device()"
commit 3b87487ac5 upstream.

This reverts commit de28f25e82.

It results in resume problems for various people. See for example

  http://thread.gmane.org/gmane.linux.kernel/1233033
  http://thread.gmane.org/gmane.linux.kernel/1233389
  http://thread.gmane.org/gmane.linux.kernel/1233159
  http://thread.gmane.org/gmane.linux.kernel/1227868/focus=1230877

and the fedora and ubuntu bug reports

  https://bugzilla.redhat.com/show_bug.cgi?id=767248
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/904569

which got bisected down to the stable version of this commit.

Reported-by: Jonathan Nieder <jrnieder@gmail.com>
Reported-by: Phil Miller <mille121@illinois.edu>
Reported-by: Philip Langdale <philipl@overt.org>
Reported-by: Tim Gardner <tim.gardner@canonical.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-03 10:42:00 -08:00
Thomas Gleixner
612e5dbc55 alarmtimers: Fix time comparison
commit c9c024b3f3 upstream.

The expiry function compares the timer against current time and does
not expire the timer when the expiry time is >= now. That's wrong. If
the timer is set for now, then it must expire.

Make the condition expiry > now for breaking out the loop.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-21 12:57:34 -08:00
Thomas Gleixner
b01b383bbd clockevents: Set noop handler in clockevents_exchange_device()
commit de28f25e82 upstream.

If a device is shutdown, then there might be a pending interrupt,
which will be processed after we reenable interrupts, which causes the
original handler to be run. If the old handler is the (broadcast)
periodic handler the shutdown state might hang the kernel completely.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09 08:52:53 -08:00
Yang Honggang (Joseph)
4078977c46 clocksource: Fix bug with max_deferment margin calculation
commit b1f919664d upstream.

In order to leave a margin of 12.5% we should >> 3 not >> 5.

Signed-off-by: Yang Honggang (Joseph) <eagle.rtlinux@gmail.com>
[jstultz: Modified commit subject]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09 08:52:52 -08:00
Thomas Gleixner
d80dee5453 tick-broadcast: Stop active broadcast device when replacing it
commit c1be84309c upstream.

When a better rated broadcast device is installed, then the current
active device is not disabled, which results in two running broadcast
devices.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09 08:52:48 -08:00
Hector Palacios
e1ef77bdad timekeeping: add arch_offset hook to ktime_get functions
commit d004e02405 upstream.

ktime_get and ktime_get_ts were calling timekeeping_get_ns()
but later they were not calling arch_gettimeoffset() so architectures
using this mechanism returned 0 ns when calling these functions.

This happened for example when running Busybox's ping which calls
syscall(__NR_clock_gettime, CLOCK_MONOTONIC, ts) which eventually
calls ktime_get. As a result the returned ping travel time was zero.

Signed-off-by: Hector Palacios <hector.palacios@digi.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09 08:52:28 -08:00
John Stultz
c3a44b4d23 alarmtimers: Avoid possible denial of service with high freq periodic timers
commit 6af7e471e5 upstream.

Its possible to jam up the alarm timers by setting very small interval
timers, which will cause the alarmtimer subsystem to spend all of its time
firing and restarting timers. This can effectivly lock up a box.

A deeper fix is needed, closely mimicking the hrtimer code, but for now
just cap the interval to 100us to avoid userland hanging the system.

CC: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:07 -07:00
John Stultz
0898dd1603 alarmtimers: Memset itimerspec passed into alarm_timer_get
commit ea7802f630 upstream.

Following common_timer_get, zero out the itimerspec passed in.

CC: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:07 -07:00
John Stultz
26cf1a7ba1 alarmtimers: Avoid possible null pointer traversal
commit 971c90bfa2 upstream.

We don't check if old_setting is non null before assigning it, so
correct this.

CC: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:06 -07:00
John Stultz
1c6b39ad3f alarmtimers: Return -ENOTSUPP if no RTC device is present
Toralf Förster and Richard Weinberger noted that if there is
no RTC device, the alarm timers core prints out an annoying
"ALARM timers will not wake from suspend" message.

This warning has been removed in a previous patch, however
the issue still remains:  The original idea was to support
alarm timers even if there was no rtc device, as long as the
system didn't go into suspend.

However, after further consideration, communicating to the application
that alarmtimers are not fully functional seems like the better
solution.

So this patch makes it so we return -ENOTSUPP to any posix _ALARM
clockid calls if there is no backing RTC device on the system.

Further this changes the behavior where when there is no rtc device
we will check for one on clock_getres, clock_gettime, timer_create,
and timer_nsleep instead of on suspend.

CC: Toralf Förster <toralf.foerster@gmx.de>
CC: Richard Weinberger <richard@nod.at
CC: Peter Zijlstra <peterz@infradead.org>
CC: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Reported by: Richard Weinberger <richard@nod.at>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2011-06-21 16:32:28 -07:00
John Stultz
c008ba58af alarmtimers: Handle late rtc module loading
The alarmtimers code currently picks a rtc device to use at
late init time. However, if your rtc driver is loaded as a module,
it may be registered after the alarmtimers late init code, leaving
the alarmtimers nonfunctional.

This patch moves the the rtcdevice selection to when we actually try
to use it, allowing us to make use of rtc modules that may have been
loaded at any point since bootup.

CC: Thomas Gleixner <tglx@linutronix.de>
CC: Meelis Roos <mroos@ut.ee>
Reported-by: Meelis Roos <mroos@ut.ee>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2011-06-21 15:38:33 -07:00
Thomas Gleixner
b5199515c2 clocksource: Make watchdog robust vs. interruption
The clocksource watchdog code is interruptible and it has been
observed that this can trigger false positives which disable the TSC.

The reason is that an interrupt storm or a long running interrupt
handler between the read of the watchdog source and the read of the
TSC brings the two far enough apart that the delta is larger than the
unstable treshold. Move both reads into a short interrupt disabled
region to avoid that.

Reported-and-tested-by: Vernon Mauery <vernux@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org
2011-06-16 19:30:53 +02:00
Thomas Gleixner
1b054b67d3 clockevents: Handle empty cpumask gracefully
For UP it's stupid to request an initialized cpumask for the clock
event devices. Though we need the mask set even on UP to avoid a
horrible ifdeffery especially in the broadcast code.

For SMP we can at least try to survive with a warning and set the
cpumask of the cpu we're running on. That gives a decent chance to
bring the machine up and retrieve the debug info.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Walleij <linus.walleij@linaro.org
Cc: Lee Jones <lee.jones@linaro.org>
Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
Cc: Stephen Boyd <sboyd@codeaurora.org>
2011-06-03 11:13:33 +02:00
Thomas Gleixner
ab8177bc53 hrtimers: Avoid touching inactive timer bases
Instead of iterating over all possible timer bases avoid it by marking
the active bases in the cpu base.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
2011-05-23 13:59:54 +02:00
Thomas Gleixner
250f972d85 Merge branch 'timers/urgent' into timers/core
Reason: Get upstream fixes and kfree_rcu which is necessary for a
follow up patch.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-20 20:08:05 +02:00
Thomas Gleixner
c0e299b1a9 clockevents/source: Use u64 to make 32bit happy
unsigned long is not 64bit on 32bit machine.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-20 10:50:52 +02:00
Linus Torvalds
78c4def67e Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  hrtimer: Make lookup table const
  RTC: Disable CONFIG_RTC_CLASS from being built as a module
  timers: Fix alarmtimer build issues when CONFIG_RTC_CLASS=n
  timers: Remove delayed irqwork from alarmtimers implementation
  timers: Improve alarmtimer comments and minor fixes
  timers: Posix interface for alarm-timers
  timers: Introduce in-kernel alarm-timer interface
  timers: Add rb_init_node() to allow for stack allocated rb nodes
  time: Add timekeeping_inject_sleeptime
2011-05-19 17:45:08 -07:00
Linus Torvalds
7e6628e4bc Merge branch 'timers-clockevents-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-clockevents-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: hpet: Cleanup the clockevents init and register code
  x86: Convert PIT to clockevents_config_and_register()
  clockevents: Provide interface to reconfigure an active clock event device
  clockevents: Provide combined configure and register function
  clockevents: Restructure clock_event_device members
  clocksource: Get rid of the hardcoded 5 seconds sleep time limit
  clocksource: Restructure clocksource struct members
2011-05-19 17:44:40 -07:00
Thomas Gleixner
80b816b736 clockevents: Provide interface to reconfigure an active clock event device
Some ARM SoCs have clock event devices which have their frequency
modified due to frequency scaling. Provide an interface which allows
to reconfigure an active device. After reconfiguration reprogram the
current pending event.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: LAK <linux-arm-kernel@lists.infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Link: http://lkml.kernel.org/r/%3C20110518210136.437459958%40linutronix.de%3E
2011-05-19 14:24:16 +02:00
Thomas Gleixner
57f0fcbe1d clockevents: Provide combined configure and register function
All clockevent devices have the same open coded initialization
functions. Provide an interface which does all necessary
initialization in the core code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Link: http://lkml.kernel.org/r/%3C20110518210136.331975870%40linutronix.de%3E
2011-05-19 14:24:15 +02:00