Fix `arch_timing_cycles_get()` to prevent overflow on 32bit cycles
rollover. Also make `arch_timing_counter_get()` to work 64bit when
`CONFIG_TIMER_HAS_64BIT_CYCLE_COUNTER` is set.
The issue was observable, for example when `tests/benchmarks/wait_queues`
or `tests/benchmarks/sched_queues` were executed on qemu for `mps2/an385`
and the benchmark has its iterations large enough as the default
BENCHMARK_NUM_ITERATIONS=1000.
Signed-off-by: Dmitrii Golovanov <dmitrii.golovanov@intel.com>