It now computes slightly more bytes to account for ATB size using
truncated division. In other words, any remaining block space that
doesn't fill an ATB byte won't be used. So, we round up our next
area size to use an exact number of ATB bytes.
Fixes#10451
By selectively collecting an allocation, we can skip scanning many
allocations for pointers because we know up front they won't have
them. This helps a ton when large buffers are being used and memory is
slow (PSRAM). In one Fruit Jam example GC times drop from 80+ms to
~25ms. The example uses a number of bitmaps that are now no longer
scanned.
- Renamed gc_sweep to gc_sweep_free_blocks.
- Call gc_sweep_run_finalisers from top level.
- Reordered the gc static functions to be in approximate
runtime sequence (with forward declarations) rather than
in declaration order.
This work was funded through GitHub Sponsors.
Signed-off-by: Angus Gratton <angus@redyak.com.au>
Do this by tracking being inside gc collection with a
separate flag, GC_COLLECT_FLAG. In gc_free(),
ignore this flag when determining if the heap is locked.
* For finalisers calling gc_free() when heap is otherwise unlocked,
this allows memory to be immediately freed (potentially
avoiding a MemoryError).
* Hard IRQs still can't call gc_free(), as heap will be locked via
gc_lock().
* If finalisers are disabled then all of this code can be compiled
out to save some code size.
Signed-off-by: Angus Gratton <angus@redyak.com.au>
Currently a finalizer may run and access memory which has already been
freed. (This happens mostly during gc_sweep_all() but could happen during
any garbage collection pass.)
Includes some speed improvement tweaks to skip empty FTB blocks. These help
compensate for the inherent slowdown of having to walk the heap twice.
Signed-off-by: Angus Gratton <angus@redyak.com.au>
These are old, unused, and most of them no longer compile. The `gc_test()`
function is superseded by the test suite.
Signed-off-by: Damien George <damien@micropython.org>
The STATIC macro was introduced a very long time ago in commit
d5df6cd44a. The original reason for this was
to have the option to define it to nothing so that all static functions
become global functions and therefore visible to certain debug tools, so
one could do function size comparison and other things.
This STATIC feature is rarely (if ever) used. And with the use of LTO and
heavy inline optimisation, analysing the size of individual functions when
they are not static is not a good representation of the size of code when
fully optimised.
So the macro does not have much use and it's simpler to just remove it.
Then you know exactly what it's doing. For example, newcomers don't have
to learn what the STATIC macro is and why it exists. Reading the code is
also less "loud" with a lowercase static.
One other minor point in favour of removing it, is that it stops bugs with
`STATIC inline`, which should always be `static inline`.
Methodology for this commit was:
1) git ls-files | egrep '\.[ch]$' | \
xargs sed -Ei "s/(^| )STATIC($| )/\1static\2/"
2) Do some manual cleanup in the diff by searching for the word STATIC in
comments and changing those back.
3) "git-grep STATIC docs/", manually fixed those cases.
4) "rg -t python STATIC", manually fixed codegen lines that used STATIC.
This work was funded through GitHub Sponsors.
Signed-off-by: Angus Gratton <angus@redyak.com.au>
There are two main changes here to improve the calculation of the size of
the next heap area when automatically expanding the heap:
- Compute the existing total size by counting the total number of GC
blocks, and then using that to compute the corresponding number of bytes.
- Round the bytes value up to the nearest multiple of BYTES_PER_BLOCK.
This makes the calculation slightly simpler and more accurate, and makes
sure that, in the case of growing from one area to two areas, the number
of bytes allocated from the system for the second area is the same as the
first. For example on esp32 with an initial area size of 65536 bytes, the
subsequent allocation is also 65536 bytes. Previously it was a number that
was not even a multiple of 2.
Signed-off-by: Damien George <damien@micropython.org>
This simplifies allocating outside of the VM because the VM doesn't
take up all remaining memory by default.
On ESP we delegate to the IDF for allocations. For all other ports,
we use TLSF to manage an outer "port" heap. The IDF uses TLSF
internally and we use their fork for the other ports.
This also removes the dynamic C stack sizing. It wasn't often used
and is not possible with a fixed outer heap.
Fixes#8512. Fixes#7334.
When set, the split heap is automatically extended with new areas on
demand, and shrunk if a heap area becomes empty during a GC pass or soft
reset.
To save code size the size allocation for a new heap block (including
metadata) is estimated at 103% of the failed allocation, rather than
working from the more complex algorithm in gc_try_add_heap(). This appears
to work well except in the extreme limit case when almost all RAM is
exhausted (~last few hundred bytes). However in this case some allocation
is likely to fail soon anyhow.
Currently there is no API to manually add a block of a given size to the
heap, although that could easily be added if necessary.
This work was funded through GitHub Sponsors.
Signed-off-by: Angus Gratton <angus@redyak.com.au>
This commit:
- Breaks up some long lines for readability.
- Fixes a potential macro argument expansion issue.
This work was funded through GitHub Sponsors.
Signed-off-by: Angus Gratton <angus@redyak.com.au>
In applications that use little memory and run GC regularly, the cost of
the sweep phase quickly becomes prohibitives as the amount of RAM
increases.
On an ESP32-S3 with 2 MB of external SPIRAM, for example, a trivial GC
cycle takes a minimum of 40ms, virtually all of it in the sweep phase.
Similarly, on the UNIX port with 1 GB of heap, a trivial GC takes 47 ms,
again virtually all of it in the sweep phase.
This commit speeds up the sweep phase in the case most of the heap is empty
by keeping track of the ID of the highest block we allocated in an area
since the last GC.
The performance benchmark run on PYBV10 shows between +0 and +2%
improvement across the existing performance tests. These tests don't
really stress the GC, so they were also run with gc.threshold(30000) and
gc.threshold(10000). For the 30000 case, performance improved by up to
+10% with this commit. For the 10000 case, performance improved by at
least +10% on 6 tests, and up to +25%.
Signed-off-by: Damien George <damien@micropython.org>
Changes in this commit:
- Add MICROPY_GC_HOOK_LOOP to gc_info() and gc_alloc(). Both of these can
be long running (many milliseconds) which is too long to be blocking in
some applications.
- Pass loop variable to MICROPY_GC_HOOK_LOOP(i) macro so that implementers
can use it, e.g. to improve performance by only calling a function every
X number of iterations.
- Drop outer call to MICROPY_GC_HOOK_LOOP in gc_mark_subtree().
This adds a mechanism to track a pending notify/indicate operation that
is deferred due to the send buffer being full. This uses a tracked alloc
that is passed as the content arg to the callback.
This replaces the previous mechanism that did this via the global pending
op queue, shared with client read/write ops.
Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
* Enable dcache for OCRAM where the VM heap lives.
* Add CIRCUITPY_SWO_TRACE for pushing program counters out over the
SWO pin via the ITM module in the CPU. Exempt some functions from
instrumentation to reduce traffic and allow inlining.
* Place more functions in ITCM to handle errors using code in RAM-only
and speed up CP.
* Use SET and CLEAR registers for digitalio. The SDK does read, mask
and write.
* Switch to 2MiB reserved for CircuitPython code. Up from 1MiB.
* Run USB interrupts during flash erase and write.
* Allow storage writes from CP if the USB drive is disabled.
* Get perf bench tests running on CircuitPython and increase timeouts
so it works when instrumentation is active.
Showing 8 digits instead of 5, supporting devices with more than 1 MByte of
RAM (which is common these days). The masking was never needed, and the
related commented-out line can go.
The assertion that is added here (to gc.c) fails when running this new test
if ALLOC_TABLE_GAP_BYTE is set to 0.
Signed-off-by: Jeff Epler <jepler@gmail.com>
Signed-off-by: Damien George <damien@micropython.org>
Prior to this fix the follow crash occurred. With a GC layout of:
GC layout:
alloc table at 0x3fd80428, length 32001 bytes, 128004 blocks
finaliser table at 0x3fd88129, length 16001 bytes, 128008 blocks
pool at 0x3fd8bfc0, length 2048064 bytes, 128004 blocks
Block 128003 is an AT_HEAD and eventually is passed to gc_mark_subtree.
This causes gc_mark_subtree to call ATB_GET_KIND(128004). When block 1 is
created with a finaliser, the first byte of the finaliser table becomes
0x2, but ATB_GET_KIND(128004) reads these bits as AT_TAIL, and then
gc_mark_subtree references past the end of the heap, which happened to be
past the end of PSRAM on the esp32-s2.
The fix in this commit is to ensure there is a one-byte gap after the ATB
filled permanently with AT_FREE.
Fixes issue #7116.
See also https://github.com/adafruit/circuitpython/issues/5021
Signed-off-by: Jeff Epler <jepler@gmail.com>
Signed-off-by: Damien George <damien@micropython.org>
When you want to use the valgrind memory analysis tool on MicroPython, you
can arrange to define MICROPY_DEBUG_VALGRIND to enable use of special
valgrind macros. For now, this only fixes `gc_get_ptr` so that it never
emits the diagnostic "Conditional jump or move depends on uninitialised
value(s)".
Signed-off-by: Jeff Epler <jepler@gmail.com>
Use C macros to reduce the size of firmware images when the GC split-heap
feature is disabled.
The code size difference of this commit versus HEAD~2 (ie the commit prior
to MICROPY_GC_SPLIT_HEAP being introduced) when split-heap is disabled is:
bare-arm: +0 +0.000%
minimal x86: +0 +0.000%
unix x64: -16 -0.003%
unix nanbox: -20 -0.004%
stm32: -8 -0.002% PYBV10
cc3200: +0 +0.000%
esp8266: +8 +0.001% GENERIC
esp32: +0 +0.000% GENERIC
nrf: -20 -0.011% pca10040
rp2: +0 +0.000% PICO
samd: -4 -0.003% ADAFRUIT_ITSYBITSY_M4_EXPRESS
The code size difference of this commit versus HEAD~2 split-heap is enabled
with MICROPY_GC_MULTIHEAP=1 (but no extra code to add more heaps):
unix x64: +1032 +0.197% [incl +544(bss)]
esp32: +592 +0.039% GENERIC[incl +16(data) +264(bss)]