adafruit-mirror/circuitpython

Author	SHA1	Message	Date
Scott Shawcroft	7f0cc9e7b4	Add Zephyr port This port is meant to grow to encompass all existing boards. For now, it is a port while we transition over. It is named `zephyr-cp` to differentiate it from the MicroPython `zephyr` port. They are separate implementations.	2025-02-04 11:24:13 -08:00
Dan Halbert	ac7e15f88a	(only) reserve merge conflicts	2024-08-28 16:31:37 -04:00
Dan Halbert	be6fa2af21	merge from main	2024-07-29 17:41:46 -04:00
Dan Halbert	71f17b08fb	wip: fixing compilation	2024-07-26 18:38:46 -04:00
Dan Halbert	69b667406b	MPy v1.22 merge: initial merge; not compiled yet	2024-07-25 15:16:24 -04:00
Jim Mussared	d694ac6e1b	py/makeqstrdata.py: Ensure that scope names get low qstr values. Originally implemented in a patch file provided by @ironss-iotec. Fixes issue #14093. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>	2024-03-26 22:52:25 +11:00
Jim Mussared	7ea503929a	py/qstr: Add support for MICROPY_QSTR_BYTES_IN_HASH=0. This disables using qstr hashes altogether, which saves RAM and flash (two bytes per interned string on a typical build) as well as code size. On PYBV11 this is worth over 3k flash. qstr comparison will now be done just by length then data. This affects qstr_find_strn although this has a negligible performance impact as, for a given comparison, the length and first character will ~usually be different anyway. String hashing (e.g. builtin `hash()` and map.c) now need to compute the hash dynamically, and for the map case this does come at a performance cost. This work was funded through GitHub Sponsors. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>	2024-01-25 16:38:17 +11:00
Jeff Epler	51e3d5ecbf	Add a note about this makeqstrdata change	2023-12-24 10:41:54 -06:00
Jeff Epler	40722741dd	makeqstrdata: ensure certain qstrs are as early as possible Some qstrs like those representing binary ops such as __add__ must have qstr numbers that fit in 8 bits. Replace the former ad-hoc method, which sorted other dunder-identifiers early with a list of all the qstrs that have this requirement. Before this, the unix coverage build was failing when I added certain qstrs like "<input>" for a codeop filename default value.	2023-12-14 17:19:26 -06:00
Jim Mussared	64c79a5423	py/qstr: Add support for sorted qstr pools. This provides a significant performance boost for qstr_find_strn, which is called a lot during parsing and loading of .mpy files, as well as interning of string objects (which happens in most string methods that return new strings). Also adds comments to explain the "static" qstrs. These are part of the .mpy ABI and avoid needing to duplicate string data for QSTRs known to already be in the firmware. The static pool isn't currently sorted, but in the future we could either split the static pool into the sorted regions, or in the next .mpy version just sort them. Based on initial work done by @amirgon in #6896. This work was funded through GitHub Sponsors. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>	2023-10-30 11:10:02 +11:00
Dan Halbert	367e13c69f	change CIRCUITPY change markers to CIRCUITPY-CHANGE	2023-10-19 16:42:36 -04:00
Dan Halbert	f2ebe6839c	Initial MicroPython v1.21.0 merge; not compiled yet	2023-10-18 17:49:14 -04:00
Jeff Epler	9104654930	makeqstrdata: ensure _lt and _gt qstrs are sorted early this fixes a build error because their numbers have to be <256	2023-09-22 10:38:52 -05:00
Dan Halbert	d582407b06	pre-commit fixes	2023-08-14 00:59:22 -04:00
Dan Halbert	48e404ab90	wip; fix frozen; still need to fix -j1 for frozen	2023-08-12 15:44:27 -04:00
Dan Halbert	8a89a3d425	sort TRANSLATION()'s	2023-08-11 13:36:03 -04:00
Dan Halbert	fe0e2f13bc	wip; fix qstr processing	2023-08-10 20:06:32 -04:00
Jim Mussared	8b27482692	top: Update Python formatting to black "2023 stable style". See https://black.readthedocs.io/en/stable/the_black_code_style/index.html Signed-off-by: Jim Mussared <jim.mussared@gmail.com>	2023-02-02 12:51:03 +11:00
MicroDev	d9d94eacca	run updated pre-commit	2023-02-01 13:38:41 +05:30
Scott Shawcroft	fd5ef009a4	Move compressed strings into own object file This breaks the translation dependency to all of the other objects and therefore speeds up subsequent builds. Now, even when the big translate() function is inlined in the header, it only needs to be optimized once.	2022-06-02 11:48:56 -07:00
Scott Shawcroft	9d10a3da66	Conditionalize LTO	2022-05-27 12:59:54 -07:00
Artyom Skrobov	18b1ba086c	py/qstr: Separate hash and len from string data. This allows the compiler to merge strings: e.g. "update", "difference_update" and "symmetric_difference_update" will all point to the same memory. No functional change. The size reduction depends on the number of qstrs in the build. The change this commit brings is: bare-arm: -4 -0.007% minimal x86: +150 +0.092% [incl +48(data)] unix x64: -608 -0.118% unix nanbox: -572 -0.126% [incl +32(data)] stm32: -1392 -0.352% PYBV10 cc3200: -448 -0.244% esp8266: -1208 -0.173% GENERIC esp32: -1028 -0.068% GENERIC[incl -1020(data)] nrf: -440 -0.252% pca10040 rp2: -1072 -0.217% PICO samd: -368 -0.264% ADAFRUIT_ITSYBITSY_M4_EXPRESS Performance is also improved (on bare metal at least) for the core_import_mpy_multi.py, core_import_mpy_single.py and core_qstr.py performance benchmarks. Originally at adafruit#4583 Signed-off-by: Artyom Skrobov <tyomitch@gmail.com>	2022-02-11 22:52:32 +11:00
Jeff Epler	d59a28db97	Compress word offset table By storing "count of words by length", the long `wends` table can be replaced with a short `wlencount` table. This saves flash storage space. Extend the range of string lengths that can be in the dictionary. Originally it was to 2 to 9; at one point it was changed to 3 to 9. Putting the lower bound back at 2 has a positive impact on the French translation (a bunch of them, such as "ch", "\r\n", "%q", are used). Increasing the maximum length gets 'mpossible', ' doit être ', and 'CircuitPyth' at the long end. This adds a bit of processing time to makeqstrdata. The specific 2/11 values are again empirical based on the French translation on the adafruit_proxlight_trinkey_m0.	2021-08-07 09:23:35 -05:00
Jeff Epler	0b8b16f6ac	increase comment on accuracy of the net savings estimate function Thanks to tyomitch for suggesting the comment could be more accurate.	2021-07-11 08:57:27 -05:00
Jeff Epler	52e75c645d	makeqstrdata: Don't include strings that are a net loss!	2021-07-09 14:26:43 -05:00
Jeff Epler	8836198ff1	TextSplitter: don't mutate 'words' I was puzzled by why the dictionary words were sorted by length. It was because TextSplitter sorted its parameter, instead of a copy. This doesn't affect encoding size, but does affect the encoding NUMBER of the found words. We'll deliberately restore sorting by length next, for other reasons, but not by spooky action.	2021-07-09 14:02:31 -05:00
Jeff Epler	99abd03b7a	makeqstrdata: use an extremely accurate dictionary heuristic Try to accurately measure the costs of including a word in the dictionary vs the gains from using it in messages. This saves about 160 bytes on trinket_m0 ja, the fullest translation for that board. Other translations on the same board all have savings, ranging from 24 to 228 bytes. ``` Translation Before After Savings ja 1164 1324 160 de_DE 1260 1396 136 fr 1424 1652 228 zh_Latn_pinyin 1448 1520 72 pt_BR 1584 1736 152 pl 1592 1640 48 es 1724 1816 92 ko 1724 1816 92 fil 1764 1800 36 it_IT 1896 2040 144 nl 1956 2136 180 ID 2072 2180 108 cs 2124 2148 24 sv 2340 2448 108 en_x_pirate 2644 2740 96 en_GB 2652 2752 100 el 2656 2768 112 en_US 2656 2768 112 hi 2656 2768 112 ```	2021-07-09 12:45:49 -05:00
Jeff Epler	45dc0953a5	makeqstrdata.py: Remove a problematic print .. it contained non-ASCII characters, even when building the standard English translation. This may help resolve the build problems reported at #4750.	2021-05-11 21:48:21 -05:00
Scott Shawcroft	b35fa44c8a	Merge MicroPython 1.12 into CircuitPython	2021-05-03 14:01:18 -07:00
Jeff Epler	dfa7c3d32d	codeformat: Fix handling of `` After discussing with danh, I noticed that `a//b` would not match `a/b`. After correcting this and re-running "pre-commit run --all", additional files were reindented, including the codeformat script itself.	2021-04-30 15:30:13 -05:00
Scott Shawcroft	76033d5115	Merge MicroPython v1.11 into CircuitPython	2021-04-26 15:47:41 -07:00
Scott Shawcroft	e54e5e3575	Merge pull request #4564 from tyomitch/patch-1 [build] simplify makeqstrdata heuristic	2021-04-19 14:50:42 -07:00
Artyom Skrobov	dcee89ade7	build: simplify compute_huffman_coding() No functional change.	2021-04-09 08:36:26 -04:00
Artyom Skrobov	68920682b6	[build] simplify makeqstrdata heuristic The simpler one saves, on average, 51 more bytes per translation; the biggest translation per board is reduced, on average, by 85 bytes.	2021-04-09 07:18:40 -04:00
Artyom Skrobov	c3e40d50ab	[qstr] Separate hash and len from string data This allows the compiler to merge strings: e.g. "update", "difference_update" and "symmetric_difference_update" will all point to the same memory. Shaves ~1KB off the image size, and potentially allows bigger savings if qstr attrs are initialized in qstr_init(), and not stored in the image.	2021-04-06 12:58:42 -04:00
microDev	a52eb88031	run code formatting script	2021-03-15 19:27:36 +05:30
Jeff Epler	0318eb359f	makeqstrdata: Work around python3.6 compatibility problem Discord user Folknology encountered a problem building with Python 3.6.9, `TypeError: ord() expected a character, but string of length 0 found`. I was able to reproduce the problem using Python3.5, and discovered that the meaning of the regular expression `"\|."` had changed in 3.7. Before, ``` >>> [m.group(0) for m in re.finditer("\|.", "hello")] ['', '', '', '', '', ''] ``` After: ``` >>> [m.group(0) for m in re.finditer("\|.", "hello")] ['', 'h', '', 'e', '', 'l', '', 'l', '', 'o', ''] ``` Check if `words` is empty and if so use `"."` as the regular expression instead. This gives the same result on both versions: ``` ['h', 'e', 'l', 'l', 'o'] ``` and fixes the generation of the huffman dictionary. Folknology verified that this fix worked for them. I could easily install 3.5 but not 3.6. 3.5 reproduced the same problem	2020-09-21 10:03:07 -05:00
Jeff Epler	bfbbbd6c5c	makeqstrdata: Work with older Python This construct (which I added without sufficient testing, apparently) is only supported in Python 3.7 and newer. Make it optional so that this script works on other Python versions. This means that if you have a system with non-UTF-8 encoding you will need to use Python 3.7. In particular, this affects a problem building circuitpython in github's ubuntu-18.04 virtual environment when Python 3.7 is not explicitly installed. cookie-cuttered libraries call for Python 3.6: ``` - name: Set up Python 3.6 uses: actions/setup-python@v1 with: python-version: 3.6 ``` Since CircuitPython's own build calls for 3.8, this problem was not detected. This problem was also encountered by discord user mdroberts1243. The failure I encountered was here: https://github.com/jepler/Jepler_CircuitPython_udecimal/runs/1138045020?check_suite_focus=true .. while my step of "clone and build circuitpython unix port" is unusual, I think the same problem would have affected "build assets" if that step had been reached.	2020-09-19 10:16:13 -05:00
Jeff Epler	a8e98cda83	makeqstrdata: comment my understanding of @ciscorn's code	2020-09-16 08:28:15 -05:00
Taku Fukada	d18d79ac47	Small improvements to the dictionary compression	2020-09-14 01:50:01 +09:00
Jeff Epler	15964a4750	makeqstrdata: Avoid encoding problems Most users and the CI system are running in configurations where Python configures stdout and stderr in UTF-8 mode. However, Windows is different, setting values like CP1252. This led to a build failure on Windows, because makeqstrdata printed Unicode strings to its stdout, expecting them to be encoded as UTF-8. This script is writing (stdout) to a compiler input file and potentially printing messages (stderr) to a log or console. Explicitly configure stdout to use utf-8 to get consistent behavior on all platforms, and configure stderr so that if any log/diagnostic messages are printed that cannot be displayed correctly, they are still displayed instead of creating an error while trying to print the diagnostic information. I considered setting the encodings both to ascii, but this would just be occasionally inconvenient to developers like me who want to show diagnostic info on stderr and in comments while working with the compression code. Closes: #3408	2020-09-12 19:43:08 -05:00
Jeff Epler	40ab5c6b21	compression: Implement ciscorn's dictionary approach Massive savings. Thanks so much @ciscorn for providing the initial code for choosing the dictionary. This adds a bit of time to the build, both to find the dictionary but also because (for reasons I don't fully understand), the binary search in the compress() function no longer worked and had to be replaced with a linear search. I think this is because the intended invariant is that for codebook entries that encode to the same number of bits, the entries are ordered in ascending value. However, I mis-placed the transition from "words" to "byte/char values" so the codebook entries for words are in word-order rather than their code order. Because this price is only paid at build time, I didn't care to determine exactly where the correct fix was. I also commented out a line to produce the "estimated total memory size" -- at least on the unix build with TRANSLATION=ja, this led to a build time KeyError trying to compute the codebook size for all the strings. I think this occurs because some single unicode code point ('ァ') is no longer present as itself in the compressed strings, due to always being replaced by a word. As promised, this seems to save hundreds of bytes in the German translation on the trinket m0. Testing performed: - built trinket_m0 in several languages - built and ran unix port in several languages (en, de_DE, ja) and ran simple error-producing codes like ./micropython -c '1/0'	2020-09-12 10:10:45 -05:00
Jeff Epler	bdb07adfcc	translations: Make decompression clearer Now this gets filled in with values e.g., 128 (0x80) and 159 (0x9f).	2020-09-08 19:07:53 -05:00
Jeff Epler	cbfd38d1ce	Rename functions to encode_ngrams / decode_ngrams	2020-09-02 19:09:23 -05:00
Jeff Epler	c34cb82ecb	makeqstrdata: correct range of low code points to 0x80..0x9f inclusive The previous range was unintentionally big and overlaps some characters we'd like to use (and also 0xa0, which we don't intentionally use)	2020-09-02 15:52:02 -05:00
Jeff Epler	07740d19f3	add bigram compression to makeqstrdata Compress common unicode bigrams by making code points in the range 0x80 - 0xbf (inclusive) represent them. Then, they can be greedily encoded and the substituted code points handled by the existing Huffman compression. Normally code points in the range 0x80-0xbf are not used in Unicode, so we stake our own claim. Using the more arguably correct "Private Use Area" (PUA) would mean that for scripts that only use code points under 256 we would use more memory for the "values" table. bigram means "two letters", and is also sometimes called a "digram". It's nothing to do with "big RAM". For our purposes, a bigram represents two successive unicode code points, so for instance in our build on trinket m0 for english the most frequent are: ['t ', 'e ', 'in', 'd ', ...]. The bigrams are selected based on frequency in the corpus, but the selection is not necessarily optimal, for these reasons I can think of: * Suppose the corpus was just "tea" repeated 100 times. The top bigrams would be "te", and "ea". However, overlap, "te" could never be used. Thus, some bigrams might actually waste space * I _assume_ this has to be why e.g., bigram 0x86 "s " is more frequent than bigram 0x85 " a" in English for Trinket M0, because sequences like "can't add" would get the "t " digram and then be unable to use the " a" digram. * And generally, if a bigram is frequent then so are its constituents. Say that "i" and "n" both encode to just 5 or 6 bits, then the huffman code for "in" had better compress to 10 or fewer bits or it's a net loss! * I checked though! "i" is 5 bits, "n" is 6 bits (lucky guess) but the bigram 0x83 also just 6 bits, so this one is a win of 5 bits for every "it" minus overhead. Yay, this round goes to team compression. * On the other hand, the least frequent bigram 0x9d " n" is 10 bits long and its constituent code points are 4+6 bits so there's no savings, but there is the cost of the table entry. * and somehow 0x9f 'an' is never used at all! With or without accounting for overlaps, there is some optimum number of bigrams. Adding one more bigram uses at least 2 bytes (for the entry in the bigram table; 4 bytes if code points >255 are in the source text) and also needs a slot in the Huffman dictionary, so adding bigrams beyond the optimim number makes compression worse again. If it's an improvement, the fact that it's not guaranteed optimal doesn't seem to matter too much. It just leaves a little more fruit for the next sweep to pick up. Perhaps try adding the most frequent bigram not yet present, until it doesn't improve compression overall. Right now, de_DE is again the "fullest" build on trinket_m0. (It's reclaimed that spot from the ja translation somehow) This change saves 104 bytes there, increasing free space about 6.8%. In the larger (but not critically full) pyportal build it saves 324 bytes. The specific number of bigrams used (32) was chosen as it is the max number that fit within the 0x80..0xbf range. Larger tables would require the use of 16 bit code points in the de_DE build, losing savings overall. (Side note: The most frequent letters in English have been said to be: ETA OIN SHRDLU; but we have UAC EIL MOPRST in our corpus)	2020-09-01 17:12:22 -05:00
Taku Fukada	79a3796b1c	Calculate the Huffman codebook without MP_QSTRs	2020-08-18 23:21:14 +09:00
Jeff Epler	08ed09acc6	makeqstrdata: don't print "compression incrased length" messages This check as implemented is misleading, because it compares the compressed size in bytes (including the length indication) with the source string length in Unicode code points. For English this is approximately fair, but for Japanese this is quite unfair and produces an excess of "increased length" messages. This message might have existed for one of two reasons: * to alert to an improperly function huffman compression * to call attention to a need for a "string is stored uncompressed" case We know by now that the huffman compression is functioning as designed and effective in general. Just to be on the safe side, I did some back-of-the-envelope estimates. I considered these three replacements for "the true source string size, in bytes": + decompressed_len_utf8 = len(decompressed.encode('utf-8')) + decompressed_len_utf16 = len(decompressed.encode('utf-16be')) + decompressed_len_bitsize = ((1+len(decompressed)) * math.ceil(math.log(1+len(values), 2)) + 7) // 8 The third counts how many bits each character requires (fewer than 128 characters in the source character set = 7, fewer than 256 = 8, fewer than 512 = 9, etc, adding a string-terminating value) and is in some way representative of the best way we would be able to store "uncompressed strings". The Japanese translation (largest as of writing) has just a few strings which increase by this metric. However, the amount of loss due to expansion in those cases is outweighed by the cost of adding 1 bit per string to indicate whether it's compressed or not. For instance, in the BOARD=trinket_m0 TRANSLATION=ja build the loss is 47 bytes over 300 strings. Adding 1 bit to each of 300 strings will cost about 37 bytes, leaving just 5 Thumb instructions to implement the code to check and decode "uncompressed" strings in order to break even.	2020-08-16 20:50:48 -05:00
Jeff Epler	d0f9b5901e	translations: document the compressed format	2020-05-28 11:30:46 -05:00
Jeff Epler	fe3e8d1589	string compression: save a few bits per string Length was stored as a 16-bit number always. Most translations have a max length far less. For example, US English translation lengths always fit in just 8 bits. probably all languages fit in 9 bits. This also has the side effect of reducing the alignment of compressed_string_t from 2 bytes to 1. testing performed: ran in german and english on pyruler, printed messages looked right. Firmware size, en_US Before: 3044 bytes free in flash After: 3408 bytes free in flash Firmware size, de_DE (with #2967 merged to restore translations) Before: 1236 bytes free in flash After: 1600 bytes free in flash	2020-05-28 08:36:08 -05:00

1 2 3

106 commits