Compare commits

...

31 commits

Author SHA1 Message Date
d901303671 Partially revert "tinf_decode_symbol: Don't traverse past end of TINF_TREE->table and ->trans."
.. I do not have any testcases demonstrating that the range check of
the access to t->trans is necessary.

This partially reverts commit cc278f73b6.
2018-07-07 13:19:13 -05:00
4c50ea3f9e TINF_PUT: move out of line for slight code size savings 2018-07-07 13:19:13 -05:00
188502c088 tinf_decode_trees: avoid out of range access to lengths[]
an example of a crash that this fixes is
explore-findings/crashes/id:000009,sig:11,src:000000,op:flip2,pos:12

this bug is more likely to be overt on a 64-bit platform where
'lengths[(unsigned)-1]' is 4GB away from 'lengths[0]'.
2018-07-07 13:19:13 -05:00
a2c98e6ba9 tinf_decode_trees: propagate error from tinf_decode_symbol
.. an example of a crash that this fixes is
explore-findings/crashes/id:000017,sig:11,src:000000,op:havoc,rep:4
2018-07-07 13:19:13 -05:00
2d8b399df8 tinf_inflate_block_data: Don't read outside of 'dest' buffer
.. given specially crafted input data, it was possible for lzOff to
point outside (specifically, before) the destination buffer.

this condition is possibly not quite stringent enough
2018-07-07 13:19:13 -05:00
4bc7825ffb tgunzip: properly initialize destStart 2018-07-07 13:19:13 -05:00
4ff6c9f96e tinf_inflate_block_data: Don't write outside of destination buffer
Because this write to d->dest is not via TPUT, the check has to be
made manually.
2018-07-07 13:19:12 -05:00
2f26f35302 TINF_PUT: don't write past the end of the destination buffer
.. instead returning TINF_DATA_ERROR in this case.
2018-07-07 13:19:12 -05:00
Jeff Epler
9967f1f773 fixup! tinflate, tinfgzip: Don't read past the end of the input buffer 2018-07-07 13:19:12 -05:00
4440911d35 tinflate, tinfgzip: Don't read past the end of the input buffer 2018-07-07 13:19:12 -05:00
7eae43028b tgunzip.c: explicitly give end of input and output buffers 2018-07-07 13:19:09 -05:00
abe71eb7cd tinf.h: add new fields for explicit end of source and destination buffers 2018-07-07 08:14:04 -05:00
09ec117dc1 tinf.h: remove unused 'destRemaining' field 2018-07-07 08:13:12 -05:00
Paul Sokolovsky
14eb5fd02c tests/Makefile: Add "clean" target. 2018-07-07 13:35:08 +03:00
Paul Sokolovsky
7bc1a9123c */makefile: Move -s (strip) to link flags. 2018-07-07 13:34:15 +03:00
Paul Sokolovsky
7838852c25 src/makefile: Remove -v flag from ar command line. 2018-07-07 13:33:39 +03:00
Paul Sokolovsky
07834aca65 */makefile.elf: Rename to just makefile. 2018-07-07 11:55:30 +03:00
Paul Sokolovsky
8aeabd3344 */makefile.elf: Don't hardcode gcc, use $(CC). 2018-07-07 11:55:16 +03:00
Paul Sokolovsky
f57a277c0f tinf_inflate_block_data: Check for EOF while decoding literal bytes.
This is an obvious case when decoding won't stop by other means on fake
zeros returned from uzlib_get_byte(), so need to check it explicitly.
2018-07-07 02:03:08 +03:00
Paul Sokolovsky
c36d540c5f tgunzip: Set TINF_DATA::source_limit.
Which is now required after "don't read past the end of source" refactor.
2018-07-07 01:00:02 +03:00
Paul Sokolovsky
50590fb5b6 tinflate: Don't read past the end of source.
For this, TINF_DATA::source_limit field was introduced, which should point
to the next byte after the source buffer (in other words,
source_limit = source + source_len). Also, readSource callback definition
was updated to allow to return -1 as EOF.

Inpsired by the patch by Jeff Epler (@jepler).
2018-07-07 00:59:57 +03:00
Paul Sokolovsky
1aa2b1161e tgunzip: Call uzlib_uncompress_init before any other ops on TINF_DATA
uzlib_uncompress_init() should be called first before setting any other
fields or running uzlib_gzip_parse_header().
2018-07-07 00:26:40 +03:00
6a47a55685 tgunzip: Don't crash on an impossibly small file.
Minimum gzip size is actually larger than 4 bytes. Use that figure here
just to avoid obvious array out of bounds access in tgunzip source.
2018-07-06 23:04:46 +03:00
Paul Sokolovsky
48c27e8783 tgunzip: On decompression error, exit with rc of the error code.
Instead of trying to print zero size and return with rc=0. The C error
codes, being negative, are actually negated to serve as exit codes of
the process.
2018-07-06 22:04:11 +03:00
cc278f73b6 tinf_decode_symbol: Don't traverse past end of TINF_TREE->table and ->trans. 2018-06-18 23:10:03 +03:00
Paul Sokolovsky
8168c3c40f tinf.h: Introduce TINF_ARRAY_SIZE().
Will be used for boundary checks on various arrays.
2018-06-17 16:20:41 +02:00
b29fca4a62 tinf_get_le_uint32: Don't invoke undefined behavior left shifting
clang ubsan says:
tinflate.c:186:44: runtime error: left shift of 223 by 24 places cannot be represented in type 'int'

this shift is defined for uint32_t, but is undefined behavior for int32_t.
2018-06-11 12:36:45 +03:00
Paul Sokolovsky
65a7f41de3 tinflate: tinf_decode_trees: Optimize for size and add overflow checks.
Factor out any common code/processing outside the switch branches. Check
that when doing repeated fills of the length table, we don't overflow
it (can happen if there's bad data).
2018-05-20 16:04:54 +03:00
Paul Sokolovsky
a46a67ca93 tinflate: Typo fix in comment. 2018-05-20 16:04:09 +03:00
Paul Sokolovsky
74b2a51c21 tests: Add test directory with a basic smoke test for decompressor.
Just include a compressed gzip file and check that decompressed result
matches the expected md5sum.

TODO: More corpus and test vectors.
2018-05-20 16:02:15 +03:00
Paul Sokolovsky
d4e4a4aa06 tinflate: Explicitly get length/invlength as little-endian.
Previosuly, the statements looked like:

length = uzlib_get_byte(d) + 256 * uzlib_get_byte(d);

But the C standard doesn't define in which order function calls are made,
and different compilers/their options may use different order. So, break
the statement above in two, to guarantee the proper computation order.

Reported in https://github.com/pfalcon/uzlib/issues/8.
2018-05-11 23:49:48 +03:00
9 changed files with 134 additions and 52 deletions

View file

@ -13,18 +13,18 @@ target = tgunzip
objects = tgunzip.o
libs = ../../lib/libtinf.a
cflags = -s -Wall -Os -I../../src
ldflags = $(cflags) -Wl,-Map,ld.map
cflags = -Wall -Os -I../../src
ldflags = $(cflags) -s -Wl,-Map,ld.map
.PHONY: all clean
all: $(target)
$(target): $(objects) $(libs)
gcc $(ldflags) -o $@ $^ $(libs)
$(CC) $(ldflags) -o $@ $^ $(libs)
%.o : %.c
gcc $(cflags) -c $<
$(CC) $(cflags) -c $<
clean:
$(RM) $(objects) $(target)

View file

@ -87,6 +87,8 @@ int main(int argc, char *argv[])
fclose(fin);
if (len < 4) exit_error("file too small");
/* -- get decompressed length -- */
dlen = source[len - 1];
@ -103,7 +105,11 @@ int main(int argc, char *argv[])
outlen = dlen;
TINF_DATA d;
// uzlib_uncompress_init(&d, malloc(32768), 32768);
uzlib_uncompress_init(&d, NULL, 0);
d.source = source;
d.source_limit = source + len - 4;
res = uzlib_gzip_parse_header(&d);
if (res != TINF_OK) {
@ -111,10 +117,8 @@ int main(int argc, char *argv[])
exit(1);
}
// uzlib_uncompress_init(&d, malloc(32768), 32768);
uzlib_uncompress_init(&d, NULL, 0);
d.dest = dest;
d.dest = d.destStart = dest;
d.edest = dest + dlen;
/* decompress byte by byte; can be any other length */
d.destSize = 1;
@ -124,6 +128,7 @@ int main(int argc, char *argv[])
if (res != TINF_DONE) {
printf("Error during decompression: %d\n", res);
exit(-res);
}
printf("decompressed %lu bytes\n", d.dest - dest);

View file

@ -13,18 +13,18 @@ target = tgzip
objects = tgzip.o
libs = ../../lib/libtinf.a
cflags = -s -Wall -Os -I../../src
ldflags = $(cflags) -Wl,-Map,ld.map
cflags = -Wall -Os -I../../src
ldflags = $(cflags) -s -Wl,-Map,ld.map
.PHONY: all clean
all: $(target)
$(target): $(objects) $(libs)
gcc $(ldflags) -o $@ $^ $(libs)
$(CC) $(ldflags) -o $@ $^ $(libs)
%.o : %.c
gcc $(cflags) -c $<
$(CC) $(cflags) -c $<
clean:
$(RM) $(objects) $(target)

View file

@ -13,8 +13,8 @@ target = ../lib/libtinf.a
objects = tinflate.o tinfgzip.o tinfzlib.o adler32.o crc32.o \
defl_static.o genlz77.o
cflags = -s -Wall -Os
ldflags = $(cflags)
cflags = -Wall -Os
ldflags = $(cflags) -s
.PHONY: all clean
@ -22,11 +22,11 @@ all: $(target)
$(target): $(objects)
$(RM) $@
ar -frsv $@ $^
ar -frs $@ $^
ranlib $@
%.o : %.c
gcc $(cflags) -o $@ -c $<
$(CC) $(cflags) -o $@ -c $<
%.o : %.nas
nasm -o $@ -f elf -D_ELF_ -O3 -Inasm/ $<

View file

@ -11,7 +11,9 @@
#ifndef TINF_H_INCLUDED
#define TINF_H_INCLUDED
#include <stdlib.h>
#include <stdint.h>
#include <stdbool.h>
/* calling convention */
#ifndef TINFCC
@ -39,6 +41,9 @@ extern "C" {
#define TINF_CHKSUM_ADLER 1
#define TINF_CHKSUM_CRC 2
/* helper macros */
#define TINF_ARRAY_SIZE(arr) (sizeof(arr) / sizeof(*(arr)))
/* data structures */
typedef struct {
@ -48,10 +53,17 @@ typedef struct {
struct TINF_DATA;
typedef struct TINF_DATA {
/* Pointer to the next byte in the input buffer */
const unsigned char *source;
/* If source above is NULL, this function will be used to read
next byte from source stream */
unsigned char (*readSource)(struct TINF_DATA *data);
/* Pointer to the next byte past the input buffer (source_limit = source + len) */
const unsigned char *source_limit;
/* If source_limit == NULL, or source >= source_limit, this function
will be used to read next byte from source stream. The function may
also return -1 in
case of EOF (or irrecoverable error). Note that besides returning
the next byte, it may also update source and sourceRemaining fields,
thus allowing for buffered operation. */
int (*readSource)(struct TINF_DATA *data);
unsigned int tag;
unsigned int bitcount;
@ -62,12 +74,13 @@ typedef struct TINF_DATA {
unsigned int destSize;
/* Current pointer in buffer */
unsigned char *dest;
/* Remaining bytes in buffer */
unsigned int destRemaining;
/* end of destination buffer */
unsigned char *edest;
/* Accumulating checksum */
unsigned int checksum;
char checksum_type;
bool eof;
int btype;
int bfinal;
@ -81,11 +94,10 @@ typedef struct TINF_DATA {
TINF_TREE dtree; /* dynamic distance tree */
} TINF_DATA;
void tinf_put(TINF_DATA *d, char c);
#define TINF_PUT(d, c) \
{ \
*d->dest++ = c; \
if (d->dict_ring) { d->dict_ring[d->dict_idx++] = c; if (d->dict_idx == d->dict_size) d->dict_idx = 0; } \
}
tinf_put(d, c)
unsigned char TINFCC uzlib_get_byte(TINF_DATA *d);

View file

@ -171,10 +171,28 @@ static void tinf_build_tree(TINF_TREE *t, const unsigned char *lengths, unsigned
unsigned char uzlib_get_byte(TINF_DATA *d)
{
if (d->source) {
/* If end of source buffer is not reached, return next byte from source
buffer. */
if (d->source < d->source_limit) {
return *d->source++;
}
return d->readSource(d);
/* Otherwise if there's callback and we haven't seen EOF yet, try to
read next byte using it. (Note: the callback can also update ->source
and ->source_limit). */
if (d->readSource && !d->eof) {
int val = d->readSource(d);
if (val >= 0) {
return (unsigned char)val;
}
}
/* Otherwise, we hit EOF (either from ->readSource() or from exhaustion
of the buffer), and it will be "sticky", i.e. further calls to this
function will end up here too. */
d->eof = true;
return 0;
}
uint32_t tinf_get_le_uint32(TINF_DATA *d)
@ -182,7 +200,7 @@ uint32_t tinf_get_le_uint32(TINF_DATA *d)
uint32_t val = 0;
int i;
for (i = 4; i--;) {
val = val >> 8 | uzlib_get_byte(d) << 24;
val = val >> 8 | ((uint32_t)uzlib_get_byte(d)) << 24;
}
return val;
}
@ -245,7 +263,9 @@ static int tinf_decode_symbol(TINF_DATA *d, TINF_TREE *t)
cur = 2*cur + tinf_getbit(d);
++len;
if (++len == TINF_ARRAY_SIZE(t->table)) {
return TINF_DATA_ERROR;
}
sum += t->table[len];
cur -= t->table[len];
@ -256,10 +276,10 @@ static int tinf_decode_symbol(TINF_DATA *d, TINF_TREE *t)
}
/* given a data stream, decode dynamic trees from it */
static void tinf_decode_trees(TINF_DATA *d, TINF_TREE *lt, TINF_TREE *dt)
static int tinf_decode_trees(TINF_DATA *d, TINF_TREE *lt, TINF_TREE *dt)
{
unsigned char lengths[288+32];
unsigned int hlit, hdist, hclen;
unsigned int hlit, hdist, hclen, hlimit;
unsigned int i, num, length;
/* get 5 bits HLIT (257-286) */
@ -286,46 +306,55 @@ static void tinf_decode_trees(TINF_DATA *d, TINF_TREE *lt, TINF_TREE *dt)
tinf_build_tree(lt, lengths, 19);
/* decode code lengths for the dynamic trees */
for (num = 0; num < hlit + hdist; )
hlimit = hlit + hdist;
for (num = 0; num < hlimit; )
{
int sym = tinf_decode_symbol(d, lt);
if (sym < 0) return sym; // e.g., TINF_DATA_ERROR
unsigned char fill_value = 0;
int lbits, lbase = 3;
/* error decoding */
if (sym < 0) return sym;
switch (sym)
{
case 16:
/* copy previous code length 3-6 times (read 2 bits) */
{
unsigned char prev = lengths[num - 1];
for (length = tinf_read_bits(d, 2, 3); length; --length)
{
lengths[num++] = prev;
}
}
if(num-1 >= sizeof(lengths)) return TINF_DATA_ERROR;
fill_value = lengths[num - 1];
lbits = 2;
break;
case 17:
/* repeat code length 0 for 3-10 times (read 3 bits) */
for (length = tinf_read_bits(d, 3, 3); length; --length)
{
lengths[num++] = 0;
}
lbits = 3;
break;
case 18:
/* repeat code length 0 for 11-138 times (read 7 bits) */
for (length = tinf_read_bits(d, 7, 11); length; --length)
{
lengths[num++] = 0;
}
lbits = 7;
lbase = 11;
break;
default:
/* values 0-15 represent the actual code lengths */
lengths[num++] = sym;
break;
/* continue the for loop */
continue;
}
/* special code length 16-18 are handled here */
length = tinf_read_bits(d, lbits, lbase);
if (num + length >= hlimit) return TINF_DATA_ERROR;
for (; length; --length)
{
lengths[num++] = fill_value;
}
}
/* build dynamic trees */
tinf_build_tree(lt, lengths, hlit);
tinf_build_tree(dt, lengths + hlit, hdist);
return TINF_OK;
}
/* ----------------------------- *
@ -341,6 +370,10 @@ static int tinf_inflate_block_data(TINF_DATA *d, TINF_TREE *lt, TINF_TREE *dt)
int sym = tinf_decode_symbol(d, lt);
//printf("huff sym: %02x\n", sym);
if (d->eof) {
return TINF_DATA_ERROR;
}
/* literal byte */
if (sym < 256) {
TINF_PUT(d, sym);
@ -380,6 +413,11 @@ static int tinf_inflate_block_data(TINF_DATA *d, TINF_TREE *lt, TINF_TREE *dt)
d->lzOff = 0;
}
} else {
if(d->dest >= d->edest) return TINF_DATA_ERROR;
if(d->lzOff >= 0) return TINF_DATA_ERROR;
// d->dest + d->lzOff >= d->destStart but without undefined behavior due to constructing a pointer to before the d->dest
// subtract d->dest from both sides
if(d->lzOff < d->destStart - d->dest) return TINF_DATA_ERROR;
d->dest[0] = d->dest[d->lzOff];
d->dest++;
}
@ -394,9 +432,11 @@ static int tinf_inflate_uncompressed_block(TINF_DATA *d)
unsigned int length, invlength;
/* get length */
length = uzlib_get_byte(d) + 256 * uzlib_get_byte(d);
length = uzlib_get_byte(d);
length += 256 * uzlib_get_byte(d);
/* get one's complement of length */
invlength = uzlib_get_byte(d) + 256 * uzlib_get_byte(d);
invlength = uzlib_get_byte(d);
invlength += 256 * uzlib_get_byte(d);
/* check length */
if (length != (~invlength & 0x0000ffff)) return TINF_DATA_ERROR;
@ -438,6 +478,9 @@ void uzlib_init(void)
/* initialize decompression structure */
void uzlib_uncompress_init(TINF_DATA *d, void *dict, unsigned int dictLen)
{
d->eof = 0;
d->source_limit = NULL;
d->readSource = NULL;
d->bitcount = 0;
d->bfinal = 0;
d->btype = -1;
@ -468,7 +511,10 @@ next_blk:
tinf_build_fixed_trees(&d->ltree, &d->dtree);
} else if (d->btype == 2) {
/* decode trees from stream */
tinf_decode_trees(d, &d->ltree, &d->dtree);
res = tinf_decode_trees(d, &d->ltree, &d->dtree);
if (res != TINF_OK) {
return res;
}
}
}
@ -481,7 +527,7 @@ next_blk:
break;
case 1:
case 2:
/* decompress block with fixed/dyanamic huffman trees */
/* decompress block with fixed/dynamic huffman trees */
/* trees were decoded previously, so it's the same routine for both */
res = tinf_inflate_block_data(d, &d->ltree, &d->dtree);
break;
@ -501,6 +547,7 @@ next_blk:
} while (--d->destSize);
if (d->eof) return TINF_DATA_ERROR;
return TINF_OK;
}
@ -549,3 +596,9 @@ int uzlib_uncompress_chksum(TINF_DATA *d)
return res;
}
void tinf_put(TINF_DATA *d, char c) {
if (d->dest >= d->edest) { d->eof = 1; return; }
*d->dest++ = c;
if (d->dict_ring) { d->dict_ring[d->dict_idx++] = c; if (d->dict_idx == d->dict_size) d->dict_idx = 0; }
}

11
tests/Makefile Normal file
View file

@ -0,0 +1,11 @@
# Very basic smoke test for decompressor
test:
$(MAKE) -C ../src
$(MAKE) -C ../examples/tgunzip
../examples/tgunzip/tgunzip corpus.tar.gz corpus-out.tar
md5sum -c corpus.md5sum
clean:
$(MAKE) -C ../src $@
$(MAKE) -C ../examples/tgzip $@
$(MAKE) -C ../examples/tgunzip $@

1
tests/corpus.md5sum Normal file
View file

@ -0,0 +1 @@
ec2a54ac0b37dd25f079db1f562471a2 corpus-out.tar

BIN
tests/corpus.tar.gz Normal file

Binary file not shown.