On Windows, when a file is opened in "text" mode, but it actually
contains Unix-style line endings, the behavior of tellg() is
unexpected.
Consider this program which puts the (binary) contents "a\nb\n" in a
file, then opens it in text mode for reading. It prints each
character read, along with the value returned by tellg():
#include <iostream>
#include <fstream>
int main()
{
{
std::ofstream f("myfile.txt", std::ios::binary);
f << "a\nb\n";
}
std::ifstream f("myfile.txt");
for (char c=0; f.get(c);)
std::cout << f.tellg() << ' ' << int(c) << '\n';
}
On a UNIX platform which does not have a distinction between "text"
and "binary" files, the output will read
1 97
2 10
3 98
4 10
because the file position simply advances one position after each
byte is read.
On Windows with the Visual Studio C and C++ runtime, the result is
instead
-1 97
1 10
2 98
4 10
While it is impossible to say exactly what the Windows runtime is
doing here, it appears that it is trying to adjust for the mismatch
between "number of bytes read in byte oriented mode and "number of
bytes read in text mode".
Since "part21" files don't necessarily contain CRLF line endings
when viewed in binary mode, open the file in binary mode. This
fixes the test failure seen on appveyor ci running the
"test_inverse_attr3" test.
- change way states are used, could give a substantial performance improvement
- implement a more flexible approach for exchange_file start token search (more extensibile for subclassing)
- rework/standardise keyword implementation for DATA token
Parser improvements
- implement error handling for duplicate entity instances
* parser catches the error, logs it
* resyncs and continues (the duplicate is ignored)
- rework the exchange_file structure detection
* added parser.reset() to allow a more flexible approach to subclassing
- replace another dict comprehension
- ensure new style classes are used in Python 2.6
- change the way the tokens list is used (improves ability to subclass)