[redland-dev] Win32 issue: Small BUFSIZ

Daniel Richard G. oss at teragram.com
Wed Sep 5 16:25:02 EDT 2012


Hello list,

A colleague of mine encountered an issue in the Raptor source that affects 
automatic parser guessing on Windows, leading to behavior inconsistent 
with Linux builds.

In raptor_internal.h, RAPTOR_READ_BUFFER_SIZE is defined as BUFSIZ if the 
latter is defined. On my Linux system, this value is 8192.

In raptor_parse.c, in raptor_world_guess_parser_name(), FIRSTN is the 
number of bytes at the beginning of a document that the code should 
analyze for syntax recognition. This is defined to 1024, a value small 
enough to avoid documents that contain HTML/XML examples (per the 
preceding comment).

The problem is that on Windows, BUFSIZ is defined to 512, and thus so is 
RAPTOR_READ_BUFFER_SIZE. Raptor does not buffer more than this many bytes 
at a time (see struct raptor_parser_s.buffer), and so when syntax 
recognition is enabled, Raptor is only looking at the first 512 bytes of 
the document on Windows, compared to 1024 on Linux. Which can lead to 
differing results, as my colleague found.

The attached patch provides (1) a compile-time check in raptor_parse.c to 
ensure that RAPTOR_READ_BUFFER_SIZE is at least as large as FIRSTN, and 
(2) a change to raptor_internal.h to use BUFSIZ only if it is greater than 
4096 (this being the default value used if BUFSIZ is undefined).

Questions and comments are welcome.


--Daniel


-- 
Daniel Richard G. || danielg at teragram.com || Software Developer
Teragram Linguistic Technologies (a division of SAS)
http://www.teragram.com/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: raptor-bufsiz-fix.patch
Type: text/x-diff
Size: 946 bytes
Desc: Patch against git master
URL: <http://lists.librdf.org/pipermail/redland-dev/attachments/20120905/7f7ec354/attachment.patch>


More information about the redland-dev mailing list