Post by Elliott MitchellPost by Justin CormackPost by Elliott MitchellPost by Justin CormackPost by Elliott MitchellIf HAVE messages are somehow indexed to tree location, then it works
pretty well.
You can encode the path down a binary tree (left = 0, 1 = right) as the
HAVE message payload. Thats fairly independent of how (if at all) you
actually index your nodes. You can with non binary trees too if you insist...
Problem is this turns back into a variable length string.
But log N bounded, which isnt very much. But it doesnt buy you anything.
I already suggested a numbering structure which would work. Keeps the
HAVE messages smaller.
They will be the same size regardless, they have to be able to count all the
nodes.
Post by Elliott MitchellPost by Justin CormackPost by Elliott MitchellPost by Justin CormackPost by Elliott MitchellPost by Justin CormackThe other alternative is a two-phase protocol where you first obtain the Merkle
tree before you can do anything else.
I'm working on a sample implementation which computes the hashes, and
uses a midsize branching factor. Turns out working with n-ary trees isn't
always easy. %-)
Its almost never worth it. Possbly never... Whats your reason for not using
binary?
I'm working with block size divided by hash size as the branching factor.
This results in a node fitting into a single message. At the same time
a block of hashes can be accounted for like a payload block. Should also
mollify the folks looking for a flatish tree (2TB with only 3 levels).
Problem is computing the indicies correctly. %-)
Well you can get that right (and padding the tree with empty leaves (length 0,
hash to match) might help.
However why not have a binary tree and send more children and granchildren
with a request if you want to get the message size up? It pretty much amounts
to the same thing and is simpler. With a 32k message, and 20 byte hashes,
each request will get 10 levels of a binary tree in one message (slightly
smaller, so only 1TB fits in 3 levels...)
You wouldn't bother sending the direct children, only the grandchildren.
At this point it effectively reverts to an n-ary tree. Notably converting
to bitfield indicies once again becomes annoying. This does limit things
to power of 2 branching, but the computations are still annoying.
With things turning into an n-ary tree no matter what, I once again have
to question the wisdom to sticking to "THEX". Just seems completely
inappropriate here.
Explain why you need the grandchildren not the children?
It would be really nice to have a system that didnt have numbering, as then
you can just store everything in hash tables, and the protocol wouldnt
enforce numbering. Bloom filters would do instead of bitmaps (with occasional
problem of false positives, but you could build this into the protocol, just
make a small list of responses which come back dont have when you request).
Then you can add have messages into the bloom filter without having to know
anything about them or create an entry for them. Only problem is the size
is larger than a bitmap to send.
The false positives are easy to deal with, there wont be many, just check
against them before sending out requests, if get back a donthave message
you know you had a false positive, add to list. If you get a have check if
it is in the table of false positives first and delete it if it is, else
add to Bloom filter.
To save on sending Bloom filters you could
(a) add a seed message that means I have everything. Always help the seeds.
(b) if the representation of your filter is longer than the have messages
that make it up (ie you have nothing or very little) send haves instead.
Justin
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/
<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/