Discussion:
hash-based requests
Justin Cormack
2005-02-14 23:09:09 UTC
Permalink
Have been thinking about the idea of making all requests and have messages
refer to hashes rather than block offsets.

It all works pretty well except that at the beginning when you only have the
root hash, you can be sent a have message for a block you know nothing about
because you dont yet have the tree (you dont even know which torrent it is
from yet). Has anyone thought of a solution to this yet? You could store them
but this gives a potential DoS by just making up random numbers and saying you
have those pieces.

The other alternative is a two-phase protocol where you first obtain the Merkle
tree before you can do anything else.

Or is there some better way?

j




Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
Olaf van der Spek
2005-02-14 23:30:09 UTC
Permalink
Post by Justin Cormack
Have been thinking about the idea of making all requests and have messages
refer to hashes rather than block offsets.
It all works pretty well except that at the beginning when you only have the
root hash, you can be sent a have message for a block you know nothing about
because you dont yet have the tree (you dont even know which torrent it is
from yet). Has anyone thought of a solution to this yet? You could store them
Doesn't that happen more often then just the beginning?
For example, if I don't have a subtree yet.
Post by Justin Cormack
but this gives a potential DoS by just making up random numbers and saying you
have those pieces.
The other alternative is a two-phase protocol where you first obtain the Merkle
tree before you can do anything else.
Or is there some better way?
Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
Justin Cormack
2005-02-14 23:28:00 UTC
Permalink
Post by Olaf van der Spek
Post by Justin Cormack
Have been thinking about the idea of making all requests and have messages
refer to hashes rather than block offsets.
It all works pretty well except that at the beginning when you only have the
root hash, you can be sent a have message for a block you know nothing about
because you dont yet have the tree (you dont even know which torrent it is
from yet). Has anyone thought of a solution to this yet? You could store them
Doesn't that happen more often then just the beginning?
For example, if I don't have a subtree yet.
Yes, just was thinking of it from the start.
Post by Olaf van der Spek
Post by Justin Cormack
but this gives a potential DoS by just making up random numbers and saying you
have those pieces.
The other alternative is a two-phase protocol where you first obtain the Merkle
tree before you can do anything else.
Or is there some better way?
Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
Elliott Mitchell
2005-02-15 21:28:02 UTC
Permalink
Post by Justin Cormack
Have been thinking about the idea of making all requests and have messages
refer to hashes rather than block offsets.
Makes a lot of sense for REQUEST and PIECE messages, turns out to make
a lot less sense for HAVE. BITFIELD/HAVE are about maintaining a bitmap,
a data structure almost completely unrelated to large blobs of data
indexed by a 20 byte string.
Post by Justin Cormack
It all works pretty well except that at the beginning when you only have the
root hash, you can be sent a have message for a block you know nothing about
because you dont yet have the tree (you dont even know which torrent it is
from yet). Has anyone thought of a solution to this yet? You could store them
but this gives a potential DoS by just making up random numbers and saying you
have those pieces.
If HAVE messages are somehow indexed to tree location, then it works
pretty well.
Post by Justin Cormack
The other alternative is a two-phase protocol where you first obtain the Merkle
tree before you can do anything else.
I'm working on a sample implementation which computes the hashes, and
uses a midsize branching factor. Turns out working with n-ary trees isn't
always easy. %-)
--
(\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/)
\ ( | ***@gremlin.m5p.com PGP 8881EF59 | ) /
\_ \ | _____ -O #include <stddisclaimer.h> O- _____ | / _/
\___\_|_/82 04 A1 3C C7 B1 37 2A*E3 6E 84 DA 97 4C 40 E6\_|_/___/





Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
Justin Cormack
2005-02-15 21:33:03 UTC
Permalink
Post by Elliott Mitchell
Post by Justin Cormack
Have been thinking about the idea of making all requests and have messages
refer to hashes rather than block offsets.
Makes a lot of sense for REQUEST and PIECE messages, turns out to make
a lot less sense for HAVE. BITFIELD/HAVE are about maintaining a bitmap,
a data structure almost completely unrelated to large blobs of data
indexed by a 20 byte string.
Maybe that is the case. Certainly havent come up with any other way of dealing
with the problems.
Post by Elliott Mitchell
Post by Justin Cormack
It all works pretty well except that at the beginning when you only have the
root hash, you can be sent a have message for a block you know nothing about
because you dont yet have the tree (you dont even know which torrent it is
from yet). Has anyone thought of a solution to this yet? You could store them
but this gives a potential DoS by just making up random numbers and saying you
have those pieces.
If HAVE messages are somehow indexed to tree location, then it works
pretty well.
You can encode the path down a binary tree (left = 0, 1 = right) as the
HAVE message payload. Thats fairly independent of how (if at all) you
actually index your nodes. You can with non binary trees too if you insist...
Post by Elliott Mitchell
Post by Justin Cormack
The other alternative is a two-phase protocol where you first obtain the Merkle
tree before you can do anything else.
I'm working on a sample implementation which computes the hashes, and
uses a midsize branching factor. Turns out working with n-ary trees isn't
always easy. %-)
Its almost never worth it. Possbly never... Whats your reason for not using
binary?




Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
Elliott Mitchell
2005-02-16 00:41:09 UTC
Permalink
Post by Justin Cormack
Post by Elliott Mitchell
If HAVE messages are somehow indexed to tree location, then it works
pretty well.
You can encode the path down a binary tree (left = 0, 1 = right) as the
HAVE message payload. Thats fairly independent of how (if at all) you
actually index your nodes. You can with non binary trees too if you insist...
Problem is this turns back into a variable length string.
Post by Justin Cormack
Post by Elliott Mitchell
Post by Justin Cormack
The other alternative is a two-phase protocol where you first obtain the Merkle
tree before you can do anything else.
I'm working on a sample implementation which computes the hashes, and
uses a midsize branching factor. Turns out working with n-ary trees isn't
always easy. %-)
Its almost never worth it. Possbly never... Whats your reason for not using
binary?
I'm working with block size divided by hash size as the branching factor.
This results in a node fitting into a single message. At the same time
a block of hashes can be accounted for like a payload block. Should also
mollify the folks looking for a flatish tree (2TB with only 3 levels).

Problem is computing the indicies correctly. %-)
--
(\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/)
\ ( | ***@gremlin.m5p.com PGP 8881EF59 | ) /
\_ \ | _____ -O #include <stddisclaimer.h> O- _____ | / _/
\___\_|_/82 04 A1 3C C7 B1 37 2A*E3 6E 84 DA 97 4C 40 E6\_|_/___/





Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
Justin Cormack
2005-02-16 01:45:05 UTC
Permalink
Post by Elliott Mitchell
Post by Justin Cormack
Post by Elliott Mitchell
If HAVE messages are somehow indexed to tree location, then it works
pretty well.
You can encode the path down a binary tree (left = 0, 1 = right) as the
HAVE message payload. Thats fairly independent of how (if at all) you
actually index your nodes. You can with non binary trees too if you insist...
Problem is this turns back into a variable length string.
But log N bounded, which isnt very much. But it doesnt buy you anything.
Post by Elliott Mitchell
Post by Justin Cormack
Post by Elliott Mitchell
Post by Justin Cormack
The other alternative is a two-phase protocol where you first obtain the Merkle
tree before you can do anything else.
I'm working on a sample implementation which computes the hashes, and
uses a midsize branching factor. Turns out working with n-ary trees isn't
always easy. %-)
Its almost never worth it. Possbly never... Whats your reason for not using
binary?
I'm working with block size divided by hash size as the branching factor.
This results in a node fitting into a single message. At the same time
a block of hashes can be accounted for like a payload block. Should also
mollify the folks looking for a flatish tree (2TB with only 3 levels).
Problem is computing the indicies correctly. %-)
Well you can get that right (and padding the tree with empty leaves (length 0,
hash to match) might help.

However why not have a binary tree and send more children and granchildren
with a request if you want to get the message size up? It pretty much amounts
to the same thing and is simpler. With a 32k message, and 20 byte hashes,
each request will get 10 levels of a binary tree in one message (slightly
smaller, so only 1TB fits in 3 levels...)



Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
Elliott Mitchell
2005-02-16 04:19:36 UTC
Permalink
Post by Justin Cormack
Post by Elliott Mitchell
Post by Justin Cormack
Post by Elliott Mitchell
If HAVE messages are somehow indexed to tree location, then it works
pretty well.
You can encode the path down a binary tree (left = 0, 1 = right) as the
HAVE message payload. Thats fairly independent of how (if at all) you
actually index your nodes. You can with non binary trees too if you insist...
Problem is this turns back into a variable length string.
But log N bounded, which isnt very much. But it doesnt buy you anything.
I already suggested a numbering structure which would work. Keeps the
HAVE messages smaller.
Post by Justin Cormack
Post by Elliott Mitchell
Post by Justin Cormack
Post by Elliott Mitchell
Post by Justin Cormack
The other alternative is a two-phase protocol where you first obtain the Merkle
tree before you can do anything else.
I'm working on a sample implementation which computes the hashes, and
uses a midsize branching factor. Turns out working with n-ary trees isn't
always easy. %-)
Its almost never worth it. Possbly never... Whats your reason for not using
binary?
I'm working with block size divided by hash size as the branching factor.
This results in a node fitting into a single message. At the same time
a block of hashes can be accounted for like a payload block. Should also
mollify the folks looking for a flatish tree (2TB with only 3 levels).
Problem is computing the indicies correctly. %-)
Well you can get that right (and padding the tree with empty leaves (length 0,
hash to match) might help.
However why not have a binary tree and send more children and granchildren
with a request if you want to get the message size up? It pretty much amounts
to the same thing and is simpler. With a 32k message, and 20 byte hashes,
each request will get 10 levels of a binary tree in one message (slightly
smaller, so only 1TB fits in 3 levels...)
You wouldn't bother sending the direct children, only the grandchildren.
At this point it effectively reverts to an n-ary tree. Notably converting
to bitfield indicies once again becomes annoying. This does limit things
to power of 2 branching, but the computations are still annoying.

With things turning into an n-ary tree no matter what, I once again have
to question the wisdom to sticking to "THEX". Just seems completely
inappropriate here.
--
(\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/)
\ ( | ***@gremlin.m5p.com PGP 8881EF59 | ) /
\_ \ | _____ -O #include <stddisclaimer.h> O- _____ | / _/
\___\_|_/82 04 A1 3C C7 B1 37 2A*E3 6E 84 DA 97 4C 40 E6\_|_/___/





Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
Justin Cormack
2005-02-16 11:58:42 UTC
Permalink
Post by Elliott Mitchell
Post by Justin Cormack
Post by Elliott Mitchell
Post by Justin Cormack
Post by Elliott Mitchell
If HAVE messages are somehow indexed to tree location, then it works
pretty well.
You can encode the path down a binary tree (left = 0, 1 = right) as the
HAVE message payload. Thats fairly independent of how (if at all) you
actually index your nodes. You can with non binary trees too if you insist...
Problem is this turns back into a variable length string.
But log N bounded, which isnt very much. But it doesnt buy you anything.
I already suggested a numbering structure which would work. Keeps the
HAVE messages smaller.
They will be the same size regardless, they have to be able to count all the
nodes.
Post by Elliott Mitchell
Post by Justin Cormack
Post by Elliott Mitchell
Post by Justin Cormack
Post by Elliott Mitchell
Post by Justin Cormack
The other alternative is a two-phase protocol where you first obtain the Merkle
tree before you can do anything else.
I'm working on a sample implementation which computes the hashes, and
uses a midsize branching factor. Turns out working with n-ary trees isn't
always easy. %-)
Its almost never worth it. Possbly never... Whats your reason for not using
binary?
I'm working with block size divided by hash size as the branching factor.
This results in a node fitting into a single message. At the same time
a block of hashes can be accounted for like a payload block. Should also
mollify the folks looking for a flatish tree (2TB with only 3 levels).
Problem is computing the indicies correctly. %-)
Well you can get that right (and padding the tree with empty leaves (length 0,
hash to match) might help.
However why not have a binary tree and send more children and granchildren
with a request if you want to get the message size up? It pretty much amounts
to the same thing and is simpler. With a 32k message, and 20 byte hashes,
each request will get 10 levels of a binary tree in one message (slightly
smaller, so only 1TB fits in 3 levels...)
You wouldn't bother sending the direct children, only the grandchildren.
At this point it effectively reverts to an n-ary tree. Notably converting
to bitfield indicies once again becomes annoying. This does limit things
to power of 2 branching, but the computations are still annoying.
With things turning into an n-ary tree no matter what, I once again have
to question the wisdom to sticking to "THEX". Just seems completely
inappropriate here.
Explain why you need the grandchildren not the children?

It would be really nice to have a system that didnt have numbering, as then
you can just store everything in hash tables, and the protocol wouldnt
enforce numbering. Bloom filters would do instead of bitmaps (with occasional
problem of false positives, but you could build this into the protocol, just
make a small list of responses which come back dont have when you request).
Then you can add have messages into the bloom filter without having to know
anything about them or create an entry for them. Only problem is the size
is larger than a bitmap to send.

The false positives are easy to deal with, there wont be many, just check
against them before sending out requests, if get back a donthave message
you know you had a false positive, add to list. If you get a have check if
it is in the table of false positives first and delete it if it is, else
add to Bloom filter.

To save on sending Bloom filters you could
(a) add a seed message that means I have everything. Always help the seeds.
(b) if the representation of your filter is longer than the have messages
that make it up (ie you have nothing or very little) send haves instead.

Justin



Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Loading...