Back to Merkle Hash Trees...

Post by sh4dowmatter
This must all seem utterly trivial, so I just want wondering if the
powers that be, whether it was originally Merkle or someone else, said
that "all Merkle trees shall be formed breadth-first, and not as a
full binary tree." If I try pursue this idea above, can I call it a
Merkle hash tree?

Either way, it is a merkle hash tree. The question is if it complies
to that THEX format. I'd still say so, because in point 3.3 they say
"Normal breadth-first serialization is the recommended manner in
which to serialize the hash tree.". Note the *recommended*.

The structure of the Hash Tree depends on the order in which you
combine the leaves. There's a lot of variations possible. If your
program wants to build hash trees your way thats no problem. But
remember that it should be able to read *any* hash tree correctly.

HTH,
Kosta
who still doesnt understand where the big advantage over a list
of hashed blocks is, anyways

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Olaf van der Spek

2005-02-04 23:47:53 UTC

Post by Konstantin 'Kosta' Welke

Isn't there just a single way to combine two hashes? First the left, then
the right?

Post by Konstantin 'Kosta' Welke
HTH,
Kosta
who still doesnt understand where the big advantage over a list
of hashed blocks is, anyways

Tiny .torrent size, smaller verification granularity, compatibility with
other merkle tree hashes.

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Joseph Ashwood

2005-02-05 03:11:01 UTC

----- Original Message -----
From: "Olaf van der Spek" <***@LIACS.NL>
Subject: Re: [BitTorrent] Back to Merkle Hash Trees...

Post by Konstantin 'Kosta' Welke
who still doesnt understand where the big advantage over a list
of hashed blocks is, anyways

Tiny .torrent size, smaller verification granularity, compatibility with
other merkle tree hashes.

And the associated disadvantages of not having those hashes immediately
available. Harder to determine the bad parts of the file, higher overhead in
computing hashes, longer download times (extra hashes need to be
downloaded).

If you're building a filesystem then Merkle tress are useful in forming the
signed library, but otherwise rather useless.

Now for those that don't believe the statements above, one at a time:

Harder .... bad parts. Current process: lookup segment hash in O(1) time,
compare hash O(m) time, total time O(n). Merkle process: step through tree
to leaf O(log(n)) time, then compare hashes O(n) time, total time
O(nlog(m)).

Higher overhead...: Computing root hash using current method O(n) time.
Computing root hash for Merkle O(nlogn) time.

Longer download: Currently the overhead is known. In the case of detecting
bad segments, Merkle overhead will necesarily include downloading the entire
Merkle tree, in addition to the current overhead. Where detecting bad
segment is unnecessary the overhead remains the same.

There are additional problems in balancing the signed library tree in order
to achieve maximum speed and in not propogating the bad segments.

For maximum processing speed the tree needs to be properly balanced doing
this requires properly balancing across the hash. In the case os SHA-1 this
requires tuning in such a way that numHashes*512/160 is as close as possible
to an integer, while this is not immediately apparent as computationally
intensive you have to understand that it is actually an O(nlogn) problem,
and while not overly costly is not cheap.

Bed segment propogation is already a small problem as parts may be
downloaded and uploaded before verification. By using a Merkle tree you are
raising the computational overhead to perform verification, and in some
cases may be delaying the verification until the complete file is downloaded
(to compute the non-downloaded complete tree), creating a situation where
there is a high likelihood of large scale propogation of any bad data.

As I said earlier, only in a filesystem architecure are merkle trees of
substantial use. In the file system each file is hashed, then categories,
etc, forming what is called a signed library. This allows the library to be
updated in O(logn) time instead of O(n) at the cost that the initial
creation is O(nlogn) instead of O(n). During download it is important to
realize that flat hashes require O(downloads*n) time and Merkle trees will
require O(downloads*nlogn) time, as the downloads become the dominant factor
the advantage of flat hashes becomes ever greater. Only in the transfer of
live data, or data that is often updated (which would require updating the
torrent file) does the Merkle tree pose any advantage.

For those that are truly security conscious of security there are a great
deal many other things that must be addressed to maintain the security that
I have not even touched on, but suffice it to say that the original Merkle
tree should not be considered secure.
Joe

Trust Laboratories
Changing Software Development
http://www.trustlaboratories.com

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Elliott Mitchell

2005-02-05 06:02:47 UTC

Post by Joseph Ashwood
Harder .... bad parts. Current process: lookup segment hash in O(1) time,
compare hash O(m) time, total time O(n). Merkle process: step through tree
to leaf O(log(n)) time, then compare hashes O(n) time, total time
O(nlog(m)).
Higher overhead...: Computing root hash using current method O(n) time.
Computing root hash for Merkle O(nlogn) time.

A legitimate concern. Two counterpoints though. The smallest size chunk
that anyone has suggested making verifiable is 4K at the high end you
might only make 256K chunks verifiable. There is also a question of how
branchy the tree should be. Given that the smallest requestable chunk is
likely to be 16K or 32K, it seems to me that the second layer might
consist of 512 hashes as that many can be packed into a 16K chunk.

Consider, for a 4GB file with the smallest chunk size suggested of 4K and
the bottom end branching factor of 2. The complete tree will consist of
2^21 hashes. If SHA256 is used, we'll have 64MB of hashes. An overhead of
less than 2%.

Post by Joseph Ashwood
Longer download: Currently the overhead is known. In the case of detecting
bad segments, Merkle overhead will necesarily include downloading the entire
Merkle tree, in addition to the current overhead. Where detecting bad
segment is unnecessary the overhead remains the same.

The V1 protocol already requires you to download hashes for the entire
file from a resource constrained server (the torrent file). The V2
protocol will almost certainly require you to download hashes at a
minimum for every piece to prevent the spread of corrupt pieces. There is
then the problem of verifying the blocks of hashes which is where the
extra overhead comes in.

As noted above I don't see it being worth transfering blocks of less than
512 hashes, as this number neatly fits in the smallest verifiable block.
At this point the tree internal nodes are going to be less than 1% of the
size of the leaf nodes. Given that V1 already requires you to transfer
leaf node hashes, this additional overhead is minimal.

Post by Joseph Ashwood
For maximum processing speed the tree needs to be properly balanced doing
this requires properly balancing across the hash. In the case os SHA-1 this
requires tuning in such a way that numHashes*512/160 is as close as possible
to an integer, while this is not immediately apparent as computationally
intensive you have to understand that it is actually an O(nlogn) problem,
and while not overly costly is not cheap.

The smallest verifiable unit will likely be fixed (Bram's writings
suggest this, but we haven't seen code yet). At this point the above
becomes futile. You can trade internal nodes in one location for internal
nodes in another location, but you're merely spreading the inefficiency
across the tree, not actually gaining anything (and spending the above
cost). Leaving it an unbalanced tree is simplest, and in fact
advantageous if you which to be able to expand torrents.

More detail?

Post by Joseph Ashwood
Bed segment propogation is already a small problem as parts may be
downloaded and uploaded before verification. By using a Merkle tree you are
raising the computational overhead to perform verification, and in some
cases may be delaying the verification until the complete file is downloaded
(to compute the non-downloaded complete tree), creating a situation where
there is a high likelihood of large scale propogation of any bad data.

Wrong. V1 places hashes in the torrent file so at the piece-level (256K
by default) things get verified. Due to this very concern I doubt Bram
would be so stupid as to not require the transfer and verification of
this granularity of hashes. By placing the transfer of hashes onto the
peers, you're removing a lot of strain from the centralized server
holding the torrent file.

Post by Joseph Ashwood
For those that are truly security conscious of security there are a great
deal many other things that must be addressed to maintain the security that
I have not even touched on, but suffice it to say that the original Merkle
tree should not be considered secure.

Yes. Hopefully I've shot down these concerns?

One you might think about helping me murder is the "THEX" design various
folks have been advocating. THEX distinguishes leaves from internal nodes
by appending a byte depending on whether it is leaf or payload; *that*
is bogus! (kills alignment, makes testing expensive)

http://www.open-content.net/specs/draft-jchapweske-thex-02.html
--
(\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/)
\ ( | ***@gremlin.m5p.com PGP 8881EF59 | ) /
\_ \ | _____ -O #include <stddisclaimer.h> O- _____ | / _/
\___\_|_/82 04 A1 3C C7 B1 37 2A*E3 6E 84 DA 97 4C 40 E6\_|_/___/

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Olaf van der Spek

2005-02-05 11:00:00 UTC

Post by Elliott Mitchell
One you might think about helping me murder is the "THEX" design various
folks have been advocating. THEX distinguishes leaves from internal nodes
by appending a byte depending on whether it is leaf or payload; *that*
is bogus! (kills alignment, makes testing expensive)

It's not. You do not have to put the byte physically in front of the
block. You can 'hash' the byte and then the block.

And it's required in general because otherwise you couldn't tell the
difference between hash data and user data (if you only know the root
hash and not the file size).

Post by Elliott Mitchell
http://www.open-content.net/specs/draft-jchapweske-thex-02.html

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Justin Cormack

2005-02-05 12:28:45 UTC

It's not. You do not have to put the byte physically in front of the
block. You can 'hash' the byte and then the block.

No you cant, sha1 is block based and you cant hash less than a block of
64 bytes at a time. So there is an alignment issue.

Post by Olaf van der Spek
And it's required in general because otherwise you couldn't tell the
difference between hash data and user data (if you only know the root
hash and not the file size).

There are more significant reasons than this I believe you can spoof
blocks if this isnt done.

Post by Elliott Mitchell
http://www.open-content.net/specs/draft-jchapweske-thex-02.html

Olaf van der Spek

2005-02-05 12:43:21 UTC

It's not. You do not have to put the byte physically in front of the
block. You can 'hash' the byte and then the block.

No you cant, sha1 is block based and you cant hash less than a block of
64 bytes at a time. So there is an alignment issue.

Hmm, you're right. The SHA1 implementation I use is copying the data
anyway, so in this case it wouldn't matter.

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Joseph Ashwood

2005-02-06 21:53:54 UTC

----- Original Message -----
From: "Justin Cormack" <***@street-vision.com>
Subject: Re: [BitTorrent] Back to Merkle Hash Trees...

Post by Justin Cormack
sha1 is block based and you cant hash less than a block of
64 bytes at a time. So there is an alignment issue.

Actually there isn't. SHA-1 is bit-based, and aligns those bits into 512-bit
blocks. But in order to prevent certain attacks it also includes a required
termination that occupies at minimum 65 bits, at maximum 512-bits. If you
maintain the 512-bit alignment SHA-1 will actually be at it's slowest, if
you align the last block at 447 bits SHA-1 will be at it's fastest.
Joe

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Elliott Mitchell

2005-02-05 23:44:45 UTC

It's not. You do not have to put the byte physically in front of the
block. You can 'hash' the byte and then the block.
And it's required in general because otherwise you couldn't tell the
difference between hash data and user data (if you only know the root
hash and not the file size).

You completely missed my point.

I agree *some* form of verification is required. You've pointed to one
alternative, knowing the file size. If the file size is known I can
easily build the tree and verify. I'm doubtful that a torrent file
wouldn't include the size, so this verifier should be acceptable. Adding
extra data to the payload will also work though.

_Prefixing_ the marking byte (or word) is *totally* bogus.

Justin already mentioned that SHA1 is a block-based hash. This means
you've hurt performance by causing the payload to be misaligned (perhaps
not a lot, but some).

Well, you still haven't gotten my primary point despite me saying it
repeatedly so it is time to be very explicit. For these purposes, let us
pretend we're poor C.S. students back at some University. This week a
professor issues an assignment where we are going to have to compute the
SHA1 hash of a block of data. Being reasonably intelligent students, we
notice the presence of OpenSSL's libcrypto on the system so we choose to
use that. We look at the man page and see the function SHA1(). Turns out
to be nice and easy to use, simply SHA1(somestr, strlen(somestr), NULL);
this weeks homework is done lets party.

Next week comes, and time to have more fun with SHA1. This time we're
going to need to get the SHA1 hash of an arbitrary blob of data that
we're going to read from standard input. The professor has stated this
will be tested on a blob of data of at least 100GB in size, and we're
assured that no machines in the department have that much memory (even
when accounting for swap and on ia32 we couldn't use it anyway). Well,
obviously SHA1() won't work since we can't fit the data into memory. So
we go searching the man pages again...

Oh, on the very same man page with SHA1() there turn out to be three
other functions. SHA1_Init(), SHA1_Update(), and SHA1_Final(). Looks like
the OpenSSL folks and designers of SHA1 thought ahead. They designed
things in such a way that you don't have to have everything in memory, in
fact they don't even require you to know the data size ahead of time.
SHA1_Init() initializes a SHA_CTX structure for use with the other two.
The SHA_CTX structure is opaque, but known to be of static size, notably
it has no pointers. We'd need to use memcpy() to move it around in
memory, but otherwise is pretty basic. SHA1_Update() simply adds hashes
the blobs of data you give it, mixing them into the SHA_CTX structure.
SHA1_Final() simply extracts the SHA1 value from the structure. Well,
pretty easy to go from there, this week's homework is done, more party
time.

There are two points here. First, all hashes can be done incrementally,
the data doesn't need to fit into memory, going with this is a somewhat
more advanced API that you may not of been aware of. The crucial point
here is that SHA_CTX structure. We're not supposed to peek inside it, but
we can do pretty well anything else we want with it. Of note we can
freely move it around if we wish to do so. The key issue here is that we
can freely copy SHA_CTX structures, there is no state kept outside of
structure nor any pointers to other structures. All interfaces to SHA1
computations (notably Python's) are similar, there is a blob of context
and you can freely move and copy it.

Imagine you're handed a 16K block of data and a hash value. You're unsure
whether this is internal nodes or a leaf (or even whether it is valid at
all). Since copy avoidance is a good thing, you create an SHA_CTX
structure, use SHA1_Update() to feed in the single marker byte for an
internal node, and then call SHA1_Update again to feed in the 16K of
payload. Finally call SHA1_Final() and we compare with the hash. No dice,
this isn't a valid internal node. So now we try almost the exact same set
of steps again only using the leaf marker byte. For my point it doesn't
matter whether it turns out to be valid or note, more than likely it is
though. See why I consider THEX bogus yet?

I want THEX dead so I'm going to presume I haven't made it obvious enough
yet. My main issue is that it is a byte _prefix_. In going with this, let
us again imagine we've got that 16K blob and the hash. Only this time the
marking byte will be appended to the blob, rather than prefixed. Now the
steps will be slightly different. Once again we create an SHA_CTX
structure and use SHA1_Update() to feed in the 16K blob of data. We then
use SHA1_Update() to append the marker byte and SHA1_Final() to get the
hash to compare. The key here is that once you've used SHA1_Update() with
the 16K blob, you can then copy the SHA_CTX structure and use
SHA1_Update()/SHA1_Final() on the original to test for it being payload
and again on the copy to test for it being internal nodes versus bogus.

Justin is correct. The secure hashes are relatively expensive. The
problem is that THEX *forces* you to run the hash over the *entirety* of
the payload twice to identify a bogus block. This is 32K + 2B of data
going through the hash function for every block. By moving the marker to
the end, we can save the hash context after running the payload through
and only rehash the marker. This is 16K + 2B of data going through the
hash function for every block.

The THEX folks have *doubled* their overhead by using an utterly bogus
design. I'm not sure I can call incompetence on the THEX designer for
this mistake, but I can STRONGLY RECOMMEND AVOIDING IT!

Post by Joseph Ashwood
Higher overhead...: Computing root hash using current method O(n) time.
Computing root hash for Merkle O(nlogn) time.

Isn't that limited by disk transfer rate instead of CPU time?

Already mentioned in my first message on this subject. Assuming you
provide for 4K block verification and a branching factor of 2, the amount
of data that gets hashed for the tree is less than 1% of the size of the
payload. This is a is measurable, but insignificant. If 16K block
verification is used and a branching factor of 512 (again minimum block
size is likely to be 16K, 512 SHA256 hashes fit into 16K), the tree
overhead is much less than 0.1% for a 4GB payload.

The main factor is that block size. No reasonable block size will make
the tree verification expensive, the payload hashing is incomparably
more expensive.
--
(\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/)
\ ( | ***@gremlin.m5p.com PGP 8881EF59 | ) /
\_ \ | _____ -O #include <stddisclaimer.h> O- _____ | / _/
\___\_|_/82 04 A1 3C C7 B1 37 2A*E3 6E 84 DA 97 4C 40 E6\_|_/___/

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Olaf van der Spek

2005-02-06 11:24:33 UTC

Post by Elliott Mitchell
Imagine you're handed a 16K block of data and a hash value. You're unsure
whether this is internal nodes or a leaf (or even whether it is valid at

How can you be unsure about that?
It should be clear from the protocol.
If not, an internal node is 21 or 41 bytes (IIRC), so your 16 kbyte
block would not be an internal node.

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Konstantin 'Kosta' Welke

2005-02-06 13:17:03 UTC

Post by Elliott Mitchell
Imagine you're handed a 16K block of data and a hash value. You're unsure
whether this is internal nodes or a leaf (or even whether it is valid at

How can you be unsure about that?
It should be clear from the protocol.
If not, an internal node is 21 or 41 bytes (IIRC), so your 16 kbyte
block would not be an internal node.

Okay, why use this 2-byte prefix then?

Kosta

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Olaf van der Spek

2005-02-06 20:33:36 UTC

Post by Konstantin 'Kosta' Welke

Post by Elliott Mitchell
Imagine you're handed a 16K block of data and a hash value. You're unsure
whether this is internal nodes or a leaf (or even whether it is valid at

How can you be unsure about that?
It should be clear from the protocol.
If not, an internal node is 21 or 41 bytes (IIRC), so your 16 kbyte
block would not be an internal node.

Okay, why use this 2-byte prefix then?

Because the 'context' can't be verified with the root hash, while the
one-byte prefix can.

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Elliott Mitchell

2005-02-06 21:27:22 UTC

Post by Elliott Mitchell
Imagine you're handed a 16K block of data and a hash value. You're unsure
whether this is internal nodes or a leaf (or even whether it is valid at

How can you be unsure about that?

Easily. Notably different protocol design than you're thinking. Crucially
it never makes sense to transfer one hash at a time. If you transfer
one hash at a time you've generated thousands of extra protocol messages.
Given this you're likely to transfer blocks of hashes, at which point
blocks of hashes will likely be of similar size to payload blocks.

Post by Olaf van der Spek
It should be clear from the protocol.

Which is another reason I dislike THEX. The type is a protocol issue, not
a standards issue. You might settle on markers similar THEX, at which
point you've got to try both markers to distinguish the types.

Post by Olaf van der Spek
If not, an internal node is 21 or 41 bytes (IIRC), so your 16 kbyte
block would not be an internal node.

And a payload block can't be of this size? (notably the EOF)

You're still assuming transfer of one hash at a time, which is worthless
as you end up bloating the protocol overhead horrendously.
--
(\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/)
\ ( | ***@gremlin.m5p.com PGP 8881EF59 | ) /
\_ \ | _____ -O #include <stddisclaimer.h> O- _____ | / _/
\___\_|_/82 04 A1 3C C7 B1 37 2A*E3 6E 84 DA 97 4C 40 E6\_|_/___/

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Olaf van der Spek

2005-02-06 22:08:24 UTC

Post by Elliott Mitchell
Imagine you're handed a 16K block of data and a hash value. You're unsure
whether this is internal nodes or a leaf (or even whether it is valid at

How can you be unsure about that?

Easily. Notably different protocol design than you're thinking. Crucially
it never makes sense to transfer one hash at a time. If you transfer

Did I say I'd transfer only a single hash at a time?

Post by Elliott Mitchell
one hash at a time you've generated thousands of extra protocol messages.
Given this you're likely to transfer blocks of hashes, at which point
blocks of hashes will likely be of similar size to payload blocks.

But with a different block/field/message type.

Post by Olaf van der Spek
It should be clear from the protocol.

Why can't you add both the markers and something else?

Post by Olaf van der Spek
If not, an internal node is 21 or 41 bytes (IIRC), so your 16 kbyte
block would not be an internal node.

And a payload block can't be of this size? (notably the EOF)
You're still assuming transfer of one hash at a time, which is worthless

Am I?

Post by Elliott Mitchell
as you end up bloating the protocol overhead horrendously.

Elliott Mitchell

2005-02-07 00:01:23 UTC

But with a different block/field/message type.

My ideas have been otherwise, simply handling blocks of hashes as any
other block of data. Pointing to payload versus internal nodes implicitly
via their indicies, but otherwise identical to any other data. My
thoughts have also not been constrained by thinking of "THEX", so I may
of been exploring cases that didn't occur in your scenarios.

Having said that I think my ideas have solidified sufficiently that I may
write them up in both code and spec form soon. At which point it will be
time to see whose ideas get shot down.

Post by Olaf van der Spek
It should be clear from the protocol.

Why can't you add both the markers and something else?

Inefficiency. If one fails is there any reason to believe the other will
survive? In Cryptography adding a constant payload is a big no-no, as it
aids cryptanalysis. As lower levels (TCP) will take care of errors, why
add redundancy at the application layer?

Post by Olaf van der Spek
If not, an internal node is 21 or 41 bytes (IIRC), so your 16 kbyte
block would not be an internal node.

And a payload block can't be of this size? (notably the EOF)
You're still assuming transfer of one hash at a time, which is worthless

Am I?

Time for those write-ups.
--
(\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/)
\ ( | ***@gremlin.m5p.com PGP 8881EF59 | ) /
\_ \ | _____ -O #include <stddisclaimer.h> O- _____ | / _/
\___\_|_/82 04 A1 3C C7 B1 37 2A*E3 6E 84 DA 97 4C 40 E6\_|_/___/

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Olaf van der Spek

2005-02-07 00:31:54 UTC

But with a different block/field/message type.

Maybe.

Post by Elliott Mitchell
Having said that I think my ideas have solidified sufficiently that I may
write them up in both code and spec form soon. At which point it will be
time to see whose ideas get shot down.

Post by Olaf van der Spek
It should be clear from the protocol.

Why can't you add both the markers and something else?

Inefficiency. If one fails is there any reason to believe the other will
survive? In Cryptography adding a constant payload is a big no-no, as it

Doesn't that apply only to encryption and not to secure hashing?

Post by Elliott Mitchell
aids cryptanalysis. As lower levels (TCP) will take care of errors, why
add redundancy at the application layer?

Maybe because the TCP checksum isn't (very) strong.

Post by Olaf van der Spek
If not, an internal node is 21 or 41 bytes (IIRC), so your 16 kbyte
block would not be an internal node.

And a payload block can't be of this size? (notably the EOF)
You're still assuming transfer of one hash at a time, which is worthless

Am I?

Time for those write-ups.

I doubt those are gonna say anything about my assumptions.

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Elliott Mitchell

2005-02-07 02:26:42 UTC

Post by Elliott Mitchell
aids cryptanalysis. As lower levels (TCP) will take care of errors, why
add redundancy at the application layer?

Maybe because the TCP checksum isn't (very) strong.

It is more than strong enough to fulfill its intended purpose. Mainly
guarding against errors due to line noise and other accidental errors
during packet handling.

The TCP checksum is sufficiently strong that if you get a packet where
the checksum verifies and yet has data different in any way from what the
other end sent, you can say beyond a reasonable doubt that it was caused
by deliberate action.

Note protocols like FTP which send large binary files across the Internet
without any additional checksums at all. Yet FTP works with very high
reliability. Sites publish MD5 checksums in an attempt to thwart
modification by attackers, not because there is even the slightest
concern of FTP/TCP failing.
--
(\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/)
\ ( | ***@gremlin.m5p.com PGP 8881EF59 | ) /
\_ \ | _____ -O #include <stddisclaimer.h> O- _____ | / _/
\___\_|_/82 04 A1 3C C7 B1 37 2A*E3 6E 84 DA 97 4C 40 E6\_|_/___/

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Brian Dessent

2005-02-07 05:18:50 UTC

Post by Elliott Mitchell
It is more than strong enough to fulfill its intended purpose. Mainly
guarding against errors due to line noise and other accidental errors
during packet handling.

Yes, in a perfect world. But there are unfortunately lots and lots of
broken routers and TCP stacks out there.
http://citeseer.ist.psu.edu/stone00when.html

Brian

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Elliott Mitchell

2005-02-07 06:06:30 UTC

Post by Elliott Mitchell
It is more than strong enough to fulfill its intended purpose. Mainly
guarding against errors due to line noise and other accidental errors
during packet handling.

Yes, in a perfect world. But there are unfortunately lots and lots of
broken routers and TCP stacks out there.
http://citeseer.ist.psu.edu/stone00when.html

True, but that is irrelevant as long as the end points verify them. There
isn't any point in worrying about this condition, because if this fails
we're already dead.
--
(\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/)
\ ( | ***@gremlin.m5p.com PGP 8881EF59 | ) /
\_ \ | _____ -O #include <stddisclaimer.h> O- _____ | / _/
\___\_|_/82 04 A1 3C C7 B1 37 2A*E3 6E 84 DA 97 4C 40 E6\_|_/___/

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Olaf van der Spek

2005-02-07 09:34:07 UTC

Post by Elliott Mitchell
It is more than strong enough to fulfill its intended purpose. Mainly
guarding against errors due to line noise and other accidental errors
during packet handling.

Yes, in a perfect world. But there are unfortunately lots and lots of
broken routers and TCP stacks out there.
http://citeseer.ist.psu.edu/stone00when.html

True, but that is irrelevant as long as the end points verify them. There

Them? What's them?
IIRC that article is about packets with a valid ethernet CRC and a valid
TCP checksum, but data that's not the same as what the original sender send.

Post by Elliott Mitchell
isn't any point in worrying about this condition, because if this fails
we're already dead.

Justin Cormack

2005-02-07 10:02:25 UTC

Post by Elliott Mitchell
aids cryptanalysis. As lower levels (TCP) will take care of errors, why
add redundancy at the application layer?

Maybe because the TCP checksum isn't (very) strong.

This is not universally agreed on. See for example the adding of checksums
to iSCSI accompanied by arguments for the error rate of the TCP checksum for
large transfers (rfc3385).

Justin

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Joseph Ashwood

2005-02-05 10:20:55 UTC

----- Original Message -----
From: "Elliott Mitchell" <***@m5p.com>
Subject: Re: [BitTorrent] Back to Merkle Hash Trees...

Post by Elliott Mitchell
Yes. Hopefully I've shot down these concerns?

Not for me, because think Merkle tress are in general a bad idea for most
purposes, but perhaps in general.

Actually I think that if Merkle trees are going to happen THEX is a tiny
baby step in the correct direction, although it lacks the serializability
that is necessary to maintain security. For serializability it is necessary
to know a number of things about each node: nodeID = (level, horizontal
location on level)
Segment #
Size
Hash
Parent by id
nodeID
number of direct children

All of these need to be hashed by the parent so instead of having a Merkle
tree as
Root = Hash(A|B)
A = hash(first part)
B = hash(second part)

you have
Root = {1,sizeof(A|B), hash(A|B), (0,0), (1, 1), 2}
A = {2,sizeof(first part), hash(first part), (1,1),(2,1), 0}
B = {3,sizeof(second part), hash(second part), (1,1), (2, 2), 0}

Assuming a secure hash this will allow complete serialization of the data,
making it necessary to break the underlying hash in order to fake a tree
(original proposal Merkle tree did not have this property). This will also
take care of your alignment issues, the {parent, level, location, children}
3-tuple can be constructed to take 128 bits, and sizeof(...) can be
constructed in 32-bits. This result is completely serializable and hence is
secure.*

Of course the order of the tuples is irrelevant, and perhaps it would be
helpful to move the most parsed information (the 3-tuple) to the front to
ease parsing. The Segment # is only for indexing and request purposes, if
the requests are reworked to be a nodeID from above then the Segment # can
go away entirely.

This noticably increases the transferred data, by adding at leadt 20 bytes.
Some may think this is trivial, I do not, but if Merkle trees are needed, at
least do them securely.
Joe

*The original library signing algorithm did in fact have this, although it
was somewhat hidden. Because the books were assumed to be in alphabetical
order, and classed based on subject again alphabetized, the tree was
serializable.

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Elliott Mitchell

2005-02-06 07:55:42 UTC

Post by Joseph Ashwood
----- Original Message -----

Actually I think that if Merkle trees are going to happen THEX is a tiny
baby step in the correct direction, although it lacks the serializability

Hopefully my previous message sufficiently explained why I consider THEX
to be a bad idea. Those reasons have nothing to do with lack of sufficent
data in the nodes. (incidentally, the above should be "prefixing" not
"appending")

Post by Joseph Ashwood
that is necessary to maintain security. For serializability it is necessary
to know a number of things about each node: nodeID = (level, horizontal
location on level)
Segment #
Size
Hash
Parent by id
nodeID
number of direct children

Some of these need to be known to verify a node, but they needn't be
verified by the root hash, because if they're wrong they'll fail to
verify.

Post by Joseph Ashwood
All of these need to be hashed by the parent so instead of having a Merkle
tree as
Root = Hash(A|B)
A = hash(first part)
B = hash(second part)
you have
Root = {1,sizeof(A|B), hash(A|B), (0,0), (1, 1), 2}
A = {2,sizeof(first part), hash(first part), (1,1),(2,1), 0}
B = {3,sizeof(second part), hash(second part), (1,1), (2, 2), 0}
Assuming a secure hash this will allow complete serialization of the data,
making it necessary to break the underlying hash in order to fake a tree
(original proposal Merkle tree did not have this property). This will also
take care of your alignment issues, the {parent, level, location, children}
3-tuple can be constructed to take 128 bits, and sizeof(...) can be
constructed in 32-bits. This result is completely serializable and hence is
secure.*

If there is only one way to construct the tree, the above is implicit in
the structure of the tree. If it is tampered with the hashes won't
verify.

Given hashes for a pair of siblings, you can compute the parent hash. So,
if you have the hashes for all direct children of the root you can verify
them as a group. Once these are verified you can confirm all of the next
level. This continues down to the leaves.
If the leaves are marked similar to how THEX suggests, then you cannot
pass off a leaf as an internal node nor can you pass an internal node off
as a leaf without breaking the hash.

Given this, you can confirm the complete set of data against the root
hash with no additional data. Given a proper set of data and internal
nodes you can verify all of them against the root.

Post by Joseph Ashwood
I'll reexplain it for those that have had a bit of time between their math
education and now.

Your math is fine. The problem is you're making incorrect assumptions.

Post by Joseph Ashwood
----- Original Message -----
Since the flat file hash is equivalent to a depth 2 Merkle tree (root level,
leaf level), the Merkle tree cannot have less overhead than the flat case,
where both are optimized to the same level.

Incorrect. For n data items, a tree has O(n) nodes. Lookup in an ideal
tree is O(log n), which leads to sorting data items requiring O(n log n)
work.

The tree still only needs O(n) nodes for n data. At this time I'll let
you work through the proof.

Post by Joseph Ashwood
I never claimed that the entire tree would need to be downloaded again, read
what I wrote, I wrote that the entire tree would have to be downloaded, by
proof #1 above the Merkle tree will NEVER be smaller than the flat hash, and
so it will take longer to download. The impact of this is open to
discussion, but the math does not lie.

Proof #1 showed that O(n log n) was larger than O(n). You failed to show
that the tree was O(n log n) size, so with John I must dissent with your
point.

Post by Elliott Mitchell
It's not really necessary to balance the tree so perfectly. A
breadth-first parse is quite adequate.

Actually due to the computational cost of verification of the tree the
balance of the tree becomes increasingly important as sizes rise. I will
agree that when a verification covers 2 bytes the difference is irrelevant,
but when there are a number of nodes in the tree, simply verifying the tree
can become very costly. To verify this consider a Btree search, the Btree
search is identical to the full verification of a node. In a balanced Btree
the search is O(logn) time, in an unbalanced tree the search tends towards
O(n) time, anyone can do the math on those and see that the balance becomes
very important as the size increases. Again when you are verifying a 2 bytes
file the difference is negligible, but as the tree size grows the
verification will become much more costly as the tree becomes unbalanced.

You've severely wounded the Merkle implementation that most folks have
been tossing around (incremental verification), but you haven't shot down
Merkle.

In order to verify a block of data only knowing the root, you need the
data plus O(log n) hashes and O(log n) work to verify them (all the
siblings up the tree). This then equates to O(m log n) overhead in both
data transfer and work for transfering m blocks of an n block file.

This is not the only possible handling of Merkle though. If blocks of
tree nodes are handled similarly to any other data blocks, you only
transfer the tree nodes once and only verify them once. At which point
the work and data overhead becomes O(n), where n is the size of the file.
The overhead will be slightly higher if you only do a partial transfer.

Post by Joseph Ashwood
Bed segment propogation is already a small problem as parts may be
downloaded and uploaded before verification. By using a Merkle tree you
are
raising the computational overhead to perform verification, and in some
cases may be delaying the verification until the complete file is
downloaded
(to compute the non-downloaded complete tree), creating a situation where
there is a high likelihood of large scale propogation of any bad data.

No, downloaded chunks will be verified as they come in.

Verification on receipt implies verification before sending on. There
must be something I'm missing here.

Post by Joseph Ashwood
I didn't discuss it last time because I assumed it would be obvious, since
it appears there are a number of people on here without suitable math
educations, a far better solution would be a descendancy of hashes, for

My, my, we're really throwing down the gauntlet here. Watch out,
credential war winners are often not the initiator. Though I will admit
I've been repeatedly tempted to start one here.

Post by Joseph Ashwood
RootHash = hash(data, hash( for all Level1 hash))
Level1 hash [i]= hash(datasegment[i], hash(for all Level2 hash))
LevelM hash[i] = hash(datasegment[i], hash(for all LevelM+1 hash))
LevelK hash[i] = hash(datasegment[i], 0000...0000)
(for known K) where for each level the datasegment length is decreased by a
given factor. This creates an O(1) lookup for each datasegment hash of any
of the established sizes. At the cost that the initial hash must be
performed K times resulting in a creation time of O(Kn), as opposed to the
Merkle trees O(nlogn).
The other downside is that it still maintains the size of a Merkle tree,
something which is simply unnecessary for BitTorrent's uses.

Interesting construction. I must suggest that this is very nearly
equivalent to a Merkle tree in every way. I note that it looks to be much
more expensive to compute though. I also not see it being easier to verify
either incrementally nor as a whole.

To verify a block, you appear to need the block, then you need the hashes
of all sibling blocks and the parent block for the next level up. You
then appear to need the same for all higher levels. You're likely to
transfer those and they appear much larger than Merkle's.
--
(\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/)
\ ( | ***@gremlin.m5p.com PGP 8881EF59 | ) /
\_ \ | _____ -O #include <stddisclaimer.h> O- _____ | / _/
\___\_|_/82 04 A1 3C C7 B1 37 2A*E3 6E 84 DA 97 4C 40 E6\_|_/___/

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Joseph Ashwood

2005-02-06 22:39:33 UTC

----- Original Message -----
From: "Elliott Mitchell" <***@m5p.com>
Subject: Re: [BitTorrent] Back to Merkle Hash Trees...

Post by Elliott Mitchell
If there is only one way to construct the tree, the above is implicit in
the structure of the tree. If it is tampered with the hashes won't
verify.

Not quite, there are cryptographic attacks that you're not considering. In
particular the preimage reorder attack, where carefully chosen input blocks
are reordered without changing the upper hash, see the latest SHA/MD5 attack
for examples of doing this as well as the reordered preimage attacks
proposed as a result that apply to all hashes, and can be addressed at the
protocol level.

Post by Elliott Mitchell
Given hashes for a pair of siblings, you can compute the parent hash. So,
if you have the hashes for all direct children of the root you can verify
them as a group. Once these are verified you can confirm all of the next
level. This continues down to the leaves.
If the leaves are marked similar to how THEX suggests, then you cannot
pass off a leaf as an internal node nor can you pass an internal node off
as a leaf without breaking the hash.

Under the preimage reorder there's no need to pass off external/internal as
internal/external in order to break the hash.

Post by Elliott Mitchell
This is not the only possible handling of Merkle though. If blocks of
tree nodes are handled similarly to any other data blocks, you only
transfer the tree nodes once and only verify them once. At which point
the work and data overhead becomes O(n), where n is the size of the file.
The overhead will be slightly higher if you only do a partial transfer.

Most of this is good, but you're forgetting that there are nlogn hashes to
be verified, and that the hashes have significant overhead at termination,
so the time taken is actually O(n+fixed) for small n (e.g. internal nodes
with few children) the fixed comes to dominate, and the fixed time actually
has a major impact on the overall hash speed for a few kilobytes. 16K chunks
should be enough to dominate this as long as there is a reasonable branching
order to the Merkle tree (part of the reason I prefer my proposal is that by
setting the #children to a 32-bit number the branching order can be 4
billion enough to quickly dominate the fixed cost).

Post by Elliott Mitchell
Verification on receipt implies verification before sending on. There
must be something I'm missing here.

This should happen, and almost always does happen, but seemingly in any
large download there are always a few bad blocks received, and so by
relation a few bad blocks have been sent by someone else. This bad block
sending by someone else means that the other end did not verify the block
before sending it. This assumes a stable transfer medium, but TCP is
designed to supply exactly that.

I didn't design it to necessarily be easier to compute, I designed it to be
easier to verify, and I will admit that I foolishly left out the critical
serialization information (see my other statements on preimage reorder
attacks).

Post by Elliott Mitchell
To verify a block, you appear to need the block, then you need the hashes
of all sibling blocks and the parent block for the next level up. You
then appear to need the same for all higher levels. You're likely to
transfer those and they appear much larger than Merkle's.

The cost of initial verification of a block (should be good enough to send
on) is reduced to O(blockSize), because the lookup takes O(1) and the
verification O(blocksize). Once the a sufficient number of blocks have been
received to verify a hash in the next level it can be moved up. Eventually
moving the entire file up to the top level. Assuming a sufficient branching
order, there will only be the necessity for a 2 or 3 levels (assuming
branching order of >2^16). Like I said before it was not aimed at being
computationally cheap, and I actually pointed out initially that it is
computationally more expensive being O(Kn) versus Merkles O(nlogn), and
taking up the same space. It does however directly address the first
preimage problems by creating direct propogation of data instead of hash
propogation. This means a tighter security model, one where the attacker
cannot depend on the postimage data to rectify any problems because a
substantial portion of the postimage data will change. That is not to say
that it is entirely secure, it does in fact suffer under the attacks
proposed against SHA-0 and MD5, that is unless the branching is continued
down to block of sizeof(hash(x)) in which case the unicity distance proofs
apply as long as it is a strong hash under other (easier) constraints. This
same argument cannot necessarily be applied to hash trees because the
pre/postimage relationship is blinded by a hash function.

I would not suggest using my iterated hash construction though because the
proofs of the approx same strength can be derived for the Merkle
construction I posted elsewhere, without the extra computational overhead.
Joe

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Elliott Mitchell

2005-02-08 02:54:02 UTC

Post by Joseph Ashwood
----- Original Message -----

Post by sh4dowmatter
I don't know where you people learned your maths and fancy proofs...
But the last time I checked, a tree with k leaves requires k-1
"interior" nodes, regardless of how the tree was constructed --
whether it's maximally unbalanced, full, etc. The size of the tree is
thus k + k-1 = 2k - 1. And then, "trivially," 2k - 1 < 2k, or O(k).
But k is some fraction of n, so k = cn for some constant c <= 1. Hence
the size of the tree is O(n).

apologize if I made the mistake first. Second your assumptions are entirely
fallacious as well. Your assumptions require that the the branch rate be
fixed, and beyond that it requires that it be fixed at 2. Second, your

Is there a real case that is worse than a fixed branching rate of two? I
doubt anyone is planning to use a tree with large linear pieces if they
can avoid it, so I don't see a worse case.

There is the tail end of the file, but that will need no more than log(n)
nodes (worst case being 2^n+1 nodes, the extra chunk would need one node
for each level). Though the notation isn't right, this gives O(n+log(n))
nodes. The linear factor completely overwhelms the logarithmic portion,
so the tree size is essentially linearly related to the size of the
payload.

Post by sh4dowmatter
To prove that reconstructing the tree takes O(n) time (assuming
top-down reconstruction), realize that you have to "verify" 2n-2 nodes
-- that is, every node in the tree except the root. To verify a node,
you just need to 1) hash it with its sibling and 2) check that it
equals the parent. (Actually, note that since this also verifies the
sibling simultaneously, this only happens for (2n-2)/2 = n-1 nodes.)
The hashes are a constant size, and there is only one sibling for each
verification. So verifying a node takes constant time. Once every node
is the tree is verified, it is constructed. There are 2n-2 (or n-1)
verifications, which is O(n). Which means constructing the tree is O(n).

Just one problem, the tree requires n+log(n) space; this is *very*
different from nlog(n) (n*log(n)). You're corrent that O(n*log(n)) is
quite a big larger than O(n); however, O(n+log(n)) is identical to O(n).

Post by Joseph Ashwood
O(nlogn) because the insertion requires that the tree be stepped through
downwards n times. Since the depth of the tree is logn, the time to insert
one node will be O(logn), multiply by the n times necessary, and we reach
O(nlogn) reconstruction time.

Creating a tree by iterative insertion is an O(nlog(n)) operation. This
is not the only way to (re)construct a tree though. In the case of Merkle
trees and BT2, breadth-first seems the obvious one. This doesn't require
the search and so is linear with the size of the payload.

Post by Joseph Ashwood
----- Original Message -----

Post by sh4dowmatter
If there is only one way to construct the tree, the above is implicit in
the structure of the tree. If it is tampered with the hashes won't
verify.

Luckily we aren't dealing with input blocks that have been carefully
chosen by an attacker.

Post by sh4dowmatter
This is not the only possible handling of Merkle though. If blocks of
tree nodes are handled similarly to any other data blocks, you only
transfer the tree nodes once and only verify them once. At which point
the work and data overhead becomes O(n), where n is the size of the file.
The overhead will be slightly higher if you only do a partial transfer.

Most of this is good, but you're forgetting that there are nlogn hashes to
be verified, and that the hashes have significant overhead at termination,

As I said above, n+log(n), bringing the O(n) back.

Post by Joseph Ashwood
has a major impact on the overall hash speed for a few kilobytes. 16K chunks
should be enough to dominate this as long as there is a reasonable branching
order to the Merkle tree (part of the reason I prefer my proposal is that by
setting the #children to a 32-bit number the branching order can be 4
billion enough to quickly dominate the fixed cost).

2^32nd children is ridiculous, even 2^16th children is ridiculous. I
think the size of the smallest transferable data block (likely 16K or
32K) divided by the size of the hash. This leads to blocks of hashes
being handled similarly to payload blocks (reducing the complexity of the
over the wire protocol).

Post by sh4dowmatter
To verify a block, you appear to need the block, then you need the hashes
of all sibling blocks and the parent block for the next level up. You
then appear to need the same for all higher levels. You're likely to
transfer those and they appear much larger than Merkle's.

But how do you verify a block knowing only the block, root hash, and some
portion of the tree? How much of the tree do you need to transfer to
start verifying blocks?
--
(\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/)
\ ( | ***@gremlin.m5p.com PGP 8881EF59 | ) /
\_ \ | _____ -O #include <stddisclaimer.h> O- _____ | / _/
\___\_|_/82 04 A1 3C C7 B1 37 2A*E3 6E 84 DA 97 4C 40 E6\_|_/___/

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Olaf van der Spek

2005-02-06 11:07:02 UTC

Why not?
If I have file A, calculate the merkle tree and root hash rh(A), give
you rh(A), how can you give me file B with rh(B) = rh(A)?

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Joseph Ashwood

2005-02-06 21:43:16 UTC

----- Original Message -----
From: "Olaf van der Spek" <***@LIACS.NL>
Subject: Re: [BitTorrent] Back to Merkle Hash Trees...

Post by Olaf van der Spek
Why not?
If I have file A, calculate the merkle tree and root hash rh(A), give
you rh(A), how can you give me file B with rh(B) = rh(A)?

Depends on the hash used. If you use SHA-0 yes, if you use MD5 yes, SHA-1 is
the next in that series and is getting close, SHA-256/384/512 already have
weaknesses in this area shown against them.

Actually in an effort to get this as good as possible I have to admit that
my proposal is not optimally secure either. While it does not suffer from
the preimage order attack of the original it does suffer from a first
preimage attack, to correct this is fairly easy change my proposal from:

Segment #
Size
Hash
Parent by id
nodeID
number of direct children

to:
{{Segment #, size, parentId, nodeId, number of direct children},
hash({Segment #, size, parentId, nodeId, number of direct children}|
children)}

This pushes the problem to a complete preimage compromise, also known as a
complete hash break (specifically SHA-0 would actually still be viable, MD5
would not, SHA-1 is in great shape).

I think it would be better if we move the security discussion to a thread
specifically on the security discussion, that should make filtering easier
for those that don't want to be bothered.
Joe

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Olaf van der Spek

2005-02-07 19:04:20 UTC

Post by Joseph Ashwood
----- Original Message -----
Subject: Re: [BitTorrent] Back to Merkle Hash Trees...

Post by Olaf van der Spek
Why not?
If I have file A, calculate the merkle tree and root hash rh(A), give
you rh(A), how can you give me file B with rh(B) = rh(A)?

Depends on the hash used.

Of course. I assumed that was safe and I thought somebody 'invented'
another method that'd work even with a safe hash.

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Nick Johnson

2005-02-07 21:20:17 UTC

Post by Olaf van der Spek
Why not?
If I have file A, calculate the merkle tree and root hash rh(A), give
you rh(A), how can you give me file B with rh(B) = rh(A)?

As far as I'm aware, there have been no preimage attacks found in MD5,
SHA-0, or SHA-1. See http://www.cryptography.com/cnews/hash.html.
Trying to build a merkle tree implementation assuming fundamental
weaknesses in the hash you're using seems a bad idea. If the hash is
broken, all bets are off.

-Nick Johnson

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

John Hoffman

2005-02-05 06:22:41 UTC

Well... the O(n) overhead is a lot heavier than the O(n log n)
overhead, so except on extremely large torrents (read >100G) you aren't
going to see a significant increase.

The idea is to both send opposing tree data, as requested, with chunks,
and to cache the data. It does not require redownloading the entire
tree in the case of an error, but merely the hashes received with a bad
chunk, which would not be a large number (30 or less, max 600 bytes,
probably much less).

Post by Joseph Ashwood
There are additional problems in balancing the signed library tree in order
to achieve maximum speed and in not propogating the bad segments.
For maximum processing speed the tree needs to be properly balanced doing
this requires properly balancing across the hash. In the case os SHA-1 this
requires tuning in such a way that numHashes*512/160 is as close as possible
to an integer, while this is not immediately apparent as computationally
intensive you have to understand that it is actually an O(nlogn) problem,
and while not overly costly is not cheap.

It's not really necessary to balance the tree so perfectly. A
breadth-first parse is quite adequate.

No, downloaded chunks will be verified as they come in.

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Joseph Ashwood

2005-02-05 09:57:14 UTC

I'll reexplain it for those that have had a bit of time between their math
education and now.

----- Original Message -----
From: "John Hoffman" <***@shambala.net>
Subject: Re: [BitTorrent] Back to Merkle Hash Trees...

Well... the O(n) overhead is a lot heavier than the O(n log n)
overhead, so except on extremely large torrents (read >100G) you aren't
going to see a significant increase.

Actually it is quite the opposite, the change becomes immediate. The O(n)
hashing will require processing n bytes, the O(nlogn) hashing will require
no less. Simple proof:
Assume: the number of bytes hashed depends on the input, if not all input is
hashed then the system collapses
Assume: hashes take O(n) time or greater, otherwise the hash does not depend
on all the input and assumption 1 is broken
Reality: Modern cryptographic hashes (e.g. SHA-1) are trivially more
expensive than O(n), but more so for smaller n
Assume: Generating a stream of hashes is needed regardless.
Assume: The straight hashing is actually equivalent to the Merkle tree in
the case where the log base is infinite.
Assume: Log(n) strictly increases as the log base decreases, for values of n

Post by John Hoffman
1

The assumptions are trivially correct, but not worth proving.
Since all functional Merkle trees cannot have an infinite log base, we have
a clear math situation where:
nlogn > n for all values of n > base of log

For clarity log base 2:
n O(n) O(nlogn)
2 2 2
4 4 8

Second proof:
Since the flat file hash is equivalent to a depth 2 Merkle tree (root level,
leaf level), the Merkle tree cannot have less overhead than the flat case,
where both are optimized to the same level.

So clearly your claim has two completely incorrect statements:
1) The Merkle tree is computationally cheaper. Wrong by proof 1
2) The Merkle tree overhead is lower than the flat. Wrong by proof 2.

While these difference may not be noticable in small files the difference is
very real, and will be noticable well before the claimed >100G.

I never claimed that the entire tree would need to be downloaded again, read
what I wrote, I wrote that the entire tree would have to be downloaded, by
proof #1 above the Merkle tree will NEVER be smaller than the flat hash, and
so it will take longer to download. The impact of this is open to
discussion, but the math does not lie.

It's not really necessary to balance the tree so perfectly. A
breadth-first parse is quite adequate.

No, downloaded chunks will be verified as they come in.

Then explain why bad segments are received when the transmission medium
remains unchanged. There will be downloaded chunks that are not verified,
and they will be shipped around regardless of how many time we repeat to
ourselves that this won't happen. If each segment was verified before it was
sent on, there would be no need to verify the segment on receipt, simply
verify the communication medium. Instead the segments must be verified
before they can be trusted. Incidently, you will find this listed as proof
by contradiction.

I stand behind what I said earlier, the Merkle tree/library signing
algorithm is not well suited to the purpose of BitTorrent, it is only
suitable when a filesystem is put in place and attempting to wedge it into
the BitTorrent solution is simply not necessary.

I didn't discuss it last time because I assumed it would be obvious, since
it appears there are a number of people on here without suitable math
educations, a far better solution would be a descendancy of hashes, for
example:
RootHash = hash(data, hash( for all Level1 hash))
Level1 hash [i]= hash(datasegment[i], hash(for all Level2 hash))
LevelM hash[i] = hash(datasegment[i], hash(for all LevelM+1 hash))
LevelK hash[i] = hash(datasegment[i], 0000...0000)
(for known K) where for each level the datasegment length is decreased by a
given factor. This creates an O(1) lookup for each datasegment hash of any
of the established sizes. At the cost that the initial hash must be
performed K times resulting in a creation time of O(Kn), as opposed to the
Merkle trees O(nlogn).

The other downside is that it still maintains the size of a Merkle tree,
something which is simply unnecessary for BitTorrent's uses.
Joe

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Olaf van der Spek

2005-02-06 10:44:42 UTC

Never smaller doesn't mean it's always bigger.
If the bottom of the tree is as wide as the number of pieces in BT1, the
overhead is (exactly) equal in terms of hashes transfered.

Post by John Hoffman
No, downloaded chunks will be verified as they come in.

Then explain why bad segments are received when the transmission medium
remains unchanged. There will be downloaded chunks that are not verified,

Because the chunks are changed/edited after receiving (or torrent
creation) and before sending.

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Olaf van der Spek

2005-02-05 10:53:45 UTC

Post by Joseph Ashwood
----- Original Message -----
Subject: Re: [BitTorrent] Back to Merkle Hash Trees...

Post by Konstantin 'Kosta' Welke
who still doesnt understand where the big advantage over a list
of hashed blocks is, anyways

Tiny .torrent size, smaller verification granularity, compatibility with
other merkle tree hashes.

And the associated disadvantages of not having those hashes immediately
available. Harder to determine the bad parts of the file, higher overhead in

Who said anything about not having the hashes available when needed?

Post by Joseph Ashwood
computing hashes, longer download times (extra hashes need to be
downloaded).
Harder .... bad parts. Current process: lookup segment hash in O(1) time,
compare hash O(m) time, total time O(n). Merkle process: step through tree
to leaf O(log(n)) time, then compare hashes O(n) time, total time
O(nlog(m)).

Isn't that limited by network transfer rate instead of CPU time?

Post by Joseph Ashwood
Higher overhead...: Computing root hash using current method O(n) time.
Computing root hash for Merkle O(nlogn) time.

Isn't that limited by disk transfer rate instead of CPU time?

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Justin Cormack

2005-02-05 14:22:06 UTC

Isn't that limited by network transfer rate instead of CPU time?

Post by Joseph Ashwood
Higher overhead...: Computing root hash using current method O(n) time.
Computing root hash for Merkle O(nlogn) time.

Isn't that limited by disk transfer rate instead of CPU time?

sha1 (let alone sha256 - dont have an implementation lying around to test)
is pretty CPU intensive..

ok my desktop can manage 130MB/s, though thats only the speed of a couple
of disks. A VIA C3 I have lying around can only manage 10MB/s, much slower
than its disk.

Merkle SHA256 is a significant amount more CPU time, its a question of
whether it is worth it.

j

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Olaf van der Spek

2005-02-05 14:59:47 UTC

Isn't that limited by network transfer rate instead of CPU time?

Post by Joseph Ashwood
Higher overhead...: Computing root hash using current method O(n) time.
Computing root hash for Merkle O(nlogn) time.

Isn't that limited by disk transfer rate instead of CPU time?

But the extra hashing required to calculate the merkle tree is tiny
compared to the normal hashing. So I still consider this point not relevant.

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Joseph Ashwood

2005-02-06 21:20:44 UTC

----- Original Message -----
From: "Olaf van der Spek" <***@LIACS.NL>
Subject: Re: [BitTorrent] Back to Merkle Hash Trees...

Post by Joseph Ashwood
And the associated disadvantages of not having those hashes immediately
available. Harder to determine the bad parts of the file, higher overhead in

Who said anything about not having the hashes available when needed?

Unfortunately the lack of hashes available is a requirement for random
access downloading. If everyone downloads the hash tree first, the hash tree
becomes universally available, but the data may not be. In order to rectify
this the tree must be randomly accessed as well during the download. This
leads directly to a situation where subtrees may not be linked into the full
tree, and hence the hashes for verification are not available.

Isn't that limited by network transfer rate instead of CPU time?

As with any bottleneck it moves around as assumptions change. Assuming
SHA-512, 2.1 GHz pentium 4, Windows XP, Crypto++ used for implementation,
and assuming a 100baseT connection the situation changes substantially. The
system can receive at 12.5Mbytes/second, but can only hash at 11.4 MB/s even
the flat file would be unable to keep up. Admittedly this situation is not
in the near future for most uses, but my systems spends most of their time
at 100% computation load, the extra overhead required for a non-flat
traversal would have an impact even if I was running on an 800 baud modem.

Post by Joseph Ashwood
Higher overhead...: Computing root hash using current method O(n) time.
Computing root hash for Merkle O(nlogn) time.

Isn't that limited by disk transfer rate instead of CPU time?

Most of the time, yes, but as with above the bottlenecks move. Already
though we're seeing some change in this. As SATA becomes more available and
pushes more towards the 200MBytes/second even SHA-1 is having trouble
keeping up in a flat file.

It may seem trivial to try to optimize these other bottleneck points when
the other bottlenecks are dominant, but as those bottlenecks are eliminated
the other known bottlenecks will become evident.
Joe

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Olaf van der Spek

2005-02-07 19:02:25 UTC

Post by Olaf van der Spek
Who said anything about not having the hashes available when needed?

Unfortunately the lack of hashes available is a requirement for random
access downloading.

Why? BT1 appears to do quite well, or is that not random order downloading?

Post by Joseph Ashwood
If everyone downloads the hash tree first, the hash tree
becomes universally available, but the data may not be. In order to rectify
this the tree must be randomly accessed as well during the download. This
leads directly to a situation where subtrees may not be linked into the full
tree, and hence the hashes for verification are not available.

Doesn't that 'may' depend on the way you exchange the hashes?

Isn't that limited by network transfer rate instead of CPU time?

True, true.

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Olaf van der Spek

2005-02-05 10:56:08 UTC

Post by Joseph Ashwood
Bed segment propogation is already a small problem as parts may be
downloaded and uploaded before verification. By using a Merkle tree you are

What/who is uploading unverified chunks/pieces?

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Justin Cormack

2005-02-05 12:31:13 UTC

Post by Joseph Ashwood
Bed segment propogation is already a small problem as parts may be
downloaded and uploaded before verification. By using a Merkle tree you are

What/who is uploading unverified chunks/pieces?

If the cost of verifying is high, and you rarely expect it to fail, there
might be a temptation to pass out pieces before you have verfified them,
and queue them up for checking later.

However BT1 protocol doesnt actually allow that (would require an unhave
message).

Post by Olaf van der Spek
Yahoo! Groups Links

Konstantin 'Kosta' Welke

2005-02-05 13:31:16 UTC

Post by Konstantin 'Kosta' Welke
Kosta
who still doesnt understand where the big advantage over a list
of hashed blocks is, anyways

Tiny .torrent size, smaller verification granularity, compatibility with
other merkle tree hashes.

Is there some URL somewhere that describes the way Merkle hash
trees are supposed to be used in bt2? It seems that a lot of
people have very different ideas on how they are used exaclty.
Most importantly: Does a .torrent file contain the whole hash
tree or just the root hash? Maybe a part of the tree?

If the complete tree is in the .torrent, where is the advantage
over just a list of hashes?

If the partial tree is in the .torrent, is the idea that you
only need those parts of the tree that are necessary to download
what you want of a file (or directory tree) and leave those other
parts out of it?

just my ideas how this might work,
Kosta

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Olaf van der Spek

2005-02-06 11:13:05 UTC

Post by Konstantin 'Kosta' Welke

Post by Konstantin 'Kosta' Welke
Kosta
who still doesnt understand where the big advantage over a list
of hashed blocks is, anyways

Tiny .torrent size, smaller verification granularity, compatibility with
other merkle tree hashes.

Is there some URL somewhere that describes the way Merkle hash
trees are supposed to be used in bt2? It seems that a lot of

No (AFAIK), but I've got an experimental implementation described at
http://62.216.18.38/bt_merkle/

Post by Konstantin 'Kosta' Welke
people have very different ideas on how they are used exaclty.
Most importantly: Does a .torrent file contain the whole hash
tree or just the root hash? Maybe a part of the tree?

Just the root hash.

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Konstantin 'Kosta' Welke

2005-02-06 13:37:52 UTC

Post by Konstantin 'Kosta' Welke
Is there some URL somewhere that describes the way Merkle hash
trees are supposed to be used in bt2? It seems that a lot of

No (AFAIK), but I've got an experimental implementation described at
http://62.216.18.38/bt_merkle/

Okay, that makes things much more clear to me. (Assuming that
hash trees in bt2 have about the same purpose)

Thanks,
Kosta

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Bill Cox

2005-02-07 12:17:17 UTC

Post by Konstantin 'Kosta' Welke
Is there some URL somewhere that describes the way Merkle hash
trees are supposed to be used in bt2? It seems that a lot of

No (AFAIK), but I've got an experimental implementation described at
http://62.216.18.38/bt_merkle/

Olaf,

This spec is excellent. It's an outstanding document. Also, it sounds
like you've already tested it, so it shouldn't have too many surprises
for implementers.

While there are minor points that I would have liked to fine-tune, I
think having a working, well done spec like this is more important. If
I had a vote, I'd vote that we adopt this spec as-is.

This gets back to a fundamental problem I feel this group currently has.
Without active participation by the author of BT, we'll have trouble
making ANY advances in the BT protocol, since we don't have any
mechanism for officially agreeing on anything. With p2p networking
advancing so rapidly, I feel the future of the BT protocol is in doubt
if it cannot evolve.

So, I'd like to see Olaf's Merkel hash trees adopted. How can we
proceed in that direction?

Perhaps a new unmoderated mailing list be formed, where permission to
post to the group would be given only to authors of BT clients, or those
who have at least published patches to existing clients. Such a group
might be able to make progress.

If there's interest in such a group, I could set it up, or perhaps
someone like Olaf should do it.

Bill

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Brian Dessent

2005-02-07 19:14:46 UTC

Post by Bill Cox
This gets back to a fundamental problem I feel this group currently has.
Without active participation by the author of BT, we'll have trouble
making ANY advances in the BT protocol, since we don't have any
mechanism for officially agreeing on anything. With p2p networking
advancing so rapidly, I feel the future of the BT protocol is in doubt
if it cannot evolve.
So, I'd like to see Olaf's Merkel hash trees adopted. How can we
proceed in that direction?
Perhaps a new unmoderated mailing list be formed, where permission to
post to the group would be given only to authors of BT clients, or those
who have at least published patches to existing clients. Such a group
might be able to make progress.

Oh boy, this again. Were you not here the last time that happened?

The fact of the situation is the Bram is probably off working on his own
implementation of all of this and doesn't care what the mailing list
thinks. That's just the way he tends to be. If you want to try to make
something new then go ahead but don't call it BitTorrent. He's been
quite clear in the past that BT is what it says it is and it's not a
group process. For better or worse...

Brian

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Brian Dessent

2005-02-07 19:17:23 UTC

Post by Brian Dessent
quite clear in the past that BT is what it says it is and it's not a
group process. For better or worse...

typo: s/it says it is/he says it is/

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Olaf van der Spek

2005-02-07 22:02:09 UTC

Post by Konstantin 'Kosta' Welke
Is there some URL somewhere that describes the way Merkle hash
trees are supposed to be used in bt2? It seems that a lot of

No (AFAIK), but I've got an experimental implementation described at
http://62.216.18.38/bt_merkle/

What would you like to tune?

Post by Bill Cox
think having a working, well done spec like this is more important. If
I had a vote, I'd vote that we adopt this spec as-is.

Justin Cormack

2005-02-07 22:18:17 UTC

Post by Konstantin 'Kosta' Welke
Is there some URL somewhere that describes the way Merkle hash
trees are supposed to be used in bt2? It seems that a lot of

No (AFAIK), but I've got an experimental implementation described at
http://62.216.18.38/bt_merkle/

Ugh, no certainly not. Its a quick hack to try out some ideas, as a "spec"
it is neither specific enough, nor well thought out.

Writing good protocols, and writing good specs is a difficult process,
there are lots of inetrests to be balanced and arguments to have, and it
is not really a job for one person.

As far as I am concerned The most important long term thing is that there
needs to be an approved RFC for a protocol that fulfils the function of
bittorrent. De-facto "standards" (even disregarding the ambiguities and
implementation differences) are not going to legitimize this sort of
protocol in the way a real standard will.

The initial thought was simply to write up BT1 as it is in the right way
(despite the potential existence of BT2), but the more I have thought about
it the less likely it seems that it would actually be accepted as an RFC,
as there are problems with parts of the protocol.

Things that I think are significantly problematic are
1. torrent files are not ASCII, so they cant be passed by email, irc, whatever.
Hence the almost universal use of html as a transport, although it is
completely unnecessary. As transparent compression is supported in web servers
and many other mechanisms, it should be possible to design a file format that
is as compact but ASCII. The idea of encoding in a URL is related, but also
requires a shorter hash (eg Merkle trees).
2. Related to 1, the info hash depends on the torrent file format not just
on say the hashes of the file (it depends on the filenames too which is not
necessary either). Without this 1. would be fairly irrelevant as you could
make different encodings of the torrent without having to recode to bencoding.
3. Parts of the protocol are not ipv6 compliant (note to implementors: just
say no to compact=1). This is not acceptable in an RFC.
4. The overlapping of multiple subfiles into single pieces is really nasty.
It is not clear to me how multiple files should be handled, as if the
users dont want them all (and their client knows that) then they cease to
be interested in large parts of the torrent, and assumptions about the
peers you are connected to being interested in what you are go away. I
believe that if trackers scaled well you would be better off scrapping all
multifile support.

Things that it would be nice to fix as they are not very orthogonal
1. The statistics gathering part of the tracker protocol is not very well
thought out (eg the start/stop/completed thing doesnt work with UDP tracker,
and doesnt give out much helpful information, and peers can spoof). It would
be best to seperate the function of getting peers from statistics gathering
(and probably junk the stats gathering replacing it with a process that
connects with peers, handshakes and reads the bitmap). This would make it
easier to develop and test different peer distribution protocols, as a
distributed one clearly makes sense.
2. Lack of a dont_have_any_more message limits many types of applications
eg caches that need to drop data later. I cant think of any reason not to
add it, even if most clients wont use it (as it is a rare message you could
just send a new bitmap, although that uses more bandwidth).
3. piece size is all rather odd as the verfification and request units are
mismatched so as far as I could see when writing a client you might as well
request a whole piece from one client at once always. Nothing else made
any sense to me. SO why not make the requests whole piece?

Generally the protocol works well (and the peer protocol needs very little
change, apart from a tightening up of the spec), as demonstrated by experience
however I do think a real standards track process would be beneficial.

Then of course there are Merkle trees...

Justin

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Brian Dessent

2005-02-07 22:36:50 UTC

Post by Justin Cormack
3. Parts of the protocol are not ipv6 compliant (note to implementors: just
say no to compact=1). This is not acceptable in an RFC.

The tracker admins that save heaps and heaps of bandwidth from that
extension disagree with you there. The fact is it's not a very well
thought out extension that has no means of coping with ipv6, but ipv6
support is nowhere in the current BT protocol so it doesn't matter. The
bandwidth savings however are significant, and there are lots of
trackers that won't talk to a client that doesn't support "compact"
because of this.

Brian

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Olaf van der Spek

2005-02-07 22:35:54 UTC

Post by Justin Cormack
3. Parts of the protocol are not ipv6 compliant (note to implementors: just
say no to compact=1). This is not acceptable in an RFC.

Wouldn't adding something like peers6 'fix' this?

Post by Brian Dessent
support is nowhere in the current BT protocol so it doesn't matter. The
bandwidth savings however are significant, and there are lots of
trackers that won't talk to a client that doesn't support "compact"
because of this.

Brian Dessent

2005-02-07 22:45:50 UTC

Post by Brian Dessent
The tracker admins that save heaps and heaps of bandwidth from that
extension disagree with you there. The fact is it's not a very well
thought out extension that has no means of coping with ipv6, but ipv6

Wouldn't adding something like peers6 'fix' this?

Yeah, that'd work. But the other issues surrounding ipv6 have been
hashed out over and over on this list in the past so there's more than
just adapting the binary peers list in the tracker response. (as I'm
sure you're aware of...)

Brian

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Olaf van der Spek

2005-02-07 22:43:12 UTC

Wouldn't adding something like peers6 'fix' this?

I know, I know, but I was wondering about the 'The fact is it's not a
very well hought out extension that has no means of coping with ipv6' if
the compact extension can be modified to deal with IPv6 easily.

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Brian Dessent

2005-02-07 22:56:38 UTC

Post by Brian Dessent
Yeah, that'd work. But the other issues surrounding ipv6 have been
hashed out over and over on this list in the past so there's more than
just adapting the binary peers list in the tracker response. (as I'm
sure you're aware of...)

The extension was well thought out in that it saves a lot of bandwidth
in the context of the current BT protocol. It's not well thought out in
that it assumes binary ipv4 addresses and nothing else. You can fix
that with any number of ad-hoc extensions, but I guess I'm just saying
that perhaps its successor might consider the idea of being both
extensible in the future and a compact binary format.

Brian

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Justin Cormack

2005-02-07 23:11:31 UTC

I actually think that say using multitracker is probably much more significant.

And we seem to be getting off the point a bit.

Its great to try out loads of things, but a standards process is important
and it should look at ways of solving the problems by better methods.

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Elliott Mitchell

2005-02-07 23:53:00 UTC

Might I suggest a "compact=2" data format consisting of tuples along
the lines of:

<byte:protocol#><4/16bytes:addr(size by prot)><2bytes:port>

Where the protocol# byte comes from RFC 1340 (available at
http://www.iana.org/assignments/protocol-numbers, and other locations).
This would be 0 for IPv4, and 41 for IPv6. The one issue is indicating
which types are acceptable, likely simply a list of bytes during the
request.
--
(\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/)
\ ( | ***@gremlin.m5p.com PGP 8881EF59 | ) /
\_ \ | _____ -O #include <stddisclaimer.h> O- _____ | / _/
\___\_|_/82 04 A1 3C C7 B1 37 2A*E3 6E 84 DA 97 4C 40 E6\_|_/___/

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Justin Cormack

2005-02-08 00:32:16 UTC

Might I suggest a "compact=2" data format consisting of tuples along
<byte:protocol#><4/16bytes:addr(size by prot)><2bytes:port>
Where the protocol# byte comes from RFC 1340 (available at
http://www.iana.org/assignments/protocol-numbers, and other locations).
This would be 0 for IPv4, and 41 for IPv6. The one issue is indicating
which types are acceptable, likely simply a list of bytes during the
request.

Because this is minor tinkering at the edges of the protocol that makes
very little difference (the difference between binary ipv4 and compressed
ascii ipv4 addresses is only about 40%, so if thats whats limiting your
number of peers it doesnt help that much).

RFC1340 is not a suitable choice (although maybe there are people
who want to run BT over CRTP (Combat Radio Transport Protocol)). 41 is for
ipv6 in ipv4 tunnelling, and 0 is not ipv4. Its what tells you whether your
ip packet is TCP (6) or UDP (17).

Justin

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Justin Cormack

2005-02-07 23:07:27 UTC

Wouldn't adding something like peers6 'fix' this?

Doesnt make it less of a gross hack.

Actually this brings me to one of the strangely undocumented bits of the
protocol which is that the "ip" field (eg as got in peers list) is never
actually documented anywhere I have seen and while everyone uses numeric
addresses, they could be names potentially. Not that that really matters.

It I happen to think that peer to peer apps are what will drive adoption
of ipv6, now it is well supported in software and becoming easily
available. (Well P2P and mobile phones).

Justin

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Justin Cormack

2005-02-07 22:49:53 UTC

Post by Justin Cormack
3. Parts of the protocol are not ipv6 compliant (note to implementors: just
say no to compact=1). This is not acceptable in an RFC.

a. I have never found a tracker that wont let me connect without compact=1
but I am sure things like that happen on the wilder shores of BT usage. I
dont mind if people do these things, but I dont think it is necessary, and
my main point is that it would not be accepted in an RFC.

b. ipv6 support is in the current protocol, it works fine. I have tested
downloads from ipv6 clients through an ipv6 tracker check out
http://6net.iif.hu/index.php?mn=3&sm=10&lg=en
The problem is mixing ipv6 and ipv4.

Justin

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Olaf van der Spek

2005-02-07 22:37:12 UTC

Post by Konstantin 'Kosta' Welke
Is there some URL somewhere that describes the way Merkle hash
trees are supposed to be used in bt2? It seems that a lot of

No (AFAIK), but I've got an experimental implementation described at
http://62.216.18.38/bt_merkle/

Ugh, no certainly not. Its a quick hack to try out some ideas, as a "spec"
it is neither specific enough, nor well thought out.

Any specific comments?

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Justin Cormack

2005-02-07 22:58:45 UTC

Post by Konstantin 'Kosta' Welke
Is there some URL somewhere that describes the way Merkle hash
trees are supposed to be used in bt2? It seems that a lot of

No (AFAIK), but I've got an experimental implementation described at
http://62.216.18.38/bt_merkle/

Ugh, no certainly not. Its a quick hack to try out some ideas, as a "spec"
it is neither specific enough, nor well thought out.

Any specific comments?

While you have a piece size on the torrent file, you have basically hard
coded 32k in in the protocol. Then you have limited the filesize by only using
32 bits for the piece. I thought everyone had pretty much agreed that 1k/4k
was a good size once we go Merkle?

The description of chunk_have is very unclear and I dont understand it.

Requiring clients to be able to read a gzipped torrent is really not necessary
just recommend/mandate that they support gzipped http.

your xbt url doesnt support a multifile torrent.

Justin

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Olaf van der Spek

2005-02-07 23:12:15 UTC

Post by Konstantin 'Kosta' Welke
Is there some URL somewhere that describes the way Merkle hash
trees are supposed to be used in bt2? It seems that a lot of

No (AFAIK), but I've got an experimental implementation described at
http://62.216.18.38/bt_merkle/

Ugh, no certainly not. Its a quick hack to try out some ideas, as a "spec"
it is neither specific enough, nor well thought out.

Any specific comments?

While you have a piece size on the torrent file, you have basically hard
coded 32k in in the protocol. Then you have limited the filesize by only using

That's the chunk size, not the piece size.

Post by Justin Cormack
32 bits for the piece. I thought everyone had pretty much agreed that 1k/4k

2^47 bytes doesn't look like a 'big' limitation.

Post by Justin Cormack
was a good size once we go Merkle?

I can't remember any discussion about that.

Post by Justin Cormack
The description of chunk_have is very unclear and I dont understand it.

It's like have, but for a chunk instead of for a piece.

Post by Justin Cormack
Requiring clients to be able to read a gzipped torrent is really not necessary
just recommend/mandate that they support gzipped http.
your xbt url doesnt support a multifile torrent.

Why not?
It's the info_hash that's included, not the root hash of a single file.

Post by Justin Cormack
Justin
Yahoo! Groups Links

--
Olaf van der Spek
http://xccu.sf.net/

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Justin Cormack

2005-02-07 23:30:56 UTC

Post by Justin Cormack
While you have a piece size on the torrent file, you have basically hard
coded 32k in in the protocol. Then you have limited the filesize by only using

That's the chunk size, not the piece size.

Ah I understand now. See below.

Post by Justin Cormack
32 bits for the piece. I thought everyone had pretty much agreed that 1k/4k

2^47 bytes doesn't look like a 'big' limitation.

Its a bit too small, 128TB, as it is within range of the size of filesystems
people have now, let alone a few years in the future.

Post by Justin Cormack
was a good size once we go Merkle?

I can't remember any discussion about that.

It has come up a few times I think.

Post by Justin Cormack
The description of chunk_have is very unclear and I dont understand it.

It's like have, but for a chunk instead of for a piece.

Oh I see. See below.

Why not?
It's the info_hash that's included, not the root hash of a single file.

But the info_hash isnt sufficient information to verify a torrent. Of course
one logical thing would be to Merkle hash the individual file hashes into
one hash...

ok re see belows above.

One of the points of Merkle hashes is that you dont really need the piece/chunk
distinction.

I can see what you are doing, keeping the distinction means fewer have messages
(though you negate this somewhat as the chunk_have messages clearly have a
useful purpose, even if only at some times).

There are a lot of options available once one gets this flexibility, and it
is probably best to scrap the piece/chunk distinction.

Call the smallest verifiable unit SVU (say 4k).

One option would be that if have messages are a range of SVUs (or SVU + length)
and requests have lengths then you could have a standard algorithm to ramp
up request sizes as downloads progress, for example, or make this rarity
based. This would amortise the slightly large have and request messages for
small requests with those for larger requests later.

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Olaf van der Spek

2005-02-08 09:47:56 UTC

Post by Justin Cormack
32 bits for the piece. I thought everyone had pretty much agreed that 1k/4k

2^47 bytes doesn't look like a 'big' limitation.

Its a bit too small, 128TB, as it is within range of the size of filesystems
people have now, let alone a few years in the future.

Let's just say I'd love to hit that limitation.

Post by Justin Cormack
your xbt url doesnt support a multifile torrent.

Why not?
It's the info_hash that's included, not the root hash of a single file.

But the info_hash isnt sufficient information to verify a torrent. Of course

Why not?
With the info_hash, you can verify the info key and with the info key
you can verify the rest.

Post by Justin Cormack
one logical thing would be to Merkle hash the individual file hashes into
one hash...
ok re see belows above.
One of the points of Merkle hashes is that you dont really need the piece/chunk
distinction.
I can see what you are doing, keeping the distinction means fewer have messages
(though you negate this somewhat as the chunk_have messages clearly have a
useful purpose, even if only at some times).

It's not just that. It's also a smaller bitfield message and smaller
in-memory bitfields.

Post by Justin Cormack
There are a lot of options available once one gets this flexibility, and it
is probably best to scrap the piece/chunk distinction.
Call the smallest verifiable unit SVU (say 4k).
One option would be that if have messages are a range of SVUs (or SVU + length)
and requests have lengths then you could have a standard algorithm to ramp
up request sizes as downloads progress, for example, or make this rarity
based. This would amortise the slightly large have and request messages for
small requests with those for larger requests later.

I thought about something like that, but someoone else suggested I kept
the protocol as simple as possible. I wanted to include a 32-bit vector
in the request, along with a piece index. That'd allow you to request
between 0 and 32 chunks of one piece with one request. Another
optimization would be to allow multiple requests per request message.
The same could be done for have and cancel messages.

Another issue is random access IO. If a seek takes 10 ms, you can only
do 100 seeks per second and with chunks of 4 kb, that'd mean a top of
400 kb. If you don't have pieces and peers requests chunks completely at
random, you can't do read-ahead.

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Justin Cormack

2005-02-08 10:13:35 UTC

Post by Justin Cormack
32 bits for the piece. I thought everyone had pretty much agreed that 1k/4k

2^47 bytes doesn't look like a 'big' limitation.

Its a bit too small, 128TB, as it is within range of the size of filesystems
people have now, let alone a few years in the future.

Let's just say I'd love to hit that limitation.

Better safe than sorry.

Post by Justin Cormack
your xbt url doesnt support a multifile torrent.

Why not?
It's the info_hash that's included, not the root hash of a single file.

But the info_hash isnt sufficient information to verify a torrent. Of course

Why not?
With the info_hash, you can verify the info key and with the info key
you can verify the rest.

yes but without the Merkle root hashes you dont know where the error was.

It's not just that. It's also a smaller bitfield message and smaller
in-memory bitfields.

The overhead is not that great, especially if you give an offset and length
you have verified in the have message. You can store internally as this too.

I thought about something like that, but someoone else suggested I kept
the protocol as simple as possible. I wanted to include a 32-bit vector
in the request, along with a piece index. That'd allow you to request
between 0 and 32 chunks of one piece with one request. Another
optimization would be to allow multiple requests per request message.
The same could be done for have and cancel messages.
Another issue is random access IO. If a seek takes 10 ms, you can only
do 100 seeks per second and with chunks of 4 kb, that'd mean a top of
400 kb. If you don't have pieces and peers requests chunks completely at
random, you can't do read-ahead.

Sorry, was there an extra dont in that last sentence?

Most of the time it makes sense to do transfers in large amounts at a time
it is just at the beginning you might want a lower value.

j

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Olaf van der Spek

2005-02-08 10:31:07 UTC

Post by Justin Cormack
32 bits for the piece. I thought everyone had pretty much agreed that 1k/4k

2^47 bytes doesn't look like a 'big' limitation.

Its a bit too small, 128TB, as it is within range of the size of filesystems
people have now, let alone a few years in the future.

Let's just say I'd love to hit that limitation.

Better safe than sorry.

The largest torrents I've seen were about 8 gb. That's a factor 16384
smaller then the limit. I doubt the protocol will still be in use when
this becomes an issue.
But using 64-bit indexes would be an easy solution.

Post by Justin Cormack
your xbt url doesnt support a multifile torrent.

Why not?
It's the info_hash that's included, not the root hash of a single file.

But the info_hash isnt sufficient information to verify a torrent. Of course

Why not?
With the info_hash, you can verify the info key and with the info key
you can verify the rest.

yes but without the Merkle root hashes you dont know where the error was.

But those root hashes are in the info key.

It's not just that. It's also a smaller bitfield message and smaller
in-memory bitfields.

The overhead is not that great, especially if you give an offset and length
you have verified in the have message. You can store internally as this too.

That sounds a lot more complex then the flat bit vector in use now
(internally).

I thought about something like that, but someoone else suggested I kept
the protocol as simple as possible. I wanted to include a 32-bit vector
in the request, along with a piece index. That'd allow you to request
between 0 and 32 chunks of one piece with one request. Another
optimization would be to allow multiple requests per request message.
The same could be done for have and cancel messages.
Another issue is random access IO. If a seek takes 10 ms, you can only
do 100 seeks per second and with chunks of 4 kb, that'd mean a top of
400 kb. If you don't have pieces and peers requests chunks completely at
random, you can't do read-ahead.

Sorry, was there an extra dont in that last sentence?
Most of the time it makes sense to do transfers in large amounts at a time
it is just at the beginning you might want a lower value.

True.

With 4k chunks, do you keep the entire merkle tree in memory?

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Justin Cormack

2005-02-08 10:44:57 UTC

Post by Justin Cormack
32 bits for the piece. I thought everyone had pretty much agreed that 1k/4k

2^47 bytes doesn't look like a 'big' limitation.

Its a bit too small, 128TB, as it is within range of the size of filesystems
people have now, let alone a few years in the future.

Let's just say I'd love to hit that limitation.

Better safe than sorry.

The largest torrents I've seen were about 8 gb. That's a factor 16384
smaller then the limit. I doubt the protocol will still be in use when
this becomes an issue.

I am expecting to be using 1TB+ by the end of this year, so it seems a lot
closer...

Post by Olaf van der Spek
But using 64-bit indexes would be an easy solution.

Post by Justin Cormack
your xbt url doesnt support a multifile torrent.

Why not?
It's the info_hash that's included, not the root hash of a single file.

But the info_hash isnt sufficient information to verify a torrent. Of course

Why not?
With the info_hash, you can verify the info key and with the info key
you can verify the rest.

yes but without the Merkle root hashes you dont know where the error was.

But those root hashes are in the info key.

Are we talking at cross purposes? I thought the idea was a URL that would
replace the torrent file by encoding all the information in it.

It's not just that. It's also a smaller bitfield message and smaller
in-memory bitfields.

The overhead is not that great, especially if you give an offset and length
you have verified in the have message. You can store internally as this too.

That sounds a lot more complex then the flat bit vector in use now
(internally).

Not sure if it is internally, I think it is in the peer protcol that it
becomes more complex. As you have a Merkle tree, your "what have I verified"
structure is also a tree. eg once you have verified the first half of the
content, all you need store is the hash for this, and the fact that your
data structure has this (rather than branches) is all the information you
need. The more you download the less overhead there is.

I thought about something like that, but someoone else suggested I kept
the protocol as simple as possible. I wanted to include a 32-bit vector
in the request, along with a piece index. That'd allow you to request
between 0 and 32 chunks of one piece with one request. Another
optimization would be to allow multiple requests per request message.
The same could be done for have and cancel messages.
Another issue is random access IO. If a seek takes 10 ms, you can only
do 100 seeks per second and with chunks of 4 kb, that'd mean a top of
400 kb. If you don't have pieces and peers requests chunks completely at
random, you can't do read-ahead.

Sorry, was there an extra dont in that last sentence?
Most of the time it makes sense to do transfers in large amounts at a time
it is just at the beginning you might want a lower value.

True.
With 4k chunks, do you keep the entire merkle tree in memory?

If you dont want to use this much memory, you can choose a large chunk size
yourself, never request less than that and never use the small hashes (though
I suppose you might have to calculate them if requested which is a
disadvantage).

Post by Olaf van der Spek
Yahoo! Groups Links

Olaf van der Spek

2005-02-08 11:22:18 UTC

Post by Justin Cormack
32 bits for the piece. I thought everyone had pretty much agreed that 1k/4k

2^47 bytes doesn't look like a 'big' limitation.

Its a bit too small, 128TB, as it is within range of the size of filesystems
people have now, let alone a few years in the future.

Let's just say I'd love to hit that limitation.

Better safe than sorry.

The largest torrents I've seen were about 8 gb. That's a factor 16384
smaller then the limit. I doubt the protocol will still be in use when
this becomes an issue.

I am expecting to be using 1TB+ by the end of this year, so it seems a lot
closer...

What's 'using'?
Do you expect to have a storage volume that large or to transfer
torrents that large?

Post by Olaf van der Spek
But those root hashes are in the info key.

Are we talking at cross purposes? I thought the idea was a URL that would
replace the torrent file by encoding all the information in it.

No, it's only supposed to contain the info needed to join a torrent.
The rest of the info key would be retrieved via the get_info/info extension.

Post by Justin Cormack
Most of the time it makes sense to do transfers in large amounts at a time
it is just at the beginning you might want a lower value.

True.
With 4k chunks, do you keep the entire merkle tree in memory?

Could you post the URL of the post with the reasons why 4k was chosen
(or repost the reasons themselves)?

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Justin Cormack

2005-02-08 11:30:36 UTC

Post by Justin Cormack
32 bits for the piece. I thought everyone had pretty much agreed that 1k/4k

2^47 bytes doesn't look like a 'big' limitation.

Its a bit too small, 128TB, as it is within range of the size of filesystems
people have now, let alone a few years in the future.

Let's just say I'd love to hit that limitation.

Better safe than sorry.

The largest torrents I've seen were about 8 gb. That's a factor 16384
smaller then the limit. I doubt the protocol will still be in use when
this becomes an issue.

I am expecting to be using 1TB+ by the end of this year, so it seems a lot
closer...

What's 'using'?
Do you expect to have a storage volume that large or to transfer
torrents that large?

Torrents that large.

Post by Olaf van der Spek
But those root hashes are in the info key.

Are we talking at cross purposes? I thought the idea was a URL that would
replace the torrent file by encoding all the information in it.

No, it's only supposed to contain the info needed to join a torrent.
The rest of the info key would be retrieved via the get_info/info extension.

Ah I see.

I still think that (as per my original post at the start of the thread)
having a text based torrent file is a better solution than this. The main
problem about get_info is that peers have to carry around enough information
to recreate the info from teh torrent file (eg the filenames). The code
implementing the peer protocol shouldnt need to know about stuff like that.

Post by Justin Cormack
Most of the time it makes sense to do transfers in large amounts at a time
it is just at the beginning you might want a lower value.

True.
With 4k chunks, do you keep the entire merkle tree in memory?

Could you post the URL of the post with the reasons why 4k was chosen
(or repost the reasons themselves)?

eg see the THEX paper.

Post by Olaf van der Spek
Yahoo! Groups Links

Olaf van der Spek

2005-02-08 13:07:51 UTC

Post by Olaf van der Spek
What's 'using'?
Do you expect to have a storage volume that large or to transfer
torrents that large?

Torrents that large.

Amazing. But with size doubling every two years, it'd still be 14 years
until the limit is hit.

Post by Olaf van der Spek
But those root hashes are in the info key.

Are we talking at cross purposes? I thought the idea was a URL that would
replace the torrent file by encoding all the information in it.

No, it's only supposed to contain the info needed to join a torrent.
The rest of the info key would be retrieved via the get_info/info extension.

Ah I see.
I still think that (as per my original post at the start of the thread)
having a text based torrent file is a better solution than this. The main

I consider an URL easier to pass around then a (text) file.

Post by Justin Cormack
problem about get_info is that peers have to carry around enough information
to recreate the info from teh torrent file (eg the filenames). The code
implementing the peer protocol shouldnt need to know about stuff like that.

It doesn't need to know. It just has to pass the info blob around.

Post by Justin Cormack
Most of the time it makes sense to do transfers in large amounts at a time
it is just at the beginning you might want a lower value.

True.
With 4k chunks, do you keep the entire merkle tree in memory?

Could you post the URL of the post with the reasons why 4k was chosen
(or repost the reasons themselves)?

eg see the THEX paper.

No it doesn't. That paper says nothing about minimal transfer size.
It does mention 1 kbyte as base segment size, but my 'spec' uses 1 kbyte
as base segment size too but it doesn't use it as minimal transfer size.

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Justin Cormack

2005-02-08 13:15:53 UTC

Post by Olaf van der Spek
What's 'using'?
Do you expect to have a storage volume that large or to transfer
torrents that large?

Torrents that large.

Amazing. But with size doubling every two years, it'd still be 14 years
until the limit is hit.

The doubling rate of data storage is often reckoned to be higher than this.
And otehr people have mroe data.

Post by Olaf van der Spek
But those root hashes are in the info key.

Are we talking at cross purposes? I thought the idea was a URL that would
replace the torrent file by encoding all the information in it.

No, it's only supposed to contain the info needed to join a torrent.
The rest of the info key would be retrieved via the get_info/info extension.

Ah I see.
I still think that (as per my original post at the start of the thread)
having a text based torrent file is a better solution than this. The main

I consider an URL easier to pass around then a (text) file.

A URL is a small text file. If we can make the .torrent small enough it can
fit in the url.

It doesn't need to know. It just has to pass the info blob around.

It messes up the code, because you cant allocate buffers until you receive
it.

Post by Justin Cormack
Most of the time it makes sense to do transfers in large amounts at a time
it is just at the beginning you might want a lower value.

True.
With 4k chunks, do you keep the entire merkle tree in memory?

Could you post the URL of the post with the reasons why 4k was chosen
(or repost the reasons themselves)?

eg see the THEX paper.

But if you cant transfer less than 32k, there is no point having the
segment size less than 32k. As far as I can see.

Post by Olaf van der Spek
Yahoo! Groups Links

Olaf van der Spek

2005-02-08 13:29:49 UTC

Post by Olaf van der Spek
I consider an URL easier to pass around then a (text) file.

A URL is a small text file. If we can make the .torrent small enough it can
fit in the url.

No, an URL is a small text. It's not a file.

It doesn't need to know. It just has to pass the info blob around.

It messes up the code, because you cant allocate buffers until you receive
it.

What's the disadvantage of the delayed buffer allocation?

Post by Justin Cormack
eg see the THEX paper.

But if you cant transfer less than 32k, there is no point having the
segment size less than 32k. As far as I can see.

The (only) point is to make the root hash (and top of the tree)
independent of the chunk/piece size and to maintain compatibility with
other uses of merkle hashes/THEX.

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Justin Cormack

2005-02-08 13:44:48 UTC

Post by Olaf van der Spek
I consider an URL easier to pass around then a (text) file.

A URL is a small text file. If we can make the .torrent small enough it can
fit in the url.

No, an URL is a small text. It's not a file.

ok, but it fulfils the same purpose. You can write it on a wall. Either
is better than a binary file like we have now.

It doesn't need to know. It just has to pass the info blob around.

It messes up the code, because you cant allocate buffers until you receive
it.

What's the disadvantage of the delayed buffer allocation?

You start to get have messages for example before you know what they mean
because you dont know the piece size. Even if you just connect to a peer
to get the info blob it has to send you have messages. And a bitmap that
you dont know whether it is the right length. You are going to have to
cache all this stuff and process it later.

Post by Justin Cormack
eg see the THEX paper.

But if you cant transfer less than 32k, there is no point having the
segment size less than 32k. As far as I can see.

The (only) point is to make the root hash (and top of the tree)
independent of the chunk/piece size and to maintain compatibility with
other uses of merkle hashes/THEX.

But your implementation has a fixed 32k chunk so that dopesnt matter.

Being compatible with other uses, hmm, well I am not sure. Who else is
using THEX?

Justin

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Olaf van der Spek

2005-02-08 14:19:09 UTC

Post by Justin Cormack
It messes up the code, because you cant allocate buffers until you receive
it.

What's the disadvantage of the delayed buffer allocation?

That'd be one option. You could also introduce a have_info message and
not send other stuff before you receive that.

Post by Justin Cormack
eg see the THEX paper.

But if you cant transfer less than 32k, there is no point having the
segment size less than 32k. As far as I can see.

The (only) point is to make the root hash (and top of the tree)
independent of the chunk/piece size and to maintain compatibility with
other uses of merkle hashes/THEX.

But your implementation has a fixed 32k chunk so that dopesnt matter.

It does. I could change my implementation without invalidating old
.torrents.

Post by Justin Cormack
Being compatible with other uses, hmm, well I am not sure. Who else is
using THEX?

I don't know. I considered the cost of being compatible lower than using
another base segment size.

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Justin Cormack

2005-02-08 16:02:25 UTC

Post by Justin Cormack
It messes up the code, because you cant allocate buffers until you receive
it.

What's the disadvantage of the delayed buffer allocation?

That'd be one option. You could also introduce a have_info message and
not send other stuff before you receive that.

The other major problem is when conencting to a torrent, if most peers use
the URL encoding it will be very hard to find a peer (ie the seed) that has
the info data. If there are a lot of clients all at once there will be a large
delay before anyone can do useful work.

Post by Justin Cormack
eg see the THEX paper.

But if you cant transfer less than 32k, there is no point having the
segment size less than 32k. As far as I can see.

The (only) point is to make the root hash (and top of the tree)
independent of the chunk/piece size and to maintain compatibility with
other uses of merkle hashes/THEX.

But your implementation has a fixed 32k chunk so that dopesnt matter.

It does. I could change my implementation without invalidating old
.torrents.

Post by Justin Cormack
Being compatible with other uses, hmm, well I am not sure. Who else is
using THEX?

I don't know. I considered the cost of being compatible lower than using
another base segment size.

Personally I think 4k as a chunk size and hashable size makes sense as this
is the most common page size and file system block size so it is the amount
that will be written to disk anyway, and its not too small.

But the issue of piece size may be relevant too.

Justin

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Olaf van der Spek

2005-02-08 16:14:52 UTC

Post by Justin Cormack
It messes up the code, because you cant allocate buffers until you receive
it.

What's the disadvantage of the delayed buffer allocation?

That'd be one option. You could also introduce a have_info message and
not send other stuff before you receive that.

The first thing any peer does, is get the info. The info is probably
very small (size depends on the file count), so would that really be an
issue?

Post by Justin Cormack
eg see the THEX paper.

But if you cant transfer less than 32k, there is no point having the
segment size less than 32k. As far as I can see.

The (only) point is to make the root hash (and top of the tree)
independent of the chunk/piece size and to maintain compatibility with
other uses of merkle hashes/THEX.

But your implementation has a fixed 32k chunk so that dopesnt matter.

It does. I could change my implementation without invalidating old
.torrents.

Post by Justin Cormack
Being compatible with other uses, hmm, well I am not sure. Who else is
using THEX?

I don't know. I considered the cost of being compatible lower than using
another base segment size.

But what is the advantage over 32k? Do you consider 32k too large?

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Justin Cormack

2005-02-08 16:45:03 UTC

Post by Justin Cormack
It messes up the code, because you cant allocate buffers until you receive
it.

What's the disadvantage of the delayed buffer allocation?

That'd be one option. You could also introduce a have_info message and
not send other stuff before you receive that.

The first thing any peer does, is get the info. The info is probably
very small (size depends on the file count), so would that really be an
issue?

I think it can be. If a lot of peers connect to a torrent at once, most wont
have it. If you request it what should they do? You dont mention this case.
Presumably you should only ask one peer at a time for it (or you run the risk
of eating bandwidth by getting multiple copies) so if a small proportion of
the peers have it it can take a long time to find them.

Post by Justin Cormack
eg see the THEX paper.

But if you cant transfer less than 32k, there is no point having the
segment size less than 32k. As far as I can see.

The (only) point is to make the root hash (and top of the tree)
independent of the chunk/piece size and to maintain compatibility with
other uses of merkle hashes/THEX.

But your implementation has a fixed 32k chunk so that dopesnt matter.

It does. I could change my implementation without invalidating old
.torrents.

Post by Justin Cormack
Being compatible with other uses, hmm, well I am not sure. Who else is
using THEX?

I don't know. I considered the cost of being compatible lower than using
another base segment size.

But what is the advantage over 32k? Do you consider 32k too large?

Well 32k might be reasonable if piece = chunk = 32k if there is not too much
overhead in sending out have messages. Not having piece and chunk simplifies
things. Actually I would pick 65536, and then use 48 bit chunk numbers,
bacause I am tidy minded that way... Its an improvement over the current
(typical) piece size of 256k or 1M. The main reason this was so high was
because of size of torrent file.

Doing the maths, a 4G file with 64k pieces/chunks will have 64k of them,
which means a bitmap is 8k, which is smaller than a request (64k) so
thats not a huge transfer. you will get 64k have messages (worst case)
which is 512k. These figures are per peer.

Actually the file size for which the request size = bitmap size is:
64k chunk: 32G
32k chunk: 8G
4k: 128M

With 4k pieces/chunks the bitmaps get too big (128k) especially compared
to the transfer size which has gone down to 4k. You clearly need to change
how these are encoded.

In fact if we dont want to change the encoding you could specify the
piece and chunk size by this rule...

Your scheme where you can transfer and verify 32k chunks but cant tell
anyone you have them until you get enough to make an arbitrary larger
piece doesnt make much sense to me. It is a kind of lossy compression
of have and bitmap messages in effect.

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Olaf van der Spek

2005-02-08 17:54:30 UTC

Post by Olaf van der Spek
The first thing any peer does, is get the info. The info is probably
very small (size depends on the file count), so would that really be an
issue?

I think it can be. If a lot of peers connect to a torrent at once, most wont
have it. If you request it what should they do? You dont mention this case.

True, but my document isn't about that. The URL format is an idea for
further extensions.

Post by Justin Cormack
Presumably you should only ask one peer at a time for it (or you run the risk
of eating bandwidth by getting multiple copies) so if a small proportion of
the peers have it it can take a long time to find them.

A have_info would solve that.

Post by Olaf van der Spek
But what is the advantage over 32k? Do you consider 32k too large?

Well 32k might be reasonable if piece = chunk = 32k if there is not too much
overhead in sending out have messages. Not having piece and chunk simplifies
things. Actually I would pick 65536, and then use 48 bit chunk numbers,
bacause I am tidy minded that way... Its an improvement over the current
(typical) piece size of 256k or 1M. The main reason this was so high was
because of size of torrent file.
Doing the maths, a 4G file with 64k pieces/chunks will have 64k of them,
which means a bitmap is 8k, which is smaller than a request (64k) so
thats not a huge transfer. you will get 64k have messages (worst case)
which is 512k. These figures are per peer.
64k chunk: 32G
32k chunk: 8G
4k: 128M
With 4k pieces/chunks the bitmaps get too big (128k) especially compared
to the transfer size which has gone down to 4k. You clearly need to change
how these are encoded.
In fact if we dont want to change the encoding you could specify the
piece and chunk size by this rule...
Your scheme where you can transfer and verify 32k chunks but cant tell
anyone you have them until you get enough to make an arbitrary larger
piece doesnt make much sense to me. It is a kind of lossy compression
of have and bitmap messages in effect.

You can tell everyone that you have them if necessary/useful, that's
what chunk_have is for.
And the disadvantage of a larger chunk size is that a peer is required
to send the entire chunk (or to close the connection). This is related
to the choking algorithm.

And you don't have to redownload 'chunks' if a 'piece' fails (advantage
over BT1).

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Justin Cormack

2005-02-08 18:07:11 UTC

Post by Justin Cormack
Well 32k might be reasonable if piece = chunk = 32k if there is not too much
overhead in sending out have messages. Not having piece and chunk simplifies
things. Actually I would pick 65536, and then use 48 bit chunk numbers,
bacause I am tidy minded that way... Its an improvement over the current
(typical) piece size of 256k or 1M. The main reason this was so high was
because of size of torrent file.
Doing the maths, a 4G file with 64k pieces/chunks will have 64k of them,
which means a bitmap is 8k, which is smaller than a request (64k) so
thats not a huge transfer. you will get 64k have messages (worst case)
which is 512k. These figures are per peer.
64k chunk: 32G
32k chunk: 8G
4k: 128M
With 4k pieces/chunks the bitmaps get too big (128k) especially compared
to the transfer size which has gone down to 4k. You clearly need to change
how these are encoded.
In fact if we dont want to change the encoding you could specify the
piece and chunk size by this rule...
Your scheme where you can transfer and verify 32k chunks but cant tell
anyone you have them until you get enough to make an arbitrary larger
piece doesnt make much sense to me. It is a kind of lossy compression
of have and bitmap messages in effect.

You can tell everyone that you have them if necessary/useful, that's
what chunk_have is for.

But having *both* makes for complicated decisions (when do I use each one)
and optimisation. Whats wrong with one kind of have message which refers
to something that just got transferred.

Post by Olaf van der Spek
And the disadvantage of a larger chunk size is that a peer is required
to send the entire chunk (or to close the connection). This is related
to the choking algorithm.

Well you shouldnt unchoke if you arent prepared to feed one chunk. And
so 32k is clearly ok, as thats what BT1 uses. I think scaling it by file
size makes some sense, as assuming the number of peers is not proportional
to the size of the file you are going to have to feed them more.

Post by Olaf van der Spek
And you don't have to redownload 'chunks' if a 'piece' fails (advantage
over BT1).

Yes thats a big advantage.

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Olaf van der Spek

2005-02-08 18:18:58 UTC

Post by Olaf van der Spek
You can tell everyone that you have them if necessary/useful, that's
what chunk_have is for.

But having *both* makes for complicated decisions (when do I use each one)

True.

Post by Justin Cormack
and optimisation. Whats wrong with one kind of have message which refers
to something that just got transferred.

Network overhead. Which could be 'prevented' with the same kind of
complicated decisions.

Post by Olaf van der Spek
And the disadvantage of a larger chunk size is that a peer is required
to send the entire chunk (or to close the connection). This is related
to the choking algorithm.

Well you shouldnt unchoke if you arent prepared to feed one chunk. And

But if chunks would be for example 1 mbyte, that'd be an 'issue'.

Post by Justin Cormack
so 32k is clearly ok, as thats what BT1 uses. I think scaling it by file

Wasn't that 16k?

Post by Justin Cormack
size makes some sense, as assuming the number of peers is not proportional
to the size of the file you are going to have to feed them more.

But the granularity of chokes would change too (as it depends on chunk
size).

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Elliott Mitchell

2005-02-09 00:10:39 UTC

Post by Justin Cormack
Being compatible with other uses, hmm, well I am not sure. Who else is
using THEX?

I don't know. I considered the cost of being compatible lower than using
another base segment size.

I've stated why I think THEX is bogus.

Post by Justin Cormack
Personally I think 4k as a chunk size and hashable size makes sense as this
is the most common page size and file system block size so it is the amount
that will be written to disk anyway, and its not too small.

But what is the advantage over 32k? Do you consider 32k too large?

Nope, though the mainline BT1 client defaults to transfers of 16K though.
Most other clients used 32K. *shrug*, either would work. 64K might be
worthy of consideration.

Post by Olaf van der Spek
Well 32k might be reasonable if piece = chunk = 32k if there is not too much
overhead in sending out have messages. Not having piece and chunk simplifies
things. Actually I would pick 65536, and then use 48 bit chunk numbers,
bacause I am tidy minded that way... Its an improvement over the current
(typical) piece size of 256k or 1M. The main reason this was so high was
because of size of torrent file.

Torrent file size is one consideration, but it shares equal ground with
HAVE messages.

Post by Olaf van der Spek
Doing the maths, a 4G file with 64k pieces/chunks will have 64k of them,
which means a bitmap is 8k, which is smaller than a request (64k) so
thats not a huge transfer. you will get 64k have messages (worst case)
which is 512k. These figures are per peer.

There are two other things to consider here. Data transfer per peer, and
overall BT overhead. Figuring the canonical 50 peers equally handling the
load, each peer will send you roughly 85MB of payload. At this point that
512K is 0.5% overhead, not large but not something to casually enlarge.
Overall with BT1 HAVE messages come out pretty close to a full 50% of the
overhead.

Other factors here. Assuming the low-level encapsulation stays the same,
those messages have 4 bytes of length (silly seeing how no clients send
messages larger than 64K) and 1 command byte. You've made them have a
total size of 11 bytes per, at which point the overhead is 704K, 1%.
Against this, on average peers will have 50% of the pieces, bringing it
back to 352K, 0.5%. A smart peer could also supress HAVE messages for
pieces you're already known to have, bringing it all the way down to
0.25%.

This still represents nearly all of the BT overhead though.

(note to others, I conceed certain earlier arguments along similar lines
here; I now understand which way to go there)

Post by Olaf van der Spek
64k chunk: 32G
32k chunk: 8G
4k: 128M
With 4k pieces/chunks the bitmaps get too big (128k) especially compared
to the transfer size which has gone down to 4k. You clearly need to change
how these are encoded.

Considering 1TB torrents, that bitmap hits 2MB, a very hefty connection
overhead indeed.

Post by Olaf van der Spek
Your scheme where you can transfer and verify 32k chunks but cant tell
anyone you have them until you get enough to make an arbitrary larger
piece doesnt make much sense to me. It is a kind of lossy compression
of have and bitmap messages in effect.

Does indeed amount to a compression algorithm of sorts. There is the one
other issue Olaf mentioned, by causing clustering of requests you greatly
reduce the load on your disks. Considering modern line speeds and this
does turn out to be a massive speed boost. In the light that HAVE
messages are the single largest portion of the BitTorrent protocol, it
does make sense.
--
(\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/)
\ ( | ***@gremlin.m5p.com PGP 8881EF59 | ) /
\_ \ | _____ -O #include <stddisclaimer.h> O- _____ | / _/
\___\_|_/82 04 A1 3C C7 B1 37 2A*E3 6E 84 DA 97 4C 40 E6\_|_/___/

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Justin Cormack

2005-02-09 10:12:26 UTC

ok, so we are getting into a technical discussion, good.

My view is that we should be doing protcol design, rationally. Anyone can
make 3 protcols before breakfast and they might even work, but I am more
interested in designing good ones with rational reasons.

ok, so in summary, with Merkle trees and verification at 1k/4k or whatever
level, having a piece size specified is merely a way of compressing have
messages and bitmaps. Its a bit of a hangover from its previous use in BT1
which was a combination of reducing have traffic and keeping the torrent
file small.

Olavs protcol lets you request in 32k chunks (and verify) but you can only
send have messages in this predetermined size set by whover made the
torrent. Lets junk the fixed piece size, and see what the options are.

Here is one suggestion

Lets change the have (and request etc) messages to look like
uint32_t clen
uint8_t message_type
uint8_t log
uint32_t piece

ie 10 bytes. piece size no longer fixed, but specified by 1 << log (may be
minimum eg 1k/4k). Have messages should be as compact as possible, so if you
have the whole file (the initial message from a seed) you send (for 4G file)
log=33, piece=0. We can abolish the bitmap message and just send initial
have messages, as mostly clients will have significant locality [They could
be longer, but we want to encourage some locality for performance reasons].

Now this doesnt buy us anything if we send a have message after each chunk,
but we now have the ability to dynamically vary piece size. First thing we
think about is how large a piece can we request in one go. Clearly if we
request really big pieces (the whole file!) we send fewer have messages but
increase the latency of them.

First thing we notice is that we dont have to send messages at the same rate
to all peers. If we are choking them, we dont need to send any messages
until the point we unchoke, so we can batch them. Another strategy is to send
have messages for rare pieces immediately to the peers that dont have them,
while you coalesce common ones.

Another thing we notice is that it might now make sense to change the unchoke
message to have a single byte log (meaning I will let you download at least
this much now, maybe more). This gives us a piece size guideline, and may
help the choking protocol.

End game is another time where making piece size really small might help,
we could get rid of the endgame protcol and cancel messages if we can shrink
piece size down.

In fact the beginning and end both symmetrically require small pieces,
which suggests that studying this further will give us a good insight into
how to select sizes

Clearly this needs analysis to see how much space it can save, or how much
it can improve latency over a fixed piece size.

This is only a suggestion. We have isolated a problem, realised it is a
tradeoff (can reduce overhead but only by increasing latency) and have changed
the protcol so that clients can dynamically make this tradeoff rather than hard
coding it into the protocol. If we find optimum fixed values (or as function
of say filesize) we can recommend or mandate them in a standard. If it
turns out to depend on network conditions it can stay dynamic.

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Olaf van der Spek

2005-02-09 10:54:07 UTC

Post by Justin Cormack
Olavs protcol lets you request in 32k chunks (and verify) but you can only
send have messages in this predetermined size set by whover made the
torrent. Lets junk the fixed piece size, and see what the options are.

How often do I have to point out the chunk_have mssage?

Post by Justin Cormack
Here is one suggestion
Lets change the have (and request etc) messages to look like
uint32_t clen
uint8_t message_type
uint8_t log
uint32_t piece

Should response messages still match request messages or are you allowed
to send only part of the requested range?

Post by Justin Cormack
Now this doesnt buy us anything if we send a have message after each chunk,
but we now have the ability to dynamically vary piece size. First thing we
think about is how large a piece can we request in one go. Clearly if we
request really big pieces (the whole file!) we send fewer have messages but
increase the latency of them.

What's the definition of latency here?

Post by Justin Cormack
First thing we notice is that we dont have to send messages at the same rate
to all peers. If we are choking them, we dont need to send any messages
until the point we unchoke, so we can batch them. Another strategy is to send
have messages for rare pieces immediately to the peers that dont have them,

What's rare for you doesn't have to be rare for somebody else.

Post by Justin Cormack
while you coalesce common ones.

Justin Cormack

2005-02-09 11:00:32 UTC

How often do I have to point out the chunk_have mssage?

You dont give any strategy for when to use it, and you suggested that
basically clients were expected to use the piece have message, so I ignored it.

Post by Justin Cormack
Here is one suggestion
Lets change the have (and request etc) messages to look like
uint32_t clen
uint8_t message_type
uint8_t log
uint32_t piece

Should response messages still match request messages or are you allowed
to send only part of the requested range?

Thats an inetresting one. I was assuming they matched (hence my suggestion
to modify the unchoke to say how much you would be allowed). You could
also allow them to differ and use this to measure how much you are being
choked. I think either way giving clients this information about choking
is a good thing.

What's the definition of latency here?

Latency is the time between when you download something (the minimum
downloadable unit) and when you announce a have so another client can get it
from you. It clearly has a major effect on throughput. Note the BT1 way of
having a minimum downloadable unit (chunk) smaller than the verifiable unit
unavoidably increases latency, probably by too much or the endgame stuff
wouldnt be needed.

What's rare for you doesn't have to be rare for somebody else.

Within the set of peers you can see. It should be a random sample so you
should be able to make judgeements that are probably correct.

Post by Justin Cormack
while you coalesce common ones.

Yahoo! Groups Links

Olaf van der Spek

2005-02-09 11:12:16 UTC

How often do I have to point out the chunk_have mssage?

You dont give any strategy for when to use it, and you suggested that
basically clients were expected to use the piece have message, so I ignored it.

In most of the cases, yes. That doesn't mean it's not there when needed
(begin game mode).

Post by Justin Cormack
Here is one suggestion
Lets change the have (and request etc) messages to look like
uint32_t clen
uint8_t message_type
uint8_t log
uint32_t piece

Should response messages still match request messages or are you allowed
to send only part of the requested range?

What's the definition of latency here?

Clearly? Why?

Post by Justin Cormack
having a minimum downloadable unit (chunk) smaller than the verifiable unit
unavoidably increases latency, probably by too much or the endgame stuff
wouldnt be needed.

As far as I know, end game mode has nothing to do with have messages.

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Justin Cormack

2005-02-09 11:29:45 UTC

Post by Justin Cormack
increase the latency of them.

What's the definition of latency here?

Clearly? Why?

Post by Justin Cormack
having a minimum downloadable unit (chunk) smaller than the verifiable unit
unavoidably increases latency, probably by too much or the endgame stuff
wouldnt be needed.

As far as I know, end game mode has nothing to do with have messages.

It has to do with latency.

ok, some numbers.

assume connections are 1Mbit ie 128kbyte/s

Downloading a piece takes us best case 2s for 256k pieces, then we have to
checksum it. However in fact we might be downloading at a typical time
maybe 8 pieces (whats a typical number? clearly depends on the stage of your
download etc, not sure) from different peers, so on average each might take 16s
even assuming you are never choked (unlikely). So the time between you
downloading the (first) minimum download unit (16k say in BT1 - which will
take 1s at same contention) is 15s. Average latency is 8s, best case.

If you have lots of things to download from your peers this really doesnt
matter, the arrival of a new have message is fairly unimportant. But at the
beginning (when most peers dont have anything) and the end (when most peers
have the same as you), you can cut this latency by a huge amount by sending
the have message for a smaller unit (like the minimum verifable unit), so
that it can be sent along a chain of peers much faster).

This gives us another strategy for out have compression algorithm: if the peer
is interested, dont send haves, batch them up so they can be coalesced into
haves for larger amounts. After all the peer is already saying it is happy
to download what we are offering. This reduces bandwidth used for have
messages even more.

Your solution of chunk-haves can be used for reducing latency, but doesnt
suggest strategies for how to use it, and when you use it it always increases
bandwidth. My version starts to suggest how to use it, and has other side
effects like reducing the load on seeds and near-seeds as they dont have
to send out bitmaps, they can send high order have messages.

Justin

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Olaf van der Spek

2005-02-09 11:45:00 UTC

Post by Justin Cormack
beginning (when most peers dont have anything) and the end (when most peers
have the same as you), you can cut this latency by a huge amount by sending

Ah, you're assuming all peers are at the same stage. That's not what the
normal end game mode in the official client is about.

Post by Justin Cormack
the have message for a smaller unit (like the minimum verifable unit), so
that it can be sent along a chain of peers much faster).
This gives us another strategy for out have compression algorithm: if the peer
is interested, dont send haves, batch them up so they can be coalesced into
haves for larger amounts. After all the peer is already saying it is happy
to download what we are offering. This reduces bandwidth used for have
messages even more.
Your solution of chunk-haves can be used for reducing latency, but doesnt
suggest strategies for how to use it, and when you use it it always increases

True, but as different usage strategies don't (always) require different
protocol syntax, I'd consider that less important.

Post by Justin Cormack
bandwidth. My version starts to suggest how to use it, and has other side
effects like reducing the load on seeds and near-seeds as they dont have
to send out bitmaps, they can send high order have messages.

Also true, but my extension was specifically designed to keep the
changes as simple as possible.

Changes in the p2p protocol are (IMO) easier to make (via extension
bits) then major incompatible changes in the .torrent format.

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Justin Cormack

2005-02-09 12:06:00 UTC

Post by Justin Cormack
Your solution of chunk-haves can be used for reducing latency, but doesnt
suggest strategies for how to use it, and when you use it it always increases

True, but as different usage strategies don't (always) require different
protocol syntax, I'd consider that less important.

But you add protcol syntax, 2 special cases. My suggestion only has 1.

extension bits are generally not very good ways to change protocols, as any
client that doesnt understand them will probably just hang up as they dont
understand them. Actually with the BT peer protocol, mandating that commands
that you dont understand are ignored (as they have a size this is pretty
easy to do, though you might penalise peers that send you lots of stuff
you dont understand...)

There are 2 issues here.

1. Merkle trees change that entire protocol in non compatible ways, so it
is a great opportunity to change everything. There is no advantge to
having a piece size in the torrent file for "small change reasons".

2. The reason for this initial thread is that I am interested in a
standards track (ie accepted RFC) protocol that fulfills the functions
of bittorrent. Currently there is a de-facto standard (even if it is not
well documented) of BT1, but as I said at the start of this thread I dont
think it is acceptable as an RFC in its current form. I dont think Bram's
BT2 will be either, as even outside the technical reasons, there needs to
be a standardisation process involved.

Now I am interested in standards because I work for a *very* small company
and we cant make up our own standards like large companies. Our potential
clients only want open standards (and open source often). So BT1 is kind
of ok as its quite simple, de-facto, best we have at the moment. BT2 is
of no interest whatsoever. If it becomes more widely used than BT1 in a
few years then well its another de-facto standard perhaps.

I also think that it is very important in legitimising this type of protocol.
You cant really ban an IETF RFC...

It is less clear to me whether anyone else is interested in this.

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Marcel Popescu

2005-02-09 12:48:08 UTC

Post by Justin Cormack
It is less clear to me whether anyone else is interested in this.

If by "this" you mean this thread - well, not really <g>, but it serves a
very important purpose: it's archived. Oh, and it's also quite open - anyone
who has an opinion can voice it.

So don't take it off-the-list, please. (Unless the moderator, if there's
one, objects - of course.)

Marcel
--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.300 / Virus Database: 265.8.6 - Release Date: 2/7/2005

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Martin Nilsson

2005-02-09 05:21:57 UTC

Post by Justin Cormack
Doing the maths, a 4G file with 64k pieces/chunks will have 64k of them,
which means a bitmap is 8k, which is smaller than a request (64k) so
thats not a huge transfer. you will get 64k have messages (worst case)
which is 512k. These figures are per peer.

Many times (seeds and recently started peers) it is also possible to
apply high compression to the bitmap. Recursively divide the bitmap into
two parts and make a binary tree. Then encode the the tree from the top with

00 - subtree all zeroes
01 - two nodes follows
10 - unencoded subtree follows
11 - subtree all ones

/Martin Nilsson

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Nick Johnson

2005-02-09 01:35:01 UTC

Post by Justin Cormack
The other major problem is when conencting to a torrent, if most peers use
the URL encoding it will be very hard to find a peer (ie the seed) that has
the info data. If there are a lot of clients all at once there will be a large
delay before anyone can do useful work.

This is no different from the problem presented when a popular torrent
is created - one seed has all the torrent data, many peers have none. BT
solves that one admirably, why not this one?

-Nick Johnson

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Brian Dessent

2005-02-08 12:03:46 UTC

Post by Olaf van der Spek
The largest torrents I've seen were about 8 gb. That's a factor 16384
smaller then the limit. I doubt the protocol will still be in use when
this becomes an issue.
But using 64-bit indexes would be an easy solution.

For the record, I've observed torrents exceeding ~65gb "in the wild",
and that was 6mo-1yr ago timeframe. Not that that comes anywhere close
to 128TB, but it never hurts to plan ahead. Perhaps the clients might
exchange "multiplier" values at handshake (defaulting to olaf's spec of
32k iirc) that the 32 bit offset is referencing. That way future
expansion is possible, if needed.

Brian

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Bill Cox

2005-02-04 12:06:04 UTC

On Fri, 2005-02-04 at 00:38 +0000, sh4dowmatter wrote:
...

Post by sh4dowmatter
There are still only five leaf hashes, and four internal hashes
(including root), but I get the power of being able to use array-based
indexing in the tree (so indexing with ROOT at 0, H at 1, G at 2,
etc). That is, I get lchild[i] = 2 * i + 1 and rchild[i] = 2 * i + 2.
Also, the size of the underlying array has a simple closed form -- the
number of 'hash elements' to allocate is simply 2 * num_leaves - 1.
Finally, given If I'm doing an array-based implementation, then I know
that the first internal node is at element array_size - num_leaves -
1; assigning i this index downto 0, I can compute each internal hash
as simply hash(lchild[i] + rchild[i]). I not afforded this convenience
so easily by using the format shown in THEX.

IMO, you're improvement is worth doing. Speed and simplicity still
count, even if you have to compute a SHA1 for each internal node.

Olaf, your client is testing Merkle hash trees now? If you can send me
a spec, I'll upgrade the client I'm playing around with (btslave), and
perhaps we can test them with each other. I like the upgrade proposed
by shadowmatter. Would it be hard to modify what you've done?

Thanks,
Bill

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Olaf van der Spek

2005-02-05 15:49:37 UTC

Post by Bill Cox
Olaf, your client is testing Merkle hash trees now? If you can send me
a spec, I'll upgrade the client I'm playing around with (btslave), and

Sure. It's at http://62.216.18.38/bt_merkle/

Post by Bill Cox
perhaps we can test them with each other. I like the upgrade proposed
by shadowmatter. Would it be hard to modify what you've done?

AFAIK it's an implementation issue and does not require changes to the
specification.

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

sh4dowmatter

2005-02-06 03:13:56 UTC

I don't know where you people learned your maths and fancy proofs...
But the last time I checked, a tree with k leaves requires k-1
"interior" nodes, regardless of how the tree was constructed --
whether it's maximally unbalanced, full, etc. The size of the tree is
thus k + k-1 = 2k - 1. And then, "trivially," 2k - 1 < 2k, or O(k).
But k is some fraction of n, so k = cn for some constant c <= 1. Hence
the size of the tree is O(n).

Which leads me to... Constructing the root of a Merkle tree, and
reconstructing the entire tree just given the root, takes O(n) time.

To prove that reconstructing the tree takes O(n) time (assuming
top-down reconstruction), realize that you have to "verify" 2n-2 nodes
-- that is, every node in the tree except the root. To verify a node,
you just need to 1) hash it with its sibling and 2) check that it
equals the parent. (Actually, note that since this also verifies the
sibling simultaneously, this only happens for (2n-2)/2 = n-1 nodes.)
The hashes are a constant size, and there is only one sibling for each
verification. So verifying a node takes constant time. Once every node
is the tree is verified, it is constructed. There are 2n-2 (or n-1)
verifications, which is O(n). Which means constructing the tree is O(n).

To prove that constructing the tree takes O(n) time, apply a similar
argument.

- shadowmatter

Post by Joseph Ashwood
Higher overhead...: Computing root hash using current method O(n) time.
Computing root hash for Merkle O(nlogn) time.

Joseph Ashwood

2005-02-06 21:00:38 UTC

----- Original Message -----
From: "sh4dowmatter" <***@ucla.edu>
Subject: [BitTorrent] Re: Back to Merkle Hash Trees...

Completely incorrect. First and foremost O() cannot be applied to memory
constraints, there is an entirely seperate notation reserved for that. I
apologize if I made the mistake first. Second your assumptions are entirely
fallacious as well. Your assumptions require that the the branch rate be
fixed, and beyond that it requires that it be fixed at 2. Second, your
method of computation is invlid for O() notation, O() notation requires the
use of calculus in the summing, otherwise the variables become unusable. To
give a firm example the following is perfectly valid for a Merkle tree:
Root = hash(A,B,C)
A - leaf node
B - leaf node
C - leaf node

This has 4 nodes instead of the required 5 under your constructions. Next
O() requires that you keep those reals that are critical to the
understanding, or at the very least disclose them, in this case your
dropping of the 2 shows a situation where dropping the real eliminates
information that is necessary for proper understanding. So to be entirely
blunt, you used the wrong notation, you calculated wrong, and your
calculations were inaccurate anyway.

Your proof if valid, except for one, not so minor, problem. Your n-space
equation is incorrect, the Merkle tree requires nlogn space, leading to
O(nlogn) time to verify. Beyond this reconstruction will also require
O(nlogn) because the insertion requires that the tree be stepped through
downwards n times. Since the depth of the tree is logn, the time to insert
one node will be O(logn), multiply by the n times necessary, and we reach
O(nlogn) reconstruction time.

Post by sh4dowmatter
To prove that constructing the tree takes O(n) time, apply a similar
argument.

And the argument falls apart for exactly the same reason. You are reaching
your result assuming that your first result is correct, and you made major
errors in your argument, making such an argument invalid.

Your math is not as bad as I first assumed, and O() notation is horribly
confusing for a wide range of people. I have even seen professional
mathematicians mess up on it for the same reasons you have. The reason is
that they have been taught the non-calculus method that usually delivers
good answers, and they have been taught to drop all integers.

If you would like to further your education in this regards, I would suggest
reading most anything published by Knuth, he is well respected and maintains
nearly perfectly strict notation.
Joe

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Olaf van der Spek

2005-02-07 18:58:01 UTC

Post by Joseph Ashwood
apologize if I made the mistake first. Second your assumptions are entirely
fallacious as well. Your assumptions require that the the branch rate be
fixed, and beyond that it requires that it be fixed at 2. Second, your

Which definition of merkle trees are you using?

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

sh4dowmatter

2005-02-08 00:28:49 UTC

Post by Joseph Ashwood
Completely incorrect. First and foremost O() cannot be applied to

memory

Post by Joseph Ashwood
constraints, there is an entirely seperate notation reserved for that.

So I can't say that a declaring an array of size N takes O(N) bytes?

Post by Joseph Ashwood
I
apologize if I made the mistake first. Second your assumptions are

entirely

Post by Joseph Ashwood
fallacious as well. Your assumptions require that the the branch

rate be

Post by Joseph Ashwood
fixed, and beyond that it requires that it be fixed at 2.

When I started this thread, I built/expanded upon the example found in
the THEX specification; it was a binary tree, and I proposed extending
it to a full binary tree. I apologize if my assumptions were not explicit.

Post by Joseph Ashwood
Second, your
method of computation is invlid for O() notation, O() notation

requires the

Post by Joseph Ashwood
use of calculus in the summing, otherwise the variables become

unusable. To

Post by Joseph Ashwood
Root = hash(A,B,C)
A - leaf node
B - leaf node
C - leaf node
This has 4 nodes instead of the required 5 under your constructions.

I understand that this is valid...

Post by Joseph Ashwood
Next
O() requires that you keep those reals that are critical to the
understanding, or at the very least disclose them, in this case your
dropping of the 2 shows a situation where dropping the real eliminates
information that is necessary for proper understanding. So to be

entirely

Post by Joseph Ashwood
blunt, you used the wrong notation, you calculated wrong, and your
calculations were inaccurate anyway.

I don't see where you prove these assertions.

Post by Joseph Ashwood
Your proof if valid, except for one, not so minor, problem. Your

n-space

Post by Joseph Ashwood
equation is incorrect, the Merkle tree requires nlogn space, leading to
O(nlogn) time to verify. Beyond this reconstruction will also require
O(nlogn) because the insertion requires that the tree be stepped

through

Post by Joseph Ashwood
downwards n times. Since the depth of the tree is logn, the time to

insert

Post by Joseph Ashwood
one node will be O(logn), multiply by the n times necessary, and we

reach

Post by Joseph Ashwood
O(nlogn) reconstruction time.

I still fail to understand how a Merkle hash tree takes O(n logn)
space, assuming that the number of leaves (which we start with) is
proportional to the size of the file.

In any tree -- a binary tree, a ternary tree, any tree where any
interior node has more than one sibling (indeed, if this weren't the
case, it would be one lame tree, and you could construct it better) --
the number of internal nodes is always less than the number of leaves.
But the number of leaves is O(n), because they are all of a constant
size (say k) and n is the size of our file. So if the number of leaves
is O(n), and the number of internal nodes is bounded above by the
number of leaves, it must be O(n) too! So our total memory requirement
is O(n).

So if we agree (again, if...) that the size of a Merkle hash tree
(like all trees) is O(n), I provide a new argument that the
construction of such a tree takes O(n) time, based on the observations
that constructing such a tree (from scratch, given the file of length
n) takes two steps: computing the leaf hashes, and then computing the
internal nodes (and finally the root).

1) Computing the leaf hashes takes O(n) time: I think we agree on this
part. The file has size n, and leaves are hashes are of size k; thus
the number of leaf hashes is ceil(n/k), which is O(n).

2) Computing the internal hashes takes O(n) time: Okay, I lied before.
This doesn't really take O(n) time -- you caught me. I assumed that
the number of children per internal node was 2. It can be fixed at any
constant greater than 1 -- 3, 4, 5, whatever. (If you insist it is a
value less than or equal to 1, each internal node doesn't have two
children when it could, and you don't have a very efficient tree...)
Say each internal node has up to m children, where each child is a
hash of constant size. The hash of an internal node is simply computed
by hashing the concatenation of all its m children. It thus takes O(m)
time. Since there are O(n) internal nodes, the total time to compute
all internal nodes is O(nm). You could argue that m is large enough so
that O(nm) > O(n logn), but then you're totally abusing what
O-notation is: I can always find some n0 such that for n >= n0, O(nm)
< O(n logn). And the finale: m is a fixed constant -- so really this
is O(n) time.

The height of the tree is, quite simply, irrelevant. You could have a
full binary tree, of minimum height, or a completely unbalanced tree
of maximum height, but the time to compute the internal hashes in the
tree is the same. This is because the number of internal hashes does
not vary, and the number of children per internal node is fixed, and
that is all the computation depends on. So I don't think log n belongs
into the equation...

Post by Joseph Ashwood
Your math is not as bad as I first assumed, and O() notation is

horribly

Post by Joseph Ashwood
confusing for a wide range of people. I have even seen professional
mathematicians mess up on it for the same reasons you have. The

reason is

Post by Joseph Ashwood
that they have been taught the non-calculus method that usually

delivers

Post by Joseph Ashwood
good answers, and they have been taught to drop all integers.

If you're comparing my math to professional mathematicians, I'm in
better company than I had thought. Thank you for the compliment ;)

Post by Joseph Ashwood
If you would like to further your education in this regards, I would

suggest

Post by Joseph Ashwood
reading most anything published by Knuth, he is well respected and

maintains

Post by Joseph Ashwood
nearly perfectly strict notation.

Knuth is God, I admit. I keep meaning to read "Concrete Mathematics"
but can't find the time...

Finally, I'm sorry if my last response sounded demeaning or full of
vitriol. I'm just looking to find an answer we can both agree upon,
regardless of who (if either!) of us is right.

Regards,
shadowmatter

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Marcel Popescu

2005-02-09 14:56:00 UTC

Actually, I am interested in the topic, and I'd like to post, but this
group is not very open. I've posted three or four messages on the
topic, but they don't get on the list. I have to e-mail the authors
directly to participate (such as this e-mail).

Are you sure this is the address you used to subscribe to the group? You
might want to check the settings in Yahoo groups... I haven't had this
problem.

Marcel
--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.300 / Virus Database: 265.8.6 - Release Date: 2/7/2005

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Bill Cox

2005-02-09 15:05:02 UTC

Post by Marcel Popescu

Are you sure this is the address you used to subscribe to the group? You
might want to check the settings in Yahoo groups... I haven't had this
problem.
Marcel

Yes, and my messages do get posted now and then. However, whoever is
the moderator (I think it's Bram) is no longer simply upgrading people
to unmoderated posters, and I think he very rarely checks the pending
posts.

My guess is that you subscribed to the group a few months ago at least,
and quickly got to be an unmoderated poster.

Bill

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/

<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

Justin Cormack

2005-02-09 15:30:44 UTC