Nick Johnson
2005-01-12 05:11:03 UTC
A project I'm working on at the moment requires fetching scrape data
from trackers. Often, I have multiple torrents on the same tracker that
I want scrape data for, so fetching scrape data for each file
individually is probably very inefficient. On the other hand, the
tracker's full scrape might be huge (multiple megs!) depending on how
many torrents are on the tracker that I don't care about.
With this in mind, I have a couple of proposals for how to improve this:
-Add a GET parameter 'hashes', which is a string that's a multiple of 20
bytes. Each 20 bytes corresponds to one info_hash that you want the
tracker to return. If the tracker recognises the extension, it returns
only the requested keys, otherwise it returns all keys (since this would
be the default behaviour of trackers getting unknown get parameters
anyway - hopefully!)
Due to limitations on GET string length, this is probably limited to
about 100 torrents before problems start to crop up. URLs like this are
quite likely to result in messes in log files, too. :/
-Somewhat more complicated: The client supplies a Bloom Filter
(http://en.wikipedia.org/wiki/Bloom_filter) to the tracker (using a
defined set of hash functions), with the appropriate entry in the filter
set for each torrent it wants. The nature of the Bloom Filter guarantees
all the requested entries will be returned, but extra entries may be
returned as well. This way, an unlimited number of hashes can be
requested with a static request size, just with an increasing
false-positive rate.
Other suggestions are welcomed. It just seems to me that, at the moment,
for fetching 100 torrents from a tracker that's tracking 1000, either
option is rather inefficient.
-Nick Johnson
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/
<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
from trackers. Often, I have multiple torrents on the same tracker that
I want scrape data for, so fetching scrape data for each file
individually is probably very inefficient. On the other hand, the
tracker's full scrape might be huge (multiple megs!) depending on how
many torrents are on the tracker that I don't care about.
With this in mind, I have a couple of proposals for how to improve this:
-Add a GET parameter 'hashes', which is a string that's a multiple of 20
bytes. Each 20 bytes corresponds to one info_hash that you want the
tracker to return. If the tracker recognises the extension, it returns
only the requested keys, otherwise it returns all keys (since this would
be the default behaviour of trackers getting unknown get parameters
anyway - hopefully!)
Due to limitations on GET string length, this is probably limited to
about 100 torrents before problems start to crop up. URLs like this are
quite likely to result in messes in log files, too. :/
-Somewhat more complicated: The client supplies a Bloom Filter
(http://en.wikipedia.org/wiki/Bloom_filter) to the tracker (using a
defined set of hash functions), with the appropriate entry in the filter
set for each torrent it wants. The nature of the Bloom Filter guarantees
all the requested entries will be returned, but extra entries may be
returned as well. This way, an unlimited number of hashes can be
requested with a static request size, just with an increasing
false-positive rate.
Other suggestions are welcomed. It just seems to me that, at the moment,
for fetching 100 torrents from a tracker that's tracking 1000, either
option is rather inefficient.
-Nick Johnson
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/BitTorrent/
<*> To unsubscribe from this group, send an email to:
BitTorrent-***@yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/