NOTE: My cryptography-based solution above (accepted answer) is my preferred method, but since it is significantly different I am including my old answer here (I did not want to clutter my other answer with it).
Non-cryptographic solution:
While the above extension to the BitTorrent protocol would require a lot more work, you can still nearly eliminate cheating without using cryptography. The solution below is less elegant but both introduces proper accounting and removes incentives to cheat.
If a special accounting flag is set in the torrent metadata, you might propose that the client should follow the following protocol:
Each client must maintain a list of peers with which it has transferred data (and how much), until it has a chance to report this to the tracker. In recording the number of bytes transferred, the client should only include data acknowledged by the other peers in some way (e.g. TCP ACK).
For example:
203.0.113.5:634 2459368 (to) 34347224 (from)
203.0.113.37:123 5954714 (to) 0 (from)
This would mean that since the client last successfully reported traffic to the tracker, it has received about 33MB from 203.0.113.5
and sent about 2MB and 6MB to 203.0.113.5
and 203.0.113.37
, respectively.
Rather than issue a GET
request to the tracker, the client will (when necessary) use a POST
request containing this information.
In our example:
203.0.113.5|634|2459368|34347224
203.0.113.37|123|5954714|0
The tracker should then acknowledge that this information was received successfully, and the client should then reset the byte counters and may discard this information. If the tracker is unavailable or responds with a transient error, the client must continue to store and attempt to report this data.
The tracker should also return a list of any records which have yet to be corroborated by other peers. The client can use this to decide if some peers are not trustworthy.
The tracker should keep track of any disagreements amongst the records it receives from peers. The tracker can use this to determine if some peers are cheating with some degree of confidence.
Why this helps:
This is effectively a double-entry bookkeeping system. The tracker can now expect to receive totals for each peer which add up well in the long term (apart from small irregularities from clients disconnecting or crashing and losing data). The tracker can record the number of bytes outstanding by each IP.
For example, suppose the tracker receives the POST
data above and is then tracking the following outstanding traffic:
REPORTER PEER UPLOADED DOWNLOADED
203.0.113.1:800 203.0.113.5:634 2459368 34347224
203.0.113.1:800 203.0.113.37:123 5954714 0
The tracker would respond to 203.0.113.1
with success, and include a list of uncorroborated data for that reporter (currently all data):
203.0.113.5|634|2459368|34347224
203.0.113.37|123|5954714|0
A little later, it might receive a POST request from 203.0.113.5
with the data:
203.0.113.1|800|41943040|2459368
Now this is subtracted from previously outstanding traffic and the record now looks like:
REPORTER PEER UPLOADED DOWNLOADED
203.0.113.1:800 203.0.113.5:634 0 0
203.0.113.1:800 203.0.113.37:123 5954714 0
203.0.113.5:634 203.0.113.1:800 7595816 0
The entry with zeros can now be removed. Note the new row since 203.0.113.5
is claiming to have sent another 7595816 bytes of data to 203.0.113.1
than the latter reported earlier (which is possible given the delay).
The tracker would respond to 203.0.113.5
with success, and include a list of uncorroborated data for that reporter:
203.0.113.1|800|7595816|0
This process continues, and the accounting should eventually balance with only small errors if any. Any client lying about its data transfer should be easy to identify in the long term.
Managing incentives:
The above accounting system removes both of the incentives to cheat.
If you claim to have uploaded more data than you really have, those extra bytes would sit unaccounted for in the tracker's table, and the tracker could choose to ignore them when calculating the ratio.
Alternatively, if you download a large amount of data, but only claim to have downloaded a smaller amount, the other peers will eventually work this out and blacklist you, preventing access to the swarm.