Downloader

Seed/download historical data

Downloader is the service for seeding/downloading historical data using the BitTorrent protocol. This data is stored in the form of snapshots, which are actually immutable .seg files.

The ETH core instructs the downloader component to download (and then seed) specific files from the BitTorrent network. The files are specified by their "info hashes", which are a form of content addressing. The files that ETH core instructs to download are block headers and block bodies. Downloader interacts then with the BitTorrent network to retrieve files needed by ETH core.

Info: While all Erigon components are separable and can be run on different machines, the downloader must run on the same machine as Erigon to be able to share downloaded and seeded files.

Start Erigon with snapshots support

Like many other Erigon components (txpool, sentry, rpc daemon) downloader can be integrated into Erigon or run as a separate process.

Downloader run by default inside Erigon with the --snapshots flag:

./build/bin/erigon  --internalcl --snapshots --datadir=<your_datadir>

Info: --snapshots flag is compatible with --prune flag (more info here).

Running downloader as a separate process

It's possible to start Downloader as independent process with --snapshots --downloader.api.addr=127.0.0.1:9093 flag.

Before using a separate downloader process the executable must be built:

cd erigon
make downloader

And you can then start the downloader

./build/bin/downloader --downloader.api.addr=127.0.0.1:9093 --torrent.port=42068 --datadir=<your_datadir>

--downloader.api.addr - is for internal communication with Erigon

--torrent.port=42068 - is for public BitTorrent protocol listen

You can increase/limit the network usage by adding the following flags:

--torrent.download.rate=512mb --torrent.upload.rate=512mb

The default download speed is 16mb/sec.

Erigon on startup sends list of .torrent files to Downloader and waits for 100% download completion

./build/bin/erigon --snapshots --downloader.api.addr=127.0.0.1:9093 --datadir=<your_datadir>

Use --snap.keepblocks=true to not delete retired blocks from DB.

Any network/chain can start with snapshot sync:

  • node will only download snapshots registered in the next repo https://github.com/ledgerwatch/erigon-snapshot

  • node will move old blocks from DB to snapshots of 1K blocks size, then merge snapshots to bigger range, until snapshots of 500K blocks, then automatically start seeding new snapshot

Creation of a new network or bootnode

You may need to create new snapshots and start seeding them

Creating new snapshots will dump blocks from Database to .seg files

erigon snapshots retire --datadir=<your_datadir>

Will create the .torrent files that downloader will automatically seed. The output format is compatible with https://github.com/ledgerwatch/erigon-snapshot.

You can change the snapshot size by using the flag

--from=0 --to=1_000_000 --segment.size=500_000

./build/bin/downloader torrent_hashes --rebuild --datadir=<your_datadir>

Start downloader (seeds automatically)

./build/bin/downloader --downloader.api.addr=127.0.0.1:9093 --datadir=<your_datadir>

Erigon is not required for snapshots seeding, but Erigon with --snapshots also does seeding.

Additional info

Snapshots creation does not require a fully-synced Erigon, few first stages are enough. For example:

STOP_AFTER_STAGE=Senders

./build/bin/erigon --snapshots=false --datadir=<your_datadir>

But for security it is better to have a fully-synced Erigon.

Erigon can use snapshots only after indexing them. Erigon will automatically index them but also can run (this step is not required for seeding):

./build/bin/erigon snapshots index --datadir=<your_datadir>

Architecture

Downloader works based on <your_datadir>/snapshots/*.torrent files. Such files can be created in 4 ways:

  • Erigon can do grpc call downloader.Download(list_of_hashes), it will trigger creation of .torrent files

  • Erigon can create new .seg file, Downloader will scan .seg file and create .torrent

  • operator can manually copy .torrent files (rsync from other server or restore from backup)

  • operator can manually copy .seg file, Downloader will scan .seg file and create .torrent

Erigon does:

  • connect to Downloader

  • share the list of hashes (see https://github.com/ledgerwatch/erigon-snapshot )

  • wait for download of all snapshots

  • when .seg file is available it automatically create .idx files - secondary indices, for example to find block by hash

  • then switch to normal staged sync (which doesn't require connection to Downloader)

  • ensure that snapshot downloading happens only once: even if new Erigon version does include new pre-verified snapshot hashes, Erigon will not download them (to avoid unpredictable downtime) - but Erigon may produce them by self.

Downloader does:

Technical details

  • To prevent attack .idx creation using random Seed - all nodes will have different .idx file (and same .seg files)

  • If you add/remove any .seg file manually you also need to remove <your_datadir>/snapshots/db folder

How to verify that .seg files have the same checksum as current .torrent files

Use it if you see strange behavior, bugs, bans, hardware problems, etc.

./build/bin/downloader --verify --datadir=<your_datadir>

Faster rsync

rsync -aP --delete -e "ssh -T -o Compression=no -x"

Release details

Start automatic commit of new hashes to branch master

crontab -e @hourly cd <erigon_source_dir> && ./cmd/downloader/torrent_hashes_update.sh <your_datadir> <network_name> 1>&2 2>> ~/erigon_cron.log

It does push to branch auto, before release - merge auto to main manually

Command line options

To display available options for downloader digit:

./build/bin/downloader --help

The --help flag listing is reproduced below for your convenience.

Commands

snapshot downloader

Usage:
   [flags]
   [command]

Examples:
go run ./cmd/downloader --datadir <your_datadir> --downloader.api.addr 127.0.0.1:9093

Available Commands:
  completion     Generate the autocompletion script for the specified shell
  help           Help about any command
  torrent_hashes 

Flags:
      --datadir string                 Data directory for the databases (default "/home/admin/.local/share/erigon")
      --downloader.api.addr string     external downloader api network address, for example: 127.0.0.1:9093 serves remote downloader interface (default "127.0.0.1:9093")
      --downloader.disable.ipv4        Turns off ipv6 for the downlaoder
      --downloader.disable.ipv6        Turns off ipv6 for the downlaoder
  -h, --help                           help for this command
      --log.console.json               Format console logs with JSON
      --log.console.verbosity string   Set the log level for console logs (default "info")
      --log.dir.json                   Format file logs with JSON
      --log.dir.path string            Path to store user and error logs to disk
      --log.dir.prefix string          The file name prefix for logs stored to disk
      --log.dir.verbosity string       Set the log verbosity for logs stored to disk (default "info")
      --log.json                       Format console logs with JSON
      --metrics                        Enable metrics collection and reporting
      --metrics.addr string            Enable stand-alone metrics HTTP server listening interface (default "127.0.0.1")
      --metrics.port int               Metrics HTTP server listening port (default 6060)
      --nat string                     NAT port mapping mechanism (any|none|upnp|pmp|stun|extip:<IP>)
                                       	     "" or "none"         default - do not nat
                                       	     "extip:77.12.33.4"   will assume the local machine is reachable on the given IP
                                       	     "any"                uses the first auto-detected mechanism
                                       	     "upnp"               uses the Universal Plug and Play protocol
                                       	     "pmp"                uses NAT-PMP with an auto-detected gateway address
                                       	     "pmp:192.168.0.1"    uses NAT-PMP with the given gateway address
                                       	     "stun"               uses STUN to detect an external IP using a default server
                                       	     "stun:<server>"      uses STUN to detect an external IP using the given server (host:port)
                                       
      --pprof                          Enable the pprof HTTP server
      --pprof.addr string              pprof HTTP server listening interface (default "127.0.0.1")
      --pprof.cpuprofile string        Write CPU profile to the given file
      --pprof.port int                 pprof HTTP server listening port (default 6060)
      --torrent.conns.perfile int      connections per file (default 10)
      --torrent.download.rate string   bytes per second, example: 32mb (default "16mb")
      --torrent.download.slots int     amount of files to download in parallel. If network has enough seeders 1-3 slot enough, if network has lack of seeders increase to 5-7 (too big value will slow down everything). (default 3)
      --torrent.maxpeers int           unused parameter (reserved for future use) (default 100)
      --torrent.port int               port to listen and serve BitTorrent protocol (default 42069)
      --torrent.staticpeers string     Comma separated enode URLs to connect to
      --torrent.upload.rate string     bytes per second, example: 32mb (default "4mb")
      --torrent.verbosity int          0=silent, 1=error, 2=warn, 3=info, 4=debug, 5=detail (must set --verbosity to equal or higher level and has defeault: 3) (default 2)
      --trace string                   Write execution trace to the given file
      --verbosity string               Set the log level for console logs (default "info")
      --verify                         Force verify data files if have .torrent files

Use " [command] --help" for more information about a command.

Last updated