How it works¶
This section gives an overview of the Reconfigure.io service. We’ll start by running through our workflow and tooling, and then take a look at our system architecture and the steps we go through to get your code into a suitable format for programming an FPGA instance.
Reconfigure.io is a platform as a service (PaaS), which takes Go code, compiles and optimizes it, and deploys to FPGA instances. Depending on which platform you’re using, FPGA instances are either cloud-based (AWS F1) or on-premises. Either way, you will code in Go and interact with our service using
reco, our simple command-line tool.
Workflow and tooling¶
Access to the Reconfigure.io service is through our tool –
reco to check, upload and simulate your code, manage builds and deploy to an FPGA instance. You will be guided to install
reco when you sign up but if you need guidance on updating and installing head here.
reco is a simple tool with several intuitive commands, we’ll look at some of these in the relevant sections below – commands are described in bullet points. For a full list see, Tool Usage Reference.
Let’s take a look at the workflow from coding to deployment:
All the code you write will be in Go. You can create projects in your Go workspace and edit with your favourite editor. A Reconfigure.io project is made up of at least two Go programs, one for the FPGA, and at least one for the host CPU, shown below within the
cmd directory (you may have multiple host side commands for benchmarking etc.). We use a subset of the Go language for FPGA-side code and any new additions to the scope will be flagged up in our Release Notes. Host-side code is written in standard Go.
├── cmd │ └── test-my-project │ └── main.go ├── main.go
Your Reconfigure.io projects are developed in your Go environment so you can use standard Go tooling throughout the process:
go build and
go test can be used to flag up any semantic or syntactic errors and run tests against your FPGA code. You can read more about the Go testing framework here. You can also benchmark your designs using the Go testing framework, the benchmark is written into your program and then run during deployment to get an accurate measurement from the process running on hardware.
├── cmd │ └── test-my-project │ └── main.go │ └── bench-my-project │ └── main.go ├── main.go ├── main_test.go
Once you are happy with your code you can perform a quick-check to make sure it is compatible with our compiler. If your code contains any errors, or you’ve used elements of Go that are out of scope for FPGA-side code, these will be flagged up during this check.
reco checklocally type checks your FPGA code.
Next, you can simulate how your program will run on hardware. Any errors will be highlighted here and it is considerably quicker than creating a build – minutes rather than hours – so will save you time during the development process. Simulations will timeout if they don’t complete within one hour.
reco sim run <my_cmd>simulates how your program would run on an FPGA.
Our compiler takes your Go code through several stages to get it into a suitable format for programming an FPGA instance. First, your code is translated into a language called Teak, then, using the Teak output we can generate dataflow graphs. Using the
graph command you can generate a dataflow graph for your program at any time, allowing you to analyze and optimize its performance.
The ability to generate graphs is a temporary feature. Due to the complexity of the output we suggest you share your graphs with us on our forum so our engineers can assist you in optimizing your code.
reco graph gengenerates a dataflow graph from the program in your current directory.
reco graph listlists all graphs in your project along with their unique IDs.
reco graph open <graph_ID>lets you view any graph in your default default PDF viewer.
Next, you can build your project. Our compiler will check compatibility and convert your code into an image suitable for deploying to an FPGA instance. Builds will timeout if they don’t complete within 12 hours.
Build times are currently in the region of 4 hours. This is longer than we would like and is partly due to underlying silicon vender tools, which we are currently working to address. Although the build time is relatively long, it is not something you will have to do very often during your program development - you will mostly use our hardware simulator, which takes minutes rather than hours.
reco build runuploads the code from your current directory to the Reconfigure.io service. Building will automatically start once the upload has completed. Your Go code will be compiled and optimized to run on an FPGA instance. It’s a good idea to add a message to your build, just as you would with a git commit, so you can remember what it’s for later. To do this, use the
--messageflag followed by your short message, like this:
reco build run -m "my helpful message".
reco build listlists all builds for the current project along with their statuses. Each build is date-stamped and given a unique ID, and you can see any messages you have included so you can always make sure you’re using the correct build when working on large and complex projects.
Once your build is complete you can deploy the image to an FPGA instance. This process programs the FPGA with your compiled and optimized code and runs your chosen host-side command on the CPU.
reco deploy run <build_ID> <cmd>will deploy your build to the FPGA and run your chosen command on the host CPU.
Attention cloud users!
Live deployments are charged to your account (open-source users get 20 hours/month for free) and if you run out of allotted hours any live deployments you have running will be terminated. If your deployment is designed to run indefinitely as a service, it is important to remember to stop it:
reco deployment stop <deployment-ID> to avoid running out of hours. It is good practice to include a timeout for services, in case you forget to stop them. To do this you can run
reco deployment run <build-ID> timeout 30m <cmd> to ensure that the service is active for 30 minutes max. You can set whatever timeout you want, using hours
1m and seconds
Reconfigure.io programs have a simple structure: code for the FPGA and code for the host CPU, all written in Go:
You can have multiple host-side commands per program, and once your code is built each host command will be available to run with the FPGA-side code during deployment. For example, as indicated in the diagram above, you may have one host-side command that just feeds data to the FPGA, receives the output and relays is, and another host-side command that, as well as feeding and receiving data, runs a benchmark (using the Go benchmarking framework) to check the performance of the FPGA code.
reco to simulate, build and deploy your programs, you will work within a project. You can list items per project, which is really useful when you’ve got several work streams going at the same time, each with several builds and deployments.
You should create a new project for each program you work on. If you run a
deploy without setting which project to use first, you will be prompted to run
reco set-project <project name> before continuing. If it’s a new program you are working on you will need to run
reco create-project followed by a new project name.
create-projectis used to create a new project
projectsdisplays a list of all active projects for your account
set-projectsets a project to use for the program code you’re currently working on
Our software defined chips are based on FPGA instances, each of which is made up of an FPGA, dedicated RAM (we call this shared memory) and a host CPU. For on-prem customers, other high performance IO will be available, 2x 10 gigabit ethernet is standard.
Data can be shared between the FPGA chip and host CPU via shared memory; the host can allocate blocks in shared memory and pass pointers to the FPGA, and the FPGA can read and write to and from those pointers. The FPGA also has on-chip block RAM, which it can allocate directly.
CPU vs FPGA¶
The Go language is designed for writing concurrent programs, which you can read more about here. Go is normally used to write for traditional CPUs, where concurrent programming can take advantage of multi-core CPUs to perform several operations in parallel. But, when we optimize your Go for an FPGA, this potential for parallel processing is drastically increased.
For example, a goroutine running on a CPU is a tiny light-weight thread running within a bigger thread, with just one big thread per CPU core. There is potential for parallelism here, but only one operation can happen per core per unit of time. On an FPGA, one goroutine translates to a small chunk of circuitry, running continuously, so you could create a million of them, and they could all do their work, all the time.
A note about memory access – AXI / SMI¶
Our current standard way of having the FPGA talk to shared memory is using the AXI protocol (find more on this in our third tutorial). AXI is designed to work with multicore CPUs, with several cores accessing memory at the same time. But for us, as we’re using Go for FPGAs, the level of parallelism is so much higher. We’re dealing with many, potentially thousands of go routines, trying to access memory at the same time. Managing this with AXI is not straightforward.
Our engineers have developed a new protocol – SMI (Scalable Multiprotocol Infrastructure) – which addresses the issue of fine-grained parallelism, as well as simplifying code and reducing boilerplate for our users. It’s available for testing from Reconfigure.io v0.17.0 onwards and will be fully rolled out as our standard method for accessing memory very soon.
For more information, please see our blog post and you can check out our examples – we’ve included a version of our histogram-array code that uses SMI rather than AXI. We’ve also included an SMI-ready version of our template so you can start playing around with your own applications.
You will notice that with SMI we have introduced a
reco.yml file per program. This contains some simple settings: Infrastructure (SMI or AXI), the memory access bandwidth (max 512 bit, min 64 bit) and the number of ports you require for your application. So, for a program using SMI, with one read and one write port, the settings should appear like this:
memory_interface: smi memory_width: 512 ports: 2
Go compilation stages¶
We take your code through several stages to get it ready to program an FPGA:
- Teak – first, your Go is translated into Teak, a data-flow language with its roots in research from the University of Manchester. This allows us (and you, using graphs) to optimize your code for the FPGA architecture.
- Verilog RTL representation - this ‘register transfer level’ description is suitable for taking your code into the traditional FPGA development process.
- Verilog netlist - we then use standard tooling to compile your code into a netlist which relates to the FPGA’s logic components.
- Place and route – this is where we decide where on the physical FPGA chip to place the components from the netlist.
- Bitstream - the last part of the process is using the place and route output to generate a bitstream capable of programming the FPGA.