Machine Learning PC Build

This page describes my most recent desktop PC build, which is designed specifically for training convolutional neural nets on a single GPU with TensorFlow.

Parts List

GPU Selection

Tensorflow-GPU runs on Nvidia graphics cards only. The main factors to consider when selecting a graphics card are memory and memory bandwidth. Here are a few options that could be considered for a machine learning desktop PC:

  • GTX 1050 Ti – 4GB, 112GB/sec, around $160
  • GTX 1060 – 6GB, 192GB/sec, around $300
  • GTX 1070 – 8GB, 256GB/sec, around $400
  • GTX 1080 – 8GB, 320GB/sec, around $550
  • GTX 1080 Ti – 11GB, 484GB/sec, around $800
  • Titan Xp – 12GB, 547.7GB/sec, around $1100

Effective cooling is very important when running a device with 250W TDP. I selected the MSI GTX 1080 Ti Gaming X card because it has good reviews regarding cooling. Unfortunately, although the card does an excellent job of keeping itself cool, it does so by heating up the motherboard and other components! Some cards have a different type of cooler that blows hot air out the back of the case through the PCIe slots. It may be best to go with this type of design instead to ensure that the system as a whole runs cool. The only drawback is that these “squirrel cage” blower fans tend to be somewhat louder than open air fans.

CPU Selection

Intel’s new i5-8400 6-core CPU provides a good balance of cost and performance. Although TensorFlow tries to use the GPU as much as possible, it still relies on the CPU for certain operations that lack GPU implementations. When running TensorFlow benchmarks, the CPU utilization is only around 20% to 30%.

However, more typical code is not going to be this efficient. Even if the code is well-written, there are often trade-offs between performance optimization and readability. During training of one particular model using lightly optimized code, the performance monitor indicated 100% CPU utilization. This seems to suggest that an even faster CPU would be helpful when experimenting with code that has not been fully optimized for performance. Unfortunately, Intel CPUs with more than 6 cores are not only expensive, but also require expensive LGA2066 motherboards. AMD Ryzen CPUs may be worth looking into.

One possible issue is that I am using a pre-compiled version of TensorFlow-GPU that does not utilize AVX and AVX2 extensions. By building TensorFlow-GPU from source and enabling these 256-bit vector instructions, it may be possible to speed up CPU performance by up to a factor of two. I have yet to try this.

Motherboard

The Intel i5-8400 is only compatible with the new Z370 motherboards. It is very important to check that the motherboard is compatible with all components that are being purchased. I have had good results with the ASRock Z370M PRO4 motherboard. This motherboard also has extra memory slots and an extra M.2 slot for future expandability.

Memory

16GB is a good amount of memory to start out with. To get good memory bandwidth, it is necessary to split up into two 8GB sticks since the CPU has dual memory channels. The i5-8400 CPU is designed for use with DDR4-2666 memory.

SSD

I was very excited to try out the new Samsung 960 EVO M.2 SSD. This SSD is mounted directly onto the motherboard, so there are no cables and mounting brackets involved. More importantly, the M.2 slot is connected to the Z370 chip with 4 PCIe 3.0 lanes. This high bandwidth connection enables sequential read and write speeds of 3200MB/sec and 1500MB/sec respectively. This is much faster than a SATA 3.0 SSD, which is limited by the interface to 600MB/sec.

Case

Money does not buy happiness when it comes to computer cases. Expensive “gaming” cases are built with heavy-gauge sheet metal and are therefore very heavy. It is better to find a lightweight microATX case that can be moved around easily.

The most important thing is to make sure that there is enough room for a full-length graphics card. The 2-fan MSI card that I used is about 10.5″ long. 3-fan models can easily reach 12.5″ and may not fit in the Rosewill FBM-05 case that I selected.

Good airflow is essential if you are using a GTX 1080 Ti with open-air fans. The FBM-05 case comes with front and rear fans, and the power supply also helps to remove hot air. If you are using a GPU with a squirrel-cage blower, case ventilation becomes much less important.

Power Supply

The GTX 1080 Ti is recommended for use with a power supply rated for at least 600W. The power supply also needs to have at least two 8-pin GPU power connectors.

Multiple GPUs?

I thought about building a computer with multiple GPUs, but this idea is not as good as it sounds. The GPU is connected directly to the CPU using 16 PCIe lanes. Mainstream CPUs such as the i5-8400 have enough lanes for only one GPU. If you use two GPUs, a special motherboard will be needed and each GPU will have access to only 8 PCIe lanes instead of 16. Transfers between the CPU and each GPU will therefore take up to twice as long. This could potentially create a bottleneck, and the CPU itself may also prove to be a weak link when using two GPUs.

The much more expensive Intel i9-7900X has 10 cores and 44 PCIe lanes, so it can definitely handle two GPUs. This may be a good solution for training a single model on two GPUs in parallel. However, if the idea is to train two experimental nets at the same time to see which one works better, it is more cost-effective to build two computers like the one on this page with one GPU each.

Assembly

The motherboard comes with an instruction manual that has most of the information needed for assembly. I started out by installing the CPU, memory, and SSD on the motherboard, and the power supply in the case. Then I attached the motherboard to the case and made all the connections. Installing the GPU is the final step.

CPU fans are always a little tricky. The instructions that come with the CPU should be followed very carefully.

The computer case should be grounded during assembly. One way to ground the case is to plug it into a surge protector that is switched off. An electrostatic wrist strap is also recommended. Static discharge can easily destroy a CPU or memory chip. It is also important to avoid touching the CPU contacts, which are very delicate. Finally, remember that sheet metal edges are very sharp!

TensorFlow Benchmarks

TensorFlow performance benchmarks can be found here: https://www.tensorflow.org/performance/benchmarks

The benchmarks are very easy to use. Download the repository from GitHub, and cd into the scripts/tf_cnn_benchmarks directory. You may need to comment out line 23 from preprocessing.py if you get an import error. It is ok to comment out this “interleave_ops” import because it is only needed when benchmarking with real ImageNet data, not synthetic data. Use the following command (all in one line) to run a benchmark with a given model and batch size:

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size={n} --model={inception3, resnet50, resnet152, alexnet, vgg16} --variable_update=parameter_server

TensorFlow Benchmark Results

  • TESLA K80 – Benchmark results from tensorflow.org, tested on Google Compute Engine with a single GPU. Amazon EC2 also uses TESLA K80 GPUs, so this is similar to what you would get with an Amazon EC2 P2.xlarge instance for about $0.90 per hour.
  • GTX 1060 6GB – Benchmark results from my old PC, with an AMD FX-8320e CPU. Tested on pre-built TensorFlow 1.4 for Windows. Due to limited GPU memory, batch_size 24 was used for resnet152.
  • GTX 1080 Ti 11GB – Benchmark results from the PC described on this page. Tested on pre-built TensorFlow 1.4 for Windows.
  • TESLA P100 – Benchmark results from tensorflow.org, tested on the Nvidia DGX-1 with a single GPU. The TESLA P100 is currently the top-of-the-line datacenter solution, soon to be replaced with TESLA V100. For comparison only. The DGX-1 with 8 P100 GPUs is priced at $129,000, so you would never actually use it with a single GPU.
ModelTESLA K80GTX 1060 6GBGTX 1080 Ti 11GBTESLA P100
Inception V3 (n=32)29.353.8128.21128
ResNet 50 (n=32)49.566.18187.8195
ResNet 152 (n=32)2029.6175.5882.7
AlexNet (n=512)6561113.252714.542987
vgg16 (n=32)35.442.54107.44144

Update – Linux Installation

I recently installed Ubuntu 16.04 LTS on this machine. Here are a few notes:

  • Update UEFI to the latest version (otherwise, Ubuntu may not load properly).
  • It may be necessary to add the “nomodeset” option to the linux command when loading the installer, and the first time that the installed OS is loaded. This temporary solution allows Linux to load before the proprietary NVIDIA GTX 1080 Ti drivers are installed. Here is a video that I found helpful: https://www.youtube.com/watch?v=OTmZYzaxR_k
  • Here is a video I found that shows how to install TensorFlow GPU and its dependencies: https://www.youtube.com/watch?v=rILtTjrecQc. This video is just slightly out of date. I installed TensorFlow 1.4, CuDNN 6.0, and the latest version of Anaconda. It is important to install versions of CUDA and CuDNN that are compatible with TensorFlow – not necessarily the latest versions.

NOTE: This post was originally published November 2017 (Old Website)

Leave a Reply

Your email address will not be published. Required fields are marked *