Worth Reading: Cherry picker

For big data analytics jobs, especially recurring jobs, finding a good cloud configuration (number and type of machines, CPU, memory ,disk and network options) can make a big different to overall cost and runtimes. Likewise, a poor choice can seriously degrade performance and/or increase costs. … How do you find the good configurations though? Just on instance types, Amazon EC2 and Microsoft Azure each offer over 40 (GCP offered 18 at the time of writing, plus the ability to customise VM memory and cores). And that’s before we’ve started to experiment with cluster size… An exhaustive search of the space is clearly way too expensive, especially as each data point requires a run of the job. —The Morning Paper