A decision tree recursively splits a feature space ℝd and then assigns class labels based on the resulting partition. Decision trees have been part of the basic machine-learning toolkit for decades. A large body of work treats heuristic algorithms to compute a decision tree from training data, usually minimizing in particular the size of the resulting tree. In contrast, little is known about the complexity of the underlying computational problem of computing a minimum-size tree for the given training data. We study this problem with respect to the number d of dimensions of the feature space. We show that it can be solved in O(n2d + 1) time, but under reasonable complexity-theoretic assumptions it is not possible to achieve f(d)⋅no(d/logd) running time, where n is the number of training examples. The problem is solvable in (dR)O(dR)⋅n1 + o(1) time if there are exactly two classes and R is the upper bound on the number of tree leaves labeled with the smallest class.