A decision tree is a supervised learning method for predicting the output of a target variable, effective in both classification and regression problems. It uses the iterative dichotomiser 3 (ID3) algorithm to calculate information gain for features to optimize class separation and build nodes until all features are used or leaf nodes remain. Key concepts include entropy, which measures disorder, and information gain, which indicates the effectiveness of features in classification.