# Decision Trees : Part 2

To understand in a better way what the two formulas described in the previous post mean really let's build a practical use case:  building simple rules to determine whether an account will be recovered.

The table below contains a table of 10 accounts. For example; the first account is Under 30 days DSO having a High Amount due, Credit Score and has been recovered.

In our example, as the root node contains 6 recovered account and 4 not recovered, we need to split on the root node. This process is then repeated to create child nodes with the least impurity.

If we split the root node based on DSO, we will get the following entropies :

Entropy(Under 30) = −[(4 / 4) * (Log2 (4 /43) + (0/4) * Log2 (0 / 4)] = 0

Entropy(Between 31 and 60) = −[(1 / 4) * Log2 (1 / 4) + (3 / 4) * Log2 (3 / 4)] = 0.811

Entropy(Over 61 days) = −[(1 / 2) * Log2 (1 / 2) + (1 / 2) * Log2 (1 / 2)] = 1

Because the dataset contains four accounts Under 30, four between 31 and 60 days and 2 accounts over 61 days, the impurity of this split is calculated :

Impurity (DSO)  (4 / 10) * 0 + (4 / 10) * (0.811) + (2 / 10) * (1) = 0.52

To build this calculation in Excel we will use :

• SUM to compute the number of accounts for a given attribute value,
• DIV to compute the fraction of observations having each possible attribute value.
• LOG to calculate the impurity by also using IFERROR  to ensure that the undefined value P(0)Log2(0)=0  the undefined value is replaced by 0.
• SUM to compute the entropy for each possible node split
• SUMPRODUCT to calculate the impurity for each split

As shown in the result above the impurity of DSO is the smallest (0.324), among the 3 features. We should then start by splitting over the DSO feature.

The process same Process can be repeated on the new subtree. The DSO Under 30 and DSO Over 61 are pure nodes, we don't need to split them.

DSO between 31 and 60 is impure. Splitting on DSO gives an impurity of 0.8, whereas Credit Score or Amount due yield impurity of 0. So we can either split over the Credit Score or the Amount due variables.

Finally, we are able to build the following decision tree, holding a simple classification rule of Accounts Receivables.

This simple rule-based classification tree might help Receivables Analysts to prioritize their actions towards receivables that might need more effort to collect.