This will happen sometimes, for a certainty. ML, like statistics, is all about the most efficient way to predict Y knowing X1, X2, X3 etc. If you're training a filter to block people whose musical tastes you don't like, and that is highly correlated with being black, the filter will totally use "black" as a shortcut to improve its accuracy. There are countermeasures, but the model can evolve new proxy variables for race much faster than its trainer can construct new reward schedules to penalize those. Cost of doing business, I'm afraid. Err on the side of tolerance, as much as your individual patience can stand.