Bias in AI happens when algorithms categorize, i.e. derive abstract patterns, from too little and unbalanced data. Hence the necessity of a vast range of diverse data to cover human behavior in all its facets in a non-biased fair way.
When we lack the quantity to capture diversity, we as linguists and designers should provide the quality and balance in training data for machines to avoid wrong abstractions and learning behavior. So when it comes to speech recognition training, make sure you give every user a voice when collecting data. And even more so when scripting it. Ideally, it would help if you had as many socio-linguistic parameters covered in the training data set as possible: age group, gender, origin, register, dialect, conversational situation, or pitch.
The same principle should apply to the audio data used to train an ASR, utterances that serve as a basis for intent design, recognition, and NLU, as well as to interaction patterns you base your design on.