Imagine that you would do the same as you do for EEG but for the sounds that people make:
Once per second measure a sound they pronounce, and attempt to find correlation between sounds made 1 second ago, 2 seconds ago, 3 seconds ago and now. On that timeframe, the only thing that is the same is a person's timbre, and it's different for each person -- exactly what you found in your dataset!