Listening to the street while waiting for a local library to open…
I recorded this soundscape while waiting for a local library to open.
Sounds of wind, footsteps, music from afar, languages, laughter, and drums. These sounds find themselves in different frequency ranges without interfering with one another. An orchestra in its own right. When I tried to transcribe this soundscape into a script, using one of the most advanced audio-to-text tools, I received this:
我現在要去看一看那些小朋友們 他們在那邊玩耍 他們在那邊玩耍 他們在那邊玩耍 他們在那邊玩耍 他們在那邊玩耍 他們在那邊玩耍 他們在那邊玩耍 他們在那邊玩耍 他們在那邊玩耍 他們在那邊玩耍 他們在那邊玩耍 他們在那邊玩耍 他們在那邊玩耍 他們在那邊玩耍 他們在那邊玩耍 他們在那邊玩耍 他們在那邊玩耍 他們在那邊玩耍 他們在那邊玩耍 他們在那邊玩耍 他們在那邊玩耍 他們在那邊玩耍
Not sure where the result comes from and what that means. It seems that the ai tool is confused. It is confused by the messy information/reality/data it was thrown into. And the result suggests that it is beyond its capacity to transduce a soundscape into a linguistic text. Perhaps we can start from there to do something fun with the sonic world?
I am Yu, I listen to sound.