Distracted by Design] loves gear from the 1980s, though some of it isn’t as useful as it used to be. He happened across a ...
Abstract: End-to-end time-domain speech separation with masking strategy has shown its performance advantage, where a 1-D convolutional layer is used as the speech encoder to encode a sliding window ...
Abstract: Understanding videos, especially aligning them with textual data, presents a significant challenge in computer vision. The advent of vision-language models (VLMs) like CLIP has sparked ...