Robots Learn to Act from Video

Video2Act diverges from conventional video-language action models by implementing an asynchronous dual-system architecture-a slow perceptual system extracts nuanced spatial and motion data, while a fast-system action decoder leverages this information to achieve both high-frequency responsiveness and stable robotic control, effectively bypassing the limitations of static image-token concatenation or direct feature conditioning approaches.

A new framework empowers robots to understand and replicate actions observed in videos, bridging the gap between visual perception and physical manipulation.

Evolving Phylogenies with Deep Learning

A six-layer ELU network, trained on JC alignments with 779 parameters, effectively approximates the true Jukes-Cantor distance function - specifically, $4\ln(1-3x/4)/3$ - exhibiting appropriate behavior by predicting a constant value beyond $x=0.75$, where the distance diverges, and even surpassing the performance of a 50-term Maclaurin series approximation, suggesting the learned transformations capture implicit information about the underlying evolutionary process-as evidenced by a ceiling value of 4.840 exceeding the mean tree diameter (3.697) of the training dataset.

New research demonstrates how artificial neural networks can learn effective distance metrics from sequence data, potentially accelerating and improving the accuracy of evolutionary relationship inference.