References

Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. 2014. “Neural Machine Translation by Jointly Learning to Align and Translate.” arXiv Preprint abs/1409.0473. https://arxiv.org/abs/1409.0473.

Bronstein, Michael M., Joan Bruna, Taco Cohen, and Petar Veličković. 2021. “Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges.” arXiv Preprint. https://arxiv.org/abs/2104.13478.

Brown, Tom B., Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, et al. 2020. “Language Models Are Few-Shot Learners.” arXiv Preprint arXiv:2005.14165. https://arxiv.org/abs/2005.14165.

Copeland, B. Jack. 2002. “Hypercomputation: Computing More Than the Turing Machine.” Minds and Machines 12 (4): 461–502. https://doi.org/10.1023/A:1021111623943.

Courbariaux, Matthieu, Yoshua Bengio, and Jean-Pierre David. 2015. “BinaryConnect: Training Deep Neural Networks with Binary Weights During Propagations.” arXiv Preprint arXiv:1511.00363.

Dziri, Nouha, Ximing Lu, Melanie Sclar, Xiang Lorraine Li, Liwei Jiang, Bill Yuchen Lin, Peter West, et al. 2023. “Faith and Fate: Limits of Transformers on Compositionality.” arXiv Preprint arXiv:2305.18654. https://arxiv.org/abs/2305.18654.

Gödel, Kurt. 1931. “Über Formal Unentscheidbare Sätze Der Principia Mathematica Und Verwandter Systeme i.” Monatshefte Für Mathematik Und Physik 38 (1): 173–98. https://doi.org/10.1007/BF01700692.

Gu, Albert, and Tri Dao. 2023. “Mamba: Linear-Time Sequence Modeling with Selective State Spaces.” arXiv Preprint arXiv:2312.00752. https://arxiv.org/abs/2312.00752.

Hamkins, Joel David, and Andy Lewis. 2000. “Infinite Time Turing Machines.” The Journal of Symbolic Logic 65 (2): 567–604. https://doi.org/10.2307/2586556.

Kaplan, Jared, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. 2020. “Scaling Laws for Neural Language Models.” arXiv Preprint abs/2001.08361. https://arxiv.org/abs/2001.08361.

LeCun, Yann. 2022. “A Path Towards Autonomous Machine Intelligence.” OpenReview. https://openreview.net/pdf?id=BZ5a1r-kVsf.

———. 2024. “Objective-Driven AI: Towards AI Systems That Can Learn, Remember, Reason, and Plan.” Seattle, WA: Dean W. Lytle Electrical & Computer Engineering Endowed Lecture, University of Washington. https://www.ece.uw.edu/wp-content/uploads/2024/01/lecun-20240124-uw-lyttle.pdf.

Penrose, Roger. 1994. Shadows of the Mind: A Search for the Missing Science of Consciousness. Oxford, UK: Oxford University Press.

Rumelhart, David E., Geoffrey E. Hinton, and Ronald J. Williams. 1986. “Learning Representations by Back-Propagating Errors.” Nature 323 (6088): 533–36. https://doi.org/10.1038/323533a0.

Shazeer, Noam, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. 2017. “Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer.” arXiv Preprint arXiv:1701.06538.

Su, Jianlin, Yu Lu, Shengfeng Pan, Ahmed Murtadha, Bo Wen, and Yunfeng Liu. 2021. “RoFormer: Enhanced Transformer with Rotary Position Embedding.” arXiv Preprint arXiv:2104.09864.

Turing, A. M. 1936. “On Computable Numbers, with an Application to the Entscheidungsproblem.” Proceedings of the London Mathematical Society 42 (2): 230–65. https://doi.org/10.1112/plms/s2-42.1.230.

Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. “Attention Is All You Need.” https://arxiv.org/abs/1706.03762.