This study evaluates the performance of large language models (LLMs), such as GPT-4, on a new cipher dataset featuring 654 test cases, focusing on their ability to unscramble text and handle various ciphers. Results indicate that models excel at low-difficulty ciphers, successfully unscrambling tokens in 77% of attempts, highlighting their generalization capabilities despite reliance on training data. The research proposes challenges for LLMs, suggesting that their ability to solve complex problems like ciphers could represent a novel reasoning approach.