SUMM

The talk focuses on AI and its application to New York Times' game "Connections"
Presenter clarifies the work is independent, not based on internal NYT research, with findings being preliminary and investigative rather than authoritative
Connections launched in beta in June 2023, officially released August 2023, and quickly became the NYT's second most played game after Wordle
All Connections puzzles and mechanics are human-made and will remain so

In each daily puzzle, players must group 16 words into four groups of four, each group having a unique relation; no overlaps allowed
Players can make up to four incorrect guesses before losing
The game uses a difficulty structure:
- Yellow: most obvious groupings
- Green: less obvious but still accessible
- Blue: tricky themes, idioms, or trivia
- Purple: the most difficult, with intentional decoys to mislead

Connections is challenging for AIs, especially for 100% solve rates, as the game tests for abstraction and avoidance of overfitting
Fixed inputs make Connections useful as a reproducible AI benchmark
Example given where ChatGPT (lower models) provide wrong solutions to puzzles
Human problem-solving involves system one (fast, intuitive) and system two (slow, deliberate) thinking, with effective play often combining both
Common human errors include overreliance on intuition (system one) or overthinking (system two)

Third-party benchmarks demonstrate progressive improvements in LLMs' abilities to solve Connections, but perfection hasn't been reached
Random guessing gives almost no chance of winning; once a category is found, random chances slightly improve but remain very low
Most players get stuck on the last two categories

The Connections puzzle can be modeled using the graph coloring problem from computer science, where vertices represent words and coloring represents group assignments
Each word (vertex) is assigned a category (color); edges represent strength of assumed relationships
Modeling Connections this way helps algorithms and AIs better search for solutions compared to random guessing

Semantic similarity is helpful but insufficient; relationships among words are multifaceted:
- Anagrammatic (orthography)
- Morphological (word forms)
- Encyclopedic (factual/knowledge-based)
- Associative (e.g., color associations)
Words with multiple meanings (polysemy) are especially challenging for AIs and humans
Presenter introduces "relational alignment" scores to computationally assess how easy or hard a puzzle is

Computational metrics can highlight difficulty, with data showing easier puzzles have higher relational alignment scores
Puzzles are categorized over time (e.g., hypernomy, orthography), and trends in categories are analyzed with histograms and counts
Multi-dimensional and time-varying analysis is necessary since words and puzzles can span many relational metrics

Search spaces can be further reduced by cluster analysis (graph clustering) using multi-dimensional/hypergraph models, integrating semantic relationships into the clusters
Graphs become increasingly complex, modeling both inter- and intra-cluster strengths among words
Semantic graphs can be constructed using lexical databases like WordNet, ConceptNet, and word embeddings
Building such explainable models allows for more transparent AI reasoning

Graph convolutional neural networks (GCNNs) can process word relationship graphs and output candidate puzzle solutions
The approach combines GCNNs with a reinforcement learning system for searching optimal groupings
The system diagram and 3D visualization demonstrate how semantic graphs and clusters enable navigation to solutions

Preliminary tests on a small set of hard puzzles show increased solvability after applying this AI framework
The approach is still being developed and tested; future plans include applying it to more puzzles and integrating findings into game development
Limitations exist: LLMs may pull answers from internet sources rather than reasoning, and their solutions can be black boxes
Next steps involve aligning this work with established AI benchmarks (e.g., ArcGI benchmark) to compare performance
Presentation ends with an invitation for audience follow-up and discussion

AI and Game Theory: A Case Study on NYT's Connections — Shafik Quoraishee, NYT Games