Abstract: Large language models (LLMs) can interpret diverse webpages but become impractical when fed raw HTML due to excessive tokens and noisy markup. We propose a lightweight DOM-aware ...