Tweet was retrieved from the Twitter API on 2023-05-11T15:14:28.055564 and is presented in pure HTML+CSS, without Twitter's official styles or official tracking.
First, lexing breaks up the input into a stream of tokens. PCRE has comments, (?#...) That's information we can drop right there in the lexer. Read a comment, don't produce a token, and move on to read the next thing.