Monday, September 09

Geek

Daily News Stuff 9 September 2019

Regexes All The Way Down Edition

Tech News

  • The Cult of Kubernetes.  (christine.website)



  • It’s not wrong that "πŸ€¦πŸΌβ€β™‚οΈ".length == 7

    Well, it's 7 in Java and JavaScript, 5 in Python 3, and 17 in Rust.  Oh, and 1 in Swift.  Because fuck programmers trying to write applications that actually work anyway.

  • Urban Dictionary - yes, that Urban Dictionary - has scrubbed the term "GamerGate" from its site except for the entomological definition of the crown princess of an anthill.  (One Angry Gamer)

    It really does mean that as well as the other thing.  (Wikipedia)

    Walking talking Superfund site Zoe Quinn was also deemed too outré for the famously staid lexicographers at UD.

  • Is there a regular expression to detect a valid regular expression?  (StackOverflow)

    Not only is the answer no, but you are going to hell for having asked and I am going with you for having answered.

    Well, technically.  But since regular expressions haven't been regular since the incident of which we shall remain silent, you can just use 
    /^((?:(?:[^?+*{}()[\]\\|]+|\\.|\[(?:\^?\\.|\^[^\\]|[^\\^])(?:[^\]\\]+|\\.)*\]|\((?:\?[:=!]|\?<[=!]|\?>)?(?1)??\)|\(\?(?:R|[+-]?\d+)\))(?:(?:[?+*]|\{\d+(?:,\d*)?\})[?+]?)?|\|)*)$/

  • Regarding people disappearing into paintings...



  • So what the heck is going on with Thirdripper?  (Gamers Nexus)

    The latest leaks claim that it will appear in 4 and 8 channel versions - that latter is possible with the TR4 socket, though not with current motherboards - and 64 or 128 lanes of PCIe 4.0, with one quarter switchable to SATA mode so you can have 32 SSDs connected directly to the CPU.  Which is very not possible with current motherboards.

    In the video Steve mentions "surface-mount LGA sockets" and for a minute I thought he was referring to BGA, i.e. surface-mount CPUs.  But no, it's LGA, and current Threadrippers are LGA already.

    So most likely the four-channel sTRX4 models will drop straight in to existing motherboards, though they'll need new motherboards for the PCIe 4.0 and SATA extensions, while the high-end sWRX8 models will really need a new motherboard.



Disclaimer:  Some people, when confronted with a problem with regular expressions, think "I know, I'll use regular expressions to validate my regular expressions.” Now they have aleph-one problems.

Posted by: Pixy Misa at 06:42 PM | Comments (7) | Add Comment | Trackbacks (Suck)
Post contains 353 words, total size 4 kb.

1 Cult link missing: https://christine.website/blog/the-cult-of-kubernetes-2019-09-07

Posted by: Rick C at Tuesday, September 10 2019 12:18 AM (Iwkd4)

2 I don't see what's wrong with the hβ€”
Oh. Oh my.

Posted by: Jay at Tuesday, September 10 2019 12:24 AM (mrlXS)

3 Rick C - thanks, fixed!

Jay - yep.

Posted by: Pixy Misa at Tuesday, September 10 2019 03:32 AM (PiXy!)

4 That house actually--other than the fact it has some water damage or something--looks like it'd be pretty great for a fairly large family, after you got rid of the more eclectic furnishings.  I'd keep the Cooper painting, personally.

Posted by: Rick C at Tuesday, September 10 2019 07:25 AM (Iwkd4)

5 "7 in Java"
Unicode, SMH.  The only question was going to be why there were more than one answer.
Apple's "1" is useful from a non-programmer's perspective, sort of.  It's one...I would've used the word "glyph" but I'm not sure that's right in this context, which I guess is why the word the article uses is "grapheme."  (Just read some SO and Quora about the differences, and (of course) there's somewhat-picky but understandable nuance.)

Posted by: Rick C at Wednesday, September 11 2019 02:30 AM (Iwkd4)

6 I like the fact that cutting and pasting the emoji can remove some of the modifiers and change the answer. And testing interactively from Bash, it breaks it into multiple characters when the line wraps. πŸ˜€

Perl gives fun answers, because it still defaults to legacy encodings unless you tell it otherwise, and the only truly reliable method to guarantee that everything is handled as Unicode is the utf8::all module. With that loaded, the script returns 5 instead of 17. The Unicode::GCString module is supposed to handle extended grapheme clusters, but since it hasn't been updated since 2013, it thinks the answer for that emoji is 3 rather than 1.

-j

Posted by: J Greely at Wednesday, September 11 2019 03:28 AM (ZlYZd)

7 "cutting and pasting the emoji can remove some of the modifiers"
I used to read Mitch Kaplan's blog from time to time back in the day, so I wasn't surprised about this--there's a bunch of ways to compose and decompose complex characters.  IIRC most "common" Western accented vowels have multiple forms:  a composite glyph consisting of the letter itself with the accent (e.g., Γ , but also there's the two-character a-followed-by-`, and I gather there's multiple ways to decompose the first form into the second.  And, of course, the chosen grapheme is an extreme example of composition. (I wonder if anyone's done reverse code golf to see how many code points you can stack to get something "reasonable" outside of Zalgotext.)

Posted by: Rick C at Wednesday, September 11 2019 03:34 AM (Iwkd4)

Hide Comments | Add Comment




Apple pies are delicious. But never mind apple pies. What colour is a green orange?




50kb generated in CPU 0.02, elapsed 0.1696 seconds.
54 queries taking 0.1479 seconds, 290 records returned.
Powered by Minx 1.1.6c-pink.