Discussion about this post

User's avatar
Cody Sedler's avatar

big fan of what you're working on!

The AI Architect's avatar

The Stripe example is particularly illuminating. Moving from discrete feature flagging to embedding-based detection delivers a 38 point jump in accuracy but loses human-parseable explanations. This tradeoff is gonna become more common as we shift from handcrafted features to learned representations. I've noticed similar challenges in healthcare where clinicians want feature-level attribution but foundation models are learning patterns across entire patient timelines. The CoT unfaithfulness research is eye-opening too, especially around RLHF incentivizing persuasiveness over transparency.

No posts

Ready for more?