Under the hood, this model resembles LDA, but replaces its Dirichlet priors with Pitman–Yor Processes (PYPs), which better capture the power-law behavior of word distributions. It also supports arbitrary hierarchical priors, allowing metadata-aware modeling.
For example, in an earnings-transcript corpus, a typical LDA might have a flat structure: Prior → Document
Our model instead uses a hierarchical graph: Uniform Prior → Global Topics → Ticker → Quarter → Paragraph
This hierarchical structure, combined with the PYP statistics, consistently yields more coherent and fine-grained topic structures than standard LDA does. There’s also a “fast mode” that collapses some hierarchy levels for quicker runs; it’s a handy option if you’re curious to see the impact hierarchy has on the model results (or in a rush).
I did a google search for "camping with dogs" and it organized the results into a set of about ~30 results which span everything I'd want to know on the topic: from safety and policies to products and travel logistics.
Does this work on any type of data?
https://sturdystatistics.com/deepdive?fast=0&q=reinforcement...
I think only 1/10 of the articles is really on topic.
BTW:, the circular graphics of the result are really cool! How did you do this?