FRESH

Hacker News

OpenAI model for masking personally identifiable information (PII) in text

39 points by tanelpoder

by hiAndrewQuinn

2 subcomments

I'm surprised nobody else has commented on this. This is a very straightforward and useful thing for a small locally runnable model to do.

by stratos123

0 subcomment

There's some interesting technical details in this release:
> Privacy Filter is a bidirectional token-classification model with span decoding. It begins from an autoregressive pretrained checkpoint and is then adapted into a token classifier over a fixed taxonomy of privacy labels. Instead of generating text token by token, it labels an input sequence in one pass and then decodes coherent spans with a constrained Viterbi procedure.
> The released model has 1.5B total parameters with 50M active parameters.
> [To build it] we converted a pretrained language model into a bidirectional token classifier by replacing the language modeling head with a token-classification head and post-training it with a supervised classification objective.

by mplanchard

0 subcomment

It would be nice if their examples weren’t mostly things that are easy to catch with regex, but it’s cool to see if released as an open, local model.

by Havoc

0 subcomment

50M effective parameters is impressively light. Is there a similarly light model on the prompt injection side? Most of the mainstream ones seem heavier

by ndom91

0 subcomment

by y0eswddl

1 subcomments

by 7777777phil

0 subcomment

> The model is available today under the Apache 2.0 license on Hugging Face (opens in a new window) and Github (opens in a new window).
Bringing back the Open to OpenAI..