Say you have a table of things that you want to be able to apply some set of tags to. It might be blog posts, for example. The tags might be subject classifications, or reader ratings. Let's say for the sake of argument that it's going to be subject classifications, and that we actually have a (large) enum type to standardize that.
The classic way to do this is to have a table of {blog_post_id, tag} which will be unique on the combination. Up to now adding a tag is a nuisance - you need to make sure the tag doesn't already exist or you'll get a uniqueness violation. In 9.5 that's improved with INSERT ... ON CONFLICT IGNORE. Even then it's a bit more work that I really want to do. What I really want is something like this:
and then I'll be able to doUPDATE blog_posts SET subject_tags = subject_tags + 'my_new_subject' WHERE post_id = 12345
It's also possible to do this, somewhat inefficiently, by using hstore or jsonb fields (in jsonb you'd just have a top level object). In both cases you would use dummy values - say make everything have a value of 1. But that's ugly, and error prone, and rather hackish.SELECT * FROM blog_posts WHERE subject_tags ? 'some subject'
Now I don't know if there is enough extra value here to justify the work and code maintenance involved. I've managed to live without it all these years without attacking what little hair I have left. But it's something where I have occasionally thought "I could do this so much more simply and elegantly if I had a set type." So I'd like to see if there is enough interest to make it into a real project.
You can also use TEXT[] to store tags. It allows indexing but does not provide unique checks.
ReplyDeletewith https://github.com/JDBurnZ/postgresql-anyarray you can have one:
UPDATE blog_posts
SET subject_tags = anyarray_uniq(subject_tags || 'my_new_subject')
WHERE post_id = 12345
Also with JSONB you can use JSON array instead of 'top level object', avoiding 'dummy values'.
DeleteUsing json arrays is a terrible idea. They aren't even guaranteed to be homogeneous. They won't be unique and the steps you'd have to go through to make them unique would be horrible. Using native arrays is not much better. The module you pointed me at will behave very inefficiently and a large part of the point of this is efficiency. If you're serious about writing performant array functions they need to be written in C, not in SQL. But in any case, the array interface is not really suited to set operations. Arrays are ordered sequences, not unordered collections. Yes, you can implement some set operations on top of some array functions, but it's lipstick on a pig. That's not to say that under the hood a set implementation couldn't make use of the array infrastructure - it probably could.
ReplyDeleteAndrew, I would definitely prefer true SET type. For now there are some inefficient options, I just listed some to comprehend your blog post. If I can ask for a feature, it would be nice to have default JSON ARRAY output format.
DeleteSELECT subject_tags FROM blog_posts WHERE post_id = 12345;
'["first-tag","my_new_subject"]'
That would be really helpful for us.
ReplyDelete