Tuesday, January 22, 2013

Handling Redis Hashes

IVC, a great boutique software house that I have worked with for quite a few years, has been shifting to a new, more modern, software stack, part of which is Redis. They have asked me to work on improving the interface between Redis and PostgreSQL. Last week they mentioned that they are doing a lot with Redis hashes, and asked me to improve access to them. These are basically sets of key/value pairs. In PostgreSQL terms think of hstore for an equivalent.

So I came up with a simple idea of mapping a foreign table to a Redis hash. This actually isn't too hard to implement, and I got it working more or less yesterday:

andrew=# \! echo type foo | redis-cli
hash
andrew=# \d xxx
       Foreign table "public.xxx"
 Column | Type | Modifiers | FDW Options 
--------+------+-----------+-------------
 key    | text |           | 
 value  | text |           | 
Server: localredis
FDW Options: (hashkey 'foo')

andrew=# select * from xxx;
   key   | value 
---------+-------
 bar     | baz
 blurflk | asdf
 qqq     | ttt
(3 rows)
So far so good. You can treat a Redis hash as a little table and select from it. And we could perform similar operations for the other Redis data structures (lists, sets and zsets).  But I'm wondering if this is going to fit well with actual Redis practice. Let's say we're dealing with web session objects. In PostgreSQL you would probably have a table for these with a session id and an hstore or maybe a piece of json (especially after my json extractor functions get committed :-) ). But in Redis I think you're at least as likely to have one hash per session object rather than a hash of session objects, and Redis data structures don't nest, so there's no hash of hashes. So if you have thousands or millions of tiny session object hashes in Redis, with names like "web.session.a9f3c1763d" where the last part of the name is some fairly opaque session name, do you really want to be having to map each one of those to its own foreign table in order to be able to read the values nicely? I don't think so.

Following this thought I came up with the idea of a related FDW which would give you access to the whole of the hashspace in Redis. The tables would have three fields, not two, being, say, (hashname, key value). We could even impose a restriction on queries that would require you to specify a hashname, although that's possibly a bit ugly, and maybe we should just warn you against not supplying a hashkey query parameter.

So, which of these patterns is likely to be most useful? They are not actually mutually exclusive, and it's perfectly possible to implement both. But I would like to know what seems most useful to Redis users generally, especially if they are also using PostgreSQL.

Another possibility would be not to use the Foreign Data Wrapper interface at all, and just provide a set returning function which would take some connection parameters and a hash name and return the key value pairs. This too would be reasonably easy to do, as I already have (and will soon publish) a framework for generalised Redis access. On the other hand, one significant advantage of using Foreign Data Wrappers is that with FDWs handling pushed down quals ("where" conditions in effect) is rather easier to do, from what I can see.

5 comments:

  1. Nice... I'd like to improve lhe ldap fdw to return more nicelly results i'm thinking about hashes but the ldap hierarchical behaviour is making me tired. . :(

    ReplyDelete
  2. A "more common" practice in Redis would be to treat a Hash as an analog to a record in a Postgres table, i.e. the actual Redis hash would be a record, the keys Columns and the values, well, values. A Redis SET would be the common primitive related to a table (really a materialized view). It would store the key names of each of the hashes in the relation. When used this way you can use the Redis SORT command for some level of SELECT-like querying.

    In terms of sessions, you're right that a more typical practice would be one Redis key per session.

    I'm definitely interested to see what you come up with!

    ReplyDelete
  3. In Redis, there are few things that you can do without knowing the key that you are looking for. You can fetch a random key, or you can get all keys that match a pattern (this will be important in a minute).

    Being that most users of Redis use it as name-prefixed keys like 'session:', offering the ability to say "Use 'session:' as a prefix to a foreign table called 'sessions'" would probably be useful for the majority of users. Then, if you want the session for , you would use "select ... from sessions where name = 'identifier';" (I've renamed hashname to name here). If you ever needed to "select ... from sessions", then that would map naturally to a "KEYS session:*" request to Redis, followed by subsequent HGET, HGETALL, etc., calls.

    If you could provide a "foreign table type" to specify STRINGs, LISTs, SETs, or ZSETs additionally (using the same name prefixing), then each data type data could be exposed in other ways (like saying STRING values actually hold JSON blobs, LISTs hold JSON blobs or otherwise, ZSETs can be queried by value/score efficiently, etc.).

    It might not be unreasonable (from a performance perspective) to *require* that any select/update include exact 'name' as a where clause (regardless of the key type). It also might not be unreasonable for Postgres to create a local index on 'name' data that was created, read, or updated through Postgres, attached to the foreign table for subsequent use as part of 'where name like ...' clauses. Those indexes could also be kept in Redis in a few different ways (I'm thinking ZSETs for the fast incremental access properties, which SORT + SET doesn't offer).

    If you'd like to hop on the Redis mailing list, others there will no doubt have comments and answers for you: https://groups.google.com/forum/?fromgroups#!forum/redis-db

    As a user of Redis, thank you for this work. I believe that it will make a lot of Redis + Postgres users very happy.

    ReplyDelete
    Replies
    1. Google didn't like my markup...

      Then, if you want the session for <identifier>, you would use "select ... from sessions where name = 'identifier';"

      Delete
  4. Having a way to interact with sorted sets in Redis from PostgreSQL is very interesting to me. I have leaderboards that I like to maintain ranks for. Ranks are expensive to be updated in storage when an entry jumps many positions and alternatively can be updated nicely in redis. Similar to your thoughts for a Redis hash a score could be an additional field in the postgres table. There would be several ways to maintain the sorted set in Redis and to join the score back to a view in PostgreSQL.

    Your work on the Redis FDW is a great foundation to what could be a very interesting synergy between PostgreSQL and Redis. I see much potential for a series of functions that can interact with Redis in addition to the PostgreSQL FDW future.

    ReplyDelete