An alt text

Need advice on how to convert a filter string query on tags in SQL, for example "(tag1 and tag2) or tag3" in SQL?

Blog post posted on PostgreSQL subreddit, Javascript subreddit and AskProgramming Subreddit:

In a closed source project I’m currently working on, I’ve implemented a system that lets you associate tags with resources.

My implementation looks like the following Proof of Concept postgres-tags-model-poc, itself inspired by the article Tags and Postgres Arrays, a Purrrfect Combination.

For the moment the implementation is working pretty well, I can apply filters (see examples in the README) on tag-based resources, examples :

a.

    SELECT *
    FROM main.contacts_with_tag_names
    WHERE (
        tags && (
            SELECT ARRAY_AGG(id) FROM main.contact_tags WHERE name = ANY(ARRAY['tag2'])
        )
    );

or

b.

SELECT *
FROM main.contacts_with_tag_names
WHERE (
  (
    tags <@ (
      SELECT ARRAY_AGG(id) FROM main.contact_tags WHERE name = ANY(ARRAY['tag2', 'tag3'])
    )
  ) OR (
    tags && (
      SELECT ARRAY_AGG(id) FROM main.contact_tags WHERE name = ANY(ARRAY['tag1'])
    )
);

Now that this implementation is in place, I’d like to allow application users to enter “query filter string” to configure tag-based filters.

This is how the query filter strings in examples a. and b. would look:

  • a. tag2
  • b. (tag2 and tag3) or tag1

My next objective is to implement this “query filter string”.
I learned from past experience that using simple regexes isn’t the best idea for implementing a parser, and I’ve found in practice that using a lexer library is much simpler and more rigorous.

I started by looking for equivalents to Flex, GNU Bison… in Javascript:

I also spent some time researching whether there was a “standard” query filter syntax, I looked at the query search syntax of Google, Duckduckgo, Melisearch, Loki, GitHub, Jira and I couldn’t find anything “standard” for making searches based on tags alone.
The query filter syntax that comes closest to my goal is the Google one. The other syntaxes I’ve studied are often too powerful, handling a lot of extra functionality.

Questions

Before I start implementing this query filter system, I’d like to ask you the following questions:

  • a. Do you know of a standard “query filter syntax” that I could use as a reference?
  • b. Is there an Open Source library that transforms this type of query filter syntax into SQL?
  • c. Do you have a Javascript lexer library to recommend that would be suitable for this rather minimalist requirement?
  • d. Side question: would a tool like pgvector be useful and more suitable than the implementation described in the article Tags and Postgres Arrays, a Purrrfect Combination?

Thanks in advance for your suggestions.
Stéphane