Wildcard purge nginx cache

Post Written by
Ivan Dabić
Last modified on June 30th, 2020 at 1:51 pm

Looking for a good solution, at least for my use case, I’ve stumbled upon interesting approach to find and remove nginx cached files by regex pattern. This is proxy_pass or fastcgi cached files, regardless. Repository with source code can be found on github: https://github.com/perusio/nginx-cache-purge The script gets the arguments from regular expression to calculate the match within caching directory nginx uses. It is very important to allow script to read as well as to write on this directory. To get back to common use case with nginx as a caching system we have to define nginx vhosts. For this showcase purpose we can use nginx.conf for all two vhosts I’ll use to mimic potential multi-user environment but, what would in real world be the case is a list of vhost files for each host. Then for test purpose I’ve created a directory “/var/cache/idabic/” for nginx to use for caching and included this path in nginx.conf. Each server block within nginx.conf, or within vhost file, is a separate server entity with separate cache keys, caching rules, server names, etc. When having strict requirement to purge large number of files and only known criteria is something that can be mapped via cache key, it’s a wildcard purge support that would save the day. Usually, this is something that’s triggered by API call so then API server can initiate the purge/delete from cache based on regex solely. What I need to implement above solution is:

  1. Nginx installation
  2. Bash script for purging by regular expression
  3. Custom Pseudo API server

Nginx

Installation on my test box isn’t utilizing vhost files (I’m too lazy to re-organize current installation), rather, it’s using main nginx.conf file located at “/etc/nginx/nginx.conf” for each server block needed to create multi-host environment. First, create caching space for nginx to use:

~$ mkdir /var/cache/idabic

Then define the caching space and caching zone in nginx.conf by adding following line in it under http block:

proxy_cache_path /var/cache/idabic keys_zone=idabic:10m inactive=10h;

This creates a caching “zone” as well with name “idabic” and it tells nginx to expire any cached request if it’s not being requested in more than 10 hours. It, also, sets index keys size to 10MB. To create multi-host environment create two server blocks as follows:

server {
      listen     80;
      listen     https://www.linkedin.com/redir/invalid-link-page?url=%5B%3A%3A%5D%3A80;
      server_name local;
      root       /usr/share/nginx/html;
      include /etc/nginx/default.d/*.conf;
      location / {
              resolver 8.8.8.8;
              proxy_pass https://www.maxcdn.com$request_uri;
              add_header ID $upstream_cache_status;
              proxy_cache_min_uses 2;
              proxy_cache idabic;
      }
      error_page 404 /404.html;
          location = /40x.html {
      }
      error_page 500 502 503 504 /50x.html;
          location = /50x.html {
      }
      proxy_cache_key $http_host$uri$is_args$args;
   }

And another one with slightly different parameters:

server {
      listen     80 default_server;
      listen     https://www.linkedin.com/redir/invalid-link-page?url=%5B%3A%3A%5D%3A80 default_server;
      server_name localhost;
      root       /usr/share/nginx/html;
      include /etc/nginx/default.d/*.conf;
      location / {
              resolver 8.8.8.8;
              proxy_pass https://www.maxcdn.com$request_uri;
              add_header ID $upstream_cache_status;
              proxy_cache_min_uses 2;
              proxy_cache idabic;
      }
      error_page 404 /404.html;
          location = /40x.html {
      }
      error_page 500 502 503 504 /50x.html;
          location = /50x.html {
      }
      proxy_cache_key $http_host$uri$is_args$args;
   }

You’ll notice that cache keys are custom in order to meet common requirement by combining Host, requested uri (without query string) and arguments (if any). Final nginx.conf looks like this:

user nginx;
worker_processes auto;
error_log /var/log/nginx/error.log;
pid /run/nginx.pid;
events {
   worker_connections 1024;
}
http {
   log_format main '$remote_addr - $remote_user [$time_local] "$request" '
                   '$status $body_bytes_sent "$http_referer" '
                   '"$http_user_agent" "$http_x_forwarded_for"';
proxy_cache_path /var/cache/idabic keys_zone=idabic:10m inactive=10h;
   access_log /var/log/nginx/access.log main;
   sendfile          on;
   tcp_nopush        on;
   tcp_nodelay       on;
   keepalive_timeout  65;
   types_hash_max_size 2048;
   include           /etc/nginx/mime.types;
   default_type      application/octet-stream;
   include /etc/nginx/conf.d/*.conf;
   server {
      listen     80;
      listen     https://www.linkedin.com/redir/invalid-link-page?url=%5B%3A%3A%5D%3A80;
      server_name local;
      root       /usr/share/nginx/html;
      include /etc/nginx/default.d/*.conf;
      location / {
              resolver 8.8.8.8;
              proxy_pass https://www.maxcdn.com$request_uri;
              add_header ID $upstream_cache_status;
              proxy_cache_min_uses 2;
              proxy_cache idabic;
      }
      error_page 404 /404.html;
          location = /40x.html {
      }
      error_page 500 502 503 504 /50x.html;
          location = /50x.html {
      }
      proxy_cache_key $http_host$uri$is_args$args;
   }
   server {
      listen     80 default_server;
      listen     https://www.linkedin.com/redir/invalid-link-page?url=%5B%3A%3A%5D%3A80 default_server;
      server_name localhost;
      root       /usr/share/nginx/html;
      include /etc/nginx/default.d/*.conf;
      location / {
              resolver 8.8.8.8;
              proxy_pass https://www.maxcdn.com$request_uri;
              add_header ID $upstream_cache_status;
              proxy_cache_min_uses 2;
              proxy_cache idabic;
      }
      error_page 404 /404.html;
          location = /40x.html {
      }
      error_page 500 502 503 504 /50x.html;
          location = /50x.html {
      }
      proxy_cache_key $http_host$uri$is_args$args;
   }
}

API Pseudo Server

This is pseudo server due to amount of time taking to create pseudo server (no OAuth required) for the sake of showcase. This time I’ll be using a python service I just wrote for this purpose. IT utilizes no authentication as it’s soly purpose is to accept the DELETE request with arguments and call the purge script accordingly:

from urlparse import urlparse
from BaseHTTPServer import HTTPServer, BaseHTTPRequestHandler
from SocketServer import ThreadingMixIn
import threading
import SocketServer
import SimpleHTTPServer
import HTMLParser
import sys
import hashlib
import json
import cgi
import logging
import time
import BaseHTTPServer
import urllib
import subprocess

class Handler(BaseHTTPRequestHandler):
   def do_DELETE(self):
         self.send_response(200)
         self.end_headers()
         message = threading.currentThread().getName()
         self.wfile.write(message)
         query = self.path.split("&")
         global match
         for name in query:
             if name.split("=")[0] == "match" or name.split("=")[0] == "/?match":
                 match = name.split("=")[1]
             else:
                 exit(1)
         self.wfile.write('Calling: /tmp/purge "' + str(match) + '" /var/cache/idabic/')
         subprocess.call('/tmp/purge "' + str(match) + '" /var/cache/idabic/', shell=True)
         self.wfile.write('OK.1')
         return
class ThreadedHTTPServer(ThreadingMixIn, HTTPServer):
   """Handle requests in a separate thread."""
if __name__ == '__main__':
   server = ThreadedHTTPServer(('', 6969), Handler)
   print 'Starting server, use <Ctrl-C> to stop'
   server.serve_forever()

NOTE: includes belong to parts of this script I’ve removed as they are a part of GET and POST handler. Run the server and call the server on port 6969 (defined in script under ThreadedHTTPServer) as follows:

curl "http://localhost:6969/?match=(localhost\/).*(\.png)" -X DELETE

Test

AT the beginning I’ve created to server blocks to handle two virtual hosts where each would respond to requests having either “localhost” or “local” as hosts. I’ve used same backend server intentionally so I can show use of cache keys and purging script relation. First, let’s make sure two sample files are cached for each vhost:

~$ curl -I localhost/wp-content/uploads/2015/06/splash-cp.png -H 'Host: localhost'

HTTP/1.1 200 OK
ID: HIT

~$ curl -I localhost/wp-content/uploads/2015/06/splash-cp.png -H 'Host: local'

HTTP/1.1 200 OK
ID: HIT

~$ curl -I localhost/wp-content/uploads/2015/06/home-map-1.png -H 'Host: localhost'

HTTP/1.1 200 OK
ID: HIT

$ curl -I localhost/wp-content/uploads/2015/06/home-map-1.png -H 'Host: local'

HTTP/1.1 200 OK
ID: HIT

Now to execute the purge call:

curl "localhost:6969/?match=(local\/).*(\.png)" -X DELETE

And expected result is to have requests called with Host “local” removed from cache and now showing ID: MISS where the rest should stay intact:

$ curl -I localhost/wp-content/uploads/2015/06/splash-cp.png -H 'Host: localhost'

HTTP/1.1 200 OK
ID: HIT

$ curl -I localhost/wp-content/uploads/2015/06/home-map-1.png -H 'Host: local'

HTTP/1.1 200 OK
ID: MISS

$ curl -I localhost/wp-content/uploads/2015/06/home-map-1.png -H 'Host: localhost'

HTTP/1.1 200 OK
ID: HIT

$ curl -I localhost/wp-content/uploads/2015/06/home-map-1.png -H 'Host: local'

HTTP/1.1 200 OK
ID: MISS

I hope this would come in handy to whom it may concern.

Contact Us

Fill out the enquiry form and we'll get back to you as soon as possible.