A python tool to watch directories on filesystems.

The Watcher is a script i wrote which watches a directory for changes and then copies the changes on files to swarm. Its written for python 3 and got a little bigger than it needed to be. Full disclosure, i'm a self taught programmer and while i tested what follows there's likely all sorts of scenario's i've not gone through. But if i can do it , anyone can so don't be afraid to give it a try and improve it.

The usecase is pretty simple. Say you have a desktop or server that needs to write a copy of some files to a Swarm cluster or Content-Gateway endpoint regularly but you want to have the process to be non interactive. There are a ton of S3 clients out there that can mount a bucket as a pseudo filesystem but its not really what we're looking for. You could also use something like rclone or even just a periodic s3 sync via the aws-cli but it would again require manual intervention. The goal here is really to have a background process that runs like a drop zone for users to have as a very simple backup location.

Ideally i wanted to write something that was cross-platform or at least something that could be made cross platform. When it comes to watching the filesystem you can subscribe to the filesystem api itself which changes from operating system to operating system. In windows the implementation is via the winapi, linux its inotify and for mac it is kqueue or FSEvents they all have different approaches and drawbacks. Alternatively you can poll the filesystem periodically for changes and do something with those changes. Thankfully there is a very robust python module that does both called watchdog.

https://pypi.org/project/watchdog/

documentation here

https://python-watchdog.readthedocs.io/en/v0.10.2/

They also detail some of the pitfalls when using the os based apis here:

https://python-watchdog.readthedocs.io/en/v0.10.2/installation.html#supported-platforms-and-caveats.

So lets get started. First we'll want to install the modules that we will need and import them into our script for testing. We will use the python module requests for handling the http posts to Swarm and the watchdog will be the main workhorse to poll the filesystem. We may also want to query the os at some point to get file info

pip3 install requests

pip3 install watchdog

Then we can go through our imports for python at the top of the file

import sys
import time
# this is used to read the file and we call the stat to get the filesize
import os
# will add logging later
import logging
# want to get the mimetype to add as header data for the upload
import magic
# watchdog is the module which scans the directories and the observer is the watching part of the watchdog
from watchdog.observers import Observer
# This is the filesystem event handler , there's a few different types this one just takes action on the filesystem
from watchdog.events import FileSystemEventHandler
# requests is used for the http upload.
import requests
import json

Some of the imports above we my never use but we'll clean them up as we go. For example "magic" is a python module for figuring out what the content-type of a file is which we can use to as header metadata.

Looking at the documentation for watchdog it comes with a nice simple example on how to use it for logging events.

import sys
import logging
from watchdog.observers import Observer
from watchdog.events import LoggingEventHandler

if __name__ == "__main__":
    logging.basicConfig(level=logging.INFO,
                        format='%(asctime)s - %(message)s',
                        datefmt='%Y-%m-%d %H:%M:%S')
    path = sys.argv[1] if len(sys.argv) > 1 else '.'
    event_handler = LoggingEventHandler()
    observer = Observer()
    observer.schedule(event_handler, path, recursive=True)
    observer.start()
    try:
        while observer.isAlive():
            observer.join(1)
    except KeyboardInterrupt:
        observer.stop()
    observer.join()

This example allows you to enter a directory path or just the directory where the script is run from and monitor for changes. This runs inconsole and will output to a the terminal.
So if we create an python script from our example code the output would look lke this.

tony@tony-NUC8i7HVK:~/watcher3/watchertests$ touch docsexample.py
tony@tony-NUC8i7HVK:~/watcher3/watchertests$ vi docsexample.py 
tony@tony-NUC8i7HVK:~/watcher3/watchertests$ chmod +x docsexample.py
tony@tony-NUC8i7HVK:~/watcher3/watchertests$ python3 docsexample.py 
docsexample.py:16: DeprecationWarning: isAlive() is deprecated, use is_alive() instead
  while observer.isAlive():
2020-10-13 08:58:21 - Created file: ./newfile11
2020-10-13 08:58:21 - Modified directory: .
2020-10-13 08:58:21 - Modified file: ./newfile11

When looking at the documentation for the watcher it gives us different handlers for the different types of operations. see here https://pythonhosted.org/watchdog/api.html

So lets start with the Filesystem event handler.

# this class overrides the existing Filesystem event handler so we can have it do more stuff.
class Event(FileSystemEventHandler):
    # on_created is a method that takes action only on newly created files.
    # there are also methods for modified and deleted etc
    def on_created(self, event):
        # here we're getting the name of the file / filepath for a new file.
        filenamefromwatchdog = event.src_path

We can add the filesystem event watcher to our basic example with as below.

import sys
import logging
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler

class Event(FileSystemEventHandler):
    def on_created(self, event):
        filenamefromwatchdog = str(event.src_path)
        print(filenamefromwatchdog + "This came from our new class")



if __name__ == "__main__":
    logging.basicConfig(level=logging.INFO,
                        format='%(asctime)s - %(message)s',
                        datefmt='%Y-%m-%d %H:%M:%S')
    path = sys.argv[1] if len(sys.argv) > 1 else '.'
    event_handler = Event()
    observer = Observer()
    observer.schedule(event_handler, path, recursive=True)
    observer.start()
    try:
        while observer.is_alive():
            observer.join(1)
    except KeyboardInterrupt:
        observer.stop()
    observer.join()

Here we have changed the watchdog.events line, and also added a new class which overrides the basic class that ships with the module. This new class doesn't do much right now except store the filename as a string and then print it out to console with an extra note to let us know that it came from the new class rather than the loggingEventFilter.

tony@tony-NUC8i7HVK:~/watcher3/watchertests$ python3 docsexample.py 
./Prometheus Monitoring with Swarm.pptx This came from our new class

But from here you can see the scaffold that we'll build on.

We'll make it do something useful in Part 2!


By Tony Lokko