grimhilde/README.md

32 lines
2.0 KiB
Markdown

# Grimhilde
## Usecase
At dotsrc, mirror statistics are currently generated by the [mirror_stats](https://gitlab.com/dotSRC/mirror_stats) go program. While this approach works decently, it could do with improved granularity, and working with the very large json blobs it generates can be taxing.
Grimhilde is a proposed replacement, currently in development.
## Planning & Structure
By using the `log_format` directive in our nginx configuration, the access log output can be
customized to include only the data we need in machine-readable form rather than the default
hunman-readable form. Different logging facilities can also be configured, including logging over
the syslog protocol. Grimhilde emulates a syslog server by opening a local UDP socket, enabling fast
and reliable communication between nginx and Grimhilde.
Each request processed by nginx is sent to Grimhilde, where its details are stored in a Postgresql
database. As the DotSrc mirror processes millions of requests each day, the amount of data we need
to store would quickly become untenable if stored traditionally. The proposed way around this is to
assign each unique piece of data in every request (request path, referrer, hostname, etc) an ID, and
only store it once, replacing any subsequent identical data with references to the initial copy.
While this will massively decrease the amount of data stored, the database will still grow large
with time. The proposed solution to this is to limit the length of time the data is stored, and to
use the API to generate static views of interest prior to purging data older than f.x. three months.
This will allow staff members to generate advanced reports on realtime data to see and respond to
problematic trends, while also retaining interesting historical usage data to be published publicly
for years to come.
## Todo
* [ ] Write tests
* [ ] Implement the GraphQL API for fetching statistics
* [ ] Find an appropriate solution for generating reports based on data from the API