Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
183 commits
Select commit Hold shift + click to select a range
a0dea5c
Remove useless indirection
OriPekelman Nov 21, 2024
5b3f8e2
Further remove useless indirection and be more explicit
OriPekelman Nov 21, 2024
b02371b
update requirements
OriPekelman Nov 22, 2024
4ded1df
Refactor both SpecialMeta and HtmlVizualisations into data classes. I…
OriPekelman Nov 22, 2024
408181e
Finish refactoring the template. Move out js.
OriPekelman Nov 22, 2024
b539060
merge with upstream
OriPekelman Dec 2, 2024
1e2ff3d
Add local ingestion
OriPekelman Nov 22, 2024
763fe82
PAULA zip files actually contain a zip file. Recurse. Correct Excepti…
OriPekelman Nov 27, 2024
c516d4d
remove api
OriPekelman Nov 29, 2024
e1a6162
test for urn
OriPekelman Nov 29, 2024
37cdaf3
add htmlvis tests
OriPekelman Nov 29, 2024
9d33183
Cleanup syntax for 3.10 bring back delete method from api views
OriPekelman Nov 29, 2024
f94f51c
consistent format
OriPekelman Nov 29, 2024
091fa32
cleanup templates
OriPekelman Nov 29, 2024
4148d3c
Eveyrthing to latets LTS
OriPekelman Nov 29, 2024
f825387
correct missing page title
OriPekelman Nov 30, 2024
03b21e9
update for recent Django
OriPekelman Nov 30, 2024
13ddd0c
Add some notes. And stop using exceptions for flow control. Make sure…
OriPekelman Nov 30, 2024
033ce30
These are now in the code, they cant be managed in the admin
OriPekelman Nov 30, 2024
b0dd3b5
make corpus slugs unique. Still need to figure out texts slugs
OriPekelman Nov 30, 2024
8ac5c7e
add notes about requirement freezing and cache clearing
OriPekelman Nov 30, 2024
dffe7ac
add cacheclear command
OriPekelman Nov 30, 2024
aab68d3
Make corpus slug unique
OriPekelman Nov 30, 2024
e62ae8a
deprecate the old scraper while keeping BC compatability
OriPekelman Dec 2, 2024
f0a826d
correct frozen requirements
OriPekelman Dec 2, 2024
db11b98
Remove deprecated requirements
OriPekelman Dec 2, 2024
977e6a1
Update for Python 3.x
OriPekelman Dec 2, 2024
8fb523a
Run Nav script to update HTML
OriPekelman Dec 2, 2024
32d426f
Rewrote some parts for clarity, added inline docs and optimized a bit…
OriPekelman Dec 2, 2024
a1fb298
add github ci
OriPekelman Dec 2, 2024
080cfa1
Add local config files and htmlvis readme
OriPekelman Dec 2, 2024
cac9763
Correct Cache
OriPekelman Dec 2, 2024
09a7b89
remove cruft
OriPekelman Dec 2, 2024
56a4608
nginx config
OriPekelman Dec 2, 2024
6fee0fb
Only add corpora that are on live
OriPekelman Dec 2, 2024
1503cf6
add Upsun configuration
OriPekelman Dec 3, 2024
78be746
we need to ensure local corpora for ..local too
OriPekelman Dec 3, 2024
33331e0
cleanliness later
OriPekelman Dec 3, 2024
1d78bb8
Cleanliness later
OriPekelman Dec 3, 2024
930efd7
change to stdout
OriPekelman Dec 3, 2024
2359ec0
static content
OriPekelman Dec 3, 2024
adc8c54
I still need to decide where to put the repo..
OriPekelman Dec 11, 2024
97d3f39
This now 'correctly' generates invalid markup
OriPekelman Dec 4, 2024
3022f5e
Preserve format ordering. And simplify json command separated
OriPekelman Dec 11, 2024
92fdd26
correct test
OriPekelman Dec 4, 2024
e037a9d
change the way we determine which config to use. It shouldnt ne in code.
OriPekelman Dec 4, 2024
9111ce1
make sure we clear cache after deployment
OriPekelman Dec 4, 2024
c5b1903
moving complex logic into model - and use prefetching of relationships
OriPekelman Dec 10, 2024
e6b75fa
reduce html diff
OriPekelman Dec 11, 2024
7dab342
correct value pairs
OriPekelman Dec 11, 2024
06316bd
Already sorting
OriPekelman Dec 11, 2024
6d33400
reduce visual delta, but also create proper markup removing newlines
OriPekelman Dec 11, 2024
65a034d
Add logging instead of print
OriPekelman Dec 11, 2024
77d0b6a
Logging settings
OriPekelman Dec 11, 2024
c898ed3
Update tests
OriPekelman Dec 11, 2024
4466f35
visually align output
OriPekelman Dec 11, 2024
91752ef
performance
OriPekelman Dec 11, 2024
b8b399c
sanity check script
OriPekelman Dec 11, 2024
a01c1f1
Move js to own file and hardcode css for viz
OriPekelman Dec 17, 2024
c0fa103
Reproducible install
OriPekelman Dec 17, 2024
a8665c9
More VS Code Launch configs
OriPekelman Dec 17, 2024
0077b85
Add path parameter
OriPekelman Dec 17, 2024
d81c047
add path parameter
OriPekelman Dec 17, 2024
f38b47c
WIP:Refactor ingest and view to be lazy
OriPekelman Dec 17, 2024
292949e
Implement lazy HTML generation(not active); Refactor CSS.
OriPekelman Dec 18, 2024
bad4fc4
Implemented basic FT search
OriPekelman Dec 18, 2024
5781ea3
basic FT search implementation
OriPekelman Dec 21, 2024
9c8e752
finish initial search implementation with highlighting
OriPekelman Jan 6, 2025
7f2fcaa
refactor scraper for indexing and move all maps to settings
OriPekelman Jan 6, 2025
2f3f580
finish initial search implementation with highlighting
OriPekelman Jan 6, 2025
4b84c08
Update tests for refactored scraper
OriPekelman Jan 6, 2025
0b220c4
move all mapping to settings
OriPekelman Jan 6, 2025
5bf126c
Integrate keyman keyboard, finish first version of search make lazine…
OriPekelman Jan 7, 2025
6c06d13
update launch config to add profiling back
OriPekelman Jan 7, 2025
3f7d0df
make this resilient to meilisearch not working. add docs.
OriPekelman Jan 7, 2025
5aac6af
update upsun config
OriPekelman Jan 9, 2025
0a5ebf9
correct git path handling
OriPekelman Jan 9, 2025
4deb1e1
Lets also be resilient to no search during deployment - ooh at any ra…
OriPekelman Jan 9, 2025
8dc99f4
prod settingq
OriPekelman Jan 9, 2025
aff279c
separate search instance
OriPekelman Jan 9, 2025
13a69eb
lowercase
OriPekelman Jan 9, 2025
bc3d8b6
move to mount
OriPekelman Jan 9, 2025
eb84785
shallow clone
OriPekelman Jan 9, 2025
be350c4
Higher memory
OriPekelman Jan 9, 2025
0b91ae5
Higher memory
OriPekelman Jan 9, 2025
c2be5b8
forgot relationship
OriPekelman Jan 10, 2025
e5b55ee
add runtime operations
OriPekelman Jan 10, 2025
b3b8296
dumps dir for meilisearch
OriPekelman Jan 10, 2025
365daeb
Run on port 8888
OriPekelman Jan 10, 2025
0a60d90
meiliurl
OriPekelman Jan 10, 2025
3b4316c
high mem
OriPekelman Jan 13, 2025
696cb10
explicitly run blackfire?
OriPekelman Jan 13, 2025
ca462f1
Simplify the pairs logic, move to query
OriPekelman Jan 13, 2025
54ffcec
remove uneeded test
OriPekelman Jan 13, 2025
0644991
correct authors
OriPekelman Jan 13, 2025
4568b84
Dont run FT if text empty
OriPekelman Jan 13, 2025
9ec6781
cleanup before import
OriPekelman Jan 17, 2025
5afb127
Doc on search
OriPekelman Jan 17, 2025
42c6d2e
Refactor meta values and html vis formats
OriPekelman Jan 17, 2025
f647217
Finish first version of FT and start faceted search
OriPekelman Jan 17, 2025
5c0a182
Minor logging
OriPekelman Jan 17, 2025
37423c9
MetaOrder was only used in the derprecated API
OriPekelman Jan 18, 2025
debc870
back to using individual css and config
OriPekelman Jan 19, 2025
9843b03
back to using individual css and config
OriPekelman Jan 19, 2025
914f9cc
Cleanup and prepare for production
OriPekelman Jan 20, 2025
151aeef
Implement search in meta fields
OriPekelman Jan 20, 2025
18d9836
going lazy in production
OriPekelman Jan 20, 2025
c8c21ce
correct typo
OriPekelman Jan 20, 2025
afb632f
make indexing more efficient
OriPekelman Jan 20, 2025
9ce14c1
add caching to get_text
OriPekelman Jan 20, 2025
973a6ce
correct regression in format display
OriPekelman Jan 20, 2025
cf5410f
correct typos
OriPekelman Jan 20, 2025
2bf7a8d
Resolve bug from incosistent visualisation handling
OriPekelman Jan 21, 2025
4cfeda4
Resolve bug from incosistent visualisation handling
OriPekelman Jan 21, 2025
1901fd1
Dont load repo other than in ingestion. Correct ellipsis logic
OriPekelman Jan 21, 2025
5787735
add the css hack for sahidica too. Correct docs
OriPekelman Jan 21, 2025
5ee5d4b
This is very bad code .. but even worse when there are no results.
OriPekelman Jan 21, 2025
26a8222
we are not longer removing the urls at the orign
OriPekelman Jan 21, 2025
13fb3f6
Remove non-lazy generation. Resovle encoding bug
OriPekelman Jan 21, 2025
5313b34
Update test for html removal
OriPekelman Jan 21, 2025
5281d04
faceted search not ready yet. but minimal cleanup
OriPekelman Jan 21, 2025
4aa4017
Now that we manage splittables correctly lets also group them
OriPekelman Jan 21, 2025
083b216
Fix regression on visualisation order
OriPekelman Jan 22, 2025
2d3defe
Implement search in English translation
OriPekelman Jan 22, 2025
f4eee66
Implement faceted search
OriPekelman Jan 23, 2025
5cc17e3
Add urn handling to faceted search implementation
OriPekelman Jan 23, 2025
1388467
correct author urn and regression
OriPekelman Jan 23, 2025
002f4aa
We are setting an API key we can expose
OriPekelman Jan 23, 2025
a9f9a7a
Correct typo
OriPekelman Jan 23, 2025
4f1e50a
Correct typo
OriPekelman Jan 23, 2025
10a7099
Lower memory profile
OriPekelman Jan 23, 2025
e951470
back to higher memory
OriPekelman Jan 23, 2025
5d9aee6
cleanliness
OriPekelman Jan 23, 2025
715dd92
Try uv
OriPekelman Jan 24, 2025
ceb6289
get into env
OriPekelman Jan 24, 2025
abff6bd
get into env
OriPekelman Jan 24, 2025
013bfd9
get into env
OriPekelman Jan 24, 2025
0dddaa1
get into env
OriPekelman Jan 24, 2025
422c854
use uv for start command
OriPekelman Jan 24, 2025
6ebe44a
Limit available memory to meiliindexing
OriPekelman Jan 24, 2025
ad33bd6
limit meilie memeoruy
OriPekelman Jan 24, 2025
d0e307a
Minimize slug and try generic image for meilisearch
OriPekelman Jan 25, 2025
33607aa
Remove deprecated CSS files for various visualization modes and centr…
OriPekelman Feb 4, 2025
898dd2d
Correct vizbar inclusion bug
OriPekelman Feb 4, 2025
9142358
Remove depencency on jQuery
OriPekelman Feb 4, 2025
e0fca7e
Add virtual keyboard and search preferences functionality
OriPekelman Feb 4, 2025
48d9dc7
Enhance search functionality and update UI components
OriPekelman Feb 4, 2025
83d5447
Remove django-grappelli dependency from project
OriPekelman Feb 4, 2025
7f37ffa
Using the built-in meilisearch ellipsis .. no need for own test
OriPekelman Feb 4, 2025
9a16bc7
Lock meilisearch version
OriPekelman Feb 4, 2025
66695ad
Revert Lock meilisearch version
OriPekelman Feb 4, 2025
11c92bd
just for a rebuild
OriPekelman Feb 4, 2025
c4dbed9
Lets download the specific version
OriPekelman Feb 4, 2025
8cb9b1c
Lets download the specific version
OriPekelman Feb 4, 2025
cd77685
Lets download the specific version
OriPekelman Feb 4, 2025
e86c1c5
allow serving from https
OriPekelman Feb 4, 2025
7372edc
correct link
OriPekelman Feb 4, 2025
8a2581a
Add uv instructions
OriPekelman Feb 4, 2025
736ad87
Handle empty search better
OriPekelman Feb 4, 2025
743553d
rewrite wikipedia.js in modern style so it can be debugged. Add https…
OriPekelman Feb 4, 2025
ba05f4f
Correct filtering and make it prettier
OriPekelman Feb 4, 2025
189b8be
Enhance faceted search with improved URL handling and UI
OriPekelman Feb 5, 2025
ba39c17
Implement pagination for faceted search results
OriPekelman Feb 5, 2025
d35cc40
Improve pagination layout with consistent button sizing
OriPekelman Feb 5, 2025
310f459
Enhance search results explanation with interactive details and impro…
OriPekelman Feb 5, 2025
a733b6d
Make this robust to old style urls with places=instead of text_mata.p…
OriPekelman Feb 5, 2025
fd4f1c6
Refactor text metadata handling and improve template filter
OriPekelman Feb 5, 2025
a5bf8cc
Improve search result display with more precise text matching and hig…
OriPekelman Feb 11, 2025
6a3a01a
Simplify preferences UI and refactor keyboard/search behavior.
OriPekelman Feb 12, 2025
4ed39ad
Streamline preferences layout and search options
OriPekelman Feb 12, 2025
e8c1b9b
Resolve HTML sturcture regression issue
OriPekelman Feb 12, 2025
38f3c33
Add HTML sanitization and fix custom template filter and resolve css …
OriPekelman Feb 12, 2025
81603a8
Implement native text ordering and simplify corpus view query
OriPekelman Feb 12, 2025
c18f4b3
Update HTML markup: Remove self-closing tags in templates
OriPekelman Feb 12, 2025
25f5de7
adapt to uv
OriPekelman Feb 12, 2025
3e51b53
adapt to uv
OriPekelman Feb 12, 2025
0f12139
adapt to uv
OriPekelman Feb 12, 2025
6a8f7eb
Correct ExactSearch regression
OriPekelman Feb 19, 2025
c6df89f
Better highlighting
OriPekelman Feb 25, 2025
b8409a4
Quote the whole shebang, not word by word
OriPekelman Feb 26, 2025
f180d1a
Correct pacific edu url
OriPekelman Mar 5, 2025
960a8d0
Update README and requirements: Adjust installation instructions for …
OriPekelman Apr 10, 2025
86167b6
Update uv.lock: Increment revision
OriPekelman Apr 10, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
35 changes: 35 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
name: Django CI

on:
push:
branches:
- main
pull_request:
branches:
- main

jobs:
build:
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v2

- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.8'

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements_django_5.txt

- name: Run addcorpora.sh
run: |
chmod +x ./addcorpora.sh
./addcorpora.sh

- name: Run tests
run: python manage.py test -t .
205 changes: 205 additions & 0 deletions .upsun/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,205 @@
# Complete list of all available properties: https://docs.upsun.com/create-apps/app-reference.html
applications:
search:
source:
root: "/meili_empty_root"
container_profile: HIGH_MEMORY
stack: [ "meilisearch" ]
hooks:
build: |
wget https://github.com/meilisearch/meilisearch/releases/download/v1.12.8/meilisearch-linux-amd64
mv meilisearch-linux-amd64 meilisearch
chmod +x meilisearch
web:
# Commands are run once after deployment to start the application process.
# More information: https://docs.upsun.com/create-apps/app-reference.html#web-commands
commands:
# The command to launch your app. If it terminates, it’s restarted immediately.
# You can use the $PORT or the $SOCKET environment variable depending on the socket family of your upstream
start: "./meilisearch --master-key=$PLATFORM_PROJECT_ENTROPY --http-addr=localhost:$PORT"
# You can listen to a UNIX socket (unix) or a TCP port (tcp, default).
# Whether your app should speak to the webserver via TCP or Unix socket. Defaults to tcp
# More information: https://docs.upsun.com/create-apps/app-reference.html#where-to-listen
upstream:
# Whether your app should speak to the webserver via TCP or Unix socket. Defaults to tcp
# More information: https://docs.upsun.com/create-apps/app-reference.html#where-to-listen
socket_family: tcp
# Each key in locations is a path on your site with a leading /.
# More information: https://docs.upsun.com/create-apps/app-reference.html#locations
locations:
"/":
passthru: true
mounts:
"/data.ms":
source: "storage"
source_path: "data.ms"
"/dumps":
source: "storage"
source_path: "dumps"

cts:
# Application source code directory
source:
root: "/coptic"
container_profile: HIGHER_MEMORY
# The runtime the application uses.
# Complete list of available runtimes: https://docs.upsun.com/create-apps/app-reference.html#types
type: "python:3.12"

# Choose which container profile (ratio CPU+RAM) your app will use. Default value comes from the image itself.
# More information: https://docs.upsun.com/manage-resources/adjust-resources.html#adjust-a-container-profile
# container_profile:

# The relationships of the application with services or other applications.
# The left-hand side is the name of the relationship as it will be exposed
# to the application in the PLATFORM_RELATIONSHIPS variable. The right-hand
# side is in the form `<service name>:<endpoint name>`.
# More information: https://docs.upsun.com/create-apps/app-reference.html#relationships

relationships:
search: "search:http"

# Mounts define directories that are writable after the build is complete.
# More information: https://docs.upsun.com/create-apps/app-reference.html#mounts
mounts:
"/db": # Represents the path in the app.
source: "storage" # "storage" sources are unique to the app, but shared among instances of the app. "service" sources can be shared among apps.
source_path: "db" # The subdirectory within the mounted disk (the source) where the mount should point.
"/corpora": # Represents the path in the app.
source: "storage" # "storage" sources are unique to the app, but shared among instances of the app. "service" sources can be shared among apps.
source_path: "corpora" # The subdirectory within the mounted disk (the source) where the mount should point.
# The web key configures the web server running in front of your app.
# More information: https://docs.upsun.com/create-apps/app-reference.html#web
web:
# Commands are run once after deployment to start the application process.
# More information: https://docs.upsun.com/create-apps/app-reference.html#web-commands
commands:
# The command to launch your app. If it terminates, it’s restarted immediately.
# You can use the $PORT or the $SOCKET environment variable depending on the socket family of your upstream
start: ". $HOME/.local/bin/env;uv run gunicorn --workers 3 --bind unix:$SOCKET coptic.wsgi:application"
# You can listen to a UNIX socket (unix) or a TCP port (tcp, default).
# Whether your app should speak to the webserver via TCP or Unix socket. Defaults to tcp
# More information: https://docs.upsun.com/create-apps/app-reference.html#where-to-listen
upstream:
# Whether your app should speak to the webserver via TCP or Unix socket. Defaults to tcp
# More information: https://docs.upsun.com/create-apps/app-reference.html#where-to-listen
socket_family: unix
# Each key in locations is a path on your site with a leading /.
# More information: https://docs.upsun.com/create-apps/app-reference.html#locations
locations:
"/":
passthru: true
"/static":
"allow": true
"expires": "1h"
"root": "static"


# Alternate copies of the application to run as background processes.
# More information: https://docs.upsun.com/create-apps/app-reference.html#workers
# workers:

# The timezone for crons to run. Format: a TZ database name. Defaults to UTC, which is the timezone used for all logs
# no matter the value here. More information: https://docs.upsun.com/create-apps/timezone.html
# timezone: <time-zone>

# Access control for roles accessing app environments.
# More information: https://docs.upsun.com/create-apps/app-reference.html#access
# access:

# Variables to control the environment. More information: https://docs.upsun.com/create-apps/app-reference.html#variables
# variables:
# env:
# # Add environment variables here that are static.
# PYTHONUNBUFFERED: "1"

# Outbound firewall rules for the application. More information: https://docs.upsun.com/create-apps/app-reference.html#firewall
# firewall:

# Specifies a default set of build tasks to run. Flavors are language-specific.
# More information: https://docs.upsun.com/create-apps/app-reference.html#build
build:
flavor: none

# Installs global dependencies as part of the build process. They’re independent of your app’s dependencies and
# are available in the PATH during the build process and in the runtime environment. They’re installed before
# the build hook runs using a package manager for the language.
# More information: https://docs.upsun.com/create-apps/app-reference.html#dependencies
# dependencies:
# python3: # Specify one Python 3 package per line.
# numpy: '*'

# Hooks allow you to customize your code/environment as the project moves through the build and deploy stages
# More information: https://docs.upsun.com/create-apps/app-reference.html#hooks
hooks:
# The build hook is run after any build flavor.
# More information: https://docs.upsun.com/create-apps/hooks/hooks-comparison.html#build-hook
build: |
set -eux
curl -LsSf https://astral.sh/uv/install.sh | sh
. $HOME/.local/bin/env
uv sync
#./manage.py collectstatic --noinput

# The deploy hook is run after the app container has been started, but before it has started accepting requests.
# More information: https://docs.upsun.com/create-apps/hooks/hooks-comparison.html#deploy-hook
deploy: |
set -eux
# echo 'Put your deploy command here'
uvm migrate

# The post_deploy hook is run after the app container has been started and after it has started accepting requests.
# More information: https://docs.upsun.com/create-apps/hooks/hooks-comparison.html#deploy-hook
post_deploy: |
uvm clearcache
operations:
import:
role: admin
commands:
start: uv run ./addcorpora.sh corpora
index:
role: admin
commands:
start: uvm index_corpora
clearcache:
role: admin
commands:
start: uvm clearcache
# Scheduled tasks for the app.
# More information: https://docs.upsun.com/create-apps/app-reference.html#crons
# crons:

# Customizations to your PHP or Lisp runtime. More information: https://docs.upsun.com/create-apps/app-reference.html#runtime
# runtime:

# More information: https://docs.upsun.com/create-apps/app-reference.html#additional-hosts
# additional_hosts:

# The services of the project.
#
# Each service listed will be deployed
# to power your Upsun project.
# More information: https://docs.upsun.com/add-services.html
# Full list of available services: https://docs.upsun.com/add-services.html#available-services
# services:
# db:
# type: postgresql:14


# The routes of the project.
#
# Each route describes how an incoming URL is going
# to be processed by Upsun.
# More information: https://docs.upsun.com/define-routes.html
routes:
"https://{default}/":
type: upstream
upstream: "cts:http"
"https://search.{default}/":
type: upstream
upstream: "search:http"
# A basic redirect definition
# More information: https://docs.upsun.com/define-routes.html#basic-redirect-definition
"https://www.{default}":
type: redirect
to: "https://{default}/"
1 change: 0 additions & 1 deletion ansible/.gitignore

This file was deleted.

58 changes: 0 additions & 58 deletions ansible/README.md

This file was deleted.

11 changes: 0 additions & 11 deletions ansible/host_vars/cloud-test-scriptorium

This file was deleted.

11 changes: 0 additions & 11 deletions ansible/host_vars/test-scriptorium

This file was deleted.

4 changes: 0 additions & 4 deletions ansible/roles/scriptorium/defaults/main.yml

This file was deleted.

26 changes: 0 additions & 26 deletions ansible/roles/scriptorium/files/scriptorium.conf

This file was deleted.

Loading