Projects - JOLPA LIMITED

2.1B token curated dataset from UK legal sources for a London‑based legal AI startup. Included synthetic summarisation labels.

[legal_v1.4.jol]

500k annotated product images with bounding boxes and attributes for computer vision shelf analysis.

[retail_synth_v2.jol]

Real‑time structured news feed with sentiment labels and entity extraction, powering a financial sentiment model.

[news_feed_live.jol]

Synthetic labels and segmentation masks for 200k chest X‑rays, enabling rare pathology detection.

[med_img_v1.jol]

Curated dataset of 10M code snippets with synthetic natural language descriptions for code‑LLM training.

[code_synth.jol]

Cleaned and deduplicated web crawl data across 12 languages with quality scoring and topic classification.

[web_crawl_v3.jol]

> Recent Pipelines