Bilingual Youth Crisis Detection Guardrail

This talk presents a guardrail for youth mental health conversations, detecting gradual crises using a two-stage system. Learn how a novel LLM prompting technique improves crisis detection accuracy.

mmBERT Cohere c4ai-command-a-03-2025 PyTorch Hugging Face Transformers gpt-oss-120b

Overview

A stateful, multi-turn input guardrail that screens an entire sensitive youth mental health conversation arc (not just the latest input) to catch crises that build up gradually (“slow drift”), the failure mode where each turn looks benign but the cumulative trajectory is high-risk.
Live, I’ll walk through the actual system: the two-stage stack (fine-tuned mmBERT classifier → Cohere c4ai chain-of-thought judge), the 5-question reasoning prompt that made the difference, and the evaluation harness output on a hidden validation set (F1 0.899, recall 0.954 at ~1.16s/sample). I’ll show the architecture diagram, the prompt engineering and the red-team CSVs that trained the classifier.

Links

https://github.com/yanischalel/multilingual-guardrail-mila-hackathon
Fine-tuned mmBERT-base and Cohere-adjudicated pipelines secure multilingual, multi-turn conversations.

Tech stack

mmBERT

mmBERT is an open-source, massively multilingual encoder-only language model trained on 3 trillion tokens across 1,833 languages.

Developed by Johns Hopkins University, mmBERT updates the aging XLM-RoBERTa architecture by bringing modern transformer optimizations to encoder-only models (1.1.1, 1.2.6). Built on the high-performance ModernBERT architecture, it delivers 2 to 4 times faster inference speeds and natively supports an expanded 8,192-token context window (1.1.1, 1.2.4). The core innovation is its annealed language learning training strategy: a three-phase schedule that prevents overfitting on high-resource languages and ensures robust representation for low-resource languages (1.1.1, 1.1.7). This approach makes mmBERT a highly efficient, production-ready standard for multilingual classification, retrieval, and semantic search (1.1.1, 1.2.5).

https://github.com/JHU-CLSP/mmBERT

View projects
Cohere c4ai-command-a-03-2025

Cohere's 111-billion-parameter open-weights model built for high-throughput enterprise tasks, advanced tool use, and multilingual operations across 23 languages.

Developed by Cohere and Cohere For AI, Command A is a 111-billion-parameter open-weights model designed to deliver maximum performance with minimal hardware overhead (deployable on just two GPUs). It features a massive 256,000-token context length and delivers a 150% throughput boost over its predecessor, Command R+ 08-2024. Optimized for business-critical workflows, the model excels at agentic tasks, multi-step tool use, and retrieval-augmented generation (RAG) with built-in document citation, making it a highly efficient, secure choice for demanding enterprise environments.

https://huggingface.co/CohereForAI/c4ai-command-a-03-2025

View projects
PyTorch

PyTorch is the open-source machine learning framework: it provides a Python-first tensor library with strong GPU acceleration and a dynamic computation graph for building deep neural networks.

PyTorch, developed by Meta AI, is a premier open-source deep learning framework favored in both research and production environments. Its core is a powerful tensor library (like NumPy) optimized for GPU acceleration, delivering 50x or greater speedups for complex computations. The key differentiator is its 'Pythonic' design and dynamic computation graph (eager execution), which allows for rapid prototyping and simplified debugging compared to static-graph frameworks. Leveraging its Autograd system for automatic differentiation, practitioners build and train models for computer vision and NLP; major companies like Tesla (Autopilot) and Microsoft utilize PyTorch for critical AI applications.

https://pytorch.org

View projects
Hugging Face Transformers

The Hugging Face Transformers library is the premier open-source Python toolkit, providing a unified API for over 1M+ state-of-the-art pre-trained models (like BERT, GPT-3, T5) across NLP, vision, and audio tasks.

Hugging Face Transformers is the essential open-source Python library for democratizing state-of-the-art machine learning. It delivers a unified, framework-agnostic API (PyTorch, TensorFlow) for accessing and utilizing over 1M+ pre-trained model checkpoints, including industry standards like BERT, GPT-2, and T5. Developers leverage the high-level `Pipeline` class for rapid, optimized inference (e.g., text generation, sentiment analysis) and the `Trainer` class for efficient fine-tuning and distributed training. This core library connects the ML community to the vast Hugging Face Hub, accelerating the deployment of models across text, vision, and audio modalities with minimal code.

https://huggingface.co/docs/transformers/index

View projects
gpt-oss-120b

OpenAI's open-weight Mixture-of-Experts model built for local, high-reasoning production workloads on a single 80GB GPU.

Released by OpenAI, gpt-oss-120b is a 117-billion parameter open-weight language model designed to deliver advanced reasoning and agentic capabilities directly on local infrastructure. By leveraging a Mixture-of-Experts (MoE) architecture, the model activates only 5.1 billion parameters per token, allowing developers to run high-performance workloads efficiently on a single 80GB GPU (such as an NVIDIA H100 or AMD MI300X). It features a 131,072-token context window and integrates natively with Hugging Face Transformers and Ollama, making it a highly accessible option for complex instruction following and tool-use tasks without the cloud overhead.

https://huggingface.co/openai/gpt-oss-120b

View projects