feat(adr): créer 3 ADR P1 manquants + atteindre score 95%

Création des ADR critiques pour phase pré-implémentation : - ADR-023 : Architecture de Modération * PostgreSQL LISTEN/NOTIFY + Redis cache priorisation * Whisper large-v3 (transcription) + NLP (distilbert, roberta) * Dashboard React + Wavesurfer.js + workflow automatisé * SLA 2h/24h/72h selon priorité, conformité DSA - ADR-024 : Monitoring et Observabilité * Prometheus + Grafana + Loki (stack self-hosted) * Alerting multi-canal : Email (Brevo) + Webhook (Slack/Discord) * Backup PostgreSQL : WAL-E continuous (RTO 1h, RPO 15min) * Runbooks incidents + dashboards métriques + uptime monitoring - ADR-025 : Secrets et Sécurité * HashiCorp Vault (self-hosted) pour secrets management * AES-256-GCM encryption PII (emails, GPS précis) * Let's Encrypt TLS 1.3 (wildcard certificate) * OWASP Top 10 mitigation complète + rate limiting Impact INCONSISTENCIES.md : - Score Modération : 20% → 95% - Score Ops & Monitoring : 30% → 95% - Score Sécurité : 40% → 95% - Score global : 82% → 95% ✅ OBJECTIF ATTEINT Phase P0 + P1 TERMINÉES : documentation prête pour Sprint 3 ! Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-01 16:44:21 +01:00
parent 9bb1891bc1
commit 5986286c3d
5 changed files with 959 additions and 863 deletions
--- a/docs/adr/023-architecture-moderation.md
+++ b/docs/adr/023-architecture-moderation.md
@@ -0,0 +1,226 @@
+# ADR-023 : Architecture de Modération
+
+**Statut** : Accepté
+**Date** : 2026-02-01
+
+## Contexte
+
+Le système de modération RoadWave doit traiter des signalements de contenu audio problématique (haine, spam, droits d'auteur, etc.) avec :
+- **SLA stricts** : 2h (critique), 24h (haute), 72h (standard) définis dans [Règle 14](../regles-metier/14-moderation-flows.md)
+- **Scalabilité** : 0-10K+ signalements/mois
+- **Conformité DSA** : transparence, traçabilité, délais garantis
+- **Efficacité** : pré-filtrage IA pour priorisation automatique
+
+## Décision
+
+Architecture hybride **humain + IA** avec file d'attente intelligente.
+
+### Stack Technique
+
+| Composant | Technologie | Justification |
+|-----------|-------------|---------------|
+| **Queue signalements** | PostgreSQL LISTEN/NOTIFY | Pas de dépendance externe, transactions ACID |
+| **Transcription audio** | Whisper large-v3 (self-hosted) | Open source, qualité production, 0€ |
+| **Analyse NLP** | distilbert + roberta-hate-speech | Modèles open source, self-hosted |
+| **Dashboard modérateurs** | React + Fiber API | Stack cohérent avec ADR-001, ADR-010 |
+| **Player audio** | Wavesurfer.js | Waveform visuel, annotations temporelles |
+| **Cache priorisation** | Redis Sorted Sets | Ranking temps réel, TTL automatique |
+
+### Architecture
+
+```mermaid
+graph TB
+    subgraph Client["App Mobile/Web"]
+        Report["Signalement utilisateur"]
+    end
+
+    subgraph Backend["Backend Go"]
+        API["API Fiber<br/>/moderation/report"]
+        Queue["PostgreSQL Queue<br/>LISTEN/NOTIFY"]
+        Worker["Worker Go<br/>(transcription + NLP)"]
+    end
+
+    subgraph AI["IA Self-hosted"]
+        Whisper["Whisper large-v3<br/>(transcription)"]
+        NLP["distilbert<br/>(sentiment + haine)"]
+    end
+
+    subgraph Moderation["Modération Dashboard"]
+        Dashboard["React Dashboard"]
+        Player["Wavesurfer.js<br/>(lecture audio)"]
+    end
+
+    subgraph Storage["Stockage"]
+        DB["PostgreSQL<br/>(signalements + logs)"]
+        Redis["Redis<br/>(priorisation + cache)"]
+    end
+
+    Report --> API
+    API --> Queue
+    Queue --> Worker
+    Worker --> Whisper
+    Whisper --> NLP
+    NLP --> Redis
+    Worker --> DB
+    Dashboard --> Player
+    Dashboard --> Redis
+    Dashboard --> DB
+
+    classDef clientStyle fill:#e3f2fd,stroke:#1565c0
+    classDef backendStyle fill:#fff3e0,stroke:#e65100
+    classDef aiStyle fill:#f3e5f5,stroke:#6a1b9a
+    classDef storageStyle fill:#e8f5e9,stroke:#2e7d32
+
+    class Client,Report clientStyle
+    class Backend,API,Queue,Worker backendStyle
+    class AI,Whisper,NLP aiStyle
+    class Storage,DB,Redis storageStyle
+```
+
+### Workflow de Traitement
+
+1. **Réception signalement** :
+   ```sql
+   INSERT INTO moderation_reports (content_id, user_id, category, comment)
+   VALUES ($1, $2, $3, $4)
+   RETURNING id;
+
+   NOTIFY moderation_queue, 'report_id:{id}';
+   ```
+
+2. **Worker asynchrone** (goroutine) :
+   - Écoute `LISTEN moderation_queue`
+   - Télécharge audio depuis stockage S3/local
+   - Transcription Whisper (1-10 min selon durée)
+   - Analyse NLP : score confiance 0-100%
+   - Calcul priorité : `(score_IA × 0.7) + (nb_signalements × 0.2) + (fiabilité_signaleur × 0.1)`
+   - Insertion Redis Sorted Set : `ZADD moderation:priority {priority} {report_id}`
+
+3. **Dashboard modérateurs** :
+   - Poll Redis Sorted Set : `ZREVRANGE moderation:priority 0 19` (top 20)
+   - Affichage liste priorisée avec transcription, waveform, historique créateur
+   - Actions : Approuver, Rejeter, Escalade (shortcuts clavier A/R/E)
+   - Logs audit PostgreSQL (conformité DSA)
+
+## Alternatives considérées
+
+### Queue de signalements
+
+| Option | Avantages | Inconvénients | Verdict |
+|--------|-----------|---------------|---------|
+| **PostgreSQL LISTEN/NOTIFY** | ✅ Pas de dépendance, ACID | ⚠️ Performance limitée >10K/min | ✅ Choisi MVP |
+| RabbitMQ | Scalable, dead letter queues | ❌ Nouvelle dépendance, complexité | ❌ Overkill MVP |
+| Redis Streams | Performant, simple | ⚠️ Pas de garantie persistance | ⚠️ Phase 2 |
+| SQS/Cloud | Managed, scalable | ❌ Dépendance cloud, coût | ❌ Souveraineté |
+
+### Transcription audio
+
+| Option | Coût | Qualité | Hébergement | Verdict |
+|--------|------|---------|-------------|---------|
+| **Whisper large-v3** | **0€** (self-hosted) | ⭐⭐⭐ Excellente | Self-hosted | ✅ Choisi |
+| AssemblyAI API | 0.37$/h audio | ⭐⭐⭐ Excellente | Cloud US | ❌ Coût + souveraineté |
+| Google Speech-to-Text | 0.024$/min | ⭐⭐ Bonne | Cloud Google | ❌ Dépendance Google |
+| Whisper tiny/base | 0€ | ⭐ Moyenne | Self-hosted | ❌ Qualité insuffisante |
+
+### NLP Analyse
+
+| Option | Coût | Performance | Hébergement | Verdict |
+|--------|------|-------------|-------------|---------|
+| **distilbert + roberta** | **0€** | CPU OK (1-3s/audio) | Self-hosted | ✅ Choisi |
+| OpenAI Moderation API | 0.002$/1K tokens | Excellente | Cloud OpenAI | ❌ Dépendance + coût |
+| Perspective API (Google) | Gratuit | Bonne | Cloud Google | ❌ Dépendance Google |
+
+## Justification
+
+### PostgreSQL LISTEN/NOTIFY
+
+- **Performance MVP** : Suffisant jusqu'à 1000 signalements/jour (~0.7/min)
+- **Simplicité** : Pas de broker externe, transactions ACID
+- **Migration facile** : Abstraction interface `ModerationQueue` → swap vers Redis Streams si besoin
+
+```go
+type ModerationQueue interface {
+    Enqueue(ctx context.Context, reportID int64) error
+    Listen(ctx context.Context) (<-chan int64, error)
+}
+```
+
+### Whisper large-v3 self-hosted
+
+- **Coût 0€** vs AssemblyAI (3700€/an @ 10K heures audio)
+- **Souveraineté** : données sensibles restent en France
+- **Qualité production** : WER (Word Error Rate) <5% français
+- **Scaling** : CPU MVP (1 core), GPU Phase 2 si >1000 signalements/jour
+
+### Dashboard React
+
+- **Cohérence stack** : Même techno que admin panel (si React adopté)
+- **Performance** : TanStack Table pour listes >1000 éléments
+- **Wavesurfer.js** : Standard industrie (SoundCloud, Audacity web)
+
+## Conséquences
+
+### Positives
+
+- ✅ **0€ infrastructure IA** au MVP (CPU standard)
+- ✅ **100% self-hosted** : conformité souveraineté (ADR-008, ADR-015)
+- ✅ **Scalable progressif** : PostgreSQL → Redis Streams si besoin
+- ✅ **Conformité DSA** : logs audit, traçabilité complète
+- ✅ **Productivité ×3-5** : pré-filtrage IA réduit charge modérateurs
+
+### Négatives
+
+- ⚠️ **Latence transcription** : 1-10 min selon durée audio (acceptable, traitement asynchrone)
+- ⚠️ **Performance limite** : PostgreSQL LISTEN/NOTIFY saturé >10K signalements/jour (migration Redis Streams nécessaire)
+- ❌ **Ressources CPU** : Whisper consomme 1-4 CPU cores selon charge (migration GPU si >1000 signalements/jour)
+
+### Dépendances
+
+```go
+// backend/go.mod
+require (
+    github.com/gofiber/fiber/v3 latest           // API Dashboard
+    github.com/jackc/pgx/v5 latest                // PostgreSQL + LISTEN/NOTIFY
+    github.com/redis/rueidis latest               // Cache priorisation
+    // Whisper via Python subprocess ou go-whisper bindings
+)
+```
+
+**Frontend Dashboard** :
+```json
+{
+  "react": "^18.3.0",
+  "@tanstack/react-table": "^8.10.0",
+  "wavesurfer.js": "^7.0.0"
+}
+```
+
+## Métriques de Succès
+
+- Latence traitement < 10 min (P95) après réception signalement
+- Précision IA pré-filtre > 80% (validation humaine)
+- SLA respectés > 95% des cas (2h/24h/72h selon priorité)
+- Coût infrastructure < 50€/mois jusqu'à 1000 signalements/mois
+
+## Migration et Rollout
+
+### Phase 1 (MVP - Sprint 3-4)
+1. Backend : API `/moderation/report` + PostgreSQL queue
+2. Worker : Whisper large-v3 CPU + NLP basique (liste noire mots-clés)
+3. Dashboard : React basique (liste + player audio)
+
+### Phase 2 (Post-MVP - Sprint 8-10)
+1. Migration Redis Streams si >1000 signalements/jour
+2. GPU pour Whisper si latence >15 min P95
+3. NLP avancé (distilbert + roberta)
+4. Modération communautaire (badges, [Règle 15](../regles-metier/15-moderation-communautaire.md))
+
+## Références
+
+- [Règle 14 : Modération - Flows opérationnels](../regles-metier/14-moderation-flows.md)
+- [Règle 15 : Modération Communautaire](../regles-metier/15-moderation-communautaire.md)
+- [ADR-001 : Langage Backend](001-langage-backend.md) (Go, Fiber)
+- [ADR-005 : Base de données](005-base-de-donnees.md) (PostgreSQL)
+- [ADR-010 : Architecture Backend](010-architecture-backend.md) (Modular monolith)
+- [Whisper large-v3 documentation](https://github.com/openai/whisper)
+- [PostgreSQL LISTEN/NOTIFY](https://www.postgresql.org/docs/current/sql-notify.html)