π― Overview
This research project introduces A Marketplace for Edge Federated ML, addressing critical challenges in building federated machine learning ecosystems where multiple stakeholdersβdata providers, model consumers, and service providersβcollaborate while preserving data privacy and ensuring transparent quality of training.
π Research Highlights
Our work tackles fundamental problems in federated learning marketplaces:
- Explainable Quality of Training (eQoT): A comprehensive framework to measure and explain training quality based on data quality and individual contributions from distributed data sources
- Privacy-Preserving Mechanisms: Data remains at the edge with selective trust models
- Real-time Cost & Quality Monitoring: Multi-dimensional cost models with transparent evaluation
- Data Integrity Detection: Novel approaches to detect malicious data modifications in marketplace settings
- Asynchronous Training: Efficient federated learning with bidirectional model aggregation
ποΈ Core Components
The research encompasses several interconnected systems:
- EADRAN Platform: Edge marketplAce for DistRibuted AI/ML traiNing
- ASYN2F Framework: ASYNchronous Federated learning Framework with bidirectional aggregation
- Data Modification Detection: Mechanisms to identify fraudulent data changes
- Cost & Quality Models: Comprehensive evaluation frameworks for federated ML
π Key Features
1. Explainable Quality of Training (eQoT)
A novel approach to provide transparency and explainability in federated ML training:
Quality of Data (QoD) Metrics:
- Class overlap and class parity
- Label purity and feature correlation
- Feature relevance and completeness
- Market context and compatibility
Contribution Analysis:
- Individual data provider contribution tracking
- Real-time performance impact measurement
- Dynamic cost evaluation based on contribution
- Transparent reward mechanisms
Multi-dimensional Cost Model:
Total_Cost = Cost_QoD + Cost_Context + Cost_Performance + Cost_Resources
where:
Cost_QoD = f(data_quantity, data_quality)
Cost_Context = f(market_reputation, compatibility)
Cost_Performance = f(accuracy_improvement, convergence_rate)
Cost_Resources = f(CPU, GPU, RAM, Storage, Network)
2. Asynchronous Federated Learning (ASYN2F)
Innovative framework addressing heterogeneity in distributed training:
Bidirectional Model Aggregation:
- Server-side: Asynchronous aggregation of local models without waiting for stragglers
- Worker-side: Mid-epoch aggregation of global model updates to reduce staleness
Key Advantages:
- Addresses obsolete information problem
- Handles heterogeneous training workers
- Adaptive learning rate synchronization
- Improved convergence speed and model performance
Convergence Analysis:
- Proven convergence rate: O(1/βT) for convex objectives
- Convergence rate: O(1/T) for strongly convex objectives
- Theoretical guarantees with practical effectiveness
3. Data Modification Detection
Advanced techniques to maintain marketplace integrity:
Detection Mechanisms:
- Weight Movement Analysis: Measuring Wasserstein distance across model layers
- Time Series Anomaly Detection: ARIMA-based behavioral pattern analysis
- Cross-client Correlation: Isolation Forest for outlier identification
- Enhanced Monitoring: Comprehensive feature engineering for detection
Capabilities:
- Detect synthetic data injection
- Identify dataset replacement
- Recognize data processing manipulation
- Handle privacy-preserving techniques (Differential Privacy, MPC)
4. Edge-Cloud Architecture
Scalable and practical implementation design:
Edge Sites:
- Local data storage and processing
- Containerized training environments
- Resource monitoring and reporting
- Privacy-preserving computation
Cloud Infrastructure:
- Orchestration and coordination
- Model aggregation services
- Real-time monitoring and analytics
- Cost computation and billing
Communication:
- Message queue-based coordination (RabbitMQ, Redis)
- Model storage (MinIO, AWS S3)
- Streaming analytics (Kafka, Spark)
- Visualization (InfluxDB, Grafana)
ποΈ System Architecture
High-Level Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Market Consumer (MC) β
β β
β ββββββββββββββββ βββββββββββββββ βββββββββββββββββ β
β β Pre-trained β βββΆ β Training β βββΆ β Trained β β
β β Model β β Request β β Model β β
β ββββββββββββββββ ββββββββ¬βββββββ βββββββββββββββββ β
ββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ
β
ββββββββββΌβββββββββ
β Orchestrator β
β & Federated β
β Server β
ββββββββββ¬βββββββββ
β
ββββββββββββββββββββββββββΌβββββββββββββββββββββββββ
β β β
βββββββββΌβββββββββ βββββββββΌβββββββββ βββββββββΌβββββββββ
β Edge Node 1 β β Edge Node 2 β β Edge Node N β
β β β β β β
β ββββββββββββββ β β ββββββββββββββ β β ββββββββββββββ β
β β Data β β β β Data β β β β Data β β
β β Provider β β β β Provider β β β β Provider β β
β β (Private β β β β (Private β β β β (Private β β
β β Data) β β β β Data) β β β β Data) β β
β ββββββββββββββ β β ββββββββββββββ β β ββββββββββββββ β
β β β β β β
β ββββββββββββββ β β ββββββββββββββ β β ββββββββββββββ β
β β Local β β β β Local β β β β Local β β
β β Model β β β β Model β β β β Model β β
β β Training β β β β Training β β β β Training β β
β ββββββββββββββ β β ββββββββββββββ β β ββββββββββββββ β
β β β β β β
β ββββββββββββββ β β ββββββββββββββ β β ββββββββββββββ β
β β Monitoring β β β β Monitoring β β β β Monitoring β β
β β Probe β β β β Probe β β β β Probe β β
β ββββββββββββββ β β ββββββββββββββ β β ββββββββββββββ β
ββββββββββ¬ββββββββ ββββββββββ¬ββββββββ ββββββββββ¬ββββββββ
β β β
βββββββββββββββββββββββββΌββββββββββββββββββββββββ
β
ββββββββββΌβββββββββ
β QoT Analysis β
β & Cost β
β Computation β
βββββββββββββββββββ
β
ββββββββββΌβββββββββ
β Visualization β
β Dashboard β
βββββββββββββββββββ
Key Components
Marketplace Services
- Data Service: Metadata management, data quality evaluation
- Training Service: Orchestration, model aggregation
- QoT Analysis Service: Real-time cost and quality computation
- Monitoring Service: Metrics collection and visualization
Edge Infrastructure
- Orchestration Service: Receives and executes training tasks
- Data Processing: Local data extraction and preparation
- Model Training: Containerized execution environments
- Resource Monitoring: CPU, GPU, RAM, storage tracking
Communication Layer
- Message Queuing: RabbitMQ for orchestration, Redis for control
- Streaming: Kafka for monitoring data, Spark for analytics
- Storage: MinIO/S3 for model artifacts
π Publications
This research has resulted in four peer-reviewed publications covering different aspects of federated ML marketplaces:
π Published Papers
1. ASYN2F: Asynchronous Federated Learning Framework with Bidirectional Model Aggregation
Authors: Tien-Dung Cao, Nguyen T. Vuong, Thai Q. Le, Hoang V.N. Dao, Tram Truong-Huu
Published: IEEE Transactions on Emerging Topics in Computing (TETC), Vol. 13, No. 4, October-December 2025
DOI: 10.1109/TETC.2025.3609004
Pages: 1618-1632
Key Contributions:
- Bidirectional aggregation algorithms for server and workers
- Handling obsolete information in asynchronous settings
- Practical implementation with real-time monitoring
- Extensive experiments on CIFAR-10, CIFAR-100, and EMBER datasets
π GitHub Repository: https://github.com/soeai/asyn2f
2. EADRAN: An Edge Marketplace for Federated Learning
Authors: Tien-Dung Cao, Hong-Tri Nguyen, Minh-Tri Nguyen, Tram Truong-Huu, Hong-Linh Truong
Published: Future Generation Computer Systems, Vol. 175, 2026
DOI: 10.1016/j.future.2025.108046
Pages: Article 108046
Key Contributions:
- Conceptual architecture for federated ML marketplaces
- Explainable Quality of Training (eQoT) framework
- Multi-dimensional cost model with four components
- Integration with Flower FL framework for adaptability
- Comprehensive experiments demonstrating eQoT benefits
π GitHub Repository: https://github.com/soeai/eadran
3. Detecting Data Modification in Marketplace of Federated Learning
Authors: Tien-Dung Cao, Ngan T.T. Pham, Hoang-Duc Le, Binh T. Nguyen
Published: International Conference on Machine Learning and Cybernetics (ICMLC), Lecture Notes in Networks and Systems, Vol. 1475, 2025
DOI: 10.1007/978-3-031-94892-3_42
Pages: 568-581
Key Contributions:
- Enhanced monitoring features for malicious behavior detection
- Weight movement analysis using Wasserstein distance
- Time series anomaly detection (ARIMA) combined with correlation analysis
- 100% recall rate in detecting data modification events
- Evaluation on CNN (CIFAR-10) and LSTM (text classification) scenarios
4. Enabling Awareness of Quality of Training and Costs in Federated Machine Learning Marketplaces
Authors: Tien-Dung Cao, Hong-Linh Truong, Tram Truong-Huu, Minh-Tri Nguyen
Published: 15th IEEE/ACM International Conference on Utility and Cloud Computing (UCC), 2022
DOI: 10.1109/UCC56403.2022.00015
Pages: 41-50
Key Contributions:
- Conceptualization of federated ML marketplace stakeholders
- Definition of quality of training (QoT) metrics
- Four-component cost model for comprehensive evaluation
- Real-time cost computation and monitoring framework
- Practical experiments demonstrating cost transparency
π Research Impact
Publication Venues:
- IEEE TETC (Q1)
- Future Generation Computer Systems (Q1)
- IEEE/ACM UCC (Rank A Conference)
- ICMLC (International Conference)
Research Coverage:
- Asynchronous federated learning algorithms
- Edge computing and marketplace design
- Data quality assessment and contribution tracking
- Security and data integrity in collaborative ML
- Cost models and real-time monitoring
Technology Stack:
- Federated Learning Frameworks: Flower, custom implementations
- Edge Computing: Docker, containerization
- Message Queuing: RabbitMQ, Redis, Kafka
- Storage: MinIO, AWS S3, MongoDB
- Monitoring: Spark Streaming, InfluxDB, Grafana
- ML Frameworks: TensorFlow, PyTorch, scikit-learn
π₯ Research Team
Principal Investigator
Tien-Dung Cao, PhD
π§ dung.cao@ttu.edu.vn
ποΈ School of Information Technology, Tan Tao University, Vietnam
π¬ Research Interests: Federated Learning, Edge Computing, Machine Learning Marketplaces, Data Quality, Distributed Systems
Role: Project lead, conceptualization, methodology, implementation, and writing
Collaborators
Prof. Hong-Linh Truong, PhD
ποΈ Department of Computer Science, Aalto University, Finland
π¬ Research Interests: Cloud Computing, Service Engineering, Data Engineering
Contribution: Conceptual architecture, marketplace design, cost models
A.Prof. Tram Truong-Huu, PhD
ποΈ Singapore Institute of Technology & Agency for Science, Technology and Research (A*STAR), Singapore
π¬ Research Interests: Cybersecurity, Federated Learning, Distributed Systems
Contribution: Algorithm design, convergence analysis, security aspects
A.Prof. Binh T. Nguyen, PhD
ποΈ Faculty of Mathematics, University of Science, VNU-HCMC, Vietnam
π¬ Research Interests: Machine Learning, Computer Vision, and Scientific Computing
Contribution: Statistical analysis, methodology
Graduate Students & Research Assistants
Nguyen T. Vuong
ποΈ Tan Tao University, Vietnam & Aalto University, Finland (Research Intern)
Contribution: ASYN2F implementation, experiments, analysis
Hong-Tri Nguyen, PhD
ποΈ Aalto University, Finland
Contribution: EADRAN platform development, integration
Minh-Tri Nguyen, PhD
ποΈ Aalto University, Finland
Contribution: System implementation, monitoring services
Thai Q. Le
ποΈ Tan Tao University, Vietnam
Contribution: Software development, testing
Hoang V.N. Dao
ποΈ Tan Tao University, Vietnam
Contribution: Implementation, experiments
Ngan T.T. Pham
ποΈ Tan Tao University, Vietnam
Contribution: Data modification detection research
Hoang-Duc Le
ποΈ Faculty of Mathematics, University of Science, VNU-HCMC, Vietnam
Contribution: Anomaly detection algorithms
π Acknowledgments
This research is supported by:
-
Tan Tao University Foundation for Science and Technology Development
Grant No. TTU.RS.22.102.001 -
CSC IT Center for Science, Finland
Cloud computing resources and infrastructure
We would like to express our gratitude to all students and staff at Tan Tao University who contributed to the implementation and testing of the platforms.
π License
This research project and associated code are released under the MIT License. See individual repositories for specific licensing details.
π Contact & Collaboration
We welcome collaboration opportunities, questions, and feedback:
- Principal Investigator: dung.cao@ttu.edu.vn
- Project Issues: GitHub Issues
- Research Inquiries: Via email to the principal investigator
Open Source Projects
- π EADRAN Repository
- π ASYN2F Repository
Related Frameworks & Tools
π Research in Federated Machine Learning Marketplaces π
Advancing Privacy-Preserving, Explainable, and Cost-Transparent Machine Learning
β Star EADRAN β’ β Star ASYN2F
Made with β€οΈ by the Tan Tao University Research Team and International Collaborators