ABSTRACT
The increasing growth of Internet of Things (IoT) networks has posed serious security challenges, such as resource-constrained devices becoming potential targets for Distributed Denial-of-Service (DDoS) attacks. This thesis proposed a Deep Q-Learning (DQL) model that uniquely integrates K-means clustering (K-Means) and K-fold cross-validation (K-Fold), enabling adaptive clustering decisions and robust evaluation across different sets. The model formulates detection as a Markov decision process, where the agent learns optimal actions based on reward signals derived from accuracy and F1-score. Reinforcement learning (RL) techniques, including experience replay and epsilon-greedy exploration, are used to iteratively refine the detection policy. Unlike machine learning (ML) models, the proposed approach dynamically adapts to evolving attack patterns and imbalanced datasets, enhancing resilience and generalization. The model effectively classifies malicious behaviors, including Scan attack, TCP Synchronize (TCP-SYN) floods,Transmission Control Protocol (TCP), Hypertext Transfer Protocol (HTTP), Internet Control Message Protocol (ICMP), and User Datagram Protocol (UDP) floods.
Furthermore, the primary contribution is the integration of K-Means clustering into the DQL agent’s learning cycle, which allows for dynamic, adaptive feature organization and significantly enhances model generalizability.









