2020/Undergraduate Researcher/archived

Intrusion Detection via Deep Learning

→ Published in IEEE — improved rare attack class detection

Published IEEE research on network intrusion detection using deep neural networks, benchmarked against the NSL-KDD dataset.

Stack

PythonTensorFlowKerasNumPyNSL-KDD

Problem

Classical ML baselines like random forests struggle with rare attack categories (U2R and R2L) in intrusion detection. These categories are the most dangerous — they represent privilege escalation and remote-to-local exploits — but their scarcity in training data causes models to under-detect them. The question I set out to answer was whether a deeper neural network could close that gap without sacrificing performance on common attack types.

My Contribution

Designed the study end-to-end as my undergraduate final-year project:

Selected NSL-KDD as the benchmark dataset (it addresses the duplicate-record issues in the original KDD'99 that inflate accuracy metrics)
Implemented and trained three models: a baseline random forest, a shallow feed-forward network, and a deeper architecture with dropout regularisation
Evaluated across all attack categories with confusion matrices to surface where each model succeeded and failed
Wrote and published the findings in an IEEE conference proceeding

Architecture

Three-model comparison study. The random forest served as the classical ML baseline. The shallow network added one hidden layer to establish whether non-linearity helped without adding depth. The deep network added layers and dropout regularisation to test whether more capacity improved detection of rare classes without overfitting.

NSL-KDD preprocessing: normalised continuous features, one-hot encoded categorical fields, stratified train/test split to preserve rare-class representation. All implemented in Python with TensorFlow and Keras.

Outcomes

The deeper network noticeably improved detection of U2R and R2L attack classes over the random forest — the categories where classical models consistently underperform. Performance on common attack types (DoS, Probe) remained competitive. Full confusion matrices and precision/recall breakdowns are in the paper.

Published as part of an IEEE conference proceeding. This was my first exposure to the full research cycle — problem definition, related work review, methodology, experiment, analysis, and write-up.

Learnings

The preprocessing pipeline matters as much as the model. Choosing NSL-KDD over the raw KDD'99 dataset was the first real decision, and it changed the experimental results meaningfully by removing inflated accuracy from duplicate records. I've carried that lesson forward: garbage in, garbage out — regardless of how sophisticated the model is.