Why Insider Threats Need a New Approach
Insider threats have always been one of the most difficult challenges in cybersecurity. Traditional tools—signature-based antivirus, static DLP, basic SIEM rules—often fail to catch a trusted user quietly exfiltrating sensitive data. Attackers can evade static rules, but it’s much harder to evade your own behavior. That’s where behavioral analytics, and specifically AI-based anomaly detection, come in.
My Approach
As part of my PhD research, I built an anomaly detection pipeline using Long Short-Term Memory (LSTM) neural networks. LSTMs are great at modeling sequences—exactly what you need when you want to understand how user actions evolve over time.
The idea is simple:
- Collect raw security event data (USB usage, file reads/writes, logins, etc.)
- Turn this into a time-series dataset
- Train an LSTM to “learn normal” behavior over sequences of activity
- Flag anything that the model can’t predict well as an anomaly
If the model is surprised by an action (high prediction error), it’s probably something worth investigating.
How I Built It
- Feature Selection:
For this example, I use USB events, file read counts, and file write counts as the main features. But you can easily expand this to include logins, network transfers, or even clipboard activity. - Data Preprocessing:
I normalize the features and split them into overlapping sequences (think of it as a moving window over the data). - Model Architecture:
A simple two-layer LSTM, with dropout to prevent overfitting, predicts the next set of activity features based on the recent history. - Training:
The LSTM is trained in an unsupervised way—using the real, (mostly normal) data as both input and target. - Anomaly Scoring:
After training, the model predicts on new sequences. I calculate the mean squared error (MSE) between predicted and actual values; if the error crosses a chosen threshold (e.g., the 99th percentile of errors), that event is flagged as an anomaly. - Visualization and Export:
I visualize anomaly scores over time, flag high-risk periods, and export detected anomalies for further investigation.
Why This Matters
- Unsupervised = Less Manual Work:
No need to label mountains of insider data or wait for the next big attack. - Portable and Generic:
Works on any time-series behavioral log—USB, file, cloud, email, you name it. - Explainable:
It’s easy to show the anomaly score, and highlight “why” an action was flagged.
Sample Code (GitHub)
I’ve open-sourced a generic, ready-to-use version of this code here.
You can plug in your own logs and feature sets, tweak the model as needed, and use it for everything from research projects to internal monitoring POCs.
Try It Yourself
- Download your endpoint or USB activity logs as CSV
- Adjust the feature names in the script
- Run, visualize, and see if you can spot unusual behaviors
If you have ideas for improvements (e.g., adding transformer models, new features, or more advanced anomaly scoring), I’d love to hear your feedback.
Final Thoughts
Behavioral AI isn’t a silver bullet, but it’s a necessary step for any modern SOC or security-conscious organization. LSTM-based anomaly detection lets you find the things static rules will always miss—especially when the attacker already has a valid login.
Want to see more projects like this? Check the Research section of my site, or reach out if you want to collaborate on AI-driven security.
