Incident handling for small sec teams
This is an article writing up a talk I gave during bSides Ljubljana 2026.
Incident management is one of those things that are rather simple to do even with very small teams, but that gets quickly overcomplicated if you try googling for it. I looked up what is online, and it mostly fits into the following camps:
- Formal incident response for professional incident responders (firefighters, emergency services, SAR…)
- Military, intelligence, and similar government level organizations.
- Complicated bundles of services or consultancies geared towards you buying security in a box (Threat Intelligence + SOC + MISP + Incidents + SIEM + …)
What all they lack is a simple way to add incident response to an organization without doubling your team size. In reality, you are already doing it.
Yes you did it, and now you might be wondering what's the fuss about? Why even bother to change it? It's because of something called unplanned work. Unplanned work is any time you have to drop your actually tasks in order to fix something that's suddenly on fire. All your careful plannings, groomings, sprints, and waterfalls are ignored cuz production is down and everyone from CEO to their dog is yelling at you to fix it. If you want to look more into systematically thinking about IT department work I wholeheartedly recommend you to read The Phoenix Project.
Few reasons why to care about Incidents:
- It's useful
- You should
- You have to
The simplest reason is that some law, security standard or management decision forces you to implement and maintain Incident response. The easiest way to look at why we would actually want to implement incident response is by some goals of the Incident response system.
Goals of the Incident response
- Calling attention to a problem - A way for your employees to point their finger and say "this seems off, someone should check it out".
- Maintaining SLAs - Besides measuring 9s, you also have to have a way of tackling anything that is affecting your SLA.
- Formalize reassigning of resources for urgent tasks - A common side effect of a ad-hoc incident response is that ether one person keeps being bothered by all problems or people end up wasting time looking for people trough public and private channels to look at the problem.
- Document unplanned work - Its very had to estimate how much unplanned work is happening in your organization and who are the MVPs doing all the hard work if you don't measure it. Also saying smth like "we spent 400h last quarter, if we are don't put 80h into making XYZ more stable this will repeat every quarter and potentially get worse" is the best way of getting management approval for tech dept related work.
- Record anything that affects CIA(A) - A more formal definition of what to record SOC2, ISO and ENISA guidelines all require you to keep track of all events that affect Confidentiality, Integrity, Availability, and Authenticity of data & systems.
Key parts of incident response
- Report Culture
- What to report
- Responding from incidents
- Learning from incidents
Report Culture
Two major pitfalls companies make is not explaining to the employees (the whole company) what is expected of them and employees being afraid to report due to potential consequences.
For the first pitfall make sure you have a simple way for employees to report something. If you use an internal ticketing system (jira, youtrack, trello, asana...) use that and if not set up a Google form or a mailing list where people can report incidents. This step should take the least tech savvy employee less than a minute. At this point you are just calling attention at a potential problem so a few sentences and a contact point is enough. Somebody from your team can follow up later for more details if necessary.
Second is a bit trickier, first you need management buy in that things employees reported will not be used against the reporter. As your organization grows phishing, malware and misconfigurations of random tools will happen, the question is how soon will you catch and fix them.
Things you don't know are the things you can't respond to.
Knowing about it
Both of these require periodic education of all employees and they will take time. This is something that is against many old-school work cultures, so it will take time for your organization to start trusting you and reporting incidents.
If you can't build trust you will have to limit your system to "production is down" and similar show-stopper grade events.
What to report
- Technical issues with the infrastructure/codebase
- Phishing
- Malware on devices
- Physical safety issues
- Security breaches
- Unlocked PC
- ...
Start with something every 6-12 months review past events and reevaluate what sort of events you want employees to report and what sort doesn't need reporting. Keep in mind to go easy on automatic reporting systems as bad signal-noise ratio will prevent you from effectively responding to incidents.
Responding to incidents
Anything reported to your team is an event for you to check out, when you have to allocate resources to fix the issue it becomes an incident. Some of the key aspects to think out are:
- who should respond and when (some can wait till Monday and some need to be fixed on Saturday at 3AM)
- prioritization of incidents based on type and severity
- Special authorization for incident responders to act if they cannot reach higher ups (if you notice a cryptolocker stopping all systems is the correct answer, but if you need to wake up CEO to sign something it might be too late
- On-call and standby schemes and associated additional pay or benefits for participating employees. If you expect employees to respond 24/7 look into setting up call trees and rotation seagulls for teams.
- Incident Commander - the person trough which all communication regarding an incident flows within the organization. In case of major incidents communication flow and decision making friction can severely affect the organizations ability to respond to incidents.
- Training the responders to document actions and findings while responding to the incident.
In the end it's a ticket board, someone should be responsible for looking at it, triaging and forwarding to relevant domain experts. In some cases you can automate automatic notification (infra gets directly the "Database is offline" alert)
Learning from incidents
Everything above is simply writing things in order to write things if you don't do anything to learn from the incident resolution experience. Here are a few things that data can be used for:
Incident log and statistics
You can learn a lot about what is happening to your organization by reviewing past incidents. Things like what systems keep braking or what sort of phishings employees keep falling for. This allows you to figure out what systems and classes of problem require your attention and resources.
Post-mortems
Postmortem is a way to play back the incident after its resolution in order to share the knowledge of what happened and how we went about fixing the issue.
Case Studies based on incidents
Similar to a postmortem but more targeted as an self-standing learning artefact. Think of things like "Common phishing attacks to look out for" or "Common postgresql misconfigurations".
Onboarding “challenges” for new team members
If you have a list of things that happened you have a list of things that might happen again. One thing i like to do is create a test for all my newly hired sysadmins to experience solving issues that actually happened on a simulated production environment.