Building a UEBA Risk Engine

Article by Derek Lin, Exabeam

 

UEBA technology is the confluence of advancements in data infrastructure, security knowledge, and algorithms. Let’s explore how each of these areas relates to anomaly detection and event scoring — the output of a UEBA engine.

 

Data infrastructure

All things UEBA start with security and network logs. Raw log events must be ingested, parsed, normalised, and context enriched before being risk-assessed. A data infrastructure to support the processing pipeline must meet a number of challenges.

 

Building a UEBA Risk Engine

First, it must be cloud scale. Only a few years ago, processing events at the rate of 10K – 50K events per second (EPS) had seemed satisfactory. Enterprises are now routinely demanding 100K EPS and more, with 1 million EPS now within reach with New-Scale SIEM™.

 

Second, the pipeline must be intelligent, transforming raw events to semantically meaningful fields before they can be utilised by the risk engine. Today, event parsing and normalisation rely heavily on humans to define rules and conditions for specific vendor log formats. This seemingly mundane task is made difficult because vendors’ security logs are always being created and changed. Keeping up with the myriad of changing log formats requires a small army of security log experts, not a small feat.

 

Third, the pipeline must perform additional event enrichments in a timely manner. Example event enrichment includes the IP-geo lookup per event, which requires joining real-time streams of log and IP-geolocation feeds. Similarly, filling in missing IP address or host information in an event requires maintaining stateful IP-to-host (and host-to-IP) tracking, which requires event correlation in real-time.

Security knowledge

Should UEBA be a completely data-driven system? Or can it incorporate human knowledge? The answer is yes to both. The best UEBA system takes a hybrid approach in combining machine learning and domain knowledge in security.

 

Modern UEBA systems have hundreds to thousands of risk indicators. Humans design these risk indicators — what hints or clues to look for, for example:

 

  • Whether the email size is unusually large
  • Whether this is the user’s first time accessing this asset
  • Whether the user’s activity is performed at an unusual time.

 

Crafting these risk indicators is the work of security experts. Then, machine learning follows up to convert the states of these indicators, commonly termed as machine learning features, of an event to a risk score. Security knowledge plays a critical role in providing input to the machine.

 

It is important to know the distinction between typical UEBA risk indicators and conventional correlation rules in SIEM. UEBA risk indicators do more than the correlation rules. Products run on correlation rules alone cannot masquerade as UEBA systems. A correlation rule says, “if the number of bytes sent is more than 10 MB, then trigger”.

 

This rule is deterministic and does not take user context into account. On the other hand, a UEBA anomaly risk indicator would say “if the number of bytes sent is unusual for this user, then trigger”.

 

A user may regularly send 10 MB, triggering the correlation rule every time. But the user will not trigger a UEBA alert if sending 10 MB is a historically normal behaviour, unless a substantially high number of bytes (i.e., an order of magnitude difference) is now sent out. Correlation rules are prone to trigger noisy alerts much more than anomaly-based risk indicators.

 

Algorithms

Algorithms in UEBA are ways of deriving new information or insights from data. Different algorithm classes apply to different data types. For endpoint logs in which events have explicit hierarchical or parent-child relationships, graph algorithms that mine the directional links among events are favourable. For network logs with independent events from activities of authentication, authorisation, access, and security vendors’ alerts, etc., algorithms at varying levels of sophistication are at our disposal for UEBA. Below I detail the steps of processing events to generate risk indicators, scoring alerts or events, and ultimately grouping them to presentable units for prioritisation.

 

Building a UEBA Risk Engine for risk indicator calculation

 

Events are examined against a collection of risk indicators. Some risk indicators are computed with simple statistical analysis using the p-value based metric for hypothesis testing — for example, whether the current VPN login country meets a p-value threshold against a frequency distribution tracking historical countries visited.

 

Other risk indicators require machine-learning estimated context. For example, the user’s peer group label is a context required for peer-based risk indicator; similarly, knowing whether the external email address “jdoe211@gmail.com” belongs to the employee John Doe is a required context for a risk indicator that monitors employees’ email usage. Determining contextual information is typically performed through a periodic offline machine learning process.

 

Lastly, other risk indicators need standalone machine learning-based processes in real time; for example, whether the user is connecting to a domain generation algorithm (DGA) domain can be flagged via a deep learning-based method.

 

For risk indicator scoring

 

Risk indicators that are triggered are notable alerts. Among millions of events processed, hundreds to thousands of events are potentially alerted as anomalous in typical enterprises. It is impossible to prioritise these anomalous events without some kind of scoring. There are choices in the scoring algorithms. At the simplest level and without resorting to algorithms, humans can always manually assign some severity scores to risk indicators when they are triggered on events. However, the static scoring can only get you so far before too many events start looking equally severe and all looking like noise. In contrast, algorithms can be employed to automatically score triggered risk indicators or events. For example, give a higher score to a triggered risk indicator if it was historically rare, and vice versa.

 

For organising alerts for presentation

 

A security operations centre (SOC) has limited resources. Investigating alerts individually is a pain because there are hundreds, if not thousands, of anomalous events every day. Here algorithms can be used to perform summarisation of these events for better investigation efficiency. Event summarisation can sensibly organise disparate anomalous events to logical groups as cases, and with scores to facilitate prioritisation.

 

To sum up, building a UEBA engine combines big data technology, human security expertise, and plenty of mathematics and algorithms to make the engine work.