Stop Silent Event Loss at Scale with Windows Event Collector

X Facebook LinkedIn

Your SIEM is only as good as what reaches it. And if you’re running Windows Event Forwarding (WEF) without a properly sized Windows Event Collector (WEC) architecture, a significant portion of your security telemetry is silently disappearing before it ever hits a log. No alerts. No forensic trail. Just gaps where attackers love to hide.

You’ve probably treated WEC like a toggle—flip it on, point a GPO at a server, call it done. That works until you hit scale. At 500 endpoints, that “good enough” architecture starts to crack under the weight of concurrent WinRM connections, XPath queries that forward everything, and a ForwardedEvents log capped at its default 20 MB. At 2,000 endpoints, it’s actively losing events. At 5,000, you’re operating blind and don’t know it yet.

This post covers what an enterprise WEC architecture actually looks like: the subscription model decision you get wrong once, the math behind capacity limits, XPath optimization patterns that cut event volume by half without sacrificing visibility, and how to connect WEC to your SIEM without deploying agents to every endpoint.

How Windows Event Forwarding Works

Windows Event Forwarding (WEF) is a native Windows feature that centralizes event logs from endpoints to a dedicated collection server. No third-party agents required. WEF has two roles. The endpoint runs the Eventlog-ForwardingPlugin service, which packages and ships matching events as a forwarder. The Windows Server runs the WecSvc service, acting as subscription manager and writing incoming events to the ForwardedEvents channel.

Communication between forwarder and collector runs over WinRM—specifically HTTP on TCP port 5985 or HTTPS on port 5986. For domain-joined endpoints, Kerberos handles mutual authentication and encrypts the payload at the application layer, even over plain HTTP. For non-domain endpoints (Entra-joined devices, remote systems), you need HTTPS with TLS client certificates.

A subscription is the configuration object that defines which events to collect, how often to deliver them, and which endpoints are authorized forwarders. That subscription decision is where most enterprise deployments go wrong.

Collector-Initiated vs. Source-Initiated Subscriptions

There are two subscription modes. One scales. One doesn’t.

Collector-initiated (pull): The WEC server holds a static list of endpoints and actively polls each one for events. This requires inbound WinRM firewall rules on every endpoint and explicit read permissions for the WEC’s computer account on every endpoint’s event logs. At 50 endpoints it’s annoying. At 500 it’s unmanageable. At scale, the polling overhead consumes enough CPU and connection state on the WEC server to meaningfully degrade performance.

Source-initiated (push): The subscription is defined on the WEC server, but endpoints initiate the connection themselves. A Group Policy Object (GPO) pushes the WEC server’s URI into the SubscriptionManager registry key on each endpoint. Endpoints connect on their own schedule, authenticate via Kerberos, pull their subscription parameters, and push matching events.

For environments with 500 or more endpoints, source-initiated subscriptions are the only viable architecture. The WEC server operates passively—it doesn’t maintain a target list, doesn’t poll, doesn’t need firewall rules to push connections outward. Access control happens through Active Directory groups listed as “Allowed Forwarders” in the subscription configuration. Adding 1,000 endpoints to forwarding is a matter of adding computers to an AD security group, not editing a subscription file.

Event Delivery Optimization

Subscriptions have three delivery modes that trade latency against resource consumption:

Mode	Batch Timeout	Use Case
Normal	15 minutes	General operational logging
Minimize Bandwidth	6 hours	Branch offices with limited WAN
Minimize Latency	30 seconds	SOC / SIEM real-time detection

Security operations centers should use Minimize Latency. Waiting 15 minutes to see a lateral movement event isn’t detection—it’s documentation. The tradeoff is that 30-second heartbeats multiply the number of concurrent WinRM connections, which directly drives memory consumption on the WEC server. That’s a hardware planning input, not a reason to tolerate latency.

Capacity Planning: The Math You Can’t Ignore

A single WEC server has hard limits that Microsoft documents and most administrators discover the wrong way.

EPS Thresholds and Hardware Sizing

The foundational metric is events per second (EPS). In practice, a standard Windows workstation with a reasonable audit policy generates roughly 10-50 EPS. Domain controllers and IIS servers generate significantly more. A WEC server should be sized to handle no more than 3,000-5,000 EPS sustained, with disk I/O as the most frequent bottleneck.

Microsoft’s baseline recommendation is 2,000-4,000 endpoints per WEC server under normal conditions. With heavily optimized XPath filtering, some integrators report scaling a single collector to 10,000 endpoints—but that assumes excellent filter coverage and leaves no fault tolerance headroom. Plan for 4,000 as your ceiling before adding another WEC node.

Hardware minimums for a WEC server managing 2,000-4,000 endpoints:

Resource	Minimum	Notes
CPU	4 cores	WecSvc and WinRM are multi-threaded
RAM	16 GB	Each concurrent WinRM session holds connection state; 4,000 clients across 5-7 subscriptions can push WecSvc above 4 GB alone
Storage	Dedicated high-speed disk	3,000+ IOPS write throughput; never share with the OS volume

The storage point is worth emphasis. The ForwardedEvents log default size is approximately 20 MB. Twenty megabytes. At 3,000 EPS that’s full in seconds. Relocate ForwardedEvents.evtx to a dedicated disk immediately, set the log to several gigabytes, and configure it to archive when full rather than overwrite. Archiving creates dated .evtx snapshots (Archive-ForwardedEvents-2025-10-25.evtx) rather than silently discarding older events.

Scaling Beyond 4,000 Endpoints

WinRM’s Kerberos authentication breaks standard network load balancers. If a Service Principal Name (SPN) doesn’t match the actual hostname terminating the connection, authentication fails. This eliminates F5 and HAProxy as straightforward options.

The practical solution is AD-level distribution: partition your endpoint fleet across Active Directory security groups, each pointing to a distinct WEC server via GPO. An environment with 10,000 workstations and three WEC servers might look like this:

WEF-Group-A (3,334 computers) → GPO references WEC-Server-01
WEF-Group-B (3,333 computers) → GPO references WEC-Server-02
WEF-Group-C (3,333 computers) → GPO references WEC-Server-03

If a WEC server goes offline, only its subset of endpoints loses real-time forwarding. The endpoints buffer locally (up to your local event log size limit) and resume from their bookmark position when the server returns—no duplicate events, no permanent gaps.

Key Insight: WEF clients use a bookmarking mechanism to track the last successfully forwarded event. If a WEC server goes offline, clients buffer locally and resume exactly where they left off on reconnect. This means local event log size directly determines your recovery window. Size accordingly.

For true redundancy, you can publish multiple WEC server URIs in a single GPO. Clients forward to all of them simultaneously. This doubles EPS load on your SIEM and requires deduplication logic—a real tradeoff that’s worth knowing before you reach for it reflexively.

XPath Query Optimization

Windows endpoints are noisy. A domain controller running a standard audit policy generates process creation events (Event ID 4688), account logon events (Event ID 4624), and network connection events at rates that will exhaust an unoptimized WEC server within hours. Forwarding everything is not a strategy.

WEF uses a subset of the W3C XPath 1.0 standard for filtering. Subscriptions support two XPath operators:

<Select>: Identifies events to collect from a specified log path
<Suppress>: Drops events that match, even if they matched a <Select> clause

If an event matches both Select and Suppress, Suppress wins. This is the basis of noise reduction.

A Practical XPath Example

Collecting successful interactive logons while suppressing machine account and service account noise:

<QueryList>
  <Query Id="0" Path="Security">
    <Select Path="Security">
      *[System[(EventID=4624)]]
      and
      *[EventData[Data[@Name='LogonType'] and (Data=2 or Data=7 or Data=10 or Data=11)]]
    </Select>
    <Suppress Path="Security">
      *[EventData[Data[@Name='TargetUserName']='SYSTEM']]
      or
      *[EventData[Data[@Name='TargetUserName']='LOCAL SERVICE']]
    </Suppress>
  </Query>
</QueryList>

XPath Gotchas Worth Knowing

The WEF XPath engine has constraints that don’t apply in other XPath contexts:

No regex, no wildcards. You can’t use * for partial string matching or regular expressions for pattern matching. Exact string matches only.

The != operator is deceptive. Because Windows events are XML documents with multiple <Data> nodes, Data != 'SYSTEM' evaluates to true if any data node in the event doesn’t equal ‘SYSTEM’—which is almost always true. Use <Suppress> to exclude unwanted events rather than negation logic in <Select>.

The 20-expression limit. XPath expressions with more than 20 expressions require a structured XML query format rather than a simple XPath string. This is a parsing limit in the Windows XPath implementation, not an intentional design.

In practice, properly tuned XPath filters reduce event volume by 40-90% depending on the environment. That range reflects the difference between suppressing only the most egregious noise (conhost.exe, antivirus scanner activity) versus a full tuning exercise. Either end of that range meaningfully changes WEC server load and SIEM ingestion costs.

Pro Tip: Start your XPath tuning with process creation events (Event ID 4688) and firewall connection events—they generate the highest volume and have the most noise to suppress. In practice, removing known-safe system binaries from 4688 forwarding alone can cut your EPS by 30% in a typical enterprise environment.

SIEM Integration Architecture

The architectural payoff of WEF is what happens at SIEM integration time. Instead of maintaining proprietary SIEM agents on 500 endpoints, you maintain agents on a handful of WEC servers. The WEC becomes your telemetry hub.

Common integration patterns:

Microsoft Sentinel: Install the Azure Monitor Agent (AMA) on the WEC server. On-premises WEC servers require Azure Arc enrollment first. The Sentinel “Windows Forwarded Events” connector reads from the ForwardedEvents channel and pushes to a Log Analytics Workspace. AMA handles up to 5,000 EPS—if your WEC receives more than that, the agent falls behind. XPath filtering upstream keeps you in range. The Azure Arc enrollment step catches many teams off guard—it’s not optional, and the setup is non-trivial on a hardened WEC server sitting in a restrictive network segment.

Splunk: A Splunk Universal Forwarder on the WEC server monitors the ForwardedEvents channel. Pre-filtering via XPath directly reduces Splunk licensing costs—which, given Splunk pricing, is the only part of this architecture your CFO will actually understand.

NXLog / Chronicle: NXLog’s im_msvistalog module reads ForwardedEvents and forwards in JSON, CEF, or syslog format to any compatible endpoint. For Google Chronicle, the Chronicle Forwarder receives the syslog output. Chronicle parses standard Windows events as WINEVTLOG—solid for standard event IDs, but verify your field mappings before going live if your subscriptions forward any custom application events.

IBM QRadar: IBM WinCollect 10 installs on the WEC server and targets the ForwardedEvents (WEF) channel directly. QRadar’s UI doesn’t support secondary XPath filtering on the ForwardedEvents channel through WinCollect, so your WEF subscription XPath filtering matters more here—it’s your only noise control valve.

Warning: The Azure Monitor Agent (AMA) has a hard ceiling of 5,000 EPS. If your WEC server is receiving more than that—common in environments where XPath filtering hasn’t been tuned—AMA will fall behind and create a growing backlog. Monitor the AMA performance counters before assuming your Sentinel ingestion is complete.

Operational Gotchas at Scale

A few issues appear specifically at enterprise scale that don’t show up in smaller deployments.

The WinRM SDDL Bug (Windows Server 2016/2019)

On Windows Server 2019, Microsoft separated WinRM and WecSvc into distinct svchost processes on machines with more than 3.5 GB RAM. Windows Server 2016 is also affected if the services are manually moved to separate host processes. This isolation broke the default URL Access Control Lists, blocking WecSvc from accessing the HTTP listener and silently preventing endpoints from connecting.

Fix it with netsh:

netsh http delete urlacl url=http://+:5985/wsman/
netsh http add urlacl url=http://+:5985/wsman/ sddl=D:(A;;GX;;;S-1-5-80-569256582-2953403351-2909559716-1301513147-412116970)(A;;GX;;;S-1-5-80-4059739203-877974739-1245631912-527174227-2996563517)

The SDDL (Security Descriptor Definition Language) strings grant execute permission to the WecSvc and WinRM service accounts by SID, bypassing the name-resolution failures that broke the default ACL.

Windows Server 2022 fresh installations don’t require this. Servers upgraded from 2019 to 2022 may still need the fix.

NETWORK SERVICE Can’t Read Security Logs

The WEF client runs as the NETWORK SERVICE account. This account has read access to Application and System logs but not to the Security log—which is where Event IDs 4624, 4625, 4688, and most of your high-value events live. If Security event forwarding silently fails or throws error 0x138C, this is why.

Fix via GPO: add NETWORK SERVICE to the Event Log Readers local security group on all endpoints. Apply the GPO before you configure subscriptions.

Registry Bloat from WEF Bookmarks

The WEC server stores a registry key for every unique FQDN that has ever connected—recording bookmark position and heartbeat state. In VDI environments or DHCP-heavy subnets where hostnames change frequently, this registry grows without bound and degrades WEC performance. Schedule a periodic cleanup script to prune stale registry entries for endpoints that haven’t connected in more than 30 days.

What a Complete Enterprise Architecture Looks Like

Bringing it together for a 2,000-endpoint environment with security operations requirements:

Subscription type: Source-initiated, using AD security groups for access control
Delivery mode: Minimize Latency (30-second batching) for security-relevant subscriptions; Normal (15-minute) for operational/system logs
WEC server: 4-core CPU, 16+ GB RAM, dedicated high-speed disk; ForwardedEvents log at 10+ GB, archive-on-full mode
XPath filtering: Structured XML queries with <Suppress> clauses for high-volume benign events; targeting 40-60% reduction in raw EPS
SIEM integration: Single agent (AMA, Splunk UF, or NXLog) on the WEC server reading ForwardedEvents
Local log sizing: Security log on endpoints at 1-4 GB via GPO, providing buffer capacity during WEC server maintenance windows

At 5,000+ endpoints, add WEC nodes, distribute via AD group-based GPOs, and establish a monitoring baseline for WecSvc memory and disk queue length on each collector. When WecSvc memory climbs above 4 GB or disk queue length stays elevated, you’ve found your next capacity inflection point.

Windows Event Forwarding is one of the few places in enterprise IT where the native, built-in tooling legitimately outperforms the third-party alternatives at scale. The architecture isn’t complicated—but it does require understanding the capacity limits, the subscription model trade-offs, and the XPath filtering patterns before you deploy, not after your first event backlog.

Hate ads? Want to support the writer? Get many of our tutorials packaged as an ATA Guidebook.

Explore ATA Guidebooks