Episode 47 — Capture data access logs that reveal sensitive reads, writes, deletes, and sharing

Data access logs are the evidence that tells you whether sensitive information was touched, not just whether an environment looked suspicious. In this episode, we focus on capturing data access visibility that can answer the most consequential investigation question in many incidents: what happened to the data. Cloud environments produce plenty of signals about configuration changes and identity behavior, but those signals do not automatically prove whether records were read, objects were copied, or files were shared outward. Stakeholders often care less about the initial compromise mechanics and more about whether sensitive data was exposed, altered, or destroyed, and data access logs are how you move that conversation from speculation to defensible fact. Without them, teams frequently overestimate or underestimate impact, both of which carry real cost in response decisions and reporting obligations. Strong data access logging also improves day-to-day governance by making access patterns visible and supporting least privilege reviews with actual usage evidence. The goal is to capture data-plane truth in a way that is searchable, correlatable, and reliable under pressure.

Before we continue, a quick note: this audio course is a companion to our course companion books. The first book is about the exam and provides detailed information on how to pass it best. The second book is a Kindle-only eBook that contains 1,000 flashcards that can be used on your mobile device or Kindle. Check them both out at Cyber Author dot me, in the Bare Metal Study Guides Series.

Data access can be defined simply as read, write, delete, and permission changes, because those actions describe how information is consumed, altered, removed, or made available to others. Reads include downloading objects, querying records, listing data structures, or retrieving file contents, and they matter because they represent potential disclosure even if no modification occurs. Writes include uploads, inserts, updates, and overwrites, and they matter because they can indicate tampering, integrity loss, or the staging of data for later exfiltration. Deletes include object removal, record deletion, or destructive operations that reduce availability and can hide evidence. Permission changes include changes to access control lists, policies, sharing links, roles, or trust settings that affect who can read or modify data going forward. This definition helps keep data access logging scoped to actions that directly represent impact rather than abstract infrastructure behavior. It also helps teams prioritize what to log because the goal is evidence of data interaction, not just system health. When you build logging around these actions, you are building the ability to answer what was touched, how it was touched, and what access pathways were created or expanded.

Object storage access logs are often the first place to start because object storage is commonly used for sensitive files, backups, exports, and application content that can be valuable to attackers. Enabling access logs for key buckets and prefixes allows you to focus on the areas that matter most rather than treating every storage location equally. Prefix focus is important because a single bucket may contain both public content and sensitive content, and logging strategy should reflect that difference. Access logs should capture operations such as reads, writes, deletes, and listing actions, because listing can be an early indicator of discovery and enumeration even before a download occurs. For sensitive areas, you also want visibility into bulk operations, such as batch reads, recursive copies, and lifecycle-driven deletions, because these can represent either legitimate maintenance or suspicious activity depending on context. The practical purpose is to ensure you can identify when sensitive objects were accessed, by whom, from where, and at what time, with enough detail to reconstruct sequences. Object storage logging becomes most valuable when it is paired with identity and network evidence, but it must exist first to support that correlation.

Database audit logs require equal attention because databases often hold the most valuable structured data, and attacks frequently involve targeted queries rather than simple file downloads. Enabling database audit logging for queries and privileged operations provides evidence of access patterns, including which principals ran which queries and whether those queries touched sensitive tables. Privileged operations, such as changes to database users, role grants, schema changes, or configuration alterations that affect logging and encryption, are especially important because they can enable persistence and reduce visibility. Query auditing can be noisy if implemented without intent, so focusing on sensitive datasets, privileged accounts, and high-risk query patterns is often more sustainable than capturing every statement indiscriminately. Context matters as well, because a legitimate application workload may run many read queries as normal behavior, while an administrative identity issuing broad select queries across many tables can be an anomaly. Database audit logs help answer whether data was accessed at the record level and whether the access was consistent with expected application behavior. In many incidents, database audit evidence is what separates suspicion from confirmation.

Sharing events are the data access actions that often create the most dangerous long-lived exposure, because they can make sensitive information available outside intended boundaries without requiring repeated downloads. Capturing sharing events includes link creation, access control list changes, and policy edits that affect who can read or write data. Links are particularly important because they can bypass identity expectations, enabling access that does not appear as a normal authenticated read in some contexts. Access control list changes matter because they can quietly expand access to broader groups, external identities, or public exposure, sometimes in ways that are not obvious in everyday dashboards. Policy edits at the storage or service level can change default access behaviors, creating broad access that affects many objects at once. Sharing events should be treated as first-class data access evidence because they represent a shift in who can touch data, which can be as impactful as a direct read. When these events are logged consistently, responders can determine whether exposure was created through sharing rather than through bulk extraction alone.

Data logs become truly useful when they record actor identity, source context, and object identifiers consistently, because investigations depend on precise correlation rather than loose inference. Actor identity should uniquely identify the principal, whether human or nonhuman, and should include information that distinguishes interactive sessions from automated access. Source context should include details such as originating network location, client type, device context where available, and session or token identifiers that can link data events to the identity session that enabled them. Object identifiers should be stable and specific, including bucket and object names for storage, table and record scope for databases where feasible, and resource identifiers that remain consistent even as systems scale. Consistency is critical because data access logs can come from multiple services, and you need common fields to join events across sources. Without consistent identifiers, responders may know that data was accessed but cannot confidently attribute it to a specific identity or session, which weakens both containment and reporting decisions. Strong field discipline also supports automated detection and analytics, because correlation becomes a structured task rather than a manual art. In practical terms, the best data access logging strategies treat fields as design requirements, not as incidental output.

Reconstructing a suspected exfiltration timeline from data logs is a practice that highlights exactly what evidence you need and where gaps will hurt. The timeline typically starts with discovery behavior, such as listing objects, enumerating tables, or performing metadata queries that indicate the actor is finding what to take. It then progresses to access actions, such as a burst of reads, export jobs, or bulk downloads, often followed by staging behaviors like copying objects to a different location or compressing exports into fewer artifacts. Sharing events may appear as the actor creates links or modifies access controls to enable off-platform retrieval without triggering typical download patterns. Deletion or cleanup behavior can occur afterward, either to hide evidence or to disrupt response, and those deletes are part of the timeline because they indicate intent. Data access logs should allow you to order these actions precisely, attribute them to a principal, and identify the specific datasets involved. When you can reconstruct the timeline, you can answer scope questions with confidence, such as which datasets were accessed, how much was touched, and over what period.

A common pitfall is relying on storage logs without true data event visibility, which creates a false sense of coverage. Storage-level logs can tell you that an object was accessed, but they may not always capture the nuance of how data was used within a service, especially when access occurs through intermediate layers or when services expose higher-level operations that do not map cleanly to object reads. Similarly, database service logs may show connection activity or administrative operations while missing query-level evidence that reveals what records were actually touched. If teams assume that basic service logs are equivalent to data access logs, they may discover during an incident that they cannot prove which sensitive data was read or exported. This pitfall also appears when teams collect only control-plane changes, such as policy edits, without collecting the subsequent data-plane events that show whether the policy change was exploited. The practical outcome is that investigations become conservative and expensive because the organization cannot rule out exposure. Data event visibility is the difference between knowing there was risk and proving there was impact.

A quick win is to prioritize logs for the most sensitive datasets first, because coverage for everything is rarely feasible immediately and sensitive data drives the highest impact. This prioritization requires understanding which datasets are crown jewels, which contain regulated information, and which support critical business processes where integrity and availability matter most. Once those datasets are identified, you can enable the most detailed access logging available for them, ensure retention aligns with investigation needs, and validate that the logs contain the fields required for correlation. Prioritizing also helps with cost management, because high-fidelity data access logging can be expensive at scale, and the organization should invest in evidence where the payoff is highest. It also improves operational focus, because analysts can build detection and investigation playbooks around the datasets that matter rather than being overwhelmed by low-value telemetry. Over time, coverage can expand, but starting with the most sensitive areas reduces the chance of missing the incident that matters most. This approach is practical, defensible, and aligned with risk management.

A scenario that makes these concepts tangible is investigating a mass download at unusual hours. The first signal may be a spike in read events against a sensitive bucket prefix or a burst of database export queries that exceeds baseline usage. Timing matters because unusual hours often correlate with lower staffing and lower scrutiny, which can indicate intentional evasion, though legitimate maintenance windows must be considered. Data access logs should show which principal performed the reads, what objects or tables were accessed, how many operations occurred, and whether the pattern suggests bulk retrieval rather than normal application behavior. Sharing event logs may show link creation or access control changes that enabled the download or provided a path for external retrieval. Network flow evidence can help confirm whether large outbound transfers occurred and to what destinations, while identity session logs can confirm whether the access came from a normal device and location. The investigative goal is to determine whether this was a legitimate bulk job, an operational mishap, or an adversary extracting data, and data access logs are the core evidence for that determination. When the logs include strong identifiers and context, triage becomes fast and defensible.

Correlation is essential because data access logs rarely tell the full story by themselves, especially when you need to understand how access was obtained and where data might have gone. Correlating data events with network flows can reveal whether large volumes of data moved out of expected boundaries or toward unusual endpoints. Correlating with identity sessions can reveal whether the actor’s authentication context was abnormal, such as an unfamiliar device, unusual geographic origin, or token behavior that suggests compromise. Correlating with control-plane events can reveal whether access pathways were created shortly before the data events, such as policy edits that broadened access or key changes that affected encryption behavior. This correlation transforms raw data reads and writes into a narrative that explains intent and scope, which is necessary for containment and reporting decisions. Correlation also helps confirm whether suspicious access succeeded, because a policy change without subsequent data access may represent attempted compromise rather than realized impact. The discipline is to design log fields so correlation is straightforward, using shared identifiers and consistent timestamps across sources. When correlation is built-in, investigations move from reactive searching to structured analysis.

A useful memory anchor is a library checkout system for every document. A library can be beautifully organized, but without a checkout record you cannot tell who borrowed a book, when they took it, or whether they returned it, and you cannot explain losses with confidence. Data access logs are the checkout record for cloud data, because they record reads, writes, deletes, and sharing actions that determine whether information was touched and whether access changed. The analogy also reinforces that the record must include who, what, and when, because a checkout slip without an identity or an item identifier is not actionable evidence. It also highlights that sharing is a form of checkout, because giving someone a copy or a link changes who can access the document even if no immediate read is observed in the original location. When teams treat data access logs like a library system, they naturally prioritize coverage for the most valuable collections first. That is exactly the mindset needed for sustainable, risk-driven data logging.

As a mini-review, data access events include reads, writes, deletes, and permission changes, because these actions define disclosure, integrity, availability, and exposure. Enable object storage access logs for key buckets and prefixes so sensitive files and exports are visible at the operation level. Enable database audit logs for queries and privileged operations so record-level access and high-impact administrative actions are captured as evidence. Capture sharing events such as link creation, access control list changes, and policy edits because these actions can create durable exposure paths. Record actor identity, source context, and object identifiers consistently so events can be correlated across identity sessions, network flows, and control-plane changes. Prioritize logging for the most sensitive datasets first to get high-value coverage quickly, and use correlation to reconstruct timelines and confirm scope. When these elements are in place, data access logs become the primary evidence for whether sensitive information was touched during suspicious activity.

To conclude, identify your crown-jewel datasets and enable access logging for them, because the most important question in many incidents is whether the data was accessed or exposed. Crown jewels should be defined based on business impact, regulatory requirements, and operational criticality, not merely on size or convenience. Once identified, ensure that data access logs capture reads, writes, deletes, and sharing actions with consistent actor and object identifiers and sufficient context for correlation. Centralize those logs, protect their integrity, and retain them long enough to support investigations that may begin weeks after the initial access. After the crown jewels are covered, expand logging coverage deliberately to additional datasets based on risk and observed gaps from incident exercises. The goal is defensible evidence and fast scoping, not indiscriminate collection. Identify your crown-jewel datasets and enable access logging.

Episode 47 — Capture data access logs that reveal sensitive reads, writes, deletes, and sharing
Broadcast by