AbstractThis PEP describes additions to the Python API and specific behaviors for the CPython implementation that make actions taken by the Python runtime visible to auditing tools. Visibility into these actions provides opportunities for test frameworks, logging frameworks, and security tools to monitor and optionally limit actions taken by the runtime. Show
This PEP proposes adding two APIs to provide insights into a running Python application: one for arbitrary events, and another specific to the module import system. The APIs are intended to be available in all Python implementations, though the specific messages and values used are unspecified here to allow implementations the freedom to determine how best to provide information to their users. Some examples likely to be used in CPython are provided for explanatory purposes. See PEP 551 for discussion and recommendations on enhancing the security of a Python runtime making use of these auditing APIs. BackgroundPython provides access to a wide range of low-level functionality on many common operating systems. While this is incredibly useful for “write-once, run-anywhere” scripting, it also makes monitoring of software written in Python difficult. Because Python uses native system APIs directly, existing monitoring tools either suffer from limited context or auditing bypass. Limited context occurs when system monitoring can report that an action occurred, but cannot explain the sequence of events leading to it. For example, network monitoring at the OS level may be able to report “listening started on port 5678”, but may not be able to provide the process ID, command line, parent process, or the local state in the program at the point that triggered the action. Firewall controls to prevent such an action are similarly limited, typically to process names or some global state such as the current user, and in any case rarely provide a useful log file correlated with other application messages. Auditing bypass can occur when the typical system tool used for an action would ordinarily report its use, but accessing the APIs via Python do not trigger this. For example, invoking “curl” to make HTTP requests may be specifically monitored in an audited system, but Python’s “urlretrieve” function is not. Within a long-running Python application, particularly one that processes user-provided information such as a web app, there is a risk of unexpected behavior. This may be due to bugs in the code, or deliberately induced by a malicious user. In both cases, normal application logging may be bypassed resulting in no indication that anything out of the ordinary has occurred. Additionally, and somewhat unique to
Python, it is very easy to affect the code that is run in an application by manipulating either the import system’s search path or placing files earlier on the path than intended. This is often seen when developers create a script with the same name as the module they intend to use - for example, a This is not sandboxing, as this proposal does not attempt to prevent malicious behavior (though it enables some new options to do so). See the Why Not A Sandbox section below for further discussion. Overview of ChangesThe aim of these changes is to enable both application developers and system administrators to integrate Python into their existing monitoring systems without dictating how those systems look or behave. We propose two API changes to enable this: an Audit Hook and Verified Open Hook. Both are available from Python and native code, allowing applications and frameworks written in pure Python code to take advantage of the extra messages, while also allowing embedders or system administrators to deploy builds of Python where auditing is always enabled. Only CPython is bound to provide the native APIs as described here. Other implementations should provide the pure Python APIs, and may provide native versions as appropriate for their underlying runtimes. Auditing events are likewise considered implementation specific, but are bound by normal feature compatibility guarantees. Audit HookIn order to observe actions taken by the runtime (on behalf of the caller), an API is required to
raise messages from within certain operations. These operations are typically deep within the Python runtime or standard library, such as dynamic code compilation, module imports, DNS resolution, or use of certain modules such as The following new C APIs allow embedders and CPython implementors to send and receive audit hook messages: # Add an auditing hook typedef int (*hook_func)(const char *event, PyObject *args, void *userData); int PySys_AddAuditHook(hook_func hook, void *userData); # Raise an event with all auditing hooks int PySys_Audit(const char *event, PyObject *args); The new Python APIs for receiving and raising audit hooks are: # Add an auditing hook sys.addaudithook(hook: Callable[[str, tuple]]) # Raise an event with all auditing hooks sys.audit(str, *args) Hooks are added by calling When events of interest are occurring, code can either call For maximum compatibility, events using the same name as an event in the reference interpreter CPython should make every attempt to use compatible arguments. Including the
name or an abbreviation of the implementation in implementation-specific event names will also help prevent collisions. For example, a While event names may be arbitrary UTF-8 strings, for consistency across implementations it is recommended to use valid Python dotted names and avoid encoding specific details in the name. For
example, an When an event is audited, each hook is called in the order it was added (as much as is possible), passing the event name and arguments. If any hook returns with an exception set, later hooks are ignored and in general the Python runtime should terminate - exceptions from hooks are not intended to be handled or treated as expected occurrences. This allows hook implementations to decide how to respond to any particular event. The typical responses will be to log the event, abort the operation with an exception, or to immediately terminate the process with an operating system exit call. When an event is audited but no hooks have been set, the As hooks may be Python objects, they need to be freed during interpreter or runtime finalization. These should not be triggered at any other time, and should raise an event hook to ensure that any unexpected calls are observed. Below in Suggested Audit Hook Locations, we recommend some important operations that should raise audit events. In general, events should be raised at the lowest possible level. Given the choice between raising an event from Python code or native code, raising from native code should be preferred. Python implementations should document which operations will raise audit events, along with the event schema. It is intentional that Verified Open HookMost operating systems have a mechanism to distinguish between files that can be executed and those that can not. For example, this may be an execute bit in the permissions field, a verified hash of the file contents to detect potential code tampering, or file system path restrictions. These are an important security mechanism for ensuring that only code that has been approved for a given environment is executed. Most kernels offer ways to restrict or audit binaries loaded and executed by the kernel. File types owned by Python appear as regular data and these features do not apply. This open hook allows Python embedders to integrate with operating system support when launching scripts or importing Python code. The new public C API for the verified open hook is: # Set the handler typedef PyObject *(*hook_func)(PyObject *path, void *userData) int PyFile_SetOpenCodeHook(hook_func handler, void *userData) # Open a file using the handler PyObject *PyFile_OpenCode(const char *path) The new public Python API for the verified open hook is: # Open a file using the handler io.open_code(path : str) -> io.IOBase The A custom handler may be set by calling Note that these hooks can import and call the If the hook determines that the file should not be loaded, it should raise an exception of its choice, as well as performing any other logging. All import and execution functionality involving code from a file will be changed to use File accesses that are not intentionally planning to execute code are not expected to use this function. This includes loading pickles, XML or YAML files, where
code execution is generally considered malicious rather than intentional. These operations should provide their own auditing events, preferably distinguishing between normal functionality (for example, A few examples: if the file type normally requires an execute bit (on POSIX) or would warn when marked as having been downloaded from the internet (on Windows), it should probably use There is no Python API provided for changing the open hook. To
modify import behavior from Python code, use the existing functionality provided by API AvailabilityWhile all the functions added here are considered public and stable API, the behavior of the functions is implementation specific. Most descriptions here refer to the CPython implementation, and while other implementations should provide the functions, there is no requirement that they behave the same. For example,
Suggested Audit Hook LocationsThe
locations and parameters in calls to Table 1 acts as both suggestions of operations that should trigger audit events on all implementations, and examples of event schemas. Table 2 provides further examples that are not required, but are likely to be available in CPython. Refer to the documentation associated with your version of Python to see which operations provide audit events. Table 1: Suggested Audit Hooks
Performance ImpactThe important performance impact is the case where events are being raised but there are no hooks attached. This is the unavoidable case - once a developer has added audit hooks they have explicitly chosen to trade performance for functionality. Performance impact with hooks added are not of interest here, since this is opt-in functionality. Analysis using the Python Performance Benchmark Suite [1] shows no significant impact, with the vast majority of benchmarks showing between 1.05x faster to 1.05x slower. In our opinion, the performance impact of the set of auditing points described in this PEP is negligible. Rejected IdeasSeparate module for audit hooksThe proposal is to add a new module for audit hooks, hypothetically Any such module would need to be a built-in module that is guaranteed to always be present. The nature of these hooks is that they must be callable without condition, as any conditional imports or calls provide opportunities to intercept and suppress or modify events. Given it is one of the most core modules, the import sys; sys.modules['audit'] = type('audit', (object,), {'audit': lambda *a: None, 'addhook': lambda *a: None}) Multiple layers of protection already exist for monkey patching attacks against either
This idea is rejected because it makes it trivial to suppress all calls to Flag in sys.flags to indicate “audited” modeThe proposal is to add a value in Currently, we are not aware of any legitimate reasons for a program to behave differently in the presence of audit hooks. Both application-level APIs The argument that this is “security by obscurity” is valid, but irrelevant. Security by obscurity is only an issue when there are no other protective mechanisms; obscurity as the first step in avoiding attack is strongly recommended (see this article for discussion). This idea is rejected because there are no appropriate reasons for an application to change its behaviour based on whether these APIs are in use. Why Not A SandboxSandboxing CPython has been attempted many times in the past, and each past attempt has failed. Fundamentally, the problem is that certain functionality has to be restricted when executing the sandboxed code, but otherwise needs to be available for normal operation of Python. For example, completely removing the ability to compile strings into bytecode also breaks the ability to import modules from source code, and if it is not completely removed then there are too many ways to get access to that functionality indirectly. There is not yet any feasible way to generically determine whether a given operation is “safe” or not. Further information and references available at [2]. This proposal does not attempt to restrict functionality, but simply exposes the fact that the functionality is being used. Particularly for intrusion scenarios, detection is significantly more important than early prevention (as early prevention will generally drive attackers to use an alternate, less-detectable, approach). The availability of audit hooks alone does not change the attack surface of Python in any way, but they enable defenders to integrate Python into their environment in ways that are currently not possible. Since audit hooks have the ability to safely prevent an operation occurring, this feature does enable the ability to provide some level of sandboxing. In most cases, however, the intention is to enable logging rather than creating a sandbox. Relationship to PEP 551This API was originally presented as part of PEP 551 Security Transparency in the Python Runtime. For simpler review purposes, and due to the broader applicability of these APIs beyond security, the API design is now presented separately. PEP 551 is an informational PEP discussing how to integrate Python into a secure or audited environment. ReferencesCopyrightCopyright (c) 2019 by Microsoft Corporation. This material may be distributed only subject to the terms and conditions set forth in the Open Publication License, v1.0 or later (the latest version is presently available at http://www.opencontent.org/openpub/). Is used to establish transition probabilities among various states?A Markov process model is used to establish transition probabilities among various states.
Which of the following technique involves counting the number of occurrences of a specific event type over an interval of time?Threshold detection involves counting the number of occurrences of a specific event type over an interval of time.
Which of the SSL TLS protocols establishes the security capabilities of the client and server?Asymmetric encryption is used to establish a secure session between a client and a server, and symmetric encryption is used to exchange data within the secured session. A website must have an SSL/TLS certificate for their web server/domain name to use SSL/TLS encryption.
|