TL;DR: "An application should assume zero trust in ALL data and resources that it consumes, no matter who created them, including internal staff"
Software, in particular web-based applications suffer from a problem with trusting data and resources.
Data coming from users has traditionally been considered a source of risk. Applications protect themselves by not trusting that data. While this greatly aids security, that defense doesn't go far enough. If an application enforces Zero Data and Resource Trust, it ensures that ALL of the data that it uses has security controls applied at the point of use. Data includes:
- User input, including any part of a web request, such as URL, request headers, cookies and request body
- Data from data stores, such as databases, data files (like XML), command line calls and LDAP queries - even if you created that data
- Data from API requests, even if you own the API, or it's from a trusted third party such as Google or Microsoft. For example, your server-side application code may make calls to APIs to handle or process data and functionality. Responses from those APIs should be given no more trust than data coming directly from users.
What do we mean by "at the point of use"?
That means the point of use is the point at which you can apply a security control before a potential attack reaches an interpreter. It's common to generate an HTML page on the server before sending it to the client. Adding some untrusted data to the HTML page while you create it represents the last chance to add a security control before it's sent to the user's browser (the HTML interpreter)
Security Controls for Data
Protections for data are context specific, although there are some concepts that span the areas of both data and resources. Key defenses are:
- Signatures, for example JSON Web Tokens (JWTs) use signatures. A signature takes data and puts it through a key based cryptographic algorithm, the output is the signature. A server can generate a hash (signature) of data with a private key. If it gets the data again, it can run it through the algorithm again to generate the hash. That means signed data generated by the server can have the signature validated only by that server and only that server could have created the hash (assuming the key hasn’t been exposed). Data signed by the server can be confirmed to have come from the server by checking the signature.
- Output encoding - various types of output encoding exist, but the most common example is HTML encoding. It takes a string of data and alters various characters that are important parts of HTML, for example angled brackets (<>). It changes those characters in such a way as to make it impossible for those characters to be mistaken for HTML commands. When they reach an HTML interpreter, they're no longer interpreted as HTML.
- Parameterization – this is important where output encoding may not apply. Good examples are SQL queries and command line requests. Parameterization enforces a separation of data and commands so that when reaching the interpreter, data (for example from users) can’t be confused with commands.
- Input validation – ensuring any input is valid is important, only allowing specific characters and field length. Input validation should focus on accepting valid data, which will have a positive impact on security.
Attacks involving resources have become a regular occurrence, with commonly used resources becoming a greater target for attackers.
- DLLs or other functionality imported into your application code. For example, npm packages, PyPi packages or nuGet packages
Can you trust your own code?
No. While the biggest risk comes from third party components, attention should still be given to first party resources. Even if you own those resources, the same security controls should be applied.
There's always the chance, however small that malicious code has been added to a code repository. Ensure two users are required to add code to the main branch of a code repository, ensure those users use multi-factor authentication. Reviewing each other’s code is important for a variety of reasons, including security. They can defend against accidental as well as intentional security issues within your code.
Security Controls for Resources
- Restricting access – Content-Security Policy (CSP) is a critical control to limit external resources on a web page. It states where resources can be loaded from and to a degree, what those resources can do. This is an excellent defense-in-depth control and should be configured to be as limiting as possible.
- Server-side egress filtering. Servers should have strict limits on the web addresses they can talk to. If they’re compromised, then this makes it complex for attackers to receive outbound communications from them.
- Consider development time – some resources are capable of running code at install time (e.g. npm packages). This can potentially expose development environments to attack.
Why shouldn’t we trust data and resources we as the owner of a product, create?
By assuming zero trust over all data and resources, we set a clear expectation that nothing is trusted. This allows for no ambiguity in the treatment of data and resources, no matter where they come from or who created them.
What does Z-DaRT bring that's new?
Applications often choose to place trust in some things and not others, adding security controls without structured thought of attacks or defenses. Z-DaRT set's the expectation that no trust should be applied, no matter who creates the content, removing the need to have a detailed understanding of the threat landscape. This works for the majority of projects, although software with high security requirements (e.g. payments / medical / military) require additional thought and knowledge.
Got a comment or correction (I’m not perfect) for this post? Please leave a comment below.
Subscribe to Gavin Johnson-Lynn
Get the latest posts delivered right to your inbox