Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form
Account Abstraction is coming to completely change how we interact with the blockchain. But the version of account abstraction proposed by ERC-4337 is a tough read, and it’s a struggle to understand why there are so many participants and why they interact the way they do.
Could there be something simpler?
In this document, I’ll walk through the process of trying to design a dirt-simple version of account abstraction, and we’ll see how as we add more requirements and solve problems that arise, we’ll end up with something that balloons in complexity and gets closer and closer to ERC-4337.
The target audience for this article is someone who has some knowledge of smart contracts but no particular knowledge of account abstraction.
Because this document is exploring the process of inventing account abstraction, there will be many cases where the APIs or behaviors I describe don’t match the final version in ERC-4337.
For example, when I list the fields of a User Operation, please do not assume that those are the actual fields. They represent what might be a first stab at defining a User Operation before being workshopped into a final version.
To get things started, let’s invent a way to protect our most valuable assets. We want to be able to sign most transactions with a single private key (as in a typical account), but our priceless Carbonated Courage NFT should only be possible to transfer if we sign with a second key that we’ll lock in a bank vault guarded by a three-headed dog.
Here’s the first question:
Every Ethereum account is either a smart contract or an externally-owned account (EOA), where the latter is controlled from off-chain using a private key. Should the account that holds these assets be a smart contract or an EOA?
In fact, the asset-holder must be a smart contract. If it were an EOA, then the assets could always be transferred by transactions signed by the EOA’s private key, which bypasses the security we want.
So unlike most people today, our presence on-chain will be represented by a smart contract instead of an EOA, which we will call a smart contract wallet, or just a “wallet.”
We need a way to issue commands to this smart contract so it carries out the actions that we want. In particular, we need to be able to command the smart contract to make any kind of transfer or call that I could have sent from an EOA.
Each user who wants their assets protected in this way will need their own smart contract. There can’t be one big contract holding the assets of multiple people because the rest of the ecosystem assumes that one address represents one entity and won’t be able to distinguish the individual users.
For example, if someone wanted to send an NFT to someone in a combined-wallet contract, the NFT’s transfer API would only allow the sender to specify the address of the combined-wallet but not an individual user within it.
I’ll deploy a wallet smart contract which will hold my assets and which has one method where I pass it information about what call I want it to make.
Let’s call the data representing the action I want my wallet to perform a user operation or user op.
So the wallet contract looks like this:
First, we need all the parameters which we would normally pass to eth_sendTransaction:
On top of that, we need to provide something to authorize the request - that is, a chunk of data that the wallet will look at to decide whether it wants to perform the operation or not.
For our NFT-protecting wallet, for most user ops we’d pass a signature of the rest of the op signed by our main key.
But if the user op is transferring our super-valuable Carbonated Courage NFT, then the wallet will need us to pass signatures of the rest of the op signed by each of our two keys instead.
We’ll also throw in a nonce to prevent replay attacks where someone could resend a previous user op to run it again:
This actually accomplishes the goal!
As long as my Carbonated Courage NFT is held by this contract, it cannot be transferred without two signatures.
While the wallet can choose to interpret the signature and nonce fields however it wants, I would expect almost all wallets to use the signature field to receive some kind of signature over all the other fields to prevent unauthorized parties from forging or tampering with the op. Similarly, I would expect almost any wallet to reject an op with a nonce it has already seen.
One unanswered question here is how executeOp(op) gets called. Since it won’t do anything without the signatures from my private keys, we can let anyone try to call it and there won’t be any security risks. But we do need someone to actually make that call in order for the operation to happen.
On Ethereum, all transactions must originate from an EOA, and the calling EOA must pay for gas with its own ETH.
What I could do is have a separate EOA account whose only purpose is to call my wallet contract. While this EOA won’t have the same two-signature protection as the wallet contract, it only needs to hold enough ETH to pay for the gas to run my wallet, while the more secure wallet contract can hold all my valuable possessions.
So we actually got a big chunk of the account-abstraction functionality with just one pretty simple contract!
What I’m calling a “wallet contract” is referred to in ERC-4337 as an “account.” I find that confusing because I think of every address as an account. I’ll always refer to this participant as a “wallet contract” or just a “wallet.”
A drawback of the above solution is the requirement that I need to run a separate EOA account to call my wallet. What if I don’t want to do that? For now, I’m still willing to pay for my own gas with ETH. I just don’t want to have two separate accounts.
We said that the wallet contract’s executeOp method can be called by anyone, so we could just ask someone else with an EOA to call it for us. I will refer to this EOA and the person running it as the “executor.”
Since the executor is the one paying for gas, not many people would be willing to do that for free. So the new plan is that the wallet contract will hold some ETH, and as part of the executor’s call, the wallet will transfer some ETH to the executor to compensate the executor for any gas used.
“Executor” is not an ERC-4337 term, but it’s a good description of what this participant does. Later on, we’ll replace it with the actual term used by ERC-4337, “bundler,” but it doesn’t make sense to do that yet since we’re not doing any bundling at the moment. Other protocols might also call this participant a “relayer.”
Let’s try to keep it simple. We said that the wallet’s interface was:
We’ll try modifying the behavior of executeOp so that at the very end, it looks at how much gas it used and sends an appropriate amount of ETH to the executor to pay for it.
If my wallet is trustworthy, then this works great! But the executor needs to be sure that the wallet really will pay out a refund. If the executor calls executeOp but the wallet doesn’t actually refund the gas, the executor will be on the hook for the gas fees.
To try to avoid this scenario, the executor can try simulating the executeOp operation locally, likely with debug_traceCall, and see if it really is being compensated for its gas. Only then will it send the actual transaction.
A problem here is that simulation doesn’t perfectly predict the future. It’s entirely possible that the wallet pays for gas during simulation and fails to do so when the transaction is actually added to a block. A dishonest wallet could do this intentionally, getting its operations executed for free and racking up a huge gas bill for the executor.
Simulation could differ from real execution for a couple of reasons:
One thing the executor could try is to restrict what the operation is allowed to do, such as rejecting any operation that uses any of the “environment” opcodes. But this would be much too harsh a restriction.
Remember that we want the wallet to be able do anything an EOA could do, so banning these opcodes would block too many legitimate uses. For example, it would prevent the wallet from interacting with Uniswap, which uses TIMESTAMP extensively.
As the wallet’s executeOp can contain arbitrary code and we can’t reasonably restrict it to stop it from fooling a simulation, this problem is insurmountable with the current interface. executeOp is just too much of a black box.
The problem here is that we’re asking the executor to run code from an untrusted contract. What the executor wants is to run these untrusted operations in a context that grants certain guarantees. This is the entire purpose of smart contracts, so we’ll introduce a new trusted (i.e. audited, source code verified) contract called the entry point and give it a method that the executor will call instead:
handleOp will do the following:
For that third bullet point to work, we actually need the entry point to hold the gas-payment ETH rather than the wallet itself, because as we saw in the previous section we can’t be sure we’ll be able to get ETH out of the wallet. Thus the entry point also needs a method for the wallet (or someone on behalf of the wallet) to put ETH into the entry point to pay for its gas, and we’ll have another method so the wallet can take its ETH back out when it wants to:
With this implementation, the executor gets refunded for gas no matter what.
This is great for the executor! But it’s actually a pretty big problem for the wallet…
Shouldn’t the wallet be able to pay for gas using its own ETH rather than ETH deposited into the entry point? Yes it should! We’ll get to this, but we can’t do it until we have the change from the next section and even then we’ll still need the deposit/withdraw system as well. Plus, we’ll need the deposit/withdraw system later to support paymasters.
We previously defined the wallet interface as:
This method really does two things: it validates that the user op is authorized, and then it actually executes the call specified by the op. This distinction didn’t matter much when the wallet’s owner was paying for gas with their own account, but now that we’re asking the executor to do it, it’s significant.
Our current implementation has the wallet refund the gas fee to the executor no matter what. But we actually don’t want the wallet to pay if validation fails.
If validation fails, it means that someone with no authority over the wallet asked for the wallet to do something.
In this case, the wallet’s executeOp will correctly block the operation, but under the current implementation the wallet will still have to pay gas.
This is a problem, because someone with no affiliation to the wallet can request a bunch of operations from that wallet and use up all the wallet’s gas money.
By contrast, if the validation succeeds, but the operation fails after that, then the wallet should be charged for gas. This represents that the wallet owner authorized an action that turned out not to succeed, much like sending a reverted transaction from an EOA, and because they authorized it they should be responsible for gas.
The current wallet interface with one method doesn’t provide a way to distinguish between validation failures and execution failures, so we need to split it into two parts.
Our new wallet interface will be:
The new implementation of entry point’s handleOp will be:
Now things look great for the wallet! It won’t be charged for gas except for operations it authorized.
But things look dicey for the executor again…
We should make sure that an unauthorized user can’t directly call executeOp on the wallet, causing it to take action without validation. The wallet can prevent this by enforcing that executeOp may only be called by the entry point.
Why doesn’t a dishonest wallet just do all its execution in validateOp, so that if the execution fails it won’t be charged for gas? We’ll see in a moment that validateOp will have significant restrictions that make it unsuitable for “real” operations.
Now when unauthorized users submit operations for the wallet, the operation will fail in validateOp and the wallet won’t have to pay. But the executor still pays gas for the on-chain execution of validateOp, and it won’t be compensated.
Dishonest wallets can no longer get their operations run for free, but griefers can still cause the executor to lose money on gas for failed operations whenever they want to.
In the previous simulation section, the executor tried simulating the operation locally first to see if it would pass, and only then submitted a transaction to call handleOp on chain.
We ran into problems because the executor could not reasonably restrict execution to prevent it from succeeding during simulation but failing in the real transaction.
But something’s different this time.
The executor doesn’t need to simulate the entire execution which now consists of validateOp followed by executeOp. It only needs to simulate the first part, validateOp, to know if it’s going to get paid or not. And unlike executeOp, which needs to be able to perform arbitrary actions so that the wallet can freely interact with the blockchain, we can put more stringent restrictions on validateOp.
To be specific, the executor will reject the user op without ever putting it on chain unless validateOp satisfies the following restrictions:
1. It never uses opcodes from a certain banlist, which includes codes like TIMESTAMP, BLOCKHASH, etc.
2. The only storage it accesses is the wallet’s associated storage, defined as any of the following:
The goal of these rules is to minimize cases where validateOp succeeds in simulation but fails in real execution.
The banned opcodes are self-explanatory, but those storage restrictions might seem kind of weird.
The idea is that any storage access represents the danger of a false simulation because that storage slot can change between simulation and execution, but if we limit storage to just locations that are associated with this wallet, then a griefer would need to update storage specific to the wallet to falsify the simulation. The hope is that the cost of updating this storage is enough to deter the griefer.
With this simulation in place, both the wallet and the executor are safe.
There’s another benefit to this storage restriction, which is that we know that calls to validateOp for operations on different wallets are unlikely to interfere with each other since the storage they can both access is limited. This will be important when we talk about bundling.
Currently, the wallet provides ETH for gas by first depositing it into the entry point and only then sending a user operation. But an ordinary EOA pays for gas out of its own ETH reserves. Shouldn’t our wallets be able to do the same?
We can do this now that we’ve split validation and execution, because the entry point can require that the wallet sends ETH to the entry point as part of the validation step or the op gets rejected.
We’ll update the wallet’s validateOp method so that the entry point can ask it for funds and have the entry point reject the op if validateOp doesn’t pay the requested amount to the entry point:
Since at validation time we don’t know the exact amount of gas that will be used during execution, the entry point asks for the maximum amount that execution can possibly use based on the op’s gas field. Then at the end of execution, we want to return the unused gas money to the wallet.
But here we run into a catch.
When writing a smart contract, it’s iffy to send ETH to an arbitrary contract because doing so calls arbitrary code on that contract, which could fail, use unpredictable amounts of gas, or even attempt reentrant attacks against us. So we won’t directly send the extra gas money back to the wallet.
Instead, we’ll hold on to it and allow the wallet to get it out by making a call to withdraw it later. This is the pull-payment pattern.
So what we’ll actually do is have the excess gas money go into the same place that ETH sent with deposit ends up, and the wallet can take it out later with withdrawTo.
It turns out we did need the deposit/withdraw system after all (or at the very least, the withdraw part of it).
This means that a wallet’s gas payment can actually come from two different places: its ETH held by the entry point, and ETH that the wallet holds itself.
The entry point will try to pay for gas using the deposited ETH first, and then if there isn’t enough deposited it will ask for the remaining portion when calling the wallet’s validateOp.
At the moment, being an executor is a thankless task. They need to run a lot of simulations, make no profit, and are sometimes forced to pay out-of-pocket for gas when their simulations are falsified.
To compensate executors, we’ll allow wallet owners to submit a tip with their user ops that will go to the executor.
We’ll add a field to user operations to express this:
Like the similarly named field in regular transactions, maxPriorityFeePerGas represents a fee that the sender is willing to pay to have their operation prioritized.
The executor, when sending its transaction to call the entry point’s handleOp, can choose a lower maxPriorityFeePerGas and pocket the difference.
We talked about how the entry point should be a trusted contract and what it does. You might notice that nothing about the entry point is specific to the wallet or the executor. Thus, the entry point can be a singleton across the whole ecosystem. All wallets and all executors will interact with the same entry point contract.
This does mean that we need to tweak user operations so that they also specify which wallet they’re for, so that when the op is passed to entry point’s handleOp, the entry point will know which wallet to ask for validation and execution.
Let’s update that:
Our goal was to create an on-chain wallet that pays for its own gas without its owner needing to manage a separate EOA, and we have now achieved that!
What we have is a wallet with the interface:
We also have a blockchain-wide singleton entry point with the interface:
When the wallet owner wants to perform an action, they craft a user op and, off-chain, ask an executor to handle it for them.
The executor simulates the wallet’s validateOp method on this user op to decide if it wants to accept it or not.
If it accepts, the executor sends a transaction to the entry point to call handleOp.
The entry point then handles validating and executing the operation on-chain, and afterwards refunds ETH to the executor out of the wallet’s deposited funds.
That was a lot, but we made it!
Before we get to the next big feature, let’s take a moment for a surprisingly easy optimization.
With what we’ve implemented so far, the executor sends one transaction to perform one user operation. But now that we have an entry point contract that’s not tied to just one wallet, we can save some gas by collecting a bunch of user operations from different people, then executing them all in a single transaction!
This bundling of user ops will save gas by not repeatedly paying a fixed 21,000 gas fee for sending a transaction as well as lowering fees for performing cold storage accesses (accessing the same storage several times in a transaction is cheaper after the first time).
This requires refreshingly few changes.
That’s basically it.
The new handleOps method does more or less what you’d expect:
The one thing of note here is that we first perform all the validations and only then perform all the executions, rather than validating and executing each op before moving on to the next one.
This is important to preserve simulations.
If during handleOps we instead executed one op before validating the next one, then the first op’s execution would be able to freely mess with the storage that the second op’s validation depends on and cause it to fail even if the second op passed validation when we simulated it in a vacuum.
Along similar lines, we want to avoid situations where one op’s validation messes with the validation of a later op in the bundle.
As long as the bundle doesn’t include multiple ops for the same wallet, we actually get this for free because of the storage restrictions discussed above: if the validations of two ops don’t touch the same storage, they can’t interfere with each other. To take advantage of this, executors will make sure that a bundle contains at most one op from any given wallet.
One great thing for the executors is that they have a new income source!
The executor has the opportunity to get some Maximal Extractable Value (MEV) by arranging user ops within a bundle (and possibly inserting their own) in a way that’s profitable.
Now that we have bundling, we can stop calling these participants “executors” and start calling them by their real name, bundlers.
I’ll call them bundlers for the rest of this 4-part series to be consistent with ERC-4337 terminology, but I actually do find “executor” to be a good way to think of them in my head, because it emphasizes that their job is to be the one that actually starts on-chain execution by sending a transaction from an EOA.
We have a setup where wallet owners submit user operations to bundlers in the hopes of having those operations included in a bundle. This is very similar to the setup for ordinary transactions, where account owners submit transactions to block builders in the hopes of having those transactions included in a block, so we can benefit from some of the same network architecture.
Just as nodes store ordinary transactions in a mempool and broadcast them to other nodes, bundlers can store validated user ops in a mempool and broadcast them to other bundlers. Bundlers can validate user ops before sharing them with other bundlers, saving each other the work of validating every operation.
A bundler can benefit by also being a block builder, because if they can choose the block that their bundle is included in, they can reduce or even eliminate the possibility of operations failing during execution after succeeding in simulation. Further, block builders and bundlers can benefit in similar ways by knowing how to extract MEV.
Over time, we might expect that bundlers and block builders merge into the same role.
That was a lot.
So far, we've figured out how to create a smart contract wallet to protect our most valuable assets and how to rely on an executor, or bundler, to call this smart contract wallet on our behalf.
Next up, learn about sponsored transactions, wallet creation, and aggregating signatures!