Onboard BMS should estimate battery’s State-of-Health independently from the cloud

Published in

Battery Discovery

10 min readOct 15, 2021

This post was written in a collaboration with Andew Wang and published on Intercalation Station.

In this post, I develop the ideas that I first verbalised in the summary of the paper “Digital twin for battery systems: Cloud battery management system with online state-of-charge and state-of-health estimation” by Li, Rentemeister, Badeda, Jöst, Schulte, and Sauer. I also spoke with folks from cloud battery analytics like Accure Battery Intelligence, Nortical, and Volytica.

Cloud-based battery intelligence systems typically do the following:

Estimate remaining useful capacity and internal resistance. Below, I refer to them as degradation parameters.
Collect and chart state-of-charge, voltage, current, and degradation parameters to let users see what happens with their individual batteries in the cloud.
Predict the remaining useful life of batteries to inform insurance, capacity planning, and procurement.

Li et al. proposed that batteries’ degradation parameter estimations should be sent from the cloud to the onboard battery management system (BMS) because they could be more accurate than the estimations that the onboard BMS could compute on its own.

I think this might be a reasonable approach only for very small batteries, such as those of electric bikes, but not batteries of electric vehicles or energy storage systems. Onboard BMS of the latter should estimate the degradation parameters on their own.

Onboard BMS needs to know the degradation parameters to improve safety and prolong the battery’s life

There are well-known reasons why information about batteries’ degradation should be present in the cloud. Knowing the state of batteries’ degradation enables businesses to predict batteries’ remaining lifetimes and to procure new batteries more accurately. For large battery fleets, this is clearly the task for a cloud system to keep track of each battery’s state-of-health so that operators can schedule replacements when these batteries reach their predetermined end-of-life level of degradation. The battery’s state-of-health also informs the price of insurance against a battery failure.

It’s perhaps less widely appreciated how knowing the battery’s degradation parameters in the onboard BMS enables better failure prediction and more efficient control of the battery.

These are some of the possible signs of the onset of a cell failure that a reliable onboard BMS should detect (and, perhaps, stop the battery operation and alert the owner of the battery or the EV):

A cell starts to degrade very rapidly, over the course of weeks or days. Note that a cloud battery intelligence could detect this rapid degradation, too, but only if the batteries come online frequently and upload all their telemetry to the cloud, which is a valid assumption for city taxi and transportation fleets, as well as some forms of stationary energy storage, but not for consumer electric vehicles. See the section “Privacy and familiarity concerns cause public hesitation” below for more on this.
Cell’s coulombic efficiency drops from nearly 100% to a lower value. This could be a sign of a parasitic reaction started inside a cell, which, in turn, could lead to thermal runaway or venting.
Cell’s self-discharge rate raises from nearly 0 to a noticeable value. This could be a sign of a short circuit.

I have argued before that a robust algorithm should estimate most cell parameters at once because all parameters conflate within the sole output signal: the voltage. This means that to detect the onset of some battery failures, the onboard BMS is effectively required to estimate the battery’s degradation parameters.

Knowing the degradation parameters in the onboard BMS could also help to avoid accelerated degradation (in a positive feedback loop), thus prolonging the battery’s life. The remaining capacity estimate should be used to adjust the battery’s maximum and minimum cutoff voltages for charge and discharge, respectively. A new battery cell is balanced so that, for example, 0% stoichiometry of anode corresponds to 95% stoichiometry of cathode, and the half-cell open-circuit voltages of anode and cathode at these stoichiometries add up to the cutoff discharge voltage specified by the battery manufacturer. But after some loss of lithium inventory, 0% stoichiometry of anode corresponds only to 90% stoichiometry of cathode, hence the voltage will still be higher than the initial cutoff. At this point, discharging the battery further would damage the anode and accelerate the battery degradation:

Similarly, after some loss of active material(s), the BMS should adjust the charge/discharge power limits and the value of “1C current” so that they correspond to the same current density as in a pristine battery.

Cloud computation exposes energy infrastructure to new risks

Now, as we established that the onboard BMS benefits from having accurate estimates of the battery’s degradation parameters, the next question is how and where they should be computed?

Li et al. suggest in their paper that the degradation parameters should be computed in the cloud and then sent back to the onboard BMS over the internet. I argue below that this approach incurs many problems and, ultimately, is not effective.

Estimating degradation parameters that the battery’s onboard BMS regularly receives from the cloud is a form of control, and centralising control in the cloud exposes the energy infrastructure to some highly impactful risks. Batteries are safety-critical systems: when poorly managed, they can catch fire, explode, and harm people in their vicinity.

Imagine this trivial situation: developers deploy a new version of cloud software that has a bug that causes incorrect estimates, or stops the processing of battery telemetry, or crashes when onboard BMS try to fetch new estimates from it.

Of course, one can say that developers can mitigate this risk by testing the software in a staging environment, rolling out new versions carefully, monitoring errors and quality metrics, etc. Yes, this is all possible, but in practice, developers tend to cut the “unnecessary” steps of the deployment process as long as everything goes well: see Rasmussen’s model of how accidents happen. The only way to combat this dynamic is to introduce hard boundaries that can’t be circumvented. For example, engineers at Amazon AWS can’t make any operational changes or deployments across Availability Zones, which I think is one of the reasons why AWS has had much fewer serious outages than Google Cloud Platform.

In the case of batteries, parameter estimations should be built into the onboard BMS and battery operators (rather than BMS developers) should be responsible for deploying BMS software (EV owners should explicitly agree to install a new over-the-air update). In this situation, new BMS versions will arrive at batteries slowly, over the course of months. If there are bugs in the new BMS that cause problems with batteries, operators and EV owners who installed updates earlier are likely to discover these problems and report them to developers while the new version is still deployed only on a minor portion of all batteries that use that particular BMS.

Even if developers do everything “right” and never let bugs into their cloud estimation of battery parameters, unless the estimations of battery parameters are implemented on top of multi-datacenter, multi-cloud computing infrastructure, cloud-centric estimation software is at risk to datacenter and cloud provider outages.

There are two more types of risk that are least probable, but also cannot be mitigated even in principle when batteries depend on parameter estimations done in the cloud.

Terrorists can hack into the cloud and send wrong estimates to batteries. As far as I know, there haven’t been such attacks on battery infrastructure yet, but there has been a ransomware attack on pipeline infrastructure in America in May 2021.

Finally, what if the internet is cut off in a grid blackout? This is exactly the moment when batteries must remain operational and reliable.

Cloud estimations will still require onboard backups

Li et al. suggest mitigating some of the risks described above by providing a fallback from the cloud parameter estimations to less accurate estimations that run on the onboard BMS:

“The functions which are required at each time point during operation should also run locally, guaranteeing the system safety. An advanced version of these functions will run in the cloud with advanced algorithms, which provide higher accuracy while requiring high computation power.”

I think there are multiple problems with this architecture.

First of all, it doesn’t address the main risk of cloud estimations: a faulty deployment that starts to send wrong parameter estimates to batteries. If the onboard BMS has to decide between “good” and “bad” estimates, not only is this very fragile, it also implies that the onboard BMS can be at least as “smart” as the estimation software in the cloud: but then, why wouldn’t it make good estimations itself?

In general, implementing a fallback that works not just in theory, but in practice can be very hard and is also subject to sociotechnical problems similar to the problem of “lazy developers not following proper practices”: since the fallback is not normally used, it will get little attention and testing. This could go unnoticed until a big failure happens. Jacob Gabrielson (working for AWS) wrote a good article on this topic: “Avoid fallback in distributed systems”.

Finally, could cloud estimations actually be noticeably more accurate than algorithms running on the onboard BMS? Battery parameter estimation algorithms take just a few time series as their inputs. They are not comparable with real-time video analysis or natural language processing algorithms that either process a lot of data or require large models that cannot fit into embedded computers (however, this too can soon change due to the very quick progress in ML model optimisation). In the end, BMS algorithms that can run on embedded computers would only have a marginally higher error compared to those run on big servers in the cloud. Time and effort are better spent on improving the algorithms themselves, rather than setting up ultra-reliable cloud infrastructure for computing and delivering the estimates to the batteries, as well as maintaining the fallback.

As embedded computers become cheaper and more efficient, more and more advanced algorithms can be executed on them. This trend will not reverse, and I think it will make the architecture where battery parameters are estimated in the cloud mostly obsolete by 2025.

Privacy and familiarity concerns cause public hesitation

If we, the battery industry, want to accelerate the transition towards electric mobility, we should think not only about technical aspects such as performance, efficiency, and safety but also about customer psychology.

Consumers place a lot of value on the trustworthiness of technology solutions. When it comes to battery-powered products like EVs, there is a difference between cloud-connected and cloud-reliant. While folks may be happy that an over-the-air update improves their Tesla’s efficiency, many would not like the risk of doing irreversible damage to their batteries if they drove without internet reception.

This topic has been widely covered in the field of smart homes and the internet of things. It’s not a big deal if a glitch on your internet-connected fridge causes your ice cream to melt or your smart toaster to burn your toast, but what if hackers could unlock your front door via your wifi lightbulb? The same logic could be applied to the safety and security of batteries.

Similarly, a lot of personal information is embedded in the way we cycle our batteries. If all battery history, SoC and SoH information was leaked on the cloud, one could reverse engineer the data to work out someone’s personal information such as their schedule at home and work, frequented locations, charging spots, and more.

Estimating degradation parameters onboard doesn’t mean we should stop doing estimations in the cloud

There are good reasons to continue estimating battery’s degradation parameters in cloud battery intelligence systems, while the onboard BMS is also computing these estimations and sending them to the cloud as telemetry. Continuously comparing these two estimates in the cloud, especially if they are computed using different algorithms, ensures that we notice if one of these estimations becomes biased. Any discrepancy between the estimates could be used to learn more about how do they work in a virtuous feedback cycle. The onboard and cloud algorithms have different strengths: the cloud has access to how a big number of similar batteries degrade over time and can compare their telemetry to come up with more robust estimates. The onboard algorithms have access to very high-resolution telemetry data.

Cloud battery intelligence systems could also serve as a playground for making improvements in estimation algorithms. When developers become confident that the new algorithms are robust and more accurate, they can implement them in the onboard BMS.

Conclusion

Computing the degradation parameters of batteries in the cloud makes immediate practical sense when the onboard BMS of the batteries don’t calculate these parameters themselves. However, this is not the right architecture for battery digitalisation if we try to design it from the ground up: computing parameters for batteries in the cloud exposes batteries to unnecessary risks due to centralisation. Also, people may not trust batteries that depend on receiving something from the cloud for safe operation. Yet, reliability and trust are precisely what Li-ion batteries must strive for the most because they already “walk on very thin ice”.

Electric vehicles’ adoption rate could be seriously hampered if a massive battery failure will happen because the batteries are controlled from a single cloud location.

Estimating degradation parameters and sending them to the onboard BMS makes sense for small batteries, such as those of electric city bikes or scooters, for several reasons. The batteries are smaller and hence fires are less catastrophic. The onboard BMS should consume very little power. Finally, all bikes in these fleets should be constantly online, anyway.

Things like the remaining battery’s life (calendar time or a number of cycles) should still be estimated in the cloud because these estimates depend on the acceptable remaining capacity and internal resistance limits which, in turn, depend on how the particular battery is used (for example, what range the EV should have, or what instantaneous power should the energy storage system be able to deliver). The onboard BMS couldn’t know these acceptable limits a priori. These limits could even be optimisation parameters for algorithms that try to calculate the optimal time when certain batteries, e. g. involved in peak-shaving energy storage applications, should be replaced with new ones. These algorithms could use the current and projected grid electricity prices, price and performance characteristics of replacement batteries, and other information that the onboard BMS couldn’t and shouldn’t know. The cloud estimation of batteries’ remaining life uses the degradation parameters that are calculated in the onboard BMS and are sent to the cloud periodically.

References

[1] Digital twin for battery systems: Cloud battery management system with online state-of-charge and state-of-health estimation (Li et al., 2020)

[2] Degradation diagnostics for lithium ion cells (Birkl et al., 2017)