Industrial devops, p.14
Industrial DevOps, page 14
Case Study: NASCAR Advances Race Car Development through Simulation23
NASCAR was founded in the late 1940s by Bill France Sr., who was a mechanic with a passion for stock car racing. To this day, NASCAR races are among the most watched programs on television. In fact, in 2023 the Daytona 500, one of NASCAR’s premier races, sold out for the eighth year in a row. NASCAR is not only entertaining but is also a case of where product development is continuously improved through making data-driven decisions.
NASCAR announced they needed a new generation of race cars in 2018. Designing a new race car is both complex and challenging due to regulatory requirements, frequent rule changes, aerodynamic testing, and performance optimization. Typically, a new race car will go through thousands of hours in expensive wind tunnel testing as it’s being developed. In addition to the design, NASCAR teams have to continuously improve. NASCAR races almost weekly during the season, and their teams typically have less than one week to prepare in between races. Unfortunately, they are on a new track every week and often will have fewer than two practice runs. The recent advance in simulation provided by D2H and Ansys has improved the design of the cars and streamlined aerodynamics in cars to nearly eliminate wind tunnel testing.
NASCAR uses instruments on their vehicles to provide full telemetry of RPMs and other performance metrics. They use sensors to track surfaces, weather conditions, and competition rules. They use the data to run exhaustive computer simulations that enable teams to produce three times as many designs without extra development time and resolve issues in minutes to hours. The data-driven approach to decision-making can equate to millions of dollars in purse and sponsorship money weekly.
Getting Started
•Define your yearly OKRs for alignment between strategy and execution. It will help the teams understand the direction they are headed, focus priorities, and build connectedness.
•Review the flow of value through the system. To do this analysis, you need to look at the tools. How quickly are features getting completed? Where are the bottlenecks in the flow? Once you find the bottleneck in the system, what will you do?
•Define a small set of metrics. As you review each of the measures, which ones will you start with to help drive the outcomes you are after?
•Objectives are fulfilled by features. Each feature needs defined objective evidence. Capture with each feature how it will be demonstrated.
•Invest in digital integrated tools to improve visibility and progress with real-time data. Many organizations are going through some level of digital transformation, and one area to focus on is the integrated tool environment. Invest there.
Key Takeaways
•Measure progress based on objective evidence of work demonstrated, not on tasks completed. The goal is to regularly demonstrate integrated functionality to stakeholders. This is more challenging in the cyber-physical domain with software and hardware, but embracing digital capabilities and design and architectural patterns for hardware makes it possible.
•Making progress visible and seeing results help shape the next set of prioritized work.
•Visualize the flow of work across the value stream to help find the bottlenecks.
•Progress toward meeting objectives and key results is understood as features are demonstrated. For cyber-physical systems, this is especially challenging, as we measure flow across the value stream, which includes software, hardware, and often suppliers.
•Stakeholder participation and feedback at the demonstrations are critical in validating the solution is meeting the anticipated results.
•Digital tools and engineering environments are enablers for iterative development and demonstrations with cyber-physical solutions.
•There are a variety of metrics that can be used. Know what metrics you are using and why you are using them. Ask yourself if these metrics are helping you understand your progress toward business outcomes.
Questions for Your Team to Answer
•Has your organization captured the value stream for the product? Before you start iterating on designs and development, be sure the teams are organized around the value stream (Principle 1).
•Once you understand your value stream, document the different development environments. When teams do their demonstrations, which environment will the demonstration take place in? What tools are needed?
•Has your team identified which metrics to track and how they will use the metrics to make data-driven decisions?
•Which tools do you need to obtain quantitative data?
Coaching Tips
•Basing decisions on objective evidence requires regular demonstrations of integrated functionality.
•Complete a stakeholder analysis to ensure you have the right participants at the reviews to provide feedback and assist with and improve decision-making.
•Reduce risk and schedule delays through regular observation and solution demonstrations.
•Be intentional in your digital transformation journey by prioritizing your digital capabilities against the greatest return for your needs and goals.
*During my (Robin’s) time in the defense industry, I had the opportunity to work closely with an officer who had been trained in the special forces. He recounted to me that every time his map of an area did not match the physical terrain of the area, that the terrain was always right. A light bulb went off. He was absolutely right; without objective evidence, information can be out of date or incorrect.
Chapter 7
Architect for Change and Speed
Principle 4: Architecting for change and speed provides information on multiple architecture considerations that can reduce dependencies and improve the speed of change.
Good architecture leads to better systems. Having a clear blueprint of what we are going to build before we begin is clearly a good system. It allows teams to effectively communicate the needs and dependencies of the system to those who will be building it. The more complex the system, the greater the need for an easy-to-follow blueprint, especially when that system is really a system of systems comprising software and firmware/hardware. There is a common misunderstanding that we do not need architecture for Agile or DevOps teams, but nothing is further from the truth. Architectural artifacts are used for many purposes when building cyber-physical systems, which include providing a common framework to enable design, a mechanism to communicate the design to stakeholders, a benchmark to verify and validate the system against, and a baseline to maintain and evolve the system in the future. Our goal is to build a blueprint that supports the needs previously discussed while ensuring we can adapt to change rapidly.
But how do we architect complex cyber-physical systems and still develop better systems faster? How do we get to market at the speed demanded of organizations today, without sacrificing quality or safety? How do we achieve the goals of continuously developing, integrating, deploying, and releasing value at speed in complex cyber-physical systems? The key is to architect for scale, modularity, and serviceability.
In this chapter, you will learn what Industrial DevOps architectural considerations you should consider in designing and building cyber-physical systems, what current technology trends are impacting architecture, and how you can leverage those trends as architectural accelerators (i.e., architect for speed). (We are not trying to teach general rules of good architecture but will focus on the specific Industrial DevOps solutions that help cyber-physical systems architect for speed. For those of you who have less of a background in systems engineering and architecture, you can first go to Appendix B to learn a little bit more.)
Architecting Cyber-Physical Systems
No matter what you are building, it is important to have an intentional road map of the required elements when architecting your system, because there are so many trade-offs that need to be considered, including change, usability, availability, observability, agility, manufacturability, reusability, security, and scalability. Each of these main considerations is further complicated by sub-considerations. The considerations for architecting cyber-physical systems are outlined in a road map in Figure 7.1. By focusing on each of these considerations and making good architecture choices up front, we can avoid costly delays and build better systems fast.
Figure 7.1 Architectural Considerations for Industrial DevOps
In many cases, engineers start the design process of a new system by copying an existing system’s architecture and then applying it to the next system. This is done to save time in architecting a new system from scratch and is a practice used by both software and systems engineers. The problem comes when we don’t stop to ask if the mission or purpose of a new system is the same as the one we are copying. We must take the time to stop and consider the unique needs of this new system. Context matters.
Teams should first start any new systems architecture by creating a quick table listing the architectural considerations for the system, then rate the impact they have on the new system using something like a Likert score. This is a very quick exercise that highlights areas in the architectural pattern you are using that may need to be refactored. Let’s walk through several of the major architectural considerations for cyber-physical systems. By planning for each of these up front, we can architect a system for speed.
Architect for Change
Given the rate of change in technologies we are experiencing, it is critical that teams building cyber-physical systems begin by architecting for change and extensibility. Many systems morph throughout the development process, sometimes eventually providing capabilities that they were not intended to provide. When this happens to a system that was not built for change, the result can be a Frankenstein’s monster of work-arounds.
Creating an architecture that is built for change involves designing a system that can be easily modified or adapted to meet the evolving business needs, new technology trends, and changing customer requirements. Key attributes to architect for change include modularity, which is dividing the system into small independent modules, and pluggability, which allows you to easily replace these modules. When architecting for cyber-physical systems, there are several unique challenges, such as heterogeneous interfaces, where we mix hardware and software components from a variety of suppliers and need to ensure that they communicate.
A good example of a cyber-physical system that was defined with change in mind is Lockheed Martin’s SmartSat technology. The smart satellite can be reprogrammed in orbit for entirely different missions. During my (Robin’s) time at Lockheed Martin, executives described the technology as a smartphone in space, which I thought was an excellent metaphor. Smartphones are very powerful computers that can operate as many different devices, like a camera, navigation system, calculator, or a phone. By downloading a new app, you can add entirely new capabilities to your phone. The smart satellite can do the same. The satellite is a module with pluggable capabilities that connect through standardized interfaces.1
The smart satellite technology is a software-defined, hardware-reliant system where the hardware is configured and controlled by elements of the software. The software-defined, hardware-reliant system is a growing trend across all industries and is often known as software-defined everything (SDE). SDE uses software to abstract and control the hardware for computing, storage, networking, and physical devices, which provides a solid foundation for system agility.
To build for change, begin by understanding the drivers of change, such as potential business needs or technology trends on the horizon. Modular design and applied design patterns can help teams build in flexibility. It’s also important to identify areas that may impact scalability, such as data growth or user traffic. Teams should run experiments with hypothetical changes and validate the areas that have the least flexibility.
Architect for Usability
Usability is critical to building better systems faster but is often not given enough emphasis. In the cyber-physical world, usability is much more than the aesthetics of a system; it can also determine safety for the humans in those environments. Take the cockpit design of an aircraft. Architecting for usability is a critical consideration, as a clear and intuitive user interface can help reduce the risk of pilot error and improve flight safety.
When architecting for usability, it is important to account for learnability, efficiency, memorability, failures, and satisfaction. During product design and development, it is important to focus on simple intuitive interfaces using metaphors and consistent design patterns and visual elements. Vehicle rear view cameras are an example of an intuitive interface in a cyber-physical system. The camera provides not only a visualization but also guardrail lines that show where the vehicle is in relation to other objects.
Architect for Availability
Availability refers to the ability of a system or service to be accessible and operational. Cyber-physical systems typically have high availability requirements. For example, cars need to be accessible and operational every time they are in use, or the consequences could be fatal. Key attributes to consider in this area are redundancy, recoverability, latency, and fault tolerance. While these areas have always been considered in architecting cyber-physical systems, new technologies have changed or enhanced how we can accomplish this.
Trends in cloud computing have enabled new ways to achieve redundancy and recoverability, which are achieved by automatic replication of data across multiple datacenters and load balancing. Edge computing, which processes data closer to the edge, has provided a mechanism to reduce latency. Chaos engineering (the discipline of experimenting on a system to build confidence in the system’s capability to withstand turbulent conditions in production) has improved our approach to fault tolerance by injecting faults into the system randomly, forcing an improvement in resilience. Today, NASA uses chaos engineering in simulators that regularly inject failures into the system, greatly reducing risk in actual launches.
Architect for Observability
Observability has always been a concern, but the term observability in association with software-intensive systems is relatively new and really came into focus with the rise of DevOps and the goal of continuous feedback. In addition, the adoption of Industry 5.0 principles has resulted in the need for increased visibility into system performance and health. Instrument systems need to provide full telemetry regarding behavior and performance of the system.
Today, technological advances help us build observability in new ways. For example, when you order dinner on Uber Eats or DoorDash, a screen pops up that shows your order being made, the driver picking up the order, and the delivery of the order to your house. The user has full observability into the system. That same level of observability into a cyber-physical system, such as a car, is very helpful in identifying the current state of the system, understanding where updates need to be made, identifying where problematic behavior is coming from, and even alerting the user of intrusions. Key attributes to consider for observability are alerting, monitoring, logging, and tracing.
New technologies are not only driving improvements in greenfield systems. In September of 2020, the DoD installed Kubernetes on the U-2 Dragon Lady test flight.2 The flight computers on the U-2 were able to use Kubernetes to run advanced machine-learning algorithms without any impact on the aircraft’s flight or mission systems. This allows us to capitalize on the aircraft’s high-altitude line of sight and makes it even more survivable in a contested environment.
Architect for Scalability
Systems need to be designed to easily adapt to the changing needs of users and stakeholders. Scale is not just about growth; it’s about the system being able to adapt. For example, a manufacturing system that uses robots to assemble products needs to be able to dynamically allocate robot resources to different assembly lines based on demand. Key attributes to consider for scalability include elasticity, latency, and distribution.
Elasticity is the system’s ability to dynamically adapt resources based on changing needs. Cloud technology has increased our ability to improve elasticity through distribution, where we can spread workloads over multiple resources. Advancements in areas such as in-memory computing, high-speed memory, and faster processors reduce latency concerns.
Formula One has taken architecture for scalability to the next level by leveraging Amazon Web Services (AWS) to scale their computational fluid dynamics technology environment, which allowed them to use empirical data to design their 2021 car. The environment enabled them to run 1,150 compute cores and analyze more than 550 million data points that model the impact of one car’s aerodynamic wake on another to pick the optimal design. The AWS environment allowed them to scale their computational fluid dynamics environment and reduce simulation time by over 80%.3 The faster they can learn, the more successful they can be.
Architect for Reusability
A top concern of nearly every organization today is speed and value for money. The easiest way to go faster and be more cost-effective is through reuse. Investing in architecting for reuse requires modularity and standardization with loose coupling between components and patterns of abstraction. In addition, each of the components must have disciplined version control with the ability to support backward compatibility.
Modularity is the degree to which the components can be separated and recombined. For example, we can change an embedded system with a field-programmable gate array (FPGA) by simply reprogramming the FPGA to implement a different function, whereas an embedded system containing a fully integrated circuit would require a whole new circuit.
Standardization, especially for interfaces, enables a system to rapidly change by providing the ability to plug in a new component as long as it meets the interface standard. For example, CubeSat Serial Interface, developed by NASA, allows actuators to be easily replaced.
Architecting for reuse requires loose coupling between components, where each component in the system has minimal dependencies on another. One example of loose coupling in satellite systems can be seen in attitude control systems: a change to a sensor should not impact anything associated with the magnetometer. The last key to reuse is leveraging patterns of abstraction, which is borrowed from software design patterns where complex implementation details of the system are hidden behind simplified interfaces. For decades, we have known that rockets were one-and-done until SpaceX decided to change the game with reusable launch technology. SpaceX reduced launch costs by roughly 60%, and as of December 2021, they have had one hundred landings.4 Key attributes that allow us to increase reuse are standardization, modularity, and patterns of abstraction.
