Modern Digital Applications with Lee Atchison

Lee Atchison

Welcome to Modern Digital Applications - a podcast for corporate decision makers and executives looking to create or extend their digital business with the help of modern applications, processes, and software strategy. Your host is Lee Atchison, a recognized industry thought leader in cloud computing and published author bringing over 30 years of experience. This podcast uses the following third-party services for analysis: Chartable - https://chartable.com/privacy Podtrac - https://analytics.podtrac.com/privacy-policy-gdrp read less

TechnologyTechnology

Play Trailer

Notice: New podcast starting later this summer

Notice: New podcast starting later this summer

Modern Digital Applications is changing and coming back after its year long hiatus. Join us for the launch of Modern Digital Business!Modern Digital Business will be coming later this summer. If you'd like to be informed when it's ready to launch, please go to mdb.fm/launch.We hope to see you there!This podcast uses the following third-party services for analysis: Chartable - https://chartable.com/privacyPodtrac - https://analytics.podtrac.com/privacy-policy-gdrp

Episodes

Notice: New podcast starting later this summer

Notice: New podcast starting later this summer

Modern Digital Applications is changing and coming back after its year long hiatus. Join us for the launch of Modern Digital Business!Modern Digital Business will be coming later this summer. If you'd like to be informed when it's ready to launch, please go to mdb.fm/launch.We hope to see you there!This podcast uses the following third-party services for analysis: Chartable - https://chartable.com/privacyPodtrac - https://analytics.podtrac.com/privacy-policy-gdrp

The Great Repo Debate with Kevin Goslar, p2

The Great Repo Debate with Kevin Goslar, p2

My guest today is Kevin Goslar. Kevin is the Senior Vice President for Technology Strategy at Originate, a digital agency that helps organizations with digital transformation best practices. He has a PhD in business informatics, and is an avid software developer. He currently is the maintainer for Git Town, an open-source project that provides a high-level CLI for Git. Previously, Kevin worked as a software developer at Google, which is where he was exposed to Mono-repos.Kevin is a Git expert and process advocate, and he’s here to discuss with me the pros and cons of monorepos vs polyrepos. This is part 2 of my interview with Kevin.This podcast uses the following third-party services for analysis: Chartable - https://chartable.com/privacyPodtrac - https://analytics.podtrac.com/privacy-policy-gdrp

The Great Repo Debate with Kevin Goslar, p1

The Great Repo Debate with Kevin Goslar, p1

My guest today is Kevin Goslar. Kevin is the Senior Vice President for Technology Strategy at Originate, a digital agency that helps organizations with digital transformation best practices. He has a PhD in business informatics, and is an avid software developer. He currently is the maintainer for Git Town, an open-source project that provides a high-level CLI for Git. Previously, Kevin worked as a software developer at Google, which is where he was exposed to Mono-repos.Kevin is a Git expert and process advocate, and he’s here to discuss with me the pros and cons of monorepos vs polyrepos. This podcast uses the following third-party services for analysis: Chartable - https://chartable.com/privacyPodtrac - https://analytics.podtrac.com/privacy-policy-gdrp

Learning from Incidents with Beth Long, p2

Learning from Incidents with Beth Long, p2

My guest today is Beth Long. Beth worked at New Relic where she held roles in both engineering and marketing, including two years leading the Reliability Engineering team, which owned the tooling and process for incident response and analysis. She also led New Relic's collaboration with the SNAFU Catchers, a group of researchers in- vestigating how tech companies learn from incidents.Beth recently left New Relic to join the startup Jeli.io, where she leads the engi- neering team working on the industry's first incident analysis platform.LinksBeth Long, Engineering Manager at Jeli.ioLinkedIn: https://www.linkedin.com/in/beth-adele-long/Twitter: https://twitter.com/BethAdeleLongFeatured in this episode: Jeli.io (https://jeli.io)Learning From Incidents with Jeli (https://leeatchison.com/atscale/2020/12/07/learning-from-incidents-with-jeli/)S3 Outage Mentioned in this Episode (https://thenewstack.io/dont-write-off-aws-s3-outage-fat-finger-folly/)This podcast uses the following third-party services for analysis: Chartable - https://chartable.com/privacyPodtrac - https://analytics.podtrac.com/privacy-policy-gdrp

Learning from Incidents with Beth Long, p1

Learning from Incidents with Beth Long, p1

My guest today is Beth Long. Beth worked at New Relic where she held roles in both engineering and marketing, including two years leading the Reliability Engineering team, which owned the tooling and process for incident response and analysis. She also led New Relic's collaboration with the SNAFU Catchers, a group of researchers in- vestigating how tech companies learn from incidents.Beth recently left New Relic to join the startup Jeli.io, where she leads the engi- neering team working on the industry's first incident analysis platform.LinksBeth Long, Engineering Manager at Jeli.ioLinkedIn: https://www.linkedin.com/in/beth-adele-long/Twitter: https://twitter.com/BethAdeleLongFeatured in this episode: Jeli.io (https://jeli.io)Learning From Incidents with Jeli (https://leeatchison.com/atscale/2020/12/07/learning-from-incidents-with-jeli/)S3 Outage Mentioned in this Episode (https://thenewstack.io/dont-write-off-aws-s3-outage-fat-finger-folly/)This podcast uses the following third-party services for analysis: Chartable - https://chartable.com/privacyPodtrac - https://analytics.podtrac.com/privacy-policy-gdrp

ICYMI: Reducing the Risk of Your Cloud Migration

ICYMI: Reducing the Risk of Your Cloud Migration

The scheduling of a cloud migration is a complex undertaking that should be thought and planned in advance.But in order for a migration to be successful, it’s important that you limit your risk as much as possible during the migration itself, so that unforeseen problems don’t show up and cause your migration to go sideways, fail outright, or result in unexpected outages that negatively impact your business.This podcast uses the following third-party services for analysis: Chartable - https://chartable.com/privacyPodtrac - https://analytics.podtrac.com/privacy-policy-gdrp

ICYMI: Avoid Downtime When Migrating Data to the Cloud

ICYMI: Avoid Downtime When Migrating Data to the Cloud

Moving your data is one of the trickiest parts of a cloud migration. During the migration, the location of your data can have a significant impact on the performance of your application. During the data transfer, keeping the data intact, in sync, and self-consistent requires either tight correlation or—worse—application downtime.Moving your data and the applications that utilize the data at the same time is necessary to keep your application performance acceptable. Deciding how and when to migrate your data relative to your services, though, is a complex question. Often companies will rely on the expertise of a migration architect, which is a role that can greatly contribute to the success of any cloud migration.Whether you have an on-staff cloud architect or not, there are three primary strategies for migrating application data to the cloud.This podcast uses the following third-party services for analysis: Chartable - https://chartable.com/privacyPodtrac - https://analytics.podtrac.com/privacy-policy-gdrp

Future of Cloud Identity with Thomas Curran, p2

Future of Cloud Identity with Thomas Curran, p2

My guest today is Thomas Curran. Thomas is a cloud executive with many years of experience, including VP of Technology and Innovation at Deutsche Telekom and Technology Advisor at Deutsche Börse. He is the co-founder of the Ory Software Foundation, which is the owner of a very popular open source, go-based, identity management library named Kratos, along with other open source identity management tools.Now, Thomas is co-founders of Ory Corp, an Open Source Identity Infrastructure and Services company.Thomas is with me today from his office in Munich, Germany, to talk about application identity management.As means of full disclosure, I’ve worked with Thomas personally for many years, first meeting him back when he was at Deutsche Börse. I’m now currently working directly with Thomas at Ory, architecting their new cloud infrastructure.Links and More Information* Thomas LinkedIn (https://www.linkedin.com/in/thomasaidancurran/)* Ory (https://ory.sh)Tech Tapas — History of the Term SaaSWhen did software as a service start? Well, that depends on what you mean by the term… depending on how you define SaaS, the answer is either the early 1960s, or somewhere around 2005.Back in the early days of computing, all applications ran on a centralized computer. Users accessed the computers remotely. Initially via punch cards and later via remote terminals. The centralized nature of the application is, by a true definition, Software as a Service.But the modern definition of SaaS is tied much more closely with cloud computing. SaaS now-a-days refers to software running centrally, typically in a public or private cloud environment, and is shared among multiple users. A thin client of some sort — either a web browser or a thin mobile application — is used to front the centralized application.From a business model standpoint, users don’t buy SaaS software, instead they rent or lease access to it with monthly or annual fees. Alternatively, the service could be free and supported by advertising or other monetization processes. This is the heart of the business model for social media, for example.So, SaaS is an old term that has been given new meaning in recent years. But it’s the recent definition that has really changed the way people think and build software today.Tech Tapas — Amazon S3Amazon S3. A highly durable, highly available file and object storage mechanism in the cloud. This service is the go to service for most companies that want to store huge quantities of data in the cloud, or for long term persistent object storage.S3 was designed with the goals of being highly available, highly durable, and highly scalable. The design goal for availability is 99.99%, with a durability of objects of 99.999999999 (that’s 11 9’s).How available? The 4 9’s availability translates to a total of 52 minutes of downtime per year.How durable? The 11 9’s durability means that if every man, woman, and child in the world had an object in S3, then Amazon would lose at most one of those objects, approximately once every 15 years.These are amazing goals, and is one of the reasons S3 has such a great reputation as a high quality object storage system. S3 was one of three initial AWS services and was a big part of AWS’s early success.This podcast uses the following third-party services for analysis: Chartable - https://chartable.com/privacyPodtrac - https://analytics.podtrac.com/privacy-policy-gdrp

Future of Cloud Identity with Thomas Curran, p1

Future of Cloud Identity with Thomas Curran, p1

My guest today is Thomas Curran. Thomas is a cloud executive with many years of experience, including VP of Technology and Innovation at Deutsche Telekom and Technology Advisor at Deutsche Börse. He is the co-founder of the Ory Software Foundation, which is the owner of a very popular open source, go-based, identity management library named Kratos, along with other open source identity management tools.Now, Thomas is co-founders of Ory Corp, an Open Source Identity Infrastructure and Services company.Thomas is with me today from his office in Munich, Germany, to talk about application identity management.As means of full disclosure, I’ve worked with Thomas personally for many years, first meeting him back when he was at Deutsche Börse. I’m now currently working directly with Thomas at Ory, architecting their new cloud infrastructure.Links and More Information* Thomas LinkedIn (https://www.linkedin.com/in/thomasaidancurran/)* Ory (https://ory.sh)This podcast uses the following third-party services for analysis: Chartable - https://chartable.com/privacyPodtrac - https://analytics.podtrac.com/privacy-policy-gdrp

Reducing the Risk of Your Cloud Migration

Reducing the Risk of Your Cloud Migration

The scheduling of a cloud migration is a complex undertaking that should be thought and planned in advance. Typically, a migration architect is involved and makes the difficult technical decisions of what to migrate when, in concert with the organization management to take into account the business needs.But it’s important for a migration to be successful that you limit your risk as much as possible during the migration, so that unforeseen problems don’t show up and cause your migration to go sideways, fail, or result in unexpected outages that negatively impact your business.When scheduling the migration, there are a number of things you should keep in mind to increase the likelihood of a successful migration and reduce the risk of the migration itself. Here are five key methods to reducing the risk of your cloud migration, and hence increase your overall chance for success.Links and More InformationThe following are links mentioned in this episode, and links to related information:• Modern Digital Applications Website (https://mdacast.com)• Lee Atchison Articles and Presentations (https://leeatchison.com)• Architecting for Scale, published by O’Reilly Media (https://architectingforscale.com)• Advising and Consulting Services by Lee Atchison (https://atchisontechnology.com)• Course: Building a Cloud Roadmap, 2018-2019 (https://leeatchison.com/classes/building-a-cloud-roadmap/)Key #1. Limit the complexity of migrating your dataThe process of migrating your data from your on-premise datastores to the cloud is, itself, the hardest, most dangerous, and most time-consuming part of your migration. There are many ways to migrate your data…some of the methods are quite complex and some of them are very basic. Some of them result in no need for downtime, others require significant downtime in order to implement.There is a tradeoff you need to make between the complexity of the migration process and the impact that complexity has on the migration, including the potential need for site downtime. While in some scenarios you must implement a complex data migration scheme to reduce or eliminate downtime and reduce risk along the way, in general I recommend choosing as simple of a data migration scheme as possible given your system constraints and business constraints. The more complex your data migration strategy, the riskier your migration.By keeping the data migration process as simple as practical given your business constraints, you reduce the overall risk of failure in your migration.Be aware, though, that you may require a certain level of migration complexity in order to maintain data redundancy and data availability during the migration itself. So the ultimate simplest migration process may not be available to you. Still, it’s important that you select the simplest migration process that achieves your business and technical migration goals.Key #2. Reduce the duration of the in-progress migration as much as possible.Put another way, do as much preparation work before you migrate as you can, and then once you start the migration, move as quickly as possible to completing the migration, postponing as much work as possible until after the migration is complete and validated. By doing as much preparation work before the migration as possible and pushing as much cleanup work to after the migration as possible, you reduce the amount of time and complexity of the migration itself. Given that your application is most at risk of a migration related failure during the migration process itself, reducing this in-migration time is critical to reducing your overall risk.For example, it may be possible to accept a bit lower overall application performance in the short term—during the migration, in order to get to the end of your migration quicker. Then, after the migration is complete, you can do some performance...

AWS Certifications with Kevin Downs, p2

AWS Certifications with Kevin Downs, p2

Amazon Web Services provides a cloud certification program to encourage and enable growing your AWS cloud technical skills to help you grow your career and your business. Have you wondered what it takes to become AWS certified?In this episode, I conclude my interview with Kevin Downs, a trial by fire expert on the AWS certification program, as we discuss the AWS cloud certification program, and how to best utilize it.And then, what was the first AWS service?This is AWS Certifications, on Modern Digital Applications.Links and More InformationThe following are links mentioned in this episode, and links to related information:Modern Digital Applications Website (https://mdacast.com)Lee Atchison Articles and Presentations (https://leeatchison.com)Architecting for Scale, published by O’Reilly Media (https://architectingforscale.com)Advising and Consulting Services by Lee Atchison (https://atchisontechnology.com)AWS Certifications (https://aws.amazon.com/certification/)A Cloud Guru (https://acloudguru.com)Kevin Downs Twitter (https://twitter.com/kupsand)Kevin Downs LinkedIn (https://www.linkedin.com/in/kevin-downs/)This episode is part 2 and final part of my interview with Kevin Downs.This podcast uses the following third-party services for analysis: Chartable - https://chartable.com/privacyPodtrac - https://analytics.podtrac.com/privacy-policy-gdrp

AWS Certifications with Kevin Downs, p1

AWS Certifications with Kevin Downs, p1

Amazon Web Services provides a cloud certification program to encourage and enable growing your AWS cloud technical skills to help you grow your career and your business. Have you wondered what it takes to become AWS certified?In this episode, join me with Kevin Downs, a trial by fire expert on the AWS certification program, while we discuss the AWS cloud certification program, and how to best utilize it.And then, what was EC2 like in the old days? Back before it was actually useful?This is AWS Certifications, on Modern Digital Applications.Links and More InformationThe following are links mentioned in this episode, and links to related information:Modern Digital Applications Website (https://mdacast.com)Lee Atchison Articles and Presentations (https://leeatchison.com)Architecting for Scale, published by O’Reilly Media (https://architectingforscale.com)Advising and Consulting Services by Lee Atchison (https://atchisontechnology.com)AWS Certifications (https://aws.amazon.com/certification/)A Cloud Guru (https://acloudguru.com)Kevin Downs Twitter (https://twitter.com/kupsand)Kevin Downs LinkedIn (https://www.linkedin.com/in/kevin-downs/)This episode is part 1 of 2 of my interview with Kevin Downs.This podcast uses the following third-party services for analysis: Chartable - https://chartable.com/privacyPodtrac - https://analytics.podtrac.com/privacy-policy-gdrp

Likelihood and Severity

Likelihood and Severity

Likelihood and Severity. Two different measures for two different aspects of measuring risk in a modern digital application.They are both measures of risk, but they measure different things. What is the difference between likelihood and severity? And why does it matter?In this episode, I’ll discuss Likelihood and Severity, how they are different, and how they are both useful measures of risk in a modern digital application.Links and More InformationThe following are links mentioned in this episode, and links to related information:• Modern Digital Applications Website (https://mdacast.com)• Lee Atchison Articles and Presentations (https://leeatchison.com)• Architecting for Scale, published by O’Reilly Media (https://architectingforscale.com)• Advising and Consulting Services by Lee Atchison (https://atchisontechnology.com)• Learning Path - Risk Management (http://leeatchison.com/classes/learning-path-risk-management/)• O’Reilly Learning Path Course (https://learning.oreilly.com/learning-paths/learning-path-microservices/9781492061106/)Microservice architectures offer IT organizations many benefits and advantages over traditional monolithic applications. This is especially true in cloud environments where resource optimization works hand-in-hand with microservice architectures.So it’s no mystery that so many organizations are transitioning their application development strategies to a microservices mindset. But even in the realm of microservices, building and operating an application at scale can be daunting.Problems can include something as fundamental as having too few resources and time to continue developing and operating your application, to underestimating the needs of your rapidly growing customer base. At its best, failure to build for scale can be frustrating. At its worst, it can cause entire projects—even whole companies—to fail.Realistically, we know that it’s impossible to remove all risk from an application. There is no magic eight ball — no crystal ball — that allows you to see in the future and understand how decisions you make today impact your application tomorrow. Risk will always be a burden to you and your application. But, we can learn to mitigate risk. We can learn to minimize and lessen the impact of risk before problems associated with the risk negatively impact you and your applications.I’ve worked in many organizations, and have observed many more. Planning for problems is very hard and something most organizations fail to do properly. Technical debt is often a nebulous concept. Quantifying risk is the first step to understanding vulnerability. It also helps set priorities and goals. Is fixing one potential risk more important than another? How can you decide if the risks aren’t understood and quantified.In this episode, we’re going to talk about how to measure risk, so that you can build, maintain, and operate large, complex, modern applications at scale.There is a great quote by Donald Rumsfeld, twice former secretary of defense for the United States. It starts “Reports that say that something hasn’t happened are always interesting to me”.He goes on to say: “because, as we know, there are known knowns, there’re things we know we know. We also know there are known...

Five Causes of Poor Availability

Five Causes of Poor Availability

Building a scalable application that has high availability is not easy. Problems can crop up in unexpected ways that can cause your application to stop working and stop serving your customer’s needs.No one can anticipate where problems will come from and no amount of testing will identify and correct all issues. Some issues end up being systemic problems that require the correlation of multiple systems in order for the problems to occur. Some are more basic, but are simply missed or not anticipated.Links and More InformationThe following are links mentioned in this episode, and links to related information:Modern Digital Applications Website (https://mdacast.com)Lee Atchison Articles and Presentations (https://leeatchison.com)Architecting for Scale, published by O’Reilly Media (https://architectingforscale.com)Application availability is critical to all modern digital applications. But how do you avoid availability problems? You can do so by avoiding those traps that cause poor availability.There are five main causes of poor availability that impact modern digital applications.Poor Availability Cause Number 1Often, the main driver of application failure is success. The more successful your company is, the more traffic your application will receive. The more traffic it receives, the more likely you will run out of some vital resource that your application requires.Typically, resource exhaustion doesn’t happen all at once. Running low on a critical resource can cause your application to begin to slow down, backlogging requests. Backlogged requests generate more traffic, and ultimately a domino effect drives your application to fail.But even if it doesn’t fail completely, it can slow down enough that your customers leave. Shopping carts are abandoned, purchases are left uncompleted. Potential customers go elsewhere to find what they are looking for.Increasing the number of users using your system or increase the amount of data these consumers are using in your system, and your application may fall victim to resource exhaustion. Resource exhaustion can result in a slower and unresponsive application.Poor Availability Cause Number 2When traffic increases, sometimes assumptions you’ve made in your code on how your application can scale are proven to be incorrect. You need to make adjustments and optimizations on the fly in order to resolve or work around your assumptions in order to keep your system performant. You need to change your assumptions on what is critical and what is not.The realization that you need to make these changes usually comes at an inopportune time. They come when your application is experiencing high traffic and the shortcomings start becoming exposed. This means you need a quick fix to keep things operating.Quick fixes can be dangerous. You don’t have time to architect, design, prioritize, and schedule the work. You can’t think through to make sure this change is the right long term change You need to make changes now to keep your application afloat.These changes, implemented quickly and at the last minute with little or no forethought or planning, are a common cause of problems. Untested and limited tested fixes, quickly thought through fixes, bad deployments caused my skipping important steps. All of these things can introduce defects into your production environment. The fact that you need to make changes to maintain availability, will itself threaten your availability.Poor Availability Cause Number 3When an application becomes popular, your business needs usually demand that your...

Why you must scale

Why you must scale

We often hear that being able to scale your application is important. But why is it important? Why do we need to be able to suddenly, and without notice, scale our application to handle double, triple, or even ten times the load it is currently experiencing?Why is scaling important?In this episode, I am going to talk about four basic reasons. Four reasons why scaling is important to the success of your business.And then, what is the dynamic cloud?This is Application Scaling, on Modern Digital Applications.Links and More InformationThe following are links mentioned in this episode, and links to related information:Modern Digital Applications Website (https://mdacast.com)Lee Atchison Articles and Presentations (https://leeatchison.com)Architecting for Scale, published by O’Reilly Media (https://architectingforscale.com)Why you must scaleWe often hear that being able to scale your application is important. But why is it important? Why do we need to be able to suddenly, and without notice, scale our application to handle double, triple, or even ten times the load it is currently experiencing?Why is scaling important?There are many reasons why our applications must scale. A growing business need is certainly one important reason. But there are other reasons why architecting your application so it can scale is important for your business.I am going to talk about four basic reasons. Four reasons why scaling is important to the success of your business.Reason #1. Support your growing businessThis is the first, and the most basic reason why your application has to scale. As your business grows, your application needs grow. But there is more to it than that. There are three aspects of a growing business that impact your application and require it to scale.First, is the most obvious. As you get more customers, your customer’s make more use of your applications and they need more access to your website. This requires more capacity and more growth for the IT infrastructure for your sites.But that’s not the only aspect.As your application itself grows and matures, typically you will add more and more features and capabilities to the application. Each new feature and each new capability means customers will make more use of your application. As each customer uses more of your application, the application itself has to scale. Simply by your business maturing over time, even if the size of your customer base doesn’t grow, the computation needs for your application grow and your application must scale.And finally, as your business grows and matures, and your application grows and matures, your more complex application will require more engineers to work on the application simultaneously, and they will work on more complex components. Your application might be rearchitected to be service based. It might add additional external dependencies and provisions. You will have to support more deployments and more updates. Your application and your application infrastructure will need to scale to support larger development teams and larger projects.This means you need more mature processes and procedures to scale the speed at which your larger team can improve your application.Reason #2. Handle surprise situationsThe second reason you need to be able to scale your application is to handle surprise situations and conditions. All businesses have their biggest days. These are the days where traffic is at the heaviest. These are days like Black Friday in retail, or the day of the Super Bowl for...

Risk Management with Ken Gavranovic

Risk Management with Ken Gavranovic

Ken Gavranovic was the Executive Vice President and GM for product at New Relic. In early 2019, Ken and I were in Boston together for an event, and we recorded an interview discussion about Risk Management in modern digital applications.Both Ken and I have experience dealing with Risk Management issues in current and past assignments. I discuss Risk Management in my book, Architecting for Scale. Ken used a very similar risk management technique in his past corporate management gigs. In this interview, we compare notes and make recommendations on best practices for Risk Management that everyone can use.Links and More InformationThe following are links mentioned in this episode, and links to related information:Modern Digital Applications Website (https://mdacast.com)Lee Atchison Articles and Presentations (https://leeatchison.com)Architecting for Scale, published by O’Reilly Media (https://architectingforscale.com)Risk Management with Ken Gavranovic Video (https://leeatchison.com/2019/02/06/managing-risk-in-modern-enterprise-applications/)Ken Gavranovic Twitter (https://twitter.com/kgavranovic)Ken Gavranovic LinkedIn (https://www.linkedin.com/in/gavranovic/) Risk Management InterviewKen: I know we both talk to a lot of customers. One of the questions is, where do I get started? What are some of the patterns we see in enterprises and our own experiences? We have an awesome opportunity to talk to a lot of companies doing digital transformation, but what is something that I can just go do tomorrow to get started?Lee: One of the things I find it’s very easy to wrap your mind around is risk management. How do you build a risk matrix to track the issues and the risks you have within your system? I like to talk to companies about that because it gets people starting to think about what their system is doing, what problems they have, and how they deal with them. It gets them thinking beyond just the problem/resolution cycle, and more into a pro/con and risk assessment process. What is the benefit of fixing something versus the benefit of mitigating it versus the benefit of simply ignoring it? I like to talk about that because it gets conversations going within the company about the sorts of things that are important to them.Creating a risk matrix is an important first step for anyone who is thinking about trying to improve their availability, trying to improve their scalability, or trying to modernize their application in many different ways. It helps get a grip on the issues that already exist in your system and what you are currently doing to manage those risks.Ken: I 100% agree. I remember in a previous role, I had a couple hundred-million-dollar project, I had some challenges. We created a risk matrix which helped us solve those challenges. So I thought it might be helpful for people watching this video. Let’s double click and see what this might look like.From my perspective, I think the key questions that need to be asked, those questions need to be asked in a bottoms-up way, not top down. Agreed?Lee: Yes, definitely.Ken: It’s not people at the top of the organization that are giving you the answers. It’s the team level that gives you the answers you need. Let me give you my shot and tell me where I miss.First of all, the things that can go into the risk are the things that can go bump in the...

How to Improve Application Availability, p2

How to Improve Application Availability, p2

Modern applications require high availability. Our customers expect it, our customers demand it. But building a modern scalable application that has high availability is not easy and does not happen automatically. Problems happen. And when problems happen, availability suffers. Sometimes availability problems come from the simplest of places, but sometimes they can be highly complex.In this episode, we will continue our discussion from last week with the remainder of the five strategies for keeping your modern application, highly available as well.This is How to Improve Application Availability, on Modern Digital Applications.Links and More InformationThe following are links mentioned in this episode, and links to related information:Modern Digital Applications Website (https://mdacast.com)Lee Atchison Articles and Presentations (https://leeatchison.com)Architecting for Scale, published by O’Reilly Media (https://architectingforscale.com)Robinhood Announcement (https://blog.robinhood.com/news/2020/3/3/an-update-from-robinhoods-founders) How to Improve Availability, Part 2Building a scalable application that has high availability is not easy and does not come automatically. Problems can crop up in unexpected ways that can cause your application to stop working for some or all of your customers. No one can anticipate where problems will come from, and no amount of testing will find all issues. Many of these are systemic problems, not merely code problems.To find these availability problems, we need to step back and take a systemic look at our applications and how they works.What follows are five things you can and should focus on when building a system to make sure that, as its use scales upwards, availability remains high. In part 1 of this series, we discussed two of these focuses. The first was building with failure in mind. The second was always think about scaling. In part 2 of this series, we conclude with the remaining three focuses.Number 3 - Mitigate riskKeeping a system highly available requires removing risk from the system. When a system fails, often the cause of the failure could have been identified as a risk before the failure actually occurred. Identifying risk is a key method of increasing availability.All systems have risk in them. There is risk that: A server will crashA database will become corruptedA returned answer will be incorrectA network connection will failA newly deployed piece of software will failKeeping a system available requires removing risk. But as systems become more and more complicated, this becomes less and less possible. Keeping a large system available is more about managing what your risk is, how much risk is acceptable, and what you can do to mitigate that risk.This is Risk management, and it is at the heart of building highly available systems. Part of risk management is risk mitigation. Risk mitigation is knowing what to do when a problem occurs in order to reduce the impact of the problem as much as possible. Mitigation is about making sure your application works as best and as completely as possible, even when services and resources fail. Risk mitigation requires thinking about the things that can go wrong, and putting a plan together now, to be able to handle the situation when it does happen.For example, consider a typical online e-commerce store. Being able to search for product on the e-commerce store is critical to almost any online store.

How to Improve Application Availability, p1

How to Improve Application Availability, p1

Modern applications require high availability. Our customers expect it, our customers demand it. But building a modern scalable application that has high availability is not easy and does not happen automatically. Problems happen. And when problems happen, availability suffers. Sometimes availability problems come from the simplest of places, but sometimes they can be highly complex.In this episode, we will discuss five strategies for keeping your modern application, highly available as well.This is How to Improve Application Availability, on Modern Digital Applications.Links and More InformationThe following are links mentioned in this episode, and links to related information:Modern Digital Applications Website (https://mdacast.com)Lee Atchison Articles and Presentations (https://leeatchison.com)Architecting for Scale, published by O’Reilly Media (https://architectingforscale.com)How to Improve Availability, Part 1Building a scalable application that has high availability is not easy and does not come automatically. Problems can crop up in unexpected ways that can cause your application to stop working for some or all of your customers. These availability problems often arise from the areas you least expect, and some of the most serious availability problems can originate from extremely simple sources.Let’s take a simple example from a real world application that I’ve worked on in the past. This problem really happened.The software was a SaaS application. Customer’s could login to the application and they received a customized experience for their personal use. One of the ways that the customer could tell they were logged in is that an avatar of themselves appeared in the top right hand corner. It wasn’t a big deal, but it was a handy indicator that you were receiving a personalized environment. We’ve all seen this sort of thing, it’s pretty common in online software applications now-a-days.Anyway, by default, when we showed the page, we read the avatar from a 3rd party avatar service that told us what avatar to display for the current user. One day, that third party system failed. Our application, which made the poor assumption that the avatar service would always be working, also failed. Simply because we were unable to display a picture of the user in the upper right hand corner, our entire application crashed and nobody could use it. It was, of course, a major problem for us. It was harder too because the avatar service was out of our control. Our business was directly tied to a 3rd party service we had no control over, and we weren’t even aware of the dependency.A very minor feature crashed our entire business…Our business crashed because of an icon.Obviously, that was unacceptable.How could we have avoided this problem? There were a thousand solutions to the problem. By far the easiest would have been to notice and catch any failure of the 3rd party service in realtime, and if it did fail, show some default generic avatar instead. There was no need to bring down our entire application over this simple problem. A simple check, some error recovery logic, some fallback options, that’s all it would have taken to avoid crashing our entire business.No one can anticipate where problems will come from, and no amount of testing will find all issues. Many of these are systemic problems, not merely code problems.To find these availability problems, we need to step back and take a systemic look at our applications and how they works.What follows are five things you can and should focus on when building a system to make sure that, as its use scales upwards, availability...

How to maintain availability with multiple AWS accounts

How to maintain availability with multiple AWS accounts

In this episode, we know that using multiple availability zones helps increase your application availability and resiliency by distributing our application across multiple disperse data centers. But did you know that availability zones don’t necessarily give you the separation you expect? In fact, it is entirely possible to have two instances of a service running in two distinct availability zones, but actually have them running in the same data center, in the same physical rack, and possibly even on the same physical server! How can this be? And even more importantly, how can we avoid it? The answer involves understanding how availability zones work and how they are structured.And then, one of the oddest cloud services created is also one of the first cloud services. Before AI and before machine learning, humans actually powered a part of the cloud.This is, Life with Multiple AWS Accounts.Links and More InformationThe following are links mentioned in this episode, and links to related information:How to maintain availability when using multiple AWS accounts (https://www.infoworld.com/article/3444860/5-pain-points-of-modern-software-development-and-how-to-overcome-them.html)Modern Digital Applications Website (https://mdacast.com)Lee Atchison Articles and Presentations (https://leeatchison.com)Architecting for Scale, published by O’Reilly Media (https://architectingforscale.com) Distributing Your ApplicationWhen building a modern, high-performant application at scale, it’s important to make sure the individual application instances are distributed across a variety of data centers in such a way that if any given data center goes offline, the application can continue to function relatively normally. This is an industry-wide best practice, and an important characteristic to architect into your applications in order to make them sufficiently resilient to data center problems.The same philosophy occurs when you build your application in the cloud. Except, when you build a cloud-based application, you typically do not have visibility into which data center a particular server or cloud resource is located. This is part of the abstraction that gives the cloud its value.Not having visibility into which data centers your application is operating in makes it difficult to build multi data center resiliency into your applications. To solve this problem, AWS created a cloud abstraction of the data center that allows you to build on this level of resiliency without being exposed to the details of data center location. The abstraction is the availability zone.AWS availability zonesAn AWS availability zone is an isolated set of cloud resources that allows specifying a certain level of isolation into your applications. Resources within a single availability zone may be physically or virtually near each other, to the extent that they can be dependent on each other and share subcomponents with each other. For example, two EC2 servers that are in the same availability zone may be in the same data center, in the same rack, or even on the same physical server.However, cloud resources that are in different availability zones are guaranteed to be separated into distinct data centers. They cannot be in the same data center, they cannot be in the same rack, and they cannot be using the same physical servers. They are distinct and independent from each other.Hence, the solution to the resiliency problem, you can build your application to live in multiple...

Special Edition: The Great Serverless Debate, Redux — Special Guest Clay Smith, part 2

Special Edition: The Great Serverless Debate, Redux — Special Guest Clay Smith, part 2

This is a special edition of Modern Digital Applications.July 9th, 2018 was the launch of a podcast episode. It was an episode of the “Modern Software Podcast”, a podcast sponsored by New Relic, and hosted by New Relic’s Fredric Paul and Tori Wieldt. This particular episode was titled “The Great Serverless Debate”. It was a debate between myself, and a good friend of mine, Clay Smith. Clay and I were guests on the show.That episode was a huge success, and I still get asked questions about it today. It seemed to me that it was time for an update of that debate…a redux if you will. Since New Relic’s Modern Software Podcast isn’t active right now, I thought I would take on the challenge myself and host a redo of the great debate — based on what we know about serverless in 2020.So, on February 21, 2020, Clay and I got together for an update to our views on the world of serverless.This is The Great Serverless Debate, Redux. This is the second part of that interview.Links and More InformationThe following are links mentioned in this episode, and links to related information:Modern Digital Applications Website (https://mdacast.com)Lee Atchison Articles and Presentations (https://leeatchison.com)Architecting for Scale, published by O’Reilly Media (https://architectingforscale.com)The Great Serverless Debate (https://leeatchison.com/2018/07/11/the-great-serverless-debate/)Clay Smith - Twitter (https://twitter.com/smithclay)Clay Smith - LinkedIn (https://www.linkedin.com/in/smithclay/)Clay Smith’s Monitoring Monitoring Newsletter (https://monitoring2.substack.com/)Modern Software Podcast - Fredric Paul @TheFreditor (https://twitter.com/TheFreditor)Modern Software Podcast - Tori Wieldt @ToriWieldt (https://twitter.com/ToriWieldt) Lee’s Guest — Clay SmithClay Smith is a good friend of mine. He was a senior software engineer at several early-stage startup companies and has been building serverless solutions for many years now, from mobile backends to real-time APIs with Amazon Web Services. Clay was a senior Developer Evangelist at New Relic, which is where Clay and I met.Clay’s newsletter is “Monitoring Monitoring”. You can subscribe to the newsletter at https://monitoring2.substack.com/.Questions/Issues DiscussedIs Lambda living up to the hype?Is there an end to the hype anytime soon?Has Fargate lived up to the hype?What is the role of containers vs FaaS?What is the role of Kubernetes?What types of problems are suited for FaaS and what kind of problems are not?How good was our guesses in 2018 for the state of serverless in...