Thursday, July 3, 2008

Architecture on Agile projects- define a baseline architecture

One of the questions that frequently comes up when people move from waterfall to iterative incremental software development (Scrum, XP), is how architecture fits into this new world. Most people of the opinion that architecture needs to be in place before you can start iterative incremental development. My answer is yes and no. I will try and explain what I mean by that. This will be first in the series.

Whether we are building a simple application or a complex enterprise application, we need to have a starting point. The starting point of an architecture is defining a baseline.

The baseline architecture. Yes, this needs to be in place before we start our first sprint. In sprint 0, we will define our baseline architecture as we build our product backlog. Some of the things that are included in this baseline at this stage are,

  1. What software platform to use? We need to decide whether to use Java/J2EE, or .Net, or Ruby on Rails, or some other technology. We also need to select what version of the technology to use. Do we want to use the latest stable release or beta release of the technology? The decision depends on many factors. Among them are type of application, existing standards (companies may already have made this decision as part of enterprise architecture), experience and expertise of the developers as well as support people, strategic partnership etc.

  2. What hardware platform to use? This decision depends on the decision from the previous question. If we select .Net, then we all know we will have to use Wintel hardware. If we select Java or RoR, the choices are more- Lintel, Sun, AIX, etc. However, we most likely will choose Lintel. We may also have to make a decision about the vendor we will get the hardware from. Most of the time companies have existing relationship with a vendor.

  3. Where to host? We know what technology we are using as well as which platform to run the application on. How about where will the servers be located? Do we want to put the server in-house or do we want to outsource them to one of those hosting providers. If we already have an in-house data center, the answer is clear (even though I am not sure why most companies would want to continue to own and operate their data center now-a-days). If we decide to go with a data center provider, we have a number of options at our disposal- shared, virtual dedicated, dedicated, and cloud. However, who do we select as our data center provider depends on whether or not the hosting provider can support the growth of your application (scalability) in addition to pricing, and reputation. We do not need to know the exact load profile of our application to understand how the platform will scale. But, we need to have an idea about the expected growth of usage of the application and the type of users.

  4. What architecture style and pattern to use? Architectural style sets the tone of the solution whereas Architectural patterns, like design patterns, provides a proven skeleton for implementing the solution following the style. The most popular style is n-tier architecture. In recent years, we have seen growing popularity of SOA (service oriented architecture), and RESTful(representational state transfer) style of architecture. However, the mother of all architecture pattern is MVC- model, view, controller. We cannot go wrong with this and it is applicable to most situations. Other patterns to consider in conjunction with MVC are hub-and-spoke, pipe-and-filter, and publish and subscribe.

  5. What technology tools to use? Most likely you would build the solution from using one or more off-the-shelf products/components/tools, commercial or open-source, instead of doing everything from scratch. Some of the type of tools that you would want to consider are- MVC framework, ORM framework, logging library, instrumentation framework, rule engine, workflow engine, integration engine, BPM, reporting engine, ETL etc. Some of these tool selection may involve a formal vendor selection process. However, the baseline architecture should include a short list of 2/3 choices.

  6. How to handle security of the solution? I have listed this as a separate item to consider for the baseline architecture because of its importance. Even though we all understand the importance of security of any solution, we tend to put it in the back burner. We developers usually like to use adminstrative access (turn off security) to avoid the headache of dealing with security issues when we write codes, which is at odds with "Principle of Least Privilege." The best practice is to start with no previliage and open up just enough to get the job done. The baseline, at a minimum, should define how to perform authentication and authorization as well as data encryption. For example, we need to decide whether to use windows or database or LDAP for authentication and authorization. Also, we need to decide whether to encrypt communication and when to encrypt (e.g. HTTPS).

  7. What development tools to use? Even though it is not part of the solution and hence is not part of the architecture, it assists in implementing the architecture. Also right development tools ensure efficient and quality delivery. The is why typically the architect is on the hook to drive these decisions. The tools are SCM (software configuration management), IDE, testing (unit, integration, functional) frameworks, build tools, project management & collaboration tool (e.g., ScrumPad), and design tool (e.g., Rational Rose, Visio).

  8. Setup of separate development, and test environments. Typically developers write codes on PC/laptop and then push the codes to integrate and run in a separate environment for testing before the application can be put in the production environment for actual use. The test environment must replicate the setup of the production environment. The only difference between the two environments is that the production is setup to handle the expected load. That is why it is important to define a unit footprint of the solution to help setup the test environment. This also allows you to define how to scale your solution in production as well as determine when to scale (in terms of the load supported by the unit of the solution). For example, say your unit of the solution can handle x load, and you need to support 5x load, you know you need to get 5 units of your solution. Even though it may seem common sense , you will be surprised to find out that how many companies do not have this in place. Yes, there are situations when you cannot have the test environment exactly replicate the production environment. This is usually the case when legacy systems are involved. For development environment, we also need to make sure that the developers can integrate with rest of the codes easily. We need to make sure a developer can have access to a representative instance of each element of the architecture. For example, if the solution is using SQL server enterprise edition, the developer may choose to use SQl server express. Sometimes, we may not have separate instance for each component of the solution for individual developers, we may need to make an effort to come up with a work-around. For example, you may need to integrate with a Mainframe application, but it may not be accessible from a developer's laptop because of the absence of network access. You could use the test environment to allow developers to access the mainframe application from their laptops.

I know some of these are common sense elements of any solution, yet I am surprised to see how many projects do not have these in place. We should spend a lot of time to define the baseline. Some of the decisions are related and one decisions will help narrow our choices for the others. Others are purely individual (or organizational) preferences and we should be fine either way. Once we have the baseline defined, the detailed as well as additional elements can be iteratively fleshed out. I will discuss how to evolve a baseline architecture iteratively.