Non-Functional Requirements
Standard Artefact
Tags: architecture , requirements
This article has been replaced with a newer one, which can be found here.
In no particular order, these are the standard non-functional requirements I have used for years. Not all will apply in all circumstances, so like much in life you’ll have to use a bit of common sense, but they form a useful check-list. Each NFR is followed by a description and some discussion on how you might like to satisfy yourself that it’s being met.
As a means of checking NFRs have been met, I use the term ‘walk through’ quite a bit - I do not mean full Fagin style inspections. If you’re into Agile, then assume that I mean frequent communication and checking of the sort you get doing pair programming, but don’t ignore the benefits, even on an agile project, of spending half a day reviewing where you’ve got to and using multiple brains to poke holes in the design and code. Much of this may sound bureaucratic or heavyweight, but remember these are just tools to help you improve the fitness of what you put out, make it transparent to the process and you’ll hardly notice it’s happening. But do make sure these are debated early in the project, and make sure business reps are thinking about them. Use ‘what if this sort of thing went wrong’ analogies to get them interested and make the benefits case clear for getting them right.
Business Analysts should collate, question, and prioritise them (for each logical component or sub-system in the solution), just as they would functional requirements, but whereas it’s a developer’s role to meet functional requirements, it a joint development/architecture role to meet NFRs. Pretty much all NFRs are interdependent - for example, you can’t have high expectations around availability if you completely neglect reliability, as it will only take a couple of boxes to get themselves in a pickle for availability to be affected. NFRs are a vehicle for the operational characteristics of a solution to be defined then discussed, none of them mean very much in isolation.
The term ‘component’ is used here rather than ‘solution’ or ‘application’, but most NFRs could (and should) be applied at all levels, unless explicitly noted otherwise.
ID | Name | Definition | Example | Notes |
1 | Availability | The time the component is fully operational, as a percentage of total time | Component C will be fully operational for P % of the time over a continuous measured period of 30 days (equivilent to 43 minutes downtime) |
Don't fall into the trap of allowing phrases like "24x7" for availability. There's no such thing (unless you have an astronomical budget).
Steer the business instead to use percentages: like "99% available",
defined against a time period (e.g. "per calendar month"), and illustrate
what the 1% downtime might mean in lost minutes. Walk through the solution regularly and look for single points of failure (SPOFs). Be prepared for there being different availability requirements for different components in the same solution - but again highlight dependencies between components that may not be available at the same time. |
2 | Capacity | The component consistently maintains all other NFRs as data load demands vary |
Multiple formats along the lines of : Component C will provide sufficient capacity for U users each with M Mb of files Component C will provide sufficient capacity for archived reports for MM calendar months at a report creation rate of R per calendar month |
How do you know when capacity is OK? When the other NFRs aren't causing you
grief. "Load" here represents data (accounts, reports, general storage) and not activity, which is covered under scalability later. Work closely with operations staff to get this right. Take measurements of all existing and legacy systems, look at historic growth as well as predicted (remember that business forecasts can be optimistic and pessimistic). Plan for peaks, and have a plan in place for what you will do when capacity is exceeded - let's face it, one day capacity will be exceeded. Often you can't afford all the capacity you think you'll need, so cost out what you can, and make sure any remaining risks and issues are well understood by the business. |
3 | Concurrency | The component maintains all other NFRs when under multiple simultaneous loads | Component C will support a concurrent group of U users running pre-defined acceptance script S simultaneously | Here we're looking at business operations being thread-safe, with their state well-managed. You can prove concurrency with performance and soak tests, but code walk-throughs are a very effective preventative measure. The more operations you can run concurrently, the better overall performance you'll get, but there's often a bottleneck somewhere - database connections, serial access to a legacy system, etc. Work out what your code will do if it has to sit in a queue for ages - making everything asynchronous is rarely a realistic option, create management plans within the software that will handle situations where access to constrained resource approaches deadlock. |
4 | Enhanceability | The component supports changes that improve existing business logic, or functionality | This is not a measurable requirement, unless you already know the enhancements. If you do then it's simply a case of stating: Component C must support adding additional feature F using less then M man-days effort. Whether this is then true is so hard to measure as to be practically unworkable |
Existing code is easier to change if it is isolated from other code,
look for dependencies, and seek to pass handles to these into
functions (in classic IOC style).
It's also easier to change if it's logically compartmentalised - i.e. it's located where you'd expect it. My suggestion for sign-off here is just to put a logical plan in place and walk the business through it. If they feel you are leaving too much work for the future at least a debate can be had. Less work later often means more work now, so it's all about trade-offs. |
5 | Extensibility | The component supports changes that add new business logic, or functionality, to cater for previously unknown situations | Not a measurable requirement. See the notes on how to improve your chances, but don't get caught up in splitting hairs on whether this is been 'met' or not. It's far more likely that so much will change nobody will remember what you meant by it anyway |
The magic phrase here is 'unknown situations', and in practice there's
as much luck as judgment in catering for the unknown, but it's not impossible to make future work easier. Businesses don't change as much as they think they do, and when they do, everybody knows about it. This is to your advantage, as it means you only need to deal with the former (given that when you are asked to adapt your project time-recording software to work as a content management system, you can sob like a baby and ask for a mountain of cash to meet this outrageous demand). What you are looking for is potential reuse and reasonable granularity - both good building blocks for the future. But don't aim for 'reuse everywhere' (a danger in SOA these days) - it ain't reusable until it's being reused. |
6 | Interoperability | The component communicates with neighbouring components, without undue overhead, to meet all other NFRs | The word 'undue' is somewhat subjective. As for extensibility, use the notes as a guide. Components will either work or they won't (and continuous integration will show this early), but what amounts to undue overhead should be the preserve of the Principal Architect or TDA |
Most projects have to deal with heterogeneous environments:
shrink-wrapped applications talking to bespoke software, legacy, new, third parties, etc.
Each one should get its data across these chasms in the most efficient and standardised way practicable. Practicable means making a compromise between having an enterprise data model used everywhere (giving rise to excess translations into and out of it, for anything but the bespoke applications) and using native lowest common denominator formats, which lead to inextensible, point-to-point integration. |
7 | Strategy Compliant | All components are signed off as meeting the strategic goals of the organisation | Yes - another that can't really be defined by example. All organisations should have some kind of strategy, and someone must own it, so ask them. |
Hard to measure this one, except by going and asking your
group/enterprise architecture team if they are cool with your choices (in many organisations you'll know this already as they will have loaded
you up with seven tons of impossibly optimistic powerpoint driven 'strategic' garbage before you even had time for a project kick-off). But let's play nice for a second. Many strategic choices are cost based. Sure, make an argument to use a novel app server if it adds real value to your project, but if using the standard one won't kill you, then concentrate your efforts on making your code better. Turf wars between project and enterprise architects aren't worth it. I've been on both sides, and in my experience all they do is act as distractions from the main event. Look on corporate standards as a challenge, like integrating with the 1970's mainframe that nobody on the project understands. Make it fun. You'll ultimately enjoy it more and live a bit longer. |
8 | Latency | A business operation with a clear start and end point completes within a predefined time period | Acceptance script S completes within T seconds on an unloaded system, and within T2 seconds on a system running at maximum capacity as defined in the Concurrency requirement |
I don't do 'performance' as an NFR, so this is one of the
sub-requirements that many people mean when they say it:
"I click the send button, and the results return to my screen
in less than five seconds" etc. It's not worth having hundreds of these on a project because, like strategic compliance, meeting them can become a distraction. A few key operations (especially the ones that customers will benchmark you by) are enough. Any other overly slow operations can be tweaked later. As per concurrency, walkthroughs are useful here - follow the data through each sub-system and across each interface, look at all the things that could slow it down (especially other operations of the same type). |
9 | Longevity | The life span of the component is compatible with the businesses Application Portfolio Management (APM) plan | Add the owner of the APM plan to the reviewer list of the high level architecture documents that detail which components are new, changing (and to what extent) and being decommissioned |
An Application Portfolio what? Indeed. Well, it never ceases to amaze me how many organisations don't have one. It's basically a multi-year plan showing exactly how strategic each application is, and thus when it will be replaced/updated. If your call centre support suite is a pile of poo that can't be extended any more, it's pretty safe to say it's not strategic. Therefore your business shouldn't be investing too much in it. Therefore you don't want to build any castles on that sand. Therefore your architecture should reflect that. |
10 | Maintainability | The component has the ability to undergo routine maintenance operations in a manner that doesn't conflict with other NFRs | List the maintenance activities that the ops team regularly perform (often a good exercise in itself) and define how and when they are required to be be performed (elapsed time, time of day, etc) |
Does the component need to be maintained (upgraded, log files
cleansed, backed-up, etc.) while the business is running?
Would shutting it down require action on other
applications or components? The best way to deal with this NFR is to start with operational staff and work through a likely maintenance schedule, then take this to the business and run through maintenance scenarios, don't forget unplanned maintenance (when a database version is updated and regression tests didn't pick up the fact that it requires a software upgrade to your app server). Then design your architecture accordingly, and get the operations team to sign it off (having operational friends is a recurring pattern in delivering good solutions to NFRs). |
11 | Manageability | The component has the ability to undergo routine administration operations in a manner that doesn't conflict with other NFRs | List the administration activities that the business regularly perform and define how and when they are required to be be performed (elapsed time, time of day, etc) |
Administration is different to Maintenance, in the sense that
this set of actions is performed by the business (reference data
updates, new products, price plans, offers etc) whereas operational
actions are performed by Operations
(systems administrators and the like). The business should supply functional requirements as to what and how they want to be able to change core business data entities, but if they don't just map out a list of all the entities in you enterprise model against CRUD actions and then make sure there's an answer for every cell. Then create an architecture to support this - and scream from the hilltops if the business think it's OK to have operational staff doing product updates using SQL scripts. It is not. |
12 | Monitoring | The present (and possibly historical) status of the component is determinable by operational staff | All errors encountered during events E1, E2, E3 are to be captured by component C and reported into the standard monitoring tool MT |
If having operational friends is key to getting NFRs met,
then this is one of the main ways to make them. Slinging a new system over the wall at operations, and hoping they'll somehow deal with it, is sadly a common practice. Don't let it be yours. Use of basic tools such as SNMP, JMX or similar and a few simple dashboard additions can go a long way to help system administrators keep and eye on what's happening and react better when things go awry. Treat them just like any other customer, and do your damnedest to meet their requirements. Just because your sponsor won't ever want to know how many http requests there have been to the web server in the last ten minutes doesn't mean it's not important. |
13 | Portability | The component has the ability to migrate onto a new architecture, at an acceptable cost | Define future platforms and versions in the requirements. Assess cost of migration (rough order of magnitude is the best you'll get) |
Meeting this doesn't mean you have to write in Java.
And, to be fair, for many organisations this is not a particularly
high priority requirement.
I like to ask though, as it sometimes opens up some illuminating conversations about infrastructure strategy. It's extraordinary the number of companies that were "Unix" and are soon to be "Microsoft" (as if it were that simple..) or vice versa. |
14 | Recoverability | The component has the ability to deal with inconsistent data using a mechanism, and within a time, acceptable to the business |
Define tees in the processing pipeline that can accommodate
inconsistent data. E.g. On receipt of of a new Order by component C1, items bearing data attributes not conforming to the model will be submitted to component C2 Define separately requirements for how C2 will be managed |
There are subtle differences between recoverability,
resilience and reliability so it's as well to be clear on this:
Recoverability measures the ability of the application or component to deal with bad data - e.g. a mistyped post code on an order application.
It's wise to accept from the outset that duff data will creep into your
production system from time to time and that no amount of
defensive coding or Design By Contract
can completely protect you. Rather than allow it to be persisted, I prefer to have a separate queue for items bearing unrecognised attributes and let them be dealt with that way (initially this may have to be manual, but as patterns are detected it is quite easy to automate the process). |
15 | Reliability | The component has the ability to deal with unreliable dependent component |
Define failure scenarios that might constitute
'unreliability' (see notes). Define expected action for each |
If recoverability is the ability to cope with unpredicted data, then reliability is the ability to cope with unpredicted outages of networks and communicating components. Personally, I use Peter Deutsch's Fallacies of Distributed Computing as the benchmark for what can go wrong here. Admittedly, some of his fallacies could more properly relate to other NFRs, but as long as they're covered it doesn't really matter what name you address them under. |
16 | Resilience | The component has the ability to manage its data, and itself, in such a way that reliability and recoverability NFRs are consistently maintained |
I've never been convinced you can specify enough requirements to cover all eventualities here, or even that you'd want to.
It would be highly un-agile to disappear for months itemising error conditions before the code is even written. My favoured approach is to specify a consistent and detailed error reporting scheme that allows them to be caught in integration testing, and fixed as you go |
A resilient system deals with internal failures, as opposed to
external (reliability) or data (recoverability) failures, properly. It's closely related to Transactional Integrity in that a resilient component is always in a "safe" state, which can atomically be returned to if the next operation fails. It's a bit more than just making transactions atomic though, and includes considerations like periodic saving or sharing of state data, load balancing etc. |
17 | Scalability | The component consistently maintains all other NFRs as activity load demands vary | Component C will support an increase of factor F in the number of users U while running acceptance script S and maintaining latency within P % of the latency NFR |
Perhaps one of the most misused NFR terms - scalability is not the same as
performance, but the ability to maintain performance (and for that
matter all NFRs) as active users, or business operations, increase. Like capacity, it's hard to set a limit, as business users always want (or have predicted to their boss) more than is realistic. Therefore, a Scalability Plan is also essential to meet this: in the world of smaller/cheaper boxes, there should be a clear horizontal scalability path, not one requiring that servers get more memory added when they run out. And don't forget that horizontal scalability has implications for shared data and application design that should have clear answers in a walk through. |
18 | Security | The component meets regulatory, compliance and local security standards | See notes |
This is a whole topic unto itself, so take this only as a reminder.
I have a 100+ page document I use for security audits on systems.
One day I will put it online, but for the purpose of this checklist
we'll have to be satisfied with 'security' as a catch-all term,
and defer to your specific support in this area. All I can say from my experience is talk to them early, do what you can to get their sign-off, and have the project managers deal with any discrepancies arising from cost etc. The last thing you want is a bun-fight with security. It's worse than having one with group architecture. |
19 | Survivability | The component meets redefined operational demands in extremis | See notes |
Chances are when there's a major disaster, such as a data centre
flood, your business will accept a reduced set of operational
parameters. (Note that survivability really only applies at
solution level, having one component or application ready to
rock when everything else is wet through
is pretty useless businesswise.) Only the most real-time continuous businesses expect to continue to meet high throughput and low latency targets after a catastrophe has befallen them. Many wouldn't have the cash to pay for it. But still, it's always worth the discussion because of what may come up. Start with business continuity planning (BCP) - what processes need to continue, where users get relocated, what the alternative operational plan looks like. These are business issues that need answers before any technology can be put in place to support them. Then look at Disaster Recovery (DR) options - SAN replication, log shipping, etc. and design the architecture accordingly. If the business wants DR but doesn't have a BCP (no?.. really?.. some business want technology to solve even the non-technical aspects of their operations?.. surely shome mishtake..) escalate and back out. Many a good project has gone off the rails by this kind of uber-scope-creep. |
20 | Throughput | The component meets specific levels of transactional activity | Component C will support N1 transactions per second, of type T, on an onloaded system, and N2 transactions per second, of type T, on a system loaded as set in the concurrency NFR |
Another spin off from the world of performance this one,
but thankfully easy to define and test. The hardest part is in agreeing what a transaction, in the context of an isolated project, is (something meaningful to the business is usually best), and then where it begins and ends - it's also worthwhile being clear about what's in and out of scope for a transactional service level agreement early on. On more than one occasion I have found suppliers inadvertently agreeing to include "the internet" as part of the transaction they will guarantee throughput times on. |
21 | Transactional Integrity | The component maintains transactions in a predictable, and consistently safe, state | Transactions of type T1, T2, T3.. Tn will have ACID properties |
One word covers this:
ACID.
Atomic, Consistent, Isolated and Durable. Not just a good interview question, but a good mnemonic for testing whether critical business operations will always be in a predicable state. Atomic means that the operation will either work, or not work (not half-work); Consistent means that if the same operation were to be applied over and over again (given the same start point) the result would be the same; Isolated means that it's working, or not working, and it's affects, will not be impacted by unconnected events; Durable means once it's worked, its results will remain (unless the resulting data is modified by another ACID transaction of course). |