There is no such thing as a one-size-fits-all approach to the subject of choosing your company’s data architecture. It’s a huge challenge that, when executed correctly, sets the foundation on which to build a data-driven organization.
Not only do we need to keep in mind what technology is already in use at the organization, but also, we need to consider business needs, budget, the team that will work on this, and most importantly, what data we’re willing to store, process and consumes.
What types of data do you need to analyze?
Different businesses have different realities. From a financial institution processing transfers data to a logistics company monitoring sensors and even weather data, to a mobile app collecting user’s behavioral data, all of them will have different needs, which will result in different solutions and approaches.
An important part of this first step is to not only select the type of data we need but also identify the data we already have. Is the data we need from external sources, such as public APIs? Is it private, from private databases? Do we need to negotiate with a third party to gain access to it?
Working with the different areas of the company in order to identify their needs is the first step we need to take in order to succeed in this stage. In parallel, we need to be mindful of regulations regarding the consumption, storage, and sharing of private information and how can this affect our strategy.
What are your use cases?
Within the organization you may have different needs, maybe one for each business area, and you need to identify those in order to design an architecture that successfully responds to business needs.
C-Levels and managers will be more likely to analyze KPIs, summarized information, maybe weekly or monthly briefs about business performance. On the other hand, a company’s operations will need daily or even real-time data to succeed in their tasks.
Considering these aspects is key to designing a robust architecture that allows the delivery of trustworthy information to every member of the organization.
The right technology stack
There are many things to analyze in this regard, as we’ve previously mentioned. The tech stack currently in use, the growth rate for the upcoming years, the team in charge of developing and maintaining data solutions are all factors that must be accounted for.
Modern stacks tend to be cloud-based, which allows scaling and growth in a way that’s faster than a full-on premise stack. Another perk of cloud-based stacks is that data teams can take advantage of built-in solutions, which vary between cloud providers, as can be Google Cloud Platform, Azure, or Amazon that provides amazing tools to accelerate the development and deployment of data solutions.
What’s your budget for this initiative?
We’ve asked a lot of questions, but still, there’s an important one we haven’t asked. How much are we willing to spend on this? We need to consider the set-up investment as well as the maintenance costs, which will need to be contemplated for as long as this initiative is in place.
This also can be a key point, central not only to the definition of your data architecture but also to the roadmap to build different solutions.
Wrapping up
It’s my personal opinion that we need to build a data architecture practice on solid foundations that allow the company to scale, modify and adapt it to business needs and changes all along the road.
I’ll share some initial approaches we can take to start building our company’s data architecture in order to bring all of these ideas down to earth.
Data can be divided into two major tech stacks, and each one of them is served by one stack: data warehouses, on one hand, serving business intelligence, KPIs, and reports, and data lakes serving data science and ad-hoc analysis, machine learning, and artificial intelligence purposes.
Identifying where to start and how to build our architecture is crucial when it comes to allowing our company to increase and improve its data usage and spread it across every process of the company.
In further posts, I’ll analyze and discuss a new trend that I found out on Martin Fowler’s blog called Data Mesh, which appears as the perfect match for up-and-coming microservices architectures.
Feel free to drop me a line if you want to discuss any of these topics in more depth.