Big Data, Machine Learning, Data Analytics. All of these are concepts that have been around for a while and, due to recent events, have captured more interest at all levels: from governments to the private sector and common people, we all have heard of them.
We, at Arion, thought it was relevant to shed some light on all of these concepts and understand what we can expect in the near future. That’s why Martín Bouza sat down for a chat with who we believe is one of the most capable professionals in the field, Máximo Gurméndez, Founder and Chief Engineer at Montevideo Labs and Academic Director of the Bachelor’s Degree in Data Science for Business at Universidad de Montevideo.
The start of it all
It all started with Amazon. The accurate product recommendation they offered was what lit the fire for Máximo. He left Uruguay and settled in the US for studying through the Fulbright Scholarship, which enabled him to connect with top-notch professionals and professors. As he explains: “I had the opportunity to meet people who were very relevant in the area of Big Data, in particular a professor who later hired me as an assistant researcher. He was one of the inventors of the map-reduce, a framework that Google began to use towards the end of the nineties resulting in a paradigm shift in the way results were found on the Internet. Before, Altavista and all those pages were quite bad. Through this technology of map-reduce and the use of algorithms such as page rank, the quality of searches was greatly improved. This professor that I worked with had a relevant part in developing this framework.”
Máximo then set out to turn the theory into practice and became an intern in Dataxu. At the time, this digital marketing startup was composed of 8 engineers with the goal of changing the way ads were bought in different internet media. “We believed that instead of being based on manual arrangements between companies, ads should be bought through auctions where everyone was going to win — because the advertisers who paid exactly the right price for the right ad at the right time would win. Publishers could potentially receive more revenue because if their sites were more relevant, then the auctions would naturally lead to those balances where publishers, media owners, web page owners, those with available space were going to have bigger chances, and all of this was done in the context of real-time shots, which happened at a rate of 3 million times per second, which is crazy”.
Dataxu progressed into using Machine Learning to decide how much to bet on each advertising space, at what moment, and for whom, based on certain patterns. In 2019 Dataxu -which had grown from a startup to a company with over 300 people and with offices in 12 countries- was absorbed by Roku, the streaming giant.
Ten years ago, Big Data and Machine Learning were very new concepts, and not many understood what they meant or had worked with them, especially in Latin America. But that didn’t stop Máximo from coming back to Uruguay and founding Montevideo Labs. “At the beginning, we had to convince people — the first 2 engineers worked from the attic of my house and we grew from there. Today we are almost 40 people. We’re very proud of the process and the personal growth we’ve had”.
What is big data, in a practical sense
What is Big Data? What does it mean for non-academic players? How does it relate to concepts such as Machine Learning? For Máximo, “There are many points of intersection, from academic definitions to a whole lot of things, lots of different literature on Big Data and Artificial Intelligence. Big Data is specially mentioned in a business context, mostly in connection to building products based on data. The conventional systems with which we process the data are not suitable for these volumes of data because there are so many, they come so fast, they are very varied. Big Data is all of that. But I think that, above all, Big Data is about how to create business value from these data opportunities, which in many cases are unexplored data, that people do not wonder whether it could be used to improve other aspects of the business”. Máximo believes that this discipline will evolve even faster once it’s fully embraced by business people, shifting management paradigms, from experience-driven to data-driven. Of course, in the end, it’s human managers who will have to make decisions, but they will probably do so with a better understanding of data in lieu of intuition.
For Máximo, all of these are technologies with immense potential, but we are still fine-tuning our abilities. For example, the first time he used Shazam, his first impression was of surprise and amazement. “But then I tried singing the same song, and Shazam wouldn’t recognize it. If you sing to it or alter the song even a little bit, the song is no longer recognized. Shazam does what we call overfitting in Machine Learning: it’s programmed to capture that song as it is, but if you vary it a little bit it no longer works so well. And that kind of reflects the current state of Machine Learning: there are things for which Machine Learning is very useful, and in many problems, we are far from creating a complete solution, which we can somehow depend on, understanding why we can depend on these decisions made by these systems. But yes, without a doubt shazaming, when one begins to apply it not only to music but to images, videos, different media, information flows that involve the time dimension, means that what we do today with text searches is going to change. We are on our way to becoming a world where we do not look for relevant information from texts but from anywhere else.”
Coronavirus: a very real motivation
The current pandemic is very interesting in terms of big data. As Máximo explains, all of the predictive models that forecasted what would eventually happen were wrong, because they were modeled on other diseases and behaviors. Behavior changed very quickly almost worldwide, and not all companies and systems have models that are adaptable to abrupt changes in behavior. And he also finds the role of data scientists interesting, because “when you only look at the statistics you don’t know why you’re making the decisions you’re making, when you look at knowledge engineering the systems are not so accurate because we don’t have enough data to feed these systems and also not generate inconsistencies. And I think that probably the best outcome is a mix of the two. So, we cannot say that analysts who do not know anything about epidemiology are right, nor we can say that only with the cause-effect analysis we’ll be able to predict what is going to happen. We have to look at data and we have to understand data. They take these algorithms that try to classify things that they have never seen, like the zero shots or those algorithms, which are related to this problem. We have certain notions of things that happened in the past that we can apply to situations in the future, which is called transfer learning”.
For Máximo, all of this will enrich the discipline, because a lot will be learned about how to reconcile these two worlds. Big Data and Machine Learning are positioned uniquely to combat new virus outbreaks and to gather data. He believes people’s perspectives on data gathering will also change. “If I had asked this question a year ago, the first thing that would have come up would have been privacy issues. And today people are saying ‘well, privacy, yes, but my data is also contributing to a common good, which is that fewer people die’. When people are told ‘look, by providing your data fewer people will die, the perspective changes”. This all means that we are moving towards discussions that are not so related to technology, but rather about ethics and the new limits are now because technology will continue to advance and algorithms will continue to evolve.
For all of this to happen, the current limitations that the discipline is facing the need to be addressed. It might be a case of data availability (not in terms of its existence, but of the collection), or it could be related to a still small pool of talent. Or it could be a matter of incentives. As Máximo sees it, “I think it’s connected to all of these. Some things don’t bloom because there is room for improvement in the incentives given for that. And although there is a lot of investment in academia, in order to solve certain problems, the really important funding comes from the industry, the governments, which are also motivated by current problems, not so much projected into the future but current problems. I also think that human beings are naturally curious, both the academic and the engineer are very curious and they will always dedicate some time to explore these borders, maybe not in the intensity and speed that is needed, but there will always be a quota of it”. He isn’t worried about the talent, because the way he sees it is that “not only industries but ordinary people as you said are learning more about data science. When you look at the coronavirus contagion curve, they are into it, they have learned to read the double-axis, things that ordinary people did not do before. They learn unintuitive topics such as exponential growth. The human mind is not prepared for exponential growth, so there are certain concepts that are incorporated, such as the R-value. People became interested in these concepts”.
The role of experience, or how an experienced data scientist can assess the next step
Montevideo Labs is often faced with different customers asking the same question. What to do, when, and how. And because the world of possibilities that exists in Big Data is enormous, Máximo explains, one can assign 10 engineers to work on a certain problem, have a proof of concept and arrive at a model that makes very precise decisions, or one can put one person working for a month and arrive at a decent model. And it is not always easy to assess the return on that investment. And many times, ultimately people end up interacting with these models, for example, if it is a forecasting system for a campaign. People go to a user interface and try to figure out, “if I put more budget and point more to this profile or this other profile, what will happen, how much will I spend, how many people will I reach, what kind of people will it go to?”. To get there, they want to answer those what-if questions as quickly as possible. And sometimes the models give results that are not 100% intuitive since the data says that raising this or that value actually lowers the cost. “A model that we could even call too exact, too polished, is perhaps not the best for the user experience.” There are a lot of other factors that are not only numerical but are user experience related. Sometimes an average model that gives generally correct results in the user experience and overall product is better than the most accurate model, and perhaps too much time was spent on trying to reach the perfect model. And there’s also the decision of which model to use: today there are millions of types, different approximations, from networks to decision trees, to averages, to dividing one number by another. All of these often work, sometimes not, and the real value of an experience lies in deciding under what conditions to use more and less complex models. Infinite resources mean that you can do whatever you want, but with limited resources and with the ignorance of what is going to happen next in a much larger ecosystem, because it is an ecosystem that has millions of users interacting with your models, it is not so easy.
This is the logic behind the consultative work Máximo does at Montevideo Labs as a data scientist. Máximo describes it as “being the translator between technical-minded and business-minded actors. That is the role of the data scientist, to try to explain to the person who makes corporate decisions why we have to go with this approach or this other approach for the problem they are facing, which are technical approaches that the business person really doesn’t make any sense. We also add value in that sense, working in different industries from agritech to programmatic marketing or even data streaming, so we get to bring together different profiles of people who understand each other and develop something that makes sense”. For this to happen, Montevideo Labs tries to expose their people to as many actors as possible so that they have an intuition, at least, of what things product managers, CEOs, scientists, hardware engineers are thinking about. That gives them insight into what is going on inside the head of each of these actors and how to bring them together to develop something that has value. They also have formative activities.
The appeal of Big Data for new generations
For Máximo, Big Data allowed him to connect his credentials with the way he understood the world around him. And he believes this might happen to others as well. Here are his three pieces of advice for young people that might be interested in becoming a part of this discipline.
#1 — Invest in long term wins
“I think the first thing is to think about investing in the long term. In the world of Big Data, let’s not think about short wins when it comes to preparing people. Let us be patient to overcome certain barriers that exist initially, analytical, mathematical barriers, that in the long term end up paying. Going through these long-term processes pays”.
As convenient as it may seem to take a course or a quick training and become a part of the industry, Máximo believes that the contribution people who have focused on quick wins can make is substantially different from the contribution made by people with a deeper understanding.
#2 — Abstraction is everything
Big Data is a way to understand patterns that can be traced to decision-making. But because the volume of data is so large, and so many variables can be considered, scientists need to approach reality with a high level of abstraction. “We are reaching the point where machines create abstractions that humans are not capable of understanding. In the latest Machine Learning algorithms, we cannot understand why they came to that decision. If we try to look into it, they are black boxes, we cannot know why they make the decisions they make, which are generally correct and even more precise than those that humans could reach through logical reasoning. So, we have to be prepared to think in an abstract way and understand those abstractions”.
#3 — Programming is not something to be scared of
For decades, a lot of emphasis in education was placed on mathematics. And that happened at the beginning, in primary education, secondary education, university. But programming was considerably less emphasized. “Today’s programming is what math was in the past. So, it might be a little challenging at the beginning, but I believe that we are all capable of having that ability and we should not be afraid of it, and we have to be trained, because that’s the way to create simulations, it is the way to go testing, so don’t be afraid of programming”.
#4 — Be curious
The key to understanding the world around us is curiosity. For Máximo, questioning what is around us is the stepping stone to being able to find patterns. “I think we must first question why we are presented with the options that are presented to us. Why are you recommending this to me? And be curious, the first question if these options are correct for you, then think about how you got one recommendation or another”.
There is an ethical aspect that Máximo feels should be considered here: the options we are given as humans are going to be more and more as a result of data or from the patterns that we get from the data. “So, these options are proposed by black boxes. We do not know why the systems give us these options, that is, the options that we have access to today are given by the data, these options are generated from black boxes. We don’t know why they are giving us these options. So how do we live in this world, and what role does human judgment play in making one choice or the other? It is important not to lose sight of it. These black-box algorithms are very optimal but we do not know why they make the decisions they do, whether or not they comply with ethical restrictions”.