The change has also delivered access to managed cloud services that reduce the complexity of building, training, and deploying machine-learning and deep-learning models at scale.
However, this doesn’t mean that there are no challenges yet to be overcome. Data scientists, data engineers, and developers still have to learn and adapt to this new environment, and there is an ever-expanding and rapidly evolving ecosystem of tools and frameworks from which to choose. Many are simply learning as they go.
The very capabilities that make the cloud so exciting also create potential pitfalls. The ease of copying data across diverse systems can create governance challenges if not handled correctly, and the speed of change means data teams can bet on the wrong tool or framework and become stranded.
To reduce the chance of problems and ensure an organisation can achieve the results it expects, there are six key factors that should be considered. They are:
1. Make data governance a top priority:
It’s very important to enable iteration and investigation without compromising governance and security. For example, many data scientists intuitively want to copy a dataset before they start working on it. But it’s too easy to make copies, move on and forget they exist, creating a nightmare in terms of compliance, security, and privacy.
2. Forget your preconceptions:
If you’re coming from an on-premises world, you’ll often bring perceptions and biases about infrastructure that no longer apply to modern platforms in the cloud. Approach the cloud from first principles and start with what you want to achieve, not what you think is possible. That’s the only way to push the boundaries and take full advantage of this new environment.
3. Be careful not to create new data silos:
A key element that is closely tied to data governance is the concept of data silos. In the cloud, it’s important not to replicate the fragmentation that’s common in the on-premises world. The proliferation of tools, platforms and vendors is great for innovation, but it can also lead to redundant, inconsistent data being stored in multiple locations.
4. Maintain an open mind:
One of the exciting things in data science is that frameworks and tools are evolving at an incredible pace. However, it’s critical not to get locked into an approach that limits future options when technologies fall in and out of favour. Choose a data platform that won’t tie you into one framework or one way of doing things, with an extensible architecture that can accommodate new tools and technologies as they come along.
5. Incorporate additional data sources:
Cloud platforms make it significantly easier to incorporate external data from partners and data-service providers into existing models. This has been particularly important during the past year, as businesses sought to understand how the impact of COVID-19, fluctuations in the economy, and subsequent changes in consumer behaviour would affect their businesses.
Some organisations used data about local infection rates, foot traffic in stores, and signals from social media to predict buying patterns and forecast inventory needs. Consider what data you could be using.
6. Reduce your complexity:
AI technologies like machine learning and deep learning are immensely powerful and have a critical role to play for certain business needs, but they’re not right for every problem. Always start with the simplest option and increase complexity as needed.
Try a simple linear regression or look at averages and medians. Check the accuracy of predictions and whether the ROI of increasing the accuracy justifies a more complex approach.
Analytics tools are quickly becoming more powerful and able to deliver ever-greater benefits to the organisations putting them to work. By adding cloud resources to the picture, data science teams will be able to take their work even further in the future.
Data science plus the cloud is a very powerful mix.