Introduction to AWS SageMaker

A Review on ML- The Need for AWS SageMaker

If you are seeking a career in the expansive ocean of technology, there is no way the Machine Learning portfolio didn’t cross your mind. From spam filters to YouTube recommendations, ML has become a part and parcel of our lives. However, the entire machine learning roadmap is no child’s play to traverse. The procedure could be corroborated only after analyzing massive volumes of data sets, training suitable machine learning models and deploying these models onto diverse production environments in real-time to yield optimum inferences at scale.
However, what would the ML engineer or the data scientist do if the required data set contained billions of records? Datasets of massive orders tend to be extremely complex to compute and process. Moreover, storing these datasets further exacerbates hitches due to limited disc space on private, local infrastructure. Consequently, the ML process is chunked down into 3 independent workflows to equalize the workload:

Data Generation/Building- This process generates example data to solve the problem in question. Suitable data is fetched, cleaned and transformed(wrangled) for model training. For example, let us assume that you are creating an ML model that differentiates apples from oranges while being given an input image of either of the aforementioned fruits. To train this model, the example data you need to feed would be images of apples, oranges or any other fruit you want the model to identify. Data analysts and scientists are responsible for data wrangling and pre-processing.

Running diagnostics (1).png Fig 1: The cleaned data(images) is used to train the model for identification.

Running diagnostics.png

Fig 2: After training, the model's accuracy is evaluated

Model Training and Evaluation-Let us take the example of a nursery school teacher. The life of a nursery school teacher is exhausting and immersive. This is because nursery children are too naive to grasp the most basic of concepts on their own. Consequently, the teacher has to intensify the effort she puts into training or educating them. There are various methods she could adopt to teach them. She could either play an educational video, or she could sing informative rhymes or she could conventionally draw on the blackboard with polychromatic chalk. All these different methods of teaching can be crudely labelled as ‘algorithms’ in machine learning.

Untitled design (3).png

Fig 3: The algorithm is the method of doing something. For eg: watching videos, singing and drawing are algorithms of teaching

The choice of algorithm can be determined by analyzing the purpose or the goal. If the teacher wants to teach her students music, she would sing rhymes with them, and if she wants to teach them drawing, she would draw with them. In ML, it is important to determine which algorithm is to be utilized when. Also, accruing resources is of utmost importance if the dataset is enormous and the model needs to be trained quickly. If the teacher has to show the video to 10-20 students, she would simply play it on television. But if she has to show the same video to 200 children, she might need a projector and a screen and a bigger auditorium to accommodate the students and the paraphernalia.

crowd canva.png

Fig 4: A broader melange of equipment needed for largely-dimensioned ML models

Similarly, to train a small-dimensioned ML model, a single general-purpose instance would suffice. But to train a model influenced by a broader ecosystem, a cluster of distributed GPU instances could be leveraged. Finally, assessing the accuracy of the model is a must after training. This could be juxtaposed with the teacher who conducts a test to evaluate the learning capabilities of the students. Training and Evaluation of ML models are done by Data scientists and ML engineers.

Deployment of ML models- This is the last step of an ML process. The model is deployed on a unit of the production environment and replicated throughout the ecosystem. Software Developers are required to deploy the ML model on multi-dimensional APIs and web applications.

Copy of Untitled.png

Fig 5: An Illustration on Independent ML workflows and workload distribution

Why SageMaker?

After scrutinizing the aforementioned infographic, we can deduce that data scientists and ML engineers need external abetment from software developers to abstract their deployable ML models and integrate them with web services. However, the collaboration between data scientists and software developers is not so smooth to fathom. Misuse of technical jargon, incompatible project compliance and inadequate information about the contrary field can jeopardize the project. Therefore, the best possible resolution would be to develop a platform allowing data scientists and ML engineers to wrangle data, train models and deftly deploy ML solutions in scalable production environments, without prior knowledge of software engineering pipelines.

AWS SageMaker is one of the most comprehensive Platform-as-a-Service ML IDEs that enables data scientists to build, train and deploy ML models by enabling the execution of customized purpose-built features that simplify the workflows remarkably. AWS SageMaker accelerates the entire process by provisioning these custom tools for each independent workflow; like labelling, data preparation, feature engineering, statistical bias detection, training, hosting, evaluating and monitoring workflows. AWS SageMaker is an umbrella service that encompasses all ML operations in a single interface. SageMaker supports key ML frameworks and toolkits like TensorFlow and PyTorch. Besides state-of-the-art on-click cloud deployment, AWS SageMaker is provenly cost-efficacious and reduces the upfront charges by over 90% by spot management training. It wouldn’t be wrong to say that AWS SageMaker is the fastest and easiest platform for Data Scientists and ML engineers and must be strategically leveraged.

Fig 6: AWS SageMaker Features

AWS SageMaker Vs Microsoft Azure ML Studio

Two of the most powerful tools to facilitate machine learning are AWS SageMaker and Microsoft Azure’s ML Studio. Both these platforms are effectively resourceful, fully managed and dynamically robust. But they cannot be paralleled as they target explicitly vivid users. So which platform should you use?

The answer depends on the skillsets and inclinations of the user. SageMaker has a more code-oriented interface with collaboration primarily executed through Jupyter notebooks. Python is the most common programming language used by data scientists; hence AWS SageMaker can be said to have more flexibility and control.

sagemaker ui.png

Fig 7: UI of AWS SageMaker Credits:aws.amazon.com/blogs/aws/sagemaker

Studio, on the contrary, has a drag-and-drop UI and could be utilized by professionals who don’t have any coding experience.

AZURE STUDIO.jpg

Fig 8: Canvas UI of Azure Studio Credits:docs.microsoft.com/en-us/azure/machine-lear..

ML Studio is more user-friendly, but SageMaker exercises more versatility. Both these platforms are equally accurate in result orientation. The only dependencies involved are the skillsets possessed by the deployers. Comparing AWS SageMaker and Azure Studio would be like comparing Jennifer Aniston with Angelina Jolie; both of them are equally incredible, but different people tend to prefer one over the other. Their opinions are factored by whether they like F.R.I.E.N.D.S more or Maleficent.

Conclusion

To sum it all up, AWS SageMaker is a bombastic blessing for any ML professional or freelancer as it outsources a powerful platform that provides numerous custom-built features that enable efficient building, training and deployment of ML models on the cloud. Some firsthand experience with Python is a necessity, but since a clear majority of data scientists are already familiar with it, it boosts team performance by over 10x. AWS SageMaker has drastically simplified the ML workflows by allowing data scientists and ML engineers to steer the entire process. Consequently, every ML aspirant must try his/her hand at this multi-functional platform.

-Sanya Sinha

References

Introduction to AWS SageMaker

Learn the importance of AWS SageMaker in Machine Learning

Did you find this article valuable?