Bridging the Curiosity Gap with Software Development and Data Science: an interview with software engineer Samadrita Ghosh.
Data-driven software developer with a passion for uncovering insights and impacts from complex datasets, Samadrita is proficient in Software development and has started using Python, R, and SQL for data manipulation and providing insights. She is eager to leverage skills in Machine Learning, Natural Language Processing, and AI to contribute to meaningful projects.
- Q) Could you first introduce yourself to the reader?
My name is Samadrita, I have been living in London for the past two years. I am currently working as a Machine Learning Engineer at a social prescribing company that envisions providing a better experience to the customer by radically changing human services using Machine Learning algorithms through personalisation of services and improving outcomes for citizens, the planet and provide value for taxpayers.
- Q) Let’s start with the basics. What is the future of Machine Learning and Data Science?
It excites me to think of the bright possibilities revolving around the field of machine learning and data science.
There will be an increased focus on Real-Time Automation and AI integration with Machine Learning where Data science plays the role of shifting from analysing historical data to real-time analytics, allowing organisations to make data-driven choices more quickly. We will be seeing advancements in Deep Learning and Quantum Computing. Deep Learning is a powerful subset of Machine Learning and I am currently working on a project which focuses on predicting Stock Market behaviors using Deep Learning models. Growth will also be seen in speech recognition and anomaly detection. I have worked on quantum computing which can process enormous information far quicker than ordinary computers, allowing for ever more complicated data processing. According to my experience, there will be a growing emphasis on data ethics with the growing usage of ML and data collection. Data scientists will need to consider privacy concerns, bias in algorithms, and ensure transparency in how AI and ML models arrive at their decisions.
- Q) For people who are less familiar with these terms, how would you define data science, machine learning, and artificial intelligence? Because as you mentioned, these are terms that float around a lot in the media and that people absorb, but it’s unclear how they fit together.
That is a really good question and I talk about this frequently where the terms used are not defined on solid ground themselves. The terms are quite related and interconnected. At the same time, the purpose of each is unique and used at different stages. Data Science is a broad term that includes various data operations and ML falls within it, as well as ML is the subset of AI.
First using Data Science I source, clean and process the data to extract meaning out of it for future analytical purposes. Then AI combines large amounts of data through iterative processing and intelligent algorithms to help computers learn automatically. These algorithms are ML models that use efficient programs that then use data without explicitly being told to do so. This is the flow that works.
To break it down further, AI uses logic and decision trees, ML uses statistical models and Data Science uses structured and unstructured data.
- Q) That is well explained. Can you give us some examples of each to understand the practical usage?
Sure.
Chatbots, Voice Assistants, Language Translation are popular examples of AI which is integrated in software.
For ML, Recommendation Systems such as Spotify is a good example where the model is gradually learning the preferences of the user or listener and recommending accordingly. Other examples include Facial Recognition, Speech Recognition, detecting spam Emails and Malware.
Healthcare Analysis, Fraud Detection are popular examples of Data Science. Today most business decisions are drawn from data and its analysis which is why a Data Scientist is crucial in today’s world.
- Q) So, an advanced software developer role emphasizes the usage of data. If you share with us the most interesting and challenging projects that you have done that included massive amounts of data?
I would love to answer this one.
The first step as a software developer is to assess the feasibility of the project which includes analysing the concept, goals, scope, specification and availability of data to build the software. I remember my first project was to build a portfolio website where I used a combination of HTML, CSS and JavaScript coding language. Recently I researched and built a ‘Hums and Whistles Recognition System’ which I enjoyed building because how many times has it happened that we hear a song for the first time, like it, and forget the lyrics right away, it’s only the melody we remember. The goal of the project was to make a ML model that will try to identify the song from the dataset that contains wight different melodies made with hum and whistle. The dataset was built by a group of 180 students from my university which was first processed to extract the important feature like pitch, power, voiced frequency, chroma and contrast. I built three different models Support Vector Classifier (SVC), Convolutional Neural Network (CNN) and Random Forest Classifier (RFC) to compare the results. Interestingly the accuracy of the models was 99%, 91.7%, and 74.85% respectively giving me an idea of the best fit model to use forward when it comes to sound recognition.
- Q) So, I would like to know about which tools, frameworks and methods you use or recommend to Software Developers and budding Data Science enthusiasts?
I use NetBeans which is an open-source Software Engineering Tool that helps Software Engineers like me with coding, debugging, testing and deploying software applications. It is my personal preference because it supports various languages and frameworks like Java, C/C++, PHP, HTML, CSS, Ruby and suitable when you work with diverse and complex software projects with debugging, syntax highlighting, refactoring, profiling and more.
Whereas with Data Science, I code in Python Language using Anaconda Library with tools like TensorFlow, ScikitLearn, Keras, Apache Spark, Hadoop and more as per the requirement of the project. GoogleColab is a very good alternative which I used extensively before shifting to Anaconda.
- Q) Do you have any advice for young people interested in doing your kind of job?
It would encourage anyone who is already in the field to play around with the data in hand to understand how different model works and it’s a lot of fun to discover the accuracy of each technique. Maths and coding can be a bit scary at first – especially when you are starting, but don’t give up. There are amazing things to build with technology.