Machine learning (ML) is the “FUTURE”. I have been reading about it for quite some time now and I am pretty convinced by the statement. We are all talking about BigData, predictive analytics, etc. but really, when a system dude like me tries to foray into the field of ML, everything seems so overwhelming. The discussion starts with having millions of records (if you are lucky). Otherwise, it is 4TB of unstructured data as a start. Your systems brain tries to grasp the big picture and gets lost in trying to figure out the details. But well, after reading around, grappling, experimenting and reading a bit more, I think that learning ML is doable for us system dudes. You do not have to be a math genius (well, it helps if you are). But I will start with this blog of mine documenting the baby steps needed. I will use it as my reference and you can use it as yours if you find it useful.
In general, before we dive into the ‘code’, we need to understand how ML works. It is actually very simple from a 10000 feet view.
We have a blob of input data, we run it through some gears and out comes the prediction :D. But seriously, we have many different data streams with either structured or unstructured data, we create usable data i.e. refine the blob to only use data which we think is usable for our predictions and remove the unnecessary noise. Beware that these steps are iterative. So you keep going back to the same step again and again until you find something the most optimal model. The most optimal model definition is based on the measurement criteria setup before you try to solve whatever problem you are trying to solve using ML. The refined data is then fed into a model and the model output is measured against sample data. Of course, you now have supervised and unsupervised learning and associated algorithms and models. But 90% of ML is supervised learning i.e. the refined data has the answers your model is trying to predict. You make the model learn from part of the sample refined data and then execute the model on the rest of the data to understand if the model predictions were as expected. You simply rinse and repeat the process as many times as it takes. There are many frameworks (FW) and tools that we can use for ML as listed on KNuggets. But we will use something very simple to start with that helps us understand ML.