Big Data – There you are!
Big Data has received an omnipresent characteristic the last years. There is no day passing by without any articles in newspapers, online portals, business reviews and tech magazines mentioning Big Data.
The survey conducted by Rexer Analytics in 2013 on more then 1’200 data miners and 75 countries shows that Big Data mining is not only buzz word but is an active element in the respective company’s plan. 13% of the participant indicated to have an active Big Data program and about the same amount informed their company running a pilot program on large scaled analytics. Therefore the remaining 74% do not perform any Big Data so far. The results for the survey 2015 are still pending, but no phenomenal changes are expected to occur.
Big Data – What are you processing?
According to Rexer’s survey the dominant source for large data is carved out from consumer transaction data, followed in the 2nd place by text data and in the 3rd place with time series collected online.
Remarkably, social media data is at the 4th place and equally in quantity to machine generated information (time series via sensors).
There are many possible explanations why consumer transaction data is in the main focus for Big Data, but let’s focus on the most importance ones:
- Awareness of the industry: it might be a logical conclusion that companies which are used to handle a tremendous amount of data in the everyday business, have fully developed awareness on Big Data (e.g. credit card companies, online shops etc.) and fully exploit their experience.
- Ownership of data: the accessibility of the data for best given if the minded data is in the passion of the respective companies. With well structured general terms and conditions (accepted by users with the magic checkbox) the ownership of the data can be clearly set and possible legal consequences avoided.
- Purity of data: the importance of social medias cannot be denied, but the data underlies distorts from behavioural mindsets. Who hasn’t already read some show off comments in Facebook or Twitter? So, why not only focusing on data generated by users which cannot be blurred by them (e.g. credit card) and keep therefore the modeling for Big Data more simple?
While processing all the zeros and ones, data specialists indicated to have frequently challenges to overcome. Time and efforts, inadequate computing power as well as the data management seam to bother the data scientists. Noteworthy, the modeling itself doesn’t seam to challenge the data specialists.
Big Data – Where will you go?
- Changes in data type and volume: with more and more devices, sensors and trackers being connected (e.g. ad-hoc micro sensor networks, self-driving cars, traffic control, bio-modeling, environment sensors etc.), the nature of the data will change as machine based information will start to flood the data networks and take a more dominant role.
- Modeling evolution: with Big Data getting popular, other research disciplines will be attracted. This will lead finally to more complex interdisciplinary research fields as for e.g. the combination of social behaviouristic models with association rules, decision trees etc. Maybe the social platforms will then significant increase in their importance for the analysts.
- Change in tooling: with more business analysts and researchers entering in the field of Data Mining, the used data mining tools will start to diversify stronger according the market forces. This will permit to combine even more different data types as well as to to design better models. The increased release rate in the software Rapid Miner and KNIME are a good indication for the change taking place right now.
- Outsourced computing: with the increased data volume and increased models the hardware performance will start to turn to a friction pad to process all the data. The changing requirements will call for more performance and specialisation, which financially will have a negative cashflow impact on a data mining company. Consequently, companies will start to outsource the Big Data section to experts or rent data centers with the must up-to-date performance.
As mentioned the future can hardly be predicted, but maybe Big Data will start to add an additional quantum accuracy in its prediction.