The Big Data and Analytics programme was initiated at the Bombay Stock Exchange (BSE) primarily for taming rumour mongering about stocks. The surveillance team kept tabs on certain websites, read stock specific news from financial newspapers, magazines and prepared paper clippings, etc. All the material was then aggregated and compiled as a report to decipher intelligence from it.
This exercise was limited to only a few sources of information from which the relevant material can be culled out and that too with limited amount of accuracy on any given day. This is submitted to the regulator on the subsequent day. Many important articles are missed due to various reasons. The real-time element was nowhere to be seen. In the stock exchange space, where wrong information can wreak havoc in seconds, the current practice was highly inefficient.
Taming the market manipulators
BSE decided to leverage the Big data and analytics solution to automate the process of intelligence gathering to nab the rumour mongers and market manipulators. “The BSE since the implementation got over, has been able to achieve 95-97 percent accuracy. It essentially means, the algorithm designed is 95-97 percent accurate in thinking like humans and the kind of findings and correlations made by humans,” says Kersi Tavadia, CIO, Bombay Stock Exchange. This was achieved using a small database. Gradually the size of the data processed has increased with tremendous pace. BSE generates 300 to 400 GB data per day and until now, the data repository has over 500 TB data.
The data is gathered from across platforms including social media sites. As soon as an alert is classified as a rumour, an Straight Through Process (STP) has been formed, which sends a message to the respective company against which the rumour is created. The company immediately responds and the clarification is published on the BSE website.
If required a report is immediately relayed to the regulator. This has gained a lot of confidence for the BSE website, to get reliable information. “On an average there are about 20-25 cases, which are false positives, where the objective is to fan a stock specific rumour,” informs Tavadia. There are many alerts generated, but only the ones which are meant to cause damage are quarantined and looked into.
Keeping tabs on people and machines
Apart from preventing rumour mongering, the Big Data analytics solution is extensively used for creating internal MISs for the top management, Relationship Managers (RMs), IT related MIS reports, etc. This includes machine related data, for e.g. the profile of people logging in from various systems, which helps in user behaviour analysis; alerts related to the health of the machines; application and system performance logs are generated; the BD&A is also used for converting videos of the media interviews of important personalities in the financial world, in a text format. They are matched with the rumour monger library created by BSE and the findings are generated.
Why Open source ?
Hitherto, the data was residing in multiple systems. There was duplication of the data. The same data set was residing in multiple versions in different systems resulting in not only duplication but also data inconsistency. “Under the BD&A programme, a data warehouse based on the Hadoop framework was designed. The data from multiple systems was consolidated in the warehouse,” says Tavadia.
The reports from the data are prepared directly from the warehouse and sent to the concerned authorities. “We have cut down significant IT costs, by using open source systems. The proprietary systems used initially were not scalable and expensive to maintain. The biggest pain with these systems was, the inability to accommodate increasing capacities into the existing versions of the hardware. The original systems had to be removed and replaced by newer versions of the same systems. This resulted in buying new hardware. Whereas with the open source system, the hardware can be sliced as required. It works on a simple intel architecture. “We have a mix of hardware servers, from HPE, Huawei, Lenovo running on a mix of Operating Systems (OS). It still runs as a single seamless farm. Hadoop framework is OS agnostic,” informs Tavadia.
When this technology architecture was adopted a few years back by BSE, there weren’t many takers of this idea. BSE, now has one of the largest data warehouses using a community based approach. After having reaped the fruits of the open source technologies openly available, BSE has also subscribed to certain special services. “We have subscribed to additional governance, security and encryption features. We deliberately didn’t invest in these additional features when the project was kicked off. If that was the case, the initial benefits from the project would have been killed by the complexities built in due to subscribing to a slew of functionalities, right at the initiation. We believe in Keep It Simple and Stupid (KISS) methodology,”
The cost advantage have been significant. Open source doesn’t demand the user for doing any visibility in terms of freezing the technology lifetime and refresh cycles. It enables ‘build- as-you-grow’ approach. “At times, it does happen, that we overestimate IT procurement cycles and something that was visualised to be bought again in the next five years has to be bought in, for example, just 2 years, which leads to not only cost escalation but complexities too,” says Tavadia. Open source is a cost effective and commodity infrastructure.
Data lifecycle on Hadoop
The data from various systems is pulled by the Hadoop architecture. The data ingestion, in 99 percent of instances is in a real time mode. There are only a few occasions when it’s taken in a batch mode. The data movement happens over LAN in GB speeds, and thus speed is not an issue. The network component of the BD&A infrastructure is tremendously simple. It runs on a cluster switching model, which has a common networking layer. The data flow thus happens instantaneously without the need for having separate networking layers between different switches as is the case with proprietary models. Apart from the simplicity in the network functioning, the availability is also high, wherein the data doesn’t travel across the systems and putting load on them affecting the performance of other applications like risk management, trading, surveillance, data feed, etc. These are mission critical systems, which are totally isolated from the BD&A but still pushing data into them. The data travels within the cluster of hundreds of servers (server farm) generated on the hadoop architecture and not in-between the different applications.
The storage handling is also automated. The Hadoop architecture allows data tiering in an automated fashion, as per how the system is configured. Accordingly, the data gets stored in flash, tapes, etc.
If you have an interesting article / experience / case study to share, please get in touch with us at editors@expresscomputeronline.com