Highlights
- Automating Core Business Lines: created an AI/ML-powered application for financial data crawling and retrieval from multiple websites.
- 10x Data Storage Cost Reduction and Data Processing Time Optimization: improved data storage architecture and reduced the time necessary for manual data analysis from days to hours.
- Retrieving Mission-Critical Business Data: developed a solution to retrieve data from diverse financial reports and annotate it with 150+ business rules.
- Strategic Cloud Transformation: successfully migrated an investment analytics platform to the cloud and achieved close-to-limitless scalability, increased processing speed, improved operational efficiency, and massive cost reduction by leveraging AWS and serverless architecture.
Client
Our client, Morningstar, Inc., is one of the premier asset management companies in the United States that employs more than 12,000 people, operates in 32 countries, and manages an investment portfolio above $200 billion. In 2018, the company’s revenue exceeded $1 billion. Morningstar, Inc. provides investors with company information, forecasts, and related proprietary analytics for evaluation of the investment potential of various businesses.
Product
SPD Technology is a long-term custom software development partner for Morningstar, Inc, delivering several major projects for the client, including:
- The Kessler Application allows users to create watch lists for tracking the performance of stocks before their purchase. It is possible to build charts to see the performance of stocks over time, as well as compare different regional markets to the different asset classes. There is also the Investment Calendar, that helps users keep track of the various important investment events, like IPO releases and dividend payments.
- An AI/ML-Powered Web Crawler Application that uses sets of keywords to search for health savings plan data across numerous websites. It retrieves data in the form of text excerpts, categorizes, and prioritizes it for further analysis.
- A Data Collection Application that allows retrieving data from one of the world’s biggest IPO databases, annotating this data, and streaming it via a 3d-party financial application. The data that can be retrieved includes around 70 various data points and to validate the manually annotated data, the application utilizes around 150 business rules. Key components of this product include a data annotation module, a data collection module, and a data validation module.
Goals and objectives
For an AI/ML-Powered Web Crawler Application:
- Data Collection Process Automation: developing a cutting-edge solution powered by Artificial Intelligence to automate a complex process of the collection of relevant financial information from hundreds of different sources on the web.
- Data Extraction Functionality: enabling our web crawling functionality to correctly extract only relevant data for further analysis by human experts.
- Effective Data Points Comparison: delivering functionality for correct and timely detection of changes in gathered information.
- Performance Optimization: maximizing the efficiency of our Web Crawler app by seamlessly integrating it into the client’s ecosystems without disrupting any business processes.
- Implementing Data Streaming: developing a solution for transferring collected information to the client’s analytics team on another continent to conduct deeper analysis and extract more business insights.
For a Data Collection Application:
- Automation of Financial Reports Data Collection: developing a solution that enables automation of the data collection process and delivers this information to one of the partners of our client via a modern financial app.
For a Kessler Application:
- Cloud Migration: transfer the data from an existing legacy platform to our modern cloud-based solution.
- Benefit from Serverless Architecture: leverage the serverless architecture and offer our client key advantages, including unprecedented scalability and lower costs on infrastructure management overhead.
- Introducing Sophisticated Features: developing several new, intricate functionalities for asset managers who are using this app.
Project challenge
- Data Crawling and Retrieving Difficulties: solve several data crawling challenges, including a significant number of the websites to be crawled having a poor structure and containing a lot of broken links, IP detection, and captures. It was also necessary to figure out how to deal with the high amount of irrelevant, noisy information in the multi-keyword Web crawling process.
- Data Entities Prioritization by Relevancy: work out a way to prioritize an overwhelming amount of collected data to be analyzed by the client’s employees every week.
- Poor Software Architecture of the Existing Data Collection App: develop a replacement for the low-performing MVP version of the client’s app that was developed previously by another vendor.
- Complexities in the Graphical User Interface (GUI) of Data Collection App: implementing the system’s GUI as an Excel-like table, as per the client’s requirement, created additional difficulties in developing this functionality. The annotation capability of the system was difficult to make function properly because of the multiple faults in the software architecture of the previously developed application.
- Additional Integrations for Kessler App: integrating the platform with the client’s aggregated accounts system and Stripe required careful planning to ensure seamless communication between components.
- Ensuring Kessler App’s Resilience to Extremely High Loads: match the customer expectations of an updated platform to withstand 1 million users daily.
- Tight Deadlines: deliver a significant part of the functionality against a very tight deadline, which was necessary due to strict business demands.
Solution
Our collaboration started with working on the Web Crawler application. After the planning stage was complete, we assembled the core team, which included:
- An Engineering Manager
- Full-Stack Developers
- AI/ML Experts
- Quality Assurance Specialist.
To deal with the main challenges in developing the Web Crawler app, our team decided to leverage our deep expertise in Artificial Intelligence and Machine Learning. So, we prepared a data set for further model training to begin the development process. To solve the challenge of the irrelevant information that contained target keywords, we came up with sophisticated AI/ML (Doc2Vec and Word2Vec) models. These models allow for analyzing a larger context around a keyword, making it possible to cut off most of the irrelevant data. Using advanced AI/ML algorithms has also allowed us to properly process health savings plans with additional reservations (for example, ones that have to do with discounts) that alter these plans’ execution logic.
Our project team employed AI and ML algorithms to help reduce the amount of data to be processed by the client’s employees weekly as well. In a user-friendly manner, our app shows the sites where no changes in data have taken place since the previous crawling. Besides, it also prioritizes all the retrieved results, indicating the percentage of relevance for each result. While we were working on preparing a custom model, we used AWS Comprehend to score crawled data relevancy.
In addition to focusing on optimizing the performance and cost-efficiency of the solutions we were tasked to develop, we also:
- Proactively optimized the client-provided data storage architecture, which has resulted in significant cost savings;
- Our Backend developers implemented the front end of the Web Crawler application, further improving the cost efficiency for the customer.
- Embraced the modular monolith architecture for our API to deliver a high-quality, scalable, and efficient solution.
Fully satisfied with our performance and speed during the Web Crawler app development, the high-ranking technical officers approached us to develop a Data Collection application.
The client already had an app developed by another vendor, but it failed to comply with current business demands. Our team made an audit of the existing app and redesigned a major part of its architecture. This allowed us to rule out 90% of the possible data losses and significantly sped up the performance.
Additionally, we redesigned and dramatically expanded the data annotation functionality of the app and developed from scratch its advanced data validation capability. Our developers put in additional effort to meet the deadline and implement the system GUI in full sync with the client’s requirements.
After this success, another legacy transformation project for Morningstar, Inc., followed, as we completely overhauled the client’s Kessler App into a full-blown Cloud application. By skillfully leveraging the potential of AWS, we’ve achieved the required high-load capacity and significantly increased the system’s processing speed.
To achieve the required integration with the client’s aggregated accounts system, we interacted directly with several different internal departments to work out the best solutions possible.
We introduced several business-critical features in the revamped Kessler App, including:
- Advanced investment analytics functionality with watch lists, charting capability, Investment Calendar, Portfolio Management markets’ comparison, and investment analytics.
- Expanded the platform’s advanced search (Screener) functionality significantly, which enabled searching for stocks with a set of criteria and grouping the found stocks into criteria-based collections.
Tech Stack
-
AWS
-
Terraform
-
Docker
-
Jenkins
-
Spinnaker
-
JavaScript
-
React.js
-
Vue.js
-
Nuxt.js
-
Angular.js
-
jQuery
-
WebSocket
-
Java
-
PostgreSQL
-
Spring Boot
-
Hibernate
-
Node.js
-
Gradle
-
Vavr
-
Express
-
Stripe
-
Crawler4j
-
Junit 5
-
Selenide
-
Allure
Our results
We successfully completed all three subprojects, demonstrating over the years that Morningstar, Inc. made the right choice in selecting SPD Technology as its full-cycle innovation partner.
- Automated Data Collection from 500+ Websites: developed a custom Web Crawler application for a complete automation of the core business lines.
- 10x Performance Optimization of Data Analysis Tasks: implemented a mechanism that allowed data analysis tasks to be performed by 2 people instead of 20, allowing the company’s employees to focus on more strategic tasks.
- 10x Data Storage Cost Reduction: found the most optimal solutions that dramatically reduced the client’s data storage costs.
- Unlocking an Entirely New Business Line for the Client: helped to achieve a key business objective, as we allowed the client to provide IPO data.
- 10x Improvement of the Processing Power and Stability: delivered a solution that offers dramatic improvement of the Data Collection app’s performance and reliability, compared to previously developed MVP.
- High-Load Kessler App Used by 1 Million Customers: strategically overhauled a legacy system into a robust platform able to handle 1 million users per day.
Highlights Client Our client is one of the premier U.S. asset management companies that employs more than 12,000 people,...
Explore Case