The Problem

The UK has over 10 million social housing or private landlord owned homes. Each one of these is bound by safety standards to keep their tenants protected. The industry has only just migrated away from engineers completing reports by hand to generating a PDF document using a digital device. This digitalisation had made obtaining and storing important data possible, however, ultimately not much more useful than a filing cabinet filled with hurried handwriting if you can’t extract, check and turn that data store into knowledge.

TCW introduced a solution six years ago that is able to automatically ingest software-created PDFs and interrogate the information contained within them. This engine enables a housing-stock owner to process all their existing certificates, and any future ones, without requiring any retraining or changes in process for their engineers.

We have enabled a complete change in how organisations manage their housing stock. By making available all the data that was previously locked in their PDF certificates and performing over 200 regulatory checks, depending on the type of certificate, our users are now able to identify the critical and potentially dangerous problems and rectify them proactively.

Our Solution (and some new problems)

Our document ingest engine (Escher) can be trained to accurately read any PDF created using a certificating tool. Given how imperative accurate extraction is for our system we opted very early on to not simply rely on either a self-hosted or cloud OCR solution. Mistaking characters isn’t an option. As such we ran into interesting scalability issues quite quickly.

Identifying a PDF certificate and generating a data representation is a computationally-intensive process. With many different tools writing these PDFs and many different ways to represent the information within them we’ve leveraged a largely serverless architecture to ensure we’re not wastefully keeping services alive for documents that might not need them.

Getting this data out of the certificates was initially our key goal. Providing answers to regulatory questions automatically for 100% of certificates instead of the 10% of checks, that have become acceptable in the sector due to limited resources, represented a huge step for our customers. However, a few years on and we are finding more and more uses for the data we have access to. Customers frequently surprise us with requests for reports that we couldn’t have expected, let alone prepared for.

We began a change in strategy in how we arrange and make available data to our customers. Querying external systems and looking up extra data from different persistence stores to collate a report when it’s requested is now simply too inefficient.

Functions to the rescue again.

All our report generation is handled asynchronously at the time that the data is added to the system. By depending on a combination of Mongo Doc stores, Cosmos Table stores and simple blob storage we’re able to ensure we’ve got pre-prepared report rows ready for any customer who requests them, instantly. Introducing a new report based on a customer request is now simply a case of deploying a new function to collect and persist that data, then re-running some events.

Insights

We’d not be half as excited about our solution or seeing the cogs turn without an easy way to get at usage information or problem solving metrics for when things go wrong. We make extensive use of PowerBI dashboards to expose usage metrics to stakeholders/support staff. This is used to recommend additional training or integration options should we see an account being under-utilised. Maintaining a close relationship with our users to ensure our feature request/user verification and acceptance process is doing what it should.

Finally a word about ApplicationInsights. With a large tightly integrated system with many, many moving parts a simple problem quickly turns into a lot of diagnosing and problem solving. By being able to pull up a trace across all our services for a particular failed request or a user experiencing performance problems has proven invaluable. The first time I demonstrate AI to a new developer their mouth-open “no way” reaction is always the same.

In conclusion – I have often been of the mindset that relying on a lot of proprietary PaaS offerings for key business functionality is a few too many eggs in a single basket. While I believe elements of this still ring true; the rapid development and incredible instrumentation and integration provided by the extensive Azure stack is a huge payout and (for me) far outweighs the vendor tie-in heebie-jeebies.

Ben Ford

Chief Technical Officer, TCW

#TCWin #Azure #technology #innovation #UKhousing #PropTech #RegTech #PowerBI #PDF #data #assurance #tenants #housing # safety #NotJustGas #NotJustElectric