ManagementPhysicalSecurity Enterprise ServicesSecurity Leadership and ManagementFire & Life SafetyPhysical Security

We hear you loud and clear: Addressing the nuances of audio security and compliance

Call recording, monitoring and surveillance of voice, and electronic communications have long been required for regulatory compliance for financial firms. With the shift to remote work and hybrid work environments continuing to be the norm, voice communications and communications over video collaboration platforms have become even more important for compliance teams to function successfully and remain proactive. FINRA, the FCA, and the SEC are clear that compliance teams must monitor communications to remain compliant.

A study by the National Center for Biotechnology Information found that video conferencing services, like Zoom, have seen a ten times increase in usage compared to pre-lockdown levels. When we take a closer look at the numbers, it is even more dizzying: In an average workday, a person makes eight calls (30 minutes each) which equates to four hours of audio, or 12,000 words, per person each day.

As remote work increases globalization, how are teams to ensure compliance with the volume of audio calls and the diversity of languages and lingo? From multilingual complexities to data volume concerns, monitoring audio can be one of the most challenging communications to monitor.

Audio is Complex

Depending on the industry, regulation varies differently — but they all have one thing in common: You must monitor all communications, including audio, in order to guarantee proper surveillance and compliance. Random sampling has been a standard practice, as sifting through data and consistent monitoring is a daunting and time-consuming task. However, with the volume of data we have and the amount of time it takes to listen to a call, audio surveillance can take a lot of effort and leave firms open to too much risk.

Given the multifaceted nature of audio, compliance teams must find a way to balance the complexities of audio itself with the explosion of data sources. However, audio isn’t just another data source, it belongs in a category of its own — with many characteristics that are vastly different from e-comms surveillance — the biggest being transcription is not an exact science.

When it comes to monitoring audio, we are not able to simply read the transcript and expect to understand the context as easily as we would an email. Words transcribed on paper can have an entirely different meaning than that of their spoken context, as the emotions and gestures are unable to be accurately captured.

On average, 80% to 90% accuracy in an audio transcription is considered “good.” That means that 10% to 20% of the words in your transcription can be misused, missing, or just plain wrong. A scary thought, considering that some of those words might be important keywords.

Transcription also neglects to consider different languages and common vernaculars in different cultures. There are 23 main languages for more than half of the population, which would be a logistical nightmare to try to capture. Not forgetting that language switching is common in voice communication and being able to understand all languages in one conversation is critical for compliance.

Tackling the Complexities of Audio

We know that elements like context and languages matter, so now what? We can take all of our comprehensive communications surveillance, where all recorded communications — voice, email, chat, e-comms, video and others — are brought together into a single data lake and are analyzed holistically. When selecting platforms, GreySpark’s report shows that 75% of the sample deemed this approach most likely to be an end objective.

With all the nuances of audio, integrated monitoring is the most practical and efficient way to review this rapidly growing data source. AI can help teams review one hour of audio in one minute rather than spot listening or listening to the entire conversation. Teams don’t have time to spend on the audio recording sections that are not relevant.

A best-in-class solution would be one that unites transcripts and audio in one place so that teams can search by keyword and playback only on risky sections. This would help teams prioritize the riskiest content to help them make decisions on the next steps in order to find misconduct and keep it from escalating.

As the world becomes more and more connected, we can confidently say that audio isn’t going anywhere, and regulations are only increasing. No matter the industry, security and surveillance teams must understand the nuances and employ communication surveillance tools that accurately identify the risks, no matter the type of communication.

This article originally ran in Security, a twice-monthly security-focused eNewsletter for security end users, brought to you by Security magazine. Subscribe here.