Image
How Egnyte Uses AV-Service to Detect Malware

How Egnyte Uses AV-Service to Detect Malware

In hybrid or remote desktop environments, content is the most exposed type of data. Egnyte’s all-in-one platform makes it simple and easy for IT to manage and control a full spectrum of content risks, including accidental data deletion, data exfiltration, privacy compliance, and much more. All while giving business users the tools they need to work faster and smarter—from any cloud, any device, anywhere.

At Egnyte, we’re committed to mitigating the risk of spreading viruses when a malware file is uploaded to the system. With millions of files uploaded daily, we need an effective, simple, scalable solution and a reliable distributed system that streams messages from publishers to subscriber clients.

The entire Egnyte ecosystem is powered by distributed systems and microservices, many of which benefit from running asynchronous workloads. One critical part of Egnyte’s software infrastructure is AV-Service, a system that detects whether malware/viruses are present in uploaded files. It’s an enhanced version of Egnyte’s file scan service, which was previously restricted to files uploaded from a shared link.

AV-Service uses various proven malware-detection methods, including traditional, signature-based detection, machine learning-based classification, and real-time internet-based cloud reputation lookups. When a file is uploaded, AV-Service scans it for viruses and malware in near real-time without throttling the upload flow. 

In this blog post, we’ll describe how AV-Service detects viruses and enables users to take action to prevent the spread of viruses to all other clients. 

Below is a high-level diagram of AV-Service:

It has the following components:

  1. Producer: Service that sends the events (in this case, ADD_FILE events) to the subscribed clients.
  2. Listener: Consumer that receives the message from the PUB/SUB.
  3. Scanner: Antivirus service that scans files for viruses.
  4. Cloud File System: Framework that handles the files that are infected.

Terms

Workgroup: Users/Clients accessing the Egnyte platform. When you sign up at Egnyte, you’ll be allocated a new domain where you and the rest of your team (workgroup) can upload and share files.

In this example, the Producer is the Event Subscription Service (ESS) which publishes events to subscribers who have indicated their interests, such as ADD_FILE, DELETE_FILE, etc. AV-Service subscribes to ESS for the ADD_FILE event, which is raised as soon as a file is uploaded to the system. It contains metadata, Object Store ID, etc.

AV-Service sneak peek: The AV-Service runs on a Kubernetes Cluster as a Deployment, which consists of a Pod with two containers. The first container is the actual service that scans the file for viruses. This scanner runtime runs as a process in the container. The second container is the service that has the Listeners pulling the messages. Both containers share a common volume mounted, so the downloaded files will be available for both containers.

The Listener is the actual AV-Service container that consumes the published messages. Egnyte works with various clients, so the filter logic and chain can be tailored to the requirements of each business (e.g., the maximum file size to scan, etc.). Once the message successfully passes the filter chain, it is published to another Pub/Sub-topic (Filtered Topic) and acknowledged (indicating that the system has finished processing the message). The Filtered Topic serves as the actual job queue for the scanner. If a message fails to pass the filter chain, it will be acknowledged but won’t be passed to the Filtered Topic.

Since we know the rate at which scanning can happen, we’ve configured the Pub/Sub subscription accordingly. We’ve chosen an asynchronous subscriber with flow control and scheduler, so each message is processed concurrently by a thread. We also have configured the dead letter topic once five delivery attempts are reached. An alert is raised if the dead letter queue fills up.

Once the message is received from the Filtered Topic, the file is downloaded from Egnyte Object Store (EOS) using the object store ID in the message to the mount path common to both containers. Then the service makes an HTTP REST call to the other container (Scanner) for scanning the file. Note: Egnyte does not store any of the files in the system apart from the storage specified by the customer. Once scanning is complete, the file is deleted.

Scanner: To facilitate the communication between the AV-Service container and the Scanner container, we wrapped the Scanner with REST. The actual scanner runs as a TCP service in the container. We chose FastAPI over Flask because we want to process the requests asynchronously. The AV-Service container sends the file path to the Scanner container. Once the scanning is done, the Scanner container sends the response back to the AV-Service container. The final prepared message is sent to Cloud File System (CFS) topics so that CFS can make necessary changes to the uploaded file.

CFS is a framework that handles uploads of files, saves the metadata to the distributed metadata store, handles all the actions on the UI, takes care of trash purging, etc.

CFS receives the event from the AV-Service, which is published to the Pod-specific topic. Then, the following actions are triggered:

  1. Move the infected file to the quarantine folder. This step is crucial as it will avoid syncing these files to other users in the workgroup.
  2. Record the information in Egnyte’s databases.
  3. Send a notification and email to the admin user and the user who has uploaded the file.

Egnyte offers the following actions in the “Potential Malware” (quarantine) folder:

  1. Restore: Restores the file to the original location—a use case where the user might mark this virus as a false positive.
  1. File and User Details: Displays file and user details such as the IP address from which the file was uploaded, its geolocation, etc.
  1. Malware Details: Displays the name of the virus, the date it was detected, and the type of infection.
  1.  Download: The user can download the file at their discretion.
  1. Delete Permanently: The user can delete the file permanently from the “Potential Malware” folder.

Testing:

We strictly recommend and follow Test Driven Development (TDD). As Egnyte runs services in Kubernetes, the system uses docker-compose to bring up the containers of the AV-Service and scanner in the local environment. Since PUB/SUB is the pipeline's starting point, Egnyte spins up a Pub/Sub emulator, which provides local emulation of the production Pub/Sub service. Though the capabilities of the emulator are limited, it still serves the purpose for our integration testing.

Performance Testing:

We were excited to see Egnyte’s behavior with the load that would be in the production environment. We collected a sample set of 1,400 objects with an average size of 200 MB, which equals Egnyte’s average file size in Production. We generated 100K ADD_FILE events in a round-robin fashion. We disabled Horizontal Pod Autoscaler ( HPA ) for this test, created ten replicas of the Pod, and triggered the performance test.

Below are the metrics from the exercise:

Time taken to download the file from EOS and scan the file for viruses (99th Percentile).

We were more interested in the scanning time (which is in milliseconds) than the download time. Download time can be reduced in the actual production environments as EOS nodes will be much closer to the processing nodes.

We know that the services aren’t resource-intensive, which was confirmed by the Kubernetes Metrics.

Scalability

We knew the scanning rate, so we deployed the deployment with Min replicas of 40 in the Production environment and configured the HPA depending on the Pub/Sub backlog. We did a canary update of our production environments, starting from the DC with less traffic and monitoring for a couple of days and then moving to the higher-traffic DCs. This service handled millions of uploaded files.

Considerations

As the volume we are mounting in the Pod has limited storage capacity (15 GB), we want to add a filter on the size of the file to scan (as one of the filter chains mentioned above). So, we restricted the file size to 10 GB and then reduced the limit to 1 GB. We get more ~200 Mb files than 1 GB files, so we don’t want to block 90% of scanning of small files due to 10% of the larger files.

Conclusion

We’ve explained how Egnyte stops virus spread by making it fast and simple to quarantine virus-infected files to the correct folder. Users will find it easy to take action on the files listed in the “Potential Malware” folder.

In Egnyte, it all comes down to scale. Although running an antivirus scan on a single file may sound like a simple operation, Egnyte performs this task for millions of streaming uploads daily, provisions computing power as needed, and delivers turnaround in near real-time.

Share this Blog

Don’t miss an update

Subscribe today to our newsletter to get all the updates right in your inbox.

By submitting this form, you are acknowledging that you have read and understand Egnyte’s Privacy Policy.