Artificial intelligence - BTC AI solutions

Comparison of AI website classifiers

CASE STUDY.

The popularity of AI-based solutions is not waning, moreover, it is growing every year. Classifying websites based on artificial intelligence creates many applications in both cyber security and IT management.

Comparison of web classifiers available on the market

There are many solutions on the market for classifying web content using artificial intelligence. What are the most popular web classifiers, how do they differ, and are they really effective. In the following comparison you will find the answers to these questions.

BTC is a pioneer on the Polish market in classifying websites using AI, so the comparison is between the BTC Website Classification solution and foreign WhoisXML and Cyren, Zvelo and Webroot tools.

BTC Website Classification

BTC Website Classification is a comprehensive solution that enables IT departments to classify websites quickly and efficiently. The tool analyzes websites in detail to strengthen IT security and streamline key processes within an organization.

The BTC solution analyzes sites based on their actual content, matching them to one of 21 categories, determined by machine learning and deep learning. It reports on a website’s productivity, and provides a detailed security analysis of it, using external databases and registries. The classifier recognizes more than 52 languages, accurately categorizing foreign sites.

The catalog of categorized websites is constantly being expanded with current classification and is available in the cloud, so it can be freely used in other solutions via API. BTC Website Classification already has a total of over 9 million categorized websites.

WhoisXML

WhoisXML API Website Categorization Solutions is a solution designed to categorize URLs. The solution analyzes websites and IP addresses to enhance enterprise security.

The tool classifies websites based on machine learning (ML) and natural language processing (NLP). It categorizes sites based on their content, matching them to more than 500+ IAB categories and subcategories. In addition to the category, WhoisXML provides a percentage value, expressing how confident it is in its score. The classifier cannot categorize Polish-language sites. WhoisXML provides an API that can be used in other solutions. In total, the WhoisXML classifier has already categorized more than 480 million Web pages.

Cyren

Cyren Website URL Category Checker is a URL classifier for security. The classifier categorizes websites and determines which ones pose a risk to data security.

Cyren Website URL Category Checker classifies URLs based on their content. Its main task is to analyze the IT reputation of classified sites, as well as check URLs and files containing malware. The Cyren classifier also monitors which URLs may be fake and phishing. The solution additionally provides position information from Alexa Ranking (a popularity ranking created based on millions of Web pages). The Cyren software manufacturer does not provide information on the number of analyzed websites.

Zvelo

ZveloCAT is a solution designed to categorize web content. The tool enables real-time classification of URLs.

The ZveloCAT classifier analyzes websites, returning information to the user about its category, assigned target audience, possible threats, as well as illegal content such as gambling or pornography. The solution works automatically. It classifies web content based on 480 categories and supports more than 200 languages. ZveloCAT provides an API that provides direct access to the zveloAI platform.

Webroot

Webroot BrightCloud Web Classification & Web Reputation is a tool that enables detailed analysis of URLs, including threat, content and reputation analysis.

The solution classifies websites in real time using machine learning. It categorizes websites based on 82 categories, and has a high speed of operation. Webroot’s classifier performs 20,000 classifications per second, and already has a total of more than 32 billion URLs in its inventory. The solution provides effective protection for organizations and users against online threats.


Comparison of categorization efficiency of classifiers

Basic content classification solutions are not enough for most organizations today. Most are looking for proven and innovative tools that will realistically improve the organization’s workflow and enhance IT security. We analyzed 5 of the most popular website classifiers to test their performance and effectiveness.

We compared sites in Polish and English, an airline site – Wizzair.com, a clothing store – Zara.pl, a news portal – Wp.pl and a computer software site – Office.com.


BTC Website Classification stands out with the highest classification efficiency. Each of the 4 URLs regardless of the language of the page was classified correctly with an efficiency close to 100%. The classifier in the result presented two categories obtained through two AI methods.

The Cyren Website URL Category Checker tool performed equally well in the comparison. The classifier correctly categorized 4 web addresses in Polish and English. The manufacturer does not provide a percentage result of categorization effectiveness.

The WhoisXML API Website Categorization Solutions tool fared the worst in the comparison. Out of 4 Web sites in both languages, only one was correctly classified with a confidence score of 63%. The remaining sites were not analyzed due to too little content on the page or an unsupported language.

ZveloCAT ‘s solution handled most foreign pages correctly. The problem occurred with the Polish-language page of a clothing store, which the classifier categorized as gambling. The ZveloCAT tool, out of 4 URLs, analyzed 3 correctly. The manufacturer does not provide a percentage result of the effectiveness of the categorization.

Webroot BrightCloud Web Classification & Web Reputation classifier categorized 3 of the 4 URLs correctly. It failed to categorize a Polish clothing store site, which it recognized as a site about business and economics. The vendor does not provide a percentage score of categorization success.

Comparison of the effectiveness of the classification of dangerous parties

Current cyber threats are the most serious in many years. IT security must be a priority for all organizations regardless of industry. Innovative AI solutions that offer web classification are able to guarantee data security, so it’s worth having them in your inventory.

Sites with a pornography category are particularly dangerous, so it is worth guarding against them and taking proper care of your IT security. In the following comparison, we check which of the available classifiers correctly diagnose pornography sites.


The BTC Website Classification solution performed flawlessly in the comparison. The classifier correctly identified pornographic sites, blocking user access to them. In addition, BTC Website Classification indicated a classification confidence score of 100% for each site.

Cyren Website URL Category Checker also correctly indicated all analyzed URLs as pornographic and dangerous to the user. The manufacturer does not provide a score of confidence in the correctness of the classification.

The WhoisXML solution incorrectly categorized the porn sites analyzed. Pornhub.com was diagnosed inconclusively as a sensitive topic with a score certainty of 57%, while xhamster.com was diagnosed as a video with a score certainty of 58%.

The ZveloCAT classifier correctly classified all 3 websites and categorized them as pornography. The manufacturer does not state how many percent the assigned category is correct.

Webroot BrightCloud Web Classification & Web Reputation correctly categorized all analyzed sites and identified them as pornographic adult sites. The manufacturer does not state how many percent the assigned category is accurate.