BTC AI - Website Classification

Quick and hassle-free website classification

We use machine learning and deep learning to automatically classify websites.

Benefits

Why classify websites?

Traditional methods of classifying websites are based on the subjective judgment of the operator, URL rules or ready-made pattern databases, which makes them ineffective in a dynamically changing Internet.

The BTC Website Classification solution eliminates these limitations, allowing instant and accurate classification of any site. What’s more, each site is reanalyzed every month, which is especially important in situations where the owner or content changes.

Our technology is distinguished not only by its speed, but also by its high quality performance. The classification process is based on three independent algorithms that are constantly improving their evaluation mechanisms. This makes the solution ideal for professional IT management and network security systems.

Unlike many foreign classifiers, BTC Website Classification effectively analyzes content in Polish and 51 other languages, making it one of the most comprehensive tools of its kind. In addition, the classified sites provide information not only about their subject matter, but also about their impact on productivity and potential threat to the user. The system effectively detects sites that can phish, government-blocked sites (such as gambling sites) and other dangerous resources.

Our solution also offers API access, which allows it to be easily integrated with other systems – for example, to automatically block sites belonging to specific categories, such as pornography or phishing sites. In addition, BTC Website Classification is not limited to analyzing the homepage only – in case of insufficient data, it searches sub-linked bookmarks and supports redirects, which significantly increases the effectiveness of the classification.

How the web classifier works

Downloading a list of URLs to categorize

The addresses are sent to the classifier

Downloading the page content

The site code is downloaded for later analysis

Cleaning the site code from unnecessary information

The code of the page is cleaned of unnecessary data, such as repeated words and HTML tags

Machine Learning: identifying keywords that determine the nature of the site

After clearing the code of unnecessary components, the words (keywords) that define the nature of the site will remain.

Deep Learning: Analyzing a web page based on the created neural network

Data processing to enhance the effectiveness of the deep learning model

Machine Learning: assessing the severity of keywords of defined categories

Repeated keywords are assigned to categories based on the dictionary and the number (saturation) of words within each category is determined

Deep Learning: a global assessment of a website in relation to the entire site context

When analyzing a Web page, the entire context of the page is taken into account, which makes it possible to analyze multi-topic pages more effectively

Determination of site classification

The site is assigned to the category that has been identified as the most probable

This site is registered on wpml.org as a development site. Switch to a production site key to remove this banner.