-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathDomain_Grades.txt
More file actions
55 lines (43 loc) · 3.59 KB
/
Domain_Grades.txt
File metadata and controls
55 lines (43 loc) · 3.59 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
domain_classification_out
High level classification for the domain, reported by a machine learning algorithm named DomainClassifier. The values currently provide the following classifications:
business - The website content indicates a business owns/manages the site
blank - Not enough content to classify
nonbusiness_other - The website content indicates a website for something other than a business: photo, personal or family site, blog, game, etc
nonbusiness_ads - Nothing but ads
nonbusiness_error - Returns an error: 404, server not found, no response, etc
nonbusiness_undercon - Under Construction
frameset - Site content within an HTML frame. Often an error. Low value.
nonbusiness_login - No content, just asking for login credentials
nonbusiness_xxx - Adult content
redirect - The website redirects
timeout - Could not resolve content. Error.
error - Error
Currently, for domains with non-English content, the language code is indicated. It should be noted that any Domain Classification with a lang classification is either business or nonbusiness_other. Errors, ads, redirects, etc will not have a lang classification. Future updates will classify non-English content into business or nonbusiness_other accordingly.
Foreign languages are designated with the ISO-639 standard, including:
lang_de German
lang_zh Chinese
lang_es Spanish
lang_fr French
lang_ru Russian
lang_pt Portuguese
lang_ja Japanese
lang_nl Dutch
lang_it Italian
lang_id Indonesian
domain_classification_score_out
domain_classification_score_out indicates level of accuracy of the Domain Classification Out, reported by DomainClassify. The score ranges from 0 to 1 with 1 being the most confident.
0 - .5 is Low Confidence
.5 - .7 is Medium Confidence
.7 - .9 is High Confidence
.9 - 1 is Extremely High Confidence
domain_grade
domain_grade is a grade with the values: ABCDEF which map to Profound's domain_classification attribute filtered by the domain_classification_score. Provides for a simple method of selecting and filtering domains based on their high level B2B business value. For most applications Grade A and Grade B will be appropriate.
Grade A "business" with domain_classification_score > 0.6
Grade B "business" with domain_classification_confidence score < 0.6
Grade C Includes the following domain_classification categories:
"nonbusiness_other", "nonbusiness_undercon", "nonbusiness_login", "nonbusiness_ads", "blank", "error", "no_homepage", "nonbusiness_error", "nonbusiness_xxx", "timeout"
Note In some cases, a website may not have enough content on it's website to make a high confidence "business" classification, in which case it may be classified as a "C". We manage a list of exceptions. A notable example includes google.com which has thin content and is not much more than a search field. If you observe domains that should be classified as A or B, we will add them to the exceptions list after our review.
Grade D "redirect". The domain does not have any website content itself, but redirects to one that does. Typically redirects to a domain_grade A, B, or C
Grade E "ns_only". Nameserver only. Typically no content whatsoever.
Grade F "unregistered" an unroutable domain. There is no Nameserver mapped to this domain and it is therefore "noise".
Note: Domain Grade C includes a very large set of functional and non-functional website domains. Profound intends on splitting this group into smaller segments. We have observed that some legitimate business domains can be classified as a C. When making decisions about Grade C domains, factor in other parameters like Domain_Classification or DBI_Density to separate functional from non-functional domains.