Stanford Project CS229
Url Text classification
-
url features
1.1 meta tag ??
1.2 title
1.3 url content: key word or substring of key words:->先转成拼音?,首字母(不好弄)?
1.4 keywords 怎么获取,-> meta tag
1.5 page content -
Adult vocabulary
2.1 Adult.txt and gray.txt,近义词怎么解决,需不需要? -
get origin url data