Identifying Suspicious URLs: An Application of Large-Scale Online Learning
Google Tech Talk May 5, 2010 ABSTRACT Presented by Justin Ma. We explore online learning approaches for detecting malicious Web sites (those involved in criminal scams) using lexical and host-based features of the associated URLs. We show that this application is particularly appropriate for online algorithms as the size of the training data is larger than can be efficiently processed in batch and because the distribution of features that typify malicious URLs is changing continuously. Using a real-time system we developed for gathering URL features, combined with a real-time source of labeled URLs from a large Web mail provider, we demonstrate that recently-developed online algorithms can be as accurate as batch techniques, achieving daily classification accuracies up to 99% over a balanced data set. Slides: cseweb.ucsd.edu Justin Ma is a PhD candidate at UC San Diego advised by Stefan Savage, Geoff Voelker and Lawrence Saul. His research interests are in systems and networking with an emphasis on network security, and his current focus is the application of machine learning to problems in security. He will be joining UC Berkeley as a postdoc after graduation. [Home page: www.cs.ucsd.edu ]
Recent Entries
- how do i buy domain names to sell?
- Want a Good domain and hosting provider?
- where is the best place to buy domain names ?
- How to get free domain and hosting?
- How hosting providers are selling us services?
- I am looking a best FREE domain hosting provider?
- SEO Help Videos – Get To #1 On Google
- Starting and Building an Online Business
- Monthly Web Hosting
- Great Newspaper Advertising














May 15th, 2010 at 7:40 am
I find his usage of the word “feature” confusing.
May 15th, 2010 at 11:49 am
8:18 Is this the top of some girl’s head?
May 16th, 2010 at 6:43 am
@arex1338 It is machine-learning jargon. So, it was used appropriately for the audience.
May 16th, 2010 at 6:45 am
This is a great video. Also, very nice refresher on ML algorithms. I’ve bookmarked it as a reference for some of those ML formulas.
May 16th, 2010 at 9:58 pm
gah, what’s that high pitched hiss when he talks
May 17th, 2010 at 1:32 pm
They can also hide their domain completely using feedproxy.google , thank you very much for that spam domain anon service google
May 17th, 2010 at 3:42 pm
Justin, a few less “Ummmm…” would be nice.
May 18th, 2010 at 8:21 pm
many really good algorithms mentioned in this video. Great work anyway
May 28th, 2010 at 7:50 pm
very intersting research, congratulations!
June 16th, 2010 at 4:05 pm
Google, one of the worlds largest companies, unable to produce decent audio!?