Free books available online
-
Applied Data Science , Columbia University, Ian Langmore and Daniel Krasner -
Understanding Machine Learning: From Theory to Algorithms , Shai Shalev-Shwartz and Shai Ben-David -
A Course in Machine Learning , Hal Daume III (draft) -
Deep Learning , Yoshua Bengio (draft), 2016 -
Neural Networks and Deep Learning , Michael Nielsen, 2017 -
Intermediate Python , Muhammad Yasoob Ullah Khalid, 2015 -
Think Bayes , Allen B. Downey, Green Tea Press, 2012 -
The Elements of Statistical Learning , Hastie, Tibshirani, Friedman, Springer, 2011 -
An Introduction to Statistical Learning James, Witten, Hastie, Tibshirani, Springer 2013 -
Bayesian Reasoning and Machine Learning , David Barber, Cambridge University Press, 2012. -
Introduction to Information Retrieval , Manning, Raghavan and Schutze, Cambridge University Press, 2008. -
Mining of Massive Datasets , Rajamaran and Ullman, Cambridge University Press, 2011. -
Information Theory, Inference and Learning Algorithms , David Mackay, Cambridge University Press, 2007 -
Introduction to Machine Learning , Alex Smola, (full draft, very good). -
Text Processing in Python , David Mertz, Addison Wesley, 2003. - A whole collection of nice open-access AI books from Intech.
- Interactive Data Visualization for the Web (O'Reilly), good book on the d3 javascript library, based on the following tutorials
Datasets
- Amazon Customer Reviews Dataset
- The Billion Prices project
- Public Datasets on AWS
- UCI Machine Learning Repository
- Berkeley Earth
- Awesome public datasets - a compilation by Xiaming Chen
- ICEWS- Integrated Crisis Early Warning System
- Phoenix Dataset - a near real-time event dataset
- NY City Motor Vehicle Collisions data
- Big Data: 35 Brilliant and Free Data Sources for 2016
- Stanford Large Network Dataset Collection
- Smart* Data Set for Sustainability
- NREL Wind Data
- GroupLens Research Data Sets (Recommendation systems, etc)
- Datasets for network analysis
- Global Database of Events, Language and Tone (Featured in SBP'14)
- NOAA/NGDC - Earth Observation Group - Defense Meteorological Satellite Progam, Boulder
- The Corpus of Historical American English (COHA)
- Project Tycho - Public Health Data for Science and Policy Making
- AMiner Citation Network Dataset
- KEEL-dataset repository
- Machine Learning data set repository
- City-sized portions of OpenStreetMap, served weekly ;-) (note to self) for Basemap, you want the "IMPOSM SHP" format
Links
- Tutorial for setting up Hadoop 2.x (most existing tutorials seem to target version 1.x or earlier..!
- Teaching materials for machine learning. Notes, slides, homework material..
- GloVe: Global Vectors for Word Representation