数据科学.pdf

数据科学.pdf
 

书籍描述

内容简介
《数据科学(影印版)(英文版)》将会告诉你所需要了解的一切。它富有深刻见解,是根据哥伦比亚大学的数据科学课程的讲义整理而成。现在人们已经意识到数据可以让选举或者商业模式变得不同,数据科学作为一项职业正在不断发展。但是你应该如何在这样一个广阔而又错综复杂的交叉学科领域中开展工作呢?

编辑推荐
《数据科学(影印版)(英文版)》由东南大学出版社出版。

作者简介
作者:(美国)舒特(Rachel Schutt) (美国)奥尼尔(Cathy O'Neil)

舒特(Rachel Schutt),新闻集团数据科学高级副总裁,是哥伦比亚大学的统计学兼职教授,也是数据科学和工程学院教育委员会的创始会员。
奥尼尔(Cathy O'Neil),Johnson研究实验室的高级数据科学家,具有哈佛大学的数学博士学位,是麻省理工学院数学系的博士后,曾经是巴纳德学院的教授。

目录
Preface
1.Introduction: What Is Data Science?
Big Data and Data Science Hype
Getting Past the Hype
Why Now?
Datafication
The Current Landscape (with a Little History)
Data Science lobs
A Data Science Profile
Thought Experiment: Meta—Definition
OK, So What Is a Data Scientist, Really?
In Academia
In Industry
2.Statistical Inference, Exploratory Data Analysis, and the Data Science
Process
Statistic.al Thinking in the Age of Big Data
Statistical Inference
Populations and Samples
Populations and Samples of Big Data
Big Data Can Mean Big Assumptions
Modeling
Exploratory Data Analysis
Philosophy of Exploratory Data Analysis
Exercise: EDA
The Data Science Process
A Data Scientist's Role in This Process
Thought Experiment: How Would You Simulate Chaos?
Case Study: RealDirect
How Does RealDirect Make Money?
Exercise: RealDirect Data Strategy
3.Algorithms
Machine Learning Algorithms
Three Basic Algorithms
Linear Regression
k—Nearest Neighbors (k—NN)
k—means
Exercise: Basic Machine Learning Algorithms
Solutions
Summing It All Up
Thought Experiment: Automated Statistician
4.Spare Filters, Naive Bayes, and Wrangling
Thought Experiment: Learning by Example
Why Won't Linear Regression Work for Filtering Spare?
How About k—nearest Neighbors?
Naive Bayes
Bayes Law
A Spare Filter for Individual Words
A Spam Filter That Combines Words: Naive Bayes
Fancy It Up: Laplace Smoothing
Comparing Naive Bayes to k—NN
Sample Code in bash
Scraping the Web: APIs and Other Tools
Jake's Exercise: Naive Bayes for Article Classification
Sample R Code for Dealing with the NYT API
5.Logistic Regression
Thought Experiments
Classifiers
Runtime
You
Interpretability
Scalability
M6D Logistic Regression Case Study
Chck Models
The Underlying Math
Estimating α and β
Newton's Method
Stochastic Gradient Descent
Implementation
Evaluation
Media 6 Degrees Exercise
Sample R Code
6.1ime Stamps and Financial Modeling
Kyle Teague and GetGlue
Timestamps
Exploratory Data Analysis (EDA)
Metrics and New Variables or Features
What's Nextl
Cathy O'Neil
Thought Experiment
Financial Modeling
In—Sample, Out—of—Sample, and Causality
Preparing Financial Data
Log Returns
Example: The S&P Index
Working out a Volatility Measurement
Exponential Downweighting
The Financial Modeling Feedback Loop
Why Regression?
Adding Priors
A Baby Model
Exercise: GetGlue and Timestamped Event Data
Exercise: Financial Data
7.Extracting Meaning from Data
William Cukierski
Background: Data Science Competitions
Background: Crowdsourcing
The Kaggle Model
A Single Contestant
Their Customers
Thought Experiment: What Are the Ethicallmplications of a Robo—Grader?
Feature Selection
Example: User Retention
Filters
Wrappers
Embedded Methods: Decision Trees
Entropy
The Decision Tree Algorithm
Handling Continuous Variables in Decision Trees
Random Forests
User Retention: Interpretability Versus Predictive Power
David Huffaker: Google's Hybrid Approach to Social Research
Moving from Descriptive to Predictive
Social at Google
Privacy
Thought Experiment: What Is the Best Way to Decrease Concern and Increase Understanding and Control?
8.Recommendation Engines:Building a User—Facing Data Product at Scale
A Real—World Recommendation Engine
Nearest Neighbor Algorithm Review
Some Problems with Nearest Neighbors
Beyond Nearest Neighbor: Machine Learning Classification
The Dimensionality Problem
Singular Value Decomposition (SVD)
Important Properties of SVD
Principal Component Analysis (PCA)
Alternating Least Squares
Fix V and Update U
Last Thoughts on These Algorithms
Thought Experiment: Filter Bubbles
Exercise: Build Your Own Recommendation System
Sample Code in Python
9.Data Visualization and Fraud Detection
Data Visualhation History
Gabriel Tarde
Mark's Thought Experiment
What Is Data Science, Redux?
Processing
Franco Moretti
A Sample of Data Visualization Projects
Mark's Data Visualization Projects
New York Times Lobby: Moveable Type
Project Cascade: Lives on a Screen
Cronkite Plaza
eBay Transactions and Books
Public Theater Shakespeare Machine
Goals of These Exhibits
Data Science and Risk
About Square
The Risk Challenge
The Trouble with Performance Estimation
Model Building Tips
Data Visualization at Square
Ian's Thought Experiment
Data Visualization for the Rest ofUs
Data Visualization Exercise
……
10.Social Networks and Data Journalism
11.Causality
12.Epidemiology
13.Lessons Learned from Data Competitions:Data Leakage and Model Evaluation
14.Data Engineering:MapReduce,Pregel,and Hadoop
15.The Students Speak
16.Next—Generation Data Scientists,Hubris,and Ethics
Index

文摘
版权页:



Social network analysis was germinated by Harrison White, professor emeritus at Columbia, contemporaneously with Columbia sociologist Robert Merton.Their idea was that people's actions have to be related to their attributes, but to really understand them you also need to look at the networks (aka systems) that enable them to do something.How do we bring that idea to our modelsl Kelly wants us to consider what he calls the micro versus macro, or individual versus systemic divide: how do we bridge this divide? Or rather, how does this divide get bridged in various contexts?
In the US, for example, we have formal mechanisms for bridging thosemicro/macro divides, namely markets in the case ofthe "buying stuff"divide, and elections in the case of political divides.But much of the world doesn't have those formal mechanisms, although they often havea fictive shadow of those things.For the most part, we need to knowenough about the actual social network to know who has the powerand influence to bring about change.
Terminology from Social Networks
The basic units of a network are called actors or nodes.They can be people, or websites, or whatever "things" you are considering, and areoften indicated as a single dot in a visualization.The relationshipsbetween the actors are referred to as relational ties or edges.For example, an instance ofliking someone or being friends would be indicated by an edge.We refer to pairs of actors as dyads, and triplets ofactors as triads.For example, if we have an edge between node A and node B, and an edge between node B and node C, then triadic closure would be the existence of an edge between node A and node C.

购买书籍

当当网购书 京东购书 卓越购书

PDF电子书下载地址

相关书籍

搜索更多