Publications by profiles

Latest News – Editorial Line – Volunteers – Events – Specialties – Books– Contact – About


Data Science Profile

R0:0c354476e45d451a73b569693db4f74a-From Zero to Research Scientist full resources guide

From Zero to Research Scientist full resources guide

This guide is designated to anybody with basic programming knowledge or a computer science background interested in becoming a Research Scientist with on Deep Learning and NLP.

Classification based on Topological Data Analysis

Classification based on Topological Data Analysis

Topological Data Analysis (TDA) is an emergent field that aims to discover topological information hidden in a dataset. TDA tools have been commonly used to create filters and topological descriptors to improve Machine Learning (ML) methods. This paper proposes an algorithm that applies TDA directly to multi-class classification problems, even imbalanced datasets, without any further ML stage

Hugging Face datasets

Hugging Face datasets

One-line dataloaders for many public datasets & Efficient data pre-processing

R0: dde004c79ac901067ab1189ea01b8ac7-Data Science: A First Introduction

Data Science: A First Introduction

The book is structured so that learners spend the first four chapters learning how to use the R programming language and Jupyter notebooks to load, wrangle/clean, and visualize data, while answering descriptive and exploratory data analysis questions. The remaining chapters illustrate how to solve four common problems in data science, which are useful for answering predictive and inferential data analysis questions[…]


Bayesian Data Analysis: book & course

This book is intended to have three roles and to serve three associated audiences: an introductory text on Bayesian inference starting from first principles, a graduate text on effective current approaches to Bayesian modeling and computation in statistics and related fields, and a handbook of Bayesian methods in applied statistics for general users of and researchers in applied statistics. Although introductory in its early sections, the book is definitely not elementary in the sense of a first text in statistics

Tidy Modeling with R

This book provides an introduction to how to use our software to create models. We focus on a dialect of R called the tidyverse that is designed to be a better interface for common tasks using R. If you’ve never heard of or used the tidyverse, Chapter 2 provides an introduction. In this book, we demonstrate how the tidyverse can be used to produce high quality models. The tools used to do this are referred to as the tidymodels packages

Machine Learning from scratch (by Danny Friedman)

This book covers the building blocks of the most common methods in machine learning. This set of methods is like a toolbox for machine learning engineers. Those entering the field of machine learning should feel comfortable with this toolbox so they have the right tool for a variety of tasks.

IEEE Use Case–Criteria for Addressing Ethical Challenges in Transparency, Accountability, and Privacy of CTA/CTT

There are substantial public health benefits gained through successfully alerting individuals and relevant public health institutions of a person’s exposure to a communicable disease. Contact tracing techniques have been applied to epidemiology for centuries, traditionally involving a manual process of interview and follow-up. This is time-consuming, difficult, and dangerous work. Manual processes are also open to incomplete information because they rely on individuals being willing and able to remember and report all contact possibilities.

Mastering Shiny

This book complements Shiny’s online documentation and is intended to help app authors develop a deeper understanding of Shiny. After reading this book, you’ll be able to write apps that have more customized UI, more maintainable code, and better performance and scalability.

The Art of Machine Learning (Algorithms + Data + R)

I wrote this book because: • ML is not a recipe. It is not a matter of knowing the syntax and mechanics of various software packages.• ML is an art, not a science. (Hence the title of this book). • One does not have to be a math whiz or know advanced math in orer to use ML effectively, but one does need to understand the concepts well — the Why? and How? of ML methods

Best Practices in Dataviz: An R Perspective

By the end of this you will have had a whirlwind tour of the very tip of the data visualization best-practices iceberg. We will go over a broad range of topics generally applicable to data science usecases but not dive too deep into any single one. One thing to keep in mind the whole time is none of this is absolutely set in stone, most often in the real world you have to bend or break some of these rules to do what you want.

Data Structures Succinctly Part 2

Data Structures Succinctly Part 2 is your concise guide to skip lists, hash tables, heaps, priority queues, AVL trees, and B-trees. As with the first book, you’ll learn how the structures behave, how to interact with them, and their performance limitations. Starting with skip lists and hash tables, and then moving to complex AVL trees and B-trees, author Robert Horvick explains what each structure’s methods and classes are, the algorithms behind them, and what is necessary to keep them valid.

Data Structures Succinctly Part 1

Data Structures Succinctly Part 1 is your first step to a better understanding of the different types of data structures, how they behave, and how to interact with them. Starting with simple linked lists and arrays, and then moving to more complex structures like binary search trees and sets, author Robert Horvick explains what each structure’s methods and classes are and the algorithms behind them. Horvick goes a step further to detail their operational and resource complexity, ensuring that you have a clear understanding of what using a specific data structure entails.

MySQL® Notes for Professionals

MySQL® Notes for Professionals book is compiled from Stack Overflow Documentation. (187 pages, published on May 2018)

Fundamentals of Data Visualization

Guide to making visualizations that accurately reflect the data, tell a story, and look professional. It has grown out of my experience of working with students and postdocs in my laboratory on thousands of data visualizations.

Advanced R

Advanced R helps you understand how R works at a fundamental level. It is designed for R programmers who want to deepen their understanding of the language, and programmers experienced in other languages who want to understand what makes R different and special. This book will teach you the foundations of R; three fundamental programming paradigms (functional, object-oriented, and metaprogramming); and powerful techniques for debugging and optimising your code.

IPython Interactive Computing and Visualization Cookbook

IPython Interactive Computing and Visualization Cookbook, Second Edition contains many ready-to-use, focused recipes for high-performance scientific computing and data analysis, from the latest IPython/Jupyter features to the most advanced tricks, to help you write better and faster code.

Text Mining with R (A Tidy Approach)

If you work in analytics or data science, like we do, you are familiar with the fact that data is being generated all the time at ever faster rates. (You may even be a little weary of people pontificating about this fact.) Analysts are often trained to handle tabular or rectangular data that is mostly numeric, but much of the data proliferating today is unstructured and text-heavy. Many of us who work in analytical fields are not trained in even simple interpretation of natural language.

We developed the tidytext (Silge and Robinson 2016) R package because we were familiar with many methods for data wrangling and visualization, but couldn’t easily apply these same methods to text.

Data Science at the Command Line

Today, data scientists can choose from an overwhelming collection of exciting technologies and programming languages. Python, R, Hadoop, Julia, Pig, Hive, and Spark are but a few examples. You may already have experience in one or more of these. If so, then why should you still care about the command line for doing data science? What does the command line have to offer that these other technologies and programming languages do not?

R Packages

Packages are the fundamental units of reproducible R code. They include reusable R functions, the documentation that describes how to use them, and sample data. In this book you’ll learn how to turn your code into packages that others can easily download and use. Writing a package can seem overwhelming at first. So start with the basics and improve it over time. It doesn’t matter if your first version isn’t perfect as long as the next version is better. This is where we are developing the 2nd edition of this book.

R Programming Succinctly

The R programming language on its own is a powerful tool that can perform thousands of statistical tasks, but by writing programs in R, you gain tremendous power and flexibility to extend its base functionality. Senior Succinctly series author and editor James McCaffrey shows you how in R Programming Succinctly.


Something went wrong. Please refresh the page and/or try again.

Share this on: