So, You Want to Climb the Mountain
Looks hot... But what's several thousand degrees among friends?
Welcome to the inagural post for An Ascent Of Analytics. Our journey awaits!
Not all pathways up the mountain of Data Science are as fiery as in Pierre-Jacques Volaire’s The Eruption of Mount Vesuvius (1777), but we’ll see what we can do to keep the sunscreen to a minimum. Maybe some pictures of snow later on will help…
This blog, once it gets off the ground, is aimed primarily at helping out anyone that feels as if the acquisition of skills to practice Data Science is comparable with a summit of Mount Everest. I’d like to be a friendly guide forward, for as far as I can take you, walking together at many points in our journey. Conversely, there are many, many expeditions hiking even higher up the mountain, and I’d appreciate and value their opinions and counsel.
It may take me a while to complete these first projected articles. They’re not necessarily going to be done in order, but I think all these topics are important. They’ll be accessible in the ‘Categories’ section of the main page. I’ll try to start off with one in each category, however.
R Essentials
- An Introduction to Vectors
- The Art of Commenting Your Code
- read.csv and read.table utilities
Python Essentials
- Lists and Dictionaries
- The Elegance of List Comprehensions
The ongoing debate between R and Python as the primary DS language shows no sign of letting up. While Python is edging ahead, I’d caution everyone to not put their eggs in one basket. In my opinion, having the comfort level to work with both makes you more well-rounded as a developer and problem-solver. But we’ll revisit that at a later date.
SQL Essentials
- Table Joins for the Common Man (and Woman)
SQL is the Swiss Army Knife of modern IT, without a doubt. It deserves its own discussion thread alongside other database methodologies (NoSql, MongoDB, etc.).
Package Coolness
- igraph
- ERGM
- shiny
- symPy
- caret
And what’s R or Python without a take on the latest or coolest package of the day?? Time to carve out for that too.
Classic Stats
- To Be, Or Not To Be (Normally Distributed)
For those that are extremely early on in their journey, we’ll cover some classic statistical concepts and their extension into the Big Data methodolgies of today.
Data Cleaning
- Missing Data; Demystified
- Amelia: An R Package for Multivariate Imputation
Data Cleaning (also called Data Munging) is one of the most vital and unappreciated parts of Data Science. It’s also one of the most time consuming - both in the act of cleaning and in the constant skepticism that should be leveled at the source(s) of the data. Data without checks and balances on its veracity is data likely to spawn false recommendations.
Data Mining
- Won’t You Be My (k-nearest) Neighbor?
- Probabilistic Topic Modeling and the LDAvis R Package
What’s a blog on Data Science without some commentary on Data Mining? Supervised methods, unsupervised methods, and everything in between.
Optimization
- A Primer on Linear and Constraint Programming
We’d also be remiss if we left Optimization out of our toolbox. Several flavors of Linear and Constraint Programming, Simulations, Decision Analysis and Shortest Path problems are occasionally encountered.
Visualization
- polygon() Essentials for Region filling
- Tufte’s Data-to-Ink Ratio, Revisited
- The Grammar of Graphics
- ggmap() and Leaflet for Geographic Data Overlay
Data Visualization could easily be a blog of its own. There are right ways and wrong ways to present data, and we’ll cover milestones in the field including the work of Edward Tufte.
Ecosystem
- R Studio
- Anaconda
- Spyder
- Excel Solver
- PyCharm
- CPLEX
- Tableau
Data Science isn’t all coding and languages. It also involves the use of tools to augment stages of the problem-solving lifecycle. There’s a whole ecosystem of tools just waiting to be discussed.
Communication
- The Power of Plain Writing
- Storytelling as a Data Science Mechanism
Employing good communication skills can mean the difference between hard work embraced and hard work ignored by business units. Clear writing and speaking are vital tools in the Data Scientist’s toolbox.
Job Hunting
- The HR Screening, a.k.a ‘Leaving the Acronyms at Home’
- Inside the Mind of a Hiring Manager
The hunt for your first job (or your next job) is a step at which many feel like they’re like a fish out of water. What is the HR screener looking to hear? What is the hiring manager looking to hear? I’ve been on both sides of the interview and possibly have some insights that can be of assistance.
Ethics Landscape
- Transparency in Action and Intent
- Reproducibility as the Gateway to Trust
- Impostor Syndrome: It’s OK to Talk About it
Lastly, what’s a Data Scientist without an effective compass when it comes to the ethics of data manipulation and retention? And the dueling needs of security versus transparency? I don’t have all the answers, by any stretch of the imagination. On this topic, my goal is to stimulate some (OK, a lot of) conversation. We’re all better off when we’re able to respectfully cross-pollinate each others’ opinions.
Enjoy the journey !