# cracking-the-data-science-interview **Repository Path**: davidgao7/cracking-the-data-science-interview ## Basic Information - **Project Name**: cracking-the-data-science-interview - **Description**: A Collection of Cheatsheets, Books, Questions, and Portfolio For DS/ML Interview Prep - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 0 - **Created**: 2022-02-11 - **Last Updated**: 2022-03-14 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README Here are the sections: * [Data Science Cheatsheets](#data-science-cheatsheets) * [Data Science EBooks](#data-science-ebooks) * [Data Science Question Bank](#data-science-question-bank) * [Data Science Case Studies](#data-science-case-studies) * [Data Science Portfolio](#data-science-portfolio) * [Data Journalism Portfolio](#data-journalism-portfolio) * [Downloadable Cheatsheets](#downloadable-cheatsheets) ## Data Science Cheatsheets [This section](https://github.com/khanhnamle1994/cracking-the-data-science-interview/tree/master/Cheatsheets) contains cheatsheets of basic concepts in data science that will be asked in interviews: * [SQL](https://github.com/khanhnamle1994/cracking-the-data-science-interview/tree/master/Cheatsheets#sql) * [Statistics and Probability](https://github.com/khanhnamle1994/cracking-the-data-science-interview/tree/master/Cheatsheets#statistics-and-probability) * [Mathematics](https://github.com/khanhnamle1994/cracking-the-data-science-interview/tree/master/Cheatsheets#mathematics) * [Machine Learning Concepts](https://github.com/khanhnamle1994/cracking-the-data-science-interview/tree/master/Cheatsheets#machine-learning-concepts) * [Deep Learning Concepts](https://github.com/khanhnamle1994/cracking-the-data-science-interview/tree/master/Cheatsheets#deep-learning-concepts) * [Supervised Learning](https://github.com/khanhnamle1994/cracking-the-data-science-interview/tree/master/Cheatsheets#supervised-learning) * [Unsupervised Learning](https://github.com/khanhnamle1994/cracking-the-data-science-interview/tree/master/Cheatsheets#unsupervised-learning) * [Computer Vision](https://github.com/khanhnamle1994/cracking-the-data-science-interview/tree/master/Cheatsheets#computer-vision) * [Natural Language Processing](https://github.com/khanhnamle1994/cracking-the-data-science-interview/tree/master/Cheatsheets#natural-language-processing) * [Stanford Materials](https://github.com/khanhnamle1994/cracking-the-data-science-interview/tree/master/Cheatsheets#stanford-materials) ## Data Science EBooks [This section](https://github.com/khanhnamle1994/cracking-the-data-science-interview/tree/master/EBooks) contains books that I have read about data science and machine learning: * [Intro To Machine Learning with Python](https://github.com/khanhnamle1994/cracking-the-data-science-interview/tree/master/EBooks/Intro-To-ML-with-Python) * [Machine Learning In Action](https://github.com/khanhnamle1994/cracking-the-data-science-interview/tree/master/EBooks/Machine-Learning-In-Action) * [Python Data Science Handbook](https://github.com/khanhnamle1994/cracking-the-data-science-interview/tree/master/EBooks/Python-DataScience-Handbook) * [Doing Data Science - Straight Talk From The Front Line](https://github.com/khanhnamle1994/cracking-the-data-science-interview/tree/master/EBooks/Doing-Data-Science-Straight-Talk-From-The-Front-Line) * [Machine Learning For Finance](https://github.com/khanhnamle1994/cracking-the-data-science-interview/tree/master/EBooks/Machine-Learning-For-Finance) * [Practical Statistics for Data Science](https://github.com/khanhnamle1994/cracking-the-data-science-interview/tree/master/EBooks/Practical-Statistics-For-Data-Science) * [A/B Testing](https://github.com/khanhnamle1994/cracking-the-data-science-interview/tree/master/EBooks/AB-Testing) ## Data Science Question Bank [This section](https://github.com/khanhnamle1994/cracking-the-data-science-interview/tree/master/Question-Bank) contains sample questions that were asked in actual data science interviews: * [Data Interview Qs](https://github.com/khanhnamle1994/cracking-the-data-science-interview/tree/master/Question-Bank/Data-Interview-Qs) * [Data Science Prep](https://github.com/khanhnamle1994/cracking-the-data-science-interview/tree/master/Question-Bank/Data-Science-Prep) * [Interview Query](https://github.com/khanhnamle1994/cracking-the-data-science-interview/tree/master/Question-Bank/Interview-Query) * [Analytics Vidhya](https://github.com/khanhnamle1994/cracking-the-data-science-interview/tree/master/Question-Bank/Analytics-Vidhya.md) * [Springboard](https://github.com/khanhnamle1994/cracking-the-data-science-interview/tree/master/Question-Bank/Springboard.md) * [Elite Data Science](https://github.com/khanhnamle1994/cracking-the-data-science-interview/tree/master/Question-Bank/Elite-Data-Science.md) * [Workera](https://github.com/khanhnamle1994/cracking-the-data-science-interview/tree/master/Question-Bank/Workera) * [150 Essential Data Science Questions and Answers](https://github.com/khanhnamle1994/cracking-the-data-science-interview/tree/master/Question-Bank/150-Essential-Data-Science-Questions-and-Answers.pdf) ## Data Science Case Studies [This section](https://github.com/khanhnamle1994/cracking-the-data-science-interview/tree/master/Case-Studies) contains case study questions that concern designing machine learning systems to solve practical problems. ## Data Science Portfolio This section contains portfolio of data science projects completed by me for academic, self learning, and hobby purposes. For a more visually pleasant experience for browsing the portfolio, check out [jameskle.com/data-portfolio](https://jameskle.com/data-portfolio) - ### Recommendation Systems - [Transfer Rec](https://github.com/khanhnamle1994/transfer-rec): My ongoing research work that intersects deep learning and recommendation systems. - [Movie Recommendation](https://github.com/khanhnamle1994/movielens): Designed 4 different models that recommend items on the MovieLens dataset. _Tools: PyTorch, TensorBoard, Keras, Pandas, NumPy, SciPy, Matplotlib, Seaborn, Scikit-Learn, Surprise, Wordcloud_ - ### Machine Learning - [Trip Optimizer](https://github.com/khanhnamle1994/trip-optimizer): Used XGBoost and evolutionary algorithms to optimize the travel time for taxi vehicles in New York City. - [Instacart Market Basket Analysis](https://github.com/khanhnamle1994/instacart-orders): Tackled the Instacart Market Basket Analysis challenge to predict which products will be in a user's next order. _Tools: Pandas, NumPy, Matplotlib, XGBoost, Geopy, Scikit-Learn_ - ### Computer Vision - [Fashion Recommendation](https://github.com/khanhnamle1994/fashion-recommendation): Built a ResNet-based model that classifies and recommends fashion images in the DeepFashion database based on semantic similarity. - [Fashion Classification](https://github.com/khanhnamle1994/fashion-mnist): Developed 4 different Convolutional Neural Networks that classify images in the Fashion MNIST dataset. - [Dog Breed Classification](https://medium.com/nanonets/how-to-easily-build-a-dog-breed-image-classification-model-2fd214419cde): Designed a Convolutional Neural Network that identifies dog breed. - [Road Segmentation](https://medium.com/nanonets/how-to-do-image-segmentation-using-deep-learning-c673cc5862ef): Implemented a Fully-Convolutional Network for semantic segmentation task in the Kitty Road Dataset. _Tools: TensorFlow, Keras, Pandas, NumPy, Matplotlib, Scikit-Learn, TensorBoard_ - ### Natural Language Processing - [Classifying Tweets with Weights & Biases](https://www.wandb.com/articles/classifying-tweets-with-wandb): Developed 3 different neural network models that classify tweets on a crowdsourced dataset in Figure Eight. - ### Data Analysis and Visualization - [World Cup 2018 Team Analysis](https://github.com/khanhnamle1994/world-cup-2018): Analysis and visualization of the FIFA 18 dataset to predict the best possible international squad lineups for 10 teams at the 2018 World Cup in Russia. - [Spotify Artists Analysis](https://github.com/khanhnamle1994/spotify-artists-analysis): Analysis and visualization of musical styles from 50 different artists with a wide range of genres on Spotify. _Tools: Pandas, NumPy, Matplotlib, Rspotify, httr, dplyr, tidyr, radarchart, ggplot2_ ## Data Journalism Portfolio This section contains portfolio of data journalism articles completed by me for freelance clients and self-learning purposes. For a more visually pleasant experience for browsing the portfolio, check out [jameskle.com/data-journalism](https://jameskle.com/data-journalism) - ### Statistics - [The 10 Statistical Techniques Data Scientists Need to Master](https://www.kdnuggets.com/2017/11/10-statistical-techniques-data-scientists-need-master.html) - [Logistic Regression Tutorial](https://www.datacamp.com/community/tutorials/logistic-regression-R) - [Decision Trees Tutorial](https://www.datacamp.com/community/tutorials/decision-trees-R) - [Support Vector Machines Tutorial](https://www.datacamp.com/community/tutorials/support-vector-machines-r) - [A Friendly Introduction to Data-Driven Marketing for Business Leaders](https://www.topbots.com/data-driven-marketing-for-business-leaders/) - ### Machine Learning - [The 10 Algorithms Machine Learning Engineers Need to Know](https://www.kdnuggets.com/2016/08/10-algorithms-machine-learning-engineers.html) - [12 Useful Things to Know About Machine Learning](https://www.kdnuggets.com/2018/04/12-useful-things-know-about-machine-learning.html) - [A Tour of The Top 10 Algorithms for Machine Learning Newbie](https://builtin.com/data-science/tour-top-10-algorithms-machine-learning-newbies) - [The 10 Data Mining Techniques Data Scientists Need For Their Toolbox](https://builtin.com/data-science/10-data-mining-techniques-data-scientists-need-their-toolbox) - [Clustering and Classification in E-Commerce](https://lucidworks.com/2019/01/24/clustering-classification-supervised-unsupervised-learning-ecommerce/) - [The ABCs of Learning to Rank](https://lucidworks.com/post/abcs-learning-to-rank/) - [6 Ways to Debug a Machine Learning Model](https://www.wandb.com/articles/debug-ml-model) - ### Deep Learning - [The 10 Deep Learning Methods AI Practitioners Need to Apply](https://www.kdnuggets.com/2017/12/10-deep-learning-methods-ai-practitioners-need-apply.html) - [The 8 Neural Network Architectures ML Researchers Need to Learn](https://www.kdnuggets.com/2018/02/8-neural-network-architectures-machine-learning-researchers-need-learn.html) - [The 5 Deep Learning Frameworks Every Serious Machine Learner Should Be Familiar With](https://heartbeat.fritz.ai/the-5-deep-learning-frameworks-every-serious-machine-learner-should-be-familiar-with-93f4d469d24c) - [The 5 Computer Vision Techniques That Will Change How You See The World](https://heartbeat.fritz.ai/the-5-computer-vision-techniques-that-will-change-how-you-see-the-world-1ee19334354b) - [Convolutional Neural Networks: The Biologically-Inspired Model](https://www.codementor.io/@james_aka_yale/convolutional-neural-networks-the-biologically-inspired-model-iq6s48zms) - [Recurrent Neural Networks: The Powerhouse of Language Modeling](https://builtin.com/data-science/recurrent-neural-networks-powerhouse-language-modeling) - [The 7 NLP Techniques That Will Change How You Communicate in the Future](https://heartbeat.fritz.ai/the-7-nlp-techniques-that-will-change-how-you-communicate-in-the-future-part-i-f0114b2f0497) - [The 5 Trends Dominating Computer Vision in 2018](https://heartbeat.fritz.ai/the-5-trends-that-dominated-computer-vision-in-2018-de38fbb9bd86) - [The 3 Deep Learning Frameworks For End-to-End Speech Recognition That Power Your Devices](https://heartbeat.fritz.ai/the-3-deep-learning-frameworks-for-end-to-end-speech-recognition-that-power-your-devices-37b891ddc380) - [The 5 Algorithms for Efficient Deep Learning Inference on Small Devices](https://heartbeat.fritz.ai/the-5-algorithms-for-efficient-deep-learning-inference-on-small-devices-bcc2d18aa806) - [The 4 Research Techniques to Train Deep Neural Network Models More Efficiently](https://heartbeat.fritz.ai/the-4-research-techniques-to-train-deep-neural-network-models-more-efficiently-810ea2886205) - [The 2 Hardware Architectures for Efficient Training and Inference of Deep Nets](https://heartbeat.fritz.ai/the-2-types-of-hardware-architectures-for-efficient-training-and-inference-of-deep-neural-networks-a034850e26dd) - [10 Deep Learning Best Practices to Keep in Mind in 2020](https://nanonets.com/blog/10-best-practices-deep-learning/) ## Downloadable Cheatsheets These PDF cheatsheets come from [BecomingHuman.AI](https://becominghuman.ai/cheat-sheets-for-ai-neural-networks-machine-learning-deep-learning-big-data-science-pdf-f22dc900d2d7). ### 1 - Neural Network Basics ![Neural Network Basics](Neural_Nets_Basics.png) ### 2 - Neural Network Graphs ![Neural Network Graphs](Neural_Nets_Graphs.png) ### 3 - Machine Learning with Emojis ![Machine Learning with Emojis](ML_In_Emoji.png) ### 4 - Scikit-Learn With Python ![Scikit-Learn With Python](Scikit_Learn_With_Python.png) ### 5 - Python Basics ![Python Basics](Python_Basics.png) ### 6 - NumPy Basics ![NumPy Basics](NumPy_Basics.png) ### 7 - Pandas Basics ![Pandas Basics](Pandas_Basics.png) ### 8 - Data Wrangling With Pandas ![Data Wrangling With Pandas Part 1](Data_Wrangling_With_Pandas_Part1.png) ![Data Wrangling With Pandas Part 2](Data_Wrangling_With_Pandas_Part2.png) ### 9 - SciPy Linear Algebra ![SciPy Linear Algebra](SciPy_Linear_Algebra.png) ### 10 - Matplotlib Basics ![Matplotlib Basics](Matplotlib_Basics.png) ### 11 - Keras ![Keras](Keras.png) ### 12 - Big-O ![Big-O](Big-O.png)