HelioCampus’ own Director of Data Science, Renee Teate, recently wrote a book: SQL for Data Scientists: A Beginner's Guide for Building Datasets for Analysis. We sat down with her to learn more about why she wanted to write a book, what readers can expect to learn, how it applies to her work at HelioCampus and more. Check out our conversation with her below:
Q: What made you want to write a book?
A: It's exciting to be able to share some of the knowledge I've gained over my career with data science learners, and I was happy that Wiley approached me to propose a topic for a book. I proposed SQL for Data Scientists, because SQL is the one skill that I've used throughout my entire career - from database design and development to data analysis to data science - and there's no sign that it's going away anytime soon. In fact, SQL is one of the most frequently mentioned requirements on current data science job postings. But many older SQL books are written for people in other data roles, like Database Administrators, so I chose to highlight a subset of SQL skills that I use most often as a Data Scientist.
I also have heard from graduates of new data science and analytics degree programs that their coursework focuses on predictive modeling, but their assignments in school start with a pre-made dataset, while in the "real world", data scientists often have to develop their own datasets from large data sources. So I wanted to provide a resource with information to help fill that gap that many apparently end up trying to teach themselves in a hurry on the job.
Q: Did you learn anything new or interesting while writing it?
A: In the process of developing the proposal, I learned just how in-demand SQL still is as a tech skill, and by polling my twitter followers, how many people who are already working as Data Scientists want to learn SQL! Working on the book also reinforced how many different ways there are to approach every problem in SQL. There is a lot of creativity involved in developing datasets using SQL, because there really are many ways to solve each problem. When writing the book, I had to pick which approaches to recommend, which made me think a lot harder about why I use certain techniques.
Q: How does the topic apply to your job at HelioCampus?
A: At HelioCampus, our Data Engineers use SQL to extract and consolidate a lot of the data from the university databases, so we don't have to learn every underlying data system. This removes a lot of the work that might otherwise fall on a Data Scientist who didn't have an engineering team to lean on. However, it really helps to be able to read that SQL and understand how the data was transformed from the source. We also often need to modify the granularity or summarization level of the data for our predictive models, so I use SQL regularly to get a dataset into its final form before importing it into our predictive modeling process.
To get more technical, I query our existing data warehouse tables with SQL to get the data into the shape I need it in, then import it into a pandas dataframe using python in a Jupyter notebook. Then, after running our predictive models, I use SQL to write the results back to the data warehouse. I also use SQL when pulling results from the data warehouse into Tableau for display to the end-users. So, there are at least 3 stages in our predictive modeling process that rely on SQL.
Q: What was the hardest part of writing it?
A: The schedule. Writing a book while working full-time, and also during a pandemic when the world is changing around us, was very challenging for me. I went way beyond the original estimates for delivering content, and my editors helped get me to the finish line. I've gained a lot of respect for people who produce books regularly, because it truly is a lot of work. I don't plan on writing another book anytime soon!
Q: If you could give yourself some advice before getting started and you could do it all over again, what would you say?
A: Ask other authors about the process to better understand what you're getting into, and whether what you're experiencing is the norm. Plan on it taking up a lot more of your life than you initially plan on, because when you're not working on the book, you'll likely be thinking about the content, or telling yourself you should be working on it.
I wish I had written an entire first draft before signing the contract, so I could spend the scheduled time improving it instead of creating the content for the first time. But then again, if I weren't on a publisher's schedule, I may never have finished it!
If you are interested in purchasing Renee's book, you can find more information here: https://sqlfordatascientists.com/