Categories
analytics analyticsengineering businessintelligence

Running a personal SQL server for free

For some, getting into data analytics outside of an academic or work environment can be very challenging - where do you start? Which database do you use? And how do you do it for low or zero cost?

In this article, I am going to walk through setting up your VM1 & database, connecting to your new remote server using Azure Data Studio, and as a bonus, connecting it to dbt. I've also written about setting up dbt on windows on a previous post.

First, let's talk about requirements & recommendations:

  1. This tutorial is focused on Windows 10 + Linux. You will need Windows 10 Pro where you install your VM.
  2. I recommend that you set up your database on different physical machine than your dev machine. You should probably have at least 32GB of RAM.
  3. Since we are installing the database on another machine, that machine needs to be on the same network as your development machine.

Why use a VM at all? In my experience, running a database on your dev machine makes everything extremely slow. Your database will be very greedy with resources (RAM specifically) - so keeping it in a little box that you can turn on and off allows you to keep using your machine "as normal".

Step 1: Enable HyperV

Open powershell as administrator and run the following command:

Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Hyper-V -All

More info can be found here: https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/quick-start/enable-hyper-v

Step 2: Create a VM in HyperV

You will need to restart your machine in order to use the HyperV features, so machine sure to do that first. The Microsoft documents to create a VM are exellent - and linked below. Make sure to select Ubuntu 20.04 when you create it.

https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/quick-start/quick-create-virtual-machine

Step 3: Install SQL Server on your VM

We will do the install of SQL Server2 in the CLI on Ubuntu, which MS has laid out again very nicely in their documentation. A couple of notes when walking through this:

  1. Make sure to select "SQL Server Express" as your edition. It limits your database size to 9GB but is otherwise relatively unencumbered by MS licensing.
  2. Write down your SA password. You will need it later when connecting.

This is quite detailed, so head over to this link and follow the instructions in detail: https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-setup?view=sql-server-ver15

Step 4: Update the settings of your virtual switch

The default settings inside HyperV is for an "internal network" on your VM. This is fine if you are accessing your VM from the machine its running on, but the whole point here is that you want it to be a "remote server". Set the virtual switch to "external network" and you can then access your VM from any machine on your network.

Again, MS has great documentation on this here: https://docs.microsoft.com/en-us/windows-server/virtualization/hyper-v/get-started/create-a-virtual-switch-for-hyper-v-virtual-machines

Step 5: Install Azure Data Studio on your dev machine - and write some SQL!

On your dev machine, make sure you can ping your VM. In my case, my VM is named "jacob-virtual-machine", so the command to validate I can reach it is:

ping jacob-virtual-machine

If you can't ping your VM, you have some networking issues to sort out. While I am no expert here, you will want to make sure you can see your VM outside the host (Step 4, above) and that port 1433 is open on the host and the VM.

Once that is resolved, you can download and install Azure Data Studio3. Now, with the credentials from above and you VM name, you can connect to your remote server. Everything can be left on defaults, but the avoidance of doubt, check out my connection settings below.

SQL Server Connection Settings

Now you have it all working and you have your own nice empty database to play with!

Bonus Content: Connect dbt to SQL Server

For those of you wishing to use dbt with SQL Server, check out the dbt-sqlserver github. It has great details, but I'll summarize the key bits.

You will need to install the dbt connector:

pip install dbt-sqlserver

I also find their explanation of the profiles.yml file kind of confusing, so I've included my own below for reference:

local_sql:
  target: dev
  outputs:
    dev: 
      type: sqlserver
      driver: 'ODBC Driver 17 for SQL Server'
      server: <VM name>
      database: <database name>
      port: 1433
      schema: <schema name>
      user: <username>
      password: <password>

Footnotes

1 You can also probably do this with WSL2, and not install a Linux VM. However, I am going to be running more software on the VM later and I want to split it to another machine. You can also use docker over top of all of this, which I may cover in another post.
2 I'm choosing SQL Server for a couple reasons: I am familiar with it and the documentation and community are large. PostgreSQL also works here, which has the advantage of having a default dbt connector.
3 SSMS works here too, but Azure Data Studio has the advantage of being cross platform. If you are using dbt, you need a SQL runner anyway as the VS code options aren't great.

Categories
analytics analyticsengineering everythinganalytics

Install dbt on Win10 - April 2021 (Updated October 2021)

I was getting a little frustrated with the web interface of dbt cloud, and just wanted to feel more "in control" of my environment. Which lead to this twitter thread:

Which led to lots of good recommendations (for Atom, POP SQL, vim+tmux, DeepChannel, and some others) but ultimately I settled on VS Code after a few auspicious DMs.

The process to install dbt on Win10 isn't exactly friendly for an analyst using dbt without engineering experience, so I wanted to share my journey and hopefully make yours easier too.

Step 0: Install Python

Note: These steps have been tested with Python 3.9.6 and earlier. As of October 15th, 2021, Python 3.10.x is not working!

Before you do anything else, install python. Make sure to check the box to "Add Python to PATH". If you don't, you can only run it explicitly.

low quality screenshot - but peep the box at the bottom.

If you miss this step, you have two options:
1 - re-install of python and check the "Add Python to PATH" box.
2 - manually add the PATH for python; one example linked here.

To confirm it's working, open the command line and enter 'py --version'. It should return the version of python you installed. If you get an error, most likely it is a PATH issue.

Step 1: Install VS Code + MS Build tools

First, download links: VS Code, MS Build tools.

As a quick call out, you need MS Build tools for MSVC v140 or higher, which is an optional component of the C++ build tools.

more low quality images, but check the right box.

The MS Build tool install takes a bit of time, so I would do this when you have a little bit of time (15 minutes or so), and you will need to restart your computer.

Step 2: Config your VS Code Environment

There are a few items that need to be done to get VS Code ready for dbt, so I will list them here.

  • Open the command palette (Ctrl+Shift+P), and type 'Python: Select Interpreter'. It should then bring up and allow you to select your python interpreter.
  • Add the dbt power user plug-in.
  • Take a look at this article for more suggested plug-ins from the dbt team.
  • Open your command line, and update pip to the latest version with the command 'pip install --upgrade pip'.
    • If you get an error here, you may need to run it with the '--user' flag as well, but I got that behavior inconsistently.

Step 3: Install dbt on Win10

If you've made it this far, congrats. If you are finding this article because your 'pip install dbt' isn't working, go back to the top and work through the above steps first.

Note: As of October 15th, 2021, this is paragraph is no longer required, but I've kept it for historical reasons. In your command line, run the following: 'pip install dbt --no-use-pep517 cryptography'. The dependency chain is broken somewhere and this flag fixes it. Do I know why? No. Do I care? Also, no.

Go ahead and run 'pip install dbt' in your CLI. This takes a few minutes to run (5-10 mins), but when it's done, you can check by running 'dbt --version' in the CLI. It should return the latest version (as of this update, 0.20.1).

Lastly, I recommend running 'dbt init' to set up your initial '.dbt' folder that holds your profiles.yml file to allow you to connect to your data warehouse. If you don't run it, you will just need to create that file by hand later. Since that file contains your credentials, it is best practice to put that in another place outside of your source control.

And with that, you are ready to connect to your repo and begin working on your dbt project. And lastly - share your Ws on twitter!