Categories
analytics bi businessintelligence everythinganalytics

Defining Analytics Titles

Previous entries in Everything Analytics:

The Many Wandering Paths to Analytics
Landing Your First Analytics Job

Confusing Web of Titles

The analytics space is rapidly growing & evolving, and this fast growth has led to a convoluted web of job titles which are overlapping and contradictory. I'm here to help you sort through some of those job titles and even open your eyes to different types of Analytics job you may not have considered. Data Scientists get all the press - but there are many more roles out there which may be a better fit for your interest & skills.

You will rarely find consistency from company to company. In fact, I'll start with a couple disclaimers:

Disclaimer 1: Keep in mind that my opinion on the separation between these jobs has no bearing on how the HR department of a company defines their positions.

Disclaimer 2: This list is not exhaustive. There are lots of substructures to these roles as well as other data-adjacent or niche jobs which exist.

Keep In Mind When Applying

Make sure you absolutely understand the job description and ask many clarifying questions during interview rounds to fully understand what you'll be doing. If you aren't thorough in evaluating the job, you may not end up with the work you thought you'd be doing.

Example Job Titles

Data Scientist
(Related: Statistician)
Data Analyst
(Related: BI Analyst)
Data Engineer
(Related: BI Architect, BI Engineer)
Business Analyst
(Related: Technical Project Manager)
Machine Learning Engineer
(Related: Software Engineer)

Reporting Structure

While there is no one-size-fits-all structure, there are general trends:

Data Scientist/Data Analyst/Business Analyst

These roles may report to any part of the business, depending on how centralized the data organization is. The more centralized, the more likely they are on the same team. Sometimes they may be their own team entirely, rolling to the CEO independent of any other C-Suite leader. Other times they may roll up through the COO or CTO.

Other times, they may be decentralized and be scattered across the company with no specific structure.

Data Engineer / Machine Learning Engineer

Typically these fall under the CTO. Data Engineers may be under IT, or may be their own division. MLEs typically fall into Software Engineering -- see below for more discussion.

Detailed Breakdowns

Data Scientist

Overview: This is the most-publicized job title out there and therefore is the broadest; it can mean many things at many places.

Data scientists are forward-looking and focus on predictive analytics. They certainly can do descriptive analytics, but their value comes from modeling/classification/etc.

Due to the emphasis on modeling, data scientists typically have advanced degrees in statistics, applied math, information science, or similar.

Example task: Predict how much stock of each item a company should order from its manufacturers in advance of the holiday season.

Data Analyst

Overview: While Data Scientists are generally forward-looking, data analysts are generally backward-looking and more entrenched in the business. Their job is to help the business understand what has happened up to this point and provide data in a clear & concise way for decision making in the future.

Typically data analysts focus heavily on making visualizations and presentations for the business and bridge the gap between the business and the data.

Data Analysts are also more jack-of-all-trades. It's common to do a bit of data science, analytics, engineering, and PMing in a single role.

Example task: Create a flexible Tableau dashboard for leadership to track trial conversion to paid users over time

Data Engineer

Overview: Data engineers work on the databases that the other members of the analytics org use to get information to stakeholders. They are responsible for bringing data from the business into some form of data warehouse in an accurate, timely and secure fashion.

This means the typical customers of data engineers are the data analysts/scientists at the company. They also may work directly with different parts of the business as they want their own data ingested automatically into the larger data warehouse.

This role is typically more technical and code-heavy in order to move massive amounts of data around at scale. There is less interaction with the business than other parts of the analytics organization.

Example task: Mirror Salesforce data in a schema in Snowflake, updated every 5 minutes, for analysts & scientists to analyze/visualize

Business Analyst

Overview: Business Analysts are sometimes called a "project/product/program manager", or PM. No matter the name, they are distinct from Data Analysts/Scientists in an important way. They coordinate and organize data projects across the business.

This role typically doesn't exist early on in a data team's existence. Usually individual analysts/scientists take this on until the burden of project managing starts outweighing time spent actually doing analysis. Eventually, the role of Business Analyst comes along.

Business Analysts are not expected to code or be as savvy on the technical side. Rather, their job is to identify problems, gather requirements, allocate resources and coordinate expectations between the data team and the business. This is no small task as many technically minded individuals are great at doing an analysis when there's a clear question, but struggle to work with non-technical individuals across the organization.

Example task: Sales wants standardized KPI dashboards across their worldwide teams available for next quarter's SKO

Machine Learning (ML) Engineer

Overview: This is a bonus position added in, largely since I see Machine Learning discussed commonly on Analytics forums and many of you may be wondering how it fits into a data org. The short answer: this role doesn't fit into the data org per se.

Specifically, this role is commonly found on the Software Engineering (SWE) org and is more of a Software Engineer with an ML focus than anything else. This role is most similar to a data scientist and usually is more involved with implementing models within the production code of the company to solve whatever problem has been identified.

Example task: Predict which users on the website may want to know about Feature X, which will prompt an informational pop-up

In Conclusion

There are all sorts of roles to explore and this list is by no means exhaustive. As I mentioned at the start, the names above may be conflated with each other at any given job you apply to. Regardless, this gives you some guardrails around what sorts of roles are out there -- from non-technical to technical and everything in between.

Categories
analytics bi businessintelligence everythinganalytics

Landing Your First Analytics Job

Entry Level Position - requires 5 years experience

-Every analytics job posting

This is Part 2 of the Everything Analytics series. Find Part 1 here.

Too few applicants with experience

As I mentioned in The Wandering Path to an Analytics Career, there is a ‘Great Filter’ in Analytics. It looks something like this:

Lots of people want to break into an analytics or data science career, yet not many are able to. This leaves a glut of competition for entry level positions, and not enough qualified applicants for mid-level to senior positions. Once you get your first few years of experience, you’re golden! You have your pick of many options within the data world – but you have to get past the Great Filter.

This rings true for me in an anecdotal sense – I have experienced this as a job seeker, interviewer and in discussions with data hopefuls. Given this is a blog devoted to data, I wanted to quantify the interest in a analytics positions just posted on LinkedIn. Unsurprisingly, you’re swimming upstream if you’re just blanket applying to analyst jobs – dozens to hundreds of applicants within a day or two of posting. See below:

For a fantastic & further in-depth analysis, I highly recommend reading the "Glut of New Data Scientists" section of this blog by Vicki Boykis.

Applicants Focus on the Wrong Things

As I’ve combed through resumes, cover letters and LinkedIn messages for the past five years, I’ve noticed applicants consistently missing the mark on what will set them apart. They consistently point to technical ability:

                Technical skills (SQL, Python, R)

                Mathematical skills (Statistics, algorithms, modelling)

                Certifications (Data Science Bootcamp, vendor-specific courses)

Those things are all great, but they don’t differentiate you from the pack. Everyone has some nominal experience in these things, you likely don’t have experience in all the tools the company needs (What if they use Looker instead of Tableau?) and even if you didn’t much of this can be taught on the job.

When I interviewed at my current position, I wasn’t asked one technical question. When the Director of Analytics stopped to see if I had any questions, my first one was “Why aren’t you asking me any technical/SQL questions?”  I’ll never forget his response: “If you’re missing any technical skills, we’ll teach you.” Wow.

This seems counter-intuitive. Isn’t data analytics/data science more technical? Don’t you have to code?  Of course you do! But those aren’t the most sought after skills; they’re a means to an end.

What Top Applicants Demonstrate

Analysts that shine on applications and interviews show they can persuasively communicate complex ideas using data. The job of a data analysts is to work with a stakeholder to generate business value. That doesn’t happen through coding – that happens through understanding the business, understanding the problem (even if it’s not directly stated!), breaking that complex problem down and communicating what the data says to do. Technical ability is solely leveraged to get there.

This is in the realm of “soft skills” is learned quickly on-the-job and is tougher to gauge for someone with no experience. How effectively can you work with non-technical stakeholders? Will people like working with you? Can you distill an ambiguous question into an actionable insight?

Top applicants can point to experience showing they can handle these scenarios and that’s why they rise to the top.

Tough to Teach in Classes

Classes are unfortunately a poor place to learn and/or demonstrate critical analytics soft skills. Teachers ask you very precise questions and give you very precise datasets to see if you’ve understood explicit topics listed in the syllabus. In the real world, this isn’t how analytics works.

Sometimes you aren’t even told there’s a question. If you’re asked a question the person may mean an entirely different question. The data might not exist, or it might sort of exist, or you might need to make it yourself. Your presentations are to a potentially skeptical crowd who doesn’t care what methods you used to arrive at your conclusion.

You can see the pattern – it’s near impossible for a teacher to create this sort of ambiguous and dynamic setting in a classroom. Imagine not knowing what day a test was coming, or if there were even questions on the test, or if the questions on the test were the ones you were supposed to answer!

This stuff is learned on-the-job, hence the need for experience.

OK OK, I get it. What do I do?

Now we’re to the crux of the matter – is there anything to give yourself better odds at landing that first job?

Yes.  There’s one overarching tactic, with three options you can do today to gain experience that will make a difference in an application.

Option 0 – Network, network, network!

This applies to all three other tips. If you’re throwing your resume into the ether of hundreds of applicants, you’ll find less success than networking with the leverage of the tips below

Option 1 – Start doing analytics in your current job

This is where many of us (including me!) started. You know how there aren’t enough good analysts out there? Take advantage! That means your company needs data people. Your boss, or some other boss needs help. Discuss their data issues and see if you can take something solvable. Don’t overcomplicate this. Use Excel to start, it’s a great place to iterate and extremely flexible. Move from there – find a pain point someone has with data and try to solve it. Start small and build. This is phenomenally effective, and you can point to this experience when applying to jobs later on (or move to a data position at your current place!).

Option 2 – Work on a data project you are passionate about

 I see tips everywhere saying “take on a personal data project” but rarely see much helpful advice beyond that. My top recommendation is to think of a hobby or interest you have and create an end-to-end analysis. Do you love a particular sport? Try to predict something that will happen. Have a favorite hobby? Think of a dashboard you could create to display your time spent/skill improvement. The more interested you are in the data, the further this will take you and the more time you’ll put into it. The options here are endless, but should be regarding something you love.

This gets you experience across the entire analytics pipeline – finding/cleaning/enriching data, asking good questions, visualization. Tableau Public is a great way to publish your results and iterate. The options are endless, and you can demonstrate skill and passion in an interview pointing to a portfolio of data projects. It doesn’t matter what tools you use, though my only recommendation is SQL and some sort of viz tool be involved.

Option 3 – Take online courses to brush up on technical ability

I’ve spent a significant amount of time saying that technical ability won’t separate you. That’s true, but you do still need some technical ability or you may be disregarded as not technical enough. If you don’t know SQL at all, you can take some basics online as part of doing Option 2. Do some basics on Tableau or Python or whatever strikes your fancy. This certainly is the least helpful option for standing out, but it also is a prerequisite if you lack technical ability. Typically if you’re doing Options 1 and 2, you’ll end up needing to do this option anyway.

In Conclusion

Breaking into analytics isn’t easy. But there are methods to get past the Great Filter and get your first job. It’ll take hard work and some luck and the goal is attainable. Companies need passionate and smart people to make sense of their data, and you can step into that role. Make it happen!

Categories
analytics excel retro

NBA Bubble Sim: A Retrospective

One thing that I really enjoy as an analyst is creating new models - and expanding them. I made a version of the Bubble sim with 1m+ scenarios, for example (that will turn into a blog post here at some point). But I rarely maintain the focus or energy to take a look at it after the fact to determine "how good was it at actual predicting the future?"1 I'm aiming to change that with this real-life example of this NBA model. So with that said, let's dive in.

Predicting individual games

Using ELO to predict individual games should theoretically massively improve the predictive ability of the model versus, say, coin flips. However, as we will see, that was really not the case.

quality of prediction for individual games

Ultimately, we were just slightly better than coin flips. Sort of disappointing if I'm honest. I do think there is some context that ELO is particularly bad at explaining, which we can distill into the statement "ELO overstates the relative strength of teams that have clinched a playoff birth."

I'll dive into this at the end, as I think some faulty modeling by the NBA around this assumption lead to some crappy basketball being played.

Predicting which teams made playoffs

When I look at the 1000 scenarios in aggregate (instead of a game by game basis), a much clearer picture of the model and its effectiveness is painted.

quality of prediction for making playoffs

Looks pretty good! A damn good model. HOWEVER - given that for all intents & purposes, 15 out of 16 playoff spots were guaranteed, this really is a false narrative about the effectiveness of the model.

Reducing scope to measure uncertain outcomes

For the purpose of this analysis, I will take a look at the quality of the model as it relates to 3 teams - the New Orleans Pelicans (NOP), the Memphis Grizzlies (MEM), and the Portland Trailblazers (POR). This is because these are the 3 teams competing for the final playoff spot, so by getting better at predicting these teams, we improve the efficacy of the entire model.

predicting outcomes for POR, MEM & NOP

I can't say these updated stats are particularly great. We are more accurate here than we were for predicting specific games, but far from some certain enough to do something like gamble on this model reliably. Even knowing what we did going into the NBA bubble, Portland, who ultimately made the playoffs, only had a 29% chance to make the playoffs.

Incorporating some modifications

One obvious observation as the bubble games continued was that "ELO overstated the relative strength of teams that have clinched a playoff birth." With this knowledge, I started tweaking my model to accommodate this new information. Ultimately what I landed on was to reduce the ELO for teams that have already clinched by 20%. This number is totally arbitrary and based on gut feel. I also assumed the eastern conference was de-facto clinched based on the players who opted out or were injured for the Wizards.

Given the relatively poor performance of the model, I was seeking to explain the following data points:

  • The Bucks & Lakers were playing very poorly.
  • The Suns & Blazers looked unstoppable.

With the modification of the model to reduce ELO for qualified teams by 20%, the new playoff odds looked like this:

playoff odds with ELO reduction for clinched teams

Of course, simply buffing Portland's playoff odds massively increases the accuracy of the prediction, so this might be a bit too reductionist. Furthermore, with some clever configuration of Excel to leverage the solver, the exact handicap percentage could be tweaked to maximize the odds of Portland making to playoffs.2 That being said, let's take a look at how model quality changes with this change:

prediction quality post adjustment

This is MUCH better. Obviously, the updated model has the benefit of some hindsight here. But a small, targeted change the model was able to increase accuracy from 54.7% to 69.2%. Precision & recall increased by similar margins. I think there is something here that can be applied to future models of NBA outcomes.

Conclusion

Overall, I am satisfied with the outcomes of this process of exploring the model in the context of the metrics above. The key learning for me is that certainty of outcomes does impact the quality of play, at least in the NBA bubble. After accounting for that, we were able to increase model accuracy by more than 25%. To get more accurate, my analysis would need to be more surgical in approach.

My biggest take-away is that I will be designing future models to enable rapid analysis using the metrics here-in. I didn't do that in this case as I didn't account for actually doing this analysis. Having appropriate consideration for accuracy testing in the front end would have meant I could have backtested assumptions and model changes across a much broader data set. As a result, I didn't have an easy way to test my updated assumption of the 20% ELO discount down at the game level. I'm certain that applying better science techniques could result in an even higher accuracy model.

I do find it super interesting that there was a huge miss on the New Orleans Pelicans performance vis-a-vis their ELO rating. This entire process was arguably designed to maximize the odds of the Pelicans (& Zion) to make the playoffs, and in that regard, the NBA's experiment failed completely. Conversely, one thing that could have been anticipated based on the 20% ELO handicap is that the Phoenix Suns had around a 35% chance to get 7 or 8 wins. Given that, it probably would have made more sense for the NBA to open a mini-tournament at the bottom of the bracket for 7/8/9/10. It would have increased the quality of play and led to a more exciting finish to the end of the regular season. And I think NBA, who certainly has modelers far more sophisticated than I, should have anticipated the drop in play associated with teams who have already clinched.

footnotes

1I'm using the assessment framework found here on towardsdatascience.com, for accuracy, precision, true positive rate, sensitivity, and F1 score. You can find the definitions within that link - it's worth the read.

2After writing this, I did some excel tweaking to allow the solver to optimize the handicap for clinched teams. It was 20.00001%. Bizarre.

Categories
analytics bi businessintelligence everythinganalytics

The Many Wandering Paths to Analytics

If we treated careers more like dating, nobody would settle down so quickly.

David Epstein in Range: Why Generalists Triumph in a Specialized World

I consistently receive the same questions from people seeking an Analytics career: What classes should I take? What certifications should I get? Should I learn SQL, Python or R?

Behind those questions there's a consistent assumption: "There must be a clear path to an analytics career."

I'm here to challenge that assumption. There isn't one clear path to work in analytics - most of us got there through a winding, wandering series of career moves. My story is one of many - ask someone in Analytics and you'll hear something similar.

Typical Wandering Path

(1) Get a college degree or other training - not super relevant
(2) Work for a while in some job non-analytics related
(3) Recognize interest in analytics
(4) Start doing basic analytics at work (ideally) or on own time
(5) Leverage that experience into first analytics job

I'll call out each step as it happened in my career journey.

My Wandering Path

Initial Career (Years 1-4)

Coming out of college I shared the assumption that careers were linear. After all, life to that point was linear, so why wouldn't careers be the same?

Except, my linear plans fell apart two days before my wedding in 2011. I'd studied International Economic Development (Step 1), interned in Latin America, become fluent in Spanish and was planning to move with my soon-to-be wife to Bolivia. In one phone call and several subsequent conversations, that potential life and career ended. I was sitting in a dead-end job I thought I'd be leaving and had to figure out Plan B.

At first it wasn't obvious - what else should I do? I was a Customer Success Manager (Step 2) but didn't really want to do that as a career. I'd worked in sales departments, but didn't really want to be a salesperson. But then I had an epiphany - there was a part of each of my first few roles that I loved that never was part of my job description.

I was consistently making little analytics & reports (Step 4 - which ironically for me came before step 3!). I'd turn 2,000 customer emails into a digestible summary for the product team. I'd make Salesforce forecasts & dashboards for the executive team. I made a Google Sheet for my Rosetta Stone team to help management track & manage renewal rates for their teams. This stuff was fun! I liked it! (Step 3) But what now?

The Great Filter: Landing First Data Job

Have you heard of the concept of the 'Great Filter'? It's part of the Fermi Paradox, which ponders why there is no extraterrestrial life given the seeming high probability it should exist in the universe. Within the Fermi Paradox, it's the step getting from non-living matter to living matter (abiogenesis). The Great Filter is a catchall for "it's hard to get past this point."

I argue there is a Great Filter for those trying to get into Analytics - getting your first job. In fact, I'm devoting my next blog to this topic, so consider this a lead in to next week.

Passing Through the Great Filter

I realized I had an uphill climb ahead. Perhaps this is where many of you are - how do you get a company to take a chance on you?

I asked lots of current analysts via informational interviews at local coffee shops. They all said "I got here via a pretty random series of events." Sound familiar?

They gave me a breadcrumb trail, though: "You have to get enough experience together and communicate about it well enough to get someone to try you out." Easier said than done, but I did have some experience already in my current role.

I applied everywhere. I was told by recruiters/HR multiple times "Hey, I guess you could be an analyst but I think your future is in sales based on your resume." Leads fizzled out until one day I got a call.

The Meeting That Changed Everything

"Can you show up at the office in an hour? The CFO and SVP of Sales want to talk."

I got that call from Jacob -- the same Jacob here at DataDuel. He was working at Funko, a quickly-growing collectibles company north of Seattle. They didn't have a position open yet, but there was interest in getting analytics going. Before going into the meeting, here is all I knew:

  • The position was planned to be part-time Analytics and part-time something else until analytics skills were proven
  • The position was planned to be a contract position (not great - I'd just bought my first house and wasn't looking for a contract spot)
  • There would be minimal support since no data team existed, so a self-starter attitude was needed

Given those three bullet points, I had three goals going in:

  • Communicate my potential to be great at analytics if given the chance
  • Sell them that I was worth a shot as an employee & not a contractor
  • Demonstrate self-starter attitude to analytics from previous roles

I quickly threw on a dress shirt, re-learned how to tie a tie on YouTube and flew out to the car.

The rest is history and full of the content I'll fill this weekly series with. The conversation went really well and they decided to take a chance on me a couple weeks later (Step 5!). While I got a full-time position, I took a 10% pay cut because I needed to prove myself. I knew the temporary sacrifice would be worth it - I just needed my first position to get past the Great Filter.

In Conclusion

There's no one path to analytics - there are many. I've used my path as an anecdote for the infinite options out there.

The general path, though, is to start doing some analytics in any fashion you can, and leverage that experience to get your first position. It isn't easy - there's a Great Filter out there which prevents many from getting in.

Come back next week and I'll dive into the Analytics Great Filter in more detail, and provide some practical options of how to overcome it.

Categories
analytics bi businessintelligence everythinganalytics

New Weekly Series: Everything Analytics

Do you enjoy working with data in your current role? Are you interested in a Data Analytics career? Are you currently a Data Analyst?

Good news! This weekly series is for you. It'll cover all sorts of topics within analytics, including advice for aspiring analysts, best practices, key skills/tools and industry updates.

Initial blog topics include:

  • The Many Wandering Paths to Analytics
  • Analytics Job/Role Types
  • Key Skill Sets for Analysts
  • Visualization Best Practices
  • Measuring Success of Analysts
  • How to Prioritize Your Work Backlog
  • ...and more!

Much of this will be written from my perspective as an Analyst. There are other perspectives out there for unique positions like Data Scientists and Data Engineering, and while I'll touch on those regularly (and will write an entire post on the difference between those roles), the focus here will be Data Analysts.

See you in a week!

Categories
data viz powerbi tableau

Duel: NBA Bubble Projections

For our inaugural duel - Jacob created a data set based on 538's NBA Predictions. He'll create a deep dive into the mechanics of the model and how to leverage Excel's data table function for no-code simulations in a later post. The data is available at the bottom of this post.

Jacob is a native Excel user and has created similar models for his Fantasy NBA league. He was able to take those models and dress them up for this duel - albeit in a format that was native to PowerBI and Excel. More on how that impacted the Tableau side of the analysis below.

Since we were using the 538 data set, we decided the first part of the challenge should be to replicate the view above in PowerBI & Tableau.

Some of the data weren't readily available, i.e. projected point differentials and team logos. For the purpose of the commentary below, we will be ignoring these facts.

As a phase 2 / stretch goal for this challenge, we also set out to create our own, novel visualization of the scenario combinations. This helped us to answer questions like "When the Bucks make the finals, who are their most likely opponents?" or "What are the paths for the Celtics to the conference finals?".

PowerBI Commentary

This is section is written in first-person by Jacob.

Part 1: 538's visualization

Where PowerBI succeeded: Getting the calculations out of the attached data set was fairly easy once I sorted out the data model in my head. While the data wasn't perfectly formed, it was quite easy to shape it using PowerQuery to get what I needed.

PowerBI Data Model

I added a couple of measures on top of it and the table working pretty quickly. Getting the conditional formatting to match was fairly easy too, although to get an exact color match I used the "color dropper" from powerpoint on a screenshot of the website (gross).

Where PowerBI struggled: I couldn't quite figure out how to get the sorting to work when I replaced certain values with "icons", i.e. >99% or the "checkmark" icon. PowerBI treats the field as a string and therefore does a character-based sort. This means that apply a single sort on the outcome of the model doesn't really work! Instead, you have to sort by ELO rating or by Projected standings to get a cohesive sort.

After I wrote this initially - I did find a workaround for this sorting issue, sort of. This video from Guy In A Cube explains the "hack" - but it is indeed just a hack.

Part 2: Scenario Modeling

I am pretty satisfied with how this visual turned out - but the sorting on probability fields continued to plague me. Also, the mental model for this data was effectively recursive, and I am not sure how to accomplish this in PowerBI, so I imported the same table twice. See the image on the leftfor how this was accomplished.

After fighting with DAX on and off for a few days, I was able to get a "base scenario" calculation using the ALL Filter. This meant that when you selected a Team from "series_winners" you could calculate the odds of that scenario versus the "base" scenario. This surfaces really neat scenarios in the modeler, such as an OKC win in the second round which double's Milwalkee's championship odds.

You can find the DAX for stripping the filters from the "series winner" table, below.

All Scenario Win Pct = 
CALCULATE ( COUNT ( series_detail[TeamID] )/
        DISTINCTCOUNT ( series_winners[ScenarioID] ),
        ALL(series_winners)
    )

Tableau Commentary

This section is written in first-person by Nate.

Part 1: Data Prepping

Where Tableau succeeded: Tableau handled the data really well once I completed a lot of trial-and-error to get the data into the right format. The data model I put together involved two tables in there twice, so it's likely sub-optimal but is functional. Specifically, Tableau is consistently improving how data can be loaded & prepared (See the recent changes just launched in 2020.2) but my unfamiliarity with those new features meant I didn't have time to give them a go on this analysis.

Tableau Data Model

Where Tableau struggled: My experience getting data into Tableau nearly always involves a connection to SQL - either a direct connection to a table or a very clean CSV output of a SQL query. Since the data model created by Jacob is in PowerBI/Excel, I had to do some manual adjustments to the tables to get them in the format I needed, such as creating long tables (just a few columns) out of wide tables. This resulted in several more hours of work as I did trial & error between modifying data and trying to visualize it in Tableau.

Part 2: 538's visualization

Tableau's version of 538's table

Where Tableau succeeded: Getting this table created was very simple once I finalized the data model. Sorting works well across all columns and the Tableau method of dragging dimensions & measures around to get colors & formats worked well.

Where Tableau struggled: I could not get some of the nifty 538 features in the table - such as a checkmark at 100%, and a string for "<1%". Instead, Tableau displays 0% for situations that round down to 0%. I tried adding in a decimal place, but that just cluttered up the view. As well, Tableau does not have strong conditional formatting capabilities for the background of cells. It's possible (see the KB article here) but I found the saved development time for other work by instead just coloring the numbers and shifting to a darker cell background.

Part 3: Scenario Modeling

Where Tableau succeeded: After several rounds of arm wrestling, pleading and bribing - Tableau finally assented to the view I wanted which included:

(a) The original odds
(b) New odds based on selection
(c) Visualization demonstrating change in odds

There's much more I wish I could have done but in the interest of time (the playoffs are going now!) it was time to hit 'Publish'

Where Tableau struggled: I spent multiple hours trying to get the FIXED LOD formula in Tableau to work which would allow me to compare the odds from both the unfiltered view and the filtered scenario view at the same time. Tableau can be frustrating to use when trying to visualize multiple levels of detail in the same view, and likely my chosen data model contributed to the struggles.

I asked the best Tableau user I know for some advice as I was getting this viz prepared and his advice sums up the struggle well: "When dealing with LODs, I usually just try every permutation until something somehow works." Turns out he was right in this instance, too.

Overall Winner: PowerBI

Category Winners:

538 Duplication: PowerBI - but really neither piece of software allows the customization that the web package used by 538 has. Still, we gave it a decent go. PowerBI does tables a bit better, so it wins here.

Scenario Modeling: While Tableau is very snappy and honestly more "discoverable" (good job with the tooltips, Nate), the PowerBI "tournament visual" is very intuitive for sports fans. Additionally, the analysis at the bottom of the chart is more comprehensive and more understandable than the tableau bar charts of the same info. We give the edge to PowerBI.

A note on data prep (not scored): The in-app data prep with PowerQuery is a no-contest when compared to Tableau. This will pretty much always be true and can be both good and bad. Good, because it allows a lot of control at the fingertips of the analyst, and bad, because a lot of code, either in M or DAX, gets added to PowerBI instead of database engine of your choice.

PowerBI Commentary from Tableau User

Where PowerBI succeeded: PowerBI is more equipped than Tableau to display data in a tabular format such as the one on 538 and that shows in the final product. The ability to put many small views of data into a single dashboard also proved to be powerful in the final scenario modeling output.

Where PowerBI struggled: PowerBI depended too strongly on its native table functionality, resulting in lots of details but a lack of bold & clear visuals. Sorting also turned out to be tough as you may notice when using the final interactive version.

Tableau Commentary from a PowerBI User

Where Tableau succeeded: The data model is much easier to grasp even to technical users. It's much faster to interact with, and the tool-tips make it very easy to understand. Additionally, clever use of the NBA logo immediately contextualizes the user.

Where Tableau struggled: As Nate mentioned, the data prep took a significant amount of time. The conditional formatting inside tables is not very finely tuned, especially compared to PowerBI. Hilariously, sorting inside a table has its own set of issues (sort on "Win Championship").

Links

Link: Interactive version of the PowerBI report.

Link: Interactive version of the Tableau report.

Let us know your thoughts below! A list of the files can be found after the jump.

Categories
Charts Reconsidered

Charts Reconsidered: Mask Wearing

Like any good analyst, I enjoy scrolling through r/dataisbeautiful. And when I say enjoy, I really mean "I'm doom-scrolling through reddit because twitter is depressingly worse." Of course this leads me here, to our first entrant into "Charts Reconsidered", where every week I will revisit a chart from reddit and suggest some improvements.

That leads me to this chart - the 5th ranked chart on the subreddit on July 21st, 2020. It tells us who does and doesn't use masks, by a few different breakdowns. It is an interesting story, but it could be told in a better way.

3D bars. Yikes. This reminds me vaguely of "WordArt".

Sorting

There are 4 distinct groups in this chart - Gender, Political Party, Education, & Overall. They are all mashed together with no space.

In excel, I would use "blank" series to add space between each group to improve readability while enabling shared axes. I would also pick a consistent series to sort on from high to low.

Colors

Green & Grey isn't a great color combo and gets amplified by a lime green gridline color. It's not a good look.

Keep the gridlines in background, a lighter gray perhaps. For a chart like this, I would use a light and dark tone of the same color. Or you can steal the Ben Evans approach - and use shades gray + a single color for emphasis (in his case, red).

Labels & Gridlines

Too much info crammed into this part which muddles the story. There are major and minor Y gridlines, which are then labeled without a percent sign. The bars are also labeled. Lastly, the X & Y axis labels are switched.

Turn off minor gridlines and make the major gridlines either 25 or 50. Add Percentage labels so the units are clear. And fix the axis labels (or remove them).

Chart Type

3D bars with series stacked front to back is not a good look. This is most obvious in the GOP group, where the labels overlap the bars. The lack of spacing between groups makes it challenging to see differences between groups as well.

Just use a regular, stacked bar chart.

New visualization

With the magic of PowerBI - I've crafted a new chart, with the same data, to tell a more visually appealing & easier to understand the story.

Regular stacked bar chart, grouped and then sorted alphabetically.

The labels are removed, the legend is cleaned up, and the colors are simplified. Did it take longer to make this chart? Yes! Does it tell a better story - also yes!

I hope you found this feedback helpful. Let me know what else you would change in the comments below.

Categories
Rules

What is Data Duel?

Data Duel is a site built around the idea of exploring what tools are available to analysts for interacting with data. Nate & I come with Tableau & PowerBI experience, respectively, but are technical enough to explore other toolsets, and of course, we both love SQL.

We expect to be posting a new "Duel" roughly once per week. The Duel is based will be comparing and contrasting ways of visualizing a data set. These data sets will be made available for your own remixing as well, within each post.

The constraints of a "Duel" will be mostly based on time - i.e. we will be spending an agreed-upon maximum amount of time on each viz. For the most part, there are no other rules, although bringing in supplemental data should be agreed upon between both parties.

Outside of the "Duel" posts, we will be posting on other topics as we see fit, with Nate focusing on a "breaking into data analytics" series, and myself focusing on "PowerQuery for SQL users". We will also be sharing our thoughts on release notes of our favorite pieces of viz software as well as remixing other content as we see fit.

I hope you find this fun & useful. Consider it our love letter to analytics.