Question first approach to data gathering and analysis

Emily DworkinMember of Product Management

When introducing data tracking into your product there are about 2 and a half approaches.

List everything that happens/is clicked on that you can think of
List questions you’d like to answer about your product then determine what data is needed
2.5. List basic things you’d reasonably like to track then add elements as you find questions the existing data doesn’t cover

My suggestion is to use the second approach however it’s important to discuss the others to understand the value and when a different approach might be appropriate.

Note: For anyone reading this that is brand new to data concepts, the easiest way to think of it is like an index card. The front gives you what the event or object is, and the back of the index card has specifics about that card. For example, you may have an event for every time someone logs in. The front of your index card would say something like user_login. The back of your index card would have information like user_id, timestamp, and geo_location.

Data ≠ Information ≠ Insight

There is an important distinction between data, information, and insight.

Data can be either quantitative (numbers) or qualitative (not numbers). It is raw input that is stored for future use. By itself is essentially meaningless but without it information and insight wouldn’t be possible.

From data, you can come up with descriptive analytics which tell you what has happened in the past such as DAU, conversion rate, or how many times unique users clicked a particular button. You can also use it to come up with predictive analytics which attempt to tell you what will happen such as predicted DAU next week or predicted conversion rate during a sale. This is information. You have taken raw data and now can form short sentences with it such as “on Monday we had 50K unique users log in.”

However, descriptive and predictive analytics are useless unless they can be turned into actionable insights. This requires more research than a single query. Insight is information with context. Good insight is information with context that leads to action. An insight might be “On Monday we had 50K unique users log in however only 15% of those completed a target action, of the remaining 85%, 75% of them closed the window after less than 2 minutes. We took a representative sample of those users and conducted 10 user interviews, 8 out of 10 indicated that they could not find the button needed to complete the target action because it was not visible without scrolling to the bottom of the page. Changing the layout to accommodate smaller screens or changing the location of the button should positively increase the number of users that complete the target action and reduce drop-off.” Of course, if you don’t have the right data or don’t pull the right information from that data, you won’t be able to reach the end goal of generating these insights.

List Everything Approach

Usually the knee-jerk reaction is to start listing all of the events that you think are important to track. You may start with general groupings such as logging in, then break that into smaller actions but it is essentially an attempt at finding an exhaustive list of every event you may need.

The problem with this approach is you will inevitably miss something, and are likely collecting data you will never use.
This also treats data as though it were by itself useful however as discussed above, we know data is not useful without context.

Question First Approach

This approach may take more time however it is time well spent. It requires knowledge of your product, an understanding of insight vs information, and some understanding of how data works (see index card analogy above).

Identify a list of questions you may have about how the end user interacts with your product.

For this step I encourage you to get input from everyone who might be interested from developers to marketing.
When creating this list, don’t worry about how you will answer these questions. There is always a way but focusing on the how first is often an innovation killer.
Set the list aside and come back to it a few times

From the list of questions start breaking it down into events and objects you’d need to query and the attributes each event/object would need. (You may only need to query events if end users don’t create things that are then stored within your platform)

Be as thorough as you can but the next step will help you cover any that you miss
Use this as an opportunity to break down the questions you have to understand intent and come up with questions you may have missed.

Come back to your list of questions and choose the most important ones (or all if you’re feeling ambitious). Write either an actual query or pseudo code for each question. If you don’t know how to do this, listing the event/object and fields you’d need will usually suffice.

This shows why each event/object you’ve listed in your data structure is needed
It also serves as a reference when you have the data and actually go to run these queries

Now go look at what data collection you already have (if any). Assess what you can get, and if there are any events that aren’t possible to collect, for example if they happen on a different platform, think of proxies you can use that will get you close.

The reason we come up with a dream data structure first is in the age of cloud based databases, it is more cost effective to store data that is easy to query even if there are redundancies than optimized data storage that requires very complicated queries. The time to write and debug these queries costs more than the extra storage.

Implement the final data structure!

This process may mean that you miss some events that you later want however you are guaranteed to have the data needed to answer the most important questions that currently need answers and minimizes collecting data you will not need.

This approach puts insight at the forefront so the true purpose is never lost.

List then Questions Approach

This blends both approaches. You may be in a position where you aren’t ready to start in depth user analysis but want some surface level information. There are some basic things that are pretty universal such as tracking page visits for web based products and logins for both web and desktop products. This will at least answer some simple questions like “how many people come to the site” “how many of those log in” “how many purchases” “where are my users” and “what time of day are they using the product.”

From this foundation you can over time start adding as you encounter a question that can’t be answered by the data you already have. This is also the approach you’d continue with after using either of the above methods to add to your pursuit of insight.