Empower DIY Research: Predict Product Success with AI and POS Data 

Editor's note: This article is an automated speech-to-text transcription, edited lightly for clarity.   

The success or failure of a product launch can make or cost a brand a lot of revenue. The more traditional way of testing products is not a foolproof method.  

So, the team at Cambri set out to find a more accurate way to test product concepts in order to help clients save money. During the session the organization sponsored on June 11, 2025, during the Quirk’s Virtual Sessions – DIY Research series, the new method was explained in depth.  

Cambri Co-Founder, Dani Kamras, explained what the team at Cambri was hoping to achieve when creating this platform, why the traditional method is inaccurate and how the new platform’s method is able to predict product success or failure.

Session transcript 

Joe Rydholm

Hi everybody and welcome to our presentation, “Empower DIY Research: Predict Product Success with AI and POS Data.” I’m Quirk’s Editor, Joe Rydholm, thanks for joining us today.  

Just a quick reminder that you can use the chat tab if you’d like to interact with other attendees during today’s discussion. You can use the Q&A tab to submit questions to the presenters, and we will get to as many as we have time for during the Q&A portion.  

Our session today is presented by Cambri. Dani, take it away!

Dani Kamras 

Great, thanks, Joe. Thanks for the intro and thanks for having us and of course, thanks to all of you joining our session today. The main theme is of course around DIY research.  

DIY has been very much about a faster and cheaper approach, but we want to challenge that and show that you don't need to sacrifice quality or depth of insights either. We will be looking into how combining point of sales data, real in market sales data with survey data by utilizing machine learning in your concept testing methodology and how that can have a significant uplift in the accuracy of picking the right concepts to launch to the market. Also, how it actually can help you understand why the concept is strong or weak and how to improve it.  

So, over the next 15 minutes or so, I'll be walking through our methodology, our AI component that we call Launch AI. Also, what results we have gotten from that compared to some traditional action standards used for concept evaluation. Then after that I'll jump to our platform to show you what the output of that looks like. If you are using our platform as a DIY platform, then you will be able to see what the output of all of this is. 

But first of course a few words about Cambri in general for those who haven't heard about us before.  

We're an insights and innovation platform and an agency. We’ve around for 6 or 7 years now, working with many of the large CPG companies today. Born out of the Nordics where I'm also based, but today we have offices across Europe and just actually opened our first U.S. office in Austin this spring.  

What makes Cambri unique is that we've been built from inception with AI in mind. Our thesis from the beginning has been that there is so much data generated within research, but the majority of it is squandered. So how can we capture the full capacity of the survey data, including the open-ends. Also, how do we start to deliver insights that speak in business opportunity terms instead of survey terms only.  

We are not a general market research agency focusing on all types of research. Our focus is on the innovation process. So, supporting our customers throughout their stage gate or innovation process. 

Today's focus will be on stage two and three stuff. So, concept ranking refinement and validation phase. So how our DIY solution is supporting our customers specifically in those stages of the innovation process, but we are supporting the whole innovation process from early stages all the way to post-launch as well.  

Let's jump to the concept testing world. I would say the traditional action standards for a concept test typically in KPIs like purchase intent, uniqueness, brand fit and so on. And that is then compared to benchmarks, meaning historical test results. 

There are many reasons why using for example, a top two box purchase intent question or the combination of these type of KPIs only as the main action standard. O course, it's quite a simplistic way of evaluating something based on one or a few closed-end questions. So, there are of course challenges with those methodologies being fairly simplistic and not being that accurate in predicting a concept success in market and also giving you very little advice on how to improve the concept and understanding the ‘why’ behind the concept, why is it strong or weak basically. 

That has really been the starting point of things to challenge this sentence when we have looked into data from concepts that have been tested and how they were done in market before, we can actually see that these type of action standards are very poor in predicting how products actually will do in market. 

The main observations on the weaknesses of that more traditional and simplistic approach are that one, the definition of a strong concept, has probably not been right. As said, a claim behavior on a purchase intent question doesn't correlate well with how consumers behave in real life. Just because you get a higher KPI compared to your benchmark on, for example, a purchase intent question doesn't actually make it a strong concept. You don't really know how those concepts that were tested earlier and are part of the benchmark database, how did they perform in market? Did they actually sell well? You just basically know how they tested, how respondents answered those specific questions.  

The connection to real life performance is missing and that is really something that we as a company are challenging and want to make sure that the insight function is taking a step up and are trying to connect the insights to real market performance to real business impact basically.  

The second thing being that there are often contradictions between KPIs and open-ended answers. How should you account for that?  

So, there might be high purchase intent but also a lot of answers, open-ended answers saying something else that indicates that the concept is not that strong. How should you actually account for that?  

The third part being then to be able to include the analysis of open-end and historic product launches in your concept evaluation. It requires a lot of manual analysis and intuition, which even if you have the skills for it, you definitely do not have the time for it and you can't really scale that across an organization.  

So, these are observations that we made, that we have then started to challenge and see how we can overcome and help our customers overcome these challenges and limitations with the traditional methods.  

Our approach, isn't ripping up the foundations of KPIs and benchmarks, it just builds on top of it. So your preferred KPIs rolled up and evaluated against benchmarks are still there like in other methodologies. Where there are differences incorporating into one holistic model, a number of previously separate data sources. Meaning open-ends for the context and justification of the respondent's KPI based responses, but also the market performance data to ensure we have a connection between pre-launch testing and ultimately market performance. 

This of course gives us a much more robust and stronger prediction of the concept’s market performance potential and gives us a far more granular and actionable read on the ‘why’ behind the performance basically. 

Here is an example of a concept with very high purchase intent to highlight some of these challenges. 

In this case the top two box on purchase intent is 83%, so very high, well above the benchmark, but this product actually failed in market. It didn't perform that well in market. It is very easy for a respondent to indicate purchase intent in a five-point scale question. Clicking that, ‘yes, sounds good, I would buy it.’ But when you then start to look at the open-ends, you can see a lot of conditions when stating that, “yes, I would buy it if it had fewer calories” or “I would buy it if the price would be right” or “I would buy it, but it sounds too good to be true.” 

They don't really believe that there is a product that can actually deliver on everything that it's saying. So, in the open-ends we can find a lot of these conditions, but the question is of course then what do we do about them? Do we discount that 83 to 73? And that is of course really hard to do by intuition. So, luckily this is a big data problem and something that AI and machine learning are well suited for.  

That is basically how we discovered and defined the problem of the traditional way of evaluating concepts that we then aimed to improve.  

Our model is trained on a combination of survey data including all open-ends together with KPIs and benchmarks, but also point of sales data on products that have launched to the market in the last 24 months. In that way our model can really learn what survey output is indicating actual success in market and what survey output is indicating failure in market.  

The model learns the traits of success and failure and is then doing that on a number of drivers that traditional methods do not look at or capture basically. So you could almost say that we are triangulating closed-end questions data with open-ended data with point of sales data.  

So, that is how the model is trained. And then when the model is used for testing new concepts, it is able to give you a prediction on what sales quintile the concept has the potential to hit and which drivers are affecting that potential and to what extent. 

So, the model is looking at a number of drivers that are crucial for new product launch success and it has, when the model has been trained, it has then been able to learn to what extent those different drivers are affecting success in market basically.  

Now let's move on to what the solution then delivers.  

It delivers basically two different things. It delivers a prediction on what rate of sales quintile the concept has the potential to land in. So, it's basically predicting how well the product, if launched to market, will sell. Then it also picks the concept apart to these drivers to show why it is strong or weak and where you should look for improvements and how basically.  

For those who are not familiar with the point of sales metric, rate of sales or weighted rate of sales, it is a metric indicating how well a product is selling taken into consideration it's distribution. So that is a typical metric, how you evaluate products in market, how well they are actually doing and how should they earn more distribution or should they be delisted basically.  

And while we're using that metric as well, not only because it's a key metric that is used, it also gives you a terminology talking to retail and talking to everyone that is connected to true business impact and not just talking about survey measures that we get a purchase in 10th of 63% or whatever it is. So, the dialogue becomes more business oriented there as well. 

If we look a bit more at those drivers. So, the model is looking at 10 different dark drivers that are all important now in this case for any food and beverage category. So we look at things like believability, do people believe that the concept and the brand are able to deliver on the benefits that it promises.  

Category and brand relationship. Understanding where the consumers are today when it comes to the category and the brand? How far away are they from this product looking at differentiation and superiority? Is this something that is offering superior value compared to something that is in market today and so on. 

In this way, the model can identify both barriers and drivers for success that other methodologies do not. That is why it is much more accurate in identifying concepts that will sell well or poorly.  

Before I jump into the platform, let me show some numbers on how this approach compares to some traditional concept testing metrics when we have validated that. 

So, here we see numbers from real cases where we have compared the accuracy of the solution with purchasing 10 top two box action standard basically and how well they were able to identify successful and failed product launches when we have actually looked at how did these products then sell six months after they launched to market.  

We can see that the AI model is able to identify both successes and failures with more than 80% accuracy. In this case, the AI part was able to identify 3% of the successful products and 80% of the failed products.  

The traditional way is performing significantly weaker, as we can see here. It's of course important that when you're doing concept testing, you are using a model or an action standard that can identify successes, so you do not miss potential revenue by not launching products that have high potential. And that it can, of course, identify failures so you do not end up wasting time, money and resources on launching products that will fail and get delisted quickly.  

I's important that it's not only screening out really heavily. If it's just really tight and screening out most of the products, you leave a lot of revenue on the table. The other way around, of course, if it's not able to identify failures, you end up just wasting a lot of time and resources to put products out there in the market that then will not sell well.  

Also, if we look at how these two methodologies did on ranking. We had 53 products in a specific category, we can also see that the AI model is doing a much better job.  

On the left you see the top 10 products based on actual sales six months after they launched. The AI model was able to get six out of 10 in the top, top 10 out of these 53 products when the traditional method only got two out of 10 in the top 10.  

So, we do not see that DIY research, especially in this concept testing environment, needs to be only about speed and cost. It is actually, with modern technology, able to deliver very high-quality predictions and insights as well.  

That's a bit about the problem statement and the solution that we have developed. I hope it was a quick go through, but I hope that you were able to follow there. Please ask questions if anything was unclear there.  

Let's now jump to our platform to actually view the DIY output you receive when you run a concept test utilizing the AI model. Let me switch to our platform. Now you should see our demo account.  

I jumped directly to the results tab because that's probably now the most interesting part to look at today. This was a premium noodle concept that was tested both with the concept description like a value proposition and a pack shot of that product. The test was run in the UK with 300 respondents.  

Let's look at the output in the platform. 

The first tab, where I am now, is called the “LAUNCH AI.” That is, I would say, the tip of the iceberg, really summarizing for you the key elements of this concept and then you can dig deeper into other insights in these other tabs. But let's start here on the AI part. 

If I scroll down a bit, I can see that this concept got a score of eight out of 100. If the score is below 20, then it's predicting that this concept will end up in the bottom fifth quintile when it comes to rate of sales indicating it as a failure. So, if you end up in the bottom rate of sales quintile, you are typically under big risk of getting delisted really quickly. You are not just performing good enough to keep your place in the market. We see that the model is predicting this concept to be a failure and not recommending you move forward.  

Then if I scroll down here, then I'm starting to get an understanding of why this is weak and if can I do anything about it to improve it. 

We have these “Launch AI Score drivers.” I can see the positive drivers.  

All good here so far, we see that the appearance is positive. The consumers, the respondents of these surveys see that the appearance is good. The believability is very positive, so they believe that the concept and the brand will be able to deliver on the benefits promised. It's also scoring high on ‘good for me and the planet.’ So the respondents are indicating that this looks like a very healthy and sustainable concept.  

But then if I scroll down a bit more, I start to get an understanding of why it is predicting that this concept will fail.  

We have three drivers here that score heavily on the negative side – Use occasion and lifestyle, Differentiation and superiority and Value and intent.  

So, even if the consumers see that they like what they see, the appearance is nice, they believe in it, they see that it's healthy and sustainable, but they don't really see that it has a natural place in their life and in their lifestyle and in their everyday life basically. That is why it's heavily on the negative side and probably the reason for that is that there isn't enough superior value. There isn't enough differentiation and superior value compared to what they're consuming today that would make them interested in actually buying this product. That's why the value and the intent to try this product is heavily on the negative side.  

And this is of course a quite typical reason why product launches fail. People might like them, they like what they see, but they can't just immediately see when they would use it and why they would use it instead of something that they are consuming today. And those are elements that most traditional and more simplistic methodologies aren't really able to detect.  

If we want to dig a bit deeper here under all these drivers, I can open them up and I can see a summary of the insights and the input that we have received from the consumers for that specific driver.  

Here we get a summary of what has been the positive and negative feedback for this concept when it comes to the ‘Good for me and the planet’ driver. We get a summary of that, and we can always have access to the raw data so we can see from what data it has been doing that summarization and that quantification in giving this specific driver a plus three value.  

I can even click on New Concept here and then it's giving me an improved value proposition or concept description focusing on the ‘Good for me and the planet’ driver. So it's looking at the concept that you tested, the original version, it's looking at all the feedback that it received when it comes to ‘Good for me on the planet’ and giving them a suggestion on what an improved version of the concept sounds like if we want to focus on ‘Good for me on the planet.’ 

So, taking the actionability of insights to a new level when it's not just laying out the data for you there, it's actually then giving you a suggestion of what an improved version of what you tested might looked like.  

If we then jump to some of these more traditional parts. If we look at these KPIs that you also get, we can for example look at purchase intent and in this case we see that 63% of the respondents said that they would likely or definitely buy it. So the top two box is 63%.  

I could also click on the open-ends. Click on the four or fives and then I get a summary of the reasons behind people wanting to buy this product. If I go a layer down, I can again get to the raw data.  

I can see that among those that had high purchase intent, we can see that there were 39 positive comments that were about taste 31 comments around healthiness, and then I can see the raw data, the actual responses behind that.  

The AI is really working for you in a way where it's going through each open-ended answer, giving it the sentiment if it's positive, negative, neutral, but also giving it a label on is the comment around healthiness or taste or trial interest or what's it about? And then with the help of generative AI, even writing a summary of that for you.  

Then of course, with the help of AI, also getting a prediction of, okay, what does that mean for these different drivers? What are their impact on this concept strength? Are they positive, negative to what extent? And then an overall verdict of, okay, what does that mean for this concept? How strong is it? What is its potential to actually sell well in market. 

Underneath these other tabs, you can then dig deeper in the data with heat maps like this, understanding better the value proposition, what is resonating well. 

Also on a question level, get a summary of the insights on why people marked these specific elements in that value proposition.  

The same goes for the pack part. We can see, we can get a heat map of the pack, and we can see in this case that 41% of the respondents said that it was the logo that first called the retention. 

Then again, we can dig deeper and look at the open-ended answers that were around the logo and with the help of generative AI, get a summary of all those open-ended answers.  

So, you can get really granular here on the different elements, but as said, the LAUNCH AI tab is really giving you the overall view of if this concept is strong enough. What is its potential to sell in market? Picking it into pieces in these drivers to understand, okay, on which parts is it performing well on and which parts is it performing in a weak way.  

And then you can understand that by jumping into these different drivers and see that, okay, how could we actually improve them? And always as said, having the access to the raw data as well, which is of course very important to get the transparency and understanding that this is not just the black box where the AI is doing some magic in the background and then giving you a number, but actually trying to give you the transparency and show that, ‘Okay, this is a raw data that we have received that has been analyzed and based on that it has come to this conclusion and this summary basically.’  

In this way we feel that DIY research asset has been a lot about speed and cost, but often also the speed has been about how fast can we set up a survey and how fast can we get the data collected? But I think it's important to of course, measure speed to insights.  

What is your DIY platform able to do with the actual data then? How much is it able to help you create the story and understanding the key elements? What is its potential to selling market? How do I explain why it's going to have a high chance of selling well or low chance of selling well and how can we improve this concept going forward basically.  

Good. That was pretty much 30 minutes as I said, so I think it's good since I know that this is a quite new approach to many. So, I'll of course want to leave enough time for us to discuss this and for you to be able to ask questions about this.