Check your tech
Editor’s note: Lisa Horwich is founder and research principal at Pallas Research Associates. She can be reached at lisa@pallasresearch.com. Z (Zontziry) Johnson is founder and owner of MRXplorer. She can be reached at zjohnson@mrxplorer.com.
As the research technology (ResTech) landscape remains a key part of a researcher’s workflow, making smart decisions on which technology to adopt becomes more critical. The wrong decision can leave you vulnerable to costly data privacy violations or find your confidential information used in unauthorized ways.
In September 2025, QRCA offered a comprehensive class on data privacy in the age of AI that featured experts in all aspects of data privacy, including the legal ramifications of the recent laws governing AI. The course had so much rich information that some were left wondering exactly how to implement the advice.
We put together a checklist to help distill the information. We sat down after putting the checklist together to chat through it and thought we’d share the conversation with you.
Disclaimer: We are not attorneys; we are just technologically savvy researchers who stay on top of the security and privacy issues facing the market research industry. The information provided in this article does not, and is not intended to, constitute legal advice. Please consult with your own legal counsel on your situation.
Platform privacy
Lisa Horwich: OK, Z, let’s start out with platform privacy. This covers legal compliance and how both our – as well as research participants’ and respondents’ – personal data is protected. So, this is really where we get into legal things like how the platform handles current laws and how it will take into account any kind of future changes to the laws. What should we be looking for?
Z Johnson: This is one where really it comes down to which laws are the most restrictive – not necessarily for where you’re operating but for where your respondents live. It’s not so much, “I operate in the U.S., therefore, I should just use the most restrictive laws in the United States.” Instead, if you are doing research that would include people in Europe or if your client does business in Europe, you should use Europe's privacy laws (GDPR). Generally, using Europe's privacy laws is a good foundation. Not only do they have strict measures in place but they also have comprehensive coverage and established guidelines to follow.
Lisa: Let’s now talk a little bit about personal data. There are a couple of things to mention here, specifically what data does the vendor collect and then who has access to it. We also should discuss what happens with that data when the project is completed. Also, how much control do we have as the technology platform user or deployer on how that data is used?
Z: This information is extremely important to look for, not just by asking the vendor but also by checking the privacy policy! I’ve started to note that a lot of well-written privacy policies will specifically spell out what data they collect about you and what they do with your personal data. Much of this is important for data in general but personal data is even more important.
There will be cases where your personal data is shared with a third party, for example for marketing purposes. Or it's shared with a third party for tracking purposes (e.g., Google Ads), for advertising purposes or for legal purposes, in order to be able to execute the terms of a contract. I've seen these specified in some privacy policies. You’re definitely looking for, “Who owns my data in your platform? Do I have control over how long it's stored? Who should I contact if I want to exercise my privacy rights, such as right-to-know or right-to-deletion? Can I ask you, the platform provider, who is storing my personal data to delete my personal data from your tools?” This last question is one of the more important things and, honestly, one of the items I find missing most often.
Lisa: Don’t forget to check the terms of service, too. Privacy policies are often limited to what the technology provider is doing with the data it collects about you, the user. What the technology provider is doing with the data you input into the tool (including any personal information included in the data you input) is often governed by the terms of service and related addendums known as data processing agreements or DPAs.
One of the best practices is to limit the amount of personal data you share or that you disclose when using any technology tool. If you don't have to share it, don't do it! And, if the tool allows you to choose, pick the shortest amount of time they store the personal data. I know I want to see granular controls which allow you to select which type of data is stored and how long it will be stored.
Z: It's also important here to define what “personal data” actually means. Any of those single sign-on (SSO) options can include the personal data stored in Google or Facebook, like your birthdate, address, social media channels, websites or even company information that is then stored on the platform – all so that you can continue to sign into their system via SSO. This is another reason to look for those granular controls for what data you can keep and what data you can remove. What information can I take away so the platform doesn’t get to keep this information forever? I should get to decide what stays and what goes and how long.
Terms of service
Lisa: Let's talk a little bit more about data itself. One thing I always get into is the data usage. How is this platform using my data?
Z: It should be in a vendor’s terms of service. Interestingly, in the past few months, I have learned that anything that you, as a user, input into a tool – this is especially relevant for tools that use prompts like AI tools – can be stored by the provider and tied to you and your tool usage. The output from that is also considered data that can be tied to you and your tool usage. What can all of that data be used for? Sometimes it is actually specified and sometimes it is not. Often it will be vague like, “We can use your information to help us improve the services we provide you.” Other times it will be more specific. This specificity is what you want to see from vendors so you can make an informed decision about whether to use the tool. Statements such as:
We will use your information to help us train our models.
We will anonymize the information that you provide to help us train our models.
We will not use the prompts that you enter into this tool to train our models.
We will remove your personal information from the prompts and the outputs when training our models.
We will use this information to help our third-party providers improve their models to then improve our services to our users.
It can get really muddy very quickly!
Lisa: And it’s not enough to just look at the terms of service once, right? I know I came across one the other day that said you are granting somebody rights to your data and that same platform said we will change our terms of service from time to time. We're not going to tell you when. You just have to keep checking back.
Z: I did see one that said that and then followed with, “The fact we have stated that we change our terms and service every once in a while, and have instructed you to come and read our terms of service whenever you use our tool, is enough to say that we have informed you that we change our terms of service – we don’t need to inform you further.”
Lisa: That’s wild! But it illustrates the need to read the terms of service thoroughly before choosing a tool or platform and to be sure to re-review it from time to time – especially if they notify you that there has been a change.
Data storage
Lisa: We talked a little bit previously about understanding where data is coming from to ensure we follow the applicable privacy laws but now we should talk more about data storage, because we really want to know exactly where the data is being stored and processed as there are certain countries where policy states the data has to reside in that country.
Z: Yes, where data is stored has a direct impact on which regulations apply. I actually saw a terms of service the other day that very specifically stated, “While we say that we will be as good as we can with your data, we also need to let you know that your data is stored in Singapore and therefore is only guarded insofar as Singapore rules and regulations allow.” Essentially, they are saying, “Be aware we don't have to be as strict with your data as we would if we were storing your data in Europe or somewhere else.”
Lisa: Yes, very important! And then checking what their policies are around data retention and deletion. One of the best practices I've always said is just pick the shortest amount of time possible – for personal and general data. You want to look for and make sure they've got clear guidelines.
Z: And if you don't find a clear guideline on how long your data is stored or a clear guideline on when your data is deleted, seek clarification or look for a different tool.
Lisa: If you don't see it in their terms of service, ask the vendor and write it into your contract with them if you're writing your own contract.
Compliance
Lisa: Let's turn to compliance. We’ve touched a little bit on it here and there but now let’s talk about it, especially in relation to AI.
Some of the questions we want to ask in this case are, “Who is handling the AI when it comes to training data compliance?” And, “Who is liable for the output’s intellectual property compliance?” I think these fit together because it's all about who owns the rights to that data.
Z: This was something that I did not even know was a thing to be concerned about until Jessica Santos (global compliance and quality director, DPO at Oracle Life Sciences) talked about this during the data privacy webinar. What it boils down to is simply: The AI training data should not be using copyrighted material. It really doesn't matter who is doing the training.
Lisa: There are numerous lawsuits out there right now claiming some of the major LLMs used copyrighted information without permission to train their models. The public domain and copyrighted works are mutually exclusive concepts. A work is either in the public domain and free to use or it is protected by copyright and requires permission to use. The AI provider, deployer or user is responsible for checking that the information obtained from public domain and/or output of the AI are copyright protected.
Z: When it comes to the output, this is another time to ensure the data does not have copyrighted material. So, not only does the material going into the tool need to be checked for copyright, the information coming out also needs to be checked for copyright violations. You want to be sure the platform provider has tools in place that check for copyrighted material both directions.
Lisa: Then who owns that data when it comes out? If I put something in a gen AI tool, who then owns that output?
Z: Again, with many legal battles going on, this could be either the developer, deployer or user of the AI. You might think that because you entered a prompt, you now own the output. But before we talk about output ownership, there’s a group of people who are growing their prompt libraries and they want to copyright them. Basically, they’re arguing, “I spent time and energy building this prompt and honing it so that it would work for this tool and I don't want someone to be taking this prompt and building their own tool based off of my prompt.” This is going to change over time with copyright laws.
It's a heated space to watch and, as marketing researchers, we should keep an eye on it! As for output, some tool providers now say in their terms of service that they, not you, own the output from prompts. We need to be very aware who actually does own and retain the ownership of the output. Because when you are using ResTech that generates reports, who owns those reports at the end of the day? You want to be very clear on that.
Lisa: Right. Let's talk a little bit about models and what we need to know from our platform providers about how they are protecting us and our data going into third-party systems including LLMs.
Z: A first step is making sure it's one of the larger, well-known LLMs and not an LLM that no one has heard of. Also, ensure it has good security and privacy practices around it. Know that sometimes you have to go into the tool and actually change your settings to ensure the controls are set up for maximum privacy and security for you and your clients. Be aware that these tools can change their terms of use anytime.
Lisa: What about an acceptable use policy? What do we want to learn from the vendor at that point?
Z: You’re just trying to understand how the system can be used. You're looking for guardrails that the tool provider expects you to honor. For instance, acceptable use policies may prohibit uploading certain types of data like protected health information. Thus, if you use a ResTech tool for a project you’re working on with a health care client and you upload interview notes that include protected health information, you might be violating the ResTech provider’s terms and exposing both yourself and your client to risk. You can never get rid of all risk – this was something that both Jessica Santos and Jessica Clark (privacy and IP counsel at Kelly & Simmons LLP) pointed out – but you can mitigate risk by taking the time to thoroughly vet and understand the tools you choose to use.
Lisa: Absolutely. This prevents you from using it in a way that it wasn't designed for, which would result in more issues. Setting up a contract with a data usage provision is also recommended. The best method is data minimization – only load the minimum necessary data into the system or transfer to the vendor and delete them as soon as the processing activity is completed.
Security
Z: Exactly! As we're talking about how you can use these tools, we also need to be looking at tool security. Lisa, you’ve talked a lot about security frameworks. Could you talk a little more about what frameworks mean and what we should look for?
Lisa: A framework is the way a technology vendor should be building security into their platform or tool. I know I always want a tool that is built with security in mind from the start. Very often, people build a tool and then they try to secure it.
When talking about frameworks, there are two that are pretty standard, one is from NIST – the National Institute of Standards and Technology. The other is COBIT – Control Objectives for Information and Related Technologies, a different IT governance framework. These frameworks outline exactly how to mitigate risk and what steps a vendor should take to secure their platform or tool. You want to ask your vendor which framework they are using because then you know that security is not an afterthought for them.
Z: Got it! You’ve also talked about checklists, which makes me think of training, and training makes me think of certifications. Are there certifications that we can look for that give us that trigger that yes, this company has been thinking about security when developing this tool?
Lisa: There are two kinds of certifications to look for. The first is ISO 27001, developed by the International Organization for Standardization (it was co-developed by IEC, the International Electrotechnical Commission). This covers information security standards (and what we're talking about when we're talking about data security is information security).
Another key one is SOC 2, which outlines internal controls and checklists that an organization has in place for information security. To get the SOC 2 certification, an organization is audited, looking specifically at a vendor's procedures, controls, who has access to the data, etc. You want to ensure the audit is conducted by a very reputable audit firm. There are companies out there that will give you an SOC 2 certification that are kind of fly-by-night. Also, it's not just enough to have the SOC 2 certification. You also want to look at the audit findings to understand areas that a company could be deficient in within their security controls.
Z: So don’t just say, “Oh, great, they have SOC 2 on their website – done!” Dig a little deeper. Let’s now talk a little bit about encryption. What should we look for when it comes to data encryption?
Lisa: Data encryption is super important because when your data is transmitted – uploaded or downloaded – somebody could get in and grab it. You want to know that it's encrypted versus having somebody's name, Social Security number or other private information easily read. You also want to know that it's encrypted where it's being stored. Wherever that data lives, it needs to be encrypted.
Z: There is another area of security I want to ask you about that is specific to gen AI applications called prompt injection. What does it mean and why should anyone care about prompt injection?
Lisa: Prompt injection is such a weird term. What it means is somebody is using prompts essentially to either inject malicious code into the application or put in instructions to the platform to give up information that it shouldn't, like passwords. If a vendor is not protecting their application against prompt injection, it means somebody could get in there and get your data.
Z: Are there safeguards to prevent unauthorized access or to prevent prompt injection?
Lisa: You definitely want to look for extra authentication protocols like multi-factor authentication, where you are sent a code to enter, or you can do biometric authentication, where you have to show your face or log in with a passkey. It’s an extra layer because anybody can get your username and password, but you want something that's going to ensure it's you or another authorized user using the tool. Ask your vendor what extra authentication protocols they use for sign-in. You don't want it to just be username and password. And this isn't just for AI. This is for any kind of technology tool. There are many cases out there of databases that have been hacked because the vendor didn’t take that extra step.
Z: We talked earlier about tools that are connecting to third parties for everything from marketing and tracking to even connecting to other LLMs. When we are looking at data protection, there's only so much we can do ourselves. What should we look for from the tool provider when it comes to data protection when our data is shared?
Lisa: In the world of technology, a supply chain is all of the different programs or the third parties and their programs, and if it’s a gen AI tool, it includes the LLMs. Also, APIs – application program interfaces – are super important because they act like the bridge between two programs. And you want to make sure that all of those things are protected along the way because any security issue within the supply chain means the entire system isn't secure.

Unique gen AI issues
Z: Shifting gears a little bit, there's the output that we talked about earlier. You type in a prompt, you get something out. It's usually kind of a black box, right? But I've seen this term starting to surface more and more about explainability: AI explainability. What is that?
Lisa: Explainability essentially is, how can I explain what is coming out of this platform or tool? It's super important, especially for us as researchers. What if you deliver your insights and findings to your client or stakeholder and they then ask you, “Where'd you get it? How did you get it?” And if you can't explain it (beyond, “I got it from AI”), it can a be big problem.
For researchers, it is essential for us to “explain” or demonstrate accuracy and replicability (if we do the same analysis again, we should generate the same result).
Z: I'd imagine that would be very true, especially for using AI to do data-quality checks, right? “Explain why you flagged these particular respondents for low data quality.”
Lisa: Right. That's a really great one!
Z: We talked about explainability. We've talked about data protection and prompt injection. We have a lot of agentic AI now where AI can do all of these things on your behalf and you don't have to do anything more than type in a prompt. And so that's been bringing out this term “human in the loop.” At what stages should we have a human in the loop? And where should we look for that when it comes to the tools that we are using?
Lisa: It’s best to keep the human in the loop wherever possible. One of the main reasons to have the human in the loop is because of hallucinations. Let's talk a little bit about hallucination and what happens. How do we mitigate hallucinations and platforms?
Z: You know, I've seen a lot about the idea of how you mitigate for hallucinations, everything from just write a better prompt to making sure that you have this guardrailed project where you are prompting to begin with. You have a limited data set that you are allowing the LLM to pull from. And I think they both have their merits and both have their abilities.
Lisa: I read the other day about why these programs hallucinate, which I found fascinating: It's the reward system. These systems are rewarded to make things up instead of saying, “I don't know.”1 That's why we have to keep checking, because that reward system is set up for the LLM to just make something up.
Z: It’s funny you say that because I was also reading something similar about one of the ways that you can avoid hallucination. So, ChatGPT has projects, Claude has projects and each can have unique, project-specific instructions now. And there was a thing that said as part of the instructions, give it the out of saying “I don't know” and that will reduce the hallucination. That way it doesn't default to, I must give it an answer no matter what the cost. Instead, you now have told it, “If you don't have the information, tell me you don't have the information.” It was another way to avoid the AI hallucination that I found really interesting. Always checking the citations provided by AI is another way to mitigate hallucination.
Lisa: That's a great piece of advice for researchers!
Z: I think there's another type of hallucination we don't necessarily talk enough about and that is bias. We haven't even figured out how to mitigate bias in ourselves but how do we work with bias in the models that we work with?
Lisa: That's so important! If you think about what all of these AI solutions and models have been built on, the training data is inherently biased, because it just is. Find out from the vendor what data it's been trained on and how the vendor checks for bias, if at all. If you have a good understanding that the data was trained on information from a subset of the population, you cannot extrapolate it to the entire population. It's just like thinking about your sample – treat bias like your own sample.
Z: This has been a great conversation about what researchers should look for in their ResTech platforms and tools – especially data privacy and security. Hopefully, we have helped some researchers feel more confident about data privacy and security questions to consider when selecting a technology solution.
Reference
1 Air Street Capital's State of AI Report for 2025.
