Software Test Engineers: Enhance your productivity with GenAI

October 14, 2024

Advances in AI technologies are transforming every IT profession, including software testing, as AI tools become more embedded in our daily work processes. First Line Software is no exception. Earlier this year, we decided to integrate GenAI solutions into our everyday operations and have made significant progress in embedding these technologies. 

Today, I will share some practical and highly useful skills for working with GenAI models. These will help you transition from traditional software testing to AI-assisted testing, improving your personal efficiency while reducing the repetitive and monotonous tasks that testers often encounter in their daily activities.

Most of you have probably experimented with using GenAI to tackle tasks like generating test checklists, creating test data, or even brainstorming ideas for testing scenarios. While these attempts may have been somewhat successful, I’ve also heard many comments that ChatGPT gave nonsensical results or generated irrelevant answers. These frustrations are, unfortunately, quite common when the prompts you provide to the AI lack depth and context. When prompts are too simple or generic, the AI often produces content that is equally vague and unhelpful, because it lacks the specific guidance needed to tailor its responses to the context of your project.

To help you avoid these kinds of issues, I want to share some practical tips on how to write effective prompts. Well-crafted prompts can help ensure that GenAI models produce answers that are highly relevant to your work tasks, particularly within the context of testing activities, rather than generating broad or disconnected responses that fail to solve the problems at hand. By improving the quality of your prompts, you can significantly increase the utility of AI tools and enhance your testing process.

Getting Started

Incorporating AI-assisted testing can be a game-changer for increasing efficiency, automating mundane tasks, and enhancing the accuracy and creativity of your test cases. For example, GenAI models can assist in generating edge cases, exploring multiple test scenarios, or even helping to identify potential areas of risk that might have been overlooked. But for AI to become a truly valuable part of your workflow, you need to be mindful of how you interact with these tools, starting with prompt engineering.

Before diving into how to leverage GenAI for AI-assisted testing, you must understand the fundamental principles behind how large language models (LLMs) like ChatGPT and other GenAI models work. LLMs are trained on vast amounts of data and are designed to generate human-like responses based on the input they receive. However, their effectiveness largely depends on the quality of the input. When using these models in the context of testing, it’s important to craft prompts that are detailed, clear, and specific to the testing task at hand.

  • LLM models operate by interpreting prompts written by humans and generating corresponding outputs.
  • When utilized properly, LLMs can serve as a valuable aid for Software Testers.
  • A Software Tester can rely on LLMs as a supplementary tool for a variety of tasks, including test planning, design, and automation.
  • It’s essential for Software Testers to critically assess the outputs provided by LLM models.
  • The effectiveness of LLM usage is maximized when the tester applies their expertise to craft precise prompts, leading to more reliable results.
  • If the prompts provided to LLMs are vague or overly broad, the responses will reflect that lack of specificity.
  • A Software Tester should maintain a healthy skepticism toward LLM-generated responses to ensure the information is useful and relevant.

Key Considerations When Working with LLMs

Hallucinations

The challenge with LLMs is ensuring their output is logical and grounded in reality.

The potential for hallucinations means it’s crucial to approach LLM responses with caution. Keep in mind that the output is predictive, and not always factually correct. Critical thinking should never be set aside simply because an LLM generates human-like responses.

Data Provenance

For most users, both the inner workings of LLMs and the datasets they are trained on remain largely unknown. LLMs can offer valuable suggestions for automation testing, but their accuracy isn’t guaranteed, so it’s important to review them thoroughly. Suggestions might be tailored to your existing test repositories and appear correct but could inadvertently include components from a different automation framework simply because it’s present in another project. Additionally, using tools like Copilot chat, you can generate automated tests based on existing ones. When creating your prompts for automation, make sure to specify detailed checks and explicitly include wait conditions, as these won’t be generated by default. Maintaining control and oversight is key to ensuring that LLMs enhance, rather than hinder, your automation testing process.

Data Privacy

Be mindful of the data you input into an LLM. Since these systems process data on external servers, if you plan to use sensitive corporate or personal information in your prompts, make sure to anonymize or modify the data beforehand.

How to Craft an Effective Prompt

Provide clear and specific instructions

It may seem straightforward, but offering clear, specific instructions in your prompts is essential to getting useful results. Ambiguity in your instructions can lead to vague or incorrect responses, so clarity is crucial for accurate output.

Use delimiters

While LLMs can infer intentions within different sections of a prompt, using delimiters—characters that separate strings—helps to make your instructions even clearer. Delimiters prevent confusion between sections of your request, allowing the model to process each part of the prompt more effectively.

Request structured output

Be explicit about the format you want in the response. Providing structure in your prompts ensures the output meets your expectations. If the response is unstructured, it may not be suitable for direct use in automated processes or further testing. 

Challenge assumptions

LLMs may generate incorrect or irrelevant information when uncertain. If you don’t want the model to guess, provide clear instructions that allow it to refrain from answering when it lacks sufficient information. This helps ensure you only receive responses that are meaningful and actionable.

Add examples of outputs

Few-shot prompting involves giving a few examples to clarify your instructions. A zero-shot prompt, by contrast, provides no examples. By including examples, you increase the likelihood of the LLM producing more relevant responses.

Break the task into steps

By using delimiters, you can break down complex tasks into manageable steps for an LLM to follow. This approach also makes it easier to evaluate each step individually and adjust instructions if needed.

Ask the LLM to self-check

Instruct the LLM to evaluate its own output to ensure it aligns with the instructions you provided. This additional check helps prevent errors or hallucinations in the final response, improving the overall reliability of the output.

Now, I hope you have a clear understanding of how crucial it is to craft the right prompt to get high-quality responses. To illustrate these principles, let’s walk through a few examples of how to use them effectively.

In the first example, we’ll look at what is arguably one of the most important and significant activities for a tester—writing test cases or checklists. Below is a sample prompt for generating test cases from a given context. It’s important to note that you can provide context in multiple ways, including by attaching program requirements as a file, for example, when using ChatGPT.

Important note! The response from an LLM is usually limited by the number of tokens it can process. So, if you provide a document that is too large, there’s no guarantee that the GenAI will be able to fully cover all the details. For this reason, I recommend decomposing the task into smaller parts. You can either break it down into sections or start by generating higher-level test cases, then use additional prompts to further expand and refine the details.

Strong Prompts to tackle QA tasks

Let’s break down the prompt a bit.

In section #A, we provide the initial instructions to the LLM, specifying its role as an expert software tester. The LLM is tasked with generating test cases, and it is shown where to source the requirements for these test cases. The LLM is told to pull relevant details from the user stories and business requirements, which are clearly delimited by triple hashes. This ensures that the LLM focuses on the correct context and derives information from a specific section, avoiding any confusion about where to get the necessary information.

In section #B, we give the LLM specific guidelines to shape its response. The test cases should include both positive and negative scenarios, ensuring comprehensive coverage of different outcomes. The LLM is also told to include validation scenarios, which are necessary to ensure that all the critical functionality works as expected. Additionally, the test cases must include generated test data, which provides concrete values that can be used in testing. Furthermore, each test case must contain specific elements, such as the test case name, preconditions, test data, steps, and expected results for each step. These elements guide the LLM to produce test cases that are both detailed and actionable. Finally, the output must match the example format provided later in the prompt, ensuring that the test cases are formatted consistently and in a readable manner.

In section #C, we address tactics for avoiding off-topic or incomplete suggestions. The LLM is instructed to ensure that each suggestion it generates meets all of the conditions specified in the previous section before outputting the test cases. This step acts as a quality assurance mechanism, prompting the LLM to check its work against the criteria provided and ensuring that the final output is aligned with the specified structure and content requirements.

In section #D, we add an example of the desired output to give the LLM a concrete model to follow. The example shows a well-structured test case, starting with the test case name, followed by preconditions, test data, and detailed steps, along with the expected results for each step. The LLM can use this example as a template, making it easier to structure the test cases it generates consistently and clearly.

Finally, we use separators in section #E to clearly distinguish the prompt from the context within which the LLM should generate the test cases. The user stories and business requirements are enclosed within triple hashes, making it easy for the LLM to recognize the relevant section. These separators help to maintain clarity and organization, ensuring that the LLM knows exactly where to look for the data it needs and how to format its output. 

In the second example, I will demonstrate how GenAI can be used in test automation by creating a Page Object Model. An automation project can include a large number of tests, so it’s crucial to apply best practices for code organization to improve maintainability and readability. To achieve this, we use design patterns, and one of the most common ones is the Page Object Model pattern.

Let’s break down the prompt a bit. In section #A, we assign the LLM the role of an expert in automation testing and guide it to pull the relevant information from the HTML enclosed by triple hashes. This ensures the LLM knows the specific context and data it should focus on when generating the page object.

In section #B, we offer step-by-step instructions to guide the LLM’s response. These hints detail how to create a Java class for the contact form page, import necessary Selenium libraries, annotate form fields with @FindBy, and initialize elements using the PageFactory. The goal here is to give the LLM clear direction on how to structure the code while covering all key elements.

In section #C, we incorporate a checkpoint to ensure the LLM stays on track. By instructing the LLM to check each suggestion against the provided conditions before outputting them, we aim to avoid irrelevant or off-topic suggestions.

Finally, in section #D, we provide an example of the HTML that the LLM will use as a reference to locate form fields and construct the corresponding @FindBy annotations. Separators (such as triple hashes) are used to clearly distinguish the provided HTML from the rest of the prompt and ensure the LLM can focus on generating the correct output.

Final Thoughts

Incorporating GenAI into software testing workflows offers immense potential for boosting efficiency, reducing repetitive tasks, and enhancing the creativity of test scenarios. However, to unlock the full benefits of AI-assisted testing, testers must learn how to interact effectively with these tools. Prompt engineering is key to ensuring LLM models deliver relevant and accurate results. By providing clear, structured, and context-specific instructions, testers can minimize issues such as hallucinations and irrelevant responses. Additionally, maintaining a critical mindset and careful oversight when reviewing AI-generated suggestions ensures that AI complements, rather than compromises, the quality of your work. 

As you embrace these AI capabilities, remember that human expertise and judgment remain crucial in guiding the AI toward meaningful contributions in your testing process.

About the author

Alex Meshkov is head of QA and AI evaluation at First Line Software. With more than 12 years of experience, Alexander specializes in software testing, test process organization, test management and AI evaluation.

Let’s talk!

Have any questions? Fill out the form and our team will be in touch!