There is no question that the rise of artificial intelligence (AI) in cybersecurity has been meteoric in nature and – in the same manner of other emerging technologies – the potential rewards and risks from adoption in commercial activity will materialize over time. For businesses, navigating the AI landscape is increasingly challenging due to the overwhelming deluge of AI backed-products, promises, and dreams at a time when certainty about the technology is paramount.

This article explores the rewards and risks in the use of AI in offensive cybersecurity practices, including vulnerability assessments, penetration testing, red teaming, and everything in between. While opportunities certainly exist, a number of potential pitfalls should be considered by offensive practitioners and customers alike.

But what did we do with all the customer data?

Not every company maintains offensive security experts as part of its workforce, and it is commonplace to outsource testing to third-party consultancies to perform testing. This model has a number of advantages, such as obtaining the services of well-practiced and qualified professionals., However, the rise of AI-assisted tools on the horizon – such as PentestGPT and BurpGPT, which have operating modes that make use of OpenAI’s web services or API to facilitate use – could be a tempting option to would-be testers.

During the course of an assessment, a tester could – with ease and at speed – send sensitive data to a third-party system outside of customer control. While testing, it is not uncommon to discover and parse information pertaining to customers and users – such as PII and credentials – and use of such technologies could inadvertently result in disclosing vulnerabilities. Leaks of sensitive data to AI services have already been highlighted publicly, much like what occurred to Samsung earlier in the year.

Leaks like this one force us to ask: What happens when someone performing what should be a trusted service leaks sensitive information? and Are organizations able to understand that risk? Can consultancies demonstrate that they understand the risks and are able to provide suitable assurances to their customers?

This isn’t to say that ignoring AI completely is the correct course of action, either.  If the primary concern around AI use comes from lack of control over data sent to a third- party AI, why not run it locally?

With a selection of base models run on consumer hardware that can be used for commercial use, or cloud services offering professional hardware like the Nvidia A100 GPU, a number of options exist to develop tooling that could aid pentester’s while reducing the risk from loss of sensitive information.

Not all advice is good advice…The technology doesn’t replace expertise

AI and large language models (LLMs) are not a panacea. As recently discovered by a New York attorney, it is not out of the question that information produced by the AI will be utterly false, yet entirely convincing. Because of a lack of widespread knowledge about how LLMs function and public perceptions of what AI is and can do not mirroring reality, these sorts of events are likely to occur from time to time.

As an example, we set up an LLM running on consumer hardware with Meta’s Llama2 13B model as the base model and asked it a relatively simple question:

The response isn’t bad and even makes some recommendations as to how the code could be better. However, let’s go for something more unique and specialized. Let’s have a closer look at some code and an unusual property of `scanf` described here, and see if an LLM is aware of the same issue:

Not to mention, quite expensive with a single page load in the browser translating to 59 API requests:

… and a cost of $2.86, at which point we concluded that the test could get quite expensive, rather quickly.

Today, the technology isn’t ready to provide useful, real-world results to assist pentesters in all situations and environments. That isn’t to say it won’t be useful in other ways – pair-assisted programming being a primary example – and it isn’t to say that there will come a point where usable information won’t be forthcoming, but wider questions around how to train an LLM and expose it to pentesters in a secure way will need to be answered. It also doesn’t remove the requirement for the tester to understand the advice given and know whether it is safe to follow or not.

As a thought experiment, imagine that a pentester is conducting an infrastructure assessment – again in a Windows Active Directory (AD) environment – and either spots that they have control over a user who can modify the membership of a high value built-in group, such as Account Operators, or the AI they feed their tool output into recognizes this state.

Next, they ask the LLM how they can take advantage of this situation and the LLM describes the next steps in the following image:

If the objective of the task was to compromise the ‘Account Operators’ group, it has been achieved. The user account is, by nature of being in the ‘Domain Users’ group, now also in the ‘Account Operators’ group, as is every other domain user in the customer environment.

While this example is fictitious and extreme,  conditions and weaknesses – such as logical flaws in AD – are introduced by customers into their own environments all the time and found by pentesters every day, and those flaws can be quite complex in nature. The last thing that a customer needs is a pentester making the situation orders of magnitude worse by further weakening the security of that environment.

As a caveat, the issue here isn’t even AI itself. It’s perfectly possible to get bad or incorrect advice from other sources of information. The problem is that we stand at the cusp of widespread use of a technology that has the potential to give inherently bad advice, and if caution is not taken, lessons will be learned the hard way.

Even sharp things can be educational

Let’s take a step back from viewing the issue through the nightmarish prism of AI not being quite as reliable as users would like.  Once the technology is fully understood, there are a number of ways it can be incorporated into the workflow of any tech professional, including pentesters.

We’ve already touched on pair-assisted programming as a great example of how AI can improve productivity and speed while programming. Additionally, based on the information we were able to elicit from a local LLM, there is evidently viable use as a 1:1 teacher / trainer at certain skill levels, providing the student understands the teacher is not always completely correct.

To learn more about the risks of implementing AI-based tools in your cybersecurity practice, visit the Immersive Labs Resource Center.

Check Out Immersive Labs in the News.


October 5, 2023


AI, Cybersecurity


Robert Reeves