Not all advice is good advice…The technology doesn’t replace expertise
AI and large language models (LLMs) are not a panacea. As recently discovered by a New York attorney, it is not out of the question that information produced by the AI will be utterly false, yet entirely convincing. Because of a lack of widespread knowledge about how LLMs function and public perceptions of what AI is and can do not mirroring reality, these sorts of events are likely to occur from time to time.
As an example, we set up an LLM running on consumer hardware with Meta’s Llama2 13B model as the base model and asked it a relatively simple question:
The response isn’t bad and even makes some recommendations as to how the code could be better. However, let’s go for something more unique and specialized. Let’s have a closer look at some code and an unusual property of `scanf` described here, and see if an LLM is aware of the same issue:
Not to mention, quite expensive with a single page load in the browser translating to 59 API requests:
… and a cost of $2.86, at which point we concluded that the test could get quite expensive, rather quickly.
Today, the technology isn’t ready to provide useful, real-world results to assist pentesters in all situations and environments. That isn’t to say it won’t be useful in other ways – pair-assisted programming being a primary example – and it isn’t to say that there will come a point where usable information won’t be forthcoming, but wider questions around how to train an LLM and expose it to pentesters in a secure way will need to be answered. It also doesn’t remove the requirement for the tester to understand the advice given and know whether it is safe to follow or not.
As a thought experiment, imagine that a pentester is conducting an infrastructure assessment – again in a Windows Active Directory (AD) environment – and either spots that they have control over a user who can modify the membership of a high value built-in group, such as Account Operators, or the AI they feed their tool output into recognizes this state.
Next, they ask the LLM how they can take advantage of this situation and the LLM describes the next steps in the following image:
If the objective of the task was to compromise the ‘Account Operators’ group, it has been achieved. The user account is, by nature of being in the ‘Domain Users’ group, now also in the ‘Account Operators’ group, as is every other domain user in the customer environment.
While this example is fictitious and extreme, conditions and weaknesses – such as logical flaws in AD – are introduced by customers into their own environments all the time and found by pentesters every day, and those flaws can be quite complex in nature. The last thing that a customer needs is a pentester making the situation orders of magnitude worse by further weakening the security of that environment.
As a caveat, the issue here isn’t even AI itself. It’s perfectly possible to get bad or incorrect advice from other sources of information. The problem is that we stand at the cusp of widespread use of a technology that has the potential to give inherently bad advice, and if caution is not taken, lessons will be learned the hard way.