Key Takeways:
1) AI projects are not automatically RDTI eligible.
2) LLM activities vary significantly in eligibility complexity.
3) Strong experimental documentation is essential for AI claims.
A key point of contention for RDTI claims involving the use of AI is that “the knowledge being sought must go beyond validating a simple progression from what is already known and beyond merely implementing existing knowledge in a different context.” (Body by Michael Pty Ltd v Industry Innovation and Science Australia, ARTA 44 (104) [24 January 2025].)
As AI technologies become increasingly commoditised and accessible, the distinction between eligible experimental activities and the implementation of existing capabilities becomes progressively more difficult to define. In practice, many AI projects involve a combination of infrastructure engineering, software development, data integration, model configuration, and experimentation. The fact that an activity involves AI, machine learning, or LLMs does not itself indicate the presence of eligible R&D activities.
To properly examine this, it is important to make a distinction between the following representative types of LLM-related activities:
- Creating a new LLM architecture (e.g. a new concept published in a journal)
- Training an existing LLM architecture (‘creating an LLM’)
- Training a top layer for an existing LLM (‘fine-tuning an LLM’)
- Giving an existing LLM access to a pool of external data at run time to augment the request and response (‘implementing a RAG pipeline’)
- Prompting an existing LLM to perform a specific task or set of tasks (colloquially ‘using AI’)
For Type 1, the new knowledge generated is a new or improved architecture.
For Types 2 and 3, whilst it may seem on the surface that this is simply practically implementing something that already exists, the process of training an existing LLM architecture is in essence the method of creating a new model. By feeding it a set of data, the model itself (i.e. the specific combination of learned weights and parameters) is generated or altered. Thus, it may be argued that the new knowledge generated is a new or improved model.
However, for RDTI purposes, the mere existence of a technically distinct model is not itself sufficient to establish eligibility. The key question is whether the outcome could be known or determined in advance based on existing knowledge, information, or experience. For example, fine-tuning an existing open-source LLM using established methodologies, standard tooling, and predictable optimisation approaches may still constitute the implementation of known machine learning practices, even if the resulting model itself is technically different.
As such, for Types 2 and 3, the critical issue is not simply whether a new model is produced, but whether there existed genuine technical uncertainty regarding whether the intended outcome could be achieved, and whether a process of systematic experimentation was required to resolve that uncertainty.
Ultimately, it is clearer that something definitively new or improved is being created for Types 1, 2, and 3 (a new architecture, a new model, and an improved model, respectively). However, the existence of a new artefact alone does not determine eligibility. The experimental process undertaken to generate that artefact remains the central consideration.
For Types 4 and 5, the question of new knowledge becomes significantly more nuanced.
For Type 4, the existing LLM itself is not new, and the existing data in the RAG pool is generally not new. The RAG pool is not inherently modifying the underlying model. In many cases, implementing a conventional RAG pipeline using existing embeddings models, vector databases, chunking methods, and retrieval frameworks may represent little more than the application of existing software and machine learning capabilities in a particular domain.
As such, based on existing current knowledge, the applicant must be careful that they are not “validating a simple progression from what is already known” (Ibid). For instance, it may already be known that giving a RAG pool of case law to existing transformer-based LLMs (e.g. ChatGPT, Claude) improves their ability to respond to legal queries. It may also already be known that newer versions of these LLMs perform better than older versions. Therefore, the outcome of giving a RAG pool of case law to a better version of an existing LLM may be considered determinable from the outset.
Through a stricter lens, it could also be argued that the outcome of giving any relevant RAG pool to any sufficiently capable existing LLM is broadly determinable from the outset. As such, the existing knowledge in the relevant field of application must be carefully considered to determine eligibility.
However, this does not necessarily mean that all RAG-related activities are ineligible. In some cases, experimentation may extend beyond conventional implementation into genuinely unresolved technical problems. For example, uncertainties may arise regarding retrieval accuracy under constrained latency conditions, adaptive chunking strategies, hallucination suppression, context window optimisation, multi-agent orchestration, ranking methodologies, or balancing retrieval precision against inference performance at scale.
In these scenarios, the experimental activity may no longer simply be “implementing a RAG pipeline”, but rather investigating unresolved technical questions relating to inference orchestration, retrieval architecture, or system reliability.
For Type 5, in addition to the problems posed above, the fact that one is using an existing tool:
a) as it is meant to be used, and
b) for its intended purpose,
adds another layer of complexity in determining whether there is new knowledge and whether the outcome is known from the outset.
This results in several fundamental questions:
- Is creating new prompts to an existing LLM a valid form of new knowledge or is it simply using or adapting an existing capability to solve a problem? “When assessing whether there is an unknown outcome, you need to consider whether an existing… capability can be adapted to solve a problem” (Department of Industry, Science and Resources, Software-related Activities and the R&D Tax Incentive (May 2024)).
- Is optimising prompts (‘prompt engineering’) essentially just testing the efficiency of algorithms that are already known to work? “There are routine testing steps in software development projects that are frequently incorrectly claimed as core R&D activities: … • Testing the efficiency of different algorithms that are already known to work” (Taxpayer Alert TA 2017/5).
- Is the activity fundamentally directed toward resolving a technical uncertainty, or is it directed toward improving a commercial or functional outcome such as user experience, response quality, or workflow efficiency?
In many cases, iterative prompt refinement may simply represent optimisation of an existing capability rather than the generation of new knowledge. However, in more advanced scenarios involving constrained reasoning frameworks, deterministic inference control, autonomous agent reliability, self-correction architectures, or multi-step orchestration systems, the activity may extend beyond simple prompt engineering into more complex experimental software or systems engineering.
As at the time of writing, many of these questions have not been directly addressed by published AAT/ARTA decisions. The rapid pace of AI development also means that the boundary between implementation and experimentation is likely to continue evolving.
Importantly, for AI-related claims, contemporaneous documentation remains critical. In practice, many AI projects involve rapid iteration, informal experimentation, changing datasets, evolving prompts, and continuously modified model configurations. Without sufficient records of hypotheses, technical uncertainties, experiment designs, failed attempts, evaluation metrics, datasets, hyperparameters, and conclusions, even genuinely experimental activities may become difficult to substantiate during review processes.
Given the complexities involved, before claiming LLM-based projects for the RDTI, it is critical to seek expert advice as to whether “the gap between existing knowledge and the hypothesis being investigated… [is] significant enough to require application of the scientific method.” (Body by Michael Pty Ltd v Industry Innovation and Science Australia, ARTA 44 (104) [24 January 2025]) in the context of the specific activities being undertaken.
Davi Kofsky is an Associate Director at Rimon Advisory. With degrees in Artificial Intelligence and Information Systems and over 10 years of RDTI experience, Davi brings a deep technical lens and expertise in modern technologies to the Rimon Advisory team.
