
Oracle Cloud has several Large Language Models available within OCI itself that can be used by your applications and all usage is charged though OCI therefore making this LLM-As-A-Service. The most important feature that we consider for this is that all data movement is contained to OCI and no data is shared with the provider of the LLM, data is sovereign to the OCI Region if the LLM is available within your region, otherwise just your region and the LLM region. If you are using company confidential information with LLMs to gain valuable insights, you want that data private and OCI ensures that is the case.
Oracle has the following LLMs on OCI, they are not available for all regions. For the examples below, we are running our workloads in OCI Sydney and using the LLM from London, United Kingdom. This means that data will be travelling from SYD to LON and back again.
That is a lot of models, and they do change reasonably frequently. Different models specialise in different use cases. You should do research and testing to determine what model is right for your use case.
For the exercise we perform below we are doing some analysis of survey results, and we will be using the three Cohere models to illustrate some differences that occur
Note that 'Cost' refers to the charges per 10,000 characters sent to the LLM plus the number of characters generated by the model. In our case, we cause the AI to ingest our survey results which is approximately 16,100 characters and the length of the result varied, but was generally 1,000 to 2,500 characters. Estimating cost would be ((16,100 + 1,000) / 10,000) * 0.00135 for the Command R model which results in an approximate $0.00231, therefore if you ran this query 100 times you would be charged $0.23085. As we are operating from OCI Sydney, all charges stated here are Australian Dollars.
To estimate the cost for the more expensive and comprehensive models, the cost would be calculated as ((16,100 + 2500) / 10,000) * 0.0234 which results in an approximate cost of $0.043524 per invocation. That is a vast difference in cost (approx 19 times) and this may have merit in determining which model to use for which use case.
Pebble IT has a demonstration Apex application titled "Worker Management System" (WMS) that is run for a fictitious company "Friendly Contracting Services" (FCS). Its primary role is to record all contractor details including their insurance requirements so that they have the necessary coverage for performing their work. In addition, this application performs survey results of their satisfaction working for FCS and to capture any safety or other concerns that management should be aware of.
Whilst survey results can be viewed individually and numerically summarised, text based examination of survey results is very difficult for people to examine, particularly when there are hundreds of survey results. We are using AI combined with document generation to build a management report that we would distribute each month that summarises the results but also highlights anything of significance. There are two major steps being performed at the request of the user:
Note that WMS is not a real application that we sell to clients, it is a theoretical demonstration aplication that shows Apex UI, workflow, documentation generation and now AI capabilities.
We will share with you the summarisation section from the report that asks the following question: "Can you give me any trends regarding stress in my workforce?". The results are quite different across the 3 models and this illustrates the point that the differences in model deployed can be significant.



We feel that "Command R" would be fine for testing purposes, but provides insufficient insights for production level data and it is quite likely using this model that we would miss important points and instead have to rely upon human scrutiny of the results.
This is a big improvement, and at 19x the cost, it definitely should be. This gives us real insights and introduces the 'Actionable Recommendations'. The AI has indicated to us that there are some stress issues within our contractor workforce and that would then be sufficient justification to do a detailed analysis on the results so that a remediation plan can be created and executed. Whilst you may form your own opinion on how truly useful this insight may be, we believe incorporating more comprehensive prompts (and we acknowledge that our prompt is a very simple prompt to keep it basic), even further insight could be achieved to deliver greater value.
The important point is that this model is likely to be seen as production-ready whereas the Command R model is not.
Another leap forward but at little incremental cost (the result is larger so a slightly more expensive cost is incurred). We were pleasantly surprised at this. You may wonder that the AI here might publish confidential details here, and it definitely is capable of that if the contractor was prepared to share confidential details in a survey, then it could definitely be surfaced in a summary report, particularly if it contained criminal or graphic descriptions. The AI took it upon itself to name key people for particular classifications. Note that all the names you read here are from our AI generated data set, so do not relate to real people, and the survey results are also manufactured, not copied from real results elsewhere.
The information provided is both insightful and actionable. This is the level of information we would want in our survey results report. However, due to the detailed analysis it does provide, its output requires human moderation. This is not a set-and-forget exercise that you would hand over 100% trust to AI. The value that this would deliver to a staff member who is responsible for this is a thorough examination of the results and allow the staff member to gold-plate the subsequent report without spending too much time compared to a completely manual exercise.
When we started experimenting with the different models, we were not expecting such stark results. Hence this article. After experimenting, it has drawn us to some points worth discussing further:
AI and LLMs are the domains of highly specialised experts, however the vendors are making these models and capabilities available to the world who then implement them in their own way with their own use cases. Due to the power of these models and the capabilities of AI, it is going to have serious consequences as we all make assumptions about what we read, hear and see. All models are not equal and we believe we have illustrated how the differences can be stark. This article is the very simplest of comparisons, its intention is really awareness as opposed to any sort of measurement model or decision-making tree.
Managing cost is the primary driver here. It is desirable that Development and System Test environments use cheaper models that have a strong relation to the more expensive Production model to verify that the model and AI has been correctly implemented. Production, UAT and Training environments should use the comprehensive models to show the highest quality outcomes that are desired.
The use of LLMs can have their costs estimated. It is Quantity of invocation multiplied by (input + output). The key is to understand all 3 of those metrics. Plus non-Production environments need to be considered. Your business may encounter significant model and AI costs in your Dev/Test/UAT/Train environments and should not be ignored. Their usage should be included in all budget estimates. Application stakeholders and business analysts will be your primary source of quantity of invocations, the input and output size is more likely to be the domain of the developers. Together costs can be calculated.
AI and LLMs are yet another source of variable spend that can easily spiral out of control, much like Could computing costs did in the early years of 2015 onwards. No doubt more cost control options will be released by Oracle to help manage costs, but until that occurs, quality budgetary calculations will be very important. To illustrate the point, a Virtual Machine from Oracle may cost $90 per month, by running 2,000 invocations against a Cohere Command A model of an average of 10,000 characters in total of input + output will result in $1,450 per month, far more significant than the cost of the VM.
We are at an AI dawn (I express it this way because there may be more than one as we approach AGI), there is a lot of differing opinions about replacement of jobs and new jobs being established as a result of AI. What is clear is that AI is a new type of computing paradigm that is less structured than what we are traditionally used to. We have not had a lot of time to adjust. Change is happening too fast for comfort. We are comfortable with 'deterministic' programs and we test expecting that the same inputs will achieve the same outputs. Payroll is a good example of a structured program that takes a number of inputs and will give the same output each time and this can be measured across different systems, typically as parallel pay runs are performed on an old payoll verses a new payroll. AI represents 'non-deterministic' and whilst achieving similar results each time, they will not be exactly the same each time it is invoked with the same inputs. You will note that most AI systems like 'Perplexity.ai' have a "regenerate" button that is to be used if you do not like the answer you are provided.
It is because of this unpredictability that human oversight is most likely required in the vast majority of use cases. In our simple example, I would definitely want to verify any extreme results that have been entered into our surveys. For example, an aggrieved contractor might post profanities or significant lies that would cause concern if included in a report sent to senior managers around the organisation. We need to understand the value of AI, and that is it is being a tool of efficiency for our workers, not being the worker. I trust humans, and whilst I will happily use AI, I will not trust it completely. Expert prompts can minimise the occurrence of hallucinations by requesting the AI to only use data provided and not to make inferences that are not present in the data. What AI does not have is good judgement. If I were responsible for the monthly survey report, I would exclude the profanity-laden rant submitted by an irate contractor whereas AI would likely treat this as a real example that needs to be highlighted and effectively skewer the report.
There is a lot of thought and preparation that needs to be undertaken as part of adopting AI. Whilst it may seem obvious, selection of the right model is a task that needs to be placed in every project, and understanding by stakeholders that different models can be employed for the same purpose but in different environments to lower costs. The Oracle OCI capability of housing LLMs makes AI very accessible and is an excellent starting point for experimenting to understand what is possible. Oracle Apex is a great tool that has built-in capability to invoke AI quickly so that money is spent on understanding the use of AI as opposed to building the framework to get to that understanding.
We have been keen observers of AI for the past 4 years and we have been very careful in our undertakings in this space. We have not jumped in with both feet, but have moved beyond dipping our toe in, and we believe we have insights to share and have good capability with AI in the context of Oracle Apex on OCI. Feel free to reach out to us here if you wish to discuss further.




