There are many factors to take into account to answer that question. For instance, you may not have the engineering resources to set up a hosted open-source LLM. In this case, you would be forced to go with a third-party API and prompting.
If you don't have any constraints, the answer can only be obtained by testing. Open-source LLMs are inferior to ChatGPT (GPT-3.5 and GPT-4), so they may or may not be enough for your task.
The "art" of designing a good prompt for an LLM is called "prompt engineering". It takes a lot of trial and error. There are some useful tricks, like adding "Be concise" if you want short answers. You may even need to design more generic "prompt templates" to use in different cases by just setting some placeholders to specific values.
There are many reasons prompting is not practical:
- Not fault-proof.
- Consumes tokens from the input, hence increasing the API call price and reducing the amount of output tokens the LLM can generate.
There are also pro's:
- You do not need to retrain a model for it to perform some task.
Fine tuning is very very inconvenient:
- not fault-proof either
- not available (for now) for proprietary models (GPT-*, Claude).
- needs a lot of resources
- needs more engineering abilities
- it may lead to catastrophic forgetting
So, to answer your question: It depends.
gpt-turbo-3.5*
(i.e. GPT-3.5) andgpt-4*
(i.e. GPT-4) were never fine-tuneable; older GPT models likedavinci
are fine-tuneable. Here is the list of fine-tuneable models. – noe Jun 21 '23 at 17:02>=3.5-turbo
– Andy Jun 21 '23 at 17:20