Skip to main content

cortex models

This command allows you to start, stop, and manage various local or remote model operations within Cortex.

Usage:

info

You can use the --verbose flag to display more detailed output of the internal processes. To apply this flag, use the following format: cortex --verbose [subcommand].


cortex models [options] [subcommand]

Options:

OptionDescriptionRequiredDefault valueExample
-h, --helpDisplay help information for the command.No--h

Subcommands:

cortex models get

info

This CLI command calls the following API endpoint:

This command returns a model detail defined by a model_id.

Usage:


cortex models get <model_id>

For example, it returns the following:


{
"ai_template":"<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n","created":9223372036854775888,"ctx_len":4096,"dynatemp_exponent":1.0,"dynatemp_range":0.0,"engine":"llama-cpp","files":["models/cortex.so/llama3.2/3b-gguf-q4-km/model.gguf"],"frequency_penalty":0.0,"gpu_arch":"","id":"Llama-3.2-3B-Instruct","ignore_eos":false,"max_tokens":4096,"min_keep":0,"min_p":0.05000000074505806,"mirostat":false,"mirostat_eta":0.10000000149011612,"mirostat_tau":5.0,"model":"Llama-3.2-3B-Instruct","n_parallel":1,"n_probs":0,"name":"llama3.2:3b-gguf-q4-km","ngl":29,"object":"model","os":"","owned_by":"","penalize_nl":false,"precision":"","presence_penalty":0.0,"prompt_template":"<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{system_message}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n","quantization_method":"","repeat_last_n":64,"repeat_penalty":1.0,"result":"OK","seed":-1,"stop":["<|eot_id|>"],"stream":true,"system_template":"<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n","temperature":0.69999998807907104,"text_model":false,"tfs_z":1.0,"top_k":40,"top_p":0.89999997615814209,"typ_p":1.0,"user_template":"<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n","version":"2"
}

info

This command uses a model_id from the model that you have downloaded or available in your file system.

Options:

OptionDescriptionRequiredDefault valueExample
model_idThe identifier of the model you want to retrieve.Yes-mistral
-h, --helpDisplay help information for the command.No--h

cortex models list

info

This CLI command calls the following API endpoint:

This command lists all the downloaded local and remote models.

Usage:


cortex models list [options]

For example, it returns the following:w


+---------+---------------------------------------------------------------------------+
| (Index) | ID |
+---------+---------------------------------------------------------------------------+
| 1 | llama3.2:3b-gguf-q4-km |
+---------+---------------------------------------------------------------------------+
| 2 | tinyllama:1b-gguf |
+---------+---------------------------------------------------------------------------+
| 3 | TheBloke:Mistral-7B-Instruct-v0.1-GGUF:mistral-7b-instruct-v0.1.Q2_K.gguf |
+---------+---------------------------------------------------------------------------+

Options:

OptionDescriptionRequiredDefault valueExample
-h, --helpDisplay help for command.No--h
-e, --engineDisplay engines.No---engine
-v, --versionDisplay version for model.No---version
--cpu_modeDisplay CPU mode.No---cpu_mode
--gpu_modeDisplay GPU mode.No---gpu_mode

cortex models start

info

This CLI command calls the following API endpoint:

This command starts a model defined by a model_id.

Usage:


cortex models start [options] <model_id>

info

This command uses a model_id from the model that you have downloaded or available in your file system.

Options:

OptionDescriptionRequiredDefault valueExample
model_idThe identifier of the model you want to start.YesPrompt to select from the available modelsmistral
--gpusList of GPUs to use.No-[0,1]
--ctx_lenMaximum context length for inference.Nomin(8192, max_model_context_length)1024
-h, --helpDisplay help information for the command.No--h

cortex models stop

info

This CLI command calls the following API endpoint:

This command stops a model defined by a model_id.

Usage:


cortex models stop <model_id>

info

This command uses a model_id from the model that you have started before.

Options:

OptionDescriptionRequiredDefault valueExample
model_idThe identifier of the model you want to stop.Yes-mistral
-h, --helpDisplay help information for the command.No--h

cortex models delete

info

This CLI command calls the following API endpoint:

This command deletes a local model defined by a model_id.

Usage:


cortex models delete <model_id>

info

This command uses a model_id from the model that you have downloaded or available in your file system.

Options:

OptionDescriptionRequiredDefault valueExample
model_idThe identifier of the model you want to delete.Yes-mistral
-h, --helpDisplay help for command.No--h

cortex models update

info

This CLI command calls the following API endpoint:

This command updates the model.yaml file of a local model.

Usage:


cortex models update [options]

Options:

OptionDescriptionRequiredDefault valueExample
-h, --helpDisplay help for command.No--h
--model_id REQUIREDUnique identifier for the model.Yes---model_id my_model
--nameName of the model.No---name "GPT Model"
--modelModel type or architecture.No---model GPT-4
--versionVersion of the model to use.No---version 1.2.0
--stopStop token to terminate generation.No---stop "</s>"
--top_pSampling parameter for nucleus sampling.No---top_p 0.9
--temperatureControls randomness in generation.No---temperature 0.8
--frequency_penaltyPenalizes repeated tokens based on frequency.No---frequency_penalty 0.5
--presence_penaltyPenalizes repeated tokens based on presence.No0.0--presence_penalty 0.6
--max_tokensMaximum number of tokens to generate.No---max_tokens 1500
--streamStream output tokens as they are generated.Nofalse--stream true
--nglNumber of generations in parallel.No---ngl 4
--ctx_lenMaximum context length in tokens.No---ctx_len 1024
--engineCompute engine for running the model.No---engine CUDA
--prompt_templateTemplate for the prompt structure.No---prompt_template "###"
--system_templateTemplate for system-level instructions.No---system_template "SYSTEM"
--user_templateTemplate for user inputs.No---user_template "USER"
--ai_templateTemplate for AI responses.No---ai_template "ASSISTANT"
--osOperating system environment.No---os Ubuntu
--gpu_archGPU architecture specification.No---gpu_arch A100
--quantization_methodQuantization method for model weights.No---quantization_method int8
--precisionFloating point precision for computations.Nofloat32--precision float16
--tpTensor parallelism.No---tp 4
--trtllm_versionVersion of the TRTLLM library.No---trtllm_version 2.0
--text_modelThe model used for text generation.No---text_model llama2
--filesFile path or resources associated with the model.No---files config.json
--createdCreation date of the model.No---created 2024-01-01
--objectThe object type (e.g., model or file).No---object model
--owned_byThe owner or creator of the model.No---owned_by "Company"
--seedSeed for random number generation.No---seed 42
--dynatemp_rangeRange for dynamic temperature scaling.No---dynatemp_range 0.7-1.0
--dynatemp_exponentExponent for dynamic temperature scaling.No---dynatemp_exponent 1.2
--top_kTop K sampling to limit token selection.No---top_k 50
--min_pMinimum probability threshold for tokens.No---min_p 0.1
--tfs_zToken frequency selection scaling factor.No---tfs_z 0.5
--typ_pTypicality-based token selection probability.No---typ_p 0.9
--repeat_last_nNumber of last tokens to consider for repetition penalty.No---repeat_last_n 64
--repeat_penaltyPenalty for repeating tokens.No---repeat_penalty 1.2
--mirostatMirostat sampling method for stable generation.No---mirostat 1
--mirostat_tauTarget entropy for Mirostat.No---mirostat_tau 5.0
--mirostat_etaLearning rate for Mirostat.No---mirostat_eta 0.1
--penalize_nlPenalize new lines in generation.Nofalse--penalize_nl true
--ignore_eosIgnore the end of sequence token.Nofalse--ignore_eos true
--n_probsNumber of probability outputs to return.No---n_probs 5

cortex models import

This command imports the local model using the model's gguf file.

Usage:

info

This CLI command calls the following API endpoint:


cortex models import --model_id <model_id> --model_path </path/to/your/model.gguf>

Options:

OptionDescriptionRequiredDefault valueExample
-h, --helpDisplay help for command.No--h
--model_idThe identifier of the model.Yes-mistral
--model_pathThe path of the model source file.Yes-/path/to/your/model.gguf