Software vulnerabilities are essentially errors in code that malicious actors can exploit. Advanced language models such as CodeBERT, GraphCodeBERT, and CodeT5 can detect these vulnerabilities, provide detailed analysis assessments, and even recommend patches to address them.
These models have proven to be highly effective in identifying and mitigating software vulnerabilities, making them an essential tool for any organization looking to enhance their security posture.
A tool named AIBugHunter in VSCode uses these models for adequate software security.
While ChatGPT and other large language models excel in code-related tasks, no comprehensive studies have assessed their potential for the entire vulnerability workflow, including-
Recently, the following cybersecurity researchers from Monash University, Clayton, Australia, have explored ChatGPT’s use in software vulnerability tasks, including prediction, classification, and smart contract correction:-
Some previous studies examined large language models in automated program repair but not the latest ChatGPT versions.
Cybersecurity researchers analyzed the ability of ChatGPT for the following four vulnerability prediction tasks:-
ChatGPT’s 1.7 trillion parameters vastly exceed those of source code-oriented models like CodeBERT, making prompt-based usage essential. Fine-tuning for vulnerability tasks isn’t possible due to ChatGPT’s proprietary parameters.
An example prompt for function and line-level vulnerability prediction (Source – Arxiv)
Security analysts evaluate ChatGPT (get-3.5-turbo and gpt-4) against code-specific models.
They compared it with AIBugHunter, CodeBERT, GraphCodeBERT, and VulExplainer on four vulnerability tasks using Big-Vul and CVEFixes datasets, addressing four research questions.
Here, we have mentioned all four research questions below, along with their respective results:-
(RQ1) How accurate is ChatGPT for function and line-level vulnerability predictions?
(RQ2) How accurate is ChatGPT for vulnerability type classification?
(RQ3) How accurate is ChatGPT for vulnerability severity estimation?
(RQ4) How accurate is ChatGPT for automated vulnerability repair?
Prompt for CWE-ID classification (Source – Arxiv)
ChatGPT didn’t produce correct repair patches, whereas fine-tuned baselines repaired 7%-30%. BLEU and METEOR scores confirm baseline patches are closer to true ones.
This highlights the challenge of vulnerability repair, suggesting ChatGPT requires domain-specific fine-tuning.
If you want to leave a comment, please log in first.
Comments