As the number of computer vulnerabilities and bugs continue to increase, cyber defenders are struggling to track, prioritize and prevent all the threats and attacks. With over 213,800 known “keys” or vulnerabilities in computer systems, and likely many more unknown, it is impossible for any one person or team to manage all the information. The current method of sharing leads by feeding information into multiple databases is not effective as there is no map of how adversaries might use most of those bugs to wreak havoc.
To solve this problem, a team of scientists from Department of Energy’s Pacific Northwest National Laboratory, Purdue University, Carnegie Mellon University and Boise State University has turned to artificial intelligence. The researchers have integrated three large databases of information about computer vulnerabilities, weaknesses and likely attack patterns to create an AI-based model that automatically links vulnerabilities to specific lines of attack that adversaries could use to compromise computer systems. This new model uses natural language processing and supervised learning to bridge information in three separate cybersecurity databases: Vulnerabilities; Weaknesses; and Attacks.
While all three databases have information crucial for cyber defenders, there have been few attempts to knit all three together so that a user can quickly detect and understand possible threats and their origins, and then weaken or prevent these threats and attacks. The new AI model, called VWC-MAP, extends a previous model that linked vulnerabilities and weaknesses, called V2W-BERT, to include attack actions.
The Benefits of VWC-MAP
The team’s model automatically links vulnerabilities to the appropriate weaknesses with up to 87 percent accuracy, and links weaknesses to appropriate attack patterns with up to 80 percent accuracy. These numbers are much better than today’s tools provide, but the scientists caution that their new methods need to be tested more widely. The model is open-source, with a portion already available on GitHub, and the team will soon release the rest of the code.
The new AI model will help cyber defenders to prioritize threats and respond to them quickly. The higher they go in classifying the bugs, the more threats they can stop with one action. The ideal goal is to prevent all possible exploitations. Cyber defenders can deal with hundreds of vulnerabilities a day, and they need to know how those could be exploited and what actions to take to mitigate those threats. The AI model provides interpretation and support for prioritization so that defenders can understand where they are vulnerable and what actions they can take.
The Challenges Ahead
One hurdle for the new AI model is the dearth of labeled data for training. Currently, very few vulnerabilities—less than 1%—are linked to specific attacks. This is not a lot of data available for training. To overcome this issue, the team fine-tuned pretrained natural language models, using both an auto-encoder (BERT) and a sequence-to-sequence model (T5). The first approach used a language model to associate CVEs to CWEs and then CWEs to CAPECs through a binary link prediction approach. The second approach used sequence-to-sequence techniques to translate CWEs to CAPECs with intuitive prompts for ranking the associations. The approaches generated very similar results, which were then validated by the cybersecurity expert on the team.
The team hopes that cybersecurity experts can put this open-source platform to the test and provide feedback. There are thousands upon thousands of bugs or vulnerabilities out there, and new ones are created and discovered every day. The team needs to develop ways to stay ahead of these vulnerabilities, not only the ones that are known but the ones that haven’t been discovered yet. The AI model will help cyber defenders to prioritize and respond to threats, but it is just one tool in the arsenal of cyber defense.
The new AI model developed by the scientists is a useful tool for cyber defenders to prioritize threats and respond to them quickly. The model integrates information from three large databases to link vulnerabilities, weaknesses, and attack actions. The model is open-source and uses natural language processing and supervised learning to classify the bugs into general categories and understand how an attack might proceed. While the model is still in its early stages and needs to be tested more widely, it has already shown promising results.