Since zero-day phishing and malware campaigns are just so hard to recognize, a solution depends on drawing on experience with past threats. One such method for detecting similarities between new files and known threats involves the creation of a type of digital fingerprint known as "fuzzy hashes."
SSDeep is designed for the generation of fuzzy hashes. In doing this, it provides detection of similar file content by checking for patterns in the code. In other words, even if some parts change in the code, there will be certain elements that might remain the same and give clues to identify malware.
Although ssdeep has been quite efficient in malware detection on its own, it performs better in the detection of new threats when supplemented with advanced AI analytics. In this blog, you will learn how ssdeep can leverage and improve phishing detection and indicate its implementation in Check Point's Zero Phishing for active detection and blocking of phishing and malicious web campaigns.
Fuzzy Hashing for Phishing and Malware Detection
Provided with a security team that can support huge databases of webpages, fuzzy hashing can unlock important correlations between unrelated domains and known malicious campaigns.
Using the ssdeep fuzzy hashing program will enable us to build an effective phishing campaign detection and clustering system by catching in the wild grouping together web pages from different domains with similar source code in HTML.
Due to this methodology, Check Point has already been able to identify thousands of phishing clusters in use to protect potential victims globally.
Why ssdeep Cluster Detection is Essential for New Threats
Provided with a security team that can support huge databases of webpages, fuzzy hashing can unlock important correlations between unrelated domains and known malicious campaigns.
Using the ssdeep fuzzy hashing program will enable us to build an effective phishing campaign detection and clustering system by catching in the wild grouping together web pages from different domains with similar source code in HTML.
Due to this methodology, Check Point has already been able to identify thousands of phishing clusters in use to protect potential victims globally.
Take, for instance, a Meta phishing campaign. Two phishing pages might differ enough to evade a classic signature detection algorithm:
Figure 1 – Screenshots of Two Facebook Phishing Campaigns
These pages, hosted on two unrelated domains using popular web hosting services, have similar structures but show differences in the text:
- feedbacdeveloper-case[.]d3nstmqzpmeow6[.]amplifyapp[.]com
- personal-interests-2437e1[.]netlify[.]app
Comparing the HTML code reveals minor differences in the <title>
, <link>
tags, and other elements.
Figure 2 – A Diff Checker Tool Showing Differences Between the Two Webpages' Source Code
As expected, calculating the SHA256 for these two files results in completely different hashes. However, comparing their ssdeep hashes shows a high similarity score, indicating that the files are nearly identical.
However, when comparing their ssdeep hashes, we find there is a high level of similarity between the files:
The ssdeep similarity score calculated by the program reveals a 97% similarity, concluding that the files contain very similar data.
By using cluster methodology, new threats can be identified based on their similarity to known threats, even when not identical. Traditional solutions might miss these connections, but ssdeep ensures that similar threats are detected.
Combating the Rise of Phishing Kits with Cluster Methodology and Databases
The rise of automated phishing kits and Phishing-as-a-Service has made it easier to deploy phishing attacks. These kits allow fraudsters with minimal technical skills to create fake websites or send spoofed emails and texts without writing code.
When a phishing kit generates a new campaign targeting a specific brand, the base code of the phishing page is typically prewritten. Different attackers might tweak the text or images within the spoofed page, but the core remains the same.
Sophisticated brand-spoofing tools utilize cluster methodology to build connections between similar known and new websites. When a cluster is generated for a phishing kit-produced webpage, all potential phishing pages created by other attackers using the same tool are blocked, regardless of the hosting domain’s reputation. This methodology effectively blocks popular phishing kit-generated attacks.
Over time, Check Point has amassed a substantial collection of phishing campaign source code hosted on various domains. Given the short lifespan of a hosted phishing page on a single domain, the process involves regularly cleaning the data to retain only active campaigns.
The ssdeep hash is calculated for each extracted HTML code, creating an indexed database of ssdeep hashes, URLs, and targeted spoofed brands.
This database allows for the systematic generation of similarity clusters. By comparing each ssdeep hash to others in the database, the similarity score is recorded for any match where it is above zero.
As the database grows, linking multiple objects with similarity scores becomes complex, necessitating a transition from a simple index to a more sophisticated graph database. This approach is particularly effective when linking a single object to many others while considering the correlation strength between different objects.
This technique forms clusters by connecting highly correlated nodes. While isolated nodes appear on the periphery, the center reveals strongly correlated clusters of nodes, each representing a different phishing campaign derived from unique source code hosted on a unique domain.
Visualizing the Clusters: A Look at Malicious Webpages
Now let’s see the results beyond data and graphs by examining the actual malicious webpages. These pages might appear different, but they are correlated.
Opening our sandbox and exploring different domains within the same cluster reveals the following:
Although the logos and colors vary, when viewed side-by-side, it’s evident that all the webpages in this cluster were created by the same entity.
Examining the source code shows that most of the code is similar, with key differences including:
- Brand logo
- Page title
- Contact information (email and phone number)
- CSS classes
Another popular cluster involves crypto-related webpages. Although these pages might seem to represent different companies, they are part of the same family:
Here, the differences are more apparent. Each page in this cluster represents a different brand, with varying contact details, images, and brand colors. Despite the weaker correlation in this cluster, the similarities are still strong enough to link these webpages.
Summary
Comparing previously unseen URLs to ssdeep-based clusters enables us to block threats solely based on their high similarity and correlation to known malicious clusters in our database. This method not only enhances phishing detection but also helps preemptively block potential threats. ThreatCloud AI currently protects tens of thousands of organizations from phishing attacks using precise and accurate methodologies.
Investigating malicious campaigns within the same cluster improves our understanding of emerging trends, evasion techniques, and popular spoofed brands, continuously enhancing our detection capabilities. This holistic approach ensures we stay ahead of evolving phishing tactics, providing robust protection for our clients.
This blog is brought to you by Check Point’s Zero-Phishing engine, part of ThreatCloud AI, which revolutionizes Threat Prevention by providing industry-leading security as part of Check Point’s Quantum, Harmony, and CloudGuard product lines.