Lecture
A fingerprint (or digital device fingerprint ) is a set of data collected about a user's browser and system that allows them to be identified. The process of collecting such data is called fingerprinting. Even if cookies are disabled, the fingerprint can be used, in whole or in part, to identify the user.
Analytics services have long used basic information about browser configuration to distinguish legitimate users from suspicious traffic. With the development of scripting technologies, it has become possible to extract more unique parameters specific to a particular device. Combining these parameters forms a unique digital fingerprint.
In 2010, the Electronic Frontier Foundation (EFF) showed that a fingerprint can contain up to 18.1 bits of entropy, allowing it to distinguish most users. Later, with the advent of Canvas fingerprinting, another 5.7 bits were added to this estimate, making identification even more accurate.
Digital fingerprints are now being actively used to detect fraudulent activities, such as identity theft and bank card abuse. Based on a user's signal profile, formed from their fingerprint, it is possible to predict the likelihood of fraud and take preventive measures.
Until 2017, the fingerprint was tied to a specific browser, making it easy to change the identifier by simply switching to another browser. However, in 2017, a group of American researchers presented a method of cross-browser fingerprinting (CBF), which allows tracking a user regardless of the browser used on the same device.
This technology uses browser-independent parameters such as graphics rendering performance, hardware characteristics (such as the number of processor cores), WebGL and Canvas behavior, and graphics processing features. This makes it possible to identify a device with over 99% accuracy, even if the user changes browsers or clears cookies.
The concept of a device fingerprint is related to the practical value of human fingerprints. Ideally, each machine would have a different fingerprint value (distinctiveness) and this value would never change (stability). In such a case, it would be possible to uniquely identify each machine on the network without the user's consent.
In practice, neither distinction nor stability can be fully achieved. Improvement of one parameter entails deterioration of the other.
A digital fingerprint can mean:
Fingerprinting methods can be hidden or active.
Covert fingerprinting is done by making a covert request to the client machine. These methods rely on precise classification of client parameters such as TCP/IP configuration, OS fingerprint, IEEE 802.11 (WiFi) settings, and time offset.
Active fingerprinting relies on the client allowing requests to be made. The most common method is to install executable code directly on the client machine. Such code will have access to more hidden parameters, such as the MAC address or unique serial numbers of the equipment. Such information is useful for programs in the field of technical means of copyright protection.
The motivation for the concept of device fingerprints stems from the forensic value of human fingerprints.
In order to uniquely differentiate between devices over time, fingerprints must be sufficiently diverse and sufficiently stable. In practice, neither diversity nor stability is fully achievable, and improving one tends to negatively impact the other. For example, including additional browser settings in a browser fingerprint usually increases diversity but also reduces stability, because if the user changes this setting, the browser fingerprint will also change.
Entropy is one of several ways to measure diversity.
There are many types of digital fingerprinting that websites use to identify and track users. Here are the main categories and specific methods:
Identification of the user by browser and device characteristics:
User-Agent: browser type, OS, version
Screen resolution & color depth
Installed fonts
Language & timezone
Canvas fingerprinting: drawing an image on a and analyzing the pixels
WebGL fingerprinting: features of 3D graphics rendering
AudioContext fingerprinting: audio API behavior
Touch support: presence of a touch screen
Battery status API (deprecated but used)
Media devices enumeration: list of available cameras and microphones
Collecting data about a physical device:
Hardware specs: CPU, GPU, RAM
Sensor data: accelerometer, gyroscope
Operating system details
Device orientation and motion
Classic method:
HTTP cookies: stored in the browser
LocalStorage/SessionStorage
IndexedDB: can be used to store identifiers
Network parameters analysis:
IP address
TCP/IP stack behavior
TLS fingerprinting: the order and types of ciphers used in an HTTPS connection
DNS requests: what domains are being requested
Tracking user behavior:
Mouse and touch movements
Typing speed and style
Site Navigation Patterns
Response time to interface elements
More hidden or complex methods:
Evercookies: stored in several places (Flash, HTML5, ETags)
ETag tracking: HTTP headers for caching
HSTS supercookies: using security policy to store identifiers
TLS session resumption tracking
Sources of Identifying Information Fingerprints in the OSI Model
Applications that are locally installed on a device can collect a large amount of information about the device's software and hardware, often including unique identifiers such as the MAC address and serial numbers assigned to the machine's hardware. Indeed, programs that use digital rights management use this information to uniquely identify the device.
Even if they are not designed to collect and share identifying information, local applications may be reluctant to disclose identifying information to remote parties with which they interact. The most prominent example is web browsers, which have been shown to provide diverse and stable information in sufficient quantity to allow remote identification,
Diverse and stable information can also be collected below the application layer using protocols that are used to transmit data. Here are some examples of such protocols, sorted by OSI model layer /
Passive fingerprinting methods simply require the fingerprint scanner to observe traffic originating from the target device, while active fingerprinting methods require the fingerprint sensor to initiate connections to the target device. Methods that require interaction with the target device via a connection initiated by the latter are sometimes considered semi-passive.
Hidden collection of device parameters below the browser level can be done at some levels of the OSI model. During normal operation, various network protocols either broadcast packets or headers that allow the client parameters to be determined. Here are some examples of such protocols:
Client fingerprinting (using a browser) can be done using JavaScript or other scripting languages to collect a large number of parameters. Only two classes of network users have serious limitations for tracking: mobile devices and programs with increased security
A separate problem is the ability of a user to have several browsers on one device, and even more so several virtual hosts. Since each of the entities can have its own fingerprint, it can be changed extremely quickly, unless the new cross-browser fingerprint technology is used
Collecting large amounts of varied and stable information from web browsers is possible in large part thanks to client-side scripting languages that were introduced in the late 1990s.
Browsers provide their name and version, as well as some compatibility information, in the User-Agent request header. Since the client is free to express its opinion, it should not be trusted to assess its identity. Instead, the browser type and version can be inferred from observing quirks in its behavior: for example, the order and number of HTTP header fields are unique to each browser family, and most importantly, each browser family and version differs in the implementation of HTML5 , CSS , and JavaScript . Such differences can be remotely inspected using JavaScript. Hamming distance comparisons of parser behavior have been shown to effectively recognize and differentiate most browser versions.
| Browser family | Deleting a property (navigator object) | Reassign (navigator/screen object) |
|---|---|---|
| Google Chrome | available | available |
| Mozilla Firefox | ignored | ignored |
| Opera | available | available |
| Internet Explorer | ignored | ignored |
A unique combination of browser extensions or plugins can be added directly to the fingerprint. Extensions can also change the behavior of any other browser attributes, adding additional complexity to the user's fingerprint. Adobe Flash and Java plugins were widely used to access user information before their support was discontinued.
User agents may provide information about the system hardware, such as the phone model, in an HTTP header. Properties about the user's operating system, screen size, screen orientation, and aspect ratio may also be obtained by observing the result of CSS media queries with JavaScript.
A fingerprint scanner can determine which sites a browser has previously visited from a list it provides by querying the list with JavaScript using the CSS :visited selector. Typically, a list of 50 popular websites is enough to create a unique profile of a user's history, as well as provide information about the user's interests. However, browsers have since reduced this risk.
The bounding boxes around letters vary between browsers based on font antialiasing and hinting configuration, and can be measured using JavaScript.
Canvas fingerprinting uses the HTML5 Canvas element, which is used by WebGL to render 2D and 3D graphics in the browser, to obtain identifying information about the installed graphics driver, video card, or graphics processing unit (GPU). Canvas-based methods can also be used to identify installed fonts. Additionally, if the user does not have a GPU, processor information can be provided to the fingerprint scanner.
The canvas fingerprinting script first draws text with a specified font, size, and background color. The text image displayed by the user's browser is then recovered using the Canvas API's ToDataURL method. The hashed text data becomes the user's fingerprint. The fingerprinting methods have been shown to produce 5.7 bits of entropy. Since this method obtains information about the user's GPU, the entropy of the information obtained is "orthogonal" to the entropy of previous browser fingerprinting methods, such as screen resolution and JavaScript capabilities.
Benchmarks can be used to determine whether a user's CPU is using AES-NI or Intel Turbo Boost by comparing the CPU time used to execute various simple or cryptographic algorithms.
Specialized APIs can also be used, such as the Battery API, which generates a short-term fingerprint based on the device's actual battery health, or the OscillatorNode, which can be called to generate a waveform based on custom entropy.
The device's hardware identifier, which is a cryptographic hash function specified by the device vendor, can also be requested to create a fingerprint.
Offers simplified fingerprint
Users can try to reduce the possibility of fingerprinting by choosing a web browser that minimizes the availability of identifying information such as browser fonts, device ID, canvas element mappings, WebGL information, and local IP address.
As of 2017, Microsoft Edge is considered the browser with the most fingerprints, followed by Firefox and Google Chrome, Internet Explorer, and Safari. Among mobile browsers, Google Chrome and Opera Mini have the most fingerprints, followed by mobile Firefox, mobile Edge, and mobile Safari.
Tor Browser disables fingerprinting features such as canvas and the WebGL API and notifies users of fingerprinting attempts.
Spoofing some of the information available to the fingerprint scanner (such as the user agent) can reduce diversity. The opposite can be achieved if the mismatch between the fake information and the browser's real information distinguishes the user from everyone else who does not use such a strategy.
Different substitution of information with each visit to the site can reduce stability.
Different browsers on the same computer usually have different fingerprints, but if both browsers are not protected from fingerprinting, then two fingerprints can be identified as coming from the same computer.
Blindly blocking client-side scripts served by third-party domains, and possibly also by your own domains (for example by disabling JavaScript or using NoScript ) can sometimes make websites unusable. The preferred approach is to block only third-party domains that appear to track people, either because they are on a blacklist of tracking domains (the approach taken by most ad blockers ) or because the tracking intent is based on past observations (the approach taken by Privacy Badger ).
The value of some web browser attributes can be randomized without any visible effect to the browser user. These attributes include audio or canvas rendering, which can be slightly distorted by a small amount of random noise. This is disturbing to a bot looking for a fingerprint that exactly matches a fingerprint it has encountered in the past. Meanwhile, the user is unaware of these micro-random changes. This method was proposed and evaluated by Nikiforakis in 2015 and Laperdrix in 2017. These two works were introduced in the Brave browser in 2020.
Canvas fingerprinting is a browser fingerprinting method for tracking online users that allows websites to identify and track visitors using the HTML5 canvas element instead of browser cookies or other similar means. The method received widespread media coverage in 2014 after researchers from Princeton University and the University of Leuven described it in their paper The Web Never Forgets .
Description of fingerprinting canvas
Canvas fingerprinting works by using the HTML5 canvas element. As described by Acar et al. in:
When a user visits a page, the fingerprinting script first draws the text with the selected font and size and adds background colors (1). The script then calls the ToDataURL Canvas API method to get the canvas pixel data in dataURL format (2), which is basically a Base64-encoded representation of the binary pixel data. Finally, the script takes a hash of the text-encoded pixel data (3), which serves as a fingerprint...
Graphics processor (GPU) or graphics driver installation options may cause a change in the fingerprint. The fingerprint may be stored and shared with advertising partners to identify users when they visit affiliated websites. A profile may be created based on the user's browsing activity, allowing advertisers to target ads to the user's inferred demographics and preferences.
Since the fingerprint is primarily based on the browser, operating system, and installed graphics hardware, it does not uniquely identify users. In a small study with hundreds of participants from Amazon's Mechanical Turk, an experimental entropy of 5.7 bits was observed. The authors of the study suggest that more entropy would likely be observed in the wild and with more patterns used in fingerprints. While not enough to identify individual users, this fingerprint can be combined with other sources of entropy to produce a unique identifier. It is claimed that since the method effectively fingerprints the GPU, the entropy is "orthogonal" to that of previous browser fingerprinting methods, such as screen resolution and browser. JavaScript capabilities .
Users and their constituencies may view fingerprinting as a breach of user privacy. Computer security experts may view fingerprinting as a browser vulnerability.
Criticism of digital fingerprinting is an important topic, especially in the context of privacy, ethics and control over personal data. Here are the main arguments against the use of digital fingerprinting:
Invisibility to the user: Unlike cookies, fingerprinting works invisibly - the user does not see that he is being tracked.
No Opt-Out: Most methods do not require consent, and the user cannot easily disable fingerprinting as is possible with cookies.
GDPR and other laws require transparency and consent, but fingerprinting often circumvents these regulations.
Collecting data about your device, behavior, location, and demographics allows you to create deeply personalized profiles that can be used:
For advertising manipulation
For discrimination (e.g. by price)
For surveillance and control
"Fingerprinting collects almost everything that can be collected to understand how a person uses a device. It's astonishing in scope and depth" — Debbie Reynolds, privacy expert
Digital fingerprints may be sold or transferred to third parties, including data brokers.
Leaks (like those from Facebook and Cambridge Analytica) show how dangerous it is to store such data.
Cyberattacks can use fingerprints to bypass security systems.
Fingerprints are not always unique - devices with the same configuration may have the same fingerprints.
Changes in the system (updates, browser change) can disrupt the stability of the fingerprint, reducing its effectiveness.
Lack of transparency: Users are often unaware that they are being tracked.
Violation of children's rights: There is no way to determine the age of the user, and children have increased privacy rights.
Trust manipulation: Companies may use fingerprinting under the guise of “improving user experience,” hiding the real purpose
Comments
To leave a comment
information security - Cryptography and Cryptanalysis. Steganography. Information protection
Terms: information security - Cryptography and Cryptanalysis. Steganography. Information protection