On the Generation of Transient Numeric Identifiers

On the Generation of Transient Numeric Identifiers SI6 Networks

Segurola y Habana 4310 7mo piso Ciudad Autonoma de Buenos Aires Argentina [email protected] https://www.si6networks.com

Quarkslab

Segurola y Habana 4310 7mo piso Ciudad Autonoma de Buenos Aires Argentina [email protected] https://www.quarkslab.com

Privacy Enhancements and Assessments security vulnerability algorithm attack fingerprinting This document performs an analysis of the security and privacy implications of different types of "transient numeric identifiers" used in IETF protocols and tries to categorize them based on their interoperability requirements and their associated failure severity when such requirements are not met. Subsequently, it provides advice on possible algorithms that could be employed to satisfy the interoperability requirements of each identifier category while minimizing the negative security and privacy implications, thus providing guidance to protocol designers and protocol implementers. Finally, it describes a number of algorithms that have been employed in real implementations to generate transient numeric identifiers and analyzes their security and privacy properties. This document is a product of the Privacy Enhancements and Assessments Research Group (PEARG) in the IRTF.

Status of This Memo This document is not an Internet Standards Track specification; it is published for informational purposes. This document is a product of the Internet Research Task Force (IRTF). The IRTF publishes the results of Internet-related research and development activities. These results might not be suitable for deployment. This RFC represents the consensus of the Privacy Enhancements and Assessments Research Group of the Internet Research Task Force (IRTF). Documents approved for publication by the IRSG are not candidates for any level of Internet Standard; see Section 2 of RFC 7841. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at .

Copyright Notice Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents () in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document.

Table of Contents

. Introduction
. Terminology
. Threat Model
. Issues with the Specification of Transient Numeric Identifiers
. Protocol Failure Severity
. Categorizing Transient Numeric Identifiers
. Common Algorithms for Transient Numeric Identifier Generation
- . Category #1: Uniqueness (Soft Failure)
- . Category #2: Uniqueness (Hard Failure)
- . Category #3: Uniqueness, Stable within Context (Soft Failure)
- . Category #4: Uniqueness, Monotonically Increasing within Context (Hard Failure)
. Common Vulnerabilities Associated with Transient Numeric Identifiers
- . Network Activity Correlation
- . Information Leakage
- . Fingerprinting
- . Exploitation of the Semantics of Transient Numeric Identifiers
- . Exploitation of Collisions of Transient Numeric Identifiers
- . Exploitation of Predictable Transient Numeric Identifiers for Injection Attacks
- . Cryptanalysis
. Vulnerability Assessment of Transient Numeric Identifiers
- . Category #1: Uniqueness (Soft Failure)
- . Category #2: Uniqueness (Hard Failure)
- . Category #3: Uniqueness, Stable within Context (Soft Failure)
- . Category #4: Uniqueness, Monotonically Increasing within Context (Hard Failure)
. IANA Considerations
. Security Considerations
. References
- . Normative References
- . Informative References
. Algorithms and Techniques with Known Issues
- . Predictable Linear Identifiers Algorithm
- . Random-Increments Algorithm
- . Reusing Identifiers Across Different Contexts
Acknowledgements
Authors' Addresses

Introduction Networking protocols employ a variety of transient numeric identifiers for different protocol objects, such as IPv4 and IPv6 Identification values , IPv6 Interface Identifiers (IIDs) , transport-protocol ephemeral port numbers , TCP Initial Sequence Numbers (ISNs) , NTP Reference IDs (REFIDs) , and DNS IDs . These identifiers typically have specific requirements (e.g., uniqueness during a specified period of time) that must be satisfied such that they do not result in negative interoperability implications and an associated failure severity when such requirements are not met. For more than 30 years, a large number of implementations of IETF protocols have been subject to a variety of attacks, with effects ranging from Denial of Service (DoS) or data injection to information leakages that could be exploited for pervasive monitoring . The root cause of these issues has been, in many cases, the poor selection of transient numeric identifiers in such protocols, usually as a result of insufficient or misleading specifications. While it is generally trivial to identify an algorithm that can satisfy the interoperability requirements of a given transient numeric identifier, empirical evidence exists that doing so without negatively affecting the security and/or privacy properties of the aforementioned protocols is prone to error . For example, implementations have been subject to security and/or privacy issues resulting from:

predictable IPv4 or IPv6 Identification values (e.g., see , , and ),
predictable IPv6 IIDs (e.g., see , , and ),
predictable transport-protocol ephemeral port numbers (e.g., see and ),
predictable TCP Initial Sequence Numbers (ISNs) (e.g., see , , and ),
predictable initial timestamps in TCP timestamps options (e.g., see and ), and
predictable DNS IDs (see, e.g., and ).

Recent history indicates that, when new protocols are standardized or new protocol implementations are produced, the security and privacy properties of the associated transient numeric identifiers tend to be overlooked, and inappropriate algorithms to generate such identifiers are either suggested in the specifications or selected by implementers. As a result, advice in this area is warranted. We note that the use of cryptographic techniques may readily mitigate some of the issues arising from predictable transient numeric identifiers. For example, cryptographic authentication can readily mitigate data injection attacks even in the presence of predictable transient numeric identifiers (such as "sequence numbers"). However, use of flawed algorithms (such as global counters) for generating transient numeric identifiers could still result in information leakages even when cryptographic techniques are employed. This document contains a non-exhaustive survey of transient numeric identifiers employed in various IETF protocols and aims to categorize such identifiers based on their interoperability requirements and the associated failure severity when such requirements are not met. Subsequently, it provides advice on possible algorithms that could be employed to satisfy the interoperability requirements of each category while minimizing negative security and privacy implications. Finally, it analyzes several algorithms that have been employed in real implementations to meet such requirements and analyzes their security and privacy properties. This document represents the consensus of the Privacy Enhancements and Assessments Research Group (PEARG).

Terminology

Transient Numeric Identifier:: A data object in a protocol specification that can be used to definitely distinguish a protocol object (a datagram, network interface, transport-protocol endpoint, session, etc.) from all other objects of the same type, in a given context. Transient numeric identifiers are usually defined as a series of bits and represented using integer values. These identifiers are typically dynamically selected, as opposed to statically assigned numeric identifiers (see, e.g., ). We note that different transient numeric identifiers may have additional requirements or properties depending on their specific use in a protocol. We use the term "transient numeric identifier" (or simply "numeric identifier" or "identifier" as short forms) as a generic term to refer to any data object in a protocol specification that satisfies the identification property stated above.
Failure Severity:: The interoperability consequences of a failure to comply with the interoperability requirements of a given identifier. Severity considers the worst potential consequence of a failure, determined by the system damage and/or time lost to repair the failure. In this document, we define two types of failure severity: "soft failure" and "hard failure".
Soft Failure:: A recoverable condition in which a protocol does not operate in the prescribed manner but normal operation can be resumed automatically in a short period of time. For example, a simple packet-loss event that is subsequently recovered with a packet retransmission can be considered a soft failure.
Hard Failure:: A non-recoverable condition in which a protocol does not operate in the prescribed manner or it operates with excessive degradation of service. For example, an established TCP connection that is aborted due to an error condition constitutes, from the point of view of the transport protocol, a hard failure, since it enters a state from which normal operation cannot be resumed.

Threat Model Throughout this document, we do not consider on-path attacks. That is, we assume the attacker does not have physical or logical access to the system(s) being attacked and that the attacker can only observe traffic explicitly directed to the attacker. Similarly, an attacker cannot observe traffic transferred between the sender and the receiver(s) of a target protocol but may be able to interact with any of these entities, including by, e.g., sending any traffic to them to sample transient numeric identifiers employed by the target hosts when communicating with the attacker. For example, when analyzing vulnerabilities associated with TCP Initial Sequence Numbers (ISNs), we consider the attacker is unable to capture network traffic corresponding to a TCP connection between two other hosts. However, we consider the attacker is able to communicate with any of these hosts (e.g., establish a TCP connection with any of them) to, e.g., sample the TCP ISNs employed by these hosts when communicating with the attacker. Similarly, when considering host-tracking attacks based on IPv6 Interface Identifiers, we consider an attacker may learn the IPv6 address employed by a victim host if, e.g., the address becomes exposed as a result of the victim host communicating with an attacker-operated server. Subsequently, an attacker may perform host-tracking by probing a set of target addresses composed by a set of target prefixes and the IPv6 Interface Identifier originally learned by the attacker. Alternatively, an attacker may perform host-tracking if, e.g., the victim host communicates with an attacker-operated server as it moves from one location to another, thereby exposing its configured addresses. We note that none of these scenarios require the attacker observe traffic not explicitly directed to the attacker.

Issues with the Specification of Transient Numeric Identifiers While assessing IETF protocol specifications regarding the use of transient numeric identifiers, we have found that most of the issues discussed in this document arise as a result of one of the following conditions:

protocol specifications that under specify their transient numeric identifiers
protocol specifications that over specify their transient numeric identifiers
protocol implementations that simply fail to comply with the specified requirements

A number of IETF protocol specifications under specified their transient numeric identifiers, thus leading to implementations that were vulnerable to numerous off-path attacks. Examples of them are the specification of TCP local ports in or the specification of the DNS ID in . On the other hand, there are a number of IETF protocol specifications that over specify some of their associated transient numeric identifiers. For example, essentially overloads the semantics of IPv6 Interface Identifiers (IIDs) by embedding link-layer addresses in the IPv6 IIDs when the interoperability requirement of uniqueness could be achieved in other ways that do not result in negative security and privacy implications . Similarly, suggests the use of a global counter for the generation of Identification values when the interoperability requirement of uniqueness per {IPv6 Source Address, IPv6 Destination Address} could be achieved with other algorithms that do not result in negative security and privacy implications . Finally, there are protocol implementations that simply fail to comply with existing protocol specifications. For example, some popular operating systems still fail to implement transport-protocol ephemeral port randomization, as recommended in , or TCP Initial Sequence Number randomization, as recommended in .

Protocol Failure Severity defines the concept of "failure severity", along with two types of failure severities that we employ throughout this document: soft and hard. Our analysis of the severity of a failure is performed from the point of view of the protocol in question. However, the corresponding severity on the upper protocol (or application) might not be the same as that of the protocol in question. For example, a TCP connection that is aborted might or might not result in a hard failure of the upper application, i.e., if the upper application can establish a new TCP connection without any impact on the application, a hard failure at the TCP protocol may have no severity at the application layer. On the other hand, if a hard failure of a TCP connection results in excessive degradation of service at the application layer, it will also result in a hard failure at the application.

Categorizing Transient Numeric Identifiers This section includes a non-exhaustive survey of transient numeric identifiers, which are representative of all the possible combinations of interoperability requirements and failure severities found in popular protocols of different layers. Additionally, it proposes a number of categories that can accommodate these identifiers based on their interoperability requirements and their associated failure severity (soft or hard). Survey of Transient Numeric Identifiers

Identifier	Interoperability Requirements	Failure Severity
IPv6 ID	Uniqueness (for IPv6 address pair)	Soft/Hard (1)
IPv6 IID	Uniqueness (and stable within IPv6 prefix) (2)	Soft (3)
TCP ISN	Monotonically increasing (4)	Hard (4)
TCP initial timestamp	Monotonically increasing (5)	Hard (5)
TCP ephemeral port	Uniqueness (for connection ID)	Hard
IPv6 Flow Label	Uniqueness	None (6)
DNS ID	Uniqueness	None (7)

NOTE:

While a single collision of IPv6 Identification (ID) values would simply lead to a single packet drop (and hence, a "soft" failure), repeated collisions at high data rates might result in self-propagating collisions of IPv6 IDs, thus possibly leading to a hard failure .
While the interoperability requirements are simply that the Interface Identifier results in a unique IPv6 address, for operational reasons, it is typically desirable that the resulting IPv6 address (and hence, the corresponding Interface Identifier) be stable within each network .
While IPv6 Interface Identifiers must result in unique IPv6 addresses, IPv6 Duplicate Address Detection (DAD) allows for the detection of duplicate addresses, and hence, such Interface Identifier collisions can be recovered.
In theory, there are no interoperability requirements for TCP Initial Sequence Numbers (ISNs), since the TIME-WAIT state and TCP's "quiet time" concept take care of old segments from previous incarnations of a connection. However, a widespread optimization allows for a new incarnation of a previous connection to be created if the ISN of the incoming SYN is larger than the last sequence number seen in that direction for the previous incarnation of the connection. Thus, monotonically increasing TCP ISNs allow for such optimization to work as expected and can help avoid connection-establishment failures.
Strictly speaking, there are no interoperability requirements for the initial TCP timestamp employed by a TCP instance (i.e., the TS Value (TSval) in a segment with the SYN bit set). However, some TCP implementations allow a new incarnation of a previous connection to be created if the TSval of the incoming SYN is larger than the last TSval seen in that direction for the previous incarnation of the connection (please see ). Thus, monotonically increasing TCP initial timestamps (across connections to the same endpoint) allow for such optimization to work as expected and can help avoid connection-establishment failures.
The IPv6 Flow Label , along with the IPv6 Source Address and the IPv6 Destination Address, is typically employed for load sharing . Reuse of a Flow Label value for the same set {Source Address, Destination Address} would typically cause both flows to be multiplexed onto the same link. However, as long as this does not occur deterministically, it will not result in any negative implications.
DNS IDs are employed, together with the IP Source Address, the IP Destination Address, the transport-protocol Source Port, and the transport-protocol Destination Port, to match DNS requests and responses. However, since an implementation knows which DNS requests were sent for that set of {IP Source Address, IP Destination Address, transport-protocol Source Port, transport-protocol Destination Port, DNS ID}, a collision of DNS IDs would result, if anything, in a small performance penalty (the response would nevertheless be discarded when it is found that it does not answer the query sent in the corresponding DNS query).

Based on the survey above, we can categorize identifiers as follows: Identifier Categories

Cat #	Category	Sample Numeric IDs
1	Uniqueness (soft failure)	IPv6 Flow L., DNS ID
2	Uniqueness (hard failure)	IPv6 ID, TCP ephemeral port
3	Uniqueness, stable within context (soft failure)	IPv6 IID
4	Uniqueness, monotonically increasing within context (hard failure)	TCP ISN, TCP initial timestamp

We note that Category #4 could be considered a generalized case of Category #3, in which a monotonically increasing element is added to a stable (within context) element, such that the resulting identifiers are monotonically increasing within a specified context. That is, the same algorithm could be employed for both #3 and #4, given appropriate parameters.

Common Algorithms for Transient Numeric Identifier Generation The following subsections describe some sample algorithms that can be employed for generating transient numeric identifiers for each of the categories above while mitigating the vulnerabilities analyzed in of this document. All of the variables employed in the algorithms of the following subsections are of "unsigned integer" type, except for the "retry" variable, which is of (signed) "integer" type.

Category #1: Uniqueness (Soft Failure) The requirement of uniqueness with a soft failure severity can be complied with a Pseudorandom Number Generator (PRNG). While most systems provide access to a PRNG, many of such PRNG implementations are not cryptographically secure and therefore might be statistically biased or subject to adversarial influence. For example, ISO C rand(3) implementations are not cryptographically secure. On the other hand, a number of systems provide an interface to a Cryptographically Secure PRNG (CSPRNG) , which guarantees high entropy, unpredictability, and good statistical distribution of the random values generated. For example, GNU/Linux's CSPRNG implementation is available via the getentropy(3) interface , while OpenBSD's CSPRNG implementation is available via the arc4random(3) and arc4random_uniform(3) interfaces . Where available, these CSPRNGs should be preferred over, e.g., POSIX random(3) or ISO C rand(3) implementations. In scenarios where a CSPRNG is not readily available to select transient numeric identifiers of Category #1, a security and privacy assessment of employing a regular PRNG should be performed, supporting the implementation decision. We note that, since the premise is that collisions of transient numeric identifiers of this category only lead to soft failures, in many cases, the algorithm might not need to check the suitability of a selected identifier (i.e., the suitable_id() function, described below, could always return "true"). In scenarios where, e.g., simultaneous use of a given numeric identifier is undesirable and an implementation detects such condition, the implementation may opt to select the next available identifier in the same sequence or select another random number. is an implementation of the former strategy, while is an implementation of the latter. Typically, the algorithm in results in a more uniform distribution of the generated transient numeric identifiers. However, for transient numeric identifiers where an implementation typically keeps local state about unsuitable/used identifiers, the algorithm in may require many more iterations than the algorithm in to generate a suitable transient numeric identifier. This will usually be affected by the current usage ratio of transient numeric identifiers (i.e., the number of numeric identifiers considered suitable / total number of numeric identifiers) and other parameters. Therefore, in such cases, many implementations tend to prefer the algorithm in over the algorithm in .

Simple Randomization Algorithm /* Transient Numeric ID selection function */ id_range = max_id - min_id + 1; next_id = min_id + (random() % id_range); retry = id_range; do { if (suitable_id(next_id)) { return next_id; } if (next_id == max_id) { next_id = min_id; } else { next_id++; } retry--; } while (retry > 0); return ERROR; NOTE: random() is a PRNG that returns a pseudorandom unsigned integer number of appropriate size. Beware that "adapting" the length of the output of random() with a modulo operator (e.g., C language's "%") may change the distribution of the PRNG. To preserve a uniform distribution, the rejection sampling technique can be used. suitable_id() is a function that checks, if possible and desirable, whether a candidate numeric identifier is suitable (e.g., whether it is in use or has been recently employed). Depending on how/where the numeric identifier is used, it may or may not be possible (or even desirable) to check whether the numeric identifier is suitable. All the variables (in this algorithm and all the others algorithms discussed in this document) are unsigned integers. When an identifier is found to be unsuitable, this algorithm selects the next available numeric identifier in sequence. Thus, even when this algorithm selects numeric identifiers randomly, it is biased towards the first available numeric identifier after a sequence of unavailable numeric identifiers. For example, if this algorithm is employed for transport-protocol ephemeral port randomization and the local list of unsuitable port numbers (e.g., registered port numbers that should not be used for ephemeral ports) is significant, an attacker may actually have a significantly better chance of guessing an ephemeral port number. Assuming the randomness requirements for the PRNG are met (see ), this algorithm does not suffer from any of the issues discussed in .

Another Simple Randomization Algorithm The following pseudocode illustrates another algorithm for selecting a random transient numeric identifier where, in the event a selected identifier is found to be unsuitable (e.g., already in use), another identifier is randomly selected: /* Transient Numeric ID selection function */ id_range = max_id - min_id + 1; retry = id_range; do { next_id = min_id + (random() % id_range); if (suitable_id(next_id)) { return next_id; } retry--; } while (retry > 0); return ERROR; NOTE: random() is a PRNG that returns a pseudorandom unsigned integer number of appropriate size. Beware that "adapting" the length of the output of random() with a modulo operator (e.g., C language's "%") may change the distribution of the PRNG. To preserve a uniform distribution, the rejection sampling technique can be used. suitable_id() is a function that checks, if possible and desirable, whether a candidate numeric identifier is suitable (e.g., if it is not already in use). Depending on how/where the numeric identifier is used, it may or may not be possible (or even desirable) to check whether the numeric identifier is in use (or whether it has been recently employed). When an identifier is found to be unsuitable, this algorithm selects another random numeric identifier. Thus, this algorithm might be unable to select a transient numeric identifier (i.e., return "ERROR"), even if there are suitable identifiers available, in cases where a large number of identifiers are found to be unsuitable (e.g., "in use"). Assuming the randomness requirements for the PRNG are met (see ), this algorithm does not suffer from any of the issues discussed in .

Category #2: Uniqueness (Hard Failure) One of the most trivial approaches for generating a unique transient numeric identifier (with a hard failure severity) is to reduce the identifier reuse frequency by generating the numeric identifiers with a monotonically increasing function (e.g., linear). As a result, any of the algorithms described in ("Category #4: Uniqueness, Monotonically Increasing within Context (Hard Failure)") can be readily employed for complying with the requirements of this transient numeric identifier category. In cases where suitability (e.g., uniqueness) of the selected identifiers can be definitely assessed by the local system, any of the algorithms described in ("Category #1: Uniqueness (Soft Failure)") can be readily employed for complying with the requirements of this numeric identifier category.

Category #3: Uniqueness, Stable within Context (Soft Failure) The goal of the following algorithm is to produce identifiers that are stable for a given context (identified by "CONTEXT") but that change when the aforementioned context changes. In order to avoid storing the transient numeric identifiers computed for each CONTEXT in memory, the following algorithm employs a calculated technique (as opposed to keeping state in memory) to generate a stable transient numeric identifier for each given context. /* Transient Numeric ID selection function */ id_range = max_id - min_id + 1; retry = 0; do { offset = F(CONTEXT, retry, secret_key); next_id = min_id + (offset % id_range); if (suitable_id(next_id)) { return next_id; } retry++; } while (retry <= MAX_RETRIES); return ERROR; NOTE: CONTEXT is the concatenation of all the elements that define a given context. F() is a pseudorandom function (PRF). It must not be computable from the outside (without knowledge of the secret key). F() must also be difficult to reverse, such that it resists attempts to obtain the secret key, even when given samples of the output of F() and knowledge or control of the other input parameters. F() should produce an output of at least as many bits as required for the transient numeric identifier. SipHash-2-4 (128-bit key, 64-bit output) and BLAKE3 (256-bit key, arbitrary-length output) are two possible options for F(). Alternatively, F() could be implemented with a keyed hash message authentication code (HMAC) . HMAC-SHA-256 would be one possible option for such implementation alternative. Note: Use of HMAC-MD5 or HMAC-SHA1 are not recommended for F() . The result of F() is no more secure than the secret key, and therefore, "secret_key" must be unknown to the attacker and must be of a reasonable length. "secret_key" must remain stable for a given CONTEXT, since otherwise, the numeric identifiers generated by this algorithm would not have the desired stability properties (i.e., stable for a given CONTEXT). In most cases, "secret_key" should be selected with a PRNG (see for recommendations on choosing secrets) at an appropriate time and stored in stable or volatile storage (as necessary) for future use. suitable_id() checks whether a candidate numeric identifier has suitable uniqueness properties. In this algorithm, the function F() provides a stateless and stable per-CONTEXT offset, where CONTEXT is the concatenation of all the elements that define the given context. For example, if this algorithm is expected to produce IPv6 IIDs that are unique per network interface and Stateless Address Autoconfiguration (SLAAC) prefix, CONTEXT should be the concatenation of, e.g., the network interface index and the SLAAC autoconfiguration prefix (please see for an implementation of this algorithm for generation of stable IPv6 addresses). The result of F() is stored in the variable "offset", which may take any value within the storage type range, since we are restricting the resulting identifier to be in the range [min_id, max_id] in a similar way as in the algorithm described in . As noted above, suitable_id() checks whether a candidate numeric identifier has suitable uniqueness properties. Collisions (i.e., an identifier that is not unique) are recovered by incrementing the "retry" variable and recomputing F(), up to a maximum of MAX_RETRIES times. However, recovering from collisions will usually result in identifiers that fail to remain constant for the specified context. This is normally acceptable when the probability of collisions is small, as in the case of, e.g., IPv6 IIDs resulting from SLAAC . For obvious reasons, the transient numeric identifiers generated with this algorithm allow for network activity correlation and fingerprinting within "CONTEXT". However, this is essentially a design goal of this category of transient numeric identifiers.

Category #4: Uniqueness, Monotonically Increasing within Context (Hard Failure)

Per-Context Counter Algorithm One possible way of selecting unique monotonically increasing identifiers (per context) is to employ a per-context counter. Such an algorithm could be described as follows: /* Transient Numeric ID selection function */ id_range = max_id - min_id + 1; retry = id_range; id_inc = increment() % id_range; if( (next_id = lookup_counter(CONTEXT)) == ERROR){ next_id = min_id + random() % id_range; } do { if ( (max_id - next_id) >= id_inc){ next_id = next_id + id_inc; } else { next_id = min_id + id_inc - (max_id - next_id); } if (suitable_id(next_id)){ store_counter(CONTEXT, next_id); return next_id; } retry = retry - id_inc; } while (retry > 0); return ERROR; NOTE: CONTEXT is the concatenation of all the elements that define a given context. increment() returns a small integer that is employed to increment the current counter value to obtain the next transient numeric identifier. This value must be larger than or equal to 1, and much smaller than the number of possible values for the numeric identifiers (i.e., "id_range"). Most implementations of this algorithm employ a constant increment of 1. Using a value other than 1 can help mitigate some information leakages (please see below) at the expense of a possible increase in the numeric identifier reuse frequency. The code above makes sure that the increment employed in the algorithm (id_inc) is always smaller than the number of possible values for the numeric identifiers (i.e., "max_id - min_d + 1"). However, as noted above, this value must also be much smaller than the number of possible values for the numeric identifiers. lookup_counter() is a function that returns the current counter for a given context or an error condition if that counter does not exist. random() is a PRNG that returns a pseudorandom unsigned integer number of appropriate size. Beware that "adapting" the length of the output of random() with a modulo operator (e.g., C language's "%") may change the distribution of the PRNG. To preserve a uniform distribution, the rejection sampling technique can be used. store_counter() is a function that saves a counter value for a given context. suitable_id() checks whether a candidate numeric identifier has suitable uniqueness properties. Essentially, whenever a new identifier is to be selected, the algorithm checks whether a counter for the corresponding context exists. If it does, the value of such counter is incremented to obtain the new transient numeric identifier, and the counter is updated. If no counter exists for such context, a new counter is created and initialized to a random value and used as the selected transient numeric identifier. This algorithm produces a per-context counter, which results in one monotonically increasing function for each context. Since each counter is initialized to a random value, the resulting values are unpredictable by an off-path attacker. The choice of id_inc has implications on both the security and privacy properties of the resulting identifiers and also on the corresponding interoperability properties. On one hand, minimizing the increments generally minimizes the identifier reuse frequency, albeit at increased predictability. On the other hand, if the increments are randomized, predictability of the resulting identifiers is reduced, and the information leakage produced by global constant increments is mitigated. However, using larger increments than necessary can result in higher numeric identifier reuse frequency. This algorithm has the following drawbacks:

It requires an implementation to store each per-context counter in memory. If, as a result of resource management, the counter for a given context must be removed, the last transient numeric identifier value used for that context will be lost. Thus, if an identifier subsequently needs to be generated for the same context, the corresponding counter will need to be recreated and reinitialized to a random value, thus possibly leading to reuse/collision of numeric identifiers.
Keeping one counter for each possible "context" may in some cases be considered too onerous in terms of memory requirements.

Otherwise, the identifiers produced by this algorithm do not suffer from the other issues discussed in .

Simple PRF-Based Algorithm The goal of this algorithm is to produce monotonically increasing transient numeric identifiers (for each given context) with a randomized initial value. For example, if the identifiers being generated must be monotonically increasing for each {Source Address, Destination Address} set, then each possible combination of {Source Address, Destination Address} should have a separate monotonically increasing sequence that starts at a different random value. Instead of maintaining a per-context counter (as in the algorithm from ), the following algorithm employs a calculated technique to maintain a random offset for each possible context. /* Initialization code */ counter = 0; /* Transient Numeric ID selection function */ id_range = max_id - min_id + 1; id_inc = increment() % id_range; offset = F(CONTEXT, secret_key); retry = id_range; do { next_id = min_id + (offset + counter) % id_range; counter = counter + id_inc; if (suitable_id(next_id)) { return next_id; } retry = retry - id_inc; } while (retry > 0); return ERROR; NOTE: CONTEXT is the concatenation of all the elements that define a given context. For example, if this algorithm is expected to produce identifiers that are monotonically increasing for each set {Source Address, Destination Address}, CONTEXT should be the concatenation of Source Address and Destination Address. increment() has the same properties and requirements as those specified for increment() in . F() is a PRF, with the same properties as those specified for F() in . suitable_id() checks whether a candidate numeric identifier has suitable uniqueness properties. In the algorithm above, the function F() provides a stateless, stable, and unpredictable offset for each given context (as identified by "CONTEXT"). Both the "offset" and "counter" variables may take any value within the storage type range since we are restricting the resulting identifier to be in the range [min_id, max_id] in a similar way as in the algorithm described in . This allows us to simply increment the "counter" variable and rely on the unsigned integer to wrap around. The result of F() is no more secure than the secret key, and therefore, "secret_key" must be unknown to the attacker and must be of a reasonable length. "secret_key" must remain stable for a given CONTEXT, since otherwise, the numeric identifiers generated by this algorithm would not have the desired properties (i.e., monotonically increasing for a given CONTEXT). In most cases, "secret_key" should be selected with a PRNG (see for recommendations on choosing secrets) at an appropriate time and stored in stable or volatile storage (as necessary) for future use. It should be noted that, since this algorithm uses a global counter ("counter") for selecting identifiers (i.e., all counters share the same increment space), this algorithm results in an information leakage (as described in ). For example, if this algorithm was used for selecting TCP ephemeral ports and an attacker could force a client to periodically establish a new TCP connection to an attacker-controlled system (or through an attacker-observable routing path), the attacker could subtract consecutive Source Port values to obtain the number of outgoing TCP connections established globally by the victim host within that time period (up to wrap-around issues and five-tuple collisions, of course). This information leakage could be partially mitigated by employing small random values for the increments (i.e., increment() function), instead of having increment() return the constant "1". We nevertheless note that an improved mitigation of this information leakage could be more successfully achieved by employing the algorithm from , instead.

Double-PRF Algorithm A trade-off between maintaining a single global "counter" variable and maintaining 2**N "counter" variables (where N is the width of the result of F()) could be achieved as follows. The system would keep an array of TABLE_LENGTH values, which would provide a separation of the increment space into multiple buckets. This improvement could be incorporated into the algorithm from as follows: /* Initialization code */ for(i = 0; i < TABLE_LENGTH; i++) { table[i] = random(); } /* Transient Numeric ID selection function */ id_range = max_id - min_id + 1; id_inc = increment() % id_range; offset = F(CONTEXT, secret_key1); index = G(CONTEXT, secret_key2) % TABLE_LENGTH; retry = id_range; do { next_id = min_id + (offset + table[index]) % id_range; table[index] = table[index] + id_inc; if (suitable_id(next_id)) { return next_id; } retry = retry - id_inc; } while (retry > 0); return ERROR; NOTE: increment() has the same properties and requirements as those specified for increment() in . Both F() and G() are PRFs, with the same properties as those required for F() in . The results of F() and G() are no more secure than their respective secret keys ("secret_key1" and "secret_key2", respectively), and therefore, both secret keys must be unknown to the attacker and must be of a reasonable length. Both secret keys must remain stable for the given CONTEXT, since otherwise, the transient numeric identifiers generated by this algorithm would not have the desired properties (i.e., monotonically increasing for a given CONTEXT). In most cases, both secret keys should be selected with a PRNG (see for recommendations on choosing secrets) at an appropriate time and stored in stable or volatile storage (as necessary) for future use. "table[]" could be initialized with random values, as indicated by the initialization code in the pseudocode above. The "table[]" array assures that successive transient numeric identifiers for a given context will be monotonically increasing. Since the increment space is separated into TABLE_LENGTH different spaces, the identifier reuse frequency will be (probabilistically) lower than that of the algorithm in . That is, the generation of an identifier for one given context will not necessarily result in increments in the identifier sequence of other contexts. It is interesting to note that the size of "table[]" does not limit the number of different identifier sequences but rather separates the increment space into TABLE_LENGTH different spaces. The selected transient numeric identifier sequence will be obtained by adding the corresponding entry from "table[]" to the value in the "offset" variable, which selects the actual identifier sequence space (as in the algorithm from ). An attacker can perform traffic analysis for any "increment space" (i.e., context) into which the attacker has "visibility" -- namely, the attacker can force a system to generate identifiers for G(CONTEXT, secret_key2), where the result of G() identifies the target "increment space". However, the attacker's ability to perform traffic analysis is very reduced when compared to the simple PRF-based identifiers (described in ) and the predictable linear identifiers (described in ). Additionally, an implementation can further limit the attacker's ability to perform traffic analysis by further separating the increment space (that is, using a larger value for TABLE_LENGTH) and/or by randomizing the increments (i.e., increment() returning a small random number as opposed to the constant "1"). Otherwise, this algorithm does not suffer from the issues discussed in .

Common Vulnerabilities Associated with Transient Numeric Identifiers

Network Activity Correlation An identifier that is predictable within a given context allows for network activity correlation within that context. For example, a stable IPv6 Interface Identifier allows for network activity to be correlated within the context in which the Interface Identifier is stable . A stable per-network IPv6 Interface Identifier (as in ) allows for network activity correlation within a network, whereas a constant IPv6 Interface Identifier (which remains constant across networks) allows not only network activity correlation within the same network but also across networks ("host-tracking"). Similarly, an implementation that generates TCP ISNs with a global counter could allow for fingerprinting and network activity correlation across networks, since an attacker could passively infer the identity of the victim based on the TCP ISNs employed for subsequent communication instances. Similarly, an implementation that generates predictable IPv6 Identification values could be subject to fingerprinting attacks (see, e.g., ).

Information Leakage Transient numeric identifiers that result in specific patterns can produce an information leakage to other communicating entities. For example, it is common to generate transient numeric identifiers with an algorithm such as: ID = offset(CONTEXT) + mono(CONTEXT); This generic expression generates identifiers by adding a monotonically increasing function (e.g., linear) to a randomized offset. offset() is constant within a given context, whereas mono() produces a monotonically increasing sequence for the given context. Identifiers generated with this expression will generally be predictable within CONTEXT. The predictability of mono(), irrespective of the predictability of offset(), can leak information that may be of use to attackers. For example, a node that selects transport-protocol ephemeral port numbers, as in: ephemeral_port = offset(IP_Dst_Addr) + mono() that is, with a per-destination offset but a global mono() function (e.g., a global counter), will leak information about the total number of outgoing connections that have been issued by the vulnerable implementation. Similarly, a node that generates IPv6 Identification values as in: ID = offset(IP_Src_Addr, IP_Dst_Addr) + mono() will leak out information about the total number of fragmented packets that have been transmitted by the vulnerable implementation. The vulnerabilities described in , , and are all associated with the use of a global mono() function (i.e., with a global and constant "CONTEXT") -- particularly when it is a linear function (constant increments of 1). Predicting transient numeric identifiers can be of help for other types of attacks. For example, predictable TCP ISNs can open the door to trivial connection-reset and data injection attacks (see ).

Fingerprinting Fingerprinting is the capability of an attacker to identify or reidentify a visiting user, user agent, or device via configuration settings or other observable characteristics. Observable protocol objects and characteristics can be employed to identify/reidentify various entities. These entities can range from the underlying hardware or operating system (OS) (vendor, type, and version) to the user. illustrates web-browser-based fingerprinting, but similar techniques can be applied at other layers and protocols, whether alternatively or in conjunction with it. Transient numeric identifiers are one of the observable protocol components that could be leveraged for fingerprinting purposes. That is, an attacker could sample transient numeric identifiers to infer the algorithm (and its associated parameters, if any) for generating such identifiers, possibly revealing the underlying OS vendor, type, and version. This information could possibly be further leveraged in conjunction with other fingerprinting techniques and sources. Evasion of protocol-stack fingerprinting can prove to be a very difficult task, i.e., most systems make use of a wide variety of protocols, each of which have a large number of parameters that can be set to arbitrary values or generated with a variety of algorithms with multiple parameters. Algorithms that, from the perspective of an observer (e.g., the legitimate communicating peer), result in specific values or patterns will allow for at least some level of fingerprinting. For example, the algorithm from will typically allow fingerprinting within the context where the resulting identifiers are stable. Similarly, the algorithms from will result in monotonically increasing sequences within a given context, thus allowing for at least some level of fingerprinting (when the other communicating entity can correlate different sampled identifiers as belonging to the same monotonically increasing sequence). Thus, where possible, algorithms from should be preferred over algorithms that result in specific values or patterns.

Exploitation of the Semantics of Transient Numeric Identifiers Identifiers that are not semantically opaque tend to be more predictable than semantically opaque identifiers. For example, a Media Access Control (MAC) address contains an Organizationally Unique Identifier (OUI), which may identify the vendor that manufactured the corresponding network interface card. This can be leveraged by an attacker trying to "guess" MAC addresses, who has some knowledge about the possible Network Interface Card (NIC) vendor. discusses a number of techniques to reduce the search space when performing IPv6 address-scanning attacks by leveraging the semantics of IPv6 IIDs.

Exploitation of Collisions of Transient Numeric Identifiers In many cases, the collision of transient network identifiers can have a hard failure severity (or result in a hard failure severity if an attacker can cause multiple collisions deterministically, one after another). For example, predictable IP Identification values open the door to Denial of Service (DoS) attacks (see, e.g., .).

Exploitation of Predictable Transient Numeric Identifiers for Injection Attacks Some protocols rely on "sequence numbers" for the validation of incoming packets. For example, TCP employs sequence numbers for reassembling TCP segments, while IPv4 and IPv6 employ Identification values for reassembling IPv4 and IPv6 fragments (respectively). Lacking built-in cryptographic mechanisms for validating packets, these protocols are therefore vulnerable to on-path data (see, e.g., ) and/or control-information (see, e.g., and ) injection attacks. The extent to which these protocols may resist off-path (i.e., "blind") injection attacks depends on whether the associated "sequence numbers" are predictable and the effort required to successfully predict a valid "sequence number" (see, e.g., and ). We note that the use of unpredictable "sequence numbers" is a completely ineffective mitigation for on-path injection attacks and also a mostly ineffective mitigation for off-path (i.e., "blind") injection attacks. However, many legacy protocols (such as TCP) do not incorporate cryptographic mitigations as part of the core protocol but rather as optional features (see, e.g., ), if available at all. Additionally, ad hoc use of cryptographic mitigations might not be sufficient to relieve a protocol implementation of generating appropriate transient numeric identifiers. For example, use of the Transport Layer Security (TLS) protocol with TCP will protect the application protocol but will not help to mitigate, e.g., TCP-based connection-reset attacks (see, e.g., ). Similarly, use of SEcure Neighbor Discovery (SEND) will still imply reliance on the successful reassembly of IPv6 fragments in those cases where SEND packets do not fit into the link Maximum Transmission Unit (MTU) (see ).

Cryptanalysis A number of algorithms discussed in this document (such as those described in Sections and ) rely on PRFs. Implementations that employ weak PRFs or keys of inappropriate size can be subject to cryptanalysis, where an attacker can obtain the secret key employed for the PRF, predict numeric identifiers, etc. Furthermore, an implementation that overloads the semantics of the secret key can result in more trivial cryptanalysis, possibly resulting in the leakage of the value employed for the secret key.

Vulnerability Assessment of Transient Numeric Identifiers The following subsections analyze possible vulnerabilities associated with the algorithms described in .

Category #1: Uniqueness (Soft Failure) Possible vulnerabilities associated with the algorithms from include the following:

use of flawed PRNGs (please see, e.g., , , , and )
inadvertently affecting the distribution of an otherwise suitable PRNG (please see, e.g., )

Where available, CSPRNGs should be preferred over regular PRNGs, such as, e.g., POSIX random(3) implementations. In scenarios where a CSPRNG is not readily available, a security and privacy assessment of employing a regular PRNG should be performed, supporting the implementation decision. When employing a PRNG, many implementations "adapt" the length of its output with a modulo operator (e.g., C language's "%"), possibly changing the distribution of the output of the PRNG. For example, consider an implementation that employs the following code: id = random() % 50000; This example implementation means to obtain a transient numeric identifier in the range 0-49999. If random() produces, e.g., a pseudorandom number of 16 bits (with uniform distribution), the selected transient numeric identifier will have a nonuniform distribution with the numbers in the range 0-15535 having double frequency than the numbers in the range 15536-49999. This effect is reduced if the PRNG produces an output that is much longer than the length implied by the modulo operation. We note that to preserve a uniform distribution, the rejection sampling technique can be used. Use of algorithms other than PRNGs for generating identifiers of this category is discouraged.

Category #2: Uniqueness (Hard Failure) As noted in , this category can employ the same algorithms as Category #4, since a monotonically increasing sequence tends to minimize the transient numeric identifier reuse frequency. Therefore, the vulnerability analysis in also applies to this category. Additionally, as noted in , some transient numeric identifiers of this category might be able to use the algorithms from , in which case the same considerations as in would apply.

Category #3: Uniqueness, Stable within Context (Soft Failure) Possible vulnerabilities associated with the algorithms from are the following:

Use of weak PRFs or inappropriate secret keys (whether inappropriate selection or inappropriate size) could allow for cryptanalysis, which could eventually be exploited by an attacker to predict future transient numeric identifiers.
Since the algorithm generates a unique and stable identifier within a specified context, it may allow for network activity correlation and fingerprinting within the specified context.

Category #4: Uniqueness, Monotonically Increasing within Context (Hard Failure) The algorithm described in for generating identifiers of Category #4 will result in an identifiable pattern (i.e., a monotonically increasing sequence) for the transient numeric identifiers generated for each CONTEXT, and thus will allow for fingerprinting and network activity correlation within each CONTEXT. On the other hand, a simple way to generalize and analyze the algorithms described in Sections and for generating identifiers of Category #4 is as follows: /* Transient Numeric ID selection function */ id_range = max_id - min_id + 1; retry = id_range; id_inc = increment() % id_range; do { update_mono(CONTEXT, id_inc); next_id = min_id + (offset(CONTEXT) + \ mono(CONTEXT)) % id_range; if (suitable_id(next_id)) { return next_id; } retry = retry - id_inc; } while (retry > 0); return ERROR; NOTE: increment() returns a small integer that is employed to generate a monotonically increasing function. Most implementations employ a constant value for "increment()" (usually 1). The value returned by increment() must be much smaller than the value computed for "id_range". update_mono(CONTEXT, id_inc) increments the counter corresponding to CONTEXT by "id_inc". mono(CONTEXT) reads the counter corresponding to CONTEXT. Essentially, an identifier (next_id) is generated by adding a monotonically increasing function (mono()) to an offset value, which is unknown to the attacker and stable for given context (CONTEXT). The following aspects of the algorithm should be considered:

For the most part, it is the offset() function that results in identifiers that are unpredictable by an off-patch attacker. While the resulting sequence is known to be monotonically increasing, the use of a randomized offset value makes the resulting values unknown to the attacker.
The most straightforward "stateless" implementation of offset() is with a PRF that takes the values that identify the context and a secret key (not shown in the figure above) as arguments.
One possible implementation of mono() would be to have mono() internally employ a single counter (as in the algorithm from ) or map the increments for different contexts into a number of counters/buckets, such that the number of counters that need to be maintained in memory is reduced (as in the "Double-PRF Algorithm" from ).
In all cases, a monotonically increasing function is implemented by incrementing the previous value of a counter by increment() units. In the most trivial case, increment() could return the constant "1". But increment() could also be implemented to return small random integers such that the increments are unpredictable (see of this document). This represents a trade-off between the unpredictability of the resulting transient numeric identifiers and the transient numeric identifier reuse frequency.

Considering the generic algorithm illustrated above, we can identify the following possible vulnerabilities:

Since the algorithms for this category are similar to those of , with the addition of a monotonically increasing function, all the issues discussed in ("Category #3: Uniqueness, Stable within Context (Soft Failure)") also apply to this case.
mono() can be correlated to the number of identifiers generated for a given context (CONTEXT). Thus, if mono() spans more than the necessary context, the "increments" could be leaked to other parties, thus disclosing information about the number of identifiers that have been generated by the algorithm for all contexts. This information disclosure becomes more evident when an implementation employs a constant increment of 1. For example, an implementation where mono() is actually a single global counter will unnecessarily leak information about the number of identifiers that have been generated by the algorithm (globally, for all contexts). describes one example of how such information leakages can be exploited. We note that limiting the span of the increment space will require a larger number of counters to be stored in memory (i.e., a larger value for the TABLE_LENGTH parameter of the algorithm in ).
Transient numeric identifiers generated with the algorithms described in Sections and will normally allow for fingerprinting within CONTEXT since, for such context, the resulting identifiers will have an identifiable pattern (i.e., a monotonically increasing sequence).

IANA Considerations This document has no IANA actions.

Security Considerations This entire document is about the security and privacy implications of transient numeric identifiers. recommends that protocol specifications specify the interoperability requirements of their transient numeric identifiers, perform a vulnerability assessment of their transient numeric identifiers, and recommend an algorithm for generating each of their transient numeric identifiers. This document analyzes possible algorithms (and their implications) that could be employed to comply with the interoperability requirements of the most common categories of transient numeric identifiers while minimizing the associated negative security and privacy implications.

References Normative References Internet Protocol Transmission Control Protocol Domain names - implementation and specification This RFC is the revised specification of the protocol and format used in the implementation of the Domain Name System. It obsoletes RFC-883. This memo documents the details of the domain name client - server communication. The MD5 Message-Digest Algorithm This document describes the MD5 message-digest algorithm. The algorithm takes as input a message of arbitrary length and produces as output a 128-bit "fingerprint" or "message digest" of the input. This memo provides information for the Internet community. It does not specify an Internet standard. Internet Protocol, Version 6 (IPv6) Specification This document specifies version 6 of the Internet Protocol (IPv6), also sometimes referred to as IP Next Generation or IPng. [STANDARDS-TRACK] Randomness Requirements for Security Security systems are built on strong cryptographic algorithms that foil pattern analysis attempts. However, the security of these systems is dependent on generating secret quantities for passwords, cryptographic keys, and similar quantities. The use of pseudo-random processes to generate secret quantities can result in pseudo-security. A sophisticated attacker may find it easier to reproduce the environment that produced the secret quantities and to search the resulting small set of possibilities than to locate the quantities in the whole of the potential number space. Choosing random quantities to foil a resourceful and motivated adversary is surprisingly difficult. This document points out many pitfalls in using poor entropy sources or traditional pseudo-random number generation techniques for generating such quantities. It recommends the use of truly random hardware techniques and shows that the existing hardware on many systems can be used for this purpose. It provides suggestions to ameliorate the problem when a hardware solution is not available, and it gives examples of how large such quantities need to be for some applications. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements. IP Version 6 Addressing Architecture This specification defines the addressing architecture of the IP Version 6 (IPv6) protocol. The document includes the IPv6 addressing model, text representations of IPv6 addresses, definition of IPv6 unicast addresses, anycast addresses, and multicast addresses, and an IPv6 node's required addresses. This document obsoletes RFC 3513, "IP Version 6 Addressing Architecture". [STANDARDS-TRACK] IPv6 Stateless Address Autoconfiguration This document specifies the steps a host takes in deciding how to autoconfigure its interfaces in IP version 6. The autoconfiguration process includes generating a link-local address, generating global addresses via stateless address autoconfiguration, and the Duplicate Address Detection procedure to verify the uniqueness of the addresses on a link. [STANDARDS-TRACK] Handling of Overlapping IPv6 Fragments The fragmentation and reassembly algorithm specified in the base IPv6 specification allows fragments to overlap. This document demonstrates the security issues associated with allowing overlapping fragments and updates the IPv6 specification to explicitly forbid overlapping fragments. [STANDARDS-TRACK] Network Time Protocol Version 4: Protocol and Algorithms Specification The Network Time Protocol (NTP) is widely used to synchronize computer clocks in the Internet. This document describes NTP version 4 (NTPv4), which is backwards compatible with NTP version 3 (NTPv3), described in RFC 1305, as well as previous versions of the protocol. NTPv4 includes a modified protocol header to accommodate the Internet Protocol version 6 address family. NTPv4 includes fundamental improvements in the mitigation and discipline algorithms that extend the potential accuracy to the tens of microseconds with modern workstations and fast LANs. It includes a dynamic server discovery scheme, so that in many cases, specific server configuration is not required. It corrects certain errors in the NTPv3 design and implementation and includes an optional extension mechanism. [STANDARDS-TRACK] The TCP Authentication Option This document specifies the TCP Authentication Option (TCP-AO), which obsoletes the TCP MD5 Signature option of RFC 2385 (TCP MD5). TCP-AO specifies the use of stronger Message Authentication Codes (MACs), protects against replays even for long-lived TCP connections, and provides more details on the association of security with TCP connections than TCP MD5. TCP-AO is compatible with either a static Master Key Tuple (MKT) configuration or an external, out-of-band MKT management mechanism; in either case, TCP-AO also protects connections when using the same MKT across repeated instances of a connection, using traffic keys derived from the MKT, and coordinates MKT changes between endpoints. The result is intended to support current infrastructure uses of TCP MD5, such as to protect long-lived connections (as used, e.g., in BGP and LDP), and to support a larger set of MACs with minimal other system and operational changes. TCP-AO uses a different option identifier than TCP MD5, even though TCP-AO and TCP MD5 are never permitted to be used simultaneously. TCP-AO supports IPv6, and is fully compatible with the proposed requirements for the replacement of TCP MD5. [STANDARDS-TRACK] Recommendations for Transport-Protocol Port Randomization During the last few years, awareness has been raised about a number of "blind" attacks that can be performed against the Transmission Control Protocol (TCP) and similar protocols. The consequences of these attacks range from throughput reduction to broken connections or data corruption. These attacks rely on the attacker's ability to guess or know the five-tuple (Protocol, Source Address, Destination Address, Source Port, Destination Port) that identifies the transport protocol instance to be attacked. This document describes a number of simple and efficient methods for the selection of the client port number, such that the possibility of an attacker guessing the exact value is reduced. While this is not a replacement for cryptographic methods for protecting the transport-protocol instance, the aforementioned port selection algorithms provide improved security with very little effort and without any key management overhead. The algorithms described in this document are local policies that may be incrementally deployed and that do not violate the specifications of any of the transport protocols that may benefit from them, such as TCP, UDP, UDP-lite, Stream Control Transmission Protocol (SCTP), Datagram Congestion Control Protocol (DCCP), and RTP (provided that the RTP application explicitly signals the RTP and RTCP port numbers). This memo documents an Internet Best Current Practice. Updated Security Considerations for the MD5 Message-Digest and the HMAC-MD5 Algorithms This document updates the security considerations for the MD5 message digest algorithm. It also updates the security considerations for HMAC-MD5. This document is not an Internet Standards Track specification; it is published for informational purposes. Reducing the TIME-WAIT State Using TCP Timestamps This document describes an algorithm for processing incoming SYN segments that allows higher connection-establishment rates between any two TCP endpoints when a TCP Timestamps option is present in the incoming SYN segment. This document only modifies processing of SYN segments received for connections in the TIME-WAIT state; processing in all other states is unchanged. This memo documents an Internet Best Current Practice. IPv6 Flow Label Specification This document specifies the IPv6 Flow Label field and the minimum requirements for IPv6 nodes labeling flows, IPv6 nodes forwarding labeled packets, and flow state establishment methods. Even when mentioned as examples of possible uses of the flow labeling, more detailed requirements for specific use cases are out of the scope for this document. The usage of the Flow Label field enables efficient IPv6 flow classification based only on IPv6 main header fields in fixed positions. [STANDARDS-TRACK] Defending against Sequence Number Attacks This document specifies an algorithm for the generation of TCP Initial Sequence Numbers (ISNs), such that the chances of an off-path attacker guessing the sequence numbers in use by a target connection are reduced. This document revises (and formally obsoletes) RFC 1948, and takes the ISN generation algorithm originally proposed in that document to Standards Track, formally updating RFC 793. [STANDARDS-TRACK] A Method for Generating Semantically Opaque Interface Identifiers with IPv6 Stateless Address Autoconfiguration (SLAAC) This document specifies a method for generating IPv6 Interface Identifiers to be used with IPv6 Stateless Address Autoconfiguration (SLAAC), such that an IPv6 address configured using this method is stable within each subnet, but the corresponding Interface Identifier changes when the host moves from one network to another. This method is meant to be an alternative to generating Interface Identifiers based on hardware addresses (e.g., IEEE LAN Media Access Control (MAC) addresses), such that the benefits of stable addresses can be achieved without sacrificing the security and privacy of users. The method specified in this document applies to all prefixes a host may be employing, including link-local, global, and unique-local prefixes (and their corresponding addresses). TCP Extensions for High Performance This document specifies a set of TCP extensions to improve performance over paths with a large bandwidth * delay product and to provide reliable operation over very high-speed paths. It defines the TCP Window Scale (WS) option and the TCP Timestamps (TS) option and their semantics. The Window Scale option is used to support larger receive windows, while the Timestamps option can be used for at least two distinct mechanisms, Protection Against Wrapped Sequences (PAWS) and Round-Trip Time Measurement (RTTM), that are also described herein. This document obsoletes RFC 1323 and describes changes from it. Recommendation on Stable IPv6 Interface Identifiers This document changes the recommended default Interface Identifier (IID) generation scheme for cases where Stateless Address Autoconfiguration (SLAAC) is used to generate a stable IPv6 address. It recommends using the mechanism specified in RFC 7217 in such cases, and recommends against embedding stable link-layer addresses in IPv6 IIDs. It formally updates RFC 2464, RFC 2467, RFC 2470, RFC 2491, RFC 2492, RFC 2497, RFC 2590, RFC 3146, RFC 3572, RFC 4291, RFC 4338, RFC 4391, RFC 5072, and RFC 5121. This document does not change any existing recommendations concerning the use of temporary addresses as specified in RFC 4941. Internet Protocol, Version 6 (IPv6) Specification This document specifies version 6 of the Internet Protocol (IPv6). It obsoletes RFC 2460. Temporary Address Extensions for Stateless Address Autoconfiguration in IPv6 This document describes an extension to IPv6 Stateless Address Autoconfiguration that causes hosts to generate temporary addresses with randomized interface identifiers for each prefix advertised with autoconfiguration enabled. Changing addresses over time limits the window of time during which eavesdroppers and other information collectors may trivially perform address-based network-activity correlation when the same address is employed for multiple transactions by the same host. Additionally, it reduces the window of exposure of a host as being accessible via an address that becomes revealed as a result of active communication. This document obsoletes RFC 4941. Transmission Control Protocol (TCP) This document specifies the Transmission Control Protocol (TCP). TCP is an important transport-layer protocol in the Internet protocol stack, and it has continuously evolved over decades of use and growth of the Internet. Over this time, a number of changes have been made to TCP as it was specified in RFC 793, though these have only been documented in a piecemeal fashion. This document collects and brings those changes together with the protocol specification from RFC 793. This document obsoletes RFC 793, as well as RFCs 879, 2873, 6093, 6429, 6528, and 6691 that updated parts of RFC 793. It updates RFCs 1011 and 1122, and it should be considered as a replacement for the portions of those documents dealing with TCP requirements. It also updates RFC 5961 by adding a small clarification in reset handling while in the SYN-RECEIVED state. The TCP header control bits from RFC 793 have also been updated based on RFC 3168. Informative References arc4random(3) OpenBSD Library Functions Manual Serious Cryptography: A Practical Introduction to Modern Encryption No Starch Press, Inc. Security Problems in the TCP/IP Protocol Suite Computer Communications Review, Vol. 19, No. 2, pp. 32-48 A Technique for Counting NATted Hosts IMW'02, Marseille, France BLAKE3: one function, fast everywhere Information technology - Programming languages - C ISO/IEC Security Assessment of the Transmission Control Protocol (TCP) Centre for the Protection of National Infrastructure (CPNI) Vulnerability Advisories for PRNGs NVD Cover your tracks: See how trackers view your browser EFF Secure Hash Standard (SHS) NIST Remote OS detection via TCP/IP Stack FingerPrinting Fyodor Phrack Magazine, Volume 8, Issue 54 Idle Scanning and related IPID games Fyodor Chapter 8. Remote OS Detection getentropy(3) Linux Linux Programmer's Manual Protocol Registries IANA From IP ID to Device ID and KASLR Bypass (Extended Version) Simple Active Attack Against TCP Proceedings of the Fifth USENIX UNIX Security Symposium Address Space Layout Randomization PaX Team OpenBSD DNS Cache Poisoning and Multiple O/S Predictable IP ID Vulnerability The Art of Computer Programming Volume 2 (Seminumerical Algorithms), 2nd Ed., Reading, Massachusetts, Addison-Wesley Publishing Company A Weakness in the 4.2BSD UNIX TCP/IP Software CSTR 117, AT&T Bell Laboratories, Murray Hill, NJ Nmap: Free Security Scanner For Network Exploration and Audit nmap IEEE Standard for Information Technology -- Portable Operating System Interface (POSIX(TM)) Base Specifications, Issue 7 IEEE Numerical Recipes in C: The Art of Scientific Computing 2nd Ed., Cambridge University Press HMAC: Keyed-Hashing for Message Authentication This document describes HMAC, a mechanism for message authentication using cryptographic hash functions. HMAC can be used with any iterative cryptographic hash function, e.g., MD5, SHA-1, in combination with a secret shared key. The cryptographic strength of HMAC depends on the properties of the underlying hash function. This memo provides information for the Internet community. This memo does not specify an Internet standard of any kind SEcure Neighbor Discovery (SEND) IPv6 nodes use the Neighbor Discovery Protocol (NDP) to discover other nodes on the link, to determine their link-layer addresses to find routers, and to maintain reachability information about the paths to active neighbors. If not secured, NDP is vulnerable to various attacks. This document specifies security mechanisms for NDP. Unlike those in the original NDP specifications, these mechanisms do not use IPsec. [STANDARDS-TRACK] Defending TCP Against Spoofing Attacks Recent analysis of potential attacks on core Internet infrastructure indicates an increased vulnerability of TCP connections to spurious resets (RSTs), sent with forged IP source addresses (spoofing). TCP has always been susceptible to such RST spoofing attacks, which were indirectly protected by checking that the RST sequence number was inside the current receive window, as well as via the obfuscation of TCP endpoint and port numbers. For pairs of well-known endpoints often over predictable port pairs, such as BGP or between web servers and well-known large-scale caches, increases in the path bandwidth-delay product of a connection have sufficiently increased the receive window space that off-path third parties can brute-force generate a viable RST sequence number. The susceptibility to attack increases with the square of the bandwidth, and thus presents a significant vulnerability for recent high-speed networks. This document addresses this vulnerability, discussing proposed solutions at the transport level and their inherent challenges, as well as existing network level solutions and the feasibility of their deployment. This document focuses on vulnerabilities due to spoofed TCP segments, and includes a discussion of related ICMP spoofing attacks on TCP connections. This memo provides information for the Internet community. IPv4 Reassembly Errors at High Data Rates IPv4 fragmentation is not sufficiently robust for use under some conditions in today's Internet. At high data rates, the 16-bit IP identification field is not large enough to prevent frequent incorrectly assembled IP fragments, and the TCP and UDP checksums are insufficient to prevent the resulting corrupted datagrams from being delivered to higher protocol layers. This note describes some easily reproduced experiments demonstrating the problem, and discusses some of the operational implications of these observations. This memo provides information for the Internet community. ICMP Attacks against TCP This document discusses the use of the Internet Control Message Protocol (ICMP) to perform a variety of attacks against the Transmission Control Protocol (TCP). Additionally, this document describes a number of widely implemented modifications to TCP's handling of ICMP error messages that help to mitigate these issues. This document is not an Internet Standards Track specification; it is published for informational purposes. Security Considerations for the SHA-0 and SHA-1 Message-Digest Algorithms This document includes security considerations for the SHA-0 and SHA-1 message digest algorithm. This document is not an Internet Standards Track specification; it is published for informational purposes. Security Assessment of the Internet Protocol Version 4 This document contains a security assessment of the IETF specifications of the Internet Protocol version 4 and of a number of mechanisms and policies in use by popular IPv4 implementations. It is based on the results of a project carried out by the UK's Centre for the Protection of National Infrastructure (CPNI). This document is not an Internet Standards Track specification; it is published for informational purposes. Privacy Considerations for Internet Protocols This document offers guidance for developing privacy considerations for inclusion in protocol specifications. It aims to make designers, implementers, and users of Internet protocols aware of privacy-related design choices. It suggests that whether any individual RFC warrants a specific privacy considerations section will depend on the document's content. Security Implications of IPv6 Fragmentation with IPv6 Neighbor Discovery This document analyzes the security implications of employing IPv6 fragmentation with Neighbor Discovery (ND) messages. It updates RFC 4861 such that use of the IPv6 Fragmentation Header is forbidden in all Neighbor Discovery messages, thus allowing for simple and effective countermeasures for Neighbor Discovery attacks. Finally, it discusses the security implications of using IPv6 fragmentation with SEcure Neighbor Discovery (SEND) and formally updates RFC 3971 to provide advice regarding how the aforementioned security implications can be mitigated. Using the IPv6 Flow Label for Load Balancing in Server Farms This document describes how the currently specified IPv6 flow label can be used to enhance layer 3/4 (L3/4) load distribution and balancing for large server farms. Pervasive Monitoring Is an Attack Pervasive monitoring is a technical attack that should be mitigated in the design of IETF protocols, where possible. Network Reconnaissance in IPv6 Networks IPv6 offers a much larger address space than that of its IPv4 counterpart. An IPv6 subnet of size /64 can (in theory) accommodate approximately 1.844 * 10^19 hosts, thus resulting in a much lower host density (#hosts/#addresses) than is typical in IPv4 networks, where a site typically has 65,000 or fewer unique addresses. As a result, it is widely assumed that it would take a tremendous effort to perform address-scanning attacks against IPv6 networks; therefore, IPv6 address-scanning attacks have been considered unfeasible. This document formally obsoletes RFC 5157, which first discussed this assumption, by providing further analysis on how traditional address-scanning techniques apply to IPv6 networks and exploring some additional techniques that can be employed for IPv6 network reconnaissance. Security and Privacy Considerations for IPv6 Address Generation Mechanisms This document discusses privacy and security considerations for several IPv6 address generation mechanisms, both standardized and non-standardized. It evaluates how different mechanisms mitigate different threats and the trade-offs that implementors, developers, and users face in choosing different addresses or address generation mechanisms. Security Implications of Predictable Fragment Identification Values IPv6 specifies the Fragment Header, which is employed for the fragmentation and reassembly mechanisms. The Fragment Header contains an "Identification" field that, together with the IPv6 Source Address and the IPv6 Destination Address of a packet, identifies fragments that correspond to the same original datagram, such that they can be reassembled together by the receiving host. The only requirement for setting the Identification field is that the corresponding value must be different than that employed for any other fragmented datagram sent recently with the same Source Address and Destination Address. Some implementations use a simple global counter for setting the Identification field, thus leading to predictable Identification values. This document analyzes the security implications of predictable Identification values, and provides implementation guidance for setting the Identification field of the Fragment Header, such that the aforementioned security implications are mitigated. The Transport Layer Security (TLS) Protocol Version 1.3 This document specifies version 1.3 of the Transport Layer Security (TLS) protocol. TLS allows client/server applications to communicate over the Internet in a way that is designed to prevent eavesdropping, tampering, and message forgery. This document updates RFCs 5705 and 6066, and obsoletes RFCs 5077, 5246, and 6961. This document also specifies new requirements for TLS 1.2 implementations. Randomness Improvements for Security Protocols Randomness is a crucial ingredient for Transport Layer Security (TLS) and related security protocols. Weak or predictable "cryptographically secure" pseudorandom number generators (CSPRNGs) can be abused or exploited for malicious purposes. An initial entropy source that seeds a CSPRNG might be weak or broken as well, which can also lead to critical and systemic security problems. This document describes a way for security protocol implementations to augment their CSPRNGs using long-term private keys. This improves randomness from broken or otherwise subverted CSPRNGs. This document is a product of the Crypto Forum Research Group (CFRG) in the IRTF. Unfortunate History of Transient Numeric Identifiers Security Considerations for Transient Numeric Identifiers Employed in Network Protocols The Definitive Guide to "Modulo Bias and How to Avoid It"! Kudelski Security Research about the ip header id message to the Bugtraq mailing list new tcp scan method message to the Bugtraq mailing list more ip id message to the Bugtraq mailing list Addressing Weakness in the Domain Name System Protocol Technical details of the attack described by Markoff in NYT message to the USENET comp.security.misc newsgroup Improving TCP/IP security through randomization without sacrificing interoperability The FreeBSD Project EuroBSDCon 2005 Conference SipHash: a fast short-input PRF TBIT, the TCP Behavior Inference Tool TBIT TCP Timestamping - Obtaining System Uptime Remotely Securiteam message to the Bugtraq mailing list Strange Attractors and TCP/IP Sequence Number Analysis Strange Attractors and TCP/IP Sequence Number Analysis - One Year Later (2002) p0f v3 (3.09b)

Algorithms and Techniques with Known Issues The following subsections discuss algorithms and techniques with known negative security and privacy implications.

Predictable Linear Identifiers Algorithm One of the most trivial ways to achieve uniqueness with a low identifier reuse frequency is to produce a linear sequence. This type of algorithm has been employed in the past to generate identifiers of Categories #1, #2, and #4 (please see for an analysis of these categories). For example, the following algorithm has been employed (see, e.g., , , , and ) in a number of operating systems for selecting IP IDs, TCP ephemeral port numbers, etc.: /* Initialization code */ next_id = min_id; id_inc= 1; /* Transient Numeric ID selection function */ id_range = max_id - min_id + 1; retry = id_range; do { if (next_id == max_id) { next_id = min_id; } else { next_id = next_id + id_inc; } if (suitable_id(next_id)) { return next_id; } retry--; } while (retry > 0); return ERROR; NOTE: suitable_id() checks whether a candidate numeric identifier is suitable (e.g., whether it is unique or not). For obvious reasons, this algorithm results in predictable sequences. Since a global counter is used to generate the transient numeric identifiers ("next_id" in the example above), an entity that learns one numeric identifier can infer past numeric identifiers and predict future values to be generated by the same algorithm. Since the value employed for the increments is known (such as "1" in this case), an attacker can sample two values and learn the number of identifiers that were generated in between the two sampled values. Furthermore, if the counter is initialized, to some known value (e.g., when the system is bootstrapped), the algorithm will leak additional information, such as the number of transmitted fragmented datagrams in the case of an IP ID generator or the system uptime in the case of TCP timestamps .

Random-Increments Algorithm This algorithm offers a middle ground between the algorithms that generate randomized transient numeric identifiers (such as those described in Sections and ) and those that generate identifiers with a predictable monotonically increasing function (see ). /* Initialization code */ next_id = random(); /* Initialization value */ id_rinc = 500; /* Determines the trade-off */ /* Transient Numeric ID selection function */ id_range = max_id - min_id + 1; retry = id_range; do { /* Random increment */ id_inc = (random() % id_rinc) + 1; if ( (max_id - next_id) >= id_inc){ next_id = next_id + id_inc; } else { next_id = min_id + id_inc - (max_id - next_id); } if (suitable_id(next_id)) { return next_id; } retry = retry - id_inc; } while (retry > 0); return ERROR; NOTE: random() is a PRNG that returns a pseudorandom unsigned integer number of appropriate size. Beware that "adapting" the length of the output of random() with a modulo operator (e.g., C language's "%") may change the distribution of the PRNG. To preserve a uniform distribution, the rejection sampling technique can be used. suitable_id() is a function that checks whether a candidate identifier is suitable (e.g., whether it is unique or not). This algorithm aims at producing a global monotonically increasing sequence of transient numeric identifiers while avoiding the use of fixed increments, which would lead to trivially predictable sequences. The value "id_rinc" allows for direct control of the trade-off between unpredictability and identifier reuse frequency. The smaller the value of "id_rinc", the more similar this algorithm is to a predicable, global linear identifier generation algorithm (as the one in ). The larger the value of "id_rinc", the more similar this algorithm is to the algorithm described in of this document. When the identifiers wrap, there is a risk of collisions of transient numeric identifiers (i.e., identifier reuse). Therefore, "id_rinc" should be selected according to the following criteria:

It should maximize the wrapping time of the identifier space.
It should minimize identifier reuse frequency.
It should maximize unpredictability.

Clearly, these are competing goals, and the decision of which value of "id_rinc" to use is a trade-off. Therefore, the value of "id_rinc" is at times a configurable parameter so that system administrators can make the trade-off for themselves. We note that the alternative algorithms discussed throughout this document offer better interoperability, security, and privacy properties than this algorithm, and hence, implementation of this algorithm is discouraged.

Reusing Identifiers Across Different Contexts Employing the same identifier across contexts in which stability is not required (i.e., overloading the semantics of transient numeric identifiers) usually has negative security and privacy implications. For example, in order to generate transient numeric identifiers of Category #2 or #3, an implementation or specification might be tempted to employ a source for the numeric identifiers that is known to provide unique values but that may also be predictable or leak information related to the entity generating the identifier. This technique has been employed in the past for, e.g., generating IPv6 IIDs by reusing the MAC address of the underlying network interface card. However, as noted in and , embedding link-layer addresses in IPv6 IIDs not only results in predictable values but also leaks information about the manufacturer of the underlying network interface card, allows for network activity correlation, and makes address-based scanning attacks feasible.

Acknowledgements The authors would like to thank (in alphabetical order) , , , , , , , , , , , , , , and for providing valuable comments on earlier draft versions of this document. The authors would like to thank and for their guidance during the publication process of this document. The authors would like to thank and (John Hopkins University) for kindly answering a number of questions. The authors would like to thank for his magic and inspiration.

Authors' Addresses SI6 Networks

Segurola y Habana 4310 7mo piso Ciudad Autonoma de Buenos Aires Argentina [email protected] https://www.si6networks.com

Quarkslab

Segurola y Habana 4310 7mo piso Ciudad Autonoma de Buenos Aires Argentina [email protected] https://www.quarkslab.com