view Side-By-Side changes
NFS Version 4 Working Group S. Shepler
INTERNET-DRAFT Sun Microsystems
Document: draft-ietf-nfsv4-04.txt C. Beame
Hummingbird Communications
B. Callaghan
Document: draft-ietf-nfsv4-02.txt
Sun Microsystems
M. Eisler
D. Robinson
R. Thurlow
Sun Microsystems
D. Noveck
Network Appliance
C. Beame
Hummingbird Communications
October 1999
D. Robinson
Sun Microsystems
R. Thurlow
Sun Microsystems
January 2000
NFS version 4 Protocol
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet- Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Abstract
NFS version 4 is a distributed file system protocol which owes
heritage to NFS protocol versions 2 [RFC1094] and 3 [RFC1813].
Expires: July 2000 [Page 1]
Draft Specification NFS version 4 Protocol January 2000
Unlike earlier versions, the NFS version 4 protocol supports
traditional file access while integrating support for file locking
and the mount protocol. In addition, support for strong security
(and its negotiation), compound operations, client caching, and internationlization
internationalization have been added. Of course,
Expires: April 2000 [Page 1]
Draft Protocol Specification NFS version 4 October 1999 attention has been
applied to making NFS version 4 operate well in an Internet
environment.
Copyright
Copyright (C) The Internet Society (1999). All Rights Reserved.
Key Words
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119.
Expires: April July 2000 [Page 2]
Draft Protocol Specification NFS version 4 October 1999 Protocol January 2000
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 7
2. RPC and Security Flavor . . . . . .
1.1. Overview of NFS Version 4 Features . . . . . . . . . . . . 8
2.1. Ports 7
1.1.1. RPC and Transports . . . . . . . . . . . . . . . . . . . 8
2.2. Security Flavors . . . . . . . . . . . . . . . . . . . . . 8
2.2.1. Security mechanisms for NFS version 4 . . . . .
1.1.2. Procedure and Operation Structure . . . . 8
2.2.1.1. Kerberos V5 as security triple . . . . . . . 8
1.1.3. File System Model . . . . . 8
2.2.1.2. <another security triple> . . . . . . . . . . . . . . 9
2.3. Security Negotiation
1.1.3.1. Filehandle Types . . . . . . . . . . . . . . . . . . . 9
2.3.1. Security Error
1.1.3.2. Attribute Types . . . . . . . . . . . . . . . . . . 10
1.1.3.3. File System Replication and Migration . . 10
2.3.2. SECINFO . . . . . 10
1.1.4. OPEN and CLOSE . . . . . . . . . . . . . . . . . . 10
3. File handles . . 11
1.1.5. File locking . . . . . . . . . . . . . . . . . . . . . 11
3.1. Obtaining the First File Handle
1.1.6. Client Caching and Delegation . . . . . . . . . . . . 11
3.1.1. Root File Handle
1.2. General Definitions . . . . . . . . . . . . . . . . . . 12
2. Protocol Data Types . 11
3.1.2. Public File Handle . . . . . . . . . . . . . . . . . . 12
3.2. File Handle 14
2.1. Basic Data Types . . . . . . . . . . . . . . . . . . . 12
3.2.1. General Properties of a File Handle . 14
2.2. Structured Data Types . . . . . . . . 12
3.2.2. Persistent File Handle . . . . . . . . . 15
3. RPC and Security Flavor . . . . . . . 13
3.2.3. Volatile File Handle . . . . . . . . . . 20
3.1. Ports and Transports . . . . . . . 13
3.2.4. One Method of Constructing a Volatile File Handle . . 15
3.3. Client Recovery from File Handle Expiration . . . . . . 15
4. Basic Data Types . . . 20
3.2. Security Flavors . . . . . . . . . . . . . . . . . . 17
5. File Attributes . . 20
3.2.1. Security mechanisms for NFS version 4 . . . . . . . . 20
3.2.1.1. Kerberos V5 as security triple . . . . . . . . . . . 21
3.2.1.2. LIPKEY as a security triple . . 19
5.1. Mandatory Attributes . . . . . . . . . . 21
3.2.1.3. SPKM-3 as a security triple . . . . . . . . 20
5.2. Recommended Attributes . . . . 22
3.3. Security Negotiation . . . . . . . . . . . . . 20
5.3. Named Attributes . . . . . 23
3.3.1. Security Error . . . . . . . . . . . . . . . 20
5.4. Mandatory Attributes - Definitions . . . . . 23
3.3.2. SECINFO . . . . . . 22
5.5. Recommended Attributes - Definitions . . . . . . . . . . 25
5.6. Interpreting owner and owner_group . . . . . . . 23
3.4. Callback RPC Authentication . . . . 30
5.7. Access Control Lists . . . . . . . . . . 23
4. Filehandles . . . . . . . . 30
5.7.1. ACE type . . . . . . . . . . . . . . . 25
4.1. Obtaining the First Filehandle . . . . . . . . 31
5.7.2. ACE flag . . . . . 25
4.1.1. Root Filehandle . . . . . . . . . . . . . . . . . . 31
5.7.3. ACE Access Mask . 25
4.1.2. Public Filehandle . . . . . . . . . . . . . . . . . . 33
5.7.4. ACE who 26
4.2. Filehandle Types . . . . . . . . . . . . . . . . . . . . 26
4.2.1. General Properties of a Filehandle . . . 33
6. Filesystem Migration and Replication . . . . . . . 27
4.2.2. Persistent Filehandle . . . . 35
6.1. Replication . . . . . . . . . . . . 27
4.2.3. Volatile Filehandle . . . . . . . . . . 35
6.2. Migration . . . . . . . 28
4.2.4. One Method of Constructing a Volatile Filehandle . . . 29
4.3. Client Recovery from Filehandle Expiration . . . . . . . 30
5. File Attributes . . . . . . . 35
6.3. Interpretation of the fs_locations Attribute . . . . . . 36
6.4. Filehandle Recovery for Migration or Replication . . . . 37
7. NFS Server Namespace . . . . 31
5.1. Mandatory Attributes . . . . . . . . . . . . . . . 38
7.1. Server Exports . . . 32
5.2. Recommended Attributes . . . . . . . . . . . . . . . . . 32
5.3. Named Attributes . 38
7.2. Browsing Exports . . . . . . . . . . . . . . . . . . . 32
5.4. Mandatory Attributes - Definitions . 38
7.3. Server Pseudo File-System . . . . . . . . . . 34
5.5. Recommended Attributes - Definitions . . . . . 39
7.4. Multiple Roots . . . . . 36
5.6. Interpreting owner and owner_group . . . . . . . . . . . 41
5.7. Quota Attributes . . . . . 39
7.5. Filehandle Volatility . . . . . . . . . . . . . . . 42
5.8. Access Control Lists . . 39
7.6. Exported Root . . . . . . . . . . . . . . . . 43
5.8.1. ACE type . . . . . 40
7.7. Mount Point Crossing . . . . . . . . . . . . . . . . . . 40 44
Expires: April July 2000 [Page 3]
Draft Protocol Specification NFS version 4 October 1999
7.8. Security Policy and Namespace Presentation Protocol January 2000
5.8.2. ACE flag . . . . . . . 41
7.9. Summary . . . . . . . . . . . . . . . . 44
5.8.3. ACE Access Mask . . . . . . . . 41
8. File Locking . . . . . . . . . . . 45
5.8.4. ACE who . . . . . . . . . . . . 42
8.1. Definitions . . . . . . . . . . . 47
6. File System Migration and Replication . . . . . . . . . . 48
6.1. Replication . 42
8.2. Locking . . . . . . . . . . . . . . . . . . . . . 48
6.2. Migration . . . 43
8.2.1. Client ID . . . . . . . . . . . . . . . . . . . . 48
6.3. Interpretation of the fs_locations Attribute . . 43
8.2.2. nfs_lockowner and stateid Definition . . . . 49
6.4. Filehandle Recovery for Migration or Replication . . . . 50
7. NFS Server Name Space . 45
8.2.3. Use of the stateid . . . . . . . . . . . . . . . . . 51
7.1. Server Exports . 45
8.2.4. Sequencing of Lock Requests . . . . . . . . . . . . . 46
8.3. Blocking Locks . . . . . . . 51
7.2. Browsing Exports . . . . . . . . . . . . . . 46
8.4. Lease Renewal . . . . . . 51
7.3. Server Pseudo File System . . . . . . . . . . . . . . . 47
8.5. Crash Recovery 52
7.4. Multiple Roots . . . . . . . . . . . . . . . . . . . . . 47
8.5.1. Client Failure and Recovery 52
7.5. Filehandle Volatility . . . . . . . . . . . . . 47
8.5.2. Server Failure and Recovery . . . . 52
7.6. Exported Root . . . . . . . . . 48
8.5.3. Network Partitions and Recovery . . . . . . . . . . . 48
8.6. Server Revocation of Locks . 53
7.7. Mount Point Crossing . . . . . . . . . . . . . . . 49
8.7. . . . 53
7.8. Security Policy and Name Space Presentation . . . . . . 53
8. File Locking and Share Reservations . . . . . . . . . . . 55
8.1. Locking . . . . . . . . 50
8.8. OPEN/CLOSE Procedures . . . . . . . . . . . . . . . . 55
8.1.1. Client ID . 51
9. Client-Side Caching . . . . . . . . . . . . . . . . . . . 52
9.1. Performance Challenges for Client-Side Caching . . 56
8.1.2. Server Release of Clientid . . . 52
9.2. Proxy Caching . . . . . . . . . . . 57
8.1.3. nfs_lockowner and stateid Definition . . . . . . . . . 58
8.1.4. Use of the stateid . 53
9.3. Delegation and Callbacks . . . . . . . . . . . . . . . . 54
9.3.1. Delegation Recovery . 59
8.1.5. Sequencing of Lock Requests . . . . . . . . . . . . . 60
8.1.6. Recovery from Replayed Requests . . . 55
9.4. Data Caching . . . . . . . . 60
8.1.7. Releasing nfs_lockowner State . . . . . . . . . . . . 61
8.2. Lock Ranges . . 57
9.4.1. Data Caching and OPENs . . . . . . . . . . . . . . . . 57
9.4.2. Data Caching and File Locking . . . . 61
8.3. Blocking Locks . . . . . . . . 58
9.4.3. Data Caching and Mandatory File Locking . . . . . . . 59
9.4.4. Data Caching and File Identity . . . . . . 62
8.4. Lease Renewal . . . . . . 60
9.5. Open Delegation . . . . . . . . . . . . . . . 63
8.5. Crash Recovery . . . . . . . . . . 61
9.5.1. Open Delegation and Data Caching . . . . . . . . . . . 63
9.5.2. Open Delegation
8.5.1. Client Failure and File Locks Recovery . . . . . . . . . . . . 64
9.5.3. Recall of Open Delegation . 64
8.5.2. Server Failure and Recovery . . . . . . . . . . . . . 64
9.5.4. Delegation Revocation
8.5.3. Network Partitions and Recovery . . . . . . . . . . . 66
8.6. Recovery from a Lock Request Timeout or Abort . . . . . 67
9.6. Data Caching and
8.7. Server Revocation of Locks . . . . . . . . . . . . . . 67
9.6.1. Revocation Recovery for Write Open Delegation . 68
8.8. Share Reservations . . . 67
9.7. Attribute Caching . . . . . . . . . . . . . . . . 69
8.9. OPEN/CLOSE Operations . . . 68
9.8. Name Caching . . . . . . . . . . . . . . 70
8.10. Open Upgrade and Downgrade . . . . . . . . 69
9.9. Directory Caching . . . . . . 70
8.11. Short and Long Leases . . . . . . . . . . . . . 70
10. Defined Error Numbers . . . . 71
8.12. Clocks and Calculating Lease Expiration . . . . . . . . 71
9. Client-Side Caching . . . . . . 72
11. NFS Version 4 Requests . . . . . . . . . . . . . 73
9.1. Performance Challenges for Client-Side Caching . . . . . 77
11.1. Compound Procedure 73
9.2. Delegation and Callbacks . . . . . . . . . . . . . . . . 74
9.2.1. Delegation Recovery . . 77
11.2. Evaluation of a Compound Request . . . . . . . . . . . 77
12. NFS Version 4 Procedures . . . . 76
9.3. Data Caching . . . . . . . . . . . . 79
12.1. Procedure 0: NULL - No Operation . . . . . . . . . . 77
9.3.1. Data Caching and OPENs . 79
12.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 80
12.2.1. Operation 2: ACCESS - Check Access Rights . . . . . . 82
12.2.2. Operation 3: CLOSE - Close . . . 78
9.3.2. Data Caching and File Locking . . . . . . . . . . . 86
12.2.3. Operation 4: COMMIT - Commit Cached . 78
9.3.3. Data Caching and Mandatory File Locking . . . . . . 88 . 80
Expires: April July 2000 [Page 4]
Draft Protocol Specification NFS version 4 October 1999
12.2.4. Operation 5: CREATE - Create a Non-Regular Protocol January 2000
9.3.4. Data Caching and File Object 91
12.2.5. Operation 6: DELEGPURGE - Purge Delegations Awaiting
Recovery . Identity . . . . . . . . . . . . 80
9.4. Open Delegation . . . . . . . . . 95
12.2.6. Operation 7: DELEGRETURN - Return Delegation . . . . 96
12.2.7. Operation 8: GETATTR - Get Attributes . . . . . . . 81
9.4.1. Open Delegation and Data Caching . 97
12.2.8. Operation 9: GETFH - Get Current Filehandle . . . . . 99
12.2.9. Operation 10: LINK - Create Link to a File . . . . . 101
12.2.10. Operation 11: LOCK - Create Lock . 84
9.4.2. Open Delegation and File Locks . . . . . . . . . 103
12.2.11. Operation 12: LOCKT - Test For Lock . . . 85
9.4.3. Recall of Open Delegation . . . . . 105
12.2.12. Operation 13: LOCKU - Unlock File . . . . . . . . . 107
12.2.13. Operation 14: LOOKUP - Lookup Filename 85
9.4.4. Delegation Revocation . . . . . . . 109
12.2.14. Operation 15: LOOKUPP - Lookup Parent Directory . . 112
12.2.15. Operation 16: NVERIFY - Verify Difference in
Attributes . . . . . . . 87
9.5. Data Caching and Revocation . . . . . . . . . . . . . . 114
12.2.16. Operation 17: OPEN - 87
9.5.1. Revocation Recovery for Write Open a Regular File Delegation . . . . 88
9.6. Attribute Caching . . 116
12.2.17. Operation 18: OPENATTR - Open Named Attribute
Directory . . . . . . . . . . . . . . . . . 89
9.7. Name Caching . . . . 124
12.2.18. Operation 19: PUTFH - Set Current Filehandle . . . . 126
12.2.19. Operation 20: PUTPUBFH - Set Public Filehandle . . . 128
12.2.20. Operation 21: PUTROOTFH - Set Root Filehandle . . . 129
12.2.21. Operation 22: READ - Read from File . . . . . . . . 130
12.2.22. Operation 23: READDIR - Read 90
9.8. Directory Caching . . . . . . . 133
12.2.23. Operation 24: READLINK - Read Symbolic Link . . . . 137
12.2.24. Operation 25: REMOVE - Remove Filesystem Object . . 139
12.2.25. Operation 26: RENAME - Rename Directory Entry . . . 141
12.2.26. Operation 27: RENEW - Renew a Lease . . . 91
10. Minor Versioning . . . . . 144
12.2.27. Operation 28: RESTOREFH - Restore Saved Filehandle . 145
12.2.28. Operation 29: SAVEFH - Save Current Filehandle . . . 147
12.2.29. Operation 30: SECINFO - Obtain Available Security . 149
12.2.30. Operation 31: SETATTR - Set Attributes . . . . . . . 151
12.2.31. Operation 32: SETCLIENTID - Negotiated Clientid . . 154
12.2.32. Operation 33: VERIFY - Verify Same Attributes . 93
11. Internationalization . . 156
12.2.33. Operation 34: WRITE - Write to File . . . . . . . . 158
13. NFS Version 4 Callback Procedures . . . . . . . . 96
11.1. Universal Versus Local Character Sets . . . . 163
13.1. Procedure 0: CB_NULL - No Operation . . . . . 96
11.2. Overview of Universal Character Set Standards . . . . . 163
13.2. Procedure 1: CB_COMPOUND - Compound Operations 97
11.3. Difficulties with UCS-4, UCS-2, Unicode . . . . 164
13.2.1. Procedure 2: CB_GETATTR - Get Attributes . . . . 98
11.4. UTF-8 and its solutions . . 166
13.2.2. Procedure 3: CB_RECALL - Recall an Open Delegation . 168
14. Locking notes . . . . . . . . . . . . . 99
12. Error Definitions . . . . . . . . . 170
14.1. Short and long leases . . . . . . . . . . . 100
13. NFS Version 4 Requests . . . . . . 170
14.2. Clocks and leases . . . . . . . . . . . 105
13.1. Compound Procedure . . . . . . . . 170
14.3. Locks and lease times . . . . . . . . . . 105
13.2. Evaluation of a Compound Request . . . . . . . 170
14.4. Locking of directories and other meta-files . . . . 106
13.3. Operation Values . . 171
14.5. Proxy servers and leases . . . . . . . . . . . . . . . 171
14.6. Locking and the new latency . . 106
14. NFS Version 4 Procedures . . . . . . . . . . . . 171
15. Internationalization . . . . 107
14.1. Procedure 0: NULL - No Operation . . . . . . . . . . . 107
14.2. Procedure 1: COMPOUND - Compound Operations . . . 172
15.1. Universal Versus Local Character Sets . . . 108
14.2.1. Operation 3: ACCESS - Check Access Rights . . . . . . 172
15.2. Overview of Universal Character Set Standards 111
14.2.2. Operation 4: CLOSE - Close File . . . . . 173
Expires: April 2000 [Page 5]
Draft Protocol Specification NFS version 4 October 1999
15.3. Difficulties with UCS-4, UCS-2, Unicode . . . . . . 115
14.2.3. Operation 5: COMMIT - Commit Cached Data . . 174
15.4. UTF-8 and its solutions . . . . 117
14.2.4. Operation 6: CREATE - Create a Non-Regular File Object 120
14.2.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting
Recovery . . . . . . . . . . . . 175
16. Security Considerations . . . . . . . . . . 123
14.2.6. Operation 8: DELEGRETURN - Return Delegation . . . . 124
14.2.7. Operation 9: GETATTR - Get Attributes . . . 176
17. NFS Version 4 RPC definition file . . . . . 125
14.2.8. Operation 10: GETFH - Get Current Filehandle . . . . 127
14.2.9. Operation 11: LINK - Create Link to a File . . . 177
18. Bibliography . . 129
14.2.10. Operation 12: LOCK - Create Lock . . . . . . . . . . 131
14.2.11. Operation 13: LOCKT - Test For Lock . . . . . . . . 134
14.2.12. Operation 14: LOCKU - Unlock File . . 206
19. Authors and Contributors . . . . . . . 136
14.2.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . 138
14.2.14. Operation 16: LOOKUPP - Lookup Parent Directory . . 210
19.1. Editor's Address 141
14.2.15. Operation 17: NVERIFY - Verify Difference in
Attributes . . . . . . . . . . . . . . . . . . . 210
19.2. Authors' Addresses . . 143
14.2.16. Operation 18: OPEN - Open a Regular File . . . . . . 145
14.2.17. Operation 19: OPENATTR - Open Named Attribute
Directory . . . . . . . . . . . . . . . . . . . . . 154
14.2.18. Operation 20: OPEN_CONFIRM - Confirm Open . . . . . 156
14.2.19. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access159
14.2.20. Operation 22: PUTFH - Set Current Filehandle . . . . 161
Expires: July 2000 [Page 5]
Draft Specification NFS version 4 Protocol January 2000
14.2.21. Operation 23: PUTPUBFH - Set Public Filehandle . . . 162
14.2.22. Operation 24: PUTROOTFH - Set Root Filehandle . . . 163
14.2.23. Operation 25: READ - Read from File . . . . . . . . 164
14.2.24. Operation 26: READDIR - Read Directory . . . . . . . 167
14.2.25. Operation 27: READLINK - Read Symbolic Link . . . . 171
14.2.26. Operation 28: REMOVE - Remove Filesystem Object . . 173
14.2.27. Operation 29: RENAME - Rename Directory Entry . . . 175
14.2.28. Operation 30: RENEW - Renew a Lease . . . . . . . . 178
14.2.29. Operation 31: RESTOREFH - Restore Saved Filehandle . 180
14.2.30. Operation 32: SAVEFH - Save Current Filehandle . . . 182
14.2.31. Operation 33: SECINFO - Obtain Available Security . 183
14.2.32. Operation 34: SETATTR - Set Attributes . . . . . . . 185
14.2.33. Operation 35: SETCLIENTID - Negotiate Clientid . . . 188
14.2.34. Operation 36: SETCLIENTID_CONFIRM - Confirm Clientid 190
14.2.35. Operation 37: VERIFY - Verify Same Attributes . . . 192
14.2.36. Operation 38: WRITE - Write to File . . . . . . . . 194
15. NFS Version 4 Callback Procedures . . . . . . . . . . . . 199
15.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 199
15.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . 200
15.2.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . 202
15.2.2. Operation 4: CB_RECALL - Recall an Open Delegation . 204
16. Security Considerations . . . . . . . . . . . . . . . . . 206
17. IANA Considerations . . . . . . . . . . . . . . . . . . . 207
17.1. Named Attribute Definition . . . . . . . . . . . . . . 207
18. RPC definition file . . . . . . . . . . . . . . . . . . . 208
19. Bibliography . . . . . . . . . . . . . . . . . . 210 . . . . 240
20. Authors and Contributors . . . . . . . . . . . . . . . . 245
20.1. Editor's Address . . . . . . . . . . . . . . . . . . . 245
20.2. Authors' Addresses . . . . . . . . . . . . . . . . . . 245
21. Full Copyright Statement . . . . . . . . . . . . . . . . 212 247
Expires: April July 2000 [Page 6]
Draft Protocol Specification NFS version 4 October 1999 Protocol January 2000
1. Introduction
The NFS version 4 protocol is a further revision of the NFS protocol
defined already by versions 2 [RFC1094] and 3 [RFC1813]. It retains
the essential characteristics of previous versions: design for easy
recovery, independent of transport protocols, operating systems and
filesystems, simplicity, and good performance. The NFS version 4
revision has the following goals:
o Improved access and good performance on the Internet.
The protocol is designed to transit firewalls easily, perform
well where latency is high and bandwidth is low, and scale to
very large numbers of clients per server.
o Strong security with negotiation built into the protocol.
The protocol builds on the work of the ONCRPC working group in
supporting the RPCSEC_GSS protocol. Additionally Additionally, the NFS
version 4 protocol provides a mechanism to allow clients and
servers the ability to negotiate security and require clients
and servers to support a minimal set of security schemes.
o Good cross-platform interoperability.
The protocol features a filesystem file system model that provides a
useful, common set of features that does not unduly favor one filesystem
file system or operating system over another.
o Designed for protocol extensions.
The protocol is designed to accept standard extensions that do
not compromise backward compatibility.
Expires: April 2000 [Page 7]
Draft Protocol Specification
1.1. Overview of NFS version Version 4 October 1999
2. RPC and Security Flavor
The NFS Features
To provide a reasonable context for the reader, the major features of
NFS version 4 protocol will be reviewed in brief. This will be done
to provide an appropriate context for both the reader who is familiar
with the previous versions of the NFS protocol and the reader that is
new to the NFS protocols. For the reader new to the NFS protocols,
there is still a Remote Procedure Call (RPC)
application fundamental knowledge that uses is expected. The reader
should be familiar with the XDR and RPC protocols as described in
Expires: July 2000 [Page 7]
Draft Specification NFS version 2 4 Protocol January 2000
[RFC1831] and [RFC1832]. A basic knowledge of file systems and
distributed file systems is expected as well.
1.1.1. RPC and Security
As with previous versions of NFS, the corresponding eXternal External Data Representation
(XDR) as and Remote Procedure Call (RPC) mechanisms used for the NFS
version 4 protocol are those defined in [RFC1831] and [RFC1832]. The
RPCSEC_GSS To
meet end to end security flavor as defined in [RFC2203] MUST requirements, the RPCSEC_GSS framework
[RFC2623] will be used as
the mechanism to deliver stronger security extend the basic RPC security. With the
use of RPCSEC_GSS, various mechanisms can be provided to NFS version 4.
2.1. Ports offer
authentication, integrity, and Transports
Historically, privacy to the NFS version 2 and version 3 servers have resided on
UDP/TCP port 2049. Port 2049 is a IANA registered port number for NFS
and therefore 4 protocol.
Kerberos V5 will continue to be used for NFS version 4. Using the
well known port for NFS services means the NFS client will not need
to use the RPC binding protocols as described in [RFC1833]; this will
allow NFS [RFC1964] to transit firewalls.
The NFS server SHOULD offer its RPC service via TCP as the primary
transport. The server SHOULD also provide UDP for RPC service. one
security framework. The
NFS client SHOULD also have a preference for TCP usage but may supply
a LIPKEY GSS-API mechanism to override TCP described in favor of UDP as the RPC transport.
2.2. Security Flavors
Traditional RPC implementations have included AUTH_NONE, AUTH_SYS,
AUTH_DH, and AUTH_KRB4 as security flavors. With [RFC2203] an
additional security flavor of RPCSEC_GSS has been introduced which
uses the functionality of GSS-API [RFC2078]. This allows
[RFCXXXX] will be used to provide for the use of varying security mechanisms user password and
server public key by the RPC layer without NFS version 4 protocol. With the
additional implementation overhead use of adding RPC
RPCSEC_GSS, other mechanisms may also be specified and used for NFS
version 4 security.
To enable in-band security flavors.
For negotiation, the NFS version 4, 4 protocol
has added a new operation which provides the RPCSEC_GSS client a method of
querying the server about its policies regarding which security flavor MUST
mechanisms must be used for access to
enable the mandatory server's file system
resources. With this, the client can securely match the security mechanism. The flavors AUTH_NONE,
AUTH_SYS,
mechanism that meets the policies specified at both the client and AUTH_DH MAY be implemented as well.
2.2.1. Security mechanisms for
server.
1.1.2. Procedure and Operation Structure
A significant departure from the previous versions of the NFS
protocol is the introduction of the COMPOUND procedure. For the NFS
version 4
The use of RPCSEC_GSS requires selection of: mechanism, quality of
protection, protocol, there are two RPC procedures, NULL and service (authentication, integrity, privacy). COMPOUND.
The
remainder COMPOUND procedure is defined in terms of this document will refer to operations and these three parameters
operations correspond more closely to the traditional NFS procedures.
With the use of the RPCSEC_GSS security as COMPOUND procedure, the security triple.
2.2.1.1. Kerberos V5 as security triple
The Kerberos V5 GSS-API mechanism as described client is able to build
simple or complex requests. These COMPOUND requests allow for a
reduction in [RFC1964] MUST the number of RPCs needed for logical file system
operations. For example, without previous contact with a server a
client will be
implemented able to read data from a file in one request by
combining LOOKUP, OPEN, and provide READ operations in a single COMPOUND RPC.
With previous versions of the following security triples.
columns: NFS protocol, this type of single
request was not possible.
The model used for COMPOUND is very simple. There is no logical OR
or ANDing of operations. The operations combined within a COMPOUND
request are evaluated in order by the server. Once an operation
Expires: April July 2000 [Page 8]
Draft Protocol Specification NFS version 4 October 1999
1 == number of pseudo flavor
2 == name of pseudo flavor
3 == mechanism's OID
4 == mechanism's algorithm(s)
5 == RPCSEC_GSS service
1 2 3 4 5
-----------------------------------------------------------------------
390003 krb5 1.2.840.113554.1.2.2 DES MAC MD5 rpc_gss_svc_none
390004 krb5i 1.2.840.113554.1.2.2 DES MAC MD5 rpc_gss_svc_integrity
390005 krb5p 1.2.840.113554.1.2.2 DES MAC MD5 rpc_gss_svc_privacy
for integrity, Protocol January 2000
returns a failing result, the evaluation ends and 56 bit DES
for privacy.
Note that the pseudo flavor is presented here as a mapping aid results of all
evaluated operations are returned to the
implementor. Because this client.
The NFS version 4 protocol includes continues to have the client refer to a
file or directory at the server by a "filehandle". The COMPOUND
procedure has a method of passing a filehandle from one operation to
negotiate security
another within the sequence of operations. There is a concept of a
"current filehandle" and it understands "saved filehandle". Most operations use the GSS-API mechanism,
"current filehandle" as the
pseudo flavor is not needed. file system object to operate upon. The pseudo flavor
"saved filehandle" is needed used as temporary filehandle storage within a
COMPOUND procedure as well as an additional operand for certain
operations.
1.1.3. File System Model
The general file system model used for the NFS version 3 since 4 protocol is
the security negotiation same as previous versions. The server file system is done via
hierarchical with the MOUNT
protocol.
For regular files contained within being treated as
opaque byte streams. In a discussion of NFS' use of RPCSEC_GSS slight departure, file and Kerberos V5, please
see [RFC2623].
2.2.1.2. <another security triple>
Another GSS-API mechanism will need directory names
are encoded with UTF-8 to be specified here
along deal with the corresponding security triple(s).
2.3. Security Negotiation
With the basics of
internationalization.
The NFS version 4 server potentially offering multiple security
mechanisms, the client will need protocol does not require a way to determine or negotiate
which mechanism is separate protocol to be used
provide for its communication with the initial mapping between path name and filehandle.
Instead of using the older MOUNT protocol for this mapping, the
server provides a ROOT filehandle that represents the logical root or
top of the file system tree provided by the server. The NFS server may have
provides multiple points within its file system name
space that are available for use systems by NFS clients. glueing them together with pseudo
file systems. These pseudo file systems provide for potential gaps
in the path names between real file systems.
1.1.3.1. Filehandle Types
In turn previous versions of the NFS protocol, the filehandle provided by
the server may was guaranteed to be configured such that each of these entry points may
have different valid or multiple security mechanisms in use.
The security negotiation between client and server must be done with
a secure channel to eliminate persistent for the possibility lifetime
of a third party
intercepting the negotiation sequence and forcing the client and file system object to which it referred. For some server
implementations, this persistence requirement has been difficult to choose a lower level of security than required/desired.
Expires: April 2000 [Page 9]
Draft Protocol Specification NFS version 4 October 1999
2.3.1. Security Error
Based on
meet. For the assumption that each NFS version 4 client and server
must support a minimum set protocol, this requirement has been
relaxed by introducing another type of security (i.e. Kerberos-V5 under
RPCSEC_GSS, <ed: add other>), the NFS client will start its
communication with filehandle, volatile. With
persistent and volatile filehandle types, the server with one of the minimal security
triples. During communication with the server, implementation
can match the client may
receive an NFS error abilities of NFS4ERR_WRONGSEC. This error allows the
server to notify the client that the security triple currently being
used is not appropriate for access to the server's file system
resources. The client is then responsible for determining what
security triples are available at the server and choose one which is
appropriate for along with
the client.
2.3.2. SECINFO operating environment. The new procedure SECINFO (see SECINFO procedure definition) will
allow the client to determine, on a per filehandle basis, what
security triple is to be used for server access. In general, the client will not have to use knowledge of the SECINFO procedure except during
initial communication with
type of filehandle being provided by the server or when the client crosses
policy boundaries at the server. It could happen that the server's
policies change during the client's interaction therefore forcing the
client and can be prepared
to negotiate a new security triple. deal with the semantics of each.
Expires: April July 2000 [Page 10] 9]
Draft Protocol Specification NFS version 4 October 1999
3. File handles Protocol January 2000
1.1.3.2. Attribute Types
The file handle in the NFS version 4 protocol is a per server unique identifier
for a introduces three classes of file system object. The contents of or
file attributes. Like the additional filehandle type, the
classification of file handle are opaque attributes has been done to ease server
implementations along with extending the client. Therefore, overall functionality of the server
NFS protocol. This attribute model is responsible for translating
the file handle structured to an internal representation be extensible
such that new attributes can be introduced in minor revisions of the file system
object. Since the file handle
protocol without requiring significant rework.
The three classifications are: mandatory, recommended and named
attributes. This is a significant departure from the client's reference to an object
and previous
attribute model used in the client may cache this reference, NFS protocol. Previously, the server should not reuse
a file handle attributes
for another the file system object. and file objects were a fixed set of mainly Unix
attributes. If the server needs to
reuse a file handle value, the time elapsed before reuse SHOULD be
large enough that it is likely the or client no longer has did not support a cached copy
of the reused file handle value.
3.1. Obtaining particular
attribute, it would have to simulate the First File Handle
The procedures of attribute the NFS protocol best it could.
Mandatory attributes are defined in terms the minimal set of one file or
more file handles. Therefore, system
attributes that must be provided by the client needs a file handle to
initiate communication with server and must be properly
represented by the server. With NFS version 2 [RFC1094]
and NFS version 3 [RFC1813], there exists an ancillary protocol to
obtain this first Recommended attributes represent
different file handle. system types and operating environments. The MOUNT protocol, RPC program
number 100005, provides
recommended attributes will allow for better interoperability and the mechanism
inclusion of translating a string based more operating environments. The mandatory and
recommended attribute sets are traditional file or file system path name to
attributes. The third type of attribute is the named attribute. A
named attribute is an opaque byte stream that is associated with a
directory or file handle which can then and referred to by a string name. Named attributes
are meant to be used by the
NFS protocols.
The MOUNT protocol has deficiencies in the area of security and use
via firewalls. This is one reason that the use of the public client applications as a method to associate
application specific data with a regular file
handle was introduced [RFC2054] [RFC2055]. With or directory.
One significant addition to the use recommended set of public file handle in combination with the LOOKUP procedure in NFS version 2
and 3, it has been demonstrated that the MOUNT protocol attributes is
unnecessary
the Access Control List (ACL) attribute. This attribute provides for viable interaction between NFS client
directory and server.
Therefore, NFS version 4 will not use an ancillary protocol for
translation from string based path names to a file handle. Two
special file handles will be access control beyond the model used as starting points for in previous
versions of the NFS
client.
3.1.1. Root File Handle protocol. The first ACL definition allows for
specification of user and group level access control.
1.1.3.3. File System Replication and Migration
With the use of a special file handles is attribute, the ROOT file handle. The
ROOT ability to migrate or
replicate server file handle systems is enabled within the "conceptual" root of the protocol. The
file system name
space at locations attribute provides a method for the NFS server. The client uses or starts with the ROOT
file handle by employing the PUTROOTFH procedure. The PUTROOTFH
procedure instructs to
probe the server to set about the "current" location of a file handle to system. In the ROOT event
of the server's a migration of a file tree. Once this PUTROOTFH procedure is
used, system, the client will receive an error
when operating on the file system and it can then traverse the entirety of query as to the server's new
file
tree with system location. Similar steps are used for replication, the LOOKUP procedure. A complete discussion of
client is able to query the server
name space is in section 7, "NFS Server Name Space". for the multiple available
locations of a particular file system. From this information, the
Expires: April July 2000 [Page 11] 10]
Draft Protocol Specification NFS version 4 October 1999
3.1.2. Public File Handle
The second special file handle is the PUBLIC file handle. Unlike Protocol January 2000
client can use its own policies to access the
ROOT appropriate file handle, the PUBLIC system
location.
1.1.4. OPEN and CLOSE
The NFS version 4 protocol introduces OPEN and CLOSE operations. The
OPEN operation provides a single point where file handle may lookup, creation,
and share semantics can be bound or represent an
arbitrary file system object at the server. combined. The server is
responsible CLOSE operation also
provides for this binding. It may be that the PUBLIC file handle
and release of state accumulated by OPEN.
1.1.5. File locking
With the ROOT file handle refer to NFS version 4 protocol, the same support for byte range file system object.
However, it
locking is up to the administrative software at the server and
the policies of the server administrator to define the binding part of the
PUBLIC file handle and server file system object. The client may not
make any assumptions about this binding.
3.2. File Handle Types
In NFS version 2 and 3, there was one type of file handle with a
single set of semantics. NFS version 4 introduces a new type of file
handle in an attempt to accommodate certain server environments. protocol. The
first type of file handle locking support is
structured so that an RPC callback mechanism is not required. This
is 'persistent'. The semantics of a
persistent file handle are the same as departure from the file handles of NFS
version 2 and 3. The second or new type previous versions of file handle is the
'volatile' NFS file handle. locking
protocol, Network Lock Manager (NLM). The volatile state associated with file handle type
locks is being introduced to address maintained at the server
functionality or implementation issues which prevent correct or
feasible implementation of under a persistent file handle. Some lease-based model. The
server
environments do not provide defines a file system level invariant that can be
used to construct single lease period for all state held by a persistent file handle. The underlying server
file system may NFS
client. If the client does not provide renew its lease within the invariant or defined
period, all state associated with the server's file system
APIs client's lease may not provide access to be released
by the needed invariant. Volatile file
handles server. The client may ease the implementation renew its lease with use of server functionality such as
hierarchical storage management or file system reorganization or
migration. However, the volatile file handle increases the
implementation burden RENEW
operation or implicitly by use of other operations (primarily READ).
1.1.6. Client Caching and Delegation
The file, attribute, and directory caching for the client but this increased burden NFS version 4
protocol is
deemed acceptable based on the overall gains achieved similar to previous versions. Attributes and directory
information are cached for a duration determined by the
protocol.
Since client. At
the end of a predefined timeout, the client will have different paths of logic query the server to handle
persistent and volatile
see if the related file handles, a system object has been updated.
For file attribute is defined
which may be used by data, the client checks its cache validity when the file is
opened. A query is sent to the server to determine if the file handle types
being returned by has
been changed. Based on this information, the client determines if
the data cache for the file should kept or released. Also, when the
file is closed, any modified data is written to the server.
3.2.1. General Properties
If an application wants to serialize access to file data, file
locking of a File Handle
The the file handle contains all data ranges in question should be used.
The major addition to NFS version 4 in the information area of caching is the
ability of the server needs to
distinguish an individual file. To delegate certain responsibilities to the client,
client. When the file handle is
opaque. The client stores file handles server grants a delegation for use in a later request and file to a client,
the client is guaranteed certain semantics with respect to the
Expires: April July 2000 [Page 12] 11]
Draft Protocol Specification NFS version 4 October 1999
can compare two Protocol January 2000
sharing of that file handles from with other clients. At OPEN, the same server may
provid the client either a read or write delegation for equality by
doing the file. If
the client is granted a byte-by-byte comparison, but MUST NOT otherwise interpret read delegation, it is assured that no other
client has the ability to write to the
contents of file handles. for the duration of the
delegation. If two file handles from the same server
are equal, they MUST refer to client is granted a write delegation, the same file, but if they are not
equal, client
is assured that no conclusions other client has read or write access to the file.
Delegations can be drawn. Servers SHOULD try recalled by the server. If another client
requests access to maintain a
one-to-one correspondence between file handles and files but this is
not required. Clients MUST only use the file handle comparisons only to
improve performance, not for correct behavior.
As an example, in the case such a way that two different path names when
traversed at the server terminate at access conflicts
with the same file system object, granted delegation, the server SHOULD return the same file handle for each path. This can
occur if a hard link is used to create two file names which refer able to notify the same underlying file object and associated data. For example, if
paths /a/b/c initial
client and /a/d/c refer to recall the same file, delegation. This requires that a callback path
exist between the server SHOULD
return the same file handle for both and client. If this callback path names traversals.
3.2.2. Persistent File Handle
A persistent file handle does not
exist, then delegations can not be granted. The essence of a
delegation is defined that it allows the client to locally service operations
such as having a persistent value OPEN, CLOSE, LOCK, LOCKU, READ, WRITE without immediate
interaction with the server.
1.2. General Definitions
The following definitions are provided for the lifetime purpose of providing
an appropriate context for the file system object to reader.
Client The "client" is the entity that accesses the NFS server's
resources. The client may be an application which it refers. Once contains
the logic to access the NFS server creates directly. The client
may also be the traditional operating system client remote
file handle system services for a file system object, the server
MUST return set of applications.
In the same case of file handle for locking the object for client is the lifetime entity that
maintains a set of
the object. If the server restarts or reboots, locks on behalf of one or the filesystem more
applications. This client is
migrated, responsible for crash or
failure recovery for those locks it manages.
Note that multiple clients may share the NFS server must honor same transport and present
multiple clients may exist on the same file handle
value network node.
Clientid A 64-bit quantity used as it did in the server's previous instantiation. a unique, short-hand reference to
a client supplied Verifier and ID. The persistent file handle will be become stale or invalid when the
file system object server is removed. When
responsible for supplying the Clientid.
Lease An interval of time defined by the server for which the
client is presented with a
persistent file handle that refers to irrevokeably granted a deleted object, it MUST
return an error lock. At the end of NFS4ERR_STALE. A file handle may become stale
when a
lease period the file system containing lock may be revoked if the object is no longer available.
The file system may become unavailable if it exists on removable
media and the media is no longer available at the server or the file
system in whole has been destroyed or the file system lease has simply been
removed from the server's name space (i.e. unmounted in a Unix
environment).
3.2.3. Volatile File Handle
A volatile file handle does not share the same longevity attributes
of the persistent file handle.
been extended. The server may determine that a
volatile file handle is no longer valid at many different points in
time. If the server can definitively determine that lock must be revoked if a volatile file
handle refers to an object that conflicting
lock has been removed, the server should
return NFS4ERR_STALE to the client (as is granted after the case for persistent lease interval.
Expires: April July 2000 [Page 13] 12]
Draft Protocol Specification NFS version 4 October 1999
file handles). In all other cases where the server determines that a
volatile file handle can no longer be used, it should return an error
of NFS4ERR_EXPIRED.
The following table shows the most common points at which a volatile
file handle may expire. This table represents the view from the
client's perspective and as such provides columns for when the file
may be open or closed Protocol January 2000
All leases granted by a server have the client.
Server Provides Persistent or Volatile File Handle
File Open File Closed
___________________________________________________________________
Restart of Server (note 4) P / V P / V
Filesystem Migration (note 5) P / V P / V
SHARE/LOCK recovery P / V N/A (note 1)
Client RENAMEs object P / V P / V
Client RENAMEs path same fixed
interval.
Lock The term "lock" is used to object P / V P / V
Other client RENAMEs object P / V P / V
Other client RENAMEs path refer to object P / V P / V
Client REMOVEs object P / V (note 2) P / V
Other client REMOVEs object P / V N/A (note 3)
Note 1
If the both record (byte-
range) locks as well as file (share) locks unless
specifically stated otherwise.
Server The "Server" is not open, persistence of the file handle is not
applicable entity responsible for the recovery of SHARE/LOCK.
Note 2
With NFS version 2 and 3, when the coordinating
client removes access to a file it has
open it follows the convention set of RENAMEing the file to '.nfsXXXX'
until the file is closed. At this point systems.
Stateid A 64-bit quantity returned by a server that uniquely
defines the REMOVE is done at locking state granted by the
server.
If this same model is used server for v4 then this entry will be
'N/A'.
Note 3
If the file is not open a
specific lock owner for a specific file.
A stateid composed of all bits 0 or all bits 1 has special
meaning and are reserved values.
Verifier A 64-bit quantity generated by the client, then it should not expect
any cached file handle to be valid.
Note 4
The restart of client that the NFS server signifies when the operating system
or NFS software is (re)started. This also includes High
Availability configurations where a separate operating system
instantiation acquires ownership of
can use to determine if the file system resources and
network resources (i.e. disks client has restarted and IP addresses). lost
all previous lock state.
Expires: April July 2000 [Page 14] 13]
Draft Protocol Specification NFS version 4 October 1999
Note 5
Filesystem migration may occur in response Protocol January 2000
2. Protocol Data Types
The syntax and semantics to an unresponsive
server or when describe the current server indicates that a filesystem has
moved data types of the NFS
version 4 protocol are defined in the XDR [RFC1832] and RPC [RFC1831]
documents. The next sections build upon the XDR data types to define
types and structures specific to this protocol.
2.1. Basic Data Types
Data Type Definition
_____________________________________________________________________
int32_t typedef int int32_t;
uint32_t typedef unsigned int uint32_t;
int64_t typedef hyper int64_t;
uint64_t typedef unsigned hyper uint64_t;
attrlist4 typedef opaque attrlist4<>;
Used for file/directory attributes
bitmap4 typedef uint32_t bitmap4<>;
Used in attribute array encoding.
changeid4 typedef uint64_t changeid4;
Used in definition of change_info
clientid4 typedef uint64_t clientid4;
Shorthand reference to client identification
component4 typedef utf8string component4;
Represents path name components
count4 typedef uint32_t count4;
Various count parameters (READ, WRITE, COMMIT)
length4 typedef uint64_t length4;
Describes LOCK lengths
linktext4 typedef utf8string linktext4;
Symbolic link contents
mode4 typedef uint32_t mode4;
Mode attribute data type
nfs_cookie4 typedef uint64_t nfs_cookie4;
Opaque cookie value for READDIR
Expires: July 2000 [Page 14]
Draft Specification NFS version 4 Protocol January 2000
nfs_fh4 typedef opaque nfs_fh4<NFS4_FHSIZE>;
Filehandle definition; NFS4_FHSIZE is defined as 128
nfs_ftype4 enum nfs_ftype4;
Various defined file types
nfsstat4 enum nfsstat4;
Return value for operations
offset4 typedef uint64_t offset4;
Various offset designations (READ, WRITE, LOCK, COMMIT)
pathname4 typedef component4 pathname4<>;
Represents path name for LOOKUP, OPEN and others
qop4 typedef uint32_t qop4;
Quality of protection designation in SECINFO
sec_oid4 typedef opaque sec_oid4<>;
Security Object Identifier
The sec_oid4 data type is not really opaque.
Instead contains an ASN.1 OBJECT IDENTIFIER as used
by returning NFS4ERR_MOVED. GSS-API in the mech_type argument to
GSS_Init_sec_context. See [RFC2078] for details.
seqid4 typedef uint32_t seqid4;
Sequence identifier used for file locking
stateid4 typedef uint64_t stateid4;
State identifier used for file locking and delegation
utf8string typedef opaque utf8string<>;
UTF-8 encoding for strings
verifier4 typedef opaque verifier4[NFS4_VERIFIER_SIZE];
Verifier used for various operations (COMMIT, CREATE,
OPEN, READDIR, SETCLIENTID, WRITE)
NFS4_VERIFIER_SIZE is defined as 8
2.2. Structured Data Types
nfstime4
struct nfstime4 {
int64_t seconds;
uint32_t nseconds;
Expires: July 2000 [Page 15]
Draft Specification NFS version 4 Protocol January 2000
}
The nfstime4 structure gives the number of seconds and
nanoseconds since midnight or 0 hour January 1, 1970 Coordinated
Universal Time (UTC). Values greater than zero for the seconds
field denote dates after the 0 hour January 1, 1970. Values
less than zero for the seconds field denote dates before the 0
hour January 1, 1970. In both cases, the attribute
fs_locations designates nseconds field is to
be added to the seconds field for the final time representation.
For example, if the time to be represented is one-half second
before 0 hour January 1, 1970, the seconds field would have a
value of negative one (-1) and the nseconds fields would have a
value of one-half second (500000000). Values greater than
999,999,999 for nseconds are considered invalid.
This data type is used to pass time and date information. A
server converts to and from local time when processing time
values, preserving as much accuracy as possible. If the
precision of timestamps stored for a file system object is less
than defined, loss of precision can occur. An adjunct time
maintenance protocol is recommended to reduce client and server
time skew.
specdata4
struct specdata4 {
uint32_t specdata1;
uint32_t specdata2;
}
This data type represents additional information for the device
file types NF4CHR and NF4BLK.
fsid4
struct fsid4 {
uint64_t major;
uint64_t minor;
};
This type is the file system identifier that is used as a
mandatory attribute.
fs_location4
Expires: July 2000 [Page 16]
Draft Specification NFS version 4 Protocol January 2000
struct fs_location4 {
utf8string server<>;
pathname4 rootpath;
};
fs_locations4
struct fs_locations4 {
pathname4 fs_root;
fs_location4 locations<>;
};
The fs_location4 and fs_locations4 data types are used for the
fs_locations recommended attribute which is used for migration
and replication support.
fattr4
struct fattr4 {
bitmap4 attrmask;
attrlist4 attr_vals;
};
The fattr4 structure is used to represent file and directory
attributes.
The bitmap is a counted array of 32 bit integers used to contain
bit values. The position of the integer in the array that
contains bit n can be computed from the expression (n / 32) and
its bit within that integer is (n mod 32).
0 1
+-----------+-----------+-----------+--
| count | 31 .. 0 | 63 .. 32 |
+-----------+-----------+-----------+--
change_info4
struct change_info4 {
bool atomic;
changeid4 before;
changeid4 after;
};
This structure is used with the CREATE, LINK, REMOVE, RENAME
Expires: July 2000 [Page 17]
Draft Specification NFS version 4 Protocol January 2000
operations to let the client know value of the change attribute
for the directory in which the target file system object
resides.
clientaddr4
struct clientaddr4 {
/* see struct rpcb in RFC 1833 */
string r_netid<>; /* network id */
string r_addr<>; /* universal address */
};
The clientaddr4 structure is used as part of the SETCLIENT
operation to specify the address of either the client that is
using a clientid or as part of the call back registration.
cb_client4
struct cb_client4 {
unsigned int cb_program;
clientaddr4 cb_location;
};
This structure is used by the client to inform the server of its
call back address; includes the program number and client
address.
nfs_client_id4
struct nfs_client_id4 {
verifier4 verifier;
opaque id<>;
};
This structure is part of the arguments to the SETCLIENTID
operation.
nfs_lockowner4
struct nfs_lockowner4 {
clientid4 clientid;
opaque owner<>;
};
Expires: July 2000 [Page 18]
Draft Specification NFS version 4 Protocol January 2000
This structure is used to identify the owner of a OPEN share or
file lock.
Expires: July 2000 [Page 19]
Draft Specification NFS version 4 Protocol January 2000
3. RPC and Security Flavor
The NFS version 4 protocol is a Remote Procedure Call (RPC)
application that uses RPC version 2 and the corresponding eXternal
Data Representation (XDR) as defined in [RFC1831] and [RFC1832]. The
RPCSEC_GSS security flavor as defined in [RFC2203] MUST be used as
the mechanism to deliver stronger security for the NFS version 4
protocol.
3.1. Ports and Transports
Historically, NFS version 2 and version 3 servers have resided on
port 2049. The registered port 2049 [RFC1700] for the NFS protocol
should be the default configuration. Using the registered port for
NFS services means the NFS client will not need to use the RPC
binding protocols as described in [RFC1833]; this will allow NFS to
transit firewalls.
The transport used by the RPC service for the NFS version 4 protocol
MUST provide congestion control comparable to that defined for TCP in
[RFC2581]. If the operating environment implements TCP, the NFS
version 4 protocol SHOULD be supported over TCP. The NFS client and
server may use other transports if they support congestion control as
defined above and in those cases a mechanism may be provided to
override TCP usage in favor of another transport.
If TCP is used as the transport, the client and server SHOULD use
persistent connections. This will prevent the weakening of TCP's
congestion control via short lived connections.
3.2. Security Flavors
Traditional RPC implementations have included AUTH_NONE, AUTH_SYS,
AUTH_DH, and AUTH_KRB4 as security flavors. With [RFC2203] an
additional security flavor of RPCSEC_GSS has been introduced which
uses the functionality of GSS-API [RFC2078]. This allows for the use
of varying security mechanisms by the RPC layer without the
additional implementation overhead of adding RPC security flavors.
For NFS version 4, the RPCSEC_GSS security flavor MUST be used to
enable the mandatory security mechanism. Other flavors, such as,
AUTH_NONE, AUTH_SYS, and AUTH_DH MAY be implemented as well.
3.2.1. Security mechanisms for NFS version 4
The use of RPCSEC_GSS requires selection of: mechanism, quality of
protection, and service (authentication, integrity, privacy). The
remainder of this document will refer to these three parameters of
Expires: July 2000 [Page 20]
Draft Specification NFS version 4 Protocol January 2000
the RPCSEC_GSS security as the security triple.
3.2.1.1. Kerberos V5 as security triple
The Kerberos V5 GSS-API mechanism as described in [RFC1964] MUST be
implemented and provide the following security triples.
column descriptions:
1 == number of pseudo flavor
2 == name of pseudo flavor
3 == mechanism's OID
4 == mechanism's algorithm(s)
5 == RPCSEC_GSS service
1 2 3 4 5
-----------------------------------------------------------------------
390003 krb5 1.2.840.113554.1.2.2 DES MAC MD5 rpc_gss_svc_none
390004 krb5i 1.2.840.113554.1.2.2 DES MAC MD5 rpc_gss_svc_integrity
390005 krb5p 1.2.840.113554.1.2.2 DES MAC MD5 rpc_gss_svc_privacy
for integrity,
and 56 bit DES
for privacy.
Note that the pseudo flavor is presented here as a mapping aid to the
implementor. Because this NFS protocol includes a method to
negotiate security and it understands the GSS-API mechanism, the
pseudo flavor is not needed. The pseudo flavor is needed for NFS
version 3 since the security negotiation is done via the MOUNT
protocol.
For a discussion of NFS' use of RPCSEC_GSS and Kerberos V5, please
see [RFC2623].
3.2.1.2. LIPKEY as a security triple
The LIPKEY GSS-API mechanism as described in [RFCXXXX] MUST be
implemented and provide the following security triples. The
definition of the columns matches the previous subsection "Kerberos
V5 as security triple"
1 2 3 4 5
-----------------------------------------------------------------------
390006 lipkey TBD negotiated rpc_gss_svc_none
390007 lipkey-i TBD negotiated rpc_gss_svc_integrity
390008 lipkey-p TBD negotiated rpc_gss_svc_privacy
Expires: July 2000 [Page 21]
Draft Specification NFS version 4 Protocol January 2000
The mechanism algorithm is listed as "negotiated". This is because
LIPKEY is layered on SPKM-3 and in SPKM-3 [RFCXXXX] the
confidentiality and integrity algorithms are negotiated. Since
SPKM-3 specifies HMAC-MD5 for integrity as MANDATORY, 128 bit
cast5CBC for confidentiality for privacy as MANDATORY, and further
specifies that HMAC-MD5 and cast5CBC MUST be listed first before
weaker algorithms, specifying "negotiated" in column 4 does not
impair interoperability. In the event an SPKM-3 peer does not
support the mandatory algorithms, the other peer is free to accept or
reject the GSS-API context creation.
Because SPKM-3 negotiates the algorithms, subsequent calls to
LIPKEY's GSS_Wrap() and GSS_GetMIC() by RPCSEC_GSS will use a quality
of protection value of 0 (zero). See section 5.2 of [RFC2025] for an
explanation.
LIPKEY uses SPKM-3 to create a secure channel in which to pass a user
name and password from the client to the user. Once the user name
and password have been accepted by the server, calls to the LIPKEY
context are redirected to the SPKM-3 context. See [RFCXXXX] for more
details.
3.2.1.3. SPKM-3 as a security triple
The SPKM-3 GSS-API mechanism as described in [RFCXXXX] MUST be
implemented and provide the following security triples. The
definition of the columns matches the previous subsection "Kerberos
V5 as security triple".
1 2 3 4 5
-----------------------------------------------------------------------
390009 spkm3 TBD negotiated rpc_gss_svc_none
390010 spkm3i TBD negotiated rpc_gss_svc_integrity
390011 spkm3p TBD negotiated rpc_gss_svc_privacy
For a discussion as to why the mechanism algorithm is listed as
"negotiated", see the previous section "LIPKEY as a security triple."
Because SPKM-3 negotiates the algorithms, subsequent calls to SPKM-
3's GSS_Wrap() and GSS_GetMIC() by RPCSEC_GSS will use a quality of
protection value of 0 (zero). See section 5.2 of [RFC2025] for an
explanation.
Even though LIPKEY is layered onto SPKM-3, SPKM-3 is specified as a
mandatory set of triples to handle the situation when the initiator
(the client) is anonymous. If the initiator is anonymous, there will
not be a user name and password to send to the target (the server).
Expires: July 2000 [Page 22]
Draft Specification NFS version 4 Protocol January 2000
3.3. Security Negotiation
With the NFS version 4 server potentially offering multiple security
mechanisms, the client needs a method to determine or negotiate which
mechanism is to be used for its communication with the server. The
NFS server may have multiple points within its file system name space
that are available for use by NFS clients. In turn the NFS server
may be configured such that each of these entry points may have
different or multiple security mechanisms in use.
The security negotiation between client and server must be done with
a secure channel to eliminate the possibility of a third party
intercepting the negotiation sequence and forcing the client and
server to choose a lower level of security than required or desired.
3.3.1. Security Error
Based on the assumption that each NFS version 4 client and server
must support a minimum set of security (i.e. LIPKEY, SPKM-3, and
Kerberos-V5 all under RPCSEC_GSS), the NFS client will start its
communication with the server with one of the minimal security
triples. During communication with the server, the client may
receive an NFS error of NFS4ERR_WRONGSEC. This error allows the
server to notify the client that the security triple currently being
used is not appropriate for access to the server's file system
resources. The client is then responsible for determining what
security triples are available at the server and choose one which is
appropriate for the client.
3.3.2. SECINFO
The new SECINFO operation will allow the client to determine, on a
per filehandle basis, what security triple is to be used for server
access. In general, the client will not have to use the SECINFO
procedure except during initial communication with the server or when
the client crosses policy boundaries at the server. It is possible
that the server's policies change during the client's interaction
therefore forcing the client to negotiate a new security triple.
3.4. Callback RPC Authentication
The callback RPC (described later) must mutually authenticate the NFS
server to the principal that acquired the delegation (also described
later), using the same security flavor the original delegation
operation used.
Expires: July 2000 [Page 23]
Draft Specification NFS version 4 Protocol January 2000
For AUTH_NONE, there are no principals, so this is a non-issue.
For AUTH_SYS, the server simply uses the AUTH_SYS credential that the
user used when it set up the delegation.
For AUTH_DH, one commonly used convention is that the server uses the
credentional corresponding to this AUTH_DH principal:
unix.host@domain
where host and domain are variables corresponding to the name of
server host and directory services domain in which it lives such as a
Network Information System domain or a DNS domain.
Regardless of what security mechanism under RPCSEC_GSS is being used,
the NFS server, MUST identify itself in GSS-API via a
GSS_C_NT_HOSTBASED_SERVICE name type. GSS_C_NT_HOSTBASED_SERVICE
names are of the form:
service@hostname
For NFS, the "service" element is
nfs
Implementations of security mechanisms will convert nfs@hostname to
various different forms. For Kerberos V5 and LIPKEY, the following
form is RECOMMENDED:
nfs/hostname
For Kerberos V5, nfs/hostname would be a server principal in the
Kerberos Key Distribution Center database. For LIPKEY, this would be
the username passed to the target (the NFS version 4 client that
receives the callback).
It should be noted that LIPKEY may not work for callbacks, since the
LIPKEY client uses a user id/password. If the NFS client receiving
the callback can authenticate the NFS server's user name/password
pair, and if the user that the NFS server is authenticating to has a
public key certificiate, then it works.
Expires: July 2000 [Page 24]
Draft Specification NFS version 4 Protocol January 2000
4. Filehandles
The filehandle in the NFS protocol is a per server unique identifier
for a file system object. The contents of the filehandle are opaque
to the client. Therefore, the server is responsible for translating
the filehandle to an internal representation of the file system
object. Since the filehandle is the client's reference to an object
and the client may cache this reference, the server should not reuse
a filehandle for another file system object. If the server needs to
reuse a filehandle value, the time elapsed before reuse SHOULD be
large enough that it is likely the client no longer has a cached copy
of the reused filehandle value.
4.1. Obtaining the First Filehandle
The operations of the NFS protocol are defined in terms of one or
more filehandles. Therefore, the client needs a filehandle to
initiate communication with the server. With the NFS version 2
protocol [RFC1094] and the NFS version 3 protocol [RFC1813], there
exists an ancillary protocol to obtain this first filehandle. The
MOUNT protocol, RPC program number 100005, provides the mechanism of
translating a string based file system path name to a filehandle
which can then be used by the NFS protocols.
The MOUNT protocol has deficiencies in the area of security and use
via firewalls. This is one reason that the use of the public
filehandle was introduced in [RFC2054] and [RFC2055]. With the use
of the public filehandle in combination with the LOOKUP procedure in
the NFS version 2 and 3 protocols, it has been demonstrated that the
MOUNT protocol is unnecessary for viable interaction between NFS
client and server.
Therefore, the NFS version 4 protocol will not use an ancillary
protocol for translation from string based path names to a
filehandle. Two special filehandles will be used as starting points
for the NFS client.
4.1.1. Root Filehandle
The first of the special filehandles is the ROOT filehandle. The
ROOT filehandle is the "conceptual" root of the file system name
space at the NFS server. The client uses or starts with the ROOT
filehandle by employing the PUTROOTFH operation. The PUTROOTFH
operation instructs the server to set the "current" filehandle to the
ROOT of the server's file tree. Once this PUTROOTFH operation is
used, the client can then traverse the entirety of the server's file
Expires: July 2000 [Page 25]
Draft Specification NFS version 4 Protocol January 2000
tree with the LOOKUP procedure. A complete discussion of the server
name space is in the section "NFS Server Name Space".
4.1.2. Public Filehandle
The second special filehandle is the PUBLIC filehandle. Unlike the
ROOT filehandle, the PUBLIC filehandle may be bound or represent an
arbitrary file system object at the server. The server is
responsible for this binding. It may be that the PUBLIC filehandle
and the ROOT filehandle refer to the same file system object.
However, it is up to the administrative software at the server and
the policies of the server administrator to define the binding of the
PUBLIC filehandle and server file system object. The client may not
make any assumptions about this binding.
4.2. Filehandle Types
In the NFS version 2 and 3 protocols, there was one type of
filehandle with a single set of semantics. The NFS version 4
protocol introduces a new type of filehandle in an attempt to
accommodate certain server environments. The first type of
filehandle is 'persistent'. The semantics of a persistent filehandle
are the same as the filehandles of the NFS version 2 and 3 protocols.
The second or new type of filehandle is the "volatile" filehandle.
The volatile filehandle type is being introduced to address server
functionality or implementation issues which make correct
implementation of a persistent filehandle infeasible. Some server
environments do not provide a file system level invariant that can be
used to construct a persistent filehandle. The underlying server
file system may not provide the invariant or the server's file system
programming interfaces may not provide access to the needed
invariant. Volatile filehandles may ease the implementation of
server functionality such as hierarchical storage management or file
system reorganization or migration. However, the volatile filehandle
increases the implementation burden for the client. However this
increased burden is deemed acceptable based on the overall gains
achieved by the protocol.
Since the client will need to handle persistent and volatile
filehandle differently, a file attribute is defined which may be used
by the client to determine the filehandle types being returned by the
server.
Expires: July 2000 [Page 26]
Draft Specification NFS version 4 Protocol January 2000
4.2.1. General Properties of a Filehandle
The filehandle contains all the information the server needs to
distinguish an individual file. To the client, the filehandle is
opaque. The client stores filehandles for use in a later request and
can compare two filehandles from the same server for equality by
doing a byte-by-byte comparison. However, the client MUST NOT
otherwise interpret the contents of filehandles. If two filehandles
from the same server are equal, they MUST refer to the same file. If
they are not equal, the client may use information provided by the
server, in the form of file attributes, to determine whether they
denote the same files or different files. The client would do this
as necessary for client side caching. Servers SHOULD try to maintain
a one-to-one correspondence between filehandles and files but this is
not required. Clients MUST use filehandle comparisons only to
improve performance, not for correct behavior. All clients need to
be prepared for situations in which it cannot be determined whether
two filehandles denote the same object and in such cases, avoid
making invalid assumpions which might cause incorrect behavior.
Further discussion of filehandle and attribute comparison in the
context of data caching is presented in the section "Data Caching and
File Identity".
As an example, in the case that two different path names when
traversed at the server terminate at the same file system object, the
server SHOULD return the same filehandle for each path. This can
occur if a hard link is used to create two file names which refer to
the same underlying file object and associated data. For example, if
paths /a/b/c and /a/d/c refer to the same file, the server SHOULD
return the same filehandle for both path names traversals.
4.2.2. Persistent Filehandle
A persistent filehandle is defined as having a fixed value for the
lifetime of the file system object to which it refers. Once the
server creates the filehandle for a file system object, the server
MUST accept the same filehandle for the object for the lifetime of
the object. If the server restarts or reboots the NFS server must
honor the same filehandle value as it did in the server's previous
instantiation. Similarly, if the file system is migrated, the new
NFS server must honor the same file handle as the old NFS server.
The persistent filehandle will be become stale or invalid when the
file system object is removed. When the server is presented with a
persistent filehandle that refers to a deleted object, it MUST return
an error of NFS4ERR_STALE. A filehandle may become stale when the
file system containing the object is no longer available. The file
Expires: July 2000 [Page 27]
Draft Specification NFS version 4 Protocol January 2000
system may become unavailable if it exists on removable media and the
media is no longer available at the server or the file system in
whole has been destroyed or the file system has simply been removed
from the server's name space (i.e. unmounted in a Unix environment).
4.2.3. Volatile Filehandle
A volatile filehandle does not share the same longevity
characteristics of a persistent filehandle. The server may determine
that a volatile filehandle is no longer valid at many different
points in time. If the server can definitively determine that a
volatile filehandle refers to an object that has been removed, the
server should return NFS4ERR_STALE to the client (as is the case for
persistent filehandles). In all other cases where the server
determines that a volatile filehandle can no longer be used, it
should return an error of NFS4ERR_FHEXPIRED.
The mandatory attribute "fh_expire_type" is used by the client to
determine what type of filehandle the server is providing for a
particular file system. This attribute is a bitmask with the
following values:
FH4_PERSISTENT
The value of FH4_PERSISTENT is used to indicate a persistent
filehandle, which is valid until the object is removed from the
file system. The server will not return NFS4ERR_FHEXPIRED for
this filehandle. FH4_PERSISTENT is defined as a value in which
none of the bits specified below are set.
FH4_NOEXPIRE_WITH_OPEN
The filehandle will not expire while client has the file open.
If this bit is set, then the values FH4_VOLATILE_ANY or
FH4_VOL_RENAME do not impact expiration while the file is open.
Once the file is closed or if the FH4_NOEXPIRE_WITH_OPEN bit is
false, the rest of the volatile related bits apply.
FH4_VOLATILE_ANY
The filehandle may expire at any time and will expire on during
system migration.
FH4_VOL_MIGRATION
The filehandle will expire during file system migration and only
then. May only be set if FH4_VOLATILE_ANY is not set.
FH4_VOL_RENAME
The filehandle may expire due to a rename. This includes a
Expires: July 2000 [Page 28]
Draft Specification NFS version 4 Protocol January 2000
rename by the requesting client or a rename by another client.
May only be set if FH4_VOLATILE_ANY is not set.
Servers which provide volatile filehandles should deny a RENAME or
REMOVE that would effect an OPEN file or any of the components
leading to the OPEN file. In addition, the server should deny all
RENAME or REMOVE requests during the grace or lease period upon
server restart.
The reader may be wondering why there are three FH4_VOL* bits and why
FH4_VOLATILE_ANY is exclusive of FH4_VOL_MIGRATION and
FH4_VOL_RENAME. If the a filehandle is normally persistent but
cannot persist across a file set migration, then the presence of the
FH4_VOL_MIGRATION or FH4_VOL_RENAME tells the client that it can
treat the file handle as persistent for purposes of maintaining a
file name to file handle cache, except for the specific event
described by the bit. However, FH4_VOLATILE_ANY tells the client
that it should not maintain such a cache for unopened files. A
server MUST not present FH4_VOLATILE_ANY with FH4_VOL_MIGRATION or
FH4_VOL_RENAME as this will lead to confusion. FH4_VOLATILE_ANY
implies that the file handle will expire upon migration or rename, in
addition to other events.
4.2.4. One Method of Constructing a Volatile Filehandle
As mentioned, in some instances a filehandle is stale (no longer
valid; perhaps because the file was removed from the server) or it is
expired (the underlying file is valid but since the filehandle is
volatile, it may have expired). Thus the server needs to be able to
return NFS4ERR_STALE in the former case and NFS4ERR_FHEXPIRED in the
latter case. This can be done by careful construction of the volatile
filehandle. One possible implementation follows.
A volatile filehandle, while opaque to the client could contain:
[volatile bit = 1 | server boot time | slot | generation number]
o slot is an index in the server volatile filehandle table
o generation number is the generation number for the table
entry/slot
If the server boot time is less than the current server boot time,
return NFS4ERR_FHEXPIRED. If slot is out of range, return
NFS4ERR_BADHANDLE. If the generation number does not match, return
Expires: July 2000 [Page 29]
Draft Specification NFS version 4 Protocol January 2000
NFS4ERR_FHEXPIRED.
When the server reboots, the table is gone (it is volatile).
If volatile bit is 0, then it is a persistent filehandle with a
different structure following it.
4.3. Client Recovery from Filehandle Expiration
If possible, the client SHOULD recover from the receipt of an
NFS4ERR_FHEXPIRED error. The client must take on additional
responsibility so that it may prepare itself to recover from the
expiration of a volatile filehandle. If the server returns
persistent filehandles, the client does not need these additional
steps.
For volatile filehandles, most commonly the client will need to store
the component names leading up to and including the file system
object in question. With these names, the client should be able to
recover by finding a filehandle in the name space that is still
available or by starting at the root of the server's file system name
space.
If the expired filehandle refers to an object that has been removed
from the file system, obviously the client will not be able to
recover from the expired filehandle.
It is also possible that the expired filehandle refers to a file that
has been renamed. If the file was renamed by another client, again
it is possible that the original client will not be able to recover.
However, in the case that the client itself is renaming the file and
the file is open, it is possible that the client may be able to
recover. The client can determine the new path name based on the
processing of the rename request. The client can then regenerate the
new filehandle based on the new path name. The client could also use
the compound operation mechanism to construct a set of operations
like:
RENAME A B
LOOKUP B
GETFH
Expires: July 2000 [Page 30]
Draft Specification NFS version 4 Protocol January 2000
5. File Attributes
To meet the requirements of extensibility and increased
interoperability with non-Unix platforms, attributes must be handled
in a flexible manner. The NFS Version 3 fattr3 structure contains a
fixed list of attributes that not all clients and servers are able to
support or care about. The fattr3 structure can not be extended as
new needs arise and it provides no way to indicate non-support. With
the NFS Version 4 protocol, the client will be able to ask what
attributes the server supports and will be able to request only those
attributes in which it is interested.
To this end, attributes will be divided into three groups: mandatory,
recommended, and named. Both mandatory and recommended attributes
are supported in the NFS version 4 protocol by a specific and well-
defined encoding and are identified by number. They are requested by
setting a bit in the bit vector sent in the GETATTR request; the
server response includes a bit vector to list what attributes were
returned in the response. New mandatory or recommended attributes
may be added to the NFS protocol between major revisions by
publishing a standards-track RFC which allocates a new attribute
number value and defines the encoding for the attribute. See the
section "Minor Versioning" for further discussion.
Named attributes are accessed by the new OPENATTR operation, which
accesses a hidden directory of attributes associated with a file
system object. OPENATTR takes a filehandle for the object and
returns the filehandle for the attribute hierarchy. The filehandle
for the named attributes is a directory object accessible by LOOKUP
or READDIR and contains files whose names represent the named
attributes and whose data bytes are the value of the attribute. For
example:
LOOKUP "foo" ; look up file
GETATTR attrbits
OPENATTR ; access foo's named attributes
LOOKUP "x11icon" ; look up specific attribute
READ 0,4096 ; read stream of bytes
Named attributes are intended for data needed by applications rather
than by an NFS client implementation. NFS implementors are strongly
encouraged to define their new attributes as recommended attributes
by bringing them to the IETF standards-track process.
The set of attributes which are classified as mandatory is
deliberately small since servers must do whatever it takes to support
Expires: July 2000 [Page 31]
Draft Specification NFS version 4 Protocol January 2000
them. The recommended attributes may be unsupported; though a server
should support as many as it can. Attributes are deemed mandatory if
the data is both needed by a large number of clients and is not
otherwise reasonably computable by the client when support is not
provided on the server.
5.1. Mandatory Attributes
These MUST be supported by every NFS Version 4 client and server in
order to ensure a minimum level of interoperability. The server must
store and return these attributes and the client must be able to
function with an attribute set limited to these attributes. With
just the mandatory attributes some client functionality may be
impaired or limited in some ways. A client may ask for any of these
attributes to be returned by setting a bit in the GETATTR request and
the server must return their value.
5.2. Recommended Attributes
These attributes are understood well enough to warrant support in the
NFS Version 4 protocol. However, they may not be supported on all
clients and servers. A client may ask for any of these attributes to
be returned by setting a bit in the GETATTR request but must handle
the case where the server does not return them. A client may ask for
the set of attributes the server supports and should not request
attributes the server does not support. A server should be tolerant
of requests for unsupported attributes and simply not return them
rather than considering the request an error. It is expected that
servers will support all attributes they comfortably can and only
fail to support attributes which are difficult to support in their
operating environments. A server should provide attributes whenever
they don't have to "tell lies" to the client. For example, a file
modification time should be either an accurate time or should not be
supported by the server. This will not always be comfortable to
clients but it seems that the client has a better ability to
fabricate or construct an attribute or do without the attribute.
5.3. Named Attributes
These attributes are not supported by direct encoding in the NFS
Version 4 protocol but are accessed by string names rather than
numbers and correspond to an uninterpreted stream of bytes which are
stored with the file system object. The name space for these
attributes may be accessed by using the OPENATTR operation. The
OPENATTR operation returns a filehandle for a virtual "attribute
Expires: July 2000 [Page 32]
Draft Specification NFS version 4 Protocol January 2000
directory" and further perusal of the name space may be done using
READDIR and LOOKUP operations on this filehandle. Named attributes
may then be examined or changed by normal READ and WRITE and CREATE
operations on the filehandles returned from READDIR and LOOKUP.
Named attributes may have attributes.
It is recommended that servers support arbitrary named attributes. A
client should not depend on the ability to store any named attributes
in the server's file system. If a server does support named
attributes, a client which is also able to handle them should be able
to copy a file's data and meta-data with complete transparency from
one location to another; this would imply that names allowed for
regular directory entries are valid for named attribute names as
well.
Names of attributes will not be controlled by this document or other
IETF standards track documents. See the section "IANA
Considerations" for further discussion.
Expires: July 2000 [Page 33]
Draft Specification NFS version 4 Protocol January 2000
5.4. Mandatory Attributes - Definitions
Name # DataType Access Description
___________________________________________________________________
supp_attr 0 bitmap READ The bit vector which
would retrieve all
mandatory and
recommended attributes
that are supported for
this object.
object_type 1 nfs4_ftype READ The type of the object
(file, directory,
symlink)
fh_expire_type 2 uint32 READ Server uses this to
specify filehandle
expiration behavior to
the client. See the
section "Filehandles"
for additional
description.
change 3 uint64 READ A value created by the
server that the client
can use to determine
if file data,
directory contents or
attributes of the
object have been
modified. The server
may return the
object's time_modify
attribute for this
attribute's value but
only if the file
system object can not
be updated more
frequently than the
resolution of
time_modify.
object_size 4 uint64 R/W The size of the object
in bytes.
Expires: July 2000 [Page 34]
Draft Specification NFS version 4 Protocol January 2000
link_support 5 boolean READ Does the object's file
system supports hard
links?
symlink_support 6 boolean READ Does the object's file
system supports
symbolic links?
named_attr 7 boolean READ Does this object have
named attributes?
fsid 8 fsid4 READ Unique file system
identifier for the
file system holding
this object. fsid
contains major and
minor components each
of which are uint64.
unique_handles 9 boolean READ Are two distinct
filehandles guaranteed
to refer to two
different file system
objects?
lease_time 10 nfs_lease4 READ Duration of leases at
server in seconds.
rdattr_error 11 enum READ Error returned from
getattr during
readdir.
Expires: July 2000 [Page 35]
Draft Specification NFS version 4 Protocol January 2000
5.5. Recommended Attributes - Definitions
Name # Data Type Access Description
_____________________________________________________________________
ACL 12 nfsace4<> R/W The access control
list for the object.
aclsupport 13 uint32 READ Indicates what types
of ACLs are supported
on the current file
system.
archive 14 boolean R/W Whether or not this
file has been
archived since the
time of last
modification
(deprecated in favor
of backup_time).
cansettime 15 boolean READ Whether or not this
object's file system
can fill in the times
on a SETATTR request
without an explicit
time.
case_insensitive 16 boolean READ Are filename
comparisons on this
file system case
insensitive?
case_preserving 17 boolean READ Is filename case on
this file system
preserved?
Expires: July 2000 [Page 36]
Draft Specification NFS version 4 Protocol January 2000
chown_restricted 18 boolean READ If TRUE, the server
will reject any
request to change
either the owner or
the group associated
with a file if the
caller is not a
privileged user (for
example, "root" in
Unix operating
environments or in NT
the "Take Ownership"
privilege)
filehandle 19 nfs4_fh READ The filehandle of
this object
(primarily for
readdir requests).
fileid 20 uint64 READ A number uniquely
identifying the file
within the file
system.
files_avail 21 uint64 READ File slots available
to this user on the
file system
containing this
object - this should
be the smallest
relevant limit.
files_free 22 uint64 READ Free file slots on
the file system
containing this
object - this should
be the smallest
relevant limit.
files_total 23 uint64 READ Total file slots on
the file system
containing this
object.
Expires: July 2000 [Page 37]
Draft Specification NFS version 4 Protocol January 2000
fs_locations 24 fs_locations READ Locations where this
file system may be
found. If the server
returns NFS4ERR_MOVED
as an error, this
attribute must be
supported.
hidden 25 boolean R/W Is file considered
hidden with respect
to the WIN32 API?
homogeneous 26 boolean READ Whether or not this
object's file system
is homogeneous, i.e.
whether pathconf is
the same for all file
system objects.
maxfilesize 27 uint64 READ Maximum supported
file size for the
file system of this
object.
maxlink 28 uint32 READ Maximum number of
links for this
object.
maxname 29 uint32 READ Maximum filename size
supported for this
object.
maxread 30 uint64 READ Maximum read size
supported for this
object.
Expires: July 2000 [Page 38]
Draft Specification NFS version 4 Protocol January 2000
maxwrite 31 uint64 READ Maximum write size
supported for this
object. This
attribute SHOULD be
supported if the file
is writable. Lack of
this attribute can
lead to the client
either wasting
bandwidth or not
receiving the best
performance.
mime_type 32 utf8<> R/W MIME body
type/subtype of this
object.
mode 33 mode4 R/W Unix-style permission
bits for this object
(deprecated in favor
of ACLs)
no_trunc 34 boolean READ If a name longer than
name_max is used,
will an error be
returned or will the
name be truncated?
numlinks 35 uint32 READ Number of links to
this object.
owner 36 utf8<> R/W The string name of
the owner of this
object.
owner_group 37 utf8<> R/W The string name of
the group of the
owner of this object.
quota_hard 38 uint64 READ For definition see
"Quota Attributes"
section below.
quota_soft 39 uint64 READ For definition see
"Quota Attributes"
section below.
Expires: July 2000 [Page 39]
Draft Specification NFS version 4 Protocol January 2000
quota_used 40 uint64 READ For definition see
"Quota Attributes"
section below.
rawdev 41 specdata4 READ Raw device
identifier.
space_avail 42 uint64 READ Disk space in bytes
available to this
user on the file
system containing
this object - this
should be the
smallest relevant
limit.
space_free 43 uint64 READ Free disk space in
bytes on the file
system containing
this object - this
should be the
smallest relevant
limit.
space_total 44 uint64 READ Total disk space in
bytes on the file
system containing
this object.
space_used 45 uint64 READ Number of file system
bytes allocated to
this object.
system 46 boolean R/W Is this file is a
system file with
respect to the WIN32
API?
time_access 47 nfstime4 R/W The time of last
access to the object.
time_backup 48 nfstime4 R/W The time of last
backup of the object.
Expires: July 2000 [Page 40]
Draft Specification NFS version 4 Protocol January 2000
time_create 49 nfstime4 R/W The time of creation
of the object. This
attribute does not
have any relation to
the traditional Unix
file attribute
"ctime" or "change
time".
time_delta 50 nfstime4 READ Smallest useful
server time
granularity.
time_metadata 51 nfstime4 R/W The time of last
meta-data
modification of the
object.
time_modify 52 nfstime4 R/W The time of last
modification to the
object.
5.6. Interpreting owner and owner_group
The recommended attributes "owner" and "owner_group" are represented
in terms of a UTF-8 string. To avoid a representation that is tied
to a particular underlying implementation at the client or server,
the use of the UTF-8 string has been chosen. Note that section 6.1
of [RFC2624] provides additional rationale. It is expected that the
client and server will have their own local representation of owner
and owner_group that is used for local storage or presentation to the
end user. Therefore, it is expected that when these attributes are
transferred between the client and server that the local
representation is translated to a syntax of the form
"user@dns_domain". This will allow for a client and server that do
not use the same local representation the ability to translate to a
common syntax that can be interpreted by both.
The translation is not specified as part of the protocol. This
allows various solutions to be employed. For example, a local
translation table may be consulted that maps between a numeric id to
the user@dns_domain syntax. A name service may also be used to
accomplish the translation. The "dns_domain" portion of the owner
string is meant to be a DNS domain name. For example, user@ietf.org.
Expires: July 2000 [Page 41]
Draft Specification NFS version 4 Protocol January 2000
In the case where there is no translation available to the client or
server, the attribute value must be constructed without the "@".
Therefore, the absence of the @ from the owner or owner_group
attribute signifies that no translation was available and the
receiver of the attribute should not place any special meaning with
the attribute value. Even though the attribute value can not be
translated, it may still be useful. In the case of a client, the
attribute string may be used for local display of ownership.
5.7. Quota Attributes
For the attributes related to file system quotas, the following
definitions apply:
quota_avail_soft
The value in bytes which represents the amount of additional
disk space that can be allocated to this file or directory
before the user may reasonably be warned. It is understood that
this space may be consumed by allocations to other files or
directories though there is a rule as to which other files or
directories.
quota_avail_hard
The value in bytes which represent the amount of additional disk
space beyond the current allocation that can be allocated to
this file or directory before further allocations will be
declined. It is understood that this space may be consumed by
allocations to other files or directories.
quota_used
The value in bytes which represent the amount of disc space used
by this file or directory and possibly a number of other similar
files or directories, where the set of "similar" meets at least
the criterion that allocating space to any file or directory in
the set will reduce the "quota_avail_hard" of every other file
or directory in the set.
Note that there may be a number of distinct but overlapping sets
of files or directories for which a quota_used value is
maintained. E.g. "all files with a given owner", "all files with
a given group owner". etc.
The server is at liberty to choose any of those sets but should
do so in a repeatable way. The rule may be configured per-
filesystem or may be "choose the set with the smallest quota".
Expires: July 2000 [Page 42]
Draft Specification NFS version 4 Protocol January 2000
5.8. Access Control Lists
The NFS ACL attribute is an array of access control entries (ACE).
There are various access control entry types. The server is able to
communicate which ACE types are supported by returning the
appropriate value within the aclsupport attribute. The types of ACEs
are defined as follows:
Type Description
_____________________________________________________
ALLOW Explicitly grants the access defined in
acemask4 to the file or directory.
DENY Explicitly denies the access defined in
acemask4 to the file or directory.
AUDIT LOG (system dependant) any access
attempt to a file or directory which
uses any of the access methods specified
in acemask4.
ALARM Generate a system ALARM (system
dependant) when any access attempt is
made to a file or directory for the
access methods specified in acemask4.
The NFS ACE attribute is defined as follows:
typedef uint32_t acetype4;
typedef uint32_t aceflag4;
typedef uint32_t acemask4;
struct nfsace4 {
acetype4 type;
aceflag4 flag;
acemask4 access_mask;
utf8string who;
};
To determine if an ACCESS or OPEN request succeeds each nfsace4 entry
is processed in order by the server. Only ACEs which have a "who"
that matches the requester are considered. Each ACE is processed
until all of the bits of the requester's access have been ALLOWED.
Once a bit (see below) has been ALLOWED by an ACCESS_ALLOWED_ACE, it
is no longer considered in the processing of later ACEs. If an
ACCESS_DENIED_ACE is encountered where the requester's mode still has
unALLOWED bits in common with the "access_mask" of the ACE, the
request is denied.
Expires: July 2000 [Page 43]
Draft Specification NFS version 4 Protocol January 2000
The bitmask constants used to represent the above definitions within
the aclsupport attribute are as follows:
const ACL4_SUPPORT_ALLOW_ACL = 0x00000001;
const ACL4_SUPPORT_DENY_ACL = 0x00000002;
const ACL4_SUPPORT_AUDIT_ACL = 0x00000004;
const ACL4_SUPPORT_ALARM_ACL = 0x00000008;
5.8.1. ACE type
The semantics of the "type" field follow the descriptions provided
above.
The bitmask constants used to for the type field are as follows:
const ACE4_ACCESS_ALLOWED_ACE_TYPE = 0x00000000;
const ACE4_ACCESS_DENIED_ACE_TYPE = 0x00000001;
const ACE4_SYSTEM_AUDIT_ACE_TYPE = 0x00000002;
const ACE4_SYSTEM_ALARM_ACE_TYPE = 0x00000003;
5.8.2. ACE flag
The "flag" field contains values based on the following descriptions.
ACE4_FILE_INHERIT_ACE
Can be placed on a directory and indicates that this ACE should be
added to each new non-directory file created.
ACE4_DIRECTORY_INHERIT_ACE
Can be placed on a directory and indicates that this ACE should be
added to each new directory created.
ACE4_INHERIT_ONLY_ACE
Can be placed on a directory but does not apply to the directory,
only to newly created files/directories as specified by the above two
flags.
ACE4_NO_PROPAGATE_INHERIT_ACE
Expires: July 2000 [Page 44]
Draft Specification NFS version 4 Protocol January 2000
Can be placed on a directory. Normally when a new directory is
created and an ACE exists on the parent directory which is marked
ACL4_DIRECTORY_INHERIT_ACE, two ACEs are placed on the new directory.
One for the directory itself and one which is an inheritable ACE for
newly created directories. This flag tells the server to not place
an ACE on the newly created directory which is inheritable by
subdirectories of the created directory.
ACE4_SUCCESSFUL_ACCESS_ACE_FLAG
ACL4_FAILED_ACCESS_ACE_FLAG
Both indicate for AUDIT and ALARM which state to log the event. On
every ACCESS or OPEN call which occurs on a file or directory which
has an ACL that is of type ACE4_SYSTEM_AUDIT_ACE_TYPE or
ACE4_SYSTEM_ALARM_ACE_TYPE, the attempted access is compared to the
ace4mask of these ACLs. If the access is a subset of ace4mask and the
identifier match, an AUDIT trail or an ALARM is generated. By
default this happens regardless of the success or failure of the
ACCESS or OPEN call.
The flag ACE4_SUCCESSFUL_ACCESS_ACE_FLAG only produces the AUDIT or
ALARM if the ACCESS or OPEN call is successful. The
ACE4_FAILED_ACCESS_ACE_FLAG causes the ALARM or AUDIT if the ACCESS
or OPEN call fails.
ACE4_IDENTIFIER_GROUP
Indicates that the "who" refers to a GROUP as defined under Unix.
The bitmask constants used to for the flag field are as follows:
const ACE4_FILE_INHERIT_ACE = 0x00000001;
const ACE4_DIRECTORY_INHERIT_ACE = 0x00000002;
const ACE4_NO_PROPAGATE_INHERIT_ACE = 0x00000004;
const ACE4_INHERIT_ONLY_ACE = 0x00000008;
const ACE4_SUCCESSFUL_ACCESS_ACE_FLAG = 0x00000010;
const ACE4_FAILED_ACCESS_ACE_FLAG = 0x00000020;
const ACE4_IDENTIFIER_GROUP = 0x00000040;
5.8.3. ACE Access Mask
The access_mask field contains values based on the following:
Expires: July 2000 [Page 45]
Draft Specification NFS version 4 Protocol January 2000
Access Description
_______________________________________________________________
READ_DATA Permission to read the data of the file
LIST_DIRECTORY Permission to list the contents of a
directory
WRITE_DATA Permission to modify the file's data
ADD_FILE Permission to add a new file to a
directory
APPEND_DATA Permission to append data to a file
ADD_SUBDIRECTORY Permission to create a subdirectory to a
directory
READ_NAMED_ATTRS Permission to read the named attributes
of a file
WRITE_NAMED_ATTRS Permission to write the named attributes
of a file
EXECUTE Permission to execute a file
DELETE_CHILD Permission to delete a file or directory
within a directory
READ_ATTRIBUTES The ability to read basic attributes
(non-acls) of a file
WRITE_ATTRIBUTES Permission to change basic attributes
(non-acls) of a file
DELETE Permission to Delete the File
READ_ACL Permission to Read the ACL
WRITE_ACL Permission to Write the ACL
WRITE_OWNER Permission to change the owner
SYNCHRONIZE Permission to access file locally at the
server with synchronous reads and writes
The bitmask constants used to for the access mask field are as
follows:
const ACE4_READ_DATA = 0x00000001;
const ACE4_LIST_DIRECTORY = 0x00000001;
const ACE4_WRITE_DATA = 0x00000002;
const ACE4_ADD_FILE = 0x00000002;
const ACE4_APPEND_DATA = 0x00000004;
const ACE4_ADD_SUBDIRECTORY = 0x00000004;
const ACE4_READ_NAMED_ATTRS = 0x00000008;
const ACE4_WRITE_NAMED_ATTRS = 0x00000010;
const ACE4_EXECUTE = 0x00000020;
const ACE4_DELETE_CHILD = 0x00000040;
const ACE4_READ_ATTRIBUTES = 0x00000080;
const ACE4_WRITE_ATTRIBUTES = 0x00000100;
const ACE4_DELETE = 0x00010000;
const ACE4_READ_ACL = 0x00020000;
Expires: July 2000 [Page 46]
Draft Specification NFS version 4 Protocol January 2000
const ACE4_WRITE_ACL = 0x00040000;
const ACE4_WRITE_OWNER = 0x00080000;
const ACE4_SYNCHRONIZE = 0x00100000;
5.8.4. ACE who
There are several special identifiers ("who") which need to be
understood universally. Some of these identifiers cannot be
understood when an NFS client accesses the server, but have meaning
when a local process accesses the file. The ability to display and
modify these permissions is permitted over NFS.
Who Description
_______________________________________________________________
"OWNER" The owner of the file.
"GROUP" The group associated with the file.
"EVERYONE" The world.
"INTERACTIVE" Accessed from an interactive terminal.
"NETWORK" Accessed via the network.
"DIALUP" Accessed as a dialup user to the server.
"BATCH" Accessed from a batch job.
"ANONYMOUS" Accessed without any authentication.
"AUTHENTICATED" Any authenticated user (opposite of
ANONYMOUS)
"SERVICE" Access from a system service.
To avoid conflict, these special identitifers are distinguish by an
appended "@" and should appear in the form "xxxx@" (note: no domain
name after the "@"). For example: ANONYMOUS@.
Expires: July 2000 [Page 47]
Draft Specification NFS version 4 Protocol January 2000
6. File System Migration and Replication
With the use of the recommended attribute "fs_locations", the NFS
version 4 server has a method of providing file system migration or
replication services. For the purposes of migration and replication,
a file system will be defined as all files that share a given fsid
(both major and minor values are the same).
The fs_locations attribute provides a list of file system locations.
These locations are specified by providing the server name (either
DNS domain or IP address) and the path name representing the root of
the file system. Depending on the type of service being provided,
the list will provide a new location or a set of alternate locations
for the file system. The client will use this information to
redirect its requests to the new server.
6.1. Replication
It is expected that file system replication will be used in the case
of read-only data. Typically, the file system will be replicated on
two or more servers. The fs_locations attribute will provide the
list of these locations to the client. On first access of the file
system, the client should obtain the value of the fs_locations
attribute. If, in the future, the client finds the server
unresponsive, the client may attempt to use another server specified
by fs_locations.
If applicable, the client must take the appropriate steps to recover
valid filehandles from the new server. This is described in more
detail in the following sections.
6.2. Migration
File system migration is used to move a file system from one server
to another. Migration is typically used for a file system that is
writable and has a single copy. The expected use of migration is for
load balancing or general resource reallocation. The protocol does
not specify how the file system will be moved between servers. This
server-to-server transfer mechanism is left to the server
implementor. However, the method used to communicate the migration
event between client and server is specified here.
Once the servers participating in the migration have completed the
move of the file system, the error NFS4ERR_MOVED will be returned for
subsequent requests received by the original server. The
NFS4ERR_MOVED error is returned for all operations except GETATTR.
Expires: July 2000 [Page 48]
Draft Specification NFS version 4 Protocol January 2000
Upon receiving the NFS4ERR_MOVED error, the client will obtain the
value of the fs_locations attribute. The client will then use the
contents of the attribute to redirect its requests to the specified
server. To facilitate the use of GETATTR, operations such as PUTFH
must also be accepted by the server for the migrated file system's
filehandles. Note that if the server returns NFS4ERR_MOVED, the
server MUST support the fs_locations attribute.
If the client requests more attributes than fs_locations, the server
may return fs_locations only. This is to be expected since the
server has migrated the file system and may not have a method of
obtaining additional attribute data.
The server implementor needs to be careful in developing a migration
solution. The server must consider all of the state information
clients may have outstanding at the server. This includes but is not
limited to locking/share state, delegation state, and asynchronous
file writes which are represented by WRITE and COMMIT verifiers. The
server should strive to minimize the impact on its clients during and
after the migration process.
6.3. Interpretation of the fs_locations Attribute
The fs_location attribute is structured in the following way:
struct fs_location {
utf8string server<>;
pathname4 rootpath;
};
struct fs_locations {
pathname4 fs_root;
fs_location locations<>;
};
The fs_location struct is used to represent the location of a file
system by providing a server name and the path to the root of the
file system. For a multi-homed server or a set of servers that use
the same rootpath, an array of server names may be provided. An
entry in the server array is an UTF8 string and represents one of a
traditional DNS host name, IPv4 address, or IPv6 address. It is not
a requirement that all servers that share the same rootpath be listed
in one fs_location struct. The array of server names is provided for
convenience. Servers that share the same rootpath may also be listed
in separate fs_location entries in the fs_locations attribute.
The fs_locations struct and attribute then contains an array of
Expires: July 2000 [Page 49]
Draft Specification NFS version 4 Protocol January 2000
locations. Since the name space of each server may be constructed
differently, the "fs_root" field is provided. The path represented
by fs_root represents the location of the file system in the server's
name space. Therefore, the fs_root path is only associated with the
server from which the fs_locations attribute was obtained. The
fs_root path is meant to aid the client in locating the file system
at the various servers listed.
As an example, there is a replicated file system located at two
servers (servA and servB). At servA the file system is located at
path "/a/b/c". At servB the file system is located at path "/x/y/z".
In this example the client accesses the file system first at servA
with a multi-component lookup path of "/a/b/c/d". Since the client
used a multi-component lookup to obtain the filehandle at "/a/b/c/d",
it is unaware that the file system's root is located in servA's name
space at "/a/b/c". When the client switches to servB, it will need
to determine that the directory it first referenced at servA is now
represented by the path "/x/y/z/d" on servB. To facilitate this, the
fs_locations attribute provided by servA would have a fs_root value
of "/a/b/c" and two entries in fs_location. One entry in fs_location
will be for itself (servA) and the other will be for servB with a
path of "/x/y/z". With this information, the client is able to
substitute "/x/y/z" for the "/a/b/c" at the beginning of its access
path and construct "/x/y/z/d" to use for the new server.
6.4. Filehandle Recovery for Migration or Replication
Filehandles for file systems that are replicated or migrated
generally have the same semantics as for file systems that are not
replicated or migrated. For example, if a file system has persistent
filehandles and it is migrated to another server, the filehandle
values for the file system will be valid at the new server.
For volatile filehandles, the servers involved likely do not have a
mechanism to transfer filehandle format and content between
themselves. Therefore, a server may have difficulty in determining
if a volatile filehandle from an old server should return an error of
NFS4ERR_FHEXPIRED. Therefore, the client is informed, with the use
of the fh_expire_type attribute, whether volatile filehandles will
expire at the migration or replication event. If the bit
FH4_VOL_MIGRATION is set in the fh_expire_type attribute, the client
must treat the volatile filehandle as if the server had returned the
NFS4ERR_FHEXPIRED error. At the migration or replication event in
the presence of the FH4_VOL_MIGRATION bit, the client will not
present the original or old volatile file handle to the new server.
The client will start its communication with the new server by
recovering its filehandles using the saved file names.
Expires: July 2000 [Page 50]
Draft Specification NFS version 4 Protocol January 2000
7. NFS Server Name Space
7.1. Server Exports
On a UNIX server the name space describes all the files reachable by
pathnames under the root directory or "/". On a Windows NT server
the name space constitutes all the files on disks named by mapped
disk letters. NFS server administrators rarely make the entire
server's file system name space available to NFS clients. More often
portions of the name space are made available via an "export"
feature. In previous versions of the NFS protocol, the root
filehandle for each export is obtained through the MOUNT protocol;
the client sends a string that identifies the export of name space
and the server returns the root filehandle for it. The MOUNT
protocol supports an EXPORTS procedure that will enumerate the
server's exports.
7.2. Browsing Exports
The NFS version 4 protocol provides a root filehandle that clients
can use to obtain filehandles for these exports via a multi-component
LOOKUP. A common user experience is to use a graphical user
interface (perhaps a file "Open" dialog window) to find a file via
progressive browsing through a directory tree. The client must be
able to move from one export to another export via single-component,
progressive LOOKUP operations.
This style of browsing is not well supported by the NFS version 2 and
3 protocols. The client expects all LOOKUP operations to remain
within a single server file system. For example, the device
attribute will not change. This prevents a client from taking name
space paths that span exports.
An automounter on the client can obtain a snapshot of the server's
name space using the EXPORTS procedure of the MOUNT protocol. If it
understands the server's pathname syntax, it can create an image of
the server's name space on the client. The parts of the name space
that are not exported by the server are filled in with a "pseudo file
system" that allows the user to browse from one mounted file system
to another. There is a drawback to this representation of the
server's name space on the client: it is static. If the server
administrator adds a new export the client will be unaware of it.
Expires: July 2000 [Page 51]
Draft Specification NFS version 4 Protocol January 2000
7.3. Server Pseudo File System
NFS version 4 servers avoid this name space inconsistency by
presenting all the exports within the framework of a single server
name space. An NFS version 4 client uses LOOKUP and READDIR
operations to browse seamlessly from one export to another. Portions
of the server name space that are not exported are bridged via a
"pseudo file system" that provides a view of exported directories
only. A pseudo file system has a unique fsid and behaves like a
normal, read only file system.
Based on the construction of the server's name space, it is possible
that multiple pseudo file systems may exist. For example,
/a pseudo file system
/a/b real file system
/a/b/c pseudo file system
/a/b/c/d real file system
Each of the pseudo file systems are consider separate entities and
therefore will have a unique fsid.
7.4. Multiple Roots
The DOS and Windows operating environments are sometimes described as
having "multiple roots". File systems are commonly represented as
disk letters. MacOS represents file systems as top level names. NFS
version 4 servers for these platforms can construct a pseudo file
system above these root names so that disk letters or volume names
are simply directory names in the pseudo root.
7.5. Filehandle Volatility
The nature of the server's pseudo file system is that it is a logical
representation of file system(s) available from the server.
Therefore, the pseudo file system is most likely constructed
dynamically when the server is first instantiated. It is expected
that the pseudo file system may not have an on disk counterpart from
which persistent filehandles could be constructed. Even though it is
preferable that the server provide persistent filehandles for the
pseudo file system, the NFS client should expect that pseudo file
system filehandles are volatile. This can be confirmed by checking
the associated "persistent_fh" attribute for those filehandles in
question. If the filehandles are volatile, the NFS client must be
prepared to recover a filehandle value (e.g. with a multi-component
LOOKUP) when receiving an error of NFS4ERR_FHEXPIRED.
Expires: July 2000 [Page 52]
Draft Specification NFS version 4 Protocol January 2000
7.6. Exported Root
If the server's root file system is exported, one might conclude that
a pseudo-file system is not needed. This would be wrong. Assume the
following file systems on a server:
/ disk1 (exported)
/a disk2 (not exported)
/a/b disk3 (exported)
Because disk2 is not exported, disk3 cannot be reached with simple
LOOKUPs. The server must bridge the gap with a pseudo-file system.
7.7. Mount Point Crossing
The server file system environment may be constructed in such a way
that one file system contains a directory which is 'covered' or
mounted upon by a second file system. For example:
/a/b (file system 1)
/a/b/c/d (file system 2)
The pseudo file system for this server may be constructed to look
like:
/ (place holder/not exported)
/a/b (file system 1)
/a/b/c/d (file system 2)
It is the server's responsibility to present the pseudo file system
that is complete to the client. If the client sends a lookup request
for the path "/a/b/c/d", the server's response is the filehandle of
the file system "/a/b/c/d". In previous versions of the NFS
protocol, the server would respond with the directory "/a/b/c/d"
within the file system "/a/b".
The NFS client will be able to determine if it crosses a server mount
point by a change in the value of the "fsid" attribute.
7.8. Security Policy and Name Space Presentation
The application of the server's security policy needs to be carefully
considered by the implementor. One may choose to limit the
viewability of portions of the pseudo file system based on the
server's perception of the client's ability to authenticate itself
properly. However with the support of multiple security mechanisms
Expires: July 2000 [Page 53]
Draft Specification NFS version 4 Protocol January 2000
and the ability to negotiate the appropriate use of these mechanisms,
the server is unable to properly determine if a client will be able
to authenticate itself. If, based on its policies, the server
chooses to limit the contents of the pseudo file system, the server
may effectively hide file systems from a client that may otherwise
have legitimate access.
Expires: July 2000 [Page 54]
Draft Specification NFS version 4 Protocol January 2000
8. File Locking and Share Reservations
Integrating locking into the NFS protocol necessarily causes it to be
state-full. With the inclusion of "share" file locks the protocol
becomes substantially more dependent on state than the traditional
combination of NFS and NLM [XNFS]. There are three components to
making this state manageable:
o Clear division between client and server
o Ability to reliably detect inconsistency in state between client
and server
o Simple and robust recovery mechanisms
In this model, the server owns the state information. The client
communicates its view of this state to the server as needed. The
client is also able to detect inconsistent state before modifying a
file.
To support Win32 "share" locks it is necessary to atomically OPEN or
CREATE files. Having a separate share/unshare operation would not
allow correct implementation of the Win32 OpenFile API. In order to
correctly implement share semantics, the previous NFS protocol
mechanisms used when a file is opened or created (LOOKUP, CREATE,
ACCESS) need to be replaced. The NFS version 4 protocol has an OPEN
operation that subsumes the functionality of LOOKUP, CREATE, and
ACCESS. However, because many operations require a filehandle, the
traditional LOOKUP is preserved to map a file name to filehandle
without establishing state on the server. The policy of granting
access or modifying files is managed by the server based on the
client's state. These mechanisms can implement policy ranging from
advisory only locking to full mandatory locking.
8.1. Locking
It is assumed that manipulating a lock is rare when compared to READ
and WRITE operations. It is also assumed that crashes and network
partitions are relatively rare. Therefore it is important that the
READ and WRITE operations have a lightweight mechanism to indicate if
they possess a held lock. A lock request contains the heavyweight
information required to establish a lock and uniquely define the lock
owner.
The following sections describe the transition from the new heavy weight
information to the eventual stateid used for most client and server location
locking and lease interactions.
Expires: July 2000 [Page 55]
Draft Specification NFS version 4 Protocol January 2000
8.1.1. Client ID
For each LOCK request, the client must identify itself to the server.
This is done in such a way as to allow for correct lock
identification and crash recovery. Client identification is
accomplished with two values.
o A verifier that is used to detect client reboots.
o A variable length opaque array to uniquely define a client.
For an operating system this may be a fully qualified host
name or IP address. For a user level NFS client it may
additionally contain a process id or other unique sequence.
The data structure for the filesystem.
3.2.4. One Method Client ID would then appear as:
struct nfs_client_id {
opaque verifier[4];
opaque id<>;
}
It is possible through the mis-configuration of Constructing a Volatile File Handle
As mentioned, in some instances client or the
existence of a file handle rogue client that two clients end up using the same
nfs_client_id. This situation is stale (no longer
valid, avoided by "negotiating" the
nfs_client_id between client and server with the use of the
SETCLIENTID and SETCLIENTID_CONFIRM operations. The following
describes the two scenarios of negotiation.
1 Client has never connected to the server
In this case the client generates an nfs_client_id and
unless another client has the same nfs_client_id.id field,
the server accepts the request. The server also records the
principal (or principal to uid mapping) from the credential
in the RPC request that contains the nfs_client_id
negotiation request (SETCLIENTID operation).
Two clients might still use the same nfs_client_id.id due
to perhaps because configuration error. For example, a High
Availability configuration where the file was removed nfs_client_id.id is
derived from the server), or it ethernet controller address and both
systems have the same address. In this case, the result is expired
a switched union that returns in addition to
NFS4ERR_CLID_INUSE, the network address (the underlying file rpcbind netid
and universal address) of the client that is valid, but since using the file handle id.
Expires: July 2000 [Page 56]
Draft Specification NFS version 4 Protocol January 2000
2 Client is volatile, it may have expired). Thus re-connecting to the server needs to after a client reboot
In this case, the client still generates an nfs_client_id
but the nfs_client_id.id field will be able the same as the
nfs_client_id.id generated prior to return NFS4ERR_STALE in reboot. If the former case, server
finds that the principal/uid is equal to the previously
"registered" nfs_client_id.id, then locks associated with
the old nfs_client_id are immediately released. If the
principal/uid is not equal, then this is a rogue client and NFS4ERR_FHEXPIRED
the request is returned in error. For more discussion of
crash recovery semantics, see the latter case. This can be done by careful construction section on "Crash
Recovery"
To mitigate retransmission of the
volatile file handle. One possible implementation follows.
A volatile file handle, while opaque to SETCLIENTID operation,
the client could contain:
[volatile bit = 1 | and server use a confirmation step. The server
returns a confirmation verifier that the client then sends
to the server boot time | slot | generation number]
o slot is an index in the SETCLIENTID_CONFIRM operation. Once
the server volatile file handle table
o generation number is receives the generation number for confirmation from the table
entry/slot
If client, the server boot time
locking state for the client is less than released.
In both cases, upon success, NFS4_OK is returned. To help reduce the
amount of data transferred on OPEN and LOCK, the current server boot time, will also
return NFS4ERR_FHEXPIRED. If slot a unique 64-bit clientid value that is out of range, return
NFS4ERR_BADHANDLE. If a shorthand reference
to the generation number does not match, return
NFS4ERR_BADHANDLE.
When nfs_client_id values presented by the client. From this point
forward, the client will use the clientid to refer to itself.
The clientid assigned by the server reboots, should be chosen so that it will
not conflict with a clientid previously assigned by the table is gone (it is volatile).
If volatile bit server. This
applies across server restarts or reboots. When a clientid is 0, then it
presented to a server and that clientid is not recognized, as would
happen after a persistent file handle server reboot, the server will reject the request with a
different structure following it.
3.3. Client Recovery from File Handle Expiration
With
the introduction error NFS4ERR_STALE_CLIENTID. When this happens, the client must
obtain a new clientid by use of the volatile file handle, SETCLIENTID operation and then
proceed to any other necessary recovery for the server reboot case
(See the section "Server Failure and Recovery").
The client must
take on additional responsibility so that also employ the SETCLIENTID operation when it may prepare itself to
recover
receives a NFS4ERR_STALE_STATEID error using a stateid derived from
its current clientid since this also indicates a server reboot which
has invalidated the expiration existing clientid (see the next section
"nfs_lockowner and stateid Definition" for details).
8.1.2. Server Release of a volatile file handle. Clientid
If the server
returns persistent file handles, determines that the client does not need these
additional steps.
For volatile file handles, most commonly the holds no associated state
for its clientid and no activity from that client will need to
store has been received
some long period of time, the component names leading up server may choose to and including release the file system
Expires: April July 2000 [Page 15] 57]
Draft Protocol Specification NFS version 4 October 1999
object in question. With these names, Protocol January 2000
clientid. The server may make this choice for an inactive client so
that resources are not consumed by those intermittently active
clients. If the client contacts the server after the this release,
the server must ensure the client receives the appropriate error so
that it will use the SETCLIENTID/SETCLIENTID_CONFIRM sequence to
establish a new identity. It should be able clear that the server must be
very hesitant to release a clientid since the resultant work on the
client to recover by finding from such an event will be the same burden as if
the server had failed and restarted.
8.1.3. nfs_lockowner and stateid Definition
When requesting a file handle in lock, the name space that is still
available or client must present to the server the
clientid and an identifier for the owner of the requested lock.
These two fields are referred to as the nfs_lockowner and the
definition of those fields are:
o A clientid returned by starting at the root server as part of the client's use of
the SETCLIENTID operation.
o A variable length opaque array used to uniquely define the owner
of a lock managed by the client.
This may be a thread id, process id, or other unique value.
When the server's file system name
space.
If server grants the expired file handle refers lock, it responds with a unique 64-bit
stateid. The stateid is used as a shorthand reference to an object that has been removed
from the file system, obviously
nfs_lockowner, since the client server will not be able to
recover from maintaining the expired file handle.
It
correspondence between them.
The server is also possible that the expired file handle refers free to a file
that has been renamed. If form the file was renamed by another client,
again stateid in any manner that it chooses
as long as it is possible that the original client will not be able to
recover. However, in recognize invalid and out-of-date stateids.
This requirement includes those stateids generated by earlier
instances of the case that server. From this, the client itself is renaming the
file and the file is open, it is possible that can be properly
notified of a server restart. This notification will occur when the
client may presents a stateid to the server from a previous
instantiation.
The server must be able to recover. The client can determine the new path name based on distinguish the
processing of following situations and
return the rename request. error as specified:
o The client can then regenerate the
new file handle based on the new path name. stateid was generated by an earlier server instance (i.e.
before a server reboot). The client could also
use error NFS4ERR_STALE_STATEID should
be returned.
o The stateid was generated by the current server instance but the compound operation mechanism to construct a set of operations
like:
RENAME A B
LOOKUP B
GETFH
Expires: April July 2000 [Page 16] 58]
Draft Protocol Specification NFS version 4 October 1999
4. Basic Data Types
Arguments and results from operations will be described in terms of
basic XDR types defined Protocol January 2000
stateid no longer designates the current locking state for the
lockowner-file pair in [RFC1832]. question (i.e. one or more locking
operations has occurred). The following data types will error NFS4ERR_OLD_STATEID should
be defined in terms of basic XDR types:
filehandle: opaque <128>
An NFS version 4 filehandle. A filehandle with zero length is
recognized as returned.
This error condition will only occur when the client issues a "public" filehandle.
utf8string: opaque <>
A counted array of octets that contains
locking request which changes a UTF-8 string.
Note: Section 11, Internationalization, covers the rational of
using UTF-8.
bitmap: uint32 <>
A counted array of 32 bit integers used to contain bit values. stateid while an I/O request
that uses that stateid is outstanding.
o The position of stateid was generated by the integer in current server instance but the array that contains bit n can
stateid does not designate a locking state for any active
lockowner-file pair. The error NFS4ERR_BAD_STATEID should be computed from
returned.
This error condition will occur when there has been a logic
error on the expression (n / 32) and its bit within part of the client or server. This should not
happen.
One mechanism that
integer is (n mod 32).
0 1
+-----------+-----------+-----------+--
| count | 31 .. 0 | 63 .. 32 |
+-----------+-----------+-----------+--
createverf: opaque<8>
Verify may be used to satisfy these requirements is for exclusive create semantics
nfstime4
struct nfstime4 {
int64_t seconds;
uint32_t nseconds;
}
The nfstime4 structure gives
the number of seconds and
nanoseconds since midnight or 0 hour January 1, 1970 Coordinated
Universal Time (UTC). Values greater than zero server to divide stateids into three fields:
o A server verifier which uniquely designates a particular server
instantiation.
o An index into a table of locking-state structures.
o A sequence value which is incremented for each stateid that is
associated with the seconds
field denote dates after same index into the 0 hour January 1, 1970. Values
less than zero for locking-state table.
By matching the seconds incoming stateid and its field denote dates before values with the 0
hour January 1, 1970. In both cases, state
held at the nseconds field server, the server is able to
be added to the seconds field easily determine if a
stateid is valid for its current instantiation and state. If the final time representation.
For example, if
stateid is not valid, the time to appropriate error can be represented is one-half second
Expires: April 2000 [Page 17]
Draft Protocol Specification NFS version 4 October 1999
before 0 hour January 1, 1970, supplied to the seconds field would have a
value
client.
8.1.4. Use of negative one (-1) the stateid
All READ and WRITE operations contain a stateid. If the nseconds fields would have
nfs_lockowner performs a
value READ or WRITE on a range of one-half second (500000000). Values greater than
999,999,999 for nseconds are considered invalid.
This data type is bytes within a
locked range, the stateid (previously returned by the server) must be
used to pass time and date information. A
server converts to and from local time when processing time
values, preserving as much accuracy as possible. indicate that appropriate lock (record or share) is held. If
no state is established by the
precision of timestamps stored for client, either record lock or share
lock, a file system object is less
than defined, loss stateid of precision can occur. An adjunct time
maintenance protocol all bits 0 is recommended to reduce client and used. If no conflicting locks are
held on the file, the server
time skew.
specdata4
struct specdata4 {
uint32_t specdata1;
uint32_t specdata2;
}
This data type represents additional information may service the READ or WRITE operation.
If a conflict with an explicit lock occurs, an error is returned for
the device
file types NFCHR and NFBLK. operation (NFS4ERR_LOCKED). This allows "mandatory locking" to be
Expires: April July 2000 [Page 18] 59]
Draft Protocol Specification NFS version 4 October 1999
5. File Attributes
To meet the NFS Version 4 requirements of extensibility and increased
interoperability with non-Unix platforms, attributes must be handled
in a more flexible manner. The NFS Version 3 fattr3 structure
contained a fixed list Protocol January 2000
implemented.
A stateid of attributes that not all clients and servers
are able to support or care about, which cannot be extended as new
needs arise, and which provides no way to indicate non-support. With
NFS Version 4, the client will be able to ask what attributes the
server supports, and will be able bits 1 (one) allows READ operations to request only those attributes in
which it is interested.
To this end, attributes will be divided into three groups: mandatory,
recommended and named. Both mandatory and recommended attributes are
supported in bypass record
locking checks at the NFS V4 protocol by a specific and well-defined
encoding, and are identified by number. They server. However, WRITE operations with stateid
with bits all 1 (one) do not bypass record locking checks. File
locking checks are requested by
setting a bit in the bit vector sent in handled by the GETATTR request; OPEN operation (see the
server response includes a bit vector to list what attributes were
returned in response. New mandatory or recommended attributes section
"OPEN/CLOSE Operations").
An explicit lock may not be
added to the granted while a READ or WRITE operation
with conflicting implicit locking is being performed.
8.1.5. Sequencing of Lock Requests
Locking is different than most NFS protocol between revisions operations as it requires "at-
most-one" semantics that are not provided by publishing ONCRPC. In the face of
retransmission or reordering, lock or unlock requests must have a
standards-track RFC which allocates
well defined and consistent behavior. To accomplish this, each lock
request contains a new attribute sequence number value and
defines the encoding for that is a consecutively increasing
integer. Different nfs_lockowners have different sequences. The
server maintains the attribute.
Named attributes are accessed by last sequence number (L) received and the new OPENATTR operation, which
accesses
response that was returned.
If a hidden directory of attributes associated request with a
filesystem object. OPENATTR takes previous sequence number (r < L) is received, it
is rejected with the return of error NFS4ERR_BAD_SEQID. Given a filehandle for
properly-functioning client, the object and
returns response to (r) must have been
received before the filehandle for last request (L) was sent. If a duplicate of
last request (r == L) is received, the attribute hierarchy, which stored response is returned.
If a
directory object accessible by LOOKUP or READDIR, and which contains
files whose names represent request beyond the named attributes and whose data bytes
are next sequence (r == L + 2) is received, it is
rejected with the value return of error NFS4ERR_BAD_SEQID. Sequence
history is reinitialized whenever the attribute. For example:
LOOKUP "foo" ; look up file
GETATTR attrbits
OPENATTR ; access foo's named attributes
LOOKUP "x11icon" ; look up specific attribute
READ 0,4096 ; read stream of bytes
Named attributes are intended primarily for data needed by
applications rather than by an NFS client implementation per se; NFS
implementors are strongly encouraged verifier changes.
Since the sequence number is represented with an unsigned 32-bit
integer, the arithmetic involved with the sequence number is mod
2^32.
It is critical the server maintain the last response sent to define their new attributes
as recommended attributes by bringing them the
client to provide a more reliable cache of duplicate non-idempotent
requests than that of the working group. traditional cache described in [Juszczak].
The set of attributes which are classified as mandatory is
deliberately small, since servers traditional duplicate request cache uses a least recently used
algorithm for removing unneeded requests. However, the last lock
request and response on a given nfs_lockowner must do whatever it takes to
support them. The recommended attributes may be unsupported, though
a server should support cached as many long
as it can. Attributes are deemed the lock state exists on the server.
8.1.6. Recovery from Replayed Requests
As described above, the sequence number is per nfs_lockowner. As
Expires: April July 2000 [Page 19] 60]
Draft Protocol Specification NFS version 4 October 1999
mandatory if Protocol January 2000
long as the data is both needed by a large server maintains the last sequence number of clients received and
is not otherwise reasonably computable by the client when support is
not provided on
follows the server.
5.1. Mandatory Attributes
These MUST be supported by every NFS Version 4 client and server in
order to ensure a minimum level methods described above, there are no risks of interoperability. a
byzantine router re-sending old requests. The server must
store need only
maintain the nfs_lockowner, sequence number state as long as there
are open files or closed files with locks outstanding.
LOCK, LOCKU, OPEN, OPEN_DOWNGRADE, and return these attributes, CLOSE each contain a sequence
number and therefore the client must be able to
function with an attribute set limited to risk of the replay of these attributes, though
some operations may be impaired or limited in some ways
resulting in this case.
A client may ask for any of these attributes to be returned by
setting undesired effects is non-existent while the server
maintains the nfs_lockowner state.
8.1.7. Releasing nfs_lockowner State
When a bit in particular nfs_lockowner no longer holds open or file locking
state at the GETATTR request, and server, the server must return
their value.
5.2. Recommended Attributes
These attributes are understood well enough may choose to warrant support in release the
NFS Version 4 protocol, though they sequence
number state associated with the nfs_lockowner. The server may not be supported make
this choice based on all
clients and servers. A client may ask lease expiration, for any the reclamation of these attributes server
memory, or other implementation specific details. In any event, the
server is able to
be returned do this safely only when the nfs_lockowner no
longer is being utilized by setting a bit the client. The server may choose to
hold the nfs_lockowner state in the GETATTR request, but must be able event that retransmitted requests
are received. However, the period to deal with not receiving them. A client may ask for hold this state is
implementation specific.
In the set of
attributes case that a LOCK, LOCKU, OPEN_DOWNGRADE, or CLOSE is
retransmitted after the server supports and should not request attributes has previously released the
nfs_lockowner state, the server will find that the nfs_lockowner has
no files open and an error will be returned to the client. If the
nfs_lockowner does have a file open, the stateid will not support. A match and
again an error is returned to the client.
In the case that an OPEN is retransmitted and the nfs_lockowner is
being used for the first time or the nfs_lockowner state has been
previously released by the server, the use of the OPEN_CONFIRM
operation will prevent incorrect behavior. When the server should be tolerant observes
the use of requests the nfs_lockowner for
unsupported attributes, the first time, it will direct the
client to perform the OPEN_CONFIRM for the corresponding OPEN. This
sequence establishes the use of an nfs_lockowner and simply not return them, rather than
considering associated
sequence number. See the section "OPEN_CONFIRM - Confirm Open" for
further details.
8.2. Lock Ranges
The protocol allows a lock owner to request an error. a lock with one byte
range and then either upgrade or unlock a sub-range of the initial
lock. It is expected that servers this will
support all attributes they comfortably can, and only fail to support
attributes which are difficult to support in their operating
environments. A server should provide attributes whenever they don't
have to "tell lies" to the client. For example, a file modification
time should be either an accurate time uncommon type of request.
Expires: July 2000 [Page 61]
Draft Specification NFS version 4 Protocol January 2000
In any case, servers or should server file systems may not be supported by able to
support sub-range lock semantics. In the event that a server
receives a locking request that represents a sub-range of current
locking state for the lock owner, the server is allowed to return the server. This will not always be comfortable
error NFS4ERR_LOCK_RANGE to clients but it
seems signify that it does not support sub-
range lock operations. Therefore, the client has a better ability should be prepared to fabricate or construct
an attribute or do without.
Most attributes
receive this error and, if appropriate, report the error to the
requesting application.
The client is discouraged from NFS V3's FSINFO, FSSTAT and PATHCONF procedures
have been added as recommended attributes, so that filesystem info coalescing adjacent ranges since the
server may be collected via not support sub-range requests and for reasons related to
the filehandle recovery of any object the filesystem.
This renders those procedures unnecessary file locking state in NFS V4.
5.3. Named Attributes
These attributes are not supported by direct encoding the event of server failure.
As discussed in the NFS
Version 4 protocol but are accessed by string names rather than
numbers section "Server Failure and correspond Recovery" below, the
server may employ certain optimizations during recovery that work
effectively only when the client's behavior during lock recovery is
similar to an uninterpreted stream the client's locking behavior prior to server failure.
8.3. Blocking Locks
Some clients require the support of bytes which are
Expires: April 2000 [Page 20]
Draft Protocol Specification blocking locks. The NFS version
4 October 1999
stored with the filesystem object. The namespace for these
attributes may be accessed by using the OPENATTR operation protocol must not rely on a callback mechanism and therefore is
unable to get notify a
filehandle client when a previously denied lock has been
granted. Clients have no choice but to continually poll for the
lock. This presents a virtual "attribute directory" and using READDIR and
LOOKUP operations on this filehandle. Named attributes may then be
examined or changed by normal READ fairness problem. Two new lock types are
added, READW and WRITE WRITEW, and CREATE operations on are used to indicate to the filehandles returned from READDIR and LOOKUP. Named attributes
may have attributes, for example, a security label may have access
control information in its own right.
It is recommended server that servers support arbitrary named attributes. A
the client is requesting a blocking lock. The server should not depend on maintain
an ordered list of pending blocking locks. When the ability to store any named attributes
in conflicting lock
is released, the server's filesystem. If a server does support named
attributes, a may wait the lease period for the first
waiting client which to re-request the lock. After the lease period
expires the next waiting client request is also able allowed the lock. Clients
are required to handle them should be able poll at an interval sufficiently small that it is
likely to copy acquire the lock in a file's data timely manner. The server is not
required to maintain a list of pending blocked locks as it is used to
increase fairness and meta-data with complete transparency from
one location not correct operation. Because of the
unordered nature of crash recovery, storing of lock state to another; this stable
storage would imply that there should be no
attribute names which will be considered illegal by required to guarantee ordered granting of blocking
locks.
Servers may also note the server.
Names lock types and delay returning denial of attributes will not be controlled by
the request to allow extra time for a standards body.
However, vendors and application writers are encouraged conflicting lock to register
attribute names and be
released, allowing a successful return. In this way, clients can be
avoid the interpretation and semantics burden of needlessly frequent polling for blocking locks.
The server should take care in the stream length of
bytes via informational RFC so that vendors may interoperate where
common interests exist. delay in the event the
client retransmits the request.
Expires: April July 2000 [Page 21] 62]
Draft Protocol Specification NFS version 4 October 1999
5.4. Mandatory Attributes - Definitions
Name # DataType Access Description
___________________________________________________________________
supp_attr 0 bitmap READ Protocol January 2000
8.4. Lease Renewal
The bit vector which
would retrieve all
mandatory purpose of a lease is to allow a server to remove stale locks
that are held by a client that has crashed or is otherwise
unreachable. It is not a mechanism for cache consistency and
recommended attributes
which lease
renewals may not be requested
for this object. denied if the lease interval has not expired.
The following events cause implicit renewal of all of the leases for
a given client must ask
this question to
request correct
attributes.
object_type 1 nfs4_ftype READ
The type (i.e. all those sharing a given clientid). Each of
these is a positive indication that the object
(file, directory,
symlink)
The client cannot
handle object
correctly without
type.
persistent_fh 2 boolean READ
Is is still active and
that the filehandle for
this object
persistent?
Server should know if associated state held at the filehandles being
provided are
persistent or not. If server, for the server client, is
still valid.
o An OPEN with a valid clientid.
o Any operation made with a valid stateid (CLOSE, DELEGRETURN,
LOCK, LOCKU, OPEN, OPEN_CONFIRM, READ, RENEW, SETATTR, WRITE).
This does not able
to make this
determination, then it
can choose volatile or
non-persistent.
Expires: April 2000 [Page 22]
Draft Protocol Specification NFS version 4 October 1999
change 3 uint64 READ
A value created by include the
server special stateids of all bits 0 or all
bits 1.
Note that if the client
can use to determine
if a file data,
directory contents had restarted or
attributes have been
modified. rebooted, the
client would not be making these requests without issuing
the SETCLIENTID operation. The use of the SETCLIENTID
operation (possibly with the addition of the optional
SETCLIENTID_CONFIRM operation) notifies the server
can just return to drop
the
file mtime in this
field though if a more
precise value exists
then it can locking state associated with the client.
If the server has rebooted, the stateids
(NFS4ERR_STALE_STATEID error) or the clientid
(NFS4ERR_STALE_CLIENTID error) will not be
substituted, valid hence
preventing spurious renewals.
This approach allows for
instance, a sequence
number.
Necessary low overhead lease renewal which scales
well. In the typical case no extra RPC calls are required for any
useful caching, likely
to be available.
object_size 4 uint64 R/W lease
renewal and in the worst case one RPC is required every lease period
(i.e. a RENEW operation). The size number of locks held by the object
in bytes.
Could be very
expensive to derive,
likely to be
available.
link_support 5 boolean READ
Does client is
not a factor since all state for the object's
filesystem supports
hard links?
Server client is involved with the
lease renewal action.
Since all operations that create a new lease also renew existing
leases, the server must maintain a common lease expiration time for
all valid leases for a given client. This lease time can then be
easily
determine if links are
supported.
symlink_support 6 boolean READ
Does updated upon implicit lease renewal actions.
8.5. Crash Recovery
The important requirement in crash recovery is that both the object's
filesystem supports
symbolic links?
Server can easily
determine if links are
supported.
named_attr 7 boolean READ
Does this object have
named attributes? client
Expires: April July 2000 [Page 23] 63]
Draft Protocol Specification NFS version 4 October 1999
fsid 8 fsid4 READ
Unique filesystem
identifier for the
filesystem holding
this object. fsid
contains major Protocol January 2000
and
minor components each the server know when the other has failed. Additionally, it is
required that a client sees a consistent view of which are uint64.
unique_handles 9 boolean data across server
restarts or reboots. All READ
Are two distinct
filehandles guaranteed
to refer to two
different file system
objects?
lease_time 10 uint32 and WRITE operations that may have
been queued within the client or network buffers must wait until the
client has successfully recovered the locks protecting the READ
Duration of leases at and
WRITE operations.
8.5.1. Client Failure and Recovery
In the event that a client fails, the server in seconds.
rdattr_error 11 enum READ
Error returned may recover the client's
locks when the associated leases have expired. Conflicting locks
from
getattr during
readdir.
Expires: April 2000 [Page 24]
Draft Protocol Specification NFS version 4 October 1999
5.5. Recommended Attributes - Definitions
Name # Data Type Access Description
_____________________________________________________________________
ACL 12 nfsace4<> R/W
The access control
list for another client may only be granted after this lease expiration.
If the object.
[The nature and
format of ACLs client is
still able to restart or reinitialize within the lease
period the client may be
determined.]
aclsupport 13 uint32 READ
Indicates what ACLs
are supported on forced to wait the
current filesystem.
archive 14 boolean R/W
Whether or not this
file has been
archived since remainder of the
time lease
period before obtaining new locks.
To minimize client delay upon restart, lock requests are associated
with an instance of last
modification
(deprecated in favor the client by a client supplied verifier. This
verifier is part of backup_time).
cansettime 15 boolean READ
Whether or not this
object's filesystem
can fill in the times
on initial SETCLIENTID call made by the client.
The server returns a SETATTR request
without an explicit
time.
case_insensitive 16 boolean READ
Are filename
comparisons on this
filesystem case
insensitive?
case_preserving 17 boolean READ
Is filename case on
this filesystem
preserved?
chown_restricted 18 boolean READ
Will clientid as a request to
change ownership be
honored?
filehandle 19 nfs4_fh READ result of the SETCLIENTID
operation. The filehandle client then confirms the use of
this object
(primarily the verifier with
SETCLIENTID_CONFIRM. The clientid in combination with an opaque
owner field is then used by the client to identify the lock owner for
readdir requests).
Expires: April 2000 [Page 25]
Draft Protocol Specification NFS version 4 October 1999
fileid 20 uint64 READ
A number uniquely
identifying
OPEN. This chain of associations is then used to identify all locks
for a particular client.
Since the file
within verifier will be changed by the
filesystem.
files_avail 21 uint64 READ
File slots available client upon each
initialization, the server can compare a new verifier to this user on the
filesystem containing
this object - this
should be verifier
associated with currently held locks and determine that they do not
match. This signifies the client's new instantiation and subsequent
loss of locking state. As a result, the server is free to release
all locks held which are associated with the old clientid which was
derived from the
smallest relevant
limit.
files_free 22 uint64 READ
Free file slots on old verifier.
For secure environments, a change in the filesystem
containing this
object - this should
be verifier must only cause the smallest
relevant limit.
files_total 23 uint64 READ
Total file slots on
release of locks associated with the filesystem
containing this
object.
fs_locations 24 fs_locations READ
Locations where this
filesystem may be
found. If authenticated requester. This
is required to prevent a rogue entity from freeing otherwise valid
locks.
Note that the server
returns NFS4ERR_MOVED
as an error, this
attribute verifier must be
supported.
hidden 25 boolean R/W
Is file considered
hidden?
homogeneous 26 boolean READ
Whether or not this
object's filesystem
is homogeneous, i.e.
whether pathconf is have the same for all
filesystem objects.
maxfilesize 27 uint64 READ
Maximum supported
file size uniqueness properties of
the verifier for the
filesystem COMMIT operation.
8.5.2. Server Failure and Recovery
If the server loses locking state (usually as a result of a restart
or reboot), it must allow clients time to discover this
object. fact and re-
Expires: April July 2000 [Page 26] 64]
Draft Protocol Specification NFS version 4 October 1999
maxlink 28 uint32 READ
Maximum number of
links for this
object.
maxname 29 uint32 READ
Maximum filename size
supported for this
object.
maxread 30 uint64 READ
Maximum read size
supported for this
object.
maxwrite 31 uint64 READ
Maximum write size
supported for this
object. This
attribute SHOULD Protocol January 2000
establish the lost locking state. The client must be
supported if able to re-
establish the file
is writable. Lack of
this attribute can
lead locking state without having the server deny valid
requests because the server has granted conflicting access to another
client. Likewise, if there is the client
either wasting
bandwidth or possibility that clients have not
receiving the best
performance.
mime_type 32 utf8<> R/W
MIME body
type/subtype of this
object.
mode 33 uint32 R/W
Unix-style permission
bits
yet re-established their locking state for this object
(deprecated in favor
of ACLs)
no_trunc 34 boolean READ
If a name longer than
name_max is used,
will an error be
returned or will file, the
name be truncated?
numlinks 35 uint32 server must
disallow READ
Number and WRITE operations for that file. The duration of links to
this object.
owner 36 utf8<> R/W
The string name recovery period is equal to the duration of the owner lease period.
A client can determine that server failure (and thus loss of this
object.
Expires: April 2000 [Page 27]
Draft Protocol Specification NFS version 4 October 1999
owner_group 37 utf8<> R/W locking
state) has occurred, when it receives one of two errors. The string name
NFS4ERR_STALE_STATEID error indicates a stateid invalidated by a
reboot or restart. The NFS4ERR_STALE_CLIENTID error indicates a
clientid invalidated by reboot or restart. When either of these are
received, the group of client must establish a new clientid (See the
owner of this object.
quota_hard 38 uint64 READ
Number section
"Client ID") and re-establish the locking state as discussed below.
The period of bytes special handling of
disk space beyond
which locking and READs and WRITEs, equal
in duration to the server will
decline lease period, is referred to as the "grace
period". During the grace period, clients recover locks and the
associated state by reclaim-type locking requests (i.e. LOCK requests
with reclaim set to allocate
new space.
quota_soft 39 uint64 READ
Number of bytes true and OPEN operations with a claim type of
disk space at which
CLAIM_PREVIOUS). During the client may choose
to warn grace period, the user
about limited space.
quota_used 40 uint64 server must reject
READ
Number and WRITE operations and non-reclaim locking requests (i.e.
other LOCK and OPEN operations) with an error of bytes NFS4ERR_GRACE.
If the server can reliably determine that granting a non-reclaim
request will not conflict with reclamation of
disk space occupied locks by other clients,
the owner of this
object on this
filesystem.
rawdev 41 specdata4 READ
Raw device
identifier.
space_avail 42 uint64 READ
Disk space in bytes
available NFS4ERR_GRACE error does not have to this
user on be returned and the
filesystem containing
this object - this
should non-
reclaim client request can be serviced. For the
smallest relevant
limit.
space_free 43 uint64 server to be able to
service READ
Free disk space in
bytes on and WRITE operations during the
filesystem containing
this object - this
should grace period, it must
again be able to guarantee that no possible conflict could arise
between an impending reclaim locking request and the
smallest relevant
limit.
space_total 44 uint64 READ
Total disk space in
bytes on the
filesystem containing
this object.
Expires: April 2000 [Page 28]
Draft Protocol Specification NFS version 4 October 1999
space_used 45 uint64 READ
Number of filesystem
bytes allocated to
this object.
system 46 boolean R/W
Whether or not this
file WRITE
operation. If the server is a system
file.
time_access 47 nfstime4 R/W
The time of last
access unable to offer that guarantee, the object.
time_backup 48 nfstime4 R/W
The time of last
backup of the object.
time_create 49 nfstime4 R/W
The time of creation
of the object. This
attribute does not
have any relation
NFS4ERR_GRACE error must be returned to the traditional Unix
file attribute
time'.
time_delta 50 nfstime4 READ
Smallest useful client.
For a server time
granularity.
time_metadata 51 nfstime4 R/W
The time of last
meta-data
modification of to provide simple, valid handling during the
object.
time_modify 52 nfstime4 R/W
The time since grace
period, the
epoch of last
modification easiest method is to simply reject all non-reclaim
locking requests and READ and WRITE operations by returning the
object.
version 53 utf8<> R/W
Version number of
NFS4ERR_GRACE error. However, a server may keep information about
granted locks in stable storage. With this document.
volatility 54 nfstime4 information, the server
could determine if a regular lock or READ
Approximate time
until next expected
change on this
filesystem, as or WRITE operation can be
safely processed.
For example, if a
measure count of
volatility. locks on a given file is available in
stable storage, the server can track reclaimed locks for the file and
when all reclaims have been processed, non-reclaim locking requests
may be processed. This way the server can ensure that non-reclaim
locking requests will not conflict with potential reclaim requests.
Expires: April July 2000 [Page 29] 65]
Draft Protocol Specification NFS version 4 October 1999
5.6. Interpreting owner Protocol January 2000
With respect to I/O requests, if the server is able to determine that
there are no outstanding reclaim requests for a file by information
from stable storage or another similar mechanism, the processing of
I/O requests could proceed normally for the file.
To reiterate, for a server that allows non-reclaim lock and owner_group
The recommended attributes "owner" I/O
requests to be processed during the grace period, it MUST determine
that no lock subsequently reclaimed will be rejected and "owner_group" are represented
in terms that no lock
subsequently reclaimed would have prevented any I/O operation
processed during the grace period.
Clients should be prepared for the return of NFS4ERR_GRACE errors for
non-reclaim lock and I/O requests. In this case the client should
employ a UTF-8 string. To backoff and retry mechanism for the request. Timeout
periods should be chosen to avoid overwhelming a representation server. The client
must account for the server that is tied able to a particular underlying implementation at perform I/O and non-
reclaim locking requests within the client or server, grace period as well as those
that can not do so.
A reclaim-type locking request outside the use of server's grace period can
only succeed if the UTF-8 string server can guarantee that no conflicting lock or
I/O request has been chosen. Note that section 6.1 granted since reboot or restart.
8.5.3. Network Partitions and Recovery
If the duration of [RFC2624] provides additional rationale. It a network partition is expected that greater than the lease
period provided by the server, the
client and server will have their own local representation of owner
and owner_group that is used not received a
lease renewal from the client. If this occurs, the server may free
all locks held for local storage the client. As a result, all stateids held by the
client will become invalid or presentation to stale. Once the
end user. Therefore, it client is expected that able to
reach the when these attributes
are transferred between server after such a network partition, all I/O submitted by
the client and with the now invalid stateids will fail with the server that
returning the local
representation error NFS4ERR_EXPIRED. Once this error is translated to a syntax of received,
the form
"user@dns_domain". This client will allow for suitably notify the application that held the lock.
As a courtesy to the client and or as an optimization, the server that do
not use may
continue to hold locks on behalf of a client for which recent
communication has extended beyond the same local representation lease period. If the ability to translate to server
receives a
common syntax lock or I/O request that can be interpreted by both.
The translation is not specified as part conflicts with one of these
courtesy locks, the protocol. This
allows various solutions to be employed. For example, server must free the courtesy lock and grant the
new request.
In the event of a local
translation table may be consulted that maps between network partition with a numeric id to duration extending beyond
the user@dns_domain syntax. A name service expiration of a client's leases, the server MUST employ a method
of recording this fact in its stable storage. Conflicting locks
requests from another client may also be used to
accomplish serviced after the translation. The 'dns_domain' portion of lease
expiration. There are various scenarios involving server failure
Expires: July 2000 [Page 66]
Draft Specification NFS version 4 Protocol January 2000
after such an event that require the owner
string storage of these lease
expirations or network partitions. One scenario is meant to be as follows:
A client holds a DNS domain name. For example, user@ietf.org.
In lock at the case where there server and encounters a
network partition and is no translation available unable to renew the associated
lease. A second client or
server, the attribute value must be constructed without obtains a conflicting lock and then
frees the '@'.
Therefore, lock. After the absence of unlock request by the @ from second
client, the owner server reboots or owner_group
attribute signifies that no translation was available reinitializes. Once the
server recovers, the network partition heals and the
receiver of
original client attempts to reclaim the attribute should not place original lock.
In this scenario and without any special meaning with state information, the attribute value. Even though server will
allow the attribute value can not be
translated, it may still reclaim and the client will be useful. In in an inconsistent state
because the case server or the client has no knowledge of the conflicting
lock.
The server may choose to store this lease expiration or network
partitioning state in a client, way that will only identify the
attribute string client as a
whole. Note that this may be used for local display potentially lead to lock reclaims being
denied unnecessarily because of ownership.
5.7. Access Control Lists
The NFS ACL attribute is an array a mix of access control entries (ACE).
There are various access control entry types. conflicting and non-
conflicting locks. The server is able may also choose to
communicate which ACE types are support by returning the appropriate
value within the aclsupport attribute. store information
about each lock that has an expired lease with an associated
conflicting lock. The types choice of ACEs are defined
as follows:
Type Description
_____________________________________________________
ALLOW
Explicitly grants the access defined in
acemask4 to the file or directory.
Expires: April 2000 [Page 30]
Draft Protocol Specification NFS version 4 October 1999
DENY
Explicitly denies the access defined in
acemask4 amount and type of state
information that is stored is left to the file or directory.
AUDIT
LOG (system dependant) implementor. In any access
attempt case,
the server must have enough state information to enable correct
recovery from multiple partitions and multiple server failures.
8.6. Recovery from a file Lock Request Timeout or directory which
uses an access method which is Abort
In the event a subset
of acemask4.
ALARM
Generate lock request times out, a system ALARM (system
dependant) client may decide to not
retry the request. The client may also abort the request when any access attempt the
process for which it was issued is
made terminated (e.g. in UNIX due to a file or directory which
signal. It is a
subset possible though that the server received the request
and acted upon it. This would change the state on the server without
the client being aware of acemask4
The NFS ACE attribute the change. It is defined as follows:
struct nfsace4 {
acetype4 type;
aceflag4 flag;
acemask4 access_mask;
utf8string who;
};
Each nfsace4 entry paramount that the
client re-synchronize state with server before it attempts any other
operation that takes a seqid and/or a stateid with the same
nfs_lockowner. This is assumed straightforward to be processed in order by do without a special re-
synchronize operation.
Since the server.
The first Access Control Entry is used where both server maintains the "who" last lock request and response
received on the
"access_mask" match the requester and nfs_lockowner, for each nfs_lockowner, the type of access desired. Any
later additional Access Control Entries which also match are ignored.
5.7.1. ACE type
The semantics of client
should cache the 'type' field follow last lock request it sent such that the descriptions provided
above.
5.7.2. ACE flag
The 'flag' field contains values based on lock request
did not receive a response. From this, the following descriptions.
ACE4_FILE_INHERIT_ACE
Can be placed on next time the client does
a directory lock operation for the nfs_lockowner, it can send the cached
request, if there is one, and indicates if the request was one that this ACE should be
added to each new non-directory file created.
ACE4_DIRECTORY_INHERIT_ACE established
Expires: April July 2000 [Page 31] 67]
Draft Protocol Specification NFS version 4 October 1999
Can be placed on a directory and indicates that this ACE should be
added to each new directory created.
ACE4_INHERIT_ONLY_ACE
Can be placed on Protocol January 2000
state (e.g. a directory but does not apply to LOCK or OPEN operation) the directory,
only client can follow up with a
request to newly created files/directories as specified by remove the above two
flags.
ACE4_NO_PROPAGATE_INHERIT_ACE
Can be placed on a directory. Normally when state (e.g. a new directory is
created and an ACE exists on LOCKU or CLOSE operation). With
this approach, the parent directory which is marked
ACL4_DIRECTORY_INHERIT_ACE, two ACEs are placed sequencing and stateid information on the new directory.
One for the directory itself client
and one which is an inheritable ACE server for
newly created directories. This flag tells the O/S to not place an
ACE on given nfs_lockowner will re-synchronize and in
turn the newly created directory which is inheritable by
subdirectories lock state will re-synchronize.
8.7. Server Revocation of Locks
At any point, the created directory.
ACE4_SUCCESSFUL_ACCESS_ACE_FLAG
ACL4_FAILED_ACCESS_ACE_FLAG
Both indicate for AUDIT server can revoke locks held by a client and ALARM which state to log the
client must be prepared for this event. On
every ACCESS or OPEN call which occurs on a file or directory which
has an ACL When the client detects that is of type ACE4_SYSTEM_AUDIT_ACE_TYPE
its locks have been or
ACE4_SYSTEM_ALARM_ACE_TYPE, may have been revoked, the attempted access client is compared to the
ace4mask of these ACLs. If
responsible for validating the access is a subset of ace4mask state information between itself and
the
identifier match, an AUDIT trail or an ALARM is generated. By
default this happens regardless of the success or failure of server. Validating locking state for the
ACCESS client means that it
must verify or OPEN call. reclaim state for each lock currently held.
The flag ACE4_SUCCESSFUL_ACCESS_ACE_FLAG only produces the AUDIT or
ALARM if the ACCESS or OPEN call first instance of lock revocation is successful. The
ACE4_FAILED_ACCESS_ACE_FLAG causes the ALARM upon server reboot or AUDIT if re-
initialization. In this instance the ACCESS client will receive an error
(NFS4ERR_STALE_STATEID or OPEN call fails.
ACE4_IDENTIFIER_GROUP
Indicates that NFS4ERR_STALE_CLIENTID) and the "who" refers to a GROUP client will
proceed with normal crash recovery as defined under Unix.
Expires: April 2000 [Page 32]
Draft Protocol Specification NFS version 4 October 1999
5.7.3. ACE Access Mask
The access_mask field contains values based on the following:
Access Description
____________________________________________________________________
READ_DATA
Permission to read described in the data previous
section.
The second lock revocation event can occur as a result of
administrative intervention within the file
LIST_DIRECTORY
Permission to list the contents of lease period. While this is
considered a
directory
WRITE_DATA
Permission to modify rare event, it is possible that the file's data
ADD_FILE
Permission to add a new file to a
directory
APPEND_DATA
Permission to append data to a file
ADD_SUBDIRECTORY
Permission to create a subdirectory server's
administrator has decided to release or revoke a
directory
READ_STREAMS
Permission to read particular lock held
by the additional
streams of client. As a file
WRITE_STREAMS
Permission to write result of revocation, the additional
streams client will receive an
error of a file
EXECUTE
Permission to execute a file
DELETE_CHILD
Permission to delete a file or directory NFS4ERR_EXPIRED and the error is received within a directory
READ_ATTRIBUTES the lease
period for the lock. In this instance the client may assume that
only the nfs_lockowner's locks have been lost. The ability to read basic attributes
(non-acls) of client notifies
the lock holder appropriately. The client may not assume the lease
period has been renewed as a file
WRITE_ATTRIBUTES
Permission to change basic attributes
(non-acls) result of a file
READ_CONTROL
?
READ_EXTENDED_ATTRIBUTES
?
WRITE_EXTENDED_ATTRIBUTES
?
DELETE
Permission to Delete failed operation.
The third lock revocation event is the File, IF FILE
BASED
READ_ACL
Permission inability to Read renew the ACL
WRITE_ACL
Permission to Write lease
period. While this is considered a rare or unusual event, the ACL
WRITE_OWNER
Permission client
must be prepared to change recover. Both the owner
SYNCHRONIZE
Allow server and client will be able
to detect the forcing of mutual-exclusion failure to renew the file
5.7.4. ACE who
There lease and are several special identifiers ("who") which need to be
understood universally. Some capable of these identifiers cannot be
understood when an NFS client accesses
recovering without data corruption. For the server, but have meaning it tracks the
last renewal event serviced for the client and knows when a local process accesses the file. The ability to display lease
will expire. Similarly, the client must track operations which will
renew the lease period. Using the time that each such request was
sent and
Expires: April 2000 [Page 33]
Draft Protocol Specification NFS version 4 October 1999
modify these permissions is permitted over NFS.
Who Description
_______________________________________________________________
"OWNER"
The owner of the file.
"GROUP"
The group associated with time that the file.
"EVERYONE"
The world.
"INTERACTIVE"
Accessed from an interactive terminal.
"NETWORK"-
cessed via corresponding reply was received, the network.
"DIALUP"
Accessed as a dialup user to
client should bound the server.
"BATCH"
Accessed from a batch job.
"ANONYMOUS"
Accessed without any authentication.
"AUTHENTICATED"
Any authenticated user (opposite of
ANONYMOUS)
"SERVICE"
Access from time that the corresponding renewal could
have occurred on the server and thus determine if it is possible that
a system service.
To avoid conflict these special identifiers should be of lease period expiration could have occurred.
When the form
"xxxx@". For example: ANONYMOUS@. client determines the lease period may have expired, the
client must mark all locks held for the associated lease as
Expires: April July 2000 [Page 34] 68]
Draft Protocol Specification NFS version 4 October 1999
6. Filesystem Migration and Replication
With the use of the recommended attribute "fs_locations", Protocol January 2000
"unvalidated". This means the NFS
version 4 server client has a method of providing filesystem migration been unable to re-establish
or
replication services. For confirm the purposes of migration and replication,
a filesystem will be defined as all files that share a given fsid
(major and minor values are appropriate lock state with the same).
The fs_locations attribute provides a list of filesystem locations.
These locations server. As described
in the previous section on crash recovery, there are specified by providing scenarios in
which the server name (either
DNS domain or IP address) and the path name representing the root of may grant conflicting locks after the filesystem. Depending on lease period
has expired for a client. When it is possible that the type of service being provided, lease period
has expired, the
list will provide client must validate each lock currently held to
ensure that a new or alternate locations for the filesystem. conflicting lock has not been granted. The client will use may
accomplish this information to redirect its requests to task by issuing an I/O request, either a pending I/O
or a zero-length read, specifying the
new server.
6.1. Replication
It is expected that filesystem replication will be used stateid associated with the
lock in question. If the case
of read-only data. Typically, response to the filesystem will be replicated
amongst two or more servers. The fs_locations attribute will provide request is success, the list
client has validated all of these locations to the client. On first access of locks governed by that stateid and
re-established the
filesystem, appropriate state between itself and the client should obtain server.
If the value I/O request is not successful, then one or more of the fs_locations
attribute. If, in the future, the client finds locks
associated with the server
unresponsive, stateid was revoked by the client may attempt to use another server specified
by fs_locations.
If applicable, and the client
must take the appropriate steps to recover
valid filehandles from the new server. This is described in more
detail in notify the following sections.
6.2. Migration
Filesystem migration owner.
8.8. Share Reservations
A share reservation is used to move a filesystem from one server mechanism to control access to
another. Migration is typically used for a filesystem that file. It
is
writable and has a single copy. The expected use of migration is for
load balancing or general resource reallocation. The protocol does
not specify how the filesystem will be moved between servers. This
server-to-server transfer separate and independent mechanism is left from record locking. When a
client opens a file, it issues an OPEN operation to the server
implementor. However, the method used to communicate
specifying the migration
event between client type of access required (READ, WRITE, or BOTH) and server is specified here.
Once the servers participating in the migration have completed the
move
type of access to deny others (deny NONE, READ, WRITE, or BOTH). If
the filesystem, OPEN fails the error NFS4ERR_MOVED client will be returned for
subsequent requests received by fail the original server. application's open request.
Pseudo-code definition of the semantics:
if ((request.access & file_state.deny)) ||
(request.deny & file_state.access))
return (NFS4ERR_DENIED)
The
NFS4ERR_MOVED error is returned constants used for all the OPEN and OPEN_DOWNGRADE operations except GETATTR. for the
access and deny fields are as follows:
const OPEN4_SHARE_ACCESS_READ = 0x00000001;
const OPEN4_SHARE_ACCESS_WRITE = 0x00000002;
const OPEN4_SHARE_ACCESS_BOTH = 0x00000003;
const OPEN4_SHARE_DENY_NONE = 0x00000000;
const OPEN4_SHARE_DENY_READ = 0x00000001;
const OPEN4_SHARE_DENY_WRITE = 0x00000002;
const OPEN4_SHARE_DENY_BOTH = 0x00000003;
Expires: April July 2000 [Page 35] 69]
Draft Protocol Specification NFS version 4 October 1999
Upon receiving Protocol January 2000
8.9. OPEN/CLOSE Operations
To provide correct share semantics, a client MUST use the NFS4ERR_MOVED error, OPEN
operation to obtain the initial filehandle and indicate the desired
access and what if any access to deny. Even if the client will intends to
use a stateid of all 0's or all 1's, it must still obtain the
value of
filehandle for the fs_locations attribute. regular file with the OPEN operation so the
appropriate share semantics can be applied. For clients that do not
have a deny mode built into their open programming interfaces, deny
equal to NONE should be used.
The client will then use OPEN operation with the CREATE flag, also subsumes the
contents CREATE
operation for regular files as used in previous versions of the attribute to redirect its requests NFS
protocol. This allows a create with a share to the specified
server. To facilitate the use of GETATTR operations such as PUTFH
must also be accepted done atomically.
The CLOSE operation removes all share locks held by the server for the migrated filesystem's
filehandles. Note nfs_lockowner
on that if file. If record locks are held, the client SHOULD release
all locks before issuing a CLOSE. The server returns NFS4ERR_MOVED, MAY free all
outstanding locks on CLOSE but some servers may not support the CLOSE
of a file that still has record locks held. The server MUST support return
failure if any locks would exist after the fs_locations attribute.
If CLOSE.
The LOOKUP operation is preserved and will return a filehandle
without establishing any lock state on the client requests more attributes than fs_locations, server. Without a valid
stateid, the server
may return fs_locations only. This is to be expected since will assume the
server client has migrated the filesystem and may least access. For
example, a file opened with deny READ/WRITE cannot be accessed using
a filehandle obtained through LOOKUP because it would not have a method of
obtaining additional attribute data.
The server implementor needs to be careful in developing
valid stateid (i.e. using a migration
solution. The server must consider all stateid of all bits 0 or all bits 1).
8.10. Open Upgrade and Downgrade
When an OPEN is done for a file and the state information
clients may have outstanding at lockowner for which the server. This includes but open
is being done already has the file open, the result is not
limited to locking/share state, delegation state, and asynchronous upgrade the
open file writes which are represented by WRITE and COMMIT verifiers. The status maintained on the server should strive to minimize include the impact on its clients during access and
after
deny bits specified by the migration process.
6.3. Interpretation of new OPEN as well as those for the fs_locations Attribute existing
OPEN. The fs_location attribute result is structured in the following way:
struct fs_location {
utf8string server<>;
pathname4 rootpath;
};
struct fs_locations {
pathname4 fs_root;
fs_location locations<>;
};
The fs_location struct that there is used to represent one open file, as far as the location of a
filesystem by providing a server name
protocol is concerned, and it includes the path to union of the root access and
deny bits for all of the
filesystem. For a multi-homed server or OPEN requests completed. Only a set single
CLOSE will be done to reset the effects of servers both OPEN's. Note that use
the same rootpath, an array of server names may be provided. An
entry in client, when issuing the server array is an UTF8 string and represents one of a
traditional DNS host name, IPv4 address, or IPv6 address. It is OPEN, may not
a requirement that all servers know that share the same rootpath be listed file is
in one fs_location struct. fact being opened. The array of above only applies if both OPEN's result
in the OPEN'ed object being designated by the same filehandle.
When the server names is provided for
convenience. Servers that share chooses to export multiple filehandles corresponding
to the same rootpath may also be listed
in separate fs_location entries in file object and returns different filehandles on two
different OPEN's of the fs_locations attribute.
The fs_locations struct same file object, the server MUST NOT "OR"
together the access and attribute then contains an array of deny bits and coalesce the two open files.
Expires: April July 2000 [Page 36] 70]
Draft Protocol Specification NFS version 4 October 1999
locations. Since Protocol January 2000
Instead the namespace of each server may be constructed
differently, the "fs_root" field is provided. The path represented
by fs_root represents the location of the filesystem in the server's
namespace. Therefore, the fs_root path is only associated must maintain separate OPEN's with the
server from which the fs_locations attribute was obtained. The
fs_root path is meant separate
stateid's and will require separate CLOSE's to aid free them.
When multiple open files on the client in locating the filesystem at
the various servers listed.
As an example, there is are merged into a replicated single open
file system located at two
servers (servA and servB). At servA the filesystem is located at
path "/a/b/c". At servB the filesystem is located at path "/x/y/z".
In this example object on the client accesses server, the filesystem first at servA
with a multi-component lookup path close of one of "/a/b/c/d". Since the client
used a multi-component lookup to obtain open files (on the filehandle at "/a/b/c/d",
it is unaware that
client) may necessitate change of the filesystem's root is located in servA's
namespace at "/a/b/c". When access and deny status of the client switches to servB, it will
need to determine that
open file on the directory it first referenced at servA server. This is
now represented by the path "/x/y/z/d" on servB. To facilitate this, because the fs_locations attribute provided by servA would have a fs_root
value union of "/a/b/c" the access and two entries in fs_location. One entry in
fs_location will be
deny bits for itself (servA) and the other will remaining open's may be for
servB with smaller (i.e. a path of "/x/y/z". With this information, the client proper
subset) than previously. The OPEN_DOWNGRADE operation is
able used to substitute "/x/y/z" for the "/a/b/c" at
make the beginning of its
access path necessary change and construct "/x/y/z/d" to use for the new server.
6.4. Filehandle Recovery for Migration or Replication
Filehandles for filesystems that are replicated or migrated have client should use it to update the
same semantics as for filesystems
server so that share reservation requests by other clients are not replicated or
migrated. For example, if a filesystem has persistent filehandles
handled properly.
8.11. Short and it is migrated to another server, Long Leases
When determining the filehandle values time period for the
filesystem will be valid at server lease, the new server.
The same is true usual
lease tradeoffs apply. Short leases are good for fast server
recovery at a filesystem which is made up cost of volatile
filehandles. In fact, increased RENEW or READ (with zero length)
requests. Longer leases are certainly kinder and gentler to large
internet servers trying to handle a very large numbers of clients.
The number of RENEW requests drop in this case the client should expect that proportion to the
new server will return NFS4ERR_EXPIRED when old filehandles lease time.
The disadvantages of long leases are
presented; the slower recovery after server
failure (server must wait for leases to expire and grace period
before granting new lock requests) and increased file contention (if
client will need fails to recover the filehandles
appropriately.
Expires: April 2000 [Page 37]
Draft Protocol Specification NFS version 4 October 1999
7. NFS Server Namespace
7.1. Server Exports
On a UNIX transmit an unlock request then server must wait for
lease expiration before granting new locks).
Long leases are usable if the name-space describes all the files reachable by
pathnames under the root directory "/". On a Windows NT server is able to store lease state in
non-volatile memory. Upon recovery, the
name-space constitutes all the files on disks named by mapped disk
letters. NFS server administrators rarely make the entire server's
file-system name-space available to NFS clients. Typically, pieces
of can reconstruct the name-space
lease state from its non-volatile memory and continue operation with
its clients and therefore long leases are made available via not an "export" feature. In
previous versions of NFS, issue.
8.12. Clocks and Calculating Lease Expiration
To avoid the root file-handle need for each export is
obtained through the MOUNT protocol; synchronized clocks, lease times are granted by
the client sends server as a string time delta. However, there is a requirement that
identifies the export of name-space
client and the server returns clocks do not drift excessively over the root
file-handle for it. The MOUNT protocol supports an EXPORTS procedure
that will enumerate duration
of the server's exports.
7.2. Browsing Exports
The NFS version 4 protocol provides a root file-handle that clients
can use to obtain file-handles for these exports via a multi-
component LOOKUP. A common user experience lock. There is to use a graphical
user interface (perhaps a file "Open" dialog window) to find a file
via progressive browsing through a directory tree. The client must be
able to move from one export to another export via single-component,
progressive LOOKUP operations.
This style also the issue of browsing is not propagation delay across the
network which could easily be several hundred milliseconds as well supported by NFS version 2 as
the possibility that requests will be lost and 3
protocols. The client expects all LOOKUP operations need to remain within
a single server file-system, i.e. be
retransmitted.
To take propagation delay into account, the device attribute will not
change. This prevents a client should subtract it
from taking name-space paths that
span exports.
An automounter on lease times (e.g. if the client can obtain a snapshot of the server's
name-space using the EXPORTS procedure of the MOUNT protocol. If it
understands estimates the server's pathname syntax, one-way
propagation delay as 200 msec, then it can create an image of
the server's name-space on the client. The parts of the name-space
that are not exported by the server are filled in with a "pseudo
file-system" assume that allows the user to browse from one mounted file-
system to another. There lease is a drawback to this representation of the
server's name-space on the client:
already 200 msec old when it gets it). In addition, it is static. If the server
administrator adds a new export the client will be unaware of it. take
Expires: April July 2000 [Page 38] 71]
Draft Protocol Specification NFS version 4 October 1999
7.3. Server Pseudo File-System
NFS version 4 servers avoid this name-space inconsistency by
presenting all Protocol January 2000
another 200 msec to get a response back to the exports within server. So the framework of client
must send a single lock renewal or write data back to the server
name-space. An 400 msec
before the lease would expire.
Expires: July 2000 [Page 72]
Draft Specification NFS version 4 client uses LOOKUP Protocol January 2000
9. Client-Side Caching
Client-side caching of data, of file attributes, and of file names is
essential to providing good performance with the NFS protocol.
Providing distributed cache coherence is a difficult problem and READDIR
operations to browse seamlessly from one export to another. Portions
previous versions of the server name-space that are NFS protocol have not exported are bridged via a
"pseudo file-system" attempted it.
Instead, several NFS client implementation techniques have been used
to reduce the problems that provides a view lack of exported directories
only. A pseudo file-system has a unique fsid coherence poses for users.
These techniques have not been clearly defined by earlier protocol
specifications and behaves like a
normal, read-only file-system.
Based on the construction of the server's name space, it is
possible that multiple pseudo filesystems may exist. For
example,
/a pseudo filesystem
/a/b real filesystem
/a/b/c pseudo filesystem
/a/b/c/d real filesystem
Need to discuss the ramifications of multiple pseudo
filesystems.
7.4. Multiple Roots
DOS, Windows 95, 98 and NT are sometimes described as having
"multiple roots". File-Systems are commonly represented as disk
letters. MacOS represents file-systems as top-level names. often unclear what is valid or invalid
client behavior.
The NFS version 4 servers for these platforms can construct a pseudo file-
system above these root names so protocol uses many techniques similar to those that disk letters or volume names
are simply directory names
have been used in the pseudo-root.
7.5. Filehandle Volatility previous protocol versions. The nature of the server's pseudo file-system is that NFS version 4
protocol does not provide distributed cache coherence. However, it is
defines a logical
representation more limited set of file-system(s) available caching guarantees to allow locks and
share reservations to be used without destructive interference from the server.
Therefore, the pseudo file-system is most likely constructed
dynamically when
client side caching.
In addition, the NFS version 4 is first instantiated. It is
expected the pseudo file-system may not have an on-disk counterpart
from protocol introduces a delegation
mechanism which persistent filehandles could be constructed. Even though
it is preferable that allows many decisions normally made by the server provide persistent filehandles for
the pseudo file-system, the NFS client should expect that pseudo
file-system file-handles are volatile. This can to
be confirmed made locally by
checking clients. This mechanism provides efficient
support of the associated "persistent_fh" attribute common cases where sharing is infrequent or where
sharing is read-only.
9.1. Performance Challenges for those
Expires: April 2000 [Page 39]
Draft Protocol Specification NFS version 4 October 1999
filehandles Client-Side Caching
Caching techniques used in question. If the filehandles are volatile, previous versions of the NFS
client must be prepared to recover a filehandle value (i.e. protocol have
been successful in providing good performance. However, several
scalability challenges can arise when those techniques are used with a v4
multi-component LOOKUP)
very large numbers of clients. This is particularly true when receiving an error
clients are geographically distributed which classically increases
the latency for cache revalidation requests.
The previous versions of NFS4ERR_FHEXPIRED.
7.6. Exported Root
If the server's root file-system is exported, it might be easy to
conclude that a pseudo-file-system NFS protocol repeat their file data
cache validation requests at the time the file is not needed. opened. This would be
wrong. Assume the following file-systems on a server:
/ disk1 (exported)
/a disk2 (not exported)
/a/b disk3 (exported)
Because disk2
behavior can have serious performance drawbacks. A common case is not exported, disk3 cannot be reached with simple
LOOKUPs. The server must bridge the gap with a pseudo-file-system.
7.7. Mount Point Crossing
The server file-system environment may be constructed in such a way
that
one file-system contains a directory in which is 'covered' or
mounted upon a file is only accessed by a second file-system. For example:
/a/b (file system 1)
/a/b/c/d (file system 2)
The pseudo file-system for single client. Therefore,
sharing is infrequent.
In this server may be constructed case, repeated reference to look
like:
/ (place holder/not exported)
/a/b (file system 1)
/a/b/c/d (file system 2)
It is the server's responsibility server to present the pseudo file-system find that no
conflicts exist is complete expensive. A better option with regards to the client. If the
performance is to allow a client sends that repeatedly opens a lookup request
for the path "/a/b/c/d", the server's response is the filehandle of
the file system "/a/b/c/d". In previous versions of NFS, the server
would respond with the directory "/a/b/d/d" within to do
so without reference to the file-system
"/a/b".
The NFS server. This is done until potentially
conflicting operations from another client will be able to determine if it crosses a server mount
point by a change actually occur.
A similar situation arises in the value of the "fsid" attribute. connection with file locking. Sending
Expires: April July 2000 [Page 40] 73]
Draft Protocol Specification NFS version 4 October 1999
7.8. Security Policy Protocol January 2000
file lock and Namespace Presentation
The application of the server's security policy needs to be carefully
considered by the implementor. One may choose unlock requests to limit the
viewability of portions of the pseudo file-system based on the
server's perception of the client's ability to authenticate itself
properly. However with server as well as the support of multiple security mechanisms read and the ability
write requests necessary to negotiate make data caching consistent with the appropriate use of these mechanisms,
locking semantics (see the server section "Data Caching and File Locking")
can severely limit performance. When locking is unable used to properly determine if provide
protection against infrequent conflicts, a client will be able
to authenticate itself. If, based on its policies, the server
chooses to limit large penalty is incurred.
This penalty may discourage the contents use of the pseudo file-system, the server
may effectively hide file-systems from a client that may otherwise
have legitimate access.
7.9. Summary file locking by applications.
The NFS version 4 protocol provides LOOKUP and READDIR operations for browsing of
NFS file-systems. These operations are also used to browse server
exports. A v4 server supports export browsing by including exported
directories in a pseudo-file-system. A browsing client can cross
seamlessly between a pseudo-file-system and more aggressive caching
strategies with the following design goals:
o Compatibility with a real, exported file-
system. Clients must support volatile filehandles and recognize
mount point crossing large range of server file-systems.
Expires: April 2000 [Page 41]
Draft Protocol Specification NFS version 4 October 1999
8. File Locking
Integrating locking into NFS necessarily causes it to be state-full,
with semantics.
o Provide the invasive nature same caching benefits as previous versions of "share" file locks it becomes
substantially more dependent on state than the traditional
combination of
NFS and NLM [XNFS]. There are three components protocol when unable to
making this state manageable: provide the more aggressive model.
o Clear division between client and Requirements for aggressive caching are organized so that a
large portion of the benefit can be obtained even when not all
of the requirements can be met.
The appropriate requirements for the server
o Ability to reliably detect inconsistency are discussed in state between client later
sections in which specific forms of caching are covered. (see the
section "Open Delegation").
9.2. Delegation and Callbacks
Recallable delegation of server
o Simple and robust recovery mechanisms
In this model, responsibilities for a file to a
client improves performance by avoiding repeated requests to the
server owns in the state information. The client
communicates its view absence of this state to inter-client conflict. With the use of a
"callback" RPC from server as needed. The
client is also able to detect inconsistent state before modifying client, a server recalls delegated
responsibilities when another client engages in sharing of a
delegated file.
To support Windows "share" locks, it
A delegation is necessary to atomically open
or create files. Having a separate share/unshare operation will not
allow correct implementation of passed from the Windows OpenFile API. In order server to correctly implement share semantics, the existing mechanisms used
when a file is opened or created (LOOKUP, CREATE, ACCESS) need to be
replaced. NFS V4 will have an OPEN procedure that subsumes client, specifying the
functionality
object of LOOKUP, CREATE, the delegation and ACCESS. However, because many
operations require a file handle, the traditional LOOKUP is preserved
to map type of delegation. There are
different types of delegations but each type contains a file name stateid to file handle without establishing state on the
server. Policy of granting access or modifying files is managed by be
used to represent the server based delegation when performing operations that
depend on the client's state. It delegation. This stateid is believed that these
mechanisms can implement policy ranging from advisory only locking similar to
full mandatory locking. While ACCESS is just a subset of OPEN, those
associated with locks and share reservations but differs in that the
ACCESS procedure
stateid for a delegation is maintained as associated with a lighter weight mechanism.
8.1. Definitions
Lock The term "lock" will clientid and may be
used to refer to both record
(byte-range) locks as well as file (share) locks unless
specifically stated otherwise.
Client Throughout this proposal on behalf of all the term "client" nfs_lockowners for the given client. A
delegation is used made to
indicate the entity that maintains client as a set whole and not to any specific
process or thread of locks control within it.
Because callback RPCs may not work in all environments (due to
firewalls, for example), correct protocol operation does not depend
on behalf them. Preliminary testing of one or more applications. The client is responsible for
crash recovery callback functionality by means of those locks it manages. Multiple clients
may share the same transport and multiple clients may exist a
Expires: April July 2000 [Page 42] 74]
Draft Protocol Specification NFS version 4 October 1999
on Protocol January 2000
CB_NULL procedure determines whether callbacks can be supported. The
CB_NULL procedure checks the same network node.
Clientid continuity of the callback path. A 64-bit quantity returned
server makes a preliminary assessment of callback availability to a
given client and avoids delegating responsibilities until it has
determined that callbacks are supported. Because the granting of a
delegation is always conditional upon the absence of conflicting
access, clients must not assume that a delegation will be granted and
they must always be prepared for OPENs to be processed without any
delegations being granted.
Once granted, a delegation behaves in most ways like a lock. There
is an associated lease that is subject to renewal together with all
of the other leases held by that client.
Unlike locks, an operation by a second client to a delegated file
will cause the server that uniquely
corresponds to recall a delegation through a callback.
On recall, the client supplied Verifier and ID.
Lease An interval of time defined by holding the delegation must flush modified
state (such as modified data) to the server for which and return the
client
delegation. The conflicting request will not receive a response
until the recall is irrevokeably granted complete. The recall is considered complete when
the client returns the delegation or the server times out on the
recall and revokes the delegation as a lock. At result of the end timeout.
Following the resolution of a
lease period the lock may be revoked if recall, the lease has not
been extended. The lock must be revoked if a conflicting
lock server has been granted after the lease interval. All leases
granted by a server have
information necessary to grant or deny the same fixed interval.
Stateid A 64-bit quantity returned by second client's request.
At the time the client receives a server delegation recall, it may have
substantial state that uniquely
defines needs to be flushed to the locking state granted by server. Therefore,
the server should allow sufficient time for a
specific lock owner for a specific file. A stateid
composed of all bits 0 or all bits 1 have special meaning
and are reserved.
Verifier A 32-bit quantity generated by the client that recall RPC to
complete since it may involve numerous RPCs to the server. If the
server
can use is able to determine if that the client has restarted and lost
all previous lock state.
8.2. Locking
It is assumed that manipulating diligently flushing
state to the server as a lock result of the recall, the server may extend
the usual time allowed for a recall. However, the time allowed for
recall completion should not be unbounded.
An example of this is rare when compared to I/O
operations. It is also assumed that crashes and network partitions
are relatively rare. Therefore it is important that I/O operations
have a light weight mechanism responsibility to indicate if they possess mediate opens on a held
lock. A lock request contains the heavy weight information required given
file is delegated to establish a lock and uniquely define client (see the lock owner. section "Open Delegation").
The following sections describe server will not know what opens are in effect on the transition from client.
Without this knowledge the heavy weight
information server will be unable to determine if the eventual stateid used for most client and server
locking
access and lease interactions.
8.2.1. Client ID
For each LOCK request, deny state for the client must identify itself to file allows any particular open until
the server.
This is done in such a way as to allow delegation for correct lock
identification and crash recovery. Client identification is
accomplished with two values.
o the file has been returned.
A verifier that is used to detect client reboots.
o A variable length opaque array failure or a network partition can result in failure to
respond to uniquely define a client.
For an operating system recall callback. In this may be a fully qualified host case, the server will revoke
the delegation which in turn will render useless any modified state
still on the client.
Expires: April 2000 [Page 43]
Draft Protocol Specification NFS version 4 October 1999
name or IP address, and for a user level July 2000 [Page 75]
Draft Specification NFS client it may
additionally contain a process id version 4 Protocol January 2000
9.2.1. Delegation Recovery
There are three situations that delegation recovery must deal with:
o Client reboot or other unique sequence.
The data structure for restart
o Server reboot or restart
o Network partition (full or callback-only)
In the Client ID would then appear as:
struct nfs_client_id {
opaque verifier[4];
opaque id<>;
}
It is possible through event the mis-configuration of a client reboots or restarts, the
existence failure to renew
leases will result in the revocation of record locks and share
reservations. Delegations, however, may treated a bit differently.
There will be situations in which delegations will need to be
reestablished after a rogue client that two clients end up using the same
nfs_client_id. This situation reboots or restarts. The reason for
this is avoided by 'negotiating' the
nfs_client_id between client may have file data stored locally and server this data
was associated with the use of the
SETCLIENTID. previously held delegations. The following describes the two scenarios of
negotiation.
1 Client has never connected client will
need to reestablish the server
In this case appropriate file state on the server.
To allow for this type of client generates an nfs_client_id and
unless another client has the same nfs_client_id.id field, recovery, the server accepts may extend the request. The server also records
period for delegation recovery beyond the
principal (or principal to uid mapping) typical lease expiration
period. This implies that requests from the credential
in the RPC request other clients that contains conflict
with these delegations will need to wait. This behavior is
consistent with the nfs_client_id
negotiation request.
Two clients might still use normal recall process may take significant time
because of the same nfs_client_id.id due client's need to flush state to perhaps configuration error (say a High Availability
configuration where the nfs_client_id.id is derived from server. This
longer interval would increase the ethernet controller address window for clients to reboot and both systems have the
same address). In this case,
consult stable storage so that the result is delegations can be reclaimed. For
open delegations, such delegations are reclaimed using OPEN with a switched
union that returns in addition to NFS4ERR_CLID_INUSE,
claim type of CLAIM_DELEGATE_PREV. (see the
network address (the rpcbind netid sections on "Data
Caching and universal address) Revocation" and "Operation 18: OPEN" for discussion of
open delegation and the client that is using the id.
2 Client is re-connecting to details of OPEN respectively).
When the server after reboots or restarts, delegations are reclaimed (using
the OPEN operation with CLAIM_DELEGATE_PREV) in a client reboot similar fashion to
record locks and share reservations. However, there is a slight
semantic difference. In this case, the client still generates an nfs_client_id
but normal case if the nfs_client_id.id field will server decides that a
delegation should not be granted, it performs the same as the
nfs_client_id.id generated prior to reboot. If requested action
(e.g. OPEN) without granting any delegation. For reclaim, the server
finds that
grants the principal/uid delegation but a special designation is equal to applied so that
the previously
"registered" nfs_client_id.id, then locks associated with client treats the old nfs_client_id are immediately released. If delegation as having been granted but recalled
by the server. Because of this, the
principal/uid is not equal, then this is a rogue client has the duty to write all
modified state to the server and then return the request is returned in error. For more discussion delegation. This
process of handling delegation reclaim reconciles three principles of
crash recovery semantics, see
the section on "Crash
Recovery" NFS Version 4 protocol:
Expires: April July 2000 [Page 44] 76]
Draft Protocol Specification NFS version 4 October 1999
In both cases, upon success, NFS4_OK is returned. To help reduce the
amount of data transferred on OPEN and LOCK, the server will also
return a unique 64-bit clientid value that is Protocol January 2000
o Upon reclaim, a short hand reference client reporting resources assigned to the nfs_client_id values presented it by the client. From this point
forward, the client can an
earlier server instance must be granted those resources.
o The server has unquestionable authority to determine whether
delegations are to be granted and, once granted, whether they
are to be continued.
o The use the clientid of callbacks is not to refer be depended upon until the client
has proven its ability to itself.
8.2.2. nfs_lockowner and stateid Definition receive them.
When requesting a lock, the client must present network partition occurs, delegations are subject to freeing
by the server when the
clientid and an identifier lease renewal period expires. This is similar
to the behavior for locks and share reservations. For delegations,
however, the owner of server may extend the requested lock.
These two fields period in which conflicting
requests are referred to as the nfs_lockowner and held off. Eventually the
definition occurrence of those fields are:
o a conflicting
request from another client will cause revocation of the delegation.
A clientid returned loss of the callback path (e.g. by later network configuration
change) will have the server as part same effect. A recall request will fail and
revocation of the clients use delegation will result.
A client normally finds out about revocation of a delegation when it
uses a stateid associated with a delegation and receives the SETCLIENTID procedure
o A variable length opaque array used error
NFS4ERR_EXPIRED. It also may find out about delegation revocation
after a client reboot when it attempts to uniquely define reclaim a delegation and
receives that same error. Note that in the owner case of a lock managed revoked write
open delegation, there are issues because data may have been modified
by the client.
This may be a thread id, process id, or client whose delegation is revoked and separately by other unique value.
When the server grants
clients. See the lock it responds with a unique 64-bit
stateid. The stateid is used as section "Revocation Recovery for Write Open
Delegation" for a short hand reference to the
nfs_lockowner, since discussion of such issues. Note also that when
delegations are revoked, information about the server revoked delegation
will be maintaining written by the
correspondence between them.
8.2.3. Use of server to stable storage (as described in the stateid
All I/O requests contain a stateid. If
section "Crash Recovery"). This is done to deal with the nfs_lockowner performs
I/O on case in
which a range of bytes within server reboots after revoking a locked range, delegation but before the stateid returned
by
client holding the server must revoked delegation is notified about the
revocation.
9.3. Data Caching
When applications share access to a set of files, they need to be used
implemented so as to indicate take account of the appropriate lock (record
or share) is held. If no state is established possibility of conflicting
access by another application. This is true whether the applications
in question execute on different clients or reside on the client, either same
client.
Share reservations and record lock or share lock, a stateid of all bits 0 is used. If no
conflicting locks are held on the file, the server may grant the I/O
request. If a conflict with an explicit lock occurs, facilities the request is
failed (NFS4ERR_LOCKED). This allows "mandatory locking" NFS
version 4 protocol provides to be
implemented.
A stateid of all bits 1 allows read requests allow applications to bypass locking checks
at the server. However, write requests with stateid with bits all 1
does not bypass file locking requirements.
An explicit lock may not be granted while an I/O operation with
conflicting implicit locking is being performed. coordinate
access by providing mutual exclusion facilities. The NFS version 4
Expires: April July 2000 [Page 45] 77]
Draft Protocol Specification NFS version 4 October 1999
The byte range of a lock is indivisible. A range may Protocol January 2000
protocol's data caching must be locked,
unlocked, or changed between read and write but may implemented such that it does not have
subranges unlocked or changed between read and write. This is
invalidate the
semantics provided by Win32 but only a subset of assumptions that those using these facilities depend
upon.
9.3.1. Data Caching and OPENs
In order to avoid invalidating the semantics
provided by Unix. It is expected sharing assumptions that Unix
applications rely on, NFS version 4 clients can more easily
simulate modifying subranges than Win32 servers adding this feature.
8.2.4. Sequencing should not provide cached
data to applications or modify it on behalf of Lock Requests
Locking is different than most NFS operations as an application when it requires "at-
most-one" semantics that are
would not provided by ONC RPC. In the face of
retransmission or reordering, lock be valid to obtain or unlock requests must have a
well defined and consistent behavior. To accomplish this each lock
request contains a sequence number modify that is same data via a monotonically increasing
integer. Different nfs_lockowners have different sequences. The
server maintains READ or
WRITE operation.
Furthermore, in the last sequence number (L) received and absence of open delegation (see the
response section "Open
Delegation") two additional rules apply. Note that was returned. If a request with these rules are
obeyed in practice by many NFS version 2 and version 3 clients.
o First, cached data present on a previous sequence
number (r < L) is received it is silently ignored as its response client must have been received before the last request (L) was sent. If a
duplicate of last request (r == L) be revalidated after
doing an OPEN. This is received, to ensure that the stored response
is returned. If a request beyond data for the next sequence (r == L + 2) is
received it OPENed
file is silently ignored. Sequences are reinitialized
whenever still correctly reflected in the client verifier changes.
8.3. Blocking Locks
Some clients require client's cache. This
validation must be done at least when the support of blocking locks. The current
proposal lacks client's OPEN
operation includes DENY=WRITE or BOTH thus terminating a call-back mechanism, similar to NLM, period
in which other clients may have had the opportunity to notify a
client when open the lock has been granted.
file with WRITE access. Clients have no choice but may choose to
continually poll for do the lock, which presents a fairness problem.
Two new lock types are added, READW and WRITEW used to indicate
revalidation more often (i.e. at OPENs specifying DENY=NONE) to
parallel the server that NFS version 3 protocol's practice for the client is requesting a blocking lock. The server
should maintain an ordered list benefit
of pending blocking locks. When the
conflicting lock is released, users assuming this degree of cache revalidation.
o Second, modified data must be flushed to the server may wait the lease period before
closing a file OPENed for the first client write. This is complementary to re-request the lock. After
first rule. If the lease period
expires data is not flushed at CLOSE, the next waiting
revalidation done after client request OPENs as file is allowed the lock. Clients
are required unable to poll at an interval sufficiently small that it
achieve its purpose. The other aspect to flushing the data
before close is
likely that the data must be committed to acquire stable
storage before the lock in CLOSE operation is requested by the client.
In the case of a timely manner. The server is not
required to maintain reboot or restart and a list of pending blocked locks as CLOSEd file, it is used to
increase fairness and
may not correct operation. Because of be possible to retransmit the
unordered nature of crash recovery, storing of lock state data to stable
storage would be required written to guarantee ordered granting the
file. Hence, this requirement.
9.3.2. Data Caching and File Locking
For those applications that choose to use file locking instead of blocking
locks.
share reservations to exclude inconsistent file access, there is an
analogous set of constraints that apply to client side data caching.
These rules are effective only if the file locking is used in a way
that matches in an equivalent way the actual READ and WRITE
operations executed. This is as opposed to file locking that is
Expires: April July 2000 [Page 46] 78]
Draft Protocol Specification NFS version 4 October 1999
8.4. Lease Renewal
The purpose of a lease Protocol January 2000
based on pure convention. For example, it is possible to allow manipulate
a server to remove stale locks
that are held two-megabyte file by a client that has crashed or is otherwise
unreachable. It is not a mechanism for cache consistency and lease
renewals may not be denied if the lease interval has not expired.
Any I/O request that has been made with a valid stateid is a positive
indication that dividing the client is still alive file into two one-megabyte
regions and locks are being
maintained. This becomes an implicit renewal of the lease. In the
case no I/O has been performed within protecting access to the lease interval, a lease can
be renewed two regions by having the client issue a file locks on
bytes zero length READ. Because
the nfs_lockowner contains a unique client value, any stateid for a
client will renew all leases and one. A lock for locks held with write on byte zero of the same client
field. This will allow very low overhead lease renewal that scales
extremely well. In file would
represent the typical case, no extra RPC calls are needed right to do READ and in WRITE operations on the worst case first
region. A lock for write on byte one RPC is required every lease period
regardless of the number of locks held by file would represent the
right to do READ and WRITE operations on the second region. As long
as all applications manipulating the file obey this convention, they
will work on a local file system. However, they may not work with
the client.
8.5. Crash Recovery NFS version 4 protocol unless clients refrain from data caching.
The important requirement rules for data caching in crash recovery is that both the client
and the server know file locking environment are:
o First, when the other has failed. Additionally it is
required that a client sees obtains a consistent view of file lock for a particular
region, the data across server
reboots. All I/O operations cache corresponding to that region (if any
cache data exists) must be revalidated. If the change attribute
indicates that the file may have been queued within updated since the cached
data was obtained, the client or network buffers must wait until flush or invalidate the
cached data for the newly locked region. A client might choose
to invalidate all of non-modified cached data that it has successfully
recovered for
the locks protecting file but the I/O operations.
8.5.1. Client Failure and Recovery
In only requirement for correct operation is to
invalidate all of the event that data in the newly locked region.
o Second, before releasing a client fails, write lock for a region, all modified
data for that region must be flushed to the server. The
modified data must also be written to stable storage.
Note that flushing data to the server may recover and the client's
locks when invalidation of cached
data must reflect the associated leases have expired. Conflicting locks
from another actual byte ranges locked or unlocked.
Rounding these up or down to reflect client may cache block boundaries
will cause problems if not carefully done. For example, writing a
modified block when only be granted after this lease expiration.
If the client half of that block is able to restart or reinitialize within the lease
period the client an area being
unlocked may be forced cause invalid modification to wait the remainder of region outside the lease
period before obtaining new locks.
To minimize client delay upon restart, lock requests contain a
verifier field
unlocked area. This, in the lock_owner. This verifier is turn, may be part of the
initial SETCLIENTID call made a region locked by the
another client. Since the verifier will
be changed Clients can avoid this situation by synchronously
performing portions of write operations that overlap that portion
(initial or final) that is not a full block. Similarly, invalidating
a locked area which is not an integral number of full buffer blocks
would require the client upon each initialization, the server can
compare a new verifier to read one or two partial blocks from the
server if the verifier associated with currently
held locks and determine revalidation procedure shows that they do not match. This signifies the
client's new instantiation and loss of locking state. As a result, data which the server
client possesses may not be valid.
The data that is free written to release all locks held which are associated
with the old verifier.
Expires: April 2000 [Page 47]
Draft Protocol Specification NFS version 4 October 1999
For secure environments, server as a change in the verifier must only cause pre-requisite to the
release
unlocking of locks associated with the authenticated requester. This
is required to prevent a rogue entity from freeing otherwise valid
locks. Note that the verifier region must have the same uniqueness
properties of the be written to stable storage. The client
may accomplish this either with synchronous writes or by following
asynchronous writes with a COMMIT verifier.
8.5.2. Server Failure and Recovery
If the server fails and loses locking state, operation. This is required
because retransmission of the modified data after a server must wait the
lease period before granting any new locks or allowing any I/O. An
I/O request during the grace period reboot
might conflict with a stale stateid will fail
with NFS4ERR_GRACE. To recover the lock and associate state, the held by another client.
Expires: July 2000 [Page 79]
Draft Specification NFS version 4 Protocol January 2000
A client will reissue the lock request with reclaim set implementation may choose to TRUE. Upon
receiving accommodate applications which
use record locking in non-standard ways (e.g. using a successful reply and associated stateid, record lock as
a global semaphore) by flushing to the client server more data upon an LOCKU
than is covered by the locked range. This may
reissue include modified data
within files other than the I/O request with one for which the new stateid.
Any time a client receives an NFS4ERR_GRACE error, unlocks are being done.
In such cases, the client must
assume that all locking state associated not interfere with applications whose
READs and WRITEs are being done only within the server returning bounds of record
locks which the error has been lost. The application holds. For example, an application locks
a single byte of a file and proceeds to write that single byte. A
client should start recovering that chose to handle a LOCKU by flushing all
outstanding locks upon receiving NFS4ERR_GRACE.
If modified data to
the server receives a lock request during its grace period could validly write that
does single byte in response to an
unrelated unlock. However, it would not have reclaim set be valid to TRUE, write the server entire
block in which that single written byte was located since it includes
an area that is not locked and might be locked by another client.
Client implementations can avoid this problem by dividing files with
modified data into those for which all modifications are done to
areas covered by an appropriate record lock and those for which there
are modifications not covered by a record lock. Any writes done for
the former class of files must return
NFS4ERR_GRACE. This error return will trigger not include areas not locked and thus
not modified on the client client.
9.3.3. Data Caching and Mandatory File Locking
Client side data caching needs to recover
all respect mandatory file locking when
it is in effect. The presence of its mandatory file locking state by reclaiming locks.
A lock request outside for a given
file is indicated in the server's grace period with reclaim set to
TRUE can only succeed if result flags for an OPEN. When mandatory
locking is in effect for a file, the server can guarantee that no conflicting client must check for an
appropriate file lock for data being read or I/O request has been granted since reboot.
8.5.3. Network Partitions and Recovery written. If the duration of a network partition is greater than lock
exists for the lease
period provided by range being read or written, the server, client may satisfy
the server will have not received a
lease renewal from request using the client. client's validated cache. If this occurs, the server may free
all locks an appropriate
file lock is not held for the client. As a result, all stateids held range of the read or write, the read or
write request must not be satisfied by the
client will become invalid. Once client's cache and the client is able
request sent to reach the server after such for processing. When a network partition, all I/O submitted by the
client with read or write
request partially overlaps a locked region, the now invalid stateids will fail request should be
subdivided into multiple pieces with each region (locked or not)
treated appropriately.
9.3.4. Data Caching and File Identity
When clients cache data, the server
returning the error NFS4ERR_EXPIRED. Once this error is received,
the client will suitably notify the application that held file data needs to organized according
to the lock.
As a courtesy file system object to which the client or optimization, data belongs. For NFS version
3 clients, the server may continue typical practice has been to hold locks on behalf assume for the purpose of a
caching that distinct filehandles represent distinct file system
objects. The client for which recent communication then has extended beyond the lease period. If the server receives a lock
or I/O request that conflicts with one of these courtesy locks, choice to organize and maintain the
data cache on this basis.
Expires: April July 2000 [Page 48] 80]
Draft Protocol Specification NFS version 4 October 1999
server must free the courtesy lock and grant the new request. Protocol January 2000
In the event of a network partition with a duration extending beyond NFS version 4 protocol, there is now the expiration of possibility to have
significant deviations from a client's leases, the server MUST employ "one filehandle per object" model
because a method
of recording this fact in its stable storage. Conflicting locks
requests from another client filehandle may be serviced after the lease
expiration. There are various scenarios involving server failure
after such an event that require constructed on the storage basis of these lease
expirations or network partitions. One scenario is as follows:
A client holds a lock at the server and encounters object's
pathname. Therefore, clients need a
network partition and is unable reliable method to renew the associated
lease. A second client obtains a conflicting lock and then
frees the lock. After the unlock request by the second
client, the server reboots or reinitializes. Once the
server recovers, determine if
two filehandles designate the network partition heals same file system object. If clients
were simply to assume that all distinct filehandles denote distinct
objects and proceed to do data caching on this basis, caching
inconsistencies would arise between the
original distinct client attempts side objects
which mapped to reclaim the original lock.
In this scenario and without any state information, the same server will
allow the reclaim and side object.
By providing a method to differentiate filehandles, the client will be NFS version 4
protocol alleviates a potential functional regression in an inconsistent state
because comparison
with the server or NFS version 3 protocol. Without this method, caching
inconsistencies within the same client has no knowledge of the conflicting
lock.
The server may choose to store could occur and this lease expiration or network
partitioning state has not
been present in a way that will only identify previous versions of the client as a
whole. NFS protocol. Note that this may potentially lead it
is possible to lock reclaims have such inconsistencies with applications executing
on multiple clients but that is not the issue being
denied unnecessarily because of a mix addressed here.
For the purposes of conflicting and non-
conflicting locks. The data caching, the following steps allow an NFS
version 4 client to determine whether two distinct filehandles denote
the same server may also choose side object:
o If GETATTR directed to store information
about each lock that has an expired lease two filehandles have different values of
the fsid attribute, then the filehandles represent distinct
objects.
o If GETATTR for any file with an associated
conflicting lock. The choice fsid that matches the fsid of
the amount and type two filehandles in question returns a unique_handles
attribute with a value of state
information that is stored is left TRUE, then the two objects are
distinct.
o If GETATTR directed to the implementor. In any case, two filehandles does not return the server must have enough state information to enable correct
recovery from multiple partitions and multiple server failures.
8.6. Server Revocation
fileid attribute for one or both of Locks
At any point, the server can revoke locks held by a client and handles, then the
client must it
cannot be prepared for this event. When determined whether the client detects that
its locks have been or may have been revoked, two objects are the same.
Therefore, operations which depend on that knowledge (e.g.
client is
responsible for validating the state information between itself and side data caching) cannot be done reliably.
o If GETATTR directed to the server. Validating locking state two filehandles returns different
values for the client means that it
must verify or reclaim state for each lock currently held.
The first instance of lock revocation fileid attribute, then they are distinct objects.
o Otherwise they are the same object.
9.4. Open Delegation
When a file is upon server reboot or re-
initialization. In this instance being OPENed, the client will receive an error or
NFS4ERR_GRACE server may delegate further handling
of opens and closes for that file to the client will proceed with normal crash recovery
as described in the previous section. opening client. Any such
Expires: April July 2000 [Page 49] 81]
Draft Protocol Specification NFS version 4 October 1999
The second lock revocation event can occur as a result of
administrative intervention within the lease period. While this is
considered a rare event, it Protocol January 2000
delegation is possible recallable, since the circumstances that allowed for
the server's
administrator has decided delegation are subject to release or revoke a particular lock held
by the client. As a result of revocation, change. In particular, the client will server may
receive an
error of NFS4ERR_EXPIRED and a conflicting OPEN from another client, the error is received within server must
recall the lease
period for delegation before deciding whether the lock. In this instance OPEN from the other
client may be granted. Making a delegation is up to the server and
clients should not assume that
only the lock_owner's locks have been lost. any particular OPEN either will or
will not result in an open delegation. The following is a typical
set of conditions that servers might use in deciding whether OPEN
should be delegated:
o The client notifies must be able to respond to the
lock holder appropriately. server's callback
requests. The client may not assume server will use the lease
period has been renewed as CB_NULL procedure for a result test
of failed operation. callback ability.
o The third lock revocation event is the inability to renew the lease
period. While this is considered a rare or unusual event, the client must have responded properly to previous recalls.
o There must be prepared to recover. Both no current open conflicting with the server and client will requested
delegation.
o There should be able
to detect the failure to renew no current delegation that conflicts with the lease and are capable
delegation being requested.
o The probability of
recovering without data corruption. For future conflicting open requests should be
low based on the server, it tracks recent history of the
last renewal event serviced for file.
o The existence of any server-specific semantics of OPEN/CLOSE
that would make the client and knows when required handling incompatible with the lease
will expire. Similarly,
prescribed handling that the delegated client must track operations which will
renew the lease period would apply (see
below).
There are two types of open delegations, read and is able to determine lease period
expiration.
When the write. A read open
delegation allows a client determines to handle, on its own, requests to open a
file for reading that the lease period has expired, do not deny read access to others. Multiple
read open delegations may be outstanding simultaneously and do not
conflict. A write open delegation allows the client must mark to handle, on
its own, all locks held opens. Only one write open delegation may exist for the associated lease as
"unvalidated". This means the a
given file at a given time and it is inconsistent with any read open
delegations.
When a client has been unable a read open delegation, it may not make any changes
to re-establish
or confirm the appropriate lock state with the server. As described
in the previous section on crash recovery, there are scenarios in
which contents or attributes of the server file but it is assured that no
other client may grant conflicting locks after the lease period
has expired for do so. When a client. Once the lease period has expired, the client must validate each lock it has held to ensure that a
conflicting lock has not been granted. The client write open delegation,
it may accomplish
this task by issuing an I/O request, either a pending I/O or zero
length read. If modify the response to file data since no other client will be accessing
the request is success, file's data. The client holding a write delegation may only
affect file attributes which are intimately connected with the file
data: object_size, time_modify, change.
When a client has validated an open delegation, it does not send OPENs or
Expires: July 2000 [Page 82]
Draft Specification NFS version 4 Protocol January 2000
CLOSEs to the lock and re-established server but updates the appropriate state
between itself and status internally.
For a read open delegation, opens that cannot be handled locally
(opens for write or that deny read access) must be sent to the
server. If the I/O request
When an open delegation is not successful, made, the lock was revoked by response to the server and OPEN contains an
open delegation structure which specifies the client must notify following:
o the
owner.
8.7. Share Reservations
A share reservation is a mechanism type of delegation (read or write)
o space limitation information to control access to a file. It
is a separate and independent mechanism from record locking. When a
client opens a file, it issues an OPEN request to flushing of data on
close (write open delegation only, see the server section "Open
Delegation and Data Caching")
o an nfsace4 specifying the type of access required (READ, WRITE, or BOTH) read and the
type of access write permissions
o a stateid to deny others (deny NONE, READ, WRITE, or BOTH). If represent the OPEN fails delegation for READ and WRITE
The stateid is separate and distinct from the client will fail stateid for the applications open request.
Pseudo-code definition of OPEN
proper. The standard stateid, unlike the semantics:
Expires: April 2000 [Page 50]
Draft Protocol Specification NFS version 4 October 1999
if ((request.access & file_state.deny)) ||
(request.deny & file_state.access))
return (NFS4ERR_DENIED)
8.8. OPEN/CLOSE Procedures
To provide correct share semantics, delegation stateid, is
associated with a client MUST use the OPEN
procedure particular nfs_lockowner and will continue to obtain be
valid after the initial file handle delegation is recalled and indicate the desired
access and what if any access file remains open.
When a request internal to deny. Even if the client intends is made to
use open a stateid of all 0's or all 1's, file and open
delegation is in effect, it must still obtain the
filehandle for will be accepted or rejected solely on
the regular file with basis of the OPEN procedure. For clients
that do not have a deny mode built into their open API, deny equal following conditions. Any requirement for other
checks to
NONE be made by the delegate should result in open delegation
being denied so that the checks can be used. made by the server itself.
o The OPEN procedure with access and deny bits for the CREATE flag, also subsumes request and the CREATE
procedure for regular files file as used
described in previous versions of NFS,
allowing a create the section "Share Reservations".
o The read and write permissions as determined below.
The nfsace4 passed with a share to delegation can be done atomicly.
Will expand on create semantics here. used to avoid frequent
ACCESS calls. The CLOSE procedure removes all share locks held by the lock_owner on
that file. If record locks are held they permission check should be explicitly
unlocked. Some servers may not support as follows:
o If the CLOSE of a file nfsace4 indicates that
still has record locks held; if so, CLOSE will fail and return an
error.
The LOOKUP procedure is preserved and will return a file handle the open may be done, then it
should be granted without establishing any lock state on reference to the server. Without a valid
stateid, the server will assume
o If the client has nfsace4 indicates that the least access. For
example, a file opened with deny READ/WRITE cannot open may not be accessed using
a file handle obtained through LOOKUP.
Expires: April 2000 [Page 51]
Draft Protocol Specification NFS version 4 October 1999
9. Client-Side Caching
Client-side caching of data, of file attributes, and of file names is
essential done, then an
ACCESS request must be sent to providing good performance in NFS. Providing dis-
tributed cache-coherence is a difficult problem and previous versions
of NFS have not attempted it. Instead, several client implementation
techniques have been used the server to reduce obtain the problems
definitive answer.
The server may return an nfsace4 that lack of co-
herence poses for users. These techniques have not been clearly
defined by earlier specifications and it is often unclear what is
valid or invalid client behavior. more restrictive than the
actual ACL of the file. This includes an nfsace4 that specifies
denial of all access. Note that some common practices such as
Expires: July 2000 [Page 83]
Draft Specification NFS version 4 uses many techniques similar Protocol January 2000
mapping the traditional user "root" to those that have been
used the user "nobody" may make it
incorrect to return the actual ACL of the file in previous versions the delegation
response.
The use of NFS. It does not provide distributed
cache coherence, but it defines a more limited set delegation together with various other forms of caching
guarantees to allow locks and share reservation to
creates the possibility that no server authentication will ever be used without
destructive interference from client-side caching.
In addition, version 4 introduces
performed for a delegation mechanism which allows
many decisions normally made by given user since all of the user's requests might be
satisfied locally. Where the client is depending on the server to for
authentication, the client should be made locally sure authentication occurs for
each user by
clients. This provides efficient support use of the common cases where
sharing is infrequent or where sharing is read-only.
9.1. Performance Challenges for Client-Side Caching ACCESS operation. This should be the case
even if an ACCESS operation would not be required otherwise. As
mentioned before, the server may enforce frequent authentication by
returning an nfsace4 denying all access with every open delegation.
9.4.1. Open Delegation and Data Caching techniques used in previous versions
OPEN delegation allows much of NFS have been
successful in providing good performance. However, several scala-
bility challenges can arise when those techniques are used the message overhead associated with very
large numbers of clients, particularly when those clients are
geographically distributed, increasing
the latency for cache
revalidation requests.
When latencies are large, repeated cache validation requests at open
time, which NFS-v2 opening and NFS-v3 clients typically do, can have serious
performance drawbacks. A common case closing files to be eliminated. An open when an open
delegation is one in which a file is only
accessed by effect does not require that a single client. Sharing is infrequent.
In this case, repeated reference validation message be
sent to the server to find server. The continued endurance of the "read open
delegation" provides a guarantee that no
conflicts exist, is expensive. A better option is to allow a client
repeatedly opening OPEN for write and thus no
write has occurred. Similarly, when closing a file to do so without reference to the server,
until potentially conflicting operations from another client actually
occur.
A similar situation arises in connection with file locking. Sending
file lock opened for write
and unlock requests if write open delegation is in effect, the data written does not
have to be flushed to the server as well as until the I/O
requests necessary to make data caching consistent with open delegation is
recalled. The continued endurance of the locking
semantics (see open delegation provides a
guarantee that no open and thus no read or write has been done by
another client.
For the section "Data Caching purposes of open delegation, READs and File Locking") can
severely limit performance. When locking is used to provide pro-
Expires: April 2000 [Page 52]
Draft Protocol Specification NFS version 4 October 1999
tection against infrequent conflicts, WRITEs done without an
OPEN are treated as the functional equivalents of a large penalty will be paid,
which may discourage corresponding
type of OPEN. This refers to the READs and WRITEs that use the
special stateids consisting of locking.
In NFS Version 4, more aggressive caching strategies are designed:
o To be compatible all zero bits or all one bits.
Therefore, READs or WRITEs with a large range of server semantics.
o Provide special stateid done by another
client will force the same caching benefits as previous versions of NFS
when unable server to provide the more aggressive model.
o Requirements for aggressive caching are organized so that recall a
large portion write open delegation. A
WRITE with a special stateid done by another client will force a
recall of read open delegations.
With delegations, a client is able to avoid writing data to the benefit can be obtained even
server when not all
of the requirements can be met. CLOSE of a file is serviced. The appropriate requirements for CLOSE operation is
the server are discussed in later
sections in usual point at which specific forms the client is notified of caching are dealt with. (see
Section "Open Delegation").
NOTE: [[This discussion a lack of proxy caching assumes that stable
storage for the a
proxy server appears to modified file data generated by the (real) server as an ordinary
client. Should there be a proposal for non-transparent
proxy server support (Mike Eisler's proxy model 2), this
can be modified.]]
9.2. Proxy Caching
Proxy caching application. At
the CLOSE, file data is a useful technique written to reduce latency and avoid the server overload when a large number of geographically distributed
clients share data. The proxy cache allows many requests to be
satisfied by a local server, reducing bandwidth and latencies
associated with accessing through normal
accounting the primary server.
If NFS version 4 were to limit itself server is able to determine if the caching approaches used
in NFS v2 and NFS v3, a large number of available file
system space for the requests which a proxy data has been exceeded (i.e. server would receive would result returns
NFS4ERR_NOSPC or NFS4ERR_DQUOT). This accounting includes quotas.
The introduction of delegations requires that a alternative method be
in corresponding requests to place for the
distant server:
o All OPEN and CLOSE requests
o WRITE requests necessary same type of communication to flush out dirty data before all file
close operations.
o All LOCK and UNLOCK requests. occur between client
Expires: April July 2000 [Page 53] 84]
Draft Protocol Specification NFS version 4 October 1999
o READ Protocol January 2000
and WRITE requests which must go to server.
In the delegation response, the server because
locks are held or being released.
o All directory modification requests (e.g. CREATE, REMOVE, etc.)
o All SETATTR requests
o Many other requests because of cache entry staleness
Maintaining distributed caches allowing authoritative decisions to be
made locally is difficult, in provides either the general case. However, there are
many situations in which access patterns allow such decisions to be
delegated opportunistically to particular clients (such as proxy
servers) avoiding a great deal limit of unnecessary communication. This is
the size of particular importance when scaling to very large numbers the file or the number of
clients.
9.3. Delegation modified blocks and Callbacks
Recallable delegation of associated
block size. The server responsibilities for a file to a must ensure that the client (which may include proxy servers) improves performance by
avoiding repeated requests will be able to
flush data to the server of a size equal to that provided in the absence
original delegation. The server must make this assurance for all
outstanding delegations. Therefore, the server must be careful in
its management of
interclient conflict. A available space for new or modified data taking
into account available file system space and any applicable quotas.
The server recalls delegated responsibilities,
using can recall delegations as a callback rpc from the server to result of managing the client, when another
available file system space. The client engages in sharing of a delegated file.
A delegation is passed from should abide by the server to server's
state space limits for delegations. If the client, specifying client exceeds the
object stated
limits for which the delegation delegation, the server's behavior is being done and type of delegation.
There are different types of undefined.
Based on server conditions, quotas or available file system space,
the server may grant write open delegations but each contains with very restrictive
space limitations. The limitations may be defined in a stateid way that will
always force modified data to be used flushed to represent the delegation when performing operations
that depend server on the delegation. This stateid is similar close.
With respect to authentication, flushing modified data to those
associated with locks and share reservations but differs in that the
stateid for a delegation is associated with server
after a clientid and CLOSE has occurred may be
used on behalf of all problematic. For example, the nfs_lockowner's for user
of the given client. A
delegation is made to application may have logged off of the client as a whole and unexpired
authentication credentials may not be present. In this case, the
client may need to any specific
process within it.
Because callback rpc's take special care to ensure that local unexpired
credentials will in fact be available. This may not work be accomplished by
tracking the expiration time of credentials and flushing data well in all environments (due to
firewalls, for example), correct operation does not depend on them.
Preliminary testing
advance of callback functionality their expiration or by means making private copies of
credentials to assure their availability when needed.
9.4.2. Open Delegation and File Locks
When a CB_NULL
request determines whether callbacks client holds a write open delegation, lock operations are
performed locally. This includes those required for mandatory file
locking. This can be supported. The CB_NULL
request checks done since the continuity delegation implies that there
can be no conflicting locks. Similarly, all of the callback path. A server makes a
preliminary assessment of callback availability to a given client and
avoids delegating responsibilities until it has determined revalidations
that
callbacks are supported. Because client requests for delegation are
always conditional upon would normally be associated with obtaining locks and the absence
flushing of data associated with the releasing of locks need not be
done.
9.4.3. Recall of Open Delegation
The following events necessitate recall of an open delegation:
o Potentially conflicting access, clients OPEN request (or READ/WRITE done with
"special" stateid)
Expires: April July 2000 [Page 54] 85]
Draft Protocol Specification NFS version 4 October 1999
can not assume that a Protocol January 2000
o SETATTR issued by another client
o REMOVE request for delegation will be granted, and
must always be prepared for denial.
Once granted, a delegation behaves in most ways like a lock. There
is an associated lease that is subject to renewal together with all
of the other leases held by that client.
Unlike locks, a request to a delegated file from a second client will
cause the server to recall a delegation through a callback.
On recall, the client holding the delegation must flush modified
state (such as modified data) to the server and return the
delegation. The conflicting request will not be responded to until
o RENAME request for the recall is complete, file as either by the return source or target of the delegation or by
RENAME
Whether a RENAME of a directory in the server timing out path leading to the file
results in recall and revoking of an open delegation depends on the delegation.
Following recall, semantics of
the server has file system. If that file system denies such RENAMEs when
a file is open, the information necessary recall must be performed to grant
or deny second client's request.
Since recalling a delegation may involve determine whether the flushing of substantial
state
file in question is, in fact, open.
In addition to the server, situations above, the server should allow a time may choose to complete the recall substantially longer than for a typical single RPC. The
server may also extend the
open delegations at any time allowed if resource constraints make it can determine that
state is being diligently flushed by the client. However, the time
advisable to complete the recall do so. Clients should not always be unbounded.
For example, when responsibility to mediate opens on a given file is
delegated to a client (see the section "Open Delegation"), prepared for the
possibility of recall.
The server
will not know what opens are in effect on the client and thus will be
unable needs to determine whether the access and deny state employ special handling for a GETATTR where the
target is a file
allows any particular open until the delegation that has been returned.
Client failure or a network partition can result write open delegation in failure to
respond effect. In this
case, the client holding the delegation needs to a recall callback. be interrogated.
The server will revoke use a CB_GETATTR callback, if the delegation,
rendering GETATTR attribute
bits include any modified state still on of the client useless.
9.3.1. Delegation Recovery
There are three situations attributes that delegation recovery must deal with:
o Client reboot
o Server reboot
o Network partition (full or callback-only)
Expires: April 2000 [Page 55]
Draft Protocol Specification NFS version 4 October 1999
In the even of a client reboot, the failure to renew leases will
result in the revocation of record locks and share reservations.
Delegations, however, write open delegate may treated
modify (object_size, time_modify, change).
When a bit differently.
Because data associated with some delegations may be written to
stable storage on the client and because a delegation held by receives a proxy
server may be further delegated recall for an open delegation, it needs to its client in turn whereupon
update state on the
proxy server may reboot, there will be situations in which
delegations will need to before returning the delegation. These
same updates must be re-established after done whenever a client (which
includes a proxy server) reboots.
To accommodate such situations, the server may, after leases expire,
force requests that conflict with existing delegations chooses to wait for return a
longer period
delegation voluntarily. The following items of time. This is consistent with state need to be
dealt with:
o If the fact that recall,
including file associated with the time necessary delegation is no longer open and
no previous CLOSE operation has been sent to flush modified state the server, a CLOSE
operation must be sent to the server
and return server.
o If file has other open references at the delegation, may take significant time. This longer
interval would allow clients which reboot client, then OPEN
operations must be sent to consult stable storage
and request the reclamation of delegations which have server. The appropriate stateids
will be provided by the server for subsequent use by the client
since the delegation stateid will not been timed
out using this longer interval. For open delegations, such
delegations are reclaimed using be valid. These
OPEN requests are done with a the claim type of
CLAIM_DELEGATE_PREV. (See
CLAIM_DELEGATE_CUR. This will allow the Sections on "Data Caching and
Revocation" and "Procedure 17: presentation of the
delegation stateid so that the client can establish the
appropriate rights to perform the OPEN. (see the section
"Operation 18: OPEN" for discussion of details.)
o If there are granted file locks, the corresponding LOCK
operations need to be performed. This applies to the write open
delegation and case only.
Expires: July 2000 [Page 86]
Draft Specification NFS version 4 Protocol January 2000
o For a write open delegation, if, at the details time of OPEN respectively).
When recall, if the server reboots, delegations are reclaimed (using OPEN with
CLAIM_DELEGATE_PREV) in a similar fashion to record locks and share
reservations. However, there
file is a slight semantic difference.
Normally, not open for write, all modified data for the file must
be flushed to the server. If the server decides that a delegation should had not be granted,
it performs existed,
the requested action (e.g. OPEN) without granting any
delegation. When client would have done this happens as part of reclaim, data flush before the server grants CLOSE
operation.
o With the write open delegation but marks in place, it specially so is possible that the client treats
file was truncated during the
delegation duration of the delegation. For
example, the truncation could have occurred as having been granted but recalled by a result of an
OPEN UNCHECKED with a object_size attribute value of zero.
Therefore, if a truncation of the server so that
it then file has occurred and this
operation has not been propagated to the duty server, the truncation
must occur before any modified data is written to write all the server.
o Any modified state data for the file needs to be flushed to the server and
then return the delegation. This handling
server.
In the case of delegation reclaim
reconciles three principles write open delegation, file locking imposes some
additional requirements. The flushing of NFS Version 4:
o That upon reclaim, any modified data in any
region for which a client faithfully reporting resources
assigned write lock was released while the write open
delegation was in effect is what is required to it precisely maintain
the associated invariant. However, because the write open delegation
implies no other locking by an earlier server instance, must be granted
those resources.
o That other clients, a simpler implementation
is to flush all modified data for the server file (as described just above)
if any write lock has untrammeled authority to determine whether
delegations are to be granted and, once granted, whether they been released while the write open delegation
was in effect.
9.4.4. Delegation Revocation
At the point a delegation is revoked, if there are associated opens
on the client, the applications holding these opens need to be continued.
o That
notified. This notification usually occurs by returning errors for
READ/WRITE operations or when a close is attempted for the open file.
If no opens exist for the file at the point the delegation is
revoked, then notification of the revocation is unnecessary.
However, if there is modified data present at the use client for the
file, the user of callbacks is the application should be notified. Unfortunately,
it may not be possible to notify the user since active applications
may not be depended present at the client. See the section "Revocation
Recovery for Write Open Delegation" for additional details.
9.5. Data Caching and Revocation
When locks and delegations are revoked, the assumptions upon until which
successful caching depend are no longer guaranteed. The owner of the
client has proved its ability to receive them.
Expires: April July 2000 [Page 56] 87]
Draft Protocol Specification NFS version 4 October 1999
When a network partition occurs, delegations, like Protocol January 2000
locks and or share reservations will be subject which have been revoked needs to freeing when the lease renewal period
expires, although be
notified. This notification includes applications with a file open
that has a corresponding delegation which has been revoked. Cached
data associated with the server will normally extend revocation must be removed from the period in which
conflicting requests are held off in client.
In the case of delegations.
Eventually, however, modified data existing in the occurrence of a conflicting request client's cache, that
data must be removed from
another the client without it being written to the
server.
Notification to a lock owner will cause revocation in many cases consist of simply
returning an error on the delegation. A blockage
of next and all subsequent READs/WRITEs to the callback (e.g. by later network configuration change) will
have
open file or on the same effect. A recall request will fail and revocation of close. Where the delegation will result.
A methods available to a client normally finds out about revocation
to make such notification impossible because errors for certain
operations may not be returned, more drastic action such as signals
or process termination may be appropriate. The justification for
this is that an invariant for which an application depends on may be
violated. Depending on how errors are typically treated for the
client operating environment, further levels of a delegation when it
uses a stateid associated with a delegation notification
including logging, console messages, and receives the error
NFS4ERR_EXPIRED. It also GUI pop-ups may find out about delegation revocation
after be
appropriate.
9.5.1. Revocation Recovery for Write Open Delegation
Revocation recovery for a write open delegation poses the special
issue of modified data in the client reboot when it attempts cache while the file is not
open. In this situation, any client which does not flush modified
data to reclaim a delegation and
receives that same error. Note the server on each close must ensure that in the case user receives
appropriate notification of the failure as a revoked write
open delegation, there are issues because data result of the
revocation. Since such situations may have been modified
by require human action to
correct problems, notification schemes in which the client whose delegation appropriate user
or administrator is revoked notified may be necessary. Logging and separately by other
clients. See console
messages are typical examples.
If there is modified data on the section "Revocation Recovery for Write Open
Delegation" for client, it must not be flushed
normally to the server. A client may attempt to provide a discussion copy of such issues. Note also
the file data as modified during the delegation under a different
name in the file system name space to ease recovery. Unless the
client can determine that when
delegations are revoked information about the revoked delegation will
be written file has not modified by the server to stable storage (as described in section
7.5) any other
client, this technique must be limited to deal with the case situations in which a server reboots after revoking
client has a
delegation but before revoked delegate find out about complete cached copy of the revocation.
9.4. Data Caching
When programs share access to a set file in question. Use of files they need to
such a technique may be
implemented so as limited to take account of the possibility of conflicting
access by another program. This is true whether the programs in
question are on different hosts files under a certain size or reside on the same host.
Share reservations and record locks are the facilities that NFS v4
provides to allow programs may
only be used when sufficient disk space is guaranteed to co-ordinate access by providing mutual
exclusion facilities. NFS v4 data caching must be implemented so
that it does not vitiate available
within the assumptions that those using these
facilities depend on.
9.4.1. Data Caching target file system and OPENs
In order when the client has sufficient
buffering resources to avoid invalidating keep the sharing assumptions that
applications rely on, NFS v4 clients should not provide cached data
to applications or modify it on behalf of an application when copy available until it
would not be valid is
properly stored to obtain/modify that same data via a READ or
WRITE rpc. the target file system.
Expires: April July 2000 [Page 57] 88]
Draft Protocol Specification NFS version 4 October 1999
Further, Protocol January 2000
9.6. Attribute Caching
The attributes discussed in the absence of open delegation (see the Section "Open
delegation"), two further rules apply. These rules this section do not include named
attributes. Individual named attributes are obeyed in
practice by many NFS v2 analogous to files and NFS v3 clients.
o The first rule is that cached
caching of the data present on a client must for these needs to be
revalidated after doing handled just as data
caching is for ordinary files. Similarly, LOOKUP results from an OPEN,
OPENATTR directory are to make sure that be cached on the data same basis as any other
pathnames and similarly for
the directory contents.
Clients may cache file in question, is still validly reflected in attributes obtained from the client's
cache. This must be done at least when a client open includes
DENY=WRITE or BOTH, terminating a period server and use
them to avoid subsequent GETATTR requests. Such caching is write
through in which other clients
may have had the opportunity that modification to open the file with WRITE access.
Clients may choose attributes is always done by
means of requests to do the revalidation more often (i.e. on
opens specifying DENY=NONE) server and should not be done locally and
cached. The exception to parallel NFS v3 practice for the
benefit of users assuming this degree of cache revalidation.
o The second rule, complementary are modifications to the first, is attributes that modified
are intimately connected with data must be flushed to the server before closing caching. Therefore, extending a
file opened
for write. If this rule is not adhered to, the revalidation
done after client OPEN's cannot achieve its purpose. This by writing data
must be committed to stable storage before the CLOSE local data cache is done
since retransmission of reflected immediately
in the data after a server reboot might not
be possible, once object_size as seen on the client without this change being
immediately reflected on the file is closed.
9.4.2. Data Caching and File Locking
When users do server. Normally such changes are not use share reservations
propagated directly to exclude inconsistent
access, the server but use file locking instead, there when the modified data is an analogous set of
constraints that apply
flushed to client side data caching. These rules the server, analogous attribute changes are
effective only if file locking made on the
server. When open delegation is used in a way which is congruent
with effect, the actual IO operations being done, as opposed modified attributes
may be returned to being used the server in
a purely conventional way. For example, it is possible the response to manipulate a 2MB file, dividing CB_RECALL call.
The result of local caching of attributes is that the file into two 1MB regions, attribute
caches maintained on individual clients will not be coherent. Changes
made in one order on the server may be seen in a different order on
one client and using in a lock
for write third order on byte 0 of the a different client.
The typical file to represent the right to system application programming interfaces do IO not
provide means to
the first region and a lock atomically modify or interrogate attributes for write to byte 1 of
multiple files at the file to
represent same time. The following rules provide an
environment where the right to do IO on potential incoherences mentioned above can be
reasonably managed. These rules are derived from the second region. As long practice of
previous NFS protocols.
o All attributes for a given file (per-fsid attributes excepted)
are cached as all
applications manipulating a unit at the file obey this convention, they will
work on client so that no non-
serializability can arise within the context of a local file system, but they may not work on NFS v4 unless
clients refrain from data caching.
The first rule single file.
o An upper time boundary is that when maintained on how long a client locks a region, it must
revalidate its data cache if it has any cached data in the region
newly locked and invalidate it if
entry can be kept without being refreshed from the change attribute shows server.
o When operations are performed that change attributes at the
file may have been written since that data was obtained. (A client
might choose to invalidate all
server, the updated attribute set is requested as part of non-modified cached data the
containing RPC. This includes directory operations that it
has, but invalidating all update
attributes indirectly. This is accomplished by following the
modifying operation with a GETATTR operation and then using the
results of the data in GETATTR to update the newly locked region is
necessary for correct operation). client's cached attributes.
Expires: April July 2000 [Page 58] 89]
Draft Protocol Specification NFS version 4 October 1999
The second rule is Protocol January 2000
Note that before releasing a write lock if the full set of attributes to be cached is requested by
READDIR, the results can be cached by the client on the same basis as
attributes obtained via GETATTR.
A client may validate its cached version of attributes for a region,
all modified data for file by
fetching only the change attribute and assuming that region must be flushed to if the change
attribute has the same value as it did when the attributes were
cached, then no attributes have changed. The possible exception is
the server
(although not necessarily to disk).
Note that flushing data attribute time_access.
9.7. Name Caching
The results of LOOKUP and READDIR operations may be cached to avoid
the server and cost of subsequent LOOKUP operations. Just as in the invalidation case of cached
data must reflect
attribute caching, inconsistencies may arise among the actual byte ranges locked or unlocked.
Rounding these up or down to reflect various client cache block boundaries
will cause problems if not carefully done. For example, writing a
modified block when only half
caches. To mitigate the effects of that block is within an area being
unlocked may cause invalid modification to these inconsistencies and given
the region outside context of typical file system APIs, the
unlocked area which may following rules should
be part followed:
o The results of unsuccessful LOOKUPs should not be cached, unless
they are specifically reverified at the point of use.
o An upper time boundary is maintained on how long a region locked by another client.
Clients client name
cache entry can avoid this situation by synchronously performing portions
of write operations that overlap that portion (initial or final) be kept without verifying that
is the entry has not
been made invalid by a full block. Similarly, invalidating directory change operation performed by
another client.
When a locked area which client is not an integral number of full buffer blocks would require making changes to a directory for which there
exist name cache entries, the client needs to read one or two partial blocks from the server if the revalidation
procedure shows periodically fetch
attributes for that directory to ensure that it is not being
modified. After determining that no modification has occurred, the data which
expiration time for the client possesses associated name cache entries may not be
valid.
Writes required updated
to flush data before unlocking must be done to stable
storage, either by doing synchronous writes or a COMMIT as part of the flush operation. The is so because retransmission of current time plus the
modified data after a server reboot might conflict with name cache staleness bound.
When a lock held
by another client.
Clients may choose client is making changes to accommodate programs using record locking in
non-standard ways (e.g. using a record lock as a global semaphore),
by flushing given directory, it needs to
determine whether there have been changes made to the server more data upon an UNLOCK than is covered directory by
other clients. It does this by using the locked range, possibly including modified data change attribute as
reported before and after the directory operation in other files.
Any client doing so must ensure that the associated
change_info4 value returned for any file in which all data
written the operation. The server is able to properly locked areas, no piece of data be written
communicate to the server which is not within client whether the locked area.
9.4.3. Data Caching and Mandatory File Locking
Client side change_info4 data caching needs to respect mandatory file locking when
this is in effect. The presence of mandatory file locking for a
given file is indicated in the result flags for an OPEN. When there
is a read or write for a file for which mandatory locking is in
effect, the client must check if it holds an appropriate lock for provided
atomically with respect to the
range of bytes being read or written. directory operation. If it does, it may satisfy the
request using change
values are provided atomically, the client side caching, just as for any other read or
write. If such a lock is not held, then able to compare
the read or write cannot be
satisfied pre-operation change value with the change value in the client's
name cache. If the comparison indicates that the directory was
updated by caching but must be sent to another client, the server. When a request
partially overlaps a locked area, name cache associated with the request should
modified directory is purged from the client. If the comparison
indicates no modification, the name cache can be broken up
into multiple pieces with each region (locked or not) treated updated on the
Expires: April July 2000 [Page 59] 90]
Draft Protocol Specification NFS version 4 October 1999
appropriately.
9.4.4. Data Caching and File Identity
When clients cache data, data needs to organized according Protocol January 2000
client to reflect the directory operation and the
file system object associated timeout
extended. The post-operation change value needs to which the data belongs. For NFS v3 clients, be saved as the typical practice has been to assume (for this purpose) that
distinct handles represent distinct filesystem objects (even though
in some unusual cases this has not been
basis for future change_info4 comparisons.
As demonstrated by the case) and scenario above, name caching requires that the data
client revalidate name cache may be maintained on the this basis.
In NFS v4, we have data by inspecting the prospect (due to pathname based handles) change attribute
of
more significant deviations from a one-filehandle-per-object model. directory at the point when the name cache item was cached.
This requires some method by which clients may reliably determine
whether two filehandles designate that the same object. If they were server update the change attribute for
directories when the contents of the corresponding directory is
modified. For a client to
simply assume that all distinct filehandles denoted distinct objects use the change_info4 information
appropriately and correctly, the server must report the pre and proceeded post
operation change attribute values atomically. When the server is
unable to do data caching on that basis, caching
inconsistencies would arise between report the distinct client side objects
which mapped before and after values atomically with respect
to the same directory operation, the server side object. While it is true must indicate that
such inconsistencies would be similar to those typically seen by
programs running on multiple clients (apart from this issue), these
inconsistencies would fact in the
change_info4 return value. When the information is not be expected an NFS v3 clients atomically
reported, the client should not sharing
files with any assume that other client. clients have not
changed the directory.
9.8. Directory Caching
The appearance results of such inconsistencies
would READDIR operations may be a definite problem inhibiting transition from NFS v3 used to NFS
v4 avoid subsequent
READDIR operations. Just as in the cases of attribute and so must be avoided.
The following procedure allows an NFS v4 name
caching inconsistencies may arise among the various client to determine (for caches.
To mitigate the
purposes effects of data caching) whether two distinct filehandles denote these inconsistencies and given the
same server side object:
o If GETATTR directed to
context of typical file system APIs, the two handles following rules should be
followed:
o Cached READDIR information for a directory which is not obtained
in question have
different values a single READDIR operation must always be a consistent
snapshot of fsid.major or fsid.minor, then they are
distinct objects.
o If directory contents. This is determined by using a
GETATTR for any file on before the fsid (major first READDIR and minor) to which after the two handles belong and unique_handles is TRUE, then last of READDIR
that contributes to the two
objects are distinct. cache.
o If GETATTR directed An upper time boundary is maintained to indicate the two handles does length of
time a directory cache entry is considered valid before the
client must revalidate the cached information.
The revalidation technique parallels that discussed in the case of
name caching. When the client is not return changing the
fileid directory in
question, checking the change attribute for one or both of the handles, then directory with GETATTR
is adequate. The lifetime of the it
cannot cache entry can be determined whether extended at
these checkpoints. When a client is modifying the two objects are directory, the same and so
operations which depend on that knowledge (e.g.
client side needs to use the change_info4 data
caching) cannot be done reliably.
o If to determine whether there
are other clients modifying the two GETATTR's return different values for directory. If it is determined that
no other client modifications are occurring, the fileid client may update
its directory cache to reflect its own changes.
Expires: April July 2000 [Page 60] 91]
Draft Protocol Specification NFS version 4 October 1999
attribute, then they are distinct objects.
o Otherwise they are Protocol January 2000
As demonstrated previously, directory caching requires that the same object.
9.5. Open Delegation
When a file is being opened,
client revalidate directory cache data by inspecting the server may delegate further handling change
attribute of opens and closes for that file to a directory at the opening client. Any such
delegation is recallable, since point when the circumstances directory was cached.
This requires that occasioned it
are subject to change. In particular, the server may receive a
conflicting OPEN from another client, which obliges it to recall update the
delegation before deciding whether change attribute for
directories when the OPEN may be granted. Granting
a delegation request is up to contents of the server and it may deny all such
requests. The following corresponding directory is
modified. For a typical set of conditions that servers
might use in deciding whether open should be delegated:
o The client must be able to respond to callbacks (as evidenced by
responding to previous CB_NULL requests).
o The client must not have failed to respond properly to previous
recalls.
o There must be no current open conflicting with use the requested
delegation.
o There should be no current delegation that conflicts with change_info4 information
appropriately and correctly, the
delegation being requested.
o The probability of future conflicting open requests should be
low based on server must report the recent history of pre and post
operation change attribute values atomically. When the file.
o The existence of any server specific semantics of OPEN/CLOSE
that would make is
unable to report the required handling incompatible before and after values atomically with respect
to the
prescribed handling that directory operation, the delegated client would apply (see
below).
There are two types of open delegations, read and write. A read open
delegation allows a client to handle, on its own, requests to open a
file for reading server must indicate that do not deny read access to others. Multiple
read open delegations may be outstanding simultaneously and do not
conflict. A write open delegation allows fact in the
change_info4 return value. When the information is not atomically
reported, the client to handle on its
own all opens. Only one write open delegation may exist for a given
file at a given time and it is inconsistent with any read open should not assume that other clients have not
changed the directory.
Expires: April July 2000 [Page 61] 92]
Draft Protocol Specification NFS version 4 October 1999
delegations.
When a client has a read open delegation, it may not make any Protocol January 2000
10. Minor Versioning
To address the requirement of an NFS protocol that can evolve as the
need arises, the NFS version 4 protocol contains the rules and
framework to allow for future minor changes or versioning.
The base assumption with respect to minor versioning is that any
future accepted minor version must follow the contents or attributes IETF process and be
documented in a standards track RFC. Therefore, each minor version
number will correspond to an RFC. Minor version zero of the file but it NFS
version 4 protocol is assured represented by this RFC. The COMPOUND
procedure will support the encoding of the minor version being
requested by the client.
The following items represent the basic rules for the development of
minor versions. Note that no
other client may do so. When a client has a write open delegation it future minor version may decide to
modify or add to the file data following rules as it wishes secure in part of the knowledge that no
other client is accessing minor version
definition.
1 Procedures are not added or deleted
To maintain the file's data. The client holding a
write delegation general RPC model, NFS version 4 minor versions
will not add or delete procedures from the NFS program.
2 Minor versions may only affect file attributes which are intimately
connected with add operations to the file data: length, modify_time, change.
When a client has an open delegation, it COMPOUND and
CB_COMPOUND procedures.
The addition of operations to the COMPOUND and CB_COMPOUND
procedures does not send OPEN's, or
CLOSE's affect the RPC model.
2.1 Minor versions may append attributes to GETATTR4args, bitmap4,
and GETATTR4res.
This allows for the server but updates expansion of the appropriate status internally.
For a read open delegation, opens that cannot be handled locally
(opens attribute model to allow
for write future growth or that deny read access) adaptation.
2.2 Minor version X must go to append any new attributes after the server.
When last
documented attribute.
Since attribute results are specified as an open delegation is requested and granted, opaque array of
per-attribute XDR encoded results, the response to complexity of adding new
attributes in the
OPEN contains an open delegation structure which specifies, midst of the type current definitions will be too
burdensome.
Expires: July 2000 [Page 93]
Draft Specification NFS version 4 Protocol January 2000
3 Minor versions must not modify the structure of delegation (read an existing
operation's arguments or write), space limitation information to
control flushing of data on close (write open delegation only, see results.
Again the Section "Open Delegation and Data Caching"), an nfsace4
specifying read and write permissions and complexity of handling multiple structure definitions
for a stateid to represent the
delegation when doing IO. This stateid single operation is separate and distinct from
the stateid too burdensome. New operations should
be added instead of modifying existing structures for a minor
version.
This rule does not preclude the following adaptations in a minor
version.
o adding bits to flag fields such as new attributes to
GETATTR's bitmap4 data type
o adding bits to existing attributes like ACLs that have flag
words
o extendin