Commit df632d3c authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'nfs-for-3.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Features include:

   - Remove CONFIG_EXPERIMENTAL dependency from NFSv4.1
     Aside from the issues discussed at the LKS, distros are shipping
     NFSv4.1 with all the trimmings.
   - Fix fdatasync()/fsync() for the corner case of a server reboot.
   - NFSv4 OPEN access fix: finally distinguish correctly between
     open-for-read and open-for-execute permissions in all situations.
   - Ensure that the TCP socket is closed when we're in CLOSE_WAIT
   - More idmapper bugfixes
   - Lots of pNFS bugfixes and cleanups to remove unnecessary state and
     make the code easier to read.
   - In cases where a pNFS read or write fails, allow the client to
     resume trying layoutgets after two minutes of read/write-
     through-mds.
   - More net namespace fixes to the NFSv4 callback code.
   - More net namespace fixes to the NFSv3 locking code.
   - More NFSv4 migration preparatory patches.
     Including patches to detect network trunking in both NFSv4 and
     NFSv4.1
   - pNFS block updates to optimise LAYOUTGET calls."

* tag 'nfs-for-3.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (113 commits)
  pnfsblock: cleanup nfs4_blkdev_get
  NFS41: send real read size in layoutget
  NFS41: send real write size in layoutget
  NFS: track direct IO left bytes
  NFSv4.1: Cleanup ugliness in pnfs_layoutgets_blocked()
  NFSv4.1: Ensure that the layout sequence id stays 'close' to the current
  NFSv4.1: Deal with seqid wraparound in the pNFS return-on-close code
  NFSv4 set open access operation call flag in nfs4_init_opendata_res
  NFSv4.1: Remove the dependency on CONFIG_EXPERIMENTAL
  NFSv4 reduce attribute requests for open reclaim
  NFSv4: nfs4_open_done first must check that GETATTR decoded a file type
  NFSv4.1: Deal with wraparound when updating the layout "barrier" seqid
  NFSv4.1: Deal with wraparound issues when updating the layout stateid
  NFSv4.1: Always set the layout stateid if this is the first layoutget
  NFSv4.1: Fix another refcount issue in pnfs_find_alloc_layout
  NFSv4: don't put ACCESS in OPEN compound if O_EXCL
  NFSv4: don't check MAY_WRITE access bit in OPEN
  NFS: Set key construction data for the legacy upcall
  NFSv4.1: don't do two EXCHANGE_IDs on mount
  NFS: nfs41_walk_client_list(): re-lock before iterating
  ...
parents 2474542f af283885
......@@ -12,9 +12,47 @@ and work is in progress on adding support for minor version 1 of the NFSv4
protocol.
The purpose of this document is to provide information on some of the
upcall interfaces that are used in order to provide the NFS client with
some of the information that it requires in order to fully comply with
the NFS spec.
special features of the NFS client that can be configured by system
administrators.
The nfs4_unique_id parameter
============================
NFSv4 requires clients to identify themselves to servers with a unique
string. File open and lock state shared between one client and one server
is associated with this identity. To support robust NFSv4 state recovery
and transparent state migration, this identity string must not change
across client reboots.
Without any other intervention, the Linux client uses a string that contains
the local system's node name. System administrators, however, often do not
take care to ensure that node names are fully qualified and do not change
over the lifetime of a client system. Node names can have other
administrative requirements that require particular behavior that does not
work well as part of an nfs_client_id4 string.
The nfs.nfs4_unique_id boot parameter specifies a unique string that can be
used instead of a system's node name when an NFS client identifies itself to
a server. Thus, if the system's node name is not unique, or it changes, its
nfs.nfs4_unique_id stays the same, preventing collision with other clients
or loss of state during NFS reboot recovery or transparent state migration.
The nfs.nfs4_unique_id string is typically a UUID, though it can contain
anything that is believed to be unique across all NFS clients. An
nfs4_unique_id string should be chosen when a client system is installed,
just as a system's root file system gets a fresh UUID in its label at
install time.
The string should remain fixed for the lifetime of the client. It can be
changed safely if care is taken that the client shuts down cleanly and all
outstanding NFSv4 state has expired, to prevent loss of NFSv4 state.
This string can be stored in an NFS client's grub.conf, or it can be provided
via a net boot facility such as PXE. It may also be specified as an nfs.ko
module parameter. Specifying a uniquifier string is not support for NFS
clients running in containers.
The DNS resolver
================
......
......@@ -1730,6 +1730,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
will be autodetected by the client, and it will fall
back to using the idmapper.
To turn off this behaviour, set the value to '0'.
nfs.nfs4_unique_id=
[NFS4] Specify an additional fixed unique ident-
ification string that NFSv4 clients can insert into
their nfs_client_id4 string. This is typically a
UUID that is generated at system install time.
nfs.send_implementation_id =
[NFSv4.1] Send client implementation identification
......
......@@ -7,7 +7,6 @@
*/
#include <linux/types.h>
#include <linux/utsname.h>
#include <linux/kernel.h>
#include <linux/ktime.h>
#include <linux/slab.h>
......@@ -19,6 +18,8 @@
#include <asm/unaligned.h>
#include "netns.h"
#define NLMDBG_FACILITY NLMDBG_MONITOR
#define NSM_PROGRAM 100024
#define NSM_VERSION 1
......@@ -40,6 +41,7 @@ struct nsm_args {
u32 proc;
char *mon_name;
char *nodename;
};
struct nsm_res {
......@@ -70,7 +72,7 @@ static struct rpc_clnt *nsm_create(struct net *net)
};
struct rpc_create_args args = {
.net = net,
.protocol = XPRT_TRANSPORT_UDP,
.protocol = XPRT_TRANSPORT_TCP,
.address = (struct sockaddr *)&sin,
.addrsize = sizeof(sin),
.servername = "rpc.statd",
......@@ -83,10 +85,54 @@ static struct rpc_clnt *nsm_create(struct net *net)
return rpc_create(&args);
}
static int nsm_mon_unmon(struct nsm_handle *nsm, u32 proc, struct nsm_res *res,
struct net *net)
static struct rpc_clnt *nsm_client_get(struct net *net)
{
static DEFINE_MUTEX(nsm_create_mutex);
struct rpc_clnt *clnt;
struct lockd_net *ln = net_generic(net, lockd_net_id);
spin_lock(&ln->nsm_clnt_lock);
if (ln->nsm_users) {
ln->nsm_users++;
clnt = ln->nsm_clnt;
spin_unlock(&ln->nsm_clnt_lock);
goto out;
}
spin_unlock(&ln->nsm_clnt_lock);
mutex_lock(&nsm_create_mutex);
clnt = nsm_create(net);
if (!IS_ERR(clnt)) {
ln->nsm_clnt = clnt;
smp_wmb();
ln->nsm_users = 1;
}
mutex_unlock(&nsm_create_mutex);
out:
return clnt;
}
static void nsm_client_put(struct net *net)
{
struct lockd_net *ln = net_generic(net, lockd_net_id);
struct rpc_clnt *clnt = ln->nsm_clnt;
int shutdown = 0;
spin_lock(&ln->nsm_clnt_lock);
if (ln->nsm_users) {
if (--ln->nsm_users)
ln->nsm_clnt = NULL;
shutdown = !ln->nsm_users;
}
spin_unlock(&ln->nsm_clnt_lock);
if (shutdown)
rpc_shutdown_client(clnt);
}
static int nsm_mon_unmon(struct nsm_handle *nsm, u32 proc, struct nsm_res *res,
struct rpc_clnt *clnt)
{
int status;
struct nsm_args args = {
.priv = &nsm->sm_priv,
......@@ -94,31 +140,24 @@ static int nsm_mon_unmon(struct nsm_handle *nsm, u32 proc, struct nsm_res *res,
.vers = 3,
.proc = NLMPROC_NSM_NOTIFY,
.mon_name = nsm->sm_mon_name,
.nodename = clnt->cl_nodename,
};
struct rpc_message msg = {
.rpc_argp = &args,
.rpc_resp = res,
};
clnt = nsm_create(net);
if (IS_ERR(clnt)) {
status = PTR_ERR(clnt);
dprintk("lockd: failed to create NSM upcall transport, "
"status=%d\n", status);
goto out;
}
BUG_ON(clnt == NULL);
memset(res, 0, sizeof(*res));
msg.rpc_proc = &clnt->cl_procinfo[proc];
status = rpc_call_sync(clnt, &msg, 0);
status = rpc_call_sync(clnt, &msg, RPC_TASK_SOFTCONN);
if (status < 0)
dprintk("lockd: NSM upcall RPC failed, status=%d\n",
status);
else
status = 0;
rpc_shutdown_client(clnt);
out:
return status;
}
......@@ -138,6 +177,7 @@ int nsm_monitor(const struct nlm_host *host)
struct nsm_handle *nsm = host->h_nsmhandle;
struct nsm_res res;
int status;
struct rpc_clnt *clnt;
dprintk("lockd: nsm_monitor(%s)\n", nsm->sm_name);
......@@ -150,7 +190,15 @@ int nsm_monitor(const struct nlm_host *host)
*/
nsm->sm_mon_name = nsm_use_hostnames ? nsm->sm_name : nsm->sm_addrbuf;
status = nsm_mon_unmon(nsm, NSMPROC_MON, &res, host->net);
clnt = nsm_client_get(host->net);
if (IS_ERR(clnt)) {
status = PTR_ERR(clnt);
dprintk("lockd: failed to create NSM upcall transport, "
"status=%d, net=%p\n", status, host->net);
return status;
}
status = nsm_mon_unmon(nsm, NSMPROC_MON, &res, clnt);
if (unlikely(res.status != 0))
status = -EIO;
if (unlikely(status < 0)) {
......@@ -182,9 +230,11 @@ void nsm_unmonitor(const struct nlm_host *host)
if (atomic_read(&nsm->sm_count) == 1
&& nsm->sm_monitored && !nsm->sm_sticky) {
struct lockd_net *ln = net_generic(host->net, lockd_net_id);
dprintk("lockd: nsm_unmonitor(%s)\n", nsm->sm_name);
status = nsm_mon_unmon(nsm, NSMPROC_UNMON, &res, host->net);
status = nsm_mon_unmon(nsm, NSMPROC_UNMON, &res, ln->nsm_clnt);
if (res.status != 0)
status = -EIO;
if (status < 0)
......@@ -192,6 +242,8 @@ void nsm_unmonitor(const struct nlm_host *host)
nsm->sm_name);
else
nsm->sm_monitored = 0;
nsm_client_put(host->net);
}
}
......@@ -430,7 +482,7 @@ static void encode_my_id(struct xdr_stream *xdr, const struct nsm_args *argp)
{
__be32 *p;
encode_nsm_string(xdr, utsname()->nodename);
encode_nsm_string(xdr, argp->nodename);
p = xdr_reserve_space(xdr, 4 + 4 + 4);
*p++ = cpu_to_be32(argp->prog);
*p++ = cpu_to_be32(argp->vers);
......
......@@ -12,6 +12,10 @@ struct lockd_net {
struct delayed_work grace_period_end;
struct lock_manager lockd_manager;
struct list_head grace_list;
spinlock_t nsm_clnt_lock;
unsigned int nsm_users;
struct rpc_clnt *nsm_clnt;
};
extern int lockd_net_id;
......
......@@ -596,6 +596,7 @@ static int lockd_init_net(struct net *net)
INIT_DELAYED_WORK(&ln->grace_period_end, grace_ender);
INIT_LIST_HEAD(&ln->grace_list);
spin_lock_init(&ln->nsm_clnt_lock);
return 0;
}
......
......@@ -95,8 +95,8 @@ config NFS_SWAP
This option enables swapon to work on files located on NFS mounts.
config NFS_V4_1
bool "NFS client support for NFSv4.1 (EXPERIMENTAL)"
depends on NFS_V4 && EXPERIMENTAL
bool "NFS client support for NFSv4.1"
depends on NFS_V4
select SUNRPC_BACKCHANNEL
help
This option enables support for minor version 1 of the NFSv4 protocol
......
This diff is collapsed.
......@@ -41,6 +41,7 @@
#define PAGE_CACHE_SECTORS (PAGE_CACHE_SIZE >> SECTOR_SHIFT)
#define PAGE_CACHE_SECTOR_SHIFT (PAGE_CACHE_SHIFT - SECTOR_SHIFT)
#define SECTOR_SIZE (1 << SECTOR_SHIFT)
struct block_mount_id {
spinlock_t bm_lock; /* protects list */
......@@ -172,7 +173,6 @@ struct bl_msg_hdr {
/* blocklayoutdev.c */
ssize_t bl_pipe_downcall(struct file *, const char __user *, size_t);
void bl_pipe_destroy_msg(struct rpc_pipe_msg *);
struct block_device *nfs4_blkdev_get(dev_t dev);
int nfs4_blkdev_put(struct block_device *bdev);
struct pnfs_block_dev *nfs4_blk_decode_device(struct nfs_server *server,
struct pnfs_device *dev);
......
......@@ -53,22 +53,6 @@ static int decode_sector_number(__be32 **rp, sector_t *sp)
return 0;
}
/* Open a block_device by device number. */
struct block_device *nfs4_blkdev_get(dev_t dev)
{
struct block_device *bd;
dprintk("%s enter\n", __func__);
bd = blkdev_get_by_dev(dev, FMODE_READ, NULL);
if (IS_ERR(bd))
goto fail;
return bd;
fail:
dprintk("%s failed to open device : %ld\n",
__func__, PTR_ERR(bd));
return NULL;
}
/*
* Release the block device
*/
......@@ -172,11 +156,12 @@ nfs4_blk_decode_device(struct nfs_server *server,
goto out;
}
bd = nfs4_blkdev_get(MKDEV(reply->major, reply->minor));
bd = blkdev_get_by_dev(MKDEV(reply->major, reply->minor),
FMODE_READ, NULL);
if (IS_ERR(bd)) {
rc = PTR_ERR(bd);
dprintk("%s failed to open device : %d\n", __func__, rc);
rv = ERR_PTR(rc);
dprintk("%s failed to open device : %ld\n", __func__,
PTR_ERR(bd));
rv = ERR_CAST(bd);
goto out;
}
......
......@@ -683,8 +683,7 @@ encode_pnfs_block_layoutupdate(struct pnfs_block_layout *bl,
p = xdr_encode_hyper(p, lce->bse_length << SECTOR_SHIFT);
p = xdr_encode_hyper(p, 0LL);
*p++ = cpu_to_be32(PNFS_BLOCK_READWRITE_DATA);
list_del(&lce->bse_node);
list_add_tail(&lce->bse_node, &bl->bl_committing);
list_move_tail(&lce->bse_node, &bl->bl_committing);
bl->bl_count--;
count++;
}
......
This diff is collapsed.
......@@ -194,7 +194,7 @@ extern __be32 nfs4_callback_recall(struct cb_recallargs *args, void *dummy,
struct cb_process_state *cps);
#if IS_ENABLED(CONFIG_NFS_V4)
extern int nfs_callback_up(u32 minorversion, struct rpc_xprt *xprt);
extern void nfs_callback_down(int minorversion);
extern void nfs_callback_down(int minorversion, struct net *net);
extern int nfs4_validate_delegation_stateid(struct nfs_delegation *delegation,
const nfs4_stateid *stateid);
extern int nfs4_set_callback_sessionid(struct nfs_client *clp);
......@@ -209,6 +209,5 @@ extern int nfs4_set_callback_sessionid(struct nfs_client *clp);
extern unsigned int nfs_callback_set_tcpport;
extern unsigned short nfs_callback_tcpport;
extern unsigned short nfs_callback_tcpport6;
#endif /* __LINUX_FS_NFS_CALLBACK_H */
......@@ -122,7 +122,15 @@ static struct pnfs_layout_hdr * get_layout_by_fh_locked(struct nfs_client *clp,
ino = igrab(lo->plh_inode);
if (!ino)
continue;
get_layout_hdr(lo);
spin_lock(&ino->i_lock);
/* Is this layout in the process of being freed? */
if (NFS_I(ino)->layout != lo) {
spin_unlock(&ino->i_lock);
iput(ino);
continue;
}
pnfs_get_layout_hdr(lo);
spin_unlock(&ino->i_lock);
return lo;
}
}
......@@ -158,7 +166,7 @@ static u32 initiate_file_draining(struct nfs_client *clp,
ino = lo->plh_inode;
spin_lock(&ino->i_lock);
if (test_bit(NFS_LAYOUT_BULK_RECALL, &lo->plh_flags) ||
mark_matching_lsegs_invalid(lo, &free_me_list,
pnfs_mark_matching_lsegs_invalid(lo, &free_me_list,
&args->cbl_range))
rv = NFS4ERR_DELAY;
else
......@@ -166,7 +174,7 @@ static u32 initiate_file_draining(struct nfs_client *clp,
pnfs_set_layout_stateid(lo, &args->cbl_stateid, true);
spin_unlock(&ino->i_lock);
pnfs_free_lseg_list(&free_me_list);
put_layout_hdr(lo);
pnfs_put_layout_hdr(lo);
iput(ino);
return rv;
}
......@@ -196,9 +204,18 @@ static u32 initiate_bulk_draining(struct nfs_client *clp,
continue;
list_for_each_entry(lo, &server->layouts, plh_layouts) {
if (!igrab(lo->plh_inode))
ino = igrab(lo->plh_inode);
if (ino)
continue;
spin_lock(&ino->i_lock);
/* Is this layout in the process of being freed? */
if (NFS_I(ino)->layout != lo) {
spin_unlock(&ino->i_lock);
iput(ino);
continue;
get_layout_hdr(lo);
}
pnfs_get_layout_hdr(lo);
spin_unlock(&ino->i_lock);
BUG_ON(!list_empty(&lo->plh_bulk_recall));
list_add(&lo->plh_bulk_recall, &recall_list);
}
......@@ -211,12 +228,12 @@ static u32 initiate_bulk_draining(struct nfs_client *clp,
ino = lo->plh_inode;
spin_lock(&ino->i_lock);
set_bit(NFS_LAYOUT_BULK_RECALL, &lo->plh_flags);
if (mark_matching_lsegs_invalid(lo, &free_me_list, &range))
if (pnfs_mark_matching_lsegs_invalid(lo, &free_me_list, &range))
rv = NFS4ERR_DELAY;
list_del_init(&lo->plh_bulk_recall);
spin_unlock(&ino->i_lock);
pnfs_free_lseg_list(&free_me_list);
put_layout_hdr(lo);
pnfs_put_layout_hdr(lo);
iput(ino);
}
return rv;
......
......@@ -93,10 +93,10 @@ static struct nfs_subversion *find_nfs_version(unsigned int version)
spin_unlock(&nfs_version_lock);
return nfs;
}
};
}
spin_unlock(&nfs_version_lock);
return ERR_PTR(-EPROTONOSUPPORT);;
return ERR_PTR(-EPROTONOSUPPORT);
}
struct nfs_subversion *get_nfs_version(unsigned int version)
......@@ -498,7 +498,8 @@ nfs_get_client(const struct nfs_client_initdata *cl_init,
return nfs_found_client(cl_init, clp);
}
if (new) {
list_add(&new->cl_share_link, &nn->nfs_client_list);
list_add_tail(&new->cl_share_link,
&nn->nfs_client_list);
spin_unlock(&nn->nfs_client_lock);
new->cl_flags = cl_init->init_flags;
return rpc_ops->init_client(new, timeparms, ip_addr,
......@@ -668,7 +669,8 @@ int nfs_init_server_rpcclient(struct nfs_server *server,
{
struct nfs_client *clp = server->nfs_client;
server->client = rpc_clone_client(clp->cl_rpcclient);
server->client = rpc_clone_client_set_auth(clp->cl_rpcclient,
pseudoflavour);
if (IS_ERR(server->client)) {
dprintk("%s: couldn't create rpc_client!\n", __func__);
return PTR_ERR(server->client);
......@@ -678,16 +680,6 @@ int nfs_init_server_rpcclient(struct nfs_server *server,
timeo,
sizeof(server->client->cl_timeout_default));
server->client->cl_timeout = &server->client->cl_timeout_default;
if (pseudoflavour != clp->cl_rpcclient->cl_auth->au_flavor) {
struct rpc_auth *auth;
auth = rpcauth_create(pseudoflavour, server->client);
if (IS_ERR(auth)) {
dprintk("%s: couldn't create credcache!\n", __func__);
return PTR_ERR(auth);
}
}
server->client->cl_softrtry = 0;
if (server->flags & NFS_MOUNT_SOFT)
server->client->cl_softrtry = 1;
......@@ -761,6 +753,8 @@ static int nfs_init_server(struct nfs_server *server,
data->timeo, data->retrans);
if (data->flags & NFS_MOUNT_NORESVPORT)
set_bit(NFS_CS_NORESVPORT, &cl_init.init_flags);
if (server->options & NFS_OPTION_MIGRATION)
set_bit(NFS_CS_MIGRATION, &cl_init.init_flags);
/* Allocate or find a client reference we can use */
clp = nfs_get_client(&cl_init, &timeparms, NULL, RPC_AUTH_UNIX);
......@@ -855,7 +849,6 @@ static void nfs_server_set_fsinfo(struct nfs_server *server,
if (server->wsize > NFS_MAX_FILE_IO_SIZE)
server->wsize = NFS_MAX_FILE_IO_SIZE;
server->wpages = (server->wsize + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
server->pnfs_blksize = fsinfo->blksize;
server->wtmult = nfs_block_bits(fsinfo->wtmult, NULL);
......
......@@ -2072,7 +2072,7 @@ found:
nfs_access_free_entry(entry);
}
static void nfs_access_add_cache(struct inode *inode, struct nfs_access_entry *set)
void nfs_access_add_cache(struct inode *inode, struct nfs_access_entry *set)
{
struct nfs_access_entry *cache = kmalloc(sizeof(*cache), GFP_KERNEL);
if (cache == NULL)
......@@ -2098,6 +2098,20 @@ static void nfs_access_add_cache(struct inode *inode, struct nfs_access_entry *s
spin_unlock(&nfs_access_lru_lock);
}
}
EXPORT_SYMBOL_GPL(nfs_access_add_cache);
void nfs_access_set_mask(struct nfs_access_entry *entry, u32 access_result)
{
entry->mask = 0;
if (access_result & NFS4_ACCESS_READ)
entry->mask |= MAY_READ;
if (access_result &
(NFS4_ACCESS_MODIFY | NFS4_ACCESS_EXTEND | NFS4_ACCESS_DELETE))
entry->mask |= MAY_WRITE;
if (access_result & (NFS4_ACCESS_LOOKUP|NFS4_ACCESS_EXECUTE))
entry->mask |= MAY_EXEC;
}
EXPORT_SYMBOL_GPL(nfs_access_set_mask);
static int nfs_do_access(struct inode *inode, struct rpc_cred *cred, int mask)
{
......
......@@ -46,6 +46,7 @@
#include <linux/kref.h>
#include <linux/slab.h>
#include <linux/task_io_accounting_ops.h>
#include <linux/module.h>
#include <linux/nfs_fs.h>
#include <linux/nfs_page.h>
......@@ -78,6 +79,7 @@ struct nfs_direct_req {
atomic_t io_count; /* i/os we're waiting for */
spinlock_t lock; /* protect completion state */
ssize_t count, /* bytes actually processed */
bytes_left, /* bytes left to be sent */
error; /* any reported error */
struct completion completion; /* wait for i/o completion */
......@@ -190,6 +192,12 @@ static void nfs_direct_req_release(struct nfs_direct_req *dreq)
kref_put(&dreq->kref, nfs_direct_req_free);
}
ssize_t nfs_dreq_bytes_left(struct nfs_direct_req *dreq)
{
return dreq->bytes_left;
}
EXPORT_SYMBOL_GPL(nfs_dreq_bytes_left);
/*
* Collects and returns the final error value/byte-count.
*/
......@@ -390,6 +398,7 @@ static ssize_t nfs_direct_read_schedule_segment(struct nfs_pageio_descriptor *de
user_addr += req_len;
pos += req_len;
count -= req_len;
dreq->bytes_left -= req_len;
}
/* The nfs_page now hold references to these pages */
nfs_direct_release_pages(pagevec, npages);
......@@ -450,23 +459,28 @@ static ssize_t nfs_direct_read(struct kiocb *iocb, const struct iovec *iov,
ssize_t result = -ENOMEM;
struct inode *inode = iocb->ki_filp->f_mapping->host;
struct nfs_direct_req *dreq;
struct nfs_lock_context *l_ctx;
dreq = nfs_direct_req_alloc();
if (dreq == NULL)
goto out;
dreq->inode = inode;
dreq->bytes_left = iov_length(iov, nr_segs);
dreq->ctx = get_nfs_open_context(nfs_file_open_context(iocb->ki_filp));
dreq->l_ctx = nfs_get_lock_context(dreq->ctx);
if (dreq->l_ctx == NULL)
l_ctx = nfs_get_lock_context(dreq->ctx);
if (IS_ERR(l_ctx)) {
result = PTR_ERR(l_ctx);
goto out_release;
}
dreq->l_ctx = l_ctx;
if (!is_sync_kiocb(iocb))
dreq->iocb = iocb;
NFS_I(inode)->read_io += iov_length(iov, nr_segs);
result = nfs_direct_read_schedule_iovec(dreq, iov, nr_segs, pos, uio);
if (!result)
result = nfs_direct_wait(dreq);
NFS_I(inode)->read_io += result;
out_release:
nfs_direct_req_release(dreq);
out:
......@@ -706,6 +720,7 @@ static ssize_t nfs_direct_write_schedule_segment(struct nfs_pageio_descriptor *d
user_addr += req_len;
pos += req_len;
count -= req_len;
dreq->bytes_left -= req_len;
}
/* The nfs_page now hold references to these pages */
nfs_direct_release_pages(pagevec, npages);
......@@ -814,6 +829,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
get_dreq(dreq);
atomic_inc(&inode->i_dio_count);
NFS_I(dreq->inode)->write_io += iov_length(iov, nr_segs);
for (seg = 0; seg < nr_segs; seg++) {
const struct iovec *vec = &iov[seg];
result = nfs_direct_write_schedule_segment(&desc, vec, pos, uio);
......@@ -825,7 +841,6 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
pos += vec->iov_len;
}
nfs_pageio_complete(&desc);
NFS_I(dreq->inode)->write_io += desc.pg_bytes_written;
/*
* If no bytes were started, return the error, and let the
......@@ -849,16 +864,21 @@ static ssize_t nfs_direct_write(struct kiocb *iocb, const struct iovec *iov,
ssize_t result = -ENOMEM;
struct inode *inode = iocb->ki_filp->f_mapping->host;
struct nfs_direct_req *dreq;
struct nfs_lock_context *l_ctx;
dreq = nfs_direct_req_alloc();
if (!dreq)
goto out;
dreq->inode = inode;
dreq->bytes_left = count;
dreq->ctx = get_nfs_open_context(nfs_file_open_context(iocb->ki_filp));
dreq->l_ctx = nfs_get_lock_context(dreq->ctx);
if (dreq->l_ctx == NULL)
l_ctx = nfs_get_lock_context(dreq->ctx);
if (IS_ERR(l_ctx)) {
result = PTR_ERR(l_ctx);
goto out_release;
}
dreq->l_ctx = l_ctx;
if (!is_sync_kiocb(iocb))
dreq->iocb = iocb;
......
......@@ -259,7 +259,7 @@ nfs_file_fsync_commit(struct file *file, loff_t start, loff_t end, int datasync)
struct dentry *dentry = file->f_path.dentry;
struct nfs_open_context *ctx = nfs_file_open_context(file);
struct inode *inode = dentry->d_inode;
int have_error, status;
int have_error, do_resend, status;
int ret = 0;
dprintk("NFS: fsync file(%s/%s) datasync %d\n",
......@@ -267,15 +267,23 @@ nfs_file_fsync_commit(struct file *file, loff_t start, loff_t end, int datasync)
datasync);
nfs_inc_stats(inode, NFSIOS_VFSFSYNC);
do_resend = test_and_clear_bit(NFS_CONTEXT_RESEND_WRITES, &ctx->flags);
have_error = test_and_clear_bit(NFS_CONTEXT_ERROR_WRITE, &ctx->flags);
status = nfs_commit_inode(inode, FLUSH_SYNC);
if (status >= 0 && ret < 0)
status = ret;
have_error |= test_bit(NFS_CONTEXT_ERROR_WRITE, &ctx->flags);
if (have_error)
if (have_error) {
ret = xchg(&ctx->error, 0);
if (!ret && status < 0)
if (ret)
goto out;
}
if (status < 0) {
ret = status;
goto out;
}
do_resend |= test_bit(NFS_CONTEXT_RESEND_WRITES, &ctx->flags);
if (do_resend)
ret = -EAGAIN;
out:
return ret;
}
EXPORT_SYMBOL_GPL(nfs_file_fsync_commit);
......@@ -286,13 +294,22 @@ nfs_file_fsync(struct file *file, loff_t start, loff_t end, int datasync)
int ret;
struct inode *inode = file->f_path.dentry->d_inode;
ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
if (ret != 0)
goto out;
mutex_lock(&inode->i_mutex);
ret = nfs_file_fsync_commit(file, start, end, datasync);
mutex_unlock(&inode->i_mutex);
out:
do {
ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
if (ret != 0)
break;
mutex_lock(&inode->i_mutex);
ret = nfs_file_fsync_commit(file, start, end, datasync);
mutex_unlock(&inode->i_mutex);
/*
* If nfs_file_fsync_commit detected a server reboot, then
* resend all dirty pages that might have been covered by
* the NFS_CONTEXT_RESEND_WRITES flag
*/
start = 0;
end = LLONG_MAX;
} while (ret == -EAGAIN);
return ret;
}
......
......@@ -32,6 +32,8 @@
#include <asm/uaccess.h>
#include "internal.h"
#define NFSDBG_FACILITY NFSDBG_CLIENT
/*