The functions used for working with ceph page vectors are defined
with char pointers, but they're really intended to operate on
untyped data. Change the types of these function parameters
to (void *) to reflect this.
(Note that the functions now assume void pointer arithmetic works
like arithmetic on char pointers.)
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
There are three ceph page vector functions declared in
"fs/ceph/super.h" that don't belong there. They're
probably left over from some long-ago code reorganization.
They're properly declared in "include/linux/ceph/libceph.h"
so just delete the ones in "super.h".
This and the next few commits resolve:
http://tracker.ceph.com/issues/4053
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Add support for CEPH_OSD_OP_STAT operations in the osd client
and in rbd.
This operation sends no data to the osd; everything required is
encoded in identity of the target object.
The result will be ENOENT if the object doesn't exist. If it does
exist and no other error occurs the server returns the size and last
modification time of the target object as output data (in little
endian format). The size is a 64 bit unsigned and the time is
ceph_timespec structure (two unsigned 32-bit integers, representing
a seconds and nanoseconds value).
This resolves:
http://tracker.ceph.com/issues/4007
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
The for_each_obj_request*() macros should parenthesize their uses of
the ireq parameter.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Simplify the way the data length recorded in a message header is
calculated in ceph_osdc_build_request().
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Update ceph_mds_state_name() and ceph_mds_op_name() to include the
newly-added definitions in "ceph_fs.h", and to match its counterpart
in the user space code.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Update most of "include/linux/ceph/ceph_fs.h" to match its user
space counterpart in "src/include/ceph_fs.h" in the ceph tree.
Everything that has changed is either:
- added definitions (therefore no real effect on existing code)
- deleting unused symbols
- added or revised comments
There were some differences between the struct definitions for
ceph_mon_subscribe_item and the open field of ceph_mds_request_args;
those differences remain.
This and the next commit resolve:
http://tracker.ceph.com/issues/4165
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
In osd_req_encode_op() there are a few cases that handle osd
opcodes that are never used in the kernel. The presence of
this code gives the impression it's correct (which really can't
be assumed), and may impose some unnecessary restrictions on
some upcoming refactoring of this code.
So delete this effectively dead code, and report uses of the
previously handled cases as unsupported.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
If osd_req_encode_op() is given any opcode it doesn't recognize
it reports an error.
This patch fleshes out that routine to distinguish between
well-defined but unsupported values and values that are simply
bogus.
This and the next commit are related to:
http://tracker.ceph.com/issues/4126
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Update ceph_osd_op_name() to include the newly-added definitions in
"rados.h", and to match its counterpart in the user space code.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Add the definition of ceph_osd_state_name(), to match its
counterpart in user space.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Update most of "include/linux/ceph/rados.h" to match its user space
counterpart in "src/include/rados.h" in the ceph tree.
Almost everything that has changed is either:
- added or revised comments
- added definitions (therefore no real effect on existing code)
- defining the same value a different way (e.g., "1 << 0" vs "1")
The only exceptions are:
- The declaration of ceph_osd_state_name() was excluded; that
will be inserted in the next patch.
- ceph_osd_op_mode_read() and ceph_osd_op_mode_modify() are
defined differently, but they were never used in the kernel
- CEPH_OSD_FLAG_PEERSTAT is now CEPH_OSD_FLAG_PEERSTAT_OLD, but
that was never used in the kernel
Anything that was present in this file but not in its user space
counterpart was left intact here. I left the definitions of
EOLDSNAPC and EBLACKLISTED using numerical values here; I'm
not sure the right way to go with those.
This and the next two commits resolve:
http://tracker.ceph.com/issues/4164
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
There are no actual users of ceph_osdc_wait_event(). This would
have been one-shot events, but we no longer support those so just
get rid of this function.
Since this leaves nothing else that waits for the completion of an
event, we can get rid of the completion in a struct ceph_osd_event.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
There is only one caller of ceph_osdc_create_event(), and it
provides 0 as its "one_shot" argument. Get rid of that argument and
just use 0 in its place.
Replace the code in handle_watch_notify() that executes if one_shot
is nonzero in the event with a BUG_ON() call.
While modifying "osd_client.c", give handle_watch_notify() static
scope.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
There is no caller of ceph_calc_raw_layout() outside of libceph, so
there's no need to export from the module.
Furthermore, there is only one caller, in calc_layout(), and it
is not much more than a simple wrapper for that function.
So get rid of ceph_calc_raw_layout() and embed it instead within
calc_layout().
While touching "osd_client.c", get rid of the unnecessary forward
declaration of __send_request().
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
The only callers of ceph_osdc_init() and ceph_osdc_stop()
ceph_create_client() and ceph_destroy_client() (respectively)
and they are in the same kernel module as those two functions.
There's therefore no need to export those interfaces, so don't.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Two of the three callers of the osd client's send_queued() function
already hold the osd client mutex and drop it before the call.
Change send_queued() so it assumes the caller holds the mutex, and
update all callers accordingly. Rename it __send_queued() to match
the convention used elsewhere in the file with respect to the lock.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
The "num_reply" parameter to ceph_osdc_new_request() is never
used inside that function, so get rid of it.
Note that ceph_sync_write() passes 2 for that argument, while all
other callers pass 1. It doesn't matter, but perhaps someone should
verify this doesn't indicate a problem.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
There is only one caller of ceph_osdc_writepages(), and it always
passes 0 as its "flags" argument. Get rid of that argument and
replace its use in ceph_osdc_writepages() with 0.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
There is only one caller of ceph_osdc_writepages(), and it always
passes 0 as its "dosync" argument. Get rid of that argument and
replace its use in ceph_osdc_writepages() with 0.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
There is only one caller of ceph_osdc_writepages(), and it always
passes the value true as its "nofail" argument. Get rid of that
argument and replace its use in ceph_osdc_writepages() with the
constant value true.
This and a number of cleanup patches that follow resolve:
http://tracker.ceph.com/issues/4126
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
The layout of struct ceph_osd_req_op leaves lots of holes.
Rearranging things a little for better field alignment
reduces the size by a third.
This resolves:
http://tracker.ceph.com/issues/4163
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Somehow, I missed this little item in Documentation/atomic_ops.txt:
*** WARNING: atomic_read() and atomic_set() DO NOT IMPLY BARRIERS! ***
Create and use some helper functions that include the proper memory
barriers for manipulating the done field.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
This commit:
bc7a62ee5 rbd: prevent open for image being removed
added checking for removing rbd before allowing an open, and used
the same request spinlock for protecting that and updating the open
count as is used for the request queue.
However it used the non-irq protected version of the spinlocks.
Fix that.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
There is a check in the completion path for osd requests that
ensures the number of pages allocated is enough to hold the amount
of incoming data expected.
For bio requests coming from rbd the "number of pages" is not really
meaningful (although total length would be). So stop requiring that
nr_pages be supplied for bio requests. This is done by checking
whether the pages pointer is null before checking the value of
nr_pages.
Note that this value is passed on to the messenger, but there it's
only used for debugging--it's never used for validation.
While here, change another spot that used r_pages in a debug message
inappropriately, and also invalidate the r_con_filling_msg pointer
after dropping a reference to it.
This resolves:
http://tracker.ceph.com/issues/3875
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Currently, if the OSD client finds an osd request has had a bio list
attached to it, it drops a reference to it (or rather, to the first
entry on that list) when the request is released.
The code that added that reference (i.e., the rbd client) is
therefore required to take an extra reference to that first bio
structure.
The osd client doesn't really do anything with the bio pointer other
than transfer it from the osd request structure to outgoing (for
writes) and ingoing (for reads) messages. So it really isn't the
right place to be taking or dropping references.
Furthermore, the rbd client already holds references to all bio
structures it passes to the osd client, and holds them until the
request is completed. So there's no need for this extra reference
whatsoever.
So remove the bio_put() call in ceph_osdc_release_request(), as
well as its matching bio_get() call in rbd_osd_req_create().
This change could lead to a crash if old libceph.ko was used with
new rbd.ko. Add a compatibility check at rbd initialization time to
avoid this possibilty.
This resolves:
http://tracker.ceph.com/issues/3798 and
http://tracker.ceph.com/issues/3799
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
An upcoming change implements semantic change that could lead to
a crash if an old version of the libceph kernel module is used with
a new version of the rbd kernel module.
In order to preclude that possibility, this adds a compatibilty
check interface. If this interface doesn't exist, the modules are
obviously not compatible. But if it does exist, this provides a way
of letting the caller know whether it will operate properly with
this libceph module.
Perhaps confusingly, it returns false right now. The semantic
change mentioned above will make it return true.
This resolves:
http://tracker.ceph.com/issues/3800
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
An open request for a mapped rbd image can arrive while removal of
that mapping is underway. We need to prevent such an open request
from succeeding. (It appears that Maciej Galkiewicz ran into this
problem.)
Define and use a "removing" flag to indicate a mapping is getting
removed. Set it in the remove path after verifying nothing holds
the device open. And check it in the open path before allowing the
open to proceed. Acquire the rbd device's lock around each of these
spots to avoid any races accessing the flags and open_count fields.
This addresses:
http://tracker.newdream.net/issues/3427
Reported-by: Maciej Galkiewicz <maciejgalkiewicz@ragnarson.com>
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Define a new rbd device flags field, manipulated using bit
operations. Replace the use of the current "exists" flag with a bit
in this new "flags" field. Add a little commentary about the
"exists" flag, which does not need to be manipulated atomically.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
When we register an osd request to linger, it means that request
will stay around (under control of the osd client) until we've
unregistered it. We do that for an rbd image's header object, and
we keep a pointer to the object request associated with it.
Keep a reference to the watch object request for as long as it is
registered to linger. Drop it again after we've removed the linger
registration.
This resolves:
http://tracker.ceph.com/issues/3937
(Note: this originally came about because the osd client was
issuing a callback more than once. But that behavior will be
changing soon, documented in tracker issue 3967.)
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Decrement the obj_request_count value when deleting an object
request from its image request's list. Rearrange a few lines
in the surrounding code.
This resolves:
http://tracker.ceph.com/issues/3940
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Switch to keeping track of the object request pointer rather than
the osd request used to watch the rbd image header object.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Move the code that unregisters an rbd device's lingering header
object watch request into rbd_dev_header_watch_sync(), so it
occurs in the same function that originally sets up that request.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Get rid rbd_req_sync_exec() because it is no longer used. That
eliminates the last use of rbd_req_sync_op(), so get rid of that
too. And finally, that leaves rbd_do_request() unreferenced, so get
rid of that.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Reimplement synchronous object method calls using the new request
tracking code. Use the name rbd_obj_method_sync()
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
When we receive notification of a change to an rbd image's header
object we need to refresh our information about the image (its
size and snapshot context). Once we have refreshed our rbd image
we need to acknowledge the notification.
This acknowledgement was previously done synchronously, but there's
really no need to wait for it to complete.
Change it so the caller doesn't wait for the notify acknowledgement
request to complete. And change the name to reflect it's no longer
synchronous.
This resolves:
http://tracker.newdream.net/issues/3877
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Get rid rbd_req_sync_notify_ack() because it is no longer used.
As a result rbd_simple_req_cb() becomes unreferenced, so get rid
of that too.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Use the new object request tracking mechanism for handling a
notify_ack request.
Move the callback function below the definition of this so we don't
have to do a pre-declaration.
This resolves:
http://tracker.newdream.net/issues/3754
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Get rid of rbd_req_sync_watch(), because it is no longer used.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Implement a new function to set up or tear down a watch event
for an mapped rbd image header using the new request code.
Create a new object request type "nodata" to handle this. And
define rbd_osd_trivial_callback() which simply marks a request done.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Delete rbd_req_sync_read() is no longer used, so get rid of it.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Reimplement the synchronous read operation used for reading a
version 1 header using the new request tracking code. Name the
resulting function rbd_obj_read_sync() to better reflect that
it's a full object operation, not an object request. To do this,
implement a new OBJ_REQUEST_PAGES object request type.
This implements a new mechanism to allow the caller to wait for
completion for an rbd_obj_request by calling rbd_obj_request_wait().
This partially resolves:
http://tracker.newdream.net/issues/3755
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
The two remaining callers of rbd_do_request() always pass a null
collection pointer, so the "coll" and "coll_index" parameters are
not needed. There is no other use of that data structure, so it
can be eliminated.
Deleting them means there is no need to allocate a rbd_request
structure for the callback function. And since that's the only use
of *that* structure, it too can be eliminated.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Now that the request function has been replaced by one using the new
request management data structures the old one can go away.
Deleting it makes rbd_dev_do_request() no longer needed, and
deleting that makes other functions unneeded, and so on.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
This patch fully implements the new request tracking code for rbd
I/O requests.
Each I/O request to an rbd image will get an rbd_image_request
structure allocated to track it. This provides access to all
information about the original request, as well as access to the
set of one or more object requests that are initiated as a result
of the image request.
An rbd_obj_request structure defines a request sent to a single osd
object (possibly) as part of an rbd image request. An rbd object
request refers to a ceph_osd_request structure built up to represent
the request; for now it will contain a single osd operation. It
also provides space to hold the result status and the version of the
object when the osd request completes.
An rbd_obj_request structure can also stand on its own. This will
be used for reading the version 1 header object, for issuing
acknowledgements to event notifications, and for making object
method calls.
All rbd object requests now complete asynchronously with respect
to the osd client--they supply a common callback routine.
This resolves:
http://tracker.newdream.net/issues/3741
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
The ceph messenger has a few spots that are only used when
bio messages are supported, and that's only when CONFIG_BLOCK
is defined. This surrounds a couple of spots with #ifdef's
that would cause a problem if CONFIG_BLOCK were not present
in the kernel configuration.
This resolves:
http://tracker.ceph.com/issues/3976
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Allow individual fields of the layout to be fetched via getxattr.
The ceph.dir.layout.* vxattr with "disappear" if the exists_cb
indicates there no dir layout set.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Sam Lang <sam.lang@inktank.com>
This virtual xattr will only appear when there is a dir layout policy
set on the directory. It can be set via setxattr and removed via
removexattr (implemented by the MDS).
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Sam Lang <sam.lang@inktank.com>
Implement a new method to generate the ceph.file.layout vxattr using
the new framework.
Use 'stripe_unit' instead of 'chunk_size'.
Include pool name, either as a string or as an integer.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Sam Lang <sam.lang@inktank.com>
Only include vxattrs in the result if they are not hidden and exist
(as determined by the exists_cb callback).
Note that the buffer size we return when 0 is passed in always includes
vxattrs that *might* exist, forming an upper bound.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Sam Lang <sam.lang@inktank.com>