We need to call fscache_wait_on_page_write in launder_page
for fscache
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
We need to ihold even in cached mode
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
We need to call v9fs_cache_inode_set_cookie in create
path also
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
With the old code we were not setting the file->f_op
with cached file operations during creat.
(format correction by jvrao@linux.vnet.ibm.com)
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
If a transport prefers payload to be sent separate from the PDU
(P9_TRANS_PREF_PAYLOAD_SEP), there is no need to allocate msize
PDU buffers(struct p9_fcall).
This patch allocates only upto 4k buffers for this kind of transports
and there won't be any change to the legacy transports.
Hence, this patch on top of zero copy changes allows user to
specify higher msizes through the mount option
without hogging the kernel heap.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
This takes care of copying out error buffers from user buffer
payloads when we are using zero copy. This happens because the
only payload buffer the server has to respond to the request is
the user buffer given for the zero copy read.
Because we only use zerocopy when the amount of data to transfer
is greater than a certain size (currently 4K) and error strings are
limited to ERRMAX (currently 128) we don't need to worry about there
being sufficient space for the error to fit in the payload.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Modify p9_client_readdir() to check the transport preference and act according
If the preference is P9_TRANS_PREF_PAYLOAD_SEP, send the payload
separately instead of putting it directly on PDU.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Modify p9_client_write() to check the transport preference and act accordingly.
If the preference is P9_TRANS_PREF_PAYLOAD_SEP, send the payload
separately instead of putting it directly on PDU.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Modify p9_client_read() to check the transport preference and act accordingly.
If the preference is P9_TRANS_PREF_PAYLOAD_SEP, send the payload
separately instead of putting it directly on PDU.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
This patch adds preferences field to the p9_trans_module.
Through this, now transport layer can express its preference about the
payload. i.e if payload neds to be part of the PDU or it prefers it
to be sent sepearetly so that the transport layer can handle it in
a better way.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Modify p9_virtio_request() and req_done() functions to support
additional payload sent down to the transport layer through
tc->pubuf and tc->pkbuf.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
This will be used by the transport layer to determine the out going
request type. Transport layer uses this information to correctly
place the mapped pages in the PDU. Patches following this will make
use of this to achieve zero copy.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
This patch prepares p9_fcall structure for zero copy. Added
fields send the payload buffer information to the transport layer.
In addition it adds a 'private' field for the transport layer to
store mapped/pinned page information so that it can be freed/unpinned
during req_done.
This patch also creates trans_common.[ch] to house helper functions.
It adds the following helper functions.
p9_release_req_pages - Release pages after the transaction.
p9_nr_pages - Return number of pages needed to accomodate the payload.
payload_gup - Translates user buffer into kernel pages.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Current code sets access=user as default for all protocol versions.
This patch chagnes it to "client" only for dotl.
User can always specify particular access mode with -o access= option.
No change there.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
The mount option access=client is overloaded as it assumes acl too.
Adding posixacl option to enable POSIX ACLs makes it explicit and clear.
Also it is convenient in the future to add other types of acls like richacls.
Ideally, the access mode 'client' should be just like V9FS_ACCESS_USER
except it underscores the location of access check.
Traditional 9P protocol lets the server perform access checks but with
this mode, all the access checks will be performed on the client itself.
Server just follows the client's directive.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
If the kernel is not compiled with CONFIG_9P_FS_POSIX_ACL and the
mount option is specified to enable ACLs current code fails the mount.
This patch brings the behavior inline with other filesystems like ext3
by proceeding with the mount and log a warning to syslog.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
With create/mkdir/mknod in non cached mode we initialize the inode using
v9fs_get_inode. v9fs_get_inode doesn't initialize the cache inode value
to NULL. This is causing to trip on BUG_ON in v9fs_get_cached_acl.
Fix is to initialize acls to NULL and not to leave them in ACL_NOT_CACHED
state.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
In v9fs_get_acl() if __v9fs_get_acl() gets only one of the
dacl/pacl we are not releasing it.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reset vector can be setup by bootloader and kernel doens't need
to touch it. If you require to setup reset vector, please use
CONFIG_MANUAL_RESET_VECTOR throught menuconfig.
It is not possible to setup address 0x0 as reset address because
make no sense to set it up at all.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Signed-off-by: John Williams <john.williams@petalogix.com>
If soft reset falls through with no hardware assisted reset, the best
we can do is jump to the reset vector and see what the bootloader left
for us.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Signed-off-by: John Williams <john.williams@petalogix.com>
Microblaze vector table stores several vectors (reset, user exception,
interrupt, debug exception and hardware exception).
All these functions can be below address 0x10000. If they are, wrong
vector table is genarated because jump is not setup from two instructions
(imm upper 16bit and brai lower 16bit).
Adding specific offset prevent problem if address is below 0x10000.
For this case only brai instruction is used.
Signed-off-by: Michal Simek <monstr@monstr.eu>
As per RCU glock patch review comments, don't use the _raw
version of this function here.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
native_flush_tlb_others() is called from:
flush_tlb_current_task()
flush_tlb_mm()
flush_tlb_page()
All these functions disable preemption explicitly, so we can use
smp_processor_id() instead of get_cpu() and put_cpu().
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Cc: Cliff Wickman <cpw@sgi.com>
LKML-Reference: <4D7EC791.4040003@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
1b4b:91a3 seems to be another PCI ID for marvell ahci. Add it.
Reported and tested in the following thread.
http://thread.gmane.org/gmane.linux.kernel/1068354
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Borislav Petkov <bp@alien8.de>
Reported-by: Alessandro Tagliapietra <tagliapietra.alessandro@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Most ata_id_XXX inlines are simple tests, so we should set
the return value to 'bool' here.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Just need to make sure that AF_UNIX garbage collector won't
confuse O_PATHed socket on filesystem for real AF_UNIX opened
socket.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
For readlinkat() we simply allow empty pathname; it will fail unless
we have dfd equal to O_PATH-opened symlink, so we are outside of
POSIX scope here. For fchownat() and fstatat() we allow AT_EMPTY_PATH;
let the caller explicitly ask for such behaviour.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
At that point we can't do almost nothing with them. They can be opened
with O_PATH, we can manipulate such descriptors with dup(), etc. and
we can see them in /proc/*/{fd,fdinfo}/*.
We can't (and won't be able to) follow /proc/*/fd/* symlinks for those;
there's simply not enough information for pathname resolution to go on
from such point - to resolve a symlink we need to know which directory
does it live in.
We will be able to do useful things with them after the next commit, though -
readlinkat() and fchownat() will be possible to use with dfd being an
O_PATH-opened symlink and empty relative pathname. Combined with
open_by_handle() it'll give us a way to do realink-by-handle and
lchown-by-handle without messing with more redundant syscalls.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
New flag for open(2) - O_PATH. Semantics:
* pathname is resolved, but the file itself is _NOT_ opened
as far as filesystem is concerned.
* almost all operations on the resulting descriptors shall
fail with -EBADF. Exceptions are:
1) operations on descriptors themselves (i.e.
close(), dup(), dup2(), dup3(), fcntl(fd, F_DUPFD),
fcntl(fd, F_DUPFD_CLOEXEC, ...), fcntl(fd, F_GETFD),
fcntl(fd, F_SETFD, ...))
2) fcntl(fd, F_GETFL), for a common non-destructive way to
check if descriptor is open
3) "dfd" arguments of ...at(2) syscalls, i.e. the starting
points of pathname resolution
* closing such descriptor does *NOT* affect dnotify or
posix locks.
* permissions are checked as usual along the way to file;
no permission checks are applied to the file itself. Of course,
giving such thing to syscall will result in permission checks (at
the moment it means checking that starting point of ....at() is
a directory and caller has exec permissions on it).
fget() and fget_light() return NULL on such descriptors; use of
fget_raw() and fget_raw_light() is needed to get them. That protects
existing code from dealing with those things.
There are two things still missing (they come in the next commits):
one is handling of symlinks (right now we refuse to open them that
way; see the next commit for semantics related to those) and another
is descriptor passing via SCM_RIGHTS datagrams.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
File system UUID is made available to application
via /proc/<pid>/mountinfo
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
File system UUID is made available to application
via /proc/<pid>/mountinfo
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
We add a per superblock uuid field. File systems should
update the uuid in the fill_super callback
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This patch add new syscalls to x86_64
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This patch adds new syscalls to x86_32
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Now that VFS check for inode->i_nlink == 0 and returns proper
error, remove similar check from file system
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Add inode->i_nlink == 0 check in VFS. Some of the file systems
do this internally. A followup patch will remove those instance.
This is needed to ensure that with link by handle we don't allow
to create hardlink of an unlinked file. The check also prevent a race
between unlink and link
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The syscall also return mount id which can be used
to lookup file system specific information such as uuid
in /proc/<pid>/mountinfo
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The hibernate image size autotuning mechanism sets the default
image size to 5/2 of the total system RAM, but it is reported
that on some systems device drivers allocate substantial
amounts of memory during suspend and the creation of the image
fails as a result (too little memory is preallocated).
Modify the autotuning mechanism to use 1/3 instead of 2/5 of RAM
as the default image size, which is reported to be sufficient for
the affected systems.
References: https://bugzilla.kernel.org/show_bug.cgi?id=30482
Reported-and-tested-by: Martin Steigerwald <Martin@Lichtvoll.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Some subsystems need to carry out suspend/resume and shutdown
operations with one CPU on-line and interrupts disabled. The only
way to register such operations is to define a sysdev class and
a sysdev specifically for this purpose which is cumbersome and
inefficient. Moreover, the arguments taken by sysdev suspend,
resume and shutdown callbacks are practically never necessary.
For this reason, introduce a simpler interface allowing subsystems
to register operations to be executed very late during system suspend
and shutdown and very early during resume in the form of
strcut syscore_ops objects.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
I have a machine where entering deep C-states broke.
pm_qos was a hot candidate, but I couldn't find any way to double
check without the need of recompiling.
While in this case it was a driver bug (ath9k):
https://bugzilla.kernel.org/show_bug.cgi?id=27532
powertop or others may want to read out cpu_dma_latency
restrictions which could be the cause of preventing a machine
entering deeper C-states.
Output with this patch:
# default value of 2000 * USEC_PER_SEC (0x77359400)
cat /dev/network_latency |hexdump
0000000 9400 7735
0000004
# value of 55 us which is the reason for not entering C2
cat /dev/cpu_dma_latency |hexdump
0000000 0037 0000
0000004
There is no reason to hide this info -> make pm_qos files readable.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
opp_find_freq_exact() documentation has is_available instead
of available. This also fixes warning with the kernel-doc:
scripts/kernel-doc drivers/base/power/opp.c >/dev/null
Warning(drivers/base/power/opp.c:246): No description found for parameter 'available'
Warning(drivers/base/power/opp.c:246): Excess function parameter 'is_available' description in 'opp_find_freq_exact'
Signed-off-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
The code handling system-wide power transitions (eg. suspend-to-RAM)
can in theory execute callbacks provided by the device's bus type,
device type and class in each phase of the power transition. In
turn, the runtime PM core code only calls one of those callbacks at
a time, preferring bus type callbacks to device type or class
callbacks and device type callbacks to class callbacks.
It seems reasonable to make them both behave in the same way in that
respect. Moreover, even though a device may belong to two subsystems
(eg. bus type and device class) simultaneously, in practice power
management callbacks for system-wide power transitions are always
provided by only one of them (ie. if the bus type callbacks are
defined, the device class ones are not and vice versa). Thus it is
possible to modify the code handling system-wide power transitions
so that it follows the core runtime PM code (ie. treats the
subsystem callbacks as mutually exclusive).
On the other hand, the core runtime PM code will choose to execute,
for example, a runtime suspend callback provided by the device type
even if the bus type's struct dev_pm_ops object exists, but the
runtime_suspend pointer in it happens to be NULL. This is confusing,
because it may lead to the execution of callbacks from different
subsystems during different operations (eg. the bus type suspend
callback may be executed during runtime suspend of the device, while
the device type callback will be executed during system suspend).
Make all of the power management code treat subsystem callbacks in
a consistent way, such that:
(1) If the device's type is defined (eg. dev->type is not NULL)
and its pm pointer is not NULL, the callbacks from dev->type->pm
will be used.
(2) If dev->type is NULL or dev->type->pm is NULL, but the device's
class is defined (eg. dev->class is not NULL) and its pm pointer
is not NULL, the callbacks from dev->class->pm will be used.
(3) If dev->type is NULL or dev->type->pm is NULL and dev->class is
NULL or dev->class->pm is NULL, the callbacks from dev->bus->pm
will be used provided that both dev->bus and dev->bus->pm are
not NULL.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Kevin Hilman <khilman@ti.com>
Reasoning-sounds-sane-to: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
'n' defaults are pretty pointless and actually bogus when used with
prompt-less config options.
The "bool"/"default y" pair with no prompt can be expressed more
compactly using def_bool.
[rjw: Rebased on top of earlier patches modifying this file.]
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
The platform bus type is often used to handle Systems-on-a-Chip (SoC)
where all devices are represented by objects of type struct
platform_device. In those cases the same "platform" device driver
may be used with multiple different system configurations, but the
actions needed to put the devices it handles into a low-power state
and back into the full-power state may depend on the design of the
given SoC. The driver, however, cannot possibly include all the
information necessary for the power management of its device on all
the systems it is used with. Moreover, the device hierarchy in its
current form also is not suitable for representing this kind of
information.
The patch below attempts to address this problem by introducing
objects of type struct dev_power_domain that can be used for
representing power domains within a SoC. Every struct
dev_power_domain object provides a sets of device power
management callbacks that can be used to perform what's needed for
device power management in addition to the operations carried out by
the device's driver and subsystem.
Namely, if a struct dev_power_domain object is pointed to by the
pwr_domain field in a struct device, the callbacks provided by its
ops member will be executed in addition to the corresponding
callbacks provided by the device's subsystem and driver during all
power transitions.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Tested-and-acked-by: Kevin Hilman <khilman@ti.com>
The variable pm_flags is used to prevent APM from being enabled
along with ACPI, which would lead to problems. However, acpi_init()
is always called before apm_init() and after acpi_init() has
returned, it is known whether or not ACPI will be used. Namely, if
acpi_disabled is not set after acpi_init() has returned, this means
that ACPI is enabled. Thus, it is sufficient to check acpi_disabled
in apm_init() to prevent APM from being enabled in parallel with
ACPI.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Len Brown <len.brown@intel.com>