The current security model is based around the flags AUTH, ENCRYPT and
SECURE. Starting with support for the Bluetooth 2.1 specification this is
no longer sufficient. The different security levels are now defined as
SDP, LOW, MEDIUM and SECURE.
Previously it was possible to set each security independently, but this
actually doesn't make a lot of sense. For Bluetooth the encryption depends
on a previous successful authentication. Also you can only update your
existing link key if you successfully created at least one before. And of
course the update of link keys without having proper encryption in place
is a security issue.
The new security levels from the Bluetooth 2.1 specification are now
used internally. All old settings are mapped to the new values and this
way it ensures that old applications still work. The only limitation
is that it is no longer possible to set authentication without also
enabling encryption. No application should have done this anyway since
this is actually a security issue. Without encryption the integrity of
the authentication can't be guaranteed.
As default for a new L2CAP or RFCOMM connection, the LOW security level
is used. The only exception here are the service discovery sessions on
PSM 1 where SDP level is used. To have similar security strength as with
a Bluetooth 2.0 and before combination key, the MEDIUM level should be
used. This is according to the Bluetooth specification. The MEDIUM level
will not require any kind of man-in-the-middle (MITM) protection. Only
the HIGH security level will require this.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
In order to decide if listening RFCOMM sockets should be accept()ed
the BD_ADDR of the remote device needs to be known. This patch adds
a socket option which defines a timeout for deferring the actual
connection setup.
The connection setup is done after reading from the socket for the
first time. Until then writing to the socket returns ENOTCONN.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The L2CAP and RFCOMM applications require support for authorization
and the ability of rejecting incoming connection requests. The socket
interface is not really able to support this.
This patch does the ground work for a socket option to defer connection
setup. Setting this option allows calling of accept() and then the
first read() will trigger the final connection setup. Calling close()
would reject the connection.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
It turns out that net_alive is unnecessary, and the original problem
that led to it being added was simply that the icmp code thought
it was a network device and wound up being unable to handle packets
while there were still packets in the network namespace.
Now that icmp and tcp have been fixed to properly register themselves
this problem is no longer present and we have a stronger guarantee
that packets will not arrive in a network namespace then that provided
by net_alive in netif_receive_skb. So remove net_alive allowing
packet reception run a little faster.
Additionally document the strong reason why network namespace cleanup
is safe so that if something happens again someone else will have
a chance of figuring it out.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Table size is defined as unsigned, wheres the table maximum size is
defined as a signed integer. The calculation of max is 8 or 4,
multiplied the table size. Therefore the max value is aligned to
unsigned.
Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
The untracked conntrack actually does usually have events marked for
delivery as its not special-cased in that part of the code. Skip the
actual delivery since it impacts performance noticeably.
Signed-off-by: Patrick McHardy <kaber@trash.net>
A long time ago we had bugs, primarily in TCP, where we would modify
skb->truesize (for TSO queue collapsing) in ways which would corrupt
the socket memory accounting.
skb_truesize_check() was added in order to try and catch this error
more systematically.
However this debugging check has morphed into a Frankenstein of sorts
and these days it does nothing other than catch false-positives.
Signed-off-by: David S. Miller <davem@davemloft.net>
During peeloff/accept() sctp needs to save the parent socket state
into the new socket so that any options set on the parent are
inherited by the child socket. This was found when the
parent/listener socket issues SO_BINDTODEVICE, but the
data was misrouted after a route cache flush.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SCTP incorrectly doubles rto ever time a Hearbeat chunk
is generated. However RFC 4960 states:
On an idle destination address that is allowed to heartbeat, it is
recommended that a HEARTBEAT chunk is sent once per RTO of that
destination address plus the protocol parameter 'HB.interval', with
jittering of +/- 50% of the RTO value, and exponential backoff of the
RTO if the previous HEARTBEAT is unanswered.
Essentially, of if the heartbean is unacknowledged, do we double the RTO.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sctp crc32c checksum is always generated in little endian.
So, we clean up the code to treat it as little endian and remove
all the __force casts.
Suggested by Herbert Xu.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a new version of my patch, now using a module parameter instead
of a sysctl, so that the option is harder to find. Please note that,
once the module is loaded, it is still possible to change the value of
the parameter in /sys/module/sctp/parameters/, which is useful if you
want to do performance comparisons without rebooting.
Computation of SCTP checksums significantly affects the performance of
SCTP. For example, using two dual-Opteron 246 connected using a Gbe
network, it was not possible to achieve more than ~730 Mbps, compared to
941 Mbps after disabling SCTP checksums.
Unfortunately, SCTP checksum offloading in NICs is not commonly
available (yet).
By default, checksums are still enabled, of course.
Signed-off-by: Lucas Nussbaum <lucas.nussbaum@ens-lyon.fr>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instructions for time stamping outgoing packets are take from the
socket layer and later copied into the new skb.
Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The overlap with the old SO_TIMESTAMP[NS] options is handled so
that time stamping in software (net_enable_timestamp()) is
enabled when SO_TIMESTAMP[NS] and/or SO_TIMESTAMPING_RX_SOFTWARE
is set. It's disabled if all of these are off.
Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Base versions handle constant folding now. For headers exposed to
userspace, we must only expose the __ prefixed versions.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a more flexible BSS lookup function so that mac80211 or
other drivers can actually use this for getting the BSS to
connect to.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch introduces cfg80211_unlink_bss, a function to
allow a driver to remove a BSS from the internal list and
make it not show up in scan results any more -- this is
to be used when the driver detects that the BSS is no
longer available.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When cfg80211 users have their own allocated data in the per-BSS
private data, they will need to free this when the BSS struct is
destroyed. Add a free_priv method and fix one place where the BSS
was kfree'd rather than released properly.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds basic scan capability to cfg80211/nl80211 and
changes mac80211 to use it. The BSS list that cfg80211 maintains
is made driver-accessible with a private area in each BSS struct,
but mac80211 doesn't yet use it. That's another large project.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The atomic requirement for the TSF callbacks
is outdated. get_tsf() is only called by
ieee80211_rx_bss_info() which is indirectly
called by the work queue ieee80211_sta_work().
In the same context are called several other
non-atomic functions, too.
And the atomic requirement causes problems
for drivers of USB wifi cards.
Signed-off-by: Alina Friedrichsen <x-alina@gmx.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The problem is that in_atomic() will return false inside spinlocks if
CONFIG_PREEMPT=n. This will lead to deadlockable GFP_KERNEL allocations
from spinlocked regions.
Secondly, if CONFIG_PREEMPT=y, this bug solves itself because networking
will instead use GFP_ATOMIC from this callsite. Hence we won't get the
might_sleep() debugging warnings which would have informed us of the buggy
callsites.
Solve both these problems by switching to in_interrupt(). Now, if someone
runs a gfp_any() allocation from inside spinlock we will get the warning
if CONFIG_PREEMPT=y.
I reviewed all callsites and most of them were too complex for my little
brain and none of them documented their interface requirements. I have no
idea what this patch will do.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Giving the signal in dB isn't much more useful to userspace
than giving the signal in unspecified units. This removes
some radiotap information for zd1211 (the only driver using
this flag), but it helps a lot for getting cfg80211-based
scanning which won't support dB, and zd1211 being dB is a
little fishy anyway.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Bruno Randolf <bruno@thinktube.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The function sock_alloc_send_pskb is completely useless if not
exported since most of the code in it won't be used as is. In
fact, this code has already been duplicated in the tun driver.
Now that we need accounting in the tun driver, we can in fact
use this function as is. So this patch marks it for export again.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
And switch bsockets to atomic_t since it might be changed in parallel.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy <kaber@trash.net> suggested:
> How about making this flag and the warning message (in a out-of-line
> function) globally available? Other qdiscs (f.i. HFSC) can't deal with
> inner non-work-conserving qdiscs as well.
This patch uses qdisc->flags field of "suspected" child qdisc.
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch enables low-level driver independent debugging of the TSF and remove the driver specific things of ath5k and ath9k from the debugfs.
Signed-off-by: Alina Friedrichsen <x-alina@gmx.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Using only the RTNL has a number of problems, most notably that
ieee80211_iterate_active_interfaces() and other interface list
traversals cannot be done from the internal workqueue because it
needs to be flushed under the RTNL.
This patch introduces a new mutex that protects the interface list
against modifications. A more detailed explanation is part of the
code change.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
If a driver is given a wiphy and it wants to get to its private
mac80211 driver area it can use wiphy_to_ieee80211_hw() to get first
to its ieee80211_hw and then access the private structure via hw->priv. The
wiphy_priv() is already being used internally by mac80211 and drivers
should not use this. This can be helpful in a drivers reg_notifier().
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This allows drivers to request strict regulatory settings to
be applied to its devices. This is desirable for devices where
proper calibration and compliance can only be gauranteed for
for the device's programmed regulatory domain. Regulatory
domain settings will be ignored until the device's own
regulatory domain is properly configured. If no regulatory
domain is received only the world regulatory domain will be
applied -- if OLD_REG (default to "US") is not enabled. If
OLD_REG behaviour is not acceptable to drivers they must
update their wiphy with a custom reuglatory prior to wiphy
registration.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Drivers may need more information than just who set the last regulatory domain,
as such lets just pass the last regulatory_request receipt. To do this we need
to move out to headers struct regulatory_request, and enum environment_cap. While
at it lets add documentation for enum environment_cap.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Drivers without firmware can also have custom regulatory maps
which do not map to a specific ISO / IEC alpha2 country code.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This can be used by drivers on the reg_notifier()
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This adds wiphy_apply_custom_regulatory() to be used by drivers
prior to wiphy registration to apply a custom regulatory domain.
This can be used by drivers that do not have a direct 1-1 mapping
between a regulatory domain and a country.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds a flag to notify drivers to start and stop
beaconing when needed, for example, during a scan run. Based
on Sujith's first patch to do the same, but now disables
beaconing for all virtual interfaces while scanning, has a
separate change flag and tracks user-space requests.
Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Since the standards only define 12 legacy rates, 32 is certainly
a sane upper limit and we don't need to use u64 everywhere. Add
sanity checking that no more than 32 rates are registered and
change the variables to u32 throughout.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Then one place can be a static const.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This should help implement suspend/resume in mac80211, these
hooks will be run before the device is suspended and after it
resumes. Therefore, they can touch the hardware as much as
they want to.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
A new nl80211 command, NL80211_CMD_SET_MGMT_EXTRA_IE, can be used to
add arbitrary IE data into the end of management frames. The interface
allows extra IEs to be configured for each management frame subtype, but
only some of them (ProbeReq, ProbeResp, Auth, (Re)AssocReq, Deauth,
Disassoc) are currently accepted in mac80211 implementation.
This makes it easier to implement IEEE 802.11 extensions like WPS and
FT that add IE(s) into some management frames. In addition, this can
be useful for testing and experimentation purposes.
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
For any callbacks in ieee80211_ops, specify what values the return
codes represent. While at it, fix a couple of capitalization and
punctuation differences.
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Reviewed-by: Kalle Valo <kalle.valo@iki.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This allows user space to determine whether a driver supports MFP and
behave properly without having to ask user to configure this in
MFP-optional mode.
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
If driver/firmware/hardware does not support CCMP for management
frames, it can now request mac80211 to take care of encrypting and
decrypting management frames (when MFP is enabled) in software. The
will need to add this new IEEE80211_KEY_FLAG_SW_MGMT flag when a CCMP
key is being configured for TX side and return the undecrypted frames
on RX side without RX_FLAG_DECRYPTED flag to use software CCMP for
management frames (but hardware for data frames).
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add mechanism for managing BIP keys (IGTK) and integrate BIP into the
TX/RX paths.
Signed-off-by: Jouni Malinen <j@w1.fi>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add flags for setting STA entries and struct ieee80211_if_sta to
indicate whether management frame protection (MFP) is used.
Signed-off-by: Jouni Malinen <j@w1.fi>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
We add support for multiple drivers to provide a regulatory_hint()
on a system by adding a wiphy specific regulatory domain cache.
This allows drivers to keep around cache their own regulatory domain
structure queried from CRDA.
We handle conflicts by intersecting multiple regulatory domains,
each driver will stick to its own regulatory domain though unless
a country IE has been received and processed.
If the user already requested a regulatory domain and a driver
requests the same regulatory domain then simply copy to the
driver's regd the same regulatory domain and do not call
CRDA, do not collect $200.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This modifies hardware flags for powersave to support three different
flags:
* IEEE80211_HW_SUPPORTS_PS - indicates general PS support
* IEEE80211_HW_PS_NULLFUNC_STACK - indicates nullfunc sending in software
* IEEE80211_HW_SUPPORTS_DYNAMIC_PS - indicates dynamic PS on the device
It also adds documentation for all this which explains how to set the
various flags.
Additionally, it fixes a few things:
* a spot where && was used to test flags
* enable CONF_PS only when associated again
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This will be needed for drivers that set the
IEEE80211_HW_NO_STACK_DYNAMIC_PS flag and still
want to handle dynamic PS.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Reviewed-by: Kalle Valo <kalle.valo@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The channel_type really doesn't need to be the only member in
a new structure, so remove the struct. Additionally, remove
the _CONF_CHANGE_HT flag and use _CONF_CHANGE_CHANNEL when the
channel type changes, since that's enough of a change to require
reprogramming the hardware anyway.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
I missed this during review of "mac80211: Fix tx power setting",
the user_power_level shouldn't be available to the driver but
rather be an internal value used to calculate the value for the
driver.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Vasanthakumar Thiagarajan <vasanth@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The set_key callback now seems rather odd, passing a MAC address
instead of a station struct, and a local address instead of a
vif struct. Change that.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Bob Copeland <me@bobcopeland.com> [ath5k]
Acked-by: Ivo van Doorn <ivdoorn@gmail.com> [rt2x00]
Acked-by: Christian Lamparter <chunkeey@web.de> [p54]
Tested-by: Kalle Valo <kalle.valo@nokia.com> [iwl3945]
Tested-by: Samuel Ortiz <samuel@sortiz.org> [iwl3945]
Signed-off-by: John W. Linville <linville@tuxdriver.com>
power_level in ieee80211_conf is being used for more than one
purpose. It being used as user configured power limit and the
final power limit given to the driver. By doing so, except very
first time, the tx power limit is taken from min(chan->max_power,
local->hw.conf.power_level) which is not what we want. This patch
defines a new memeber in ieee80211_conf which is meant only for
user configured power limit.
Signed-off-by: Vasanthakumar Thiagarajan <vasanth@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
We can simply use conf_is_ht() check where needed.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
In HT capable drivers you often need to check if you
are currently using HT20 or HT40. This adds a few small
helpers to let drivers figure that out.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
In commit 9db66bdcc8 (net: convert
TCP/DCCP ehash rwlocks to spinlocks), I forgot to change one
occurrence of rwlock_t to spinlock_t
I believe sizeof(raw_spinlock_t) might be > 0 on !CONFIG_SMP if
CONFIG_DEBUG_SPINLOCK while sizeof(raw_rwlock_t) should be 0 in this
case.
Fortunatly, CONFIG_DEBUG_SPINLOCK adds fields to both spinlock_t and
rwlock_t, but at this might change in the future (being able to debug
spinlocks but not rwlocks for example), better to be safe.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
crc32c algorithm provides a byteswaped result. On little-endian
arches, the result ends up in big-endian/network byte order.
On big-endinan arches, the result ends up in little-endian
order and needs to be byte swapped again. Thus calling cpu_to_le32
gives the right output.
Tested-by: Jukka Taimisto <jukka.taimisto@mail.suomi.net>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Preliminary work to make IPv4 multicast routing netns-aware.
Declare variable 'reg_vif_num' per-namespace, move into struct netns_ipv4.
At the moment, this variable is only referenced in init_net.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Preliminary work to make IPv4 multicast routing netns-aware.
Declare IPv multicast routing variables 'mroute_do_assert' and
'mroute_do_pim' per-namespace in struct netns_ipv4.
At the moment, these variables are only referenced in init_net.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Preliminary work to make IPv4 multicast routing netns-aware.
Declare variable cache_resolve_queue_len per-namespace: move it into
struct netns_ipv4.
This variable counts the number of unresolved cache entries queued in the
list mfc_unres_queue. This list is kept global to all netns as the number
of entries per namespace is limited to 10 (hardcoded in routine
ipmr_cache_unresolved).
Entries belonging to different namespaces in mfc_unres_queue will be
identified by matching the mfc_net member introduced previously in
struct mfc_cache.
Keeping this list global to all netns, also allows us to keep a single
timer (ipmr_expire_timer) to handle their expiration.
In some places cache_resolve_queue_len value was tested for arming
or deleting the timer. These tests were equivalent to testing
mfc_unres_queue value instead and are replaced in this patch.
At the moment, cache_resolve_queue_len is only referenced in init_net.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Preliminary work to make IPv4 multicast routing netns-aware.
Dynamically allocate IPv4 multicast forwarding cache, mfc_cache_array,
and move it to struct netns_ipv4.
At the moment, mfc_cache_array is only referenced in init_net.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Preliminary work to make IPv6 multicast routing netns-aware.
Dynamically allocate interface table vif_table and move it to
struct netns_ipv4, and update MIF_EXISTS() macro.
At the moment, vif_table is only referenced in init_net.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Preliminary work to make IPv4 multicast routing netns-aware.
Make IPv4 multicast routing mroute_socket per-namespace,
moves it into struct netns_ipv4.
At the moment, mroute_socket is only referenced in init_net.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
With simple extension to the binding mechanism, which allows to bind more
than 64k sockets (or smaller amount, depending on sysctl parameters),
we have to traverse the whole bind hash table to find out empty bucket.
And while it is not a problem for example for 32k connections, bind()
completion time grows exponentially (since after each successful binding
we have to traverse one bucket more to find empty one) even if we start
each time from random offset inside the hash table.
So, when hash table is full, and we want to add another socket, we have
to traverse the whole table no matter what, so effectivelly this will be
the worst case performance and it will be constant.
Attached picture shows bind() time depending on number of already bound
sockets.
Green area corresponds to the usual binding to zero port process, which
turns on kernel port selection as described above. Red area is the bind
process, when number of reuse-bound sockets is not limited by 64k (or
sysctl parameters). The same exponential growth (hidden by the green
area) before number of ports reaches sysctl limit.
At this time bind hash table has exactly one reuse-enbaled socket in a
bucket, but it is possible that they have different addresses. Actually
kernel selects the first port to try randomly, so at the beginning bind
will take roughly constant time, but with time number of port to check
after random start will increase. And that will have exponential growth,
but because of above random selection, not every next port selection
will necessary take longer time than previous. So we have to consider
the area below in the graph (if you could zoom it, you could find, that
there are many different times placed there), so area can hide another.
Blue area corresponds to the port selection optimization.
This is rather simple design approach: hashtable now maintains (unprecise
and racely updated) number of currently bound sockets, and when number
of such sockets becomes greater than predefined value (I use maximum
port range defined by sysctls), we stop traversing the whole bind hash
table and just stop at first matching bucket after random start. Above
limit roughly corresponds to the case, when bind hash table is full and
we turned on mechanism of allowing to bind more reuse-enabled sockets,
so it does not change behaviour of other sockets.
Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net>
Tested-by: Denys Fedoryschenko <denys@visp.net.lb>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix (delete) more mac80211 kernel-doc:
Warning(linux-2.6.28-git13//include/net/mac80211.h:375): Excess struct/union/enum/typedef member 'retry_count' description in 'ieee80211_tx_info'
Warning(linux-2.6.28-git13//net/mac80211/sta_info.h:308): Excess struct/union/enum/typedef member 'last_txrate' description in 'sta_info'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The kernel-doc was referring to member @debufs_dentry instead of
@debugfs_dentry.
Reported by Randy Dunlap http://marc.info/?l=linux-netdev&m=123147942302885&w=2
As well, escape the colon in the field's text description, as it is
causing the generated text to be erraticly broken up (with paragraphs
moved down). Could not find a reason why it is happening so, even when
other field descriptions use colons and work as expected.
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: (22 commits)
ioat: fix self test for multi-channel case
dmaengine: bump initcall level to arch_initcall
dmaengine: advertise all channels on a device to dma_filter_fn
dmaengine: use idr for registering dma device numbers
dmaengine: add a release for dma class devices and dependent infrastructure
ioat: do not perform removal actions at shutdown
iop-adma: enable module removal
iop-adma: kill debug BUG_ON
iop-adma: let devm do its job, don't duplicate free
dmaengine: kill enum dma_state_client
dmaengine: remove 'bigref' infrastructure
dmaengine: kill struct dma_client and supporting infrastructure
dmaengine: replace dma_async_client_register with dmaengine_get
atmel-mci: convert to dma_request_channel and down-level dma_slave
dmatest: convert to dma_request_channel
dmaengine: introduce dma_request_channel and private channels
net_dma: convert to dma_find_channel
dmaengine: provide a common 'issue_pending_all' implementation
dmaengine: centralize channel allocation, introduce dma_find_channel
dmaengine: up-level reference counting to the module level
...
Reported by Randy Dunlap from a warning in the v2.6.29 merge window
tree as of 2009/1/8.
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds GRO support for IPv6. IPv6 GRO supports extension
headers in the same way as GSO (by using the same infrastructure).
It's also simpler compared to IPv4 since we no longer have to worry
about fragmentation attributes or header checksums.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Definitions for the user/kernel API protocol through generic
netlink. User space can copy it verbatim and use it.
Kernel API definition declares the main data types and calls for the
drivers to integrate into the WiMAX stack. Provides usage
documentation.
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Use the general-purpose channel allocation provided by dmaengine.
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Simply, if a client wants any dmaengine channel then prevent all dmaengine
modules from being removed. Once the clients are done re-enable module
removal.
Why?, beyond reducing complication:
1/ Tracking reference counts per-transaction in an efficient manner, as
is currently done, requires a complicated scheme to avoid cache-line
bouncing effects.
2/ Per-transaction ref-counting gives the false impression that a
dma-driver can be gracefully removed ahead of its user (net, md, or
dma-slave)
3/ None of the in-tree dma-drivers talk to hot pluggable hardware, but
if such an engine were built one day we still would not need to notify
clients of remove events. The driver can simply return NULL to a
->prep() request, something that is much easier for a client to handle.
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Thanks to excellent diagnosis by Eduard Guzovsky.
The core problem is that on a network with lots of active
multicast traffic, the neighbour cache can fill up. If
we try to allocate a new route and thus neighbour cache
entry, the bog-standard GC attempt the neighbour layer does
in ineffective because route entries hold a reference
to the existing neighbour entries and GC can only liberate
entries with no references.
IPV4 already has a way to handle this, by doing a route cache
GC in such situations (when neigh attach returns -ENOBUFS).
So simply mimick this on the ipv6 side.
Tested-by: Eduard Guzovsky <eguzovsky@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update the NetLabel kernel API to expose the new features added in kernel
releases 2.6.25 and 2.6.28: the static/fallback label functionality and network
address based selectors.
Signed-off-by: Paul Moore <paul.moore@hp.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1429 commits)
net: Allow dependancies of FDDI & Tokenring to be modular.
igb: Fix build warning when DCA is disabled.
net: Fix warning fallout from recent NAPI interface changes.
gro: Fix potential use after free
sfc: If AN is enabled, always read speed/duplex from the AN advertising bits
sfc: When disabling the NIC, close the device rather than unregistering it
sfc: SFT9001: Add cable diagnostics
sfc: Add support for multiple PHY self-tests
sfc: Merge top-level functions for self-tests
sfc: Clean up PHY mode management in loopback self-test
sfc: Fix unreliable link detection in some loopback modes
sfc: Generate unique names for per-NIC workqueues
802.3ad: use standard ethhdr instead of ad_header
802.3ad: generalize out mac address initializer
802.3ad: initialize ports LACPDU from const initializer
802.3ad: remove typedef around ad_system
802.3ad: turn ports is_individual into a bool
802.3ad: turn ports is_enabled into a bool
802.3ad: make ntt bool
ixgbe: Fix set_ringparam in ixgbe to use the same memory pools.
...
Fixed trivial IPv4/6 address printing conflicts in fs/cifs/connect.c due
to the conversion to %pI (in this networking merge) and the addition of
doing IPv6 addresses (from the earlier merge of CIFS).
See commit 1045b03e07 ("netlink: fix
overrun in attribute iteration") for a detailed explanation of why
this patch is necessary.
In short, nlmsg_next() can make "remaining" go negative, and the
remaining >= sizeof(...) comparison will promote "remaining" to an
unsigned type, which means that the expression will evaluate to
true for negative numbers, even though it was not intended.
I put "theoretical" in the title because I have no evidence that
this can actually happen, but I suspect that a crafted netlink
packet can trigger some badness.
Note that the last test, which seemingly has the exact same
problem (also true for nla_ok()), is perfectly OK, since we
already know that remaining is positive.
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement socket option SCTP_GET_ASSOC_NUMBER of the latest ietf socket
extensions API draft.
8.2.5. Get the Current Number of Associations (SCTP_GET_ASSOC_NUMBER)
This option gets the current number of associations that are attached
to a one-to-many style socket. The option value is an uint32_t.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Provide a locking free version of iucv_message_receive and iucv_message_send
that do not call local_bh_enable in a spin_lock_(bh|irqsave)() context.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Data Center Bridging (DCB) had no way to know if setstate had failed in the
driver. This patch enables dcb netlink code to handle the status for the DCB
setstate interface. Likewise it allows the driver to return a failed status
if MSI-X isn't enabled.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Eric W Multanen <eric.w.multanen@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements dynamic power save for mac80211. Basically it
means enabling power save mode after an idle period. Implementing it
dynamically gives a good compromise of low power consumption and low
latency. Some hardware have support for this in firmware, but some
require the host to do it.
The dynamic power save is implemented by adding an timeout to
ieee80211_subif_start_xmit(). The timeout can be enabled from userspace
with Wireless Extensions. For example, the command below enables the
dynamic power save and sets the time timeout to 500 ms:
iwconfig wlan0 power timeout 500m
Power save now only works with devices which handle power save in firmware.
It's also disabled by default and the heuristics when and how to enable is
considered as a policy decision and will be left for the userspace to handle.
In case the firmware has support for this, drivers can disable this feature
with IEEE80211_HW_NO_STACK_DYNAMIC_PS.
Big thanks to Johannes Berg for the help with the design and code.
Signed-off-by: Kalle Valo <kalle.valo@nokia.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds option for HT-enabled drivers to report HT rates
(HT20/HT40, short GI, MCS index) to mac80211. These rates are
currently not in the rate table, so the rate_idx is used to indicate
MCS index.
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
HT management is done differently for AP and STA modes, unify
to just the ->config() callback since HT is fundamentally a
PHY property and cannot be per-BSS.
Rename enum nl80211_sec_chan_offset as nl80211_channel_type to denote
the channel type ( NO_HT, HT20, HT40+, HT40- ).
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds signal strength and transmission bitrate
to the station_info of nl80211.
Signed-off-by: Henning Rogge <rogge@fgan.de>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
GPRS TX flow control won't need to lock the underlying socket anymore.
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to pad irda_skb_cb in order to keep it safe accross dev_queue_xmit()
calls. This is some ugly and temporary hack triggered by recent qisc code
changes.
Even though it fixes bugzilla.kernel.org bug #11795, it will be replaced by a
proper fix before 2.6.29 is released.
Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the TCP-specific portion of GRO. The criterion for
merging is extremely strict (the TCP header must match exactly apart
from the checksum) so as to allow refragmentation. Otherwise this
is pretty much identical to LRO, except that we support the merging
of ECN packets.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds GRO support for IPv4.
The criteria for merging is more stringent than LRO, in particular,
we require all fields in the IP header to be identical except for
the length, ID and checksum. In addition, the ID must form an
arithmetic sequence with a difference of one.
The ID requirement might seem overly strict, however, most hardware
TSO solutions already obey this rule. Linux itself also obeys this
whether GSO is in use or not.
In future we could relax this rule by storing the IDs (or rather
making sure that we don't drop them when pulling the aggregate
skb's tail).
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch replaces the newly introduced sta_notify_ps function,
which can be used to notify the driver about every power state
transition for all associated stations, by integrating its functionality
back into the original sta_notify callback.
Signed-off-by: Christian Lamparter <chunkeey@web.de>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
There's no driver that actually does fragmentation on the
device, and the callback is buggy (when it returns an error,
mac80211's fragmentation status is changed so reading the
frag threshold from userspace reads the new value despite
the error). Let's just remove it, if we really find some
hardware supporting it we can add it back later.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Mention more possible STA entries and document the atomic requirement.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Preliminary work to make IPv6 multicast forwarding netns-aware.
Declare variable 'reg_vif_num' per-namespace, moves into struct netns_ipv6.
At the moment, this variable is only referenced in init_net.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Preliminary work to make IPv6 multicast forwarding netns-aware.
Declare IPv6 multicast forwarding variables 'mroute_do_assert' and
'mroute_do_pim' per-namespace in struct netns_ipv6.
At the moment, these variables are only referenced in init_net.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Preliminary work to make IPv6 multicast forwarding netns-aware.
Declare variable cache_resolve_queue_len per-namespace: moves it into
struct netns_ipv6.
This variable counts the number of unresolved cache entries queued in the
list mfc_unres_queue. This list is kept global to all netns as the number
of entries per namespace is limited to 10 (hardcoded in routine
ip6mr_cache_unresolved).
Entries belonging to different namespaces in mfc_unres_queue will be
identified by matching the mfc_net member introduced previously in
struct mfc6_cache.
Keeping this list global to all netns, also allows us to keep a single
timer (ipmr_expire_timer) to handle their expiration.
In some places cache_resolve_queue_len value was tested for arming
or deleting the timer. These tests were equivalent to testing
mfc_unres_queue value instead and are replaced in this patch.
At the moment, cache_resolve_queue_len is only referenced in init_net.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Preliminary work to make IPv6 multicast forwarding netns-aware.
Dynamically allocates IPv6 multicast forwarding cache, mfc6_cache_array,
and moves it to struct netns_ipv6.
At the moment, mfc6_cache_array is only referenced in init_net.
Replace 'ARRAY_SIZE(mfc6_cache_array)' with mfc6_cache_array size: MFC6_LINES.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Preliminary work to make IPv6 multicast forwarding netns-aware.
Dynamically allocates interface table vif6_table and moves it to
struct netns_ipv6, and updates MIF_EXISTS() macro.
At the moment, vif6_table is only referenced in init_net.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Preliminary work to make IPv6 multicast forwarding netns-aware.
Make IPv6 multicast forwarding mroute6_socket per-namespace,
moves it into struct netns_ipv6.
At the moment, mroute6_socket is only referenced in init_net.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes htmldocs warning:
Warning(mac80211.h:379): No description found for parameter 'pad[2]'
Signed-off-by: Kalle Valo <kalle.valo@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch is necessary in order to provide a proper Access point support for p54.
Unfortunately for us, there is no documented way to disable the interfering
power save buffering mechanism in firmware completely.
Therefore we give in and notify the driver through our new sta_notify_ps callback,
so that we can update the filter state.
Signed-off-by: Christian Lamparter <chunkeey@web.de>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
No need to pad the header so no constant needed for that,
no need to carry any version number from netbsd nor CVS
IDs from them.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
further reducing wext code in mac80211.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch moves the SIOCGIWNAME handling from mac80211 to cfg80211.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds new NL80211_CMD_SET_WIPHY attributes
NL80211_ATTR_WIPHY_FREQ and NL80211_ATTR_WIPHY_SEC_CHAN_OFFSET to allow
userspace to set the operating channel (e.g., hostapd for AP mode).
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Conflicts:
fs/nfsd/nfs4recover.c
Manually fixed above to use new creds API functions, e.g.
nfs4_save_creds().
Signed-off-by: James Morris <jmorris@namei.org>
With the introduction of CONFIG_DYNAMIC_PRINTK_DEBUG it is possible to
allow debugging without having to recompile the kernel. This patch turns
all BT_DBG() calls into pr_debug() to support dynamic debug messages.
As a side effect all CONFIG_BT_*_DEBUG statements are now removed and
some broken debug entries have been fixed.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The Bluetooth subsystem was not using the HCI Reset command when doing
device initialization. The Bluetooth 1.0b specification was ambiguous
on how the device firmware was suppose to handle it. Almost every device
was triggering a transport reset at the same time. In case of USB this
ended up in disconnects from the bus.
All modern Bluetooth dongles handle this perfectly fine and a lot of
them actually require that HCI Reset is sent. If not then they are
either stuck in their HID Proxy mode or their internal structures for
inquiry and paging are not correctly setup.
To handle old and new devices smoothly the Bluetooth subsystem contains
a quirk to force the HCI Reset on initialization. However maintaining
such a quirk becomes more and more complicated. This patch turns the
logic around and lets the old devices disable the HCI Reset command.
The only device where the HCI_QUIRK_NO_RESET is still needed are the
original Digianswer devices and dongles with an early CSR firmware.
CSR reported that they fixed this for version 12 firmware. The last
official release of version 11 firmware is build ID 115. The first
version 12 candidate was build ID 117.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Reviewed-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is an implementation of David Miller's suggested fix in:
https://bugzilla.redhat.com/show_bug.cgi?id=470201
It has been updated to use wait_event() instead of
wait_event_interruptible().
Paraphrasing the description from the above report, it makes sendmsg()
block while UNIX garbage collection is in progress. This avoids a
situation where child processes continue to queue new FDs over a
AF_UNIX socket to a parent which is in the exit path and running
garbage collection on these FDs. This contention can result in soft
lockups and oom-killing of unrelated processes.
Signed-off-by: dann frazier <dannf@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since all other gen_estimator functions use bstats and rate_est params
together, and searching for them is optimized now, let's use this also
in gen_estimator_active(). The return type of gen_estimator_active()
is changed to bool, and gen_find_node() parameters to const, btw.
In tcf_act_police_locate() a check for ACT_P_CREATED is added before
calling gen_estimator_active().
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of using one atomic_t per protocol, use a percpu_counter
for "orphan_count", to reduce cache line contention on
heavy duty network servers.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of using one atomic_t per protocol, use a percpu_counter
for "sockets_allocated", to reduce cache line contention on
heavy duty network servers.
Note : We revert commit (248969ae31
net: af_unix can make unix_nr_socks visbile in /proc),
since it is not anymore used after sock_prot_inuse_add() addition
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Found that while trying average rate policing, it was possible to
request average rate policing without a rate estimator. This results
in no policing which is harmless but incorrect.
Since policing could be setup in two steps, need to check
in the kernel.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make
net.core.xfrm_aevent_etime
net.core.xfrm_acq_expires
net.core.xfrm_aevent_rseqth
net.core.xfrm_larval_drop
sysctls per-netns.
For that make net_core_path[] global, register it to prevent two
/proc/net/core antries and change initcall position -- xfrm_init() is called
from fs_initcall, so this one should be fs_initcall at least.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SA and SPD flush are executed with NULL SA and SPD respectively, for
these cases pass netns explicitly from userspace socket.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pass netns pointer to struct xfrm_policy_afinfo::garbage_collect()
[This needs more thoughts on what to do with dst_ops]
[Currently stub to init_net]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pass netns to xfrm_lookup()/__xfrm_lookup(). For that pass netns
to flow_cache_lookup() and resolver callback.
Take it from socket or netdevice. Stub DECnet to init_net.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add netns parameter to xfrm_policy_bysel_ctx(), xfrm_policy_byidx().
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Per-netns hashes are independently resizeable.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Again, to avoid complications with passing netns when not necessary.
Again, ->xp_net is set-once field, once set it never changes.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Disallow spurious wakeups in __xfrm_lookup().
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
State GC is per-netns, and this is part of it.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
km_waitq is going to be made per-netns to disallow spurious wakeups
in __xfrm_lookup().
To not wakeup after every garbage-collected xfrm_state (which potentially
can be from different netns) make state GC list per-netns.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All of this is implicit passing which netns's hashes should be resized.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since hashtables are per-netns, they can be independently resized.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is done to get
a) simple "something leaked" check
b) cover possible DoSes when other netns puts many, many xfrm_states
onto a list.
c) not miss "alien xfrm_state" check in some of list iterators in future.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To avoid unnecessary complications with passing netns around.
* set once, very early after allocating
* once set, never changes
For a while create every xfrm_state in init_net.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds API to cfg80211 to allow wireless drivers to inform
us if their firmware can handle regulatory considerations *and*
they cannot map these regulatory domains to an ISO / IEC 3166
alpha2. In these cases we skip the first regulatory hint instead
of expecting the driver to build their own regulatory structure,
providing us with an alpha2, or using the reg_notifier().
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Acked-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This adds country IE parsing to mac80211 and enables its usage
within the new regulatory infrastructure in cfg80211. We parse
the country IEs only on management beacons for the BSSID you are
associated to and disregard the IEs when the country and environment
(indoor, outdoor, any) matches the already processed country IE.
To avoid following misinformed or outdated APs we build and use
a regulatory domain out of the intersection between what the AP
provides us on the country IE and what CRDA is aware is allowed
on the same country.
A secondary device is allowed to follow only the same country IE
as it make no sense for two devices on a system to be in two
different countries.
In the case the AP is using country IEs for an incorrect country
the user may help compliance further by setting the regulatory
domain before or after the IE is parsed and in that case another
intersection will be performed.
CONFIG_WIRELESS_OLD_REGULATORY is supported but requires CRDA
present.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
fix this warning:
net/netfilter/nf_conntrack_proto_tcp.c: In function \u2018tcp_in_window\u2019:
net/netfilter/nf_conntrack_proto_tcp.c:491: warning: unused variable \u2018net\u2019
net/netfilter/nf_conntrack_proto_tcp.c: In function \u2018tcp_packet\u2019:
net/netfilter/nf_conntrack_proto_tcp.c:812: warning: unused variable \u2018net\u2019
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
During SACK processing, most of the benefits of TSO are eaten by
the SACK blocks that one-by-one fragment SKBs to MSS sized chunks.
Then we're in problems when cleanup work for them has to be done
when a large cumulative ACK comes. Try to return back to pre-split
state already while more and more SACK info gets discovered by
combining newly discovered SACK areas with the previous skb if
that's SACKed as well.
This approach has a number of benefits:
1) The processing overhead is spread more equally over the RTT
2) Write queue has less skbs to process (affect everything
which has to walk in the queue past the sacked areas)
3) Write queue is consistent whole the time, so no other parts
of TCP has to be aware of this (this was not the case with
some other approach that was, well, quite intrusive all
around).
4) Clean_rtx_queue can release most of the pages using single
put_page instead of previous PAGE_SIZE/mss+1 calls
In case a hole is fully filled by the new SACK block, we attempt
to combine the next skb too which allows construction of skbs
that are even larger than what tso split them to and it handles
hole per on every nth patterns that often occur during slow start
overshoot pretty nicely. Though this to be really useful also
a retransmission would have to get lost since cumulative ACKs
advance one hole at a time in the most typical case.
TODO: handle upwards only merging. That should be rather easy
when segment is fully sacked but I'm leaving that as future
work item (it won't make very large difference anyway since
this current approach already covers quite a lot of normal
cases).
I was earlier thinking of some sophisticated way of tracking
timestamps of the first and the last segment but later on
realized that it won't be that necessary at all to store the
timestamp of the last segment. The cases that can occur are
basically either:
1) ambiguous => no sensible measurement can be taken anyway
2) non-ambiguous is due to reordering => having the timestamp
of the last segment there is just skewing things more off
than does some good since the ack got triggered by one of
the holes (besides some substle issues that would make
determining right hole/skb even harder problem). Anyway,
it has nothing to do with this change then.
I choose to route some abnormal looking cases with goto noop,
some could be handled differently (eg., by stopping the
walking at that skb but again). In general, they either
shouldn't happen at all or are rare enough to make no difference
in practice.
In theory this change (as whole) could cause some macroscale
regression (global) because of cache misses that are taken over
the round-trip time but it gets very likely better because of much
less (local) cache misses per other write queue walkers and the
big recovery clearing cumulative ack.
Worth to note that these benefits would be very easy to get also
without TSO/GSO being on as long as the data is in pages so that
we can merge them. Currently I won't let that happen because
DSACK splitting at fragment that would mess up pcounts due to
sk_can_gso in tcp_set_skb_tso_segs. Once DSACKs fragments gets
avoided, we have some conditions that can be made less strict.
TODO: I will probably have to convert the excessive pointer
passing to struct sacktag_state... :-)
My testing revealed that considerable amount of skbs couldn't
be shifted because they were cloned (most likely still awaiting
tx reclaim)...
[The rest is considering future work instead since I got
repeatably EFAULT to tcpdump's recvfrom when I added
pskb_expand_head to deal with clones, so I separated that
into another, later patch]
...To counter that, I gave up on the fifth advantage:
5) When growing previous SACK block, less allocs for new skbs
are done, basically a new alloc is needed only when new hole
is detected and when the previous skb runs out of frags space
...which now only happens of if reclaim is fast enough to dispose
the clone before the SACK block comes in (the window is RTT long),
otherwise we'll have to alloc some.
With clones being handled I got these numbers (will be somewhat
worse without that), taken with fine-grained mibs:
TCPSackShifted 398
TCPSackMerged 877
TCPSackShiftFallback 320
TCPSACKCOLLAPSEFALLBACKGSO 0
TCPSACKCOLLAPSEFALLBACKSKBBITS 0
TCPSACKCOLLAPSEFALLBACKSKBDATA 0
TCPSACKCOLLAPSEFALLBACKBELOW 0
TCPSACKCOLLAPSEFALLBACKFIRST 1
TCPSACKCOLLAPSEFALLBACKPREVBITS 318
TCPSACKCOLLAPSEFALLBACKMSS 1
TCPSACKCOLLAPSEFALLBACKNOHEAD 0
TCPSACKCOLLAPSEFALLBACKSHIFT 0
TCPSACKCOLLAPSENOOPSEQ 0
TCPSACKCOLLAPSENOOPSMALLPCOUNT 0
TCPSACKCOLLAPSENOOPSMALLLEN 0
TCPSACKCOLLAPSEHOLE 12
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can reduce pressure on dst entry refcount that slowdown UDP transmit
path on SMP machines. This pressure is visible on RTP servers when
delivering content to mediagateways, especially big ones, handling
thousand of streams. Several cpus send UDP frames to the same
destination, hence use the same dst entry.
This patch makes ip_append_data() eventually steal the refcount its
callers had to take on the dst entry.
This doesnt avoid all refcounting, but still gives speedups on SMP,
on UDP/RAW transmit path
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the last step to be able to perform full RCU lookups
in __inet_lookup() : After established/timewait tables, we
add RCU lookups to listening hash table.
The only trick here is that a socket of a given type (TCP ipv4,
TCP ipv6, ...) can now flight between two different tables
(established and listening) during a RCU grace period, so we
must use different 'nulls' end-of-chain values for two tables.
We define a large value :
#define LISTENING_NULLS_BASE (1U << 29)
So that slots in listening table are guaranteed to have different
end-of-chain values than slots in established table. A reader can
still detect it finished its lookup in the right chain.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the slub allocator is used, kmem_cache_create() may merge two or more
kmem_cache's into one but the cache name pointer is not updated and
kmem_cache_name() is no longer guaranteed to return the pointer passed
to the former function. This patch stores the kmalloc'ed pointers in the
corresponding request_sock_ops and timewait_sock_ops structures.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can avoid some useless instructions if !CONFIG_NET_NS
Because of RCU, we use INET_MATCH or INET_TW_MATCH twice for the found
socket, so thats six instructions less per incoming TCP packet.
Yet another tbench speedup :)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds #include <linux/timer.h> in lib80211.h to avoid
these compilation erros.
> In file included from /work/src/wireless-testing/net/wireless/lib80211.c:24:
> /work/src/wireless-testing/include/net/lib80211.h:113: error: field
> 'crypt_deinit_timer' has incomplete type
> /work/src/wireless-testing/net/wireless/lib80211.c: In function
> 'lib80211_crypt_info_init':
> /work/src/wireless-testing/net/wireless/lib80211.c:83: error: implicit
> declaration of function 'setup_timer'
> /work/src/wireless-testing/net/wireless/lib80211.c: In function
> 'lib80211_crypt_info_free':
> /work/src/wireless-testing/net/wireless/lib80211.c:95: error: implicit
> declaration of function 'del_timer_sync'
> /work/src/wireless-testing/net/wireless/lib80211.c: In function
> 'lib80211_crypt_deinit_handler':
> /work/src/wireless-testing/net/wireless/lib80211.c:157: error:
> implicit declaration of function 'add_timer'
> /work/src/wireless-testing/net/wireless/lib80211.c: In function
> 'lib80211_crypt_delayed_deinit':
> /work/src/wireless-testing/net/wireless/lib80211.c:182: error:
> implicit declaration of function 'timer_pending'
> make[3]: *** [net/wireless/lib80211.o] Error 1
> make[2]: *** [net/wireless] Error 2
> make[1]: *** [net] Error 2
> make: *** [sub-make] Error 2
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Otherwise, the BUILD_BUG_ON calls in ieee80211_tx_info_clear_status can
fail on some architectures.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
These bits are shared already between ipw2x00 and hostap, and could
probably be shared both more cleanly and with other drivers. This
commit simply relocates the code to lib80211 and adjusts the drivers
appropriately.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Delete kernel-doc struct descriptions for fields that don't exist:
Warning(include/net/mac80211.h:1263): Excess struct/union/enum/typedef member 'conf_ht' description in 'ieee80211_ops'
Warning(net/mac80211/sta_info.h:309): Excess struct/union/enum/typedef member 'addr' description in 'sta_info'
Warning(net/mac80211/sta_info.h:309): Excess struct/union/enum/typedef member 'aid' description in 'sta_info'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
cc: Johannes Berg <johannes@sipsolutions.net>
cc: John W. Linville <linville@tuxdriver.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
On ARM alignment is done slightly different from other architectures.
struct ieee80211_tx_rate is aligned to word size, even though it only has 3
single-byte members, which triggers the BUILD_BUG_ON in
ieee80211_tx_info_clear_status
This patch marks the struct ieee80211_tx_rate as packed, so that ARM
behaves like the other architectures.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Adds an interface to configure the Backward Congestion Notification
(BCN) feature. In a BCN capabale network, congestion notifications
from congested points out in the network can cause the end station
limit the rate of a given traffic flow.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds a netlink interface for Data Center Bridging (DCB) to get and set
the enable state of the Priority Flow Control (PFC) feature.
Primarily, this is a way to turn off PFC in the driver while DCB
remains enabled.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds interface for Data Center Bridging (DCB) to query (and set if
supported) the number of traffic classes currently supported by the
device for the two (DCB) features: priority groups (PG) and priority
flow control (PFC).
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds to the netlink interface for Data Center Bridging (DCB), allowing
the DCB capabilities supported by a device to be queried.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds support for Data Center Bridging (DCB) features in the ixgbe
driver and adds an rtnetlink interface for configuring DCB to the
kernel. The DCB feature support included are Priority Grouping (PG) -
which allows bandwidth guarantees to be allocated to groups to traffic
based on the 802.1q priority, and Priority Based Flow Control (PFC) -
which introduces a new MAC control PAUSE frame which works at
granularity of the 802.1p priority instead of the link (IEEE 802.3x).
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now TCP & DCCP use RCU lookups, we can convert ehash rwlocks to spinlocks.
/proc/net/tcp and other seq_file 'readers' can safely be converted to 'writers'.
This should speedup writers, since spin_lock()/spin_unlock()
only use one atomic operation instead of two for write_lock()/write_unlock()
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch prepares RCU migration of listening_hash table for
TCP/DCCP protocols.
listening_hash table being small (32 slots per protocol), we add
a spinlock for each slot, instead of a single rwlock for whole table.
This should reduce hold time of readers, and writers concurrency.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The first argument to csum_partial is const void *
casts to char/u8 * are not necessary
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before ieee80211_notify_mac() was added, it was presented with the
use case of using it to tell mac80211 that the association may
have been lost because the firmware crashed/reset.
Since then, it has also been used by iwlwifi to (slightly) speed
up re-association after resume, a workaround around the fact that
mac80211 has no suspend/resume handling yet. It is also not used
by any other drivers, so clearly it cannot be necessary for "good
enough" suspend/resume.
Unfortunately, the callback suffers from a severe problem: It only
works for station mode. If suspend/resume happens while in IBSS or
any other mode (but station), then the callback is pointless.
Recently, it has created a number of locking issues, first because
it required rtnl locking rather than RCU due to calling sleeping
functions within the critical section, and now because it's called
by iwlwifi from the mac80211 workqueue that may not use the rtnl
because it is flushed under rtnl.
(cf. http://bugzilla.kernel.org/show_bug.cgi?id=12046)
I think, therefore, that we should take a step back, remove it
entirely for now and add the small feature it provided properly.
For suspend and resume we will need to introduce new hooks, and for
the case where the firmware was reset the driver will probably
simply just pretend it has done a suspend/resume cycle to get
mac80211 to reprogram the hardware completely, not just try to
connect to the current AP again in station mode. When doing so, we
will need to take into account locking issues and possibly defer
to schedule_work from within mac80211 for the resume operation,
while the suspend operation must be done directly.
Proper suspend/resume should also not necessarily try to reconnect
to the current AP, the time spent in suspend may have been short
enough to not be disconnected from the AP, mac80211 will detect
that the AP went out of range quickly if it did, and if the
association is lost then the AP will disassoc as soon as a data
frame is sent. We might also take into account WWOL then, and
have mac80211 program the hardware into such a mode where it is
available and requested.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
net/netfilter/nfnetlink_log.c:537:1: warning: symbol 'nfulnl_log_packet' was not declared. Should it be static?
Including the proper header also revealed an incorrect prototype.
Signed-off-by: Patrick McHardy <kaber@trash.net>
As for now, the creation and update of conntracks via ctnetlink do not
propagate an event to userspace. This can result in inconsistent situations
if several userspace processes modify the connection tracking table by means
of ctnetlink at the same time. Specifically, using the conntrack command
line tool and conntrackd at the same time can trigger unconsistencies.
This patch also modifies the event cache infrastructure to pass the
process PID and the ECHO flag to nfnetlink_send() to report back
to userspace if the process that triggered the change needs so.
Based on a suggestion from Patrick McHardy.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch adds module loading for helpers via ctnetlink.
* Creation path: We support explicit and implicit helper assignation. For
the explicit case, we try to load the module. If the module is correctly
loaded and the helper is present, we return EAGAIN to re-start the
creation. Otherwise, we return EOPNOTSUPP.
* Update path: release the spin lock, load the module and check. If it is
present, then return EAGAIN to re-start the update.
This patch provides a refactorized function to lookup-and-set the
connection tracking helper. The function removes the exported symbol
__nf_ct_helper_find as it has not clients anymore.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch adds the macro MODULE_ALIAS_NFCT_HELPER that defines a
way to provide generic and persistent aliases for the connection
tracking helpers.
This next patch requires this patch.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Simply delete ops from list and let list debugging do the job.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As found in the past (commit f1dd9c379c
[NET]: Fix tbench regression in 2.6.25-rc1), it is really
important that struct dst_entry refcount is aligned on a cache line.
We cannot use __atribute((aligned)), so manually pad the structure
for 32 and 64 bit arches.
for 32bit : offsetof(truct dst_entry, __refcnt) is 0x80
for 64bit : offsetof(truct dst_entry, __refcnt) is 0xc0
As it is not possible to guess at compile time cache line size,
we use a generic value of 64 bytes, that satisfies many current arches.
(Using 128 bytes alignment on 64bit arches would waste 64 bytes)
Add a BUILD_BUG_ON to catch future updates to "struct dst_entry" dont
break this alignment.
"tbench 8" is 4.4 % faster on a dual quad core (HP BL460c G1), Intel E5450 @3.00GHz
(2350 MB/s instead of 2250 MB/s)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RCU was added to UDP lookups, using a fast infrastructure :
- sockets kmem_cache use SLAB_DESTROY_BY_RCU and dont pay the
price of call_rcu() at freeing time.
- hlist_nulls permits to use few memory barriers.
This patch uses same infrastructure for TCP/DCCP established
and timewait sockets.
Thanks to SLAB_DESTROY_BY_RCU, no slowdown for applications
using short lived TCP connections. A followup patch, converting
rwlocks to spinlocks will even speedup this case.
__inet_lookup_established() is pretty fast now we dont have to
dirty a contended cache line (read_lock/read_unlock)
Only established and timewait hashtable are converted to RCU
(bind table and listen table are still using traditional locking)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a straightforward patch, using hlist_nulls infrastructure.
RCUification already done on UDP two weeks ago.
Using hlist_nulls permits us to avoid some memory barriers, both
at lookup time and delete time.
Patch is large because it adds new macros to include/net/sock.h.
These macros will be used by TCP & DCCP in next patch.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fix this warning:
net/bluetooth/af_bluetooth.c:60: warning: ‘bt_key_strings’ defined but not used
net/bluetooth/af_bluetooth.c:71: warning: ‘bt_slock_key_strings’ defined but not used
this is a lockdep macro problem in the !LOCKDEP case.
We cannot convert it to an inline because the macro works on multiple types,
but we can mark the parameter used.
[ also clean up a misaligned tab in sock_lock_init_class_and_name() ]
[ also remove #ifdefs from around af_family_clock_key strings - which
were certainly added to get rid of the ugly build warnings. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
After implementing qdisc->ops->peek() and changing sch_netem into
classless qdisc there are no more qdisc->ops->requeue() users. This
patch removes this method with its wrappers (qdisc_requeue()), and
also unused qdisc->requeue structure. There are a few minor fixes of
warnings (htb_enqueue()) and comments btw.
The idea to kill ->requeue() and a similar patch were first developed
by David S. Miller.
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The urg_ptr field is not used anywhere and is merely confusing.
Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
security/keys/internal.h
security/keys/process_keys.c
security/keys/request_key.c
Fixed conflicts above by using the non 'tsk' versions.
Signed-off-by: James Morris <jmorris@namei.org>
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.
Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().
Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: netdev@vger.kernel.org
Signed-off-by: James Morris <jmorris@namei.org>
Every user is under CONFIG_NET_DMA already, so ifdef field as well.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using read_pnet() and write_pnet() in neighbour code ease the reading
of code.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can shrink size of "struct inet_bind_bucket" by 50%, using
read_pnet() and write_pnet()
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces two helpers that deal with reading and writing
struct net pointers in various network structures.
Their implementation depends on CONFIG_NET_NS
For symmetry, both functions work with "struct net **pnet".
Their usage should reduce the number of #ifdef CONFIG_NET_NS,
without adding many helpers for each network structure
that hold a "struct net *pointer"
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
->pde isn't actually needed, since name is stashed in ->id.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ensure regulatory converstion macros safely accept
multiple arguments and make REG_RULE() use them.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add a new attribute, NL80211_ATTR_WIPHY_TXQ_PARAMS, that can be used with
NL80211_CMD_SET_WIPHY for userspace (e.g., hostapd) to set TX queue
parameters (txop, cwmin, cwmax, aifs).
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add a new attribute, NL80211_ATTR_BSS_BASIC_RATES, that can be used with
NL80211_CMD_SET_BSS for userspace (e.g., hostapd) to set which rates are
in the basic rate set.
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This adds a helper function that, given a bitmap of basic
rates and a bitrate returns the response rate for this rate.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Send a notification to the driver on succesful
reception of an ADDBA response, add IEEE80211_AMPDU_TX_RESUME
for this purpose.
Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Remove the SSID from the driver API since now there is no
driver that requires knowing the SSID and I think it's
unlikely that any hardware design that does require the
SSID will play well with mac80211.
This also removes support for setting the SSID in master
mode which will require a patch to hostapd to not try.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Previously I assumed that the receive queues of candidates don't
change during the GC. This is only half true, nothing can be received
from the queues (see comment in unix_gc()), but buffers could be added
through the other half of the socket pair, which may still have file
descriptors referring to it.
This can result in inc_inflight_move_tail() erronously increasing the
"inflight" counter for a unix socket for which dec_inflight() wasn't
previously called. This in turn can trigger the "BUG_ON(total_refs <
inflight_refs)" in a later garbage collection run.
Fix this by only manipulating the "inflight" counter for sockets which
are candidates themselves. Duplicating the file references in
unix_attach_fds() is also needed to prevent a socket becoming a
candidate for GC while the skb that contains it is not yet queued.
Reported-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__scm_destroy() walks the list of file descriptors in the scm_fp_list
pointed to by the scm_cookie argument.
Those, in turn, can close sockets and invoke __scm_destroy() again.
There is nothing which limits how deeply this can occur.
The idea for how to fix this is from Linus. Basically, we do all of
the fput()s at the top level by collecting all of the scm_fp_list
objects hit by an fput(). Inside of the initial __scm_destroy() we
keep running the list until it is empty.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds better IPv6 failover support for bonding devices,
especially when in active-backup mode and there are only IPv6 addresses
configured, as reported by Alex Sidorenko.
- Creates a new file, net/drivers/bonding/bond_ipv6.c, for the
IPv6-specific routines. Both regular bonds and VLANs over bonds
are supported.
- Adds a new tunable, num_unsol_na, to limit the number of unsolicited
IPv6 Neighbor Advertisements that are sent on a failover event.
Default is 1.
- Creates two new IPv6 neighbor discovery functions:
ndisc_build_skb()
ndisc_send_skb()
These were required to support VLANs since we have to be able to
add the VLAN id to the skb since ndisc_send_na() and friends
shouldn't be asked to do this. These two routines are basically
__ndisc_send() split into two pieces, in a slightly different order.
- Updates Documentation/networking/bonding.txt and bumps the rev of bond
support to 3.4.0.
On failover, this new code will generate one packet:
- An unsolicited IPv6 Neighbor Advertisement, which helps the switch
learn that the address has moved to the new slave.
Testing has shown that sending just the NA results in pretty good
behavior when in active-back mode, I saw no lost ping packets for example.
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
A packet dequeued and stored as gso_skb in qdisc_peek_dequeued() should
be seen as part of the queue for sch->q.qlen queries until it's really
dequeued with qdisc_dequeue_peeked(), so qlen needs additional updating
in these functions. (Updating qstats.backlog shouldn't matter here.)
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch modifies xt_NFLOG to suppress the call to nf_log_packet()
function. The call of this wrapper in xt_NFLOG was causing NFLOG to
use the first initialized module. Thus, if ipt_ULOG is loaded before
nfnetlink_log all NFLOG rules are treated as plain LOG rules.
Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Remove the 'supports_ipv6' scheduler flag since all schedulers now
support IPv6.
Signed-off-by: Julius Volz <julius.volz@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One parameter wasn't described and one I forgot to update when
renaming it; also update TBDs in sta_info.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The code needs to be split out and cleaned up, so as a
first step remove the capability, to add it back in a
subsequent patch as a separate function. Also remove the
publically facing return value of the function and the
wiphy argument. A number of internal functions go from
being generic helpers to just being used for alpha2
setting.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The regdom struct is given to the core, so it might as well
free it in error conditions.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Wireless HW without any dedicated queues for aggregation
do not need the ampdu_queues mechanism present right now
in mac80211. Since mac80211 is still incomplete wrt TX MQ
changes, do not allow aggregation sessions for drivers that
set ampdu_queues.
This is only an interim hack until Intel fixes the requeue issue.
Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: Luis Rodriguez <Luis.Rodriguez@Atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
It is unnecessary and of questionable value. Also remove
is_empty_ssid, as it is also unnecessary.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This function requires an internal lock to be held, so it cannot
be published to other modules in the kernel.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>