linux

Author	SHA1	Message	Date
Linas Vepstas	ac325acd50	[PATCH] powerpc/pseries: clear PCI failure counter if no new failures The current PCI error recovery system keeps track of the number of PCI card resets, and refuses to bring a card back up if this number is too large. The goal of doing this was to avoid an infinite loop of resets if a card is obviously dead. However, if the failures are rare, but the machine has a high uptime, this mechanism might still be triggered; this is too harsh. This patch will avoids this problem by decrementing the fail count after an hour. Thus, as long as a pci card BSOD's less than 6 times an hour, it will continue to be reset indefinitely. If it's failure rate is greater than that, it will be taken off-line permanently. This patch is larger than it might otherwise be because it changes indentation by removing a pointless while-loop. The while loop is not needed, as the handler is invoked once fo each event (by schedule_work()); the loop is leftover cruft from an earlier implementation. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-04-22 18:46:13 +10:00
Linas Vepstas	a219be2cf4	[PATCH] powerpc/pseries: fix device name printing, again. The recent patch to print device names in EEH reset messages was lacking ... this patch works better. Signed-off-by: Linas Vepstas <linas@linas.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-04-01 22:37:02 +11:00
Linas Vepstas	8df83028cf	[PATCH] powerpc/pseries: print message if EEH recovery fails The current code prints an ambiguous message if the recovery of a failed PCI device fails. Give this special case its own unique message. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-04-01 22:35:01 +11:00
Linas Vepstas	b4f382a3e5	[PATCH] powerpc/pseries: Cleanup device name printing. This avoids printk'ing a NULL string. Signed-off-by: Linas Vepstas <linas@linas.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-03-27 14:48:46 +11:00
Olaf Hering	273d280381	[PATCH] powerpc: fix NULL pointer in handle_eeh_events This patch fixes a crash in handle_eeh_events, but ethtool -t still doesnt work right. ... pepino:~ # cpu 0x3: Vector: 300 (Data Access) at [c00000005192bbe0] pc: c00000000004a380: .handle_eeh_events+0xe0/0x23c lr: c00000000004a374: .handle_eeh_events+0xd4/0x23c sp: c00000005192be60 msr: 9000000000009032 dar: 268 dsisr: 40000000 current = 0xc0000001fe7bf1a0 paca = 0xc00000000048b280 pid = 16322, comm = eehd enter ? for help [c00000005192bf00] c00000000004a808 .eeh_event_handler+0xcc/0x130 [c00000005192bf90] c000000000025e00 .kernel_thread+0x4c/0x68 ... (none):/# /usr/sbin/ethtool -i eth0 driver: e100 version: 3.5.10-k2-NAPI firmware-version: N/A bus-info: 0000:21:01.0 (none):/# /usr/sbin/ethtool -t eth0 Call Trace: [C00000000F8DEFF0] [C00000000000F270] .show_stack+0x74/0x1b4 (unreliable) [C00000000F8DF0A0] [C000000000049D04] .eeh_dn_check_failure+0x290/0x2d8 [C00000000F8DF150] [C000000000049E58] .eeh_check_failure+0x10c/0x138 [C00000000F8DF1E0] [C0000000002DFDB0] .e100_hw_reset+0x70/0xf4 [C00000000F8DF270] [C0000000002E1BBC] .e100_hw_init+0x2c/0x260 [C00000000F8DF310] [C0000000002E2464] .e100_loopback_test+0x8c/0x220 [C00000000F8DF3C0] [C0000000002E28DC] .e100_diag_test+0xdc/0x16c [C00000000F8DF490] [C000000000420BE0] .dev_ethtool+0xf24/0x14f8 [C00000000F8DF8F0] [C00000000041F4A8] .dev_ioctl+0x5cc/0x740 [C00000000F8DFA20] [C00000000040FEFC] .sock_ioctl+0x3d0/0x404 [C00000000F8DFAC0] [C0000000000D513C] .do_ioctl+0x68/0x108 [C00000000F8DFB50] [C0000000000D56B0] .vfs_ioctl+0x4d4/0x510 [C00000000F8DFC10] [C0000000000D5740] .sys_ioctl+0x54/0x94 [C00000000F8DFCC0] [C0000000000FB6EC] .ethtool_ioctl+0x11c/0x150 [C00000000F8DFD60] [C0000000000F7E40] .compat_sys_ioctl+0x338/0x3bc [C00000000F8DFE30] [C00000000000871C] syscall_exit+0x0/0x40 EEH: Detected PCI bus error on device 0000:21:01.0 EEH: This PCI device has failed 1 times since last reboot: <NULL> - modprobe: FATAL: Could not load /lib/modules/2.6.16-rc4-git7/modules.dep: No such file or directory Cannot get strings: No such device (none):/# (none):/# EEH: Unable to configure device bridge (-3) for /pci@400000000110/pci@2,2 (none):/# Call Trace: [C00000000FA17940] [C00000000000F270] .show_stack+0x74/0x1b4 (unreliable) [C00000000FA179F0] [C000000000049D04] .eeh_dn_check_failure+0x290/0x2d8 [C00000000FA17AA0] [C00000000001E114] .rtas_read_config+0x120/0x154 [C00000000FA17B40] [C000000000049664] .early_enable_eeh+0x274/0x2bc [C00000000FA17C00] [C000000000049708] .eeh_add_device_early+0x5c/0x6c [C00000000FA17C90] [C000000000049748] .eeh_add_device_tree_early+0x30/0x5c [C00000000FA17D20] [C000000000046568] .pcibios_add_pci_devices+0x8c/0x1f8 [C00000000FA17DD0] [C00000000004A528] .eeh_reset_device+0xe0/0x110 [C00000000FA17E60] [C00000000004A698] .handle_eeh_events+0x140/0x250 [C00000000FA17F00] [C00000000004AC7C] .eeh_event_handler+0xe8/0x140 [C00000000FA17F90] [C000000000025784] .kernel_thread+0x4c/0x68 EEH: Detected PCI bus error on device <NULL> EEH: This PCI device has failed 1 times since last reboot: <NULL> - EEH: Unable to configure device bridge (-3) for /pci@400000000110/pci@2,2 Call Trace: [C00000000FA17940] [C00000000000F270] .show_stack+0x74/0x1b4 (unreliable) [C00000000FA179F0] [C000000000049D04] .eeh_dn_check_failure+0x290/0x2d8 [C00000000FA17AA0] [C00000000001E114] .rtas_read_config+0x120/0x154 [C00000000FA17B40] [C000000000049664] .early_enable_eeh+0x274/0x2bc [C00000000FA17C00] [C000000000049708] .eeh_add_device_early+0x5c/0x6c [C00000000FA17C90] [C000000000049748] .eeh_add_device_tree_early+0x30/0x5c [C00000000FA17D20] [C000000000046568] .pcibios_add_pci_devices+0x8c/0x1f8 [C00000000FA17DD0] [C00000000004A528] .eeh_reset_device+0xe0/0x110 [C00000000FA17E60] [C00000000004A698] .handle_eeh_events+0x140/0x250 [C00000000FA17F00] [C00000000004AC7C] .eeh_event_handler+0xe8/0x140 [C00000000FA17F90] [C000000000025784] .kernel_thread+0x4c/0x68 EEH: Detected PCI bus error on device <NULL> EEH: This PCI device has failed 1 times since last reboot: <NULL> - EEH: Unable to configure device bridge (-3) for /pci@400000000110/pci@2,2 Call Trace: [C00000000FA17940] [C00000000000F270] .show_stack+0x74/0x1b4 (unreliable) [C00000000FA179F0] [C000000000049D04] .eeh_dn_check_failure+0x290/0x2d8 [C00000000FA17AA0] [C00000000001E114] .rtas_read_config+0x120/0x154 [C00000000FA17B40] [C000000000049664] .early_enable_eeh+0x274/0x2bc [C00000000FA17C00] [C000000000049708] .eeh_add_device_early+0x5c/0x6c [C00000000FA17C90] [C000000000049748] .eeh_add_device_tree_early+0x30/0x5c [C00000000FA17D20] [C000000000046568] .pcibios_add_pci_devices+0x8c/0x1f8 [C00000000FA17DD0] [C00000000004A528] .eeh_reset_device+0xe0/0x110 [C00000000FA17E60] [C00000000004A698] .handle_eeh_events+0x140/0x250 [C00000000FA17F00] [C00000000004AC7C] .eeh_event_handler+0xe8/0x140 [C00000000FA17F90] [C000000000025784] .kernel_thread+0x4c/0x68 EEH: Detected PCI bus error on device <NULL> and so on Signed-off-by: Olaf Hering <olh@suse.de> Acked-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-02-28 16:25:54 +11:00
Al Viro	d04e4e115b	[PATCH] eeh_driver NULL noise removal Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2006-02-07 20:58:33 -05:00
Paul Mackerras	18eb3b398d	powerpc: Fix up some compile errors in the PCI error recovery code <asm/systemcfg.h> is gone now, and the PCI error recovery constants in include/linux/pci.h changed their names in the process of getting accepted. Signed-off-by: Paul Mackerras <paulus@samba.org> (cherry picked from 5a2516156c591fc3d2059fbd93f97e15eb6010d6 commit)	2006-01-10 15:32:31 +11:00
Linas Vepstas	3914ac7b0e	[PATCH] powerpc: handle multifunction PCI devices properly 239-eeh-multifunction-consolidate.patch New-style firmware will often place multiple different functions under a non-EEH-aware parent. However, these devices might share a common PE "partition endpoint" and config address, ad thus any EEH events will affect all of the devices in common. This patch makes the effort to find all of these common devices and handle them together. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> (cherry picked from 216810296bb97d39da8e176822e9de78d2f00187 commit)	2006-01-10 15:30:23 +11:00
Linas Vepstas	b6495c0c8f	[PATCH] powerpc: Don't continue with PCI Error recovery if slot reset failed. 238-eeh-stop-if-reset_failed.patch If the firmware is unable to reset the PCI slot for some reason, then don't attempt any further recovery steps after that point. Instead, mark the device as permanently failed. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> (cherry picked from e06b942521eb2cdaf232726f45a820d5837acb12 commit)	2006-01-10 15:30:14 +11:00
Linas Vepstas	9fb40eb883	[PATCH] powerpc: Remove duplicate code 234-eeh-find-pe.patch The find_device_pe() routine is duplicated in two files. Remove one of the two copies, declare the other extern. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> (cherry picked from 48408e708282d4d0269136ff27ea5acbd9410b5a commit)	2006-01-10 15:29:33 +11:00
Linas Vepstas	77bd741561	[PATCH] powerpc: PCI Error Recovery: PPC64 core recovery routines Various PCI bus errors can be signaled by newer PCI controllers. The core error recovery routines are architecture dependent. This patch adds a recovery infrastructure for the PPC64 pSeries systems. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> (cherry picked from e8ca11b460c4c9c7fa6b529be221529ebd770e38 commit)	2006-01-10 15:28:32 +11:00

11 Commits