Post by Mike ChristieDid you bring the target back up and if so did you do it with the same
target name?
Sorry for the delay in getting back, been travelling on business. But thanks
very much for the reply!
Yes, I did bring the target back up and with the same name. Although some of
the LUNs have moved around as I rebuilt the machine to match its partner which
the VMs RAID 1 it against.
Post by Mike ChristieWhat is your replacement/recovery timeout setting in /etc/iscsi/iscsid.conf?
Looks like 120 but just in case, here's the entire contents:
scsid.startup = /etc/rc.d/init.d/iscsid force-start
node.startup = automatic
node.leading_login = No
node.session.timeo.replacement_timeout = 120
node.conn[0].timeo.login_timeout = 15
node.conn[0].timeo.logout_timeout = 15
node.conn[0].timeo.noop_out_interval = 5
node.conn[0].timeo.noop_out_timeout = 5
node.session.err_timeo.abort_timeout = 15
node.session.err_timeo.lu_reset_timeout = 30
node.session.err_timeo.tgt_reset_timeout = 30
node.session.initial_login_retry_max = 8
node.session.cmds_max = 128
node.session.queue_depth = 32
node.session.xmit_thread_priority = -20
node.session.iscsi.InitialR2T = No
node.session.iscsi.ImmediateData = Yes
node.session.iscsi.FirstBurstLength = 262144
node.session.iscsi.MaxBurstLength = 16776192
node.conn[0].iscsi.MaxRecvDataSegmentLength = 262144
node.conn[0].iscsi.MaxXmitDataSegmentLength = 0
discovery.sendtargets.iscsi.MaxRecvDataSegmentLength = 32768
node.conn[0].iscsi.HeaderDigest = None
node.session.nr_sessions = 1
node.session.iscsi.FastAbort = Yes
See anything amiss? I now have around 8 processes stuck on this system. I'm
going to have to reboot it this weekend to clear up the issue but I would
really like to find out what is really going on and how to avoid it before
taking such measures.
Post by Mike ChristieIt sounds like the scsi scan IO is stuck on a target that disappeared
and never came back, or it is a Centos scsi layer bug. Could you send
the /var/log/messages.
The entire file is rather large but here are some of the messages relevant to
iscsi:
Jul 4 15:18:44 cpu03 kernel: connection8:0: detected conn error (1020)
Jul 4 15:18:45 cpu03 iscsid: Kernel reported iSCSI connection 8:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)
Jul 4 15:18:46 cpu03 kernel: connection6:0: detected conn error (1020)
Jul 4 15:18:46 cpu03 kernel: connection7:0: detected conn error (1020)
Jul 4 15:18:47 cpu03 iscsid: Kernel reported iSCSI connection 6:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)
Jul 4 15:18:47 cpu03 iscsid: Kernel reported iSCSI connection 7:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)
Jul 4 15:18:47 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused)
Jul 4 15:18:50 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused)
Jul 4 15:18:50 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused)
Jul 4 15:18:51 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused)
Jul 4 15:19:27 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to host)
Jul 4 15:19:30 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to host)
Jul 4 15:19:30 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to host)
Jul 4 15:19:33 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to host)
Jul 4 15:19:36 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to host)
<skip many of these no route to host messages, happened while I was rebuilding the target with ip 10.0.1.11>
Jul 4 15:18:47 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused)
Jul 4 15:18:50 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused)
Jul 4 15:18:50 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused)
Jul 4 15:18:51 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused)
Jul 4 15:20:45 cpu03 kernel: session8: session recovery timed out after 120 secs
Jul 4 15:20:45 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to host)
Jul 4 15:20:47 cpu03 kernel: session6: session recovery timed out after 120 secs
Jul 4 15:20:47 cpu03 kernel: session7: session recovery timed out after 120 secs
Jul 8 20:37:04 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused)
Jul 8 20:37:04 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused)
Jul 8 20:37:07 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused)
<skip lots of these connection refused messages)
Jul 12 14:33:08 cpu03 kernel: connection8:0: detected conn error (1020)
Jul 12 14:33:08 cpu03 kernel: connection6:0: detected conn error (1020)
Jul 12 14:33:08 cpu03 kernel: connection7:0: detected conn error (1020)
Jul 12 14:33:09 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:09 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:09 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:11 cpu03 kernel: connection8:0: detected conn error (1020)
Jul 12 14:33:11 cpu03 kernel: connection6:0: detected conn error (1020)
Jul 12 14:33:11 cpu03 kernel: connection7:0: detected conn error (1020)
Jul 12 14:33:12 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:12 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:12 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:14 cpu03 kernel: connection8:0: detected conn error (1020)
Jul 12 14:33:14 cpu03 kernel: connection6:0: detected conn error (1020)
Jul 12 14:33:14 cpu03 kernel: connection7:0: detected conn error (1020)
Jul 12 14:33:15 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:15 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:15 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:17 cpu03 kernel: connection8:0: detected conn error (1020)
Jul 12 14:33:17 cpu03 kernel: connection6:0: detected conn error (1020)
Jul 12 14:33:17 cpu03 kernel: connection7:0: detected conn error (1020)
Jul 12 14:33:18 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:18 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:18 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:20 cpu03 kernel: connection8:0: detected conn error (1020)
Jul 12 14:33:20 cpu03 kernel: connection6:0: detected conn error (1020)
Jul 12 14:33:20 cpu03 kernel: connection7:0: detected conn error (1020)
Jul 12 14:33:21 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:21 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:21 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:23 cpu03 kernel: connection8:0: detected conn error (1020)
Jul 12 14:33:23 cpu03 kernel: connection6:0: detected conn error (1020)
Jul 12 14:33:23 cpu03 kernel: connection7:0: detected conn error (1020)
Jul 12 14:33:24 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:24 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:24 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:26 cpu03 kernel: connection8:0: detected conn error (1020)
Jul 12 14:33:26 cpu03 kernel: connection6:0: detected conn error (1020)
Jul 12 14:33:26 cpu03 kernel: connection7:0: detected conn error (1020)
Jul 12 14:33:27 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:27 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:27 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:29 cpu03 kernel: connection8:0: detected conn error (1020)
Jul 12 14:33:29 cpu03 kernel: connection6:0: detected conn error (1020)
Jul 12 14:33:29 cpu03 kernel: connection7:0: detected conn error (1020)
Jul 12 14:33:30 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:30 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
<skip lots of these>
Jul 12 14:35:23 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:35:23 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:35:23 cpu03 iscsid: connection6:0 is operational after recovery (9 attempts)
Jul 12 14:35:26 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:35:26 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:35:29 cpu03 iscsid: connection7:0 is operational after recovery (9 attempts)
Jul 12 14:35:29 cpu03 iscsid: connection8:0 is operational after recovery (9 attempts)
Jul 12 15:42:57 cpu03 kernel: scsi39 : iSCSI Initiator over TCP/IP
Jul 12 15:42:57 cpu03 iscsid: Could not set session34 priority. READ/WRITE throughout and latency could be affected.
Jul 12 15:42:58 cpu03 kernel: scsi 39:0:0:0: RAID IET Controller 0001 PQ: 0 ANSI: 5
Jul 12 15:42:58 cpu03 kernel: scsi 39:0:0:0: Attached scsi generic sg129 type 12
Jul 12 15:42:58 cpu03 kernel: scsi 39:0:0:1: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Jul 12 15:42:58 cpu03 kernel: sd 39:0:0:1: Attached scsi generic sg130 type 0
Jul 12 15:42:58 cpu03 kernel: scsi 39:0:0:2: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Jul 12 15:42:58 cpu03 kernel: sd 39:0:0:2: Attached scsi generic sg131 type 0
Jul 12 15:42:58 cpu03 iscsid: Connection34:0 to [target: iqn.2012-04.com.edirectpublishing.disk06:6b, portal: 10.0.1.11,3260] through [iface: default] is operational now
Jul 12 15:43:12 cpu03 kernel: scsi40 : iSCSI Initiator over TCP/IP
Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:0: RAID IET Controller 0001 PQ: 0 ANSI: 5
Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:0: Attached scsi generic sg132 type 12
Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:1: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Jul 12 15:43:12 cpu03 kernel: sd 40:0:0:1: Attached scsi generic sg133 type 0
Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:2: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Jul 12 15:43:12 cpu03 kernel: sd 40:0:0:2: Attached scsi generic sg134 type 0
Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:3: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Jul 12 15:43:12 cpu03 kernel: sd 40:0:0:3: Attached scsi generic sg135 type 0
Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:4: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Jul 12 15:43:12 cpu03 kernel: sd 40:0:0:4: Attached scsi generic sg136 type 0
Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:5: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Jul 12 15:43:12 cpu03 kernel: sd 40:0:0:5: Attached scsi generic sg137 type 0
Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:6: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Jul 12 15:43:12 cpu03 kernel: sd 40:0:0:6: Attached scsi generic sg138 type 0
Jul 12 15:43:12 cpu03 iscsid: Could not set session35 priority. READ/WRITE throughout and latency could be affected.
Jul 12 15:43:12 cpu03 iscsid: Connection35:0 to [target: iqn.2012-04.com.edirectpublishing.disk06:6e, portal: 10.0.1.11,3260] through [iface: default] is operational now
Jul 12 15:43:16 cpu03 kernel: scsi41 : iSCSI Initiator over TCP/IP
Jul 12 15:43:16 cpu03 iscsid: Could not set session36 priority. READ/WRITE throughout and latency could be affected.
Jul 12 15:43:17 cpu03 kernel: scsi 41:0:0:0: RAID IET Controller 0001 PQ: 0 ANSI: 5
Jul 12 15:43:17 cpu03 kernel: scsi 41:0:0:0: Attached scsi generic sg139 type 12
Jul 12 15:43:17 cpu03 kernel: scsi 41:0:0:1: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Jul 12 15:43:17 cpu03 kernel: sd 41:0:0:1: Attached scsi generic sg140 type 0
Jul 12 15:43:17 cpu03 kernel: scsi 41:0:0:2: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Jul 12 15:43:17 cpu03 kernel: sd 41:0:0:2: Attached scsi generic sg141 type 0
Jul 12 15:43:17 cpu03 iscsid: Connection36:0 to [target: iqn.2012-04.com.edirectpublishing.disk06:6f, portal: 10.0.1.11,3260] through [iface: default] is operational now
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Unit Not Ready
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Sense Key : Illegal Request [current]
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Add. Sense: Logical unit not supported
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] READ CAPACITY(16) failed
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Sense Key : Illegal Request [current]
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Add. Sense: Logical unit not supported
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] READ CAPACITY failed
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Sense Key : Illegal Request [current]
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Add. Sense: Logical unit not supported
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Test WP failed, assume Write Enabled
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Asking for cache data failed
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Assuming drive cache: write through
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Unit Not Ready
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Sense Key : Illegal Request [current]
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Add. Sense: Logical unit not supported
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] READ CAPACITY(16) failed
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Sense Key : Illegal Request [current]
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Add. Sense: Logical unit not supported
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] READ CAPACITY failed
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Sense Key : Illegal Request [current]
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Add. Sense: Logical unit not supported
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Test WP failed, assume Write Enabled
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Asking for cache data failed
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Assuming drive cache: write through
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Unit Not Ready
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Sense Key : Illegal Request [current]
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Add. Sense: Logical unit not supported
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] READ CAPACITY(16) failed
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Sense Key : Illegal Request [current]
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Add. Sense: Logical unit not supported
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] READ CAPACITY failed
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Sense Key : Illegal Request [current]
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Add. Sense: Logical unit not supported
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Test WP failed, assume Write Enabled
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Asking for cache data failed
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Assuming drive cache: write through
Jul 14 23:10:39 cpu03 kernel: sd 18:0:0:2: [sdan] Unit Not Ready
Jul 14 23:10:39 cpu03 kernel: sd 18:0:0:2: [sdan] Sense Key : Illegal Request [current]
Jul 14 23:10:39 cpu03 kernel: sd 18:0:0:2: [sdan] Add. Sense: Logical unit not supported
Jul 14 23:10:39 cpu03 kernel: sd 18:0:0:2: [sdan] READ CAPACITY(16) failed
Jul 14 23:10:39 cpu03 kernel: sd 18:0:0:2: [sdan] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
<skip lots of these>
Jul 24 16:51:30 cpu03 kernel: scsi 24:0:0:5: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Jul 24 16:51:30 cpu03 kernel: sd 24:0:0:5: Attached scsi generic sg142 type 0
Jul 24 16:51:30 cpu03 kernel: scsi 29:0:0:5: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Jul 24 16:51:30 cpu03 kernel: sd 29:0:0:5: Attached scsi generic sg143 type 0
Thanks for any insight you can provide!
--
Tracy Reed