Discussion:
Rescan LUNs hangs
Tracy Reed
2013-07-24 00:32:52 UTC
Permalink
Hello all,

I am running iscsi-initiator-utils-6.2.0.872-13.el5 on CentOS release 5.4 (will
patch up at next reboot) initiator to scsi-target-utils-1.0.24-2.el6.x86_64 on
CentOS 6.4.

I have Xen running on the initiator machine with luns from the target machine
as storage. I actually have two target machines and do software raid1 in the
VMs.

I needed to upgrade one of the target machines so I split the mirrors in the
VMs and shutdown the target machine, upgraded some disk, reinstalled new OS
etc. Now when I do /sbin/iscsiadm -m node -R on the initiator machine it hangs.
The process seems uninterruptable. I can't kill -9 it and it spins using 100%
of a cpu. I think perhaps before I shutdown the target machine I should have
logged out the initiator from it. Now I have a bunch of hung processes and I
can't access new disk volumes because I can't rescan the LUNs.

How do I clear this up, preferably without rebooting the initiator?

Thanks!
--
Tracy Reed
Mike Christie
2013-07-25 00:15:16 UTC
Permalink
Post by Tracy Reed
Hello all,
I am running iscsi-initiator-utils-6.2.0.872-13.el5 on CentOS release 5.4 (will
patch up at next reboot) initiator to scsi-target-utils-1.0.24-2.el6.x86_64 on
CentOS 6.4.
I have Xen running on the initiator machine with luns from the target machine
as storage. I actually have two target machines and do software raid1 in the
VMs.
I needed to upgrade one of the target machines so I split the mirrors in the
VMs and shutdown the target machine, upgraded some disk, reinstalled new OS
etc. Now when I do /sbin/iscsiadm -m node -R on the initiator machine it hangs.
Did you bring the target back up and if so did you do it with the same
target name?
Post by Tracy Reed
The process seems uninterruptable. I can't kill -9 it and it spins using 100%
of a cpu. I think perhaps before I shutdown the target machine I should have
logged out the initiator from it. Now I have a bunch of hung processes and I
can't access new disk volumes because I can't rescan the LUNs.
What is your replacement/recovery timeout setting in /etc/iscsi/iscsid.conf?

It sounds like the scsi scan IO is stuck on a target that disappeared
and never came back, or it is a Centos scsi layer bug. Could you send
the /var/log/messages.
--
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to open-iscsi-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/groups/opt_out.
Tracy Reed
2013-08-01 18:43:01 UTC
Permalink
Post by Mike Christie
Did you bring the target back up and if so did you do it with the same
target name?
Sorry for the delay in getting back, been travelling on business. But thanks
very much for the reply!

Yes, I did bring the target back up and with the same name. Although some of
the LUNs have moved around as I rebuilt the machine to match its partner which
the VMs RAID 1 it against.
Post by Mike Christie
What is your replacement/recovery timeout setting in /etc/iscsi/iscsid.conf?
Looks like 120 but just in case, here's the entire contents:

scsid.startup = /etc/rc.d/init.d/iscsid force-start
node.startup = automatic
node.leading_login = No
node.session.timeo.replacement_timeout = 120
node.conn[0].timeo.login_timeout = 15
node.conn[0].timeo.logout_timeout = 15
node.conn[0].timeo.noop_out_interval = 5
node.conn[0].timeo.noop_out_timeout = 5
node.session.err_timeo.abort_timeout = 15
node.session.err_timeo.lu_reset_timeout = 30
node.session.err_timeo.tgt_reset_timeout = 30
node.session.initial_login_retry_max = 8
node.session.cmds_max = 128
node.session.queue_depth = 32
node.session.xmit_thread_priority = -20
node.session.iscsi.InitialR2T = No
node.session.iscsi.ImmediateData = Yes
node.session.iscsi.FirstBurstLength = 262144
node.session.iscsi.MaxBurstLength = 16776192
node.conn[0].iscsi.MaxRecvDataSegmentLength = 262144
node.conn[0].iscsi.MaxXmitDataSegmentLength = 0
discovery.sendtargets.iscsi.MaxRecvDataSegmentLength = 32768
node.conn[0].iscsi.HeaderDigest = None
node.session.nr_sessions = 1
node.session.iscsi.FastAbort = Yes

See anything amiss? I now have around 8 processes stuck on this system. I'm
going to have to reboot it this weekend to clear up the issue but I would
really like to find out what is really going on and how to avoid it before
taking such measures.
Post by Mike Christie
It sounds like the scsi scan IO is stuck on a target that disappeared
and never came back, or it is a Centos scsi layer bug. Could you send
the /var/log/messages.
The entire file is rather large but here are some of the messages relevant to
iscsi:

Jul 4 15:18:44 cpu03 kernel: connection8:0: detected conn error (1020)
Jul 4 15:18:45 cpu03 iscsid: Kernel reported iSCSI connection 8:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)
Jul 4 15:18:46 cpu03 kernel: connection6:0: detected conn error (1020)
Jul 4 15:18:46 cpu03 kernel: connection7:0: detected conn error (1020)
Jul 4 15:18:47 cpu03 iscsid: Kernel reported iSCSI connection 6:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)
Jul 4 15:18:47 cpu03 iscsid: Kernel reported iSCSI connection 7:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)
Jul 4 15:18:47 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused)
Jul 4 15:18:50 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused)
Jul 4 15:18:50 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused)
Jul 4 15:18:51 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused)
Jul 4 15:19:27 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to host)
Jul 4 15:19:30 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to host)
Jul 4 15:19:30 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to host)
Jul 4 15:19:33 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to host)
Jul 4 15:19:36 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to host)
<skip many of these no route to host messages, happened while I was rebuilding the target with ip 10.0.1.11>
Jul 4 15:18:47 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused)
Jul 4 15:18:50 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused)
Jul 4 15:18:50 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused)
Jul 4 15:18:51 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused)
Jul 4 15:20:45 cpu03 kernel: session8: session recovery timed out after 120 secs
Jul 4 15:20:45 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to host)
Jul 4 15:20:47 cpu03 kernel: session6: session recovery timed out after 120 secs
Jul 4 15:20:47 cpu03 kernel: session7: session recovery timed out after 120 secs
Jul 8 20:37:04 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused)
Jul 8 20:37:04 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused)
Jul 8 20:37:07 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused)
<skip lots of these connection refused messages)
Jul 12 14:33:08 cpu03 kernel: connection8:0: detected conn error (1020)
Jul 12 14:33:08 cpu03 kernel: connection6:0: detected conn error (1020)
Jul 12 14:33:08 cpu03 kernel: connection7:0: detected conn error (1020)
Jul 12 14:33:09 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:09 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:09 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:11 cpu03 kernel: connection8:0: detected conn error (1020)
Jul 12 14:33:11 cpu03 kernel: connection6:0: detected conn error (1020)
Jul 12 14:33:11 cpu03 kernel: connection7:0: detected conn error (1020)
Jul 12 14:33:12 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:12 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:12 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:14 cpu03 kernel: connection8:0: detected conn error (1020)
Jul 12 14:33:14 cpu03 kernel: connection6:0: detected conn error (1020)
Jul 12 14:33:14 cpu03 kernel: connection7:0: detected conn error (1020)
Jul 12 14:33:15 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:15 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:15 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:17 cpu03 kernel: connection8:0: detected conn error (1020)
Jul 12 14:33:17 cpu03 kernel: connection6:0: detected conn error (1020)
Jul 12 14:33:17 cpu03 kernel: connection7:0: detected conn error (1020)
Jul 12 14:33:18 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:18 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:18 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:20 cpu03 kernel: connection8:0: detected conn error (1020)
Jul 12 14:33:20 cpu03 kernel: connection6:0: detected conn error (1020)
Jul 12 14:33:20 cpu03 kernel: connection7:0: detected conn error (1020)
Jul 12 14:33:21 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:21 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:21 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:23 cpu03 kernel: connection8:0: detected conn error (1020)
Jul 12 14:33:23 cpu03 kernel: connection6:0: detected conn error (1020)
Jul 12 14:33:23 cpu03 kernel: connection7:0: detected conn error (1020)
Jul 12 14:33:24 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:24 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:24 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:26 cpu03 kernel: connection8:0: detected conn error (1020)
Jul 12 14:33:26 cpu03 kernel: connection6:0: detected conn error (1020)
Jul 12 14:33:26 cpu03 kernel: connection7:0: detected conn error (1020)
Jul 12 14:33:27 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:27 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:27 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:29 cpu03 kernel: connection8:0: detected conn error (1020)
Jul 12 14:33:29 cpu03 kernel: connection6:0: detected conn error (1020)
Jul 12 14:33:29 cpu03 kernel: connection7:0: detected conn error (1020)
Jul 12 14:33:30 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:30 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
<skip lots of these>
Jul 12 14:35:23 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:35:23 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:35:23 cpu03 iscsid: connection6:0 is operational after recovery (9 attempts)
Jul 12 14:35:26 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:35:26 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:35:29 cpu03 iscsid: connection7:0 is operational after recovery (9 attempts)
Jul 12 14:35:29 cpu03 iscsid: connection8:0 is operational after recovery (9 attempts)
Jul 12 15:42:57 cpu03 kernel: scsi39 : iSCSI Initiator over TCP/IP
Jul 12 15:42:57 cpu03 iscsid: Could not set session34 priority. READ/WRITE throughout and latency could be affected.
Jul 12 15:42:58 cpu03 kernel: scsi 39:0:0:0: RAID IET Controller 0001 PQ: 0 ANSI: 5
Jul 12 15:42:58 cpu03 kernel: scsi 39:0:0:0: Attached scsi generic sg129 type 12
Jul 12 15:42:58 cpu03 kernel: scsi 39:0:0:1: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Jul 12 15:42:58 cpu03 kernel: sd 39:0:0:1: Attached scsi generic sg130 type 0
Jul 12 15:42:58 cpu03 kernel: scsi 39:0:0:2: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Jul 12 15:42:58 cpu03 kernel: sd 39:0:0:2: Attached scsi generic sg131 type 0
Jul 12 15:42:58 cpu03 iscsid: Connection34:0 to [target: iqn.2012-04.com.edirectpublishing.disk06:6b, portal: 10.0.1.11,3260] through [iface: default] is operational now
Jul 12 15:43:12 cpu03 kernel: scsi40 : iSCSI Initiator over TCP/IP
Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:0: RAID IET Controller 0001 PQ: 0 ANSI: 5
Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:0: Attached scsi generic sg132 type 12
Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:1: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Jul 12 15:43:12 cpu03 kernel: sd 40:0:0:1: Attached scsi generic sg133 type 0
Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:2: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Jul 12 15:43:12 cpu03 kernel: sd 40:0:0:2: Attached scsi generic sg134 type 0
Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:3: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Jul 12 15:43:12 cpu03 kernel: sd 40:0:0:3: Attached scsi generic sg135 type 0
Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:4: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Jul 12 15:43:12 cpu03 kernel: sd 40:0:0:4: Attached scsi generic sg136 type 0
Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:5: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Jul 12 15:43:12 cpu03 kernel: sd 40:0:0:5: Attached scsi generic sg137 type 0
Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:6: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Jul 12 15:43:12 cpu03 kernel: sd 40:0:0:6: Attached scsi generic sg138 type 0
Jul 12 15:43:12 cpu03 iscsid: Could not set session35 priority. READ/WRITE throughout and latency could be affected.
Jul 12 15:43:12 cpu03 iscsid: Connection35:0 to [target: iqn.2012-04.com.edirectpublishing.disk06:6e, portal: 10.0.1.11,3260] through [iface: default] is operational now
Jul 12 15:43:16 cpu03 kernel: scsi41 : iSCSI Initiator over TCP/IP
Jul 12 15:43:16 cpu03 iscsid: Could not set session36 priority. READ/WRITE throughout and latency could be affected.
Jul 12 15:43:17 cpu03 kernel: scsi 41:0:0:0: RAID IET Controller 0001 PQ: 0 ANSI: 5
Jul 12 15:43:17 cpu03 kernel: scsi 41:0:0:0: Attached scsi generic sg139 type 12
Jul 12 15:43:17 cpu03 kernel: scsi 41:0:0:1: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Jul 12 15:43:17 cpu03 kernel: sd 41:0:0:1: Attached scsi generic sg140 type 0
Jul 12 15:43:17 cpu03 kernel: scsi 41:0:0:2: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Jul 12 15:43:17 cpu03 kernel: sd 41:0:0:2: Attached scsi generic sg141 type 0
Jul 12 15:43:17 cpu03 iscsid: Connection36:0 to [target: iqn.2012-04.com.edirectpublishing.disk06:6f, portal: 10.0.1.11,3260] through [iface: default] is operational now
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Unit Not Ready
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Sense Key : Illegal Request [current]
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Add. Sense: Logical unit not supported
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] READ CAPACITY(16) failed
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Sense Key : Illegal Request [current]
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Add. Sense: Logical unit not supported
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] READ CAPACITY failed
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Sense Key : Illegal Request [current]
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Add. Sense: Logical unit not supported
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Test WP failed, assume Write Enabled
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Asking for cache data failed
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Assuming drive cache: write through
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Unit Not Ready
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Sense Key : Illegal Request [current]
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Add. Sense: Logical unit not supported
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] READ CAPACITY(16) failed
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Sense Key : Illegal Request [current]
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Add. Sense: Logical unit not supported
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] READ CAPACITY failed
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Sense Key : Illegal Request [current]
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Add. Sense: Logical unit not supported
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Test WP failed, assume Write Enabled
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Asking for cache data failed
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Assuming drive cache: write through
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Unit Not Ready
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Sense Key : Illegal Request [current]
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Add. Sense: Logical unit not supported
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] READ CAPACITY(16) failed
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Sense Key : Illegal Request [current]
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Add. Sense: Logical unit not supported
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] READ CAPACITY failed
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Sense Key : Illegal Request [current]
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Add. Sense: Logical unit not supported
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Test WP failed, assume Write Enabled
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Asking for cache data failed
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Assuming drive cache: write through
Jul 14 23:10:39 cpu03 kernel: sd 18:0:0:2: [sdan] Unit Not Ready
Jul 14 23:10:39 cpu03 kernel: sd 18:0:0:2: [sdan] Sense Key : Illegal Request [current]
Jul 14 23:10:39 cpu03 kernel: sd 18:0:0:2: [sdan] Add. Sense: Logical unit not supported
Jul 14 23:10:39 cpu03 kernel: sd 18:0:0:2: [sdan] READ CAPACITY(16) failed
Jul 14 23:10:39 cpu03 kernel: sd 18:0:0:2: [sdan] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
<skip lots of these>
Jul 24 16:51:30 cpu03 kernel: scsi 24:0:0:5: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Jul 24 16:51:30 cpu03 kernel: sd 24:0:0:5: Attached scsi generic sg142 type 0
Jul 24 16:51:30 cpu03 kernel: scsi 29:0:0:5: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Jul 24 16:51:30 cpu03 kernel: sd 29:0:0:5: Attached scsi generic sg143 type 0

Thanks for any insight you can provide!
--
Tracy Reed
Mike Christie
2013-08-03 19:50:23 UTC
Permalink
Post by Tracy Reed
Post by Mike Christie
Did you bring the target back up and if so did you do it with the same
target name?
Sorry for the delay in getting back, been travelling on business. But thanks
very much for the reply!
Yes, I did bring the target back up and with the same name. Although some of
the LUNs have moved around as I rebuilt the machine to match its partner which
the VMs RAID 1 it against.
Post by Mike Christie
What is your replacement/recovery timeout setting in /etc/iscsi/iscsid.conf?
scsid.startup = /etc/rc.d/init.d/iscsid force-start
node.startup = automatic
node.leading_login = No
node.session.timeo.replacement_timeout = 120
node.conn[0].timeo.login_timeout = 15
node.conn[0].timeo.logout_timeout = 15
node.conn[0].timeo.noop_out_interval = 5
node.conn[0].timeo.noop_out_timeout = 5
node.session.err_timeo.abort_timeout = 15
node.session.err_timeo.lu_reset_timeout = 30
node.session.err_timeo.tgt_reset_timeout = 30
node.session.initial_login_retry_max = 8
node.session.cmds_max = 128
node.session.queue_depth = 32
node.session.xmit_thread_priority = -20
node.session.iscsi.InitialR2T = No
node.session.iscsi.ImmediateData = Yes
node.session.iscsi.FirstBurstLength = 262144
node.session.iscsi.MaxBurstLength = 16776192
node.conn[0].iscsi.MaxRecvDataSegmentLength = 262144
node.conn[0].iscsi.MaxXmitDataSegmentLength = 0
discovery.sendtargets.iscsi.MaxRecvDataSegmentLength = 32768
node.conn[0].iscsi.HeaderDigest = None
node.session.nr_sessions = 1
node.session.iscsi.FastAbort = Yes
See anything amiss? I now have around 8 processes stuck on this system. I'm
going to have to reboot it this weekend to clear up the issue but I would
really like to find out what is really going on and how to avoid it before
taking such measures.
Post by Mike Christie
It sounds like the scsi scan IO is stuck on a target that disappeared
and never came back, or it is a Centos scsi layer bug. Could you send
the /var/log/messages.
The entire file is rather large but here are some of the messages relevant to
Jul 4 15:18:44 cpu03 kernel: connection8:0: detected conn error (1020)
Jul 4 15:18:45 cpu03 iscsid: Kernel reported iSCSI connection 8:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)
Jul 4 15:18:46 cpu03 kernel: connection6:0: detected conn error (1020)
Jul 4 15:18:46 cpu03 kernel: connection7:0: detected conn error (1020)
Jul 4 15:18:47 cpu03 iscsid: Kernel reported iSCSI connection 6:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)
Jul 4 15:18:47 cpu03 iscsid: Kernel reported iSCSI connection 7:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)
Jul 4 15:18:47 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused)
Jul 4 15:18:50 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused)
Jul 4 15:18:50 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused)
Jul 4 15:18:51 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused)
Jul 4 15:19:27 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to host)
Jul 4 15:19:30 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to host)
Jul 4 15:19:30 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to host)
Jul 4 15:19:33 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to host)
Jul 4 15:19:36 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to host)
<skip many of these no route to host messages, happened while I was rebuilding the target with ip 10.0.1.11>
Jul 4 15:18:47 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused)
Jul 4 15:18:50 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused)
Jul 4 15:18:50 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused)
Jul 4 15:18:51 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused)
Jul 4 15:20:45 cpu03 kernel: session8: session recovery timed out after 120 secs
Jul 4 15:20:45 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to host)
Jul 4 15:20:47 cpu03 kernel: session6: session recovery timed out after 120 secs
Jul 4 15:20:47 cpu03 kernel: session7: session recovery timed out after 120 secs
Jul 8 20:37:04 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused)
Jul 8 20:37:04 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused)
Jul 8 20:37:07 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused)
<skip lots of these connection refused messages)
Jul 12 14:33:08 cpu03 kernel: connection8:0: detected conn error (1020)
Jul 12 14:33:08 cpu03 kernel: connection6:0: detected conn error (1020)
Jul 12 14:33:08 cpu03 kernel: connection7:0: detected conn error (1020)
Jul 12 14:33:09 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:09 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:09 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:11 cpu03 kernel: connection8:0: detected conn error (1020)
Jul 12 14:33:11 cpu03 kernel: connection6:0: detected conn error (1020)
Jul 12 14:33:11 cpu03 kernel: connection7:0: detected conn error (1020)
Jul 12 14:33:12 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:12 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:12 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:14 cpu03 kernel: connection8:0: detected conn error (1020)
Jul 12 14:33:14 cpu03 kernel: connection6:0: detected conn error (1020)
Jul 12 14:33:14 cpu03 kernel: connection7:0: detected conn error (1020)
Jul 12 14:33:15 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:15 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:15 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:17 cpu03 kernel: connection8:0: detected conn error (1020)
Jul 12 14:33:17 cpu03 kernel: connection6:0: detected conn error (1020)
Jul 12 14:33:17 cpu03 kernel: connection7:0: detected conn error (1020)
Jul 12 14:33:18 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:18 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:18 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:20 cpu03 kernel: connection8:0: detected conn error (1020)
Jul 12 14:33:20 cpu03 kernel: connection6:0: detected conn error (1020)
Jul 12 14:33:20 cpu03 kernel: connection7:0: detected conn error (1020)
Jul 12 14:33:21 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:21 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:21 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:23 cpu03 kernel: connection8:0: detected conn error (1020)
Jul 12 14:33:23 cpu03 kernel: connection6:0: detected conn error (1020)
Jul 12 14:33:23 cpu03 kernel: connection7:0: detected conn error (1020)
Jul 12 14:33:24 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:24 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:24 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:26 cpu03 kernel: connection8:0: detected conn error (1020)
Jul 12 14:33:26 cpu03 kernel: connection6:0: detected conn error (1020)
Jul 12 14:33:26 cpu03 kernel: connection7:0: detected conn error (1020)
Jul 12 14:33:27 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:27 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:27 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:29 cpu03 kernel: connection8:0: detected conn error (1020)
Jul 12 14:33:29 cpu03 kernel: connection6:0: detected conn error (1020)
Jul 12 14:33:29 cpu03 kernel: connection7:0: detected conn error (1020)
Jul 12 14:33:30 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:33:30 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
<skip lots of these>
Jul 12 14:35:23 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:35:23 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:35:23 cpu03 iscsid: connection6:0 is operational after recovery (9 attempts)
Jul 12 14:35:26 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:35:26 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03)
Jul 12 14:35:29 cpu03 iscsid: connection7:0 is operational after recovery (9 attempts)
Jul 12 14:35:29 cpu03 iscsid: connection8:0 is operational after recovery (9 attempts)
Jul 12 15:42:57 cpu03 kernel: scsi39 : iSCSI Initiator over TCP/IP
Jul 12 15:42:57 cpu03 iscsid: Could not set session34 priority. READ/WRITE throughout and latency could be affected.
Jul 12 15:42:58 cpu03 kernel: scsi 39:0:0:0: RAID IET Controller 0001 PQ: 0 ANSI: 5
Jul 12 15:42:58 cpu03 kernel: scsi 39:0:0:0: Attached scsi generic sg129 type 12
Jul 12 15:42:58 cpu03 kernel: scsi 39:0:0:1: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Jul 12 15:42:58 cpu03 kernel: sd 39:0:0:1: Attached scsi generic sg130 type 0
Jul 12 15:42:58 cpu03 kernel: scsi 39:0:0:2: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Jul 12 15:42:58 cpu03 kernel: sd 39:0:0:2: Attached scsi generic sg131 type 0
Jul 12 15:42:58 cpu03 iscsid: Connection34:0 to [target: iqn.2012-04.com.edirectpublishing.disk06:6b, portal: 10.0.1.11,3260] through [iface: default] is operational now
Jul 12 15:43:12 cpu03 kernel: scsi40 : iSCSI Initiator over TCP/IP
Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:0: RAID IET Controller 0001 PQ: 0 ANSI: 5
Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:0: Attached scsi generic sg132 type 12
Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:1: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Jul 12 15:43:12 cpu03 kernel: sd 40:0:0:1: Attached scsi generic sg133 type 0
Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:2: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Jul 12 15:43:12 cpu03 kernel: sd 40:0:0:2: Attached scsi generic sg134 type 0
Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:3: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Jul 12 15:43:12 cpu03 kernel: sd 40:0:0:3: Attached scsi generic sg135 type 0
Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:4: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Jul 12 15:43:12 cpu03 kernel: sd 40:0:0:4: Attached scsi generic sg136 type 0
Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:5: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Jul 12 15:43:12 cpu03 kernel: sd 40:0:0:5: Attached scsi generic sg137 type 0
Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:6: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Jul 12 15:43:12 cpu03 kernel: sd 40:0:0:6: Attached scsi generic sg138 type 0
Jul 12 15:43:12 cpu03 iscsid: Could not set session35 priority. READ/WRITE throughout and latency could be affected.
Jul 12 15:43:12 cpu03 iscsid: Connection35:0 to [target: iqn.2012-04.com.edirectpublishing.disk06:6e, portal: 10.0.1.11,3260] through [iface: default] is operational now
Jul 12 15:43:16 cpu03 kernel: scsi41 : iSCSI Initiator over TCP/IP
Jul 12 15:43:16 cpu03 iscsid: Could not set session36 priority. READ/WRITE throughout and latency could be affected.
Jul 12 15:43:17 cpu03 kernel: scsi 41:0:0:0: RAID IET Controller 0001 PQ: 0 ANSI: 5
Jul 12 15:43:17 cpu03 kernel: scsi 41:0:0:0: Attached scsi generic sg139 type 12
Jul 12 15:43:17 cpu03 kernel: scsi 41:0:0:1: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Jul 12 15:43:17 cpu03 kernel: sd 41:0:0:1: Attached scsi generic sg140 type 0
Jul 12 15:43:17 cpu03 kernel: scsi 41:0:0:2: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Jul 12 15:43:17 cpu03 kernel: sd 41:0:0:2: Attached scsi generic sg141 type 0
Jul 12 15:43:17 cpu03 iscsid: Connection36:0 to [target: iqn.2012-04.com.edirectpublishing.disk06:6f, portal: 10.0.1.11,3260] through [iface: default] is operational now
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Unit Not Ready
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Sense Key : Illegal Request [current]
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Add. Sense: Logical unit not supported
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] READ CAPACITY(16) failed
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Sense Key : Illegal Request [current]
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Add. Sense: Logical unit not supported
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] READ CAPACITY failed
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Sense Key : Illegal Request [current]
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Add. Sense: Logical unit not supported
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Test WP failed, assume Write Enabled
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Asking for cache data failed
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Assuming drive cache: write through
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Unit Not Ready
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Sense Key : Illegal Request [current]
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Add. Sense: Logical unit not supported
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] READ CAPACITY(16) failed
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Sense Key : Illegal Request [current]
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Add. Sense: Logical unit not supported
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] READ CAPACITY failed
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Sense Key : Illegal Request [current]
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Add. Sense: Logical unit not supported
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Test WP failed, assume Write Enabled
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Asking for cache data failed
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Assuming drive cache: write through
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Unit Not Ready
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Sense Key : Illegal Request [current]
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Add. Sense: Logical unit not supported
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] READ CAPACITY(16) failed
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Sense Key : Illegal Request [current]
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Add. Sense: Logical unit not supported
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] READ CAPACITY failed
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Sense Key : Illegal Request [current]
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Add. Sense: Logical unit not supported
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Test WP failed, assume Write Enabled
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Asking for cache data failed
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Assuming drive cache: write through
Jul 14 23:10:39 cpu03 kernel: sd 18:0:0:2: [sdan] Unit Not Ready
Jul 14 23:10:39 cpu03 kernel: sd 18:0:0:2: [sdan] Sense Key : Illegal Request [current]
Jul 14 23:10:39 cpu03 kernel: sd 18:0:0:2: [sdan] Add. Sense: Logical unit not supported
Jul 14 23:10:39 cpu03 kernel: sd 18:0:0:2: [sdan] READ CAPACITY(16) failed
Jul 14 23:10:39 cpu03 kernel: sd 18:0:0:2: [sdan] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
<skip lots of these>
Jul 24 16:51:30 cpu03 kernel: scsi 24:0:0:5: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Jul 24 16:51:30 cpu03 kernel: sd 24:0:0:5: Attached scsi generic sg142 type 0
Jul 24 16:51:30 cpu03 kernel: scsi 29:0:0:5: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Jul 24 16:51:30 cpu03 kernel: sd 29:0:0:5: Attached scsi generic sg143 type 0
I am not sure. The problem is that the code in Centos is older and I do
not know what other patches they have put in there.

Did you delete the luns from userspace at some point? Run the scsi
rescan bug script or write to the remote/delete sysfs file or did you
run the iscsi logout command? Did you do some login command after you
did your management stuff? It looks like you created new sessions but
the old ones were left there?

Are the /dev/sdX nodes for the old sessions still there?

What does

iscsiadm -m session -P 3

report?

It looks like you might be hitting a bug in older kernels where when
devices/sessions are removed the scsi layer could cause hangs. The iscsi
rescan command then gets caught on that.
--
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to open-iscsi-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/groups/opt_out.
Loading...