Sunday, June 10, 2012

Oracle VM at home

Thanks to Yury Velikanov posts about Oracle VM Server I start my journey with that tool. First of all installation and configuration of Oracle VM 3.1 and Oracle VM Manager on one box went well and I was able to connect to it via browser (see Yury's posts for details). I have started configuration of environment but there was first glitch. Oracle VM can create local storage on whole disk only (correct me if I’m wrong) but I have installed it on my test PC already running other Linux distributions. I had 1 partition free (not whole disk) and I was unable to add it in simple way.

Adding file systems to repository using NFS on local loop interface

Oracle VM is supporting NFS and iSCSI/FC disks as well so I decided that I can use NFS to present free partition as repository. OVM is based on OEL distribution and it had NFS server already installed. So here is my configuration:
[root@OVMiddleEarth ~]# cat /etc/fstab
…
/dev/sdb2  /nfs_pool  ext3    defaults       0 0

[root@OVMiddleEarth ~]# cat /etc/exports
/nfs_pool *(rw,insecure,no_root_squash,sync)

[root@OVMiddleEarth ~]# chkconfig --level 2345 nfs on
[root@OVMiddleEarth ~]# chkconfig --list nfs
nfs             0:off   1:off   2:on    3:on    4:on    5:on    6:off

So far so good I was able to add local NFS server as repository for Oracle VM but in next 5 min I hit another issue – you can import Assemblies (pre-configured machines) via http(s)/ftp protocol only.

Adding local Apache (httpd) server 

OK – lets add Apache to Oracle VM. I have already added yum repository from OEL 5.8(thanks Yury !) so adding httpd package was simple.
[root@OVMiddleEarth ~]# ls /etc/yum.repos.d/
public-yum-el5.repo
[root@OVMiddleEarth ~]# yum install httpd
Then simple configuration change to disable welcome screen
[root@OVMiddleEarth ~]# cat /etc/httpd/conf.d/welcome.conf
#
# This configuration file enables the default "Welcome"
# page if there is no default index page present for
# the root URL.  To disable the Welcome page, comment
# out all the lines below.
#
#
#    Options -Indexes
#    ErrorDocument 403 /error/noindex.html
#
And I have moved my assemblies into /var/www/html
[root@OVMiddleEarth ~]# ls -l /var/www/html/
total 571644
-rw-r--r-- 1 root root 584785920 Jan 20 22:51 OVM_OL6U1_x86_64_PVM.ova
That was simple as well and I was ready for implementation of Virtual Machine. I started to importing assemblies using local http server (http://localhost/OVM_OL6U1_x86_64_PVM.ova ) but it hung after minute or so. I wait a while but nothing happen so I start digging. First of all there was no disk activity at all – hmmm – I know it quite well – D-state.
ps aux | grep D
– showed processes waiting in DN state so it looks like problem with NFS server. I checked /var/log/messages and this is what I found
Jun  4 13:24:21 OVMiddleearth kernel: INFO: task nfsd:3639 blocked for more than 120 seconds.
Jun  4 13:24:21 OVMiddleearth kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun  4 13:24:21 OVMiddleearth kernel: nfsd            D 0000000000000000     0  3639      2 0x00000000
Jun  4 13:24:21 OVMiddleearth kernel:  ffff8800a7e49be0 0000000000000246 00080094a7e49b60 00000000000121c0
Jun  4 13:24:21 OVMiddleearth kernel:  ffff8800a7e46500 ffff8800b0cfc0c0 ffff8800af48ac80 ffff8800af7fca80
Jun  4 13:24:21 OVMiddleearth kernel:  0000000000000000 ffff8800aa54c540 ffffffff81009d5d ffff8800a7e49bc0
Jun  4 13:24:21 OVMiddleearth kernel: Call Trace:
Jun  4 13:24:21 OVMiddleearth kernel:  [] ? xen_force_evtchn_callback+0xd/0x10
Jun  4 13:24:21 OVMiddleearth kernel:  [] ? check_events+0x12/0x20
Jun  4 13:24:21 OVMiddleearth kernel:  [] ? ext3_mark_dquot_dirty+0x60/0x60 [ext3]
Jun  4 13:24:21 OVMiddleearth kernel:  [] ? xen_restore_fl_direct_reloc+0x4/0x4
Jun  4 13:24:21 OVMiddleearth kernel:  [] ? kmem_cache_alloc+0xab/0x190
Jun  4 13:24:21 OVMiddleearth kernel:  [] schedule+0x45/0x60
Jun  4 13:24:24 OVMiddleearth kernel:  [] __mutex_lock_slowpath+0xd6/0x150
Jun  4 13:24:26 OVMiddleearth kernel:  [] ? dquot_file_open+0x4a/0x50
Jun  4 13:24:30 OVMiddleearth kernel:  [] mutex_lock+0x2b/0x50
Jun  4 13:24:32 OVMiddleearth kernel:  [] ima_rdwr_violation_check+0x67/0x100
Jun  4 13:24:33 OVMiddleearth kernel:  [] ima_file_check+0x20/0x50
Jun  4 13:24:40 OVMiddleearth kernel:  [] nfsd_open+0x121/0x170 [nfsd]
Jun  4 13:24:44 OVMiddleearth kernel:  [] nfsd_write+0xb3/0x100 [nfsd]
Jun  4 13:24:46 OVMiddleearth kernel:  [] nfsd3_proc_write+0x103/0x140 [nfsd]
Jun  4 13:24:50 OVMiddleearth kernel:  [] nfsd_dispatch+0xbb/0x220 [nfsd]
Jun  4 13:24:51 OVMiddleearth kernel:  [] svc_process_common+0x324/0x650 [sunrpc]
Jun  4 13:25:05 OVMiddleearth kernel:  [] ? nfsd_set_nrthreads+0x190/0x190 [nfsd]
Oops looks like problem with kernel / xen stack. My first idea was to google for error but only a few pages were found. Oracle VM 3.1 is latest version and it is using Oracle kernel as well so I decided to reinstall everything thing using Oracle VM 3.0.3 and test it again. After 1 h I have my Oracle VM 3.0.3 up and running and I was ready for tests. This time I was able to go one step more. I was able to import assemblies into Oracle VM but it hung when I started create template process.
Jun  4 15:23:21 OVMiddleEarth kernel: [  361.219126] INFO: task nfsd:6446 blocked for more than 120 seconds.
Jun  4 15:23:21 OVMiddleEarth kernel: [  361.219127] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun  4 15:23:21 OVMiddleEarth kernel: [  361.219128] nfsd          D ffff880062372d2c     0  6446      2 0x00000000
Jun  4 15:23:21 OVMiddleEarth kernel: [  361.219129]  ffff8800efb85960 0000000000000246 ffffffff8002c2f0 0000000000000400
Jun  4 15:23:21 OVMiddleEarth kernel: [  361.219131]  ffffffff80618bc0 ffff8800efb825c0 0000000000009480 ffff8800efb829a0
Jun  4 15:23:21 OVMiddleEarth kernel: [  361.219132]  ffff8800efb82680 ffff8800efb825c0 ffff8800f54da740 ffff8800efb829a0
Jun  4 15:23:21 OVMiddleEarth kernel: [  361.219133] Call Trace:
Jun  4 15:23:21 OVMiddleEarth kernel: [  361.219135]  [] ? target_load+0x30/0x70
Jun  4 15:23:21 OVMiddleEarth kernel: [  361.219137]  [] ? tcp_transmit_skb+0x3d3/0x730
Jun  4 15:23:21 OVMiddleEarth kernel: [  361.219138]  [] ? _spin_lock_bh+0x13/0x120
Jun  4 15:23:21 OVMiddleEarth kernel: [  361.219140]  [] __mutex_lock_slowpath+0xd9/0x1a0
Jun  4 15:23:21 OVMiddleEarth kernel: [  361.219141]  [] mutex_lock+0x1e/0x40
Jun  4 15:23:21 OVMiddleEarth kernel: [  361.219143]  [] generic_file_aio_write+0x44/0xb0
Jun  4 15:23:21 OVMiddleEarth kernel: [  361.219145]  [] ? generic_file_aio_write+0x0/0xb0
Jun  4 15:23:21 OVMiddleEarth kernel: [  361.219146]  [] do_sync_readv_writev+0xed/0x130
Jun  4 15:23:21 OVMiddleEarth kernel: [  361.219148]  [] ? iput+0x2b/0x70
Jun  4 15:23:21 OVMiddleEarth kernel: [  361.219150]  [] ? autoremove_wake_function+0x0/0x40
Jun  4 15:23:21 OVMiddleEarth kernel: [  361.219152]  [] ? find_acceptable_alias+0x23/0x140 [exportfs]
Jun  4 15:23:21 OVMiddleEarth kernel: [  361.219155]  [] ? __kmalloc+0x80/0x160
Jun  4 15:23:21 OVMiddleEarth kernel: [  361.219156]  [] ? security_file_permission+0x11/0x20
Jun  4 15:23:21 OVMiddleEarth kernel: [  361.219158]  [] do_readv_writev+0xcb/0x1e0
Jun  4 15:23:21 OVMiddleEarth kernel: [  361.219161]  [] ? nfsd_setuser+0x113/0x2d0 [nfsd]
Jun  4 15:23:21 OVMiddleEarth kernel: [  361.219164]  [] ? nfsd_setuser_and_check_port+0x5c/0x60 [nfsd]
Jun  4 15:23:21 OVMiddleEarth kernel: [  361.219165]  [] vfs_writev+0x39/0x60
Jun  4 15:23:21 OVMiddleEarth kernel: [  361.219168]  [] nfsd_vfs_write+0x106/0x430 [nfsd]
Jun  4 15:23:21 OVMiddleEarth kernel: [  361.219170]  [] ? dentry_open+0x4d/0xb0
Jun  4 15:23:21 OVMiddleEarth kernel: [  361.219173]  [] ? nfsd_open+0x15c/0x1e0 [nfsd]
Jun  4 15:23:21 OVMiddleEarth kernel: [  361.219176]  [] nfsd_write+0xe5/0x100 [nfsd]
Jun  4 15:23:21 OVMiddleEarth kernel: [  361.219179]  [] nfsd3_proc_write+0xfe/0x140 [nfsd]
Jun  4 15:23:21 OVMiddleEarth kernel: [  361.219182]  [] nfsd_dispatch+0xb5/0x230 [nfsd]
Jun  4 15:23:21 OVMiddleEarth kernel: [  361.219187]  [] svc_process+0x477/0x780 [sunrpc]
Jun  4 15:23:21 OVMiddleEarth kernel: [  361.219188]  [] ? wake_up_process+0x10/0x20
Jun  4 15:23:21 OVMiddleEarth kernel: [  361.219191]  [] ? nfsd+0x0/0x150 [nfsd]
Jun  4 15:23:21 OVMiddleEarth kernel: [  361.219193]  [] nfsd+0xbd/0x150 [nfsd]
Jun  4 15:23:21 OVMiddleEarth kernel: [  361.219195]  [] kthread+0x8e/0xa0
Jun  4 15:23:21 OVMiddleEarth kernel: [  361.219197]  [] child_rip+0xa/0x20
Jun  4 15:23:21 OVMiddleEarth kernel: [  361.219199]  [] ? kthread+0x0/0xa0
Jun  4 15:23:21 OVMiddleEarth kernel: [  361.219200]  [] ? child_rip+0x0/0x20
There were similar errors in /var/log/message file so this same issue appear in two different kernels so probably kernel version is not a problem. This time there were direct relations to network so I think for while and I decided to check network stack – and it was it – network parameters in kernel were set to defaults so I set number of parameters.
net.core.wmem_max=12582912
net.core.rmem_max=12582912
net.ipv4.tcp_rmem= 10240 87380 12582912
net.ipv4.tcp_wmem= 10240 87380 12582912
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_sack = 1
After that change issue have been solved for short time but it happen again. I have end up with installing and using tshark investigation of NFS packages and there were lot of lost ACK segments on loopback interface. I have stopped Oracle VM Manager and used set of commands to replicate unpacking assemblies after that I run Oracle VM Manager again
[root@OVMiddleEarth ~]# cat /OVS/Repositories/0004fb0000030000c7347e844b6d10ac/Assemblies/0004fb0011c5ece/unpacked/System.img | gzip -dc | dd of=/OVS/Repositories/0004fb0000030000c7347e844b6d10ac/VirtualDisks/marcin.img bs=1M
and in other window
[root@OVMiddleEarth ~]# tshark -i lo -w lo.trc
When D-state appear again I have trace file to investigate
[root@OVMiddleEarth ~]# tshark -r lo.trc | grep -i NFS
...
5238   5.870485 192.168.1.30 -> 192.168.1.30 TCP nfs > 725 [ACK] Seq=46726805 Ack=73719217 Win=194 Len=0 TSV=79041 TSER=79041
5239   5.870490 192.168.1.30 -> 192.168.1.30 TCP [TCP ACKed lost segment] nfs > 725 [ACK] Seq=46726805 Ack=73751985 Win=194 Len=0 TSV=79041 TSER=79041
5240   5.870493 192.168.1.30 -> 192.168.1.30 TCP [TCP ACKed lost segment] nfs > 725 [ACK] Seq=46726805 Ack=73784753 Win=194 Len=0 TSV=79041 TSER=79041
5241   5.870497 192.168.1.30 -> 192.168.1.30 TCP [TCP ACKed lost segment] nfs > 725 [ACK] Seq=46726805 Ack=73817521 Win=194 Len=0 TSV=79041 TSER=79041
5243   5.870500 192.168.1.30 -> 192.168.1.30 TCP nfs > 725 [ACK] Seq=46726805 Ack=73850289 Win=194 Len=0 TSV=79041 TSER=79041
5244   5.870502 192.168.1.30 -> 192.168.1.30 TCP [TCP ACKed lost segment] nfs > 725 [ACK] Seq=46726805 Ack=73883057 Win=194 Len=0 TSV=79041 TSER=79041
5246   5.912057 192.168.1.30 -> 192.168.1.30 TCP [TCP ACKed lost segment] nfs > 725 [ACK] Seq=46726805 Ack=73932209 Win=90 Len=0 TSV=79051 TSER=79041
5248   5.952204 192.168.1.30 -> 192.168.1.30 TCP nfs > 725 [ACK] Seq=46726805 Ack=73948593 Win=26 Len=0 TSV=79061 TSER=79051
5268   6.163832 192.168.1.30 -> 192.168.1.30 TCP [TCP ZeroWindow] nfs > 725 [ACK] Seq=46726805 Ack=73955249 Win=0 Len=0 TSV=79114 TSER=79114
5270   6.375827 192.168.1.30 -> 192.168.1.30 TCP [TCP Keep-Alive] 725 > nfs [ACK] Seq=73955248 Ack=46726805 Win=8197 Len=0 TSV=79167 TSER=79114
5271   6.375849 192.168.1.30 -> 192.168.1.30 TCP [TCP ZeroWindow] nfs > 725 [ACK] Seq=46726805 Ack=73955249 Win=0 Len=0 TSV=79167 TSER=79114
5272   6.799804 192.168.1.30 -> 192.168.1.30 TCP [TCP Keep-Alive] 725 > nfs [ACK] Seq=73955248 Ack=46726805 Win=8197 Len=0 TSV=79273 TSER=79167
5273   6.799820 192.168.1.30 -> 192.168.1.30 TCP [TCP ZeroWindow] nfs > 725 [ACK] Seq=46726805 Ack=73955249 Win=0 Len=0 TSV=79273 TSER=79114
5309   7.647833 192.168.1.30 -> 192.168.1.30 TCP [TCP Keep-Alive] 725 > nfs [ACK] Seq=73955248 Ack=46726805 Win=8197 Len=0 TSV=79485 TSER=79273
5310   7.647857 192.168.1.30 -> 192.168.1.30 TCP [TCP ZeroWindow] nfs > 725 [ACK] Seq=46726805 Ack=73955249 Win=0 Len=0 TSV=79485 TSER=79114
5361   9.343832 192.168.1.30 -> 192.168.1.30 TCP [TCP Keep-Alive] 725 > nfs [ACK] Seq=73955248 Ack=46726805 Win=8197 Len=0 TSV=79909 TSER=79485
5362   9.343864 192.168.1.30 -> 192.168.1.30 TCP [TCP ZeroWindow] nfs > 725 [ACK] Seq=46726805 Ack=73955249 Win=0 Len=0 TSV=79909 TSER=79114
5792  12.735830 192.168.1.30 -> 192.168.1.30 TCP [TCP Keep-Alive] 725 > nfs [ACK] Seq=73955248 Ack=46726805 Win=8197 Len=0 TSV=80757 TSER=79909
5793  12.735852 192.168.1.30 -> 192.168.1.30 TCP [TCP ZeroWindow] nfs > 725 [ACK] Seq=46726805 Ack=73955249 Win=0 Len=0 TSV=80757 TSER=79114
5966  19.519835 192.168.1.30 -> 192.168.1.30 TCP [TCP Keep-Alive] 725 > nfs [ACK] Seq=73955248 Ack=46726805 Win=8197 Len=0 TSV=82453 TSER=80757
5967  19.519866 192.168.1.30 -> 192.168.1.30 TCP [TCP ZeroWindow] nfs > 725 [ACK] Seq=46726805 Ack=73955249 Win=0 Len=0 TSV=82453 TSER=79114
So there were problems and connections have been terminated around package 5268 - 5270. So let's see what happen
[root@OVMiddleEarth ~]# tshark -r lo.trc | grep -e "^52[456]."
Running as user "root" and group "root". This could be dangerous.
524   3.914358 192.168.1.30 -> 192.168.1.30 TCP [TCP ACKed lost segment] 725 > nfs [ACK] Seq=7745 Ack=6477389 Win=6148 Len=0 TSV=78551 TSER=78551
525   3.914369 192.168.1.30 -> 192.168.1.30 TCP [TCP ACKed lost segment] 725 > nfs [ACK] Seq=7745 Ack=6510157 Win=6148 Len=0 TSV=78551 TSER=78551
526   3.914379 192.168.1.30 -> 192.168.1.30 TCP [TCP ACKed lost segment] 725 > nfs [ACK] Seq=7745 Ack=6542925 Win=6148 Len=0 TSV=78551 TSER=78551
5240   5.870493 192.168.1.30 -> 192.168.1.30 TCP [TCP ACKed lost segment] nfs > 725 [ACK] Seq=46726805 Ack=73784753 Win=194 Len=0 TSV=79041 TSER=79041
5241   5.870497 192.168.1.30 -> 192.168.1.30 TCP [TCP ACKed lost segment] nfs > 725 [ACK] Seq=46726805 Ack=73817521 Win=194 Len=0 TSV=79041 TSER=79041
5242   5.870499 192.168.1.30 -> 192.168.1.30 RPC [TCP Previous segment lost] Continuation
5243   5.870500 192.168.1.30 -> 192.168.1.30 TCP nfs > 725 [ACK] Seq=46726805 Ack=73850289 Win=194 Len=0 TSV=79041 TSER=79041
5244   5.870502 192.168.1.30 -> 192.168.1.30 TCP [TCP ACKed lost segment] nfs > 725 [ACK] Seq=46726805 Ack=73883057 Win=194 Len=0 TSV=79041 TSER=79041
5245   5.870513 192.168.1.30 -> 192.168.1.30 RPC Continuation
5246   5.912057 192.168.1.30 -> 192.168.1.30 TCP [TCP ACKed lost segment] nfs > 725 [ACK] Seq=46726805 Ack=73932209 Win=90 Len=0 TSV=79051 TSER=79041
5247   5.912074 192.168.1.30 -> 192.168.1.30 RPC Continuation
5248   5.952204 192.168.1.30 -> 192.168.1.30 TCP nfs > 725 [ACK] Seq=46726805 Ack=73948593 Win=26 Len=0 TSV=79061 TSER=79051
5249   6.023955 192.168.1.30 -> 192.168.1.30 TCP 34311 > 54321 [PSH, ACK] Seq=16637 Ack=3042 Win=48 [TCP CHECKSUM INCORRECT] Len=831 TSV=79079 TSER=78940
5250   6.024731 192.168.1.30 -> 192.168.1.30 TCP 54321 > 34311 [PSH, ACK] Seq=3042 Ack=17468 Win=48 [TCP CHECKSUM INCORRECT] Len=38 TSV=79079 TSER=79079
5251   6.024773 192.168.1.30 -> 192.168.1.30 TCP 34311 > 54321 [ACK] Seq=17468 Ack=3080 Win=48 Len=0 TSV=79079 TSER=79079
5252   6.024835 192.168.1.30 -> 192.168.1.30 TCP 34311 > 54321 [PSH, ACK] Seq=17468 Ack=3080 Win=48 [TCP CHECKSUM INCORRECT] Len=220 TSV=79079 TSER=79079
5253   6.024954 192.168.1.30 -> 192.168.1.30 TCP 54321 > 34311 [PSH, ACK] Seq=3080 Ack=17688 Win=48 [TCP CHECKSUM INCORRECT] Len=51 TSV=79079 TSER=79079
5254   6.063807 192.168.1.30 -> 192.168.1.30 TCP 34311 > 54321 [ACK] Seq=17688 Ack=3131 Win=48 Len=0 TSV=79089 TSER=79079
5255   6.071880 192.168.1.30 -> 192.168.1.30 TCP 34311 > 54321 [PSH, ACK] Seq=17688 Ack=3131 Win=48 [TCP CHECKSUM INCORRECT] Len=212 TSV=79091 TSER=79079
5256   6.072005 192.168.1.30 -> 192.168.1.30 TCP 54321 > 34311 [PSH, ACK] Seq=3131 Ack=17900 Win=48 [TCP CHECKSUM INCORRECT] Len=50 TSV=79091 TSER=79091
5257   6.072064 192.168.1.30 -> 192.168.1.30 TCP 34311 > 54321 [ACK] Seq=17900 Ack=3181 Win=48 Len=0 TSV=79091 TSER=79091
5258   6.079861 192.168.1.30 -> 192.168.1.30 TCP 57168 > 0 [SYN] Seq=0 Win=32792 Len=0 MSS=16396 TSV=79093 TSER=0 WS=8
5259   6.079876 192.168.1.30 -> 192.168.1.30 TCP 0 > 57168 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
5260   6.159933 192.168.1.30 -> 192.168.1.30 TCP 34311 > 54321 [PSH, ACK] Seq=17900 Ack=3181 Win=48 [TCP CHECKSUM INCORRECT] Len=251 TSV=79113 TSER=79091
5261   6.160069 192.168.1.30 -> 192.168.1.30 TCP 54321 > 34311 [PSH, ACK] Seq=3181 Ack=18151 Win=48 [TCP CHECKSUM INCORRECT] Len=50 TSV=79113 TSER=79113
5262   6.160126 192.168.1.30 -> 192.168.1.30 TCP 34311 > 54321 [ACK] Seq=18151 Ack=3231 Win=48 Len=0 TSV=79113 TSER=79113
5263   6.160183 192.168.1.30 -> 192.168.1.30 TCP 34311 > 54321 [PSH, ACK] Seq=18151 Ack=3231 Win=48 [TCP CHECKSUM INCORRECT] Len=236 TSV=79113 TSER=79113
5264   6.160303 192.168.1.30 -> 192.168.1.30 TCP 54321 > 34311 [PSH, ACK] Seq=3231 Ack=18387 Win=48 [TCP CHECKSUM INCORRECT] Len=26 TSV=79113 TSER=79113
5265   6.160410 192.168.1.30 -> 192.168.1.30 TCP 34311 > 54321 [PSH, ACK] Seq=18387 Ack=3257 Win=48 [TCP CHECKSUM INCORRECT] Len=249 TSV=79113 TSER=79113
5266   6.160533 192.168.1.30 -> 192.168.1.30 TCP 54321 > 34311 [PSH, ACK] Seq=3257 Ack=18636 Win=48 [TCP CHECKSUM INCORRECT] Len=50 TSV=79113 TSER=79113
5267   6.163807 192.168.1.30 -> 192.168.1.30 RPC Continuation
5268   6.163832 192.168.1.30 -> 192.168.1.30 TCP [TCP ZeroWindow] nfs > 725 [ACK] Seq=46726805 Ack=73955249 Win=0 Len=0 TSV=79114 TSER=79114
5269   6.199807 192.168.1.30 -> 192.168.1.30 TCP 34311 > 54321 [ACK] Seq=18636 Ack=3307 Win=48 Len=0 TSV=79123 TSER=79113
So it looks like that NFS connection is terminated when any other packages from Oracle VM Manager or local Oracle XE database are appear on loop back interface. Probably (I can’t prove that so far) missing ACK is a part of problem but why [nfsd] is hanging on writing on disk ?
Anyway I still want to test Oracle VM so I decided to use iSCSI on loopback instead of NFS.

Adding local iSCSI server

I have found documentation how to set up iSCSI server here. So let’s start again:
# yum install scsi-target-utils

====================================================================================================================================================================
 Package                                       Arch                             Version                                  Repository                            Size
====================================================================================================================================================================
Installing:
 scsi-target-utils                             x86_64                           1.0.14-2.el5                             el5_latest                           172 k
Installing for dependencies:
 libibverbs                                    x86_64                           1.1.3-2.el5                              el5_latest                            45 k
 libnes                                        x86_64                           0.9.0-2.el5                              el5_latest                            13 k
 librdmacm                                     x86_64                           1.0.10-1.el5                             el5_latest                            22 k
 openib                                        noarch                           1.4.1-6.el5                              el5_latest                            20 k
 perl                                          x86_64                           4:5.8.8-38.el5                           el5_latest                            12 M
 perl-Config-General                           noarch                           2.40-1.el5                               el5_latest                            68 k
Now it is time to add some block devices to share. We need at least two – as one has to be used as voting disk for OCFS2 and other one will be used for keeping data. TGT (iSCSI server) is quite flexible so we can use file on file system presented as block device.
[root@OVMiddleEarth ~]# dd if=/dev/zero of=/etc/tgt/small_disk bs=1M count=1000
[root@OVMiddleEarth ~]# vi /etc/tgt/targets.conf

    backing-store /dev/sdb2 # my free partition
    backing-store /etc/tgt/small_disk # small file for OCFS vote at least 1 GB
    write-cache off # this is very important to disable write cache as TGT is killed and cache will be not sync at the reboot

Lets start TGT
[root@OVMiddleEarth ~]# service tgtd start
[root@OVMiddleEarth ~]# chkconfig tgtd on
Little hack to start TGTD just after network service and before iSCSI
[root@OVMiddleEarth ~]#  cd /etc
[root@OVMiddleEarth etc]# mv rc2.d/S39tgtd rc2.d/S11tgtd
[root@OVMiddleEarth etc]# mv rc3.d/S39tgtd rc3.d/S11tgtd
[root@OVMiddleEarth etc]# mv rc4.d/S39tgtd rc4.d/S11tgtd
[root@OVMiddleEarth etc]# mv rc5.d/S39tgtd rc5.d/S11tgtd
[root@OVMiddleEarth etc]# ls -lR rc?.d/*tgt*
lrwxrwxrwx 1 root root 14 Jun  7 17:33 rc0.d/K35tgtd -> ../init.d/tgtd
lrwxrwxrwx 1 root root 14 Jun  7 17:33 rc1.d/K35tgtd -> ../init.d/tgtd
lrwxrwxrwx 1 root root 14 Jun  7 17:33 rc2.d/S11tgtd -> ../init.d/tgtd
lrwxrwxrwx 1 root root 14 Jun  7 17:33 rc3.d/S11tgtd -> ../init.d/tgtd
lrwxrwxrwx 1 root root 14 Jun  7 17:33 rc4.d/S11tgtd -> ../init.d/tgtd
lrwxrwxrwx 1 root root 14 Jun  7 17:33 rc5.d/S11tgtd -> ../init.d/tgtd
lrwxrwxrwx 1 root root 14 Jun  7 17:33 rc6.d/K35tgtd -> ../init.d/tgtd
And now load new configuration
   
[root@OVMiddleEarth ~]# tgt-admin --execute
List Active Targets
[root@OVMiddleEarth ~]# tgtadm --lld iscsi --mode target --op show
Target 1: iqn.2008-09.com.example:server1.trial
    System information:
        Driver: iscsi
        State: ready
    I_T nexus information:
    LUN information:
        LUN: 0
            Type: controller
            SCSI ID: IET     00010000
            SCSI SN: beaf10
            Size: 0 MB, Block size: 1
            Online: Yes
            Removable media: No
            Readonly: No
            Backing store type: null
            Backing store path: None
            Backing store flags:
        LUN: 1
            Type: disk
            SCSI ID: IET     00010001
            SCSI SN: beaf11
            Size: 200006 MB, Block size: 512
            Online: Yes
            Removable media: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/sdb2
            Backing store flags:
        LUN: 2
            Type: disk
            SCSI ID: IET     00010002
            SCSI SN: beaf12
            Size: 1049 MB, Block size: 512
            Online: Yes
            Removable media: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /etc/tgt/small_disk
            Backing store flags:
    Account information:
    ACL information:
        ALL
So now we have iSCSI server and we can add it to Oracle VM. I have added it as new Storage Array using iSCSI Storage Server and I have added new iSCSI initiators in Access Group - here is my configuration:

When both LUN(s) have been presented to Oracle VM I have created server pool (it has to be clustered one even for one server - still not sure why but I was unable to create OCFS repository for not clustered server pool).



Then I have created repository and was able to import Assemblies and create template without any issues. Creating my first VM from template was possible as well and at the end I have my first Oracle VM machine.


So what I like in Oracle VM :
  • Assemblies from Oracle with preconfigured tools
What I dislike in Oracle VM (it can change when I will know that tool better) :
  • Tricky installation process in non production environment 
  • Local storage (repository) on whole empty disk until you will setup NFS / iSCSI on local host
  • Assemblies imported via http(s)/ftp path – why there is no SCP and register functionality (or maybe I don’t know how to do it)
  • Oracle VM manager is quite big – after some tuning it can run on 2 GB but still this is much for management only
  • No command line tools – tricky to manage if you have ssh connection only
One more hack - Oracle VM Manager should be started after all Oracle VM Server processes
[root@OVMiddleEarth etc]# mv rc2.d/S99ovmm rc2.d/S99xovmm
[root@OVMiddleEarth etc]# ls -l rc?.d/S*ovmm
lrwxrwxrwx 1 root root 14 Jun  4 14:46 rc2.d/S99xovmm -> ../init.d/ovmm
lrwxrwxrwx 1 root root 14 Jun  4 14:46 rc3.d/S99xovmm -> ../init.d/ovmm
lrwxrwxrwx 1 root root 14 Jun  4 14:46 rc4.d/S99xovmm -> ../init.d/ovmm
lrwxrwxrwx 1 root root 14 Jun  4 14:46 rc5.d/S99xovmm -> ../init.d/ovmm
regards,
Marcin

1 comments:

Morten Jensen said...

Hey Marcin,

Just one point for working around the disk issue when you have partitions on it.
I quickly came to the conclusion that booting from something else would probably solve this problem.
So I am using an 8MB USB stick to boot from and before installation I make sure that I have no partitions on the harddisk - the installation takes less than 4GB.
Works nicely and no problem creating a local repository.
The very old USB stick I use is very slow - and could be due an upgrade - but once the VM Server is booted there's very little I/O going to/from the device.