Corporate Home Open Source Home
Syndicate content
Eucalyptus
11 replies [Last post]
mrhearn
Offline
Joined: 08/04/2009

Install Type: From 1.6.2 Source
O/S: Oracle Enterprise Linux
Tiers: server1: CLC/Walrus, server2:SC/CC Server3: NC
Walrus: 1 x EMi and associated kernel and ramdisk loaded into Walrus

As vm deployments seem to take longer than expected I watched for processes on the NC for any clues.
I noticed that every time a deployment runs, a copy is initiated from the running location to the cache. e.g.

cp -a /eucalyptus/instances/user1/i-41EC08DE/root /eucalyptus/instances/eucalyptus/emi-DC34182C/root

Isn't the expected behaviour to do this once, to cache images locally?

Secondly, although I seem to have enough resources (mem, disk space) I am experiencing the warning - Not Enough Resources. I ensured the 'type' used covers the emi space requirements and so looked at the CC log to determine what it thought available resources are and believe it is returning false information about available disk. From the example below I agree with mem and cores but not disk.
node=xh0755.mydomain.com mem=16155/15387 disk=2287/1237 cores=4/1

What mechanism is by the CC to gather disk data - libvirt?

Many Thanks

graziano
Offline
Joined: 01/14/2010
Hello, the NC will copy the

Hello,

the NC will copy the image into the cache the first time it sees (asked to download) the image. After that it will need to make a copy per instance. So you will see other cp commands every time you start an instance. If it's taking too long, perhaps the image is very big? Or the disk subsystem very slow?

You can look at eucalyptus.conf to see where the NC is having the cache for the instances, and check how much space you have available. Also, make sure that you have enough public IP (in case you are running MANAGED* mode). The message is telling that there are about 15GB available disk space: you shouldn't look at those number, since are Eucalyptus internal and they are subject to change from release to release.

cheers
graziano

mrhearn
Offline
Joined: 08/04/2009
Clarity?

Graziano - As ever thanks for your response!
Can you clarify the last part of your explanation. You said that the return data is indicating that there is roughly 15Gb of disk available from.

node=xh0755.mydomain.com mem=16155/15387 disk=2287/1237 cores=4/1

Did you mix up the mem and disk values. Assuming the values are in Gb's isn't there 1.2Gb available?

In actual fact that's not bothering me too much. What is is that Eucalyptus does not seem to be reporting back accurate disk resource. For instance if I look at the latest disk data data from the cc.log file I see

node=xh0755.us.oracle.com mem=15360/14336 disk=210/0 cores=16/12

Because the cc see's zero disk available( disk=210/0 ) it will not deploy a vm to the NC . However, if i look at the disk mount point / disk resource on the NC there seems to be plenty of resource available to the INSTANCE_PATH, /OVS.

[root@xh0755 ~]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sdb1 3050060 2285044 607584 79% /
/dev/sda1 101086 45901 49966 48% /boot
tmpfs 524372 0 524372 0% /dev/shm
/dev/sdc1 103210940 39036784 58931348 40% /OVS

Is this a bug or have I misunderstood something?

Thanks again.

graziano
Offline
Joined: 01/14/2010
Hello, yes you are right: I

Hello,

yes you are right: I swapped the values. My apologies.

Can you restart the NC which has no disk, and post here the beginning of the nc.log (or course the beginning after it restarts)? There should be there some information on disk space the NC believes to be available. Also, is that a machine with a lot of disk activity? That is a lot of variability in the available disk? And did you restart the nc after you changed the INSTANCE_PATH?

cheers
graziano

mrhearn
Offline
Joined: 08/04/2009
Requested Details

Graziano
To answer your questions:
- The disk is dedicated to the INSTANCE_PATH so activity is confined to the 3 vm's deployed to it which are not being actively used. (I'm testing capacity and stability at this point)

- The INSTANCE_PATH was not changed between NC service restarts

- I restarted the NC and here is the startup details found within nc.log
[Tue Mar 9 03:16:06 2010][021326][EUCAINFO ] NC is looking for configuration in /opt/eucalyptus162-0/etc/eucalyptus/eucalyptus.conf//opt/eucalyptus162-0/etc/eucalyptus/eucalyptus.local.conf
[Tue Mar 9 03:16:06 2010][021326][EUCAINFO ] SC is looking for configuration in files (/opt/eucalyptus162-0/etc/eucalyptus/eucalyptus.conf,/opt/eucalyptus162-0/etc/eucalyptus/eucalyptus.local.conf)
[Tue Mar 9 03:16:06 2010][021326][EUCAINFO ] euca_init_cert(): using file /opt/eucalyptus162-0/var/lib/eucalyptus/keys/node-cert.pem
[Tue Mar 9 03:16:06 2010][021326][EUCAINFO ] euca_init_cert(): using file /opt/eucalyptus162-0/var/lib/eucalyptus/keys/node-pk.pem
[Tue Mar 9 03:16:06 2010][021326][EUCADEBUG ] doInitialized() invoked
[Tue Mar 9 03:16:06 2010][021326][EUCADEBUG ] system_output(): [/opt/eucalyptus162-0/usr/lib/eucalyptus/euca_rootwrap /opt/eucalyptus162-0/usr/share/eucalyptus/get_xen_info]
[Tue Mar 9 03:16:06 2010][021326][EUCAINFO ] Using 16 cores
[Tue Mar 9 03:16:06 2010][021326][EUCAINFO ] Using 15360 memory
[Tue Mar 9 03:16:06 2010][021326][EUCAINFO ] looking for existing domains
[Tue Mar 9 03:16:07 2010][021326][EUCAINFO ] - adopted running domain i-3DBD0733 from user user2
[Tue Mar 9 03:16:07 2010][021326][EUCAINFO ] - adopted running domain i-3E6107F5 from user admin
[Tue Mar 9 03:16:07 2010][021326][EUCAINFO ] - adopted running domain i-425E076F from user admin
[Tue Mar 9 03:16:07 2010][021326][EUCADEBUG ] vnetApplySingleTableRule(): applying single table (filter) rule (-A FORWARD -p udp -m udp --sport 67:68 --dport 67:68 -j LOG --log-level 6)
[Tue Mar 9 03:16:07 2010][021326][EUCAINFO ] vnetInit(): VNET Configuration: eucahome=/opt/eucalyptus162-0, path=/opt/eucalyptus162-0/var/run/eucalyptus/net, dhcpdaemon=, dhcpuser=, pubInterface=eth0, privInterface=eth0, bridgedev=xenbr0, networkMode=SYSTEM
[Tue Mar 9 03:16:07 2010][021326][EUCAINFO ] checking the integrity of instances directory (/OVS/eucalyptus/instances)
[Tue Mar 9 03:16:07 2010][021326][EUCAWARN ] warning: could not stat file /OVS/eucalyptus/instances/user2/i-3DBD0733/root
[Tue Mar 9 03:16:07 2010][021326][EUCAWARN ] warning: non-standard instance directory /OVS/eucalyptus/instances/user2/i-3DBD0733
[Tue Mar 9 03:16:07 2010][021326][EUCAWARN ] warning: could not stat file /OVS/eucalyptus/instances/admin/i-425E076F/root
[Tue Mar 9 03:16:07 2010][021326][EUCAWARN ] warning: non-standard instance directory /OVS/eucalyptus/instances/admin/i-425E076F
[Tue Mar 9 03:16:07 2010][021326][EUCAWARN ] warning: could not stat file /OVS/eucalyptus/instances/admin/i-3E6107F5/root
[Tue Mar 9 03:16:07 2010][021326][EUCAWARN ] warning: non-standard instance directory /OVS/eucalyptus/instances/admin/i-3E6107F5
[Tue Mar 9 03:16:07 2010][021326][EUCAINFO ] checking the integrity of the cache directory (/OVS/eucalyptus/instances/eucalyptus/cache)
[Tue Mar 9 03:16:07 2010][021326][EUCAINFO ] - cached image eki-2D421DA0 directory, size=1929801
[Tue Mar 9 03:16:07 2010][021326][EUCAINFO ] - cached image eri-5AAB1E30 directory, size=2305679
[Tue Mar 9 03:16:07 2010][021326][EUCAINFO ] Maximum disk available = 210 (under /OVS/eucalyptus/instances)
[Tue Mar 9 03:16:07 2010][021326][EUCADEBUG ] doDescribeResource() invoked
[Tue Mar 9 03:16:07 2010][021326][EUCADEBUG ] Starting monitoring thread

- INSTANCE_PATH size is:
[root@xh0755 instances]# df -h /OVS/eucalyptus/instances
Filesystem Size Used Avail Use% Mounted on
/dev/sdc1 99G 38G 57G 40% /OVS

Cheers!
M.

graziano
Offline
Joined: 01/14/2010
Hello, may I ask which

Hello,

may I ask which compiler did you use to compile Eucalyptus? Also can you try to add a MAX_DISK=0 in eucalyptus.conf on the NC?

cheers
graziano

mrhearn
Offline
Joined: 08/04/2009
Graziano First things first.

Graziano
First things first. I added MAX_DISK to the nc's config file trying various values.
Adding MAX_DISK=0 had no affect to the reported disk resource
node=xh0755.us.oracle.com mem=15360/14336 disk=210/0 cores=16/12

Adding a value between 1 and 209 to MAX_DISK was echoed in the reported max disk output e.g
MAX_DISK=100
node=xh0755.us.oracle.com mem=15360/14336 disk=100/0 cores=16/12

Adding a value greater than 209 and no change to the output
node=xh0755.us.oracle.com mem=15360/14336 disk=210/0 cores=16/12

Regarding the compilation question, the code was compiled on the same machine i'm running the nc on.
uname -a
Linux xh0755.us.oracle.com 2.6.18-128.2.1.4.9.el5xen #1 SMP Fri Oct 9 14:57:31 EDT 2009 i686 i686 i386 GNU/Linux

Any clues?

Thanks

graziano
Offline
Joined: 01/14/2010
Hello, this is somewhat

Hello,

this is somewhat puzzling. Eucalyptus is doing an statfs on the INSTANCE_PATH directory and figures out how much space is available to it from the result. So, in your case, it looks like Eucalyputs is finding only 210M available (looking at the nc.log is probabaly better in this case) and I'm not quite sure why. Are there permissions problems? Is that directory a link? Can you try to create another temporary INSTANCE_PATH and check what Eucalyputs finds on it?

cheers
graziano

mrhearn
Offline
Joined: 08/04/2009
I believe solved

Graziano
I decided to look at the source code to understand what the problem maybe.
I found the underlying calculations within handlers.c and added a couple of lines to echo out the values of the various values statfs was collecting and the result of multiplying the block size and avail blocks. e.g

nc_state.disk_max = fs.f_bsize * fs.f_bavail + instances_bytes; /* max for Euca, not total */

What I found was that although the values of fs.f_bsize fs.f_bavail were correct, the result of multiplying the values was incorrect.
To cut a long story short (& not being a seasoned C developer), after noticing disk_max was of the long long type I created a couple of vars, BSIZE, BAVAIL, assigned f_bsize and f_bavail accordingly and slightly altered the code as mentioned above. e.g.

BAVAIL long long;
BSIZE long long;
..................................
BAVAIL = fs.f_bavail;
BSIZE = fs.f_bsize;
nc_state.disk_max = BSIZE * BAVAIL + instances_bytes; /* max for Euca, not total */

After making the adjustment the reported disk usage under INSTANCE_PATH is making more sense.

node=xh0755 mem=15360/14336 disk=57554/55981 cores=16/12

as opposed

node=xh0755 mem=15360/14336 disk=210/0 cores=16/12
df
/dev/sdc1 103210940 39036784 58931348 40% /OVS

Hope this makes sense. Bug?

Cheers

graziano
Offline
Joined: 01/14/2010
Hello, thanks for digging

Hello,

thanks for digging this up! May I ask you to try a different change? In the incriminating line, can you add (long long) in front of f_bavail and f_bsize and retest? Also do you have a 32bit OS or a 64 bit one?

And yes this is clearly a bug!

cheers
graziano

mrhearn
Offline
Joined: 08/04/2009
Graziano Adding (long long)

Graziano
Adding (long long) is a far neater solution and works just fine e.g.

nc_state.disk_max = (long long) fs.f_bsize * (long long) fs.f_bavail + instances_bytes;

OS is 32bit.

graziano
Offline
Joined: 01/14/2010
Hello, great! Thanks for

Hello,

great! Thanks for checking!

cheers
graziano