How Objects are Stored in the OpenSim Asset Database

From Open Metaverse Wiki
Revision as of 13:54, 28 September 2023 by Kayaker Magic (talk | contribs) (hack off incomplete sentence)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Opensimulator: Grid Owners: How Objects are Stored in the OpenSim Asset Database

How Assets are Stored in OpenSim

I'm writing code to examine the OpenSim databases. Why? To calculate perceptual hashes, detect copy-botted items, search for corrupted textures, etc. In the Region Databases, objects are stored in a relational way: For example each prim in a linked object has a separate database record in the prim, primitems and primshapes tables. However the main asset database stores only 'atomic' items like the data for one mesh or a texture map. Then there is another atomic 'Object Object' that describes how to link them all together. I believe that system prims are just parameters in this Object and there is no need to store them as atoms in this database. While figuring out how to read Objects, I decided to write down my notes and share them here. At the end of this document are some diagrams showing the process in short form, skip ahead if you like pictures. When it is in better shape I'll also stick the PHP code that reads object somewhere here on the Wiki.

Assets as Blobs vs File System

OpenSim has two ways to store the data of each asset: In an 'asset' database as blobs (binary database objects), or as files in a tree of folders starting with a folder named 'fsassets' (for File System Assets). The documentation says that blobs are good for small grids, fsassets are better for large grids. Database experts say that blobs are super efficient in modern databases and will always be more efficient. Most grids today use fsassets, despite the problem that documentation at OpenSimulator.org about this option is sorely missing. I got used to using fsassets so my examples will be assuming this mode. Using blobs would actually be simpler so if that is your setup, I will leave figuring out how to read those as an exercise for the student.

Searching for User IDs, Inventories and Assets

I'm searching for items from users one at a time, so I'm reading data from the user accounts. I need to know the UUID of each user, here's the SQL to get one by Avatar Name from the Robust database:

SELECT PrincipalID FROM UserAccounts WHERE FirstName='Kayaker' AND LastName='Magic';

I could use this to search for all the inventory folders of that user and then search for all items in each folder, but it is easier to just find all the items that belong to that user:

SELECT  assetID, assetType, inventoryName, creatorID FROM inventoryItems
	 WHERE avatarID='$PrincipalID';

If the assetType is 6 (Object) then the assetID references an XML block that describes the root object and all its child prims. To fetch that block you next read the SHA256 hash from the fsassets table:

SELECT hash FROM fsassets WHERE id='$assetID';

Sample hash: 583379B04FEDE508BA980972A17F79E07DCA238DB98E1789EEF073AB06DF7ED6

Hack this up into pieces to make a file name for the fsassets folder tree. The first 2 letters make the first folder layer, 2 more letters for the second folder layer, 2 more for the 3rd folder layer, 4 more for the last folder layer, then the file name is the whole hash with .gz as the file type. For example:

.../fsassets/data/58/33/79/B04F/583379B04FEDE508BA980972A17F79E07DCA238DB98E1789EEF073AB06DF7ED6.gz

Actually, the server first looks for a file without the .gz on the end and reads that one if it exists. Then tries again (probably wasting time traversing all the folders again) to try with .gz on the end.

Reading an Object Object (assetType 6)

Read that file, gunzip it and the result will be a chunk of xml. It is packed into a single line with no newlines or indentations. I ran an example through "Prettify XML" to convert it into an almost human readable format and here are the first few lines:

<SceneObjectGroup>
   <RootPart>
       <SceneObjectPart>
           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
           xmlns:xsd="http://www.w3.org/2001/XMLSchema">
           <AllowedDrop>false</AllowedDrop>
           <CreatorID>
               <UUID>332f0d3e-035b-427d-b4b5-4394976e0c47</UUID>
           </CreatorID>
           <FolderID>
               <UUID>7e017b15-2ceb-4a0f-8a0d-977ba861d97e</UUID>
           </FolderID>
           <InventorySerial>0</InventorySerial>
           <UUID>
               <UUID>7e017b15-2ceb-4a0f-8a0d-977ba861d97e</UUID>
           </UUID>
           <LocalId>2578086358</LocalId>
           <Name>Spectacles</Name>
               ...

This XML data will describe everything about the object that you would find in multiple tables and records in the region database. The <RootPart> contains the root prim of the object, <OtherParts> contains a list of <Part>s that describe all the child prims.

The <RootPart> and <Part>s each have a <SceneObjectPart> that each have a bunch of values including the prim type, flexy, light, prim torturing parameters, etc. What I am interested in are things like the permissions, <CreatorID>, <CreatorData>, <Name>, <LinkNum> and <Shape>.

The SceneObjectPart can have a <TaskInventory> section containing a list of <TaskInventoryItem>'s, one for each item in the prim inventory.

The <Shape> contains a <TextureEntry>, <ExtraParams>, <SculptTexture>, <SculptType> among many other things.

The <TextureEntry> contains a base64 encoded blob with a list of texture asset UUIDs in binary. Each UUID is 16 binary bytes terminated by an extra byte. If the extra byte is 0 that is the end of the list of diffuse textures for the prim. I don't know how to read the spectacular and normal textures yet.

If the <SculptType> is 6 then the <SculptTexture> is the asset UUID of a mesh in the asset table. Use the asset UUID to fetch a hash from the fsasset table, use that hash to read a file from the hard disk, gunzip the file, the result is a block of memory with a mesh in Binary LLSD format. This is documented down near the bottom of https://wiki.secondlife.com/wiki/LLSD. Ignore all the XML, JSON and 'Notation Serialization' discussion, they never used that! Only learn the Binary LLSD section. There are libraries referenced there for reading LLSD in many languages, but the PHP library only reads and writes the XML file format which is never used by Linden Labs or OpenSim.

Each <TaskInventoryItem> contains an <AssetID>, <CreatorID>, permissions, <InvType> <LastOwnerID>, <Name>, <OwnerID>, <Type>. Yes, asset type and inventory type are similar but different. For example inventory type 'Wearable' can have an asset type of 'Clothing', 'Bodypart', etc. And yes, the OpenSimulator.org documentation does call out a difference between the inventory and asset assignments for Animation and Gesture. If the asset type is Clothing or Bodypart then there is a sub-type for Shape, Skin, Hair etc. This subtype is hacked into the lower bits of a <Flags> value.

Value Inventory Types Asset Types
-2 Material (not used)
-1 User Defined Unknown
0 Texture Texture
1 Sound Sound
2 Calling Card Calling Card
3 Landmark Landmark
4
5 Clothing
6 Object Object
7 Notecard Notecard
8
9
10 Script LSLText (Script)
11
12
13 Bodypart
14
15 Snapshot
16
17 Attachment
18 Wearable
19 Animation
20 Gesture Animation
21 Gesture
22 Mesh (not used?)
23
24 Link
25 Settings
26 Material (not used?)
...
49 Mesh
57 Material (PBR)

Diagram: How to Read from the Asset Database

Diagram: Outline of an Object Asset Type