Xen 2.0 Roadmap. Feedback appreciated.

Coordinator
Apr 12, 2010 at 10:34 PM
Edited Apr 12, 2010 at 10:40 PM

Hi,
First of all, a warning: this is entirely subject to change.

My first question is: How important is maintaining support for XNA 3.1? I'm expecting performance to go down slightly moving to 4.0 (although I'm prepared to be surprised! and realistically it's not going to affect many people).

First of all is the big issue, support for shaders in XNA 4.
After a lot of thinking, I believe I've found an efficient solution that is very similar to how I get Xbox shaders to work*. The end result will be that each shader will have an internal effect, each with a single VS and PS constant array dealt with using a single EffectParam. Hopefully by reducing usage of the effect system to a minimum it'll keep things efficient.
*In Xen Xbox shader don't use Effects - it would just be a similar build process.
The point is, in theory the shader system won't change much at all. It'll still have all the same capabilities and limitations that is currently has. So automatic bindings (eg, GLOBAL, WORLDVIEWPROJECTION, etc) will all still work just the same) but it will still be limited to one pass and no state changes (which I think is a good thing!).

Second question: How important is it to support standard XNA Effects alongside Xen shaders? And if it is important, how important is supporting multi pass Effects?

Minor API changes: 

I intend that 2.0 break backward compatibility. And breaks it fairly significantly.

I would like to cut out some of the more specific, less necessary features / classes. This includes things such as the Batch drawing interfaces, etc. I think they are probably just confusing. Also I want to have a bit of a cull in Xen.Ex, there are a couple of things in there that don't really fit.
I haven't decided on the scope here, but I'd prefer a leaner API.

Major API changes:

I want to break DrawState into 2 or 3 separate states. The first is 'FrameState', which will be passed in to the Application Draw method (which may be renamed 'Frame'). This will include adding an interface IFrameDraw, or similar. The intention of this change is that it defines a distinction between a class that draws geometry and a class that performs render target operations. Such an example is all the DrawTarget classes and the BlurFilter class. These cannot be nested for very specific reasons, yet this isn't obvious or prevented.

The third split will be a ContentState class. The use of this should be pretty clear.
This will accompany a split of the IContentOwner interface, adding IContentOwnerUnload. Because frankly, you never really need Unload.
The LoadContent method itself needs a tidy up too.

I'm very happy with UpdateState, so it stays how it is.

Application probably needs looking at.

 

Then we get on to DrawState itself, this is where the bulk of the changes will come. (Yes, the bulk :-)

The first big change I plan is how shaders interact with DrawState. Currently, IShaderSystem is publicly implemented by DrawState. This is why you can call shader.Bind(state); - DrawState explicitly implements IShaderSystem and the cast happens for you.
I plan to move the IShaderSystem implementation into a private class within the render system. This way, you cannot manually call Bind(). Because of this, the Bind method will be explicit and only visible through the IShader interface. (So generated shaders do not appear to have visible Bind method).

Wait?! What?! Am I mad?

Hold on, I'll explain.
It's part of a bigger plan. I intend to standardize all stacks that DrawState stores internally, and in the process add one for shaders.
I want them to be separate classes, and accessed by properties instead of methods directly in DrawState.

What does this mean?

Well, it means that:

state.PushRenderState(...);
state.PushWorldMatrix(...);

becomes:

state.RenderState.Push(...);
state.WorldMatrix.Push(...);

This cleans up the DrawState API *a lot*, and it makes things much easier to find. 'WorldMatrix' would have 'Set', 'SetTranslation', 'Push', 'PushMultiply', 'PushTranslation', 'PushTranslationMultiply', etc. All in one place.

What does this mean for shaders? Well, then shaders become:

state.Shader.Push(...);

This has the added benefit that I can make sure Bind() is only called when absolutely needed. Currently, if you call Bind() then change the world matrix, then draw, internally I have to call Bind() again because the world matrix changed. It's not a huge overhead, but it probably could add up.

Note:
The only catch here is that you cannot directly assign the property, so the following cannot be done:

state.Shader.Push();
state.Shader = ...; 

The only way to do this would be:

state.ShaderStack.Push();
state.Shader = ...;

However I don't like this. So I'd rather just have:

state.Shader.Push();
state.Shader.Set(...);     (obviously, Push(...) is the same here, but this is an example)


This also opens the door for overloading the shader methods with:

state.Shader.Push(IShader)
state.Shader.Push(Effect)

 

There may be other areas where DrawState gets cleaned up, but that is the major changes.

As for supporting Effects, the tricky problem there is how to deal with WorldMatrix, etc. I haven't yet figured out a good solution to this problem.

 

Anyway, that's all for now. I'm after feedback as to how you feel this might affect you, etc. It's obviously a fairly chunky set of changes - and it would naturally take quite a while to implement (especially as busy as I've been - although I'm feeling a lot better about things now I think XNA 4 may no longer be the death knell I thought it was...)

As usual, feedback and suggestions appreciated.

Coordinator
Apr 12, 2010 at 10:43 PM

What would really help, is if you could tell me the specific problems you've encountered and things that haven't made sense.
Usually in such a case it's an API design issue more than a doc issue. And obviously, it'd be nice to catch them.

Apr 13, 2010 at 8:39 AM

Hi StatusUnknown,

I'll try to give you some feedback on how I used Xen and what were the main issues I encountered using it (including ease of use as I think it's one of the most important features of any API :) )

So, first, I really, really, really, love how you structure applications with the Application class, the DrawState and UpdateState references in the Draw and Update methods provides us with a clean, clear and clever way to manage all that matters for us.
I think the proposal made on the DrawState breaking changes makes total sense and ease its usage over time.

Now, to answer your first 2 questions:

A. I do think that Xen 2.0 should focus on Xna 4 support. As Xen takes its root from Xna, you should keep a good synchronisation on versions to avoid adding bugs and confusion: Xen 1.8.1 is already a very good and stabilized version that could still be available while Xna 3.1 is still being used while Xen 2.0 will be focusing on providing what lacks to Xna 4.

B. One of the things that prevent me to fully use Xen Shader system was the fact that I couldn't use RenderTargets as I would have liked (which was the only way to get my desired effect). I therefore had to add special code to sometimes avoid using Xen draw pipeline and use my own derived from the Xna Effect system. It would be cool to actually be able to use Xna Effects when required (using a bit flag on the DrawState instance or using a (PreDraw, Draw, PostDraw) system potentially? ;)

Otherwise, I think it would be great if you could make the core Application class less dependent on the Drawing system. This is directly linked to the Xen.dll requirement for Xen.IShaderSystem: removing this dependency would be a really good idea.

Finally, I think Xen.Ex could be revisited to move some of the classes (and even all of them) inside the core Xen assembly. I often needed to use just one or two classes of the Ex assembly and for such cases have to link it to my project loosing therefore the benefits of separated assemblies. (i.e.: the great CameraControlled3D and FirstPersonControlled3d classes).
Maybe you could consider Xen.Ex being part of the core assembly and creating multiple Assemblies to provide standard classes/features for other uses by domain?

Those were my 2 cents ;)

Philippe

Apr 13, 2010 at 5:11 PM

how can i convert hlsl file to xen file to use it as .dll file?

Coordinator
Apr 14, 2010 at 9:53 AM

Hi philip. Thanks for the feedback!
Can you elaborate on what specifically you couldn't do? The only thing you genuinely cannot use with Xen is the ResolveTexture2D, which has some very fiddly syntax.
It may be there is a special case I'm not taking into account.

Hi Cheesy, Xen doesn't directly use the Effect system - instead generating C# classes that contain static arrays storing the Vertex and Pixel shader compiled bytes. So in this way, any class library with a generated xen shader can be used with any other xen project as a .dll import.
Thanks :-)

Apr 14, 2010 at 12:28 PM

Hi StatusUnknown,

The issue I got was trying to implement inside my Xen inherited application the following shader:

http://www.youtube.com/watch?v=yiLeATHdMxk

Philippe

Coordinator
Apr 17, 2010 at 12:40 AM

I'm really not too sure why specifically that effect couldn't be achieved with DrawTargets and cloning.

Anyway, I do have some good news, I've passed the first critical point on the road to XNA 4. I've managed to make a proof of concept that generates Techniques in the shader system that correctly map shader constants. It's a *long* way till a working prototype, but it's the big hurdle to making sure it should actually work.

In theory, in terms of D3D API calls, etc, nothing should change. It's just going through a nasty Effect system layer, most likely slowing things down somewhat.

Apr 17, 2010 at 1:05 PM

Congrats StatusUnknown! Good to hear you progress on the issues and found some motivation to spend your spare time back on Xen ;)

Coordinator
Apr 17, 2010 at 7:28 PM

Ok. I'm pretty confident that I can get this working, however it is going to have some side effects.
MaterialShader is going need some pretty extensive changes. No longer will a light specifically set if it's per-pixel or per-vertex, it'll have to be done on the shader level.

Supporting v_fetch on the 360 is going to be quite tricky.

Coordinator
Apr 18, 2010 at 11:14 PM

Ok. I've figured out how to do vFetch too :-)

It's a total massive hack - but it's actually more flexible than the old vfetch hack and should work better :-D

Apr 20, 2010 at 4:01 AM

I'm glad you're not abandoning Xen. I've starting using this API about 2 months ago and quite honestly, I don't know why more developers aren't using it.

It's so impressive, that the  latest HDR Tutorial look as good as the Rendering API used in Uncharted 2 for PS3

I am really looking forward to your upcoming updates for 2.0.

Thanks for all of your hard work.

Coordinator
Apr 21, 2010 at 8:58 PM
Edited Apr 21, 2010 at 9:02 PM

Thank you for your kind words and the extra bonus :-)
I appreciate it a lot

As for why more people aren't using it, well, honestly most people (myself included back in the day) usually want to do everything themselves. Especially in the indy scene. It's not until you scale up that it's obvious you need to use all the tools you can get your hands on. Once you are working, well, doing everything yourself is crazy :-)

Funny you should mention Uncharted. It turns out the 'Film Approximation' technique is exactly what Uncharted 2 uses - A friend told me about it but at first I didn't realise it he'd got it from this excellent paper about lighting in Uncharted. You might recognise the kodak image as I reused it in the tut.

Apr 21, 2010 at 10:39 PM

That's awesome. My co-workers/friends told me that there was no way to produce graphics as rich and as nice as Uncharted 2 with XNA...
After showing them your HDR tutorial, they're all asking me for books and tutorials on XNA. LOL

I definitely hope this transfers over well with XNA 4.0, but I still have time to make a game Xen 1.8
XNA 4.0 probably will not be released until after august.

Oh, BTW if you take of me, I take care you. Got It?  :)

Apr 22, 2010 at 2:37 AM


Great news about the transition!
I really want to give you some useful feedback, but I feel like I've just been exploring Xen and learning from it so I don't think there's much I can say that could help you.

The little I can say from my experience:
In my strange architecture I ended up using a PreDraw(DrawState state) to avoid some problems I was having with rendertargets.  Not sure it was the best solution but it worked for me.
I tryied to use normalized coordinates but didnt really get it to work, though I do need to take a better look at it... X)
I ended up creating a TextSprite class based on text rendered to a texture because TextElement didnt allow me to play with scale, rotation and some other details. (Again, not sure if it is a good solution but it works without apparently breaking anything :D )

Everything I tryed in Xen: from the updateState to creating particlesystems I found it to be a great improvement over XNA.

Xen is great! I do hope you're able to keep with it!
It's a huge help to us, thank you for putting your time into building something this awesome =)

Cheers

Coordinator
Apr 27, 2010 at 9:52 PM
Edited Apr 27, 2010 at 9:56 PM

Changes to Material Shader

One of the sublte features of the shader system is that you can merge two shaders together. The resulting shader takes the vertex shader from one, the pixel shader from another. (Assuming there is a set of matching constants on either).

This is no longer possible with an FX based system, so it has the side effect that I have to cut back the Material Shader quite dramatically.
Currently, there is a pretty huge combinational explosion effect going on. MS has a simply massive number of tweakables.

So, as I have to rewrite a large chunk of it, I might as well reengineer it too.
So this is what I'm looking at doing:

My overall goal is to make it require less hand holding. Currently it's so tweakable it's just too much work to setup.

So I want it to be easier to use, look better and be slimmer (it's currently got a massive number of shaders as it stands).
This means cutting back the combinations.

This is what my plan currently looks like:

  • No more X vertex lights + Y pixel lights. This was great and all, but I doubt anyone seriously needed 11 lights!
  • If you set a normal map, it will use per-pixel lighting. Otherwise, it'll use per vertex. Simple.
  • I will remove the per-pixel specular flag. It was too confusing. This limits the shaders to 2 pixel lights instead of 2 or 5.
  • Vertex lights will be reduced to 2 or 3. Previously it was 6, but 4,5 & 6 all used the same shader...
  • Fog!
  • The output will be Gamma corrected. I may make it toggleable for linear output (so it can be blended).
  • I'm going to simplify the light attenuation by removing the linear term. (In reality, the linear term is usually only used to account for a lack of gamma correction).
  • The VS will sample a spherical harmonic, just like the HDR tutorial samples it in the PS.

I realise that in some ways this is getting dangerously close to BasicEffect, especially since XNA 4 will have an animated version of BasicEffect.

I want to counter this by providing the ability to have lighting activly managing by the LightCollection intelligently. If there are more lights than can be displayed by the VS/PS, I want to dynamically store these lights in the spherical harmonic - basically this is exactly what the Unreal Engine does. As you move away from a light, it fades out and fades into the SH. Freeing up space for a closer light.

The idea being you can throw in 100 lights into the light collection, and reuse it for all your models. It won't be terribly efficient, but it'll be simple!
This will have the side effect that there will need to be an intermediate class between the light collection and the material shader.

This would actually end up quite similar to how Avatars are lit...

Apr 30, 2010 at 10:12 PM
Edited Apr 30, 2010 at 10:16 PM

Its great to see that you are thinking about Xen 2! Especially since I am becoming very attached to it. =)

However, one thing that bothers me a bit is the xml for particle systems. It is hard to understand by just looking at it. I feel like it would be so much easier if it were some sort of scripting language. Using a markup language to write code just doesn't work for me. My only wish is that you provide an alternative to xml in Xen 2. I think it would be possible to create a custom tool that "compiles" particle scripts (something that looks like LUA, JS, Python... etc) into xml. That way you could keep a lot of the current particle system code.

Coordinator
Apr 30, 2010 at 10:19 PM

The simple reason for using XML is that you can use a schema. This pretty much does all your validation for you, and the app can parse the XML without having to validate the input. It also means you get intellisense, which you really don't appreciate until you have to write code without it. :-)

Basically, it made writing the system much, much easier.. However the tradeoff is that it's not the nicest way to write code.

You might notice I'm quite a fan of strictly typed things :-)

Coordinator
Apr 30, 2010 at 11:21 PM
Edited May 8, 2010 at 6:11 PM

I managed to get the first test of Xen with the Effect system done tonight (first pass, got it compiling).
Still a bunch of things to do (Material shader, particles, bugs), but this is rather promising for a very first test (I was expecting it to either crash spectacularly or display garbage or nothing at all..):

Tutorial 28 using XNA Effect objects compiled through XenFX (first build!)

May 1, 2010 at 4:41 PM

wow. this is XNA 4.0? so far it looks like the transition is going well. :)

 

Coordinator
May 1, 2010 at 9:29 PM
Edited May 1, 2010 at 11:06 PM

Not quite. It's still XNA 3.1, but using the Effect system instead of the low level shader APIs.

The XNA 4 CTP only works on the phone emulator, and the phone doesn't let you use custom shaders. So if that *was* XNA 4, then holy crap. :-)

Coordinator
May 2, 2010 at 1:46 PM
Edited May 2, 2010 at 1:47 PM

Changes to animation / instancing:

This one isn't set in stone, but I am looking at making some fairly significant changes to how you do animation and instancing in a shader and in code.
Currently, it's quite explicit in both. I'm looking to make the process a tad more automated.

So, what I've come up with is the following, which should be self explanatory:

technique MyShader
{
	pass
	{
		vertexshader = compile vs_2_0 MyVS();
		pixelshader = compile ps_2_0 MyPS();
	}
	pass Blending
	{
		vertexshader = compile vs_2_0 MyVS_Blending();
	}
	pass Instancing
	{
		vertexshader = compile vs_2_0 MyVS_Instancing();
	}
}

Also, as an extension, I plan to make the blend matrices a semantic driven value - ie, similar to how you currently tag 'WORLDVIEWPROJECTION', you would tag the matrices 'BLENDMATRICES'.

This way, the shader itself defines if it supports blending or instancing.
This internally makes a number of things much easier, the biggest changes being that ModelInstance can use the currently bound shader (assuming it supports blending). (No more Shader override!)
It also makes managing blend matrices a bit more efficient.

May 8, 2010 at 1:24 AM
Edited May 8, 2010 at 4:55 AM

Two more suggestions for possible additions to Xen 2.0:

- It would be really nice if the DrawPredicate had a constructor that took in a BoundingBox or BoundingSphere for the predicate. I think most people use it with a bounding volume anyways and requiring the user to create a bounding volume drawer seems tedious.

- Currently the DrawPredicate automatically draws the complex geometry if (queryPixelCount >= pixelCount && item.CullTest(state)). Keep in mine that most of the time the simple predicate geometry will be drawn with a simple fill shader like FillSolidColour. If you are using occlusion queries or the DrawPredicate, it is likely because you are drawing a lot of geometry. If you are drawing a lot of geometry that uses the same shader you could be doing a lot of binding back and forth shader binding between the complex geometry shader and the predicate geometry shader. For example lets say you are drawing a forest of 100 trees using the same shader, you will end up switching shaders a lot. This seems like a waste to me. It would be nice if the DrawPredicate class could set some sort of boolean variable to the visibility of the object rather than calling the object's draw function automatically. This way you could test the visibility of the 100 trees and then draw the visible trees without switching between predicate and complex geometry shaders on every predicate test.

 

 

 

Coordinator
May 8, 2010 at 6:09 PM
Edited May 9, 2010 at 12:03 AM

Yup. That makes sense, thanks for the feedback.

Changes to interaction with standard XNA classes: 

This has always been a right pain in the arse. With Xen taking over control of the GraphicsDevice (for all the right reasons) it was always a bit of a nightmare integrating with existing XNA classes.
This is where the likes of 'DirtyInternalRenderState()' and 'BeginGetGraphicsDevice()' came in. Which, to be honest, are downright terrible APIs.

If you are going to make a mistake, it's going to be with one of these APIs.
So, what to do?

How about removing them entirely!
I'm not 100% sure on this, but what I'm thinking I'll do is a nice compromise. Instead of requiring that the user dirties the render state whenever they draw a sprite batch (for example) I'm going to go in a different direction.

Xen already had a method 'DrawVertexBuffer' in DrawState. This took a standard XNA VertexBuffer and drew it, using the internal state checking code to make sure nothing would go wrong (and actually optimise rendering somewhat).
So, I've decided to look at taking that one step further.

Xen 2.0 is going to include an extensions class in the core namespace ('XnaExtensions'). This extensions class is going to provide extension methods to the common XNA classes, adding a Draw(DrawState) method.
This will mean that a VertexBuffer will be drawable in almost exactly the same way a Vertices<> object is. I'll look at providing them for SpriteBatch as well, and the standard XNA component.

Hopefully, this will make it easier to mix the two APIs without totally messing up the xen internals.

Coordinator
May 10, 2010 at 9:06 PM
Edited May 10, 2010 at 9:07 PM

More detail on the changes to animation and instancing:

Earlier, I mentioned I'd be looking into making a shader explicitly support instancing or blending.
Well, I've been working on it, and it actually made a lot of sense. So I'm going with this design.

So now a shader explicitly declares if it supports blending or instancing by providing a blend or instance Pass.

But there is the question of how you integrate that with the rest of the code. I wasn't too keen on having a DrawState global that said 'use instancing!' etc. So I had a think, and what I have come up with (I think) makes a lot of sense:

You specify instancing or blending when drawing geometry - at the vertex buffer level.

What this means, is that the Vertices<> class, etc, now have three Draw methods:

Draw,
DrawBlended,
DrawInstances

The call is explicit, and Blended/Instances now will take in a new parameter specifying the data.

What are the new classes?

Blending:
Previously, for animation there was MaterialAnimationTransformHeirachy in Xen.Ex - a bit of a mouthful. Well, now this class has been moved into the core Xen API (and renamed 'AnimationTransformArray' to make it more obvious). You pass one of these into 'DrawBlended' and bingo! the shader gets the appropriate animation data (when needed and when changed).
If you pass in null, then it defaults to a normal Draw() call.
The shader gets the blend matrices in the same format as older Xen shaders, except now they are a known semantic:

float4 blendMatrices[72*3] : BLENDMATRICES;

That's all you need to do, the API takes care of making sure they are set correctly. Because it's standardised, it can be optimised quite a bit too (there is still room to eliminate a redundant copy of the matrices too, so it will potentially get even faster). If you use the shader twice in a row with the same matrices, they will not be copied twice.

Instancing:
Previously, VerticesGroup had some obscure Draw overloads that let you specify a StreamFrequency object. This object setup instancing on the PC using vertex stream frequency hacks. Internally this was used by DrawBatch(), etc, however it really wasn't at all obvious.
Internally there is a class called InstanceBuffer, a call to BeginDrawBatch() would return an InstanceBuffer, which you would then fill with your instances. Call EndDrawBatch() and the instances would be drawn. This was OK, and internally it was quite efficient - but it's not that nice to use. It's big problem was there is quite a lot of overhead copying the instance data to the GPU every frame.

Now, instead DrawInstances directly takes an InstanceBuffer object as it's extra parametre. And the InstanceBuffer is now user creatable. This allows an InstanceBuffer to be static. DrawState will get a 'GetDynamicInstanceBuffer(int size)' method for creating a dynamic instance buffer, for within a single frame.

Once again, pass in null and it will be just like a regular call to Draw().

 

The upshot of all of this, is that rendering that involves blending or instancing become much simpler. Especially when the rendering is sometimes blending, sometimes not.

The best example is ModelInstance. Currently, if you want to customise the shader it uses, it's a right pain. You have to use ShaderOverride. With this system, the model instance just calls DrawBlending(...) passing in the animation data (if it even exists). Whatever shader you had set will be used.

 

I'm really keen to hear feedback. The last thing I want is 'OMG you broke all my code!' right as I release it!
If I don't get any feedback, I'm assuming it's OK to just go with it.

Cheers.

May 13, 2010 at 4:05 AM
StatusUnknown wrote:

I'm really keen to hear feedback. The last thing I want is 'OMG you broke all my code!' right as I release it!
If I don't get any feedback, I'm assuming it's OK to just go with it.

It might be a good idea to post a link to this thread on the front page for this project. I think some people come to check for updates without checking the discussion section.

Coordinator
May 18, 2010 at 10:44 PM
Edited May 18, 2010 at 10:51 PM

Ok. Here is an API extension that is completely out-there (but I actually kinda like it).

I've been working away at these API changes, and an interesting problem has popped up.
Now that everything is more heavily stack based, it means you spend more time worrying about getting the Push() / Pop() calls lined up.

For example:

	state.WorldMatrix.PushMultiply(ref this.worldMatrix);

	if (sphereGeometry.CullTest(state))
	{
		state.Shader.Push(this.shader);

		sphereGeometry.Draw(state);

		state.Shader.Pop();
	}

	state.WorldMatrix.Pop();

 

It works in this simple case, however I'm finding it's not especially readable as the size of the implementation grows.


So, what I'm considering is allowing the following syntax to work (using the magic of C#'s using statement):

 

	using (state.WorldMatrix.PushMultiply(ref this.worldMatrix))
	{
		if (sphereGeometry.CullTest(state))
		{
			using (state.Shader.Push(this.shader))
			{
				sphereGeometry.Draw(state);
			}
		}
	}

 

This makes the scope for the push/pop very explicit.

Which got me thinking, I can push this one step further down the crazy road:

 

	using (state * worldMatrix)
	{
		if (sphereGeometry.CullTest(state))
		{
			using (state + shader)
			{
				sphereGeometry.Draw(state);
			}
		}
	}

 

Thoughts? (Note, all three options can be supported at once)

May 19, 2010 at 9:03 AM

I personally like the Using keyword approach but not the latter (state * worldMatrix) which is for me less reading friendly: what am I actually really doing when I am multiplying state to worldMatrix and then adding state to shader???

Sometimes the less typing is the developer's ennemy... Especially working on a team that needs to read your code :p

However, I wonder if you wouldn't get some allocation/deallocation and Garbage Collector performance related issues with the "using" statement?...

From my understanding, "using" actually creates an instance that is then automatically disposed and collected by the GC when the brackets is closed. Doing this every frame for rendering and for many objects, if my assumptions are right, would have a terrible effect on Xen, wouldn't it?

Philippe

Coordinator
May 19, 2010 at 9:13 AM

No, if a struct is passed into a using statement, then the C# compiler is smart enough to treat it as so (Unless it has an explicit Dispose implementation).
In theory it should be no less efficient than calling the method directly.

May 19, 2010 at 1:55 PM

I like the less wordy approach.
 I believe its not that hard to understand and having smaller code lines facilitates/speeds up reading and writing.

I would prefer it to using push and pop explicitly due to the issues you stated above.

May 24, 2010 at 4:51 PM

I like the last implementation the best. However, I think the first implementation should remain for people who don't want to use the using statement.

-------------------------------------------------------------------------------------------

Off topic:

I think the following practice is a little annoying for value types as small as a vector3:

                Vector3 camera;
                state.Camera.GetCameraPosition(out camera);

I would rather do

                Vector3 camera = state.Camera.GetCameraPosition();

or

                Vector3 camera = state.Camera.Position;

or

                Vector3 camera = new Vector3();
                camera.X = state.Camera.Position.X;
                camera.Y = state.Camera.Position.Y;
                camera.Z = state.Camera.Position.Z;

 

 

 

May 24, 2010 at 9:42 PM

Hi Arriu,

This is not specific to a convention rather optimisation :)

By setting a parameter using the out statement, you avoid creating another instance inside the method to be returned.

Take a simple example:

public string GetMyName()
{
    return "Philippe"; // this is the same as: return new String("Philippe")
}
static void Program
{
    string myName = GetMyName(); // Here, we create another string instance.
}

If you use this kind of calls, you are actually creating two instances of string one inside the method (destroyed when the method returns) and the one that receives the result of the method.

By doing:

public void GetMyName(out string myname)
{
    myname = "Philippe"; // We do not create an instance but fill the one passed in the method's signature
}

static void Program
{
    string myname = string.empty;
    GetMyName(out myname);
}

We just create one instance of string that is used inside the method.

This specific case isn't really important performance wise but when you make a few calls to a method every frame, you can get a really good perfomance increase using such pattern.

StatusUnknown is just giving us some good API methods with performances in mind ;)

Coordinator
May 24, 2010 at 11:24 PM
Edited May 24, 2010 at 11:25 PM

Actually, string is a sortof special case. It's a class, so it's pass by reference. However it's immutable - so any modification you do will produce a new string. So in that case, you do not duplicate the string.
If it was a struct type, such as VectorX or Matrix, then yes, it would be duplicated.

It's good practice in XNA to pass your Vectors and Matrices by ref or out. The compact framework that runs on the 360 generally doesn't do any inlining, or optimising out redundant data copies.

It also forces you to think about the method calls you are making. But I do understand that the vast majority of people simply don't care :-)

May 25, 2010 at 1:59 AM
philippedasilva wrote:
public void GetMyName(out string myname)
{
myname = "Philippe"; // We do not create an instance but fill the one passed in the method's signature
}

If memory is really a concern, then you are forcing the existence of at least another variable by having a method return "Philippe". If this is confusing, consider the following:

public class Camera
{
      private Vector3 position;

      public void GetPosition(out Vector3 position)
      {
            position.X = this.position.X;
            position.Y = this.position.Y;
            position.Z = this.position.Z;
      }

}

Here I cannot access the camera position from another class without creating another variable for it. This has some advantages though (access protection). But as StatusUnknown has mentioned, sometimes I don't care =)

public class Camera
{
      public Vector3 Position;
}

Using the above class I can directly reference the position saving the need to create local variables inside other classes. Also, I am able to get Position.X without getting the entire Vector3 ;)

May 25, 2010 at 8:33 AM
StatusUnknown wrote:
Actually, string is a sortof special case. It's a class, so it's pass by reference. However it's immutable - so any modification you do will produce a new string. So in that case, you do not duplicate the string.
If it was a struct type, such as VectorX or Matrix, then yes, it would be duplicated.

Ok StatusUnknown, my example wasn't the best one :p

arriu wrote:

If memory is really a concern, then you are forcing the existence of at least another variable by having a method return "Philippe". If this is confusing, consider the following:

public class Camera
{
      private Vector3 position;

      public void GetPosition(out Vector3 position)
      {
            position.X = this.position.X;
            position.Y = this.position.Y;
            position.Z = this.position.Z;
      }

}

Here I cannot access the camera position from another class without creating another variable for it. This has some advantages though (access protection). But as StatusUnknown has mentioned, sometimes I don't care =)

public class Camera
{
      public Vector3 Position;
}

Using the above class I can directly reference the position saving the need to create local variables inside other classes. Also, I am able to get Position.X without getting the entire Vector3 ;)

In your example, you are simply returning a that isn't used inside of the method call while Xen Camera class stores all the camera position and orientation using a unique Matrix. The camera position is therefore computed/copied from the Matrix instance. To avoid creating a Vector3 structure for each call to Camera.GetCameraPosition (which you may surely be using quite often in a single frame), it is simply using the out pattern ;)

May 26, 2010 at 6:03 AM

You know, this should probably be in its own thread, and possibly in a different forum, but my original reaction to the question about using the out keyword was that of course it's for performance reasons.  I never trust myself when I think something is obvious, so I compiled and timed a few variants, including using a property, the Vector3 vec = cam.GetPosition(); method, and Vector3 vec; cam.GetPosition(out vec); 

I figured I'd share the results, for what they're worth.  First off, I've got to preface this by saying that I'm using VS2010 and .NET 4.0.  Right away, that means this data is not directly relevant.  (I'm dying for the XNA team to release 4.0 so I can get back on my projects that use Xen -- when Xen 2.0 is released of course.)  I also timed only on a PC, not an XBox360.  But there might still be something of interest here.

What I found using ildasm was that storage for all local variables was predefined and zeroed by the compiler, so none of the three snippets required any new object initialization.  (Using "ref" instead of "out" did though, and slowed things down ridiculously).  All three used an identical callvirt operation to access the property/method in the Camera class.  And in the release assembly there was no excess copying for any method.  The method that returned the Vector3 and the Property simply put the position attribute on top of the stack and returned, and the main method immediately performed a stloc to put it in the local variable.  The version with the out keyword used the address of the local variable as a parameter and had a stobj instruction in the method to put the data in the local variable, but didn't execute any store after returning.  Using the out keyword was about 20-30% faster over 100 million reps.  (I'm assuming the difference is in the implementation of stobj vs stloc, but I didn't ngen any of this, so I don't know what the difference is exactly.)

Looking at the Debug assembly is a little more interesting.  For both the property and the method that returned the Vector3, there *is* an extra copy of the position field data.  Not only that, but there's an unconditional branch to the following instruction (i.e.... the instruction does nothing.  It's a fairly common artefact in debug assemblies).  And the performance?  The property and method call returning a Vector3 again took pretty much exactly the same time to execute.  Using the "out" parameter?  That ran in *one third* the time of the other two.  Now consider a Vector3 is just 4 floats... obviously Matrices and larger structs are going to take longer to copy, so the relative performance of using the "out" parameter will increase that much more.

So, given that the compiler doesn't optimize away redundant copies when targetting the compact framework on the XBox (one has to ask why???) it sure seems like using "out' is the right thing to do.  It gives a pretty nice boost in a release assembly on the PC, and vastly better performance on the XBox.  (Why the heck doesn't the compiler optimize away redundant copies when targetting the XBox??? 

 

So back to the original thread question.  I'm totally happy to have Xen 2.0 break anything and everything.  I'm still experimenting with stuff and don't have any serious library built up with dependencies on the lower releases.

Coordinator
May 27, 2010 at 12:48 PM
Edited May 27, 2010 at 12:50 PM

Hmmmm.... Interesting. :-)

I have a lot of code that follows the pattern:

changed |= GetValue(ref value, ref change_idx)

(GetWorldMatrix, etc)

Where GetValue returns true if the target value has changed according to the index, and 'value' is typically a Vector4 array element. At the least, I should change 'value' to out instead of ref. As it won't make any difference in the implementation.
This pattern is especially common in the shader system (and is even more common in 2.0)

Perhaps this would make more sense:
changed |= GetValue(out value, change_idx, out change_idx)

It's a tad counter intuitive but should work.

Coordinator
May 27, 2010 at 12:59 PM
Edited May 27, 2010 at 10:40 PM

I'll also mention:

I've got the first pass of the revised MaterialShader working. It's been quite a struggle getting the instruction count down. The most complex pixel shader needs to do two lights (with specular power/mask/colour, directional/point flag), fogging, normal mapping (tangent space matrix, etc), vertex colour, input gamma->linear conversions, alpha output, and toggleable output gamma correction - all in 63 arithmetic instructions! (ps_2_0 limit is 64).

It's surprising where the shader compiler omits obvious optimisations. For example:

a += (b * c) + (d * e);

Is one more instruction than:

a += b * c;
a += d * e;

From what I can tell, this is due to how INF/NaN might change the ouput. The second is two MADD instructions, the first is MADD,MUL,ADD. Although on a lot of hardware they should be identical in terms of performance.

May 27, 2010 at 10:30 PM

Heh.  Ain't optimization a pain?  Performance timing is a minefield, too, and I wouldn't bank on "back-of-the-envelope" rig I put together to be particularly precise.  I was more interested in the factor of three... you don't need much faith in the precision of the measurement to pick the option that runs three times as fast... :-)

Re:  changed |= GetValue(out value, change_idx, out change_idx):    I suspect that changed |= GetValue(out value, ref change_idx); might be faster.

The ref keyword was only slowing things down because you have to pass an initialized object as ref, which meant initobj had to be called before the method call.  With out, or when returning a value to be stored in a local, you don't have to initialize the local first.  My test rig was probably the worst case scenario for the ref keyword... I newed up the object within the timing loop to see how much the initobj instruction cost compared to the callvirt.  Pulling the initialization out of the timing loop resulted in the ref and out versions running identical times, as you'd expect.

In any case, change_idx sounds like an integer, so you're not going to be calling initobj on it anyway, and passing 3 parameters instead of 2 will cost a little.  So instead of worrying about that, find us a new feature to use up that last pixel shader instruction you're wasting!  ;o)

Coordinator
May 27, 2010 at 10:57 PM
Edited May 28, 2010 at 12:08 AM

I realise now I misread your reply. "Wow! out is that much faster than ref!, I need to get to work"
When I really should have read 'wow, initalising or invoking an empty constructor on a struct can sometimes be really expensive'

Yeah duh, I should have paid more attention. :-) I've pretty much gone back to what I had.

 

I'm running some tests right now. Invoking the binding method on the HDR tutorial character shader (very heavy shader) 10 million times takes 2 seconds. With out (and using initalisers) was ~ 2.2 seconds. Of course, if you actually try and set the constants on the GPU then that goes up to ~15 seconds. :P
Sure, it's not a very productive target for optimisation work, but it's still interesting.

Jun 12, 2010 at 3:20 PM
Edited Jun 12, 2010 at 3:21 PM
It would be nice if Xen 2.0 supported the following method of instancing (as described here and here) on the xbox 360 or at least allowed us to code it ourselves (using DrawPrimitives in this way is not supported by Xen, neither are multiple vertex streams per draw call from what I can tell):

Draw geometry using by using DrawPrimitives instead of DrawIndexedPrimitives, so the GPU generates incrementing index values
Store your real index values in vertex stream #1
Store vertex data in stream #2
Store instance data in either constant registers or stream #3
Then the real indices can be vfetched inside the shader.

Upside: no longer need to replicate any geometry data at all (thus saves memory)
Downside: disables post T&L vertex caching (thus increases vertex processing workload by about 10-20%)
Coordinator
Jun 12, 2010 at 4:54 PM
Edited Jun 12, 2010 at 4:58 PM

In 1.8, instancing support on the 360 currently uses technique #2 on shawns post. It will automatically generated a duplicated index buffer internally. You need to provide the shaders for this, but it's much easier in 2.0. 
Of course, the downside is you can't draw a subset of the triangles and it does duplicate data.

In 2.0, instancing is done by calling 'DrawInstances' on a vertex buffer. The shader itself defines if it supports instancing, and if it doesn't internally xen will fallback to simply using multiple draw calls.
All built in shaders support instancing out of the box, including material shader.

The XNA team are doing some work to make instancing easier, so I'll need to look into how they are doing that - but I suspect it's the method you mention. Unfortunately that method is quite a lot slower for very heavy vertex shaders, as you don't get any benefit from the vertex cache - but the 360 is very fast at vertex work so it's not such a big deal. I'm half tempted to change to that method, for the majority of people it will be more flexible even if the vertex load will be higher.

Downside: disables post T&L vertex caching (thus increases vertex processing workload by about 10-20%) 

It's actually significantly higher than that. An optimally indexed flat grid mesh will be more close to 4x higher vertex load.

Jul 6, 2010 at 8:58 PM
Edited Jul 6, 2010 at 9:38 PM
I would really like to know how you got vfetch to work with xen.

The reason I ask is because I would like to use vfetch in the following way:
- I want to draw about 500,000 quads (all the same size, facing the same direction, only position differs).
- I'd like to do a drawprimitives call which generates the indices automatically.
- then I could store the positions of each quad in another stream (vertex stream 2).
- then I could vfetch the position of the quad for each vertex in vertex stream 1.


I could just set the quad positions in vertex stream 1 but then the setdata() call would be a lot more expensive... (I am planning on double buffering and calling setdata pretty much every update, this leads to about 20-25mb that need to be sent :$... i might be able to set dirty ranges but vfetch appears to be a better choice)
Coordinator
Jul 7, 2010 at 5:38 PM

To use vfetch in xen, you have to use the asm_vfetch() macro, and #include "asm_vfetch". You can find quite a lot of shaders that do this (although you don't need the include in xen 2).

What you are wanting to do is pretty tricky, and XNA doesn't really expose any nice way to do it (outside of dynamically updating a VB and using vfetch).

Coordinator
Jul 7, 2010 at 11:42 PM

I'd also suggest considering texture lookup using vertex texture fetch. If you can get the GPU to generate the data for you, then you cut the CPU out of the loop and generally get significantly better performance.

This of course is exactly how the particle system works in xen. :-)

Jul 9, 2010 at 9:27 PM

Thanks for the reply!

I have given it some more thought. I am not sure that using vfetch instancing is a great idea in my situation. It does have many pro's but the lack of portability is a pretty large disadvantage.  Also, you have pointed out that it would be a lot more computationally expensive as well and the way I intended to use it would not be very feasible.

 

StatusUnknown wrote:

I'd also suggest considering texture lookup using vertex texture fetch. If you can get the GPU to generate the data for you, then you cut the CPU out of the loop and generally get significantly better performance.

This of course is exactly how the particle system works in xen. :-)

I have implemented a particle system similar to the one you've implemented in Xen. Before I considered using vfetch I thought about using textures to store quad positions. However, the terrain i am working on is "infinite" and fully destructible. The logistics of making it all work on the GPU makes me wanna curl up into a ball. However, the perlin noise function would be A LOT more efficient on the GPU (ugh and that happens to be a very attractive plus which may drive me to change my mind). I would use a technique similar to what is described in Chapter 1 of GPU Gems 3 but the lack of dx10 makes it a lot more expensive.  

Instead, I think I am going to try to use a single vertexbuffer which will be divided into "pages" of equal size. Then I will create a table of nodes with references to the pages of data in the vb. This will act like a file system, allowing me to cram multiple vertex buffers of data into a single buffer (unused memory will consist of vertices at position (0,0,0) and drawing a few tringles of 0 area should not be a huge problem (thoughts/suggestions?)). This wont reduce the memory footprint, like instancing would, but it will reduce the total number of draw calls (which is about 30-200 at the moment, since my terrain is divided into 32x32x32 chunks, each of which has its own vb).

If you're interested, here is what I'm working on.

 

Coordinator
Jul 10, 2010 at 1:44 AM
Edited Jul 10, 2010 at 1:45 AM

I'm curious what type of effect you are actually going for? Is this for some kind of grass / shrub effect?

I can't help think that rendering a heightmap on the GPU - and using VTF when rendering all your cubes would probably be quite an efficient way to draw the terrain. 
If you are trying to do something like grass, then that'd work quite nicely too.

And 200 draw calls isn't too bad - provided the number of state changes between each isn't too high.

Coordinator
Jul 18, 2010 at 5:25 PM
Edited Jul 18, 2010 at 5:26 PM

I'm currently in the process of porting to the XNA 4 Beta. It's going to be a fairly long process, but I've got the shader compiler sorted (which I was expecting to be the hardest bit).

One thing is clear, the shader compiler requires that I create a 'HiDef' capable device. In other words, a DX10 level device.
It's quite possible that Xen will not build on DX9 hardware.

Note: Xen will still run on DX9 hardware - just it may not build.

 

I've had to do some exceptionally nasty hacks to get things to work, a number of lower level APIs have been removed. Such as Sampler States are no longer accessible through an Effect (native wrapper ahoy!).

Coordinator
Jul 26, 2010 at 12:36 AM
Edited Jul 26, 2010 at 12:41 AM

Humble beginnings:

 

Around 500+ build errors later.... :-)

I must say, XNA 4 is a damn sight better. I've had to remove a tonne of special case and error preventing code. Although a number of the cuts made certain things rather tricky.

 

And some even better news,

It took some work,  but I've managed to work out a way to efficiently automatically generate instancing and animation shaders.

Coordinator
Jul 29, 2010 at 9:20 PM

yay

Jul 30, 2010 at 1:11 AM

Xen 4 is looking better each day!
 Great work =)

Aug 11, 2010 at 3:58 AM

Hi StatusUnknown. I've been keeping an eye on progress and I gotta say I'm really impressed. I'm looking forward to giving Xen 2 a crack when we start our next Xbox 360 3D project.

Well done for keeping it up to date with Xna 4 - I've just converted Square Off over and there were a lot of (mostly minor) edits requried. I imagine it would've been much harder in your case  ;)

Coordinator
Aug 11, 2010 at 9:41 AM

Thanks mate :-) I love square off, fantastic little game. You didn't overstretch and it shows.

I'm not going to lie; it's been difficult to keep motivated to get xen ported. You wouldn't believe some of the nasty things I've had to do to the shader system :-) Stealing COM pointers, native code, hijacking render state, reverse engineering internal XNA classes... :-)
Also there sheer number of build changes has been pretty overwhelming. Simple example: They have removed Point rendering, which the GPU particle system relied on heavily.
And I still have a bunch of areas to fix up (the big one being instancing)
To cap it all off, I'm in the middle of a fairly big transition in life (again!) so I've been distracted. Ohh and my laptop's GPU is dying too!

But... With all that said, I'm feeling pretty happy with how it's going. I'm really happy with the crazy hacks I've made in the shader system - automatically generating animation and instancing shaders. I think this will help a lot of people - it certainly made the material shader much simpler.
Also, it's a much tighter integration with normal XNA now. It should be much easier to integrate with other XNA game components, etc.

I'm thinking about what I want to do with xen long-term. In a lot of ways, the API is fairly stable and not really going to change much. I'm open to ideas.

Aug 20, 2010 at 12:37 PM

Hi!

I have a question. First of all, my apologies. It's probably it's a silly question! I'm new in the XNA world, so now I'm trying to find a good framework to make me easier this hard start-up (no 3D modeling knowledge neithter, so you can imagine... 8·D).

My question is: is XEN 2.0 going to have a Windows Phone version? If so, will it be the same? Or will it have some limitations?

Thanks in advance for your reply 8·)

 

Coordinator
Aug 20, 2010 at 9:04 PM
Hi. I can't say yet about win'phone support, XNA 4 won't support custom shaders on the phone - and xen is based very heavily around the shader system. Also on matter what, 3d will be hard on the phone. Sorry I cannot give a definite answer. Also, unfortunately, my dev laptop died last week, and it'll be at least another 10 days till it's fixed. :-(. In the mean time I have to reply on my slow phone, op at work (which I will be leaving soon) - so xen is unfortunately not moving forward right this moment. :-( Thanks for posting, I'll try to keep the thread up to date.