Monday, September 26, 2005

XmlSerializer Performance Issue

Ever notice that making a call to new a System.Xml.Serialization.XmlSerializer is really slow?

As it turns out, the XmlSerializer constructor is slow since it creates the serialization and deserialization code and compiles it into a temporary assembly at runtime. This occurs for each type that is passed in as a parameter to the XmlSerializer constructor.

On older machines or machines where the performance is I/O bound, the time it takes for the constructor call balloons very quickly. I’ve seen it take as long as 8 seconds.

The fact is that in the majority of cases, you will not need to have this functionality at runtime since the binding of the object graph and the XML will be known at compile time.

Workaround in the .Net Framework 1.0 and 1.1
In .Net 1.0 and 1.1, you could workaround this issue by capturing the generated code in a test run, write an XmlSerializer subclass and use that instead.

In order to capture the XmlSerializer generated code, you need to set a diagnostic switch in the application config file of the app:

<?xml version="1.0"?>
<configuration>
<runtime>
<gcConcurrent enabled="true" />
<assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
<publisherPolicy apply="yes" />
<probing privatePath="bin\debug" />
</assemblyBinding>
</runtime>
<system.diagnostics>
<switches>
<add name="XmlSerialization.Compilation" value="4"/>
</switches>
</system.diagnostics>
</configuration>

After performing a test run with this app config file, the generated code can be found in the temp folder. Now that you have that code, you need to write a subclass of the XmlSerializer that looks something like this (the actual names of the methods and classes are generated and are subject to change):

public class MySerializer : XmlSerializer
{
public MySerializer()
{
}

protected override XmlSerializationReader CreateReader()
{
return new XmlSerializationReader1();
}

protected override XmlSerializationWriter CreateWriter()
{
return new XmlSerializationWriter1 ();
}

public override Boolean CanDeserialize(XmlReader xmlReader)
{
return true;
}

protected override void Serialize(Object objectToSerialize, XmlSerializationWriter writer)
{
((XmlSerializationWriter1)writer).Write4_Options(objectToSerialize);
}

protected override System.Object Deserialize(XmlSerializationReader reader)
{
return ((XmlSerializationReader1)reader).Read5_Options();
}
}

Given that you’ve added the class above and the generated code to your project, the last thing that remains is to change your instantiation of the XmlSerializer:

From:
XmlSerializer serializer = new XmlSerializer(typeof(Options));

To:
XmlSerializer serializer = new MySerializer();

The MySerializer() constructor is empty so it takes 0 time. All the time spent instantiating the XmlSerializer is eliminated.

The other nice thing about this is that you can look at the generated code to see how the serialization/deserialization code is implemented.

The downside is that you need to ensure that you keep the generated code and the code for the class being serialized in sync. You may want to wait until your code stabilizes before switching to the captured code.

sGen in the .Net Framework 2.0
In the .Net Framework 2.0 (Whidbey), there is a new command line tool called sGen that is also accessible from the Visual Studio 2005 IDE in the Build project settings as “Generate serialization assembly”.

Essentially, this automates the list of steps above and creates a corresponding AssemblyName.XmlSerializers.dll where "AssemblyName" corresponds to the name of the assembly that uses the XmlSerializers.

When the XmlSerializer constructor is called, the .Net Framework will look for a corresponding *.XmlSerializers.dll assembly and to load the statically generated serializers.

Its important to realize that this is happening because if you forget to deploy your *.XmlSerializers.dll files, your deployed application will have far worse performance than what you are seeing on your development machines – and if you don’t know about this mechanism, it’ll be hard to figure out why that is.

7 Comments:

At 2:57 AM, Anonymous Anonymous said...

Hi,

I have used the technique which you have explained in your blog. The genrated code is not compiling due to the following missing statement in the code.

if (!needType) {
System.Type t = o.GetType();
if (t == typeof(BusinessEntitiesToSerialize.Supplier))
;
else {
throw CreateUnknownTypeException(o);
}

if (!needType) {
System.Type t = o.GetType();
if (t == typeof(System.Object))
;
else if (t == typeof(BusinessEntitiesToSerialize.Supplier)) {
Write1_Supplier(n, ns, (BusinessEntitiesToSerialize.Supplier)o, isNullable, true);
return;
}

Could you pls let me know the missing piece of code which I have to add in to the generated code to make it working?

Your reply is highly appreciated.

Thanks Regds,
Vasudevan

 
At 9:23 PM, Blogger Jim Nakashima said...

What version of the framework are you using and what do you mean by "the genrated [sic] code is not compiling due to the following missing statement in the code"?

I would like to help but do not have enough context as to what the problem is.

 
At 8:31 AM, Anonymous Anonymous said...

Hi,

I used XML, my classes are really big. I don't use soapformatter or binaryformatter, I use a web service(proxy) (SoapHttpClientProtocol) (Xmlserializer) to all call.

It takes 35 secondes to create schema of my classes the first time I use the application.

I found something weird, I use [XmlSchemaProvider("CreateEntiteSchema")]Public class Eid3000_contratCollection.

When I use this attribute "XmlSchemaProvider", the framework bypass my XmlSerializer dll created by sgen.exe.

When I remove this attribute, the framework use it, and it takes 20 secondes instead of 35 secondes.

This class is the biggest of the application and generated by codesmith.

Any idea to reduce the waiting time ? (this waiting time is appear only the first time, second time is about 2 secondes)

I even tried the compression like http://msdn2.microsoft.com/en-gb/library/ms979193.aspx but no gain.

Thanks.

 
At 8:30 PM, Anonymous Anonymous said...

Very very nice workaround if you want to compile to .NET 1.x!

 
At 6:19 AM, Blogger Yos said...

Hi
Where can I find the temp code/assembly

 
At 4:59 AM, Blogger gray said...

Hi,
Have gone through an article on MSDN. Here is the snippet from

http://msdn2.microsoft.com/en-us/library/system.xml.serialization.xmlserializer(VS.85).aspx

----------------------------

To increase performance, the XML serialization infrastructure dynamically generates assemblies to serialize and deserialize specified types. The infrastructure finds and reuses those assemblies. This behavior occurs only when using the following constructors:

System.Xml.Serialization.XmlSerializer(Type)

System.Xml.Serialization.XmlSerializer(Type,String)

----------------------------


It clearly says that the runtime finds and reuses the dynamically generated assemblies. If such is the case then, it should not be a performance issue.

I didn't see any performance overhead when using XmlSerializer(Type) constructor.

 
At 10:07 AM, Anonymous jnak said...

Since they've updated to google accounts I can't seem to login to this blog anymore. I tried the legacy signin too, guess I must've forgottent my username/pass.

Anyway, the performance issue *is* the auto generation of an assembly -- dynamically generating an assembly at runtime is costly.

That said, in VS 2005 and beyond, Microsoft solved this problem by providing a tool that generates this assembly at development time say as a post build step.

 

Post a Comment

<< Home