Expand my Community achievements bar.

Don’t miss the AEM Skill Exchange in SF on Nov 14—hear from industry leaders, learn best practices, and enhance your AEM strategy with practical tips.
SOLVED

PDF Size will increase in size dramatically with every submit.

Avatar

Level 8

I have a PDF Form desinged using Adobe LiveCycle Desinger ES2.

It has a submit button which will submit the form to the server (IIS and ASP.NET) using this javascript command:

event.target.submitForm( {cURL: "http://server/ASPNETWebPage.ASPX", aPackets:["datasets","pdf"], cSubmitAs: "XDP"});

On the server, from ASP.NET, I use the following code to extract the submitted "chunk" element and convert it from Base64 to Binary PDF File:

            fs = New System.IO.FileStream(mFormFileNameFolder, IO.FileMode.Create)
            bw = New System.IO.BinaryWriter(fs)
            ' Get chunk element form the submitted XML
            Dim srChunk As New StringReader(mXML.GetElementsByTagName("chunk")(0).InnerXml)
            Do While True
                Dim theChunkLine As String
                theChunkLine = srChunk.ReadLine
                If Not String.IsNullOrEmpty(theChunkLine) Then
                    theReadBytes = theChunkLine.Length
                Else
                    theReadBytes = 0
                    Exit Do
                End If
                Dim theBase64Length = (theReadBytes * 3 / 4)
                Dim buffer() As Byte
                buffer = Convert.FromBase64String(theChunkLine)
                bw.Write(buffer)
            Loop
            bw.Close()
            bw = Nothing
            fs.Close()
            fs = Nothing

The above code is working fine, and PDF is generted successfully.

I have one problem.

With every submit, the generated PDF Size will increase dramatically. I reported this to Adobe Support, and they cofirmed that this is by desing and that with every submit, the previous PDF State is saved, and the new state is added. That is why I get huge PDF File.

I was told that the only way to solve this problem is to submit the form as PDF ONLY, and after I save the PDF File on a file system, I then must use Adobe Service/Process "exportData" to extract the XML Data from the PDF.

I think this is really big change to me. I was hoping that there is a way to indentify the latest PDF State from the chunk element.

Any help will be greatly appreciated.

Tarek.

1 Accepted Solution

Avatar

Correct answer by
Level 4

The heart of the problem was that large images were being placed in XFA image fields. Due to the design of PDF and incremental updates, copies of these images were being added to the file for each file save.  I'll write more on this later, most likely on the ADEP product blog.  But for now, the solution is to limit the size of the image in the field.  [As background, the image was used for a 1x1 inch thumbnail of a face, which is well-satisfied by a 72 DPI highly compressed JPG, or around 20-40K bytes or less.  The images in the file were on the order of megabytes, which caused massive issues. 

John Brinkman did a blog post on how to check the image size and generate an error if it is too large.  You can see this on John's Formfeed blog, and it is quite elegant.

View solution in original post

28 Replies

Avatar

Former Community Member

Are you submitting as XDP because you want the data and PDF separately? If not why not just submit the data and leave the PDF out of it. You can change the cSubmitAs parameter to XML and then you will get data only. Are there signatures involved in this scenario? Do you have LiveCycle Server at the back end.

Paul

Avatar

Level 8

Thanks Paul,

The main idea of using Adobe PDF is to save the result as PDF on the server with all digital signature. Otherwise, I will use HTML Forms or ASP.NET Forms.

I am now looking for a method to remove all unwanted bytes from the chunk element, and keep only the minimum.

Appreciate it if any one can help.

Tarek.

Avatar

Former Community Member

So it is the signatures themselves that is making the size of the PDF grow ......there is nothing I can do about that.

Paul

Avatar

Level 8

No, it is not the signatures which is causing the problem. Even if I so not add any signature, then the size would still increase.

Tarek

Avatar

Former Community Member

Then that makes no sense .....the file shoudl grow slightly as more data is added but the signatures will cause a copy of the pdf to be saved so you can compare to the pre signature version and that is generally what cause the pdf size to grow large. Is the file Reader Extended?

Paul

Avatar

Level 8

Yes, the file is Reader Extended.

According to Adobe Support, only becuase I am using this command:

event.target.submitForm( {cURL: "http://server/ASPNETWebPage.ASPX", aPackets:["datasets","pdf"], cSubmitAs: "XDP"});

the above command will send the XML Data and the PDF curent state, and the last state of the PDF embedded (before any change). So basically, the PDF size will double with every submit.

I was told, if I send only PDF (without XDP) then only the last PDF state is sent. But then, I have to use another service/process "exportData" to extract the XML Data from the PDF.

I have never used "exportData" from .NET. So, I am now looking for a way to extract only the last stafe of the PDF from the submitted XDP.

Tarek.

Avatar

Former Community Member

What version of Reader Extensions are you using .......the last state of the PDF shoudl not be changed if you have not signed it yet!

Paul

Avatar

Level 8

I am using Adobe LiveCycle Reader Extnesions Server ES2. I can send you a short screenr.com video I recorded to show you the submission process and how the size is doubling !!

I can send you a private message with link to the video.

Tarek

Avatar

Former Community Member

No need ......I am not doubting that it is doubling in size ...just trying to figure out why.

There was an issue at one time where Reader Extensions was affecting the size of the PDF but that was in earlier versions than the one that you have.

Have you submitted just the PDF and if so does the size get affected?

Paul

Avatar

Level 8

Do you mean that I should try to submit the form as PDF Only ?

If so, I will add another button that will submit the PDF to a test URL, but I don't know how it will be received on the server ! Will it be received as stream of binary or text ? How to generate the PDF file on the server ?

I will give it a try.

Tarek.

Avatar

Level 8

I did a quick test, and I used the following javascript command:

event.target.submitForm( {cURL: "http://server/ASPNETWebPage.ASPX", cSubmitAs: "PDF"});

and on the server, using ASP.NET, I converted the input stream to a byte array, and saved the result as binary to a file stream. The result was a working PDF file and the size was OK. I did several submits with few changes, and the size was increasing by only 10-20 bytes max.

So, if there is no way to use this command:

event.target.submitForm( {cURL: "http://server/ASPNETWebPage.ASPX", aPackets:["datasets","pdf"], cSubmitAs: "XDP"});

and be able to generate a PDF with reasonable size with every submit, then this means I have to use {cSubmitAs:"PDF"} and I have to look for a way to extract the XML Data form the PDF using .NET.

Appreciate your help.

Tarek.

Avatar

Level 4

Did you catch my comment on your recorded video. assuming that you're the same person?  The issue that I saw there is that you were including images in the form data, and that they image was a 1.9MB TIFF file. Each image that you include will become part of the saved PDF. You should use an appropriately sized image. 

And yes, Lee, Paul and I have all been talking about this issue.

Avatar

Level 8

Hi Chuck,

Thanks for the feedback.

I was recording the video over VPN Connection, and that is why there is no sound, and the mouse was moving in a funny way.

Yes, I know I am using large images. But the problem is still there. If I use smaller size images, the problem will also be there. Even if I don't use any image, the size will increas in doubles with every submit.

I just used large images to see how far the process can go without breaking. I have received reports from various users that they are getting OutOfMemoryException. When I analysed the situation, I discovered the root casue which is the topic of this thread. Later, I decided to change the method for converting the "chunck" element form Base64 string to Binary and I used buffering to avoid this error, and I succeeded.

Now, I am not getting "OutOfMemoryException", but the size will continue to increase with every submit.

I am working now on new porject for Staff Appraisal and it involves 4 users: Staff, Manager, Director, and HR. Each one will have to submit the form at least 3 times (in one callendar year), and each time they have to sign the form. I need to do something now, in order to solve the root cause of the problem for the new porject. This new project is critical and upper management are watching !

Tarek.

Avatar

Former Community Member

Going back to our last test where you submitted just the PDF and saw an insignificant increase in size. This should be no different than the submission as an XDP (except the XDP will have all of the data as well) so it shoudl be a few K bigger). Chuck has mentioned the use of images ......are you including the images in your data stream? Are you also including the template in your data stream. If you are unsure can you write the inbound XDP file for a couple of submissions to separate files and send me the results. We can have a look at them here and see where the file size is coming from. You can send the files to LiveCycle8@gmail.com

Paul

Avatar

Level 8

Thanks Paul,

When I did the test for submitting only the PDF, I forgot to test with the same large images. This is the new server-side code I used to generate the PDF:

    Protected Sub Page_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load
        Dim thePDFStreem As New System.IO.BinaryReader(Page.Request.InputStream)
        Dim thePDFBytes() As Byte
        Dim theFile As New System.IO.FileStream(MapPath(".") & "\" & "thePDFFile.pdf", IO.FileMode.Create)
        Dim theWriter As New System.IO.BinaryWriter(theFile)
        Dim theURL As String
        thePDFBytes = thePDFStreem.ReadBytes(Page.Request.InputStream.Length)
        theWriter.Write(thePDFBytes)
        theWriter.Close()
        theFile.Close()
        theURL = CSLAIDB.Library.GetWebRoot() & System.IO.Path.GetFileName(theFile.Name)
        thePDFLink.NavigateUrl = System.IO.Path.GetFileName(theFile.Name)
        Response.AppendHeader("Refresh", "2; URL=" & theURL)
    End Sub

I did the test again same like before, and the size was increasing with about 2MB with every submission, even if I make a small change.

What do you mean by "send you the inbound XDP" ?

Do you want me to send the result PDF File ? Or the result XDP File in text format ?

Note: The support team has the PDF Samples I generated: the origianl size, and the the one with huge size.

Tarek.

Avatar

Former Community Member

Then lets leave it with support and let them get to the root of the issue.

Paul

Avatar

Level 4

I've been in touch with the support person in Edinburgh and looked at your files and your various attempts to make the size smaller. 

Simply stated, it really is working as designed, but it is difficult to appreciate that unless you go a bit deeper into the file. 

Or, to say this another way, it grows dramatically in size because you have added dramatic amounts of data.

So, using your example of this form with a 1.9MB TIF image embedded as a inch-square thumnail five times,  I picked up most of this information using publicly available tools, such as the document font list (document properties/ fonts), text editors, and Windjack's Canopener.

I'll give you a few metrics and comments which may help:

  1. The Base PDF file size is about 1.4MB.  Much of this is because of your embedded fonts which take over 1.1MB
  2. Your form is a reader-extended dynamic XFA form.  That means that the PDF itself does not contain the real pages as PDF marking operators...  It's generated each time you open it in Reader from the XFA form definition and your data.
  3. The image itself is 1.9MB.  But remember that this image is Base64-encoded, so it takes four bytes of XML for every three of image.  That makes the XML data 2.6MB/image.  And I'll note again that that's an incredibly large image to use in a square inch image.
  4. The file you've given us has the image repeated 5 times.  That explains the 14MB file size (2.6*5+1=14).  You can see a snapshot of the XML data and its size in the canopener view for "big".
  5. I presume that you know that PDF files have a versioned structure, where changes to the file add on in incremental change areas.  The file you sent has two areas...  One about 1MB and one 13MB.  You can see these if you open the file in a good text editor and search for %%EOF.  That happens at the end of each incrememental change.  In other words, the incremental change is all the XML data and there is only one incremental update area.  See section 7.5.6 of the PDF reference manual if you'd like to know more about the incremental update.
  6. You also observed that if you open this file in Acrobat 9.1 and save it, the file shrinks from 14MB to 4MB.  This is due to a feature that Acrobat added where it will compress parts of the XFA data stream.  You can see this in the canopener view for small: it is the exact same uncompressed size, but is reduced 10MB by the flate_compression. So you can thank Acrobat engineering, but it won't help your form submission issue much
  7. I'll also note that a basic check that I did on your file was to export the form data (tools/ forms/ more form options/ manage form data/ export in Acrobat 10) and saw the same size XML data stream for both of these.

You're basically running up against basic laws of space conservation: put a number of big things in a flexible sack, and the sack grows. I'd suggest that you give strong guidance to folks on the size of the image that they use.

PDF can be a bit mysterious if you can't see what's happening.  That's why tools like Canopener are key to shedding daylight on the dark insides.

Finally, I will note that your filesize WILL increase when you add digital signatures.  The size comes when you sign, not when you add the field.  Simply stated, Acrobat (or Reader) will make a pdf marking set of the pages each time that the form is signed... that's the record part of it and it is a new level of incremental change.  So you can expect it to grow as signatures are added.  Again, this is even more reason to use appropriately sized images.

Avatar

Level 8

Thanks a lot C. Myers,

You explanation helped me understand what is happening.

I have been following the same method for the past 4 years, and I was hit by this problem (OutOfMemoryException) only when some users started using image size more than 500KB. Then, I decided to report this problem.

I was able to rewrite the code to convert from Base64 to binary using buffering:

http://forums.asp.net/t/1662571.aspx/1?URGENT+Exception+OutOfMemoryException+thrown+when+when+conver...+

So far, I am not getting OutOfMemoryExceptions, but the PDF Size will continue to grow with every submit. However, if the all the images size is less than 50KB, the increase is not significant.

Please allow me to ask this question:

Is there a way to change the above code so that I can extract only the last version of the submitted PDF from the Data Stream "chunk" element ?

Sooner or later, some one will notice that such PDF sizes are not logical. Even when the PDF does not have images, I have noticed in the past, some PDF Sizes (for Staff Profile Data Collection Form) are something like 15MB !!! I was not able to figure out why. But now I understand. I think the user must have submitted the form for saving many times.

Now, things are OK. But, I will post back if this problem will fire back.

Tarek.

Avatar

Level 4

I'd like to see one of the files that has grown so much.  Or, better yet, I'd like to see a sequence of files, base, after submit with one image, after the next submit that adds another image, etc., and we can diagnose from there; also, a step where just some of the form data, not pictures, are changed.  But I'd also suggest that you get the 10 day trial pdf canopener from Windjack to inspect the files yourself for the base data AND that you count the number of %%EOF so that you can see the number of incremental updates (sounds like a good use of GREP).  But let's get some scientific numbers on the problem.

Best would be to send the files to support on the existing case number.

And I'd like to take this to a point of conclusion and then even do a brief blog on this topic.  I can only imagine that other people have these same issues.

As for the "getting just the last chunk," it really depends on the SW you are running on the server.  "Simple" PDF utilities will just always make an incremental update. More rich software, like Form Data Integration in processes in LC let you export the data and then import to a clean form. And there are also tools in LiveCycle like assembler that will consolidate the incremental updates. 

But the overall question is "what software are you using to merge the XML data into the form?"  Is it from Adobe or somwhere else?  Your forum posts don't shed any light on this.

Avatar

Level 1

I will try to prepare the files you requested, and I will send them all to support.

I am not using any tool to merge the XML Data with PDF. I have developed a .NET Program to merge XML with PDF using XDP format. The result is rendered to the client browser as XDP MIME Type using VB.NET "Response.write()"

When the PDF is rendered on the client, then when the user clicks "Submit" or save, and the PDF sent to ASPX Page on the server, then the "chunk" element is extracted from "Page.InputStream" and converted from Base64 to Binary Array, and the PDF is then generated as PDF file and saved on the server. All this is doen using .NET Program under IIS Server on Windows 2003 Server.

I will try to use LiveCycle assembler services that will consolidate the incremental updatesthat but I have never done that from ASP.NET.

Tarek.