OpenXML tag search

The problem with trying to find tags is that words are not always in the underlying XML in the format that they appear to be in Word. For example, in your sample XML the <!TAG1!> tag is split across multiple runs like this: <w:r> <w:rPr> <w:lang w:val=”en-GB”/> </w:rPr> <w:t>&lt;!TAG1</w:t> </w:r> <w:proofErr w:type=”gramEnd”/> <w:r> <w:rPr> <w:lang … Read more

How to extract text from word file .doc,docx,.xlsx,.pptx php

Here is a simple class which does the right job for .doc/.docx , PHP docx reader: Convert MS Word Docx files to text. class DocxConversion{ private $filename; public function __construct($filePath) { $this->filename = $filePath; } private function read_doc() { $fileHandle = fopen($this->filename, “r”); $line = @fread($fileHandle, filesize($this->filename)); $lines = explode(chr(0x0D),$line); $outtext = “”; foreach($lines as … Read more

VBA: Using WithEvents on UserForms

You can create an event-sink class that will contain the event-handling code for all of your controls of a particular type. For example, create the a class called TextBoxEventHandler as follows: Private WithEvents m_oTextBox as TextBox Public Property Set TextBox(ByVal oTextBox as TextBox) Set m_oTextBox = oTextBox End Property Private Sub m_oTextBox_Change() ‘ Do something … Read more

.doc to pdf using python

A simple example using comtypes, converting a single file, input and output filenames given as commandline arguments: import sys import os import comtypes.client wdFormatPDF = 17 in_file = os.path.abspath(sys.argv[1]) out_file = os.path.abspath(sys.argv[2]) word = comtypes.client.CreateObject(‘Word.Application’) doc = word.Documents.Open(in_file) doc.SaveAs(out_file, FileFormat=wdFormatPDF) doc.Close() word.Quit() You could also use pywin32, which would be the same except for: import … Read more

Is there a Java API that can create rich Word documents? [closed]

In 2007 my project successfully used OpenOffice.org’s Universal Network Objects (UNO) interface to programmatically generate MS-Word compatible documents (*.doc), as well as corresponding PDF documents, from a Java Web application (a Struts/JSP framework). OpenOffice UNO also lets you build MS-Office-compatible charts, spreadsheets, presentations, etc. We were able to dynamically build sophisticated Word documents, including charts … Read more

Create Word Document using PHP in Linux [closed]

real Word documents If you need to produce “real” Word documents you need a Windows-based web server and COM automation. I highly recommend Joel’s article on this subject. fake HTTP headers for tricking Word into opening raw HTML A rather common (but unreliable) alternative is: header(“Content-type: application/vnd.ms-word”); header(“Content-Disposition: attachment; filename=document_name.doc”); echo “<html>”; echo “<meta http-equiv=\”Content-Type\” … Read more

Reading/Writing a MS Word file in PHP

Reading binary Word documents would involve creating a parser according to the published file format specifications for the DOC format. I think this is no real feasible solution. You could use the Microsoft Office XML formats for reading and writing Word files – this is compatible with the 2003 and 2007 version of Word. For … Read more