I was developing a program which allows me to take a bunch of HTML files and put it into a single document. These files need to be put in as rendered in IE and the HTML code tags should not be visible. Still all the links, etc. should be along in the document so generated. Manual way to do it would be to open the HTML files one by one in Internet Explorer or any other browser, press CTRL+A, then CTRL+C, open the word document, go to End of File (EOF) and then press CTRL+V. One of the questions is the use case. Why would I need to do something like this?
We are all aware of e-books in CHM (Complied HTML format). I have loads of them. And sometimes, the TOC (Table of Contents) gets corrupt due to one or other reason and I cannot navigate around. The other thing is that if I want to print, I cannot print all of the topics in the ebook. I need to select one by one and I don’t know if it is only on my PC, that I cannot get them to print properly. Most of the time, the output is blank pages. Also you loose the flexibility of converting to another favorable format such as PDF, etc. Hence I generally de-compile the HTML using Microsoft HTML workshop and then make a doc out of it. If you have 5 HTML pages to deal with, it is okay to do so manually. And I found it very boring with 5 anyways.
Read more about it, get the binaries and source code in the forum link: http://www.naresh.se/phpBB/viewtopic.php?f=17&t=11. The same link will also talk more on the implementation details about pipes, events, COM interfaces, Wrappers, etc. once I dissect the code and start with explanations interested people to follow. It also uses the Full Duplex anonymous pipes as explained in the post: http://www.naresh.se/2009/09/16/anonymous-pipes-in-windows/
Ahhh… and one more thing before I close the post. You might be wondering as to why I am writing code for doing the shit, when I can open word document, click on Insert File and select all the HTML files and it just works as my code does. Well, it does, but try doing it with 100 files and Word would simply not be able to do it. At least it didn’t work on my machine and I don’t know if my machines Word installation is fucked up or if Word seriously cannot do it.
Cheers & have fun computing…