舊版報紙資訊網: Initial Thoughts and Technical Review

I have been spending a few days looking at one of the important early postwar newspapers in Taiwan, 臺灣新生報, using the microfilm collection on the 6th floor Taiwan resource center of the National Taiwan Library (國立中央圖書館臺灣分館). Yesterday, I happened to catch a glimpse of someone viewing some old copies of another important postwar Taiwan paper, 民報, using an online database, which I will offer some comments about below, following a brief opening rant.

I’m not a big fan of microfilm newspapers. The advantages of this medium over providing access to physical copies or bound printed copies are obvious. Among them include: 1) preservation 2) space conservation 3) the ability to zoom 4) ability to print zoomed in articles from microfilm machines onto various sizes of paper, etc.

However, from the point of view of the historian, the disadvantages soon become apparent: unlike the bound printed copies of, for example, 中央日報, 申報, or 大公報 that I can find in various libraries, which sometimes shrink the original size of the newspaper such that the characters in the articles are barely legible – it is actually possible to browse these through these collections quite fast. It is in fact faster to turn the pages of a book and scan a page of a newspaper for interesting articles than it is to operate the knob of a microfilm machine and zoom in and out on interesting looking pieces.

More importantly, the higher contrast of black text on the printed white of paper makes the experience of looking at bound volumes far more pleasant than a microfilm machine. With the exception of some digital microfilm readers that artificially boost the contrast, the vast majority of microfilm readers I have used in Taiwan, Korea, Japan, and the United States are extremely hard on the eyes. If you have to sit at the reader for 4-10 hours, with some short breaks, for even a few days in a row, the impact on one’s eyes is noticeable. The sick yellow background (or black with white text, as the default is for many newspapers I viewed in Korea’s national library, hardly much better) of the microfilm reader, desperately trying to transfer light to the viewer through its lenses always seems to fall below the expectations of my eyes, which yearn to look at real paper, or even the greater contrast of a computer screen!

Digital databases of newspapers are always welcome. In addition to the power of database searching, they offer some of the benefits of both paper bound and microfilm collections but also some more serious defects. It is not all one glorious march towards progress. In my experience, I have found that digital newspaper collections (as well as many library OPAC databases and other online resources) often are designed by people that appear to greatly underestimate the importance of browsing. It isn’t just about what is there in that article or even on that page; historians often want to know what can be found near that article, page, or issue. Sometimes we aren’t looking for a single article about a single topic, but trying to get a feel for the kinds of things being written on the days and weeks surrounding a particular historical event. It is all part of the task of surveying the discursive environment of a time or place.

Now, having made these comments, let me turn to the database I discovered completely by chance yesterday: 舊版報紙資訊網. Read on below the fold…

Contents

From what I can tell, the 舊版報紙資訊網 is a digital newspaper archive project put together by 國立台中圖書館 and first launched online in the spring of 2003. Its own introduction reports that it currently includes, among others these newspapers: 民報(館藏民國三十四年十月十一日至三十六年二月,已停刊)、公論報(自民國四十一年三月開始收藏,已停刊)、民聲報(自民國四十一年一月開始收藏,已停刊)、香港工商日報(自民國三十九年五月開始收藏,已停刊). Another page reports that it contains: 民報、正氣中華、外交部周報、工人報、台東新報、大華新聞、更生報、攝影新聞等.

This is a truly impressive list and a great service to historians of early postwar Taiwan, but even more exciting is the claim that they want to add to their collection 中央日報、中華日報、and 台灣新生報. The last two among these are particularly important for historical projects involving early postwar Taiwan.

Unless I’m missing something (please let me know if I am!) there is no way to browse by newspaper or date using the web page so I wasn’t able to easily confirm what issues of what newspapers in the above list were available. The only way to access the contents of database, as far as I could tell, is through their search page. Using various search terms I was able to get the useful 民報 for most searches 1945 to the suppression of the newspaper in 1947 following the 2.28 incident, and 外交部通報週報, 青年新報, and 攝影新聞 for searches in the 1950s. However, among these four papers, I was only able to view images of 民報 and no images appeared for the others.

This inability to browse, both within the available range of a newspaper or limited by newspaper is truly crippling. It really need not be this way since the database clearly offers a way to index by the name of the newspaper and by date. I very much hope that they will add this capability in the future, and furthermore, add an “update” page (ideally with an RSS feed) which will indicate which recent additions have been made to the database so we can follow the growth of the database and return when an important addition has been made (e.g. 中華日報 and 台灣新生報)

Technical Review

Let us look closer at some of the technical aspects of this online database. As already mentioned, the website doesn’t seem to have images for some of the newspapers that appear in the search: I wasn’t able to get images at all for newspapers other than 民報.

The search mechanism, when it functions, does provide the ability to search by a large number of metadata categories such as title of the article or author, etc. It was not clear to me how much data was actually indexed from each article. You can limit the searches by year or exact date (in Western or 民國 years), but not by newspaper.

Unlike many of the world’s worst online newspaper databases (Japanese and Korean historical databases are, in my opinion, the most infamous in this regard), this collection fortunately does not appear to require any special plugins, ActiveX components, etc. The creators of this collection bravely resolved that standard web images are more than sufficient to show images on the screen. The small preview images are jpg files (example) that are actually shrunk down and can be viewed in larger size if you download the image while the full size image is a TIFF file which can be downloaded and viewed on any operating system. I find the choice of JPEG for the preview image to be unusual, since black and white or greyscale images are often much smaller and clearer if saved as GIF files. They are, however, clear.

When a search is performed, a list of hit articles are returned. When an article is clicked the screen splits into 2. An image of the newspaper page with the article appears on the left and on the right a list of article titles on that page, with the article searched for marked. While the list of other articles is a useful addition, there is a flaw with this design: Splitting the screen in half wastes important horizontal screen space, forcing the developers to use a much smaller newspaper page image than necessary. If the article information was place below/above the image, perhaps broken into two columns to minimize its vertical coverage, then they could use a newspaper image twice the size on the screen, and thus make it far more legible to the reader. As it is, the small image makes only some of the newspaper titles on the page visible and it is difficult to tell how long the article one is looking for is.

In the background, if popups are allowed, a much larger image of the page often seems to appear, and will also appear if the smaller page image is selected. However, I have never been able to get these larger images in the popups to appear, though I have seen them appear on a library computer, so it may have something to do with the version of Internet Explorer I used (versions 7 and 8 on Windows XP).

Now let me list some of the many many design and coding flaws of this database that need to be fixed if this collection is to reach its potential. I hope this will not only serve as a critique of this newspaper collection but will be read by others creating similar collections as a warning of the kinds of mistakes to avoid:

Only Works on Internet Explorer Using Windows – This problem follows the long tradition of digital archives, especially in East Asia, being created by programmers who apparently don’t know that there are standards compliant browsers other than Internet Explorer (Firefox, Safari, Opera, etc.) and other operating systems besides Windows (OS X, Linux). If you try to open the search page using Safari on a mac, you get this completely unusable page:

6safari.gif

If you open the search page using Firefox you will find that, due to poor Javascript programming, drop down menu items are missing so you cannot choose the type of search you wish to do:

6firefox.gif

If you try to search using either Firefox or Safari you will get this message:

6firefoxsearch.gif

Web standards are important, and it is no longer acceptable, as it was perhaps more accepted back in 2003 when this site was developed, that your web page fail to function with those standards for maximum durability into the future. The above problems were due to some simple errors Javascript, especially using references to objects in a way understood only by Internet Explorer.

Text Encoding Is Missing – Another common problem in East Asian historical databases is that programmers assume that every computer viewing their website has Internet Explorer configured with their own favorite encoding as the default, in this case Big5 for traditional Chinese. But what if you are viewing the web page on a mainland Chinese computer, or a Korean, Japanese, or American computer? The result is that some pages will appear like this:

2encoding.gif

As I show in the source below, the developer did not include any meta tag to indicate the Big5 encoding here so that the buttons and message are unreadable until you right-click and physically change the encoding yourself. This is despite the fact they did include this tag correctly on other pages:

2.1encoding.gif

Poor Overall Design – There are some aspects of this site which are simply poor overall design. These include serious problems with text and tables overlapping with background images and a background pattern that is not made wide enough to accommodate the larger resolutions of today’s monitors, making some text almost illegible.

Here are three examples:

3.1overlap.gif

1overlap.jpg

3overlap.gif

There are also a number of completely mashed buttons, and I’m not sure how they ended up creating the effect. The effect, however, is to make three of the special features of the collection largely invisible to the visitor who cannot read their titles:

7mashedbuttons.gif

Here is an original version of one button:

7.1mashedbuttons.gif

Incidentally, it puzzles me that they offer these special searches for advertisements and riddles (as one of the others is) but not the ability to browse by individual newspapers!

Frequent File Not Found Errors – There were many cases where I searched for an entry and after going through several pages of hits, eventually was given a file not found error for the next page:

5notfound.gif

This error 500 is a script error, which suggests that it was not able to handle the parameters of the call for the next page.

Incorrect File Type – This collection doesn’t require special plug-ins, which is very handy. It offers the ability to download a full TIFF image of single pages of the newspaper (PDF downloads of whole issues, would be nice, but perhaps asking too much).

However, the file that is downloaded does not have a .tif or .tiff or other recognizable file type. It is a TIFF file but will not be recognized as such by Windows or, if the file is transferred to OS X or Linux, on other operating systems either. People with more computing experience will know they can simply change the file name’s final characters to .tif to get it to be recognized as an image, but some less technically inclined historians will not know how to open the downloaded “.xtf” images. These TIFF images are certainly not XTF files.

4filesave.gif

The developers should ensure that the downloaded file is given the correct file attribute.

100 Hit Limit Too Small – I understand why databases provide a maximum number of hits. They need to manage the load on the database server which is processing the searches of all visitors. However, 100 does seem too small, and I would suggest 500 or even 250 as a more reasonable number. I also feel that it is also lazy programming to leave out a link at the end of the first hundred hits that allows the user to query the database for the next 100 hits. As every web database programmer knows, this is easily done with a simple modification to the MySQL query. That way, if someone really wants to go through many hundreds of hits, they can do so without asking for all the returned results all in a single query.

Conclusion

I was delighted to find this online digital archive, the 舊版報紙資訊網. The quality of the downloaded images themselves, which are in nice standard TIFF format are very clear, often more so than the bound or microfilm versions I have been looking at so far.

If it eventually contains the half dozen or so most important early postwar Taiwanese newspapers it will allow powerful search abilities for a scholar without access to hard copies or microfilm collections often found only in Taiwan. If good solid browsing features are added with fast and easy viewing of the high contrast images, it will easily outdo microfilm in usefulness. If someday, powerful OCR software indexed some of the text in the pages (at least the more clear text in the titles where OCR software stands a chance at accurately interpreting the content) or if article titles were embedded as metadata into the files and the database collection provided us PDF downloads of these pages, it would allow us to search for relevant articles in our own set of downloaded collection of relevant articles when we are away from the internet and access to the online database.

I hope that this online collection receives sufficient support and funding to continue its digitization and indexing efforts but that it will also invest some time and effort into improving the code and design of the existing infrastructure of the site.

3 Comments

  1. Did you notice any information on whether they plan to put 人民導報 in this archive or how to access this newspaper, which was important in the period leading up to 228?

  2. I just spoke to someone who knows a bit more about the project last week and they were very pessimistic about anything being added to the project and said it may be in its “finished state” as it is now (that is, with all the technical issues and without some of the newspapers that it claims to have). I too have been seeing a lot of references to articles in 人民導報 for my interests as well and I wish it included the paper.

Leave a Reply to Dennis EngbrthCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Mastodon