Ralf Herrmann Posted April 29, 2015 Posted April 29, 2015 There are already some helpful comments in the other topic. I am starting another one for discussing the database structure only. I just set up a test installation to play around with the categories and fields, but there are still many questions. Which data would you want to collect and be able to use as filter in an online archive of type specimens? There are obvious and easy things like “year of release”, “title” and things like that. But would you need “type of publication”, e.g. book/booklet/leporello/sheet/poster? What about specimen size? Scan resolution? Page count? …What else? Some things are useful, but tricky, for example “country”. In 500 years of printing borders have shifted and names of countries have changed. For filters one would need to offer strict and fixed choices. When the country field is just a text field it’s easier to put something in but difficult to get good filter results. Same problems exists for foundry names. That’s one of the most important fields for such a database. But again, names have changed often and that tears apart entries which would belong together and users might expect to appear together. Maybe @Lars has some recommendations about this? He is experienced in dealing with font data from a developer perspective.
Lars Posted April 29, 2015 Posted April 29, 2015 I must admit I haven't completely followed "the other topic", but I'll catch up on it. Actually there are existing schemes for bibliographic records like this (MARC 21 for example), so it would be a good idea to support this in some way. For countries I would simply stick to current names, even there are a few Incunabula type specimen these should be recorded as Italy, Germany, Netherlands and so on. Dealing with old country names is like dealing with old and new style dates/calendars, these should be recorded somewhere, but do not need to be search-able. Scanning or photographing actually should be taken care of in terms of scales, so photographs should at least show a scale for each page, otherwise it's not possible to estimate type sizes later on for example. Foundries ... hmmm ... it's important to have a good reference here incl. successor, predecessor and acquisitions. This makes it much easier to track down the origins of a face. Dates are also important of course. I can make a suggestion for a scheme and post it here for discussion next week.
Riccardo Sartori Posted April 29, 2015 Posted April 29, 2015 For the examples you give, there would need some sort of correlation in place between keywords. It could be hierarchical (so, for example, looking for "Germany” one would have results also for “Prussia”, or looking for “ATF” one would have results also for “Binny & Ronaldson”, but not the other way around). Or it could be arranged in “clusters”, and any given keyword would give, along the results, a list of related terms useful for expanding or narrowing the search. In any case, I think there should be a simple free-form search (for the casual user) and a very advanced one with filters, operators and so forth (for the dedicated scholar). Thinking on how I would peruse such resource, the one thing I really would like to reference (thus having it in search results too) is the showing of the single specific font, regardless of year, page, foundry, country, and so on.
Ralf Herrmann Posted May 7, 2015 Author Posted May 7, 2015 On 29 April 2015 at 9:46 PM, Lars said: Actually there are existing schemes for bibliographic records like this (MARC 21 for example), so it would be a good idea to support this in some way. Oh my, that looks complicated.
Lars Posted May 12, 2015 Posted May 12, 2015 Just a few more thoughts: I would probably focus on publications rather than on faces from publications. So one would record the publication and just a basic list of face names. Recording all face details (incl. all details like item/order number per size) from a publication seems far too much effort from my point of view, also for earlier faces the name is basically useless and requires a visual representation. Never the less, adding information on face level (at least a face name, publisher/foundry and date) should be possible afterwards. My suggestion would be to focus on recording the publication itself with as much information as available and the images. So for a publication you would need the title, publisher, date and number of pages at minimum. Date can be extended to hold information about the date type (common or local, so you could record local date formats for dates not starting on 1st January for example, obviously only necessary for the really old publications) and and reference to the date (direct or indirect mention). I would record the imprint and index as simple text fields, allowing the users to paste in the OCR version of the imprint and the index to faces. For images one should be able to assign a type (title, index, imprint, type showing, other) and some technical details (photo or scan, original parameters used for the scan/photo and so on). Publishers require at least a name, a role (printer, private press, foundry, distributor, other) and a location. Other meta data can be added later on, as long as the publishers are unique. The model/scheme should work with two types of "data input": A) users scan the complete publication B) users scan or photograph just the most basic pages (title, imprint, index) I presume that B will be used more often and the title, imprint and index images should be kind of mandatory. Fine tuning search and other optional meta data input can be done afterwards in a second step imho. Taxonomy/tags and similar taken for granted. After reading all this it sounds like an archive.org/openlibrary.org thing would work/do fine. Focusing on the images, supporting OCR to make all content search-able and the option to add various meta data and tags.
Ralf Herrmann Posted May 12, 2015 Author Posted May 12, 2015 Thanks Lars, It looks like you understand the concept almost exactly as I envision it. I agree to almost anything you said. By the way: Do you have recommendations about hosting? I am very happy with my German go-to host I use for over 10 years, but I already noticed that I can’t use it in this case, since uploading and processing such huge files hits all the limits of the shared hosting packages and a full server package is too expensive to start off such a non-commercial project.
Lars Posted May 12, 2015 Posted May 12, 2015 Ralf, I would suggest to stick to the usual suspects when it comes to hosting. Due to the nature of this project the circle of users will be relatively small, so a simple shared hosting setup would do. Having a shared, managed or even root server would be nice for adding OCR functionality server-wise and having a separate CDN provider for static content (images/scans) would be nice, too, but not really necessary. For most users images scaled to the a resulting file size of c. 1 MB should work fine and with a regular shared hosting this should work for up to like 50.000 images? Even I like the libre idea I would probably add a "pro" account to serve original images to paying users via a separate CDN. I'm using Hetzner for everything from simple shared hosting to virtual, root or managed server and I still like them.
Ralf Herrmann Posted May 12, 2015 Author Posted May 12, 2015 For most users images scaled to the a resulting file size of c. 1 MB should work fine and with a regular shared hosting this should work for up to like 50.000 images? If people make a 600dpi scan of a full page or even poster and then upload that straight to the server where are thumbnail for the default screen view is created, it needs a lot of RAM and computing time. It doesn’t work with my current host. It fails for images above like 3000 pixels in width. I could ask users to upload low-res and hi-res images separately and don’t do any server-side manipulations, but that would not be very user-friendly or professional. Even I like the libre idea I would probably add a "pro" account to serve original images to paying users via a separate CDN. Yes, that’s currently my favourite model as well. It could even be a simple member group functionality, where you don’t “buy” individual files with complicated licensing, but instead you just make one payment and get full access to all hi-res files for a year or so.
Lars Posted May 13, 2015 Posted May 13, 2015 600dpi? Wow. How about dealing with only like 200dpi and allow people to request higher scans of specific pages if they really need to view the details of a 6p face from the owner of the book directly? 600dpi (and higher) sound like a scholar/research level thing to me, where 200dpi sounds like a good quality average to view specimens online and zoom into them to show an "o.k." level of detail.
Ralf Herrmann Posted May 13, 2015 Author Posted May 13, 2015 (edited) Well, it is an ambitious project with a long-term value. So I figured, why not ask for the best scan quality possible. When I started the German Font-Wiki 2 or 3 years ago I defined the record image as 75 by 75 pixels. Now, with a Retina screen, I already regret that. But okay, maybe the type specimen archive should start out more modest than I originally thought and the image and server specs can be raised over time when the project becomes financially more stable. Edited May 13, 2015 by Ralf Herrmann 1
Ralf Herrmann Posted May 15, 2015 Author Posted May 15, 2015 More rapid prototyping. Started adding filters:
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now