Design, query, and evaluate information retrieval systems.
Information retrieval and librarianship have undergone a drastic change in the last few decades. Computers and technology have changed the way information is sought and retrieved, and library information retrieval systems (IRS) have transformed from a card catalog system to a digital database. Traditional card catalog systems relied on individually printed bibliographic records stored on-site in filing cabinets or some other form of storage cabinet. By contrast, digital databases also contain individual bibliographic records, but they are stored on servers in digital format, accessible from the internet, and no longer requiring physical on-site access to retrieve the desired information. Due to the ease of access and the speed which information can be retrieved, the card catalog system became obsolete in favor of digital databases and virtual IRSs. Cataloging and classification standards remain largely the same, as covered in detail in Competency G, but electronic records can provide for more aboutness of a record, that is, the subject of the document and what the information is about, rather than just a description of the document (Weedman, 2018, p. 141). This allows for a user to get a better sense of what the record is about and whether it addresses their needs. Regardless of the type of database (IRS) used, the basic function remains the same, enable a user to perform a search and retrieve the desired information that is useful and relevant to the user. To do this effectively, three elements must be considered: design, query and evaluation.
Design
An information retrieval system has two parts, a database and a search engine. The database consists of a collection of information/records and is where the information is stored. The search engine is the apparatus that allows a user to search for the specific record they are looking for. This digital database allows for virtual access and nearly instant retrieval of information. In order to receive the most relevant information, the software engineers need to ensure that the database is designed taking consideration of utility and user experience. Complexity should be kept to a minimum, while still creating a comprehensive search function that can access a controlled vocabulary of search terms. This can be accomplished by creating an index of the documents. According to Ceri, et.al (2013), "the index is a logical view where documents in a collection are represented through a set of index terms or keywords" (section 2.1.1). The keywords can be specific terms extracted from the document based on topic or subject headings. Whatever the keywords assigned to a specific document, they must achieve two goals: exhaustiveness, ensuring enough terms are assigned to the document, and specificity, ensuring those terms are semantic and non-generic to avoid inflating the index (Ceri, et.al, section 2.2). Only terms that add value to controlled vocabulary and refer to the aboutness of the document should be included. As information professionals, our focus may be more on the user end of the system, but it is still beneficial to know the basic design of an IRS and database.
Query
A query is simply a question, or in the virtual sense, a search. It is what we are looking for in an information retrieval system, be it an academic journal or database, an online public access catalog (OPAC), or even a search engine like Google. Whatever medium we decide to use, we must know the strategy for receiving the best results. This is more than just a being a good "Googler". As an information professional, we need to demonstrate refined search methods to get better results. As Tucker (2018) highlights, "this means having an informed view and exploring both the database content and the options for searching that content (p. 319). This involves using advanced search functions through whatever engine we are using. For example, throughout the MLIS program, I conducted research for papers I was writing and literature reviews. Using the MLK library, I would initiate a search and look through the results, then refine my search using Boolean operators and the advanced search features. I would continue to refine by peer-reviewed articles and date range to further narrow results.
Evaluate
An evaluation is an investigation into how well our stated goals and expectations have been met. When measuring the performance of an information retrieval system, relevance must be taken into consideration. Relevance has two main factors: precision and recall. Ceri, et.al (2013) state that precision "is the fraction of retrieved documents that are relevant to a query and provides a measure of the 'soundness' of the system" (section 1.1). Essentially, precision measures the relevant results among the retrieved results, or how useful the results are. Recall "is the fraction of relevant documents that are effectively retrieved and thus provides a measure of the 'completeness' of the system" (Ceri, et.al, section 1.1). Recall is the amount of relevant results that were retrieved from the database, or how complete the results are. Imagine a search for documents about virtual reality use in teen spaces in libraries. You receive 40 results, 30 of which are relevant. Precision would be 30/40, or 3/4. If there were an additional 30 documents available that were not retrieved, the recall would be 30/60, or 1/2. Why are these numbers important? These statistics provide an evaluation of how effective the database is at retrieving the information we want. It allows us to determine if the design process needs to be modified or algorithms edited to provide more relevant information.
Supporting Evidence
I first learned the technicalities of databases and database design in INFO 202: Information Retrieval System Design. I had a group assignment that tasked us with designing a database on WebDataPro using non-traditional items in a collection, which we decided to use coffee tables. We also had to write a paper detailing our statement of purpose and our rules and guidelines for each database record. Both team members had an equal part in coming up with the fields, vocabulary, rules and guidelines, and we were each responsible for creating three records in our database (I created the two Mathis and Everett brand records). Specific to the paper, I was responsible for the data structure plan and overall editing and formatting. This assignment demonstrates my understanding of how to create a database and the corresponding search function, and how I can identify the target audience and simplify the search parameters.
The database link in the paper requires the following credentials; username: alyce.scott@sjsu.edu password: moxie
To demonstrate competency in database design and information retrieval, I am using my final assignment from INFO 246: Python Programming. For this assignment, which can be accessed here, I created a program that runs a simple game called "The Wizard". The game asks the user to input a simulated monetary donation in order to answer questions posed by the user. The program retrieves information from a database I created on the accompanying server. The program calls on external functions for data validation, and then proceeds to access answers from a list dependent on what the user inputs on the screen. This demonstrates my ability to use technology, as I coded this program in its entirety, and how I can develop databases in an unconventional way, other than a typical database for books or other materials. Coding with Python can further lead to programming databases for information retrieval based on longer lists with more information and complex queries.
Conclusion
Information retrieval systems have been ingrained into society, specifically web systems. Google search has given us access to a plethora of information, unheard of only twenty-five years ago. In libraries, gone are the days of the card catalog, and modern libraries have fully incorporated an online system for information retrieval, whether its a source or reference database. As an iSchool student, information retrieval systems have been an integral part of my studies. Learning how to properly utilize different databases and search more profoundly has given me a deeper understanding of how databases work. We may not all design databases in the future, but understanding the design, query and evaluation principles of information retrieval systems can help us be better information professionals. Knowing the basics of database design can help us better assist patrons by demonstrating how to design search strategies by using particular search terms, integrating Boolean operators and using advanced search features.
References
Ceri S., Bozzon A., Brambilla M., Della Valle E., Fraternali P., & Quarteroni S. (2013) Web information retrieval. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39314-3
Tucker, V.M. (2018). Lecture 6: Search. In V.M. Tucker (ed.), Information retrieval system design: Principles & practice (5th ed., pp. 317-326). AcademicPub.
Weedman, J. (2014). Lecture 3 supplement: Subject metadata. In V.M. Tucker (ed.), Information retrieval system design: Principles & practice (5th ed., pp. 317-326). AcademicPub.