Blog Posts and Project Reports and Contributor Guides, Oh My! (Week 13)

This week I’ve been looking back at previous assignments, blog posts, and Google Doc notes to sum up the evolution of our project for the project report. My part of the contributor guide, how to process oral histories and other incoming files, is somewhat contingent on what Dr. Sies’ classes’ processes have been in the past, about which we will find out more through Suzanne and Jenny’s meeting with her on Thursday morning. In the meantime I’ve been continuing to gather best practices from the list of collection resource guides Jesse provided to us. I also cleaned up our Airtable People controlled vocabulary, which entailed separating records with multiple people into separate person entries, alphabetizing them, and deleting duplicates. We decided to move any specific people from the Subjects controlled vocabulary to the People, leaving only family names in the Subjects. We ended up with 120 Subject entries, 257 Places/Institutions, and 613 People for 746 items!

Week 12: Where Metadata Ends and Contributor Guide Begins

It’s been satisfying this week to see our combined controlled vocabulary go from 346 Places/Institutions down to 261 with the elimination of duplicates. I’ve also been looking through the community heritage resources Jesse sent us, and they seem like they’ll be very helpful for my part of the Contributor Guide describing how to input and process future files. Although we haven’t heard back from MITH yet (as Maya pointed out), we did just hear back pretty quickly from Dr. Sies, the American Studies professor whose classes have worked on the Lakeland Community Heritage Project in the past. We’re working on coordinating a meeting with Dr. Sies in the near future to get the missing context we discussed last week. Hopefully next week we’ll be reporting back on a successful meeting!

Speaking of Missing JPEGs…

I have now catalogued the metadata in Airtable for my quarter (186) of the 746 Omeka records, in time for our self-imposed deadline of tomorrow. Along the way I have found that in several records, for some reason particularly ones describing JPEGs, there are file descriptions, but the file itself is not attached. Hopefully MITH, or someone at least, still has these files on one of the 3 hard drives associated with this project, since they don’t seem to be on the Omeka servers.

These records in particular, unsurprisingly, have especially poor metadata, which is both sparse as well as rife with spelling and grammatical errors and inconsistent capitalization and punctuation. These were lower-numbered records, so they seem to be among the earliest inputted (but among the last I catalogued because the Omeka pages work backward from most recently inputted), and an Omeka learning curve could help explain some of these errors. We’re awaiting access to the admin side of the Omeka site so we can have more context to understand the file and record inputting process.

Fortunately, most of the records are indeed attached to files, with varying levels of metadata that are interesting to analyze. I can definitely see the value of our controlled vocabularies in organizing and standardizing the metadata, and our inventory in helping MITH assess how much overlap there is between their hard drives and the Omeka site and fill in any gaps. But I wish we had more time for this part of the project! Now we prepare to move on to writing the contributor guide, which I don’t think I will enjoy as much as I’ve enjoyed cataloging and learning about the people and history of Lakeland!

…And More Metadata! (Week 10)

Like my fellow group members, my project work for both this week and next week consists of entering my quarter of the LCHP Omeka site’s metadata into our Airtable inventory spreadsheet. At this point I have inputted 70 records, with 115 more (some more of which will be done tonight and tomorrow) to be done by April 12.

So far I’ve been mostly focusing on inventorying what is already there, and only adding what I can add quickly, but I plan to go back and add more at the end. I ended up leaving the “Document Creation Location” column in my tab for now, because once I got to the Oral Histories it seemed useful to note where they were recorded.

For further along in the project plan, Jesse said that he has some examples of contributor guides he can show us, which should be very helpful.

Now back to metadata!

Reflections and Approaches (Week 6)

The main lesson I took away from our group’s first meeting with MITH and our LCHP clients a week ago was, as Suzanne noted, the challenge of finding resources (both time and money) for small community archives like LCHP. Our meeting was very detailed and helpful, but much of it was spent trying to coordinate future times that worked for everyone for further meetings and grant application deadlines. A couple of technology issues also sometimes made it difficult for us all to hear one another over the conference call (we and the MITH folks were all together in person, but our LCHP clients joined the meeting via conference call and screen sharing). The fact that so many stakeholders from different organizations, including the former mayor of College Park, are willing to take time out of their busy schedules to help with this project is impressive. This project has been going on for years, through different UMD classes and groups of volunteers, and there are so many details and moving parts that it can be hard for us to catch up! But we did get some ideas for possible approaches from LCHP’s summary of their survey of informal “focus groups” of Lakeland-area residents, both young and older, who are the prime audience for the LCHP digital archives. According to LCHP, the young people said that in terms of the design of the digital archive, they’d prefer mostly pictures or diagrams and oral histories on the main page to draw them in, and then some text but not too much. One request that was consistent across both groups was to organize the digital archive documents by geographic location and possibly also by family or community relationships.

So LCHP wants to work with us to build a prototype using these organizing principles, with a preference for photos and other items that are easily shareable over social media. Before putting additional materials onto the public website, they want to sort them into two separate groups: those that already have geographic locations associated with them, and those that don’t, so they know which they can put up and which they need more information for. Sorting materials by age and subject can also help them decide which community members they can go to for this information. Their ideal goal is to eventually have an interactive app that provides a walking tour of Lakeland with geolocated photos and documents associated with certain locations. Suzanne’s idea of using HistoryPin would be perfect for this, but we haven’t brought this up with the client because we don’t know how we would integrate it with their existing digital archive, and we feel we have to manage their expectations for the limited time we have. We will probably start with helping them organize their existing Dublin Core metadata and adding some more metadata to their Airtable spreadsheets, and then go from there. I think all of the capacity assessment tools that we read about for this week have some aspects that could be helpful for our project, especially the Digital Preservation Management Tutorial “Survey of Institutional Readiness,” because it states that it “is intended to help you take stock of the requisite components of a digital preservation program and to help you begin or proceed with your digital preservation planning” (page 1, emphasis mine), which seems realistic for the small organization with which we’re working.

Pragmatism and Information Science: Are They Mutually Exclusive?

Like Andy, I don’t think Yakel’s information science approach and Dallas’s “pragmatic” view are as different as Dallas thinks they are. While Yakel’s article does make information professionals essential to her definition of digital curation, she also recognizes the gap between them and data creators. She goes on to cite the e-Science Curation report, which encourages the involvement of both records creators and information professionals in digital curation, making it more like the “contact zone” that Dallas advocates.

Once my group has our meeting with our client tomorrow, we hope to have a better idea of which of these approaches will work best for all stakeholders in our project. I have a feeling it will be some combination of the two, using Yakel’s definition as an ideal goal to strive towards, but in practice perhaps leaning slightly more toward Dallas’s real-world “in the wild” approach.

I’m also looking forward to rereading Yakel’s article in the context of my Wikipedia work on the Data Curation page. I’ve been trying to think of ways to differentiate data curation from digital curation while still showing how they overlap. I like how Yakel brings the different aspects of data curation and preservation together under the umbrella concept of digital curation. Of all the reports Yakel cites, the National Science Foundation reports are the only ones that reference data curation more than digital curation. This explains why the data curation Wikipedia page is specific to scientific research data, while the Digital Curation page is presented as more broadly applicable to other digital assets.

Week 4: Data Curation

I have been assigned the Data Curation article, which has been rated as Start-Class on the quality scale (a fair rating in my opinion), and Mid-importance on the importance scale (I think it could gain importance if it were of a higher quality). There are several potentially confusing aspects to this article.

For example, it does not include a link to the Digital Curation page anywhere (this is a content gap), although the reverse is true. While Data Curation and Digital Curation are not interchangeable, they are related, so I do think they should both link to each other. Andy, since you’re working on the Digital Curation article, I’m curious what your thoughts are on this?

The Data Curation page is less library-specific, and its opening definition much broader, than the Digital Curation page. While the Data Curation page is listed under the Information Science category, it is also listed as within the scope of the WikiProject Computational Biology. Unlike the Digital Curation page, the Data Curation page is mainly about data in non-library contexts, but does go on to cite a definition from the University of Illinois’ Graduate School of Library and Information Science.

The sentence “The exact curation process undertaken within any organization depends on . . . how much noise the data contains . . .” is not clear about what it means by “noise”. Does it mean superfluous data, data that can be discarded? More precision would be helpful here.

There are also some positives about this article, such as the number of links to other Wikipedia articles. The link to the Data page is especially important because, to learn about Data Curation, it is essential to first understand the definition of data itself. However, the broad definition of curation does not link to the Wikipedia page on Curation (another content gap), though the Curation page does link to both the Data Curation and Digital Curation pages.

I fixed some of the grammar in the Data Curation article, but I’m still reading through all of the related linked pages, continuing to identify content gaps or overlap, and making notes of references to add to cover these gaps.

 

Blog Post 1

Learning WikiCode goes well with this course and with the other course I am taking in which we’re learning HTML!

I definitely see a major drop-off in quality and depth of coverage from the more fleshed out, sourced, and developed articles like Digital Preservation, to the more sparse Community archives article. At first I was looking at it on an iPad, but now that I look at it on a larger desktop computer screen, its lack of depth really stands out among the other articles. I like the ideas on the Talk page about what’s missing and how to improve this article; it just seems like nobody has implemented them yet, so this could be a good place from which to start.

Overall I’m very impressed with Wikipedia, from the suggestions on how to make articles better, to the many suggested sources, to the relative civility on most of the Talk pages. A lot of the Talk pages are debates about semantics, but this is important for an encyclopedia.

As a former professional editor, I did find some grammatical issues, such as where the Data curation page reads: “Data curation is typically user initiated and maintains metadata rather than the database itself.” I think it should read: “The user, rather than the database itself, typically initiates data curation and maintains metadata.”

I’m also still learning what qualifies as a subjective value judgment on Wikipedia. For example, on the Digital Preservation page, the line “It is a difficult and critical process because the remaining selected records will shape researchers’ understanding of that body of records, or fonds” seems true, but I probably would have left out the first part and just written “The remaining selected records will shape researchers’ understanding of that body or records, or fonds.”

Interestingly, as I write this post, the blog is underlining curation as a word it does not recognize!