Writing about the Project

I have spent this week catching up on the group paper and starting the outlines for my sections. I will be working on the Overview, the Inventory Analysis, and the Policy Recommendations for CreativeWorks. The Overview will include the obvious background information, along with a description of our initial findings and proposal. I think it’s particularly interesting to note that our initial impression was regarding the need for a digital museum/gallery experience for all of the great content, but ultimately we’ve ended up identifying ways to organize that content in hopes that future uses can be implemented later. This definitely isn’t a fault with the collaboration, but rather an example of how projects can shift quickly and it’s beneficial to remain flexible and communicative to achieve the best results.

The Inventory Analysis section will likely include some details from my previous post. I am also hoping to include the steps to repeat both the inventory export and the inventory analysis. However, some of the data is still a bit unclear (e.g., should we count the “Folders” file type?). Also, the analysis process has been mostly manual, still, so I need to confirm a good, repeatable method. There are several different ways to interpret the results but ultimately, this information makes clear that the CW team is likely continue to produce more and more content with each passing semester and helps make the case for a more formal storage and backup solution.

Finally, the Policy Recommendations will include details for CW staff, teachers, and students, with the approach that things like file naming, folder organization, and backup processes are everyone’s job and part of professional behavior. Pulling all of this together in a way that is approachable for each of these audiences will be the challenge here. Having outlined these sections, I feel ready to prepare my initial drafts for the team to review.

Analysis and Next Steps

For CreativeWorks, I continued working on the digital inventory of all the files collected on the hard drive. As I mentioned before, I had done much of the analysis using Excel prior to exploring OpenRefine, and the remainder of the work I was trying required arithmetic best accomplished via Excel, so I haven’t had a chance to work with OpenRefine much more. However, I am still interested in seeing if we can document an ongoing inventory filter with that tool that CW can use to track inventory moving forward.

The analysis I’ve conducted still needs some refinement. The total number of files form each year does not add up to the total number of rows in the inventory document, so I am trying to identify the source of the difference. (The inventory states we have 42,301 rows, but the total from my table indicates 41,566, so the difference is 725 files.) I also want to validate my math for the years we have and try to determine some patterns so we can help CW set expectations for the types and sizes of files they may generate moving forward. Each year is very different from the last, but 2017 and 2018 definitely show huge leaps in both number and size of files. This may be due to previous file loss (i.e., they generated similar files/sizes previously but the files are now gone) or this may be due to changes in the program (i.e., maybe they are using different software or focusing on different project types that mean more files/larger file sizes. Additional cleanup is needed to see if we need to account for the file types “Data” and “Folders” since those seem less useful and are perhaps redundant.

My next steps will be to continue inventory data validation and then add to our team document regarding process and policy recommendations. We have already culled some helpful “one-sheet” information regarding file naming conventions, but I plan to spend more time with suggested curriculum additions/enhancements that can start to make this kind of “digital hygiene” part of the normal routine for students. It seems to fit nicely with professional training such as job hunting and resume writing, so we think this could be a useful way to engage the students with the goals for file maintenance and organization.

Data analysis and standardization

I’ve been spending this week working with the Excel inventory Lauren generated during our previous site visit. The initial analysis was pretty simple, though it was basically a manual review of the Excel doc based on the easily accessible information. However, we’d like to drill down into the information further in a few ways. 1) We want to present the inventory by year, so CreativeWorks can get some concrete data about the amount of digital content they generate. 2) We want to identify more of the file types, since the initial inventory does not provide all the data we need in each column. 3) It would be helpful to standardize the file sizes, since the inventory combines bytes, KB, GB, MB, etc., into a single column. 4) There is an odd scenario where Folders seem to be counted towards the total MB, but we suspect that may mean Folders and their contents are contributing duplicative info to the total size.

Working with Excel and some internet research, I created a MB column that should account for the file size standardization using these kinds of formulas.
– bytes to MB: # / 1048576
– GB to MB: # * 1024
– KB to MB: # / 1024

Next, I tried to use the Text-To-Columns feature to isolate the file names that used extensions to help identify some of the unknown file types. There is also a helpful CODEC column in our inventory that lets me isolate information even further.

I have been more stymied trying to isolate the information by year. There is the expected difference between file creation and file modified dates, further complicated by the fact that some of these files have metadata that sets them in the 1960s… I need to massage this more to see if I can get better results.

Of course, as I’ve been wrangling this data, I learned more about OpenRefine from this week’s reading and now I want to give that some additional thought as an option. While my manual processes will get us a one-time analysis that’s helpful for the purposes of the report we will present CW, a more repeatable method would be infinitely more useful to them.

Digital Content and Envisioning Next Steps for Inventory

This week, team progress was made continuing to refine the organization of the external hard-drives, continuing to iterate on the folder structure, and (tomorrow) exporting an inventory of the compiled files. The process of gathering and evaluating all of the content has been time-consuming and I have concerns that we may get the CreativeWorks team to a point where they feel more organized now, for the snapshot of time where we applied our efforts, but still lack concrete steps to develop and maintain long-lasting improvements. In March, we started a document about best practices for digital organization to try to capture some best practices in a way that is easy to read and share, but it still feels like the organization would benefit from prioritizing the creation of official digital processes, policies, and procedures and from placing greater emphasis on the human and technical resources needed to support that effort. It is easy enough to pass along that kind of recommendation but more difficult to imagine that they have the organizational capacity and desire to consider them.

Similarly, last week, our team found ourselves discussing the value of the inventory we propose to generate now that (we think that) all of the digital content is consolidated and backed-up. The file names are rarely informative, the metadata is scarce and inconsistent, and duplicates are expected and sometimes unavoidable. On one hand, it seems impossible to recommend a long-term digital preservation plan without understanding all of the assets. On the other hand, however, the staff has neither the interest nor the time to clean up old data (outside of the identified high-value works). So once again, I return to the value their program could provide by teaching students how to label and identify their digital work according to a more rigorous standards from the get-go, as part of the generalized job training they provide. While the organization effort we’ve accomplished so far is notable and incredibly helpful for the staff, I hope that we are able to translate this into an organizational priority in terms of student skills.

Tomorrow, I plan to spend more time considering the implications of the Common Heritage grant for CreativeWorks. I appreciate the broad support the funding suggests regarding the importance of digitizing cultural heritage materials and the organization of outreach through community events around these materials. There is no doubt that JME and CreativeWorks participate in activities that fall under that umbrella. However, most of what CreativeWorks generates now is already digitized and presented at a public event for outreach that gain support and broaden their audience, so some of the grant focus feels redundant. While I think that our group can hopefully offer some valuable insight into how JME and CW might best leverage their assets and skills for additional funding, I am tempted to recommend they identify a technology infrastructure grant that could help meet their networking needs, a more pressing current issue.

Clash of Organizational Systems

After our recent follow-up meeting with CreativeWorks (CW) team at Joe’s Movement Emporium (JME), I think our team has a better appreciation for the current status of their files and organizational systems and we are starting to identify some of the needs we may be able to help them address. They would probably benefit from standardized file naming and hierarchy rules as well as working from a single server with regular backups. Their hardware and software setups are sufficient for student needs in the moment, but lack software updates, security features, and regular IT maintenance/support. And as my teammates have mentioned, each project, student, and staff member seem to implement their own systems, with varying degrees of clarity and success. And so, working with their teams to flush out the details of these systems, in addition to preparing their students and staff to move forward with them, is certainly easier said than done.

Broadly speaking, our conversations serve to both document these challenges and to allow the staff to interact and discuss them in a different setting. While Andy and Zach were detailing their findings from the computers, I found myself returning to the Weiss and Holtzenblatt techniques to further interview staff. Sierra reviewed the challenges of running the Digital Media Lab, including inventorying equipment, signing in and signing out resources, and providing appropriate guidance and training without creating barriers between students and their projects. Asking for more details about Patrick’s comments regarding a former staff member helped identify that some previous work had been done to retrieve student files before a contract with provider of computer gear ended, the details of which we should try to leverage. Finally, we also learned that there is a “Joe’s Metadata Template” actively used in AdobeBridge.

As the requirements grow more complicated, I think it will help us to keep in mind that it is a service to document these discussions, even if they have to do with “pain points” that our project and future projects may not be able to fix. To my mind, as we recommend improved intake and filing processes to track their assets, these may lend themselves naturally to helping CW make process improvements in other areas. Furthermore, since the files themselves go through an identifiable and relatively predictable life cycle, we can point out specific resulting deliverables (final projects, resumes, reports, etc.) that may help motivate both students and staff to participate in some new organizational processes.

WikiArticles and WikiArchives: Second post

I have been volunteering at the Maryland State Archives for a few months and the Director of the Special Collections department recently posed a question said she’d asked of each of the many MLIS students who have worked there over the past few years, always getting a different answer. The question: “How do you define digital curation?” Having so recently revisited Yakel’s article “Digital Curation” (2007), I gave the definition from the text almost verbatim: “Digital curation is the active involvement of information professionals in the management, including the preservation, of digital data for future use.” The Director nodded, pleased with this answer, although she wasn’t specifically familiar with this definition from Yakel. She also said she hadn’t heard this exact answer in previous conversations. I suspect this answer appealed to her because its first emphasis requires active engagement with information professionals. In an archive that manages both physical and digital collections, with a great deal of legacy technology (for better or for worse), within an often slow-moving government institutional context, she did not need to ever have heard of Yakel to appreciate the strength of the definition.  I share this anecdote because I think it represents an interesting interaction between theory and practice, and highlighting the difference between defining and doing an activity.

So in comparing that definition to the one currently given by the Wikipedia article: “Digital curation is the selection, preservation, maintenance, collection and archiving of digital asset,” we encounter one example of a digital library topic area on Wikipedia that is generally representative of the field but could use more nuance and support. Furthermore, I agree with the previous blog post that this article relies too heavily on information from the Digital Curation Centre, which while strong and valid, may represent a UK-specific viewpoint and stand to benefit from more discussion of fundamentals from a variety of sources. By contrast, the Digital Preservation article seems both deeply and broadly researched, with sufficient details and citations to support each section. (I think we already mentioned that the “start-class” designation here is confusing.)

For my article assignment, I’m excited to be working on Community archives. There are loads of great ideas listed on the Talk page for potential new content. The section called “Issues” is particularly rich:

  • Digitization as a way to build or link community archives
  • Digital divide re: community members accessing their own material
  • Archival training
  • Community representation in the profession
  • Ethics of access
  • Capacity challenges (i.e. funding, disaster planning)

Furthermore, I have identified four other sources to evaluate from the Intro to Archives and Digital Curation Fall 2017 syllabus. Also, since I questioned the use of one of the articles (Woodward) as a citation, I will examine whether that one is worth replacing. I did check to see if anyone responded to my Talk comment about that, but have not seen any answers yet.

WikiEdits and WikiEvaluations: First post

I use Wikipedia as a resource regularly, both as a way to answer questions that come up throughout the course of a normal day and also as a jumping off point for academic research into topics that are very new to me. In class discussion last week, we talked about how school kids are often taught that they should never reference Wikipedia in their research papers and I had almost completely forgotten that kind of bias exists. Although I consider myself a regular visitor to the site, that reminder forced me to acknowledge the amount of filtering and evaluating I do while reading a WikiArticle in order to identify useful, fact-based information and learn to ignore speculation and bad writing.

When I started using a more critical eye (instead of filtering past weak contributions), I noticed that most of the articles dealing with digital curation and digital preservation topics could use a decent amount of attention and improvement. At first glance, each one seemed to have a healthy number of citations given the article length, no glaring errors or omissions, and no major problems. However, much of the writing could be improved and some of the organization was confusing. “Digital Curation” seemed like it could use more introductory information and also seemed to very much represent information from the Digital Curation Centre. That is definitely a great resource that should be presented on this page, but other venues cover this topic and represent alternative methodology and viewpoints.

I ended up focusing mostly on “community archives” and had the experience of watching one of our classmates edit it in real-time, which was pretty great. That some of my initial feedback was immediately addressed by her contributions effectively demonstrated how much value can be added with each edit. The rest of my evaluation of the original content still ended up being pretty critical and I didn’t have many great suggestions for how to address every one of my concerns, but I expect I will revisit it again with some new ideas. I still feel a little tentative about editing Wikipedia, perhaps even more now that we’ve gone through formal training and had some of the desired rigor emphasized. However, the classroom exercises and sandbox activities have helped me gain confidence and I made contributions to both the article and the talk page.