0:03
Hello everyone and welcome to today's Infopeople webinar better sharing through metadata good practices for better Discovery and I am now happy to turn it over to our presenter Matthew McKinley. He is the metadata Harvest programmer at the California Digital Library.
0:23
Hi everybody. Yeah, so I'm Matthew McKinley from the California Digital Library as they've already stated and we're here to talk about better sharing through metadata or ways that you can improve your metadata to make it more discoverable and more shareable on the web.
0:41
So first, I want to go over the schedule very briefly. I'm going to start with a brief intro and just talk a bit about why shareable metadata actually matters that I'm going to move into something that I'm calling the six C's or shareable metadata sort of larger concepts of how to create Cheryl metadata, then after that moving into strategies and tools for creating sharable metadata, or sort of concrete things you can do to actually create and share this metadata with the world. And then finally we leave some time at the end for questions.
1:10
in a bit of wrap up also before I begin I want to I'd be remiss if I didn't mention that this Workshop series is coordinated by the California Digital Library as part of its account harvesting California's Bounty project. This is a project supported by the u.s. Institute of Museum and Library Services Under the provisions of the library services and Technology act which is administered in California by the California state librarian.
1:40
I also want to say that a lot of the content for this Workshop comes from a workshop called metadata for you and me a training program for shareable metadata, which was a 2006 collaboration between the University of Illinois and Indiana University. Now, this is a great resource, but it's also from 13 years ago. So we've made some updates and customizations for our own purposes.
2:04
All right. So diving right in since you're already here. I'm guessing you have a pretty good idea of why crafting sharable metadata is important, but just to frame things. Let's say you're in charge of making some of your institutions collections available online. You've put in the work installing the software digitizing describing an organizing and now your brand new digital collections are accessible to the whole world. Great job. How do you attract users to the spiffy new content the answer lies in maximizing the discoverability?
2:34
Your metadata discoverability is just what it sounds like the ability to be discovered in this case by your content current users and by any potential users, but that sounds like even more work. Why should you put more time and effort into the records that you've just made available?
2:52
Before I answer that question, let's take a step back. A lot of what we'll cover today can improve your records discoverability in any context. We're also specifically focusing on sharing your records with large-scale cultural aggregators. So what is an aggregator? It's a web-based service that Harvest Digital Collection metadata records for many different institutions and combines them into a single search and browse interface.
3:15
As an example. These are screenshots of the interface for calisphere. It's a free aggregation service provided by the California digital library. And it's what I spend my time working on calisphere offers Educators students and the general public access to nearly two million digital objects from over 200 cultural institutions across, California.
3:35
the California Digital Library also shares collections with the digital Public Library of America, which is an aggregator much like calisphere but operating with a Country Wide scope what this means is that everything that goes into calisphere will eventually also be pulled into dpla most other states and regions have their own aggregation services that also serve as content hubs for dpla which means all this diverse Regional content gets funneled to a single easy-to-use interface representing content from across the tree Now this is sort of the first point where I'm going to pause and ask you all a question, which you can feel free to answer via the chat box. I'll give you a minute to sort of give the answers well and it to the answers in a few slides, but the question for this slide is where you sharing your content and has anyone here share their content with an aggregator already. See I feel free to post your answers in the chat and we'll go over them in a minute while you answer. I'll move on to a more concrete example of the benefits of fine tuning your metadata to be more shareable.
4:41
So take a look at this resource on the left. It's from a series of survey reports in calisphere. The metadata might be a little minimal but it names the resource and it gives date format and some usage info. It makes total sense within the context of the rest of the collection in its local Repository.
5:00
But the vast majority of users aren't searching for resources from within your repository. They're coming from aggregation platform such as dpla and calisphere or Federated catalog such as worldcat or good old-fashioned search engines.
5:13
And if the terms are using for search don't align with your metadata, they probably won't find your resources now take another look at this metadata on the left.
5:25
Here's a new and improved version of that metadata record what's different here? First of all, the titles more descriptive and it doesn't include local ID, which may confuse some users. There is an original date included with the 2017 day clearly labeled as a date scanned the description mentions a more keyword friendly region. There's a more specific format and finally rights information. So the user is a hundred percent clear on how the resource can be used.
5:51
Now that we've punched up this metadata a bit any one of the user search query is to return the record within the users chosen search interface.
6:01
When records are easily found the more easily used reused and shared this creates a positive feedback loop getting more eyeballs.
6:08
For more place lose to your original resource driving a page views and ultimately increasing the resources prominence in search engine index has it also makes the job easier for the various crawlers indexers enrichers and other data munging ver that actually runs the web if you can get your metadata records to adhere to the best practices and protocols will cover today the full descriptive richness of the record can be accessed by tools that will index it share it and enrich it in ways you in never even have dreamed of So it's really pretty simple most users aren't coming into your repository via the front door or the main landing page. The internet is hyper connected by Design. So it's only a matter of time until records in your repository Escape into the wider web. You can plan for this by making sure your metadata is optimized for linking and for sharing like a lot of library and archives work in the end. It's all about increasing access.
7:07
I'm going to pause and Mary's going to read out some of the answers for the first question we had.
7:14
I've where you're sharing your content and whether anyone shared with the aggregator already.
7:19
Okay. Yes.
7:20
We have archive.org calisphere and dpla Alabama Mosaic, California revealed digital Commonwealth and online archive of California Cool. Yeah, those are all a lot of those are actually aggregators in their own. Right so it's people are already at least two people already sharing it familiar with the aggregators. We will all even cover some of those later on in the presentation, but it's good to hear that people are sort of sharing in a diversity of places.
7:59
So yeah, some people already have kind of a head start but whether or not you are already sharing your content with an aggregator, these metadata tips should help so All right cool moving on.
8:12
So these are access statistics for 2018 for a single institutions collection and calisphere in this case the Los Angeles Public Library now in 2018. They had about 550,000 total page views meaning an end-user navigated to a collection or object page within their calisphere collection of these a little over 61,000 and users click through from the calisphere record or object page to the local repository page itself.
8:41
So these numbers are all in addition to the traffic being driven to lapl repository via other methods Beyond enabling oai harvesting and fine-tuning things a bit with us lapl didn't have to do anything extra to get all those new traffic.
8:57
So let's move into the six C's of shareable metadata.
9:02
Well crafted metadata should not be one size fits. All in other words don't think of metadata is a single written in stone record, but it's multiple views that can be projected outwards from a single resource monolithic records the try to hold all possible metadata for all possible audiences are problematic for a few different reasons.
9:22
First the most widely is metadata schemas and especially those used by aggregators are geared towards simple sharing and often. They don't have the complexity to clearly communicate every bit of information stuffed into them. So if you're trying to include every bit of metadata from a more domain specific scheme or from multiple schemes things can quickly get confusing.
9:45
Secondly and users and aggregator software can be confused when search results are deluded by extraneous or ambiguous information. For example, let's say I'm better at a record for a photograph holds to date values one from 1975 and one from 2008 neither the user nor an indexing program will necessarily know just by looking at the record of the form represents the date the photograph was taken and the ladder the date the photograph was digitized.
10:14
At the most basic level. I'm at a record is simply a view of the resource and that view may change depending on audience use in context. Cheryl metadata should be both useful and usable to Services outside of its local context. This doesn't mean you need to tailor your metadata for all circumstances and all services or create separate records for every possible aggregator that may use your metadata, but it does mean you should think carefully about the uses and services that your institution would like to support.
10:43
Or through your metadata.
10:48
This is a standardized set of metrics for judging the quality of a metadata record as laid out by Thomas Bruce and Diane Hillman and their 2004 book metadata and practice these include a records completeness with all known and relevant information included the accuracy of a record and that is error free.
11:05
You can form some standards for things such as proper names and abbreviations the records provenance or how the record was created and updated and by whom Our records conformance to expectations which means you're providing the necessary amount and type of metadata, the your intended audience expects the consistency incoherence of a record using unambiguous standard vocabulary is from profiles wherever possible the timeliness of a record mean the metadata is up-to-date and that it best reflects current descriptive best practices and finally the accessibility of a record or how available it is to users with different physical and intellectual.
11:44
Opposite he's these are all certainly good values to keep in mind when creating any metadata record and they formed the basis of the concepts were about to go through but to really get at what makes for good sharable metadata. We need to go beyond hence the six C's a shareable metadata.
12:02
First off making sure your content is optimized for sharing and that is consistently created and applied.
12:08
Coherency and context are two metrics to become even more important in the world of shareable metadata. And finally you need to be thoughtful about how you communicate your metadata to aggregators and the your metadata conforms to predetermine standards. So the aggregators can make sense of it.
12:26
Let's jump right into content. No matter what view of a resource metadata record describes. The content within is one of the most crucial Parts. What's a metadata record without content? So you need to consider this content carefully a sharable metadata record should describe the resource at a level appropriate to the resource and its intended use is item level or group level description more appropriate for the resource. What fields do users need to Discover? It makes sense to that resource or what Fields might have aggregator used to make it more discoverable.
12:57
These are all great questions to ask when creating shareable metadata.
13:02
Now if you know which aggregator you plan to use poke around on the site and take a look at some other collections to get a sense of which Fields they may be showcasing or indexing for search and browse if possible consider reaching out to the administration administrators to see what woodfields they require or recommend tuning your metadata towards the aggregator. You plan to contribute to will maximize is discoverability. For example, this is the calisphere collection browse screen now title is the obvious field here as the displays was every object.
13:32
But we also index the object type and derive a decade value from the date field. So you don't want to make sure those fields are completed and shared for your users to take full advantage of this interface. You also need to be very explicit with your metadata. So that aggregators know how to interpret it.
13:50
If you're using a controlled vocabulary to restrict a particular field to a limited amount of set values you should indicate which vocabulary you use In the same way if a URL link is included give some indication of what it points to this segues into the next point, which is consistency records can be indexed enhance in discovered so much more easily. If you're consistent about which field you store metadata values in and how those values are written to the field.
14:20
When Fields aren't used consistently for example, putting date information in either a date field.
14:27
Or for a temporal field for two different records from a single collection. It can confuse both end users and the aggregators attempted indexing and it could dilute search results.
14:41
It's also very important to keep values within a field consistent. The date field is a common culprit take a look in the right here at all. The different ways to express a single date value. Now, imagine a collection that contains each of these different formats in the date field depending on the record confusing right? Well aggregators think so too. Here's another question for the chat that a pause the end of the slide to read have Mary read through some of the answers, but the question is what other fields can you think of where consistency is especially important?
15:13
So predictability is key for machines who might be indexing your metadata, especially if there's no way to explicitly encode the controlled vocabulary or content format the better you adhere to consistent vocabulary, whether an external vocabulary or one made in-house the easier it is for an aggregator to recognize and enhance your content.
15:35
In other words getting as close as you can do a controlled vocabulary for all descriptive Fields, make sure records cleaner clearer and more flexible.
15:45
Now pause see if we can American read through some more of the answers for what other fields can you think of where consistency is especially important? We have names type subjects locations title.
16:03
author those are Those are all good picks and that's yeah, I mean it's a wide swath as I was saying. It's really some Fields. It's a little easier than others. But really as close as you can get for any field to sort of have a very consistent results the easier it is for users and for aggregators to make sense of it.
16:28
All right, next up keeping it coherent. Now the your records are consistent. What are they actually saying?
16:35
You need to include enough information that a record is self-explanatory since records discovered an aggregator or search engine are almost always removed from the immediate context of both the institution and the collection.
16:48
Avoid local. Jargon, if you can something that makes sense to local users might miss divide those who aren't familiar with your Institution.
16:59
Finally, I want to point out that kalispera isn't a hosting or preservation service. We simply use metadata and a representative thumbnail to create a version of a resource that is already hosted in your local repository a vital part of this includes linking the calcia record to its original resource in that repository and for that we need a value in your metadata that makes it clear how this should be done.
17:22
So you need to include a single and stable URL clearly linking back to your local resource. This means either a single unambiguous URL link or some indication of which of several links points back to that resource.
17:37
Now for that single and stable URL, it's a good idea to use a persistent identifiers such as a digital object identifier our or handle for this URL link. These Services create unique identifiers designed to be associated with a single resource in perpetuity. Even when the location of that resource changes in this way, you can avoid what's called link Rod or an outdated URL pointing to a resource that has been moved or removed.
18:07
Another way to keep it coherent if you have multiple values for a single metadata field think carefully about whether you need all of them. If you do choose to use multiple values, it's always better to use repeated instances the field instead of packing a single field of multiple values since no infield delimiter such as Ampersand or a comma is foolproof, especially at the level of mass aggregation.
18:32
About when crafting Cheryl metadata you'll need to think about the new contacts running your records.
18:38
For example in a collection about Teddy Roosevelt within your local repository surrounded by other resources about the former president the title on Horseback might be sufficient for a record given the context.
18:51
When this record is aggregated with records and other Collections and removed from that context on Horseback is frustratingly vague.
19:00
All right. Now I'm going to pause to ask a question and then give you some time to answer and then we'll go through the results. But this okay, how would you improve this title field to provide more clarity and context I'm going to give about 30 seconds to read a poem but in your answer.
19:30
We're getting some answers. So I'll go ahead and read them as they come in sure include location a couple of those location add year and place if possible and then we have some suggestions Cowboys on Horseback President Roosevelt on horseback.
19:49
location date and place collection data Yeah, yeah location and time those are both great things to include it's one of those things we sort of got to balance obviously putting everything in the title versus, you know, putting some descriptive information. But any of those things would help sort of improve the sense-making of this record Beyond just on horseback.
20:13
For for our part, we just chose Teddy Roosevelt horseback even just saying that immediately gives us so much more context than just saying on Horseback, but a lot of good answers in the chat, too.
20:26
So it pays to think of each record and the information that conveys as an atomic unit. What does that discrete record tell you in separated from the rest of the collection at the very least. You should include some basic information on your institution and the collection each record is a part of when exposing your records for aggregation.
20:44
Now on the flip side there may be some local contextual information that is relatively useless in aggregation this scanning day and technician for example, maybe don't need to be shared in every context if your repository software allows for it. You may even be able to expose some metadata for harvesting that is hidden from the record view within your local repository or vice versa.
21:06
Remember an aggregator will always point back to the complete resource and metadata on your local site. So you can afford to be choosy about which Fields you share for aggregation.
21:16
This is a good time to point out the context varies with time as well as location older methods of describing resources and widely used terms can sometimes be rendered obsolete at best an offensive at worst. If you do have such sensitive metadata in your collection consider possibly updating or removing it or at the very least acknowledging how it reflects the time and place the original metadata was created.
21:42
Now let's get to the nitty-gritty of how you're communicating or metadata records to aggregators. There's a lot of different ways to expose metadata for harvesting and your chosen method will have a lot to do with what the aggregator accepts as well as whichever method makes the most sense for your institution and your collection. Let's take a quick look at three of the more common metadata communication standards Marc records apis and oai-pmh.
22:09
Mark stands for machine-readable cataloging Mark standards were developed by computer scientist Henry at avram while working at the Library of Congress in the 1960s. It became the very first international data standard for transmitting bibliographic information. If you're at a public or academic library and your on their online catalog looking for a book, you're likely looking at Mark records most online public Catalog Store information and Mark format.
22:35
At most basic form Mark as a set of fields that contain different types of descriptive information about a resource each with the corresponding three digit tag, for example 0 to 0 corresponds to the ISBN number and 100 corresponds to the author. The fields are further refined by subfields, which you can see here as letters preceded by a dollar sign delimiter.
22:55
Most Mark based systems provide options to export records in ROM or conform wrong Mark format or an XML format. You can see the examples here with remark being on the left and XML being on the right. So if you're at a library that manages metadata for your unique digital collections using Mark format that can be one option for sharing records with aggregators.
23:18
API stands for application programming interface API is are used in a lot of different programmatic contexts. But here we're concerned with web apis. I'm going to leave out a lot of the technical details for simplicity's sake at its core a web. API is a set of specifications that Define both requests made to a web-based application by an outside agent as well as the response to that request.
23:43
You can think of this as an agreement between the user and application.
23:47
In other words, if either an end user or another application sends a request to a web applications API to return a set of resources or perform a service as long as that request is structured properly according to the apis documentation. The user or program can expect to receive a response with the data requested again structured to that API documentation. This makes automating this whole process a lot easier.
24:11
Many Digital Collection Management systems and platforms support these apis which can be used by an aggregator to query and retrieve metadata records. For example, both Flickr and YouTube have their own apis. So if your library is posting digital photographs with metadata into flicker or videos to YouTube and aggregator may be able to use this API to programmatically grab those records.
24:36
now oai-pmh stands for open archives initiative protocol for metadata harvesting you can think of this as a very specific version of an API a data provider in this case your local repository exposes structured metadata and XML format according to the specifications of the oai-pmh protocol a service provider in this case a harvest of such as calisphere that makes a request via a URL to harvest a structured men through data The metadata records within oai-pmh are most typically formatted using a standard called Dublin core Dublin core basically defines a handful of key elements for describing any type of resource, whether it's a book a digitized photograph a sculpture or whatever these elements include things like title date format extent and so on if it's available, but we IPM H is often the best choice for exposing metadata for harvesting and aggregation since it was built specifically for archival metadata.
25:34
Data harvesting it's included or baked in to a lot of widely use repository software such as islandora Omega and d space most aggregators support this protocol. So if you have a system that uses oai-pmh, you should be able to easily share your collections.
25:54
now we're going to move to a little bit different of an activity a poll of sort of which standard you have the most experience with and or feel the most comfortable using so I'll let Mary take over to sort of go to the whole technology. There we go.
26:10
I will give you 30 seconds to a minute or so to put in your answer.
26:42
Alright, so we have the results here and yeah, not not super surprising marks been around for a while. So it's got a lot of buy-in in the library and archives Community. Yeah, looks like most people chose Mark but we do have some people who are also familiar with away. I and apis I guess I'm curious about the other if some people wouldn't mind posting in the chat like what other harvesting Protocols are standards are you using?
27:21
Well, it's fine we can.
27:24
Move on to the next slide. I'll let you know if someone types in that one has they haven't tasted it yet. Cool. Thanks. Yeah, I was just curious. But yeah, okay. So Marky eyes is used pretty much across across the board in that those reflected in the pole.
27:38
So All right, finally conforming to standards conforming to agreed-upon standards is recommended for all descriptive metadata, but it's essential if you want to share your records with the aggregators and divers and users. As you can see there are standards at a lot of different levels at the highest level. You have the structural metadata such as requirements of the sharing protocol you use between the data provider and the aggregator and the field doesn't Nations for the metadata structure within each record being shared.
28:08
If these are not adhered to properly and aggregator might not read your metadata at all.
28:13
At the content level we have the standards that are essential for metadata transmission, but are necessary for both human and machine agents to make sense of your metadata.
28:22
There are controlled vocabularies and syntax rules to follow for individual types of content such as the Library of Congress subject headings, and then there's domain specific content standards to follow and actually doing the description such as describing archives a Content standard.
28:39
At the lowest level on a character by character basis, you want to make sure any special characters or entities are handled according to whatever text encoding you're using if this looks like a lot to keep track of let's because it is but there is help available in the next section. I'll cover resources for performing quality control on your shareable metadata.
29:01
So now we're going to pause for a reality check and so it's sort of an overarching question that addresses all of these 60 is a shareable metadata is that could a person with no prior knowledge determining convey what your record describes this is kind of question. You want to keep in mind when you're crafting metadata for sure ability because as you can see from that Teddy Roosevelt example, it's not always clear absent of any context what that record is about.
29:30
Alright, so now we're going to have another kind of break mini activity and again post your answers in the chat, but take a look at this example metadata record knowing what we know now about the 60s a shareable metadata what might you change or add to better align with the six easily shareable metadata that we just covered take a few minutes to post your answers in the chat before we review Multiple subject lines instead of using a semicolon. Yeah, it's a good one split up the subject line.
30:26
Again, make the subject separate entries rather than a single one.
30:31
But quite a few of those.
30:34
Someone's asking about the date the date and standard form.
30:39
Yeah, very good, Define dates.
30:47
Okay. Yeah. Yeah both both subject and date.
30:53
Yeah, and title could be more descriptive for sure.
30:58
So let me just dive into sort of some of the changes that I made this record. First of all, I included you can see all the changes here in green. I included the names of his declaration. That's not something that we covered in this Workshop dude is time, but and that's something that should be automatically be generated by sort of your XML program. So it's not something you necessarily need to worry about but it is something that aggregators used to make sense of sort of what metadata format they're looking at.
31:29
So yeah, we moved. There was a DC coverage value 2001 that I moved to a date field and has this as several people mentioned. There's also the subject of equity split into multiple fields.
31:43
Yeah, and then adding dates can do that 2001 date field to give it a little more context and sense. And then finally you can see in one of the identifier fields.
31:56
We got rid of that period there, you know that might be part of your identifier, but unnecessary punctuation can kind of confuse both aggregators and end users you want to make sure that is the correct value you can see to that there were sort of two identifiers or very similar if they're that similar you often just want to kind of Move one again. So as not to sort of dilute or muddy up search results or to confuse anybody.
32:21
But yeah a lot of good pics.
32:25
Let's see.
32:29
So now we're going to move in. I'll save questions for the end of the presentation, but let's move directly into strategies and tools for now or basically ways to put some of the concepts you just learned into practice.
32:43
The first is know your audience in order to best optimize your metadata for sharing. You need to know the audience you're sharing it with and for that you'll need to do some investigating for your current users. And what do they find interesting or useful? What about potential users? You haven't yet reached?
32:59
Think about the unique value your collections provide and how potential and users might utilize and amplify that value.
33:07
You could also do a scan of your current users.
33:09
How are they interacting with your resources where it is your inner resources and why what sort of reference questions are you getting about a particular collection, if possible gather use usage data from your repository platform or consider using a tool such as Google analytics now armed with all this research you can construct some basic user profiles to have a clearer idea of what sort of metadata drives your current and any potential users n Once again context is King think about the context of your records when end users encounter them in aggregation. What is clear in a local context might be confusing or unnecessary and aggregation and vice versa. If there's a relationship or information implied by the local context of your Digital Collection. It's a good idea to make those connections explicit when exposing metadata for aggregation. This means finding a way to consistently include institutional identity information such as full name and contact info.
34:08
Also means providing links and metadata for any related copies or versions of a resource since the two objects won't always remain contextually grouped.
34:17
Finally, this is important enough to restate maintain stable record links any relationship built on a URL and your metadata breaks down as soon as that explicit link. It's changed or becomes otherwise inaccessible. So again consider using a persistent identifier service for those urls.
34:39
Now I'm going to ask question and then you can then again answer in the chat window and we'll go over the answers at the end of the slide. But what are some other fields you might add descriptive context if a record is separated from its collection.
34:53
Here's a few examples that I mentioned previously spelled out imagine. These fields are a part of a metadata record the California Historical Society. You can see they've included their full institutional name and address including phone and email to make it as easy as possible. If you're curious and user to get in contact. They've also included explicit links to related resources in this case a link to a finding aid for the records collection. And another resource is referenced by the current resource.
35:22
Nothing, you supposed DIYs and arcs.
35:27
Now we're going to pause to tab Mary read through some of the answers for what other field you might add descriptive context. If a record is separated from its collection.
35:39
We have one answer so far. Someone says series title.
35:44
yeah, yeah, very true that something serious just sort of being the larger context of what object is part of that becomes a lot more important ones to separated from your initial context.
35:58
We have illustrator.
36:04
Someone saying is also useful to spell out abbreviations for international audiences.
36:09
Yeah, very true. Very true.
36:13
That's yeah, that's one of those things that you know makes perfect sense to you. You need to think about sort of like all the variety of users that are looking at your object and more descriptive rights or licensing statement. Yes. Yeah rights are super important and we'll cover that in a few slides from now. Yeah. Those are those are all great answers it really there's no like wrong answer in this context. Again. It depends on your object depends on who you're reaching out to and so we're just depends on how you describe that object.
36:46
So if all this is starting to seem a bit much remember, you don't have to get a perfect right out of the gate quality sharable. Metadata is a journey not a destination to verify as a concept popular in Silicon Valley. You want to think about the minimum viable record?
37:01
What are the essential metadata fields necessary to convey the value and uniqueness of that particular record or the collection as a whole focus on getting those core Fields just right and leave the helpful, but not required stuff for future updates.
37:19
So you don't have to map everything at first you want to start by focusing on the essential fields.
37:24
That said it's also helpful to think about search engine optimization or how the technical makeup of your metadata can help drive more traffic to your records.
37:33
Focus on widely used terms and vocabulary avoid jargon and above all find something to title minimally described resource other than Untitled.
37:42
Due to how search engines crawl and index sites titles can be very important for getting content indexed by search engines. Do you want to make sure it's actually saying something?
37:52
The vague and imprecise wording isn't just frustrating. It can also be an accessibility issue think about the sight impaired user who has to glean or they can entirely from the text and I'll text associated with an object or a non-native or perhaps grade school age user who might get tripped up by complicated or esoteric terms.
38:13
Now when it comes to metadata structure your can find a bit by the aggregator. You plan to use in other words aggregators will have their own metadata profile and you or the aggregator will typically need to map your local metadata to their scheme a simple spreadsheet is a good tool for building and communicating metadata crosswalks or you can use an online service such as Google Sheets to easily collaborate jointly on a crosswalk. This is a screenshot from a Google Sheets. Crosswalk CDL uses to map metadata.
38:43
From external sources such as the oai provider on the right to arm or calisphere specific metadata model.
38:53
Once decided upon make sure your you document your local meta data mappings conventions to practices for your own purposes. And so that you can explain any decisions if called upon by an aggregator or an end user.
39:11
When describing your content use content standards, wherever possible their tried and true for a reason another based on fundamental archival and cataloging values that said these standards are not 100% unimpeachable cultural values and lenses are changing all the time. And the journey of descriptive metadata is also a journey towards inclusivity.
39:32
If your legacy metadata is insensitive or unnecessarily exclusionary consider noting a revising and don't play absolutely beholden to the rules.
39:43
You want to avoid structural formatting with this includes line breaks bullet points Etc. This might look great in your local system. But this formatting often doesn't translate through to the aggregator.
39:53
Finally for scanned items be certain of what you're emphasizing in your description the original object or the digitized resource. For example, should you put photograph or jpg in the format field? Should you include digitization equipment and information or should you focus more on the original photographer?
40:12
Here's an example of HTML formatting getting mangled during the Harvest process on the left is how this description field displayed on the original site a nice clean bullet list. However, if the HTML that formats this bullet list is exposed within the description field for harvesting the aggregator won't know how to interpret it and we'll treat it like text.
40:32
In this case, it encodes all those greater than and less than signs and you sort of end up with is unreadable mess.
40:40
Here's a more visual example from calisphere. You can see the description of value from the original repository on the left note the greater than and less than signs here in the middle which are a key part of HTML and other xml-based encoding it looks fine here one this description gets harvested through to calisphere.
41:02
Our Harvester interprets that HTML and they get decoded it. Is this far Messier description value?
41:11
See, I always think about sort of the encoding or there's any sort of weird characters of the new description field. You want to clear it with your aggregator because it might end up with something like this.
41:23
Now, let's talk a little more about content standards in the worlds of shareable metadata. These standards are all about reducing ambiguity. This is especially important for fields that are often index for Discovery by aggregators such as Geographic places in date values.
41:40
Take a look at this example Cairo on its own as a discreet location value might be interpreted very differently than Cairo Alexander County, Illinois, unless the aggregator is explicitly set up to handle Geographic values as these discrete atomized groupings. It's best to go with least ambiguous single field approach.
42:01
Well controlled vocabularies tend to restrict themselves to a single field type content standards can exist at a variety of levels the Library of Congress name Authority file provides standardized versions of proper names which can be used with in any field that contains such a name at a higher level. The cataloging cultural object standard is used by visual or resources catalogers and others for guidance on shaping the whole record of a cultural object.
42:26
Now we don't have time to fully dive into implementing linked data. But if your metadata model supports it, it's never a bad idea to include URI references wherever you can for descriptive fields.
42:37
And finally, it's crucial to provide clear and unambiguous writes data. So that users and aggregators know exactly how resource may or may not be used as long as you craft it thoughtfully and apply it consistently providing a custom right statements is a okay, but we do highly recommend going with one of two right standards.
42:58
Projects our institution has the rights to consider Creative Commons. They offer a wide variety of licenses from CC 0 which has no restrictions on reuse all the way to CC by ncnd which requires attribution and does not allow for commercial use or any derivative creative remixes of the work. Essentially. This will only allow per re sharing with proper attribution.
43:21
Creative Commons even offers a tool to help you choose the correct license based on a set of questions regarding how you want your resource to be shared and used.
43:34
For resource is not owned by your institution or resources. The owner does not wish to make available via Creative Commons consider write statements from write statements dot-org which breaks the statements down to three different types in copyright. No copyright and copyright on clear.
43:52
Within These groupings are more detailed licenses that deal specifically with orphan Works European Union versus u.s. Copyright educational use now. They're meant to cover nearly any copyright situation. And again, the whole point is reducing ambiguity. If an aggregator knows it is only receiving a limited set of licenses. It can integrate functionality around those licenses in a clear and robust way.
44:17
With all these recommendations in tweaks. How do you make sure your many records are up to Snuff? Well not to sound like a broken record. But once again consistency is key if standards are consistently applied in content is consistently described according to content standards and your own methodology. You'll end up with much fewer quality control issues. Overall. Any mistakes will be easier to spot due to all that consistency applied everywhere else.
44:44
No matter how you try though. Some outliers are going to sneak through. There's too many repository and schema options out there to recommend any specific tool or approach but luckily the digital the digital library federation's metadata working group has put together a thoroughly researched assessment tool kit.
45:01
It lays out all the necessary steps When approaching metadata assessment including choosing the scope and Fields to assess Gathering data performing a documenting the assessment. It also includes a repository of tools to use for assessment and it includes a Clearinghouse of metadata application profiles. So you can be sure you're assessing towards the right profile for your aggregator.
45:24
Finally some aggregators including Callister and dpla can even generate analysis reports to spot discrepancies in your metadata.
45:36
Now we're going to move on to the final section, which is letting it go.
45:41
Now as a rule all metadata records in dpla are made available under a Creative Commons license, which basically means no rights reserved. Anyone is free to download and reuse the metadata without requiring permission compensation or attribution since all Callister records are eventually harvested by dpla this applies to all metadata records and calisphere as well.
46:05
They're thinking and ours is it the least amount of restrictions encourages the most use they defend his position by arguing that descriptive metadata records are not subject to copyright because they only contain either objective facts or modes of expression. So limited they cannot be classified as creative and as copyrightable content.
46:25
Now bear in mind that the cc0 license applies to the descriptive metadata record only the rights of the actual object content are totally different matter.
46:36
So again Creative Commons 0 license waves all copyright and dedicates the metadata record to the public domain this unrestricted free use encourages sharing and reuse by removing barriers rule for both machines and for humans.
46:49
One particular gray area is if a transcription of an audio resource or significant passages of a text resource are included in the metadata description field. If this text veers closer to the content of an of an object than description will often have to remove it from the record before it gets to calisphere. So it's not to share unrestricted intellectual property.
47:10
This is also not to say that all cc0 metadata records should have all contacts immediately removed in both policy and site design. Most aggregators visibly encourage users to properly attribute any records they share with the original source.
47:26
Here's an example of a caliphate calisphere citation link. If you click the get citation button for any calisphere record page will automatically generate a citation information for you to use and this is all just generated from metadata were given like the title date collection the owning institution and then also including the date of access and a permalink. So it has everything you need right here to share this metadata with the actual resource.
47:54
All right. Finally, I want to touch on how to share your collections with a digital aggregator. This is a generic basic metadata sharing workflow. Even for a so-called basic workflow. There's a lot of steps, but don't worry Beyond planning and creating metadata, which hopefully we've given you the resources to do today an aggregator such as calisphere and dpla can handle all the technical work of harvesting transforming and sharing your work and can assist you in collecting usage data for assessment.
48:26
So I encourage any California institutions who aren't yet contributing to calisphere to reach out to us.
48:32
Non-california institutions can find their state or Regional aggregator by visiting dpla service hubs page for Alyssa aggregators will also push your metadata to dpla.
48:43
Even if your content is barely described or even digitized there's often initiatives that can help you with these early stages of the digital object line cycle for example in California. We have the California revealed project which Partners local libraries and heritage heritage organizations to digitize and he'll provide access to California primary sources. In fact, we already heard in the call today on the presentation today that several people are using California revealed now, this is not just exclusive exclusive to California.
49:10
You can check with your service Hub if you're in another state or And to see if a similar service exists.
49:19
And that pretty much does it thank you so much for attending this webinar. If you have any questions, we have a little bit of time here. So feel free to post them in the chat or to the questions box and I'll try to answer them. I do want to point out that the the all of these slides including the notes will be shared by info people and you can see well actually be shared as handouts for this presentation. And also with this handouts, there's a link to a whole bunch of different resources and sources I use to compile this Workshop.
49:48
So if you want to dig For any of these topics that will be very helpful to consult.
49:57
Yeah, thanks again. Again. My name is Matthew McKinley with the California Digital Library. This has been creating sharable metadata, and I hope you got a lot out of this and yeah, let me know if you have any questions.
50:11
Okay, we're getting a lot of thank yous no questions yet and Megan from California revealed says thank you for the shout out. Yeah, no problem, Megan.
50:22
And actually we just you'll see in the chat. We put a link to a survey so I'll be tell yes. Yes, very good. Hmm.
50:34
Yeah, and I'm happy. I've included my contact info with this presentation and I'm you know, you probably got it at just as part of the course is presentation. I'm happy to answer any questions via email as well. Okay. It looks like we have a couple questions among the thank-yous hear someone saying I'm a librarian with limited metadata experience. I'm interested in volunteering and learning something like an internship. Do you have any recommendations?
51:00
Yeah, it's hard to Opera specific recommendation. I mean, I think the impulse is actually is a great one. I think that's the best way to get experience in a really learned kind of the ins and outs is to volunteer with the organization. I would say most most Library systems have like local history rooms or things like that where and they're often, you know fairly underfunded so they often appreciate it sort of any volunteer.
51:27
They can get and you can end up like pretty pretty quickly working on some pretty cool stuff just because they can use sort of these Helping Hands. I would say yeah like a local library or there's also often sort of educational opportunities contacting a local college or university. I mean most archives and libraries have some sort of capacity for volunteers to help and gain experience.
51:52
So, I'm sorry that's not super specific answer but I do think it's a great impulse and I will say that There's a lot of a lot of different libraries and learning institutions offer that so I'd encourage you to reach out to them.
52:07
Do you think that entering metadata in a Native American language will put off non-natives accessing it?
52:17
That is a good question. It's that's that does that sort of goes back to tailoring your content to your audience? If a significant portion of our user base does understand and speak his language and I think it would obviously make sense to include that language within the object.
52:35
Even if they don't, you know, if a description or whatever field you're using in that language is sort of a significant part of that collection and what that collections all about how to maybe Include that description and then if possible also include possibly like an English translation of whatever that those values are. But yeah, I think that I think it will be important to share. I also think there's something to be said about kind of including what could be seen as more unique sort of metadata like these sort of underrepresented languages. So yeah, I think definitely include it but also consider maybe including a translation so you're reaching the widest possible audience.
53:18
Okay, again, there's a lot of thank yous and nice comments which we will make sure to share with you after Matthew great.
53:30
Good. I'm glad I'm glad this was useful and yeah, always happy to answer questions in the future and hopefully the handle child pool as well.
53:40
Alrighty, then we can go ahead and wrap it up then. Thank you very much. And that was that was awesome and great visuals and and thank you for the audience also for contributing and commenting and answering the questions.
53:57
Yep. Thank you everybody again, I guess one more time. My name is Matthew McKinley and with the California digital library, and thanks for attending and hope hope this is helpful.
54:09
All right, and so for our audience everyone who registered and attended today's webinar will receive a follow-up email tomorrow that includes a link to the archived recording of This webinar as well as a link to a certificate of attendance. We also have a short survey as I pointed out. You'll see it in the in the chat area. If you can please take a few minutes to fill that out and helps us in planning future training. So, thanks again everyone and we'll see you at our next webinar.
RE-GENERATE TRANSCRIPT
SAVE EDITS