Skip to content

Agenda 2025-05-06 #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
TzviyaSiegman opened this issue Apr 25, 2025 · 16 comments
Open

Agenda 2025-05-06 #12

TzviyaSiegman opened this issue Apr 25, 2025 · 16 comments

Comments

@TzviyaSiegman
Copy link
Collaborator

TzviyaSiegman commented Apr 25, 2025

Agenda: C2PA and IPTC

Speakers: Leonard Rosenthol and Brendan Quinn

  • Responses to Authentic Web Questions
  • Overview of C2PA
  • IPTC work on C2PA
  • Discussion

Meeting information

Minutes

@TzviyaSiegman TzviyaSiegman changed the title Agenda 202-05-06 Agenda 2025-05-06 Apr 25, 2025
@lrosenthol
Copy link

Two other items that we will cover in the talk...

  • Relationship to Creative Assertion Working Group (CAWG)
  • Relationship to JPEG Trust (ISO 21617)

@TzviyaSiegman
Copy link
Collaborator Author

C2PA Questionnaire is available for preread

@tantek
Copy link
Member

tantek commented May 6, 2025

Additional suggested pre-read for workshop participants, given the primary topic of this meeting:

I have a conflict for most of the duration of this mini-workshop instance.

The time of this second instance is also the same as the first, which is exceptionally unfriendly to participants in Asia and Oceania, such as the author of the above pre-read.

If there is an intention to continue this mini-workshop series, I request rotating future event instances across times that are more friendly and accommodating across timezones in order to be more inclusive of global participants.

Lastly, here is another suggested pre-read on a fallacy I noted in the prior mini workshop^1:

Thank you for your attention to both of these suggested pre-reads.

Stay skeptical, my friends.

Tantek Çelik, Mozilla Advisory Committee Representative, Member of W3C Credible Web Community Group (https://credweb.org/)

^1 https://github.com/w3c/authentic-web-workshop/blob/main/minutes/2025-03-12AuthWeb.md

(Originally published at: https://tantek.com/2025/126/t1/)

@TzviyaSiegman
Copy link
Collaborator Author

Questions we didn't get to in today's session:

  1. How have the softbinding algorithms been security tested? Are any of them cryptographically secure?
  2. How does the certificate lifecycle impact credentials' validity
  3. What, if anything, prevents the IPTC signing process from being used by every user of email, web, or social media, to sign content they create?
  4. The browser certificate ecosystem discovered some vulnerabilities with EV certificates, in that it's possible to register companies with misleading names, which then justifies issuing misleading EV certificates. (This was partially discussed, but lots of questions about EV cert).

It would also be great to hear your response to @martinthomson's article above.

Thanks @lrosenthol and Brendan Quinn

@jyasskin
Copy link
Member

jyasskin commented May 7, 2025

@lrosenthol mentioned that it would be great to get browsers to display information from C2PA. Let's think about what that might look like:

It might be possible to let users go through the context menu to open a "metadata" view, which could show C2PA, EXIF, and any other interesting metadata. However, not many users are going to go to that trouble, so it probably doesn't really handle the use case of helping users know when content is authentic.

It would be possible to badge images with a little logo, but it would be important that users tend to draw the right conclusions from seeing that badge. If it's mostly GenAI companies putting C2PA into their images, and users really want to see an indication of content that's human-created, we might accidentally decrease the average user's understanding.

One could also imagine showing the signing entity's name when the user clicks the logo. That's where the EV experience is worrying: it's likely possible to register a company named "BBC" in London, Texas, and it would be important that the UI for content signed by this company not be confusable with the real BBC's content.

The idea came up of providing a browser API to parse C2PA data. That's probably not useful: the server can provide a JS library to do the same thing, and it can write whatever it wants to the page's UI. However, I could imagine extending the Integrity-Policy and https://github.com/WICG/signature-based-sri proposals to look at embedded signatures. Then images on, say, the BBC's site, which weren't signed by the BBC's key, could fail to load instead of showing unauthorized content. @chrisn, would that be attractive?

@martinthomson
Copy link
Member

@lrosenthol mentioned that it would be great to get browsers to display information from C2PA

I've heard this a few times and - because I think that this is not helpful - it's partly what motivated me to write that article.

It's technically possible to display badges if you have the necessary trust infrastructure (which invokes all of the problems that @jyasskin describes regarding who gets to assert what, and a whole lot more). But as he observes, you need to consider what that badge might represent. If it rounds to "this is AI generated", that's not going to improve the situation much.

There are also some very fundamental problems with the C2PA architecture that need to be addressed before even that sort of thing is feasible

The whole idea of a traceable chain of provenance is, to be frank, implausible. Perceptual hashing is not strong enough to rely on for strong assertions about identity of sources. And cryptographic hashing doesn't survive even the most superficial of edits. You are more likely to be able to do - as I think Jeffrey is implying - something about authenticating the "source" of content (the BBC, not the specific camera used by the journalist).

If you were to limit this to "NYT asserts that this image is authentic", that's a useful signal. But it is equally-well provided by HTTPS, with a little extra metadata. You go to their website and download the image, along with some assertion from them that this is "genuine" according to whatever criteria you might mutually agree upon. No signing necessary.

Signing might seem to help if the image is being distributed via other means, but that too is more readily addressed by HTTPS: include a link to the original point of distribution. Anyone who cares can talk to the WaPo or LA Times site to get the original, along with any original assertions about its provenance, authenticity, etc...

And then there is the question whether this information is really going to address the problems we care about. Ultimately, the best that any system like this can do is provide people with information. They still have to know how to use it. And want to.

@lrosenthol
Copy link

Adding my presentation here for easy access...

Content Credentials @ W3C.pdf

@lrosenthol
Copy link

@lrosenthol mentioned that it would be great to get browsers to display information from C2PA. Let's think about what that might look like:

I think Brendan did, but I would certainly support the idea!

It might be possible to let users go through the context menu to open a "metadata" view, which could show C2PA, EXIF, and any other interesting metadata. However, not many users are going to go to that trouble, so it probably doesn't really handle the use case of helping users know when content is authentic.

Agreed, and I would also note that is one of the various reasons (including the later comment about APIs) why the old W3C proposal for metadata APIs (https://www.w3.org/TR/mediaont-api-1.0/) and related other attempts (whose links I can't immediately find) never saw adoption.

It would be possible to badge images with a little logo, but it would be important that users tend to draw the right conclusions from seeing that badge. If it's mostly GenAI companies putting C2PA into their images, and users really want to see an indication of content that's human-created, we might accidentally decrease the average user's understanding.

Yes, and this is why some of the companies that have implemented support for Content Credentials in their UXs, don't use a simple "badge", but put larger statements like "Made with AI" or "Made by Human" instead - since when scrolling through a social feed, that is what you might want to know up front. And then, as we note about selective disclosure, displaying info about the signer and others. But that is also why you need validation status to accompany that, which includes chaining up to a trust anchor on the trust list - especially one that involved verified identity such as used in countries other than US.

@lrosenthol
Copy link

There are also some very fundamental problems with the C2PA architecture that need to be addressed before even that sort of thing is feasible

Such as??

The whole idea of a traceable chain of provenance is, to be frank, implausible.

Given that it's been out there "in the wild" for 3+ years now - I would say that it is quite plausible.

However, l think that it depends on what you think you are getting.

If you are expecting to get a COMPLETE chain of provenance since creation through to current viewing - I agree 100% that is unrealistic. But C2PA doesn't promise that. Content Credentials provide a mechanism for storing tamper-evident provenance via cryptographic means tied to a trust model - no more, no less. What is contained in that provenance might be only the latest stop in the life of the asset - such as a publisher just putting their "seal" on the asset....or it might indeed be complete from camera (or AI) through editing to publishing.

You are more likely to be able to do - as I think Jeffrey is implying - something about authenticating the "source" of content (the BBC, not the specific camera used by the journalist).

The C2PA does not try to authenticate anything. Validation of a Content Credential simply serves to provide a consumer/user with the state(s) of that Content Credential, as described at https://c2pa.org/specifications/specifications/2.2/specs/C2PA_Specification.html#_validation_states. Given the normal scenario, that means that the Content Credential is well formed, valid, and its contents were created by a trusted claim generator. No more - no less.

If you were to limit this to "NYT asserts that this image is authentic", that's a useful signal.

Which is a version of what I said above, though we aren't vouching for either the NYT (in this example) or the authenticity. We are vouching for the fact that the software/hardware used to create the Content Credential has met the conformance requirements of the C2PA (and at a particular security assurance level), such that it was issued a certificate by a trusted CA, for the purposing of signing Content Credentials.

But it is equally-well provided by HTTPS, with a little extra metadata. You go to their website and download the image, along with some assertion from them that this is "genuine" according to whatever criteria you might mutually agree upon. No signing necessary.

There are (at least) two problems with this - the same two that you and I have discussed in the context of "AI Controls" at the IETF.

1 - Most content authors don't own the sites where their content is distributed
2 - Many pieces of content aren't distributed & consumed over HTTPS

and then add to that, in this case, provenance is NOT the same as "genuine", and even that term ("genuine") can mean many different things. For example, the Piltdown Man (https://en.wikipedia.org/wiki/Piltdown_Man) is genuine, but yet still a hoax/fake.

And then there is the question whether this information is really going to address the problems we care about.

Which is??

Ultimately, the best that any system like this can do is provide people with information.
They still have to know how to use it. And want to.

Absolutely on all three counts! And that is why education is needed and has been ongoing for quite some time.

And while some don't like the lock icon in the browser - the analogy is real. When it first appeared, most users had no clue what it was nor did they care. Over time - through all sorts of means (both technical and social) we got to the point we are at today where it is so ubiquitous that many would like to remove it from the UX as it no longer serves the same purpose.

@martinthomson
Copy link
Member

Such as??

I did follow that statement with a lot more text. Here's a little more.

This probably isn't the best medium for this, but I don't consider the trust architecture of C2PA to be viable as defined. There are many types of claims that can be made (or worse, implied), all of which are lumped together. The WebPKI concentrates on a single claim ("controls an identifier") and even there it often fails. You are seeking to provide a large number of claims, none of which seem to be verifiable. In particular, provenance claims like "ingredients" are virtually impossible to validate; you need to trust the software that combined the ingredients, and that leads to DRM. At best, someone can make claims and the only recourse an interlocutor might have would be to decide which of those claims is or isn't true.

The use of X.509 allows for delegation of trust, but none of that usage is well-specified. There are lots of unanswered questions. For instance how that might that delegation be scoped to certain claims. What sort of process would be required to make a delegation that meant something? For the one simple claim in WebPKI, you have hundreds of pages of procedures. C2PA has virtually nothing to say on the subject, from what I can see.

I see no evidence of any PKI that would make this usable. You claim that is not necessary, but then how is someone to decide even the basics, like whether the identity of a claimant is accurate? It says "BBC". Is it really?

The use of soft bindings is fundamentally incompatible with the hard decisions that digital signatures enable. Perceptual hashing systems are famously vulnerable to attack. (You also have many options, which potentially only reduces the strength of the system to that of the weakest, while reducing interoperability.) I don't know definitively the answer to @TzviyaSiegman's question, but until a number of cryptographers tell me otherwise, I'll continue to view soft binding as fundamentally incompatible with use for security purposes.

[re HTTPS] There are (at least) two problems with this [...]

You skipped the part where I addressed those concerns.

some don't like the lock icon in the browser

Count me in that group. It's a bad model to chase. We're close to finally eliminating it for the Web PKI.

But it is also the worst possible analogy. You are either verifiably connected to the entity that holds a given domain name. Or not. This is a very binary thing and something that computers can make good decisions about.

Whether a given piece of information is "true" is often a matter of judgment, nuanced, or contested. Your Piltdown Man example only highlights that.

The idea that a technical mechanism like C2PA might contribute positively to the quest for truth is really a point that needs far stronger evidence than has been presented.

@jyasskin
Copy link
Member

jyasskin commented May 9, 2025

I think there is a kernel of use in C2PA, to associate a piece of content, independent of where it's hosted, with a group of entities who claim it's true, so that we can update our trust in those entities when we get new information about whether the content actually was true.

But I agree with @martinthomson that all the rest of the features are likely to be distractions. That doesn't make it useless: X.509 is chock full of such distractions, and it still works for the Web PKI.

If @lrosenthol wants the Web to take advantage of the other features, I think it's up to him to answer:

And then there is the question whether this information is really going to address the problems we care about.

Which is??

@lrosenthol
Copy link

lrosenthol commented May 11, 2025

@martinthomson

This probably isn't the best medium for this, but I don't consider the trust architecture of C2PA to be viable as defined. There are many types of claims that can be made (or worse, implied), all of which are lumped together. The WebPKI concentrates on a single claim ("controls an identifier") and even there it often fails. You are seeking to provide a large number of claims, none of which seem to be verifiable.

We are not trying to verify (or enable verification of) those claims (or what we call assertions). It is clearly stated right at the start of our Trust Model documentation (https://c2pa.org/specifications/specifications/2.2/specs/C2PA_Specification.html#_overview_5).

...is the consumer (who is not specified in the trust model), who uses the identity of the signer, along with other trust signals, to decide whether the assertions made about an asset are true.

In particular, provenance claims like "ingredients" are virtually impossible to validate; you need to trust the software that combined the ingredients,

Correct. And if that software has undergone conformance testing, which granted it a certificate that chains up to a trust anchor on the C2PA Trust List - that is a huge signal. Of course, you may choose not to trust it even then because it is closed source or you don't like the author or whatever...which is why trust is all on humans.

and that leads to DRM.

You completely lost me on this one. What does DRM have to do with any of this?? Nothing in C2PA prevents access to content, metadata or any other information.

At best, someone can make claims and the only recourse an interlocutor might have would be to decide which of those claims is or isn't true.

Based on whether they trust (or not) the claimant - 100%. Same thing folks have been doing for the 30+ year history of X.509, where it has served that purpose on the web, in PDF signatures and many other places. And then you add to that newer models such as the Verifiable Credentials, ZKPs, etc. - all of which are compatible with Content Credentials.

The use of X.509 allows for delegation of trust, but none of that usage is well-specified.

Some usage of X.509 allow for that - but none of the standard usages, such as those noted above (aka SSL/TLS and PDF Signatures).

For the one simple claim in WebPKI, you have hundreds of pages of procedures. C2PA has virtually nothing to say on the subject, from what I can see.

We don't need to re-invent (or restate) what is already well specified in the world today. Standards such as those from ETSI/ESI, ISO, IETF and elsewhere provide the documentation for how to use X.509 certs - both to sign and to verify. Why would we need to say anything new?? That is a huge part of what we have done - use existing standards in the industry in ways that they weren't be using before. For example, you can think of the C2PA Specification providing for "Signing arbitrary media" (vs. signing a PDF).

how is someone to decide even the basics, like whether the identity of a claimant is accurate? It says "BBC". Is it really?

Same way they do it today on the web, in PDF, and every other use of X.509 - trust lists!

However, the C2PA Ecosystem has recognized that in the case of human and organizational identity that isn't always enough - which is why the CAWG (https://cawg.io/) has defined their CAWG Identity Assertion (https://cawg.io/identity/1.1/) extension to Content Credentials. This provides for humans and organizations to not only leverage other forms of identity (e.g. VC's, OAUTH with social media, etc.) but also lets them sign (aka take responsibility for) a specific subset of the assertions in a given Claim (to your point earlier).

The use of soft bindings is fundamentally incompatible with the hard decisions that digital signatures enable.

I don't believe anyone - including myself - has every said otherwise.

Soft bindings, however, serve a significantly useful purpose in a world where existing systems do not maintain Content Credentials during ingestion (usually not for malicious reasons) and some types of media can be easily "re-encoded" either digitally (e.g. photos of photos) or through an analog operation (e.g. print & scan).

some don't like the lock icon in the browser
Count me in that group. It's a bad model to chase. We're close to finally eliminating it for the Web PKI.

Agreed - but it took how many years to get to this point (~30, if my math is correct). I would argue that we are at the beginning of a similar journey now for digital content provenance and authenticity - and in the same way, it will take time to get to the world of "no lock icon" for content.

Whether a given piece of information is "true" is often a matter of judgment, nuanced, or contested. Your Piltdown Man example only highlights that.

100% - on that we agree ;).

And the only way for a user to make that judgement is to have "trust signals" (or whatever you want to call them) that they can evaluate. The C2PA specification for Content Credentials provides the technical means for the actors involved in the creation/editing/publishing of that content to provide those signals - we call that provenance. Just as the world has relied on provenance for physical objects (for centuries!) to determine their authenticity (and value in many cases), bringing that same concept to digital content/objects is a logical progression.

@lrosenthol
Copy link

If @lrosenthol wants the Web to take advantage of the other features, I think it's up to him to answer:

I believe that the top level answer to that is simple (and related to the discussions above):

Users need access to the "trust signals" related to the content they are consuming in the place they are consuming it.

As I noted in a previous response, the C2PA specification for Content Credentials provides the technical means for the actors involved in the creation/editing/publishing of that content to provide those signals - we call that provenance. So, for that side, we believe that we have that "half" covered.

What is needed now is a standardized way for users to "access" that information when using a web browser to consume their content. I expect that dedicated apps that present information to users can/will present the information as they see fit - as they do today. But users consuming in a browser (UA) expect some level of consistency and that is what we are looking for here. HOW that works - I wouldn't presume to tell the UA vendors...but I hope to work with them to figure out that answer.

[NOTE: I am purposefully ignoring issues such as "browser vs. web view" as currently being debated in the W3C's "public-webview" CG.]

@jyasskin
Copy link
Member

There are lots of sub-discussions here, which ideally should be split out into their own issues and then documents. I see:

  • What are we trying to achieve with an "authentic web"? Is it to give users access to trust signals in case they want to spend the time to verify them; to increase the chance that a person who visits a web page quickly develops the same beliefs as if they'd taken the time to investigate it; to increase the fraction of pages that are "authentic"; or something else?
  • What sorts of software need to be able to produce "authentic[ated]" content? The mention of "software has undergone conformance testing, which granted it a certificate that chains up to a trust anchor on the C2PA Trust List" implies that the software incorporates the private key for that certificate, which requires that end-users can't extract the private key to sign their own forgeries with it. This requires the software to run in the cloud or possibly in a DRM-controlled sandbox on end-user machines. Is that acceptable, or does open-source software need to be able to participate in this part of the ecosystem?
  • Which pieces of C2PA are relevant to which kinds of users? E.g. maybe signatures by the camera and along the chain of provenance are more useful to content producers, while creator assertions are more useful to the average end-user. If we develop an ecosystem where fact-checkers can rate creators for accuracy, maybe those fact-checkers want to see a creator's justifications for endorsing a piece of content, so if a creator is tricked into signing something, the fact-checker can demote the people who tricked them instead of just the final creator.
  • How can we express a signer's identity to an end-user, in a way that's not vulnerable to the problems that doomed EV certificates?

@lrosenthol
Copy link

@jyasskin Good idea! As you can see I broke out the four issues you suggested...and have made comments there...

@chrisn
Copy link
Member

chrisn commented May 16, 2025

The idea came up of providing a browser API to parse C2PA data

I created #20 to continue discussion of this idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants