Publicly-Verifiable Certificates of Statistical Validity
Research · · 2 min readKeywords: PAC Verification, SQ Model, Adaptive Data Analysis, Fiat-Shamir Transformation, Differential Privacy, Delegation of Learning
Say, there is a researcher who wants to use a cloud computing service to run a machine learning algorithm. In the present world, this is a centralized model as in the researcher must trust the service to run their algorithm according to specification. But if the researcher cannot trust the service, what can the researcher do in order to do machine learning, even with an untrustworthy computing service?
Prior Work: A line of work on PAC verification [GRSY ‘21] claims a solution by drawing on a long history of interactive proofs. That is, if we allow the researcher and service to communicate, there may be a way for the researcher to verify that the service did in fact do what it was supposed to do. The main question is: how efficient can the verifier be? For example, can the verifier skip over computation, or use less samples? A follow-up paper [MS ‘23] applies this to learning over thhe Statistical Query (SQ) model.
Result: In our work, we generalize on the PAC verification framework and show how to verify the delegation of all SQ algorithms for any binary task where, in some cases, the verifier only needs exponentially fewers data samples than the learning algorithms needs in the first place. Thus, we provide a way for a researcher to check the cloud computing service’s work with exponentially less data.
In addition, we achieve a non-interactive proof system, thus propose a new paradigm of publicly-verifiable delegation of learning where the untrusted party precomputes a learning algorithm and a certificate of statistical validity, then a verifier can check if the learning is good for a different data distribution. We call these Publicly-Verifiable Certificates of Statistical Validity (pvCSVs). This naturally lends to a notion of distributional robustness where verification passes if the marginals of the prover and verifier’s distribution w.r.t. a fixed SQ algorithm are sufficiently close.
We obtain pvCSVs, even for randomized SQ algorithms an randomized SQ protocols, by leveraging a Fiat-Shamir-style transformation, and characterizing the number of rounds fresh randomness is drawn by a new complexity measure we dub epoch complexity. If epoch complexity is constant, then any negligible soundness is preserved.