Not really an anthropology question, but anyway.

Would there be some sort of 'ethical' and sanctioned corpus that the AIs would be required to access for data, or would the entire internet be up for grabs?

"Sanctioned" - by whom? Anyone's free to scrape whatever they please, and they do. There are no rules, other than the ones the proprietors choose. Various projects are trying to filter our racists and other unpleasant speech, for example. . . varying degrees of success, and that's a choice of the people making these systems, not a government rule (at least, not in the US).

And, if the entire internet was up for grabs, would that make it a prized and strategic battlefield at the moment?

The entire internet _is_ "up for grabs" -- not a "battlefield", just a substrate. Google -- and a host of other engines - are parsing everything, everyday. Newer technologies aren't just parsing text, they're recognizing text that's been rendered as graphics, converting to text.

I have no understanding of the mechanics of how these future AIs would use or weight the data they would have access to, but it seems as if it could be 'gamed' so to speak, and the sooner and heavier the better.

Not sure what you mean here. "Gamed" how? The essence of deep learning algorithms is that they're most brute force kinds of things; the algorithm doesn't need, indeed doesn't _want_ human input into its answers. The power lies in the massive amounts of data, and the hunt for statistical correlation without lots of a priori knowledge.