• 2 Posts
  • 25 Comments
Joined 2 years ago
cake
Cake day: August 16th, 2023

help-circle
rss












  • (Sorry for the late response.) Well it depends a lot on the site. Since I focus on books and scholarly articles, the ideal way is to find the URL of the original PDF. The website might show you just individual pages as images, but it might hide the link to the PDF somewhere in the code. Alternatively, you might just obtain all the URLs of the individual page images, put them all into a download manager, and later bundle them all into a new PDF. (When you open the “inspect element” window, you just have to figure out which part of the code is meant to display the pages/images to you.) Sometimes the PDFs and page images can be found in your browser cache, as I mention in the OP. There’s quite some variety among the different sites, but with even the most rudimentary knowledge of web design you should be able to figure out most of them.

    If need help with ripping something in particular, DM me and I’ll give it a try.









  • Something of the sort has already been claimed for language/linguistics, i.e. that LLMs can be used to understand human language production. One linguist wrote a pretty good reply to such claims, which can be summed up as “this is like inventing an airplane and using it to figure out how birds fly”. I mean, who knows, maybe that even could work, but it should be admitted that the approach appears extremely roundabout and very well might be utterly fruitless.