Downloading posts as HTML

2 months ago by da_cow (she/her) to c/lemmy_support

So I recently started builduing a tool to download Lemmy posts as raw HTML. I am using bash for this.

I got all of my other code working so far, but the problem is, that when I fetch a command just from the post id, the entire formatting is quite fucked. An example:

Does anyone know how I can fetch the post with curl (or if you have any other CLI programm, that could do this, feel free to recommend them to me), after everything has been loaded, so that in the end I can see the post as I would normally see it through the web interface.

SteveTech 2 points 2 months ago

It's missing the CSS, and I don't think this is something you can do solely with curl, at least without a bunch of parsing. Wget does have a --page-requisites option that should work though.

Also, your first screenshot has included the HTTP headers, which are not part of the HTML. You probably need to remove a -v from the curl command, although it doesn't really fix the main issue.

Edit: You also might get some more replies if you posted to one of the programming communities.

path: 0 23366669, hotness: undefined, score: 2, children: 0
lemmy_support

@lemmy.ml

login for more options
5070
1565
70

Support / questions about Lemmy.

Matrix Space: #lemmy-space

go to feed...