RedGlow.2715 Posted June 16, 2019 Share Posted June 16, 2019 Hello everyone,I'm in the process of writing an /items scraper, and I'm wondering if there's a more efficient way to keep the data up-to-date than running a full crawl every X hours. Can some HTTP headers like Last-Modified or If-Modified-Since be used in order to know if there are new items to download, or whether some items have been changed?Maybe the question has already been answered, but if so, I can't find it.Thanks everyone! Link to comment Share on other sites More sharing options...
Steven.6309 Posted June 17, 2019 Share Posted June 17, 2019 I don't know about any headers or how to use them. What I do is get the entire index of items (/v2/items), discard the ids that I already have in my storage and then fetch the ids that remain. If the build number changes (/v2/build), I wipe my storage. Link to comment Share on other sites More sharing options...
Leo.3428 Posted June 17, 2019 Share Posted June 17, 2019 I was about to ask the same question and was hoping for an endpoint for diff listing. Thank you @StevenL.3761, refreshing the whole data on a new build is a good compromise. Link to comment Share on other sites More sharing options...
Zok.4956 Posted July 21, 2019 Share Posted July 21, 2019 @"StevenL.3761" said:I don't know about any headers or how to use them. What I do is get the entire index of items (/v2/items), discard the ids that I already have in my storage and then fetch the ids that remain. If the build number changes (/v2/build), I wipe my storage.I use the same strategy for https://www.gw2gh.com/ with one exception: If the build number changes, I do not delete my storage but mark all items in my storage as "dirty"."dirty" items will be replaced asap with actual ones. But if the API fails and (some) items are not available over the API anymore (which did happen a few times in the past) I could still use the "dirty" items from my storage.Of couse, this means that old items, that were removed from the game, will not be automatically deleted in my storage. This is acceptable for me. Link to comment Share on other sites More sharing options...
Steven.6309 Posted July 21, 2019 Share Posted July 21, 2019 That's undeniably better than deleting data. Minor suggestion: "stale" is a more appropriate term than "dirty" (each item doesn't necessarily change between builds). Link to comment Share on other sites More sharing options...
Leo.3428 Posted July 21, 2019 Share Posted July 21, 2019 One strategy I have been considering is keeping all item records with a timestamp for creation and last check, and when an item actually changes, creating a new record, in order to keep track of the evolution of the items. I am a bit worried this might be against the ToS though, what do you think? Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.