Expand my Community achievements bar.

WF API - duplicate and missing objects in paginated lookups

Avatar

Level 1
I have noticed a problem, when I look up large numbers of objects over the Workfront API. An example: When looking up 14.038 task entries (2000 at a time) and then comparing their IDs, I find 12 IDs that are duplicates of entries already retrieved. This also means that there are 12 entries that were not retrieved, because the duplicates have "pushed" them out of the result. Things get weird when I try to run the same search again, a relatively short time after: the number of duplicates decreases. I still get the exact same number of entries, but now with fewer duplicates and more unique entries. Sometimes the second lookup has entirely unique values, sometimes it's the third full lookup. The lookup is done in a very simple C# console application, which doesn't save anything between executions, so I don't think that there is any caching going on on my end. For now, I can hack the issue away by simply running the full search again and again, until I get no duplicates, but that's obviously not a pretty solution. The problem happens for all object types I have tried (where I have more than 2000 entries to look up), and the duplicates tend to pop up in the last chunks of 2000 objects that I retrieve. I have tried retrieving all fields or only the ID, the situation is still the same. Has anyone seen this problem before, and possibly know of a way to mitigate it? Thanks in advance. Martin Lausen
Topics

Topics help categorize Community content and increase your ability to discover relevant content.

API
15 Replies

Avatar

Level 10
That's interesting. I'll be getting into this sort of query within a month or two. Will keep any eye on this thread. Regards, David Cornwell

Avatar

Level 10
Hi Martin, I've now come across this issue and the behaviour is exactly the same as you describe. As you found, the number of duplicates really vary for different executions. I've found that once I get it down to around 14000 records (7 pages) it usually doesn't give me duplicates. However, for sets of 50,000+ records, I'm generally getting around 600 duplicates (and missing, I think, the same number of unique records). I don't think I could even try your "not a pretty solution" of re-running the set of queries multiple times because I will always get duplicates at this scale. However I'm thinking (as a workaround) I may be able to run the full sets of queries 2 or 3 times then append the results and remove duplicates. It seems pretty random as to which records are duplicated/excluded each time, so if it's random enough I should hopefully get fairly close to the full set of records. I'm going to log a support ticket as well but don't see this as a quick fix..... @Vazgen Babayan and @Doug Den Hoed - AtAppStore if you have any gems of advice, I'd really love to hear them. Thanks Regards, David David Cornwell

Avatar

Level 10
Hi David, Raising this through support should be the best path to take. As a possible cause for this, it may be that when multiple API queries are executed in quick succession, they land on parallel threads on the server for optimizing the load and this results in duplicates. As a potential workaround, a delay can be introduced to the script between the calls, that will ensure that the queries are not run in parallel. Vazgen Babayan Product Manager Workfront

Avatar

Level 10
Thanks David, We move a lot of data around, especially in our Snapshot and Merge/Split solutions, but I don't recall having run into this one. We typically use a "chunking" subroutine, appending each chunk (eg 2000 at a time) into a dictionary or collection, so perhaps that approach (or inherent delay, as Vasgen surmised) has provided protection. Regards, Doug Doug Den Hoed - AtAppStore Got Skills? Lend a hand! https://community.workfront.com/participate/unanswered-threads

Avatar

Level 10
Hi @Doug Den Hoed - AtAppStore and @Vazgen Babayan I'm pretty sure I'm already doing what you're both saying. These queries are run sequentially, then the results are appended. I will try adding a delay per Vazgen's suggestion...perhaps the issue is that the second query is coming in a little too quickly after the previous one. Do you know if there is any other way I force it not to run on a parallel thread? Some more background if it helps: We have been using the api-unsupported so that we can get access to the resource contour data. Yes, I know the whole point of it is that it is not supported and we shouldn't be relying on it but this is the only way we can get this data and it's worked OK until now. Late yesterday when I was troubleshooting, I setup queries to the V9 Production API for comparison. I saw significanlty fewer duplicates, but there were definitely still some of them. Then, the API-Unsupported queries ran with very few duplicates. It did this a few times and so I thought perhaps there had been some issue with the infrastructure which was getting better. This morning, the V9 API has given me no duplicates at all, whilst API-unsupported is giving lots of duplicates again. Troubleshooting the API-unsupported API I have tried bypassing Akamai cache, with no improvement. I also tried removing the resourceContours from the data being returned in case this was causing the issue, but it hasn't helped. So, at the moment as far as I can see the duplicates are consistently occurring on API-unsupported at various levels. On the V9 API the duplicates are rarely occurring but do sometimes occur. Vazgen - I have a suppor ticket 1161614 open currently. I know my main issue seems to be with API-unsupported, but it does happen with V9 sometimes (as per the other customer who started this thread). Given that the issue seems a lot worse in API-unsupported, hopefully Workfront can look into it as this may come back to generate wider issues in the future if the issue makes it into the production API. For the meantime I'm going to try adding in a delay, and also try merging results from API V9 and API-unsupported to hopefully get data that's good enough to use for our reporting. Thanks for any further thoughts or assistance you can organise. David Cornwell

Avatar

Level 10
Hi David, Thanks for the extra info. We'll take a closer look ar our Resource Contouring solution this week, and I'll let you know if we see any symptoms. Regards, Doug Doug Den Hoed - AtAppStore Got Skills? Lend a hand! https://community.workfront.com/participate/unanswered-threads

Avatar

Level 10
Cheers Doug - I hope you're not having issues too, but if you're not, I wonder why not? I'll obviously let you know what I find out with my testing. David Cornwell

Avatar

Level 10
Hi, So I have tried two more troubleshooting steps: Added a delay of 5 seconds between each page of the API queries per Vazgen's suggestion Test both the Production and Unsupported APIs on the Preview Sandbox to rule out user data changes being the cause. In both cases, it made not difference and I'm still getting duplicates and missed records. I'm now going to focus on a workaround of using both the Production API and the API-unsupported and merging the results....hopefully this reduces the impact of the missing records until we can work out why we're getting them. If not, I will have to stop using resourceContours and go back to using WorkPerDayList via the V9 API only, which will at least mean we have very few (if any) missing records. This is my least-preferred option because my users rely on seeing the correct daily values from the contouring. Regards, David David Cornwell

Avatar

Level 10
Ouch. That's unfortunate, David. Some long shot additional ideas... To access the contouring data, we go through the Assignments object and haven't seen any issues that way (but will keep an eye out). We also do not group our query (in case you do). We don't know if the API supports sorting, but if it does, perhaps that would help. By coincidence, I have a call scheduled with Vazgen shortly, and will let you know if we come up with anyrhing else. Regards, Doug Doug Den Hoed - AtAppStore Got Skills? Lend a hand! https://community.workfront.com/participate/unanswered-threads

Avatar

Level 10
Hi David, Vazgen and I chatted; no other insights, but (via your helpdesk ticket), perhaps the Networking team can trace the actual calls to determine the source of the duplicates. Regards, Doug Doug Den Hoed - AtAppStore Got Skills? Lend a hand! https://community.workfront.com/participate/unanswered-threads

Avatar

Level 10
Hi Doug, No, we don't use grouping (I think that's only for report queries anyway) and I'm not aware of any way to sort the results. In a way I'm glad that you're not experiencing these issues because it suggests it's not a core problem with the API functionality. So, hopefully we can work through my ticket to a resolution. I might take it offline and have a chat with you some time once I know more so I can share how my queries operate and see if you can replicate the same errors or not. For the moment, my workaround of using both the Production API V9 and API-Unsupported and merging the results seems to be working pretty well. We just lose the granularity of the Resource Contours data if it was on the records which were lost and therefore gets replaced with the averaged workPerDayList values. Will keep you posted. Even though not resolved it's helpful to have people who do the same thing taking an interest! Cheers David Cornwell

Avatar

Level 10
Along the lines of what @Doug Den Hoed - AtAppStore was thinking, it turns out that it is indeed possible to sort the results by adding the following to the query: &ID_Sort=asc The above example is to sort by the ID field in ascending order. To do it in descending order it should be &ID_Sort=desc Thanks @Tyler Reid for digging up this info. He is looking into getting it added to the API documentation. Based on my experience Doug, I'd strongly suggest you implement this to avoid potential issues. :) Cheers David Cornwell

Avatar

Level 10
Hi David, Thanks for the tip. Since IDs are somewhat random (and tough to read), I think I'll instead sort by the RefNumber, which are numeric and always increase (albeit not necessarily sequentially). That reminds me...we had another theory as to why you (and others) might be seeing duplicates where we are not: "volume". To my knowledge, the API does not support the concept of a static cursor (e.g. return to me only what existed at the time I made my request), but instead always brings back "live" whatever matches at the time of the next fetch. Although we are making some pretty serious calls, perhaps yours are even larger, and/or are hitting data that is being edited while you are working (where ours are typically off hours after locking out users for a Merge/Split). If so -- particularly prior to the "order" as you've now discovered -- perhaps that combination is leading to the duplicates you've observed. If so, the order (e.g. by RefNumber) might help ensure that each unique row is only picked up once. Another option -- slightly more work, but perhaps even better insurance -- could be to do a "fast" retrieval of all the unique ID's (only) that match your initial request (effectively creating a static cursor), and then a second loop to drive off those unique values and pull back the associated details in a more orderly (albeit still batch/chunked/next...) fashion. Regards, Doug Doug Den Hoed - AtAppStore Got Skills? Lend a hand! https://community.workfront.com/participate/unanswered-threads

Avatar

Level 10
Hi @Doug Den Hoed - AtAppStore - the reference number is a good idea if you're querying an object that has one, of course. However some of my queries (assignments, hours) don't have them. Still, sorting by a date or something other than ID would reduce the chance of a reordering due to data edits whilst the queries are running. On that topic, I actually did testing last week in our sandbox environment (where there is no user activity) and the duplications were still occurring, so it was not being caused by that. So the sorting is just giving the database a consistent way to order the returned results and that's avoiding the issues. It still doesn't explain why I was seeing such a large number of duplicates in Preview vs Production, but I'll let that rest seeing as we have a solution. Cheers David Cornwell

Avatar

Level 10
And "pop" goes another theory, David. I suspect we'd need some detailed log tracing from the DBA's to get to the root cause of this one, but agree: since adding a sort to an API query works around the sporadic duplication behavior, Good Enough. Well done, glad you got it, and appreciate you sharing! Regards, Doug Doug Den Hoed - AtAppStore Got Skills? Lend a hand! https://community.workfront.com/participate/unanswered-threads