(50g) Removing Objects From Lists Quickly

01052019, 05:16 PM
Post: #1




(50g) Removing Objects From Lists Quickly
Note: The examples in this post require the libraries GoferLists and ListExt. The last example also requires LSORT.
The GoferLists command Filter selects elements of a list on stack level 2 that meet criteria described by a user program on level 1. For instance: Code:
Filter can similarly be used to remove items from a list by inverting the logic of the program, e.g. 2 MOD NOT will return the even numbers from the list. If one wants to remove multiple instances of one object from a list, Filter can be used like so: Code:
but Filter is quite slow for that purpose. A much faster method is illustrated by the following short program using ListExt commands: Code:
with the list on level 2 and the object to be removed on level 1. If one wants to remove all instances of multiple objects from a list, one can use the GoferLists command Diff (set difference). Diff is reasonably fast for small lists but slows down appreciably for larger lists. The following program is a bit slower than Diff for small lists but more than 10 times as fast for lists containing hundreds of objects: Code:
A similar case involves removing duplicate objects from a list. The GoferLists command Nub does this, and so does the ListExt command LDDUP, which is quite a bit faster. Both commands preserve the order of the (unique) objects in the list. If preserving the order of list objects is not required or, especially, if sorting them is desirable, the following short sequence is much faster: Code:


01062019, 07:28 PM
Post: #2




RE: (50g) Removing Objects From Lists Quickly
(01052019 05:16 PM)John Keith Wrote: If one wants to remove all instances of multiple objects from a list, one can use the GoferLists command Diff (set difference). Diff is reasonably fast for small lists but slows down appreciably for larger lists. The following program is a bit slower than Diff for small lists but more than 10 times as fast for lists containing hundreds of objects: That's a clever approach! I doubt I'd have ever come up with something like that. I was curious to see how Diff worked, so I took a look at what it does. GoferList's Diff is essentially: Code: « SWAP « Delete » Foldl » Foldl internally uses STREAM, which is reasonably efficient, but I think the slowdown comes from repeated calls to Delete in this case. It looks for the matching item, and if found it then splits the source list into "before the found item" and "after the found item" sublists, then concatenates them. That results in the remaining list objects having to be copied into new objects two separate times for each found (common) object, which slows things down considerably. The ListExtbased program you came up with only has to replicate the source data twice (as opposed to twice per found object). The first replication occurs with LREPL, and the second with LRMOV. While looking through the Diff code, I also realized that it handles duplicates differently than the ListExt version. For example: Code: { 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 } ...gives this result: { 1 2 2 3 3 4 4 5 5 } Using the ListExt sequence with the same list objects yields an empty list result. I can see both possibilities being appropriate in different circumstances, though. Nice implementations, John! 

01072019, 05:19 PM
(This post was last modified: 01092019 11:10 PM by John Keith.)
Post: #3




RE: (50g) Removing Objects From Lists Quickly
Thanks, David.
From my (limited) understanding of set theory, the difference function (as well as intersection and, of course, union) should eliminate any duplicates. Set functions in Mathematica work this way. All three of those functions in GoferLists return lists with duplicate objects in some circumstances. I believe that this behavior is erroneous. (Maybe not see post #11 below.) Along those lines, I also came up with a similar program for set intersection. It is also much faster than GoferLists' Intersect: Code:
The order of commands may seem strange but gives correct results faster than any other program I could come up with. Others are encouraged to improve the code, and especially to test it for correctness. 

01072019, 08:05 PM
Post: #4




RE: (50g) Removing Objects From Lists Quickly
(01072019 05:19 PM)John Keith Wrote: From my (limited) understanding of set theory, the difference function (as well as intersection and, of course, union) should eliminate any duplicates. Set functions on the Prime and in Mathematica work this way. I'm not even remotely a mathematician, so I'll take your word for it that removing duplicates is the correct way to go. Python sets don't even allow duplicates to begin with, so you could also include that as another reference supporting that premise. (01072019 05:19 PM)John Keith Wrote: Along those lines, I also came up with a similar program for set intersection. It is also much faster than GoferLists' Intersect and never returns lists with duplicate objects: Once again, an interesting approach. You are indeed creative! I've got some ideas for implementations of Union/Difference/Intersection that involve parts of the underlying code behind LDDUP/LREPL/LRMOV. I'll add that investigation to my stack of "look at this sometime" items. Those would seem to be useful and a natural fit for ListExt. 

01072019, 10:36 PM
Post: #5




RE: (50g) Removing Objects From Lists Quickly  
01092019, 06:29 PM
Post: #6




RE: (50g) Removing Objects From Lists Quickly
So...
Union seems fairly straightforward, and in this case would be functionally similar to "+ LDDUP" (though implemented a bit more efficiently). Intersection doesn't have a shortandsweet equivalent, but seems functionally understandable as well (list of unique elements that are in both lists). Difference may need some clarification, though. Am I correct in assuming that the target here is what is described on this page as the "relative complement"? That would imply that order of arguments is important (ie. "difference" is not commutative). 

01092019, 08:31 PM
Post: #7




RE: (50g) Removing Objects From Lists Quickly
(01092019 06:29 PM)DavidM Wrote: Difference may need some clarification, though. Am I correct in assuming that the target here is what is described on this page as the "relative complement"? Yes. There's also a symmetric difference: 

01092019, 10:34 PM
Post: #8




RE: (50g) Removing Objects From Lists Quickly
(01092019 06:29 PM)DavidM Wrote: Difference may need some clarification, though. Am I correct in assuming that the target here is what is described on this page as the "relative complement"? That would imply that order of arguments is important (ie. "difference" is not commutative). There is definitely some confusing terminology in this area. "difference" as implemented by GoferLists as Diff and my difference program, is what Wikipedia refers to as "relative complement". Called Complement in Mathematica. Confusingly, the Prime's DIFFERENCE command implements symmetric difference, not complement. You are correct that set difference aka complement is noncommutative. 

01092019, 10:42 PM
Post: #9




RE: (50g) Removing Objects From Lists Quickly
(01092019 08:31 PM)Thomas Klemm Wrote:(01092019 06:29 PM)DavidM Wrote: Difference may need some clarification, though. Am I correct in assuming that the target here is what is described on this page as the "relative complement"? Yes, there is. Assuming my difference program is called DIFF: Code:
I'm sure this could be greatly improved by rewriting it directly rather than in terms of DIFF, but I haven't attempted to do so yet. 

01092019, 10:56 PM
Post: #10




RE: (50g) Removing Objects From Lists Quickly  
01092019, 11:06 PM
Post: #11




RE: (50g) Removing Objects From Lists Quickly
In addition to the terminological confusion, there seem to be several differences (pun intended) in the way these functions are implemented. In Mathematica, both Complement and Union return sorted lists; GoferLists and the Prime return lists containing elements in the original order that they occurred in the input lists.
Also, I have noticed that INTERSECT on the Prime does return lists with some duplicate elements if the first input list contains duplicates. Therefor I am removing the final LDDUP command in my intersection program in post #3. The program will be faster, and users can always execute LDDUP themselves afterwards if they wish. 

« Next Oldest  Next Newest »

User(s) browsing this thread: 1 Guest(s)