MRW

Having fun with Perl like lists in C++

This article describes how mighty C++ is in emulating other languages. It shows how one can get the ease of Perl in C++ list handling, using operator overloading and an own list class.

Perl versus C++

In Perl, there are list constructs like:

  @list = (1, 2, 3, 4);

In C++, the same list ist defined as:

  std::list<int> l;
  l.push_back(1);
  l.push_back(2);
  l.push_back(3);
  l.push_back(4);

This is far not as smart as in Perl. Moreover, iterating through a list in Perl is simply done with:

  foreach(@list) {
    print $_."\n";
  }

In C++ you need more code, that's more complicated code:

  for (list<int>::iterator it(l.begin()); it!=l.end())
    std::cout<<*it<<std::endl;

The reason for this article is: Is it possible to simplify this and is this a good idea?

The stunning answer to the first question is yes! C++ is so mighty that it can emulate nearly the exact Perl syntax! And please note that I very seldom make use of C-Macros, but they are forbidden in all C++ Coding Rules that I have editted.

C++ List with (nearly) Perl Syntax

If there is a class like Cont:

class Cont {
  public:
    int i;
    operator int() {
      return
    }
    Cont(int a): i(a) {
    }
};

Then this is correct C++ code:

  mrw::fun::List<Cont> l = (Cont(1), Cont(3), Cont(5), Cont(7), Cont(11));

It is also possible to iterate through the list, without the need for an iterator:

  while (++l) std::cout<<*l<<' ';

How did I do this and what's behind the class mrw::fun::List? The ideas behind this are:

  1. to concatenate lists with commas, overwrite operator,()
  2. to iterate without iterator, store an iterator internally in the list

Therefore the mrw::fun::List looks something like this:

template<typename T> class List {
  private:
    std::list<T> _list;
    typename std::list<T>::iterator _it;
    bool _reset;
  public:
    List(): _reset(true) {};
    List(const List& a): _list(a._list), _it(a._it), _reset(a._reset) {} 
    List& operator+=(const T& a) {
      _list.push_back(a);
      return *this;
    }
    bool operator++() {
      if (_reset)
        _it=_list.begin();
      else
        ++_it;
      return !(_reset = (_it==_list.end()));
    }
    T operator*() {
      return *_it;
    }
};

The class internally stores the std::list and an iterator to this list. The flag _reset is required to know whether iteration should restart at the beginning. The copy constructor is simple, it copies another list. I use operator+= to append elements to the list. operator++ and operator* are used to iterate through the list. operator++ starts with the first element, then iterates through the list. It returns false and resets if it reaches the end.

The only thing missing is the ability to connect list elements with commas. This is done by overloading the comma operator twice, once to connect two values of the given type (here Cont), and once to add a value to an already existing list:

template<typename T> mrw::fun::List<T> operator,(T a, T b) {
  return (mrw::fun::List<T>()+=a)+=b;
}
template<typename T> mrw::fun::List<T> operator,(mrw::fun::List<T> a, T b) {
  return a+=b;
}

What we see now is, that the line:

mrw::fun::List<Cont> l =
  (Cont(1), Cont(2), Cont(3), Cont(4));

expands to:

mrw::fun::List<Cont> l =
  operator,(operator,(operator,(Cont(1), Cont(2)), Cont(3)), Cont(4)));

Is this a Good Idea?

Now we have a Perl like list for elements of the class Cont or any other class. But can we also simply construct a list with comma separated int values, like mrw::fun::List<Cont> l = (1, 2, 3, 4);? The answer is no! The reason is, that it is not possible in C++ to overwrite operator,() for simple types. Overwriting the comma operator can also have several unplanned risks and side effects, when an overwritten operator,() is used, when the programmer expects standard behaviour. So it is definitely not a good idea to use operator,() for this purpose.

The second part of our feature, the integrated iterator and simplified while has two drawbacks: First of all it is not thread safe, and second if you leave the while before the end has been reached, the iterator is not reset and the next iteration continues where the previous iteration aborted.

Consider the following code:

   while (++l) if (*l == "hello") break;
   std::cout<<"found: "<<*l<<" in:"<<std::endl;
   while (++l) std::cout<<*l<<' ';
   std::cout<<std::endl;

Here the second while does not start at the begin of the list l, but after the found element. So, between the two while-loops, there must be a call like l.reset(), that sets the _reset flag. So it is not really a bad idea to place an internal iterator in a list, but the programmer must kow what he is doing, i.e. never access the iterator from two separate threads and know when he has to call a reset method.

A Better Alternative

Instead of overwriting operator,(), it is better to overwrite operator«. Also, operator« should be part of the class, not a global method, so add the methods:

template<typename T> class List {
  public:
    List(): _reset(true) {};
    List& operator<<(const T& a) {
      _list.push_back(a);
      return *this;
    }
    List& operator<<(const List& a) {
      _list.insert(_list.end(), a._list.begin(), a._list.end());
      return *this;
    }
};

Now, a list of int can easily be set up:

  mrw::fun::List<int> l = mrw::fun::List<int>()<<1<<3<<5<<7<<11;