Category Archives: STL

A tokeniser using STL

Using STL and its find functions, you can write a simple and extremely useful tokenise method.

std::vector<std::wstring> Tokenise(const std::wstring& stringToTokenise, const std::wstring& delimiters)
	std::vector<std::wstring> tokens;
	size_t startPos = 0; 
	size_t endPos = 0;
	std::wstring token;

	// Get the tokens
	while(startPos != std::wstring::npos)
		// Find the start of the next token, beginning from the last one found
		startPos = stringToTokenise.find_first_not_of(delimiters, endPos);
		// Find the end of the next token, beginning from the one just found
		endPos = stringToTokenise.find_first_of(delimiters, startPos);

		// If a token wasn't found, don't try to extract it
		if(startPos != std::wstring::npos)
			tokens.push_back(stringToTokenise.substr(startPos, endPos - startPos));

	return tokens;

Usage of this function is nice:

std::vector<std::wstring> strings = 
    Tokenise(L"Custard Creams;Jaffa Cakes;Hobnobs", L";");

This will return a vector of crumbly deliciousness:

strings[0] = L"Custard Creams"
strings[1] = L"Jaffa Cakes"
strings[2] = L"Hobnobs"

You can also specify multiple deliminators for the same result:

std::vector<std::wstring> strings = 
    Tokenise(L"Custard Creams?Jaffa Cakes!Hobnobs:)", L"?!:)");

One thing it does not do is return an empty string for cases like this:

std::vector<std::wstring> strings = 
    Tokenise(L";;", L";");

This returns an empty vector, but in a strange parallel world it could return 3 empty strings. Should it? I will need convincing.