Parsing Your Input and Formatting Your Output in C++ Introduction While TopCoder problems are generally hard enough to solve as is, sometimes their solution is complicated with the necessity of extra data conversion: parsing input strings into numerical data to be processed, or formatting the final result as a string of special format. Such transformations are not as simple as they seem; if worst comes to worst, they can prevent one from solving the problem, and even the easiest cases can take enough time to lose precious points. At one point, while working on one of the Google CodeJam practice problems, I found myself just giving up after an extremely boring hour of parsing input without ever starting the solution itself. This article is focused on main fast-to-type and convenient means of parsing input and formatting output in cases that are typical for TopCoder problems. All methods are specific to C++, but Java and VB coders may find several helpful ideas. Typical problems The problem of parsing input arises when you are given a string (or a set of strings) of a given format that contains data you will need for your computations. The following parsing cases are the most frequent in TopCoder problems:
Problems in formatting outputs arise when you need to produce the output as a string of a particular format while including the results of numerical computations. Generally speaking, there are fewer commonly used formatting cases, including:
String fragmentation problem We are given string s, and we need to get and process all the text fragments it consists of. We assume that delimiters are spaces, though this assumption is essential only for the last approach (which uses stringstream). This problem allows manual parsing, though the corresponding code is rather lengthy. string tmp=""; int i=0; s+=" "; //so that we needn't check the last text fragment separately while (i<s.size()) { if (s[i]==' ') //check if the current char is a delimiter { if (tmp.size()>0) { // process the current fragment tmp tmp=""; } } else tmp+=s[i]; i++; }Using string type member functions find and find_first_not_of for searching the beginnings of spaces and text fragments makes the solution smarter. string tmp; int j=0,i=s.find_first_not_of(" ",j); while (i != string::npos) { j=s.find(" ",i); //first argument here... tmp=s.substr(i,j-i); i=s.find_first_not_of(" ",j); //...and here is a string of delimiters // process the current fragment tmp }Special function strtok searches delimiters in a string and replaces them with null character (one delimiter at a time): char *st,*buf, sep[]=" .,"; //sep is a string of delimiters buf=strdup(s.c_str()); st=strtok(buf,sep); while (st) { //process the current fragment st st=strtok(0,sep); }The solution that uses stringstream is the shortest and the smartest: string tmp; stringstream a; a<<s; while (a>>tmp) { //process the current fragment tmp }Extracting numbers We are given string s, and we need to extract and process all integers it contains. First we consider that we can neglect all other characters (like in HiddenNumbers (SRM 220, Div1 Easy, Div2 Medium). Then we can replace all characters except for numbers with spaces, and trim ending spaces: for(int i=0;i<s.sz;i++) if ((s[i]>'9')||(s[i]<'0')) s[i]=' '; s.erase(s.find_last_not_of(" "));Then we can use one of these approaches:
char *tek,st[20]; int i; strcpy(st,s.c_str()); tek=st; while (tek-st<s.sz) { i=strtod(tek,&tek); // process the current number i }The next is stringstream based, and doesn't even need to trim ending spaces: int i; stringstream a; a<<s; while (a>>i) { //process the current number i }The third uses the special functions strtok and atoi: char *st,*buf, sep[]=" "; int i; buf=strdup(s.c_str()); st=strtok(buf,sep); while (st) { i=atoi(st); //process the current number i st=strtok(0,sep); }If we need each piece of information that the string contains, we'll need some more information about the string format to be sure that we're getting the right pieces. Let's examine the parsing task in the SimpleCalculator (SRM 178, Div2 Easy). Here we have an integer n1, a character sign and one more integer n2. The parsing here can look like: sscanf(s.c_str(),"%d%c%d",&n1,&sign,&n2);or stringstream A; A<<s; A>>n1>>sign>>n2;Data conversion string to number We are given string s that is a notation of one number, and we need to extract this number into a variable d. First, let's consider a situation when the number is an integer. There are a number of ways to get the d: 0) d=0; for(int i=0; i<s.size(); i++) { d+=s[i]-'0'; d*=10; } 1) d=atoi(s.c_str()); 2) sscanf(s.c_str(),"%d",&d); 3) char *t; d=strtod(s.c_str(),&t); 4) stringstream A(s); A>>d;If we have a floating-point number, written in a standard way (i.e. not with a scientific notation, though many functions recognize that too) with an obligatory decimal point. Corresponding ways to get the number are: 0) d=0; int pos=0, i; for(i=0; s[i]!='.'; i++) d=(d+s[i]-'0')*10; pos=i; i++; for(; i<s.size(); i++) d+=(s[i]-'0')/pow(10,i-pos); 1) d=atof(s.c_str()); 2) sscanf(s.c_str(),"%f",&d); 3) stringstream A(s); A>>d;Formatting doubles We will consider the most common problem definition: we need to represent a positive double d as its notation with the given number of digits before and after decimal point. For simplicity sake let's assume that we need exactly 4 digits before the decimal point and 4 digits after (this task occurs in PointLifeGame (SRM 221, Div2 Medium)). Manual formatting is overly long and full of enumeration of possibilities that include round-up and adding leading and trailing zeroes. The sprintf function, however, is perfect for the task: char st[10]; sprintf(st,"%09.4f",d); //9 stands for total result length s=string(st);The stringstream can manage this task, too, though it takes several hard-to-remember flags: stringstream A; A<<setw(9)<<setfill('0')<<fixed<<setprecision(4)<<d; A>>s;Other special functions yield worse results. The …toa family doesn't support conversion from doubles at all. ecvt and fcvt support either the total number of digits or the number of digits after decimal point -- but not both -- and neither of the functions actually writes the decimal point. You could use these with lots of extra work, like checking the actual position of the decimal point and inserting it by hand, but even manual formatting is probably preferable. Managing time Let's consider one more problem that is an example of parsing and formatting time. Given a string time that contains notation of time in format "HH:MM", we need to extract integer numbers of hours h and of minutes m (cases that include seconds, "am" or "pm" modifiers, or don't include hours should be processed the same way). Manual parsing is short and elegant here, if we use the property of numeric characters: h = (time[0]-'0')*10+(time[1]-'0'); m = (time[3]-'0')*10+(time[4]-'0');Other approaches are not much smarter, because this task is extremely easy and can be performed in dozens of ways. For atoi the code is: h = atoi( time.c_str() ); m = atoi( time.c_str()+3 );For strtod it looks like this: char *end; h=strtod( time.c_str(), &end); m=strtod( time.c_str()+3, &end);For sscanf we get: sscanf( time.c_str(), "%d:%d",&h,&m);Using stringstream gives us: stringstream s; time[2]=' '; s<<time; s>>h>>m;Now let's consider a pair of integer numbers of hours h and minutes m, which we need to write to a string time as "HH:MM". This task is an example of writing an integer with leading zeros. Manual formatting is easy but comparatively long: time=" : "; time[0] = h/10+'0'; time[1] = h%10+'0'; time[3] = m/10+'0'; time[4] = m%10+'0';The problem with using the special functions here is that some of them don't give you a chance to set the resulting string's width. Thus, if you choose to use itoa, you will need to add zeros manually: time=""; char hs[3],ms[3]; itoa(h,hs,10); itoa(m,ms,10); if (strlen(hs)==1) time +="0"; time +=string(hs)+":"; if (strlen(ms)==1) time +="0"; time +=string(ms);You can use fcvt with no digits after decimal point, but it will also need adding zeros: time =""; int dec,sign; char *hs; hs=fcvt(h,0,&dec,&sign); if (strlen(hs)==1) time +='0'; time +=string(hs)+":"; hs=fcvt(m,0,&dec,&sign); if (strlen(hs)==1) time +="0"; time +=string(hs);sprintf fulfils the task easily, though using it requires data conversion from char[] to string: char st[6]; sprintf(st,"%02d:%02d",h,m); time=string(st);Using stringstream, with its format functions, gives us: #include <iomanip> stringstream A; A<<setw(2)<<setfill('0')<<h<<':'<<setw(2)<<setfill('0')<<m; A>>time;Should you use a template? The appeal of using pre-written templates is that, if you have written all the functions you might need for parsing and formatting beforehand, all you will need to do during the Coding Phase is just choose the best one to use. This approach unites the advantages of other methods, and it is extremely handy. The disadvantage, though, is that when the template is written its author can't be sure that it covers all the parsing and formatting problems they will face during their coder career. (And, as I'm pretty sure that our problem writers' imaginations are far from exhausted, we should expect to encounter many more exotic parsing tasks.) The sensible advice here is to take a compromise approach -- write a nice short template for cases that are hard to remember, and memorize some quick and multi-purpose methods for both typical and exotic cases. Experienced coders (tomek, John Dethridge, Eryx, bladerunner) include in their templates a function for string fragmentation into strings or integers, since such parsing tasks are the most frequent in TopCoder problems. Some coders, like bladerunner, have template functions for data conversion that encapsulate a call of some special function with the necessary temporary variables so that they needn't waste time on typing it. One can view their templates in any of their recent solutions. Taking the opposite approach is misof, who doesn't use a template at all; he remembers everything he needs by heart. Beware The problems of parsing and data conversion look rather simple once they are solved correctly, but they can still hold pitfalls. The following are several tricky cases. If you are formatting a double as its notation with a given number of fractional digits, you will need to use some round-up algorithm implemented by the function you've chosen. Remember, though, that the function's round-up might be different from the one the problem statement asks for, so check the boundary cases carefully and consider writing a more complicated conversion. Thus, in the PointLifeGame (SRM 221, Div2 Medium) you need to round the result down to the nearest ten-thousands. In the PiCalculator (SRM 246, Div1 Easy) "the last digit(s) should be rounded according to the standard rounding rules (less than five round down, more than or equal to five round up)". One more rounding-up problem might occur when working with negative doubles, though it is a rare case for TopCoder problems. I wouldn't say that you should format the numbers manually, but using a certain combination of floor, ceil and constants like 0.49 before writing the number to a string might be helpful in case of special rounding rules. Another pitfall is computational speed. Data conversion is a rather slow operation compared to numerical computations, so you shouldn't use them needlessly. For example consider the ApocalypseSomeday (TCHS SRM 2, Div1 Medium) problem. There are various ways to check whether the number contains "666" in its notation. If you convert the number into a string using stringstream and then search for "666" with find member function, you will run into the time limit on the maximum test. You should also take into account the fact that using different methods of data conversion takes significantly different time. If we compare times of conversions string to integer using the described above methods, we can see that atoi is the fastest, sscanf is about 3 times slower, strtod is 4 times slower than sscanf, and stringstream is 9 times slower than strtod. Therefore, if you need several thousands of data conversions, you should weight the chosen function against others. And finally, there exist problems concerned with parsing and formatting that aren't limited to them. Sometimes other knowledge is needed, or just reading the "Constraints" section carefully. Thus, in the PiCalculator (SRM 246, Div1 Easy) you should notice that double and even long double type is not enough to store all 25 digits of Pi, so you will need to write if (precision==1) return "3.1"; if (precision==2) return "3.14"; if (precision==3) return "3.142";…and so on. Eryx recalls a problem in which "some people had a problem with parsing, because they didn't notice that numbers can have more than one digit." Conclusion Different problems and pitfalls might be encountered in different parsing or formatting tasks, but, as misof pointed out, "the most common cause of mistakes is ... trying to do something by hand when there is a library function that does the job." A certain amount of practice is necessary for managing such tasks freely and confidently, however annoying it can be. Practice Room Listed below are a couple of TopCoder problems that illustrate the use of the parsing and formatting tasks outlined above. Some of them were mentioned in the write-up, others are a good practice for a novice. The problems are grouped roughly by the task they implement, though some problems need several ones. String fragmentation VLNString (SRM 226, Div2 Easy) SortBooks (SRM 262, Div2 Medium, Div1 Easy) TreasuresPacking (SRM 308, Div2 Hard) Recipe (SRM 265, Div1 Medium) String fragmentation with non-space delimiters DigitalDisplay (TCHS SRM 6, Div1 Easy) DirectoryTree (SRM 168, Div1 Medium) GradingGridIns (SRM 264, Div1 Medium) Numbers extracting SimpleCalculator (SRM 178, Div2 Easy) MassiveNumbers (SRM 236, Div2 Easy) TwoEquations (SRM 287, Div1 Easy, Div2 Medium) HiddenNumbers (SRM 220, Div1 Easy, Div2 Medium) Managing time VideoEncode (TCO05, Qual 1/3) SlowKeyboard (TCO06 Qual 7/9/14) Time (SRM 144, Div2 Easy) TimeCard (SRM 257, Div2 Hard) SlowDigitalClock (SRM 260, Div1 Medium) Formatting doubles FormatAmt (SRM 149, Div2 Easy) PointLifeGame (SRM 221, Div2 Medium) RoomSummary (CRPF Charity Round 1, Div1 Medium) Planets (TCO'03 Round1, Div1 Medium) |
|